Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCIONLab next version initial update #158

Merged
merged 32 commits into from
Mar 8, 2024
Merged

Conversation

juagargi
Copy link
Member

@juagargi juagargi commented Jan 19, 2024

Following instructions on https://gitlab.inf.ethz.ch/PRV-PERRIG/scionlab/scionlab-docs/-/wikis/Biweekly-SCIONLab-Update

Notes:

TODO:


This change is Reviewable

matzf and others added 30 commits October 30, 2023 16:51
…to#4424)

Bump google.golang.org/grpc from 1.57.0 to 1.57.2 due to a
security patch.
Skipping 1.57.1, because that contained a bug.

From vulnerability alert "gRPC-Go HTTP/2 Rapid Reset vulnerability":

> Impact
>
> In affected releases of gRPC-Go, it is possible for an attacker to send
> HTTP/2 requests, cancel them, and send subsequent requests, which is
> valid by the HTTP/2 protocol, but would cause the gRPC-Go server to
> launch more concurrent method handlers than the configured maximum
> stream limit. Patches
>
> This vulnerability was addressed by #6703 and has been included in patch
> releases: 1.56.3, 1.57.1, 1.58.3. It is also included in the latest
> release, 1.59.0.
>
> Along with applying the patch, users should also ensure they are using
> the grpc.MaxConcurrentStreams server option to apply a limit to the
> server's resources used for any single connection.
Improve metrics for router packet processing:

* Classify packets by size bucket: log2(size)
* Classify packets by traffic type: incoming, outgoing, as-transit-in/out, br-transit.
* Add process-level on-cpu and sleep time accounting
Refactor the remaining targets that depended on @cgrindel_bazel_starlib//updatesrc.
The repository has now been completely migrated to @aspect_bazel_lib//lib:write_source_files.

During the refactor, we dropped the functionality to write the source files in
rules_openapi. Instead, the caller side can decide how to write the source
files. This makes the rules more composable and easier to use. The functionality
has been moved to a macro in private/mgmtapi/api.bzl, which is used inside of
scionproto/scion.
Drop the `go_embed_data` custom rules, those were deprecated for a long time.
Drop most of the custom `rules_openapi` bazel rules.
Users are recommended to use their own invocations of the JS tools.
The interface label value of every metric instance had been lost to a refactoring in scionproto#4422.
The oapi-codegen libraries went through a reorganization; the
repositories for the oapi-codegen build tool was separated from the
runtime libraries.
In our normal code dependencies, the oapi-codegen build tool no longer
shows up. Import it in a dummy .go file as an easy way to ensure that we
have all the appropriate dependencies (via the `go.mod`/`go_deps.bzl`) .
This leaves some clutter in the `go.mod` file which is not aesthetically
pleasing, but is not a real issue due to dependency graph 
pruning.
The alternative solution would have been to manage the dependencies of
the `rules_openapi` bazel rules explicitly in a `rules_openapi_dependencies` 
macro. This turns out to be rather cumbersome, and can lead to
unexpected or even invalid dependency version combinations, because
`rules_go` does not have a way to separate the dependencies of the build
tools from the dependencies of the code.

Aside; fix `cfg = "exec"` for the oapi-codegen buildtool so that this
tool is built for the environment of the builder, not the target
platform.
Simplify the usage of the various docker-compose configurations by including the project name in the configuration file. This has been supported for a while now in docker-compose v2. This allows to drop the `-p`/`--project-name` from all `docker-compose` incantations.

Also, streamline the docker-compose files generated by the topogen scripts; remove the explicit `container_name` configurations and drop all the explicit `scion_` prefixes -- managing these prefixes is docker-compose's job. 
Shut up the warnings on "SCION_EXPERIMENTAL_... variable is not set. Defaulting to a blank string." by using the variable expansion syntax to explicitly default to a blank string.

Also, drop the `docker compose` `--compatibility` flag. The compatibility flag affects what word separator is used in the
container name and it has long been deprecated. We don't usually rely on specific container names, so this should be
fine. In some exception cases where expecting specific container names seems more practical (containers for bazel-remote-cache and go-module-proxy) due to special casing in the CI scripts, container_name is set explicitly.
--always, will substitute an abbreviated commit hash if there's no tag in sight. Better than failing with a cryptic message.
…cionproto#4437)

* Enhance the end2end_integration programs as follows:
  * end2end_integration has a slightly finer-grain role-based pair selection capability.
  * end2end_integration outputs an extra result line that provides precise start/stop timestamps for the whole test.
* process_metrics exposes the number of cores used by Go.
* Add  end2endblast. Similar to end2end but plays a game other than pingpong: packetblast. It keeps logging and tracing to a minimum. The server side responds to very few of the pings (1/256). Just enough to prove that the circuit works.
* Add an integration test called router_benchmark:
  * relies on an especially crafted topology: router_bm.topo
  * uses end2end_integration with command end2endblast over that topology as a load test for the router
  * extracts and logs relevant performance metrics after the test
  * compares performances with a set of expectations if executed by CI.

As a benchmark program, this is only a first approximation. Extending the end2end suite provided a quick solution but isn't ideal: it uses too many cores at once due to requiring 5 routers actively running for one of the tests. More work is in progress to replace this with some leaner setup where only one router is actually running.
Update the docstring to follow the semantics of negative means undefined.
The `supervisorctl shutdown` command does not block for the supervisor
or its child processes to terminate.
This can occasionally lead to situations where SCION processes were
still running after the `scion.sh stop` command returned. In the CI
system, which immediately proceeds to bundle up the log files, this led
to `tar` reporting an error that the log file was still being written
to.
Fixed by invoking `supervisorctl stop all`, which does block, to
terminate all child processes before `shutdown`.
Bazel and bazilisk do support ARM64 by now. Lift the restriction on AMD64 and, while we're at it, update bazelisk to the latest version.
The default target remains unchanged but is renamed to `build-dev`.
The target `build` now only builds the main SCION services.
When copying metric labels we assumed that `append` will always
create a copy of the label array. This is not necessarily the case.
In such case two metrics may end up with the same underlaying
array of labels and change to one of them also overwrites the
labels in the other one.
…scionproto#4453)

I found a bug in the SCION daemon. If you create a SCION daemon connection and call one of the drkey endpoints without first setting the configuration value "drkey_level2_db.Connection" to something meaningful, the SCION daemon will encounter an error like that and crash:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xb82d33]
```
This bug can easily be reproduced by running a code snippet like this for a SCION daemon that has no drkey configured:
```
sd, _ := daemon.NewService(daemonAddr).Connect(ctx)
sd.DRKeyGetASHostKey(ctx, drkey.ASHostMeta{})
```
The problem is that in the file "daemon/internal/servers/grpc.go", when the drkey endpoints are called, DaemonServer.DRKeyClient is nil if it is not configured as described above.

Fixed by explicitly checking whether DRKey is available and erroring out if it is not.
log.FromCtx could panic if a custom Logger was used and a span was
present in the context. This commit allows the custom logger to
implement the `WithOptions(...zap.Option) Logger` method so that the
CallerSkip can still be applied. In case the logger can't be casted to
anything the caller skip is not applied, but we also don't panic
anymore.
…nproto#4444)

This is an alternative to router_benchmark and meant to eventually replace it.
It's main feature is that it only runs a single router. As a result it's measured performance is far less dependent on the number of cores available on the test system.

It is heavily inspired by the braccept test, in that:
* it relies on the test harness to create an isolated network and veth interfaces to connect the test driver to the router.
* it manufactures packets that are fed directly into the router's interfaces and it optionally captures packets for verification in the same manner.
* it assumes a custom topology and cannot be usefully run on any other topology.

It differs from braccept in that:
* It does far less verification of the forwarded packets (it's braccept's job to do that).
* It is considerably simplified, mainly because of the lighter verifications.
* It relies on a simple address-assignment scheme so code maintainers don't have to keep tens of addresses in mind.
* It does not assume that the router runs in a container network; it could also talk to a real router if given the names and mac addresses of the relevant interfaces. The test harness (or the user) is responsible for supplying the interface names and mac addresses to the test driver.
* It does not require the test harness (or the user) to have any understanding of the topology (only to configure the router with it). The specifics of the captive network to be configured are supplied by the test driver.
* The test harness doesn't need the "pause" container anymore.

The similarity between this and braccept means that, in the future, we could re-converge them. Notably, the benefits of the simple addressing scheme would make the braccept test cases easier to maintain or expand.

Collateral: the go version required was bumped to  1.21

Fixes scionproto#4442
…to#4454)

This test has been superceeded by the router_newbenchmark test.
One component of it is preserved for its usefulness as an experiment/debugging
tool: the end2endblast component of the end2end suite.
…to#4455)

Since, the original "router_benchmark" has been deleted the "new" prefix is
no longer necessary.
…scionproto#4456)

In the router. Compute a better default for the number of processors based on multiple observations: given enough streams to occupy all processors, performance is degraded if there are as many processors as cores. Best performance is obtained by setting aside two cores for other tasks. Making this the default simplifies the benchmarking process by not having to configure an override for each type of target machine being tested.

In the benchmark. Run the test with many streams. It reflects reality and allows a very parallel router to perform at its best without distorting the performance of a less-parallel one. Make sure that the number of streams is likely to be a multiple of the number of cores so the cores are loaded evenly in spite of there being typically fewer streams than in real life. (TBD, learn the router's internal processors count and just use that).

As a side effect:
* Adjust CI's expectations for the benchmark
* Add automatic retries (2) for failed CI tests.
* Allow retrying successful CI tests (helps investigating performance variations).
* Add cpu_info to the router_benchmark output (same reason).
…cionproto#4457)

* testing: improved stream synthesis in router benchmark

Changes:
* Update the flowID of packets (and a couple of checksums) on the fly (just before
  sending) as opposed to pre-constructing a different packet for each stream.
* As a result, the test cases are simplified, they no-longer need to provide one
  packet per flowID.
* In passing, deleted the dumping of /proc/cpuinfo in the log. That has served its purpose.
* In the test harness, left in the form of comments, a suggestion of core partitioning. Not
  sure how to apply that in general. This is what provides the best results on my machine.

* testing: simplify the checksum updates out of the blaster.

Also enable core-partitioning to see how it behaves on the CI system.

* testing: Add check for proper saturation in the router benchmark.

In passing: fixed a couple of function names (camel-> snake)

* testing: fix expectation for droppage in the router benchmark

Was expressed in pkts instead of %.

* testing: cosmetic change to the output of expected drop ratio.

* testing: linter friendship

* testing: include a cpu selection in the router benchmark

Also: fixed the broken prom query string.

* testing: please the linter.

* testing: log lscpus on the CI system.

* testing: fix bad string formating.

* testing: Improve core selection in the router benchmark.

Added L2 cache criteria. The test harness will prefer configuration where either cores share no L2 cache or all cores share the same one.

* testing: make the router benchmark test harness compatible with more variants
of lscpu (hopefully).

* testing: log the CPU serial number in the router benchmark.

This is done as an attempt at detecting live migration.

* build: added dmidecode as a required package.

* build: Complete the list of files that affect the list of packages to install.

Three files were missing, most notably the pkgs.txt files which are typically the ones
modified when adding a dependency. As a result new packages weren't getting installed.

* build: Delete use of dmidecode.

This proved useless as a means of spotting migrations. The information doesn't
even change from instance to instance.

* build: fix trailing blank line.

* router: reduce the benchmarks expectations for saturation.

After fixing an arithmetic error, it seems we are not able to reliably cause the
router to drop more than 1% of the packets (and then sometimes no even that).
Until we figure out how to reach higher levels, making sure that other PRs
aren't blocked.

* testing: fixed bugs in router_benchmark core selection code.

* testing: update the expectations of the router benchmark.

After switching to non-overbooked aws instances, performance has become much better
and repeatable.
Build debian packages for amd64, arm64, i386 and armel.
- There are separate packages for router, control, daemon, dispatcher,
gateway and tools (scion and scion-pki).
- The packages include systemd unit files to run the services. For the
daemon, dispatcher and gateway one instance per host is supported by the
systemd service, and a default configuration file is included. For
router and control, multiple instances per host are supported, and as a
consequence of this, no default configuration file is provided.
- Currently, there is no man page contained in the packages. We should
be able to build these from our existing manuals, but it seems to
require a significant amount of fiddling to get something useful.

Building the .deb packages uses bazel with `rules_pkg`.
The target `//dist:deb_all` cross-builds packages for the default set of
target platforms. Alternatively, the target `//dist:deb` allows to build
(all) packages for the current target platform. This current platform
can be set with the `--platforms` bazel option (see
https://github.com/bazelbuild/rules_go#how-do-i-cross-compile for more
details).
- To increase reuse of build results while cross-building, some internal
targets related to openapi forcibly ignore the target platform
- The package version is based on the current git tag. `rules_pkg` can
include this version _in_ the package metadata, but bazel _cannot_ spit
out appropriately named package files. This is addressed by copying and
renaming the package files after build in a make target `make dist-deb`.

Add installation documentation for the packages and the systemd units.
As a side effect, slightly reorganize the build documentation also, trying to 
make give a simpler path to just build the binaries without installing the
entire development setup.
)

Dissecting TLV options crashes the wireshark plugin, if the `data` field
is empty. This happens always for Pad1 options and for PadN options if
`data_len` is 0.
Documenting that we've bumped the buildkite instance type from t3 to
m6i.
The main reason for the switch is that the "burstable" nature of the
t*-instances makes them extremely unsuitable for running benchmarks. The
m6i are also much faster, but not much more expensive.
Also, the agent pool is now at most 6 agents. Spinning up 10 agents and
having to warm up all the caches, and then scaling back in after a
single build, seemed excessive.
Make the segment signer pluggable. This decouples the beacon extender
from the trust signer implementation and allows plugging in different
implementations.
Etags can be used so as not to transfer large prefix sets over and over
again.

Etags may or may not be supported by a particular gateway
implementation. This change is fully forward and backward compatible.
Implementations that do not support etags should just ignore the field.
* Add Carbon Intensity Information to beacons
First crude version, just to allow adding _some_ numbers.
Note: not merged code paths with bandwidth although this is currently
processed identically, as changes to the representation of the carbon
intensity information are to be expected.

* Add Carbon Intensity to combinator, snet.Path and showpaths

* Resolved conflict in daemon protobuf file by incrementing field
  number. Recreated protobuf generated file.
* Resolved conflicts in other places due to diff not matching original,
  by adding the carbon intentity field in data
structures/initializations again to the upstream ones.
@juagargi juagargi changed the base branch from scionlab to upstream February 15, 2024 19:01
Copy link

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % the fact that my review was quite superficial. I hope it is still more helpful than not.


lint-doc: lint-doc-mdlint

lint-doc-mdlint:
$(info ==> $@)
@FILES=$$(find -type f -iname '*.md' -not -path "./rules_openapi/tools/node_modules/*" -not -path "./.github/**/*" | grep -vf tools/md/skipped); \
@FILES=$$(find -type f -iname '*.md' -not -path "./private/mgmtapi/tools/node_modules/*" -not -path "./.github/**/*" | grep -vf tools/md/skipped); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure what this does, but the path ./private/mgmtapi/tools/node_modules/* doesn't seem to exist. Is that expected?

// |
// AS3 (br3) ---+
//
// We're only executing and monitoring br1a. All the others are a fiction (except for the knowledge

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a fiction" -> "fictional" ?

//
// The metrics we add serve to estimate the available cpu time; that is, the amount of CPU that the
// scheduler granted to the process, independently of whether it ran with it or slept on it. The
// goal is to measure what the overall performance of the process would be if it was never

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"measure" -> "estimate"?

// preempted. For example, this could be expressed as some_output/available_cpu_time.
//
// At a given time, a given thread is either running, runnable, or sleeping. When running it
// consumes exactly one core. When runnable, it is being deprived of exactly one core (because Go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exactly one core" -> I guess this ignores shared caches and other resources which slows down neighbor cores, sometimes considerably, in effect one core "consumes" more than just its share of the machine? Also , I guess process may trigger CPU operations that may still be going on even if the process is preempted (in effect the process is using machine time without using core time)?

Anyway, depending on what the measurement is used for, these effects can probably be ignored.

carbonIntensity.Inter = make(map[common.IFIDType]uint64)
for ifid := range as.IFIDs {
if ifid != outIF {
carbonIntensity.Intra[common.IFIDType(ifid)] = g.CarbonIntensity(ifid, outIF)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if ifid != outIf && isPeer() == true, the carbon intensity is added to both intra and inter?

@@ -40,6 +40,8 @@ message StaticInfoExtension {
LatencyInfo latency = 1;
// Approximate, maximum bandwidth for paths based on this ASEntry.
BandwidthInfo bandwidth = 2;
// Aproximate carbon intensity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Aproximate" -> "Approximate"

@@ -0,0 +1,326 @@
// Copyright 2023 SCION Association

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in future (not as part of this PR) we could consider a more compact (and machine readable) license header: https://spdx.dev/learn/handling-license-info/
E.g. one liner: /* SPDX-License-Identifier: Apache-2.0 */

@juagargi juagargi marked this pull request as ready for review February 29, 2024 09:05
@juagargi juagargi requested a review from matzf as a code owner February 29, 2024 09:05
@juagargi juagargi changed the title Draft: SCIONLab next version initial update SCIONLab next version initial update Feb 29, 2024
Copy link
Member Author

@juagargi juagargi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGT

Reviewable status: 0 of 220 files reviewed, 7 unresolved discussions / 0 of 1 LGTMs obtained / 0 of 1 approvals obtained


Makefile line 149 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

Not exactly sure what this does, but the path ./private/mgmtapi/tools/node_modules/* doesn't seem to exist. Is that expected?

It seems that master upstream fails for the lint make target. Can you ask upstream?


acceptance/router_benchmark/brload/main.go line 1 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

Maybe in future (not as part of this PR) we could consider a more compact (and machine readable) license header: https://spdx.dev/learn/handling-license-info/
E.g. one liner: /* SPDX-License-Identifier: Apache-2.0 */

Can you open an issue upstream?


acceptance/router_benchmark/cases/topo.go line 36 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

"a fiction" -> "fictional" ?

Upstream?


pkg/private/processmetrics/processmetrics_linux.go line 21 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

"measure" -> "estimate"?

Upstream?


pkg/private/processmetrics/processmetrics_linux.go line 25 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

"exactly one core" -> I guess this ignores shared caches and other resources which slows down neighbor cores, sometimes considerably, in effect one core "consumes" more than just its share of the machine? Also , I guess process may trigger CPU operations that may still be going on even if the process is preempted (in effect the process is using machine time without using core time)?

Anyway, depending on what the measurement is used for, these effects can probably be ignored.

Upstream?


pkg/private/xtest/graph/graph.go line 654 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

So if ifid != outIf && isPeer() == true, the carbon intensity is added to both intra and inter?

Yes. But the outIF will never be that of a peering link, because PCBs never exit thru peering links.


proto/control_plane/v1/seg_extensions.proto line 43 at r1 (raw file):

Previously, tzaeschke (Tilmann) wrote…

"Aproximate" -> "Approximate"

done.

Copy link

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 219 of 220 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @matzf)

@juagargi juagargi merged commit 941a27e into upstream Mar 8, 2024
3 checks passed
@juagargi juagargi deleted the scionlab_nextversion branch March 8, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants