Optimize `Scannable#name()` and related logic #3901

Stephan202 · 2024-10-03T10:34:52Z

Performance is improved in two ways:

By invoking Scannable#stepName() only when Attr.NAME is not
explicitly set.
By optimizing Traces#extractOperatorAssemblyInformationParts, to
which several Scannable#stepName() implementations delegate.

The Scannable#name() logic may be executed many times, e.g. if a hot
code path uses {Mono,Flux}#log or Micrometer instrumentation. The
added benchmark shows that for large stack traces, the new Traces
implementation is several orders of magnitude more efficient in terms of
compute and memory resource utilization.

Deferral of invocation of Scannable#stepName() assumes that said
method does not have side-effects. This is true for all built-in
implementations.

While there, improve two existing benchmarks by utilizing the black hole
to which benchmark method return values are implicitly sent.

This logic may be executed many times, e.g. if a hot code path uses `{Mono,Flux}#log` or Micrometer instrumentation. The added benchmark shows that for large stack traces the new implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization. While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.

Stephan202

Added some comments with context. This PR relates to #3900.

Stephan202 · 2024-10-03T10:35:28Z

benchmarks/src/main/java/reactor/core/publisher/MonoAllBenchmark.java

+	public Boolean measureThroughput() {
+		return Flux.range(0, rangeSize)


Here and in MonoCallableBenchmark: as stated, these are unrelated improvements. Can be pulled into a separate PR if desired.

Stephan202 · 2024-10-03T10:38:41Z

benchmarks/src/main/java/reactor/core/publisher/TracesBenchmark.java

+@BenchmarkMode({Mode.AverageTime})
+@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
+@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
+@Fork(value = 1)
+@OutputTimeUnit(TimeUnit.NANOSECONDS)
+@State(Scope.Benchmark)
+public class TracesBenchmark {


The configuration and setup for this benchmark match those of the existing benchmarks, including explicit specification of default parameters. Can be cleaned up a bit more if desired.

As for the impact of this PR, the following shows the before- and after output of ./gradlew jmh --include="TracesBenchmark" --profilers="gc":

Before:

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 93.082 ± 5.671 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8607.367 ± 515.275 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 117.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 153.000 ms TracesBenchmark.measureThroughput 0 10 avgt 5 398.660 ± 16.663 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 10 avgt 5 7081.158 ± 294.949 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 10 avgt 5 2960.001 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 10 avgt 5 96.000 counts TracesBenchmark.measureThroughput:gc.time 0 10 avgt 5 124.000 ms TracesBenchmark.measureThroughput 0 100 avgt 5 2999.075 ± 89.655 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 100 avgt 5 7364.662 ± 220.010 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 100 avgt 5 23160.019 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 100 avgt 5 100.000 counts TracesBenchmark.measureThroughput:gc.time 0 100 avgt 5 131.000 ms TracesBenchmark.measureThroughput 0 1000 avgt 5 27916.269 ± 1601.038 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1000 avgt 5 7751.034 ± 442.501 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1000 avgt 5 226865.228 ± 0.070 B/op TracesBenchmark.measureThroughput:gc.count 0 1000 avgt 5 105.000 counts TracesBenchmark.measureThroughput:gc.time 0 1000 avgt 5 133.000 ms TracesBenchmark.measureThroughput 10 0 avgt 5 402.076 ± 11.693 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 0 avgt 5 7476.144 ± 216.947 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 0 avgt 5 3152.001 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 10 0 avgt 5 131.000 ms TracesBenchmark.measureThroughput 10 10 avgt 5 626.244 ± 31.574 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 10 avgt 5 8053.582 ± 403.074 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 10 avgt 5 5288.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 10 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 10 10 avgt 5 140.000 ms TracesBenchmark.measureThroughput 10 100 avgt 5 3331.927 ± 253.038 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 100 avgt 5 7608.535 ± 564.012 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 100 avgt 5 26576.019 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 100 avgt 5 104.000 counts TracesBenchmark.measureThroughput:gc.time 10 100 avgt 5 137.000 ms TracesBenchmark.measureThroughput 10 1000 avgt 5 27321.302 ± 1180.729 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 1000 avgt 5 7992.724 ± 345.188 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 1000 avgt 5 228969.037 ± 0.044 B/op TracesBenchmark.measureThroughput:gc.count 10 1000 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 10 1000 avgt 5 140.000 ms TracesBenchmark.measureThroughput 100 0 avgt 5 3119.034 ± 70.765 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 0 avgt 5 7555.751 ± 170.859 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 0 avgt 5 24712.018 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 0 avgt 5 102.000 counts TracesBenchmark.measureThroughput:gc.time 100 0 avgt 5 131.000 ms TracesBenchmark.measureThroughput 100 10 avgt 5 3384.598 ± 134.383 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 10 avgt 5 7894.374 ± 310.075 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 10 avgt 5 28016.020 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 10 avgt 5 107.000 counts TracesBenchmark.measureThroughput:gc.time 100 10 avgt 5 141.000 ms TracesBenchmark.measureThroughput 100 100 avgt 5 5920.906 ± 192.896 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 100 avgt 5 7678.427 ± 250.022 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 100 avgt 5 47672.060 ± 0.002 B/op TracesBenchmark.measureThroughput:gc.count 100 100 avgt 5 104.000 counts TracesBenchmark.measureThroughput:gc.time 100 100 avgt 5 141.000 ms TracesBenchmark.measureThroughput 100 1000 avgt 5 28805.156 ± 1329.846 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 1000 avgt 5 8211.734 ± 378.784 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 1000 avgt 5 248017.124 ± 0.052 B/op TracesBenchmark.measureThroughput:gc.count 100 1000 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 100 1000 avgt 5 147.000 ms TracesBenchmark.measureThroughput 1000 0 avgt 5 27765.904 ± 1098.007 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 0 avgt 5 8341.364 ± 333.465 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 0 avgt 5 242849.078 ± 0.042 B/op TracesBenchmark.measureThroughput:gc.count 1000 0 avgt 5 113.000 counts TracesBenchmark.measureThroughput:gc.time 1000 0 avgt 5 145.000 ms TracesBenchmark.measureThroughput 1000 10 avgt 5 28772.385 ± 1965.006 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 10 avgt 5 8115.825 ± 550.707 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 10 avgt 5 244809.121 ± 0.076 B/op TracesBenchmark.measureThroughput:gc.count 1000 10 avgt 5 111.000 counts TracesBenchmark.measureThroughput:gc.time 1000 10 avgt 5 145.000 ms TracesBenchmark.measureThroughput 1000 100 avgt 5 31094.765 ± 1220.979 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 100 avgt 5 8049.444 ± 315.635 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 100 avgt 5 262450.032 ± 0.080 B/op TracesBenchmark.measureThroughput:gc.count 1000 100 avgt 5 109.000 counts TracesBenchmark.measureThroughput:gc.time 1000 100 avgt 5 142.000 ms TracesBenchmark.measureThroughput 1000 1000 avgt 5 55008.078 ± 3723.012 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 1000 avgt 5 8252.861 ± 557.526 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 1000 avgt 5 475940.138 ± 0.280 B/op TracesBenchmark.measureThroughput:gc.count 1000 1000 avgt 5 112.000 counts TracesBenchmark.measureThroughput:gc.time 1000 1000 avgt 5 148.000 ms

After

Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units TracesBenchmark.measureThroughput 0 0 avgt 5 28.283 ± 2.319 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12142.383 ± 996.110 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 165.000 counts TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 218.000 ms TracesBenchmark.measureThroughput 0 10 avgt 5 48.606 ± 4.176 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 10 avgt 5 11305.217 ± 943.994 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 10 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 10 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 0 10 avgt 5 211.000 ms TracesBenchmark.measureThroughput 0 100 avgt 5 47.248 ± 2.156 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 100 avgt 5 11626.888 ± 524.566 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 100 avgt 5 576.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 100 avgt 5 158.000 counts TracesBenchmark.measureThroughput:gc.time 0 100 avgt 5 204.000 ms TracesBenchmark.measureThroughput 0 1000 avgt 5 47.641 ± 4.503 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 0 1000 avgt 5 11535.133 ± 1068.897 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1000 avgt 5 576.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 0 1000 avgt 5 157.000 counts TracesBenchmark.measureThroughput:gc.time 0 1000 avgt 5 199.000 ms TracesBenchmark.measureThroughput 10 0 avgt 5 47.941 ± 1.134 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 0 avgt 5 10821.406 ± 255.277 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 0 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 0 avgt 5 147.000 counts TracesBenchmark.measureThroughput:gc.time 10 0 avgt 5 182.000 ms TracesBenchmark.measureThroughput 10 10 avgt 5 48.681 ± 3.065 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 10 avgt 5 10658.850 ± 672.502 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 10 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 10 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 10 10 avgt 5 189.000 ms TracesBenchmark.measureThroughput 10 100 avgt 5 48.309 ± 3.576 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 100 avgt 5 10741.652 ± 778.802 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 100 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 100 avgt 5 147.000 counts TracesBenchmark.measureThroughput:gc.time 10 100 avgt 5 202.000 ms TracesBenchmark.measureThroughput 10 1000 avgt 5 46.582 ± 2.587 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 10 1000 avgt 5 11138.595 ± 611.256 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 1000 avgt 5 544.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 10 1000 avgt 5 152.000 counts TracesBenchmark.measureThroughput:gc.time 10 1000 avgt 5 202.000 ms TracesBenchmark.measureThroughput 100 0 avgt 5 46.127 ± 2.642 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 0 avgt 5 11248.700 ± 638.495 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 0 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 0 avgt 5 153.000 counts TracesBenchmark.measureThroughput:gc.time 100 0 avgt 5 201.000 ms TracesBenchmark.measureThroughput 100 10 avgt 5 46.186 ± 2.518 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 10 avgt 5 11234.035 ± 610.123 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 10 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 10 avgt 5 153.000 counts TracesBenchmark.measureThroughput:gc.time 100 10 avgt 5 199.000 ms TracesBenchmark.measureThroughput 100 100 avgt 5 48.736 ± 1.891 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 100 avgt 5 10645.285 ± 416.594 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 100 avgt 5 544.000 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 100 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 100 100 avgt 5 185.000 ms TracesBenchmark.measureThroughput 100 1000 avgt 5 46.312 ± 2.365 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 100 1000 avgt 5 11203.368 ± 573.569 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 1000 avgt 5 544.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 100 1000 avgt 5 152.000 counts TracesBenchmark.measureThroughput:gc.time 100 1000 avgt 5 198.000 ms TracesBenchmark.measureThroughput 1000 0 avgt 5 46.600 ± 2.220 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 0 avgt 5 11133.818 ± 527.892 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 0 avgt 5 544.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1000 0 avgt 5 152.000 counts TracesBenchmark.measureThroughput:gc.time 1000 0 avgt 5 201.000 ms TracesBenchmark.measureThroughput 1000 10 avgt 5 45.805 ± 1.806 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 10 avgt 5 11326.617 ± 446.037 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 10 avgt 5 544.002 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1000 10 avgt 5 154.000 counts TracesBenchmark.measureThroughput:gc.time 1000 10 avgt 5 206.000 ms TracesBenchmark.measureThroughput 1000 100 avgt 5 48.450 ± 2.778 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 100 avgt 5 10709.195 ± 616.791 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 100 avgt 5 544.003 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1000 100 avgt 5 145.000 counts TracesBenchmark.measureThroughput:gc.time 1000 100 avgt 5 191.000 ms TracesBenchmark.measureThroughput 1000 1000 avgt 5 51.759 ± 2.299 ns/op TracesBenchmark.measureThroughput:gc.alloc.rate 1000 1000 avgt 5 10023.884 ± 443.024 MB/sec TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 1000 avgt 5 544.004 ± 0.001 B/op TracesBenchmark.measureThroughput:gc.count 1000 1000 avgt 5 136.000 counts TracesBenchmark.measureThroughput:gc.time 1000 1000 avgt 5 174.000 ms

Note how performance of the new implementation is almost independent of the input, while the old implementation scales fairly poorly.

Stephan202 · 2024-10-03T10:44:11Z