Conntrack latency measurement recipe #376

enhaut · 2024-09-02T14:27:08Z

Description

still WIP, opening just to get some feedback

This MR adds support for "latency on cache miss conntrack recipe" and is supposed to measure latency during conntrack's cache miss. That's done by measuring latency for data transfer over a newly opened TCP connection (uncached), comparing it to cached connection samples. The difference between cached and uncached transfer most of the time in order of magnitude.

Tests

Results.merge_with() test script

from lnst.RecipeCommon.Perf.Results import  SequentialPerfResult, ParallelPerfResult, PerfInterval

# no recursion branch test
multi_parallel = ParallelPerfResult()
sequential0 = SequentialPerfResult()
sequential1 = SequentialPerfResult()

for j in range(1):
   sequential0.append(PerfInterval(j, 0.1, "s", (j)*0.1+100))
   sequential1.append(PerfInterval(j+100, 0.1, "s", (j)*0.1+100))
multi_parallel.append(sequential0)
multi_parallel.append(sequential1)


multi_parallel2 = ParallelPerfResult()
sequentiall0 = SequentialPerfResult()
sequentiall1 = SequentialPerfResult()
for j in range(1):
   sequentiall0.append(PerfInterval(j, 0.1, "s", (j)*0.1+100))
   sequentiall1.append(PerfInterval(j+100, 0.1, "s", (j)*0.1+100))
multi_parallel2.append(sequentiall0)
multi_parallel2.append(sequentiall1)

r = multi_parallel.merge_with(multi_parallel2)

assert type(r) == type(multi_parallel)
assert type(r[0]) == type(multi_parallel[0])
assert sequential0[0] in r[0]
assert sequentiall0[0] in r[0]

assert sequential1[0] in r[1]
assert sequentiall1[0] in r[1]


# recursion branch test
seq_container0 = SequentialPerfResult()
seq_container0.append(sequential0)
seq_container1 = SequentialPerfResult()
seq_container1.append(sequential1)

seq_container2 = SequentialPerfResult()
seq_container2.append(sequentiall0)
seq_container3 = SequentialPerfResult()
seq_container3.append(sequentiall1)

multi_parallel = ParallelPerfResult()
multi_parallel.append(seq_container0)
multi_parallel.append(seq_container1)

multi_parallel2 = ParallelPerfResult()
multi_parallel2.append(seq_container2)
multi_parallel2.append(seq_container3)

r = multi_parallel.merge_with(multi_parallel2)
print(r)

assert type(r) == type(multi_parallel)
assert type(r[0]) == type(multi_parallel[0])
assert type(r[0][0]) == type(multi_parallel[0][0])

assert sequential0[0] in r[0][0]
assert sequentiall0[0] in r[0][0]

assert sequential1[0] in r[1][0]
assert sequentiall1[0] in r[1][0]

Just split of methods into multiple smaller. So they can be reused by other parts of LNST as well.

Method slices raw samples by index.

Metdho `PerfList.merge_with()` merges 2 `PerfList` objects with the same structure. E.g. following results container: ``` ParallelPerfResult( SequentialPerfResult( PerfInterval(value=1) ), SequentialPerfResult( PerfInterval(value=2) ) ) ``` merged with themself will result to: ``` ParallelPerfResult( SequentialPerfResult( PerfInterval(value=1), PerfInterval(value=1) ), SequentialPerfResult( PerfInterval(value=2), PerfInterval(value=2) ) ) ``` It simply merges all the `PerfList` layers all the way to `PerfInterval`.

The only way to saving scalar samples/results is to use PerfInterval and {Sequential,Parallel}PerfResult. However, these are meant to store vector/multidimensional data - value and duration and all the calculations it does expects vector data.

Added support for measuring latency (in background) during (flow) test. The latency is measured over single long-lived TCP connection that gathers `latency_packets_count-1` samples after measurement start. It then runs `cache_poison_tool` function which is supposed to somehow poison cache, then last sample is gathered. `LatencyMeasurementResults` then distinguish between uncached and cached latency which refers to 1st and last sample, the middle samples respectively. The problem this is trying to solve is to separate samples that were gathered when DUT was able to cache everything needed and samples where DUT hit lots of cache misses during connection handling. sq latency measurement results

This is supposed to test latency of conntrack during cache miss sq ct latency on cache miss recipe

olichtne · 2024-09-18T13:41:22Z

lnst/RecipeCommon/Perf/Results.py

@@ -171,6 +171,25 @@ def time_slice(self, start, end):
            )
        return result

+    def samples_slice(self, slicer: callable):


i still think this probably shouldn't be part of the generic Perf.Results.PerfList class and instead you probably just want this functionality as a helper in the tests that use the specific type of slicing that you have here.

e.g. with some setups you could have:

PerfSequentialResult PerfSequentialResult: [PerfInterval, ...] PerfSequentialResult: [PerfInterval, ...]

which would mean that you ran 2 repetitions of a stream test for example... with the slice you would cut each repetition to a shorter one

but

PerfParallelResult PerfSequentialResult: [PerfInterval, ...] PerfSequentialResult: [PerfInterval, ...]

has a completely different meaning since now the streams are parallel.

Basically I'm not sure if having this method work recursively wouldn't lead to confusing situations based on how you organize the recursive hierarchy of PerfList type objects... and so it may be more relevant to instead have a "specific" helper function in a place which is informed about the hierarchy which it is working with and where we can write a specific enough documentation that the use case is understood.

olichtne · 2024-09-18T13:43:15Z

lnst/RecipeCommon/Perf/Results.py

@@ -155,6 +155,26 @@ def __setitem__(self, i, item):

        super(PerfList, self).__setitem__(i, item)

+    def merge_with(self, iterable):


and this again i would have as a separate helper method somewhere else.

maybe at some point when these methods are proven to be generic we could collect them into some "common" module that acts on PerfList objects but I don't think this should be included in the PerfList class itself.

olichtne · 2024-09-18T13:58:20Z

lnst/RecipeCommon/Perf/Results.py

+class ParallelScalarResult(ParallelPerfResult):
+    @property
+    def average(self):
+        samples_count = sum([len(i) for i in self])


this won't work if:

ParallelScallarResult([ScalarSample, ...])

as ScalarSample doesn't support len

olichtne · 2024-09-18T13:59:22Z

lnst/RecipeCommon/Perf/Results.py

@@ -40,6 +40,43 @@ def end_timestamp(self):
    def time_slice(self, start, end):
        raise NotImplementedError()

+class ScalarSample(PerfResult):


not sure about this implementation considering it's still using duration and has time slicing.... at least this should be refactored so that the common PerfInterval and ScalarSample property getters are in some common class...

olichtne · 2024-09-19T11:09:45Z

lnst/RecipeCommon/Perf/Measurements/LatencyMeasurement.py

+from lnst.Controller.RecipeResults import ResultType
+
+
+class LatencyMeasurement(BaseFlowMeasurement):


discussed on tech meeting that this could maybe work as a standard measurement which simply measures latency in a regular interval and is combined with an additional measurement which is "more primary" in the overall PerfRecipeConfiguration which has a "10 second" quiet period and then executes the poisoning, in that way you will get the following hierarchy:

1. cpu measurement: [....] 2. cpu measurement: [....] 3. latency measurement: [10, 1, 1, 1, 1, 1, 100, 100, ...] 4. poisoning measurement: [0 , 0, 0, 0, 0, 100, 100, ...]

and afterwards you can postprocess these results to get:

1. cpu measurement: [....] 2. cpu measurement: [....] 3. latency measurement - start: [10] 4. latency measurement - middle: [1, 1, 1, 1, 1] 5. latency measurement - poisoned: [100, 100, ...]

and evaluate these each individually

enhaut · 2024-10-18T12:53:34Z

Closing this, we internally agreed we actually don't have a customer use case for this test.

enhaut added 3 commits September 2, 2024 16:21

LongLivedConnectionsMixin: methods split

a991476

Just split of methods into multiple smaller. So they can be reused by other parts of LNST as well.

Results: added samples_slice method

9749801

Method slices raw samples by index.

enhaut force-pushed the conntrack_latency branch from b354444 to 216f876 Compare September 16, 2024 13:22

enhaut added 3 commits September 18, 2024 14:53

CTLatencyOnCacheMissRecipe: added recipe for measuring conntrack latency

94cb414

This is supposed to test latency of conntrack during cache miss sq ct latency on cache miss recipe

enhaut force-pushed the conntrack_latency branch from 216f876 to 94cb414 Compare September 18, 2024 13:01

olichtne requested changes Sep 19, 2024

View reviewed changes

enhaut closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conntrack latency measurement recipe #376

Conntrack latency measurement recipe #376

enhaut commented Sep 2, 2024 •

edited

Loading

olichtne Sep 18, 2024

olichtne Sep 18, 2024

olichtne Sep 18, 2024

olichtne Sep 18, 2024

olichtne Sep 19, 2024

enhaut commented Oct 18, 2024

		@@ -155,6 +155,26 @@ def __setitem__(self, i, item):

		super(PerfList, self).__setitem__(i, item)

		def merge_with(self, iterable):

		from lnst.Controller.RecipeResults import ResultType


		class LatencyMeasurement(BaseFlowMeasurement):

Conntrack latency measurement recipe #376

Conntrack latency measurement recipe #376

Conversation

enhaut commented Sep 2, 2024 • edited Loading

Description

Tests

olichtne Sep 18, 2024

Choose a reason for hiding this comment

olichtne Sep 18, 2024

Choose a reason for hiding this comment

olichtne Sep 18, 2024

Choose a reason for hiding this comment

olichtne Sep 18, 2024

Choose a reason for hiding this comment

olichtne Sep 19, 2024

Choose a reason for hiding this comment

enhaut commented Oct 18, 2024

enhaut commented Sep 2, 2024 •

edited

Loading