Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tetragon_msg_op_total metric #2856

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lambdanis
Copy link
Contributor

@lambdanis lambdanis commented Sep 1, 2024

tetragon_msg_op_total was counting events per opcode in the ring buffer queue. It wasn't particularly useful, as there are other metrics exposing similar numbers:

  • tetragon_bpf_missed_events_total counting missed events per opcode in BPF
  • tetragon_observer_ringbuf_queue_events_received_total counting total events received in the ring buffer queue
  • tetragon_events_total counting events per event type in grpc

If needed, in the future we can add opcode label to metrics counting events in the observer:

  • tetragon_observer_ringbuf_events_received_total
  • tetragon_observer_ringbuf_queue_events_received_total
  • tetragon_observer_ringbuf_queue_events_lost_total

We could also add a metric counting all events (not only missed) per opcode in BPF. However, it's unclear if they could be useful - ringbuffer and events queue shouldn't discriminate different types of events, so having total counts of successful and missed events at each stage should be enough to troubleshoot capacity issues. There is still a per event type counter at the last stage, for monitoring overall data volume.

@lambdanis lambdanis added area/metrics Related to prometheus metrics kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note/minor This PR introduces a minor user-visible change labels Sep 1, 2024
@lambdanis lambdanis requested review from mtardy and a team as code owners September 1, 2024 18:29
Copy link

netlify bot commented Sep 1, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 74212a7
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66d4b28d56f49e0008bc0344
😎 Deploy Preview https://deploy-preview-2856--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

tetragon_msg_op_total was counting events per opcode in the ring buffer queue.
It wasn't particularly useful, as there are other metrics exposing similar
numbers:
* tetragon_bpf_missed_events_total counting missed events per opcode in BPF
* tetragon_observer_ringbuf_queue_events_received_total counting total events
  received in the ring buffer queue
* tetragon_events_total counting events per event type in grpc

If needed, in the future we can add opcode label to metrics counting events in
the observer:
* tetragon_observer_ringbuf_events_received_total
* tetragon_observer_ringbuf_queue_events_received_total
* tetragon_observer_ringbuf_queue_events_lost_total

We could also add a metric counting all events (not only missed) per opcode in
BPF. However, it's unclear if they could be useful - ringbuffer and events
queue shouldn't discriminate different types of events, so having total counts
of successful and missed events at each stage should be enough to troubleshoot
capacity issues. There is still a per event type counter at the last stage, for
monitoring overall data volume.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis force-pushed the pr/lambdanis/observer-metrics-op branch from 74212a7 to 7791130 Compare September 1, 2024 18:32
@lambdanis lambdanis marked this pull request as draft September 2, 2024 08:19
@lambdanis
Copy link
Contributor Author

Converting to draft as I need to rethink how these metrics should be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Related to prometheus metrics kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note/minor This PR introduces a minor user-visible change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants