Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken trace trees with tracing_instrumentations=request #11294

Closed
1 task done
ybasket opened this issue Jul 26, 2023 · 6 comments · Fixed by #11468 or #11484
Closed
1 task done

Broken trace trees with tracing_instrumentations=request #11294

ybasket opened this issue Jul 26, 2023 · 6 comments · Fixed by #11468 or #11484

Comments

@ybasket
Copy link

ybasket commented Jul 26, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.3.0

Current Behavior

With tracing_instrumentations=request, Kong will only report the request span via the OpenTelemetry plugin. The problem hereby is that it will also inject an invalid parent span ID (maybe the one of the balancer span?) into the balancer request, leading to broken trace graphs like this:

Screenshot 2023-07-26 at 10 43 57

With tracing_instrumentations=all, the span tree is not broken:

Screenshot 2023-07-26 at 11 02 58

Expected Behavior

With tracing_instrumentations=request (and any other limited set), Kong should inject the span ID of the request span as parent span ID into the balancer request so that the trace tree is not broken.

Steps To Reproduce

  1. Set up OpenTelemetry plugin with Jaeger export (other collectors should show same problem)
  2. Send request to an instrumented service
  3. Check trace in Jaeger

Anything else?

If I'm not mistaken, this was working correctly on earlier versions of Kong 3.x and broke with the update to 3.3

@samugi
Copy link
Member

samugi commented Aug 7, 2023

@ybasket thank you for reporting this, it does sound like a bug indeed.
Just to confirm: does tracing_instrumentations=request,balancer address the broken traces problem for you?

@ybasket
Copy link
Author

ybasket commented Aug 7, 2023

@ybasket thank you for reporting this, it does sound like a bug indeed.
Just to confirm: does tracing_instrumentations=request,balancer address the broken traces problem for you?

I tested with tracing_instrumentations=all (which on this test created additional spans for the OTEL plugin and the balancer) and it worked. I'm unfortunately not able to test the exact tracing_instrumentations=request,balancer in that setup soon, but from what I've seen, I would expect it to fix the issue.

@ybasket
Copy link
Author

ybasket commented Aug 11, 2023

@samugi I had a chance to test now and confirm that tracing_instrumentations=request,balancer does indeed fix the issue:
Screenshot 2023-08-11 at 10 53 17

Interestingly and visible in the screenshot, the balancer span regularly exceeds its parent span, seems as the parent is closed too early (before the balancer span finishes).

@samugi
Copy link
Member

samugi commented Aug 11, 2023

@ybasket thank you for confirming that. The behavior you describe is due to the information regarding the root span end time being reported with millisecond precision, while the balancer span is more precise, so the rounding gap is visible and should never exceed 1 ms. Definitely something that can be improved as well though.

@ybasket
Copy link
Author

ybasket commented Aug 11, 2023

@ybasket thank you for confirming that. The behavior you describe is due to the information regarding the root span end time being reported with millisecond precision, while the balancer span is more precise, so the rounding gap is visible and should never exceed 1 ms. Definitely something that can be improved as well though.

Interesting. Maybe the root span's end could be reported with "ceil" on the rounding, so that it reports the millisecond after, not before? It would be as inaccurate as the current behaviour, but avoid the warning in Jaeger and alike. But I don't know any implementation details, so maybe this is nonsense 🤷

@samugi
Copy link
Member

samugi commented Aug 28, 2023

@ybasket the linked PR will take care of the original "broken trace" issue reported here, I am treating the precision issue separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants