Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Propagate context from async producer #538

Merged
merged 18 commits into from
Jul 14, 2023

Conversation

plantfansam
Copy link
Contributor

@plantfansam plantfansam commented Jun 30, 2023

I'd like to marinate on this one for a couple days, but I think that any "... send" spans generated from ruby-kafka's AsyncProducer's do_loop method will be root spans. This is because we do not explicitly pass context between AsyncProducer#perform and Producer#perform. To demonstrate, consider the traces generated by this code excerpted from ruby_kafka.rb:

# Add message to producer buffer

async_producer = kafka.async_producer(delivery_threshold: 1, delivery_interval: 0.001)

tracer.in_span("hello async") do |span|
  async_producer.produce('async example', topic: topic, headers: { foo: :bar }, create_time: Time.now)
end

producer.deliver_messages

# Process messages
count = 0
kafka.each_message(topic: topic) do |message|
  count += 1
  break if count == 2 # we only generate 2 messages
end

Traces for async message production without new patch

"Hello async" parent span has no child spans

Screenshot 2023-07-06 at 2 01 32 PM

"... send" span is parent of consumer span, but does not point to "Hello async" as its parent

Screenshot 2023-07-06 at 2 06 39 PM

Traces for async message production with new patch

image

Background

Some background on how AsyncProducer works (AFAICT): the AsyncProducer has a background thread that watches an instance variable called @queue. #produce shoves stuff onto @queue, which the background thread handles (i.e. calls @producer.produce).

Strategy

This PR attempts to get the correct parent on "... send" spans by:

  1. Injecting the current trace context into headers during AsyncProducer#produce
  2. Trying extract the trace context from that header in Producer#produce and use it as the parent for the ...send span (we only do this if the current span is INVALID, indicating that Producer#produce is running in the context of do_loop).

We could, alternatively, patch AsyncProducer#producer to push the current OpenTelemetry context onto @queue when it's adding messages to them, then pop that off the @queue in do_loop, but then we'd have to overwrite the method signature for perform which I don't really want to do. It almost feels appropriate to use a propagator to pass context between these two methods, but maybe that's Friday afternoon talking.

Thoughts? Feelings? Warnings?

@arielvalentin
Copy link
Collaborator

@plantfansam do you have a sample app and/or screenshots of this in action?

@plantfansam
Copy link
Contributor Author

@arielvalentin I'll get some screenshots and/or try to set up an example app today.

@plantfansam
Copy link
Contributor Author

@arielvalentin I updated the PR with a docker-compose.yml with instructions and added some screenshots/better explanation to PR description! Let me know if anything's confusing...

@arielvalentin
Copy link
Collaborator

@plantfansam we have a docker-compose file in the top level directory.

Is there a reason not to add the example directory there? https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/docker-compose.yml

OpenTelemetry.propagation.inject(headers)
super
# If context is unset, try to inject headers injected by async producer
ctx = if OpenTelemetry::Trace.current_span == OpenTelemetry::Trace::Span::INVALID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 would checking if this is a recording? span be more appropriate?

Copy link
Contributor Author

@plantfansam plantfansam Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so: 00cd9e8.

Now, logic is: if there's already a span recording when we produce, use that as context; otherwise try to extract context from headers.

There's maybe an edge case where the active span when #producer runs has been sampled out, but produce is being called with headers: with a trace context indicating a sampled in parent. Should we (info-level) log a warning in that circumstance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we (info-level) log a warning in that circumstance?

What would the info log message say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I totally missed this @robbkidd! I've changed the code a bit since I made this comment so I don't think a log is necessary...

@plantfansam
Copy link
Contributor Author

Is there a reason not to add the example directory there? https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/docker-compose.yml

Not conceptually, no, I just can't validate it with docker-compose (long story). podman-compose doesn't like for me to run podman-compose run against services that have a depends_on: block for whatever reason. I think the config would look like this:

  ex-instrumentation-ruby_kafka:
    <<: *base
    working_dir: /app/instrumentation/ruby_kafka/example
    depends_on:
      - kafka
      - jaeger
   ...
   jaeger:
    image: jaegertracing/all-in-one
    ports:
      - "16686:16686"
      - "4318:4318" # open up this port for OTLP export

@arielvalentin
Copy link
Collaborator

Is there a reason not to add the example directory there? https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/docker-compose.yml

Not conceptually, no, I just can't validate it with docker-compose (long story). podman-compose doesn't like for me to run podman-compose run against services that have a depends_on: block for whatever reason. I think the config would look like this:

Does that mean you are not able to use the other examples with podman?

@plantfansam
Copy link
Contributor Author

Is there a reason not to add the example directory there? https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/docker-compose.yml

Not conceptually, no, I just can't validate it with docker-compose (long story). podman-compose doesn't like for me to run podman-compose run against services that have a depends_on: block for whatever reason. I think the config would look like this:

Does that mean you are not able to use the other examples with podman?

Yeah, when run from the root of the repo, that is the case (e.g. podman-compose run ex-instrumentation-bunny). podman-compose up works fine, so running instrumentation specs that rely on dependent services is no big deal.

@arielvalentin
Copy link
Collaborator

Ok that's good to know. Would you mind writing up an issue for that?

@arielvalentin
Copy link
Collaborator

Calling @robbkidd for exploratory testing!

@robbkidd robbkidd self-assigned this Jul 9, 2023
@plantfansam
Copy link
Contributor Author

I'd like to add a few more tests to this this morning to ensure that all args passed properly from async_producer#produce to producer#produce across all supported ruby versions.

@robbkidd
Copy link
Member

Well, I like that child spans are connected to parents now.

I was able to run the example with docker-compose from the root of the repo.

  • I updated the exporter endpoint in example/ruby_kafka.rb to http://otelcol:4318/v1/traces.
  • I added the following service to root docker-compose.yml:
  ex-instrumentation-ruby_kafka:
    <<: *base
    image: ruby:3.1-slim # protobuf is upset with the musl environment of *base's ruby alpine image
    depends_on:
      - kafka
    working_dir: /app/instrumentation/ruby_kafka/example
  • Start it all up with dcupd ex-instrumentation-ruby_kafka otelcol jaeger
  • Run the demo with dce ex-instrumentation-ruby_kafka bash -c "bundle && bundle exec ruby ruby_kafka.rb"

And I see complete traces in local Jaeger:

Screenshot 2023-07-10 at 5 18 09 PM

@plantfansam
Copy link
Contributor Author

I have updated the PR to have the synchronous producer always trust the headers it receives. My contention is that if someone is injecting headers into produce, they know what they're doing. It still looks fine in dev:

image

image

plantfansam and others added 3 commits July 13, 2023 12:42
…uby_kafka/patches/producer.rb

Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>
@arielvalentin arielvalentin merged commit 0dd8ac5 into open-telemetry:main Jul 14, 2023
42 checks passed
@arielvalentin arielvalentin changed the title Propagate context from async producer fix: Propagate context from async producer Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants