Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Simple/Batch processor to concurrently invoke Exporter.Export() #4231

Open
cijothomas opened this issue Sep 26, 2024 · 4 comments
Open
Assignees
Labels
spec:logs Related to the specification/logs directory triage:accepted:ready-with-sponsor

Comments

@cijothomas
Copy link
Member

What are you trying to achieve?

Current spec restricts Simple/Batch processors from calling exporter.export() concurrently. This can be limiting in some scenarios:

  1. When Exporting to Windows ETW, Linux user_events etc., the exporters can handle concurrent calls. This is important to get highest performance. These exporters are to be paired with SimpleProcessor, as they cannot afford to have the contention that is typical in the BatchProcessors. However, without SimpleProcessor concurretly calling export(), we wont get the required perf due to contention introduced to syncronize export() calls.
  2.  Various discussions in Clarify spans export concurrency #4205 shows the need for concurrently calling export() from BatchProcessors as well.

In short, there is need for allowing both Simple,Batch processors to call export() concurrently.

To keep back-compatibility, we can do something like:

  1. Allow exporters to advertise if their export() methods are okay to be called concurrently. If no explicit advertisement, then default to false.
  2. Relax restriction in exporting processors (Simple/Batch), to call export() concurrently, if the paired exporter has advertised that they support being called concurrently.
  3. Spec can be silent on "how" exporters advertise this, and leave it for implementations.

Additional context.

Previous attempt to make this happen, but only for SimpleProcessor: #4163

Opened for Logs, but equally applicable for Tracing too.

@cijothomas cijothomas added the spec:logs Related to the specification/logs directory label Sep 26, 2024
@CodeBlanch
Copy link
Member

CodeBlanch commented Oct 14, 2024

What's interesting is Logs spec has some language which could be seen as permissive:

The SDK MUST allow users to implement and configure custom processors and
decorate built-in processors for advanced scenarios such as enriching with
attributes.

If we consider changing the concurrency of the simple processor an advanced scenario 😄

I have a prototype in .NET of @cijothomas's idea:

  1. Allow exporters to advertise if their export() methods are okay to be called concurrently. If no explicit advertisement, then default to false.
  2. Relax restriction in exporting processors (Simple/Batch), to call export() concurrently, if the paired exporter has advertised that they support being called concurrently.
  3. Spec can be silent on "how" exporters advertise this, and leave it for implementations.

Prototype: open-telemetry/opentelemetry-dotnet#5758

If users decorate a log or span exporter like this...

[ConcurrencyModes(ConcurrencyModes.Multithreaded | ConcurrencyModes.Reentrant)]
public class MyLogExporter : BaseExporter<LogRecord>
{
    // Not shown
}

[ConcurrencyModes(ConcurrencyModes.Multithreaded | ConcurrencyModes.Reentrant)]
public class MySpanExporter : BaseExporter<Activity>
{
    // Not shown
}

Then the simple processor will allow Export calls concurrently to that exporter.

PS: We have the same design (attribute-based) already in .NET for Push/Pull metrics exporter decoration.

@austinlparker
Copy link
Member

I think this would be valuable in a few different scenarios I can think of outside the ones that were identified -- especially around export of OTLP-JSON to an intermediate transmission medium (e.g., writing to a Lambda buffer where the OTLP will be reconstructed and forwarded by an extension layer)

@trask
Copy link
Member

trask commented Oct 15, 2024

@open-telemetry/technical-committee this seems reasonable, are you ok if we mark it as triage:accepted:needs-sponsor?

@jmacd
Copy link
Contributor

jmacd commented Oct 16, 2024

The TC accepts this issue and I have volunteered to act as sponsor. The topic of concurrent export is something I have a lot of experience with across OTel, and I have implemented concurrent batch processor logic in both the SDK and the Collector.

As I understand it, we have several concerns here:

Enabling Exporter concurrency

Can the exporter accept concurrent calls? This is a "capabilities" question, one that can be answered in language-specific ways. For example, some languages have marker traits, some use interfaces for the same, and so on. Each language should come up with an idiomatic way for the SDK to determine whether the exporter is willing to accept concurrency.

Enabling Processor concurrency

The functional requirement is that Export() can be called concurrently, and it appears useful for both classes of processor in today's specification. However, we cannot always permit unlimited concurrency, and SDKs need some sort of limit or users would not trust them. I have seen two forms of limit that can apply, and both are successful: we can limit actively-used memory, or we can limit concurrency and batch size. The most important aspect of that is that the Exporter is able to impede the Processor to prevent it from consuming more resources when the limit is reached.

Simple processor case

The simple processor is special because it calls Export synchronously while blocking the caller. Without concurrency, each call to the simple exporter blocks until the exporter can give it service. If we permit exporter concurrency in this case, users will get what they expect: their logs and spans are sent one-by-one, and the greater the concurrency factor, the less likely they are to be queued waiting for export.

In this case, I would not expect unlimited concurrency to be a problem in most cases. When the user has selected a simple processor, it means the exporter directly impedes the caller. The total resource footprint of the SDK, Processor, and Exporter in this case is proportional to the application's usage of telemetry APIs, thus whatever limits the application has for itself carry over to the SDK and its resources. In other words, we expect a constant-factor increase in memory usage resulting from each call site, but callers are blocked while this happens, so the application naturally limits itself and the simple processor may not need to impose additional concurrency limits.

In this case, I am not convinced that it it is worthwhile to support configurable concurrency -- users are probably best served by an unlimited-concurrency simple processor if the exporter supports it, and users are not well-served by a single-concurrent-export processor. However, we have a stable specification so I would not wish to change the existing behavior. This suggests we can could:

  1. add a new type (e.g., SimpleConcurrentProcessor with unlimited concurrency)
  2. add a new type (e.g., LimitedConcurrencyProcessor with configurable concurrency limit)
  3. add a new concurrency-limit parameter for the existing Simple processor that defaults to 1

I favor the first option because it is the only truly simple alternative. IMO we should not call a processor with a concurrency limit "simple".

Batch Span/Log processor case

Ignoring the Metrics SDK, discussed below, there are several ways to limit resource impact. The most important outcome from selecting the batch processor option is that telemetry data will drop instead of exceeding the configured resource footprint. Here are some ways we can limit resources to control when dropping happens:

  1. By memory size (sum of pending item sizes)
  2. By item count (number of pending spans or logs)
  3. By export concurrency limit with a batch-size limit (either item size or item count)

From experience, I favor the first two approaches. When you have a limit on pending items or bytes, there's no great need to also limit concurrency. However, the third option is the easiest to implement and we can make an incremental change to the batch processor specification to add concurrency under this approach. So, while recognizing that pending item-count or bytes-count limits are closer to how a user thinks about governing SDK resources, extending the batch processor with a configurable concurrency limit (default = 1) makes sense.

Metrics case

The OTel metrics SDK specification does not directly lend itself to concurrent exports, because concurrency introduces a risk of partially-complete data covering an interval. Today's specification has an all-or-none quality which is, I think, a good default. However, I've seen cases where an SDK emits so many metrics that a single collection cycle can exceed maximum-request-size limits in a typical pipeline. When this is the case, a user may desire to split the export into smaller requests and export them concurrently. Equivalent functionality exists in the OpenTelemetry collector, where the batch processor component is capable of both splitting and combining requests into batches with a maximum item count.

If users are interested, we could work on specifications for push-based metrics exporters to support splitting requests so that concurrent export is supported, however there is no existing "Metrics Processor" concept in our specification.

Additional resources

There are two different kinds of batching functionality in the Collector today, and both have a problem with concurrency. I have proposed fixing the batch processor as follows:
open-telemetry/opentelemetry-collector#11308

Others are investigating how to extend the Collector's exporterhelper-based batcher with concurrency, but there is a problem with code organization standing in the way.
open-telemetry/opentelemetry-collector#10368

Lightstep uses custom OTel metrics and tracing SDKs. Our SDKs have replaced the OTel-Go metrics and trace exporters with pipelines based on OTel Collector components. Therefore, our SDKs can be configured with the same batching parameters that the collector has, and we use this both for concurrency (e.g., of spans) and for splitting (e.g., of metrics). In order for this to work, we have developed a custom "concurrentbatchprocessor" component for the collector, which we are currently trying to contribute back to the upstream batch processor component:
https://github.com/open-telemetry/otel-arrow/blob/main/collector/processor/concurrentbatchprocessor/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory triage:accepted:ready-with-sponsor
Projects
Status: No status
Development

No branches or pull requests

6 participants