Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to declare multiple elasticsearch appenders with different jackson mixins? #41

Open
romanmar1 opened this issue Mar 22, 2020 · 14 comments
Labels

Comments

@romanmar1
Copy link

If yes, is that efficient resource-wise?

@rfoltyns
Copy link
Owner

Yes, it's possible.

Resource-wise - it depends. Usually it's not efficient, as every Elasticsearch appender instance spawns several threads (batch emitter, metric printers, IO threads) and if you use pooled buffers, it also allocates at least as many bytes as you configure.

It depends what you need. Could you describe what exactly you want to achieve?

E.g. if you'd like to have mixins-per-index, I'd recommend programmatic config with:

  • one AsyncbatchDelivery, one ItemSourceFactory and ObjectWriter per set of mixins and use it like this:
asyncBatchDelivery.add(indexName, itemSourceFactory.create(logEvent, objectWriter));

or

  • one AsyncbatchDelivery and JacksonJsonLayout per set of mixins (with shared ItemSourceFactory!) and use it like this:
asyncBatchDelivery.add(indexName, jacksonJsonLayout.serialize(logEvent));

@romanmar1
Copy link
Author

I see.

I'll change the focus of my question a bit (or a lot):
Suppose I need to write logs for different components to different indices with different payloads,
and I want to be resource efficient as much as possible (for example: instantiate only 1 instance of JestClient, or 1 instance of apache async http client , with shared everything).
Mixins are not longer relevant to the question (I think i'll use pattern layout instead).

How would i do that?

@rfoltyns
Copy link
Owner

If you want to use the same appender instance for multiple indices, for now, you'll have to write your own IndexNameFormatter tailored to your needs and derive the index name from message content or context data.

I'll think that context data based formatter with StrLookup should do the trick. I'll add it in future releases.

If you need to be resource efficient, use hc module. It allocates several times less memory than jest, it has lower CPU overhead and further improvements are on the way.

@romanmar1
Copy link
Author

That's not exactly what i want.
Using the same Elasticsearch appender while deriving index name from context would be analogous using the same RollingFileAppender and deriving file name from context, which to me seems like the wrong kind of solution.

I want to use different appenders, possibly with different filters and everything else log4j2 has to offer, and be resource efficient. (Specifically to be able to share the same client, or delivery mechanism).

ps: Can i use hc with pattern layout? I think i tried that and it didn't work (something about using ItemSource api only).

@rfoltyns
Copy link
Owner

You can achieve that only if you configure multiple appenders programmatically. Filters are applied before LogEvent gets to the appender, so you should be fine there.

Any layouts that produce raw Strings - including PatternLayout- are not compatible with hc, only ItemSourceLayouts are and I'm not planning to change that. JacksonJsonLayout is the only implementation atm.

@romanmar1
Copy link
Author

romanmar1 commented Mar 23, 2020

Will you consider adding support for that via configuration files?

@rfoltyns
Copy link
Owner

Short answer - no, never.

With Log4j2, each appender has it's own lifecycle - all underlying components must be reliably stopped when requested to do so and release all resources, especially running threads, connections and thread locals(!) if used. They can't leak.

With config files, even if you manage to share resources between multiple appenders, the main challenge is to identify that the last appender was stopped and shared resource can be released. Log4j2 has control over it so static registry with usage count is one of the ideas. Possible, but horrible - I'll never add support for that.

However, programmatic config will allow you to do it - this library is highly extensible by design. You can extend ElasticsearchAppender.stop() and ElasticsearchAppender.createItemAppenderFactory() methods and control the lifecycle of AsyncBatchDelivery on your own. Have a look at the architecture. Maybe you'll find a better way.

How many appenders would you like to have?

@romanmar1
Copy link
Author

I'll probably start with 3 and go up from there, up to 10, give or take.

I understand your concerns.
I might add that the solution to this problem, at least initially, doesn't have to be complete, imho.
The demands can be more relaxed.

In the scenario where a couple of Elasticsearch appenders share a common delivery mechanism, It seems unlikely that one would want to stop some of them, and not the others. Assuming that, you can forgo the reference count mechanism and close the first appender completely, which will mean for the other appenders that they will stop working, whereupon closing them, they will dispose of resources only if not already disposed.

Making this partial capability will not jeopardize the library i think. On the contrary, you'll be adding, what i think, is a common and important feature. Elastic is very common logging platform nowdays, and using Elasticsearch appenders should be as easy as using file appenders, without taking a heavy toll on resources.

@rfoltyns
Copy link
Owner

It seems unlikely that one would want to stop some of them, and not the others

You're right. However, it still doesn't allow a Lifecycle.stop() call on first appender to cause instability of the others.

close the first appender completely, which will mean for the other appenders that they will stop working

That's exactly why I'd rather sacrifice a few megabytes of heap, close them properly, in order, allowing all logs in flight to be delivered safely. I don't want to jeopardize your data(!) If graceful shutdown can be executed, it should be.

The demands can be more relaxed

They can as long as it doesn't lead to data loss and instability of client application.

without taking a heavy toll on resources

I agree, that's why I started to work on pooled buffers and hc delaying other key features - the difference between hc and default jest module overhead is huge.


The partial solution is there already - programmatic config with user-provided handling of AsyncBatchDelivery lifecycle.

@romanmar1
Copy link
Author

That's exactly why I'd rather sacrifice a few megabytes of heap, close them properly, in order, allowing all logs in flight to be delivered safely. I don't want to jeopardize your data(!) If graceful shutdown can be executed, it should be.

It's logging data, and the fact that it is being channeled directly to ES, theoretically, puts it already in some jeopardy. It's a calculated risk.

They can as long as it doesn't lead to data loss and instability of client application.

Data loss perhaps, i don't see how it will lead to instability.

There are other options as well. For example, two types of elasticsearch appenders. One type is what you have now available, the other is a Decorating elasticsearch appender, that gets by ref the other type. The Decorating appender will use the inner appender (by actually decorating ItemAppender) and only needs to transfer the index name to the inner appender. This will make it quite easy, i think, to implement reference count on the inner appender. Whenever some Decorating appender get an inner appender by ref, the counter goes up, and on dispose, it goes down, on zero, it is disposed. Even simpler, you can enforce order of disposal, Decorating appenders first, inner appenders last.

The partial solution is there already - programmatic config with user-provided handling of AsyncBatchDelivery lifecycle.

Well, yes and no. I have the need to write logs not only to ES, but to other sources as well. Also , the project that i'm currently working on has already a number of log4j2 xml configurations, and I don't want the logging configuration to be present in several places (code and xml). I'm a big fan of programmatic configuration myself, but It's not an acceptable excuse to move everything to code configuration.

I agree, that's why I started to work on pooled buffers and hc delaying other key features - the difference between hc and default jest module overhead is huge.

Can you provide some absolute metrics for this? perhaps from tests that you conducted?

@rfoltyns
Copy link
Owner

It's a calculated risk

I'm not willing to increase it.

i don't see how it will lead to instability

If in any case resources are not closed properly or closed prematurely, some classes may not be unloaded in hot-deploy environments (this applies to most dangling resources)

Decorating appender get an inner appender by ref

Options are already available with Log4j2: AsyncAppender, RoutingAppender, RewriteAppender. All of them can ref shared appender.

Here are the measurements of GC times and memory allocation rate taken from SmokeTest.programmaticConfigTest in jest and SmokeTest.programmaticBufferedConfigTest in hc, revision 946ac3e at ~10,000 logs per second (each log has 200 characters). 2 GB heap. 10,000,000 logs sent. Log is allocated once to measure library overhead.

Shenandoah (default, no tuning):

Metric jest hc
No. of minor collections 132 12
Pause interval ~8 sec ~120 sec
Total Pause Time 477 ms 104 ms
Avg. Pause Time 1.72 ms 2.60 ms
Allocation Rate 185.81 MB/sec 11.45 MB/sec
Allocated Total 187.71 GB 11.2 GB
GC Throughput 99.954% 99.99%

CMS (default, no tuning):

Metric jest hc
No. of minor collections 341 21
Pause interval ~3 sec ~50 sec
Total Pause Time 3 sec 910 ms 430 ms
Avg. Pause Time 11.4 ms 18.7 ms
Allocation Rate 172.97 MB/sec 11.17 MB/sec
Allocated Total 174.17 GB 11.06 GB
GC Throughput 99.621% 99.958%

@romanmar1
Copy link
Author

romanmar1 commented Mar 25, 2020

Options are already available with Log4j2: AsyncAppender, RoutingAppender, RewriteAppender. All of them can ref shared appender.

I don't see how they solve the problem we discussed. Through them its possible to implement context based routing, but they don't solve the problem of delivery mechanism sharing with different Elasticsearch indices.

Thanks for performance metrics. It will help decide which client to use.

@rfoltyns
Copy link
Owner

You're welcome! :) Always a pleasure to do these kind of comparisons.

As mentioned earlier, you can use your own, custom IndexNameFormatter based on context data and either:

  • RewriteAppender with custom RewritePolicy parametrized with indexName
  • or custom MessageFactory that creates Messages with indexName context parameter; not perfect, cause one may prefer to use SLF4J; however, since it's on the class level, it allows to use parametrization at any level of granularity (class, package, service, group, etc.) with properties supplier of your choice;
  • or ThreadContext (used to be MDC) which is the simplest, but requires careful cleanup (thread locals); all parametrization options apply as well

IMHO, they are reasonably safe solutions to the problem you'd like to solve. All of these interfaces are battle proven, well tested and allow you to use everything that Log4j2 has to offer.

I can add the optional IndexNameResolver with StrLookup to current formatters in 1.5. This would allow to achieve a lot with minimal changes required. Do you think that this could be of any use to you?

@romanmar1
Copy link
Author

RewriteAppender with custom RewritePolicy parametrized with indexName

This one will do, together with the IndexNameResolver and StrLookup.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants