Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat documentation related to parsing logs from containers is lacking and easily confuses people #37741

Open
11 tasks
C0rn3j opened this issue Jan 25, 2024 · 0 comments
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@C0rn3j
Copy link

C0rn3j commented Jan 25, 2024

The container parser (at least in docker format) silently concats every log content that does not contain a final newline in the string and only then does it get parsed.

This is the usual JSON log from docker:

{"log":"This is a log\n","stream":"stdout","time":"2024-01-24T19:09:55.568450771Z"}

This is an abnormal one without a newline which I came up with during debugging an issue that made my debugging process hell:

{"log":"This is a log","stream":"stdout","time":"2024-01-24T19:09:55.568450771Z"}

Which means that this logfile:

{"log":"This is a log","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"This is a log2","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"This is a log3","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}
{"log":"{\"message\":\"Starts synchronization of instances.\",\"request_id\":null}}\n","stream":"stdout","time":"2024-01-24T16:42:51.200033705Z"}

When going through this input:

- type: filestream
  id: docker-logs
  paths:
    - /tmp/filebeat-repro/docker.log
  parsers:
    - container:
        format: docker

Will end up spitting out this:

{
  "@timestamp": "2024-01-23T19:09:55.568Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.12.0"
  },
  "input": {
    "type": "filestream"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "host": {
    "name": "vyvoj"
  },
  "agent": {
    "version": "8.12.0",
    "ephemeral_id": "538597a4-ab3a-46b0-9c35-594a82fac18d",
    "id": "2b02cf81-00ee-4dbc-ac18-89b6e2abd0ff",
    "name": "vyvoj",
    "type": "filebeat"
  },
  "log": {
    "file": {
      "path": "/tmp/filebeat-repro/docker.log",
      "device_id": "1048580",
      "inode": "15861855"
    },
    "offset": 0
  },
  "stream": "stdout",
  "message": "This is a logThis is a log2This is a log3{\"message\":\"Starts synchronization of instances.\",\"request_id\":null}}\n"
}

If this is the expected behavior, which it might as well be, I would expect it to be documented here:
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_container


Onto my second issue.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_container
Use the container parser to extract information from containers log files. It parses lines into common message lines, extracting timestamps too.

Based on what I am getting from syslog parser here, crio logs look like this:

2022-02-04T18:14:43.219493781+01:00 stdout F Starting up on port 80

Meanwhile, docker by default logs JSON like this:

{"log":"This is a log\n","stream":"stdout","time":"2024-01-23T19:09:55.568450771Z"}

It is not immediately obvious from this that the docker logs get JSON decoded and log becomes message

"message": "This is a log\n",

It's a bit odd that the parser keeps the final newline as without it it doesn't even parse correctly.


Onto ticket number 3.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#filebeat-input-filestream-ndjson

These options make it possible for Filebeat to decode logs structured as JSON messages. Filebeat processes the logs line by line, so the JSON decoding only works if there is one JSON object per message.

The decoding happens before line filtering. You can combine JSON decoding with filtering if you set the message_key option.
This can be helpful in situations where the application logs are wrapped in JSON objects, like when using Docker.

The ndjson documentation suggests that it's useful for docker logs, but for that purpose the container log already exists, and if someone tries to use ndjson on top of parsing with container, they end up with garbage random error message about JSON decodes.


https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html

Fourth problem, deprecations and the container input.

log input is deprecated, container input is a wrapper for log.

One is marked as deprecated, the other isn't.

In fact it isn't even mentioned that one CAN migrate to filestream with the container parser.

Vice-versa, filestream documentation does not mention this is the replacement for the container input.

There is a related issue to this point -> #34393 - but it is mostly about introducing take_over option for the input now.

Not even autodiscovery is using the container input anymore - #35984 - yet people wanting to setup new Filebeat inputs for container logs are not warned or informed at all about the options and differences.


Why it matters

All of these issues combined will result in people doing double or triple JSON parsing and being ultimately very confused.

Here is an issue where OP(and many people in the comments, me included) tries to double-parse the log field which doesn't exist at that point anymore: #20053

Here is an issue where OP... does exactly the same thing, and on top of that they try to triple parse it with decode_json_fields processor: elastic/ecs-logging-java#43

Here's an issue where OP attempts to triple parse the log field again: SO64860153

There's many threads about these throughout the Filebeat forums too.


Actionable TL;DR:

  • 1. Confirm/Deny that log field concat of logs without newline until one with newline is thrown in container parser with docker format is expected and document/fix the behavior
  • 2. Document expected log formats for docker and cri format options in the container parser
  • 2. Document supported logging drivers for the docker format, seems it only supports json-file and not the local driver for example, as per this forum post - the filebeat.yml reference documentation is better than the website doc
  • 2. Confirm/Deny that final newline is kept for default docker JSON logs in container parser and document/fix the behavior
  • 2. Make it very obvious that using container parser will turn default docker JSON log log field into a message field
  • 3. Put a warning on the filestream ndjson parser that container parser ALREADY does what the ndjson example suggests to do, as that leads people to double parsing
  • 4. Deprecate the container input in the documentation
  • 4. Recommend filestream input with container parser instead of the container input in the container input documentation
  • 4. Document that container filestream parser is the replacement for the container input in the filestream documentation and mention differences, if any
  • 4. Document that migration from container input is currently not possible via take_over and is in the works in the container input documentation
  • 5. Bonus issue, fix the unescaped asterisks in the container input documentation that are wrongly bolding text
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 25, 2024
@C0rn3j C0rn3j changed the title Filestream parser documentation related to containers is lacking and easily confuses people Filebeat documentation related to parsing logs from containers is lacking and easily confuses people Jan 25, 2024
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Jan 25, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

2 participants