Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ethe committed Oct 10, 2023
1 parent 866369d commit 6ab3ee6
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 78 deletions.
24 changes: 0 additions & 24 deletions docs/404.html

This file was deleted.

14 changes: 3 additions & 11 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,13 @@
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: Your awesome title
email: your-email@example.com
description: >- # this means to ignore newlines until "baseurl:"
Write an awesome description for your new site here. You can edit this
line in _config.yml. It will appear in your document head meta (for
Google search results) and in your feed.xml site description.
baseurl: "" # the subpath of your site, e.g. /blog
url: "" # the base hostname & protocol for your site, e.g. http://example.com
title: STRUCTURED LOGS IS AMAZING, BUT YOU DON'T NEED TO PREPARE FOR IT
email: gotzehsing@gmail.com
github_username: ethe

github: [metadata]

# Build settings
markdown: kramdown
remote_theme: pages-themes/minimal@v0.2.0
remote_theme: "just-the-docs/just-the-docs"
plugins:
- jekyll-remote-theme # add this line to the plugins list if you already have one

Expand Down
25 changes: 0 additions & 25 deletions docs/_posts/2023-10-07-welcome-to-jekyll.markdown

This file was deleted.

18 changes: 0 additions & 18 deletions docs/about.md

This file was deleted.

17 changes: 17 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
---
layout: minimal
---

# STRUCTURED LOGS IS USEFUL, AND YOU DON'T NEED TO PREPARE FOR IT

Ever found yourself in the same predicament as I have? As a backend developer, I often find myself in a dilemma during service program diagnostics: I regret not having collected and structured more logs in advance for insertion into Elasticsearch, while also being panic by the complexity of grep, awk and sed commands.

Recently, however, I seem to have seen a turning point: I've made a new attempt based on some recent innovations. I've created a command-line tool that leverages the LLM's automatic structuring capabilities to structure logs after the fact, and uses an in-process localized OLAP database, Python REPL, and Numpy / Pandas to provide a quick and powerful querying and processing workstation. You can [check out the results here](https://github.com/ethe/bakalog). Below are some of my thoughts on the issue of log processing.
Expand Down Expand Up @@ -102,6 +108,17 @@ After clustering the logs, we need to identify the log patterns to facilitate th
└────────────────────────────────────┘ └──────────────────────────────────────────────┘
```

And LLM is a perfect fit for this job: LLM performs exceptionally well in pattern recognition and summarization. I once attempted to submit each log to GPT-4 for pattern recognition, [but LLM was too slow](https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow). Ultimately, I chose to provide GPT-4 only with samples from each log group, prompting GPT-4 to generate regular expressions, and using these for direct log matching. Only when a log does not belong to any regular expression, does further log clustering and summarization occur. The advantage of this approach is that after a brief cold-start period, the vast majority of logs will be processed via regular expressions and will no longer rely on GPT-4.

In the end, we have a complete tool:

1. Use regular expressions to match and extract log variables;
2. Cluster logs based on text embedding model;
3. Use GPT-4 to extract regular expressions from samples of each log category;
4. Automatically create DuckDB tables and open IPython/Jupyter;

You can see the final demonstration in [bakalog/README.md](https://github.com/ethe/bakalog#what-it-does).

## WHAT'S NEXT?

This is just a Proof of Concept (PoC), it has more potential capabilities to grow into a real localized ad-hoc log analysis platform:
Expand Down

0 comments on commit 6ab3ee6

Please sign in to comment.