diff --git a/docs/404.html b/docs/404.html
deleted file mode 100644
index c472b4e..0000000
--- a/docs/404.html
+++ /dev/null
@@ -1,24 +0,0 @@
----
-layout: default
----
-
-
-
-
-
404
-
-
Page not found :(
-
The requested page could not be found.
-
diff --git a/docs/_config.yml b/docs/_config.yml
index 0089889..49bf9d5 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -13,21 +13,13 @@
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
-title: Your awesome title
-email: your-email@example.com
-description: >- # this means to ignore newlines until "baseurl:"
- Write an awesome description for your new site here. You can edit this
- line in _config.yml. It will appear in your document head meta (for
- Google search results) and in your feed.xml site description.
-baseurl: "" # the subpath of your site, e.g. /blog
-url: "" # the base hostname & protocol for your site, e.g. http://example.com
+title: STRUCTURED LOGS IS AMAZING, BUT YOU DON'T NEED TO PREPARE FOR IT
+email: gotzehsing@gmail.com
github_username: ethe
-github: [metadata]
-
# Build settings
markdown: kramdown
-remote_theme: pages-themes/minimal@v0.2.0
+remote_theme: "just-the-docs/just-the-docs"
plugins:
- jekyll-remote-theme # add this line to the plugins list if you already have one
diff --git a/docs/_posts/2023-10-07-welcome-to-jekyll.markdown b/docs/_posts/2023-10-07-welcome-to-jekyll.markdown
deleted file mode 100644
index b19ce36..0000000
--- a/docs/_posts/2023-10-07-welcome-to-jekyll.markdown
+++ /dev/null
@@ -1,25 +0,0 @@
----
-layout: post
-title: "Welcome to Jekyll!"
-date: 2023-10-07 18:46:14 +0800
-categories: jekyll update
----
-You’ll find this post in your `_posts` directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run `jekyll serve`, which launches a web server and auto-regenerates your site when a file is updated.
-
-To add new posts, simply add a file in the `_posts` directory that follows the convention `YYYY-MM-DD-name-of-post.ext` and includes the necessary front matter. Take a look at the source for this post to get an idea about how it works.
-
-Jekyll also offers powerful support for code snippets:
-
-{% highlight ruby %}
-def print_hi(name)
- puts "Hi, #{name}"
-end
-print_hi('Tom')
-#=> prints 'Hi, Tom' to STDOUT.
-{% endhighlight %}
-
-Check out the [Jekyll docs][jekyll-docs] for more info on how to get the most out of Jekyll. File all bugs/feature requests at [Jekyll’s GitHub repo][jekyll-gh]. If you have questions, you can ask them on [Jekyll Talk][jekyll-talk].
-
-[jekyll-docs]: https://jekyllrb.com/docs/home
-[jekyll-gh]: https://github.com/jekyll/jekyll
-[jekyll-talk]: https://talk.jekyllrb.com/
diff --git a/docs/about.md b/docs/about.md
deleted file mode 100644
index 8b4e0b2..0000000
--- a/docs/about.md
+++ /dev/null
@@ -1,18 +0,0 @@
----
-layout: page
-title: About
-permalink: /about/
----
-
-This is the base Jekyll theme. You can find out more info about customizing your Jekyll theme, as well as basic Jekyll usage documentation at [jekyllrb.com](https://jekyllrb.com/)
-
-You can find the source code for Minima at GitHub:
-[jekyll][jekyll-organization] /
-[minima](https://github.com/jekyll/minima)
-
-You can find the source code for Jekyll at GitHub:
-[jekyll][jekyll-organization] /
-[jekyll](https://github.com/jekyll/jekyll)
-
-
-[jekyll-organization]: https://github.com/jekyll
diff --git a/docs/index.md b/docs/index.md
index 829c03a..f546877 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,3 +1,9 @@
+---
+layout: minimal
+---
+
+# STRUCTURED LOGS IS USEFUL, AND YOU DON'T NEED TO PREPARE FOR IT
+
Ever found yourself in the same predicament as I have? As a backend developer, I often find myself in a dilemma during service program diagnostics: I regret not having collected and structured more logs in advance for insertion into Elasticsearch, while also being panic by the complexity of grep, awk and sed commands.
Recently, however, I seem to have seen a turning point: I've made a new attempt based on some recent innovations. I've created a command-line tool that leverages the LLM's automatic structuring capabilities to structure logs after the fact, and uses an in-process localized OLAP database, Python REPL, and Numpy / Pandas to provide a quick and powerful querying and processing workstation. You can [check out the results here](https://github.com/ethe/bakalog). Below are some of my thoughts on the issue of log processing.
@@ -102,6 +108,17 @@ After clustering the logs, we need to identify the log patterns to facilitate th
└────────────────────────────────────┘ └──────────────────────────────────────────────┘
```
+And LLM is a perfect fit for this job: LLM performs exceptionally well in pattern recognition and summarization. I once attempted to submit each log to GPT-4 for pattern recognition, [but LLM was too slow](https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow). Ultimately, I chose to provide GPT-4 only with samples from each log group, prompting GPT-4 to generate regular expressions, and using these for direct log matching. Only when a log does not belong to any regular expression, does further log clustering and summarization occur. The advantage of this approach is that after a brief cold-start period, the vast majority of logs will be processed via regular expressions and will no longer rely on GPT-4.
+
+In the end, we have a complete tool:
+
+1. Use regular expressions to match and extract log variables;
+2. Cluster logs based on text embedding model;
+3. Use GPT-4 to extract regular expressions from samples of each log category;
+4. Automatically create DuckDB tables and open IPython/Jupyter;
+
+You can see the final demonstration in [bakalog/README.md](https://github.com/ethe/bakalog#what-it-does).
+
## WHAT'S NEXT?
This is just a Proof of Concept (PoC), it has more potential capabilities to grow into a real localized ad-hoc log analysis platform: