Skip to content

Commit

Permalink
rename to bakalog
Browse files Browse the repository at this point in the history
  • Loading branch information
ethe committed Oct 10, 2023
1 parent 9b32c9b commit 811a2de
Show file tree
Hide file tree
Showing 8 changed files with 20 additions and 20 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
- name: Test Installation
run: |
pip install -e '.'
python -m log2row --help
python -m bakalog --help
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Log2Row
# BakaLog
A command-line tool that detects, extracts log templates, and structures logs to in-process database, leveraging template patterns generated by GPT-4.

## What it does?
Expand Down Expand Up @@ -33,10 +33,10 @@ If you have several kinds of logs are mixed together (samples from [loghub](http
[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 6
```

Log2Row could detect and extract several log templates, it would take a while, all extracted log would be stored into an embedded in-memory DB [DuckDB](http://duckdb.org/docs/archive/0.9.0/), and opens an IPython REPL:
BakaLog could detect and extract several log templates, it would take a while, all extracted log would be stored into an embedded in-memory DB [DuckDB](http://duckdb.org/docs/archive/0.9.0/), and opens an IPython REPL:

```ipython
↳ OPENAI_API_KEY="***" python -m log2row run "loghub/Apache/*.log" --gpt-base https://api.openai.com/v1 --max-lines 0 --buf-size 1MB --threshold 0.9
↳ OPENAI_API_KEY="***" python -m bakalog run "loghub/Apache/*.log" --gpt-base https://api.openai.com/v1 --max-lines 0 --buf-size 1MB --threshold 0.9
Python 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.16.0 -- An enhanced Interactive Python. Type '?' for help.
Expand Down Expand Up @@ -172,29 +172,29 @@ DuckDB also supports saving results to various output types such as CSV, JSON, a
+ + > pattern flow
```

Log2Row processes all logs through a list of regex patterns. If a log matches a pattern successfully, it's grouped and variables are inserted into DuckDB. If a log doesn't match any patterns, it's buffered. These buffered logs are used to detect log communities via a text embedding model. Samples from each community are then sent to GPT-4 to extract their regex patterns.
BakaLog processes all logs through a list of regex patterns. If a log matches a pattern successfully, it's grouped and variables are inserted into DuckDB. If a log doesn't match any patterns, it's buffered. These buffered logs are used to detect log communities via a text embedding model. Samples from each community are then sent to GPT-4 to extract their regex patterns.

The pattern flow isn't part of the main processing, which means that after an initial bootstrap, the processing speed increases significantly. Thus, the longer Log2Row runs, the higher the logs/sec rate it has.
The pattern flow isn't part of the main processing, which means that after an initial bootstrap, the processing speed increases significantly. Thus, the longer BakaLog runs, the higher the logs/sec rate it has.

## How to install it?
```
↳pip install git@https://github.com/ethe/log2row.git
↳pip install git@https://github.com/ethe/bakalog.git
↳ python3 -m log2row
Usage: python -m log2row [OPTIONS] COMMAND [ARGS]...
↳ python3 -m bakalog
Usage: python -m bakalog [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
clean log2row cache all extracted patterns to each files as default,...
clean bakalog cache all extracted patterns to each files as default,...
run
```

*Logs2Array requires Python3.9+*

## How much does it cost?
Log2Row uses GPT-4 to extract the regex of log community, each extraction would costs hundreds to thousands tokens of GPT-4. This means each log community detection would costs 0.01$ to 0.1$.
BakaLog uses GPT-4 to extract the regex of log community, each extraction would costs hundreds to thousands tokens of GPT-4. This means each log community detection would costs 0.01$ to 0.1$.

## What is next?
- [ ] auto-detect multiple parts of log templates
Expand All @@ -211,4 +211,4 @@ Log2Row uses GPT-4 to extract the regex of log community, each extraction would
- [ ] GPT-3.5 compatible

## More information
Currently, log2row is still in early stage. If you are interested in it, let's discuss it on [Hacker News](https://news.ycombinator.com/item?id=37789903)
Currently, bakalog is still in early stage. If you are interested in it, let's discuss it on [Hacker News](https://news.ycombinator.com/item?id=37789903)
File renamed without changes.
10 changes: 5 additions & 5 deletions log2row/__main__.py → bakalog/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@
from IPython import embed
from rich.logging import RichHandler

from log2row import Match, Sink, collect
from log2row.cluster import Cluster
from log2row.extract import extract
from log2row.util import Memory, parse_size
from bakalog import Match, Sink, collect
from bakalog.cluster import Cluster
from bakalog.extract import extract
from bakalog.util import Memory, parse_size


@click.group()
Expand All @@ -19,7 +19,7 @@ def main():


@main.command(
help="log2row cache all extracted patterns to each files as default, clean the cache as needed."
help="bakalog cache all extracted patterns to each files as default, clean the cache as needed."
)
def clean():
files = glob(f"{Memory.PATH}/*")
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion log2row/extract.py → bakalog/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def get_messages(samples):
"role": "system",
"content": "You are a senior regular expression developer."
"Create a regular expression which could match, group and extract the log template of several logs below."
"Exactly the same part between logs must be the part of template."
"The exactly same part between logs must be the part of template."
"Pattern should start with `^` and end with `$`",
},
{
Expand Down
2 changes: 1 addition & 1 deletion log2row/util.py → bakalog/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def register(self, type, encoder):

class Memory(Singleton):
home = os.environ["HOME"]
PATH = f"{home}/.log2row"
PATH = f"{home}/.bakalog"

def __init__(
self,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.poetry]
name = "log2row"
name = "bakalog"
version = "0.1.0"
description = ""
authors = ["Gwo Tzu-Hsing <gotzehsing@gmail.com>"]
Expand Down

0 comments on commit 811a2de

Please sign in to comment.