Skip to content

Commit

Permalink
Optimize code structure
Browse files Browse the repository at this point in the history
Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
  • Loading branch information
jaelgu committed Nov 17, 2023
1 parent 40edc67 commit cfec748
Show file tree
Hide file tree
Showing 80 changed files with 60 additions and 107 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,5 @@ jobs:
- name: Python pylint
run: |
pip install pylint==2.10.2
pylint --rcfile=.pylintrc --output-format=colorized src_towhee
pylint --rcfile=.pylintrc --output-format=colorized src_langchain
pylint --rcfile=.pylintrc --output-format=colorized src
pylint --rcfile=.pylintrc --output-format=colorized offline_tools
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
**/__pycache__
**/tmp
**/*.egg-info
**/*.db
**/build
4 changes: 2 additions & 2 deletions Contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ If you're interested in contributing to the `zilliztech/akcio` codebase, follow
4. During development, you might want to run `pylint`. You can do so with one of the commands below:
```bash
$ pip install pylint==2.10.2
$ pylint --rcfile=.pylintrc --output-format=colorized src_towhee
$ pylint --rcfile=.pylintrc --output-format=colorized src_langchain
$ pylint --rcfile=.pylintrc --output-format=colorized src.towhee
$ pylint --rcfile=.pylintrc --output-format=colorized src.langchain
$ pylint --rcfile=.pylintrc --output-format=colorized offline_tools
```

Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,34 +71,34 @@ It also supports different integrations of LLM service and databases:

The option using Towhee simplifies the process of building a system by providing [pre-defined pipelines](https://towhee.io/tasks/pipeline). These built-in pipelines require less coding and make system building much easier. If you require customization, you can either simply modify configuration or create your own pipeline with rich options of [Towhee Operators](https://towhee.io/tasks/operator).

- [Pipelines](./src_towhee/pipelines)
- [Pipelines](./src.towhee/pipelines)
- **Insert:**
The insert pipeline builds a knowledge base by saving documents and corresponding data in database(s).
- **Search:**
The search pipeline enables the question-answering capability powered by information retrieval (semantic search and optional keyword match) and LLM service.
- **Prompt:** a prompt operator prepares messages for LLM by assembling system message, chat history, and the user's query processed by template.

- [Memory](./src_towhee/memory):
The memory storage stores chat history to support context in conversation. (available: [most SQL](./src_towhee/memory/sql.py))
- [Memory](./src.towhee/memory):
The memory storage stores chat history to support context in conversation. (available: [most SQL](./src.towhee/memory/sql.py))


### Option 2: LangChain

The option using LangChain employs the use of [Agent](https://python.langchain.com/docs/modules/agents) in order to enable LLM to utilize specific tools, resulting in a greater demand for LLM's ability to comprehend tasks and make informed decisions.

- [Agent](./src_langchain/agent)
- [Agent](./src.langchain/agent)
- **ChatAgent:** agent ensembles all modules together to build up qa system.
- Other agents (todo)
- [LLM](./src_langchain/llm)
- [LLM](./src.langchain/llm)
- **ChatLLM:** large language model or service to generate answers.
- [Embedding](./src_langchain/embedding/)
- [Embedding](./src.langchain/embedding/)
- **TextEncoder:** encoder converts each text input to a vector.
- Other encoders (todo)
- [Store](./src_langchain/store)
- [Store](./src.langchain/store)
- **VectorStore:** vector database stores document chunks in embeddings, and performs document retrieval via semantic search.
- **ScalarStore:** optional, database stores metadata for each document chunk, which supports additional information retrieval. (available: [Elastic](src_langchain/store/scalar_store/es.py))
- **ScalarStore:** optional, database stores metadata for each document chunk, which supports additional information retrieval. (available: [Elastic](src.langchain/store/scalar_store/es.py))
- **MemoryStore:** memory storage stores chat history to support context in conversation.
- [DataLoader](./src_langchain/data_loader/)
- [DataLoader](./src.langchain/data_loader/)
- **DataParser:** tool loads data from given source and then splits documents into processed doc chunks.

## Deployment
Expand Down Expand Up @@ -228,7 +228,7 @@ The option using LangChain employs the use of [Agent](https://python.langchain.c

## Load data

The `insert` function in [operations](./src_langchain/operations.py) loads project data from url(s) or file(s).
The `insert` function in [operations](./src.langchain/operations.py) loads project data from url(s) or file(s).

There are 2 options to load project data:

Expand Down
10 changes: 5 additions & 5 deletions config.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
raise NotImplementedError

RERANK_CONFIG = {
'rerank': True, # or False
'rerank': False, # or False
'rerank_model': rerank_model,
'threshold': 0.0,
'rerank_device': -1 # -1 will use cpu
Expand All @@ -126,7 +126,7 @@
'chunk_size': 300
}

QUESTIONGENERATOR_CONFIG = {
'model_name': 'gpt-3.5-turbo',
'temperature': 0,
}
# QUESTIONGENERATOR_CONFIG = {
# 'model_name': 'gpt-3.5-turbo',
# 'temperature': 0,
# }
File renamed without changes.
4 changes: 2 additions & 2 deletions gradio_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
'The service should start with either "--langchain" or "--towhee".'

if USE_LANGCHAIN:
from src_langchain.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
from src.langchain.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
if USE_TOWHEE:
from src_towhee.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
from src.towhee.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413


def create_session_id():
Expand Down
4 changes: 2 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@
'The service should start with either "--langchain" or "--towhee".'

if USE_LANGCHAIN:
from src_langchain.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
from src.langchain.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
chat = partial(chat, enable_agent=ENABLE_AGENT)
if USE_TOWHEE:
from src_towhee.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
from src.towhee.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
if ENABLE_MONITER:
from moniter import enable_moniter # pylint: disable=C0413
from prometheus_client import generate_latest, REGISTRY # pylint: disable=C0413
Expand Down
2 changes: 1 addition & 1 deletion offline_tools/insert.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_langchain.embedding import TextEncoder # pylint: disable=C0413
from src.langchain.embedding import TextEncoder # pylint: disable=C0413
from offline_tools.generator_questions import get_output_csv # pylint: disable=C0413
from offline_tools.utils.stackoverflow_json2csv import stackoverflow_json2csv # pylint: disable=C0413
from offline_tools.utils.load_npy import langchain_load # pylint: disable=C0413
Expand Down
2 changes: 1 addition & 1 deletion offline_tools/utils/load_npy.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_langchain.store import DocStore # pylint: disable=C0413
from src.langchain.store import DocStore # pylint: disable=C0413


class DBReader(object):
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ gradio>=3.30.0
fastapi
uvicorn
towhee>=1.1.0
pydantic<2.0
pymilvus
elasticsearch>=8.0.0
prometheus-client
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ agent = ChatAgent.from_llm_and_tools(
# Define a chain
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
verbose=False
tools=tools
)

# Run a test
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ def chat(session_id, project, question, enable_agent=False):
agent=agent,
tools=tools,
memory=memory_db.memory,
verbose=False
)
try:
final_answer = agent_chain.run(input=question)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion src_towhee/memory/sql.py → src/towhee/memory/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))

from src_towhee.base import BaseMemory # pylint: disable=C0413
from src.towhee.base import BaseMemory # pylint: disable=C0413
from config import MEMORYDB_CONFIG # pylint: disable=C0413


Expand Down
4 changes: 2 additions & 2 deletions src_towhee/operations.py → src/towhee/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src_towhee.memory import MemoryStore # pylint: disable=C0413
from src.towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src.towhee.memory import MemoryStore # pylint: disable=C0413


logger = logging.getLogger(__name__)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
RERANK_CONFIG, QUERY_MODE, INSERT_MODE,
DATAPARSER_CONFIG
)
from src_towhee.base import BasePipelines # pylint: disable=C0413
from src_towhee.pipelines.search import build_search_pipeline # pylint: disable=C0413
from src_towhee.pipelines.insert import build_insert_pipeline # pylint: disable=C0413
from src.towhee.base import BasePipelines # pylint: disable=C0413
from src.towhee.pipelines.search import build_search_pipeline # pylint: disable=C0413
from src.towhee.pipelines.insert import build_insert_pipeline # pylint: disable=C0413


class TowheePipelines(BasePipelines):
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
import os
import sys
import unittest

from langchain.agents import AgentExecutor, Tool
from langchain.llms.fake import FakeListLLM

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.agent import ChatAgent
from src.langchain.agent import ChatAgent


class TestChatAgent(unittest.TestCase):
Expand All @@ -25,8 +20,7 @@ class TestChatAgent(unittest.TestCase):
def test_run_chat_agent(self):
agent_executor = AgentExecutor.from_agent_and_tools(
agent=self.chat_agent,
tools=self.tools,
verbose=False
tools=self.tools
)
final_answer = agent_executor.run(input='whats 2 + 2', chat_history=[])
assert final_answer == self.responses[1]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
import os
import sys
import unittest

from langchain.schema import AgentAction, AgentFinish

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.agent.prompt import FORMAT_INSTRUCTIONS
from src_langchain.agent.output_parser import OutputParser
from src.langchain.agent.prompt import FORMAT_INSTRUCTIONS
from src.langchain.agent.output_parser import OutputParser


class TestOutputParser(unittest.TestCase):
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
import io
import os
import sys
import tempfile
import unittest
from unittest.mock import patch

from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.data_loader import DataParser
from src.langchain.data_loader import DataParser


class TestDataParser(unittest.TestCase):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.data_loader.data_splitter import MarkDownSplitter
from src.langchain.data_loader.data_splitter import MarkDownSplitter


class TestMarkDownSplitter(unittest.TestCase):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

import numpy as np

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))
from src_langchain.embedding.langchain_huggingface import TextEncoder
from src.langchain.embedding.langchain_huggingface import TextEncoder


class TestLangchainHuggingface(unittest.TestCase):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

import numpy as np

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))
from src_langchain.embedding.openai_embedding import TextEncoder
from src.langchain.embedding.openai_embedding import TextEncoder


class TestOpenAIEmbedding(unittest.TestCase):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

from langchain.schema import HumanMessage

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))

MOCK_ANSWER = 'mock answer'

Expand All @@ -18,7 +15,7 @@ def __call__(self, prompt):

with patch('transformers.pipeline') as mock_pipelines:
mock_pipelines.return_value = MockGenerateText()
from src_langchain.llm.dolly_chat import ChatLLM
from src.langchain.llm.dolly_chat import ChatLLM

chat_llm = ChatLLM(model_name='mock', device='cpu', )
messages = [HumanMessage(content='hello')]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
import os
import sys
import unittest
from unittest.mock import patch
from langchain.schema import HumanMessage, AIMessage

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))


class TestERNIE(unittest.TestCase):
def test_generate(self):
Expand All @@ -27,7 +23,7 @@ def test_generate(self):
)
mock_post.return_value = mock_res

from src_langchain.llm.ernie import ChatLLM
from src.langchain.llm.ernie import ChatLLM

EB_API_TYPE = 'mock_type'
EB_ACCESS_TOKEN = 'mock_token'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))


class TestOpenAIChat(unittest.TestCase):
def test_init(self):
from src_langchain.llm.openai_chat import ChatLLM
from src.langchain.llm.openai_chat import ChatLLM
chat_llm = ChatLLM(openai_api_key='mock-key')
self.assertEqual(chat_llm.__class__.__name__, 'ChatLLM')

Expand Down
Empty file.
1 change: 1 addition & 0 deletions tests/unit_tests/src/towhee/akcio_ut.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is test content.
Empty file.
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_towhee.base import BaseMemory # pylint: disable=C0413
from src_towhee.memory.sql import MemoryStore # pylint: disable=C0413
from src.towhee.base import BaseMemory
from src.towhee.memory.sql import MemoryStore


class TestSql(unittest.TestCase):
Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
import unittest
from unittest.mock import patch

import json
import sys
import os

from milvus import MilvusServer

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from config import ( # pylint: disable=C0413
CHAT_CONFIG, TEXTENCODER_CONFIG,
VECTORDB_CONFIG, RERANK_CONFIG,
)
from src_towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src.towhee.pipelines import TowheePipelines # pylint: disable=C0413

milvus_server = MilvusServer()

Expand Down
Loading

0 comments on commit cfec748

Please sign in to comment.