Pf 220: Make retrieval tool configurable (#228)

* Working configurable retriever * Fix issue with tool config deserilaization * Make retrieval tool configurable * update documentation --------- Co-authored-by: Dan Orlando <daorlando@webmd.net>
PersonaFlow · Oct 26, 2024 · 0b07b1c · 0b07b1c
1 parent dd79094
commit 0b07b1c
Show file tree

Hide file tree

Showing 12 changed files with 286 additions and 77 deletions.
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -115,7 +115,7 @@ services:
   # Use the command: `docker exec -it ollama ollama pull <model>` to pull a model
   # ollama:
   #   image: ollama/ollama
-  #   container_name: embed-ollama
+  #   container_name: personaflow-ollama
   #   ports:
   #     - 11435:11434
   #   volumes:

diff --git a/docs/assistants.md b/docs/assistants.md
@@ -1,12 +1,13 @@
-# Local Assistants
+# Local Assistants API
 
-The assistants API allows for database-backed assistants to be created and run fully local.
+The assistants API allows for database-backed assistants to be created and run fully locally.
 
-There are three base architectures:
+Current architectures:
 
 - Chat: Just LLM
-- Chat Retrieval/RAG
+- Chat Retrieval (RAG)
 - Agent (RAG + Tools)
+- Corrective RAG Agent (in progress)
 
 New agent architectures that include things like self-reflection, multi-agent collaboration, and complex control flows (eg. customer support) are implemented relatively easily using [LangGraph](https://langchain-ai.github.io/langgraph/).
 
@@ -48,9 +49,24 @@ In this example, there are three assistant architectures: agent, chatbot, and ch
 
 Some may find the keys in the `configurable` object peculiar. Although it may seem complex to wrap your head around at first, it actually simplifies things considerably as it allows for the assistant configuration to be bound to predefined schemas for each architecture while being able to scale as new LangGraph agent pipelines are added without having to support a continuously expanding assistants model.
 
+However, it is important to note that these type annotations are _not_ required. An assistant configuration that is loaded into the run manager will still work without them as long as the type is specified, ie:
+
+```json
+{
+  "configurable": {
+    "type": "agent",
+    "tools": [],
+    "interupt_before_action": true,
+    "agent_type": "GPT 4o Mini",
+    "system_message": "You are a helpful assistant.",
+    "retrieval_description": "Can be used to look up information."
+  }
+}
+```
+
 ## Tools
 
-A small set of tools are currently available, with more tools planned:
+A large focus has been placed on knowledge base RAG that is high quality and easily configurable, so a small set of tools are currently available, with more tools planned:
 
 - Vector Retrieval
 - DuckDuckGo Search
@@ -64,16 +80,38 @@ A small set of tools are currently available, with more tools planned:
 - Wikipedia
 - PubMed
 
-Tools are added to an assistant as an array of objects, like so:
+Tools are added to an assistant as an array of objects. The `config` field is where the tool-specific configuration is stored. For example, you can set options like the encoder configuration, rerank and the vector database index/collection name to use. If you are using multiple vector database, you can also specify the vector database that should be used by the retrieval tool as well.
+
 
 ```json
-{
-  "id": "retrieval",
-  "type": "retrieval",
-  "name": "Retrieval",
-  "description": "Look up information in uploaded files.",
-  "config": {}
-}
+"tools": [
+  {
+      "type": "retrieval",
+      "description": "Look up information in uploaded files.",
+      "name": "Retrieval",
+      "config": {
+          "encoder": {
+              "provider": "ollama",
+              "dimensions": 384, 
+              "encoder_model": "all-minilm"
+          }, 
+          "vector_database": {
+            "type": "qdrant",
+            "config": {
+              "host": "http://localhost:6333",
+              "api_key": "123456789"
+            }
+          },
+          "enable_rerank": false,
+          "index_name": "test"
+      },
+      "multi_use": false 
+  }
+],
 ```
+Notes:
+- The retrieval config is a means of overriding the defaults set via environment variables. 
+- Retrieval and action server are the only tools that currently use the config field.
+- `multi-use` is for tools that are mult-purpose, such as the Sem4.ai Action Server or Connery. 
+
 
-### Creating New Tools
diff --git a/docs/rag.md b/docs/rag.md
@@ -1,6 +1,6 @@
 # Document Processing and RAG
 
-PersonaFlow's document processing system can be used as part of the Assistants flow or as a standalone system for processing and querying structured or unstructured data. The system is designed to be modular and can be easily extended to support other embedding models and vector databases.
+AgentStack's document processing system can be used as part of the Assistants flow or as a standalone system for processing and querying structured or unstructured data. The system is designed to be modular and can be easily extended to support other embedding models and vector databases.
 
 Processing of data is handled primarily by the embedding service, which orchestrates the partioning, chunking, embedding, and upserting of documents. The splitting of documents can be done by title, where title elements in the document are identified and used as split points, or splits can be created according to semantic similarity of the surrounding content. Splits can also be made using the standard recursive method with a chunk overlap.
 
@@ -57,7 +57,7 @@ The `/api/v1/rag/ingest` endpoint takes an array of file IDs. An optional webhoo
 - For use-cases that involve multiple collections across different vector stores, the `vector_database` and `index_name` fields can be used to specifiy the location where the embeddings should be stored. This applies to both the ingest and query endpoints.
 
 ```json
-...
+(...)
   "vector_database": {
     "type": "qdrant",
     "config": {
@@ -66,12 +66,66 @@ The `/api/v1/rag/ingest` endpoint takes an array of file IDs. An optional webhoo
     }
   },
   "index_name": "custom"
-...
+(...)
 ```
 
-## Querying
+## Assistant Retrieval
+To use retrieval augmented generation (RAG) via an assistant, you can create an assistant using one of the architectures that support it. The `chat_retrieval` architecture has knowledge base retrieval built in to the assistant. The standard `agent` architecture performs RAG by including the Retrieval tool in the assistant configuration, like so:
 
-Document retrieval can be done independent of an assistant by calling the `/api/v1/rag/query` endpoint. This is a POST request with a JSON payload containing the query parameters. Here is an example of a query payload:
+```json
+(...)
+"tools": [
+  {
+      "type": "retrieval",
+      "description": "Look up information in uploaded files.",
+      "name": "Retrieval",
+      "config": {
+          "encoder": {
+              "provider": "ollama",
+              "dimensions": 384, 
+              "encoder_model": "all-minilm"
+          },
+          "enable_rerank": true,
+          "index_name": "test"
+      },
+      "multi_use": false 
+  }
+],
+(...)
+```
+Currently, the supported encoders include: Cohere, OpenAI, Ollama, Azure OpenAI, and Mistral, but any encoder that is supported by langchain can be added by creating an adapter class:
+
+```python
+from typing import List
+from pydantic import Field
+from semantic_router.encoders import BaseEncoder
+from langchain_community.embeddings import OllamaEmbeddings
+from stack.app.core.configuration import settings
+
+class OllamaEncoder(BaseEncoder):
+    name: str = Field(default="all-minilm")
+    score_threshold: float = Field(default=0.5)
+    type: str = Field(default="ollama")
+    dimensions: int = Field(default=384)
+    embeddings: OllamaEmbeddings = Field(default=None)
+
+    class Config:
+        arbitrary_types_allowed = True
+
+    def __init__(self, **data):
+        super().__init__(**data)
+        if "name" in data:
+            self.embeddings = OllamaEmbeddings(
+                model=self.name, base_url=settings.OLLAMA_BASE_URL
+            )
+
+    def __call__(self, docs: List[str]) -> List[List[float]]:
+        return self.embeddings.embed_documents(docs)
+```
+
+## Stand-alone Querying
+
+Document retrieval can also be done independent of an assistant by calling the `/api/v1/rag/query` endpoint. This is a POST request with a JSON payload containing the query parameters. Here is an example of a query payload:
 
 ```json
 {
@@ -99,3 +153,4 @@ Document retrieval can be done independent of an assistant by calling the `/api/
 - `vector_database`: This block is optional but is useful when collections are held across different vector databases. If omitted, these details will be obtained from environment variables.
 - `thread_id`: This is an optional parameter and can be used to tie the query to an existing conversation id for logging purposes.
 - `enable_rerank`: Whether or not to rerank the query results. Currently requires a cohere api key if true (local reranking will be included in an upcoming release).
+
diff --git a/stack/app/agents/configurable_agent.py b/stack/app/agents/configurable_agent.py
@@ -150,6 +150,35 @@ class ConfigurableAgent(RunnableBinding):
     thread_id: Optional[str] = None
     user_id: Optional[str] = None
 
+    def _create_tool(
+        self,
+        tool: Union[dict, Tool],
+        assistant_id: Optional[str],
+        thread_id: Optional[str],
+        retrieval_description: str,
+    ) -> Union[Tool, list[Tool]]:
+        """Helper method to create tool instances."""
+        if isinstance(tool, dict):
+            tool_type = AvailableTools(tool["type"])
+        else:
+            tool_type = tool.type
+
+        if tool_type == AvailableTools.RETRIEVAL:
+            if assistant_id is None or thread_id is None:
+                raise ValueError(
+                    "Both assistant_id and thread_id must be provided if Retrieval tool is used"
+                )
+            config = tool.config if isinstance(tool, Tool) else tool.get("config", {})
+            return get_retrieval_tool(
+                assistant_id, thread_id, retrieval_description, config
+            )
+        else:
+            tool_obj = (
+                tool if isinstance(tool, Tool) else self._convert_dict_to_tool(tool)
+            )
+            tool_config = tool_obj.config or {}
+            return TOOLS[tool_obj.type](**tool_config)
+
     def __init__(
         self,
         *,
@@ -166,23 +195,17 @@ def __init__(
     ) -> None:
         settings = get_settings()
         others.pop("bound", None)
+
         _tools = []
-        for _tool in tools:
-            if _tool["type"] == AvailableTools.RETRIEVAL:
-                if assistant_id is None or thread_id is None:
-                    raise ValueError(
-                        "Both assistant_id and thread_id must be provided if Retrieval tool is used"
-                    )
-                _tools.append(
-                    get_retrieval_tool(assistant_id, thread_id, retrieval_description)
-                )
+        for tool in tools:
+            created_tool = self._create_tool(
+                tool, assistant_id, thread_id, retrieval_description
+            )
+            if isinstance(created_tool, list):
+                _tools.extend(created_tool)
             else:
-                tool_config = _tool.get("config", {})
-                _returned_tools = TOOLS[_tool["type"]](**tool_config)
-                if isinstance(_returned_tools, list):
-                    _tools.extend(_returned_tools)
-                else:
-                    _tools.append(_returned_tools)
+                _tools.append(created_tool)
+
         _agent = get_agent_executor(
             _tools, agent, system_message, interrupt_before_action
         )