Skip to content

Commit

Permalink
Pf 220: Make retrieval tool configurable (#228)
Browse files Browse the repository at this point in the history
* Working configurable retriever

* Fix issue with tool config deserilaization

* Make retrieval tool configurable

* update documentation

---------

Co-authored-by: Dan Orlando <daorlando@webmd.net>
  • Loading branch information
danorlando and Dan Orlando authored Oct 26, 2024
1 parent dd79094 commit 0b07b1c
Show file tree
Hide file tree
Showing 12 changed files with 286 additions and 77 deletions.
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ services:
# Use the command: `docker exec -it ollama ollama pull <model>` to pull a model
# ollama:
# image: ollama/ollama
# container_name: embed-ollama
# container_name: personaflow-ollama
# ports:
# - 11435:11434
# volumes:
Expand Down
66 changes: 52 additions & 14 deletions docs/assistants.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Local Assistants
# Local Assistants API

The assistants API allows for database-backed assistants to be created and run fully local.
The assistants API allows for database-backed assistants to be created and run fully locally.

There are three base architectures:
Current architectures:

- Chat: Just LLM
- Chat Retrieval/RAG
- Chat Retrieval (RAG)
- Agent (RAG + Tools)
- Corrective RAG Agent (in progress)

New agent architectures that include things like self-reflection, multi-agent collaboration, and complex control flows (eg. customer support) are implemented relatively easily using [LangGraph](https://langchain-ai.github.io/langgraph/).

Expand Down Expand Up @@ -48,9 +49,24 @@ In this example, there are three assistant architectures: agent, chatbot, and ch

Some may find the keys in the `configurable` object peculiar. Although it may seem complex to wrap your head around at first, it actually simplifies things considerably as it allows for the assistant configuration to be bound to predefined schemas for each architecture while being able to scale as new LangGraph agent pipelines are added without having to support a continuously expanding assistants model.

However, it is important to note that these type annotations are _not_ required. An assistant configuration that is loaded into the run manager will still work without them as long as the type is specified, ie:

```json
{
"configurable": {
"type": "agent",
"tools": [],
"interupt_before_action": true,
"agent_type": "GPT 4o Mini",
"system_message": "You are a helpful assistant.",
"retrieval_description": "Can be used to look up information."
}
}
```

## Tools

A small set of tools are currently available, with more tools planned:
A large focus has been placed on knowledge base RAG that is high quality and easily configurable, so a small set of tools are currently available, with more tools planned:

- Vector Retrieval
- DuckDuckGo Search
Expand All @@ -64,16 +80,38 @@ A small set of tools are currently available, with more tools planned:
- Wikipedia
- PubMed

Tools are added to an assistant as an array of objects, like so:
Tools are added to an assistant as an array of objects. The `config` field is where the tool-specific configuration is stored. For example, you can set options like the encoder configuration, rerank and the vector database index/collection name to use. If you are using multiple vector database, you can also specify the vector database that should be used by the retrieval tool as well.


```json
{
"id": "retrieval",
"type": "retrieval",
"name": "Retrieval",
"description": "Look up information in uploaded files.",
"config": {}
}
"tools": [
{
"type": "retrieval",
"description": "Look up information in uploaded files.",
"name": "Retrieval",
"config": {
"encoder": {
"provider": "ollama",
"dimensions": 384,
"encoder_model": "all-minilm"
},
"vector_database": {
"type": "qdrant",
"config": {
"host": "http://localhost:6333",
"api_key": "123456789"
}
},
"enable_rerank": false,
"index_name": "test"
},
"multi_use": false
}
],
```
Notes:
- The retrieval config is a means of overriding the defaults set via environment variables.
- Retrieval and action server are the only tools that currently use the config field.
- `multi-use` is for tools that are mult-purpose, such as the Sem4.ai Action Server or Connery.


### Creating New Tools
65 changes: 60 additions & 5 deletions docs/rag.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Document Processing and RAG

PersonaFlow's document processing system can be used as part of the Assistants flow or as a standalone system for processing and querying structured or unstructured data. The system is designed to be modular and can be easily extended to support other embedding models and vector databases.
AgentStack's document processing system can be used as part of the Assistants flow or as a standalone system for processing and querying structured or unstructured data. The system is designed to be modular and can be easily extended to support other embedding models and vector databases.

Processing of data is handled primarily by the embedding service, which orchestrates the partioning, chunking, embedding, and upserting of documents. The splitting of documents can be done by title, where title elements in the document are identified and used as split points, or splits can be created according to semantic similarity of the surrounding content. Splits can also be made using the standard recursive method with a chunk overlap.

Expand Down Expand Up @@ -57,7 +57,7 @@ The `/api/v1/rag/ingest` endpoint takes an array of file IDs. An optional webhoo
- For use-cases that involve multiple collections across different vector stores, the `vector_database` and `index_name` fields can be used to specifiy the location where the embeddings should be stored. This applies to both the ingest and query endpoints.

```json
...
(...)
"vector_database": {
"type": "qdrant",
"config": {
Expand All @@ -66,12 +66,66 @@ The `/api/v1/rag/ingest` endpoint takes an array of file IDs. An optional webhoo
}
},
"index_name": "custom"
...
(...)
```

## Querying
## Assistant Retrieval
To use retrieval augmented generation (RAG) via an assistant, you can create an assistant using one of the architectures that support it. The `chat_retrieval` architecture has knowledge base retrieval built in to the assistant. The standard `agent` architecture performs RAG by including the Retrieval tool in the assistant configuration, like so:

Document retrieval can be done independent of an assistant by calling the `/api/v1/rag/query` endpoint. This is a POST request with a JSON payload containing the query parameters. Here is an example of a query payload:
```json
(...)
"tools": [
{
"type": "retrieval",
"description": "Look up information in uploaded files.",
"name": "Retrieval",
"config": {
"encoder": {
"provider": "ollama",
"dimensions": 384,
"encoder_model": "all-minilm"
},
"enable_rerank": true,
"index_name": "test"
},
"multi_use": false
}
],
(...)
```
Currently, the supported encoders include: Cohere, OpenAI, Ollama, Azure OpenAI, and Mistral, but any encoder that is supported by langchain can be added by creating an adapter class:

```python
from typing import List
from pydantic import Field
from semantic_router.encoders import BaseEncoder
from langchain_community.embeddings import OllamaEmbeddings
from stack.app.core.configuration import settings

class OllamaEncoder(BaseEncoder):
name: str = Field(default="all-minilm")
score_threshold: float = Field(default=0.5)
type: str = Field(default="ollama")
dimensions: int = Field(default=384)
embeddings: OllamaEmbeddings = Field(default=None)

class Config:
arbitrary_types_allowed = True

def __init__(self, **data):
super().__init__(**data)
if "name" in data:
self.embeddings = OllamaEmbeddings(
model=self.name, base_url=settings.OLLAMA_BASE_URL
)

def __call__(self, docs: List[str]) -> List[List[float]]:
return self.embeddings.embed_documents(docs)
```

## Stand-alone Querying

Document retrieval can also be done independent of an assistant by calling the `/api/v1/rag/query` endpoint. This is a POST request with a JSON payload containing the query parameters. Here is an example of a query payload:

```json
{
Expand Down Expand Up @@ -99,3 +153,4 @@ Document retrieval can be done independent of an assistant by calling the `/api/
- `vector_database`: This block is optional but is useful when collections are held across different vector databases. If omitted, these details will be obtained from environment variables.
- `thread_id`: This is an optional parameter and can be used to tie the query to an existing conversation id for logging purposes.
- `enable_rerank`: Whether or not to rerank the query results. Currently requires a cohere api key if true (local reranking will be included in an upcoming release).

53 changes: 38 additions & 15 deletions stack/app/agents/configurable_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,35 @@ class ConfigurableAgent(RunnableBinding):
thread_id: Optional[str] = None
user_id: Optional[str] = None

def _create_tool(
self,
tool: Union[dict, Tool],
assistant_id: Optional[str],
thread_id: Optional[str],
retrieval_description: str,
) -> Union[Tool, list[Tool]]:
"""Helper method to create tool instances."""
if isinstance(tool, dict):
tool_type = AvailableTools(tool["type"])
else:
tool_type = tool.type

if tool_type == AvailableTools.RETRIEVAL:
if assistant_id is None or thread_id is None:
raise ValueError(
"Both assistant_id and thread_id must be provided if Retrieval tool is used"
)
config = tool.config if isinstance(tool, Tool) else tool.get("config", {})
return get_retrieval_tool(
assistant_id, thread_id, retrieval_description, config
)
else:
tool_obj = (
tool if isinstance(tool, Tool) else self._convert_dict_to_tool(tool)
)
tool_config = tool_obj.config or {}
return TOOLS[tool_obj.type](**tool_config)

def __init__(
self,
*,
Expand All @@ -166,23 +195,17 @@ def __init__(
) -> None:
settings = get_settings()
others.pop("bound", None)

_tools = []
for _tool in tools:
if _tool["type"] == AvailableTools.RETRIEVAL:
if assistant_id is None or thread_id is None:
raise ValueError(
"Both assistant_id and thread_id must be provided if Retrieval tool is used"
)
_tools.append(
get_retrieval_tool(assistant_id, thread_id, retrieval_description)
)
for tool in tools:
created_tool = self._create_tool(
tool, assistant_id, thread_id, retrieval_description
)
if isinstance(created_tool, list):
_tools.extend(created_tool)
else:
tool_config = _tool.get("config", {})
_returned_tools = TOOLS[_tool["type"]](**tool_config)
if isinstance(_returned_tools, list):
_tools.extend(_returned_tools)
else:
_tools.append(_returned_tools)
_tools.append(created_tool)

_agent = get_agent_executor(
_tools, agent, system_message, interrupt_before_action
)
Expand Down
Loading

0 comments on commit 0b07b1c

Please sign in to comment.