This project guides you in creating a system that efficiently searches for and retrieves articles using Retrieval Augmented Generation (RAG). It leverages well-known tools and libraries to manage data and enhance the search process, making it both fast and effective.
- LangChain - simlifies creating applications with llms usage.
- OpenAI - APIs for advanced AI models and embeddings.
- Pandas- library providing high-performance, easy-to-use data structures.
- NumPy - fundamental package for scientific computing with Python.
- Matplotlib - plotting library for the Python programming language and its numerical mathematics extension NumPy.
- ChromaDB - modern vector database designed for high-performance similarity searches, making it ideal for applications that involve matching and retrieving large volumes of data quickly.
First, ensure you have Git installed on your system to manage cloning the project repository. You will also need Python. This project uses Python 3.10.12 version.
Clone the repository and navigate to the project directory:
git clone https://github.com/mweglowski/article_retrieval_system.git
cd article_retrieval_system
Install the required Python packages:
pip install chromadb pandas numpy langchain openai matplotlib python-dotenv tiktoken lark langchain-openai
Navigate to the main.ipynb file in the Jupyter Notebook interface to begin.
To start using the Article Retrieval System, follow these simple steps:
- Set Up Your Environment - make sure to install all the required libraries as mentioned in the Quickstart section. This prepares your system for running the notebook.
- Follow the Notebook - open the main.ipynb file in Jupyter Notebook. This file contains clear, step-by-step instructions on how to use each part of the system.
- Run the Code - execute the code cells in the order they appear. By running these cells, you can see how the system operates and ensure that everything is working correctly.
Everything you need to get started is thoroughly explained within the main.ipynb notebook, making it easy to get up and running without prior knowledge of the system’s inner workings.