Skip to content

RAG application to track and analyze Safaricom Mpesa transactions from using LLMs.

Notifications You must be signed in to change notification settings

derak-isaack/Mpesa-statement-chatbot

Repository files navigation

M-PESA RAG application

OpenAI Python Jupyter

RAG application that uses OpenAI embeddings to allow user interaction with the Safaricom PDF M-pesa statement reports to analyze one's transactions patterns.

The application uses llama-index as the base for the Retrieval Augmneted Generation and OpenAI embeddings as the vector store for similarity search purposes.

The model gets it wrong at some instances during vector inferencing & similarity search and therefore refining the queries or using the LllamaParser is necessary to produce quality results.

Improving accuracy

To improve the query results, it is very essential to use tools which clean the data for any RAG applications. One such tool is the Llama_parser. The main goal of LlamaParse is to parse and clean your data, ensuring that it's good quality before passing to any downstream LLM use case such as advanced RAG. To utilize the use of its 1000 pages free API, check the following docs and to get the code snippets for use, check this

Below is a snippet showing the benefits of using the LlamaParser: LlamaParser

Running the command pip install -r requirements.txt installs all the required dependencies including the LlamaParser. To use this Parser, one remaining dependency is using the library net-ascyncio which can be installed using the command pip install ascyncio.

After using the Parser library, the search queries improve significantly. Especially for applications involving use of tables and figures.

Approach II

Alternatively, instead of converting the pdf fully with OpenAI embeddings, the library tabular-py which extracts tables form pdfs and converts them to CSVs can be used. This library is a simple python wrapper for java-tables and their documentation is conclusive about all approaches. The library however requires JAVA be installed because it's a python wrapper for JAVA.

Below is a snippet about how the library achieves this: Tabular-py

After conversion to a CSV, the use of Pandas-AI can now be employed to allow querying the data using user prompts. Their documentation is also conclusive with a tone of code snippets with examples for querying:

  • Excel files
  • Google sheets
  • CSVs

Below are the code snippets: CSV Excel Google Sheets

It also employs use of various API KEYS which serve as credentials for interacting with the Generative AI models.

About

RAG application to track and analyze Safaricom Mpesa transactions from using LLMs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published