Initial public commit

IntelLabs · Aug 2, 2024 · 225ca88 · 225ca88
1 parent 3fc9203
commit 225ca88
Show file tree

Hide file tree

Showing 97 changed files with 3,468 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+/.python-version
+/outputs/
+__pycache__/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,23 @@
+repos:
+  - repo: https://github.com/psf/black-pre-commit-mirror
+    rev: 24.4.2
+    hooks:
+      - id: black
+        args: ["-l", "90"]
+  - repo: https://github.com/pycqa/isort
+    rev: 5.13.2
+    hooks:
+      - id: isort
+  - repo: https://github.com/pycqa/flake8
+    rev: 7.1.0
+    hooks:
+      - id: flake8
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.6.0
+    hooks:
+      - id: trailing-whitespace
+      - id: check-docstring-first
+      - id: check-added-large-files
+      - id: check-yaml
+        args: ["--unsafe", "mkdocs.yml"]
+      - id: check-merge-conflict
diff --git a/LICENSE b/LICENSE
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [yyyy] [name of copyright owner]
+   Copyright 2024 Intel Corporation
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

diff --git a/README.md b/README.md
@@ -1,2 +1,81 @@
-# RAGFoundry
-Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.
+<div align="center">
+    <img src="assets/rag_foundry.png" width="500"/>
+</div>
+
+----------
+
+A framework for enhancing LLMs for RAG use-cases by enabling users to create data-augmented datasets for tuning and
+evaluation of LLMs, using RAG workflows.
+
+
+**RAG Foundry** is a library designed to improve LLMs ability to use external information by fine-tuning models on
+specially created RAG-augmented datasets. The library helps create the data for training, given a RAG technique, helps
+easily train models using parameter-efficient finetuning (PEFT), and finally can help users measure the improved
+performance using various, RAG-specific metrics. The library is modular, workflows are customizable using configuration
+files.
+
+### Installation
+Clone locally and run:
+
+```sh
+pip install -r requirements.txt
+```
+
+## Overview
+
+The RAG Foundry framework facilitates fast prototyping and experimentation with various RAG settings and configurations,
+including data selection and filtering, processing, retrieval, ranking, query manipulation, prompt generation, training,
+inference, output processing and evaluation. The library is comprised of 4 modules: dataset creation, training,
+inference and evaluation.
+
+* **Dataset Creation**: The processing module creates datasets, persisting RAG interactions, to be used for RAG training
+and inference. RAG interactions include dataset loading, columns normalization, data aggregation (fewshot creation),
+information retrieval using external tools and frameworks, API integration, template-based prompt creation and any other
+form of pre-processing. The data is saved in a consistent, model-independent, input-output format, along with all other
+fields and metadata. See [Processing.md](docs/processing.md).
+
+* **Training**: using PEFT for efficient training and TRL (e.g. supervised FT) users can train any model on the augmented
+datasets. Training is done on the completions. Models can be pushed to HF Hub. See [Training.md](docs/training.md).
+
+* **Inference**: generating predictions using the augmented datasets with trained or untrained LLMs. See [Inference.md](docs/inference.md).
+
+* **Evaluation**: running evaluation on the generated output from the inference module. Users can provide a list of
+metrics to run; custom metrics can be implemented easily. Current metrics include EM, F1, ROUGE, BERTScore, Deepeval,
+RAGAS, HF `evaluate` and classification. Metrics can be *local*—run on each example, or *global*—run on the entire
+dataset, e.g. recall. Metrics can utilize any feature in the dataset, like retrieval results, reasoning,
+citations and attributions, not just the input and output texts. See [Evaluation.md](docs/evaluation.md).
+
+
+## Running
+The 4 modules are represented as scripts: `processing.py`, `training.py`, `inference.py` and `evaluation.py` at the top
+level. Every call has the form `python SCRIPT options...`.
+
+The library utilizes the [Hydra](https://hydra.cc/docs/intro/) configuration tool; it enables the use of hierarchical
+configurations, easily overridden of values in the CLI and the ability to run multiple jobs remotely (e.g. integrations with
+SLURM and Ray). It represents a *configuration-as-code* approach, as it can instantiate python classes according to
+configuration (the `_target_` keyword indicates the python class to use in a given context).
+
+There are default configurations for each module in the [configs](./configs/) folder. A configuration file can be
+overridden like so:
+
+```sh
+python processing -cp configs/paper -cp processing-asqa-retrieval
+```
+
+Individual keywords can be overridden as well:
+```sh
+python processing -cp configs/paper -cp processing-asqa-retrieval   \
+       output_path=/store/data/here                                 \
+       hfhub_tag=my_org/my_data
+```
+
+For a complete set of configurations, **reproducing the experimentation in the paper with the ASQA dataset**, see the
+configurations in the [Paper](./configs/paper) folder.
+
+## License
+
+The code is licensed under the [Apache 2.0 License](LICENSE).
+
+## Disclaimer
+
+This is not an official Intel product.
diff --git a/assets/rag_foundry.png b/assets/rag_foundry.png
diff --git a/configs/evaluation.yaml b/configs/evaluation.yaml
@@ -0,0 +1,42 @@
+answer_processor:
+  _target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
+  capture_pattern:          # "<ANSWER>: (.*)"
+  stopping_pattern:         # "[,.;]"
+
+metrics:
+  - _target_: ragfoundry.evaluation.metrics.HFEvaluate
+    metric_names: [rouge]
+  - _target_: ragfoundry.evaluation.metrics.EM
+  - _target_: ragfoundry.evaluation.metrics.StringEM
+  - _target_: ragfoundry.evaluation.metrics.F1
+  - _target_: ragfoundry.evaluation.metrics.BERTScore
+    model: microsoft/deberta-large-mnli
+  - _target_: ragfoundry.evaluation.metrics.Semantic
+    model: vectara/hallucination_evaluation_model
+  - _target_: src.evaluation.metrics.Classification
+    mapping: {"yes": 1, "no": 0, "maybe": 2}
+    else_value: 2
+  - _target_: ragfoundry.evaluation.deep.Faithfulness
+    azure_endpoint:
+    azure_deployment:
+    api_version:
+  - _target_: ragfoundry.evaluation.deep.Relevancy
+    azure_endpoint:
+    azure_deployment:
+    api_version:
+    embeddings: BAAI/bge-small-en-v1.5
+
+
+key_names:
+  generated: generated
+  label: answer
+  query: query
+  context: context
+
+results_file: my-evaluation.yaml
+generated_file: inference.jsonl
+data_file: my-processed-data.jsonl
+limit:
+use_wandb:
+experiment:
+wandb_entity:
diff --git a/configs/external/haystack/qdrant.yaml b/configs/external/haystack/qdrant.yaml
@@ -0,0 +1,25 @@
+components:
+    retriever:
+        init_parameters:
+            document_store:
+                type: haystack_integrations.document_stores.qdrant.document_store.QdrantDocumentStore
+                init_parameters:
+                    url: qdrant.url.com
+                    port: 6333
+                    index: wikipedia
+                    embedding_dim: 768
+                    similarity: dot_product
+                    write_batch_size: 50
+            top_k: 10
+        type: haystack_integrations.components.retrievers.qdrant.retriever.QdrantEmbeddingRetriever
+    text_embedder:
+        init_parameters:
+            batch_size: 64
+            model: BAAI/llm-embedder
+            prefix: "Represent this query for retrieving relevant documents: "
+            device:
+        type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
+connections:
+    - receiver: retriever.query_embedding
+      sender: text_embedder.embedding
+
diff --git a/configs/inference.yaml b/configs/inference.yaml
@@ -0,0 +1,26 @@
+model:
+    _target_: ragfoundry.models.hf.HFInference
+    model_name_or_path: microsoft/Phi-3-mini-128k-instruct
+    load_in_4bit: false
+    load_in_8bit: true
+    device_map: auto
+    torch_dtype:
+    trust_remote_code: true
+    instruction: ragfoundry/processing/prompts/prompt_instructions/qa.txt
+    instruct_in_prompt: false
+    lora_path:
+    generation:
+        do_sample: false
+        max_new_tokens: 50
+        max_length:
+        temperature:
+        top_k:
+        top_p:
+        return_full_text: false
+
+data_file: my-processed-data.jsnol
+generated_file: model-predictions.jsonl
+input_key: prompt
+generation_key: output
+target_key: answer
+limit:
diff --git a/configs/paper/evaluation-asqa-long.yaml b/configs/paper/evaluation-asqa-long.yaml
@@ -0,0 +1,29 @@
+answer_processor:
+    _target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
+    capture_pattern: "<ANSWER>: (.*)"
+    stopping_pattern:
+
+metrics:
+    - _target_: ragfoundry.evaluation.deep.Faithfulness
+      azure_endpoint: azure.endpoint.com
+      azure_deployment: GPT-4-32k-Bot
+      api_version: 2024-05-01-preview
+    - _target_: ragfoundry.evaluation.deep.Relevancy
+      azure_endpoint: azure.endpoint.com
+      azure_deployment: GPT-4-32k-Bot
+      api_version: 2024-05-01-preview
+      embeddings: BAAI/bge-small-en-v1.5
+
+key_names:
+    generated: text
+    label: answers
+    query: query
+    context: positive_passages
+
+results_file: asqa-context-dev-generated-results.yaml
+generated_file: asqa-context-dev-generated.jsonl
+data_file: asqa-context-dev.jsonl
+limit:
+use_wandb:
+experiment:
+wandb_entity:
diff --git a/configs/paper/evaluation-asqa-short.yaml b/configs/paper/evaluation-asqa-short.yaml
@@ -0,0 +1,20 @@
+answer_processor:
+    _target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
+    capture_pattern: "<ANSWER>: (.*)"
+    stopping_pattern:
+
+metrics:
+    - _target_: ragfoundry.evaluation.metrics.StringEM
+
+key_names:
+    generated: text
+    label: answer-short
+    query: query
+
+results_file: evaluation-asqa-baseline.yaml
+generated_file: asqa-baseline-dev-generated.jsonl
+data_file: asqa-baseline-dev.jsonl
+limit:
+use_wandb:
+experiment:
+wandb_entity:
diff --git a/configs/paper/inference-asqa.yaml b/configs/paper/inference-asqa.yaml
@@ -0,0 +1,27 @@
+model:
+    _target_: ragfoundry.models.hf.HFInference
+    model_name_or_path: microsoft/Phi-3-mini-128k-instruct
+    load_in_4bit: false
+    load_in_8bit: true
+    device_map: auto
+    torch_dtype:
+    trust_remote_code: true
+    instruction: ragfoundry/processing/prompts/prompt_instructions/qa.txt
+    instruct_in_prompt: false
+    lora_path:
+    generation:
+        do_sample: false
+        max_new_tokens: 50
+        max_length:
+        temperature:
+        top_k:
+        top_p:
+        return_full_text: false
+
+data_file: asqa-baseline-dev.jsonl
+generated_file: asqa-baseline-dev-generated.jsonl
+input_key: prompt
+generation_key: output
+target_key: answers
+limit:
+
diff --git a/configs/paper/processing-asqa-baseline.yaml b/configs/paper/processing-asqa-baseline.yaml
@@ -0,0 +1,20 @@
+name: asqa_baseline
+cache: false
+output_path: .
+hfhub_tag:
+steps:
+    - _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
+      inputs: dev
+      filename: asqa-dev.jsonl
+
+    - _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
+      inputs: dev
+      prompt_file: ragfoundry/processing/prompts/qa-short.txt
+      output_key: prompt
+      mapping:
+            query: query
+
+    - _target_: ragfoundry.processing.global_steps.output.OutputData
+      inputs: dev
+      prefix: asqa-baseline
+
diff --git a/configs/paper/processing-asqa-context.yaml b/configs/paper/processing-asqa-context.yaml
@@ -0,0 +1,30 @@
+name: asqa_context
+cache: false
+output_path: .
+hfhub_tag:
+steps:
+    - _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
+      inputs: train
+      filename: asqa-train.jsonl
+
+    - _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
+      inputs: dev
+      filename: asqa-dev.jsonl
+
+    - _target_: ragfoundry.processing.local_steps.context.DocumentsJoiner
+      inputs: [train, dev]
+      docs_key: positive_passages
+      k: 5
+
+    - _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
+      inputs: [train, dev]
+      prompt_file: ragfoundry/processing/prompts/qa.txt
+      output_key: prompt
+      mapping:
+            question: query
+            context: positive_passages
+
+    - _target_: ragfoundry.processing.global_steps.output.OutputData
+      inputs: [train, dev]
+      prefix: asqa-context
+
diff --git a/configs/paper/processing-asqa-cot-dev.yaml b/configs/paper/processing-asqa-cot-dev.yaml
@@ -0,0 +1,26 @@
+name: asqa_cot_dev
+cache: false
+output_path: .
+hfhub_tag:
+steps:
+    - _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
+      inputs: dev
+      filename: asqa-dev.jsonl
+
+    - _target_: ragfoundry.processing.local_steps.context.DocumentsJoiner
+      inputs: dev
+      docs_key: positive_passages
+      k: 5
+
+    - _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
+      inputs: dev
+      prompt_file: ragfoundry/processing/prompts/cot.txt
+      output_key: prompt
+      mapping:
+            question: query
+            context: positive_passages
+
+    - _target_: ragfoundry.processing.global_steps.output.OutputData
+      inputs: dev
+      prefix: asqa-cot
+