Skip to content

Commit

Permalink
Initial public commit
Browse files Browse the repository at this point in the history
  • Loading branch information
danielfleischer committed Aug 2, 2024
1 parent 3fc9203 commit 225ca88
Show file tree
Hide file tree
Showing 97 changed files with 3,468 additions and 3 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/.python-version
/outputs/
__pycache__/
23 changes: 23 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
repos:
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.4.2
hooks:
- id: black
args: ["-l", "90"]
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 7.1.0
hooks:
- id: flake8
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: check-docstring-first
- id: check-added-large-files
- id: check-yaml
args: ["--unsafe", "mkdocs.yml"]
- id: check-merge-conflict
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright 2024 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
83 changes: 81 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,81 @@
# RAGFoundry
Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.
<div align="center">
<img src="assets/rag_foundry.png" width="500"/>
</div>

----------

A framework for enhancing LLMs for RAG use-cases by enabling users to create data-augmented datasets for tuning and
evaluation of LLMs, using RAG workflows.


**RAG Foundry** is a library designed to improve LLMs ability to use external information by fine-tuning models on
specially created RAG-augmented datasets. The library helps create the data for training, given a RAG technique, helps
easily train models using parameter-efficient finetuning (PEFT), and finally can help users measure the improved
performance using various, RAG-specific metrics. The library is modular, workflows are customizable using configuration
files.

### Installation
Clone locally and run:

```sh
pip install -r requirements.txt
```

## Overview

The RAG Foundry framework facilitates fast prototyping and experimentation with various RAG settings and configurations,
including data selection and filtering, processing, retrieval, ranking, query manipulation, prompt generation, training,
inference, output processing and evaluation. The library is comprised of 4 modules: dataset creation, training,
inference and evaluation.

* **Dataset Creation**: The processing module creates datasets, persisting RAG interactions, to be used for RAG training
and inference. RAG interactions include dataset loading, columns normalization, data aggregation (fewshot creation),
information retrieval using external tools and frameworks, API integration, template-based prompt creation and any other
form of pre-processing. The data is saved in a consistent, model-independent, input-output format, along with all other
fields and metadata. See [Processing.md](docs/processing.md).

* **Training**: using PEFT for efficient training and TRL (e.g. supervised FT) users can train any model on the augmented
datasets. Training is done on the completions. Models can be pushed to HF Hub. See [Training.md](docs/training.md).

* **Inference**: generating predictions using the augmented datasets with trained or untrained LLMs. See [Inference.md](docs/inference.md).

* **Evaluation**: running evaluation on the generated output from the inference module. Users can provide a list of
metrics to run; custom metrics can be implemented easily. Current metrics include EM, F1, ROUGE, BERTScore, Deepeval,
RAGAS, HF `evaluate` and classification. Metrics can be *local*—run on each example, or *global*—run on the entire
dataset, e.g. recall. Metrics can utilize any feature in the dataset, like retrieval results, reasoning,
citations and attributions, not just the input and output texts. See [Evaluation.md](docs/evaluation.md).


## Running
The 4 modules are represented as scripts: `processing.py`, `training.py`, `inference.py` and `evaluation.py` at the top
level. Every call has the form `python SCRIPT options...`.

The library utilizes the [Hydra](https://hydra.cc/docs/intro/) configuration tool; it enables the use of hierarchical
configurations, easily overridden of values in the CLI and the ability to run multiple jobs remotely (e.g. integrations with
SLURM and Ray). It represents a *configuration-as-code* approach, as it can instantiate python classes according to
configuration (the `_target_` keyword indicates the python class to use in a given context).

There are default configurations for each module in the [configs](./configs/) folder. A configuration file can be
overridden like so:

```sh
python processing -cp configs/paper -cp processing-asqa-retrieval
```

Individual keywords can be overridden as well:
```sh
python processing -cp configs/paper -cp processing-asqa-retrieval \
output_path=/store/data/here \
hfhub_tag=my_org/my_data
```

For a complete set of configurations, **reproducing the experimentation in the paper with the ASQA dataset**, see the
configurations in the [Paper](./configs/paper) folder.

## License

The code is licensed under the [Apache 2.0 License](LICENSE).

## Disclaimer

This is not an official Intel product.
Binary file added assets/rag_foundry.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions configs/evaluation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
answer_processor:
_target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
capture_pattern: # "<ANSWER>: (.*)"
stopping_pattern: # "[,.;]"

metrics:
- _target_: ragfoundry.evaluation.metrics.HFEvaluate
metric_names: [rouge]
- _target_: ragfoundry.evaluation.metrics.EM
- _target_: ragfoundry.evaluation.metrics.StringEM
- _target_: ragfoundry.evaluation.metrics.F1
- _target_: ragfoundry.evaluation.metrics.BERTScore
model: microsoft/deberta-large-mnli
- _target_: ragfoundry.evaluation.metrics.Semantic
model: vectara/hallucination_evaluation_model
- _target_: src.evaluation.metrics.Classification
mapping: {"yes": 1, "no": 0, "maybe": 2}
else_value: 2
- _target_: ragfoundry.evaluation.deep.Faithfulness
azure_endpoint:
azure_deployment:
api_version:
- _target_: ragfoundry.evaluation.deep.Relevancy
azure_endpoint:
azure_deployment:
api_version:
embeddings: BAAI/bge-small-en-v1.5


key_names:
generated: generated
label: answer
query: query
context: context

results_file: my-evaluation.yaml
generated_file: inference.jsonl
data_file: my-processed-data.jsonl
limit:
use_wandb:
experiment:
wandb_entity:
25 changes: 25 additions & 0 deletions configs/external/haystack/qdrant.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
components:
retriever:
init_parameters:
document_store:
type: haystack_integrations.document_stores.qdrant.document_store.QdrantDocumentStore
init_parameters:
url: qdrant.url.com
port: 6333
index: wikipedia
embedding_dim: 768
similarity: dot_product
write_batch_size: 50
top_k: 10
type: haystack_integrations.components.retrievers.qdrant.retriever.QdrantEmbeddingRetriever
text_embedder:
init_parameters:
batch_size: 64
model: BAAI/llm-embedder
prefix: "Represent this query for retrieving relevant documents: "
device:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
connections:
- receiver: retriever.query_embedding
sender: text_embedder.embedding

26 changes: 26 additions & 0 deletions configs/inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
model:
_target_: ragfoundry.models.hf.HFInference
model_name_or_path: microsoft/Phi-3-mini-128k-instruct
load_in_4bit: false
load_in_8bit: true
device_map: auto
torch_dtype:
trust_remote_code: true
instruction: ragfoundry/processing/prompts/prompt_instructions/qa.txt
instruct_in_prompt: false
lora_path:
generation:
do_sample: false
max_new_tokens: 50
max_length:
temperature:
top_k:
top_p:
return_full_text: false

data_file: my-processed-data.jsnol
generated_file: model-predictions.jsonl
input_key: prompt
generation_key: output
target_key: answer
limit:
29 changes: 29 additions & 0 deletions configs/paper/evaluation-asqa-long.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
answer_processor:
_target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
capture_pattern: "<ANSWER>: (.*)"
stopping_pattern:

metrics:
- _target_: ragfoundry.evaluation.deep.Faithfulness
azure_endpoint: azure.endpoint.com
azure_deployment: GPT-4-32k-Bot
api_version: 2024-05-01-preview
- _target_: ragfoundry.evaluation.deep.Relevancy
azure_endpoint: azure.endpoint.com
azure_deployment: GPT-4-32k-Bot
api_version: 2024-05-01-preview
embeddings: BAAI/bge-small-en-v1.5

key_names:
generated: text
label: answers
query: query
context: positive_passages

results_file: asqa-context-dev-generated-results.yaml
generated_file: asqa-context-dev-generated.jsonl
data_file: asqa-context-dev.jsonl
limit:
use_wandb:
experiment:
wandb_entity:
20 changes: 20 additions & 0 deletions configs/paper/evaluation-asqa-short.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
answer_processor:
_target_: ragfoundry.processing.answer_processors.regex.RegexAnswer
capture_pattern: "<ANSWER>: (.*)"
stopping_pattern:

metrics:
- _target_: ragfoundry.evaluation.metrics.StringEM

key_names:
generated: text
label: answer-short
query: query

results_file: evaluation-asqa-baseline.yaml
generated_file: asqa-baseline-dev-generated.jsonl
data_file: asqa-baseline-dev.jsonl
limit:
use_wandb:
experiment:
wandb_entity:
27 changes: 27 additions & 0 deletions configs/paper/inference-asqa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
model:
_target_: ragfoundry.models.hf.HFInference
model_name_or_path: microsoft/Phi-3-mini-128k-instruct
load_in_4bit: false
load_in_8bit: true
device_map: auto
torch_dtype:
trust_remote_code: true
instruction: ragfoundry/processing/prompts/prompt_instructions/qa.txt
instruct_in_prompt: false
lora_path:
generation:
do_sample: false
max_new_tokens: 50
max_length:
temperature:
top_k:
top_p:
return_full_text: false

data_file: asqa-baseline-dev.jsonl
generated_file: asqa-baseline-dev-generated.jsonl
input_key: prompt
generation_key: output
target_key: answers
limit:

20 changes: 20 additions & 0 deletions configs/paper/processing-asqa-baseline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: asqa_baseline
cache: false
output_path: .
hfhub_tag:
steps:
- _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
inputs: dev
filename: asqa-dev.jsonl

- _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
inputs: dev
prompt_file: ragfoundry/processing/prompts/qa-short.txt
output_key: prompt
mapping:
query: query

- _target_: ragfoundry.processing.global_steps.output.OutputData
inputs: dev
prefix: asqa-baseline

30 changes: 30 additions & 0 deletions configs/paper/processing-asqa-context.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: asqa_context
cache: false
output_path: .
hfhub_tag:
steps:
- _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
inputs: train
filename: asqa-train.jsonl

- _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
inputs: dev
filename: asqa-dev.jsonl

- _target_: ragfoundry.processing.local_steps.context.DocumentsJoiner
inputs: [train, dev]
docs_key: positive_passages
k: 5

- _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
inputs: [train, dev]
prompt_file: ragfoundry/processing/prompts/qa.txt
output_key: prompt
mapping:
question: query
context: positive_passages

- _target_: ragfoundry.processing.global_steps.output.OutputData
inputs: [train, dev]
prefix: asqa-context

26 changes: 26 additions & 0 deletions configs/paper/processing-asqa-cot-dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: asqa_cot_dev
cache: false
output_path: .
hfhub_tag:
steps:
- _target_: ragfoundry.processing.dataset_loaders.loaders.LocalLoader
inputs: dev
filename: asqa-dev.jsonl

- _target_: ragfoundry.processing.local_steps.context.DocumentsJoiner
inputs: dev
docs_key: positive_passages
k: 5

- _target_: ragfoundry.processing.local_steps.prompter.TextPrompter
inputs: dev
prompt_file: ragfoundry/processing/prompts/cot.txt
output_key: prompt
mapping:
question: query
context: positive_passages

- _target_: ragfoundry.processing.global_steps.output.OutputData
inputs: dev
prefix: asqa-cot

Loading

0 comments on commit 225ca88

Please sign in to comment.