Official implementation for "In-context Example Selection with Influences". We introduce in-context influences as a way to select examples for few-shot in-context learning. Authors: Tai Nguyen and Eric Wong.
- Todo - Release influence scores for all tasks and code for baselines
- 04/18/2023 - Repository release
- 04/06/2023 - Blog post release
Create a new conda environment using environment.yml
. The env is called "icl-influences" by default.
conda env create -f environment.yml
conda activate icl-influences
Alternatively, feel free to use Dockerfile
to build your own Docker image.
Directory data-train400-dev200
holds the subsampled data from our paper.
We conducted experiments on 9 SuperGLUE tasks.
To redownload these datasets from HuggingFace, please run the following command.
python data_download.py
In addition to downloading, the script automatically samples a specified number of examples for train/dev/test data splits.
To compute in-context influences for a specific task and model, we first need to obtain a number of "training runs".
The following script 1) obtains the training runs, and 2) computes influence scores for both influence-based methods discussed in Section 3.1.
By default, we write training run results to out/
and influence scores to influence_scores.jsonl
.
python icl_influence.py --task=hellaswag \
--model_name_or_path=facebook/opt-6.7b \
--shot=46 \
--iterations=650 \
--cache_dir=<HF_CACHE_DIR>
In the above script, note that we pass in:
--shot
: The number of examples used in each few-shot prompt--iterations
: The number of training runs evaluated on the Dev set--cache_dir
: (Optional) Directory for caching all models downloaded from HuggingFace
We recommend specifying a maximal number of shots that could fit in the context window. This means that fewer iterations need to be run for good coverage of all train examples.
After influence scores are computed, run evaluation as followed.
python evaluate.py --task=hellaswag \
--model_name_or_path=facebook/opt-6.7b \
--split=test \
--method=incontext_influence_positive \
--resource_file=influence_scores.jsonl \
--cache_dir=<HF_CACHE_DIR>
The script picks a pre-defined k number of examples for each task define in evaluate.SHOT_MAP
(same settings as in-context influence computation).
- Add a method to
data_download.py
for downloading your own data. Keep the data fields similar to the current datasets. - Add the task type of your newly added task to
task_config.json
.- If the task type is outside of Multi-choice and Binary classification (ie. "free-form" text generation), you should also modify inference and encode methods in
utils.py
. - Alternative to accuracy, you can also define your own evaluation metric by modifying
icl_datamodel.py
.
- If the task type is outside of Multi-choice and Binary classification (ie. "free-form" text generation), you should also modify inference and encode methods in
- Add a new prompt template to
templates.py
. - Rerun the same pipeline.
We currently include working pipelines for 4 autoregressive model families: GPT-2, OPT, GPT-J/NeoX, and LLaMA.
To save on memory, we load all models with half precision (fp16
) wherever possible.
For LLaMA, please include the path to your converted weights following the HF's official guide.
If you find our work helpful, please cite:
@article{nguyen2023incontextinfluences,
author = Nguyen, Tai and Wong, Eric,
title = In-context Example Selection with Influences,
journal = arXiv,
year = 2023
}