GitHub

Steps for running the Ruler Eval

Install packages in requirements.txt
Change the default.yaml files in each of the tasks as needed. In particular, change max_seq_length to 64k, 32k etc according to the context length you want. a. https://github.com/vkaul11/ruler/blob/main/data/niah/conf/simulation/default.yaml b. https://github.com/vkaul11/ruler/blob/main/data/qa/conf/simulation/default.yaml c. https://github.com/vkaul11/ruler/blob/main/data/variable_tracking/conf/simulation/default.yaml
Change the model_id, auth_key and url for evaluation https://github.com/vkaul11/ruler/blob/main/eval/conf/config.yaml
Run the bash scripts https://github.com/vkaul11/ruler/blob/main/run_all_tasks.sh for running all the 3 tasks that will print out the metric per example and average metric or Run 1)NIAH task https://github.com/vkaul11/ruler/blob/main/run_niah.sh or 2) Variable tracking task https://github.com/vkaul11/ruler/blob/main/run_variable_tracking.sh or 3) QA task https://github.com/vkaul11/ruler/blob/main/run_qa.sh if you want to run the tasks individually and get the metrics
The eval directory will also have the predictions and errors for each of the tasks outputted.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
eval		eval
README.md		README.md
phi-model-4k.txt		phi-model-4k.txt
requirements.txt		requirements.txt
run_all_tasks.sh		run_all_tasks.sh
run_niah.sh		run_niah.sh
run_qa.sh		run_qa.sh
run_variable_tracking.sh		run_variable_tracking.sh