Data

🎵 MuSiQue: Multi-hop Questions via Single-hop Question Composition

Repository for our TACL 2022 paper "MuSiQue: Multi-hop Questions via Single-hop Question Composition"

Data

MuSiQue is distributed under a CC BY 4.0 License.

Usage Caution: If you're using any of our seed single-hop datasets (SQuAD, T-REx, Natural Questions, MLQA, Zero Shot RE) in any way (e.g., pretraining on them), please note that MuSiQue was created by composing questions from these seed datasets. Therefore, single-hop questions used in MuSiQue's dev/test sets may occur in the training sets of these seed datasets. To help avoid information leakage, we are releasing the IDs of single-hop questions that are used in MuSiQue dev/test sets. Once you download the data below, these IDs and corresponding questions will be in data/dev_test_singlehop_questions_v1.0.json. If you use our seed single-hop datasets in any way in your model, please be sure to avoid using any single-hop question IDs present in this file

To download MuSiQue, either run the following script or download it manually from here.

bash download_data.sh

The result will be stored in data/ directory. It contains (i) train, dev and test sets of MuSiQue-Ans and MuSiQue-Full, (ii) single-hop questions and ids from source datasets (squad, natural questions, trex, mlqa, zerore) that are part of dev or test of MuSiQue.

Predictions

We're releasing the model predictions (in official format) for 4 models on dev sets of MuSiQue-Ans and MuSiQue-Full. To get it, you can run the following script or download it manually from here.

bash download_predictions.sh

Evaluation

You can use evaluate_v1.0.py to evaluate your predictions against ground-truths. For eg.:

python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl

These are the results you would get for MuSiQue-Answerable and MuSiQue-Full validation sets and for each of the four models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model).

# MuSiQue-Answerable
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.423, "support_f1": 0.676}

python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.473, "support_f1": 0.723}

python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.456, "support_f1": 0.778}

python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.497, "support_f1": 0.792}

# MuSiQue-Full
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.406, "support_f1": 0.325, "group_answer_sufficiency_f1": 0.22, "group_support_sufficiency_f1": 0.252}

python evaluate_v1.0.py predictions/musique_full_v1.0_dev_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.486, "support_f1": 0.522, "group_answer_sufficiency_f1": 0.344, "group_support_sufficiency_f1": 0.42}

python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.463, "support_f1": 0.75, "group_answer_sufficiency_f1": 0.321, "group_support_sufficiency_f1": 0.447}

python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.498, "support_f1": 0.777, "group_answer_sufficiency_f1": 0.328, "group_support_sufficiency_f1": 0.431}

Leaderboard

We've two leaderboards for MuSiQue: MuSiQue-Answerable and MuSiQue-Full.

Once you've the test set predictions in the official format, it's just about uploading the files in the above leadboards! Feel free to contact me (Harsh) in case you've any questions.

Models and Experiments

We've relased the code that we used for experiments in the paper. If you're interested in trying our trained models, training them from sratch, viewing their predictions or generating their predictions from your trained model, follow the steps below.

Installations

# Set env.
conda create -n musique python=3.8 -y && conda activate musique

# Set allennlp in root directory
git clone https://github.com/allenai/allennlp
cd allennlp
git checkout v2.1.0
git apply ../allennlp.diff # small diff to get longformer global attention to work correctly.
cd ..

pip install allennlp==2.1.0 # we only need dependencies of allennlp
pip uninstall -y allennlp

pip install gdown==v4.5.1
python -m nltk.downloader stopwords

pip uninstall -y transformers
pip install transformers==4.7.0 # we used this version of transformers

Download Raw Data

Our models were developed using a different (non-official) format of the dataset files. So to run our code, you'll first need to download the dataset files in the raw format.

python download_raw_data.py

Note that officially released data and what we've used here are only different in the format (e.g. uses different names for json fields), and are not qualitatively different. Take a look at raw_data_to_official_format.py if you're interested.

We've done experiments on 4 datasets (MuSiQue-Ans, MuSiQue-Full, HotpotQA-20K, 2WikiMultihopQA-20K) with 4 multihop models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model) where possible. See Table 1. You can explore each combination using the instruction toggle below.

For each combination, you'll see instructions on how (i) download trained model (ii) train a model from scratch (iii) download model prediction/s (iv) generate predictions with a trained or a downloaded model.

Our models are implemented in allennlp. If you're familiar with it, using the code should be pretty straightforward. The only difference is that instead of using allennlp command, we're using run.py as an entrypoint, which mainly loads allennlp_lib to load our allennlp code (readers, models, predictors, etc).

MuSiQue-Answerable

End2End Model [EE]

Experiment Name

end2end_model_for_musique_ans_dataset

Download model

python download_models.py end2end_model_for_musique_ans_dataset

Train from scratch

python run.py train experiment_configs/end2end_model_for_musique_ans_dataset.jsonnet \
                    --serialization-dir serialization_dir/end2end_model_for_musique_ans_dataset

Download prediction/s

python download_raw_predictions.py end2end_model_for_musique_ans_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/end2end_model_for_musique_ans_dataset/model.tar.gz \
                      raw_data/musique_ans_dev.jsonl \
                      --output-file serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl \
                      --predictor transformer_rc --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl

Select+Answer Model [SA]

The system has 2 parts given below: (i) Selector Model (ii) Answerer Model

Experiment Name

# Selector Model
select_and_answer_model_selector_for_musique_ans

Download model

python download_models.py select_and_answer_model_selector_for_musique_ans

Train from scratch

python run.py train experiment_configs/select_and_answer_model_selector_for_musique_ans.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_ans

Download prediction/s

python download_raw_predictions.py select_and_answer_model_selector_for_musique_ans

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
                      raw_data/musique_ans_train.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_train.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
                      raw_data/musique_ans_dev.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer Model
select_and_answer_model_answerer_for_musique_ans

Download model

python download_models.py select_and_answer_model_answerer_for_musique_ans

Train from scratch

python run.py train experiment_configs/select_and_answer_model_answerer_for_musique_ans.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_ans

Download prediction/s

python download_raw_predictions.py select_and_answer_model_answerer_for_musique_ans

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
                      serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
                      --output-file serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl

Execution by End2End Model [EX(EE)]

The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full

Download model

python download_models.py execution_model_decomposer_for_musique_ans_and_full

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
                      raw_data/musique_ans_dev.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Executor Model
execution_by_end2end_model_for_musique_ans

Download model

python download_models.py execution_by_end2end_model_for_musique_ans

Train from scratch

python run.py train experiment_configs/execution_by_end2end_model_for_musique_ans.jsonnet \
                    --serialization-dir serialization_dir/execution_by_end2end_model_for_musique_ans

Download prediction/s

python download_raw_predictions.py execution_by_end2end_model_for_musique_ans

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_end2end_model_for_musique_ans/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
                      --output-file serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
                      --predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl

Execution by Select+Answer Model [EX(SA)]

The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full

Download model

python download_models.py execution_model_decomposer_for_musique_ans_and_full

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
                      raw_data/musique_ans_dev.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_ans

Download model

python download_models.py execution_by_select_and_answer_model_selector_for_musique_ans

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_ans.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_ans

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
                      raw_data/musique_ans_single_hop_version_train.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_train.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
                      raw_data/musique_ans_single_hop_version_dev.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_ans

Download model

python download_models.py execution_by_select_and_answer_model_answerer_for_musique_ans

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_ans.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_ans

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
                      --predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz","num_select":3}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl

MuSiQue-Full

End2End Model [EE]

Experiment Name

end2end_model_for_musique_full_dataset

Download model

python download_models.py end2end_model_for_musique_full_dataset

Train from scratch

python run.py train experiment_configs/end2end_model_for_musique_full_dataset.jsonnet \
                    --serialization-dir serialization_dir/end2end_model_for_musique_full_dataset

Download prediction/s

python download_raw_predictions.py end2end_model_for_musique_full_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/end2end_model_for_musique_full_dataset/model.tar.gz \
                      raw_data/musique_full_dev.jsonl \
                      --output-file serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl \
                      --predictor transformer_rc --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl

Select+Answer Model [SA]

The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.

Experiment Name

# Selector Model
select_and_answer_model_selector_for_musique_full

Download model

python download_models.py select_and_answer_model_selector_for_musique_full

Train from scratch

python run.py train experiment_configs/select_and_answer_model_selector_for_musique_full.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_full

Download prediction/s

python download_raw_predictions.py select_and_answer_model_selector_for_musique_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
                      raw_data/musique_full_train.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_train.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
                      raw_data/musique_full_dev.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer Model
select_and_answer_model_answerer_for_musique_full

Download model

python download_models.py select_and_answer_model_answerer_for_musique_full

Train from scratch

python run.py train experiment_configs/select_and_answer_model_answerer_for_musique_full.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_full

Download prediction/s

python download_raw_predictions.py select_and_answer_model_answerer_for_musique_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_full/model.tar.gz \
                      serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
                      --output-file serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl

Execution by End2End Model [EX(EE)]

The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full

Download model

python download_models.py execution_model_decomposer_for_musique_ans_and_full

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
                      raw_data/musique_full_dev.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Executor Model
execution_by_end2end_model_for_musique_full

Download model

python download_models.py execution_by_end2end_model_for_musique_full

Train from scratch

python run.py train experiment_configs/execution_by_end2end_model_for_musique_full.jsonnet \
                    --serialization-dir serialization_dir/execution_by_end2end_model_for_musique_full

Download prediction/s

python download_raw_predictions.py execution_by_end2end_model_for_musique_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_end2end_model_for_musique_full/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
                      --output-file serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
                      --predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl

Execution by Select+Answer Model [EX(SA)]

The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full

Download model

python download_models.py execution_model_decomposer_for_musique_ans_and_full

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
                      raw_data/musique_full_dev.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_full

Download model

python download_models.py execution_by_select_and_answer_model_selector_for_musique_full

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_full.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
                      raw_data/musique_full_single_hop_version_train.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_train.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
                      raw_data/musique_full_single_hop_version_dev.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_full

Download model

python download_models.py execution_by_select_and_answer_model_answerer_for_musique_full

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_full.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_full

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
                      --predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz","num_select":3}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl

HotpotQA

End2End Model [EE]

Experiment Name

end2end_model_for_hotpotqa_20k_dataset

Download model

python download_models.py end2end_model_for_hotpotqa_20k_dataset

Train from scratch

python run.py train experiment_configs/end2end_model_for_hotpotqa_20k_dataset.jsonnet \
                    --serialization-dir serialization_dir/end2end_model_for_hotpotqa_20k_dataset

Download prediction/s

python download_raw_predictions.py end2end_model_for_hotpotqa_20k_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/end2end_model_for_hotpotqa_20k_dataset/model.tar.gz \
                      raw_data/hotpotqa_dev_20k.jsonl \
                      --output-file serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl \
                      --predictor transformer_rc --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl

Select+Answer Model [SA]

The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.

Experiment Name

# Selector Model
select_and_answer_model_selector_for_hotpotqa_20k

Download model

python download_models.py select_and_answer_model_selector_for_hotpotqa_20k

Train from scratch

python run.py train experiment_configs/select_and_answer_model_selector_for_hotpotqa_20k.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k

Download prediction/s

python download_raw_predictions.py select_and_answer_model_selector_for_hotpotqa_20k

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
                      raw_data/hotpotqa_train_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_train_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
                      raw_data/hotpotqa_dev_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer Model
select_and_answer_model_answerer_for_hotpotqa_20k

Download model

python download_models.py select_and_answer_model_answerer_for_hotpotqa_20k

Train from scratch

python run.py train experiment_configs/select_and_answer_model_answerer_for_hotpotqa_20k.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k

Download prediction/s

python download_raw_predictions.py select_and_answer_model_answerer_for_hotpotqa_20k

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/model.tar.gz \
                      serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl

2WikiMultihopQA

End2End Model [EE]

Experiment Name

end2end_model_for_2wikimultihopqa_20k_dataset

Download model

python download_models.py end2end_model_for_2wikimultihopqa_20k_dataset

Train from scratch

python run.py train experiment_configs/end2end_model_for_2wikimultihopqa_20k_dataset.jsonnet \
                    --serialization-dir serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset

Download prediction/s

python download_raw_predictions.py end2end_model_for_2wikimultihopqa_20k_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/model.tar.gz \
                      raw_data/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --predictor transformer_rc --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl

Select+Answer Model [SA]

The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.

Experiment Name

# Selector Model
select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset

Download model

python download_models.py select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset

Train from scratch

python run.py train experiment_configs/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset

Download prediction/s

python download_raw_predictions.py select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
                      raw_data/2wikimultihopqa_train_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_train_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
                      raw_data/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer Model
select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset

Download model

python download_models.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset

Train from scratch

python run.py train experiment_configs/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset.jsonnet \
                    --serialization-dir serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset

Download prediction/s

python download_raw_predictions.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset

Predict with a trained or a downloaded model

python run.py predict serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/model.tar.gz \
                      serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl \
                      --predictor transformer_rc --batch-size 16 --cuda-device 0 --silent

# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl

Execution by End2End Model [EX(EE)]

The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_2wikimultihopqa

Download model

python download_models.py execution_model_decomposer_for_2wikimultihopqa

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqa

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqa

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
                      raw_data/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Executor Model
execution_by_end2end_model_for_2wikimultihopqa

Download model

python download_models.py execution_by_end2end_model_for_2wikimultihopqa

Train from scratch

python run.py train experiment_configs/execution_by_end2end_model_for_2wikimultihopqa.jsonnet \
                    --serialization-dir serialization_dir/execution_by_end2end_model_for_2wikimultihopqa

Download prediction/s

python download_raw_predictions.py execution_by_end2end_model_for_2wikimultihopqa

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
                      --predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl

Execution by Select+Answer Model [EX(SA)]

The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.

Experiment Name

# Decomposer Model
execution_model_decomposer_for_2wikimultihopqa

Download model

python download_models.py execution_model_decomposer_for_2wikimultihopqa

Train from scratch

python run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
                    --serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqa

Download prediction/s

python download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqa

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
                      raw_data/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --predictor question_translator --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_2wikimultihopqa

Download model

python download_models.py execution_by_select_and_answer_model_selector_for_2wikimultihopqa

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_2wikimultihopqa.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_2wikimultihopqa

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
                      raw_data/2wikimultihopqa_single_hop_version_train_20k.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_train_20k.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
                      raw_data/2wikimultihopqa_single_hop_version_dev.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_dev.jsonl \
                      --predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent

Experiment Name

# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_2wikimultihopqa

Download model

python download_models.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqa

Train from scratch

python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa.jsonnet \
                    --serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa

Download prediction/s

python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqa

Predict with a trained or a downloaded model

python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/model.tar.gz \
                      serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
                      --output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
                      --predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
                      --predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz","num_select":3}'


# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl

Citation

If you use this in your work, please cite use:

@article{trivedi2021musique,
  title={{M}u{S}i{Q}ue: Multihop Questions via Single-hop Question Composition},
  author={Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish},
  journal={Transactions of the Association for Computational Linguistics},
  year={2022}
  publisher={MIT Press}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
allennlp_lib		allennlp_lib
experiment_configs		experiment_configs
metrics		metrics
utils		utils
.all_data_information.json		.all_data_information.json
.all_experiment_information.json		.all_experiment_information.json
.answer_aliases.json		.answer_aliases.json
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
allennlp.diff		allennlp.diff
download_data.sh		download_data.sh
download_models.py		download_models.py
download_predictions.sh		download_predictions.sh
download_raw_data.py		download_raw_data.py
download_raw_predictions.py		download_raw_predictions.py
evaluate_v1.0.py		evaluate_v1.0.py
raw_data_to_official_format.py		raw_data_to_official_format.py
raw_predictions_to_official_format.py		raw_predictions_to_official_format.py
run.py		run.py

License

StonyBrookNLP/musique

Folders and files

Latest commit

History

Repository files navigation

🎵 MuSiQue: Multi-hop Questions via Single-hop Question Composition

Data

Predictions

Evaluation

Leaderboard

Models and Experiments

Installations

Download Raw Data

MuSiQue-Answerable

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

MuSiQue-Full

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s

Predict with a trained or a downloaded model

Experiment Name

Download model

Train from scratch

Download prediction/s