Code, model and data for Zhou et al. 2022: Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts. We are currently working on making this package easier to use, any advice is welcomed.
We provide the following data:
comparable_text_pretrain.txt.zip
Google Drive: Distant supervision data that we used for pre-train DecompT5 as described in Section 3.data/decomposition_train.txt
: The decomposition supervision we used to train the decomposition model in DecompEntail (on top of DecompT5).data/entailment_train.txt
: The entailment supervision we used to train the entailment model in DecompEntail (On top of T5-3b).data/strategyqa/*
: StrategyQA train/dev/test splits we used for experiments.data/hotpotqa/*
: HotpotQA binary questions we used for experiments.data/overnight/*
: Overnight data used for experiments.data/torque/*
: Torque data used for experiments.
We provide several trained model weights used in our paper, hosted on Huggingface hub. We randomly released one seed from multi-seed experiments.
- CogComp/l2d: T5-large trained on
comparable_text_pretrain.txt
. - CogComp/l2d-decomp: DecompT5 trained on
data/decomposition_train.txt
, used in the DecompEntail pipeline. - CogComp/l2d-entail: T5-3b trained on
data/entailment_train.txt
, used in the DecompEntail pipeline.
The code are divided into two separate packages, each using slightly different dependencies, as provided in corresponding requirements.txt
.
The seq2seq
package can be used to reproduce DecompT5, and its related experiments in Section 5 of the paper.
It is also used to train and evaluate the entailment model used in DecompEntail. We provide a few use case examples as shell scripts.
seq2seq/train_decompose.sh
: Train CogComp/l2d-decompseq2seq/train_entailment.sh
: Train CogComp/l2d-entailseq2seq/eval_entailment.sh
: Evaluate entailment model
In addition, we provide the generation and evaluation code for overnight and torque experiments in seq2seq/gen_seq.py
- To generate the top 10 candidates, use
gen_output()
. - To evaluate the generated candidates, use
evaluate_top()
. See code comments for more detail.
The DecompEntail pipeline can be run with the following steps:
- Generate decompositions given raw questions.
- This can be done by
generate_decomposition()
indecompose/gen_facts.py
. See comments for more detail.
- This can be done by
- Format generated decompositions into l2d-entail readable forms.
- This can be done by
format_to_entailment_model()
indecompose/gen_facts.py
.
- This can be done by
- Run l2d-entail to get entailment scores.
- This can be done by
seq2seq/eval_entailment.sh
and replacing the input file with the output file from the previous step. - If you run an aggregation with different seeds, concatenate the output files into one file and use as an input to the script.
- This can be done by
- Majority vote to derive final labels based on entailment scores.
- The previous step will output two files
eval_probs.txt
andeval_results_lm.txt
. Replace the path indecompose/evaluator.py
and compute accuracy.
- The previous step will output two files
See the following paper:
@inproceedings{ZRYR22,
author = {Ben Zhou and Kyle Richardson and Xiaodong Yu and Dan Roth},
title = {Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts},
booktitle = {EMNLP},
year = {2022},
}