This file describes the steps for (1) downloading dataset, (2) processing dataset, (3) training, and (4) evaluation.
From the main directory, run the following command:
bash prepare_nmt_dataset.sh wmt14_en_fr
To train a model with a single node comprising of 8 V100 GPUs (each with 32 GB memory), you can use the following command:
python nmt_wmt14_en2fr.py --d-m 256
where --d-m
is the model dimension. In our experiments, we have only tested d-m={128, 256, 384, 512, 640}
To evaluate a model, you can use the following command:
python generate.py data-bin/wmt14_en_fr/ --path <results_dir>/checkpoint_best.pt --beam 5 --lenpen 0.9 --remove-bpe --batch-size 128 --quiet
Here are the results that we obtain.
Model dimension (d_m) | Parameters | BLEU | Training Logs |
---|---|---|---|
128 | 8.19 M | 34.1 | Link |
256 | 13.89 M | 37.3 | Link |
384 | 23.35 M | 38.3 | Link |
512 | 36.86 M | 39.6 | Link |
640 | 54.14 M | 40.5 | Link |