English-Vietnamese Translation using Transformer

This is our Final Project for the Statistical Learning Course at VNUHCM - University of Science.

You can read our report at this link: Google Drive (detailed explanation about Transformer)

1. Installation

You should have CUDA installed with version 11.x.

Installation commands:

conda create -n trans python=3.9
conda activate trans
bash install.sh

Download the trained weights at this link and put it in folder ./runs.

Expected structure:

./runs/
    |-- <folder_name>/
        |-- config.yaml
        |-- best.pt
        |-- src_field.pt
        |-- trg_field.pt
        |-- ...

Run:

streamlit run app.py ./runs/<folder_name>

Screenshots:

We use the dataset from TED Talk (provided at this repo by pbcquoc). You can download the dataset at this link and put it in folder ./data.

Expected structure:

./data/
    |-- train.en
    |-- train.vi
    |-- val.en
    |-- val.vi
    |-- test.en
    |-- test.vi

You can modify the model architecture, optimizer hyps,... at the config file ./configs/_base_.yaml. Then run the command:

python ./tools/train.py \
    --config_path ./configs/_base_.yaml \
    --device cuda:0

With the result folder you get from training process, you can use it to evaluate the model with these command:

python tools/eval.py \
    --runs_path ./runs/2023-06-28_16-57-19 \
    --beam-size 3 \
    --device cuda:0

You can also add the argument --run-train-set to evaluate on set training, but it will take a long time to complete.