I write a attention-based seq2seq model for neural machine translation. It can runs on multiple GPUs(one PC with multiple GPUs)
My data was downloaded from nlp.stanford.edu/projects/nmt/. Trained on small dataset english-Vietnamese. Pickle data is avaliable if you want
build_dict.py
preprocess dataset. I preprocessed input dataset into pickle files. Transfer string into int32, filter length(3~50). Note your file path.config.py
some model parameters.model_topbah.py
Bahanau attention on top layer of decoder and encodertrain_vi.py
entrance for training. set up your own parameters at the beginning of this file. At line 37, set gpu_id likegpus = "5,6,7"
, No space in string.gpuloader.py
dataloader for multiple gpu training.dataloader.py
dataloader to feed in data totf.placeholder
. Not used.
- You can try Luong attention as well, but I didn't get well result using Luong, not so easy to train. RMSProp and Adam need small lr(like 0.001), while SGD need bigger like 1.0. But SGD is much harder to train.
- Best result is
output att=False, rmsp, (lr=0.001, start_decay=8000,0.8)
, got bleu=20.5% on tst2012.vi without beam search. - decode phase not tested.
- Spend lots of time on writing multiple gpu training. how to feed in data? how to compute loss and gradient?
My code, especially model.py
and train.py
, is not well organized, may be updated if I have spare time.
may add comments.
Want to use tf.data.Dataset
api. I wonder how to set validation_per_train_step.
- https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py # good for beginners
- https://github.com/tensorflow/nmt/tree/master/nmt
- https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
- Effective Approaches to Attention-based Neural Machine Translation
- neural machine translation by jointly learning align and translate