[Official] Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

This repository contains code for the paper "Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks" presented in EMNLP 2022.

Reproducibility Checklist

We used "bert-base-multilingual-cased". Vocab size is about 120,000 and the number of parameters is about 180M.
We used GeForce RTX 3090. For training MUSC on XNLI (the largest time-consuming task), about 2 days are required.

How to start

All steps start from the root directory.

Set conda env

cd data
bash install_tools.sh

Download datasets

For MLDocs dataset, refer to https://github.com/facebookresearch/MLDoc
For MARC dataset, refer to https://docs.opendata.aws/amazon-reviews-ml/readme.html
For XTREME datasets (XNLI, PAWSX)

source activate fsxlt
conda install -c conda-forge transformers
pip install networkx==1.11

cd data
bash scripts/download_data.sh

MUSC (refer to exps folder)

source activate fsxlt
pip install -r requirements.txt

Contact

Jaehoon Oh: jhoon.oh@kaist.ac.kr
Jongwoo Ko: jongwoo.ko@kaist.ac.kr

References

How Multilingual is Multilingual BERT?
Cross-lingual Language Model Pretraining
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (https://github.com/google-research/xtreme)
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters (https://github.com/fsxlt/code)
https://github.com/fsxlt/buckets

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
data		data
data_loader		data_loader
exps		exps
future		future
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluation.ipynb		evaluation.ipynb
finetuning_baseline.py		finetuning_baseline.py
finetuning_parameters.py		finetuning_parameters.py
requirements.txt		requirements.txt
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[Official] Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

Reproducibility Checklist

How to start

Contact

References

About

Releases

Packages

Contributors 2

Languages

jongwooko/MUSC

Folders and files

Latest commit

History

Repository files navigation

[Official] Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

Reproducibility Checklist

How to start

Contact

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages