EMOTTS: Multilingual Emotion-Controlled Voice Cloning Text-to-Speech System

Create Env

conda create -n emo python=3.8
conda activate emo

Install packages

pip install -r requirements.txt
python env.py

Download Pre-trained Model

Download the model by this link, and then put them into /chinese-roberta-wwm-ext

Collecting Data

Collet the data by this

Preprocessing

Use this code to complete the following preprocessing:

Change the audio to single channel, sampling rate to 22050, format to wav.
Merge and slice the audio into 10s segments.
Use ASR technology to recognize text in speech.
Store the audio, emotion and text in 3 folders with corresponding file names.

# The audio path and corresponding text and emotion are stored and divided into training set and validation set.
python getdata.py
python split.py

Build Monotonic Alignment Search

cd monotonic_align
python setup.py build_ext --inplace
cd ..

Training

python train.py -c path/to/json -m model

Inference

python infer.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
chinese-roberta-wwm-ext		chinese-roberta-wwm-ext
configs		configs
filelists		filelists
img		img
model		model
monotonic_align		monotonic_align
output		output
text		text
.DS_Store		.DS_Store
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
dept.py		dept.py
env.py		env.py
env.txt		env.txt
getdata.py		getdata.py
infer.py		infer.py
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
split.py		split.py
test.py		test.py
test2.py		test2.py
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMOTTS: Multilingual Emotion-Controlled Voice Cloning Text-to-Speech System

Create Env

Install packages

Download Pre-trained Model

Collecting Data

Preprocessing

Build Monotonic Alignment Search

Training

Inference

About

Releases

Packages

Languages

LuckyBian/EMOTTS

Folders and files

Latest commit

History

Repository files navigation

EMOTTS: Multilingual Emotion-Controlled Voice Cloning Text-to-Speech System

Create Env

Install packages

Download Pre-trained Model

Collecting Data

Preprocessing

Build Monotonic Alignment Search

Training

Inference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages