Skip to content

LuluW8071/Deep-Speech-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Speech 2: End-to-End Speech Recognition

Code in Progress License Open Issues Closed Issues Open PRs Repo Size Last Commit

This repository contains an implementation of the paper Deep Speech 2: End-to-End Speech Recognition using Lightning AI ⚡. Deep Speech 2 was a state-of-the-art automatic speech recognition (ASR) model designed to transcribe speech into text with end-to-end training using deep learning techniques in 2015.

Installation

  1. Clone the repository:

    git clone --recursive https://github.com/LuluW8071/Deep-Speech-2.git
    cd deep-speech-2
  2. Install Pytorch and required dependencies:

    pip install -r requirements.txt

    Ensure you have PyTorch and Lightning AI installed.

Dataset

This implementation supports LibriSpeech. The datasets are automatically downloaded and preprocessed during training.

Usage

Training

Important

Before training make sure you have placed comet ml api key and project name in the environment variable file .env.

To train the Deep Speech 2 model, use the following command for default training configs:

python3 train.py

Customize the pytorch training parameters by passing arguments in train.py to suit your needs:

Refer to the provided table to change hyperparameters and train configurations.

Args Description Default Value
-g, --gpus Number of GPUs per node 1
-g, --num_workers Number of CPU workers 8
-db, --dist_backend Distributed backend to use for training ddp_find_unused_parameters_true
--epochs Number of total epochs to run 50
--batch_size Size of the batch 32
-lr, --learning_rate Learning rate 1e-5 (0.00001)
--checkpoint_path Checkpoint path to resume training from None
--precision Precision of the training 16-mixed
python3 train.py 
-g 4                   # Number of GPUs per node for parallel gpu training
-w 8                   # Number of CPU workers for parallel data loading
--epochs 10            # Number of total epochs to run
--batch_size 64        # Size of the batch
-lr 2e-5               # Learning rate
--precision 16-mixed   # Precision of the training
--checkpoint_path path_to_checkpoint.ckpt    # Checkpoint path to resume training from

Model Architecture

Deep Speech 2 Architecture

References