Skip to content

A research project that aims to create an application by introducing an end-to-end models. These models take in an audio signal and directly output transcriptions. This project was carried out using Deep learning models. An automatic end-to-end speech recognition pipeline for Swahili was built.

Notifications You must be signed in to change notification settings

nebasam/Speech-to-text-deep-learning-model

 
 

Repository files navigation

Speech Recognition

Live Transcription of Swahili Audio to Swahili Text

Navigation

Introduction

World food Program wants to collect nutritional information of food bought and sold in Kenya. The project is designed to have selected people install an app on their mobile phones, and whenever they buy food, they use their voices to activate the app to register the list of items they have bought in Swahili. The app is expected to live transcribe the voice of the people to text and organize the information in an easy-to-process way in a database

Objective

This project builds, trains and deploy a deep learning model which transcribe audio in Swahili to text in Swahili.

How to start

  • Machine Setup:

First, you need to have python 3 installed.

Next clone this github link

git clone https://github.com/10Academy-Group-4/Week-4

Finally, you can install the requirements. If you are an Anaconda user: (else replace pip with pip3 and python with python3)

pip install -r requirements.txt

  • Docker:

This is a containerized flask application with docker image put on docker hub.A docker image is available with all pre-requisites installed. Here is how you use it

Pull docker image

docker pull nebasam/stt-swahili

Run docker image

docker run --rm -it -p 33507:33507/tcp nebasam/stt-swahili:latest

Data

Data_Features

Input features (X): audio clips of spoken words
Target labels (y):  text transcript of what was spoken

Directory_Structure

  • Artifacts-A directory which contains artifacts such meta files and other artifacts generated through the project
  • Notebook-A directory which contains notebooks for describing the functionality of the the classes to achieve the meta generation and the preprocessing
  • Scripts-A directory which contains scripts for Meta generation, preprocessing and feature extraction
  • test_data-A directory which has data for running tests for every commit or merge on the main branch
  • tests-A directory which has the codes for testing every commit or merge on the main branch
  • data.dvc- DVC File for versioning of the data
  • requirements.txt- A file for dependencies for the project

Testing

The inbuit unittest library in python was used to for the testing of the functions and classes in the project. A .travis.ymal was added to automate testing of any commit or merge made to the main branch. Data used for testing is found in test_data directory

Modelling

To get an idea of how models are setup and investigated, take a look at the notebooks for Models, WordError and Augmentation.

Deployment

The user interface was built with flask. The model was dockerized and deployed on Heroku on https://swahili-stt.herokuapp.com/

Contributors

  1. Michael Darko Ahwireng
  2. Toyin Hawau Olamide
  3. Nebiyu Samuel
  4. Sibitenda Harriet
  5. Same Michael
  6. Mubarak Sani
  7. Khairat Ayinde

About

A research project that aims to create an application by introducing an end-to-end models. These models take in an audio signal and directly output transcriptions. This project was carried out using Deep learning models. An automatic end-to-end speech recognition pipeline for Swahili was built.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 91.4%
  • Python 6.9%
  • Other 1.7%