STT-data-collection

The purpose of this challenge is to build a data engineering pipeline that allows recording millions of Amharic speakers reading digital texts in-app and web platforms.

Table of content

Introduction
Installation
Folders
Technologies
Contributers

Introduction

There are many text corpuses for Amharic and Swahili. Our client 10 academy wants to gather vast amount of quality audio data from diffrent applications by displaying text corpus and record users reading the displayed text. And build robust, large scale, fault tolerant, highly available Kafka cluster that can be used to post a sentence and receive an audio file.

Installation

kafka installation
airflow installation
spark installation

Folders

data :
notebooks :
scripts :
tests :

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
application		application
challenge-document		challenge-document
data		data
notebooks		notebooks
scripts		scripts
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
__init__.py		__init__.py
requirement.txt		requirement.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STT-data-collection

Table of content

Introduction

Installation

Folders

Technolologies

Contributers

About

Releases

Packages

Languages

10Academy-Group-4/STT-data-collection

Folders and files

Latest commit

History

Repository files navigation

STT-data-collection

Table of content

Introduction

Installation

Folders

Technolologies

Contributers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages