The purpose of this challenge is to build a data engineering pipeline that allows recording millions of Amharic speakers reading digital texts in-app and web platforms.
There are many text corpuses for Amharic and Swahili. Our client 10 academy wants to gather vast amount of quality audio data from diffrent applications by displaying text corpus and record users reading the displayed text. And build robust, large scale, fault tolerant, highly available Kafka cluster that can be used to post a sentence and receive an audio file.
- kafka installation
- airflow installation
- spark installation
- data :
- notebooks :
- scripts :
- tests :