Skip to content

Built a local tech stack tool to complete ELT data pipeline of any kind. The tools consist of MySQL, PostgreSQL, dbt, Spark, Airflow and Docker

Notifications You must be signed in to change notification settings

Abuton/Data-Tech-Stack-Tools

 
 

Repository files navigation

Data-Eng-Tut

Status GitHub Issues GitHub Pull Requests License

image

Data warehouse tech stack with MySQL, DBT, Airflow, and Spark

Business Need

You and your colleagues have joined to create an AI startup that deploys sensors to businesses, collects data from all activities in a business - from people’s interaction to the smart appliances installed in the company to reading environmental and other relevant information. Your startup is responsible to install all the required sensors, receive a stream of data from all sensors, and analyse the data to provide key insights to the business. The objective of your contract with the client is to reduce the cost of running the client facility as well as to increase the livability and productivity of workers. In this challenge you are tasked to create a scalable data warehouse tech-stack that will help you provide the AI service to the client. By the end of this project, you should produce a tool that can be used as a basis for the data warehouse needs of your startup.

Tasks

Complete the following tasks: - Create a DAG in Airflow that uses the bash/python operator to load the data files into your database. Think about a useful separation of Prod, Dev and Staging

- Connect dbt with your DWH and write transformations codes for the data you can execute via the Bash or Python operator in Airflow. Write proper documentation for your data models and access the dbt docs UI for presentation. 

-Check additional modules of dbt that can support you with data quality monitoring (e.g. great_expectations, dbt_expectations or re-data). 

- Connect the reporting environment and create a dashboard out of this data

⛏️ Built Using

  • Python - Programming Language
  • Airflow - Orchestration and Automation
  • DBT - Data Transformation using sql select statement
  • Redash - Dashboard Environment
  • Spark - Big Data Loading and Transformation
  • Superset - Migrated Dashboard Environment

About

Built a local tech stack tool to complete ELT data pipeline of any kind. The tools consist of MySQL, PostgreSQL, dbt, Spark, Airflow and Docker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 41.0%
  • JavaScript 33.3%
  • TypeScript 17.6%
  • Less 6.6%
  • HTML 1.1%
  • Shell 0.2%
  • Other 0.2%