Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 1.9 KB

README.md

File metadata and controls

53 lines (40 loc) · 1.9 KB

Project: Build A Data Warehouse using MYSQL, Airflow and DBT

This project builds a scalable data warehouse tech-stack that will help to provide an AI service to a client. The Data used for this project is sensor data in csv format. In Data (ucdavis.edu) you can find parquet data, and or sensor data in CSV. ELT pipeline that extracts data from the links in the website, stages them in MYSQL Database, and transforms data using DBT. This whole process is automated using Airflow.

Table of Contents

Project Structure

airflow
|___dags
  |____create_station_Summary.sql    # database/table creation using MYSQL
  |____insert_station_summary.sql    # sql file for loading data
  |____load_data_airflow.py          # loads data 
  |____dbt_airflow.py                # transforms data using dbt 
DBT
|___models
  |____merged_station.sql            # sql file for transforming tables

ELT Pipeline

load_data_airflow.py

ELT pipeline builder

  1. create_tables
    • create tables using MYSQL and automates using airflow
  2. load_tables
    • Load raw data from CSV Dataframe to staging tables and automates using airflow

dbt_airflow.py

Transforms table using sql files and automates using airflow

Built With

License

MIT