Your Future Bonafide Work Experience and Externship Projects to Impress Recruiters

Embark on Your Future Portfolio Journey: Data Engineer and ML Engineering Projects with Microsoft Partnered Companies

Tiger Analytics ETL Pipeline Project.

This project aims to build an ETL pipeline to provide temperature, population, and immigration statistics for different cities. It involves extracting data from multiple datasets, transforming it with Apache Spark, and converting it into JSON files. The JSON files are then uploaded to a Redshift database via Apache Airflow and S3. Further transformations and loading occur in normalized fact and dimension tables using reusable tasks. Data checks are performed to ensure data accuracy and integrity

Shiprocket Data Pipline Project with Apache Airflow

The purpose of this project was to build a dynamic ETL data pipeline that utilizes automation and monitoring. The data pipeline is built from reusable tasks allows for easy backfills. It utilizes custom operators to perform tasks such as staging the data, filling the data warehouse, and running a check on the data as the final step so as to to catch any discrepancies in the datasets.

Juspay Data Warehouse Project

The purpose of this project is to build an ETL pipeline that will be able to extract song data from an S3 bucket and transform that data to make it suitable for analysis. This data can be used with business intelligence and visualization apps that will help the analytics team to better understand what songs are commonly listened to on the app.

Spotify Data Lake Project with Apache Spark

The purpose of this project was to build an ETL pipeline that will be able to extract song and log data from an S3 bucket, process the data using Spark and load the data back into s3 as a set of dimensional tables in spark parquet files. This helps analysts to continue finding insights on what their users are listening to.

Tredence Spark Streaming Project

The purpose of this project was to create a Kafka server to produce data and ingest data through Spark Structured Streaming.

Chicago Transit Authority Public Transit Status with Apache Kafka

In this project, I constructed a streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority I built an event pipeline around Kafka that allows me to simulate and display the status of train lines in real time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Your Future Bonafide Work Experience and Externship Projects to Impress Recruiters

Tiger Analytics ETL Pipeline Project.

Shiprocket Data Pipline Project with Apache Airflow

Juspay Data Warehouse Project

Spotify Data Lake Project with Apache Spark

Tredence Spark Streaming Project

Chicago Transit Authority Public Transit Status with Apache Kafka

Files

README.md

Latest commit

History

README.md

File metadata and controls

Your Future Bonafide Work Experience and Externship Projects to Impress Recruiters

Tiger Analytics ETL Pipeline Project.

Shiprocket Data Pipline Project with Apache Airflow

Juspay Data Warehouse Project

Spotify Data Lake Project with Apache Spark

Tredence Spark Streaming Project

Chicago Transit Authority Public Transit Status with Apache Kafka