Skip to content

IMDB Movie Data ETL Pipeline using S3, Glue, Redshift, EventBridge, SNS

Notifications You must be signed in to change notification settings

ShikhaYadav123/AWS-Glue-IMDB-Data-Quality-ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDB Data Quality ETL Pipeline :

Technology Used :

Amazon Web Services, S3 (Simple Storage Service), Glue Crawler, Glue Catalog, Visual ETL, Redshift, CloudWatch, EventBridge, SNS (Social Networking Service)

Overview :

I have created an ETL pipeline designed to extract data from the source S3, transform it to ensure data quality and consistency using AWS Glue, and load it into the Redshift for further analysis and reporting.

Architecture :

Architecture

Features :

1)Data Extraction: Utilized AWS S3 for storing raw IMDB movie data, ensuring secure and scalable storage, extracted the metadata using Glue Crawler.

2)Data Transformation: Employed AWS Glue for data transformation, implementing rules for data quality and consistency checks. This included handling missing values, data type conversions, and applying business rules.

3)Data Loading: Load transformed data into the destination Redshift table for analysis. And Loading Bad Data into S3 bucket for further analysis.

4)Automation and Monitoring: Integrated AWS CloudWatch for monitoring ETL job performance and logging. Set up AWS EventBridge to trigger ETL jobs based on specific events.

5)Alerting and Notifications: Configured AWS SNS to send notifications for ETL job statuses and failures, ensuring timely updates and quick resolution of issues.

imdb-glue-visual-ETL Redshift-table

DataSet Used :

Here's the DataSet link - https://www.kaggle.com/datasets/thedevastator/netflix-imdb-scores

Releases

No releases published

Packages

No packages published

Languages