Skip to content

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

Notifications You must be signed in to change notification settings

George-Nyamao/GCP_ETL_Project

Repository files navigation

GCP_ETL_Project

First, we create a fake employee dataset with Python with the help of the Faker library. We then upload the dataset to a Google Cloud Storage bucket using the same Python program. We use Wrangler in Data Fusion to concatenate columns and mask Personal Identifiable Information (PII). We then send the resulting table to BigQuery and create a report in Looker.

Screenshot of the pipeline

Finally, we automate the workflow using Apache Airflow in Cloud Composer.

About

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages