This repo contains all information regarding Udacity's data engineer nanodegree
It is structured around the following projects:
- Data modeling with PostgreSQL
- Data modeling with Apache Cassandra
- Data warehousing in the (AWS-)cloud with RedShift
- ETL with Apache Spark in the (AWS-)cloud with Elastic MapReduce (EMR)
- Data pipelines with Apache Airflow
- Capstone project: Streaming and processing tweets and climate data with tweepy, Kinesis, Comprehend, S3 and Redshift