First project for Big Data course held at Roma Tre University
-
Updated
Jun 26, 2019 - Jupyter Notebook
First project for Big Data course held at Roma Tre University
Project for Cloud Computing course (A.Y. 2018/2019)
Apache spark sandbox on GCP and Amazon EMR.
Monte Carlo stock simulation using Apache Spark.
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.
Implements a work queue for Dataproc Worflow Template executions
Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.
Hadoop Google DataProc DIO study
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
Marketing Campaign Data Analysis Using Apache Spark (PySpark)
Car Insurance Cold Calls Data Analysis using Apache Hive
Movie Rating Analysis using Apache Spark (pyspark)
Add a description, image, and links to the gcp-dataproc topic page so that developers can more easily learn about it.
To associate your repository with the gcp-dataproc topic, visit your repo's landing page and select "manage topics."