Movie Rating Analysis using Apache Spark (pyspark)
-
Updated
Nov 8, 2023 - Jupyter Notebook
Movie Rating Analysis using Apache Spark (pyspark)
Car Insurance Cold Calls Data Analysis using Apache Hive
Marketing Campaign Data Analysis Using Apache Spark (PySpark)
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Hadoop Google DataProc DIO study
Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.
Implements a work queue for Dataproc Worflow Template executions
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.
Monte Carlo stock simulation using Apache Spark.
Apache spark sandbox on GCP and Amazon EMR.
Project for Cloud Computing course (A.Y. 2018/2019)
First project for Big Data course held at Roma Tre University
Add a description, image, and links to the gcp-dataproc topic page so that developers can more easily learn about it.
To associate your repository with the gcp-dataproc topic, visit your repo's landing page and select "manage topics."