#

gcp-dataproc

Here are 14 public repositories matching this topic...

bug-data / Big_Data_First_Project

First project for Big Data course held at Roma Tre University

python spark hive hadoop bigdata jupyter-notebook gcp university-project hadoop-streaming gcp-storage gcp-compute gcp-dataproc roma-tre-university

Updated Jun 26, 2019
Jupyter Notebook

emanuelegiona / CC2019

Project for Cloud Computing course (A.Y. 2018/2019)

streaming apache-spark gcp python3 cloud-computing word-count sapienza-university gcp-dataproc

Updated Jan 28, 2020
Python

tansudasli / spark-sandbox

Apache spark sandbox on GCP and Amazon EMR.

python apache-spark aws-emr gcp-dataproc

Updated Mar 4, 2020
Jupyter Notebook

askmrsinh / spark-stocksim

Monte Carlo stock simulation using Apache Spark.

apache-spark stock-market monte-carlo-simulation predictive-analytics spark-sql spark-mllib apache-commons-math gcp-dataproc

Updated Mar 27, 2020
Scala

visalvo / projectScalable

Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.

scala spark mapbox-gl-js pagerank gcp-dataproc weighted-pagerank

Updated May 21, 2020
JavaScript

prodriguezdefino / dataproc-workflowtemplate-cloudfunction

Implements a work queue for Dataproc Worflow Template executions

terraform gcp-cloud-functions gcp-dataproc

Updated Sep 28, 2020
HCL

nrohit78 / PigHive_StackExhangeData

Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.

hive pig tf-idf gcp-dataproc google-datap

Updated Nov 21, 2020
PigLatin

sentiment-batch-stream-pipeline

DenisOgr / sentiment-batch-stream-pipeline

nlp twitter spark sentiment-analysis pyspark gcp-cloud-functions gcp-storage gcp-dataproc gcp-app-engine

Updated May 25, 2021
Jupyter Notebook

RickLeite / Hadoop-Google-DataProc-DIOstudy

Hadoop Google DataProc DIO study

hadoop google-cloud-platform gcp-cloud-functions gcp-dataproc digital-innovation-one

Updated Sep 4, 2021
Python

prakashdontaraju / google-cloud-ecommerce

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

Updated Mar 9, 2022
Python

ElhNour / large-scale-data-management-spark

Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.

spark gcp-dataproc large-scale-data-analytics

Updated Jan 13, 2023
Python

aeronaut2001 / Marketing-Campaign-Data-Analysis

Marketing Campaign Data Analysis Using Apache Spark (PySpark)

apache-spark pyspark hql apache-hive gcp-dataproc

Updated Nov 8, 2023
Jupyter Notebook

aeronaut2001 / Car-Insurance-Cold-Calls-Data-Analysis

Car Insurance Cold Calls Data Analysis using Apache Hive

hive gcp hdfs hql big-data-analytics apache-hadoop gcp-dataproc hql-joins

Updated Nov 8, 2023
HiveQL

aeronaut2001 / Movie-Rating-Analysis

Movie Rating Analysis using Apache Spark (pyspark)

apache-spark pyspark data-analytics gcp-dataproc

Updated Nov 8, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the gcp-dataproc topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gcp-dataproc topic, visit your repo's landing page and select "manage topics."