Skip to content
/ Spark Public

Use of Apache Spark for the calculation of queries. Apache Spark offers 2 basic API's for the implementation of queries, the RDD API and the Dataframe API / Spark SQL

Notifications You must be signed in to change notification settings

gkarozis/Spark

Repository files navigation

Spark

Use of Apache Spark for the calculation of queries. Apache Spark offers 2 basic API's for the implementation of queries, the RDD API and the Dataframe API / Spark SQL

The file code includes the queries implemented in Apache Spark. Q2-Q5.py files contain the queries in RDD API. Q2b-Q5b.py files contain the queries in Spark SQL. In both cases the dataset was in csv format. Q2c-Q5c.py files contain the queries in Spark SQL but the dataset was in parquet form. The conversion from csv to parquet form was made in Parquet.py file

About

Use of Apache Spark for the calculation of queries. Apache Spark offers 2 basic API's for the implementation of queries, the RDD API and the Dataframe API / Spark SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages