Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 541 Bytes

README.md

File metadata and controls

4 lines (3 loc) · 541 Bytes

Spark

Use of Apache Spark for the calculation of queries. Apache Spark offers 2 basic API's for the implementation of queries, the RDD API and the Dataframe API / Spark SQL

The file code includes the queries implemented in Apache Spark. Q2-Q5.py files contain the queries in RDD API. Q2b-Q5b.py files contain the queries in Spark SQL. In both cases the dataset was in csv format. Q2c-Q5c.py files contain the queries in Spark SQL but the dataset was in parquet form. The conversion from csv to parquet form was made in Parquet.py file