Use of Apache Spark for the calculation of queries. Apache Spark offers 2 basic API's for the implementation of queries, the RDD API and the Dataframe API / Spark SQL
The file code includes the queries implemented in Apache Spark. Q2-Q5.py files contain the queries in RDD API. Q2b-Q5b.py files contain the queries in Spark SQL. In both cases the dataset was in csv format. Q2c-Q5c.py files contain the queries in Spark SQL but the dataset was in parquet form. The conversion from csv to parquet form was made in Parquet.py file