Skip to content

Latest commit

 

History

History
82 lines (70 loc) · 3.22 KB

README.md

File metadata and controls

82 lines (70 loc) · 3.22 KB

Explanation of all Spark SQL, RDD, DataFrame and Dataset examples present on this project are available at https://sparkbyexamples.com/ , All these examples are coded in Scala language and tested in our development environment.

Table of Contents (Spark Examples in Scala)

Spark RDD Examples

  • Create a Spark RDD using Parallelize
  • Spark – Read multiple text files into single RDD?
  • Spark load CSV file into RDD
  • Different ways to create Spark RDD
  • Spark – How to create an empty RDD?
  • Spark RDD Transformations with examples
  • Spark RDD Actions with examples
  • Spark Pair RDD Functions
  • Spark Repartition() vs Coalesce()
  • Spark Shuffle Partitions
  • Spark Persistence Storage Levels
  • Spark RDD Cache and Persist with Example
  • Spark Broadcast Variables
  • Spark Accumulators Explained
  • Convert Spark RDD to DataFrame | Dataset

Spark SQL Tutorial

  • Spark Create DataFrame with Examples
  • Spark DataFrame withColumn
  • Ways to Rename column on Spark DataFrame
  • Spark – How to Drop a DataFrame/Dataset column
  • Working with Spark DataFrame Where Filter
  • Spark SQL “case when” and “when otherwise”
  • Collect() – Retrieve data from Spark RDD/DataFrame
  • Spark – How to remove duplicate rows
  • How to Pivot and Unpivot a Spark DataFrame
  • Spark SQL Data Types with Examples
  • Spark SQL StructType & StructField with examples
  • Spark schema – explained with examples
  • Spark Groupby Example with DataFrame
  • Spark – How to Sort DataFrame column explained
  • Spark SQL Join Types with examples
  • Spark DataFrame Union and UnionAll
  • Spark map vs mapPartitions transformation
  • Spark foreachPartition vs foreach | what to use?
  • Spark DataFrame Cache and Persist Explained
  • Spark SQL UDF (User Defined Functions
  • Spark SQL DataFrame Array (ArrayType) Column
  • Working with Spark DataFrame Map (MapType) column
  • Spark SQL – Flatten Nested Struct column
  • Spark – Flatten nested array to single array column
  • [Spark explode array and map columns to rows

Spark SQL Functions

  • Spark SQL String Functions Explained
  • Spark SQL Date and Time Functions
  • Spark SQL Array functions complete list
  • Spark SQL Map functions – complete list
  • Spark SQL Sort functions – complete list
  • Spark SQL Aggregate Functions
  • Spark Window Functions with Examples

Spark Data Source API

  • Spark Read CSV file into DataFrame
  • Spark Read and Write JSON file into DataFrame
  • Spark Read and Write Apache Parquet
  • Spark Read XML file using Databricks API
  • Read & Write Avro files using Spark DataFrame
  • Using Avro Data Files From Spark SQL 2.3.x or earlier
  • Spark Read from & Write to HBase table | Example
  • Create Spark DataFrame from HBase using Hortonworks
  • Spark Read ORC file into DataFrame
  • Spark 3.0 Read Binary File into DataFrame

Spark Streaming & Kafka

  • Spark Streaming – Different Output modes explained
  • Spark Streaming files from a directory
  • Spark Streaming – Reading data from TCP Socket
  • Spark Streaming with Kafka Example
  • Spark Streaming – Kafka messages in Avro format
  • Spark SQL Batch Processing – Produce and Consume Apache Kafka Topic