Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 559 Bytes

README.md

File metadata and controls

20 lines (17 loc) · 559 Bytes

Spark and Python for Big Data with PySpark

PySpark was set-up for this course using any one of the below mentioned methods -

  1. Ubuntu + Spark + Python on Virtual Box
  2. Amazon EC2 with Python and Spark
  3. Databricks Notebook System
  4. AWS EMR Notebook (Not Free)

Implemented Machine Learning Techniques using PySpark -

  1. Linear Regression
  2. Logistic Regression
  3. Tree Methods i. Decision Trees ii. Random Forests iii. Gradient Boosted Trees
  4. K-means Clustering
  5. Recommender Systems
  6. Natural Language Processing
  7. Spark Streaming via Twitter