Skip to content

Spark and Python for Big Data with PySpark. Distributed Machine Learning

Notifications You must be signed in to change notification settings

mgamzec/Big-Data-with-Pyspark-in-Python

Repository files navigation

Spark and Python for Big Data with PySpark

PySpark was set-up for this course using any one of the below mentioned methods -

  1. Ubuntu + Spark + Python on Virtual Box
  2. Amazon EC2 with Python and Spark
  3. Databricks Notebook System
  4. AWS EMR Notebook (Not Free)

Implemented Machine Learning Techniques using PySpark -

  1. Linear Regression
  2. Logistic Regression
  3. Tree Methods i. Decision Trees ii. Random Forests iii. Gradient Boosted Trees
  4. K-means Clustering
  5. Recommender Systems
  6. Natural Language Processing
  7. Spark Streaming via Twitter

About

Spark and Python for Big Data with PySpark. Distributed Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published