Spark-nlp-Pyspark

This project is focused on building a Spark ML Pipeline using Pyspark to perform natural language processing on a dataset. The pipeline uses the following annotators:

Getting Started

To get started with the project, you will need to have Spark and Pyspark installed on your machine. Additionally, you will need to import the necessary libraries, including the pretrained models for English.

Prerequisites

Installing

To install Spark and Pyspark, please follow the instructions provided on the respective websites. To install the Spark NLP library, you can use the following command in your Pyspark project:

!pip install spark-nlp

Running the Application

The application is run by executing the script file containing the pipeline. The pipeline will read the input dataset, and it will print the transformed DataFrame showing only the POS column and the NER column. As a bonus, it will only show the result attribute of these annotations. The result attribute of NER and POS will be collected, and the relationship between found entities and their part of speech attributes will be analyzed and explained.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
spark-nlp.ipynb		spark-nlp.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-nlp-Pyspark

Getting Started

Prerequisites

Installing

Running the Application

About

Releases

Packages

Languages

GhaidaaShtayeh/spark-nlp-Pyspark

Folders and files

Latest commit

History

Repository files navigation

Spark-nlp-Pyspark

Getting Started

Prerequisites

Installing

Running the Application

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages