- Downloaded the data from kaggle with a shape of (1600000, 6).
- Cleaned the data & removed all the stopwords using the stopwords module from NLTK.
- Using Porter Stemmer, the tweets have been stemmed & stored as a separate feature.
- Performed train_test_split on the stemmed data & filled up any missing values arising from this.
- Transformed the stemmed text data into vectors using the TfidfVectorizer from sklearn.
- Using Logistic Regression fitted the 80% split data & trained the model.
- Using the 20% split data to evaluate the performance of the model. The accuracy score for the test data turned out to be 78%. Whereas the train accuracy was 81%.
-
Notifications
You must be signed in to change notification settings - Fork 0
dkamp007/Twitter_sentiment_analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published