Twitter_sentiment_analysis

Downloaded the data from kaggle with a shape of (1600000, 6).
Cleaned the data & removed all the stopwords using the stopwords module from NLTK.
Using Porter Stemmer, the tweets have been stemmed & stored as a separate feature.
Performed train_test_split on the stemmed data & filled up any missing values arising from this.
Transformed the stemmed text data into vectors using the TfidfVectorizer from sklearn.
Using Logistic Regression fitted the 80% split data & trained the model.
Using the 20% split data to evaluate the performance of the model. The accuracy score for the test data turned out to be 78%. Whereas the train accuracy was 81%.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Twitter_Sentiment_Analysis.ipynb		Twitter_Sentiment_Analysis.ipynb

Provide feedback