Skip to content

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data. I am a Beginner in this Field so kindly Forgive if there are any Silly Mistakes. Suggestions through Mail for Improving the Analysis are always Welcome. πŸ˜€πŸΉ πŸ₯‡πŸ’―

Notifications You must be signed in to change notification settings

mandarmakhi/DataScience-R-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

88 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hits]

DataScience (R-code)

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data. I am a Beginner in this Field so kindly Forgive if there are any Silly Mistakes. Suggestions through Mail for Improving the Analysis are always Welcome.πŸ˜€πŸΉ πŸ₯‡πŸ’― E-Mail id:- mandarmakhi007@gmail.com

You will Need Rstudio to Execute all the Codes So Install it first and then Go through the Below Codes. To Download Rstudio, Click Here.

To Begin with the Basics of the Data Science, go through the Practice(Basics) Folder in the Repository.

Practice(Basics)

1.Basics practice.r

2.Confidence Interval Confidence_Interval.r

3.Probability Probability.r


Now we will do the Descriptive Statistics Analysis also known as Exploratory Data Analysis(EDA).

Descriptive Statistics - Exploratory Data Analysis(EDA).

1.Carbon Dioxide(CO2) Descriptive_Stats_CO2.r

2.Air Quality Descriptive_Stats_airquality.r


Now lets Go through Various Algorithms.

1. Hypothesis Testing

  1. Hypothesis Testing Hypothesis Testing.r

2. Linear Regression

A. Simple Linear Regression

1.Newspaper Data NewspaperData.CSV Newspaper_LinearRegression.r

2.Waist Circumference-Adipose issue WC-AT.csv WC-AT_LinearRegression.r

B. Multiple Linear Regression

1.Cars Cars.csv Cars_Multi_Linear_Regression.r

2.Corolla Toyota_Corolla.csv Toyota_Multi_Linear_Regression.r


3. Logistic Regression

  1. Claimants Claimants.csv Logistic Regression.r

4. Association Rule

1.Titanic Titanic.csv Titanic_Association_Rule.r


5. Principle Component Analysis (PCA) - Combines related Columns

1.Cat Cat.jpg Example1_PCA.r

2.University Universities.csv Universities_PCA.r


6. Clustering - Combining Related Rows

1.Universities Univesities.csv Universities_Heirarchical_Clustering.r K-Means_Clustering.r


7. Survival Analysis

1.Unemployment Survival_Unemployment.csv Survival_Unemployment.r


Now Lets see Various Supervised Machine Learning Algorithms(Techniques)

1. Decision Tree

There are 2 Techniques in Decision Tree - Bagging Technique and Boosting Technique

  1. Example 1 DecisionTree.r Decision_tree_Bagging.r Decision_Tree_Bagging+Boosting.r

2. K-Nearest Neighbour(KNN)

  1. Cancer KNN.csv K-Nearest_Neighbour.r

3. Random Forest

  1. Iris Available in R Datasets random_forest.r

4. Artificial Neural Networks

  1. Concrete concrete.csv Concrete_Neural_Network.r

5. Support Vector Machine(SVM)

  1. Letter Data LetterData.csv LetterData_Support_Vector_Machine.r

6. Naive Bayes Classifier

  1. SMS Spam sms_spam.csv Naive_Bayes_Sms_Spam.r

7. Forecasting Analysis

  1. Amtrak Amtrak.csv | Predict_new.xlsx | Amtrak_Forecasting.r

  2. Aviation Aviation.csv Aviation_Exponential_Smooting_Forecasting.r


8. NLP - Natural Language Processing (Text Mining)

There are Two Approaches - Emotion Mining and Sentiment Analysis.

We require Positive Words and Negative Words for the Analysis.

  1. Emotion Mining Amazon Nokia Lumia Reviews.txt Emotion_Mining_Amazon.r

  2. Sentiment Analysis McD_Small.csv Sentiment Analysis_McD.r


Web Scraping

If you want to extract the Reviews of a particular Product from Amazon then Run the Below Code in Rstudio.

This Code is Valid only for the Products on Amazon.

The Code Varies from site to site.

install.packages("rvest")
install.packages("XML")
install.packages("magrittr")

library(rvest)
library(XML)
library(magrittr)

# Amazon Reviews #############################
aurl <- "URL of Product Reviews page"
amazon_reviews <- NULL
for (i in 1:10){
  murl <- read_html(as.character(paste(aurl,i,sep="=")))
  rev <- murl %>%
    html_nodes(".review-text") %>%
    html_text()
  amazon_reviews <- c(amazon_reviews,rev)
}
length(amazon_reviews)
write.table(amazon_reviews,"apple.txt",row.names = F)

I have Performed this code for Extracting Reviews of Apple Macbook Air, Do check it Out.


After Going Through the basics, We will Now Perform Algorithms on Different Datasets.

Implementation of Algorithms on Datasets

1. Hypothesis Testing

  1. Buyer Ratio .pptx BuyerRatio.csv BuyerRatio.r

  2. Customer Order Form .pptx Customer+OrderForm.csv Customer+OrderForm.r

  3. Cutlet Diameter .pptx Cutlets.csv Cutlet_Hyp_Test.r

  4. Fantaloons .pptx Fantaloons.csv Fantaloons.r

  5. Lab .pptx LabTAT.csv Lab_Hyp_Anova_test.r


2. Linear Regression

A. Simple Linear regression

  1. Calories Consumed .txt Calories_Consumed.csv Calories_Simple_Linear.r

  2. Delivery Time Data .txt Delivery_Time.csv Delivery_Simple_Linear_Regression.r

  3. Employee Data .txt Emp_Data.csv Emp_Simple_Linear.r

  4. Salary Data .txt Salary_Data.csv Salary_Simple_Linear.r


B. Multi Linear Regression

  1. 50 Startup .txt 50_Startups.csv 50_Startup_Multi_Linear.r

  2. Computer Data .txt Computer_Data.csv Computer_Data_Multi_Linear.r

  3. Computer Data .txt ToyotaCorolla.csv ToyotaCorolla_Multi_Linear.r


3. Logistic Regression

  1. Bank .txt Bank-Full.csv Bank_logistic_Regression.r

  2. Credit Card .txt Creditcard.csv Creditcard_Logistic_regression.r


4. Association Rule

  1. Books .txt Book.csv Book.r

  2. Groceries [.txt](https://github.com/mandarmakhi/DataScience-R-code/blob/master/2.%20Algorithms%20on%20Datasets/Association%20Rule/groceries/Problem_Statment.txt Groceries.csv Groceries.r

  3. Movies .txt My_Movies.csv My_Movies.r


5. Clustering

  1. Crime Data .txt Crime_Data.csv Crime_Data_Clustering.r
  2. East West Airlines .txt EastWestAirlines.xlsx EastWestAirlines_Cluster.r

6. Principle Component Analysis(PCA)

  1. Wine .txt Wine.csv Wine_PCA.r

Supervised Machine Learning Algorithms


1. Decision Tree

  1. Company Data .txt Company_Data.csv Company_Data.r

  2. Fraud Check .txt Fraud_Check.csv Fraud_Check.r


2. Random Forest

  1. Company Data .txt Company_Data.csv Company_Data.r

  2. Fraud Check .txt Fraud_Check.csv Fraud_Check.r

  3. Iris .pdf Available in R Dataset Iris.r


3. K-Nearest Neighbour (KNN) Classifier

  1. Glass Data .txt Glass.csv Glass.r

  2. Zoo .txt Zoo.csv Zoo.r


4. Artificial Neural Network (NN)

  1. 50 Startups 50_Startups.csv 50_Startups.r
  2. Concrete Concrete.csv Concrete.r
  3. ForestFires Forestfires.csv Forestfires.r

5. Support Vector Machine(SVM)

  1. Forest Fires .txt Forestfires.csv Forestfires.r
  2. Salary Data .txt Salary_Data_Train.csv, Salary_Data_Test.csv SalaryData.r

6. Naive Bayes Classifier

  1. Salary_Data .txt SalaryData_Train.csv, SalaryData_Test.csv SalaryData.r
  2. Sms Data .txt Sms_Raw_NB.csv Sms_Raw_NB.r

7. Forecasting Analysis

  1. Airlines Data .txt Airlines+Data.xlsx Airlines+Data.r
  2. Coca Cola Sales .txt CocaCola_Sales_Rawdata.xlsx CocaCola_Sales_Rawdata.r
  3. Plastic Sales .txt PlasticSales.csv PlasticSales.r

8. NLP - Natural Language Processing(Text Mining)

You Require Positive-Words, Negative-Words and Stop-Words for this Analysis.

  1. Amazon iphone Review .txt iphone Reviews.txt Amazon_iphone_Reviews.r
  2. IMDB Money heist WebSeries Review .txt Money heist_Reviews.txt Money Heist.r

About

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data. I am a Beginner in this Field so kindly Forgive if there are any Silly Mistakes. Suggestions through Mail for Improving the Analysis are always Welcome. πŸ˜€πŸΉ πŸ₯‡πŸ’―

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published