Skip to content

This repositiry is for my university project for Bayesian Inference and Computation course. The task was to perform a logistic regression on a true dataset. This dataset is related to red variants of the Portuguese "Vinho Verde" wine. Link to project page: https://carimo198.github.io/bayesian-inference-vinho-verde/

Notifications You must be signed in to change notification settings

carimo198/bayesian-inference-vinho-verde

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian Analysis project with Vinho Verde red variant wine

This repositiry is for my university project for Bayesian Inference and Computation course. The task was to perform a logistic regression on a true dataset. This dataset is related to red variants of the Portuguese "Vinho Verde" wine. The dataset is described in the publication by Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties.

The dataset for this task can be accessed via winequality-red.csv. The input variables (based on physicochemical tests) are:

  • fixed acidity
  • volatile acidity
  • citric acid
  • residual sugar
  • chlorides
  • free sulfur dioxide
  • total sulfur dioxide
  • density
  • pH
  • sulphates
  • alcohol

The output variable (based on sensory data) is quality (a score between 0 and 10).

Analysis was conducted by using R. For this project, we were required to perform the following tasks:

  1. Read the dataset into R. Check if there are missing values (NA) and, in case there are, remove them.
  2. Implement a logistic regression. Suppose we consider "good" a wine with quality above 6.5 (included).
  3. Run a frequentist analysis on the logistic model, using the glm() function. What are the significant coefficients?
  4. Estimate the probabilities of having a "success": fix each covariate at its mean level, and compute the probabilities for a wine to score "good" varying total.sulfur.dioxide, and plot the results.
  5. Perform a Bayesian analysis of the logistic model for the dataset, i.e. approximate the posterior distributions of the regression coefficients, following these steps:
  • Write an R function for the log posterior distribution.
  • Fix the number of simulation at 10^4.
  • Choose 4 different initialisations for the coefficients.
  • For each initialisation, run a Metropolis–Hastings algorithm.
  • Plot the chains for each coefficients (the 4 chains on the same plot) and comment.
  1. Approximate the posterior predictive distribution of an unobserved variable characterised by:
  • fixed acidity: 7.5
  • volatile acidity: 0.6
  • citric acid: 0.0
  • residual sugar: 1.70
  • chlorides: 0.085
  • free sulfur dioxide: 5
  • total sulfur dioxide: 45
  • density: 0.9965
  • pH: 3.40
  • sulphates: 0.63
  • alcohol: 12 Plot the approximate posterior predictive distribution.
  1. Use the metrop() function available in the mcmc package to perform the same analysis on the posterior distribution you have approximated for task 5. Choose again 10^4 simulations and compare the results with the results obtained with your code. (Here a visual comparison of the chains is enough to get full mark).

My grade for this project was 40/40 marks.

About

This repositiry is for my university project for Bayesian Inference and Computation course. The task was to perform a logistic regression on a true dataset. This dataset is related to red variants of the Portuguese "Vinho Verde" wine. Link to project page: https://carimo198.github.io/bayesian-inference-vinho-verde/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages