Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 2.92 KB

README.md

File metadata and controls

28 lines (21 loc) · 2.92 KB

Data Science Portfolio

Collection of Data related projects

Projects

Determinants of Amsterdam House Prices

Data was obtained from NVM (Dutch Real Estate Agent Organization). Through Hedonic Regression, it is found that some house features have a negative linear relation with predicted house price, and apartments in Amsterdam costs more than the other house types in Amsterdam. A neural network regression model was tested against the baseline linear regression model and obtained 10% more accuracy.

Machine Learning Course UvA 2017/2018

This is a collection of related projects done during the course of University of Amsterdam's 2017/2018 Machine Learning campus course.

Graded Assignments

  1. Manual Implementation of Logistic Regression. Model is trained and tested on MNIST dataset.
  2. Evaluation and comparation of the parametric Logistic Regression and non-parametric K-NN algorithm to classify MNIST dataset. K-NN is better at predicting on this dataset, with a higher accuracy score of 0.93. Moreover, higher precision, recall, and F-scores were also obtained using this method instead of Logistic Regression.

Final Project: Wine Quality Classification

This project is done as part of the final project for the UvA Machine Learning course. Using a kaggle wine dataset, my team and I compared the performances of three classifiers: Logistic Regression, SVM, and Random Forest. We found that for this specific problem, our Random Forest classifier performed the best out of the three.

Salary Prediction

In this project, I created a prediction model for employee salaries in the USA based on job characteristics. The aim of this project was to find out how these features impact salaries of various jobs and to model employee salaries. I created and compared several regression models: Linear Regression, Random Forest Regression, and Gradient Boosting. My best model, the Gradient Boost Regression with hyperparameter tuning performed much better than the baseline linear regression model with an average MSE of 357.

Customer Credit Data: Segregating Customers into Defaulters vs Non-Defaulters to minimize credit risk

This project aims to analyze the credit card/borrowing behavior of Indonesian customers. I cleaned the data and found several insights regarding the characteristics of users who default on loans. I implemented a Random Forest Classification model but it did not improve upon the baseline classification result of 92.65%. However, my insights from the features importance confirmed my findings about the characteristics of Defaulters.