Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 5.63 KB

README.md

File metadata and controls

15 lines (13 loc) · 5.63 KB

Machine Learning Exercises

Development of practical works (TP) related to Machine Learning field.

💬 Description 📁 Data 👨🏻‍💻 Code

TP-1: Anscombe's quartet
Analysis of the importance of the outliers effect and data visualization.
Anscombe's datasets (source). Jupyter Notebook

TP-2.1: Data visualization
General exploratory analysis to find data showing abnormal behavior.
Sanitary and epidemiological situation of the municipality of Bahía Blanca, Argentina (source). Jupyter Notebook

TP-2.2: Parametric classifier
Minimum error classifier design and performance analysis against variations of the mean and standard deviation of the generated data.
"Randomly" generated Gaussian distributed data. Jupyter Notebook

TP-3.1: KNN Overview
Creation of K-nearest neighbors (KNN) classifiers and performance evaluation against some training parameters.
Random samples from a normal (Gaussian) distribution. Jupyter Notebook

TP-3.2: KNN GridSearch
Evaluation of hyperparameters and their combination for a k-nearest neighbors (knn) classifier. K-Fold cross-validation is implemented to find the influence of the data on the model.
Random samples from a normal (Gaussian) distribution. Jupyter Notebook

TP-3.3: Spotify songs
Development and tunning of a k-nearest neighbors (knn) classifier to predict whether a given song will be liked or not. Feature engineering is implemented to select the data that contributes the most information to the model.
More than 2000 Spotify songs from a specific user marked as liked or disliked (source). Jupyter Notebook

Fog event forecasting
Comparison of ensembles to predict the occurrence of fog event in the next hour. Bagging and boosting algorithms are implemented to achieve this purpose, including some basic hyperparameter tuning.
Meteorological data from the Ezeiza (Buenos Aires, Argentina) weather station with hourly measurements from 1979 to 2011 (source). Jupyter Notebook

TP-5: Customers segmentation
Construction of clustering algorithms to segment customers based on their annual consumption pattern in product categories. Silhouette coefficient is implemented to evaluate each model performances.
Clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories (source). Jupyter Notebook

TP-6: Boston housing prices
Construction of regression algorithms to predict property sales prices in the city of Boston. Feature selection techniques are implemented to reduce data dimensionality.
Boston Housing dataset with 506 observations and 14 features describing housing prices (source). Jupyter Notebook