A multilabel-classifier that can identify the species of iris flowers based on the length and width of their leaves
Data has already available here. The dataset contains only 150 observations with 3 species. Each species have 50 observations.
Data has no NULL, Duplicate value. No imbalance issue in the dataset.
Datasets is trained using Logistic Regression
, Decision Tree
, Support Vector Machine (SVM)
and Random Forest
classification models.
In the table we see the beseline models accuracy, training and testing accuracy of the respective models.
Model | Baseline Accuracy | Training Accuracy | Testing Accuracy |
---|---|---|---|
Logistic Regression | 0.341 | 0.967 | 0.967 |
Decision Tree | 0.341 | 1.000 | 0.967 |
SVM | 0.341 | 0.967 | 0.967 |
Random Forest | 0.341 | 0.967 | 0.967 |
All the models beat the baseline model and their performance all almost similar. Due to Decision Tree's
flexibility it may overfit sometimes.
From the image, we can see that, there is a Normal Distribution
in Sepal Length & Width (first two images)
but Bi-Directional
in Petal Length & Width (last two images)
.