Hello!
This project is the final for my STAT228: Introduction to Data Science course at Simmons University. With this project, I’d like to encompass everything I’ve learned throughout the semester, as well as some additional data science methods that I have taught myself outside of class. (The primary non-curriculum methods I am using come from the package tmap, which I was made aware of from a LinkedIn post by Joachim Schork, a data science educator & consultant from Germany).
The premise of my project is to predict & analyze Women’s Empowerment Index scores for countries ; in this project, I aim to find the best version of the model predicting WEI scores using a variety of ensemble methods. There are two datasets I’m interested in using here:
-
The wei dataset, which stands for Women’s Empowerment Index, is the first dataset I’m going to be using. It is sourced from Human Development Reports, and can be accessed at the link below. (https://www.kaggle.com/datasets/iamsouravbanerjee/women-empowerment-index). It contains information on the WEI scores for 114 countries.
-
The second dataset I’m interested in using is the led dataset, which stands for Life Exptectancy Dataset, and is sourced from the World Health Organization. This dataset can be accessed at the link below. (https://www.kaggle.com/datasets/augustus0498/life-expectancy-who/data). This dataset contains important health-centered data on every country in the world.
I’d like to join the two datasets by the common variable “Country”, and analyze WEI scores by health factors related to the patient’s country. I am interested in creating several maps that will visualize WEI scores against other health-based factors.