Analyzing suspicious scores including midterm scores and year work scores for a whole class, trying to identify patterns and figuring out how these scores were evaluated.
To better understand where the data came from, the context must be explained. Year work scores have a total of 40 marks, 30 of which are the midterm score and the rest are evaluated based on student's work over the semester. In this case, midterm scores were revealed to the class (out of 30). After that, year work scores were revealed (out of 40) with some nonsensical results. Lastly, bonus marks were added to the class to compensate for the error.
The objective of this project is to get an idea of what might have happened during the evaluation of the total year work score, how it affected the class and figuring out if the added bonus actually made up for the error.
The dataset was originally collected from real college students data in text form, later it was cleaned and transformed into 3 excel workbooks ,with 231 entries, which are:
- Students List which contains the ID, Name and Gender of each student in the class.
- Midterm Scores which contains the following:
ID
: ID of each student in the class.MID
: Midterm score.
- Year Work Scores which contains the following columns:
ID
: ID of each student.TOT
: Year work score (before the added bonus).NEW_TOT
: Year work score (after the added bonus).
Other attributes are generated from the data in the project, which are:
diff
: The difference between year work scores (before adding the bonus) and midterm scores.bonus
: The added bonus.yw_diff
: The difference between total year work scores (after adding the bonus) and midterm scores.
The project with made with Python 3.9 on jupyter notebook. All the graphs were made using matplotlib and seaborn modules.
- Minimum: 1
- Average score: 20.3
- Maximum: 27
- Most students got between 20 and 26
We notice that the distribution of the year work scores does not follow or resemble the distribution of the midterm scores, a scaling of some sort must've been applied to the scores.
- Minimum: 9
- Average score: 25.58
- Maximum: 40
- Most students got 20
Female students have a slightly higher average score than male students
Plotting the midterm score of each student with their year work score shows a high density cluster of students around (20-25 MID, 20-30 Yearwork). It also shows that the data is quite random with a correlation coefficient of 0.177 which raises a red flag since high midterm scores should lead to high year work scores.
Around 35% of the students got their midterm scores or less as their year work score.
- Minimum: -7
- Average score: 5.25
- Maximum: 25
We notice a higher inverse correlation (-0.468) between midterm scores & difference between year work scores and midterm scores. Which means people who got higher marks in midterms, are those who got less year work score than their midterm score.
A noticeable observation would be the student who got the highest midterm score (27) and has the lowest difference (-7) meaning he has a year work score of just 20 out of 40.
Bonus marks were added differently for each student with the following frequency:
One bonus mark is the most frequently added bonus to the class. Only one student didn't have any bonus marks, also only 3 student got more than 10 bonus marks.
- Minimum: 0
- Average added bonus: 5
- Maximum: 10
The difference between the final year work scores (after adding bonus) and the midterm scores has the following frequency:
- Minimum: 1
- Average added marks: 10.26
- Maximum: 26
Female students got a higher average added marks than the male students.
After we took a deeper dive into midterm scores and the added bonus, now we're interested in answering the following questions:
- Did the added bonus fix the distribution of year work scores? if so, how? here.
- Compared to the ideal situation (each student having their year work score = their midterm score + 10), how did the bonus marks benifit students? here.
- Given that they varied from one student to another, what criteria were the bonus marks based on? here.
After adding bonus marks, the year work scores distribution becomes as follows:
Adding bonus marks obviously increased the mean score of the class. Also the distribution now resembles the distribution of the midterms scores more.
- Minimum: 14
- Average score: 30.58
- Maximum: 40
To better understand how did the bonus affect the class, let's divide them into 4 sections based on their scores:
Although the lowest 2 sections and the highest section changed drastically, most students still got between 20 and 30 before and after adding the bonus.
So far, we've covered the statistics of the class as whole. Now, let's compare those year work score to what should've happened ideally, which is each student having 10 marks (the rest of year work marks) added to their midterm scores to see how many students would be satisfied with score. In this ideal scenario average year work score is equal to 30.32 which is approximately equal to the average year work score after adding the bonus.
First, comparing the ideal situation to the old year work scores (as shown in the figure below) reveals that 66.24% of the students have less marks than if they were to be evaluated ideally.
The graph shows the difference betweeen the ideal scores and year work scores before adding the bonus. The green/red colors indicates how satisfied or unsatisfied a student would be depending on the difference between the ideal score and the year work score.
Secondly, comparing the ideal scores to year work scores after adding bonus marks show us a small improvment since 50.22% of the students would be unsatisfied with their scores.
It is clear that even though the average year work score increased by adding the bonus, around 50% of the students got less scores that what they should've got ideally. This might have happened because the bonus marks were not added equally to all the students.
To comprehend the reasons on which the bonus marks were added, we should investigate the relationship between bonus marks & year work scores, bonus marks & midterm scores and bonus marks & the difference between year work scores and midterm scores.
Note: For all the visualizations below, the line represents the average added bonus for each score. The dot radius represents how many students got that score & bonus.
Visualizing added bonus with the old year work scores:
The graph shows a weak correlation between the added bonus and year work scores.
Visualizing added bonus with midterm scores:
A positive correlation shows up, meaning students who got high marks on their midterm are those who got more added bonus.
Visualizing added bonus with the difference between year work scores and midterm scores:
The highest (inverse) correlation so far which suggests that the bonus marks were added to the students based on their difference between year work score and midterm score.
- The distribution of year work scores before adding the bonus highly suggests that midterm scores were scaled down and other marks were factored in during the evaluation of final year work score.
- Most of the students who got high scores on the midterm also got a year work score less than their midterm score.
- Female students have slightly better scores than male students.
- Adding bonus marks increased the average year work score by 5 marks (12.5%).
- Although adding the bonus increased the average score of the class, 50% of the students got less scores than what they should've got ideally.
- Students who got the most bonus marks are those who have a year work score less than their midterm score, and most of these students are the ones who already have higher midterm scores. Therefore, adding bonus affected students with high midterm scores in first place more than the rest of the class. As a result, affecting the average score of the class.