-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9a52cf6
commit 7a4ec98
Showing
66 changed files
with
3,003 additions
and
210 deletions.
There are no files selected for viewing
Binary file removed
BIN
-27.8 KB
_images/4e268b209ab2ea3f2e72422e9bec47f6c998a48fb494e326e33756fc17cb4e9f.png
Binary file not shown.
Binary file added
BIN
+27.7 KB
_images/a318d2119336d94c638413cedc2cb1f37d2e690f22748b0f6672e555b1fd25bf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed
BIN
-16.5 KB
_images/ac3e3534978cccd2eb6ba714e7d9cff13a7b9a4b4d38b3b82e31cc7acf8fc2e5.png
Binary file not shown.
Binary file added
BIN
+22.3 KB
_images/cb4ba6dd979009882fa9449a570a15db32ac73a50d2d04cc56875d2c768d4afc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.8 KB
_images/d19e1393f89f5806de6d1ac50487cb41ad1279f4e86fcbd9b791cd7d77fd5e9b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+16.5 KB
_images/e46da72d78ef33239fd09a8e58186baec3c7286dfc0be10dd51966609d3d0251.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed
BIN
-22.6 KB
_images/ece1396cebcee3f0b95dc34de915455aceeda4106380a3dd234eb180bf9cb86c.png
Binary file not shown.
Binary file removed
BIN
-27.8 KB
_images/f983f1226a0e19abd0a3212b23db57576ea933ea238f54c00a04a53700c03d40.png
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Assignment 6: Auditing Algorithms | ||
|
||
|
||
__Due: 2023-10-18_ | ||
|
||
Eligible skills: | ||
- evaluate level 1 | ||
- construct level 2 | ||
- summarize, 1,2 | ||
- visuailze 1,2 | ||
|
||
|
||
## Related notes | ||
|
||
- [](../notes/2023-10-12) | ||
<!-- - [](../notes/2023-03-02) --> | ||
|
||
|
||
## About the data | ||
|
||
|
||
We have provided a copy of [reconstructed version](https://github.com/socialfoundations/folktables) of the [Adult](https://archive.ics.uci.edu/dataset/2/adult) Dataset, which is a popular benchmark dataset for training machine learning models that comes from a [recent paper](https://arxiv.org/abs/2108.04884) about the risks of that dataset. The classic [Adult](https://archive.ics.uci.edu/dataset/2/adult) dataset tries to predict if a person makes more or less than 50k. | ||
|
||
Researchers reconstructed the Adult dataset with the actual value of the income. We trained models to predict `income>=$10k`, `income>=$20k` , etc. We used three different learning algorithms, nicknamed 'LR', 'GPR', and 'RPR' for each target (`>10k`, `>20k` ,..., `>90k`). | ||
|
||
- `adult_models_only.csv` has the model's predictions | ||
- `adult_reconstruction_bin.csv` has the data. | ||
|
||
|
||
Both data files have a unique identifier column included. | ||
|
||
```{admonition} Think Ahead | ||
Why might the dataset have more samples in it than the model predictions one? | ||
``` | ||
|
||
## Complete an audit | ||
|
||
|
||
|
||
Thoroughly audit two rannomdly selected models. If you load the `adult_models_only.csv` to `models_df` then the following will give you replicable, but random, two columns | ||
``` | ||
import numpy as np | ||
my_num = # pick a number | ||
np.random.seed(my_num) | ||
models_to_audit = np.random.choice( models_df.columns,2) | ||
``` | ||
|
||
In your audit, use at least three different performance metrics. Compare and contrast performance in those metrics across racial or gender groups. | ||
|
||
Include easy to read tables with your performance metrics and interpretations of the model's overall performance and any disparities that could be understood by a general audience. | ||
|
||
Which of the two would you think is better to deploy? | ||
|
||
|
||
## Extend your Audit | ||
|
||
|
||
```{note} | ||
optional | ||
(for more Achievements or deeper understanding/more practice) | ||
``` | ||
|
||
Use functions and loops to build a dataset about the performance of the different models so that you can answer the following questions: | ||
|
||
1. Which model (target and learning algorithm) has the best accuracy? | ||
1. Which target value has the least average disparity by race? by gender? | ||
1. Which learning algorithm has the least average disparity by race? by gender? | ||
1. Which model (target and learning algorithm) do you think is overall the best? | ||
|
||
|
||
```{list-table} Example table format | ||
* - y | ||
- model | ||
- score | ||
- value | ||
- subset | ||
* - >=10k | ||
- LR | ||
- accuracy | ||
- .873 | ||
- overall | ||
* - >=20k | ||
- RPR | ||
- false_pos_rate | ||
- .873 | ||
- men | ||
``` | ||
|
||
This table is not real data, just headers with one example value to help illustrate what the column name means. | ||
|
||
|
||
```{hint} | ||
This step you should make separate data frames and then merge them together for construct. If you don't need construct you can build it as one, for visualize you should use appropriate groupings | ||
``` |
Oops, something went wrong.