Run the COMPAS Recidivism Classifier Notebook DEMO linked from the Google What-If Tool page -> https://colab.research.google.com/github/pair-code/what-if-tool/blob/master/WIT_COMPAS.ipynb (Links to an external site.). You don’t need to change the code, just click Run all in Runtime tab.
Under the “Invoke What-If Tool for test data and the trained models” step, you can see the same results that ProPublica found in their analysis by:
Selecting the “Performance & Fairness” tab
In “Ground Truth Feature” dropdown menu, select “recidivism_within_2_years”
In “Slice by” dropdown menu, select “race”
Under the “Fairness” option, select the “Equal accuracy” option
As you can see in the “Equal accuracy thresholds for 6 values of race” window, the different slices have very similar accuracy rates, but different false positive and false negative rates.
Provide a definition for the terms false negative, false positive, and accuracy rates.
In your own words, provide a definition for bias based on this dataset. Provide a definition for fairness based on this dataset. What is the difference between these two terms?
Of the three rates (false negative, false positive, and accuracy), which rate should be considered to help mitigate bias? Why?
Of the three rates (false negative, false positive, and accuracy), which rate should be considered to ensure fairness? Why?
Are the rates you selected for bias and fairness the same? Why or why not?
In the What-If threshold window, change the threshold values to help mitigate bias. What happens to the rates for the other two terms? Do the corresponding results impact any groups negatively? (include a snapshot of the thresholds selected)
In the What-If threshold window, change the threshold values to ensure fairness. What happens to the rates for the other two terms? Do the corresponding results impact any groups negatively? (include a snapshot of the thresholds selected) [Note: If you selected the same rates for bias and fairness, there is no need to rerun the analysis, just mention that here]
Based on your assessment and definitions, does it seem as if mitigating bias and ensuring fairness at the same time is a difficult task? Why or why not?
Do you think your assessment and definitions would apply if a different dataset was selected? Why or why not?