Formal Group Fairness and Accuracy in Automated Decision Making
Abstract
:1. Introduction
2. Fairness in Machine Learning
2.1. Relevance of Fairness in Automated Decision Making
2.2. Fairness and Group Fairness Measures
2.3. Individual Fairness
3. Sources of Bias in the Machine Learning Pipeline
3.1. Sources of Bias in the Data Collection Phase
3.2. Model Development
3.3. Model Evaluation
3.4. Model Deployment
4. Bias Mitigation Strategies
4.1. Pre-Processing Strategies
4.2. In-Processing and Post-Processing Strategies
5. Accuracy and Fairness
5.1. The Trade-Off Assumption between Accuracy and Fairness
5.2. State-of-the-Art Approaches
6. Data and Methodology
- Optimized modeling: Does maximizing accuracy simultaneously lead to an increase in fairness? Enhancing accuracy with respect to ACC and BACC is expected to decrease the corresponding error rate and, consequently, to simultaneously increase fairness [12].
- Discrimination threshold: Does adapting the threshold of classification in the interest of maximizing accuracy improve fairness? The LRC is optimized with respect to the F1 score. Adapting the threshold of classification is expected to increase accuracy. Furthermore, balancing the samples representing the privileged and unprivileged groups might simultaneously increase fairness.
- Bias mitigation: How does the objective of maximizing fairness impact accuracy? To analyze the effect of bias mitigation on accuracy and fairness, the popular mitigation strategy RW was performed. According to the research, assuming the trade-off between accuracy and fairness to be unavoidable, accuracy is expected to decrease if fairness is maximized.
- Feature set size: Does the number of features used to make a decision influence accuracy and fairness? Increasing the feature set size, corresponding to greater knowledge being available about an individual, is expected to increase accuracy and fairness [11].
- Training set size: Does the quantity of available training data influence accuracy and fairness? Increasing the training set size is expected to increase accuracy but to also negatively impact fairness, according to [11].
- Data augmentation: How do accuracy and fairness change if synthetic data points are augmented to the data set? The addition of realistic, synthetic data points to account for different labels with respect to the unprivileged and privileged groups is assumed to increase fairness and accuracy simultaneously [10].
6.1. Data Set Selection
6.2. Data Pre-Processing
- COMPAS: {race, sex, age, juv_felt_count (Juvenile felony count), juv_mids_count (Juvenile misdemeanor count), juv_other_count (Juvenile other defenses count), priors_count (Prior offenses count), c_charge_degree (Charge degree of original crime), two_year_recid (Rearrested within two years)};
- LAW: {race, gender, lsat (The student’s LSAT score), ugpa (The student’s undergraduate GPA), zfygpa (The first year law school GPA), DOB_yr (Year of birth), zgpa_felt_count (The cumulative law school GPA), family_income (The student’s family income bracket), part_time (Working full- or part-time), cluster_tier (Tier of school), weighted_lsat_ugpa (Weighted LSAT and UGPA score), pass_bar (Passed bar exam on first try)}.
6.3. Model Selection
6.4. Fairness Measures
6.5. Accuracy Measures
6.6. Performance Stability
6.7. Reweighing
6.8. Feature and Training Set Size
6.9. Data Augmentation
7. Results
7.1. Baseline Results
7.2. Optimized Modeling
7.3. Discrimination Threshold
7.4. Reweighing
7.5. Feature Set Size
7.6. Training Set Size
7.7. Data Augmentation
8. Discussion
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Feature Name | Type | Value | Description |
---|---|---|---|
race | Binary | {African-American, Caucasian} | Race |
sex | Binary | {Male, Female} | Gender |
age | Numerical | [18–96] | Age in years |
juv_felt_count | Numerical | [0–20] | Juvenile felony count |
juv_mids_count | Numerical | [0–13] | Juvenile misdemeanor count |
juv_other_count | Numerical | [0–17] | Juvenile other offenses count |
priors_count | Numerical | [0–38] | Prior offenses count |
c_charge_degree | Binary | {F, M} | Charge degree of original crime |
two_year_recid | Binary | {0, 1} | Rearrested within two years |
Feature Name | Type | Value | Description |
---|---|---|---|
race | Binary | {Non-White, White} | Race |
gender | Binary | {Female, Male} | Gender |
lsat | Numerical | [11–48] | The student’s LSAT score |
ugpa | Numerical | [1.5–4] | The student’s undergraduate GPA |
zfygpa | Numerical | [−3.35–3.48] | The first year law school GPA |
DOB_yr | Numerical | [10–71] | Year of birth |
zgpa_felt_count | Numerical | [−6.44–4.01] | The cumulative law school GPA |
family_income | Categorical | 5 | The student’s family income bracket |
part_time | Binary | {0,1} | Working full- or part-time |
cluster_tier | Categorical | 6 | Tier of school |
weighted_lsat_ugpa | Numerical | [288.95–1000] | Weighted LSAT and UGPA score |
pass_bar | Binary | {0, 1} | Passed bar exam on first try |
References
- O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Broadway Books: New York, NY, USA, 2016. [Google Scholar]
- Žliobaitė, I. Measuring discrimination in algorithmic decision making. Data Min. Knowl. Discov. 2017, 31, 1060–1089. [Google Scholar] [CrossRef] [Green Version]
- Cooper, A.F.; Abrams, E.; Na, N. Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Online, 19–21 May 2021; pp. 46–54. [Google Scholar]
- Corbett-Davies, S.; Pierson, E.; Feller, A.; Goel, S.; Huq, A. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 797–806. [Google Scholar]
- Menon, A.K.; Williamson, R.C. The cost of fairness in binary classification. In Proceedings of the Conference on Fairness, Accountability and Transparency, New York, NY, USA, 23–24 February 2018; pp. 107–118. [Google Scholar]
- Zhao, H.; Gordon, G. Inherent tradeoffs in learning fair representations. J. Mach. Learn. Res. 2022, 23, 2527–2552. [Google Scholar]
- Friedler, S.A.; Scheidegger, C.; Venkatasubramanian, S. On the (im) possibility of fairness. arXiv 2016, arXiv:1609.07236. [Google Scholar]
- Dutta, S.; Wei, D.; Yueksel, H.; Chen, P.Y.; Liu, S.; Varshney, K. Is there a trade-off between fairness and accuracy? A perspective using mismatched hypothesis testing. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 2803–2813. [Google Scholar]
- Berk, R.; Heidari, H.; Jabbari, S.; Kearns, M.; Roth, A. Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. 2021, 50, 3–44. [Google Scholar] [CrossRef]
- Sharma, S.; Zhang, Y.; Ríos Aliaga, J.M.; Bouneffouf, D.; Muthusamy, V.; Varshney, K.R. Data augmentation for discrimination prevention and bias disambiguation. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–8 February 2020; pp. 358–364. [Google Scholar]
- Zhang, J.M.; Harman, M. “Ignorance and Prejudice” in Software Fairness. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 25–28 May 2021; pp. 1436–1447. [Google Scholar]
- Hellman, D. Measuring algorithmic fairness. Va. Law Rev. 2020, 106, 811–866. [Google Scholar]
- Wick, M.; Tristan, J.B. Unlocking fairness: A trade-off revisited. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Fish, B.; Kun, J.; Lelkes, Á.D. A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, FL, USA, 5–7 May 2016; pp. 144–152. [Google Scholar]
- Rodolfa, K.T.; Lamba, H.; Ghani, R. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat. Mach. Intell. 2021, 3, 896–904. [Google Scholar] [CrossRef]
- Narayanan, A. Translation tutorial: 21 fairness definitions and their politics. In Proceedings of the Conference Fairness, Accountability and Transparency, New York, USA, 23–24 February 2018; Volume 1170, p. 3. [Google Scholar]
- Verma, S.; Rubin, J. Fairness definitions explained. In Proceedings of the 2018 IEEE/ACM International Workshop on Software Fairness (Fairware), Gothenburg, Sweden, 28–29 May 2018; pp. 1–7. [Google Scholar]
- Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. Acm Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Chen, J.; Kallus, N.; Mao, X.; Svacha, G.; Udell, M. Fairness under unawareness: Assessing disparity when protected class is unobserved. In Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 339–348. [Google Scholar]
- Barocas, S.; Hardt, M.; Narayanan, A. Fairness in machine learning. NIPS Tutor. 2017, 1, 2. [Google Scholar]
- Kamiran, F.; Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 2012, 33, 1–33. [Google Scholar] [CrossRef] [Green Version]
- Dunkelau, J.; Leuschel, M. Fairness-Aware Machine Learning—An Extensive Overview; Universität Düsseldor: Düsseldorf, Germany, 2019. [Google Scholar]
- Larson, J.; Mattu, S.; Kirchner, L.; Angwin, J. How we analyzed the COMPAS recidivism algorithm. ProPublica 2016, 9, 5. [Google Scholar]
- Besse, P.; del Barrio, E.; Gordaliza, P.; Loubes, J.M.; Risser, L. A survey of bias in machine learning through the prism of statistical parity. Am. Stat. 2021, 76, 188–198. [Google Scholar] [CrossRef]
- Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency, New York, NY, USA, 23–24 February 2018; pp. 77–91. [Google Scholar]
- Kim, J.S.; Chen, J.; Talwalkar, A. FACT: A diagnostic for group fairness trade-offs. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 5264–5274. [Google Scholar]
- Zhang, J.M.; Harman, M.; Ma, L.; Liu, Y. Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Softw. Eng. 2020, 48, 1–36. [Google Scholar] [CrossRef] [Green Version]
- Dwork, C.; Hardt, M.; Pitassi, T.; Reingold, O.; Zemel, R. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, Cambridge, MA, USA, 8–10 January 2012; pp. 214–226. [Google Scholar]
- Hardt, M.; Price, E.; Srebro, N. Equality of opportunity in supervised learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 2017, 5, 153–163. [Google Scholar] [CrossRef]
- Grgic-Hlaca, N.; Zafar, M.B.; Gummadi, K.P.; Weller, A. The case for process fairness in learning: Feature selection for fair decision making. In Proceedings of the NIPS Symposium on Machine Learning and the Law, Barcelona, Spain, 5–10 December 2016; Volume 1, p. 2. [Google Scholar]
- Kusner, M.J.; Loftus, J.; Russell, C.; Silva, R. Counterfactual fairness. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Suresh, H.; Guttag, J. A framework for understanding sources of harm throughout the machine learning life cycle. In Proceedings of the Equity and Access in Algorithms, Mechanisms, and Optimization, New York, NY, USA, 5–9 October 2021; pp. 1–9. [Google Scholar]
- Zemel, R.; Wu, Y.; Swersky, K.; Pitassi, T.; Dwork, C. Learning fair representations. In Proceedings of the International Conference on Machine Learning, Miami, FL, USA, 4–7 December 2013; pp. 325–333. [Google Scholar]
- Bellamy, R.K.; Dey, K.; Hind, M.; Hoffman, S.C.; Houde, S.; Kannan, K.; Lohia, P.; Martino, J.; Mehta, S.; Mojsilovic, A.; et al. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv 2018, arXiv:1810.01943. [Google Scholar]
- Mellin, W. Work with new electronic ‘brains’ opens field for army math experts. Hammond Times 1957, 10, 66. [Google Scholar]
- Babbage, C. Chapter VIII—Of the analytical engine. In Passages from The Life of a Philosopher; Longman: London, UK, 1864; pp. 112–141. [Google Scholar]
- Calders, T.; Žliobaitė, I. Why unbiased computational processes can lead to discriminative decision procedures. In Discrimination and Privacy in the Information Society; Springer: Berlin/Heidelberg, Germany, 2013; pp. 43–57. [Google Scholar]
- Suresh, H.; Guttag, J.V. A framework for understanding unintended consequences of machine learning. arXiv 2019, arXiv:1901.10002. [Google Scholar]
- Danks, D.; London, A.J. Algorithmic Bias in Autonomous Systems. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, VIC, Australia, 19–25 August 2017; Volume 17, pp. 4691–4697. [Google Scholar]
- Selbst, A.D.; Boyd, D.; Friedler, S.A.; Venkatasubramanian, S.; Vertesi, J. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 59–68. [Google Scholar]
- Silva, S.; Kenney, M. Algorithms, platforms, and ethnic bias: An integrative essay. Phylon 2018, 55, 9–37. [Google Scholar]
- Kamiran, F.; Calders, T. Classifying without discriminating. In Proceedings of the 2009 2nd International Conference on Computer, Control and Communication, Karachi, Pakistan, 17–18 February 2009; pp. 1–6. [Google Scholar]
- Calmon, F.; Wei, D.; Vinzamuri, B.; Natesan Ramamurthy, K.; Varshney, K.R. Optimized pre-processing for discrimination prevention. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Feldman, M.; Friedler, S.A.; Moeller, J.; Scheidegger, C.; Venkatasubramanian, S. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 259–268. [Google Scholar]
- Louizos, C.; Swersky, K.; Li, Y.; Welling, M.; Zemel, R. The variational fair autoencoder. arXiv 2015, arXiv:1511.00830. [Google Scholar]
- Hajian, S.; Domingo-Ferrer, J.; Martinez-Balleste, A. Rule protection for indirect discrimination prevention in data mining. In Proceedings of the International Conference on Modeling Decisions for Artificial Intelligence, Changsha, China, 28–30 July 2011; pp. 211–222. [Google Scholar]
- Edwards, H.; Storkey, A. Censoring representations with an adversary. arXiv 2015, arXiv:1511.05897. [Google Scholar]
- Kozodoi, N.; Jacob, J.; Lessmann, S. Fairness in credit scoring: Assessment, implementation and profit implications. Eur. J. Oper. Res. 2022, 297, 1083–1094. [Google Scholar] [CrossRef]
- Noriega-Campero, A.; Bakker, M.A.; Garcia-Bulle, B.; Pentland, A. Active Fairness in Algorithmic Decision Making. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19), Honolulu, HI, USA, 27–28 January 2019; pp. 77–83. [Google Scholar] [CrossRef] [Green Version]
- Wightman, L.F. LSAC National Longitudinal Bar Passage Study; Law School Admission Council: Newtown, PA, USA, 1998. [Google Scholar]
- Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: New York, NY, USA, 2002. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Gradient Boosting for Classification. 2011. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html (accessed on 12 May 2022).
- Bengfort, B.; Bilbro, R. Yellowbrick: Visualizing the scikit-learn model selection process. J. Open Source Softw. 2019, 4, 1075. [Google Scholar] [CrossRef]
- Friedler, S.A.; Scheidegger, C.; Venkatasubramanian, S.; Choudhary, S.; Hamilton, E.P.; Roth, D. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 329–338. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Cross-Validation: Evaluating Estimator Performance. 2011. Available online: https://scikit-learn.org/stable/modules/cross_validation.html (accessed on 12 May 2022).
- Hoffman, S.C. The AIF360 Team Adds Compatibility with Scikit-Learn. 2020. Available online: https://developer.ibm.com/blogs/the-aif360-team-adds-compatibility-with-scikit-learn/ (accessed on 12 May 2022).
- Mohri, M.; Rostamizadeh, A.; Talwalkar, A. Foundations of Machine Learning; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Quy, T.L.; Roy, A.; Iosifidis, V.; Ntoutsi, E. A survey on datasets for fairness-aware machine learning. arXiv 2021, arXiv:2110.00530. [Google Scholar]
Measure | Definition | Statistical Measures | Criterion |
---|---|---|---|
Statistical Parity [28] | Independence | ||
Conditional Statistical Parity [4] | Independence | ||
Equalized Odds [29] | , Y | Separation | |
Equalized Opportunity [29] | , Y | Separation | |
Predictive Parity [30] | , Y | Sufficiency |
True Positive | True Negative | |
---|---|---|
Predicted Positive | True Positive (TP) | False Positive (FP) |
Predicted Negative | False Negative (FN) | True Negative (TN) |
Measure | Paper | Concept |
---|---|---|
Fairness Through Awareness | [28] | Similarity |
Fairness Through Unawareness | [31,32] | Similarity |
Counterfactual Fairness | [32] | Causality |
ACC | BACC | F1 | SPD | AOD | |
---|---|---|---|---|---|
COMPAS | 0.675 | 0.669 | 0.713 | −0.261 | −0.230 |
LAW | 0.952 | 0.566 | 0.975 | −0.061 | −0.125 |
Objective | ACC | BACC | F1 | SPD | AOD | |
---|---|---|---|---|---|---|
COMPAS | ACC + BACC | 0.675 (=) | 0.669 (=) | 0.714 (+0.14%) | −0.259 (+0.77%) | −0.228 (+0.87%) |
LAW | ACC | 0.952 (=) | 0.562 (−0.71%) | 0.975 (=) | −0.056 (+8.2%5) | −0.117 (+6.4%) |
BACC | 0.793 (−16.7%) | 0.813 (+43.64%) | 0.879 (−9.85%) | −0.410 (−572.13%) | −0.288 (−130.4%) |
ACC | BACC | F1 | SPD | AOD | ||
---|---|---|---|---|---|---|
COMPAS | 0.42 | 0.653 (−3.26%) | 0.636 (−4.93%) | 0.732 (+2.66%) | −0.180 (+31.03%) | −0.156 (+32.17%) |
LAW | 0.45 | 0.952 (=) | 0.551 (−2.65%) | 0.975 (=) | −0.042 (+31.15%) | −0.092 (+26.4%) |
ACC | BACC | F1 | SPD | AOD | |
---|---|---|---|---|---|
COMPAS | 0.655 (−2.96%) | 0.656 (−1.94%) | 0.699 (−1.96%) | 0.043 (+116.48%) | 0.076 (+133.04%) |
LAW | 0.950 (−0.21%) | 0.731 (+29.15%) | 0.974 (−0.1%) | −0.005 (+91.8%) | 0.014 (+111.2%) |
ACC | BACC | F1 | SPD | AOD | |
---|---|---|---|---|---|
COMPAS | 0.658 (−2.08%) | 0.651 (−2.11%) | 0.700 (−1.82%) | 0.094 (+135.47%) | 0.093 (+139.57%) |
LAW | 0.0.951 (−0.11%) | 0.535 (−4.46%) | 0.975 (=) | 0.001 (+101.72%) | 0.015 (+113.16%) |
Data | ACC | BACC | F1 | SPD | AOD | |
---|---|---|---|---|---|---|
COMPAS | Train + Test | 0.676 (+0.15%) | 0.676 (+1.05%) | 0.714 (+0.14%) | 0.001 (+100.38%) | 0.002 (+100.87%) |
Train | 0.674 (−0.15%) | 0.673 (+0.60%) | 0.704 (−1.26%) | −0.225 (+13.79%) | −0.193 (+16.09%) | |
LAW | Train + Test | 0.952 (=) | 0.768 (+35.69%) | 0.975 (=) | 0.000 (+100%) | −0.003 (+97.6%) |
Train | 0.952 (=) | 0.762 (+34.63%) | 0.975 (=) | −0.059 (+3.28%) | −0.116 (+7.2%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Langenberg, A.; Ma, S.-C.; Ermakova, T.; Fabian, B. Formal Group Fairness and Accuracy in Automated Decision Making. Mathematics 2023, 11, 1771. https://doi.org/10.3390/math11081771
Langenberg A, Ma S-C, Ermakova T, Fabian B. Formal Group Fairness and Accuracy in Automated Decision Making. Mathematics. 2023; 11(8):1771. https://doi.org/10.3390/math11081771
Chicago/Turabian StyleLangenberg, Anna, Shih-Chi Ma, Tatiana Ermakova, and Benjamin Fabian. 2023. "Formal Group Fairness and Accuracy in Automated Decision Making" Mathematics 11, no. 8: 1771. https://doi.org/10.3390/math11081771
APA StyleLangenberg, A., Ma, S.-C., Ermakova, T., & Fabian, B. (2023). Formal Group Fairness and Accuracy in Automated Decision Making. Mathematics, 11(8), 1771. https://doi.org/10.3390/math11081771