1. Introduction
The deployment of algorithmic risk assessment instruments in criminal justice systems has accelerated dramatically over the past two decades, fundamentally altering how sentencing decisions are informed and justified. Tools such as the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), the Public Safety Assessment (PSA), and the Level of Service Inventory-Revised (LSI-R) now influence the liberty interests of millions of defendants annually across multiple jurisdictions [
1,
2]. Proponents argue that these instruments enhance objectivity, reduce inter-judge disparities, and improve resource allocation within correctional systems. Critics counter that algorithmic sentencing tools perpetuate historical biases, operate as inscrutable “black boxes,” and undermine the individualized assessment that due process demands [
3,
4].
The tension between predictive utility and interpretive transparency has emerged as a central challenge in this domain. Machine learning models capable of superior predictive performance often achieve such performance through complex, nonlinear transformations that resist human comprehension. Conversely, models designed for transparency may sacrifice predictive accuracy, potentially compromising public safety objectives. This apparent trade-off has prompted some scholars to question whether algorithmic sentencing can ever satisfy the dual imperatives of accuracy and accountability [
5].
This paper argues that the framing of explainability as a constraint rather than a competing objective offers a productive path forward. Rather than treating transparency as one value to be balanced against others, we propose that explainability requirements should function as non-negotiable architectural constraints within which predictive systems must operate. This approach reflects the constitutional reality that due process is not merely one policy preference among many, but a fundamental requirement that circumscribes permissible governmental action.
The contribution of this work is threefold. First, we develop a theoretical framework situating explainability requirements within the broader normative architecture of criminal sentencing, drawing upon constitutional doctrine, administrative law principles, and emerging AI governance frameworks. Second, we propose a technical architecture for explainability-constrained sentencing models that integrates multiple complementary explanation mechanisms calibrated to judicial decision-making contexts. Third, we assess the practical viability of this framework through comparative analysis of existing risk assessment instruments and evaluation against emerging regulatory requirements, particularly those articulated in the European Union Artificial Intelligence Act.
An important distinction must be drawn between technical design feasibility and institutional adoptability. While this paper demonstrates that explainability-constrained sentencing models are technically feasible—achieving competitive predictive performance while satisfying transparency requirements—institutional adoption faces distinct and independently formidable challenges that warrant systematic analysis.
Research on judicial technology adoption identifies several categories of institutional resistance. First, epistemic conservatism: courts exhibit documented reluctance to adopt technically sophisticated tools, reflecting both professional norms favoring established procedures and legitimate concerns about judicial capacity to evaluate algorithmic recommendations critically [
6]. Second, organizational inertia: court administrative structures, procurement processes, and workflow routines create path dependencies that resist technological disruption regardless of technical merit. Third, professional identity concerns: judges may perceive algorithmic decision support as encroaching upon judicial discretion—a core professional value—even when such tools are positioned as advisory rather than determinative. Fourth, accountability ambiguity: the introduction of algorithmic intermediaries complicates traditional accountability structures, creating uncertainty about responsibility allocation when algorithmic recommendations prove erroneous.
We emphasize that all claims in this paper regarding judicial comprehension, cognitive load management, and decision-making improvement are theoretically motivated design objectives grounded in cognitive science literature rather than empirically demonstrated outcomes. Design feasibility establishes what is possible; institutional adoption depends upon judicial training, organizational culture, appellate oversight, stakeholder acceptance, and resource allocation. Empirical validation of judicial engagement with the proposed explanation architecture—through controlled experiments, field studies, and structured expert feedback—constitutes a critical priority for future research, as discussed in
Section 7. This paper addresses primarily design feasibility while
Section 6 provides implementation guidance addressing adoption barriers. The success of explainable AI-assisted sentencing ultimately depends not only on sound technical design but also on institutional reforms ensuring that judges possess both the capacity and the incentives to engage meaningfully with algorithmic recommendations rather than deferring uncritically to technological authority.
The structure of the paper proceeds as follows.
Section 2 reviews the existing literature on algorithmic risk assessment in criminal justice, with particular attention to empirical evaluations of predictive validity, fairness properties, and the mathematical impossibility results that constrain achievable fairness guarantees.
Section 3 examines the legal and regulatory landscape governing AI-assisted sentencing, including constitutional due process requirements under State v. Loomis and emerging statutory frameworks.
Section 4 presents our proposed framework for explainability-constrained sentencing models, detailing both the technical architecture and the underlying design rationale.
Section 5 provides a comparative analysis of framework performance against existing alternatives.
Section 6 discusses implementation pathways and offers policy guidance.
Section 7 concludes with reflections on the broader implications for algorithmic governance in high-stakes domains.
6. Implementation Pathways and Policy Guidance
6.3. Toward a Coherent Policy Framework
Drawing together the preceding analysis, we articulate a coherent policy framework for jurisdictions considering AI-assisted sentencing systems. These recommendations constitute guidance informed by theoretical analysis, computational experimentation, and synthesis of existing regulatory requirements rather than validated best practices tested in operational judicial settings. Empirical evaluation of these recommendations through pilot programs and implementation studies is an essential precondition for their adoption as established practice. The framework proceeds from the premise that algorithmic risk assessment can play a legitimate informational role in sentencing, but that legitimacy depends upon satisfaction of transparency, accountability, and fairness requirements that current proprietary systems fail to meet.
The starting point should be a presumption favoring interpretable model architectures. Complex black-box systems should be permitted only where demonstrably superior predictive performance can be shown to outweigh transparency costs—a burden that current evidence suggests will rarely be met for tabular risk assessment data. This presumption reverses current practice, which permits opacity by default and treats transparency as an optional enhancement.
Explanation requirements should be mandatory and multi-layered, encompassing global model descriptions that enable assessment of overall model behavior; local explanations identifying factors contributing to individual predictions; counterfactual analyses revealing prediction sensitivity; and uncertainty quantification enabling calibrated reliance. These explanation requirements should be enforceable through procedural rights, with defendants able to challenge sentences where required explanations are absent or inadequate.
Human oversight must be substantive rather than nominal. The Loomis framework correctly requires judges to articulate independent justifications not dependent upon risk scores, but this requirement can be circumvented through pro forma compliance. Meaningful oversight requires that judges engage with the substance of algorithmic recommendations, considering both the factors identified and the limitations acknowledged. Training, institutional culture, and appellate review should reinforce expectations of genuine deliberation.
Ongoing validation and accountability mechanisms should be mandatory rather than discretionary. Annual validation studies, publicly reported, should assess predictive validity and fairness metrics on deployment populations. Independent oversight bodies should have authority to audit system performance, investigate complaints, and mandate corrective action. Sunset provisions should require periodic legislative reauthorization, ensuring democratic accountability for continued system use.
Fairness monitoring should be continuous and responsive. Given the impossibility results establishing that all fairness criteria cannot be simultaneously satisfied, jurisdictions must make explicit choices about which fairness properties to prioritize. These choices should be subject to public deliberation and periodic reconsideration. When monitoring reveals disparities exceeding specified thresholds, investigation and corrective action should be mandatory.
These recommendations collectively establish a framework treating algorithmic sentencing as a privilege contingent upon demonstrated compliance with transparency, accountability, and fairness requirements—not as a technological inevitability to which legal systems must simply adapt.
7. Conclusions
The deployment of algorithmic risk assessment in criminal sentencing presents both opportunities and perils. The opportunity lies in the potential for more consistent, evidence-informed decisions that allocate correctional resources effectively while minimizing unnecessary incarceration. The peril lies in the prospect of opaque systems that perpetuate historical biases, resist accountability, and undermine the individualized assessment that justice demands.
As argued in
Section 1 and
Section 4.1, treating explainability as a non-negotiable architectural constraint—rather than a value to be traded against predictive performance—offers a productive path forward. The proposed framework provides preliminary evidence, based on analysis of one benchmark dataset, that interpretable model architectures—specifically, Generalized Additive Models with pairwise interactions—can achieve predictive performance comparable to black-box alternatives while satisfying constitutional due process requirements and emerging regulatory mandates. Whether this finding generalizes across jurisdictions, populations, and prediction targets requires prospective multi-site validation that the present study cannot provide. The impossibility theorems governing algorithmic fairness establish that normative choices about error distribution are unavoidable; transparent architectures render these choices visible and subject to democratic deliberation rather than concealing them within proprietary systems.
The framework presented here offers both immediate actionable contributions and a foundation for longer-term research development. Short-term actionable contributions suitable for immediate implementation include: adoption of inherently interpretable model architectures (GA2M, rule lists, sparse linear models) in jurisdictions currently using or considering algorithmic risk assessment; development of standardized explanation templates calibrated to judicial audiences with attention to cognitive constraints and decision-making contexts; implementation of fairness monitoring protocols with transparent public reporting of performance disparities across demographic groups and explicit articulation of chosen fairness criteria; and establishment of governance structures—including independent auditing authority, defendant access rights, and periodic revalidation requirements—ensuring ongoing accountability rather than one-time certification. These contributions require no fundamental technical breakthroughs and can be implemented with existing methods and computational infrastructure.
Longer-term research agendas building upon this foundation include: prospective validation studies assessing framework performance in operational judicial settings rather than retrospective datasets, with measurement of both predictive accuracy and downstream sentencing outcomes; empirical evaluation of judicial comprehension and appropriate reliance through controlled experiments, field studies, and structured interviews with judicial actors; extension to intersectional fairness analysis examining performance across multiple simultaneously relevant protected characteristics (race, gender, age, socioeconomic status) rather than binary demographic comparisons; development of methods enabling dynamic model updating as populations and risk factors evolve while maintaining transparency and accountability; integration with case management and supervision systems to assess whether risk assessment improves not only classification accuracy but also ultimate outcomes through targeted intervention; and comparative analysis across jurisdictions examining how local context (judicial culture, sentencing norms, available resources) moderates framework effectiveness. These longer-term research directions require sustained empirical investigation, institutional collaboration, and iterative refinement based on deployment experience.
Several limitations of the current analysis warrant explicit acknowledgment. Most significantly, the framework evaluation relies entirely upon retrospective analysis of a single dataset—the Broward County COMPAS dataset—rather than prospective validation across multiple jurisdictions. While this dataset is the standard benchmark in the algorithmic fairness literature, results demonstrate technical feasibility rather than general validity. Population characteristics, criminal justice practices, data quality, and base rates vary substantially across jurisdictions, and model performance observed in Broward County may not transfer to other contexts. Prospective multi-site validation studies are essential before claims of general applicability can be supported.
The fairness analysis is further limited by its focus on binary racial classifications (Black versus White), which cannot capture the complexity of intersectional identities, multiracial populations, or compounding disadvantages arising from simultaneous membership in multiple marginalized groups. Future research should examine fairness across multiple simultaneously relevant protected characteristics including race, ethnicity, gender, age, and socioeconomic status. The usability claims rest upon theoretical foundations from cognitive science literature rather than empirical testing with judicial actors; controlled usability studies with judges, presentence investigators, and defense attorneys are necessary to validate design assumptions.
More fundamentally, this paper has addressed the design of risk assessment systems without engaging the prior normative question of whether predictive considerations should influence sentencing at all. Even a perfectly accurate, transparent, and fair risk assessment system would face philosophical objections regarding the propriety of punishment calibrated to predicted future conduct rather than past actions. These objections, grounded in retributivist commitments to proportionality and desert, fall outside the scope of the present analysis but deserve serious engagement in ongoing debates about evidence-based sentencing.
The stakes of these debates could hardly be higher. Criminal sentencing decisions determine who loses liberty and for how long—the most consequential exercise of state power over individuals in peacetime governance. If algorithmic systems are to play a role in these decisions, they must be systems that affected individuals can understand, challenge, and ultimately trust. The framework presented here represents one approach to achieving that essential transparency while preserving the predictive benefits that motivate algorithmic risk assessment. Whether the trade-offs embodied in this approach are acceptable is ultimately a question for democratic deliberation, not technical optimization. What technical analysis can contribute is clarity about what trade-offs exist, what institutional arrangements might navigate them, and what immediate steps can be taken while longer-term research proceeds—contributions this paper has attempted to provide.
Author Contributions
Conceptualization, J.S. and T.S.; methodology, J.S.; software, T.S.; validation, J.S. and T.S.; formal analysis, J.S.; investigation, J.S.; resources, J.S. and T.S.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S. and T.S.; visualization, T.S.; supervision, J.S.; project administration, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Shandong Province Key Educational Reform Project “Legal Services for All: Research on the Training Model of Applied Legal Talents”, grant number Z2023015; Shandong Province Social Science Special Project “Research on the Criminal Liability Dilemma and Solutions for Generative AI ‘Hallucinations’”, grant number 25CFZJ21; Shandong University of Political Science and Law Educational Reform Project “Research on the Mechanism and Path of ‘Legal Services for All’ Empowering the Construction of Ideological and Political Courses in Political Science and Law Universities” grant number 2025JG001; and the Shandong University of Political Science and Law Dingshan Talent Project 2024.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
Data Availability Statement: The original contributions presented in this study are included in the article. The primary dataset used for model development and validation is the publicly available ProPublica COMPAS dataset (Broward County, Florida, 2013–2014), accessible at
https://github.com/propublica/compas-analysis (accessed on 10 January 2025). All preprocessing code, trained models, and analysis scripts will be made available in a public GitHub repository upon manuscript acceptance. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Desmarais, S.L.; Singh, J.P. Risk Assessment Instruments Validated and Implemented in Correctional Settings in the United States; Council of State Governments Justice Center: Lexington, KY, USA, 2013; Available online: https://csgjusticecenter.org/wp-content/uploads/2020/02/Risk-Assessment-Instruments-Validated-and-Implemented-in-Correctional-Settings-in-the-United-States.pdf (accessed on 21 January 2026).
- Kehl, D.; Guo, P.; Kessler, S. Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing; Responsive Communities Initiative; Berkman Klein Center for Internet & Society, Harvard Law School: Cambridge, MA, USA, 2017; Available online: http://nrs.harvard.edu/urn-3:HUL.InstRepos:33746041 (accessed on 21 January 2026).
- Starr, S.B. Evidence-Based Sentencing and the Scientific Rationalization of Discrimination. Stanf. Law Rev. 2014, 66, 803–872. [Google Scholar]
- Harcourt, B.E. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age; University of Chicago Press: Chicago, IL, USA, 2007; ISBN 978-0-226-31614-7. [Google Scholar]
- Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Stevenson, M.T.; Doleac, J.L. Algorithmic Risk Assessment in the Hands of Humans; IZA Discussion Paper No. 12853; Institute of Labor Economics (IZA): Bonn, Germany, 2022. [Google Scholar]
- Gottfredson, S.D.; Moriarty, L.J. Statistical Risk Assessment: Old Problems and New Applications. Crime Delinq. 2006, 52, 178–200. [Google Scholar] [CrossRef]
- Burgess, E.W. Factors Determining Success or Failure on Parole. In The Workings of the Indeterminate Sentence Law and the Parole System in Illinois; Bruce, A.A., Harno, A.J., Burgess, E.W., Landesco, J., Eds.; State Board of Parole: Springfield, IL, USA, 1928; pp. 221–234. [Google Scholar]
- Andrews, D.A.; Bonta, J.; Wormith, J.S. The Recent Past and Near Future of Risk and/or Need Assessment. Crime Delinq. 2006, 52, 7–27. [Google Scholar] [CrossRef]
- Angwin, J.; Larson, J.; Mattu, S.; Kirchner, L. Machine Bias: There’s Software Used across the Country to Predict Future Criminals. And It’s Biased against Blacks. ProPublica. 23 May 2016. Available online: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing (accessed on 21 January 2026).
- DeMichele, M.; Baumgartner, P.; Wenger, M.; Barber-Rioja, V.; Comfort, M.; Mack, N. The Public Safety Assessment: A Re-Validation and Assessment of Predictive Utility and Differential Prediction by Race and Gender in Kentucky. Criminol. Public Policy 2020, 19, 409–431. [Google Scholar] [CrossRef]
- Andrews, D.A.; Bonta, J. The Psychology of Criminal Conduct, 5th ed.; Anderson Publishing: New York, NY, USA, 2010. [Google Scholar]
- Singh, J.P.; Grann, M.; Fazel, S. A Comparative Study of Violence Risk Assessment Tools: A Systematic Review and Metaregression Analysis of 68 Studies Involving 25980 Participants. Clin. Psychol. Rev. 2011, 31, 499–513. [Google Scholar] [CrossRef] [PubMed]
- Fazel, S.; Singh, J.P.; Doll, H.; Grann, M. Use of Risk Assessment Instruments to Predict Violence and Antisocial Behaviour in 73 Samples Involving 24827 People: Systematic Review and Meta-Analysis. BMJ 2012, 345, e4692. [Google Scholar] [CrossRef] [PubMed]
- Dressel, J.; Farid, H. The Accuracy, Fairness, and Limits of Predicting Recidivism. Sci. Adv. 2018, 4, eaao5580. [Google Scholar] [CrossRef] [PubMed]
- Lin, Z.; Jung, J.; Goel, S.; Skeem, J. The Limits of Human Predictions of Recidivism. Sci. Adv. 2020, 6, eaaz0652. [Google Scholar] [CrossRef] [PubMed]
- Dieterich, W.; Mendoza, C.; Brennan, T. COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity; Technical Report; Northpointe Inc. Research Department: Traverse City, MI, USA, 2016. [Google Scholar]
- Kleinberg, J.; Mullainathan, S.; Raghavan, M. Inherent Trade-Offs in the Fair Determination of Risk Scores. In Proceedings of the Innovations in Theoretical Computer Science Conference (ITCS 2017), Berkeley, CA, USA, 9–11 January 2017. [Google Scholar] [CrossRef]
- Chouldechova, A. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 2017, 5, 153–163. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Citron, D.K.; Pasquale, F. The Scored Society: Due Process for Automated Predictions. Wash. Law Rev. 2014, 89, 1–33. [Google Scholar]
- Supreme Court of the United States. Gardner v. Florida, 430 U.S. 349. Available online: https://supreme.justia.com/cases/federal/us/430/349/ (accessed on 2 February 2026).
- Supreme Court of Wisconsin. State v. Loomis, 881 N.W.2d 749. Available online: https://harvardlawreview.org/print/vol-130/state-v-loomis/ (accessed on 2 February 2026).
- Supreme Court of the United States. Loomis v. Wisconsin, 137 S. Ct. 2290 (cert. denied). Available online: https://law.justia.com/cases/wisconsin/supreme-court/2016/2015ap000157-cr.html (accessed on 2 February 2026).
- European Parliament; Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024, L 2024/1689, 1–144. [Google Scholar]
- European Parliament. EU AI Act: First Regulation on Artificial Intelligence. Available online: https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (accessed on 21 January 2026).
- Edwards, L.; Veale, M. Slave to the Algorithm? Why a “Right to an Explanation” Is Probably Not the Remedy You Are Looking for. Duke Law Technol. Rev. 2017, 16, 18–84. [Google Scholar]
- European Parliament; Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data (General Data Protection Regulation). Off. J. Eur. Union 2016, L 119, 1–88. [Google Scholar]
- Article 29 Data Protection Working Party. Guidelines on Automated Individual Decision-Making and Profiling for the Purposes of Regulation 2016/679 (WP251rev.01); European Commission: Brussels, Belgium, 2018. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Floridi, L. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. Int. Data Priv. Law 2017, 7, 76–99. [Google Scholar] [CrossRef]
- Lou, Y.; Caruana, R.; Gehrke, J.; Hooker, G. Accurate Intelligible Models with Pairwise Interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 623–631. [Google Scholar] [CrossRef]
- Nori, H.; Jenkins, S.; Koch, P.; Caruana, R. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv 2019, arXiv:1909.09223. [Google Scholar] [CrossRef]
- Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harv. J. Law Technol. 2018, 31, 841–887. [Google Scholar] [CrossRef]
- Zeng, J.; Ustun, B.; Rudin, C. Interpretable Classification Models for Recidivism Prediction. J. R. Stat. Soc. Ser. A 2017, 180, 689–722. [Google Scholar] [CrossRef]
- Dietvorst, B.J.; Simmons, J.P.; Massey, C. Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err. J. Exp. Psychol. Gen. 2015, 144, 114–126. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |