Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments

Atkinson, Gentry; Casagrande, Katlyn

doi:10.3390/info17060574

Open AccessReview

Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments

by

Gentry Atkinson

^1,*

and

Katlyn Casagrande

^2,*

¹

Department of Computer Science, St. Edward’s University, Austin, TX 78704, USA

²

School of Social Sciences, Utah State University, Logan, UT 84322, USA

^*

Authors to whom correspondence should be addressed.

Information 2026, 17(6), 574; https://doi.org/10.3390/info17060574 (registering DOI)

Submission received: 9 April 2026 / Revised: 3 June 2026 / Accepted: 4 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue The Role of Artificial Intelligence for Diversity, Equity, and Inclusion)

Download

Browse Figures

Versions Notes

Abstract

Risk assessment tools are used in criminal justice to evaluate an individual’s likelihood of reoffending. There is a growing discussion around the use of artificial intelligence (AI) and machine learning (ML) in algorithmic risk assessment (ARA). This survey examines the use of and the potential for bias in the use of ARA in criminal justice. Through a structured interdisciplinary review of recent research on the impact of ARA, this investigation examines the tools currently being used and whether there is evidence that ARA tools contribute to bias. Included papers were collected from Google Scholar and the ACM Digital Library and have been published since 2015, discuss AI, and focus on the adult justice system in the US, yielding 56 studies. In total, 79% of the surveyed literature concluded that AI and ML can or do contribute to biased performance in risk assessment. The two most recorded sources of bias were the use of historical court records as training data and the use of variables or features that correlate strongly with race, gender, age, or other protected attributes, while noting that this result relies heavily on a small number of real-world observations, most notably the COMPAS dataset collected in Broward County. The recorded benefits of ARA included efficiency and resource utilization. The use of AI-derived risk assessment tools is growing and holds the potential to affect a lot of lives. It is important to examine and consider the implications of their use, especially involving bias and fairness in criminal justice decision-making.

Keywords:

risk assessment; fairness; artificial intelligence; algorithmic governance

1. Introduction

Algorithmic governance can offer many potential benefits to governments in the United States. These range from improving the efficiency of resource allocation to addressing issues of systemic fairness. This is particularly true in the criminal justice system, where small changes in a defendant’s perceived risk may result in longer sentences, higher bail, and a lower likelihood of parole. Risk assessment tools are used in criminal justice to evaluate an individual’s likelihood of reoffending by calculating a risk score based on personal history, criminal record, and other causal factors [1,2]. Algorithmic risk assessments (ARAs) have been developed to improve the accuracy and efficiency of these tools and are currently used in sentencing and corrections throughout the United States.

Several recent works have discussed the usefulness and potential biases of these tools. However, it is difficult to reach any solid conclusions about ARA tools themselves because the nature of the tools varies widely from state to state and is often not open to public review. The specifics of ARA tools are also rapidly changing, meaning that solid conclusions from just a few years ago may need to be re-reviewed for consistency in the current judicial landscape. It is also true that the authors discussing this subject vary widely, as do the nature of the publications.

This structured interdisciplinary review examined and cataloged recent works on ARA with a particular focus on the intersection that these tools have with the field of artificial intelligence (AI), and how the introduction of AI into this field might impact the reality and perception of biased performance by these tools.

Because risk assessments and their place in criminal justice are rapidly changing and hold the potential to impact a great number of human lives, there is a need for an up-to-date analysis of ARA tools in the wild with a focus on applications of artificial intelligence, which, like algorithmic governance, is a rapidly evolving field. In the recent past, AI was a niche field whose awareness was largely limited to computer science and neuroscience publications. Now, however, AI is widely discussed and applied. Some ARA tools are now beginning to incorporate, or claim to incorporate, AI. The extent to which these claims are true is not entirely clear, owing largely to a lack of a unified definition for AI across the many disparate disciplines that are currently engaged in reviewing and discussing ARA tools. Similarly, machine learning (ML) is a rapidly advancing field that is widely discussed by the academic community without a shared understanding of its abilities and limitations. If ML is commonly being used as a method for ARA, that has implications for the usefulness and fairness of these tools and is worthy of greater public awareness.

Developers of AI tools promise a substantial number of benefits to society. As seen in Figure 1, a wide range of potential benefits have been identified by our included sources, with the most common being a reduction in the costs associated with the criminal justice system or better distribution of resources. Other benefits include reducing the bias of human judgment [3], doing a better job of predicting recidivism [4], and reducing the number of individuals held in jails and prisons [5]. Although most sources recognize that the broad promises of AI-enabled risk assessment have not been fully realized yet [6], this list has been sufficient to attract substantial interest both from academic researchers and from government administrators.

While it is easy to get the impression that risk assessment tools exist wholly inside of courtrooms and could only ever impact the lives of criminal defendants, in recent years, these tools have begun to leave the courtroom and be more commonly applied in the neighborhoods of American cities. Some law enforcement agencies are using AI and statistical tools in predictive policing practices to focus police presence and resources on areas identified by these tools as more likely to experience crime. Bias potentially introduced by these tools substantially impacts the lives of citizens who find their neighborhoods either bereft of policing resources or suddenly under increased legal scrutiny. These tools and tactics are being applied more broadly in recent years, and there is a need for outside review to ensure that they are being applied ethically and that the tools themselves have been rigorously reviewed, both for predictive effectiveness and for fairness.

The need for review is particularly urgent when ML is being used as a tool for developing and administering risk assessments. ML finds patterns in data without direction or insight from human analysts, which puts it at a greater risk for capturing bias recorded in historical data, misaligning our tools with their intentions. ML, and the related field of deep learning, are currently at the forefront of AI research, and they are likely influencing the development of modern risk assessment tools. If this is true, there is an additional need for verification by impartial human analysts to be certain that our tools are not performing in ways that were not intended. This review has investigated AI as it is being applied in American courtrooms, many of which are slow to adopt emerging technologies. Although AI and ML have advanced substantially in recent years, the works studied here present AI as it is being used.

This paper addressed gaps in the current discussion around AI in ARA by answering four research questions using a thorough and structured interdisciplinary survey of recent literature to build a more unified understanding of the uses and impact of ARA tools in the current American criminal justice system:

Is machine learning being widely used as a technique for algorithmic risk assessment in the United States criminal justice system?
What specific tools are being used for risk assessment at this time?
Does algorithmic risk assessment contribute to bias or issues of fairness in the criminal justice system (i.e., in sentencing, setting bail, or granting parole)?
What, if any, is the evidence that is commonly being used to suggest that algorithmic risk assessment tools contribute to bias?

These questions were addressed by systematically collecting recent literature that specifically discusses the applications of ARA in the adult criminal justice system in the United States. Our specific inclusion criteria are discussed at greater length in Section 4 of this paper. The collected papers were reviewed to develop an understanding of the current uses of ARA in the United States judicial system and the effect that AI is having on those tools.

The remainder of this paper is organized as follows: Section 2 discusses other works related to this paper, Section 3 narrowly defines relevant terms, Section 4 lays out the inclusion and exclusion criteria for our structured collection of scholarly works, Section 5 presents the results of our analysis, Section 6 gives room for some discussion of issues not specifically covered by our research questions, and Section 7 documents our conclusions.

2. Related Work

The United States’ criminal justice system has a long history of bias and a growing reliance on ARA, making it crucial to understand any influences that AI and ML may have on accuracy and fairness. There has been a growing amount of work on this topic from researchers in law, computer science, criminal justice, and other related fields. Our study is distinct in part because it comes from an interdisciplinary collaboration between researchers from computer science and criminal justice. Interdisciplinary research is valuable in many ways, especially for topics such as this, which integrates the expertise and needs of two very different fields. When considering the application of computer science algorithmic tools to the policies and procedures in the criminal justice system, the collaboration of experts in both fields allows for a more fully informed discussion.

Much of the existing research in the fields of criminal justice and computer science that explores issues of bias struggles with unclear and inconsistent definitions of the term [4]. Our study is unique in that we have included an explanation of the statistical and social meaning and use of the terms bias and fairness. Because of this, we are able to offer a more comprehensive discussion about both the reality and perception of bias and fairness in ARA tools. By exploring the different definitions of bias and fairness as they relate to ARA, this work aims to clarify some of the misunderstandings surrounding bias in the applications of ARA in criminal justice. Furthermore, by focusing on the use of AI and AI-enabled ARA products, we present the particular contributions that AI makes to the potential biases of risk assessment tools, rather than the human contributions as recorded in historical data used for training AI.

Several AI-focused reviews of the general field of ARA exist in previous literature. Of these, some have focused on specific ARA tools or coding challenges [7] while others focus only on the predictive power of AI tools in risk assessment [8,9], rather than the fairness and bias of those tools. Other work [10] have explored the use of algorithms in criminal risk assessment without a specific focus on AI and ML. This multi-disciplinary review has adopted a scope that is wider than a particular municipality or context and has chosen to analyze AI in ARA specifically with bias in mind, drawing on both mathematical definitions of bias grounded in common practices in ML and an understanding of bias as it is discussed in social and criminal studies. While previous work typically studies ARA from either the perspective of the law or through mathematics and computer science, this interdisciplinary study captures both viewpoints. Literature has been collected from both legal review and technology-focused publications.

3. Defining Algorithmic Risk Assessment

3.1. Deterministic Methods

Risk need assessment (RNA) tools are used in almost every decision point in the criminal justice system with the goal of offering a uniform, evidence-based approach to properly target resources and increase public safety [11,12]. There has been rapid and expansive growth in RNA tools over the last hundred years, with research in the early 2000s indicating that anywhere from 75% to 97% of probation, parole, and community corrections agencies were using a risk assessment tool [13]. Modern RNA and ARA tools grew out of a need to standardize risk assessments that relied on clinical and professional judgments.

The first generation of RNA tools was subjective, based on semi-structured interviews, and held the potential for bias (conscious or unconscious) and stereotyping [12]. The second, third, and fourth generations introduced standardized scoring sheets based on key variables statistically linked to the outcomes of interest. Second-generation RNA tools focused on risk prediction using criminal history. The third and fourth generation tools introduced treatment needs and criminogenic factors, respectively. Actuarial and statistical assessment models became the standard, as they are considered more accurate, consistent, and efficient than human judgment, and less likely to introduce bias [13].

Introduced in the late 1990s and early 2000s, the fifth generation of risk assessment tools introduced ML to streamline decisions and attempt to increase predictive accuracy [12,13]. ARA tools focus on static risk items and variables that can be pulled from automated records. The focus on risk can be seen as reverting back to second-generation tools, but some have argued that this is justified given the ease of administration and reduction in human bias [12]. Of course, others have argued that ARA tools can reinforce human bias and inequality, as the historic data used to train models often reflect biased policing and judicial practices [14].

3.2. Artificial Intelligence

The definition of AI is extremely flexible, seemingly being extended to any product on the market. In a technical setting, an artificially intelligent agent is defined as one that is capable of observing its environment and updating a stored representation of it [15] or one that can generate optimal outcomes in terms of resource utilization using observation and reasoning [16]. This can include large, complicated systems, but might also include simple statistical tools. Artificial intelligence is often a question of application as much as it is a description of the underlying techniques.

Although AI has advanced rapidly in the 21st century, it is important to maintain our understanding of the term as it exists in the field of computer science. While deep learning and large language models (LLMs) have captured a large portion of research attention, the term applies to many statistical and algorithmic techniques. As such, the term artificial intelligence is often applied to algorithmic risk assessments, even those that do not incorporate deep learning or large language models. While AI-driven risk assessments are not responsive or conversational, they do learn the importance of features in collected data that determine the outcome, generally a risk score for an individual in the legal system. This is distinct from actuarial tools, where feature weights are determined by analysts at the time that the tools are created.

3.3. Machine Learning

A large suite of statistical tools for pattern analysis shares the name “machine learning”. Broadly, machine learning is an approach to AI that relies on iterative, statistical tools. Although the term has been in use since the 1940s [16], the increasing power of computers in the 21st century has made ML the preferred methodology for advancing the study and applications of AI [17]. Actuarial tools rely on human insight for their development, including identifying relevant features and deciding the mathematical function used to model the relationship between those features. In contrast, machine learning allows computer programs to develop a model of real-world data through an iterative process of refinement [18].

ML is most effectively applied to situations where the amount of data collected is very large, and known outcomes have been recorded for a large percentage of the data [17]. When this condition is satisfied, it is possible for ML tools to find relationships between features that might have been invisible to a human analyst. Neural networks are the current focus of machine learning, with deep neural networks having provided tools like large language models and highly nuanced computer vision in recent years.

However, many far simpler tools should still be considered ML. Logistic regressions, linear regressions, and decision trees are all still recognized as machine learning models. This is important because they still suffer from the same tendency to capture historical biases when trained on data that do not align with current social views or with the intended outcomes of the tool. Deep learning models infrequently appear in ARAs [19], but the simple tools share many of the same dangers as larger models [20].

As computing power and support for heterogeneous computing techniques has increased, the size, and often the predictive power, of neural networks has increased substantially. LLMs have stepped past the need for large labeled datasets [21], giving them access to greater quantities of training data to support higher parameter counts and greater reasoning ability. However, deep neural networks and LLMs are not being used in criminal risk assessments [19]. While a belief does exist in the literature that deep learning will be applied to risk assessment in the future [22], the risks, lack of interpretability [23], and need for large training data sets [24] currently outweigh the potential benefits.

3.4. Bias and Fairness

In statistics, bias is defined as a preference for one decision over another given the same data [25]. This can be a banal or even desired outcome when there are discrepancies in the impact of predictions. However, when tests show a preference to one prediction over another on the lines of legally protected features like race, gender, or age, it can raise serious questions about the real-world usability of the test.

Socially, bias is defined as personal, unfounded judgment, or prejudice [26]. The public often thinks about bias in the criminal justice system as an issue of explicit, or conscious, bias, but research has consistently identified implicit bias as the root problem [27]. Racial bias, for example, is often discounted by the American public under the belief that the law is neutral and any disproportionate representation in the criminal justice system by a racial minority is the result of disproportionate involvement in criminal activity. The racial disparity, however, cannot be sufficiently explained by higher crime rates among minority racial groups [28]. A more accurate explanation of such racial disparity is implicit bias in the administration of the criminal justice system and the compounding effect of past discriminatory decisions and policies [27,29].

By relying on historical data, ARA tools have the potential to perpetuate implicit biases within the criminal justice system [14]. Additionally, an offender’s criminal record may reflect biased or unjust treatment of the individual by various social institutions [30]. ARA tools use criminal records as a major factor in the calculation, thereby solidifying the effect of said bias. As a result, ARA tools may be perceived by researchers and the general public as inherently biased or introducing greater bias than other RNA tools, when the reality might simply be the reinforcement of existing implicit biases.

Fairness is a concept that bridges the social and statistical definitions of bias. There are two broad categories of fairness measurements: group and individual [25]. Group fairness asserts that groups defined by features that are not used for predictions (like race or gender) should experience similar outcomes [31], while individual fairness asserts that individuals with similar characteristics should experience similar outcomes [32]. A naive approach to assuring fairness is to omit protected attributes from training data. Unfortunately, it is possible for information about these features to slip in through “proxy variables”, which correlate strongly with the protected features [33].

Mathematically, fairness can be defined in many different ways [34]. Unfortunately, these definitions can, at times, be conflicting [22]. Without a unified metric for measuring fairness, it is impossible to quantitatively measure the fairness of the many tools for ARA that are currently available [4].

4. Source Collection

Risk assessment has been an important discussion in the world of criminal justice for many decades, and the recent inclusion of AI-enabled tools has only made this discussion more relevant. To ensure that a body of included works was collected that represented a thorough encapsulation of this discussion, specific inclusion and exclusion criteria were developed by this team. Our intention in defining these criteria was to find as many sources as possible that contributed a variety of viewpoints, while still remaining focused on the scientific and legal frameworks of ARA. All of our queries and source retrievals occurred in January and April of 2026. The collected sources were reviewed by the two authors. This protocol for this review is not registered but care has been taken to develop a source collection protocol that is thorough, complete, and resistant to bias. The search was conducted jointly by both authors, with either author being able to exclude works that either failed to meet the inclusion criteria, or did meet the exclusion criteria.

4.1. Inclusion Criteria

To ensure that this review is both complete and consistent, the following inclusion criteria were defined for works included in this analysis:

Published in the year 2015 or later
Published in a peer-reviewed, highly regarded journal or conference with a focus on law, criminal justice, or computer science
Specifically focuses on the application of ARA in the United States legal system
Applied transparent investigative techniques
Focuses on the adult, rather than juvenile, legal system

Each of these criteria were carefully considered and selected to generate a complete collection of high-quality academic work across several disciplines. Algorithmic risk assessments and their use in the law are not limited to any one field of study, so casting a wide net was necessary. Sources were primarily gathered from Google Scholar and the ACM Digital Library, using search queries that are discussed at greater length later in this section.

Although the use of ARA in courts of law in the United States dates back to the 1970s [22], it is also true that artificial intelligence as a field of academic study has changed substantially within the last decade. Therefore, this review has chosen to focus exclusively on works published in 2015 or later. This has unfortunately removed some excellent prior work from our analysis, but it has given the best summary of the AI tools that are currently in use.

Each discipline under the umbrella of academia has its own practices for judging the quality of published material. This can make it difficult to define exact criteria for selecting or rejecting work in a multi-disciplinary analysis. Ultimately, the onus has been placed on the particular practitioners in each field by relying on the peer review process for including works published in scientific journals or conferences, and by selecting legal works from law reviews associated with established educational institutions. The scientific standard of peer-review has been used in this investigation as our quality assessment for included works. This has eliminated some excellent research published in popular media, but it has shielded the authors from the need to subjectively assess works that have not passed peer review. Overall, this quality assessment has yielded rigorous works of consistently high merit. Publications that received peer review in an academic publication were treated by the authors as being of equal validity for the purposes of collection our results.

The decision was made to limit the focus of the review specifically to the use of ARA in the United States legal system. Although algorithmic tools are now being applied in courts around the world, each legal system has its own context, laws, history, and potential biases. Conclusions that may be clear using data collected from one country do not necessarily hold in another. As such, the decision was made to focus on the United States as it is a nation that has drawn a large number of academic observations, and it is the nation with which the authors are familiar in a legal context.

The juvenile justice system in the United States is a large and diverse collection of entities that deserves just as much attention as the adult justice system. However, there are additional legal protections in place to protect data collection involving children, and ethical considerations around juvenile justice that do not apply to adults. Different philosophies of justice and punishment prevail within the juvenile system, as compared to the adult justice system, and the juvenile justice system operates differently from the adult system. As a result, many risk assessment tools used for juveniles include criteria not part of tools used with adults, and vice versa. Therefore, the decision was made to focus exclusively on the adult criminal justice system. The analysis presented in this work focuses on the uses of AI in risk assessment rather than on the actual applications of justice, so limiting the investigation to the adult justice system gives us the strongest and most consistent conclusions.

4.2. Exclusion Criteria

The specified search criteria yielded an extremely large number of papers, some of which would not have served the intended purpose of this investigation well. In anticipation of this, several criteria were also defined to remove written works that otherwise would have been included.

The intent of this investigation is to collect and summarize the works of other authors. Therefore, works which were themselves summaries, e.g., survey papers or reviews of existing literature, have been excluded. Rather than recording in this work the words of one author given by another, we have gone straight to the source whenever possible.

Because the process of peer review was relied upon heavily to identify high-quality academic literature, self-published works have been excluded from this study. It is an unfortunate fact that many excellent papers have been published directly on the internet and have garnered many citations, but, due to the multi-disciplinary nature of this study, reviewing self-published work from one field while excluding it from another would introduce potential bias. The focus of this investigation is to collect the informative works of many authors over a ten-year period. Authors whose intention is to persuade the reader may intentionally or unintentionally conceal some of the facts as part of their discussion or might introduce bias into the results of this review. Therefore, papers whose intention is primarily persuasive have been excluded from this investigation.

Finally, AI and algorithmic governance are both fields that are developing extremely quickly. The cadence of academic research means that original works in these subjects are generally introduced first as journal or conference papers. Textbooks tend to lag behind the rapid advances of research and generally convey information that has already been expressed in an earlier publication. Because of this, book chapters have been excluded from this survey.

4.3. Query Terms

The term ARA is not universally accepted. AI and actuarial tools are known by a large number of names. To ensure that a complete list of initial papers was collected, the following list of synonyms for AI or algorithmic tools was collected: predictive algorithms, algorithmic decision-making, AI-driven, AI-enabled, statistical risk assessment, algorithmic scoring, predictive analytics, evidence-based sentencing, pretrial risk assessment instruments, recidivism prediction instruments, algorithmic governance, and models of recidivism. This list was compiled by conducting an initial review of the literature before collecting our list of included papers.

Each database search query was structured in the same way, by selecting one of the listed search terms along with criminal, justice, and risk assessment. An example query would be:

predictive algorithms AND (criminal OR justice) AND “risk assessment”

One query was constructed for each search term, for a total of 12 queries in each database with the same template shown above being used for each query to either Google Scholar or to the ACM Digital Library. No database-specific search strings were used. Many papers appeared in multiple queries. Duplicate sources were removed before any further analysis was conducted. Once all 24 queries were run and duplicates removed, a total of 188 articles were identified. From the identified articles, 20 were removed as duplicates. Sixty-three results were removed based on our exclusion criteria before retrieval, resulting in 105 retrieved articles. Of the 105 retrieved, 49 articles were excluded, giving a final sample size for review of 56 articles. The count of sources at each step can be seen in Figure 2. The complete list of included sources is provided in Appendix A. This list, along with the author’s analysis and code used for visualization is linked in the Data Availability Statement.

5. Results

The 56 included sources were analyzed using tabulation in an Excel spreadsheet which can be found using the link in the data availability statement. Visualizations were generated using Seaborn 0.13 in Python 3.12.

5.1. Question 1: Is Machine Learning Being Used as a Tool for Risk Assessment in the United States?

ML, especially deep learning, promises to detect complex patterns in large datasets without human intervention. While this promises human-level or better detection of high-risk offenders within our criminal justice system, without the need for expensive and time-consuming data curation and analysis by human beings, it also presents some particular challenges. ML systems rely on historical training data to be able to make accurate predictions about the future, and so, might also capture biased decisions made by officers and judges in the past. There is also a question of transparency, as many ML-enabled systems do not deliver an interpretable decision-making process. Rather, they only deliver a final judgement.

Of the 56 included sources, 42 of them specifically discussed ML as a technique for developing or administering risk assessments. As presented in Figure 3a, this indicates that 75% of surveyed sources chose to discuss ML as it applies to risk assessment. This demonstrates that there is widespread interest in adapting ML tools to address the concerns of this problem space. However, as discussed in Section 3.3, ML tools fall into a very wide spectrum. None of the surveyed sources made the specific claim that deep learning techniques are currently being applied to the problem of risk assessment in the courtroom.

ML was tested as an experimental technique in several of our surveyed sources. These have included applying random forest to trial data from Virginia [35] and gradient boosting trees applied to data from New York City [5]. However, the tools used in these experiments tend to be fairly simple algorithms. As shown in Figure 3b, the tools applied in these experiments tend to be from classical ML, rather than newer techniques like deep neural networks. Some techniques from classical machine learning offer greater interpretability than deep learning [36], which generally suffers from the “black box” problem [36], meaning that the internal workings of the models are not observable or verifiable. Transparency and interpretability are highly valued in risk assessment tools, which may be why classical ML techniques are preferred in this context.

Although the state of the art is advancing quickly in AI, legal systems in the United States have been slow to adapt newer tools drawn from deep learning, including LLMs [19]. While there is a perception that deep learning will be used as a risk assessment tool in courtrooms in the future [22], currently, many ARAs are derived from simple ML techniques [6]. Even though LLMs and other techniques from natural language processing are not currently being used for risk assessments, they do have applications in the courtroom for transcribing and summarizing conversations [3] and for obfuscating race-related information in prosecutorial documents [37].

5.2. Question 2: What Specific Technologies Are Being Applied for Risk Assessment?

There is very little consensus amongst states as to which risk assessments should be employed in courtroom settings. Frequently, states choose to develop their own actuarial tools, using data that are not publicly available. However, some tools have emerged in the literature as attracting more substantial attention and more frequently serving as the baseline for tools developed by state commissions.

As shown in Figure 4, COMPAS [38] is the most widely discussed ARA tool, appearing in 37 of the 56 reviewed sources. The next most widely discussed tool is the Public Safety Assessment (PSA) [39], which was developed by the federal government in 2018 and was discussed by 19 out of 56 sources. The Level of Service Inventory-Revised (LSI-R) [40] was also developed by the federal government in 1995, as an update to the earlier version of the LSI, and 11 of our sources discussed either the LSI-R or the LSI. The next two most popular tools, the Virginia Pretrial Risk Assessment Instrument (VPRAI) [41] and the Colorado Pretrial Risk Assessment Tool (CPAT) [42], were both developed at the state level, and both were discussed in 8 of our 56 sources. The sixth most popular tool for risk assessment in the courtroom is the Post-Conviction Risk Assessment (PCRA) [43], another federally developed tool, discussed in 4 of our sources.

A wide range of other risk assessments exist, some of which were discussed in Section 3.1. However, each of those tools were discussed in fewer than four of our total sources. COMPAS is generally presented as a caution against unintentional bias in risk assessment tools [44], while the other tools are presented in both positive and negative light.

AI and ML have been influential on the development of risk assessments, even if the newest tools of AI have yet to make an appearance. The burden of training models on datasets of limited size [6] and on using models that are relatively interpretable [17] mean that most ARAs have been built from relatively simple models, with logistic regressions being the most commonly referenced in the literature. Notably absent from risk assessments are deep neural networks and LLMs, both of which have become extremely popular in AI recently. Deep learning has shown no additional predictive power in this particular problem space while suffering from additional needs for training data [24], while exacerbating existing concerns regarding interpretability and transparency [33]. The additional needs of deep learning approaches for data are exacerbated by the need for risk assessments to be calibrated for local populations [45].

5.3. Question 3: Does Algorithmic Risk Assessment Contribute to Bias or Issues of Fairness?

Of our surveyed sources, 42 individual sources indicated that AI and actuarial tools do or can contribute to biased performance in risk assessment, while the remaining 12 sources did not, as shown in Figure 5. The specific biases identified by the authors of our sources are race [14,22,33,46], gender [6,14,47], age [48,49], and economic status [17,50,51,52]. This finding does not indicate that biased and unfair performance is necessary in the use of algorithmic tools. Rather, many sources described ARA tools as failing to correct the human biases in the justice system. Varying definitions of bias and fairness have also contributed to disagreement in the literature on this subject [53].

Broadly speaking, there were two significant sources of human bias identified. The first of these was bias present in training data collected in historical court records [3,36,54]. Even ARA tools that are specifically identified as ML are frequently developed by statisticians based on an analysis of historical records, and so can still be impacted if or when those data have captured the biased performance of human risk assessors. The second way that human bias might affect ARA tools is in the interpretation of generated risk scores [55]. Judges with access to risk assessment tools are not always beholden to them and are usually free to utilize an individual’s risk score in whatever way they feel is appropriate at the time. In many municipalities, there is no guarantee that individuals with the same risk score will have the same outcome [35].

The second concern shows that ARA tools can often give the color of objectivity to an assessment, without actually mitigating the bias introduced by human assessors into the process [56]. This type of false negative is concerning because it delays legitimate reforms of risk assessments in criminal justice. If risk assessments are believed to be a purely mechanical or computational process, and therefore immune to expressing bias along lines of race, gender, age, and wealth, then those biased assessments are more likely to be accepted at face value, rather than investigated for any potential expressions of unfairness.

Many of the collected sources indicate that human judgment is the most troubling source of bias in ARA. Unfair sentencing practices in historical data can cause skewed performance in risk assessments [17], and many tools leave the interpretation of risk scores up to judges and prosecutors [57]. For AI-derived tools to continue gaining acceptance in American courtrooms, it will have to be shown that these tools can correct rather than capture the potential biases of human judgement.

5.4. Question 4: What, If Any, Is the Evidence and Apparent Source of Bias in Algorithmic Risk Assessment?

Overwhelmingly, the COMPAS dataset collected in Broward County [58] is used as evidence amongst our sources to suggest that ARA tools do or can exhibit biased and unfair performance when used in the real world. From our included papers, 36 (64%) specifically use the COMPAS study as evidence for or against the existence of bias in ARA tools. Of the 44 sources that specifically stated that ARA can or does contribute to bias, 32 (73%) of them cited the COMPAS study as a source, while only 12 (27%) of them made the claim that ARA shows bias without using COMPAS as an example. The COMPAS dataset is now more than a decade old, and it is not clear that its performance is exemplary of the ARA tools currently in use by American courts. This limitation is discussed further in Section 6.1.

Another case presented as evidence in exploring the potential biases of risk assessment tools was Kentucky’s adaptation of pretrial risk-assessment tools as a means of bail reform in 2011 [4]. This move resulted in a small decrease in pretrial detention rates in Kentucky, along with a commensurate small increase in failure-to-appear and pretrial crime rates [4]. Furthermore, the inconsistent interpretation and application of the pretrial risk-assessment tools resulted in a greater benefit being realized by white defendants than black [4]. This disappointing outcome led Kentucky to move towards the PSA risk-assessment tool in 2013, although this move did not appreciably improve outcomes [4]. It should be noted that while the period of data collection for this study was outside our focus area, several papers discussing these events were published that fit our inclusion criteria.

Like Kentucky, Virginia is often held up as an example of biased performance by ARA tools. Observations of sentencing rates in Virginia have shown that their own use of risk assessments as a sentencing tool in the 2000s resulted in longer sentences for black and young defendants [35], a fact which was documented by the Virginia Criminal Sentencing Commission. However, the fact that black defendants were sentenced more harshly during this time, even with the same risk score [55], shows that the tool itself may not have been at fault in this case. Rather, as is often the case, the tools appear to have been used to apply the color of objectivity to a biased assessment made by a human being.

Other examples of underperformance in ARA tools have resulted in the tools themselves being replaced with updated versions. Two examples of tools that were updated after several years of use are the LSI, which later became the LSI-R, and the Static Risk Assessment (SRA), which was later replaced with the SRA2. While the LSI was updated largely to improve its predictive performance [47], the SRA had to be updated because, as a purely data-driven tool, it had captured some confusing and illogical patterns in its training data. In particular, because individuals who are convicted of murder are less frequently re-arrested, individuals convicted of murder were given a lower risk score by the SRA [47]. While this observation might be true in purely numerical terms, the tool failed to make the connection that convicted murderers often spend the rest of their lives in prison and therefore do not appear in later arrest records.

In a field that is progressing as rapidly as AI, it can be difficult to accept results based on decade-old data collection. However, a full examination of the concept-to-courtroom pipeline shows that this decade-old data is often the best available data. The risk assessments themselves undergo a lengthy period of development and validation [47], and then must operate in the wild for several years before large datasets can be collected. Court records must be collected by independent researchers, which itself is a slow process. All things considered, risk assessment tools are usually several years old before their performance can be carefully examined by outside parties.

5.5. Observed Sources of Bias

There is an extremely large number of factors that might cause biased performance by an actuarial or AI tool. These range from factors intrinsic to the tools themselves, problems or difficulties with collected real-world data, and the place that these tools occupy in the world. Figure 6 summarizes this investigation’s observations on the most commonly documented sources of bias in ARA tools.

By far, the most frequently recorded concern is bias in the historical court records used as training data for tools founded in statistical techniques [34]. This concern is why it is so important for the public to be aware of the usage of ML in risk assessment. Tools that learn patterns in their training data without human guidance have a particular need for human review before they can be used with faith in real courts [17]. ML algorithms train to minimize their loss, rather than maximize their benefit, and it is clear that human knowledge grounded in ethics needs to be part of the process of judgment at some point.

Many of the perceived sources of bias are more strongly connected to the human presence within the justice system. While many sources express the view that AI, as an inanimate program, can only acquire bias from human-sourced data [3], the concern remains that ARA limit advances in fairness in our justice system. Sufficiently large datasets for training ML span a large data collection period and risks capturing unfair or unconstitutional practices [17]. Continued improvements in AI-enabled risk assessments require the regular collection of fresh data.

The second most commonly recorded concern is the use of variables or features that correlate so strongly with race, gender, age, or other protected attributes as to be essentially replacements for those values in the predictive features given to risk-assessment algorithms [59]. It is commonly understood that some attributes describing defendants should be omitted from consideration in judging the risk presented by a defendant, either because they are constitutionally protected in the United States or because they are immaterial to the assessment. However, the inclusion of certain other attributes, referred to as “Proxy Variables” in Figure 6, can, in effect, introduce these protected attributes into the consideration. Examples of proxies identified by our included sources would be a count of stops by police [60], income [61], employment history [17], education [62], ZIP code [61], or an individual’s first name [62]. Proxy variables can be difficult to identify without independent exploratory factor analysis, again highlighting the importance of human judgment and review in the process of developing ARA tools.

The remaining sources of bias each appear in our collected sources far less frequently than training data and proxy variables. Other concerns include intentional or unintentional actions taken by the tools’ developers, which introduce bias in their products, hiding or rationalizing human bias in the justice system, inconsistent interpretation of scores or answers to questions on risk surveys, and the need for tools to be regionally calibrated. Of these concerns, very few identify the necessary underlying issues with ARA tools themselves. One source noted that with correct and unbiased training data, even COMPAS performs without bias [36].

Some of the concerns expressed by our sources are intrinsic to AI techniques. Many ML models are “black boxes” in that, while their outputs can easily be read and understood by human beings, the reasoning behind those decisions is not clear and is entirely untransparent [17]. This is not true of all ML models, but it does indicate that specifically interpretable models should be applied if ML is going to be used in ARA.

It is important to note that interpretability and black-box models are frequently identified as concerns by authors discussing ARA, even when the ML models being used in risk assessments, like decision trees or logistic regressions, are inherently more interpretable than neural networks, which are not being used in courtroom risk assessments. While frameworks exist to improve the interpretability of complex ML [63], these frameworks have done little to quell the concerns of authors in the legal space. The high stakes of criminal justice make courts slow to adopt new technologies [52], and the state of the art in AI has little impact on criminal risk assessment. Additional concerns exist around risk assessments developed by private industry, like COMPAS, where the outcomes of the risk assessment may lack transparency, not because of any inherent restriction of ML, but rather because of trade secrets [64].

5.6. The Potential Benefits of AI

The use of AI-enabled risk assessment tools promises many potential benefits to criminal justice system actors (i.e., defendants, judges, prosecutors, etc.), and to society as a whole. This observation is shown in Figure 1. By far, the most commonly referenced benefit was efficiency, with 38 (68%) of our sources claiming that the use of AI-enabled ARA tools could improve resource utilization or lower costs within the criminal justice system. Interestingly, it has been claimed that this improvement in efficiency is possible while also reducing the overall crime rate. One study showed that the application of simple AI tools to the task of predicting recidivism can reduce crime by 24.8% while also reducing the incarcerated population by 42.0% [5]. This study is widely cited to show AI’s synergistic potential for identifying high-risk offenders for incarceration, allowing offenders with lower risk scores to be given alternative sanctions.

Most interestingly for this paper, 34 (61%) of our sources listed a reduction in bias as a potential benefit of AI-enabled ARA tools. It is important to remember that this benefit is a possibility that is largely dependent on the implementation of the ARA tool being used. However, this demonstrates that there is a common and widespread claim that ARA can be used to make the process of criminal justice less biased. Twenty-six (46%) of our sources also listed better predictive power compared to human judges as one of the benefits of ARA. Although many sources expressed concern about AI being used to remove human judgment from the justice system, a great number of sources have also expressed that AI tools, correctly implemented and trained, could make better predictions with less bias than human judges.

The claimed benefits of AI and ARA extend beyond the walls of the courtroom. Seventeen (30%) of our sources identified ARA as a tool for increasing the objectivity of the justice system, while 14 (25%) specifically identified ARA as a tool for increasing the transparency of the justice system. ARA is often identified as a tool that supports reforms within the justice system. Twelve (21%) sources identified social or criminal justice reform as a potential benefit of ARA, and 7 (13%) specified that ARA can be used as a replacement for cash bail, allowing defendants identified as low-risk to await their trials outside of jail without the need for them to have access to large cash reserves.

6. Discussion

6.1. COMPAS—A Cautionary Tale

Out of 56 included sources in this study, 36 (64%) specifically discuss COMPAS. It is the most discussed risk assessment and the most frequently used as evidence of bias in ARA. Of the 44 sources that specifically make the claim that ARA tools are contributing to bias, 32 (73%) of them used COMPAS as evidence in support of their claim. COMPAS was the subject of a 2016 ProPublica article [65], which gave it a much higher public profile than other ARA tools. More importantly, a large dataset of criminal defendants from 2013 to 2014 in Broward County, which was using COMPAS, was compiled using public record requests [58]. This made statistical analysis of COMPAS’s performance possible in a way that was not always possible for other tools. These analyses uncovered that COMPAS recommended 24% longer sentences for young defendants [35] and that COMPAS was more than twice as likely to incorrectly identify a black defendant as a future criminal as compared to a white defendant [58] although Northpoint, the company that developed COMPAS, has disputed claims of bias in COMPAS [58].

While the abundance of data related to COMPAS makes it one of the easiest tools to analyze, it does not necessarily make it the most representative ARA tool. The dataset is more than 12 years old as of the writing of this paper, and many reforms have been made to risk assessment, some inspired by the shortcomings identified in COMPAS. While the original analysis of COMPAS was very important at the time that it was first conducted, it is no longer clear that it accurately reflects the landscape of AI or ARA as they exist now. If anything, COMPAS now stands as an example of the harms and biases that can arise from ARA and other AI tools when trained on biased data, rather than as an example of the harms and biases that do currently exist within the justice system. More research is needed that focuses on and evaluates tools other than COMPAS to better understand issues of bias in current ARA tools.

6.2. Reactions to AI in the Real World

While AI-enabled ARA tools show potential for improving and supporting the criminal justice system, one common limiting factor is the unwillingness of human beings to engage with those tools [55]. While AI-derived tools might be better predictors of recidivism than judges [6] and can be less biased than human actors [6], these benefits are meaningless if the recommendations of AI systems are ignored. Studies of risk assessment tools in use in the real world have found that judges frequently ignore the recommendations of those systems [55,66] and that judges sometimes interpret identical risk scores differently based on the race of defendants [57].

The public reaction to ARA is not always positive. Civil rights groups have expressed concerns regarding their usage [67]. Even where these tools might contribute to more fairness in the judicial process, the public is often skeptical and slow to accept tools that come with documented risks [24]. Likewise, judges express reluctance to adopt new risk assessments in their courtrooms [55], expressing concerns that risk assessments limit judicial options [55] and undermine elected officials. Despite the slowness of the public and the judiciary to support ARA, legislative groups often view them favorably as a means to correct historical unfairness [4]. Because of the disconnect between political and judicial priorities, risk assessment tools often take many years to develop [24], and their risk scores are often ignored by judges [55].

6.3. How These Tools Are Used Outside the Courtroom

AI is an extremely influential field of study both inside and outside of the courtroom. Although deep learning algorithms are not being widely used for ARA [19], they have begun to appear in the newer field of predictive policing [20]. Several products currently on the market purport to indicate neighborhoods that are more likely to experience crime [58]. Some of the products currently offered for use by police departments include: Gotham, HunchLab, and PredPol, all of which were made by Palantir [33], which is headquartered in Miami, FL.

Efficiency and effective use of limited resources are the most widely cited benefits of AI’s use in the courtroom, with 38 sources (68%) indicating that AI can contribute to better resource allocation. However, there is also substantial concern for introducing or perpetuating bias with these products [33]. Neighborhoods that have historically experienced over-policing generally have higher arrest rates without higher crime rates [17]. These inflated arrest rates can be interpreted by ML models as areas needing additional policing, which further inflates the arrest rates in those neighborhoods, a phenomenon called the “ratchet effect” [68]. Although ML has been shown to predict crime and recidivism with greater reliability than humans [49], it is also clear that these products require additional testing and public scrutiny before they can reliably be used in the real world.

Of our surveyed sources, five (9%) expressed concerns that AI algorithms do not have the same transparency and interpretability as classical statistical tools. For many ML algorithms and neural networks, it is not clear what weight they have given to individual features [4]. This problem raises serious concerns about the legality and usability of these programs [4], in addition to the concerns regarding bias. Explainable AI is a field of study that produces AI models whose reasoning is transparent to outside scrutiny [69]. Because the predictive policing software currently on the market is closed-source and proprietary [70], it is impossible to know whether interpretable models are being used, or even what models are being used.

6.4. Limitations of This Study

Several decisions were made to limit the scope of this paper. These decisions were intended to produce a cohesive body of work for review, but have also created some limitations on the usability of this structured interdisciplinary review. First, we have focused entirely on the justice system within the United States. Second, we have chosen to focus exclusively on the adult justice system, omitting several papers whose focus was on the juvenile justice system. Finally, we have limited the time over which papers were collected to one where AI has been influential. While ARA tools have existed at least since the 1970s [22], many of the early tools were simple actuarial tools rather than AI or ML. This review summarizes the conclusions presented by the collected sources, and while some observations may apply to broader populations, we make no explicit claim of that.

Although care was taken to ensure that this review is complete by using broad queries to search well-supplied sources of literature, it is possible that some appropriate publications were not collected during our query process. Queries and paper assessments were performed and reviewed by both authors independently to help ensure the completeness of this review. No formal inter-rater agreement statistic was calculated, though the authors discussed and reviewed each others’ assessments informally. A sensitivity analysis for an additional ten percent missing, includable papers suggests that the actual percent of papers that the actual percentage of papers expressing concerns over bias in ARA may be as high as 80.5% or as low as 71.4%, both of which are consistent with the overall conclusion of this study that a majority of polled sources express concern that algorithmic techniques will increase the overall bias in criminal justice. The other results presented in Section 5 are similarly robust to the presence of un-collected sources.

7. Conclusions and Future Work

This analysis shows that there is a predominant concern of bias in ARA, with 78% of sources expressing the concern that risk assessment tools do or can contribute to bias within the criminal justice system. At the same time, there is a common belief that AI can help alleviate this concern, with 59% of sources recognizing that eliminating bias is one potential benefit of AI. While there is substantial interest in ML, with 74% of sources discussing its use, it appears that it is relatively simple tools like random forests and logistic regressions that are primarily being used in criminal risk assessment tools, rather than deep learning approaches.

Over the past few decades, there has been a push for sentencing guidelines and other tools that reduce judicial discretion when making decisions about pre-trial detention and custodial sentences [71]. This concern about explicit and implicit bias within human decision-making has led to an increased use of data-driven models with the assumption that the technology cannot harbor such pre-conceived biases. The problem noted by many of our sources is that training ARA tools on existing data perpetuates existing disparities. The financial inaccessibility of bail, for example, increases the likelihood of pre-trial detention, which increases the likelihood of conviction and future crime [66]. Criminal justice researchers have also established that racial bias permeates arrest records, as Black Americans and other minority groups are disproportionately stopped, searched, and arrested by police for the same behaviors exhibited by White Americans [71,72,73].

Criminal history is a reflection not just of personal behavior, but of a system that disproportionately affects people based on race, ethnicity, class, gender, age, and other protected characteristics. Risk assessment tools, especially ARA, rely heavily on criminal records and other historic data that make people of color appear riskier than their White counterparts [71]. This results in ARA predictions that are correct when viewed through the lens of the training data, but still reflect bias inherent in said data.

ARA is a rapidly changing technique, both in law and in computer science. There is a need for the academic literature in both fields to continue to monitor and evaluate new tools as they are developed and adopted. By 2020, more than 20 states had already adopted ARA as part of their courtroom procedures [64]. While some states are using federal or commercial tools, many have developed their own tools, suggesting a widespread need for review of the bias and effectiveness of these disparate tools. Similarly, predictive policing tools need additional scrutiny, both through the lenses of theory and legal philosophy as well as independent testing of their specific implementations. Algorithmic governance is an issue that is poised to impact the lives of all people.

Author Contributions

Conceptualization, G.A. and K.C.; methodology, G.A. and K.C.; software, G.A.; validation, G.A. and K.C.; formal analysis, G.A. and K.C.; investigation, G.A. and K.C.; resources, G.A. and K.C.; data curation, G.A. and K.C.; writing—original draft preparation, G.A. and K.C.; writing—review and editing, G.A. and K.C.; visualization, G.A.; supervision, G.A. and K.C.; project administration, G.A. and K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

A complete list of the considered sources used in this work, along with the code used for their analysis, can be found at https://github.com/gentry-atkinson/AI-in-Risk-Assessment (accessed on 25 May 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ARA	Algorithmic Risk Assessment
COMPAS	Correctional Offender Management Profiling for Alternative Sanctions
CPAT	Colorado Pretrial Risk Assessment Tool
LSI-R	Level of Service Inventory- Revised
ML	Machine Learning
PCRA	Post-Conviction Risk Assessment
PSA	Public Safety Assessment
RNA	Risk Need Assessment
SRA	Static Risk Assessment
VPRAI	Virginia Pretrial Risk Assessment Instrument

Appendix A. Included Publications

Table A1. The complete list of included papers, containing 56 sources. Inclusion criteria are given in Section 4.

Title	Author(s)	Year	Journal
AI in Corrections	Rowland, Matthew G., Amit Shah, and Ashit Chandra	2023	Federal Probation
AI In Criminal Justice: Implications For Justice, Fairness, and Potential Biases	Ramandeep Kaur	2023	Nyaayshastra Law Review
Algorithmic Decision Making and the Cost of Fairness	Corbett-Davies, Sam, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq	2017	Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining
Algorithmic governance from the bottom up	Hannah Bloch-Wehba	2022	BYU Law Review
Algorithmic risk assessment in the hands of humans	Stevenson, Megan T., and Jennifer L. Doleac	2024	American Economic Journal: Economic Policy
Algorithmic risk governance: Big data analytics, race and information activism in criminal justice debates	Hannah-Moffat, Kelly.	2019	Theoretical Criminology
Algorithms and the individual in criminal law	Jorgensen, Renée	2022	Canadian Journal of Philosophy
Algorithms in practice: Comparing web journalism and criminal justice	Christin, Angèle	2017	Big data & society
Artificial intelligence, due process and criminal sentencing	Villasenor, John, and Virginia Foggo	2020	Michigan State Law Review
Assessing risk assessment in action	Stevenson, Megan	2018	Minnesota Law Review
Bail or jail? Judicial versus algorithmic decision-making in the pretrial system	Elyounes, Doaa Abu	2020	The Columbia Science and Technology Law Review
Bias in, bias out	Mayson, Sandra G.	2018	The Yale Law Journal
Blind Justice: Algorithmically Masking Race in Charging Decisions	Chohlas-Wood, Alex, Joe Nudell, Keniel Yao, Zhiyuan Lin, Julian Nyarko, and Sharad Goel	2021	Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
Citizen Decisions and Machine Predictions: Coethnicity, Artificial Intelligence and Co-Production	Anastasopoulos, L. Jason, and Micah Gell-Redman	2024	New England Area Political Psychology Meeting
Constitutional dimensions of predictive algorithms in criminal justice	Brenner, Michael, Jeannie Suk Gersen, Michael Haley, Matthew Lin, Amil Merchant, Richard Jagdishwar Millett, Suproteem K. Sarkar, and Drew Wegner	2020	Harvard Civil Rights-Civil Liberties Law Review
Designed to fit: The development and validation of the STRONG-R recidivism risk assessment	Hamilton, Zachary, Alex Kigerl, Michael Campagna, Robert Barnoski, Stephen Lee, Jacqueline Van Wormer, and Lauren Block	2016	Criminal Justice and behavior
Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments	Green, Ben, and Yiling Chen	2019	Proceedings of the conference on fairness, accountability, and transparency.
Evaluating algorithmic risk assessment	Hamilton, Melissa	2021	New Criminal Law Review
Evaluating the evidence in algorithmic evidence-based decision-making: the case of US pretrial risk assessment tools	König, Pascal D., and Tobias D. Krafft	2021	Current Issues in Criminal Justice
Evidence-based sentencing and scientific evidence	Martínez-Garay, Lucía	2023	Frontiers in Psychology
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments	Chouldechova, Alexandra	2017	Big Data
Fair risk assessments: A precarious approach for criminal justice reform	Green, Ben	2018	5th Workshop on fairness, accountability, and transparency in machine learning
Fairness, accountability and transparency: notes on algorithmic decision-making in criminal justice	Chiao, Vincent	2019	International Journal of Law in Context
Formalizing Fairness: Statistical Measures of Parity for Recidivism Prediction Instruments	Song, Joshua	2023	Michigan Technology Law Review
Fragile algorithms and fallible decision-makers: lessons from the justice system	Ludwig, Jens, and Sendhil Mullainathan	2021	Journal of Economic Perspectives
Ghosting the machine: Judicial resistance to a recidivism risk assessment instrument	Pruss, Dasha	2023	2023 ACM conference on fairness, accountability, and transparency
Human decisions and machine predictions	Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan	2018	The quarterly journal of economics
Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction	Grgic-Hlaca, Nina, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller	2018	Proceedings of the 2018 World Wide Web Conference
In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction	Wang, Caroline, Bin Han, Bhrij Patel, and Cynthia Rudin	2023	Journal of Quantitative Criminology
Inherent trade-offs in the fair determination of risk scores	Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan	2017	Leibniz International Proceedings in Informatics
Interpretable classification models for recidivism prediction	Zeng, Jiaming, Berk Ustun, and Cynthia Rudin	2017	Journal of the Royal Statistical Society Series A: Statistics in Society
Judging risk	Garrett, Brandon L., and John Monahan	2020	California Law Review
Life, liberty, and trade secrets: Intellectual property in the criminal justice system	Wexler, Rebecca.	2018	Stanford Law Review
Machine learning forecasts of risk to inform sentencing decisions	Berk, Richard, and Jordan Hyatt	2015	Federal Sentencing Reporter
On chances and risks of security related algorithmic decision making systems	Zweig, Katharina A., Georg Wenzelburger, and Tobias D. Krafft	2018	European Journal for Security Research
Paths to digital justice: Judicial robots, algorithmic decision-making, and due process	Fortes, Pedro Rubim Borges	2020	Asian Journal of Law and Society
Predictive Analytics and Risk Assessment: A Logical Response to Intimate Partner Homicide	Ross, Lee E.	2017	International Journal of Criminal and Forensic Science
Pretrial risk assessment instruments in practice: The role of judicial discretion in pretrial reform	Copp, Jennifer E., William Casey, Thomas G. Blomberg, and George Pesta	2022	Criminology & Public Policy
Pretrial risk assessment instruments in the US criminal justice system—what lessons can be learned for the European Union	Novokmet, Ante, Zvonimir Tomičić, and Zoran Vinković	2022	International journal of law and information technology
Reprogramming recidivism: the first step act and algorithmic prediction of risk	Cyphert, Amy B	2020	Seton Hall Law Review
Risk assessment in criminal sentencing	Monahan, John, and Jennifer L. Skeem	2016	Annual review of clinical psychology
Risk scores, label bias, and everything but the kitchen sink	Zanger-Tishler, Michael, Julian Nyarko, and Sharad Goel	2024	Science Advances
Smart Justice? Making sense of the rise of algorithm-based pre-trial risk assessment in criminal justice through ‘legal models	Wenzelburger, Georg, Karen Yeung, and Kathrin Hartmann	2025	Digital Society
Technologies of crime prediction: The reception of algorithms in policing and criminal courts	Brayne, Sarah, and Angèle Christin	2021	Social problems
The accuracy, fairness, and limits of predicting recidivism	Dressel, Julia, and Hany Farid	2018	Science advances
The effect of risk assessment scores on judicial behavior and defendant outcomes	Sloan, CarlyWill, George Naufal, and Heather Caspers	2025	Journal of Human Resources
The impact of algorithmic risk assessments on human predictions and its analysis via crowdsourcing studies	Fogliato, Riccardo, Alexandra Chouldechova, and Zachary Lipton	2021	Proceedings of the ACM on Human-Computer Interaction
The institutional life of algorithmic risk assessment	Solow-Niederman, Alicia, YooJung Choi, and Guy Van den Broeck	2019	Berkley Technology Law Journal
The intersection of race and algorithmic tools in the criminal legal system	Southerland, Vincent M	2020	Maryland Law Review
The intuitive-override model: Nudging judges toward pretrial risk assessment instruments	DeMichele, Matthew, Megan Comfort, Kelle Barrick, and Peter Baumgartner	2021	Federal Probation
Carlson, Alyssa M. “The need for transparency in the age of predictive sentencing algorithms	Carlson, Alyssa M	2017	Iowa Law Review
The use of artificial intelligence in gauging the risk of recidivism	Hillman, Noel L	2019	The Judges Journal
This thing called fairness: Disciplinary confusion realizing a value in technology	Mulligan, Deirdre K., Joshua A. Kroll, Nitin Kohli, and Richmond Y. Wong	2019	Proceedings of the ACM on Human-Computer Interaction
Time for a change: Examining the relationships between recidivism-free time, recidivism risk, and risk assessment	Frisch-Scott, Nicole E., and Kiminori Nakamura	2022	Justice Quarterly
Uncertainty, risk and the use of algorithms in policy decisions: A case study on criminal justice in the USA	Hartmann, Kathrin, and Georg Wenzelburger	2021	Policy Sciences
Using daubert to evaluate evidence-based sentencing	Hopkinson, Charlotte	2017	Cornell Law Review

References

Matz, A.K. The Debate and Concerns of Risk Assessment with Historically Marginalized Populations. In Handbook on Inequalities in Sentencing and Corrections Among Marginalized Populations; Routledge: Abingdon, UK, 2022; pp. 119–130. [Google Scholar]
Kang, B.; Wu, S. False positives vs. false negatives: Public opinion on the cost ratio in criminal justice risk assessment. J. Exp. Criminol. 2023, 19, 919–941. [Google Scholar]
Rowland, M.G.; Shah, A.; Chandra, A. AI in Corrections: The Basics and a Way to Experiment. Fed. Probat. 2023, 87, 4. [Google Scholar]
Stevenson, M. Assessing risk assessment in action. Minn. Law Rev. 2018, 103, 303. [Google Scholar] [CrossRef]
Kleinberg, J.; Lakkaraju, H.; Leskovec, J.; Ludwig, J.; Mullainathan, S. Human decisions and machine predictions. Q. J. Econ. 2018, 133, 237–293. [Google Scholar]
Ludwig, J.; Mullainathan, S. Fragile algorithms and fallible decision-makers: Lessons from the justice system. J. Econ. Perspect. 2021, 35, 71–96. [Google Scholar] [CrossRef]
Jegede, T.; Gerchick, M.K.; Mathai, A.S.; Horowitz, A. Challenge accepted? A critique of the 2021 National Institute of Justice recidivism forecasting challenge. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, Boston, MA, USA, 30 October–1 November 2023. [Google Scholar]
Desmarais, S.L.; Zottola, S.A.; Clarke, S.E.D.; Lowder, E.M. Predictive validity of pretrial risk assessments: A systematic review of the literature. Crim. Justice Behav. 2021, 48, 398–420. [Google Scholar] [CrossRef]
Farayola, M.M.; Tal, I.; Connolly, R.; Saber, T.; Bendechache, M. Ethics and trustworthiness of ai for predicting the risk of recidivism: A systematic literature review. Information 2023, 14, 426. [Google Scholar] [CrossRef]
Scaria, A.G.; Subramanian, V.; George, N.K.; Sengupta, N. Algorithms and recidivism: A multi-disciplinary systematic review. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, San Jose, CA, USA, 21–23 October 2024; Volume 7. [Google Scholar]
Gottfredson, S.D.; Moriarty, L.J. Statistical risk assessment: Old problems and new applications. Crime Delinq. 2006, 52, 178–200. [Google Scholar]
Taxman, F.S.; Dezember, A. The value and importance of risk and need assessment (RNA) in corrections and sentencing: An overview of the handbook. In Handbook on Risk and Need Assessment: Theory and Practice; Taxman, F.S., Ed.; Routledge: New York, NY, USA, 2017; pp. 1–20. [Google Scholar]
Burrell, W.D. Risk and needs assessment in probation and parole: The persistent gap between promise and practice. In Handbook on Risk and Need Assessment: Theory and Practice; Taxman, F.S., Ed.; Routledge: New York, NY, USA, 2017; pp. 23–48. [Google Scholar]
Kaur, R. AI in criminal justice: Implications for justice, fairness, and potential biases. Nyaayshastra Law Rev. 2024, 4, 1–11. [Google Scholar]
Poole, D.L.; Mackworth, A.K. Artificial Intelligence: Foundations of Computational Agents, 3rd ed.; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson Education Limited: London, UK, 2015. [Google Scholar]
Elyounes, D.A. Bail or jail? Judicial versus algorithmic decision-making in the pretrial system. Colum. Sci. Technol. Law Rev. 2019, 21, 376. [Google Scholar]
Yeung, K.; Harkens, A. How do “technical” design-choices made when building algorithmic decision-making tools for criminal justice authorities create constitutional dangers? Part II. arXiv 2023, arXiv:2301.04715. [Google Scholar]
Partnership on AI. Report on Algorithmic Risk Assessment Tools in the US Criminal Justice System; Partnership on AI: San Francisco, CA, USA, 2019. [Google Scholar]
Bloch-Wehba, H. Algorithmic governance from the bottom up. BYU Law Rev. 2022, 48, 69. [Google Scholar] [CrossRef]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Open AI 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 3 June 2026).
Cyphert, A.B. Reprogramming recidivism: The first step act and algorithmic prediction of risk. Seton Hall Law Rev. 2020, 51, 331. [Google Scholar] [CrossRef]
McKay, C. Predicting risk in criminal procedure: Actuarial tools, algorithms, AI and judicial decision-making. Curr. Issues Crim. Justice 2020, 32, 22–39. [Google Scholar] [CrossRef]
Berk, R.; Berk, D.; Drougas, D. Machine Learning Risk Assessments in Criminal Justice Settings; Springer: New York, NY, USA, 2019. [Google Scholar]
Blakeney, C.; Atkinson, G.; Huish, N.; Yan, Y.; Metsis, V.; Zong, Z. Measuring bias and fairness in multiclass classification. In Proceedings of the 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), Philladelphia, PA, USA, 3–4 October 2022. [Google Scholar]
Merriam-Webster. Merriam-Webster Dictionary; Merriam-Webster: Springfield, MA, USA, 2004. [Google Scholar]
Clemons, T.R. Blind injustice: The Supreme Court, implicit racial bias, and the racial disparity in the criminal justice system. Am. Law Rev. 2014, 51, 689–714. [Google Scholar]
Mauer, M. Addressing racial disparities in incarceration. Prison. J. 2011, 91, 87S–101S. [Google Scholar] [CrossRef]
Butler, P.D. Poor people lose: Gideon and the critique of rights. Yale Law J. 2013, 122, 2176–2205. [Google Scholar]
Berk, R.; Hyatt, J. Machine learning forecasts of risk to inform sentencing decisions. Fed. Sentencing Rep. 2015, 27, 222–228. [Google Scholar] [CrossRef]
Maughan, K.; Near, J.P. Towards a measure of individual fairness for deep learning. arXiv 2020, arXiv:2009.13650. [Google Scholar] [CrossRef]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
Jorgensen, R. Algorithms and the individual in criminal law. Can. J. Philos. 2022, 52, 61–77. [Google Scholar] [CrossRef]
Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 2017, 5, 153–163. [Google Scholar] [CrossRef] [PubMed]
Stevenson, M.T.; Doleac, J.L. Algorithmic risk assessment in the hands of humans. Am. Econ. J. Econ. Policy 2024, 16, 382–414. [Google Scholar] [CrossRef]
Wang, C.; Han, B.; Patel, B.; Rudin, C. In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction. J. Quant. Criminol. 2023, 39, 519–581. [Google Scholar] [CrossRef]
Chohlas-Wood, A.; Nudell, J.; Yao, K.; Lin, Z.; Nyarko, J.; Goel, S. Blind justice: Algorithmically masking race in charging decisions. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Virtual, 19–21 May 2021. [Google Scholar]
Brennan, T.; Dieterich, W.; Ehret, B. Evaluating the predictive validity of the COMPAS risk and needs assessment system. Crim. Justice Behav. 2009, 36, 21–40. [Google Scholar] [CrossRef]
Brittain, B.J.; Georges, L.; Martin, J. Examining the predictive validity of the Public Safety Assessment. Crim. Justice Behav. 2021, 48, 1431–1449. [Google Scholar] [CrossRef]
Austin, J.; Coleman, D.; Peyton, J.; Johnson, K.D. Reliability and Validity Study of the LSI-R Risk Assessment Instrument; The Institute on Crime, Justice, and Corrections, George Washington University: Washington, DC, USA, 2003. [Google Scholar]
VanNostrand, M.; Rose, K.J. Pretrial Risk Assessment in Virginia: The Virginia Pretrial Risk Assessment Instrument (2009). BiblioCEJA, CEJA. 2016. Available online: https://biblioteca.cejamericas.org/bitstream/handle/2015/3099/Virginia_Risk_Assessment_2009.pdf?sequence=1 (accessed on 3 June 2026).
Terranova, V.A.; Ward, K.C. Colorado Pretrial Assessment Tool Validation Study Final Report; National Association of Criminal Defense Lawyers: Washington, DC, USA, 2020. [Google Scholar]
Johnson, J.L.; Lowenkamp, C.T.; VanBenschoten, S.W.; Robinson, C.R. The construction and validation of the federal Post Conviction Risk Assessment (PCRA). Fed. Probat. 2011, 75, 16. [Google Scholar]
Dressel, J.; Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 2018, 4, Eaao5580. [Google Scholar] [CrossRef]
Hamilton, M. Evaluating algorithmic risk assessment. New Crim. Law Rev. 2021, 24, 156–211. [Google Scholar] [CrossRef]
Mayson, S.G. Bias in, bias out. Yale Law J. 2018, 128, 2218. [Google Scholar]
Hamilton, Z.; Kigerl, A.; Campagna, M.; Barnoski, R.; Lee, S.; Van Wormer, J.; Block, L. Designed to fit: The development and validation of the STRONG-R recidivism risk assessment. Crim. Justice Behav. 2016, 43, 230–263. [Google Scholar] [CrossRef]
Hopkinson, C. Using daubert to evaluate evidence-based sentencing. Cornell Law Rev. 2017, 103, 723. [Google Scholar]
DeMichele, M.; Comfort, M.; Barrick, K.; Baumgartner, P. The intuitive-override model: Nudging judges toward pretrial risk assessment instruments. Fed. Probat. 2021, 85, 22. [Google Scholar] [CrossRef]
Carlson, A.M. The need for transparency in the age of predictive sentencing algorithms. Iowa Law Rev. 2017, 103, 303. [Google Scholar]
Wenzelburger, G.; Yeung, K.; Hartmann, K. Smart Justice? Making sense of the rise of algorithm-based pre-trial risk assessment in criminal justice through ‘legal models’. Digit. Soc. 2025, 4, 48. [Google Scholar]
Novokmet, A.; Tomičić, Z.; Vinković, Z. Pretrial risk assessment instruments in the US criminal justice system—What lessons can be learned for the European Union. Int. J. Law Inf. Technol. 2022, 30, 1–22. [Google Scholar] [CrossRef]
Green, B. “Fair” risk assessments: A precarious approach for criminal justice reform. In Proceedings of the 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning, Stockholm, Sweden, 14 July 2018. [Google Scholar]
Villasenor, J.; Foggo, V. Artificial intelligence, due process and criminal sentencing. Mich. St. Law Rev. 2020, 2020, 295. [Google Scholar]
Pruss, D. Ghosting the machine: Judicial resistance to a recidivism risk assessment instrument. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023. [Google Scholar]
Christin, A. Algorithms in practice: Comparing web journalism and criminal justice. Big Data Soc. 2017, 4, 2053951717718855. [Google Scholar] [CrossRef]
Green, B.; Chen, Y. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, Georgia, 29–31 January 2019. [Google Scholar]
Hannah-Moffat, K. Algorithmic risk governance: Big data analytics, race and information activism in criminal justice debates. Theor. Criminol. 2019, 23, 453–470. [Google Scholar]
Peeters, R.; Schuilenburg, M. Algorithmic governance: Technology, knowledge and power. In The SAGE Handbook of Digital Society; SAGE: Thousand Oaks, CA, USA, 2023; pp. 439–457. [Google Scholar]
Westermann, H. Evidence-Based Sentencing: Risks and Opportunities. Lex Electron. 2020, 25, 71. [Google Scholar]
Gravett, W. Sentenced by an algorithm—Bias and lack of accuracy in risk-assessment software in the United States criminal justice system. South Afr. J. Crim. Justice 2021, 34, 31–54. [Google Scholar]
Solow-Niederman, A.; Choi, Y.J.; Van den Broeck, G. The institutional life of algorithmic risk assessment. Berkeley Technol. Law J. 2019, 34, 705. [Google Scholar]
Nguyen, H.T.T.; Cao, H.Q.; Nguyen, K.V.T.; Pham, N.D.K. Evaluation of explainable artificial intelligence: Shap, lime, and cam. In Proceedings of the FPT AI Conference, Virtual, 17–21 August 2021. [Google Scholar]
Brenner, M.; Gersen, J.S.; Haley, M.; Lin, M.; Merchant, A.; Millett, R.J.; Sarkar, S.K.; Wegner, D. Constitutional dimensions of predictive algorithms in criminal justice. Harv. Civ. Rights Civ. Lib. Law Rev. 2020, 55, 267. [Google Scholar]
Washington, A.L. How to argue with an algorithm: Lessons from the COMPAS-ProPublica debate. Color. Technol. Law J. 2018, 17, 131. [Google Scholar]
Copp, J.E.; Casey, W.; Blomberg, T.G.; Pesta, G. Pretrial risk assessment instruments in practice: The role of judicial discretion in pretrial reform. Criminol. Public Policy 2022, 21, 329–358. [Google Scholar] [CrossRef]
Green, B.; Chen, Y. Algorithmic risk assessments can alter human decision-making processes in high-stakes government contexts. Proc. ACM Hum. Comput. Interact. 2021, 5, 1–33. [Google Scholar]
Southerland, V.M. The intersection of race and algorithmic tools in the criminal legal system. Md. Law Rev. 2020, 80, 487. [Google Scholar]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Wexler, R. Life, liberty, and trade secrets: Intellectual property in the criminal justice system. Stan. Law Rev. 2018, 70, 1343. [Google Scholar] [CrossRef]
Eckhouse, L.; Lum, K.; Conti-Cook, C.; Ciccolini, J. Layers of bias: A unified approach for understanding problems with risk assessment. Crim. Justice Behav. 2019, 46, 185–209. [Google Scholar] [CrossRef]
Epp, C.R.; Haider-Markel, D.P.; Maynard-Moody, S. Pulled Over: How Police Stops Define Race and Citizenship; University of Chicago Press: Chicago, IL, USA, 2014. [Google Scholar]
Goel, S.; Rao, J.M.; Shroff, R. Precinct or prejudice? Understanding racial disparities in New York City’s stop-and-frisk policy. Ann. Appl. Stat. 2016, 10, 365–394. [Google Scholar] [CrossRef]

Figure 1. A count of the number of sources that discuss the potential benefits of AI systems is presented. The sources do not necessarily conclude that these benefits have been realized, but do specify that they are recognized as a benefit that could be implemented.

Figure 2. The PRISMA flowchart of source collection, review, and inclusion.

Figure 3. (a) The portions of included sources which indicated ML is being used in criminal justice risk assessments. (b) The count of ML tools applied to ARA.

Figure 4. The count of sources that discussed the six most popular ARA tools. COMPAS was privately developed, while the others have been developed by public entities.

Figure 5. The relative portions of sources that claim that ARA tools can or do contribute to biased performance in court systems in the United States.

Figure 6. Counts of the most frequently observed sources of bias in ARA tools as documented in our included sources.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Atkinson, G.; Casagrande, K. Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments. Information 2026, 17, 574. https://doi.org/10.3390/info17060574

AMA Style

Atkinson G, Casagrande K. Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments. Information. 2026; 17(6):574. https://doi.org/10.3390/info17060574

Chicago/Turabian Style

Atkinson, Gentry, and Katlyn Casagrande. 2026. "Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments" Information 17, no. 6: 574. https://doi.org/10.3390/info17060574

APA Style

Atkinson, G., & Casagrande, K. (2026). Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments. Information, 17(6), 574. https://doi.org/10.3390/info17060574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficiency vs. Equity: A Structured Interdisciplinary Review of AI in Criminal Justice Risk Assessments

Abstract

1. Introduction

2. Related Work

3. Defining Algorithmic Risk Assessment

3.1. Deterministic Methods

3.2. Artificial Intelligence

3.3. Machine Learning

3.4. Bias and Fairness

4. Source Collection

4.1. Inclusion Criteria

4.2. Exclusion Criteria

4.3. Query Terms

5. Results

5.1. Question 1: Is Machine Learning Being Used as a Tool for Risk Assessment in the United States?

5.2. Question 2: What Specific Technologies Are Being Applied for Risk Assessment?

5.3. Question 3: Does Algorithmic Risk Assessment Contribute to Bias or Issues of Fairness?

5.4. Question 4: What, If Any, Is the Evidence and Apparent Source of Bias in Algorithmic Risk Assessment?

5.5. Observed Sources of Bias

5.6. The Potential Benefits of AI

6. Discussion

6.1. COMPAS—A Cautionary Tale

6.2. Reactions to AI in the Real World

6.3. How These Tools Are Used Outside the Courtroom

6.4. Limitations of This Study

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Included Publications

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI