1. Introduction
Recidivism prediction has become one of the most visible applications of data-driven decision support in criminal justice [
1,
2,
3,
4]. Risk assessment instruments are now used, or actively considered, in pretrial release, sentencing, parole, prison classification, and rehabilitation planning because they promise more consistent, evidence-informed, and scalable decisions than unaided human judgment [
1,
5,
6]. At the same time, the use of predictive systems in criminal justice raises fundamental concerns about fairness, transparency, accountability, and public legitimacy. The relevant literature, therefore, spans doctrinal critiques of risk construction and sentencing use, empirical studies of predictive validity and bias, technical work on interpretable and fairness-aware modeling, and broader reviews of AI governance in criminal justice [
7,
8,
9,
10,
11].
The first challenge is that recidivism is not a uniquely defined target. Across the literature, recidivism has been operationalized as rearrest, reconviction, reimprisonment, reincarceration, or a new charge within a specified follow-up window, and these choices materially affect event rates, prediction difficulty, and the meaning of model error [
12,
13]. Some studies emphasize general post-release reoffending, whereas others focus on violent recidivism, offense-specific recidivism, or return to a particular correctional setting. Recent work further shows that the importance of predictors may vary across one-, two-, three-, and five-year horizons, across adult and juvenile populations, and across general, violent, or offense-specific outcomes [
8,
14,
15,
16,
17]. Consequently, recidivism prediction is not a single fixed learning problem; it is a family of context-dependent prediction tasks shaped by legal definitions, institutional objectives, and operational time frames. This point is especially important for high-stakes applications, because a model optimized for one definition of recidivism may be inappropriate for another, even when the same data source is used.
The field has also evolved substantially in methodological terms. Early actuarial approaches and conventional statistical models used demographic and criminal-history variables to generate simple risk scores, often through additive or linear structures that were readily interpretable but limited in expressive power. Contemporary work has expanded this design space considerably. In that direction, researchers have proposed Bayesian regression and survival models, random forests, support vector machines, gradient boosting systems, decision-tree optimization procedures, neural networks, additive neural models, cluster-aware deep learning pipelines, and fairness-aware multi-objective optimization frameworks [
2,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27]. This diversification reflects two realities. First, recidivism datasets are heterogeneous, frequently imbalanced, and shaped by local correctional practices. Second, there is no universally dominant model family: performance depends on the target definition, data quality, class balance, and institutional context. As a result, recidivism prediction research increasingly focuses not only on whether machine learning can improve discrimination but also on which model is appropriate for a specific criminal justice task.
The modern fairness debate in this area was largely catalyzed by controversies surrounding COMPAS and related tools [
10,
12,
13]. The literature has shown that such systems may produce unequal false-positive and false-negative patterns across demographic groups, especially when evaluated on race and gender subpopulations. Yet, the same literature makes clear that fairness in recidivism prediction is inherently multi-dimensional, as calibration, demographic parity, equal opportunity, predictive equality, predictive parity, and balanced error rates cannot be simultaneously satisfied, especially when outcome-based rates differ across groups [
17,
28,
29,
30,
31,
32,
33]. As a result, there is no single mathematically complete notion of fairness for this problem. Fairness-aware recidivism modeling must instead confront explicit trade-offs among partially incompatible criteria, each reflecting a different normative view about justice, acceptable risk allocation, and the proper role of algorithms in criminal justice.
This recognition moved the field beyond binary debates about whether a model is “fair” or “unfair”. More recent work examines fairness interventions across the full machine learning pipeline, including pre-processing, in-processing, and post-processing stages. Reweighing, adversarial learning, disparate impact removal, reject-option classification, equalized-odds optimization, and related methods have also been applied to recidivism prediction. Integrative studies suggest that isolated debiasing interventions are often insufficient, whereas multi-phase approaches can sometimes significantly improve fairness [
33,
34,
35,
36,
37]. However, these studies also show that fairness interventions highly depend on the datasets and metrics used. A method that improves one fairness criterion may worsen another or reduce performance on a different dataset. This means that fairness-aware recidivism prediction is best understood as a constrained or multi-objective-based problem rather than a simple post hoc correction procedure. The implication for model development is direct since fairness should be embedded in the learning objective, model selection strategy, and not treated as an afterthought.
Interpretability forms the second major pillar of current research. In high-stakes structured data settings, several influential studies argue that post hoc explanations of opaque models constitute an insufficient substitute for models that are understandable by design. This concern is reinforced by empirical and conceptual work showing that black-box explanations can be unstable, persuasive without being faithful, or difficult to contest in legally meaningful ways [
38,
39,
40,
41,
42,
43,
44,
45,
46]. More importantly, interpretability literature does not simply equate interpretability with simplicity. Rather, it emphasizes multiple dimensions, including simulatability, transparency of feature effects, faithful local reasoning, stable global structure, and practical usability for human decision makers. This has led to renewed interest in sparse scoring systems, interpretable classification rules, generalized additive models, explainability-constrained architectures, and neural additive models that selectively incorporate interactions while preserving human-readable component effects [
13,
40,
41,
42,
43,
47,
48].
A recurring result across the existing literature is that interpretability need not always require a substantial sacrifice in predictive quality. Several comparative studies show that interpretable models can perform as well as black-box alternatives and proprietary tools such as COMPAS or the Arnold PSA when trained on structured recidivism data [
5,
30,
31,
42,
45]. Other works propose interpretable output constraints, Shapley-based importance analyses across near-optimal models, and explanation architectures that combine global structure with exact local decomposition and counterfactual reasoning [
21,
39,
41]. These contributions have collectively weakened the assumption that accuracy and interpretability lie on opposite ends of a fixed trade-off curve. Instead, they suggest that much depends on whether the prediction task is based on structured tabular data, whether the model class is chosen deliberately, and whether explanation is treated as an intrinsic modeling requirement rather than as a visualization layer added after training.
Related to interpretability is the broader framework of trustworthy AI. Systematic reviews in the recidivism domain repeatedly identify fairness, transparency, privacy and data protection, accountability, human oversight, technical robustness, and social acceptability as core requirements for responsible deployment [
10,
11,
15,
29,
38,
44]. These dimensions matter because a model may achieve acceptable discrimination metrics, still being inappropriate in practice if data provenance is unclear, decision logic is not auditable, or outputs encourage uncritical human reliance. In this sense, interpretability should not be treated merely as a communication aid but rather as a part of the governance structure required for audit, contestability, and responsible human oversight. On the other hand, socio-legal critiques go further by warning that explanation layers attached to opaque systems may convert uncertainty into institutional justification, thereby redistributing responsibility without providing reasons that are scientifically or legally reviewable [
43]. For criminal justice applications, this means that technical explanation must be aligned with procedural rights, reviewability, and practical avenues for contesting adverse decisions.
A further concern is transportability and institutional tasks. Studies of race and geography suggest that location can influence predictive performance more consistently than race in some machine learning settings, implying that models developed in one jurisdiction may not generalize reliably to another [
25,
30,
49]. Other work argues that predictive accuracy alone is not enough if the system is meant to support treatment allocation or policy design rather than only risk ranking; from this perspective, the objective should shift from pure prediction toward learning actionable decision policies [
22]. Additional studies on mental illness, intimate partner violence, cognitive-emotion regulation, and juvenile offending show that subgroup-specific recidivism prediction often requires different feature spaces, different operational definitions, and different interpretability settings [
16,
18,
20,
26]. These findings suggest that recidivism modeling should be treated as a context-sensitive problem in which fairness, interpretability, and deployment goals must be aligned with the intended institutional task.
Data quality and class imbalance present additional technical barriers as recidivism datasets are often imbalanced, fragmented across institutions, and constrained by privacy or legal restrictions. Recent work addresses these issues through feature-selection pipelines, SMOTE-based balancing, clustering, hyperparameter optimization, synthetic-data generation, and direct AUC-oriented objectives [
14,
27,
40,
50]. Open-source replications and synthetic-data studies also highlight the importance of reproducibility, privacy-preserving experimentation, and methodological transparency [
35,
50]. However, gains in predictive performance do not, by themselves, resolve fairness or interpretability concerns. For example, some studies report high apparent accuracy in specialized or local datasets, whereas others caution that data imbalance, outcome definition, and deployment context can distort naive evaluations. Likewise, work on mental illness suggests that clinically salient variables may add limited predictive value over crime and demographic features, even though they remain important for treatment planning and legal–ethical analysis [
26]. These observations reinforce the need for model designs that balance predictive power with transparent reasoning and fairness-sensitive evaluation, rather than optimizing performance in isolation.
The human side of recidivism prediction is equally important. Public attitude studies show that people often underestimate the error rates of algorithmic systems while demanding very low error tolerance in high-stakes contexts [
14]. Other work on crowd perceptions and human algorithm interaction indicates that notions of fairness depend not only on statistical metrics but also on certain predictors that are viewed as legitimate, how the system is explained, and whether people retain meaningful oversight [
12,
36]. In professional settings, algorithmic support can improve human predictions in some groups, especially among targeted or trained users, but practitioners remain reluctant to fully endorse automated recidivism assessment [
12]. Instead, such tools are often viewed as aids for standardization, training, cross-checking, and capacity extension. This has direct implications for model design. A fairness-aware and interpretable model is valuable not only because it can be inspected statistically, but also because it can support calibrated, reviewable, and contestable human decision processes.
A review of the literature reveals a clear research gap. Existing studies often optimize one or two properties at a time: some emphasize predictive performance, others focus on fairness auditing, others prioritize explainability, and others examine legal or ethical implications [
9,
31,
33,
38,
44]. Comparatively fewer studies integrate these objectives into a single design framework for structured recidivism data. Even when fairness and interpretability are both discussed, fairness is frequently treated as a post hoc evaluation layer, while interpretability is treated as a reporting feature rather than as a structural principle embedded in the model itself. Based on the above analysis, the main contributions of the current endeavor are enumerated as follows. First, three recidivism datasets from three European countries were developed. The two of them were provided by official authorities, while the third one was artificially synthesized based on existing statistical distributions and sophisticated data generation methods. Given that structured recidivism datasets are rare in the existing literature, the above contribution provides a significant impact on studying recidivism and creating machine learning models to quantify its properties and characteristics. Second, it develops a model that remains operationally interpretable. Third, it evaluates the model not only by predictive performance but also by fairness-relevant criteria. Fourth, it contributes to the broader transition from opaque risk scoring toward algorithmic systems that are technically effective, normatively defensible, and institutionally trustworthy. Fifth, it situates the technical design within the wider literature on criminal justice analytics, trustworthy AI, and responsible deployment of AI software, thereby linking model construction to auditability, contestability, and human oversight. In this way, the study is positioned at the intersection of predictive modeling, interpretable machine learning, algorithmic fairness, and criminal justice governance.
The remainder of this paper is organized as follows.
Section 2 presents the methodological framework.
Section 3 describes the three recidivism datasets acquired from Bulgaria, Greece, and Portugal, detailing their attribute structures, output definitions, and the methodology used to generate the Portuguese dataset.
Section 4 presents experimental analysis, organized into several simulation cases. Finally, the paper concludes in
Section 5.
3. Description of the Datasets
In this section, we present three new recidivism datasets acquired in the current endeavor from three different countries, namely, Bulgaria, Greece, and Portugal. The Bulgarian and Greek datasets were, respectively, taken from the Bulgarian and Greek official authorities and were processed to be appropriately anonymized, fully compliant with the EU General Data Protection Regulation (GDPR). For the Portuguese case, no official datasets were available. Therefore, that data set was artificially generated by using publicly available statistics and the Greek data set. The generation process was based on using the data distributions coming from the public Portuguese statistics and the respective Greek distributions. The final variables, data instances, and simulation outcomes for the Portuguese data set were thoroughly studied and checked by a group of experts, such as lawyers, establishment officers, judges, etc., a fact that guarantees both realism and generalizability of the generated data and the experimental findings reported in the next section (i.e., see
Section 4).
Herein, recidivism has been specifically defined as reincarceration following release from prison. This definition excludes rearrests that do not result in custody, technical probation violations (unless leading to custodial admission), and purely administrative returns. The operationalization differs slightly across national contexts due to the data set architecture. Regarding the Bulgarian data set, due to data constraints, an event-based proxy measure is constructed by the authorities, which distinguishes custodial admission from probation admission. In the Greek data set, recidivism has been defined as reincarceration due to a new offense within a defined post-release time window (typically within three years). Finally, in the Portuguese data set, recidivism risk is estimated through structured administrative variables aligned with validated criminological predictors. These differences reflect responsible adaptation to national institutional realities rather than conceptual divergence.
Table 1,
Table 2 and
Table 3 depict the attribute names and types for the Bulgarian, Greek, and Portuguese datasets, respectively. There are 4940, 12,422, and 4418 sample instances available for the Bulgarian, Greek, and Portuguese datasets, respectively. Each sample instance corresponds to an individual.
The Bulgarian data set includes 17 input attributes and one output attribute, called “Recidivism Risk Assessment”. It is worth noting that the output attribute, while not straightforwardly referring to re-incarceration as a binary-typed variable, quantifies the re-incarceration risk assessment carried out by the Bulgarian official authorities as low and medium.
The Greek data set includes 11 input variables and two outputs. The two outputs are called “Recidivism within 3 Years” and “Recidivism”. The first one quantifies the probability of reincarceration in a three-year interval after an individual’s prison release, while the second one quantifies the probability of reincarceration in an individual’s life span after prison release. Note that the first output is widely considered a reliable index to effectively predict the recidivism related to an individual’s attribute status.
The Portuguese data set includes nine input attributes and two output attributes. The output attributes are similar to the Greek data case.
In view of the above tables, we can distinguish between two types of attributes, namely, static and dynamic attributes. Static attributes refer to factors that are fixed or not meaningfully modifiable in the short term, such as gender, nationality, age at exiting prison, criminal patterns, structural offense characteristics, etc. Static factors consistently demonstrate a strong statistical association with recidivism. Age, for instance, follows the well-established “age–crime curve,” where criminal involvement peaks in late adolescence and early adulthood and declines with age. Criminal patterns, such as the type of crime or crime category, are among the most powerful predictors of future criminal justice contact. In general, the advantage of static variables lies in their objectivity and reproducibility. They reduce subjective interpretation and are reliably documented in administrative datasets. On the other hand, dynamic attributes correspond to dynamic factors able to change over time and may be influenced by intervention. Examples include employment status, educational status, institutional behavior, etc. Dynamic factors are highly relevant for rehabilitation-oriented policy. However, they are often inconsistently recorded across national administrative systems. For this reason, the current endeavor relies primarily on static predictors supplemented by limited dynamic factors for socio-economic vulnerability.
Regarding the output variables, in the Bulgarian case, no information was provided to set up a model where the output would be the individual’s risk assessment related to the probability of reincarceration in a three-year time interval after his/her prison release. As far as the other two cases are concerned, the data structure is defined in a straightforward manner in terms of the two output attributes, namely “Recidivism” and “Recidivism within 3 Years”. Therefore, for the Bulgarian case, only one classification model is constructed to predict the re-incarceration risk, while for each one of the Greek and Portuguese cases, two models will be constructed. The first model concerns the “Recidivism within 3 years” output, and the second one the “Recidivism” output.
4. Experiments and Discussion
Herein, we apply the methodology proposed in
Section 2 on the datasets described in
Section 3 to estimate and mitigate bias against gender. Thus, the sensitive attribute is the gender attribute. This attribute is divided into the Male group, defining the privileged group, and the Female group, defining the unprivileged group.
Recalling what was reported in the previous section, there are three available datasets for Bulgaria, Greece, and Portugal. The Bulgarian data set has only one binary output that corresponds to the re-incarceration risk assessment and is called “Recidivism” or “Recidivism risk assessment”. The other two datasets include two outputs called “Recidivism” and “Recidivism within 3 Years”, where the first one quantifies the probability of reincarceration in a three-year interval after an individual’s prison release, and the second quantifies the probability of reincarceration in an individual’s life span after prison releaseThe objectives of the experimental analysis are enumerated as follows: (a) study of the model’s accuracy before and after the bias mitigation, (b) study of the bias against gender before and after the mitigation process based on EO fairness criterion, (c) study of the Predictive Parity (PP) fairness criterion, and (d) study of the interpretability capabilities of the model after the mitigation process, as far the EO criterion is concerned.
To evaluate the behavior without bias mitigation, we build the 1D-CNN model using the cross-entropy objective function defined in Equations (15) and (21). Therefore, this model favors the bias estimation process. On the other hand, we use the custom objective function given in (25) to create a model that performs bias mitigation and study the model’s behavior after bias mitigation.
In view of
Table 1,
Table 2 and
Table 3, the created 1D-CNN models are subsequently described. First, for the Bulgarian data set, we built two models to study the “Recidivism risk assessment” output attribute, before and after the mitigation process, respectively. Second, for the Greek data set, we built two models to study the “Recidivism within 3 years” output attribute, before and after the mitigation process, respectively. Third, for the Greek data set, we built two models to study the “Recidivism” output attribute, before and after the mitigation process, respectively. Fourth, for the Portuguese data set, we built two models to study the “Recidivism within 3 years” output attribute, before and after the mitigation process, respectively. And finally, for the Portuguese data set, we built two models to study the “Recidivism” output attribute, before and after the mitigation process, respectively. Thus, in total, we create five models to quantify the behaviors before bias mitigation and five models to evaluate the status after bias mitigation.
To train the 1D-CNN models, we used the stochastic gradient descent (SGD) algorithm, where the activation functions were quantified in terms of the ReLU function, while the learning rate, the number of maximum epochs, and the batch size were set to 0.0001, 2000, and 50, respectively. The parameters for 1D-CNN were as follows: number of kernels = 16, kernel size = 2, padding = “same”, input_shape = (m, 1), where m is the number of input attributes, and input channels = 1. Thus, the total number of parameters was equal to 48 (i.e., 32 weights and 18 biases). The regularization parameter in Equation (25) was determined in terms of an iterative process where its initial value was very small (i.e., favoring the presence of bias), and as the iteration number increased, this value also increased using a pre-defined step size. The iteration stops when, during two consecutive iterations, the minimization rates of the fairness part in Equation (25) are close to each other. The final values for the Bulgarian, Greek, and Portuguese datasets were = 10, 5, and 7, respectively. Finally, for each simulation case, the original data set was divided into a training set, including the 70% of the data, and a testing set, including the rest 30% of the data.
To carry out the statistical experimental analysis, considering all 10 1D-CNN models, 100 runs for each model were executed using different initializations. The results reported in the following subsections concern the testing data.
Based on the above setting, we performed four experimental cases, called Experiment 1, Experiment 2, Experiment 3, and Experiment 4, which are presented within the next subsections.
4.1. Experiment 1: Descriptive Statistics of the Accuracy Performance
This case concerns the study of the model’s accuracy before and after the mitigation process. As a first step, apart from testing the 1D-CNN model, we also tested two more models, namely, the standard XGBoost algorithm and a standard MLP neural network. The MLP consisted of two hidden layers with 10 and 5 nodes, respectively, and the ReLU function as the nodes’ activation operator.
Table 4 depicts the simulation results in terms of the accuracy obtained over the testing data.
Regarding the implementation of the 1D-CNN model, the results are depicted in
Table 5 and
Table 6.
Based on the results reported in the tables above, we justify our choice to employ the 1D-CNN in the design process of our algorithmic architecture. The first reason relies on its accurate performance. By comparing the results reported in
Table 4 with those reported in
Table 5 and
Table 6 for the case “Before Bias Mitigation”, we can easily observe that, apart from the case of Portuguese data in the “Within 3 Years Recidivism Prediction” case, where the XGBoost slightly outperformed the 1D-CNN model, the 1D-CNN clearly obtained better performance than XGBoost and MLP models. It turns out that the use of convolutional kernels seems to be appropriate for describing and quantifying criminal variables. The second reason relies on the choice of using the Kernel SHAP algorithm to carry out interpretability analysis. In this regard, the Kernel SHAP fits better with the inherent layered structure of the 1D-CNN than the respective algorithmic structures of the XGBoost and MLP models.
Next, we proceed to studying
Table 5 and
Table 6, which exclusively refer to the proposed algorithmic structure. Recalling that, contrary to the implementation of Equation (21), Equation (25) includes fairness constraints, the following conclusions can be extracted. First, the results indicate similar behavioral trends across the models. Second, the best performance is achieved by the Portuguese data set, and the worst by the Greek data set. Third, it is obvious that the implementation of the mitigation process in terms of the customized objective function in Equation (25) compromises the accuracy. Thus, in all experimental cases, the accuracy drops for the debiased models. This reduction was expected in the first place, because the constraints that enable fair behavior are considered in Equation (25), imposing conflicting effects as far as the accuracy is concerned. A more rigorous analysis of this issue is presented at the end of
Section 4.2.
4.2. Experiment 2: Inference Statistics for Bias Estimation and Mitigation
In this experiment, we evaluate the descriptive statistics of the 10 1D-CNN models (i.e., five before bias mitigation, and five after bias mitigation).
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6 illustrate the results. These figures include the following information: (a) the FPR for Male and Female groups and the corresponding differences between the two FPRs, called DFPR (difference in FPR), before and after the bias mitigation process, and (b) the FNR for Male and Female groups and the corresponding differences between the two FNRs, called DFNR (difference in FNR), before and after the bias mitigation process. Note that DFPR and DFNR are quantified by Equations (22) and (23), respectively. For further analysis and discussion, we recall that figures labeled as “Before Bias Mitigation” refer to the bias estimation process, while figures labeled as “After Bias Mitigation” correspond to the bias mitigation process.
From these figures, it is clearly observed that the likelihood of predicting recidivism between Male and Female groups appears to have similar behavior in all simulations that correspond to the “Before Bias Mitigation” case. Similar conclusions are extracted regarding the “After Bias Mitigation” case.
Next, we proceed to studying the cases before and after bias mitigation.
First, we study the model predictions without considering bias mitigation (i.e., before the bias mitigation process). Thus, the main purpose is to perform bias estimation and provide rigorous statistical evidence.
Remark 1: In all datasets, the FPR mean values for the Female group are larger than the FPR mean values for the Male group, i.e., FPR_Mean (Male) < FPR_Mean (Female).
Remark 2: For the Greek and Portuguese datasets, the FNR mean values for the Female group are smaller than the FNR mean values for the Male group, i.e., FNR_Mean (Male) > FNR_Mean (Female).
Remark 3: For the Bulgarian data set, the FNR mean values for the Female group are larger than the FNR mean values for the Male group, i.e., FNR_Mean (Male) < FNR_Mean (Female).
Next, we carry out rigorous statistical inference to study the distributions reported in
Figure 2a,c,
Figure 3a,c,
Figure 4a,c,
Figure 5a,c and
Figure 6a,c by considering only FPR and FNR values (i.e., we do not consider the DFPR and DFNR values).
To perform the normality check for those distributions, we employed the well-known Shapiro–Wilk test, with the following Null Hypothesis: “
The population follows normal distribution”.
Table 7 depicts the obtained results, where pairs of Male/Female distributions are reported. The reason is that the inference statistics that follow take place considering these pairs of distributions.
Having said that, in view of
Table 7, the inference statistics were carried out in terms of the
t-test for the cases where the null hypothesis is accepted and the Mann–Whitney U test for the cases where the null hypothesis is rejected. In all comparative cases, the Null Hypothesis is as follows: “
The two populations, corresponding to Male and Female groups, have the same central tendency, which is interpreted as equal distributions”.
Table 8 summarizes the findings of our analysis. The results in these tables directly indicate that the obtained
p-values are less than 0.05 and therefore the above null hypothesis is rejected in all simulation cases.
This means that the populations of the Male group and Female group for all simulation cases regarding FPRs and FNRS are different. That outcome strongly supports the previous analysis that concerned the existence of bias against the Female group.
Trying to analyze the above results, we proceed with the subsequent analysis.
FPR is defined as the ratio between the number of individuals who are predicted to reoffend, while they do not do so, divided by the number of all individuals who do not reoffend. As such, a classifier discriminates against a specific group of individuals when assigning higher FPR values for that group. Thus, in all datasets and in all simulations depicted in
Figure 2a,
Figure 3a,
Figure 4a,
Figure 5a and
Figure 6a and
Table 7 and
Table 8, the resulting classification models appear to discriminate against the Female group.
FNR is defined as the ratio between the number of individuals who are predicted not to reoffend while they do so, divided by the total number of individuals who reoffend. This means that a classifier discriminates against a specific group of individuals when assigning it lower FNR values. Thus, in Greek and Portuguese datasets and in all simulations depicted in
Figure 3c,
Figure 4c,
Figure 5c and
Figure 6c and
Table 7 and
Table 8, the resulting classification models appear to discriminate against the female group.
Based on the above discussion, Definition 1 and Equations (17)–(20), we can easily verify that for the Greek and Portuguese datasets, the developed classification models exhibit discriminative behavior against the Female group by violating the Equalized Odds requirement. This directly implies that the machine learning models that do not perform bias mitigation appear to have a strong bias against the unprivileged group.
As far as the Bulgarian data set is concerned, while the results for the FPR show clear discrimination against the Female group, this outcome is not clear for the FNR case. However, as will be shown, also in this case, the models appear to exhibit discriminative behavior against the Female group.
To interpret those conclusions, we adopt the concept of base rate (BR) and proceed with the following analysis. The BR is defined as the proportion of individuals in the population who reoffend. In all datasets, women tend to have lower BRs of reoffending than men. In addition, the resulting comparative FPRs and FNRs indicate that the models provide both overestimates and underestimates of risk for women, which directly implies that they do not predict risk consistently for them. These conclusions reflect structural differences between the two groups’ criminal histories, which indicate far fewer women reoffend, while the corresponding predictions differ. Therefore, it is consistent to conclude that for all datasets, the models created to predict recidivism appear to have bias against the Female group by violating the Equalized Odds fairness criterion.
Next, we study model predictions with bias mitigation. To accomplish that task, we perform rigorous statistical inference to study the distributions reported in
Figure 2b,d,
Figure 3b,d,
Figure 4b,d,
Figure 5b,d and
Figure 6b,d by considering only FPR and FNR values (i.e., we do not consider the DFPR and DFNR values).
The methodology to create the models is presented in
Section 2.1,
Section 2.2 and
Section 2.3. In a similar fashion to the previously reported simulations,
Table 9 depicts the populations’ normality check test and
Table 10 the inference statistics test. Again, acceptance of the null hypothesis of normality in
Table 9 implies the use of the
t-test inference statistics test in
Table 10, while rejection of the normality check in
Table 9 implies the use of the Mann–Whitney U inference statistics test in
Table 10.
In view of the above tables, it can be easily concluded that the differences in FRPs and FNRs between the Male and Female groups are significantly reduced in all simulation cases. In addition, it is worth noting that some
p-values in
Table 10 indicate that the null hypothesis is accepted, meaning that the bias has been fully mitigated. On the other hand, several
p-values indicate that this hypothesis is rejected, which means that the mitigation process was not fully accomplished. Based on this observation, we strongly emphasize the following remark. The use of the optimization process does not guarantee a global minimum. Thus, we intended to minimize the FPR and FNR differences to mitigate the bias and not to eliminate it. In that direction,
Table 10 indicates that in some cases, optimization managed to detect near-global minimum solutions, while in other cases it failed to do so.
In any case, the differences in FNRs and FPRs have been substantially reduced. Therefore, it is consistent to conclude that the constrained optimization process imposed a strong effect and finally obtained the best possible results.
Finally, we focus on the trade-off between fairness and predictive performance.
Figure 7 illustrates the mean values of the accuracies obtained by the 1D-CNN for the three datasets for the recidivism prediction case, and
Figure 8 illustrates the respective values for the within 3 years recidivism prediction case, considering the models without and with bias mitigation.
Given the conclusions of the above analysis, it can be easily verified that the models’ accuracies with bias mitigation are considerably smaller than the respective accuracies for the models without bias mitigation. Therefore, those figures directly quantify the trade-off between fairness and the accuracy performance of the models.
4.3. Experiment 3: Implementation and Study of the Predictive Parity Fairness Criterion
In this section, we analyze the Predictive Parity (PP) fairness criterion and its impact on developing fair 1D-CNN classifiers for the three datasets. This criterion attempts to assess whether an ML model achieves equal positive predictions across the privileged and unprivileged groups. As such, the main requirement is to obtain equal values of the Positive Predicted Value (PPV) across groups. The current analysis acts as a supplement to the previous analysis, which was based on the EO fairness criterion, and intends to integrate our understanding regarding bias estimation and mitigation.
Given the nomenclature of
Section 2.2, the PPV is
where TP is the True Positive and FP the False Positive. We can easily derive the values of TP and FP using the next soft differentiable approach [
52]
and
Thus, the PPV can be approximated as indicated next
where
is a small positive number. As a result, in our case, the PP fairness criterion is expressed in terms of the next equation
which can be rewritten as
Thus, the resulting constrained condition is
Recalling that the main task is to minimize the
in Equation (21), the constraint optimization problem becomes
Again, the above constrained problem is formulated by minimizing the subsequent regularization approach
To resolve the above problem, we use again the 1D-CNN, where the structure and the parameter selection are the same as reported at the beginning of
Section 4. The only difference is that, in this case, the values for the parameter
were found equal to 5, 3, and 5 for the Bulgarian, Greek, and Portuguese datasets. Also, here, 100 runs were executed for each experimental simulation.
Figure 9,
Figure 10 and
Figure 11 depict the results obtained for the three datasets regarding the cases without and with bias mitigation.
In our recidivism case, the PP fairness criterion indicates the proportion of people who reoffended, out of all those the classifier predicts would do so. Thus, the detection of Predictive Parity imbalance is a strong indicator of the classifier’s behavior related to groups with different characteristics [
2]. In view of this remark,
Figure 9a,
Figure 10a,c and
Figure 11a,c imply a direct imbalance regarding the PPV values for the Male and Female groups when no bias mitigation is taken into account. However, the imbalances reported in the above figures are not as strong as in
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6 for the “Before Bias Mitigation” cases.
Considering the implementation of EO and PP fairness criteria, we conclude that their combined effect strongly supports the assumption that the proposed methodology is in a position to produce fair recidivism classification predictions.
4.4. Kernel SHAP Configuration, Attribution Analysis, and Fairness Interpretation
For the interpretability analysis, the Kernel SHAP explainer was configured as follows. A random background data set of 1000 instances, sampled without replacement from the training partition, was used to approximate the marginal feature distributions required for coalition value estimation, as described in Equation (28). This background size balances the trade-off between approximation fidelity and computational cost for the feature dimensionalities of the three datasets considered. For each explained instance, 2000 coalition samples were drawn to fit the weighted least-squares surrogate model in Equation (29); the sampling scheme follows the Kernel SHAP weighting function defined in Equation (30), which upweights singleton and full coalitions to improve the estimation of extreme marginal contributions. A logit link function was applied to the model’s sigmoid output prior to attribution, so that the resulting SHAP values are additive in log-odds space, as expressed in Equation (32). Feature attributions were computed on the held-out test set for each of the 100 independent runs, and the resulting mean absolute SHAP values were aggregated across runs to obtain stable global importance rankings. All computations were performed using the KernelExplainer class, and check_additivity was enabled to verify local accuracy for each explained instance.
Figure 12 reports the grouped mean absolute SHAP values after aggregation of the one-hot encoded feature families. The dominant contributors are reducing punishment with work, sentence fulfilled, exemption from serving the sentence, and sentence for multiple crimes, followed by grouped marital status and grouped education level. By contrast, gender and nationality have negligible grouped importance, while days in prison and days of penalty have almost zero contribution at the global level. This result is important for the fairness analysis because it suggests that, in the fairness-constrained model, the global prediction structure is driven primarily by sentence-administration and legal-status variables rather than by direct use of the protected attribute.
Figure 13 provides a representative local waterfall explanation for a high-risk prediction. Starting from the baseline value E[f(X)] = 0.32, the predicted output increases to f(x) = 0.808 mainly due to the combined positive effects of non-exemption from serving the sentence, sentence fulfilled, reducing punishment with work, and sentence for multiple crimes. Conditional early release and the duration-related variables contribute only weakly, while gender does not appear among the dominant local drivers. Therefore, at the case level, the explanation is again dominated by operational and sentence-related attributes rather than by the protected characteristic itself.
The above explanations complement, rather than replace, the Equalized Odds analysis presented in
Section 4.2. Equalized Odds quantifies fairness at the group level through disparities in false positive and false negative rates, whereas SHAP provides an audit of the feature pathways through which individual predictions are formed. In this sense, the weak direct contribution of gender in both the global and local explanations is consistent with the fairness-constrained training objective. The scientific value of the SHAP analysis lies in showing that the improved fairness outcomes are accompanied by a decision structure that places negligible direct weight on gender, while also revealing the operational variables through which residual disparities could still arise. This makes SHAP a complementary tool for fairness auditing, contestability, and model governance in criminal justice applications.
5. Conclusions
This study developed and evaluated a fairness-aware and interpretable framework for recidivism prediction, integrating a 1D Convolutional Neural Network, a custom Equalized Odds-constrained loss function, and Kernel SHAP-based explanations. The framework was applied to three distinct institutional datasets from Bulgaria, Greece, and Portugal, covering five prediction tasks and evaluated across independent runs per model, yielding statistically robust conclusions. The experimental results confirm that machine learning models trained without fairness constraints exhibit significant discriminatory behavior against the female group across all datasets and prediction horizons. Specifically, statistically significant differences in false positive rates and false negative rates between male and female offenders were detected in every baseline model, providing strong evidence that standard classification objectives reproduce and amplify structural biases present in historical criminal justice data. This finding holds regardless of national context or output definition, underscoring the systemic nature of the problem.
The fairness-constrained models achieved a substantial reduction in gender-based error rate disparities. In several experimental cases, the null hypothesis of equal distributions between Male and Female groups could not be rejected following bias mitigation, indicating full equalization of predictive treatment. Where statistically significant differences persisted, their magnitude was considerably diminished relative to the baseline. As expected, the introduction of fairness constraints involved a moderate reduction in overall classification accuracy, reflecting the genuine tension between unconstrained predictive optimization and equitable treatment across demographic groups. This trade-off is both theoretically anticipated and operationally manageable in the criminal justice context, where fairness and legitimacy are at least as important as aggregate performance.
Kernel SHAP analysis provided case-level and global interpretability of the constrained models, identifying the relative contribution of static attributes such as age, criminal history, and offense characteristics as the dominant drivers of individual risk predictions across all three national datasets. This finding is consistent with established criminological theory and provides a basis for auditing model behavior, supporting contestability, and informing institutional oversight. Crucially, interpretability was embedded as a structural principle of the framework rather than appended as a post hoc visualization layer, aligning with the requirements of trustworthy AI deployment in high-stakes settings.
Taken together, these results demonstrate that fairness, interpretability, and competitive predictive performance can be simultaneously pursued within a unified design framework for structured criminal justice data.
Future efforts will extend this framework to consider additional protected attributes such as nationality, employment/unemployment status, and age. Also, we will focus on examining cross-jurisdictional transferability more systematically, incorporating dynamic predictors as their longitudinal recording improves, and exploring the integration of counterfactual reasoning to further support procedural rights and practical contestability in deployment contexts.