Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities

Arias-Serna, María Andrea; Quiza-Montealegre, Jhon Jair; Móntes-Gómez, Luis Fernando; Clavijo, Leandro Uribe; Orozco-Duque, Andrés Felipe

doi:10.3390/jrfm18090489

Open AccessArticle

Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities

by

María Andrea Arias-Serna

^1,*

,

Jhon Jair Quiza-Montealegre

¹

,

Luis Fernando Móntes-Gómez

¹,

Leandro Uribe Clavijo

¹ and

Andrés Felipe Orozco-Duque

²

¹

Faculty of Engineering, Universidad de Medellín, Cra. 87 #30-65, Medellín 050026, Colombia

²

Department of Applied Sciences, Instituto Tecnológico Metropolitano, Cl. 73 #76A-354, Medellín 050034, Colombia

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(9), 489; https://doi.org/10.3390/jrfm18090489

Submission received: 7 July 2025 / Revised: 27 August 2025 / Accepted: 29 August 2025 / Published: 2 September 2025

(This article belongs to the Section Financial Technology and Innovation)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a methodology for implementing a custom-developed explainability model for credit rating using behavioral data registered during the lifecycle of the borrowing that can replicate the score given by the regulatory model for the solidarity economy in Colombia. The methodology integrates continuous behavioral and financial variables from over 17,000 real credit histories into predictive models based on ridge regression, decision trees, random forests, XGBoost, and LightGBM. The models were trained and evaluated using cross-validation and RMSE metrics. LightGBM emerged as the most accurate model, effectively capturing nonlinear credit behavior patterns. To ensure interpretability, SHAP was used to identify the contribution of each feature to the model predictions. The presented model using LightGBM predicted the credit risk assessment in accordance with the regulatory model used by the Colombian Superintendence of the Solidarity Economy, with a root-mean-square error of 0.272 and an R² score of 0.99. We propose an alternative framework using explainable machine learning models aligned with the internal ratings-based approach under Basel II. Our model integrates variables collected throughout the borrowing lifecycle, offering a more comprehensive perspective than the regulatory model. While the regulatory framework adjusts itself generically to national regulations, our approach explicitly accounts for borrower-specific dynamics.

Keywords:

credit risk modeling; explainable machine learning; internal ratings-based approach; LightGBM; SHAP values

1. Introduction

Credit risk management is of fundamental importance in ensuring the stability of financial institutions, as it determines their capacity to absorb losses arising from credit defaults (Jorion, 2000). In this context, international standards, particularly those developed by the Comité de Supervisión Bancaria de Basilea (2011), have played a main role in the development of regulatory frameworks that guide the identification, measurement, and mitigation of financial risks. The Basel II Accord represented a significant landmark in the field of credit risk modeling, introducing three distinct approaches for this purpose: the standardized approach, the foundation internal ratings-based (IRB) approach, and the advanced IRB approach. Each method provides varying degrees of flexibility in estimating the components of expected loss probability of default (PD), loss given default (LGD), and exposure at default (EAD). While the standardized approach relies mainly on external ratings, the IRB methodologies enable institutions to advantage behavioral and transactional data to enhance expected loss modeling. The fundamental distinction between the foundation and advanced IRB methods lies in the degree of autonomy afforded to banks in estimating risk components. Although advanced methodologies require significant investment in infrastructure and analytical capacity, they allow institutions to achieve more accurate models, reduce excess capital holdings, and strengthen their ability to manage risks and foster sustainable growth.

In the Colombian context, institutions supervised by the Superintendence of the Solidarity Economy (SES) are required to conduct continuous evaluations of the credit risk associated with their loan portfolios. This evaluation must be performed both at origination and at regular intervals during the loan lifecycle, across all lending modalities (consumer, housing, commercial, and microcredit). To support these processes, the SES has developed a reference model that serves as a standardized guide for rating credit portfolios and estimating expected losses (Superintendencia de la Economia Solidaria, 2024). While this regulatory model ensures consistency across entities, it is constrained by its reliance on standardized frameworks that do not consider the operational, social, and financial heterogeneity of solidarity-based institutions.

The Colombian solidarity sector encompasses a diverse array of financial entities, including savings and credit cooperatives, employee funds, and mutual associations. These organizations are characterized by their not-for-profit orientation, democratic governance, and strong community ties, resulting in operational dynamics that differ markedly from those of commercial banks. Heterogeneity in institutional size, membership composition, credit products, and data availability poses unique challenges for credit risk assessment. Consequently, standardized rating models, often designed for traditional banking environments, fail to capture the social and financial specificities of these institutions. Tailored credit scoring models, particularly those employing explainable machine learning techniques, are therefore essential to ensure accurate risk assessments, fair access to credit, and alignment with both regulatory requirements and cooperative principles. Furthermore, the current regulatory model relies predominantly on binary variables (e.g., default/non-default, presence/absence of collateral). While operationally straightforward, this approach limits the explanatory power of risk assessments by disregarding informative continuous variables (e.g., income, tenure, payment history, or debt level) and by constraining the identification of complex patterns associated with default probability. These technical limitations hinder decision-making and may lead to inefficiencies in credit risk management.

Recent credit risk research has delivered a rich toolkit for risk classification and prediction, ranging from logistic regression models to more sophisticated methods such as random forests, gradient boosting, and LightGBM, all of which exhibit high predictive accuracy (Gatla, 2023; Aguilar-Valenzuela, 2024; Machado & Karray, 2022; Sharma et al., 2022). Additionally, various studies incorporate model-agnostic interpretability frameworks such as SHapley Additive exPlanations (SHAP) (Bussmann et al., 2021; Li & Wu, 2024). However, most existing works concentrate on default prediction at origination, leaving the post-disbursement phase—where lifetime PD, LGD, and EAD must be estimated to support forward-looking provisioning—relatively underexplored (Jacobs, 2020; Botha et al., 2025). This gap is even wider in the Colombian solidarity sector, where limited data volumes and regulatory constraints provide little empirical evidence for the application of machine learning frameworks to provisioning models. Few studies, such as that of Bermudez Vera et al. (2025), have attempted to predict default without fully integrating behavioral data across the loan lifecycle, leaving significant opportunities for methodological advances.

Within this broader landscape, Gambacorta et al. (2024) provide evidence from a Chinese fintech firm showing that machine learning models and non-traditional data improve credit risk prediction, particularly during periods of economic stress. Their findings underscore the potential of combining advanced analytics with contextualized datasets to enhance resilience in credit scoring. Complementing this perspective, Alsuhabi (2024) introduced a novel Topp–Leone exponentiated exponential distribution for financial data, offering new insights into risk modeling through innovative statistical frameworks. Together, these contributions highlight a global research trend toward integrating non-traditional data, advanced models, and distributional innovations into credit risk management. However, applications in cooperative and solidarity-based financial systems, particularly in Latin America, remain scarce.

Given this scenario, there is a clear need to develop more adaptive and technically robust credit scoring models tailored to the operational realities of solidarity institutions. The objective of this research is therefore to apply explainable machine learning techniques to credit rating in the Colombian solidarity sector, and to propose a methodology aligned with the principles of the Basel II IRB approach. To this end, we analyze a dataset of 17,518 members from a cooperative, applying both linear and tree-based regression models, including LightGBM. Model performance is evaluated using root-mean square error (RMSE), while interpretability is ensured through the SHAP framework. The findings demonstrate that models incorporating continuous variables drawn from real institutional data can generate credit ratings comparable to those produced by the SES regulatory model, while improving transparency and predictive accuracy. The main novelty of this research lies in the adaptation and validation of explainable machine learning models (specifically LightGBM combined with SHAP) for credit risk rating in the Colombian solidarity sector.

LightGBM outperforms linear methods such as ridge regression due to its ability to capture nonlinear relationships, and the SHAP analysis provides actionable insights by tracing the influence of predictors on individual scores. This interpretability, combined with high predictive performance, positions LightGBM as not only a technically sound alternative but also a management-oriented tool that fulfills traceability and accountability requirements in modern risk management systems. This work thus offers a technically robust and management-oriented tool that strengthens credit risk management in solidarity-based financial institutions, fostering regulatory alignment while respecting the sector’s social and operational specificities.

The remainder of this article is organized as follows: Section 2 presents the dataset, outlines the regulatory model used in the Colombian solidarity sector, and provides a brief introduction to the machine learning models employed. Section 3 evaluates the predictive performance of the models—regularized linear regression (ridge), decision trees, random forests, and LightGBM—and analyzes the contribution of each variable to the prediction of default. Section 4 provides a brief discussion of recent regulatory reforms in the Colombian solidarity sector, illustrating how the proposed models align with the logic of the 2025 regulatory changes. Section 5 presents the conclusions.

2. Materials and Methods

2.1. Data Construction

The dataset comprises the data of 17,518 individual members and was generated through the systematic integration of multiple monthly files containing variables relevant to credit risk analysis. Specifically, 37 files, corresponding to the individual credit portfolio reports, were consolidated, covering the 36 months preceding the evaluation date as well as the evaluation month itself. The features were computed to capture the customer’s realized credit behavior over a fixed look-back window and then collapse that time information into a single scalar value. Because the time dimension is fully aggregated, the final modeling table is strictly cross-sectional: one observation per customer (we used only a single cut-off date), no sequential ordering retained, and no forward-looking information introduced.

In addition to the credit portfolio reports, the data integration process incorporated auxiliary files detailing information on members, employees, and debtors associated with the sale of goods and services, as well as the individual savings report and the individual member contributions report. In total, 40 different reports were merged into a unified data structure. The integration was executed using the unique loan identification number as the relational key across files, ensuring both consistency and data integrity throughout the process. The compiled data pertains exclusively to consumer loans, including both debit and non-debit credits, spanning an approximate three-year timeframe.

For modeling purposes, the consolidated dataset was segmented into two distinct files: one corresponding to credits with debit consumer loans, comprising 6660 records; and another for credits with non-debit consumer loans, encompassing 10,858 records. Although the raw database initially included 96 variables, the selection of predictive features for the non-debit credits was guided by the 12 variables specified in the official regulatory model, as detailed in Appendix A. This methodological approach ensured both regulatory alignment and relevance in the construction of predictive models.

2.2. Regulatory Reference Model

This section presents the regulatory model proposed by the Superintendence of the Solidarity Economy (SES) for estimating credit risk ratings, as applied to both non-debit and debit consumer credit portfolios (Superintendencia de la Economia Solidaria, 2024). The regulatory credit rating model (Z) is the result of a linear statistical model that estimates a borrower’s credit risk level. This score is obtained from a linear combination of variables that describe the borrower’s behavior and characteristics, considering factors such as contributions, savings, loans, and length of service, among others. It is also assigned a specific weight based on technical studies and historical analyses of the institution’s clients. The rating models are described below.

Regulatory Model: Non-Debit Consumer Credit Portfolio

Z = −1.8017 − 0.3758 × EA − 1.1475 × AP + 0.4934 × REEST − 0.387 × CUENAHO − 1.0786 ×
CDAT − 0.0167 × PER + 0.3204 × ENTIDAD1 − 0.8419 × SALPRES +0.1271 × ANTIPRE1
− 0.3912 × ANTIPRE2 − 0.4892 × VIN2 + 0.7877 × MORA1230+ 2.5651 × MORA1260
+ 0.696 × MORA2430 + 2.908 × MORA2460 + 0.8114 × MORA3615

(1)

where EA: status of the member, AP: contributions, REEST: restructured loan, CUENAHO: savings account, CDAT: CDAT, PER: contribution balance, SALPRES: balance/loan ratio, ANTIPRE1: bonding time less than or equal to one month, ANTIPRE2: bonding time greater than 36 months, VIN2: bonding time greater than 120 months, MORA1230: maximum arrears in the last 12 months between 31 and 60 days, MORA1260: maximum arrears in the last 12 months greater than 60 days, MORA2430: maximum arrears in the last 24 months between 31 and 60 days, MORA2460: maximum arrears in the last 24 months greater than 60 days, MORA3615: maximum arrears in the last 36 months between 1 and 15 days, and Z: credit risk rating.

Regulatory Model: Debit Consumer Credit Portfolio

Z = −2.2504 − 0.8444 × EA − 1.0573 × AP + 1.0715 × T C − 1.3854 × COOC DAT +
0.7833 × ANTIPRE1 + 0.8526 × MORA15 + 1.4445 × MORA1230 + 1.3892 × MORA1260 +
0.2823 × MORA2430 + 0.7515 × MORA2460 − 0.6632 × SIN MORA + 1.2362 × MORTRIM

(2)

where TC: type of credit installment, COOCDAT: “Cooperativa de ahorro y crédito,” MORA15: maximum arrears in the last 12 months between 16 and 30 days, SINMORA: if the debtor did not present arrears in the last 36 months, and MORTRIM: one or more arrears of between 31 and 60 days in the last 3 months.

2.3. Proposed Models

Figure 1 shows the project workflow. We split the entire dataset into two sets: a training set and a testing set. We implemented a k-fold cross-validation to tune the hyperparameters of the regression models. The cross-validation also allowed us to validate the variability in the model’s prediction for different training datasets. A randomized search function was used to perform the hyperparameter tuning by trying random combinations (in a defined range) instead of testing all possibilities.

Once the model is trained, we can use it to generate predictions on new data, such as the test dataset. We measured performance using the root-mean square error (RMSE) and R-squared (R²). Finally, we implemented SHAP (SHapley Additive exPlanations) to explain the contribution of each feature to an individual prediction.

The following models were implemented: ridge linear regression, decision trees, random forest, XGBoost (3.0.3), and LightGBM (4.6.0). They are briefly described below.

2.3.1. Ridge

Ridge is an extension of linear regression that penalizes the sum of the squares of the coefficients using the L2-norm (McDonald, 2009). Before the implementation of ridge, all of the features were scaled by the power transformation method (I. K. Yeo et al., 2014). In the ridge implementation, we tuned the hyperparameter alpha, which controls the regularization strength. This controls how much a machine learning model is penalized for being too complex, helping it avoid overfitting. For the randomized search, we used a sample from a log-uniform distribution between 10⁻⁵ and 10⁵. Small alpha values indicate minimal simplification of the model, while large values result in a model that can very easily avoid overfitting. Randomized search helps efficiently find an alpha that best balances fitting the data well and keeping the model general (Bischl et al., 2021).

2.3.2. Decision Trees

A decision tree is a predictive model that splits data into branches based on feature values. In this model, we tuned the cost–complexity pruning alpha hyperparameter (ccp_alpha). This controls how much a decision tree is pruned, which means cutting back the tree to make it simpler. We tuned it using a log-uniform distribution between 10⁻⁵ and 10⁵. A small ccp_alpha keeps the tree large and detailed, while a large ccp_alpha cuts off more branches, making the tree smaller and less complex. This process helps find the best balance between overfitting (being too complex) and underfitting (being too simple) (Friedman et al., 2009).

2.3.3. Random Forest

A random forest model builds many decision trees and combines their predictions to make more accurate and stable forecasts. In this setup, we tuned two parameters: the number of trees built (n_estimators, ranging from 1 to 200), and the degree of simplification of the trees by pruning them (ccp_alpha, varying between minimal and high values, with the same distribution used for the decision tree model). In this case, the hyperparameter tuning helps find the best balance between making detailed predictions and keeping the model general enough to work well on new data (Geurts et al., 2006).

2.3.4. XGBoost

XGBoost is an optimized gradient-boosting framework that builds an ensemble of decision trees one stage at a time, sequentially and additively, using second-order gradient information for fast convergence and built-in regularization to reduce overfitting (Chen & Guestrin, 2016). In this model, we tuned the following parameters: colsample_bytree, which controls the fraction of features randomly selected at each tree’s construction; gamma, which defines the minimum loss reduction required to make a further partition on a leaf node; learning_rate, which controls the speed of learning; max_delta_step, which restricts the maximum weight change allowed in a boosting step; max_depth, which limits how deep the individual trees can grow; min_child_weight, which sets the minimum sum of instance weights in a child node; n_estimators, which specifies the total number of boosting rounds (trees); reg_alpha (L1) and reg_lambda (L2), which are regularization parameters; and finally, subsample, which defines the fraction of training instances used to grow each tree.

2.3.5. LightGBM

LightGBM constructs models using multiple decision trees within a gradient-boosting framework, progressively and efficiently reducing prediction errors, leveraging histograms, leaf-based growth, and intelligent sampling and clustering techniques (Meng et al., 2017). In this model, we tuned some hyperparameters that are equivalent to those adjusted in XGBoost: colsample_bytree, learning_rate, max_depth, n_estimators, reg_alpha (L1 regularization), and reg_lambda (L2 regularization). Other adjusted hyperparameters included min_child_samples, which defined the minimum number of data points required in a leaf; num_leaves, which defined the maximum number of leaves per tree; and subsample, which defined the fraction of training data randomly sampled for each tree.

2.3.6. SHAP

SHAP (SHapley Additive exPlanations) is a machine learning tool that explains how much each feature contributes to the model’s prediction. It helps understand which inputs are most important and whether they have a positive or negative impact on the prediction, making complex models easier to interpret and trust (Lundberg & Lee, 2017).

3. Results

For the practical implementation of the model, the risk analysis team should begin by requesting from the institution the data corresponding to the 12 variables that make up the model (as described in Appendix A). These variables are related to payment and delinquency behavior and must be collected from a monthly historical record covering the three years prior to the evaluation date. The features related to the historical records of the borrowers are collapsed into a single scalar value: for instance, MORA12, representing the maximum delinquency recorded over the past 12 months. Next, the proposed models should be applied, and the SHAP analysis should be incorporated as described in the following subsections.

3.1. Exploratory Analysis

The purpose of the exploratory data analysis (EDA) was to gain an initial understanding of the dataset’s structural characteristics and to investigate the relationships between the independent variables and the target variable Z (credit rating). To this end, univariate descriptive statistics were computed, and the distributions of variables were visualized through histograms and scatter plots. The associations between features and the target were assessed using the Spearman rank correlation coefficient, which is well suited for detecting monotonic (including nonlinear) relationships.

Based on the results of this analysis, variables that contributed little to predictive performance were excluded. Specifically, the variables TipoCuota and Activo were removed from the debit consumer credit dataset, while Reestr was eliminated from the non-debit consumer credit dataset. This refinement of the feature space aimed to improve the data quality for subsequent modeling steps and to mitigate issues such as multicollinearity and model overfitting.

Figure 2 displays the heatmap corresponding to the Spearman correlation matrix. This visualization captures the monotonic relationships between all predictor variables and the target variable, Z (credit rating). As shown, the target variable does not exhibit strong correlations with most of the predictors; many of the correlation values are represented in light blue or white tones, indicating weak or negligible associations. Features EA and TC show near-zero correlation coefficients, suggesting that they have minimal or no direct influence on the target variable. In contrast, a cluster of futures (MORA1230, MORA1260, MORA2430, MO-RA2460, MORTRIM, and MORA15) exhibits strong positive intercorrelations, as evidenced by the presence of deep red tones. This pattern suggests a high degree of co-movement among different default indicators: increases in default within a one-time window tend to be associated with increased default across other temporal ranges.

3.2. Data Preparation for Model Implementation

The selection of predictor variables was based on the regulatory model, which employs 12 variables for debit consumer credit and 14 for non-debit consumer credit. Unlike the regulatory approach, which predominantly uses binary variables, the proposed model utilizes their continuous values. This methodological shift enables a more nuanced characterization of individual financial behavior, thereby avoiding the loss of information inherent in variable binarization.

Two variables—TC (credit type) and EA (active status)—were excluded due to a lack of variability. The EA variable is highly imbalanced, with 6,658 observations coded as 1 and only two as 0, while TC offers no meaningful differentiation across observations. Both were deemed irrelevant and removed from further analysis. Additionally, the delinquency-related variables (MORA1230, MORA1260, MORA2430, MORA2460, MORATRIM, and MORA15) exhibited strong positive intercorrelations, indicating significant redundancy among them. To address this, they were consolidated into a single variable, MORA12, representing the maximum delinquency recorded over the past 12 months. This transformation reduces multicollinearity while retaining the most informative aspect of recent payment behavior, thereby enhancing the model’s representativeness.

A detailed description of all variables and the modifications applied is provided in Appendix A. However, the most relevant adjustments are summarized below.

For variables reflecting the member’s financial relationship and solvency, the proposed model replaces binary indicators with continuous metrics. For example, instead of using a binary variable to indicate the presence of savings, contributions, or term deposits (CDATs), the model incorporates the actual balance at the time of evaluation, offering a more precise representation of the member’s financial engagement with the institution. Regarding seniority, the model replaces categorical groupings with a continuous variable that measures the length of affiliation in months. This adjustment enhances the granularity of the analysis, allowing for a more accurate assessment of how tenure impacts credit behavior. For payment behavior, a continuous variable capturing the maximum delinquency observed during the evaluation period is used, rather than classifying arrears into predefined risk thresholds as mandated by Supersolidaria. This approach yields a more detailed view of credit history and enhances the model’s predictive accuracy. These refinements result in a reduction in the explanatory variables: from 12 to 9 in the debit consumer credit dataset, and from 14 to 10 in the non-debit consumer credit dataset.

3.3. Models for Debit Consumer Credit

For this credit line, several regression models, including both linear and decision tree-based models, were evaluated. The best results obtained are presented below.

3.3.1. Linear Regression Model

The linear model that produced the best results was a ridge regression model with a λ of 0.010 and power transformation of the numerical variables. With this model, a mean validation RMSE of 0.338, a test RMSE of 0.344, and an R² of 0.640 were obtained. The equation resulting from this model was

Z = −3.864 − 0.022 × AP − 0.153 × ANTI + 0.178 × MORA12 − 0.154 × COOCDAT −
0.800 × SINMORA

(3)

All variables were transformed using the Yeo–Johnson method (I. Yeo & Johnson, 2000), except for the SINMORA variable, which is binary.

As previously observed in Equations (1) and (2), the regulatory model incorporates a large number of explanatory features. In contrast, the model based on ridge regression, developed using real historical data, presents a simpler structure, retaining only those variables that proved statistically significant under the L2 penalty.

The residuals of this model had a mean of 5.59 × 10⁻⁴ and a standard deviation of 0.344.

3.3.2. Tree-Based Models

With a single decision tree, the best model obtained had a complexity coefficient (ccp_alpha) of 1.317 × 10⁻². With this model, the mean validation RMSE was 0.248, the test RMSE was 0.248, and the R² in the test set was 0.813. It is worth clarifying that neither this model nor any tree-based model was preprocessed for variables.

The most important feature of this model was MORA12, with a score of 0.674, followed by ANTI (score = 0.136) and SINMORA (score = 0.101). Finally, the COOCDAT score was 0.089, and the AP score was 0. The residuals of this model had a mean of 1.06 × 10⁻² and a standard deviation of 0.248.

Random forest models were also evaluated, with the best performer having 226 estimators and a ccp_alpha of 1.02 × 10⁻². With this model, the mean validation RMSE was 0.237, the test RMSE was 0.240, and the R² in the test set was 0.825. The most important feature of this model was MORA12, with a score of 0.673, followed by ANTI (score = 0.140) and SINMORA (score = 0.097). Finally, the COOCDAT score was 0.090, and the AP score was 0. The residuals of this model had a mean of 7.99 × 10⁻³ and a standard deviation of 0.24.

XGBoost models were also evaluated. Of these, the one with which the best results were obtained had the following hyperparameters: colsample_bytree: 0.755, gamma: 0.170, learning_rate: 0.156, max_delta_step: 2, max_depth: 4, min_child_weight: 8, n_estimators: 422, reg_alpha: 1.009 × 10⁻², reg_lambda: 4.787, and subsample: 0.624. With this model, the mean validation RMSE was 0.219, the test RMSE was 0.228, and the R² in the test set was 0.841. In this model, the residuals’ mean was 3.23 × 10⁻³, and the standard deviation was 0.227. The most important feature of this model was SINMORA, with a score of 0.593, followed by MORA12 (score = 0.170) and COOCDAT (score = 0.166). Lastly, the ANTI feature had a score of 0.054, and AP was 0.018.

The last family of models evaluated was the LightGBM models. Of these, the one that gave the best results had the following hyperparameters: colsample_bytree: 0.648, learning_rate: 0.071, max_depth: 13, min_child_samples: 30, n_estimators: 134, num_leaves: 74, reg_alpha: 1.592, reg_lambda: 0.277, and subsample: 0.518. With this model, the mean validation RMSE was 0.215, the test RMSE was 0.223, and the R² in the test set 0.849. In this model, the residuals had a mean of 4.27 × 10⁻³ and a standard deviation of 0.222. The most important variable in this model was MORA12 (normalized score = 0.541), followed by SINMORA (normalized score = 0.191) and ANTI (normalized score = 0.133). Finally, the COOCDAT score was 0.093, and the AP score was 0.043.

Table 1 summarizes the results of the metrics obtained with each model evaluated. It can be observed that the model yielding the best results is LightGBM. However, the other ensemble models evaluated (random forests and XGBoost) also provided similar results in terms of the RMSE metric.

It can be observed that the ridge model tends to make more or less accurate predictions for values less than or equal to −3, but for values greater than −3 it tends to predict values close to −3. On the other hand, in assembly models, it can be observed in Figure 3 that they tend to make more or less accurate predictions for values less than or equal to −1, but for values greater than this they tend to predict values close to −1.

Figure 4 shows the relative importance of the features in each assembly model evaluated. It can be seen that, for three of these models, the most important feature is MORA12, and only for the XGBoost model is it SINMORA. What all models agree on is that the AP feature is the least important, to the point that, for decision tree models and random forests, it has zero importance. Although they are not directly comparable with the sizes of the coefficients of each feature in the linear ridge model, it is interesting to note that, in this model, the highest coefficient, in terms of absolute value, corresponds to the SINMORA feature (0.800), followed by MORA12 (0.178) and COOCDAT (0.154), coinciding with the order of importance of the XGBoost model.

3.3.3. SHAP Global Analysis of the LightGBM Model

Given that the LightGBM model yielded the best results, and that it is essential for the entity to understand how the model’s characteristics influence its predictions, an interpretability analysis was conducted using SHAP. This analysis facilitates the justification of decisions to associates or users, thereby promoting trust and confidence among them.

First, a global interpretability analysis was conducted, the results of which are shown in Figure 5, to understand how the variables impact the predictions. Low values in the ANTI and SINMORA features result in an increased predicted value, indicating a higher risk. Additionally, high values in the MORA12 feature also increase the prediction value. Finally, high values in COOCDAT cause the prediction value to decrease; that is, subjects with high values in this feature tend to have a better risk rating.

A similar analysis, conducted only for high-risk loans (B rating or higher), reveals that, in this case, the most significant characteristics are MORA12 and SINMORA, with the other characteristics having a minor impact.

Comparing the order of importance of the characteristics obtained from global SHAP analyses with those from LightGBM’s method reveals a difference. This difference is explained by the fact that, while LightGBM calculates the relative importance of the features from the gain provided by each of them to optimize the loss function, SHAP calculates the marginal contribution of each feature, but this can be affected if there are strongly correlated variables (Holzinger et al., 2022), as is the case in this case with the MORA12 and SINMORA features. We chose to leave both variables, despite their high correlation, because excluding any of them ostensibly degraded the performance of all models.

3.3.4. SHAP Local Analysis of LightGBM Model

With SHAP, local analysis can also be performed, identifying how each characteristic influences a prediction. For example, Figure 6 illustrates the case of a subject with low risk. For this subject, the prediction was −4.823, which is a lower value than the base prediction, which is −4.6 (the base value corresponds to the average of the values of the target variable). It can be observed that the lowest prediction is due to the values of the characteristics ANTI (2185, a high value), SINMORA (1, a high value), and MORA12 (0, a low value). On the other hand, the value of COOCDAT (0, a low value) causes the value of the prediction to rise; that is, it goes in the opposite direction to the final prediction.

It is important to note that, in a local analysis, the order of relative importance of the features does not necessarily coincide with the order of the global analysis.

3.4. Models for Non-Debit Consumer Credit

For non-debit consumer credit, several regression models, both linear and based on decision trees, were evaluated. The best results obtained are presented below.

3.4.1. Linear Regression Model

The linear model that yielded the best results was a ridge model with λ = 0.164, a power transformation of the numerical variables, and the elimination of outliers (observations above the 99th percentile) for the AP and SALPRES variables. With this model, a mean validation RMSE of 1.30, a test RMSE of 1.34, and an R² in the test set of 0.768 were obtained. The equation resulting from this model was

Z = −0.733 + 2.104 × MORA12 + 0.098 × SALPRES − 0.245 × ANTI − 0.380 × AP − 1.421 × CTIVO

(4)

The residuals of this model had a mean of 3.01 × 10⁻² and a standard deviation of 1.344.

3.4.2. Tree-Based Models

With a single decision tree, the best model obtained had a complexity coefficient (ccp_alpha) of 2.21 × 10⁻². With this model, the mean validation RMSE was 0.486, the test RMSE was 0.476, and the adjusted R² in the test set was 0.971. It is worth clarifying that neither this model nor the following ones underwent any preprocessing of variables.

The most important variable in this model is MORA12, with a score of 0.974, followed by ANTI (score = 0.015) and AP (score = 0.012). Finally, the score of SALPRES and Active was 0. The residuals of this model had a mean of −1.50 × 10⁻² and a standard deviation of 0.476.

With random forest models, the one that gave the best results had 316 estimators and a ccp_alpha of 2.96 × 10⁻⁴. With this model, the mean validation RMSE was 0.296, the test RMSE was 0.280, and the R² in the test set was 0.990. In this model, the residuals’ mean was −5.24 × 10⁻⁴, and the standard deviation was 0.280. The most important feature of this model was MORA12, with a score of 0.955, followed by ANTI (score = 0.021) and Active (score = 0.009). Finally, the AP feature had a score of 0.008, and SALPRES had a score of 0.007.

With XGBoost models, the one that gave the best results had the following hyperparameters: colsample_bytree: 0.982, gamma: 1.555, learning_rate: 0.152, max_delta_step: 4, max_depth: 9, min_child_weight: 8, n_estimators: 266, reg_alpha: 1.986 × 10⁻³, reg_lambda: 2.616, and subsample: 0.987. With this model, the mean validation RMSE was 0.332, the test RMSE was 0.294, and the R² in the test set was 0.989. In this model, the residuals’ mean was 6.80 × 10⁻⁵, and the standard deviation was 0.295. The most important feature of this model was MORA12, with a score of 0.918, followed by Active (score = 0.034) and AP (score = 0.021). Finally, SALPRES had a score of 0.006.

The last model evaluated, and the one that gave the best results, was a LightGBM regression model. With this model, the mean validation RMSE was 0.288, the test RMSE was 0.276, and the adjusted R² in the test set was 0.990. In this model, the residuals’ mean was 1.75 × 10⁻³, and the standard deviation was 0.272. The values of the hyperparameters tuned in this model were as follows: colsample_bytree: 0.963, learning_rate: 0.245, max_depth: 11, min_child_samples: 16, n_estimators: 168, num_leaves: 47, reg_alpha: 6.660, reg_lambda: 0.938, and subsample: 0.818. The most important feature of this model was MORA12, with a normalized score of 0.966, followed by ANTI (score = 0.017) and AP (score = 0.014). Lastly, the SALPRES feature had a score of 0.003, and Active had a score of 0.000.

Table 2 summarizes the results of the metrics obtained with each model evaluated. It can be observed that the model yielding the best results is LightGBM. However, the other ensemble models evaluated (random forests and XGBoost) also provided satisfactory results in terms of the RMSE metric.

Figure 7 shows the scatter plots of the actual values versus the values estimated by the different models. It can be observed that the ridge and decision tree models exhibit high variability in their predictions, as evidenced by their high RMSE values. However, paradoxically, this variability is not reflected in the R² values. On the other hand, assembly models tend to make more accurate predictions and exhibit a lower level of variability, particularly for high values of Z (greater than 0).

Despite the higher R², ensemble models are not overfitted, as a visual inspection reveals a high correlation between the estimated and real values, albeit not a perfect one. In addition, the validation and test RMSEs are similar in all cases, which would not be the case if the model were overfitted.

Figure 8 shows the relative importance of the features in each assembly model evaluated. It can be seen that, in all of these models, the most important feature is MORA12. It should also be noted that, in general, all other features have little importance in the models.

Although they are not directly comparable with the sizes of the coefficients of each feature in the linear ridge model, it is interesting to note that, in this model, the highest coefficient, in terms of absolute value, corresponds to the MORA12 feature (2.104), followed by Activo (1.421) and AP (0.380), coinciding (again) with the order of importance of the XGBoost model.

3.4.3. Global SHAP Analysis of LightGBM Model

Regarding the debit consumer portfolio, an interpretability analysis was conducted using SHAP. The global analysis (see Figure 9) shows that low values of MORA12 have a negative impact on the value of Z, while high values have a positive impact. Low values of ANTI and AP also have a positive impact, albeit not as significant as that of MORA12. However, low SALPRES values negatively impact the Z value.

On the other hand, for subjects with an elevated level of risk, it can be observed that the most important characteristic remains MORA12; however, ANTI and AP exchange their positions, although the direction of the impacts remains the same as in the general case.

In this case, it can be observed that there is a coincidence in the order of importance of the characteristics given by the LightGBM and SHAP methods in the set of all subjects. Regardless, in the case of high-risk subjects, SHAP finds that the second most important characteristic is AP and not ANTI.

3.4.4. SHAP Local Analysis of LightGBM Model

Figure 10 corresponds to a SHAP waterfall plot visualization, which enables the decomposition and analysis of the prediction generated by the model for a particular individual, allowing for the accurate interpretation of the contribution of each explanatory feature to the model’s results.

For example, we will present a local analysis conducted on a high-risk individual. The model has a base prediction expectation of −2.073. From this reference point, the marginal contributions of each customer feature are identified, which modify this prediction until a final value of 2.031 is reached. The feature with the most significant impact is MORA12 (whose value is 270 days (about 9 months) of default), which generates a positive contribution of +3.94 units to the prediction, evidencing a strong association between high levels of default in the last 12 months and the increase in the credit risk rating assigned by the model. This feature dominates the explanation of the result, suggesting that the history of recent defaults is the primary determinant of the risk profile in this case. Other features have marginal impacts: ANTI (permanence in the institution of 1360 days) increases the prediction by +0.22.

On the other hand, the AP feature (with a value of USD 216,274 in available contributions) has a mitigating effect, with a contribution of −0.12, a financial feature that reduces the perceived risk. Finally, SALPRES (ratio of balance to loan value 0.778) provides a small positive contribution (+0.05) without significantly affecting the model’s decision. The weighted sum of these effects enables the model to adjust its prediction from the average value to an individualized output, which, in this case, represents an adverse credit rating.

4. Discussion

Recent regulatory reforms in Colombia’s solidarity sector reflect a global trend of alignment with the Basel II and III agreements. There is a significant transition between the reference model used in 2024 and the new model adopted in 2025 for credit risk assessment. The 2024 model, described in Section 3 using Equations (1) and (2), incorporates multiple binary variables that represent the structural characteristics of the associate. On the other hand, the 2025 model, as seen in Equations (5) and (6), focuses more clearly on variables related to credit behavior, especially those associated with non-performing loans.

Debit consumer credit 2025:

Z = −1.523 − 2.081 × EA + 2.495 × MORA1230 + 3.062 × MORA1260 − 0.575 × SINMORA
+ 0.319 × MORA2430N + 1.615 × MORA3660

(5)

Non-debit consumer credit 2025:

Z = −1.523 − 2.081 × EA + 2.495 × MORA1230 +3.062 × MORA1260 − 0.575 ×
SINMORA + 0.319 × MORA2430N + 1.615 × MORA3660

(6)

The findings of this study highlight the practical and regulatory value of explainable machine learning models in improving credit risk assessment within Colombia’s solidarity financial sector. The ridge regression model, which prioritizes variables directly associated with borrowers’ repayment behavior, particularly their delinquency history, shows a clear alignment with the internal ratings-based (IRB) principles of Basel II. By leveraging continuous variables and avoiding the discretization common in traditional regulatory models (e.g., binary indicators for default in specific timeframes), Ridge retains the granularity of the original data, aligning with the logic of the 2025 regulatory update by the Superintendence of the Solidarity Economy (SES), which encourages the use of behavioral variables over rigid rule-based inputs. Ridge’s simplicity, transparency, and low dimensionality make it especially appropriate for cooperatives with limited technical capacity, offering a viable entry point for the gradual adoption of advanced internal risk models.

In contrast, the LightGBM model stands out for its good predictive performance. It achieved an RMSE of 0.224 and an adjusted R² of 0.847 in the debit portfolio, and an RMSE of 0.272 with an adjusted R² of 0.990 in the non-debit portfolio. These results substantially surpass those of linear models and simple decision trees, confirming LightGBM’s capacity to capture complex interactions and nonlinear patterns that traditional models often miss. This outcome is consistent with the existing literature on the superior predictive power of boosting methods such as LightGBM and XGBoost in credit scoring (Gatla, 2023). Furthermore, LightGBM provides operational flexibility by allowing the model to be retrained periodically or updated with new indicators without awaiting formal regulatory revisions, an advantage in dynamic credit environments.

However, it is necessary to take the results obtained in the R² metric with some caution, since several previous studies have shown its inconvenience to evaluate nonlinear predictive models such as ours, since it tends to give very inflated results (Sapra, 2014; Book & Young, 2006). Despite this, we decided to report this result because it is a widely used metric in economics and finance. Consequently, all of our analyses were based on the results of the RMSE metric, which does not have the problems of R², and on the visual analysis of the scatter plots of the estimated values versus the actual values in the test set.

Another important consideration is that, since the Superintendency model applies the sigmoid function to the target variable (Z) and then discretizes it into five ranges to grant a risk level rating, it is likely that, for subjects whose estimated values of Z are close to the limits between ranges, the level of risk estimated with our models will be different from that obtained with the models of the Superintendency.

Comparing the models obtained for the two credit modalities reveals that we achieved better results in the non-debit consumer credit modality, not due to the higher R² values that they present, but rather because they perform better with values in the upper Z range, which corresponds to a higher level of risk. We also found that, in this type of credit, all models, and even SHAP, agree that the most relevant characteristic is MORA12, and that, in contrast to this, the others are less important. In the modality of debit consumer credit, there are divergences in the order of relative importance of the characteristics. Thus, while for three models the most important feature is MORA12, for the other two models (and for SHAP) the most important feature is SINMORA. These differences may be due, among other reasons, to the high collinearity between these variables, which we attempted to solve by excluding some of them or applying techniques such as PCA to obtain new linearly independent characteristics, but we did not obtain good results.

A common concern with high-performing machine learning models is their lack of interpretability. This limitation was effectively addressed through the integration of SHAP (SHapley Additive exPlanations), which decomposes each prediction into the individual contributions of its explanatory variables. SHAP values make it possible to meet supervisory expectations of transparency and accountability by enabling auditors and regulators to trace the rationale behind each credit decision, including adverse ratings or rejections. Additionally, SHAP enables institutions to audit the influence of sensitive variables and identify and mitigate potential biases. SHAP-based explanations offer actionable value. Institutions can provide personalized feedback to applicants, target financial education efforts toward specific risk factors, and refine credit policies based on behavioral insights. SHAP values also help adjudicate borderline credit decisions, providing a defendable basis for approvals or denials during reviews or appeals. This level of explainability aligns with principles of social accountability, traceability, and cooperative member engagement, further reinforcing the ethical dimension of the proposed framework. Although SHAP has been primarily applied in this study for interpretability and internal decision support, its potential in regulatory and governance contexts is increasingly recognized in both academic and applied settings. For example, Bussmann et al. (2021) showed that SHAP can be integrated into model governance workflows to document variable importance and ensure consistency in decision-making processes. In our context, SHAP values can be used to generate standardized explanation reports for internal audit trails, identifying the primary risk drivers for each credit decision. These explanations can serve as supporting documentation during supervisory reviews or appeals, particularly in borderline or adverse decisions. A stylized use case is presented in Figure 6 (Section 3.3.4), where SHAP values are used to justify a low-risk classification by detailing the contribution of each feature. This level of detail could be integrated into credit committee reports, automated decision dashboards, or client-facing disclosure formats, reinforcing institutional accountability.

The ridge and LightGBM models can be applied at every stage of the loan lifecycle. During origination, they enhance credit assessments with richer behavioral inputs while maintaining interpretability. In the monitoring phase, they support dynamic risk reassessment using updated borrower data, enabling early identification of deterioration. Critically, the models allow for the estimation of expected credit loss (ECL), leading to more accurate provisioning practices and stronger financial management. In the collection and resolution phase, risk estimates inform targeted recovery strategies. While easing capital adequacy requirements under the IRB approach is a long-term regulatory incentive, the immediate value of the proposed framework lies in improving credit decision quality, portfolio oversight, and borrower engagement.

It is important to emphasize that the implementation of internal models, as proposed in this study, does not inherently result in an automatic or unjustified reduction in regulatory capital requirements. Under the Basel II IRB approach, such a reduction is only warranted when internal models demonstrate superior accuracy in estimating expected credit losses, thereby reducing uncertainty and improving risk-adjusted solvency. Although this study does not include a quantitative simulation of capital impacts (an area identified for future research), the high explanatory power of LightGBM (with R² values near 0.99) indicates that the model captures a significant portion of credit loss variability, laying a technical foundation for more efficient capital use. However, adopting these models must not lead to a weakening of prudential standards. Instead, it allows for more effective capital allocation, redirecting resources to lower-risk segments or strengthening risk mitigation strategies. Provided that models are robust, validated, and subject to adequate supervision, capital optimization can be achieved without increasing credit risk or compromising institutional soundness. Ultimately, the proposed explainable ML framework complements rather than contradicts regulatory standards, enabling a shift toward more adaptive, transparent, and ethically grounded credit risk management in the solidarity sector.

5. Conclusions

In this paper, we propose an alternative model to compute a credit rating for the borrowers of the financial cooperatives in Colombia. Notably, the LightGBM model demonstrates a superior ability to capture the complexity of credit behavior in this sector. One of the main strengths of this approach, in comparison with the reference model, is the use of continuous financial variables, which enable the detection of subtle differences between partners. However, a key point to highlight in terms of practical applicability is the incorporation of SHAP analysis, which facilitates the interpretation of the credit scores generated by LightGBM. SHAP could enable cooperatives to identify the variables that have the most significant impact on the score, and to communicate this information clearly to members, managers, control entities, and supervisors. This feature makes LightGBM not only a highly accurate model but also a tool aligned with the principles of traceability and accountability required in modern risk management systems.

The implementation of explainable machine learning models across the loan lifecycle enhances credit risk management by improving risk assessment, enabling dynamic monitoring through expected credit loss estimation and better provisioning, and supporting tailored recovery strategies. Unlike the reference model, which defines parameters for adjusting the model to national regulations in a generic way, our model was established based on information specific to a solidarity sector entity. Our methodology enables us to identify which variables influenced a specific rating or which ones are most sensitive to the entity in the credit management process.

This research opens multiple lines: the incorporation of alternative data (such as transactional information or digital behaviors), the use of deep learning models combined with interpretability based on SHAP, and the multicenter validation of the model using data from different cooperatives across the country, developing complementary models that estimate the expected loss by integrating the dimensions of exposure to non-compliance, the severity of the loss, and the probability of non-compliance at the associate or segment level, based on the scores obtained from the current models.

Author Contributions

Conceptualization, M.A.A.-S. and L.F.M.-G.; methodology, M.A.A.-S.; software, J.J.Q.-M., A.F.O.-D., and L.U.C.; validation, J.J.Q.-M., A.F.O.-D. and L.U.C.; formal analysis, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C. investigation, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C.; resources, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C.; data curation, M.A.A.-S., J.J.Q.-M. and L.U.C., writing—original draft preparation, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C.; writing—review and editing, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C.; visualization, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C. supervision, M.A.A.-S.; project administration, M.A.A.-S.; funding acquisition, M.A.A.-S., L.F.M.-G., J.J.Q.-M., A.F.O.-D. and L.U.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Definition of Variables

Table A1. Variables indicating the member’s relationship and solvency.

Variable	Regulatory Model Definition	Proposed Model
EA	Corresponds to the member’s status in the evaluation month; it takes the value 1 if the member is “active”; otherwise, it takes the value 0.	Corresponds to the member’s status in the evaluation month; it takes the value 1 if the member is “active”; otherwise, it takes the value 0.
AP	The debtor has contributions in the solidarity organization; it takes the value 1 if the amount of contributions is greater than zero; otherwise, it takes the value 0.	Total contribution balance in the evaluation month.
REEST	It takes the value 1 if the loan is restructured; otherwise, the value is 0.	Takes the value 2 or 4.
CUENAHO	If the debtor has a “Savings Account” product with a balance >1 and active status in the solidarity organization, it takes the value 1; otherwise, it takes the value 0.	Savings account balance in the evaluation month.
CDAT	If the debtor has an active “CDAT” product in the solidarity organization, it takes the value 1; otherwise, it takes the value 0.	CDT balance in the evaluation month.
COOCDAT	If the organization type is “Savings and Credit Cooperative” and the member has an active CDAT with the cooperative, it takes the value 1; otherwise, the value is 0.	CDT balance in the evaluation month.
PER	If the debtor has “Permanent Savings” in the solidarity organization, it takes the value 1; otherwise, it takes the value 0.	Permanent savings balance in the evaluation month.
SALPRES	If the balance/loan ratio is less than 20% it takes the value 1; otherwise, the value is 0.	Total balance/loan.
TC	Type of loan installment; it takes the value 1 if the installment is not fixed; otherwise, it takes the value 0.	Not considered in the model, since it is a constant.

Table A2. Seniority variables.

Variable	Regulatory Model Definition	Proposed Model
ANTIPRE1	Corresponds to the length of time the member has been affiliated with the organization at the date the loan was requested; if it is less than or equal to one month, it takes the value 1; otherwise, it takes the value zero.	Corresponds to the length of time the member has been affiliated with the organization.
ANTIPRE2	Corresponds to the length of time the member has been affiliated with the organization at the date the loan was requested; if it is more than 36 months, it takes the value 1; otherwise, it takes the value 0.	Corresponds to the length of time the member has been affiliated with the organization.
VIN2	Corresponds to the length of time the member has been affiliated with the solidarity organization; if it is more than 120 months, it takes the value 1; otherwise, it takes the value 0.	Corresponds to the length of time the member has been affiliated with the organization.

Table A3. Maximum arrears variables.

Variable	Regulatory Model Definition	Proposed Model
MORA1230	If the maximum arrears in the last 12 months are between 31 and 60 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 12 months.
MORA1260	If the maximum arrears in the last 12 months are greater than 60 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 12 months.
MORA15	If the maximum arrears in the last 12 months are between 16 and 30 days, it takes the value 1; otherwise, it takes the value 0.	Corresponds to the maximum arrears in the last 12 months.
MORA2430	If the maximum arrears in the last 24 months are between 31 and 60 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 24 months.
MORA2460	If the maximum arrears in the last 24 months are greater than 60 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 24 months.
MORA2430N	If the maximum arrears in the last 24 months are between 31 and 60 days and MORA1230 is equal to 0, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 24 months.
MORA3615	If the maximum arrears in the last 36 months are between 1 and 15 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 36 months.
MORA3660	If the maximum arrears in the last 36 months are between 31 and 60 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 36 months.
SINMORA	If the debtor did not present any arrears in the last 36 months, it takes the value 1; otherwise, the value is 0.	Takes the value 1 if the variable Max-arrears-36 is equal to zero; otherwise, it takes the value 0.
MORA315	If the maximum arrears in the last 3 months are between 16 and 30 days, it takes the value 1; otherwise, the value is 0.	Corresponds to the maximum arrears in the last 3 months.
MORTRIM	Takes the value 1 if the debtor presented one or more delinquencies of between 31 and 60 days in the last 3 months; otherwise, it takes the value 0.	Corresponds to the maximum arrears in the last 3 months.

References

Aguilar-Valenzuela, G. R. (2024). Algoritmos para machine learning utilizados en la Gestión de Riesgo Crediticio en Perú: Machine learning algorithms used in credit risk management in Peru. Micaela Revista de Investigación-UNAMBA, 5(1), 5–30. [Google Scholar] [CrossRef]
Alsuhabi, H. (2024). The new Topp-Leone exponentied exponential model for modeling financial data. Mathematical Modelling and Control, 4, 44–63. [Google Scholar] [CrossRef]
Bermudez Vera, I. M., Mosquera Restrepo, J., & Manotas-Duque, D. F. (2025). Data mining for the adjustment of credit scoring models in solidarity economy entities: A methodology for addressing class imbalances. Risks, 13(2), 20. [Google Scholar] [CrossRef]
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., & Boulesteix, A.-L. (2021). Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. Reseñas Interdisciplinarias de Wiley and Knowledge Discovery, 13(2), 1–43. [Google Scholar] [CrossRef]
Book, S. A., & Young, P. H. (2006). The trouble with R². Journal of Parametrics, 25(1), 87–114. [Google Scholar] [CrossRef]
Botha, A., Verster, T., & Breedt, R. (2025). Modelling the term-structure of default risk under IFRS 9 within a multistate regression framework. arXiv, arXiv:2502.14479. [Google Scholar]
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57(1), 203–216. [Google Scholar] [CrossRef]
Chen, T., & Guestrin, C. (2016, August 13–17). XGBoost: A scalable tree boosting system. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794), San Francisco, CA, USA. [Google Scholar]
Comité de Supervisión Bancaria de Basilea. (2011). Basilea III: Marco regulatorio global para reforzar los bancos y los sistemas bancarios (versión revisada). Obtenido de Banco de Pagos Internacionales. Available online: https://www.bis.org (accessed on 28 August 2025).
Friedman, J. H., Robert, T., & Trevor, H. (2009). Elements of statistical learning. Universidad de Pennsylvania. [Google Scholar]
Gambacorta, L., Huang, Y., Qiu, H., & Wang, J. (2024). How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm. Journal of Financial Stability, 73, 101284. [Google Scholar] [CrossRef]
Gatla, T. R. (2023). Machine learning in credit risk assessment: Analyzing how machine learning models are transforming the assessment of credit risk for loans and credit cards. Journal of Emerging Technologies and Innovative Research, 10, 746–750. [Google Scholar]
Geurts, P., Damien, E., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. [Google Scholar] [CrossRef]
Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K. R., & Samek, W. (Eds.). (2022). General pitfalls of model-agnostic interpretation methods for machine learning models. In xxAI—Beyond explainable AI. xxAI 2020 (Vol. 13, pp. 39–68). Lecture Notes in Computer Science. Springer. [Google Scholar]
Jacobs, M., Jr. (2020). A holistic model validation framework for current expected credit loss (CECL) model development and implementation. International Journal of Financial Studies, 8(2), 27. [Google Scholar] [CrossRef]
Jorion, P. (2000). Valor En Riesgo: El nuevo paradigma para el control de riesgos con derivados. Scribd. [Google Scholar]
Li, H., & Wu, W. (2024). Loan default predictability with explainable machine learning. Finance Research Letters, 60, 104–867. [Google Scholar] [CrossRef]
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 1–11. [Google Scholar]
Machado, M., & Karray, S. (2022). Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Systems with Applications, 200, 116–889. [Google Scholar] [CrossRef]
McDonald, G. (2009). Ridge regression. Wiley Interdisciplinary Reviews: Com-Putational Statistics, 1(1), 93–100. [Google Scholar]
Meng, K., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Pro-Cessing Systems, 30, 1–9. [Google Scholar]
Sapra, R. L. (2014). Using R2 with caution. Current Medicine Research and Practice, 4(3), 130–134. [Google Scholar] [CrossRef]
Sharma, A. K., Li, L. H., & Ahmad, R. (2022, November 18). Default risk prediction using random forest and xgboosting classifier. 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-Data Applications (Vol. 314, pp. 91–101), Taichung City, Taiwan. [Google Scholar]
Superintendencia de la Economia Solidaria. (2024). Sistema de administración del riesgo de crédito. SARC. [Google Scholar]
Yeo, I., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrik, 87(4), 954–959. [Google Scholar] [CrossRef]
Yeo, I. K., Johnson, R. A., & Deng, X. (2014). An empirical characteristic function approach to selecting a transformation to normality. Communications for Statistical Applications and Methods, 21(3), 213–224. [Google Scholar] [CrossRef]

Figure 1. Project workflow.

Figure 2. (a) Heatmap of Spearman correlation coefficients for debit consumer credit. (b) Heatmap of Spearman correlation coefficients for non-debit consumer credit.

Figure 3. Scatter plots of the estimated values for each model versus the actual values in the test set, for debit consumer credit.

Figure 4. Relative importance of each feature in the assembly models evaluated for the credit with debit modality.

Figure 5. Global SHAP analysis of the LightGBM model for debit consumer credit: (a) All subjects. (b) High-risk subjects (Grade B or higher).

Figure 6. SHAP local analysis of a low-risk subject in debit consumer credit.

Figure 7. Scatter plots of the estimated values for each model versus the actual values in the test set, for non-debit consumer credit.

Figure 8. Relative importance of each feature in the assembly models evaluated for non-debit credit.

Figure 9. Global SHAP analysis of LightGBM model for non-debit credits: (a) All subjects. (b) High-risk subjects (Grade B or higher).

Figure 10. SHAP local analysis of a high-risk subject in non-debit credit.

Table 1. Metric values obtained by each of the models evaluated for debit consumer credit.

Metric	Ridge	Decision Tree	Random Forest	XGBoost	LightGBM
Mean Validation RMSE	0.338	0.260	0.237	0.219	0.215
Test RMSE	0.344	0.248	0.240	0.228	0.223
Test R²	0.640	0.813	0.825	0.841	0.849

Table 2. Metric values obtained by each of the models evaluated for the non-debit credits.

Metric	Ridge	Decision Tree	Random Forest	XGBoost	LightGBM
Mean Validation RMSE	1.300	0.486	0.296	0.332	0.288
Test RMSE	1.340	0.476	0.280	0.294	0.276
Test R²	0.768	0.971	0.990	0.989	0.990

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arias-Serna, M.A.; Quiza-Montealegre, J.J.; Móntes-Gómez, L.F.; Clavijo, L.U.; Orozco-Duque, A.F. Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities. J. Risk Financial Manag. 2025, 18, 489. https://doi.org/10.3390/jrfm18090489

AMA Style

Arias-Serna MA, Quiza-Montealegre JJ, Móntes-Gómez LF, Clavijo LU, Orozco-Duque AF. Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities. Journal of Risk and Financial Management. 2025; 18(9):489. https://doi.org/10.3390/jrfm18090489

Chicago/Turabian Style

Arias-Serna, María Andrea, Jhon Jair Quiza-Montealegre, Luis Fernando Móntes-Gómez, Leandro Uribe Clavijo, and Andrés Felipe Orozco-Duque. 2025. "Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities" Journal of Risk and Financial Management 18, no. 9: 489. https://doi.org/10.3390/jrfm18090489

APA Style

Arias-Serna, M. A., Quiza-Montealegre, J. J., Móntes-Gómez, L. F., Clavijo, L. U., & Orozco-Duque, A. F. (2025). Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities. Journal of Risk and Financial Management, 18(9), 489. https://doi.org/10.3390/jrfm18090489

Article Menu

Explainable Machine Learning Models for Credit Rating in Colombian Solidarity Sector Entities

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Construction

2.2. Regulatory Reference Model

2.3. Proposed Models

2.3.1. Ridge

2.3.2. Decision Trees

2.3.3. Random Forest

2.3.4. XGBoost

2.3.5. LightGBM

2.3.6. SHAP

3. Results

3.1. Exploratory Analysis

3.2. Data Preparation for Model Implementation

3.3. Models for Debit Consumer Credit

3.3.1. Linear Regression Model

3.3.2. Tree-Based Models

3.3.3. SHAP Global Analysis of the LightGBM Model

3.3.4. SHAP Local Analysis of LightGBM Model

3.4. Models for Non-Debit Consumer Credit

3.4.1. Linear Regression Model

3.4.2. Tree-Based Models

3.4.3. Global SHAP Analysis of LightGBM Model

3.4.4. SHAP Local Analysis of LightGBM Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Definition of Variables

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI