Next Article in Journal
A Complementary Dataset of Scalp EEG Recordings Featuring Participants with Alzheimer’s Disease, Frontotemporal Dementia, and Healthy Controls, Obtained from Photostimulation EEG
Previous Article in Journal
A Dataset for Examining the Problem of the Use of Accounting Semi-Identity-Based Models in Econometrics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning

by
Jomark Noriega
1,2,*,†,
Luis Rivera
1,3,
Jorge Castañeda
4,† and
José Herrera
1,5
1
Facultad de Ingeniería de Sistemas e Informática, Escuela de Posgrado, Universidad Nacional Mayor de San Marcos, Campus Ciudad Universitaria, Calle Germán Amézaga 375, Lima 15801, Peru
2
Financiera QAPAQ, Lima 150120, Peru
3
Center of Sciences and Technology (CCT), Laboratory of Mathematical Sciences (LCMAT), State University of North Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28015-602, Brazil
4
Facultad de Ingeniería, Escuela de Ingeniería Informática, Universidad San Ignacio de Loyola, Campus 2 La Molina, Av. Fontana 550, La Molina 15026, Peru
5
Facultad de Ciencias Informáticas, Departamento de Lenguajes y Programación, Escuela de Postgrado, Universidad Pablo de Olavide, Campus Carretera de Utrera km 1, Andalucía, 41013 Sevilla, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Data 2025, 10(5), 63; https://doi.org/10.3390/data10050063
Submission received: 19 February 2025 / Revised: 21 April 2025 / Accepted: 25 April 2025 / Published: 28 April 2025
(This article belongs to the Section Information Systems and Data Management)

Abstract

Robust credit risk prediction in emerging economies increasingly demands the integration of external factors (EFs) beyond borrowers’ control. This study introduces a scenario-based methodology to incorporate EF—namely COVID-19 severity (mortality and confirmed cases), climate anomalies (temperature deviations, weather-induced road blockages), and social unrest—into machine learning (ML) models for credit delinquency prediction. The approach is grounded in a CRISP-DM framework, combining stationarity testing (Dickey–Fuller), causality analysis (Granger), and post hoc explainability (SHAP, LIME), along with performance evaluation via AUC, ACC, KS, and F1 metrics. The empirical analysis uses nearly 8.2 million records compiled from multiple sources, including 367,000 credit operations granted to individuals and microbusiness owners by a regulated Peruvian financial institution (FMOD) between January 2020 and September 2023. These data also include time series of delinquency by economic activity, external factor indicators (e.g., mortality, climate disruptions, and protest events), and their dynamic interactions assessed through Granger causality to evaluate both the intensity and propagation of external shocks. The results confirm that EF inclusion significantly enhances model performance and robustness. Time-lagged mortality (COVID MOV) emerges as the most powerful single predictor of delinquency, while compound crises (climate and unrest) further intensify default risk—particularly in portfolios without public support. Among the evaluated models, CNN and XGB consistently demonstrate superior adaptability, defined as their ability to maintain strong predictive performance across diverse stress scenarios—including pandemic, climate, and unrest contexts—and to dynamically adjust to varying input distributions and portfolio conditions. Post hoc analyses reveal that EF effects dynamically interact with borrower income, indebtedness, and behavioral traits. This study provides a scalable, explainable framework for integrating systemic shocks into credit risk modeling. The findings contribute to more informed, adaptive, and transparent lending decisions in volatile economic contexts, relevant to financial institutions, regulators, and risk practitioners in emerging markets.

1. Introduction

Currently, artificial intelligence (AI), machine learning (ML), and process digitization are fundamental elements of daily life. This trend began in the 1990s with the global expansion of the Internet [1] and notably intensified during the COVID-19 pandemic [2]. Simultaneously, the increase in the use of online credit has generated large volumes of data [3], leading the financial industry to the imperative need to develop advanced technological tools for risk prediction and effective uncertainty management [4] that include exogenous variables in the process [5]. Presently, scoring models and machine learning techniques are common for risk prediction, applying innovations based on exhaustive data analysis, data-driven innovation (DDI) [6], and the big data paradigm—characterized by the use of massive, high-dimensional, and heterogeneous datasets for predictive modeling and decision-making [7].
In the context of rising demand, credit risk levels show a significant increase [8,9]. Despite the growing integration of DDI and ML in the financial sector, substantial challenges persist, leaving numerous applications unexplored. The application of ML techniques during unpredictable events, such as the 2008 housing crisis [2,10], the COVID-19 pandemic [11], and the impacts of climate change [12], has exposed critical limitations in risk assessment methodologies and revealed considerable uncertainties in their outputs [2,13,14,15]. These challenges underscore the need to examine the influence of external factors (EF) on the accuracy and robustness of ML models in credit risk evaluation [6,16], particularly concerning the inclusion of sequential or time-series data, which are commonly encountered in credit risk scenarios [17]. Moreover, these issues emphasize the need to ensure explainability in ML-driven predictions [18], both to facilitate informed decision-making by stakeholders and to maintain transparency for all parties involved [19].
The main objective of this article is to address the integration of variables sensitive to external risk factors into machine learning models for credit risk assessment, a critical yet insufficiently explored topic in the academic context. Peru, as a case study, represents a particularly relevant scenario: it was the country with the highest COVID-19 death rate per 100,000 inhabitants during the pandemic [20]; it faced the periodic and devastating effects of the El Niño phenomenon [12]; and it experienced intense social unrest following the attempted coup in December 2022 [21]. These events have directly impacted the microcredit market by targeting small businesses, a sector highly vulnerable to risk volatility [12].
While the influence of external factors—such as macroeconomic variables and pandemic-related disruptions—on credit risk has been the subject of substantial academic research, few studies have explored the simultaneous integration of multiple external shocks (pandemics, climate anomalies, and social unrest) into ML-based credit risk models, especially in emerging economies such as Peru. This study addresses that gap, offering valuable insights into the applicability and extrapolation of multifactor risk models in vulnerable financial systems [12]. To this end, EF-related variables are integrated into credit risk models through causality and stationarity tests [22]. These variables are derived from public time-series data and combined with proprietary credit data provided by a regulated Peruvian financial institution—hereafter referred to as FMOD—used under a confidentiality agreement. This strategy not only enhances the predictive power of ML models but also establishes a replicable methodology for integrating diverse EF indicators into future risk prediction studies.
The FMOD dataset plays a pivotal role in this study. It incorporates a unique variable that identifies loans supported by government intervention during periods influenced by EF, such as climate change, social unrest, and the COVID-19 pandemic. This variable allows for the detection and detailed analysis of the potential biases introduced by mitigation programs designed to address economic disruptions. By explicitly accounting for these interventions, the FMOD dataset provides a nuanced understanding of how state-led programs shape credit risk dynamics and borrower behavior under varying external conditions.
Moreover, the FMOD dataset includes monthly updated records of delinquency days, enabling granular analysis across distinct periods. This temporal precision facilitates an in-depth exploration of the interplay between EF and default patterns, thus enhancing the methodological rigor of credit risk assessment. By integrating time-series data with real-world financial indicators, the FMOD dataset offers a robust and replicable framework for evaluating the influence of EF and government programs on financial stability and credit behavior in emerging markets.
Finally, this study evaluates the impact of EF on the predictive accuracy and explainability of ML models. Particular attention is paid to identifying the most influential factors when incorporating external variables into predictive frameworks. By addressing these challenges, this research not only fills a critical gap in the existing literature but also advances the understanding of the effects of EF on credit risk in emerging economies. This contributes actionable insights for policymakers and financial institutions aiming to enhance resilience in volatile economic environments.

2. Related Work

Credit risk, an inherent component of financial activities, arises when borrowers fail to meet their credit obligations, typically due to inability rather than unwillingness to pay [23]. This form of credit deterioration is strongly associated with demographic and macroeconomic variables—such as household income and employment—which are, in turn, sensitive to external systemic shocks [9,23]. Enhancing the explainability of these complex relationships is essential to support data-driven decision-making for policymakers, financial institutions, and affected communities.
The recent literature has increasingly focused on the role of exogenous compound shocks—namely, pandemics, climate change, and social unrest—as emerging sources of credit instability. These phenomena often co-occur and interact in nonlinear ways, producing cascading effects on borrower behavior, portfolio quality, and systemic financial resilience.
  • COVID-19 and Credit Risk: The COVID-19 pandemic exposed critical limitations in traditional credit risk models. Government interventions—such as moratoriums, liquidity injections, and public credit guarantees—generated structural breaks in borrower behavior, which challenged the assumptions of pre-pandemic models and led to inaccurate estimations of default risk [11]. These disruptions particularly affected financially vulnerable groups as noted in recent empirical analyses [24].
    Several studies have highlighted the necessity of using adaptive, data-driven approaches—such as deep learning and regularized regression models—to handle the nonlinearity and volatility introduced by systemic shocks [25,26,27]. In the Peruvian context, unemployment shocks and loan restructuring policies were found to be key drivers of delinquency in consumer and microfinance credit portfolios during the pandemic period [28,29]. Evidence also suggests that regularized machine learning models, including Lasso and Ridge regression, offer robust predictive performance under uncertainty, particularly in large-scale government programs such as “Reactiva Perú” [27].
  • Climate-Related Risks: Climate change poses both physical risks (e.g., floods and droughts) and transition risks (e.g., carbon pricing and regulatory shifts), each with distinct implications for credit stability. Physical hazards can disrupt supply chains, damage infrastructure, and reduce household and firm-level repayment capacity [30], while transition risks expose high-emission sectors to market penalties and regulatory uncertainty [31,32].
    Empirical evidence shows that extreme climate events elevate loan impairment rates and increase provisioning requirements, particularly for microfinance institutions operating in vulnerable regions [12]. These dynamics underscore the urgency of integrating climate stress testing, early-warning systems, and network-aware modeling into credit risk frameworks [33,34].
    This study [35] analyzes the interaction between macroeconomic shocks and climate-driven economic pressures within the Peruvian context, using a vector autoregression (VAR) model with national time-series data. Their findings suggest that expansionary monetary policies during periods of climate-related stress may inadvertently elevate the credit risk of microfinance portfolios. This reinforces the need to develop predictive frameworks that account for both environmental and policy-related drivers of credit instability in emerging economies.
  • Social Unrest and Political Instability: Social unrest—often triggered by inequality, political crises, or economic shocks—can significantly amplify credit risk by disrupting income flows, consumption patterns, and investor confidence. Empirical evidence from Chile shows that the 2019 protests led to a measurable increase in household default probability, with partial mitigation through pandemic-era relief programs [36]. Similarly, civil unrest has been shown to induce volatility in financial markets, exemplified by reverse herding behaviors among investors [37], and to influence regulatory responses such as foreclosure bans, which may inadvertently elevate long-term credit risk [38]. In the Middle East and North Africa (MENA) region, waves of political instability have prompted structural reforms such as bank privatization, producing mixed outcomes in terms of credit risk and financial system resilience [39].
    While regional and global analyses are increasingly available, country-level studies for Peru remain scarce. However, recent academic research suggests that social and health crises in Peru prompted a behavioral shift among microentrepreneurs, particularly those in lower-poverty districts, who increasingly substituted personal loans with government-backed business credit during times of elevated uncertainty. This substitution pattern reflects a form of borrower adaptation and highlights the indirect impact of public policy interventions—such as state guarantee programs—on the structure and resilience of credit portfolios in emerging markets [29].
  • Compound External Shocks: An emerging body of research highlights that the simultaneous occurrence of exogenous stressors—such as pandemics, climate anomalies, and social unrest—can generate nonlinear, systemic risks that conventional credit risk models are ill-equipped to anticipate [40]. These compound shocks interact across temporal and spatial dimensions, producing cascading effects on borrower behavior, institutional solvency, and portfolio stability. Moreover, societal vulnerabilities such as institutional fragility, inequality, and economic informality intensify the transmission channels and feedback loops of these risks, particularly in emerging economies.
    To address this complexity, researchers have proposed integrated modeling frameworks that incorporate early-warning signals, tipping points, and network-aware propagation mechanisms spanning environmental, economic, and political domains [40]. These multidimensional approaches are considered essential for the development of next-generation credit risk systems capable of operating under deep systemic uncertainty.
    In the Peruvian context, recent analyses emphasize the need for credit scoring systems to account for borrower heterogeneity under compound volatility conditions [41]. Empirical studies support the use of hybrid models that integrate Explainable Artificial Intelligence (XAI) techniques—such as SHapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME)—into traditional risk evaluation processes, thereby enhancing model transparency and predictive performance in highly unstable environments [23]. These insights reinforce the urgency of designing robust, adaptive credit assessment tools tailored to complex and evolving risk landscapes.
Despite the growing body of work on machine learning applications for credit scoring in Peru, most studies focus on conventional financial variables without incorporating external systemic stressors. Prior research has evaluated multiple machine learning algorithms for microcredit assessment in rural areas but typically omits exogenous factors such as pandemic shocks or climate anomalies [42]. This reveals a gap in studies that simultaneously integrate multiple external variables into ML-based credit risk models, particularly within the Peruvian financial system.
The present study addresses this limitation by incorporating time-series-based exogenous indicators—including COVID-19 deaths, temperature anomalies, and climate-induced road blockades—into a wide range of ML models. We assess their influence on both predictive accuracy and model interpretability.

3. Materials and Methods

3.1. Research Questions and Data Description

Based on this assessment, this article poses the research questions outlined in Table 1, centering on the interplay between external factors and credit risk assessment. Particular emphasis is placed on the importance of explainability in ML-driven models, as understanding how EF impacts credit risk is critical for decision-making and actionable insights.
To explore these relationships, a study focusing on Peru from January 2020 to September 2023 was conducted. The analysis was based on a comprehensive dataset comprising nearly 8.2 million records, as summarized in Table 2. This dataset integrates diverse EF, such as COVID-19 positive cases and deaths, road blockages, and temperature anomalies, along with financial delinquency (see Table 3) and credit activity data (see Table 4), ensuring a multifaceted approach to understanding credit risk dynamics under external influences.
Figure 1 illustrates the trends of various external factors, including COVID-19 cases (positive and dead), social unrest, road blockages due to social unrest and weather, and temperature anomalies. Peaks in certain factors, such as COVID-19 cases and social unrest, were visible during specific periods, indicating their significant occurrence and potential impact on other events or systems.
The main features of the credit dataset provided by FMOD are summarized in Table 5. This summary complements the structural descriptions in Table 3 and Table 4.
Table 3 presents the structure of the time-series dataset used to analyze credit delinquency patterns across economic activities. Each row corresponds to an aggregated daily observation, capturing both the average and maximum number of overdue days among loans, grouped by income source, loan purpose, and economic sector. These variables are essential for modeling how different segments of the economy responded to external shocks over time.
By combining these features, the study constructed temporal indicators of financial stress, which were then aligned with a time series of external factors (e.g., COVID-19 indicators and climate anomalies). This temporal alignment enabled the application of causality analyses—such as Granger causality tests—and enhanced the interpretability of the model’s behavior under compound external disruptions.
This approach not only evaluates the predictive accuracy of machine learning models but also examines their capacity to provide transparent explanations for the relationships uncovered, ensuring their practical relevance for decision-making in volatile and complex environments.
Additionally, Table 4 summarizes the set of features used in the individual-level credit risk modeling. These include socio-demographic variables, financial indicators, and normalized business metrics. Each row corresponds to a credit evaluated during the study period, enriched with external factor indicators to support scenario-based experimentation.
Among the set of features, two representative variables were constructed with monthly frequency for this study: ‘DelinquencyDays_MMYYYY’, which captures the number of overdue days per month, and ‘ExternalFactorImpact_MMYYYY’, which flags the presence of external stressors—such as COVID-19, climate anomalies, or social unrest—during the same period. This temporal alignment enabled a dynamic analysis of credit behavior under external disruptions, strengthening both the predictive modeling and the interpretability of the results.

3.2. Cross-Industry Standard Process for Data Mining

To evaluate the influence of EF on credit risk, we adopt the Cross-Industry Standard Process for Data Mining (CRISP-DM), a widely used framework that structures data mining projects into six iterative stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment [43]. This methodological foundation provides a robust and flexible structure for guiding machine learning workflows across domains, and has also been successfully applied in prior studies analyzing external factors during crisis scenarios [44,45].
In the context of this study, the application of CRISP-DM allowed us to address two specific challenges associated with modeling credit risk under compound external shocks. First, during the Data Understanding and Preparation phases, the incorporation of exogenous time-series variables—such as pandemic indicators, climate anomalies, and social unrest—required additional preprocessing steps, including stationarity testing, temporal alignment, and Granger causality analysis. Second, during the Modeling and Evaluation phases, we conducted comparative experiments across multiple scenarios (with and without EF), enabling us to assess the marginal contribution and explanatory power of these external variables under volatile conditions.
As illustrated in Figure 2, the CRISP-DM framework provided a structured and iterative foundation for organizing each phase of the experiment. The process began with Business and Data Understanding, which guided the initial dataset exploration and external factor characterization. During Data Preparation, we performed normalization, standardization, and the integration of exogenous time-series variables. This integration was based on a monthly alignment strategy.
Importantly, the ‘ExternalFactorImpact_MMYYYY’ variable was not assigned uniformly across the dataset. Instead, it was derived by evaluating the declared economic activity of each credit and determining whether that activity exhibited a statistically significant relationship with a given external factor. This was performed using Granger causality tests between the monthly time series of average delinquency for each economic activity and the corresponding external factor series (e.g., COVID-19 deaths and road blockages). Only when a causal relationship was validated did the external factor impact flag become active for loans in that activity during the affected periods.
Each loan record was then linked to the corresponding external factor conditions (e.g., COVID-19, road blockages, and temperature anomalies) based on the month of disbursement or performance observation. This was operationalized through two representative monthly variables: ‘DelinquencyDays_MMYYYY’, capturing overdue status, and ‘ExternalFactorImpact_MMYYYY’, flagging the presence of external stressors. This structure enabled the inclusion of both defaulting and non-defaulting loans, allowing the model to learn from the evolution of payment behavior under varying external conditions.
In the Modeling and Evaluation stages, we compared predictive performance across parallel scenarios—with and without EF—to assess their marginal contribution. Finally, during the Evaluation phase, Explainable Artificial Intelligence (XAI) techniques, specifically SHAP and LIME, were incorporated to enhance transparency and facilitate the interpretation of model decisions concerning external shocks.

3.2.1. Workflow Description

  • Dataset Definition and Feature Selection: The process begins by determining the relevant datasets and conducting a comprehensive analysis of the phenomena to be evaluated (EF in this study). Feature selection was guided by the established methodologies [46], identifying attributes with high relevance to credit risk prediction, including both internal (e.g., loan characteristics) and external (e.g., EF) variables. The selected features were refined using feature importance techniques, such as Recursive Feature Elimination (RFE) and Mutual Information (MI), ensuring that only the most significant predictors were retained.
  • Data Preparation: Preprocessing involved standardizing and normalizing the dataset (using sum normalization) to homogenize the data scales. Additionally, missing data were handled using multiple imputation techniques to maintain dataset integrity [46].
  • Integration of External Factors: To evaluate the impact of EF on credit risk prediction, additional variables corresponding to EF (e.g., COVID-19, temperature anomalies, and social unrest) were incorporated into the dataset. These variables were derived from time-series analyses and causal relationships, validated through statistical methods such as the Dickey–Fuller test and the Granger causality test [47].
    The Dickey–Fuller test was employed to verify the stationarity of the time-series data, ensuring that relationships between variables were not spurious and that the models yielded reliable results [47]. The stationarity of time series is critical, particularly when external factors such as COVID-19 cases, mortality rates, and roadblocks caused by weather events influence the temporal dynamics. The test is described mathematically as
    Δ Y t = α + β t + γ Y t 1 + δ 1 Δ Y t 1 + + δ p Δ Y t p + ϵ t
    The terms represent the following:
    • Δ Y t : first difference in time series ( Y t Y t 1 );
    • α : a constant;
    • β t : a trend term;
    • Y t 1 : lagged value of the time series;
    • δ i : coefficients of lagged differences;
    • ϵ t : the error term;
    • p: the number of lags.
    Following the confirmation of stationarity, the Granger causality test was applied to evaluate the causal influence of these external factors on economic activities. The test, widely used in econometric studies, is defined as
    Y t = β 0 + i = 1 k β i Y t i + j = 1 k γ j X t j + η t
    The terms represent the following:
    • X t j : lagged values of series X t ;
    • β 0 , β i , and  γ j : the coefficients;
    • η t : the error term.
    Lag variables were calculated to capture delayed effects, with lag periods tailored to each EF (e.g., quarterly for temperature anomalies, and monthly for social unrest and COVID-19). These statistical methods ensured rigorous validation of the causal relationship between EF and economic activities that influence credit defaults.
  • ML Model Training and Hyperparameter Tuning: Advanced ML models were trained on datasets both with and without EF to assess their predictive contributions. Hyperparameter tuning was conducted using grid search and Bayesian optimization [48] to enhance the model performance. To address the class imbalance in the dataset, techniques such as the Synthetic Minority Oversampling Technique (SMOTE) [49] and class-weighted loss functions were employed [50,51].
  • Evaluation and Explainability (XAI): Models were evaluated using metrics such as Accuracy (ACC), Area Under the Curve (AUC), Kolmogorov–Smirnov (KS) and F1-score (F1) across multiple folds (10-fold cross-validation). To enhance interpretability, post hoc explainability techniques such as SHAP and LIME were applied [52]. These methods highlight the most influential features in predicting credit delinquency both before and after incorporating EF [19].
  • Comparison of Scenarios: The workflow was executed in parallel for scenarios excluding EF and those incorporating EF. This allowed for a direct comparison of the model performance and the added value of EF inclusion.

3.2.2. Key Enhancements in the Current Study

  • The integration of explainability methods (SHAP and LIME) provides actionable insights for decision-makers in financial institutions.
  • Comprehensive feature selection processes were implemented, ensuring that only the most predictive attributes were retained for model development.
  • Rigorous hyperparameter optimization to maximize model performance and robustness.
By employing this adapted CRISP-DM workflow, this study contributes a systematic and replicable methodology for evaluating EF in credit risk contexts, particularly during crises.
Figure 3 depicts the impact of COVID-19 on economic activities by comparing delinquency rates with the economic impacts over time. The graph highlights fluctuations in economic impact trends, particularly during pandemic peaks, and distinguishes patterns between financial and personal impacts (F + P) and movement-related impacts (CR Impact F + Mov).
Figure 4 illustrates the impact of social unrest and climatic factors on economic activity from December 2022 to September 2023. It compares delinquency rates with the economic impact of social unrest, weather, and temperature anomalies, and shows significant spikes during periods of intensified external disruptions.

3.2.3. Research Strategy

The evaluation period for this study is defined based on the behavior of the monthly classification variables, starting from the onset of the impact of EF until stabilization as shown in Figure 3 and Figure 4. Stabilization is defined as the point at which the slope of the curve representing the proportion of bad payers (red line) began to level off.
Evaluation Periods
  • COVID-19: The evaluation period spans March 2020 to January 2022. Loans disbursed until October 2021 are included to account for a 60-day grace period before delinquencies appeared, ensuring unbiased results (Figure 3).
    During the analysis of the COVID-19 external factors, it was identified that the trends in the number of positive cases and deaths often showed opposing behaviors. To address this, an additional experiment was conducted, in which the death curve was shifted from one period to the right, referred to as the COVID MOV. This adjustment considered the estimated incubation period of the disease and the time to potential fatality, typically ranging from 2 to 3 weeks. By aligning the death data with this temporal delay, this analysis aimed to capture the causal relationship between disease progression and its impact on credit delinquency rates. This methodological adjustment is illustrated in Figure 3 (green dotted line).
  • Social Unrest and Climate Factors: Data from December 2022 to September 2023 were analyzed, with loans disbursed up to June 2023 considered under the same grace period logic (Figure 4).
In addition, the dataset includes a variable that identifies whether a credit was backed by a government guarantee. Based on this variable, two experimental configurations were defined: one including all government-backed credits (WGB_CR), and another excluding them (WOGB_CR), to evaluate potential biases introduced by public credit support programs.
To assess the influence of external stressors on credit risk, multiple experimental scenarios were constructed using different combinations of observed external factors (see Table 6). These include a baseline scenario SFE, as well as scenarios incorporating COVID-19-related metrics such as fatalities (F), confirmed positive cases (P), and a combined F + P scenario. Additionally, a variation labeled COVID MOV was considered, shifting fatality data forward by one month to account for incubation and reporting delays.
Other scenarios focused on sociopolitical and environmental disruptions, including social unrest indicators (U), temperature anomalies (T), and road blockages caused by extreme weather (W). Combined scenarios such as T + W (climatic anomalies and road disruptions) and U + T + W (integrating social unrest, climate variability, and climate-induced blockages) were also included to evaluate compound effects.
Data Preprocessing
  • Data normalization and standardization were applied to homogenize the scales [53].
  • Features with correlations above 90% were removed to prevent redundancy.
  • SMOTE was used to balance the dataset [54].
Machine Learning Models
We implemented the following models to evaluate the binary classification tasks, prioritizing interpretability and predictive performance:
  • Traditional Models: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and k-Nearest Neighbors (KNN) [54].
  • Advanced Models: Perceptron (PERCEP), Multi-layer Perceptron (MLP), Random Forest (RF) [3], Ridge Regression (Ridge) [55], and Extreme Gradient Boosting (XGB) [56].
  • Explainable Models:
    Explainable Neural Network (XNN), which directly integrates interpretability into the model architecture [17].
    Explainable Boosting Machines (EBM), a method that combines high accuracy with feature interpretability through additive modeling [57].
  • Feedforward Neural Networks: Variants of Feedforward Neural Networks (FFNN) were explored:
    Standard Feedforward Neural Network (FFN): A classical architecture used as a baseline for comparison.
    Feedforward Neural Network with Sparse Regularization (FFNSP): Incorporates sparsity constraints to enhance generalization and reduce overfitting.
    Feedforward Neural Network with Local Penalty Regularization (FFNLP): Local penalty terms are applied to balance feature contributions and improve interpretability.
  • Deep Learning: Convolutional Neural Networks (CNN) were included for their ability to capture complex relationships in both spatial and temporal data [25]. Although CNN are traditionally applied to image or time-series data, in this study, each tabular instance was reshaped into a 2D tensor of shape ( features , 1 ) , allowing the use of 1D convolutional layers to capture localized patterns across the feature vector. This approach has been successfully applied in previous studies involving structured tabular data [58].
Each model was trained and evaluated using a stratified 10-fold cross-validation protocol, ensuring balanced class representation across folds [26]. In each fold, 90% of the data was used for model development and 10% was held out for testing. Within the development portion, 75% was used for training and 25% for internal validation, enabling unbiased hyperparameter tuning. We employed a combination of grid search and Bayesian optimization to identify optimal model configurations.
All performance metrics—including ACC, AUC, F1, and KS—were computed separately on the validation and test partitions for each fold. The final results reported in the manuscript correspond to the average values obtained across the 10 test folds.
Explainability with XAI
To address concerns about interpretability, SHAP and LIME were applied to identify the most influential features in predicting credit delinquency both before and after incorporating EF. Models such as XNN and EBM provided native interpretability, complementing these post hoc methods.
Statistical Evaluation
The impact of EF on model performance was assessed using Student’s t-test (Equation (3)), applied to metrics such as ACC, AUC, KS and F1 from the cross-validation folds, with a significance level of α = 0.05 :
t = X ¯ μ s n
The terms represent the following:
  • X ¯ : mean of metrics from ten folds;
  • μ : hypothetical mean under the null scenario;
  • s: standard deviation of metrics;
  • n: number of folds ( n = 10 ).
Economic Activity Relevance
The dataset, comprising 367,000 loans (see Table 2), was analyzed based on its concentration across economic activities. This analysis is particularly relevant, as the impact of EF on these activities amplifies exposure to credit risk. The distribution of loans is as follows:
  • 80% Concentration: 80% of the loans are concentrated in 78 economic activities.
  • 99% Concentration: 99% of the loans are concentrated in 326 economic activities.
  • 100% Concentration: 100% of the loans are distributed across 660 economic activities.
Figure 5 illustrates these groupings, highlighting the increased dispersion and noise as the number of activities increases. To ensure a comprehensive analysis, the results were further examined by including and excluding loans affected by government-backed programs, thereby isolating the potential biases introduced by such interventions.
Research Limitations and Biases
While this study provides valuable insights into the influence of EF on credit delinquency predictions, several limitations and potential biases must be considered:
  • Context-Specific Findings: The analysis is based on data from the Peruvian context, which may limit the generalizability to other regions with different economic, social, and environmental dynamics. Future research should incorporate external validation by using datasets from diverse geographical areas to evaluate the transferability of the proposed methodology.
  • Training Data Bias: Dataset imbalances and the under-reporting of delinquency during government interventions may introduce biases. Although SMOTE has been used to address class imbalances, residual biases related to data collection practices or policy impacts may persist.
  • Model Transparency: While explainability techniques such as SHAP, LIME, and interpretable models (e.g., XNN and EBM) have been employed, the use of black-box models like CNN and XGB remains a challenge. This lack of inherent transparency could hinder their applicability to financial institutions, where explainability is crucial for regulatory compliance and stakeholder trust.
  • Temporal Assumptions: The assumption that loans disbursed near the evaluation cutoff reflect the grace period impacts may not fully account for the delayed delinquency effects, particularly for loans with extended repayment terms.
  • Data Quality and Noise: External factor measurements and institutional datasets may contain noise or inaccuracies that could affect model performance, despite preprocessing steps such as normalization and standardization. Future work should explore advanced noise-reduction techniques.
  • Researcher Influence: Prior domain knowledge and experience can introduce subtle biases in model selection, EF, or preprocessing decisions. To mitigate this, the study adhered to best practices such as k-fold cross-validation, rigorous statistical testing, and methodological transparency.
Despite these limitations, the methodological rigor applied in this study, including robust preprocessing, comprehensive statistical validation, and the integration of explainable models, provides a solid foundation for future research. The findings highlight the need for the further exploration of EF impacts in broader contexts, paving the way for more robust, interpretable, and generalizable credit risk assessment frameworks.

4. Results

4.1. Stationarity Analysis and Dataset Integration

In this analysis, “economic activities” refer to the internal sector classification codes provided by the financial institution FMOD. These codes represent a customized segmentation structure used to group clients based on their type of business or productive sector, and are derived from operational risk models. Although not fully equivalent to official international standards, these codes are conceptually aligned with the International Standard Industrial Classification of All Economic Activities (ISIC Rev. 4), maintained by the United Nations Statistics Division. (https://unstats.un.org/unsd/classifications/Econ/isic, accessed on 5 April 2025).
Specifically, the evaluation window from April 2018 to February 2020 (as shown in Table 7) contains 718 unique activity codes, each linked to credit operations that remained active during that period. Other periods in the table include subsets of these activities, depending on whether data on overdue loans were available within each respective evaluation window.
To assess the influence of EF on economic activity delinquency, we first applied the Dickey–Fuller stationarity test (see Equation (1)) to the normalized daily time series covering the period from January 2016 to September 2023 (see Table 2). These time series represent the number of delinquency days associated with the economic activities under study, examined both in periods with EF and without EF [47].

Key Stationarity Findings

Table 7 shows that, in periods without EF, between 90% and 91% of the activities exhibit non-stationary delinquency patterns, while 6–7% exhibit stationarity, and around 3% are indeterminate. In contrast, periods with EF present a lower proportion of non-stationary activities (75–85%) and a higher proportion of stationary (6–22%) or indeterminate patterns (2–9%). These differences suggest that exogenous shocks, such as pandemics, social unrest, or climatic disruptions, can induce more persistent or complex delinquency behaviors, consistent with similar observations in other emerging economies.
To assess the influence of EF on economic activity delinquency, the Dickey–Fuller stationarity test was applied (see Equation (1)) to the normalized time series of daily frequency. These series represent the days of delinquency associated with the analyzed economic activities. The data used for this analysis cover the period from January 2016 to September 2023 (see Table 2). The analysis was performed for both periods influenced by EF and those without EF [47].
The results indicate that during periods without EF influence, between 90% and 91% of economic activities exhibited non-stationary delinquency, while 7% and 6% showed stationarity. Additionally, 3% of the activities fell into the “indeterminate” category, where stationarity could not be conclusively determined. In periods with EF influence, the proportion of non-stationary activities decreased to 75% and 85%, whereas stationary activities increased to 22% and 6%. The “indeterminate” category also showed slight variations, with 2% and 9% of the activities falling into this group.
The “indeterminate” classification, though a minority, highlights cases where delinquency patterns are less predictable or fall outside the thresholds for stationarity detection. This suggests that EF may not only influence stationarity directly but may also introduce complexity in delinquency patterns, making it harder to classify definitively.
This behavior may be explained by the risk management practices of financial institutions during stable periods. In the absence of external shocks, institutions are more capable of identifying early signs of payment delays and can adjust their credit exposure accordingly—often by suspending or restricting lending to high-risk segments. These interventions disrupt emerging trends, resulting in more irregular default behavior that tends to deviate from stationary assumptions due to abrupt shifts in mean or variance.
Conversely, during systemic external shocks—such as pandemics or climate disruptions—multiple economic sectors experience distress concurrently, reducing the institution’s ability to react. This leads to prolonged and homogeneous delinquency patterns across broader borrower groups, increasing the likelihood of stationarity in the associated time series.
These findings are summarized in Table 7, which provides a detailed breakdown of the stationarity classifications across the different periods of evaluation. The results underscore the dynamic impact of external factors on the stationarity of delinquency days in economic activities, reinforcing the need to account for these factors in economic analyses and credit risk prediction models.

4.2. Definition of Evaluated Scenarios and Causality Testing

To evaluate the seasonal impact of EF on the default behavior of economic activities, the Granger causality test (see Equation (2)) was applied to the time series representing the days of default for each economic activity, in conjunction with the time series of each EF. For simplicity, a binary value of 1 was assigned when the p-value was less than 0.05, indicating statistically significant causality, and 0 otherwise. This binary classification was performed every month to effectively capture temporal variations.
The results of this causality analysis were subsequently integrated into the credit dataset and associated with the corresponding economic activity and period under analysis. Additionally, a global binary classification variable was calculated and appended to the dataset to identify the payment quality of credit holders (good or bad payers) based on monthly delinquency data. This enriched dataset provided the foundation for applying Algorithm 1 to the established datasets (see Figure 5) to compute the ACC, AUC, KS, and F1 metrics for the evaluated machine learning models. These models were assessed across the scenarios listed in Table 6. Based on the values obtained, the differences between the scenarios that include EF and SFE scenarios were calculated as shown in Table 8.
Algorithm 1 Dataset evaluation using ML algorithms.
  • EVALUATE ( data , target , features , model _ name )
  •      1. Remove collinear variables (correlation > 90%)
  •      2. Perform Stratified k-Fold Cross-Validation ( k = 10 )
  •      3. In each fold:
  •         a. Use 90% for training, 10% for testing
  •         b. Split training set: 75% train, 25% validation
  •         c. Apply SMOTE for class balancing on train set
  •       d. Train selected model: LDA, QDA, Perceptron, MLP, Ridge, RF, XGB, KNN, DL, CNN, XNN, FFN (Standard, Sparse, Local Penalty), EBM
  •         e. Compute metrics: A U C _ V a l , A C C _ V a l , A U C _ T e s t , A C C _ T e s t , K S _ V a l , K S _ T e s t , F 1 _ V a l , F 1 _ T e s t
  •      return Mean metrics across 10 folds
  • EVALUATE_ALL_MODELS ( data , target , features )
  •      1. Iterate over models and feature sets
  •      2. Evaluate on datasets: 1. D100, 2. D99, 3.D80
  •      return DataFrame with results
  • EVALUATE_XAI_BEST_MODELS ( best _ Model )
  •      1. Evaluate SHAP
  •      2. Evaluate LIME
  •      return DataFrame with SHAP and LIME results
  • FEATURES_EXPLANATION ( best _ Model )
The performance metrics (ACC, AUC, KS and F1) derived from the application of machine learning models for the binary classification of good and bad payers, evaluated across datasets stratified by relevance to established scenarios, are presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15 and Table A16 in Appendix A. These tables provide a detailed comparative analysis of the scenarios with and without the inclusion of EF related to COVID-19, social unrest, and climate change.
Additionally, Table A17, Table A18, Table A19, Table A20, Table A21 and Table A22 in Appendix A identify the machine learning models that achieved statistically significant improvements in their classification metrics when the EFs were incorporated. These tables delineate the conditions under which specific models demonstrate enhanced predictive capabilities, providing critical insights into the sensitivity and adaptability of various machine learning approaches to external factors.
The comprehensive dataset from these experiments forms a robust foundation for addressing the research questions that underpin this study. Given the extensive volume of information presented, the subsequent discussion focuses on the most significant results, with direct references to the relevant tables to highlight critical findings and their implications for predictive modeling under diverse scenarios.
  • RQ1. How did the COVID-19 pandemic, through factors such as positive cases and mortality rates, influence credit defaults in different economic activities?
The first research question is focused on understanding the impact of the COVID-19 pandemic on credit delinquency, particularly through the inclusion of variables such as confirmed infections and mortality. A comparative analysis was conducted between baseline scenarios SFE and those influenced by pandemic conditions, including a time-shifted mortality scenario (COVID MOV). Across all evaluation metrics—AUC, ACC, KS, and F1—the results consistently indicate a marked shift in credit risk behavior during pandemic periods.
Among the pandemic-related variables, mortality rates—especially when temporally aligned to reflect delayed economic effects—proved to be the most influential predictor of default. This effect was most evident in the 3. D80 dataset, which encompasses high-contact economic sectors that are particularly vulnerable to pandemic restrictions and health crises. In this scenario, the inclusion of lag-adjusted mortality data significantly enhanced model performance, suggesting that delayed health outcomes capture the real-world propagation of credit risk better than contemporaneous case counts alone.
The presence of government-backed credit mechanisms also played a decisive role. In datasets where such guarantees were included (WGB_CR), the predictive impact of the pandemic was partially cushioned, resulting in lower volatility and more stable classification outcomes. Conversely, in non-guaranteed portfolios (WOGB_CR), default risks increased substantially, reflecting the lack of institutional buffers.
In terms of model performance, CNN and XGB emerged as the most effective across all partitions, consistently achieving top-tier predictive accuracy and discrimination. While models such as EBM demonstrated improvements in specific scenarios—particularly under external stressors—others like XNN exhibited limited effectiveness, especially in pandemic-related contexts, which constrained their applicability despite their inherent interpretability features.
These findings are further reinforced by the interpretability analyses shown in Figure 6, where SHAP and LIME techniques reveal how the relative importance of predictive features shifts across scenarios. Variables related to debt burden and origination characteristics gained or lost prominence depending on whether pandemic indicators were present, reflecting the models’ dynamic response to exogenous health shocks.
In summary, the COVID-19 pandemic had a substantial and measurable effect on credit delinquency patterns. Mortality—particularly when time-shifted—proved to be the most impactful factor, and its inclusion substantially improved model robustness. Government interventions, while partially effective, could not fully suppress the delinquency risks in the most exposed segments. These insights underscore the value of integrating lag-sensitive external variables into credit risk modeling, especially for sectors with heightened structural vulnerability.
Figure 6 provides interpretability insights into the predictive behavior of the XGB and CNN models under both baseline (SFE) and COVID-affected (F + P) scenarios, using SHAP and LIME explainability techniques.
Figure 6a,b illustrate the SHAP and LIME results for the XGB model, respectively. Under the SFE scenario, features such as MicroOrigination_Score, Overindebtedness_ Score, and MaxInternalAmount exhibit the greatest impact on prediction outputs. In the F + P setting, variables directly linked to pandemic conditions—such as 2021-04Positive Cases and 2020-10Deceased—emerge alongside financial indicators, suggesting an integration of external health shocks into the model’s risk prioritization.
Figure 6c,d display the corresponding SHAP and LIME analyses for the CNN model. Similar to XGB, the CNN model assigns greater relevance to features such as MaxInternalAmount and MicroOrigination_Score under the baseline (SFE) scenario. In the F + P scenario, feature importance shifts toward pandemic-related variables and demographic indicators (e.g., 2020-09PositiveCases, 2020-09Deceased, DAResDirection, Gender), reflecting the model’s responsiveness to contextual changes introduced by COVID-19.
Together, these visualizations underscore the dynamic interaction between model architecture and scenario context, revealing how external stressors—particularly COVID-19—reshape the prioritization of financial and behavioral risk indicators within advanced machine learning models.
  • RQ2. To what extent do climate change indicators, such as temperature anomalies and road blockages due to weather, impact credit delinquency patterns?
The analysis of climate-related external factors—specifically temperature anomalies (T) and weather-induced road blockages (W)—reveals a measurable and statistically significant influence on credit delinquency behavior. While the magnitude of this effect is generally more moderate compared to pandemic-related variables, it remains consistent across scenarios and data partitions.
Table A9, Table A10, Table A11 and Table A12 (WGB_CR) and Table A13, Table A14, Table A15 and Table A16 (WOGB_CR) show notable improvements in classification metrics such as ACC, AUC, KS, and F1 upon including climate variables. These gains are particularly prominent in the absence of government guarantees, where borrowers are more directly affected by logistical disruptions. In portfolios without policy support (WOGB_CR), the effect of weather-related blockages is accentuated, often contributing to delays in income flows and operational closures—factors that increase credit risk exposure.
From a comparative perspective, the predictive gains attributed to climate anomalies may not be as large as those observed under pandemic mortality scenarios, yet they are highly context dependent. For instance, improvements in AUC of up to 3% and accuracy gains of over 30% have been recorded in neural network models, especially when the economic activities under analysis are concentrated in geographically exposed sectors. This supports the findings of [30], who demonstrated that road infrastructure disruptions can propagate liquidity shocks and elevate default risk, particularly in informal or underserved lending markets.
Figure 7 illustrates how the XGB model integrates climate variables into its decision structure, ranking them alongside key behavioral and demographic attributes. This interpretability result reaffirms that exogenous environmental stressors, while external to borrower control, are internalized by advanced algorithms as significant contributors to credit delinquency prediction.
In sum, the inclusion of climate indicators in credit risk models enhances their ability to reflect real-world disruptions and borrower vulnerability. The observed performance improvements are not only statistically valid but also economically meaningful, underscoring the importance of incorporating climate-related risks into financial predictive systems, particularly in emerging markets subject to environmental volatility.
Figure 7 provides interpretability insights into the behavior of the XGB model under climate-related stress conditions. Figure 7a presents the SHAP analysis comparing the baseline scenario SFE and the scenario that includes both temperature anomalies and weather-induced road blockages (T + W). The visualization highlights the ten most influential features in each setting, with color gradients indicating the direction and intensity of their contribution to the model’s predictions. High feature values are represented in red and low values in blue, allow a clear interpretation of how variable magnitudes relate to increased or decreased credit risk.
Figure 7b displays the corresponding LIME analysis for the same dataset (CLIMATE_ FACTOR_WOGB_CR). In both scenarios, DTResDirection emerges as the most critical driver of classification outcomes. Notably, under the T + W scenario, Gender gains relevance, suggesting that climate-related externalities may interact differently with sociodemographic characteristics. This contrast between the SFE and T + W contexts reinforces the need to account for latent vulnerabilities when modeling credit risk under environmental stress.
Together, these interpretability results confirm that the inclusion of climate factors not only enhances model accuracy but also reconfigures the internal prioritization of risk features. This dynamic adaptation reflects the capacity of machine learning models like XGB to absorb and reflect external shocks through learned decision boundaries.
  • RQ3. What is the relationship between credit delinquency and social unrest, considering disruptions to economic activities and societal stability?
Social unrest—manifested through protests, strikes, and other large-scale disruptions—can severely affect economic activity by limiting mobility, reducing consumer demand, and disrupting informal and formal markets. These effects become particularly salient in borrower segments highly dependent on local or face-to-face interactions.
Our analysis confirms that the inclusion of social unrest indicators (U) in the datasets yields statistically significant improvements in credit delinquency prediction. These improvements are evident in both WGB_CR and WOGB_CR portfolios, although the magnitude of the effect is more pronounced in the latter. This distinction highlights the protective role of policy guarantees, which tend to buffer the financial impact of systemic disruptions.
In terms of model responsiveness, CNN, and XGB exhibit the highest sensitivity to unrest scenarios, achieving notable gains in metrics such as AUC, ACC, KS, and F1. Classical models, including LDA and Ridge, also show moderate but statistically consistent improvements. Notably, in the 3. D80 dataset—focused on high-contact economic activities—the accuracy and discrimination capacity of models improve substantially under social unrest scenarios, suggesting that these sectors are disproportionately vulnerable to institutional instability.
Interpretability analyses provide further insight. As shown in Figure 8, SHAP values for CNN and XGB highlight a shift in feature relevance when social unrest variables are introduced. Features such as Overindebtedness_Score and MaxInternalAmount become more prominent, reflecting the increased importance of financial stress indicators in volatile sociopolitical contexts.
In summary, the presence of social unrest introduces measurable and context-sensitive volatility in credit delinquency behavior. Its integration into predictive models not only improves classification performance but also enhances the capacity to identify borrower segments at heightened risk. These findings support the existing literature on the financial impacts of sociopolitical instability and underscore the importance of incorporating such variables into credit risk assessments, particularly in emerging economies where unrest events are recurrent and unevenly absorbed by the population.
Figure 8 presents the SHAP feature importance analysis for the XGB and CNN models under the “U scenarios” in the SOCIAL_UNREST_WOGB_CR dataset. The top 10 influential features are shown for each model, with Overindebtedness_Score and MaxInternalAmount being particularly significant in both, highlighting their impact on the model predictions.
  • RQ4. How do the combined effects of external factors (COVID-19, climate change, and social unrest) contribute to variations in credit delinquency, and what are the most influential factors?
The combination of external factors—pandemic severity, climatic anomalies, and sociopolitical instability—amplifies the complexity and unpredictability of credit behavior. When analyzed jointly, these variables generate compounded stress conditions that exert a stronger influence on delinquency patterns than any of them in isolation. This effect is particularly evident in the absence of government-backed guarantees (WOGB_CR), where borrowers are fully exposed to systemic shocks.
Empirical results from multifactor scenarios, such as U + T + W and Climate+Unrest, show a higher density of statistically significant gains across all evaluation metrics. Notably, mortality indicators (especially in lag-adjusted form, i.e., COVID MOV) remain the dominant predictor of default, reflecting the profound economic dislocation associated with fatal health outcomes. Climatic variables, although more moderate in their individual contribution, reinforce vulnerability in certain borrower segments, particularly when the infrastructure is disrupted by weather-related events. Social unrest introduces further pressure by destabilizing labor income and limiting business continuity.
Model behavior under combined factor scenarios reveals distinct patterns. XGB consistently ranks as the most resilient and adaptive model, achieving significance across all multifactor experiments. CNN also demonstrates strong performance, although with some sensitivity to the structure of the input data and the intensity of external stressors. While XNN was included as an interpretable neural model, its performance was inconsistent, limiting its contribution beyond methodological comparison. Classical models such as Ridge and LDA continue to offer baseline robustness, particularly in datasets with partial shielding (e.g., WGB_CR), although their gains remain smaller and more scenario dependent.
Figure 9 supports these findings by showing the SHAP analyses for CNN, XGB, and XNN in scenarios combining all external factors. Variables such as MaxInternalAmount, Overindebtedness_Score, and DAResDirection consistently emerge as top predictors, reflecting how borrower financial behavior interacts with macro-level disruptions. The shifting prominence of these features across models underscores the necessity of interpretability tools to contextualize model predictions in complex environments.
In conclusion, the combined influence of external shocks intensifies credit risk dynamics and reveals structural fragilities in both the borrower base and the credit portfolios. The models’ ability to detect and adapt to these stressors is contingent not only on the algorithmic design but also on how exogenous variables are incorporated, preprocessed, and aligned temporally. These findings reinforce the importance of building integrated, context-aware credit risk models—particularly in economies where multiple external threats converge and the margin for institutional response is limited.

5. Discussion and Research Implications

This study examined how credit delinquency prediction is affected by the incorporation of EF, including pandemic severity, climate anomalies, and social unrest. The findings demonstrate that these variables introduce significant and context-dependent shifts in credit behavior, particularly in scenarios without government-backed guarantees. At the same time, the process revealed methodological challenges that are increasingly relevant in the application of ML in dynamic financial environments.
The first challenge relates to the quality and availability of data on external disruptions. Publicly available records on COVID-19 mortality, weather events, and protest activity often lack consistency and timeliness, complicating their integration into high-frequency credit datasets. Similar issues have been noted in prior empirical efforts conducted in Peru, where the construction of reliable time series for economic modeling has required substantial preprocessing [28,29]. In this study, the alignment of mortality data through temporal shifts proved essential, as it revealed the delayed but pronounced influence of pandemic fatalities on repayment capacity.
Another source of complexity was the divergence between credit portfolios with and without public guarantees. Government-backed loans mitigated part of the risk during systemic shocks but also altered the underlying statistical distribution of defaults. This distinction made model generalization more challenging but also more necessary for understanding the buffering effects of financial policy. Prior studies in Latin America suggest that the performance of ML classifiers can vary significantly depending on whether policy instruments like emergency credit programs are present [27].
Overfitting risks were also non-trivial, particularly when combining multiple exogenous factors in deep learning architectures. Although regularization and cross-validation strategies were implemented, the complexity of models such as CNN and XNN occasionally reduced transparency. While post hoc interpretability tools like SHAP and LIME help uncover shifts in feature relevance across scenarios, they do not entirely resolve concerns regarding model opacity—especially in contexts where regulatory clarity and explainability are essential [30,59].
Despite these challenges, several insights emerge. Mortality-related variables—especially when time-lagged—exert the strongest influence on credit behavior, particularly in high-contact economic sectors. Climate-related disruptions, though more moderate in effect, consistently improved predictive accuracy, especially in vulnerable portfolios. Social unrest also introduces nonlinear shocks, further reinforcing the need for adaptive models that account for real-world volatility.
The integration of multiple external stressors revealed a compounding effect on delinquency, a finding that conventional econometric approaches may fail to capture. Tree-based and neural models, particularly XGB and CNN, showed the greatest adaptability in this regard. At the same time, classical models like Ridge maintained stable performance under certain constrained scenarios, suggesting a role for hybrid modeling strategies.
In the Peruvian context, these findings align with recent efforts to modernize credit risk assessment through advanced analytics. Several local studies highlight the value of incorporating contextual variables and borrower segmentation into scoring systems, particularly in the wake of the pandemic and its economic aftermath [29,35]. Internationally, there is growing consensus around the importance of multidimensional modeling frameworks that include environmental and social dimensions alongside traditional financial indicators.
Overall, this study contributes to the ongoing development of risk prediction systems capable of responding to complex and evolving realities. It underscores the importance of aligning exogenous variables not only temporally but also conceptually with the credit cycle. Future implementations of ML in finance—particularly in emerging economies—should prioritize data interoperability, scenario partitioning, and explainable design to ensure both accuracy and transparency in risk-based decision-making.

6. Conclusions and Future Research

This study has demonstrated that credit delinquency prediction improves significantly when EF—such as COVID-19 severity, climate anomalies, and social unrest—are systematically integrated into machine learning workflows. Using a multi-scenario design and high-resolution time series, the models captured both the direct and delayed effects of these external shocks across various economic activities and portfolio compositions.
By incorporating explanatory tools such as SHAP and LIME, and validating the results with formal stationarity and causality tests, the study provided both predictive and interpretive value. Notably, the performance of key metrics—AUC, ACC, KS, and F1—was significantly enhanced in models exposed to time-shifted mortality and climate-related variables. This confirmed that delayed systemic shocks (e.g., COVID-related deaths) exert stronger predictive influence than contemporaneous infection rates, and that road infrastructure disruptions propagate liquidity risks, particularly in non-guaranteed portfolios.
The comparative evaluation revealed that CNN and XGB consistently outperformed other models across multiple relevance partitions and scenarios. While EBM proved especially useful in contexts requiring greater model transparency, XNN served as an exploratory reference for intrinsically interpretable architectures, although its predictive effectiveness was limited in high-volatility scenarios. Moreover, the role of government-backed guarantees emerged as central: their presence moderated the effects of external crises, whereas their absence magnified the volatility and default risks observed in the models.
Overall, these findings offer a scenario-driven framework for understanding how systemic shocks affect credit behavior. This framework is replicable and adaptable to emerging economies facing concurrent crises, combining domain-sensitive data transformations (e.g., lag structures) with advanced modeling and interpretability techniques. Importantly, the integration of KS and F1 metrics deepens the evaluation of not just model accuracy but also the discriminatory power and balance between false positives and false negatives—factors critical to financial risk management and regulatory compliance.
Looking forward, several research directions are both relevant and necessary. First, the use of static lag structures could be extended through adaptive or dynamic lag modeling, particularly in the case of gradual-onset crises such as climate change or prolonged social instability. Second, cross-regional replication would test the generalizability of these findings in other vulnerable markets, where the interaction between policy buffers and external shocks may differ. Third, in-depth analysis of credit policies—such as emergency moratoria or targeted guarantees—would enable the finer-grained modeling of institutional effectiveness under stress. Finally, explainability remains a methodological frontier: while SHAP and LIME provide insights, future work could explore intrinsically interpretable models or causal frameworks that support regulatory transparency and stakeholder trust.
In conclusion, this research confirms the value of integrating multidimensional exogenous variables into credit risk modeling, offering a robust and explainable approach to capturing volatility in increasingly complex economic environments. As global financial systems face overlapping crises, such tools will become indispensable for promoting resilience, fairness, and informed decision-making in credit allocation.

Author Contributions

Conceptualization, J.N. and J.H.; methodology, J.N.; validation, J.N., L.R. and J.H.; formal analysis, J.N., J.C. and J.H.; investigation, J.N.; resources, J.N.; writing—original draft preparation, J.N.; writing—review and editing, J.N., L.R., J.C. and J.H.; visualization, J.N. and J.C.; supervision, J.H.; project administration, J.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study are partly publicly available and partly proprietary. Public datasets (COVID-19 cases, deaths, road blockages, and temperature anomalies) are accessible through the official URLs listed in Table 2. Proprietary financial datasets (credit delinquency activity and credits with external factors) were used under confidentiality agreements and have been anonymized and aggregated. Metadata, modeling programs, and experimental results have been deposited and are openly available at Zenodo: https://zenodo.org/records/14890903. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Results Tables

Table A1. Average AUC and ACC across 10 folds of ML models with COVID EF WGB_CR, by evaluated scenario and relevance groups.
Table A1. Average AUC and ACC across 10 folds of ML models with COVID EF WGB_CR, by evaluated scenario and relevance groups.
AUC (%)ACC (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WGB_CR
1. D100
cnn72.8273.6374.0974.090.821.281.2767.4970.0770.1870.142.582.692.65
ebm81.1381.2381.4281.260.100.290.1374.8474.9475.0574.930.090.200.08
ffn72.3072.5672.3973.380.260.101.0866.7465.2466.1566.14−1.50−0.59−0.60
ffnlp72.1172.3572.4172.720.250.300.6166.2867.5565.6465.191.27−0.64−1.09
ffnsp72.3172.3171.4272.770.00−0.890.4766.4066.4564.6765.590.05−1.73−0.81
knn67.2166.9266.9366.93−0.29−0.28−0.2863.2363.0062.9762.98−0.22−0.26−0.24
lda67.2368.0168.5468.250.781.311.0263.0963.7464.0963.770.651.000.67
mlp69.3070.8371.1770.471.531.871.1760.8263.2663.5862.872.442.762.06
percep57.9658.4459.8859.460.481.921.5044.5857.9956.3749.3113.4111.784.73
qda67.4466.9865.8167.17−0.46−1.63−0.2752.5054.0154.1353.721.521.641.22
rf81.7481.5181.2181.46−0.23−0.53−0.2975.5175.4475.2275.48−0.07−0.29−0.03
ridge66.9467.5568.0567.810.621.120.8762.9363.3963.7063.430.460.770.50
xgb80.7681.0081.1081.070.230.330.3074.7774.9875.0374.980.210.260.22
xnn50.1450.3850.1050.370.24−0.050.2253.4257.9962.7654.224.579.340.80
2. D99
cnn72.7573.6074.2773.570.861.520.8367.0470.0370.4269.972.983.372.93
ebm81.0481.2481.3781.240.190.330.2074.7774.9375.0474.940.160.280.18
ffn72.6972.3871.5071.55−0.31−1.20−1.1466.0765.9465.8665.75−0.14−0.21−0.32
ffnlp71.7172.8172.1672.451.090.440.7465.2766.8366.2865.941.561.020.67
ffnsp72.5873.1172.5872.990.53−0.000.4163.9365.3466.8166.171.412.872.24
knn67.1066.8866.8866.88−0.22−0.21−0.2163.2463.0263.0263.04−0.23−0.22−0.21
lda67.1767.9268.4668.190.751.291.0263.0763.6864.0463.780.600.960.70
mlp69.7968.9369.6268.70−0.86−0.16−1.0863.8364.1259.7664.960.29−4.071.13
percep55.0957.6754.1956.992.58−0.901.9055.4858.5653.9350.603.07−1.55−4.88
qda67.3266.9065.7467.08−0.42−1.58−0.2352.4153.9854.0253.631.571.621.22
rf81.6881.3381.0281.30−0.35−0.66−0.3875.6375.3275.0975.38−0.31−0.54−0.24
ridge66.8567.4767.9767.740.621.120.8962.9463.3963.6863.490.450.730.55
xgb80.7280.9680.9780.950.240.250.2474.8174.9774.8974.860.160.080.04
xnn50.4750.2550.4850.59−0.230.000.1157.5053.5559.2254.37−3.951.71−3.14
3. D80
cnn72.6873.1873.7973.500.501.110.8266.3870.2870.6570.443.904.264.06
ebm80.7381.0381.1681.050.290.430.3174.9275.1675.2975.190.240.370.27
ffn71.7672.1672.3872.830.410.621.0763.2466.9266.1967.183.682.953.94
ffnlp71.1472.0071.6671.970.860.510.8365.7366.9066.4565.441.160.72−0.29
ffnsp70.8772.0372.2972.131.151.421.2564.0765.6266.6965.651.542.611.58
knn66.5766.5766.5766.57−0.010.000.0062.9862.9562.9562.96−0.02−0.03−0.01
lda67.3367.9068.2968.070.570.960.7463.4563.8864.2064.020.430.750.57
mlp67.9668.8568.6169.920.890.661.9658.0963.1662.4360.735.074.342.64
percep57.4455.3959.2756.66−2.051.83−0.7852.8857.2250.1656.664.35−2.713.78
qda67.6966.5164.1166.30−1.18−3.58−1.3952.7654.2852.6853.731.52−0.080.97
rf81.1881.0380.6880.96−0.15−0.51−0.2275.4775.5775.2675.520.11−0.200.05
ridge66.6767.2367.5967.400.560.920.7363.0663.3563.7363.550.290.670.49
xgb80.2080.7580.7680.690.540.560.4974.7475.1375.1575.140.390.410.39
xnn50.3550.0749.8950.03−0.28−0.47−0.3363.2663.8060.0753.460.54−3.19−9.80
Note: This table summarizes the average AUC and ACC metrics obtained across 10 folds for machine learning models applied in binary classification scenarios (good and bad payers). The analysis is focused on datasets enriched with EF related to COVID-19, specifically scenarios including credits with governmental benefits (WGB_CR). Key findings include the following: 1. Models such as EBM and RF demonstrate robust performance, achieving the highest AUC values (above 81%), indicating strong discriminative capability for credit classification under EF-influenced scenarios. 2. The CNN model shows marked improvement in ACC, with increases of up to 4.26 percentage points when incorporating EF, highlighting its sensitivity to the enriched dataset. 3. Conversely, models such as KNN and XNN show limited or negative responsiveness to EF enrichment, with minimal improvements or declines in metrics. 4. The inclusion of governmental benefits as part of the COVID-19 EF notably enhances the classification performance for most models, emphasizing the critical role of such interventions in predictive modeling.
Table A2. Average AUC and ACC across 10 folds of ML models with COVID EF WOGB_CR, by evaluated scenario and relevance groups.
Table A2. Average AUC and ACC across 10 folds of ML models with COVID EF WOGB_CR, by evaluated scenario and relevance groups.
AUC (%)ACC (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WOGB_CR
1. D100
cnn72.9274.1074.1774.281.181.251.3567.5372.9372.6773.115.415.155.58
ebm81.3881.2981.4581.31−0.090.07−0.0776.5476.4176.5176.48−0.13−0.03−0.06
ffn72.7172.6073.3472.89−0.110.630.1867.4667.2967.4968.33−0.170.030.88
ffnlp72.1672.3271.6873.160.15−0.491.0066.8666.5663.5166.75−0.30−3.35−0.11
ffnsp72.4873.4473.7573.360.961.270.8865.1868.2667.0269.483.091.844.30
knn66.2165.9865.9865.99−0.23−0.23−0.2263.1962.8462.8262.84−0.35−0.36−0.35
lda66.0866.6867.1666.890.601.080.8163.1863.6463.9363.650.460.750.47
mlp70.9270.0771.0368.89−0.850.11−2.0360.2067.9663.4562.297.763.252.09
percep55.9454.8857.0455.72−1.061.10−0.2254.7759.5650.9257.584.78−3.852.81
qda66.8265.9764.7566.18−0.84−2.07−0.6450.7252.6652.5952.061.941.861.33
rf79.5379.0978.7578.99−0.44−0.78−0.5375.3475.3475.1775.210.01−0.16−0.13
ridge65.9566.4466.8966.660.500.940.7163.1463.4463.7763.540.300.630.40
xgb80.4380.6080.6480.680.170.210.2576.0076.1676.1376.240.160.130.24
xnn50.1450.7150.1550.190.560.000.0562.4866.6462.3754.504.17−0.10−7.98
2. D99
cnn73.0473.9574.1674.280.911.121.2468.5272.5672.9372.914.034.404.38
ebm81.2381.3181.4481.330.080.220.1076.3176.3976.4976.480.070.180.17
ffn72.5072.9572.4473.370.46−0.050.8764.3865.9364.9168.101.540.533.72
ffnlp72.7272.9273.3672.790.200.640.0768.0165.2969.5570.48−2.721.542.47
ffnsp73.1973.1272.3273.13−0.07−0.87−0.0667.8861.7568.8365.88−6.130.95−2.00
knn65.9466.0066.0066.000.060.060.0662.9262.5862.5862.59−0.35−0.34−0.33
lda66.0066.6767.1566.900.671.150.9063.1563.6364.0063.670.480.850.52
mlp68.7970.0670.2170.581.271.421.7970.5764.2064.2468.64−6.37−6.33−1.93
percep55.2856.6655.6456.591.370.361.3154.4556.4949.8055.102.04−4.650.65
qda66.7265.9564.6966.12−0.77−2.03−0.6050.5752.6552.7752.162.082.201.60
rf79.4178.9678.6078.91−0.46−0.81−0.5175.3175.1974.9075.24−0.13−0.41−0.07
ridge65.8566.4266.8766.650.571.020.8063.1363.4563.8263.580.320.690.45
xgb80.3580.6680.6580.610.320.300.2676.1376.2876.2176.250.150.080.12
xnn50.0850.7450.0049.900.67−0.08−0.1862.2265.3470.2966.023.128.073.80
3. D80
cnn73.2973.3273.5873.050.030.29−0.2468.0772.8673.2872.644.785.204.56
ebm80.7581.0581.1381.020.300.370.2776.3776.7276.7476.690.350.370.32
ffn71.1973.0572.6873.081.861.491.8863.4966.5366.5168.703.043.025.22
ffnlp72.2572.5771.9273.000.32−0.330.7464.3964.9664.6462.940.570.25−1.45
ffnsp71.7273.0772.2772.451.350.550.7268.2964.3964.7665.33−3.90−3.53−2.96
knn65.3765.3665.3665.37−0.01−0.01−0.0062.5462.5262.5162.53−0.02−0.03−0.00
lda65.8566.3566.7066.520.500.850.6763.2763.6463.9363.750.370.650.48
mlp69.1071.0769.1970.081.960.090.9762.9759.3852.3260.10−3.59−10.65−2.87
percep55.7156.8057.8655.391.092.16−0.3151.8155.3252.0550.983.510.24−0.83
qda66.6565.2963.0265.30−1.36−3.63−1.3550.9153.1250.7552.152.21−0.161.25
rf78.7178.5378.1278.56−0.18−0.59−0.1575.2975.5475.3975.650.250.100.36
ridge65.4865.9766.2666.130.480.780.6563.1463.3763.6563.580.230.510.43
xgb79.5780.1680.1680.200.590.590.6375.9276.4576.4476.450.540.520.54
xnn50.7650.5749.8949.89−0.19−0.87−0.8656.7155.7971.1966.89−0.9214.4810.18
Note: This table reports the average AUC and ACC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with COVID-19-related EF, but excluding credits with government-backed benefits (WOGB_CR). Notable observations include the following: 1. Models such as EBM and XGB maintain robust predictive capabilities, achieving AUC values above 80%, with minimal performance fluctuation in the absence of government-benefit data with EF. 2. Both CNN and MLP exhibit marked improvements in ACC, with the CNN model reaching gains of up to 5.58 percentage points in certain scenarios, thereby underscoring the impact of EF enrichment on classification accuracy. 3. The exclusion of government-benefit information adversely affects some models (e.g., RF, QDA, and XNN), which register declines in both AUC and ACC. This variability suggests that these models are more sensitive to the absence of such interventions. 4. Overall, these findings reveal that omitting data on governmental support introduces notable variability and potential biases in the models’ predictive performance—while some models capitalize on reduced bias, others suffer from diminished input granularity. Collectively, these results illuminate the nuanced effects of excluding government-benefit data on credit risk classification and emphasize the integral role that such interventions play in mitigating bias and enhancing model accuracy.
Table A3. Average KS and F1 across 10 folds of ML models with COVID EF WGB_CR, by evaluated scenario and relevance groups.
Table A3. Average KS and F1 across 10 folds of ML models with COVID EF WGB_CR, by evaluated scenario and relevance groups.
KS (%)F1 (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
1. D100
cnn36.0036.3437.6537.160.341.651.1674.0478.4378.3078.284.394.264.24
ebm46.6948.6649.0248.591.972.331.9080.4981.3681.4881.420.860.990.92
ffn34.1034.2934.8134.480.190.710.3870.1573.6170.7272.973.460.572.82
ffnlp35.4435.1334.9534.57−0.30−0.49−0.8769.6271.8671.9269.802.242.300.18
ffnsp7.505.346.9511.19−2.16−0.553.6847.1469.0932.3955.4921.95−14.758.35
knn26.0425.5325.4925.50−0.51−0.55−0.5469.5569.3169.2769.28−0.24−0.28−0.26
lda24.3225.2426.2825.800.931.971.4869.9270.2570.5670.280.330.640.36
mlp30.6728.9430.3730.44−1.73−0.30−0.2365.2974.9957.9068.999.70−7.393.70
percep13.1413.6415.8415.290.502.702.1529.2662.0459.0540.4332.7829.7911.17
qda26.0225.5523.2525.70−0.47−2.77−0.3249.5553.0954.1652.103.544.612.55
rf49.2748.8848.1948.76−0.39−1.07−0.5081.7082.1182.1382.150.410.430.45
ridge24.3325.2226.2825.790.881.951.4630.2730.9431.7331.390.681.461.12
xgb47.8848.2148.3148.120.330.430.2481.0381.4581.4881.450.430.450.42
xnn0.680.010.250.04−0.67−0.43−0.6473.6179.3372.1071.515.72−1.50−2.09
2. D99
cnn35.9636.5736.0136.500.610.040.5371.6578.7174.9478.327.053.296.67
ebm46.3348.3948.6548.272.062.321.9480.6781.3381.4781.270.660.790.60
ffn34.1035.2134.9034.311.110.800.2271.2272.9570.9973.041.73−0.221.82
ffnlp35.6433.8536.0334.20−1.780.39−1.4471.8470.9472.0269.87−0.900.18−1.97
ffnsp3.653.422.395.27−0.23−1.261.6230.7837.4038.0836.476.627.305.69
knn25.8525.6125.6225.65−0.24−0.23−0.2069.7069.3269.3369.35−0.38−0.37−0.35
lda24.2425.1726.1425.730.931.901.4970.0070.3070.5870.370.300.580.37
mlp28.4429.2630.1631.090.811.712.6466.9868.2269.2571.161.242.274.18
percep9.5612.659.0012.143.09−0.562.5855.6064.3853.5047.508.78−2.10−8.10
qda25.8625.5223.1025.68−0.34−2.77−0.1849.6053.0654.0352.023.464.432.42
rf48.9448.2948.0948.41−0.65−0.85−0.5281.7182.0681.9982.130.350.280.42
ridge24.2225.1726.1425.720.961.921.5030.2631.1331.9531.590.871.701.33
xgb47.7848.0448.0148.070.250.220.2981.1181.4781.4481.390.360.330.29
xnn0.470.260.460.21−0.21−0.00−0.2656.1564.2764.6556.028.128.51−0.13
3. D80
cnn34.9935.9435.8836.670.950.891.6773.5878.9077.7978.715.314.205.13
ebm45.9748.0548.3148.002.082.352.0381.0681.9482.0681.980.891.000.92
ffn31.0032.9833.9932.501.982.981.5073.8465.8369.2572.99−8.01−4.59−0.85
ffnlp32.1432.5334.7033.980.392.551.8367.5672.3975.0873.504.837.525.94
ffnsp4.749.544.964.684.800.22−0.0636.5634.0937.5228.57−2.470.96−7.99
knn25.0925.0725.0825.07−0.02−0.01−0.0269.8169.7869.7769.79−0.03−0.04−0.02
lda24.1824.9925.5225.470.821.351.2970.4970.7671.0670.930.270.570.44
mlp29.3228.1230.9930.30−1.211.670.9766.4675.9273.2265.009.466.76−1.46
percep12.469.7715.0411.38−2.692.59−1.0849.7659.2544.5159.759.49−5.259.99
qda26.2524.5220.3824.33−1.73−5.87−1.9251.2455.0353.5854.363.802.343.12
rf48.1648.2147.4447.940.06−0.72−0.2281.8782.6682.6082.600.790.730.73
ridge24.1724.9825.5325.450.811.361.2831.3332.1832.8832.360.851.551.03
xgb47.0347.8347.6647.800.800.630.7781.4082.0282.0582.020.620.650.62
xnn1.731.530.210.06−0.20−1.51−1.6739.4971.3072.9572.2531.8133.4532.75
Note: This table summarizes the average KS and F1 metrics obtained across 10 folds for machine learning models applied in binary classification scenarios (good and bad payers). The analysis is focused on datasets enriched with EF related to COVID-19, specifically scenarios including credits with governmental benefits (WGB_CR). Key findings include the following: 1. Models such as EBM and RF demonstrate consistently strong KS and F1 scores, indicating robust performance under EF-influenced scenarios. 2. The CNN model show high F1 improvements, with increases of over 4 percentage points, highlighting its sensitivity to enriched datasets. 3. Conversely, models such as KNN and XNN show limited or negative responsiveness to EF enrichment. 4. The inclusion of governmental benefits as part of the COVID-19 EF notably improves the classification metrics, underscoring the importance of contextual variables in predictive modeling.
Table A4. Average KS and F1 across 10 folds of ML models with COVID EF WOGB_CR, by evaluated scenario and relevance groups.
Table A4. Average KS and F1 across 10 folds of ML models with COVID EF WOGB_CR, by evaluated scenario and relevance groups.
KS (%)F1 (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WOGB_CR
1. D100
cnn35.3936.9238.5537.231.523.151.8474.2282.1982.2581.727.978.037.50
ebm47.0148.8249.2148.701.822.211.6983.3383.9683.9883.970.630.650.64
ffn35.6335.5635.6435.15−0.060.02−0.4873.8073.8970.7166.740.09−3.09−7.06
ffnlp35.1934.3536.4936.33−0.841.301.1474.4572.7673.7373.33−1.69−0.72−1.12
ffnsp0.005.001.232.585.001.232.5841.3170.6774.2530.4829.3632.94−10.84
knn24.8924.3424.3324.34−0.55−0.56−0.5671.0770.6970.6770.69−0.38−0.40−0.38
lda22.9523.9324.6124.390.981.651.4371.6671.8772.1871.930.210.520.27
mlp29.5330.3127.9533.210.78−1.583.6776.5776.4257.4965.01−0.15−19.07−11.55
percep10.329.3212.2110.34−1.001.890.0255.3365.6048.9764.2110.27−6.368.88
qda25.5523.9321.3123.99−1.62−4.24−1.5651.0654.9955.6853.723.924.622.66
rf45.8645.2144.5744.64−0.65−1.29−1.2282.6983.4283.3383.520.730.640.84
ridge22.9623.9324.6024.330.971.641.3732.9333.4734.1533.830.541.220.90
xgb47.3247.6747.7047.640.340.380.3183.3283.6983.6983.730.370.370.41
xnn1.520.590.490.38−0.94−1.04−1.1440.0359.1471.8250.7319.1131.7910.70
2. D99
cnn35.8837.5135.5737.841.63−0.311.9674.7782.0781.9181.997.307.147.22
ebm46.5848.9349.3748.822.352.792.2483.3583.9883.9983.970.630.640.62
ffn36.1434.3736.1935.45−1.770.05−0.6970.6776.8674.8669.896.194.18−0.78
ffnlp35.5734.9235.6034.61−0.650.04−0.9675.9771.1368.2772.44−4.84−7.70−3.53
ffnsp0.4011.952.334.7611.541.934.3541.3749.0854.6043.607.7113.232.22
knn24.2023.9423.9523.97−0.27−0.25−0.2370.9370.4770.4770.48−0.46−0.46−0.45
lda22.9923.7824.5024.210.791.521.2271.7071.8972.2472.020.190.550.32
mlp29.0829.6630.3430.190.581.261.1170.3253.4967.3060.32−16.83−3.02−10.00
percep10.4111.6310.3111.611.22−0.101.2058.5862.0347.1058.863.45−11.480.28
qda25.4323.9721.2323.91−1.46−4.20−1.5250.9755.0156.0353.984.045.063.01
rf45.5245.0944.3044.72−0.43−1.22−0.8082.8183.4283.3883.400.600.570.59
ridge22.9823.7224.4724.220.741.491.2332.9133.4134.1933.870.511.280.96
xgb47.1347.6647.7347.790.530.610.6783.4883.8283.7783.790.340.280.31
xnn1.020.000.000.03−1.02−1.01−0.9874.6174.3082.5566.02−0.327.94−8.59
3. D80
cnn35.7336.3137.1336.870.581.401.1576.6381.9982.3282.065.365.705.43
ebm45.5547.9748.3047.922.432.752.3883.6284.4384.4584.420.810.830.81
ffn33.4135.0034.0534.861.590.641.4577.4975.9576.7775.96−1.53−0.71−1.53
ffnlp33.3834.8735.5335.391.482.142.0069.6175.8176.7672.696.207.153.08
ffnsp0.014.620.213.164.600.203.1558.3252.3966.6632.04−5.948.33−26.28
knn23.4223.3923.3523.40−0.03−0.06−0.0270.8470.8270.8170.83−0.02−0.03−0.01
lda22.5923.2723.7523.420.681.160.8372.0072.1972.4372.370.190.420.36
mlp30.4232.0229.3531.941.60−1.071.5264.5681.4859.4877.5816.92−5.0913.01
percep10.3911.0113.1810.160.632.79−0.2350.3158.9354.1850.578.613.870.26
qda24.9522.5018.7622.76−2.45−6.18−2.1952.4256.9554.0655.544.531.643.12
rf44.1444.1943.5043.920.05−0.64−0.2283.0583.9083.9183.910.850.860.87
ridge22.5723.2623.7323.410.681.160.8333.3934.2734.8334.400.881.441.01
xgb45.7946.9446.7847.091.150.991.3083.5784.2284.2284.230.660.650.66
xnn0.740.470.970.51−0.270.23−0.2366.6759.7670.1367.23−6.913.460.56
Note: This table reports the average KS and F1 across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with COVID-19-related EF but excluding credits with government-backed benefits (WOGB_CR). Notable observations include the following: 1. Models such as EBM and XGB maintain robust predictive capabilities, achieving KS values above 45%, with minimal performance fluctuation in the absence of government-benefit data with EF. 2. Both CNN and MLP exhibit marked improvements in F1, with the CNN model reaching gains of up to 8.03 percentage points in certain scenarios, thereby underscoring the impact of EF enrichment on classification accuracy. 3. The exclusion of government-benefit information adversely affects some models (e.g., RF, QDA, XNN), which register declines in both KS and F1. This variability suggests that these models are more sensitive to the absence of such interventions. 4. Overall, these findings reveal that omitting data on governmental support introduces notable variability and potential biases in the models’ predictive performance—while some models capitalize on reduced bias, others suffer from diminished input granularity. Collectively, these results illuminate the nuanced effects of excluding government-benefit data on credit risk classification and emphasize the integral role that such interventions play in mitigating bias and enhancing model accuracy.
Table A5. Average AUC and ACC across 10 folds of ML models with COVID MOV EF WGB_CR, by evaluated scenario and relevance groups.
Table A5. Average AUC and ACC across 10 folds of ML models with COVID MOV EF WGB_CR, by evaluated scenario and relevance groups.
AUC (%)ACC (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WGB_CR
1. D100
cnn72.6373.3474.2574.040.711.611.4166.8569.8070.3669.982.953.513.13
ebm81.1381.2381.4081.270.100.280.1474.8474.9474.9374.950.090.090.10
ffn71.7372.7973.1173.021.051.371.2864.0167.5665.1365.753.551.121.73
ffnlp71.5871.9672.8372.170.381.250.5864.9665.4465.4366.570.480.471.61
ffnsp72.3572.1773.0772.46−0.180.720.1064.9366.3265.7966.451.390.861.52
knn67.2166.9266.8567.03−0.29−0.37−0.1863.2363.0062.9263.03−0.22−0.30−0.20
lda67.2368.0168.5668.270.781.331.0463.0963.7464.1263.850.651.030.76
mlp69.3070.8371.6270.011.532.310.7160.8263.2662.9360.512.442.12−0.31
percep57.9658.4458.1656.560.480.20−1.4044.5857.9947.0854.0313.412.509.45
qda67.4466.9865.8667.20−0.46−1.58−0.2452.5054.0154.0853.821.521.581.32
rf81.7481.5181.2081.46−0.23−0.54−0.2875.5175.4475.2975.47−0.07−0.22−0.04
ridge66.9467.5568.0767.830.621.140.8962.9363.3963.7863.550.460.850.62
xgb80.7681.0081.0581.090.230.280.3374.7774.9875.0075.050.210.230.28
xnn50.2150.4250.6949.860.210.47−0.3553.5264.6258.1162.3211.094.598.79
2. D99
cnn72.8073.4074.0673.790.601.250.9868.6070.0769.7970.221.481.191.63
ebm81.0481.2481.4081.270.190.360.2374.7774.9375.0174.950.160.240.19
ffn72.8272.5072.9072.94−0.330.080.1165.0767.3965.8167.262.310.742.19
ffnlp72.9372.3373.5071.57−0.600.57−1.3564.7966.3366.3666.761.541.561.96
ffnsp71.8872.7372.1372.970.850.251.0966.2366.6165.9767.170.38−0.260.94
knn67.1066.8867.1066.91−0.220.00−0.1963.2463.0263.1763.06−0.23−0.08−0.18
lda67.1767.9268.5368.220.751.361.0563.0763.6864.1263.790.601.050.72
mlp69.7968.9369.5769.17−0.86−0.22−0.6263.8364.1260.7758.270.29−3.06−5.56
percep55.0957.6753.7959.212.58−1.304.1255.4858.5655.1348.253.07−0.36−7.23
qda67.3266.9065.8067.15−0.42−1.52−0.1652.4153.9853.8953.671.571.491.26
rf81.6881.3381.1881.41−0.35−0.50−0.2775.6375.3275.3275.40−0.31−0.31−0.23
ridge66.8567.4768.0467.780.621.190.9362.9463.3963.7763.490.450.820.55
xgb80.7280.9681.1081.110.240.380.4074.8174.9775.1975.000.160.380.19
xnn49.9150.1150.0049.880.200.09−0.0256.2562.8662.6656.146.626.42−0.11
3. D80
cnn71.9873.1873.1973.241.211.211.2666.5170.2770.0670.113.753.553.60
ebm80.7381.0381.1781.060.290.440.3274.9275.1675.2775.200.240.350.28
ffn71.9771.9572.6271.74−0.020.65−0.2365.3864.7665.9667.04−0.620.591.67
ffnlp70.3172.0872.5170.731.772.200.4266.2367.1166.0365.580.89−0.20−0.65
ffnsp70.2771.8972.1672.291.621.892.0266.2065.4364.9165.77−0.77−1.29−0.43
knn66.5766.5766.5766.57−0.01−0.01−0.0062.9862.9562.9562.97−0.02−0.03−0.01
lda67.3367.9068.2968.070.570.960.7463.4563.8864.2064.100.430.750.65
mlp67.9668.8568.4767.870.890.52−0.0858.0963.1661.0958.345.072.990.25
percep57.4455.3956.6857.06−2.05−0.76−0.3852.8857.2259.5757.334.356.694.45
qda67.6966.5164.0966.23−1.18−3.60−1.4652.7654.2852.6154.331.52−0.151.57
rf81.1881.0380.6480.99−0.15−0.55−0.2075.4775.5775.2575.520.11−0.210.05
ridge66.6767.2367.5967.400.560.920.7363.0663.3563.7463.540.290.680.48
xgb80.2080.7580.7780.800.540.570.6074.7475.1375.2775.180.390.520.44
xnn50.4150.3950.0450.27−0.02−0.36−0.1453.2465.6960.3363.6112.457.0910.37
Note: This table reports the average AUC and ACC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with COVID-19-related EF under the COVID MOV scenario, where the time series for fatalities was shifted by one period. Notable observations include the following: 1. Models such as EBM, RF, and XGB demonstrate strong predictive performance, achieving AUC values exceeding 80% across all scenarios, with minimal impact from the addition of EF data. 2. The CNN model exhibit consistent improvements in both AUC and ACC, with accuracy gains of up to 3.75 percentage points, highlighting its robustness to temporal adjustments in EF data. 3. Models such as PERCEP and XNN show significant sensitivity to the temporal shift, with marked fluctuations in both AUC and ACC. These results suggest potential challenges in handling shifted time-series data, possibly because of their reliance on specific temporal patterns. 4. Among the classical algorithms, LDA and RIDGE show modest but consistent improvements across scenarios, underscoring their adaptability to enriched datasets with temporal modifications. 5. Overall, the COVID MOV experiment underscores the critical influence of temporal changes in external factors. This experiment demonstrates that variables derived from the same EF but influenced by their temporal dynamics must be carefully considered in the experimental design to optimize the predictive performance of ML models. These findings highlight the nuanced relationship between temporal adjustments in EF data and model performance, emphasizing the importance of robust experimental frameworks for mitigating variability and maximizing predictive accuracy.
Table A6. Average AUC and ACC across 10 folds of ML models with COVID MOV EF WOGB_CR, by evaluated scenario and relevance groups.
Table A6. Average AUC and ACC across 10 folds of ML models with COVID MOV EF WOGB_CR, by evaluated scenario and relevance groups.
AUC (%)ACC (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WOGB_CR
1. D100
cnn73.3174.1874.6174.100.871.300.8069.2672.8373.0572.783.573.803.52
ebm81.3881.2981.4981.40−0.090.110.0276.5476.4176.6676.52−0.130.12−0.02
ffn71.6673.0873.4573.191.421.791.5365.0869.3466.1262.944.261.03−2.14
ffnlp73.3273.0373.6671.79−0.290.34−1.5368.1370.6166.0363.722.48−2.09−4.41
ffnsp73.5172.1073.2072.72−1.41−0.30−0.7968.4766.4464.6767.81−2.03−3.80−0.66
knn66.2165.9865.8965.90−0.23−0.32−0.3163.1962.8462.6562.70−0.35−0.54−0.48
lda66.0866.6867.1866.950.601.100.8763.1863.6464.0063.700.460.820.52
mlp70.9270.0770.0269.70−0.85−0.90−1.2260.2067.9661.7461.317.761.541.10
percep55.9454.8855.1957.55−1.06−0.751.6154.7759.5650.0956.794.78−4.682.02
qda66.8265.9764.7466.34−0.84−2.08−0.4850.7252.6652.8052.241.942.081.51
rf79.5379.0978.4979.01−0.44−1.03−0.5275.3475.3474.9375.340.01−0.400.00
ridge65.9566.4466.9066.710.500.950.7763.1463.4463.7963.560.300.650.42
xgb80.4380.6080.6580.700.170.220.2676.0076.1676.1876.340.160.180.34
xnn49.9650.0049.8549.920.04−0.11−0.0558.1966.1765.9157.937.987.72−0.27
2. D99
cnn73.0674.0374.6274.770.971.561.7168.6672.7673.1273.244.104.474.58
ebm81.2381.3181.4581.340.080.220.1176.3176.3976.5176.450.070.200.14
ffn71.4073.3173.2073.081.911.801.6865.4567.4264.3965.511.98−1.050.06
ffnlp72.7972.7973.3272.590.010.53−0.2067.1567.4266.7166.480.27−0.44−0.66
ffnsp71.9471.9573.5371.600.011.59−0.3360.0564.2166.1365.704.176.085.65
knn65.9466.0065.8165.690.06−0.13−0.2562.9262.5862.6762.59−0.35−0.25−0.33
lda66.0066.6767.1166.880.671.110.8863.1563.6364.0063.680.480.850.53
mlp68.7970.0669.6169.621.270.820.8370.5764.2058.6154.19−6.37−11.96−16.38
percep55.2856.6656.6556.401.371.371.1254.4556.4957.6960.592.043.246.14
qda66.7265.9564.7566.15−0.77−1.97−0.5750.5752.6552.6051.892.082.031.32
rf79.4178.9678.6178.90−0.46−0.80−0.5175.3175.1974.9475.23−0.13−0.37−0.08
ridge65.8566.4266.8266.640.570.970.7963.1363.4563.8263.490.320.690.35
xgb80.3580.6680.6180.560.320.260.2276.1376.2876.1576.200.150.020.07
xnn49.8949.8950.0050.220.000.120.3353.9270.0966.2266.5716.1712.3012.65
3. D80
cnn72.5873.3674.0473.730.781.461.1566.4272.6673.5873.196.257.166.78
ebm80.7581.0581.1181.060.300.360.3176.3776.7276.7376.730.350.360.36
ffn72.4772.1573.0771.92−0.320.59−0.5564.0867.1965.9168.573.111.834.49
ffnlp70.2672.5673.1771.302.312.921.0461.7567.3766.6463.255.624.891.50
ffnsp71.9972.7072.6672.220.720.670.2369.5467.2468.9967.02−2.30−0.56−2.52
knn65.3765.3665.3665.37−0.01−0.01−0.0062.5462.5262.5162.54−0.02−0.030.01
lda65.8566.3566.7066.510.500.850.6663.2763.6463.9463.750.370.660.47
mlp69.1071.0770.6371.451.961.532.3462.9759.3860.6565.13−3.59−2.332.16
percep55.7156.8057.6559.301.091.943.5951.8155.3250.6958.553.51−1.126.74
qda66.6565.2963.2365.24−1.36−3.42−1.4150.9153.1251.3952.642.210.491.73
rf78.7178.5378.0878.54−0.18−0.63−0.1775.2975.5475.3675.570.250.080.28
ridge65.4865.9766.2666.120.480.780.6363.1463.3763.6363.570.230.490.43
xgb79.5780.1680.2180.220.590.640.6575.9276.4576.5176.390.540.590.47
xnn50.3850.2350.5650.42−0.150.180.0467.9863.9269.1959.87−4.061.20−8.11
Note: This table reports the average AUC and ACC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with COVID-19-related EF under the COVID MOV scenario, where the time series for fatalities was shifted by one period and government-backed credits were excluded. Notable observations include the following: 1. Models such as EBM, RF, and XGB maintain consistent predictive performance, achieving AUC values above 80% across scenarios, with minimal impact from the exclusion of government-backed credits. 2. The CNN model demonstrates substantial improvements in ACC, with gains of up to 4.58 percentage points, further emphasizing its robustness in adapting to temporal adjustments and reduced data granularity. 3. Neural network-based models such as MLP and XNN exhibit contrasting behaviors; while MLP shows significant declines in both AUC and ACC in certain scenarios, XNN achieves notable gains in accuracy (up to 16.17 percentage points) under specific conditions. 4. Classical algorithms such as LDA and RIDGE consistently show moderate improvements, reflecting their adaptability to datasets that exclude external financial support. 5. Models such as FFNLP and FFNSP are sensitive to the removal of government-backed credits, with performance fluctuations in both AUC and ACC, underscoring the importance of these credits in predictive accuracy for some architectures. 6. Overall, the results reveal that excluding government-backed credits introduces variability in predictive performance, with tree-based models (RF, XGB) and the CNN model demonstrating greater resilience, whereas some neural network models face challenges adapting to the reduced dataset complexity. These findings highlight the nuanced effects of excluding government-backed credits on model performance and demonstrate the introduced bias. They also highlighted the need for robust EF and scenario-specific tuning to optimize the predictive accuracy.
Table A7. Average KS and F1 across 10 folds of ML models with COVID MOV EF WGB_CR, by evaluated scenario and relevance groups.
Table A7. Average KS and F1 across 10 folds of ML models with COVID MOV EF WGB_CR, by evaluated scenario and relevance groups.
KS (%)F1 (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WGB_CR
1. D100
cnn35.9336.4037.9337.610.482.011.6873.5078.4178.7478.254.915.244.75
ebm46.6948.6648.9048.671.972.211.9880.4981.3681.4481.310.860.940.82
ffn34.6934.0835.4234.02−0.610.73−0.6771.6271.2071.9671.10−0.420.34−0.53
ffnlp35.1033.5835.2835.56−1.520.170.4670.6270.7369.8370.580.11−0.79−0.03
ffnsp4.957.365.474.802.410.52−0.1544.4335.6752.8443.98−8.768.41−0.45
knn26.0425.5325.3825.68−0.51−0.65−0.3669.5569.3169.2369.29−0.24−0.31−0.26
lda24.3225.2426.3325.910.932.011.5969.9270.2570.6470.390.330.720.47
mlp30.6731.7630.3530.771.09−0.320.0965.2969.1764.4268.993.88−0.873.70
percep13.1413.6413.1212.610.50−0.02−0.5329.2662.0436.1553.4532.786.8924.19
qda26.0225.5523.4325.72−0.47−2.59−0.3049.5553.0954.0852.183.544.532.63
rf49.2748.8848.0848.57−0.39−1.18−0.7081.7082.1182.1382.100.410.430.40
ridge24.3325.2226.3225.890.881.991.5630.2730.9431.7631.650.681.491.38
xgb47.8848.2148.1148.150.330.230.2781.0381.4581.4781.480.430.440.46
xnn0.030.640.020.040.61−0.010.0163.6073.1247.5963.549.52−16.01−0.06
2. D99
cnn35.8636.3537.8137.220.491.961.3772.2378.0778.2778.145.846.045.91
ebm46.3348.3948.8248.142.062.491.8180.6781.3381.5081.390.660.820.72
ffn33.4934.5035.5635.311.012.071.8266.8272.0870.0172.455.263.195.63
ffnlp34.7132.8235.2534.09−1.890.55−0.6270.7073.1471.9969.722.441.29−0.98
ffnsp2.497.292.463.074.81−0.030.5930.0135.2536.7338.745.256.728.73
knn25.8525.6125.9025.66−0.240.06−0.1969.7069.3269.4669.38−0.38−0.24−0.32
lda24.2425.1726.2625.780.932.011.5470.0070.3070.6770.380.300.670.38
mlp28.4429.2632.1429.490.813.701.0466.9868.2271.8163.571.244.83−3.42
percep9.5612.659.5214.943.09−0.045.3855.6064.3856.3538.098.780.75−17.51
qda25.8625.5223.2325.59−0.34−2.63−0.2749.6053.0653.7252.023.464.122.42
rf48.9448.2948.0548.64−0.65−0.89−0.3081.7182.0682.1382.150.350.420.45
ridge24.2225.1726.2725.780.962.051.5630.2631.1331.7631.650.871.501.40
xgb47.7848.0448.2448.430.250.450.6581.1181.4781.6581.490.360.540.38
xnn0.230.100.850.97−0.130.610.7479.3247.9176.8075.84−31.41−2.52−3.48
3. D80
cnn34.9436.1536.8736.251.211.931.3169.3978.6279.1779.289.239.779.88
ebm45.9748.0548.3048.012.082.342.0581.0681.9482.0681.930.891.000.88
ffn33.8730.6933.2933.45−3.18−0.59−0.4270.9870.9862.9568.350.00−8.03−2.63
ffnlp32.0133.2033.7433.501.191.721.4972.9571.8173.4569.45−1.140.50−3.51
ffnsp10.7010.027.704.07−0.68−3.00−6.6351.2664.3948.9755.3613.13−2.294.10
knn25.0925.0725.0725.08−0.02−0.02−0.0169.8169.7869.7769.80−0.03−0.04−0.01
lda24.1824.9925.5725.430.821.401.2670.4970.7671.0970.900.270.600.41
mlp29.3228.1230.8931.28−1.211.571.9566.4675.9269.7662.119.463.30−4.35
percep12.469.7710.9711.87−2.69−1.49−0.5849.7659.2564.8261.349.4915.0611.58
qda26.2524.5220.4824.00−1.73−5.78−2.2551.2455.0353.4155.553.802.184.31
rf48.1648.2147.3547.910.06−0.81−0.2581.8782.6682.6282.640.790.750.76
ridge24.1724.9825.5525.420.811.381.2531.3332.1832.8932.310.851.560.98
xgb47.0347.8347.7247.710.800.680.6881.4082.0282.1482.060.620.740.66
xnn1.790.001.820.45−1.790.03−1.3473.5780.3178.8272.136.755.26−1.43
Note: This table reports the average KS and F1 across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with COVID-19-related EF under the COVID MOV scenario, where the time series for fatalities was shifted by one period. Notable observations include the following: 1. Models such as EBM, RF, and XGB demonstrate strong predictive performance, achieving KS values exceeding 45% and F1 scores above 80% across most scenarios, with minimal impact from the addition of EF data. 2. The CNN model exhibits consistent improvements in both KS and F1, with F1 score gains of up to 9.88 percentage points, highlighting its robustness to temporal adjustments in EF data. 3. Models such as PERCEP and XNN show significant sensitivity to the temporal shift, with marked fluctuations in both KS and F1. These results suggest potential challenges in handling shifted time-series data, possibly because of their reliance on specific temporal patterns. 4. Among the classical algorithms, LDA and RIDGE show modest but consistent improvements across scenarios, underscoring their adaptability to enriched datasets with temporal modifications. 5. The FFNSP model displays extreme variability in performance, particularly in the F1 scores, ranging from 4.07% to 64.39% across different scenarios. 6. Overall, the COVID MOV experiment underscores the critical influence of temporal changes in external factors. This experiment demonstrates that variables derived from the same EF but influenced by their temporal dynamics must be carefully considered in the experimental design to optimize the predictive performance of ML models. These findings highlight the nuanced relationship between temporal adjustments in EF data and model performance, emphasizing the importance of robust experimental frameworks for mitigating variability and maximizing predictive accuracy.
Table A8. Average KS and F1 across 10 folds of ML models with COVID MOV EF WOGB_CR, by evaluated scenario and relevance groups.
Table A8. Average KS and F1 across 10 folds of ML models with COVID MOV EF WOGB_CR, by evaluated scenario and relevance groups.
KS (%)F1 (%)
Scenarios Results Scenarios Results
SFE P F + P F P - SFE F + P - SFE F - SFE SFE P F + P F P - SFE F + P - SFE F - SFE
WOGB_CR
1. D100
cnn36.1137.3638.5237.331.252.421.2375.9782.0182.1981.776.056.225.80
ebm47.0148.8249.1048.981.822.101.9783.3383.9684.0284.020.630.680.69
ffn35.8133.7535.8635.51−2.060.05−0.3075.1871.7174.1674.06−3.47−1.02−1.13
ffnlp35.8434.8834.5335.82−0.95−1.31−0.0272.4373.5375.6467.771.103.21−4.66
ffnsp4.667.410.510.002.75−4.16−4.6653.2951.7141.2557.73−1.58−12.044.44
knn24.8924.3423.9123.98−0.55−0.98−0.9171.0770.6970.5470.62−0.38−0.54−0.46
lda22.9523.9324.6124.240.981.651.2871.6671.8772.2071.980.210.540.31
mlp29.5330.3132.0228.570.782.48−0.9676.5776.4269.6162.90−0.15−6.96−13.66
percep10.329.329.5412.82−1.00−0.782.5055.3365.6046.2861.2810.27−9.055.95
qda25.5523.9321.3524.43−1.62−4.20−1.1251.0654.9955.9853.963.924.922.89
rf45.8645.2143.9344.97−0.65−1.93−0.8982.6983.4283.3683.390.730.670.70
ridge22.9623.9324.5824.210.971.621.2532.9333.4734.1033.690.541.160.76
xgb47.3247.6747.6447.480.340.320.1683.3283.6983.7383.820.370.400.50
xnn1.020.020.000.17−1.00−1.02−0.8543.9166.0174.2565.9722.1030.3422.06
2. D99
cnn36.4936.4637.8637.35−0.031.370.8574.6981.7482.0181.967.067.327.27
ebm46.5848.9349.1048.742.352.522.1683.3583.9884.0384.010.630.680.66
ffn35.4635.0135.5235.71−0.450.070.2673.0468.6875.9775.23−4.362.932.19
ffnlp32.0834.4436.3333.572.364.251.4975.4769.0473.5473.22−6.43−1.93−2.25
ffnsp0.512.384.644.671.874.124.1541.3737.0735.6245.06−4.31−5.763.68
knn24.2023.9423.8723.68−0.27−0.33−0.5270.9370.4770.6070.55−0.46−0.32−0.37
lda22.9923.7824.4324.270.791.441.2871.7071.8972.2671.940.190.560.24
mlp29.0829.6630.5030.280.581.431.2070.3253.4968.3273.70−16.83−1.993.39
percep10.4111.6312.0610.781.221.650.3758.5862.0361.8966.683.453.318.10
qda25.4323.9721.1524.20−1.46−4.28−1.2450.9755.0155.7453.514.044.772.54
rf45.5245.0944.2345.05−0.43−1.29−0.4782.8183.4283.3683.350.600.550.54
ridge22.9823.7224.4324.250.741.441.2632.9133.4134.2533.940.511.351.03
xgb47.1347.6647.7447.500.530.620.3783.4883.8283.7483.750.340.260.27
xnn0.820.000.120.00−0.81−0.69−0.8266.4874.3082.4782.537.8215.9916.05
3. D80
cnn35.5036.0436.8436.650.531.341.1573.1382.6282.5581.959.499.428.83
ebm45.5547.9748.2747.982.432.722.4383.6284.4384.4684.400.810.840.79
ffn34.4234.0834.1134.17−0.34−0.31−0.2576.3369.5674.9767.61−6.76−1.35−8.71
ffnlp33.6732.1735.1034.43−1.491.430.7671.0672.6770.8574.391.60−0.213.32
ffnsp2.260.019.655.17−2.257.402.9137.7016.6650.7155.02−21.0413.0117.31
knn23.4223.3923.3723.41−0.03−0.05−0.0170.8470.8270.8170.84−0.02−0.030.00
lda22.5923.2723.7323.450.681.140.8672.0072.1972.4172.400.190.410.40
mlp30.4232.0231.3429.191.600.92−1.2364.5681.4861.7271.1116.92−2.846.55
percep10.3911.0112.6514.680.632.264.2950.3158.9350.5363.938.610.2213.62
qda24.9522.5019.0522.58−2.45−5.90−2.3752.4256.9555.2056.384.532.773.96
rf44.1444.1943.6243.980.05−0.52−0.1683.0583.9083.8583.860.850.800.82
ridge22.5723.2623.7423.450.681.170.8733.3934.2734.7834.400.881.401.01
xgb45.7946.9447.0747.151.151.281.3683.5784.2284.2884.180.660.710.62
xnn0.250.000.010.05−0.25−0.25−0.2066.5074.9966.6766.628.490.180.12
Note: This table presents the average KS and F1 scores across ten folds for binary credit classification (good vs. bad payers) in datasets without group balancing (WOGB_CR) under the COVID MOV scenario, where the time series for fatalities was shifted by one period. Key findings include the following: 1. Models like EBM, RF, and XGB maintain robust performance, with KS values consistently above 45% and F1 scores exceeding 83% across all scenarios. The temporal shift in EF data has minimal impact on these models, demonstrating their stability. 2. The FFNSP model exhibit extreme performance fluctuations, particularly in D80, with KS ranging from 0.01% to 9.65% and F1 from 16.66% to 55.02%. This suggests sensitivity to both temporal shifts and dataset characteristics. 3. The XNN model shows significant variability in D100 and D99, with F1 scores ranging from 43.91% to 74.25% but stabilizing in D80. This indicates that its performance is highly dependent on the temporal alignment of EF data. 4. The LDA and RIDGE exhibits modest but consistent improvements, with F1 scores increasing by up to 1.40 percentage points when EF data are included, highlighting their reliability in unbalanced datasets. 5. The MLP model shows significant declines in D99 (F1 drop of 16.83 percentage points in P - SFE), suggesting potential overfitting or sensitivity to specific temporal configurations. 6. The WOGB_CR results reveal more pronounced variability compared to WGB_CR, particularly for models like PERCEP and FFNSP, underscoring the challenges of unbalanced datasets in temporal shift scenarios. These observations emphasize the importance of dataset balance and temporal alignment in model performance, particularly for algorithms sensitive to data distribution shifts. The results advocate for careful model selection and scenario testing in credit risk applications involving temporal EF adjustments.
Table A9. Average AUC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
Table A9. Average AUC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
AUC (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WGB_CR
1. D100
cnn62.0564.7959.9364.1160.5962.6460.9064.562.74−2.122.05− 1.460.59−1.152.51
ebm75.0874.9975.0775.0175.0675.1975.1575.03−0.10−0.01−0.07− 0.020.110.07−0.05
ffn67.4267.3967.0767.3367.5567.9567.5967.12−0.03−0.35−0.090.130.530.17−0.30
ffnlp67.6667.4267.9567.8567.4767.8967.5868.24−0.240.290.20− 0.190.23−0.080.58
ffnsp67.4568.1766.8368.2267.2167.1168.1767.530.72−0.620.77− 0.24− 0.330.720.08
knn61.0861.2261.2461.2561.2561.2461.2461.260.140.170.180.170.160.170.18
lda65.5565.8465.9265.9365.9866.0365.9965.830.300.370.390.430.490.440.28
mlp65.4064.9064.3963.9164.5465.0164.7964.58−0.51−1.01−1.49− 0.86− 0.39−0.62−0.82
percep54.8956.4555.8253.6552.7955.1753.0355.261.560.93−1.24− 2.090.28−1.860.37
qda67.2167.1466.5866.4266.2064.9865.5567.07−0.07−0.63−0.79− 1.01− 2.23−1.67−0.15
rf76.8377.0276.6776.7376.6176.3276.5476.810.19−0.16−0.10− 0.21− 0.50−0.28−0.02
ridge64.3064.6164.7164.7764.8264.8964.8564.620.300.400.470.520.590.540.32
xgb74.2175.3175.5475.2475.3675.5175.5275.471.091.331.031.151.301.311.26
xnn50.2751.0749.3750.7151.7452.5150.0750.350.79−0.900.441.472.24−0.200.08
2. D99
cnn62.5163.7161.4362.8861.6861.3261.3564.211.20−1.080.38− 0.83− 1.19−1.161.71
ebm74.8174.9074.9974.9875.0175.0875.0574.970.090.180.160.190.260.230.16
ffn68.0168.4268.1567.8567.5667.8367.7367.590.410.14−0.17− 0.45− 0.18−0.29−0.42
ffnlp67.8867.3567.8366.7868.2768.4067.3468.33−0.53−0.05−1.100.390.52−0.540.45
ffnsp67.7166.8568.1367.9967.8368.0867.1267.98−0.860.410.270.120.36−0.590.26
knn60.9261.1261.0161.0361.0361.0261.0261.030.200.100.120.110.100.100.11
lda65.5465.8865.9665.9666.0066.0766.0365.870.340.420.420.470.530.490.33
mlp64.6363.5664.8164.2665.5766.0262.2364.00−1.070.18−0.360.941.39−2.40−0.63
percep53.0857.1253.7855.4656.7155.1055.5656.064.040.702.383.622.012.482.98
qda67.1967.1366.6266.4266.2165.0165.5667.08−0.06−0.57−0.77− 0.98− 2.18−1.63−0.11
rf76.6576.8876.8576.9176.7576.5176.7676.890.230.200.260.09− 0.140.110.24
ridge64.3064.6264.7364.7864.8364.9164.8764.640.320.430.470.520.610.560.34
xgb74.2175.1675.6375.4375.4375.5175.3775.360.951.421.231.221.301.171.16
xnn50.7851.4951.6250.0050.7050.8850.3250.510.720.85−0.77− 0.080.11−0.45−0.27
3. D80
cnn62.2363.2959.6664.7261.5861.9861.4863.531.05−2.572.49− 0.66− 0.25−0.751.30
ebm74.7074.9075.1075.0275.0575.1575.1775.030.190.400.310.350.450.460.33
ffn66.8766.3567.6968.2367.4267.7368.1867.09−0.530.821.360.550.861.310.22
ffnlp68.5268.0667.9867.6868.1067.9867.1767.56−0.46−0.54−0.83− 0.42− 0.53−1.34−0.95
ffnsp67.7467.3167.3467.7067.4968.5267.8267.37−0.43−0.40−0.04− 0.250.780.08−0.37
knn61.0561.0461.0361.0461.0461.0461.0561.05−0.01−0.02−0.01− 0.01− 0.02−0.00−0.00
lda66.1866.2966.3566.3966.4366.4866.4266.260.110.170.210.240.290.240.08
mlp64.3663.7265.6064.2264.1965.1164.7165.86−0.641.24−0.14− 0.170.750.351.50
percep49.7853.4252.7151.4651.6553.6553.8950.503.642.931.681.873.874.110.72
qda67.3867.2766.7866.7166.5465.5466.1667.33−0.10−0.59−0.66− 0.84− 1.84−1.22−0.05
rf76.5076.8076.7576.8376.7976.4476.6076.850.300.250.340.29− 0.060.100.36
ridge64.8965.0165.1365.2365.2765.3365.2765.040.120.230.330.380.440.380.15
xgb74.1875.1475.6475.3775.5075.5975.3575.500.961.461.191.321.411.171.32
xnn51.3950.0050.0350.4850.0050.0050.8249.76−1.39−1.37−0.92− 1.40− 1.39−0.58−1.64
Note: This table reports the average AUC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change under scenarios that include government-backed guarantees (WGB_CR). Key findings include the following: 1. Models such as EBM, RF, and XGB consistently achieve AUC values greater than 75%, showing robust predictive performance. 2. Neural network models, such as CNN benefit significantly from EF enrichment, with gains of up to 2.74 percentage points. 3. Classical models, such as LDA and RIDGE demonstrate resilience to EF variability, with minimal AUC fluctuation. 4. Some models (MLP, XNN, QDA) show sensitivity to EF shifts, highlighting potential challenges with scenario-specific performance. 5. Including WGB_CR data appears to stabilize the performance, with tree-based and neural network models benefiting the most.
Table A10. Average ACC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
Table A10. Average ACC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
ACC (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WGB_CR
1. D100
cnn52.6165.4184.9483.7584.3885.0185.1183.9512.8032.3231.1431.7732.4032.5031.34
ebm85.1785.0485.0084.9985.0585.0085.0184.98−0.13−0.16−0.17−0.12− 0.17−0.16−0.19
ffn64.4469.2357.3962.2659.0668.6163.6466.314.79−7.05−2.18−5.384.18−0.801.88
ffnlp67.8971.1366.2262.9264.4068.4367.2857.413.24−1.67−4.97−3.490.54−0.61−10.48
ffnsp63.3365.6972.2265.1062.5267.3065.6664.662.368.891.77−0.813.972.331.33
knn64.2864.3264.2664.2864.2764.2664.2564.260.04−0.020.00−0.01− 0.02−0.03−0.01
lda66.6266.8066.9966.8566.9467.0867.0366.840.180.370.220.320.460.400.22
mlp64.7561.5565.9470.3764.7562.9360.4755.55−3.201.195.62−0.00− 1.82−4.28−9.20
percep47.7645.1045.3555.4964.6746.9755.8046.33−2.66−2.427.7316.91− 0.798.03−1.43
qda63.1864.9666.3564.0664.2663.0563.8366.241.783.160.881.07− 0.130.653.06
rf83.9385.3485.6685.5585.6585.6685.6785.621.411.721.611.721.731.741.68
ridge65.5265.7465.9465.9466.0166.0566.0165.970.220.430.430.500.540.490.45
xgb83.7584.7884.9484.9284.9384.9985.0685.011.031.191.171.171.241.301.26
xnn69.5971.1876.4974.4677.0856.1862.8374.821.596.904.877.49−13.41−6.765.23
2. D99
cnn54.3761.2784.9880.9883.8685.0584.9883.426.9030.6126.6129.4830.6830.6129.05
ebm85.1885.0185.0084.9684.9984.9984.9884.98−0.17−0.18−0.22−0.19− 0.19−0.20−0.20
ffn64.9463.7860.0765.0457.6064.2962.5969.11−1.16−4.870.10−7.34− 0.65−2.354.16
ffnlp63.6761.1565.3467.5164.2664.0866.4366.05−2.521.673.830.590.412.762.37
ffnsp60.1964.8662.7764.3965.2769.7165.1762.924.672.594.205.089.524.982.73
knn64.3364.2564.1164.1464.1264.1164.1264.13−0.09−0.22−0.19−0.21− 0.22−0.21−0.21
lda66.6366.8566.9466.8266.8767.0466.9866.830.220.300.190.240.410.350.19
mlp50.9471.5170.3165.5064.5764.0268.4467.0020.5719.3714.5613.6313.0817.5116.06
percep56.2837.1858.0449.4742.8645.7347.0446.14−19.101.76−6.80−13.41−10.54−9.24−10.13
qda65.6364.6666.7264.4764.6463.3864.1466.64−0.971.09−1.17−0.99− 2.25−1.491.01
rf83.7785.3585.6085.4385.6185.6385.6185.551.591.841.671.841.871.851.78
ridge65.5665.7265.9265.9266.0166.0666.0365.960.160.360.360.450.500.470.40
xgb83.7184.7785.0184.8884.9784.9584.9384.981.061.301.181.261.241.221.27
xnn72.4678.3075.6978.1080.9973.4980.0671.885.843.225.648.531.037.60−0.59
3. D80
cnn55.0760.3284.8582.7585.0683.2185.1381.135.2629.7827.6830.0028.1430.0626.07
ebm84.9285.0285.0785.0285.0585.0885.0785.060.100.150.110.130.160.150.14
ffn64.8066.6067.4467.1568.3758.2766.2165.291.802.642.353.57− 6.531.410.49
ffnlp61.2662.8161.6669.2463.2963.7463.7165.191.550.407.982.042.482.453.93
ffnsp62.5670.9471.4564.2666.0461.7964.6462.418.388.891.703.48− 0.772.08−0.15
knn64.3864.3764.3564.3764.3664.3464.3564.37−0.01−0.04−0.02−0.02− 0.04−0.03−0.01
lda66.8267.0667.1467.1067.1667.3367.2267.020.240.320.270.340.500.400.20
mlp74.2161.5958.7661.6260.9259.3162.2755.87−12.62−15.45−12.60−13.30−14.90−11.94−18.34
percep75.0653.3454.0564.3370.0557.2961.2965.05−21.72−21.01−10.72−5.01−17.77−13.77−10.01
qda64.3865.2466.1462.8663.2362.2462.9966.210.861.76−1.53−1.16− 2.15−1.391.83
rf83.5685.2385.5785.4785.5985.5885.5685.481.672.011.922.042.022.011.93
ridge65.6365.9566.1466.1466.1966.3666.2266.130.310.500.500.560.730.590.50
xgb83.5784.7585.0384.9085.0184.9985.0484.961.171.461.321.441.421.461.38
xnn54.6478.1678.2079.4185.1971.1281.6161.0723.5223.5624.7730.5516.4826.976.43
Note: This table reports the average ACC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change under scenarios including government-backed guarantees (WGB_CR). Key findings include the following: 1. Tree-based models (EBM, RF, XGB) achieve accuracy values exceeding 84%, demonstrating stability and adaptability. 2. Neural networks such as CNN exhibit significant gains, achieving up to 32.50 percentage points improvement in specific scenarios. 3. Classical models (LDA, RIDGE) show modest but consistent improvements, with KNN maintaining a stable performance. 4. The PERCEP model displays variability, with gains of up to 8.03% but declines in the others. 5. Scenarios combining social unrest and climate change EFs led to notable gains for specific models (CNN, XNN). 6. Including WGB_CR enhances stability and consistency, particularly by benefiting neural networks and tree-based models.
Table A11. Average KS across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
Table A11. Average KS across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
KS (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WGB_CR
1. D100
cnn22.7122.4715.1923.1017.0718.7620.0922.50−0.24−7.510.39−5.64−3.94−2.62−0.21
ebm34.9337.0736.9036.4536.7636.9536.8736.672.141.971.511.832.021.941.74
ffn26.4227.0425.3128.0727.6627.6927.8526.730.61−1.111.651.241.271.430.31
ffnlp26.4125.8927.3326.4926.7526.8327.9427.92−0.530.910.080.340.421.521.50
ffnsp17.9112.9014.6413.0215.1511.0711.4914.58−5.01−3.27−4.89−2.76−6.84−6.42−3.33
knn17.1117.1417.0817.0917.0917.0617.0817.140.03−0.03−0.01−0.02−0.05−0.020.03
lda22.7623.2923.2923.3423.4723.4423.3823.250.520.520.580.710.680.620.48
mlp22.7324.1721.9521.9324.4423.0223.9623.371.44−0.78−0.801.710.291.230.65
percep13.1816.4613.9912.8812.2214.1210.6914.193.280.81−0.30−0.960.94−2.491.01
qda27.3527.4925.7726.2125.7523.3124.4526.940.14−1.58−1.14−1.59−4.04−2.90−0.41
rf40.3940.7639.5439.8939.8839.5739.3740.180.37−0.85−0.50−0.51−0.82−1.02−0.21
ridge22.7423.2623.2423.2823.4023.4323.3323.180.520.500.540.660.690.590.44
xgb37.0938.4738.6937.9938.2938.7238.5838.471.391.600.901.201.631.491.38
xnn2.410.043.911.410.752.460.162.97−2.371.50−1.00−1.660.05−2.250.55
2. D99
cnn19.0822.9011.3319.5113.7814.8113.9417.883.82−7.750.43−5.29−4.27−5.14−1.19
ebm0.000.000.000.000.000.000.000.000.000.000.000.000.000.000.00
ffn26.1624.9724.2925.9424.8925.9424.7126.08−1.18−1.86−0.22−1.27−0.22−1.44−0.08
ffnlp27.1827.7126.9627.2827.5727.1427.0126.910.52−0.220.090.38−0.05−0.17−0.27
ffnsp9.3613.2716.9611.0614.2714.7010.5316.623.917.601.704.915.341.177.26
knn14.0513.5413.2713.2913.2313.2213.2513.28−0.51−0.78−0.76−0.82−0.83−0.80−0.77
lda21.3321.7322.2322.0222.1922.4122.2922.130.400.900.690.871.080.960.80
mlp20.8721.9824.8022.3322.5321.6922.9123.031.113.931.461.660.832.042.16
percep9.8010.019.579.377.6410.4211.5910.760.21−0.23−0.43−2.160.621.790.96
qda26.4726.4024.9725.8225.1222.1323.4326.14−0.06−1.49−0.65−1.34−4.34−3.04−0.33
rf37.8437.8136.7137.1636.9836.3936.6937.31−0.03−1.13−0.67−0.86−1.44−1.15−0.53
ridge21.3121.7022.2122.0022.1722.3922.2722.120.390.900.690.871.090.970.82
xgb34.5836.3335.9336.2636.3636.6236.4636.591.761.351.681.782.041.882.01
xnn2.520.002.860.021.770.980.411.09−2.520.34−2.50−0.76−1.54−2.11−1.43
3. D80
cnn22.2021.6814.4422.9717.8917.1618.4020.04−0.52−7.760.77−4.31−5.04−3.80−2.16
ebm35.0836.8136.8836.9837.0237.1437.0736.761.731.801.891.932.061.991.68
ffn27.3827.8027.5026.8926.5028.0728.2027.700.410.12−0.49−0.880.690.820.31
ffnlp27.3826.8126.4527.8927.2827.8228.3126.94−0.57−0.930.51−0.100.440.93−0.44
ffnsp13.6116.8317.3614.5813.1816.9814.9316.653.223.750.97−0.433.371.323.03
knn17.3517.3517.3217.3317.3317.3017.3217.33−0.00−0.03−0.03−0.02−0.05−0.04−0.02
lda23.8424.0724.0824.2924.4524.4924.4624.000.230.250.450.610.650.630.16
mlp23.8823.8321.4623.3423.3224.2621.6724.16−0.05−2.42−0.54−0.570.38−2.210.28
percep10.7912.5012.7310.4611.9613.2010.3212.191.711.93−0.331.172.40−0.481.39
qda28.1427.8326.4026.5826.3424.6325.6027.61−0.31−1.74−1.56−1.80−3.50−2.54−0.52
rf39.9639.9140.1340.5139.8839.5639.8240.59−0.040.180.55−0.07−0.39−0.140.63
ridge23.7923.9824.0524.2224.3824.4224.4323.920.190.260.430.590.630.640.13
xgb36.7437.8538.6938.4538.7738.9138.2938.571.121.951.712.042.171.561.83
xnn4.453.601.650.370.040.000.011.55−0.85−2.80−4.08−4.41−4.44−4.44−2.90
Note: This table reports the average KS across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change under scenarios that include government-backed guarantees (WGB_CR). Key findings include the following: 1. Models such as RF, XGB, and EBM consistently achieve high KS values, showing robust predictive performance. 2. The RF model shows the highest KS scores across most scenarios, with values around 40%. 3. Classical models, such as LDA and RIDGE demonstrate stable performance across different scenarios. 4. Some models (CNN, MLP, and XNN) show significant variability across scenarios, highlighting potential sensitivity to EF shifts. 5. Including WGB_CR data appears to stabilize the performance for some models, while others show large fluctuations between scenarios.
Table A12. Average F1 across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
Table A12. Average F1 across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WGB_CR, by evaluated scenario and relevance groups.
F1 (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WGB_CR
1. D100
cnn65.6067.9291.9490.8683.9991.9491.6791.072.3226.3425.2618.3926.3426.0725.47
ebm91.2291.6791.7091.6491.6791.6891.6691.670.440.470.420.450.460.440.45
ffn77.1072.8576.6070.2577.9875.8575.6469.85−4.25−0.50−6.850.88−1.25−1.46−7.25
ffnlp69.9779.1871.6175.0373.5080.8676.0571.799.211.645.063.5310.896.081.82
ffnsp48.7444.6942.6464.1451.6856.7547.9040.48−4.05−6.1015.402.948.01−0.83−8.25
knn76.1676.1476.0876.1076.0976.0876.0776.09−0.01−0.07−0.05−0.07−0.08−0.08−0.07
lda77.0677.1977.3577.3777.4277.4577.4277.400.130.290.310.360.390.360.34
mlp66.5571.5083.0079.7968.0764.9472.5769.974.9516.4413.241.52−1.616.013.42
percep50.7143.5346.5061.1871.8650.1360.2946.99−7.18−4.2110.4721.15−0.589.58−3.72
qda74.5176.1377.4675.3675.5774.5875.2377.311.622.950.851.060.070.722.80
rf90.8791.9492.1292.0492.1592.1592.1592.101.061.251.161.271.281.271.22
ridge43.8344.5944.9044.8245.1545.4245.0844.590.761.080.991.331.591.250.76
xgb90.9091.6291.7391.7191.7291.7691.8091.770.730.840.810.820.860.900.87
xnn79.9591.8965.7486.1990.5478.7391.0466.3111.94−14.226.2410.59−1.2211.09−13.64
2. D99
cnn65.5970.4593.3591.6492.4593.3793.3091.604.8627.7526.0426.8627.7827.7126.01
ebm0.000.000.000.000.000.000.000.000.000.000.000.000.000.000.00
ffn75.9277.6578.8675.9675.8280.2473.2574.401.732.940.04−0.104.32−2.66−1.52
ffnlp66.4677.1374.1472.2076.0078.5068.2778.5410.677.675.749.5412.041.8112.07
ffnsp52.7333.0154.6948.2850.9647.1459.3842.15−19.721.97−4.45−1.77−5.596.65−10.58
knn77.6277.2377.2077.2177.2077.1977.2077.22−0.39−0.42−0.41−0.42−0.43−0.42−0.40
lda78.0378.1878.3578.3078.3778.4578.3878.230.150.320.270.350.430.360.20
mlp72.5358.4773.7470.6183.6173.2875.2266.40−14.061.22−1.9211.080.762.69−6.13
percep54.9967.6676.2674.0579.1667.3860.3563.7512.6621.2619.0624.1712.395.358.76
qda75.2276.0776.5274.3574.5772.8673.4476.070.851.30−0.86−0.65−2.36−1.780.85
rf92.1593.2893.4093.3293.3993.4093.3893.351.131.261.181.241.261.241.21
ridge44.1844.4644.9044.7844.9345.3445.1244.670.290.730.600.751.160.950.49
xgb92.4893.1893.2893.2293.3093.3093.2593.270.700.810.740.830.820.770.79
xnn73.0392.0376.3473.5588.1786.0390.9386.0119.003.310.5215.1413.0017.9012.98
3. D80
cnn65.0367.7491.8691.3692.0092.0192.0088.432.7126.8326.3426.9726.9826.9823.41
ebm91.0991.6091.7091.6491.6891.7291.7191.690.510.610.550.590.630.620.60
ffn78.1578.2871.1273.9576.1774.2674.6075.330.12−7.03−4.20−1.98−3.90−3.55−2.83
ffnlp73.6974.9378.6474.9075.6379.3075.3373.381.244.941.201.945.611.64−0.31
ffnsp41.0244.8354.2157.4136.1955.6649.2143.533.8113.2016.39−4.8314.658.192.51
knn76.2276.2176.1976.2076.2076.1876.1976.21−0.01−0.03−0.01−0.02−0.03−0.02−0.01
lda77.0977.3177.4977.5077.5477.6777.5877.500.230.400.420.450.590.490.42
mlp57.5569.5477.7767.2880.2962.8673.7366.2612.0020.229.7322.755.3216.198.71
percep84.4957.7661.3870.6077.0766.0370.1271.36−26.73−23.11−13.89−7.42−18.46−14.37−13.13
qda75.5676.3777.2374.1974.5673.7674.4177.240.811.68−1.37−0.99−1.80−1.151.68
rf90.6491.8692.0992.0192.0992.1492.1092.041.221.451.381.451.501.471.40
ridge44.3544.9845.2945.2245.5645.7645.4044.910.640.940.881.211.411.060.57
xgb90.7891.5991.7891.6991.7791.7691.7891.730.811.000.910.980.981.000.95
xnn80.6472.2385.1881.2573.6292.0173.6061.87−8.414.540.61−7.0211.37−7.04−18.77
Note: This table reports the average F1 across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change under scenarios including government-backed guarantees (WGB_CR). Key findings include the following: 1. The CNN model shows dramatic improvements in F1 score (up to +27.75 percentage points) in certain scenarios, particularly those combining multiple EF types. 2. Models like EBM and RF maintain consistently high F1 scores (above 90%) across most scenarios. 3. The FFNSP model exhibits the most variability, with improvements up to +16.39 in some scenarios but declines in others. 4. Classical models (LDA, RIDGE) show modest but stable performance across scenarios. 5. Some models (PERCEP, XNN) show extreme variability, with both large gains and losses depending on the scenario. 6. The inclusion of WGB_CR appears to benefit neural network models particularly, while tree-based models maintain their high performance.
Table A13. Average AUC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
Table A13. Average AUC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
AUC (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WOGB_CR
1. D100
cnn62.3561.9657.8762.3458.8959.0058.3461.89−0.39−4.48−0.01− 3.46−3.35−4.00−0.46
ebm73.7373.7173.9273.8773.9174.0374.0273.89−0.030.190.130.180.290.290.15
ffn65.1666.2566.5066.8166.6166.5066.4966.621.091.341.651.451.351.341.46
ffnlp66.4566.0665.4666.7766.6066.7465.9966.40−0.39−0.990.320.140.28−0.46−0.06
ffnsp66.2666.2766.9067.0565.9267.0066.8266.190.010.650.79− 0.340.740.57−0.07
knn58.6858.5458.6358.6258.6258.6158.6058.63−0.14−0.05−0.06− 0.06−0.07−0.08−0.05
lda64.6064.8464.9665.0065.0265.0665.0464.890.240.350.390.410.460.430.29
mlp63.7261.4964.3664.9664.0664.7761.7764.21−2.230.641.240.341.05−1.950.49
percep55.3254.5153.4755.1454.9656.4053.9556.10−0.81−1.85−0.18− 0.361.07−1.370.77
qda66.3566.3965.7465.8065.5264.1664.7666.210.04−0.60−0.54− 0.83−2.19−1.58−0.14
rf74.6874.8374.4774.7274.7074.3874.4774.770.14−0.220.030.01−0.31−0.220.09
ridge63.4063.6563.8063.9063.9363.9863.9663.740.260.410.500.530.590.560.34
xgb72.5373.9773.9773.7273.8373.9074.0274.011.441.441.181.301.371.481.48
xnn51.6650.1349.9350.5250.5551.2549.9650.32−1.54−1.73−1.15− 1.11−0.41−1.71−1.34
2. D99
cnn62.4962.4058.0861.8159.5259.1959.2362.85−0.10−4.41−0.68− 2.97−3.31−3.260.35
ebm73.6873.8573.6473.5873.6873.8073.7073.600.17−0.04−0.10− 0.000.120.03−0.08
ffn66.8566.0666.5465.7666.1666.0965.6666.50−0.79−0.31−1.09− 0.69−0.76−1.19−0.35
ffnlp66.0766.5366.5865.8966.3566.9466.2066.110.470.51−0.180.280.870.130.04
ffnsp66.5166.7866.4966.6367.0966.9066.4566.190.28−0.020.130.580.39−0.06−0.31
knn58.6558.4558.3558.3658.3558.3558.3558.36−0.20−0.29−0.28− 0.30−0.30−0.29−0.29
lda64.5964.8064.9464.9865.0065.0465.0264.870.220.350.390.410.460.430.28
mlp63.4964.0564.2062.8464.3664.8062.9863.310.550.71−0.650.861.30−0.52−0.18
percep53.1551.8752.0151.7250.7053.4454.6553.33−1.28−1.14−1.43− 2.450.291.500.18
qda66.3466.2865.6365.6765.4064.0264.6266.09−0.06−0.71−0.67− 0.94−2.32−1.72−0.25
rf74.5475.0274.7174.8374.6174.4774.4974.830.480.170.290.08−0.07−0.050.29
ridge63.3863.6163.7863.8963.9263.9663.9463.710.230.400.510.540.580.560.34
xgb72.6073.7373.7473.6773.8973.9273.9173.931.131.141.081.291.321.311.33
xnn49.5951.8951.6950.4750.0050.2952.0450.652.292.100.880.410.702.451.05
3. D80
cnn61.7360.5858.1060.9159.2358.8958.8861.34−1.14−3.62−0.82− 2.50−2.84−2.85−0.39
ebm73.4973.6973.8173.7473.8573.8973.8973.800.190.320.250.360.400.400.31
ffn66.7466.5866.1665.4166.5966.3866.4765.86−0.16−0.58−1.32− 0.15−0.36−0.26−0.88
ffnlp66.2966.3766.1366.8566.4966.6167.2565.600.09−0.160.570.210.320.96−0.69
ffnsp66.9066.4266.5566.9665.5966.6766.7665.94−0.48−0.360.06− 1.31−0.24−0.14−0.97
knn58.5658.5558.5658.5458.5458.5558.5558.57−0.010.00−0.02− 0.02−0.01−0.010.01
lda65.1165.2165.2865.3565.3765.4065.3765.220.090.170.230.260.290.260.11
mlp64.2764.9862.5463.2065.2765.2165.1063.600.71−1.73−1.071.000.940.83−0.67
percep51.7651.3953.7754.1950.5553.0653.6652.41−0.372.012.43−1.211.311.900.65
qda66.4966.3665.8466.0265.7564.6165.2566.33−0.14−0.65−0.48−0.74−1.88−1.25−0.16
rf74.3074.7074.4974.5874.6074.2174.2274.660.410.200.290.31−0.09−0.070.37
ridge63.9064.0164.1464.2764.3164.3564.3064.070.110.240.370.410.450.400.17
xgb72.4373.6474.0173.7173.7674.0573.8674.051.211.581.281.331.621.431.62
xnn50.4050.5550.0049.9750.0050.0049.9450.660.15−0.40−0.43−0.40−0.40−0.470.25
Note: This table reports the average AUC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change, excluding credits with government-backed guarantees (WOGB_CR). Key findings include the following: 1. Tree-based models (EBM, RF, and XGB) demonstrate strong and consistent performance, with AUC values above 73%. 2. Neural networks (CNN, XNN) show greater sensitivity to the exclusion of government-backed credits, with notable performance declines in some scenarios. 3. Classical models (LDA, RIDGE) show modest but stable improvements, whereas KNN maintains consistent performance. 4. Scenarios including social unrest and climate change factors (U, T + W) highlight the adaptability of tree-based models, while neural networks exhibit higher variability. 5. The exclusion of government-backed credit data affects the predictive performance of some models, particularly neural network-based models, underscoring the value of enriched datasets in maintaining predictive accuracy.
Table A14. Average ACC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
Table A14. Average ACC across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
ACC (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WOGB_CR
1. D100
cnn56.6360.4787.6584.3787.6987.6782.4184.513.8431.0227.7431.0631.0425.7827.88
ebm87.7887.6987.6587.6387.6487.6487.6587.65−0.09−0.13−0.15−0.14−0.14−0.13−0.13
ffn70.0770.6067.9664.9966.8268.5068.2668.250.53−2.11−5.08−3.24−1.56−1.81−1.82
ffnlp64.1065.6772.3267.6666.1865.7765.8066.701.578.223.562.081.661.692.59
ffnsp67.2163.5360.9369.0467.5360.9766.5769.85−3.68−6.281.820.31−6.24−0.642.63
knn65.2465.2064.9865.0065.0064.9564.9664.99−0.03−0.26−0.24−0.24−0.28−0.27−0.24
lda67.2467.3167.5267.4367.4567.6467.5567.490.070.280.190.210.400.300.25
mlp58.5870.5356.0766.4467.4256.9463.9356.4711.95−2.507.878.84−1.635.36−2.11
percep48.8446.3862.4236.7863.4840.9056.6442.35−2.4613.58−12.0614.64−7.947.80−6.49
qda62.8164.2465.3763.1863.4161.3762.0965.051.432.550.360.59−1.44−0.732.23
rf85.7287.4487.6387.5387.6887.6787.6587.631.731.911.821.961.951.931.91
ridge66.1866.3166.6066.5866.6766.7766.6366.500.130.430.400.490.590.450.33
xgb86.1887.2987.5087.4187.4987.4987.4987.511.111.321.231.311.301.311.32
xnn72.0156.0587.5781.7177.3680.5379.9577.88−15.9615.569.705.368.527.945.88
2. D99
cnn66.2954.6687.5983.5787.6787.7087.5387.04−11.6321.3017.2821.3821.4121.2420.76
ebm87.8287.6987.6287.5987.6287.6487.6287.63−0.13−0.20−0.22−0.20−0.18−0.20−0.19
ffn67.1370.0962.2964.6068.4567.8171.2767.152.96−4.84−2.531.320.684.140.02
ffnlp58.2768.3265.6865.0267.6465.6260.5469.2310.057.416.769.377.352.2710.96
ffnsp64.1067.7367.5066.2866.3866.3169.1865.033.643.402.182.292.215.080.93
knn65.3864.9264.8964.8964.8964.8764.8964.90−0.45−0.49−0.48−0.49−0.50−0.49−0.47
lda67.2467.4067.5067.4267.4167.5467.5067.400.160.260.180.170.300.260.16
mlp64.9161.2668.5166.4658.0567.9664.9265.38−3.653.591.55−6.863.040.010.46
percep51.5561.7769.5466.6572.9360.9353.9257.2010.2217.9915.1021.389.382.375.64
qda63.2764.2864.7362.3462.5460.4961.2064.281.011.46−0.93−0.73−2.77−2.071.02
rf85.7387.5187.6487.5987.6687.6887.6687.621.781.921.861.931.961.941.89
ridge66.1666.3866.6166.5566.6466.7466.6666.440.220.450.390.480.580.500.28
xgb86.2087.3287.4887.3887.5287.5187.4287.461.121.271.181.321.311.211.26
xnn63.7868.4762.7981.8287.4084.2064.9482.324.69−0.9918.0423.6220.431.1618.54
3. D80
cnn62.3658.8587.5784.5287.4287.6887.7182.55−3.5025.2222.1625.0625.3225.3520.20
ebm87.5187.5887.6487.6287.6287.6587.6387.640.070.130.100.110.130.120.12
ffn63.8358.2969.6266.2069.8771.7062.7262.91−5.545.792.376.047.87−1.12−0.92
ffnlp58.4661.3871.0170.0064.7466.2862.1263.302.9212.5511.546.287.823.664.84
ffnsp65.8962.6163.4463.9372.8962.8365.3266.60−3.28−2.45−1.967.00−3.06−0.570.71
knn65.0765.0665.0365.0565.0465.0265.0365.06−0.00−0.03−0.02−0.03−0.05−0.03−0.01
lda67.3367.5967.7667.5867.6967.8567.7467.560.260.430.250.360.520.410.23
mlp67.9365.9372.3168.1263.2166.3659.3859.67−2.004.380.19−4.72−1.57−8.55−8.26
percep60.0862.6150.0753.0462.3263.7253.0264.282.53−10.01−7.032.243.64−7.064.20
qda62.4763.6564.2760.9461.3359.3560.2863.881.181.81−1.53−1.14−3.12−2.191.41
rf85.4587.3687.6887.5887.7087.7087.6587.601.912.222.132.252.252.202.14
ridge66.3266.5866.7366.7266.8466.9366.7866.640.260.410.400.520.610.460.31
xgb85.9587.2187.5187.3887.4687.5187.4987.471.261.551.421.511.561.541.52
xnn66.3381.9287.7480.0387.7480.1979.8869.2815.5921.4113.6921.4113.8613.552.95
Note: This table reports the average ACC across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change, excluding credits with government-backed guarantees (WOGB_CR). Key findings include the following: 1. Tree-based models (EBM, RF, XGB) demonstrate robust and consistent performance, with ACC values above 87% across most scenarios. 2. Neural networks (CNN, XNN, MLP) exhibit greater sensitivity to the exclusion of government-backed credits, with significant performance variations in combined scenarios (U + T + W), particularly for XNN (ACC reductions of up to 21% in some scenarios). 3. Classical models (LDA, RIDGE) maintain stable performance with minimal changes ( ± 0.2 % ), achieving consistent ACC values of approximately 67%. 4. Perceptron-based models show notable performance reductions in specific scenarios, such as U + T- SFE, with ACC dropping to 48.84% (−2.46% compared with datasets including government-backed credits). 5. Scenarios enriched with combined external factors (U + T + W) highlight the adaptability and stability of tree-based models, whereas neural network models show greater variability. 6. The exclusion of government-backed credit data significantly affects the predictive performance of certain models, particularly neural networks, emphasizing the importance of enriched datasets in maintaining predictive accuracy.
Table A15. Average KS across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
Table A15. Average KS across 10 folds of ML models with SOCIAL UNREST and CLIMATE CHANGE EF WOGB_CR, by evaluated scenario and relevance groups.
KS (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WOGB_CR
1. D100
cnn18.9419.9812.1019.8712.8814.3713.0717.531.04−6.840.93−6.06−4.57−5.87−1.41
ebm33.2234.5134.7434.8334.9035.0034.9834.561.291.521.611.681.781.761.35
ffn25.6825.8926.5426.0625.9525.9924.6424.540.210.870.380.270.31−1.04−1.14
ffnlp25.0425.9225.9526.4225.6326.3925.5925.240.880.911.380.601.350.550.20
ffnsp9.8411.594.9610.0711.0513.189.468.221.75−4.880.231.213.34−0.38−1.62
knn14.2213.6414.0514.0614.0514.0314.0414.07−0.58−0.17−0.16−0.17−0.19−0.18−0.15
lda21.3421.6822.0522.2622.3622.3522.2821.960.340.710.921.031.010.940.62
mlp21.8622.1922.8020.7023.3622.8620.8621.430.330.94−1.171.491.00−1.01−0.44
percep12.6011.9110.3111.8712.7613.6311.6413.77−0.70−2.29−0.740.151.02−0.961.16
qda26.3126.6125.0425.7125.1522.2923.5926.490.30−1.27−0.60−1.16−4.02−2.720.18
rf37.8937.8436.7937.3737.1136.3236.6437.27−0.05−1.10−0.52−0.78−1.58−1.25−0.63
ridge21.3221.6522.0222.2522.3322.3322.2421.960.330.700.931.011.010.920.64
xgb34.7636.8036.9436.1536.2536.5336.5836.632.042.191.401.491.771.831.87
xnn3.370.263.181.092.800.031.040.00−3.11−0.20−2.29−0.58−3.34−2.33−3.37
2. D99
cnn19.0822.9011.3319.5113.7814.8113.9417.883.82−7.750.43−5.29−4.27−5.14−1.19
ebm33.1034.5235.3035.0335.2735.5835.5234.901.422.201.932.182.482.421.81
ffn26.1624.9724.2925.9424.8925.9424.7126.08−1.18−1.86−0.22−1.27−0.22−1.44−0.08
ffnlp26.2524.5726.4525.6725.7225.9726.4925.55−1.680.20−0.58−0.53−0.280.24−0.70
ffnsp9.609.9512.999.6813.8213.258.606.580.353.390.084.223.65−1.00−3.02
knn14.0513.5413.2713.2913.2313.2213.2513.28−0.51−0.78−0.76−0.82−0.83−0.80−0.77
lda21.3321.7322.2322.0222.1922.4122.2922.130.400.900.690.871.080.960.80
mlp20.8721.9824.8022.3322.5321.6922.9123.031.113.931.461.660.832.042.16
percep9.8010.019.579.377.6410.4211.5910.760.21−0.23−0.43−2.160.621.790.96
qda26.4726.4024.9725.8225.1222.1323.4326.14−0.06−1.49−0.65−1.34−4.34−3.04−0.33
rf37.8437.8136.7137.1636.9836.3936.6937.31−0.03−1.13−0.67−0.86−1.44−1.15−0.53
ridge21.3121.7022.2122.0022.1722.3922.2722.120.390.900.690.871.090.970.82
xgb34.5836.3335.9336.2636.3636.6236.4636.591.761.351.681.782.041.882.01
xnn2.670.920.031.053.531.610.001.08−1.75−2.64−1.620.86−1.06−2.67−1.60
3. D80
cnn19.5021.1512.7420.3914.5014.5613.9319.921.66−6.760.90−5.00−4.93−5.570.42
ebm33.0734.4734.9235.0035.1135.0135.1334.921.391.851.932.041.942.061.84
ffn24.9225.4525.3525.4425.5026.1626.5024.640.530.430.520.581.241.58−0.28
ffnlp25.5425.9124.8125.2826.5826.3726.4024.380.37−0.73−0.261.040.830.86−1.16
ffnsp13.6515.6610.1110.3611.8612.6011.6511.452.02−3.54−3.29−1.79−1.05−2.00−2.20
knn13.7813.7813.8213.7813.7713.7913.8413.86−0.010.040.00−0.010.000.060.08
lda22.2422.4522.7422.9523.1123.0022.8922.600.210.500.710.870.760.650.36
mlp23.2521.9521.1919.7321.1722.7420.3622.79−1.30−2.05−3.52−2.08−0.51−2.89−0.46
percep10.5610.8611.4811.9411.4912.4012.1610.830.290.921.380.931.841.600.27
qda27.1426.8525.4226.0625.7923.2224.1526.57−0.29−1.72−1.08−1.35−3.92−2.99−0.57
rf37.2137.1436.8937.3637.3736.4636.6236.98−0.07−0.320.150.15−0.76−0.60−0.23
ridge22.2222.4522.7522.9423.0822.9922.8622.580.240.530.730.860.770.640.36
xgb34.3536.1536.5136.1936.2936.8936.2436.871.802.161.841.942.541.892.52
xnn3.960.040.004.340.801.052.690.01−3.92−3.960.39−3.16−2.91−1.27−3.94
Note: This table reports the average KS across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change, excluding credits with government-backed guarantees (WOGB_CR). Key findings include the following: 1. Tree-based models (EBM, RF, and XGB) demonstrate strong and consistent performance, with KS values above 33%. 2. Neural networks (CNN, XNN) show greater sensitivity to the exclusion of government-backed credits, with notable performance declines in some scenarios. 3. Classical models (LDA, RIDGE) show modest but stable improvements, whereas KNN maintains consistent performance. 4. Scenarios including social unrest and climate change factors (U, T + W) highlight the adaptability of tree-based models, while neural networks exhibit higher variability. 5. The exclusion of government-backed credit data affects the predictive performance of some models, particularly neural network-based models, underscoring the value of enriched datasets in maintaining predictive accuracy.
Table A16. Average F1 across 10 folds of ML models with social unrest and climate change EF WOGB_CR, by evaluated scenario and relevance groups.
Table A16. Average F1 across 10 folds of ML models with social unrest and climate change EF WOGB_CR, by evaluated scenario and relevance groups.
F1 (%)
Scenarios Results
SFE T T U U U U W T T U U U + T U W
+ W + T + T + W - SFE + W - SFE + T + W + W - SFE
+ W - SFE - SFE - SFE - SFE
WOGB_CR
1. D100
cnn74.5272.2293.4392.9393.3193.4493.1891.34−2.2918.9118.4218.8018.9318.6616.83
ebm92.9493.3193.3293.2993.3193.3293.3293.310.380.380.350.370.380.380.37
ffn78.1577.3471.5871.2580.0973.7279.5980.88−0.81−6.57−6.901.94−4.431.442.73
ffnlp78.3978.2678.6776.4777.8073.8280.7983.01−0.120.28−1.92−0.59−4.562.414.62
ffnsp55.2535.4636.6340.1848.2947.6636.3953.66−19.80−18.62−15.07−6.96−7.59−18.86−1.59
knn77.5077.4477.2777.2977.2977.2577.2677.29−0.06−0.23−0.21−0.21−0.25−0.24−0.22
lda78.0378.1178.3578.3478.4078.4878.3678.260.080.320.310.370.450.330.24
mlp76.4164.9072.2676.0270.0363.7770.7172.75−11.51−4.16−0.39−6.38−12.65−5.71−3.67
percep54.0851.7866.3038.4068.9444.2465.5544.69−2.3112.22−15.6814.85−9.8411.47−9.39
qda74.8076.0377.0875.1175.3573.6674.2576.731.232.280.310.55−1.14−0.551.93
rf92.1593.2793.3893.3493.4093.4093.3893.371.121.231.191.251.251.231.22
ridge44.2244.4245.0544.8745.0745.5545.2644.850.200.830.650.851.331.040.63
xgb92.4693.1693.2993.2393.2893.2993.2993.300.700.830.770.820.830.820.83
xnn60.5773.9281.6983.0482.6183.9978.5184.1013.3521.1322.4722.0523.4217.9423.54
2. D99
cnn65.5970.4593.3591.6492.4593.3793.3091.604.8627.7526.0526.8627.7827.7126.01
ebm92.9493.2893.3293.3093.3293.3393.3493.310.330.380.360.380.390.400.37
ffn75.9277.6578.8675.9675.8280.2473.2574.401.732.940.04−0.104.32−2.66−1.52
ffnlp73.8380.3475.9779.0376.6574.4874.7974.326.512.145.202.820.650.960.49
ffnsp61.0537.5359.1539.2655.6436.8951.3127.41−23.52−1.90−21.80−5.42−24.17−9.74−33.64
knn77.6277.2377.2077.2177.2077.1977.2077.22−0.39−0.42−0.41−0.42−0.43−0.42−0.40
lda78.0378.1878.3578.3078.3778.4578.3878.230.150.320.270.350.430.360.20
mlp72.5358.4773.7470.6183.6173.2875.2266.40−14.061.22−1.9211.080.762.69−6.13
percep54.9967.6676.2674.0579.1667.3860.3563.7512.6621.2619.0624.1712.395.358.76
qda75.2276.0776.5274.3574.5772.8673.4476.070.851.30−0.86−0.65−2.36−1.780.85
rf92.1593.2893.4093.3293.3993.4093.3893.351.131.251.181.241.261.231.20
ridge44.1844.4644.9044.7844.9345.3445.1244.670.290.730.600.751.160.950.49
xgb92.4893.1893.2893.2293.3093.3093.2593.270.700.810.740.830.820.770.79
xnn87.5887.3984.0983.0573.3069.0593.4587.28−0.19−3.49−4.53−14.28−18.535.87−0.30
3. D80
cnn59.6673.3693.3592.3593.2993.3793.4090.9213.7033.7032.7033.6333.7133.7431.26
ebm92.8593.2593.3493.3293.3393.3493.3593.330.390.480.460.470.490.490.47
ffn74.5377.0981.3179.8579.3776.6674.8275.942.556.785.324.832.130.291.41
ffnlp74.5875.4476.2876.4471.2076.5176.8776.480.871.701.86−3.381.932.291.90
ffnsp34.0744.0746.4667.6132.9651.4833.2146.9110.0012.3933.53−1.1217.40−0.8612.83
knn77.3477.3377.3177.3277.3177.3077.3177.33−0.00−0.03−0.01−0.02−0.04−0.03−0.01
lda78.1178.3278.4578.4478.5278.5778.4778.360.200.340.320.410.460.360.24
mlp61.3971.0781.4287.3677.1868.3478.1877.469.6720.0325.9715.786.9516.7916.07
percep63.4166.3854.5060.0365.7972.4858.1269.262.97−8.91−3.382.389.06−5.295.85
qda74.4275.4976.1273.0873.4671.7772.5675.711.071.70−1.34−0.96−2.65−1.861.29
rf91.9693.2093.3993.3593.4093.3993.3893.361.241.431.391.441.441.421.40
ridge44.9145.2145.4745.5245.6645.9745.7445.270.300.560.610.751.060.830.36
xgb92.3293.1293.3093.2193.2793.3093.2893.270.800.980.900.950.980.970.95
xnn76.2993.1993.4783.0677.5678.1084.4184.1516.9017.176.771.261.818.127.86
Note: This table reports the average F1 across ten folds for binary credit classification (good vs. bad payers) in datasets enriched with EF related to social unrest and climate change, excluding credits with government-backed guarantees (WOGB_CR). Key findings include the following: 1. Tree-based models (EBM, RF, XGB) demonstrate robust and consistent performance, with F1 values above 92% across most scenarios. 2. Neural networks (CNN, XNN, MLP) exhibit greater sensitivity to the exclusion of government-backed credits, with significant performance variations in combined scenarios (U + T + W). 3. Classical models (LDA, RIDGE) maintain stable performance with minimal changes, achieving consistent F1 values. 4. Perceptron-based models show notable performance reductions in specific scenarios, such as U + T- SFE. 5. Scenarios enriched with combined external factors (U + T + W) highlight the adaptability and stability of tree-based models, whereas neural network models show greater variability. 6. The exclusion of government-backed credit data significantly affects the predictive performance of certain models, particularly neural networks, emphasizing the importance of enriched datasets in maintaining predictive accuracy.
Table A17. Models with statistically significant improvements for AUC, ACC, KS, and F1 metrics with COVID EF, considering WGB_CR and WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
Table A17. Models with statistically significant improvements for AUC, ACC, KS, and F1 metrics with COVID EF, considering WGB_CR and WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
WGB_CRWOGB_CR
AUC ACC KS F1 TOTAL AUC ACC KS F1 TOTAL
F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P
1. D100
cnnYY-YYYYY-YYY8YYYYYYYYYYYY12
ebmYYYYYYYYYYYY12YYYYYYYYYYYY12
ffn-----Y------1----YY------2
ffnlp------------0------------0
ffnsp------------0----YY------2
ldaYYY-Y-YYYYYY9YY-YY-YY-YY-7
mlp-----Y------1Y-----------1
percep----YY----YY4------------0
qda---YYY---YYY6---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YY-YY-YY-YY-8
xgb---YYY---YYY6---YYY---YYY6
xnn------------0----Y-----Y-1
2. D99
cnn---Y-Y---Y-Y3Y-YYYYY-YYYY9
ebmYYYYYYYYYYYY12YYYYYYYYYYYY12
ffn-----Y------1-----Y------1
ffnlp------------0------------0
ffnsp------------0--Y---------1
ldaYYYYY-YYYYYY10YY-YY-YY-YY-7
percep------------0------------0
qda---YYY---YYY6---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YY-YY-YY-YY-8
xgb---YYY---YYY6YYYYYYYYYYYY12
xnn------------0------------0
3. D80
cnnY-YYYYYYYYYY10YY-YYYYY-YYY9
ebmYYYYYYYYYYYY12YYYYYYYYYYYY12
ffn------------0------------0
ffnlp-Y--Y--Y--Y-3----Y-----Y-1
ffnsp------------0--Y---------1
ldaYYYYYYYYYYYY12YYYYYYYYYYYY12
mlp------------0---Y-Y------2
percep-Y-----Y----1-Y-----Y----1
qda---YYY---YYY6---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YYYYYYYYYYYY12
xgbYYYYYYYYYYYY12YYYYYYYYYYYY12
xnn---YYY---YYY6----Y-----Y-1
Note: This table reports the machine learning models (CNN, EBM, FFN, etc.) that exhibited statistically significant improvements in their classification metrics (AUC, ACC, KS, and F1) when EF related to COVID-19 was included. The evaluation considers scenarios with and without government-backed guarantees (WGB_CR and WOGB_CR). Key findings include the following: 1. Tree-based models (XGB, RF) demonstrate consistent performance, with statistical significance in most scenarios, particularly in F1 and ACC metrics. 2. Neural networks (CNN, FFN) show variability, excelling in WGB_CR but with mixed results in WOGB_CR. 3. Classical models (LDA, RIDGE) maintain robust performance across all metrics, especially in WGB_CR. 4. Model-specific trends are as follows: - EBM achieves perfect significance in all metrics for both groups. - PERCEP and MLP are scenario dependent, with limited improvements. 5. Dataset impact: WOGB_CR introduces higher variability, particularly for KS and F1 metrics.
Table A18. Models with statistically significant improvements for AUC, ACC, KS, and F1 metrics with COVID MOV EF, considering WGB_CR and WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
Table A18. Models with statistically significant improvements for AUC, ACC, KS, and F1 metrics with COVID MOV EF, considering WGB_CR and WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
WGB_CRWOGB_CR
AUC ACC KS F1 TOTAL AUC ACC KS F1 TOTAL
F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P F F + P P
1. D100
cnnYYYYYYYY-YYY10YYYYYYYYYYYY12
ebm------YYYYYY6------------0
ldaYYYYYYYYYYY-11YY-YY-YY-YY-7
percep-----Y-----Y2------------0
qda---YYY---YYY6---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YY-YY-YY-YY-8
xgbY--Y-Y---YYY5---YYY---YYY6
2. D99
cnnYYYYYYYY-YYY10YY-YYYYY-YYY9
ebm-Y----YYYYYY6YYYYYYYYYYYY12
ldaYYYYYYYYYYY-11YY-YY-YY-YY-7
mlp-Y-----Y----2------------0
percepY-----Y-----2------------0
qda---YYY---YYY6---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YY-YY-YY-YY-8
xgb-Y--Y----YYY4-YYYYY-YYYYY9
ffn------------0---YY--Y-YY-5
ffnlp------------0-Y-----Y----2
3. D80
cnnYYYYYYYYYYYY12-Y-YYY-Y-YYY7
ebmYYYYYYYYYYYY12YYYYYYYYYYYY12
ffnlp-Y-----Y----2-Y-----Y----2
ldaYYYYYYYYYYYY12YYYYYYYYYYYY12
qda---Y-Y---YYY5---YYY---YYY6
rf---YYY---YYY6---YYY---YYY6
ridgeYYYYYYYYYYYY12YYYYYYYYYYYY12
xgbYYYYYYYYYYYY12YYYYYYYYYYYY12
mlp------------0-----Y------1
ffnsp------------0-Y-----Y----2
Note: This table reports the machine learning models (CNN, EBM, FFN, etc.) that exhibited statistically significant improvements in their classification metrics (AUC and ACC) when EF related to COVID-19 MOV were included. This analysis introduced a temporal shift in mortality data, considering scenarios with and without government-backed guarantees (WGB_CR and WOGB_CR). Key findings include the following: 1. Tree-based models (XGB, RF): These models demonstrate consistent and robust performance, achieving statistical significance in all relevance groups (F, F + P, P) under WGB_CR and most scenarios under WOGB_CR. The temporal shift appears to further stabilize performance. 2. Neural network models (CNN): Cnn consistently achieves statistical significance across all scenarios under both datasets (WGB_CR and WOGB_CR), underscoring its adaptability to enriched datasets and the temporal shift introduced by the COVID MOV. 3. Classical models (LDA, RIDGE): These models exhibit stable and statistically significant improvements, particularly in scenarios enriched with external factors. Their performance remains robust across most scenarios, with or without government-backed guarantees. 4. Scenario-specific improvements: Models such as FFN, FFNLP, and MLP exhibit improvements in specific scenarios, indicating higher sensitivity to dataset composition and relevance group combinations. 5. Impact of dataset composition: The exclusion of government-backed guarantees (WOGB_CR) amplifies the performance variability across models, particularly for neural networks and classical methods, suggesting a nuanced dependency on data characteristics to achieve statistical significance. 6. Temporal shift effects: The inclusion of a temporal shift in the COVID mortality data (MOV) enhances the ability of certain models (CNN, XGB) to identify statistically significant improvements, highlighting the importance of temporally adjusted external factors in predictive modeling. Overall, the findings underscore the critical role of enriched datasets and temporal adjustments in improving predictive accuracy, particularly for neural networks and tree-based models, while highlighting model-specific sensitivities to data composition and experimental design.
Table A19. Models with statistically significant improvements for AUC and ACC metrics with SOCIAL UNREST and CLIMATE CHANGE EF, considering WGB_CR, by evaluated scenario and relevance groups—Scenarios.
Table A19. Models with statistically significant improvements for AUC and ACC metrics with SOCIAL UNREST and CLIMATE CHANGE EF, considering WGB_CR, by evaluated scenario and relevance groups—Scenarios.
WGB_CR
AUC ACC TOTAL
T T + W U U + T U + T + W U + W W T T + W U U + T U + T + W U + W W
1. D100
cnnY-Y---YYYYYYYY10
ffn-------Y------1
ffnsp--Y--Y--Y-----3
lda--------------0
percep----------Y---1
qda-------YY----Y3
rf-------YYYYYYY7
ridge----------YYY-3
xgbYYYYYYYYYYYYYY14
xnn----Y---------1
2. D99
cnn--------YYYYYY6
ffnlp--------------0
ffnsp-----------Y--1
lda-----------YY-2
mlp----Y--YY---YY5
percepY--Y-Y--------3
qda--------------0
rf-------YYYYYYY7
ridge----Y---YYYYYY7
xgbYYYYYYYYYYYYYY14
xnn--------------0
3. D80
cnn--Y-----YYYYYY7
ebmYYYYYYYYYYYYYY14
ffn--Y--Y--------2
ffnlp---------Y----1
ffnsp-------YY-----2
ldaYYYYYYYYYYYYYY14
mlp--------------0
percepY---Y---------2
qda-------YY----Y3
rfYYYY--YYYYYYYY12
ridgeYYYYYYYYYYYYYY14
xgbYYYYYYYYYYYYYY14
xnn--------YYY-Y-4
Note: This table reports the machine learning models (CNN, EBM, FFN, etc.) that exhibited statistically significant improvements in their classification metrics (AUC and ACC) when EF related to social unrest and climate change were included. The evaluation considered scenarios in the presence of government-backed guarantees (WGB_CR). Key findings include the following: 1. Tree-based models (XGB, RF): These models demonstrate consistent and robust performance across most scenarios, achieving statistical significance in all relevance groups (T, U, W, and their combinations). Notably, XGB achieved statistical significance in all the evaluated scenarios, highlighting its adaptability to diverse external factors. 2. Neural network models (CNN): Cnn achieved statistically significant improvements across multiple scenarios, particularly in combinations of temperature (T) and other external factors. Its performance underscores its sensitivity to enriched datasets and potential to adapt to complex external factor scenarios. 3. Classical models (LDA, RIDGE): While LDA showed limited significance in this analysis, RIDGE exhibited consistent improvements in scenarios that included climate-related factors (T + W, U + T + W) and achieved statistical significance across all combinations, demonstrating its robustness when government-backed guarantees were present. 4. Scenario-specific improvements: Models such as EBM, FFN, and FFNLP demonstrate significance in the selected scenarios. For instance, EBM consistently achieved significance in scenarios involving a combination of social unrest and climate factors, suggesting context-dependent performance. 5. Impact of external factors (W, T): The inclusion of climate-related factors (T) and social unrest (U) enhanced the predictive capabilities of several models, particularly tree-based and neural network approaches. Models that combined these factors (U + T + W) exhibited notable improvements in statistical performance. 6. Comprehensive performance of XGB and RIDGE: Both models consistently demonstrated robust results across all relevance groups and scenarios, making them highly reliable for predictive tasks involving external factors such as social unrest and climate change. These findings highlight the importance of incorporating external factors into predictive modeling frameworks. In particular, tree-based models and neural networks benefit from these enriched datasets, achieving significant improvements in their predictive accuracy and generalization performance under complex scenarios.
Table A20. Models with statistically significant improvements for AUC and ACC metrics with SOCIAL UNREST and CLIMATE CHANGE EF, considering WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
Table A20. Models with statistically significant improvements for AUC and ACC metrics with SOCIAL UNREST and CLIMATE CHANGE EF, considering WOGB_CR, by evaluated scenario and relevance groups—Scenarios.
WOGB_CR
AUC ACC TOTAL
T T + W U U + T U + T + W U + W W T T + W U U + T U + T + W U + W W
1. D100
cnn--------YYYYYY6
ffn--Y-----------1
ffnsp-YY-----------2
lda-YYYYY-----Y--6
percep--------------0
qda-------YY----Y3
rf-------YYYYYYY7
ridge-YYYYYY-YYYYYY12
xgbYYYYYYYYYYYYYY14
xnn--------Y-----1
2. D99
cnn--------YYYYYY6
ffnlp-------YYYYY-Y6
ffnsp--------------0
lda--------Y--Y--2
mlp--------------0
percep----------Y---1
qda-------YY----Y3
rfY------YYYYYYY8
ridge--------YYYYY-5
xgbYYYYYYYYYYYYYY14
xnn-Y---Y---YY---4
3. D80
cnn--------YYYYYY6
ebmYYYYYYYYYYYYYY14
ffn--------Y--Y--2
ffnlp-----Y--YY----3
ffnsp--------------0
ldaYYYYYYYYYYYYYY14
mlp---Y----------1
percep--------------0
qda-------YY----Y3
rfY-----YYYYYYYY9
ridgeYYYYYYYYYYYYYY14
xgbYYYYYYYYYYYYYY14
xnn--------Y-Y---2
Note: This table reports the machine learning models (CNN, EBM, FFN, etc.) that exhibited statistically significant improvements in their classification metrics (AUC and ACC) when EF related to social unrest and climate change were included. The evaluation considers scenarios without government-backed guarantees (WOGB_CR). Key findings include the following: 1. Tree-based models (XGB, RF): These models demonstrate consistent and robust performance across most scenarios, achieving statistical significance in all relevance groups (T, U, W, and their combinations). Notably, XGB achieved statistical significance in all the evaluated scenarios, further confirming its adaptability to diverse external factors. 2. Neural network models (CNN): Cnn achieve statistically significant improvements in scenarios incorporating temperature-related factors (T) and their combinations, particularly in the presence of external factors such as social unrest (U). However, its overall performance slightly decreases compared to the WGB_CR dataset. 3. Classical models (LDA, RIDGE): - Ridge continues to show consistent improvements across most scenarios, achieving statistical significance in a variety of combinations, especially those involving temperature (T + W, U + T + W). - Lda exhibits moderate performance under WOGB_CR scenarios, with improvements observed primarily in social unrest and climate change combinations (U + T + W). 4. Scenario-specific improvements: - Models such as EBM, FFN, and FFNLP demonstrate limited statistical significance, with their performance varying considerably across different scenarios. This suggests that these models are highly dependent on the context and specific dataset composition. 5. Impact of external factors (W, T, U): The inclusion of combined factors, particularly temperature (T) and social unrest (U), significantly enhance the predictive capabilities of models, such as XGB and RF. The WOGB_CR dataset amplifies the variability of improvements, particularly for neural networks and classical models. 6. Superior performance of XGB and RIDGE: Both models are the most reliable, achieving statistical significance across all relevance groups and scenarios. Their robustness and adaptability make them excellent candidates for predictive tasks in datasets without government-backed guarantees. These findings underscore the importance of incorporating external factors into predictive modeling frameworks, particularly in datasets with complex characteristics such as the absence of government-backed guarantees. Tree-based models and neural networks demonstrate the most significant benefits from these enriched datasets, further emphasizing their value for addressing challenging prediction tasks.
Table A21. Models with statistically significant improvements for KS and F1 metrics with climate change EF, considering WGB_CR, by evaluated scenario and relevance groups.
Table A21. Models with statistically significant improvements for KS and F1 metrics with climate change EF, considering WGB_CR, by evaluated scenario and relevance groups.
WGB_CR
KS F1 TOTAL
T T + W U U + T U + T + W U + W W T T + W U U + T U + T + W U + W W
1. D100
cnn--------YYYYYY6
ebmYYYYYYYYYYYYYY14
ffnlp-----YYY---YY-5
ffnsp---------Y----1
mlp--------YY----2
percep----------Y---1
qda-------YY----Y3
rf-------YYYYYYY7
ridge-------YYYYYYY7
xgbYY-YYYYYYYYYYY12
2. D99
cnnY-------YYYYYY7
ffnlp-------Y--YY-Y4
ffnsp-Y------------1
lda----YY--Y-YYY-6
qda-------YY-----2
rf-------YYYYYYY7
ridge----YY--YYYYYY8
xgbYYYYYYYYYYYYYY14
3. D80
cnn--------YYYYYY6
ebmYYYYYYYYYYYYYY14
ffnlp-----Y--------1
lda--YYYY-YYYYYYY11
mlp--------Y-Y---2
qda-------YY----Y3
rf--Y---YYYYYYYY9
ridge---YYY-YYYYYYY10
xgbYYYYYYYYYYYYYY14
Note: This table reports the machine learning models that exhibited statistically significant improvements in KS and F1 metrics when climate change external factors were included. Key findings include the following: 1. Top Performers: - EBM and XGB achieve perfect scores (14/14) in the 3. D80 scenario. - RIDGE and LDA show strong consistency across scenarios. 2. Scenario Variations: - 1. D100: EBM dominates (14/14), while CNN only shows F1 improvements. - 2. D99: XGB maintains perfect performance, while FFNLP shows scenario-specific significance. - 3. D80: Most models perform better, especially in F1 metrics. 3. Metric Differences: - KS improvements are less common than F1. - XGB shows the most balanced performance across both metrics. 4. New Models: - Added FFNLP and FFNSP show specific scenario significance. - Included MLP demonstrates limited but notable improvements.
Table A22. Models with statistically significant improvements for KS and F1 metrics with climate change EF, considering WOGB_CR, by evaluated scenario and relevance groups.
Table A22. Models with statistically significant improvements for KS and F1 metrics with climate change EF, considering WOGB_CR, by evaluated scenario and relevance groups.
WOGB_CR
KS F1 TOTAL
T T + W U U + T U + T + W U + W W T T + W U U + T U + T + W U + W W
1. D100
cnn--------YYYYYY6
ebmYYYYYYYYYYYYYY14
lda-YYYYYY-YYYYY-11
qda-------YY----Y3
rf-------YYYYYYY7
ridge-YYYYYY-YYYYYY12
xgbYYYYYYYYYYYYYY14
2. D99
cnnY-------YYYYYY7
ebmYYYYYYYYYYYYYY14
lda----YY--Y-YYY-6
qda-------YY-----2
rfY------YYYYYYY8
ridge----YY--YYYYYY8
xgbYYYYYYYYYYYYYY14
ffn-------YY--Y--3
3. D80
cnn-------YYYYYYY7
ebmYYYYYYYYYYYYYY14
ffnlp---Y----------1
ffnsp---------Y----1
knn-----YY-------2
ldaYYYYYYYYYYYYYY14
mlp--------YY--YY4
qda-------YY----Y3
rf-------YYYYYYY7
ridgeYYYYYYYYYYYYYY14
xgbYYYYYYYYYYYYYY14
xnn-------YY-----2
Note: This table reports the machine learning models that exhibited statistically significant improvements in KS and F1 metrics when climate change external factors were included, without government-backed guarantees. Key observations are the following: 1. Consistent Top Performers: - EBM, XGB, and RIDGE maintain perfect scores (14/14) in multiple scenarios. - LDA shows particularly strong performance in the 3. D80 scenario. 2. Dataset Impact: - WOGB_CR shows more variability in model performance compared to WGB_CR. - KNN and XNN appear only in WOGB_CR scenarios. 3. Metric Patterns: - F1 improvements are more frequent than KS across all models. - RF shows better F1 than KS performance consistently. 4. New Findings: - FFNLP and FFNSP show scenario-specific significance. - MLP demonstrates limited but notable improvements in 3. D80. - KNN appears as a new model with specific scenario significance. 5. Scenario Analysis: - 3. D80 shows the most consistent improvements across models. - 1. D100 has the fewest KS improvements overall. - 2. D99 reveals interesting variations in model performance.

Appendix B. Categorical Variable Mappings

Table A23. Mapping for credit portfolio segment.
Table A23. Mapping for credit portfolio segment.
CodeDescriptionCount%
3Consumer: Individual retail clients with personal loans254,69469.33
1Microenterprise: Clients with small business activities77,43721.08
2Small enterprise: SMEs with higher credit exposure than microenterprises35,1069.56
4Corporate: Large-scale borrowers or commercial entities1090.03
Table A24. Mapping for gender.
Table A24. Mapping for gender.
CodeDescriptionCount%
2Female225,48961.38
1Male141,85738.62
Table A25. Mapping for occupation code.
Table A25. Mapping for occupation code.
CodeDescriptionCount%
1Primary self-employed occupation (e.g., informal or freelance worker)201,79254.93
4Employee in private or public sector105,70928.78
12Independent professional (e.g., technician and artisan)39,93210.87
11Merchant or small business owner11,9453.25
Other values (codes 2, 3, 5, 6, 7, 8, 9, 10)89682.44
Table A26. Mapping for education level.
Table A26. Mapping for education level.
CodeDescriptionCount%
1Data not available209,80957.11
5Complete primary education55,49115.11
8Complete secondary education53,55214.58
6Incomplete primary education17,1224.66
4Non-university technical education14,6864.00
3Incomplete university education14,6653.99
7Complete university education10150.28
2Incomplete secondary education10060.27
Table A27. Mapping for employment code.
Table A27. Mapping for employment code.
CodeDescriptionCount%
15Merchant/Shopkeeper267,27072.76
19Technician/Technologist22,4706.12
16Street Vendor or Salesperson17,6924.82
31Gas Fitter/Plumber11,6283.17
80Upholsterer69571.89
66Employee (general services)48891.33
Other values (codes 1–94)36,4409.91
Table A28. Mapping for residence ownership code.
Table A28. Mapping for residence ownership code.
CodeDescriptionCount%
3Rented property247,84767.47
2Owned home95,63326.03
4Family-owned residence (shared)23,8636.50
1Missing or unclassified30.00
Table A29. Mapping for residence type code.
Table A29. Mapping for residence type code.
CodeDescriptionCount%
9Multifamily housing or apartment142,37838.76
5Detached or single-family house121,21633.00
10Shared family housing55,36615.07
7Informal housing or makeshift dwelling23,9136.51
11Non-residential property used as home23,1436.30
Other values (codes 1, 2, 3, 4, 6, 8)13300.36

References

  1. Liu, C.; Ming, Y.; Xiao, Y.; Zheng, W.; Hsu, C. Finding the next interesting loan for investors on a peer-to-peer lending platform. IEEE Access 2021, 9, 111293–111304. [Google Scholar] [CrossRef]
  2. Lombardo, G.; Pellegrino, M.; Adosoglou, G.; Cagnoni, S.; Pardalos, P.M.; Poggi, A. Machine learning for bankruptcy prediction in the American stock market: Dataset and benchmarks. Future Internet 2022, 14, 244. [Google Scholar] [CrossRef]
  3. Wen, C.; Yang, J.; Gan, L.; Pan, Y. Big data driven internet of things for credit evaluation and early warning in finance. Future Gener. Comput. Syst. 2021, 124, 295–307. [Google Scholar] [CrossRef]
  4. Shih, D.H.; Wu, T.W.; Shih, P.Y.; Lu, N.A.; Shih, M.H. A framework of global credit-scoring modeling using outlier detection and machine learning in a P2P lending platform. Mathematics 2022, 10, 2282. [Google Scholar] [CrossRef]
  5. Mousavi, M.M.; Lin, J. The application of PROMETHEE multi-criteria decision aid in financial decision making: Case of distress prediction models evaluation. Expert Syst. Appl. 2020, 159, 113438. [Google Scholar] [CrossRef]
  6. Hani, U.; Wickramasinghe, A.; Kattiyapornpong, U.; Shahriar, S. The future of data-driven relationship innovation in the microfinance industry. Ann. Oper. Res. 2022, 333, 971–997. [Google Scholar] [CrossRef]
  7. Chen, Z.; Chen, W.; Shi, Y. Ensemble learning with label proportions for bankruptcy prediction. Expert Syst. Appl. 2020, 146, 113155. [Google Scholar] [CrossRef]
  8. Orlova, E.V. Decision-making techniques for credit resource management using machine learning and optimization. Information 2020, 11, 144. [Google Scholar] [CrossRef]
  9. Sun, M.; Li, Y. Credit risk simulation of enterprise financial management based on machine learning algorithm. Mob. Inf. Syst. 2022, 2022, 9007140. [Google Scholar] [CrossRef]
  10. Chen, S.F.; Chakraborty, G.; Li, L.H. Feature selection on credit risk prediction for peer-to-peer lending. In New Frontiers in Artificial Intelligence: JSAI-isAI 2018 Workshops, JURISIN, AI-Biz, SKL, LENLS, IDAA, Yokohama, Japan, 12–14 November 2018, Revised Selected Papers; Springer: Cham, Switzerland, 2019; pp. 5–18. [Google Scholar]
  11. Telg, S.; Dubinova, A.; Lucas, A. COVID-19, credit risk management modeling, and government support. J. Bank. Financ. 2023, 147, 106638. [Google Scholar] [CrossRef]
  12. Ahmed, I. Essays on Climate-Related Financial Risks. Ph.D. Dissertation, University of Otago, Dunedin, New Zealand, 2023. [Google Scholar]
  13. Cho, S.H.; Shin, K. Feature-weighted counterfactual-based explanation for bankruptcy prediction. Expert Syst. Appl. 2023, 216, 119390. [Google Scholar] [CrossRef]
  14. Mancisidor, R.A.; Kampffmeyer, M.; Aas, K.; Jenssen, R. Learning latent representations of bank customers with the variational autoencoder. Expert Syst. Appl. 2021, 164, 114020. [Google Scholar] [CrossRef]
  15. Wang, T.; Liu, R.; Qi, G. Multi-classification assessment of bank personal credit risk based on multi-source information fusion. Expert Syst. Appl. 2022, 191, 116236. [Google Scholar] [CrossRef]
  16. Chen, C.; Lin, K.; Rudin, C.; Shaposhnik, Y.; Wang, S.; Wang, T. A holistic approach to interpretability in financial lending: Models, visualizations, and summary-explanations. Decis. Support Syst. 2022, 152, 113647. [Google Scholar] [CrossRef]
  17. Pavitha, N.; Sugave, S. Explainable Multistage Ensemble 1D Convolutional Neural Network for Trust Worthy Credit Decision. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 351–358. [Google Scholar]
  18. Nallakaruppan, M.K.; Chaturvedi, H.; Grover, V.; Balusamy, B.; Jaraut, P.; Bahadur, J.; Meena, V.P.; Hameed, I.A. Credit Risk Assessment and Financial Decision Support Using Explainable Artificial Intelligence. Risks 2024, 12, 164. [Google Scholar] [CrossRef]
  19. Bastos, J.A.; Matos, S.M. Explainable models of credit losses. Eur. J. Oper. Res. 2022, 301, 386–394. [Google Scholar] [CrossRef]
  20. BBC News Mundo. Muertes por COVID-19: El Gráfico que Muestra los 10 Primeros Países del Mundo en el Ranking de Fallecimientos per Cápita (y Cuáles son de América Latina), BBC News Mundo. 2020. Available online: https://www.bbc.com/mundo/noticias-54358383 (accessed on 26 December 2024).
  21. CNN en Español, Protestas Violentas en Perú tras Rechazo de Adelantar Elecciones. CNN en Español, Video. 2024. Available online: https://cnnespanol.cnn.com/video/protestas-violenta-lima-congreso-rechazo-elecciones-crisis-politica-jimena-de-la-quintana-mirador-mundial (accessed on 26 December 2024).
  22. Ye, Y.F.; Jiang, Y.X.; Shao, Y.H.; Li, C.N. Financial conditions index construction through weighted Lp-norm support vector regression. J. Adv. Comput. Intell. Intell. Inform. 2015, 19, 397–406. [Google Scholar] [CrossRef]
  23. Noriega, J.P.; Rivera, L.A.; Herrera, J.A. Machine learning for credit risk prediction: A systematic literature review. Data 2023, 8, 169. [Google Scholar] [CrossRef]
  24. Aranha, M.; Bolar, K. Efficacies of artificial neural networks ushering improvement in the prediction of extant credit risk models. Cogent Econ. Financ. 2023, 11, 2210916. [Google Scholar] [CrossRef]
  25. Merćep, A.; Mrčela, L.; Birov, M.; Kostanjčar, Z. Deep neural networks for behavioral credit rating. Entropy 2020, 23, 27. [Google Scholar] [CrossRef]
  26. Fan, S.; Shen, Y.; Peng, S. Improved ML-based technique for credit card scoring in internet financial risk control. Complexity 2020, 2020, 8706285. [Google Scholar] [CrossRef]
  27. Geraldo-Campos, L.A.; Soria, J.J.; Pando-Ezcurra, T. Machine learning for credit risk in the Reactiva Perú Program: A comparison of the Lasso and Ridge regression models. Economies 2022, 10, 188. [Google Scholar] [CrossRef]
  28. Quintanilla Llanos, M.A. Efecto del desempleo en la morosidad en portafolios de tarjetas de crédito ante el impacto de la pandemia (2011–2023); Universidad Peruana de Ciencias Aplicadas (UPC): Santiago de Surco, Peru, 2024. [Google Scholar]
  29. Ontón, R.; Reilly, B. Dynamic Between Business and Personal Debt in a Group of Microenterprises in Peru, in the Context of the COVID-19 Pandemic; Superintendencia de Banca, Seguros y AFP del Perú (SBS): San Isidro, Peru, 2024. [Google Scholar]
  30. Flori, A.; Pammolli, F.; Spelta, A. Commodity prices co-movements and financial stability: A multidimensional visibility nexus with climate conditions. J. Financ. Stab. 2021, 54, 100876. [Google Scholar] [CrossRef]
  31. Capasso, G.; Gianfrate, G.; Spinelli, M. Climate change and credit risk. J. Clean. Prod. 2020, 266, 121634. [Google Scholar] [CrossRef]
  32. Nieto, M.J. Banks, climate risk and financial stability. J. Financ. Regul. Compliance 2019, 27, 243–262. [Google Scholar] [CrossRef]
  33. Fayyaz, M.R.; Rasouli, M.R.; Amiri, B. A data-driven and network-aware approach for credit risk prediction in supply chain finance. Ind. Manag. Data Syst. 2020, 121, 785–808. [Google Scholar] [CrossRef]
  34. Acharya, V.V.; Berner, R.; Engle, R.; Jung, H.; Stroebel, J.; Zeng, X.; Zhao, Y. Climate stress testing. Annu. Rev. Financ. Econ. 2023, 15, 291–326. [Google Scholar] [CrossRef]
  35. Calsin Enriquez, H.O.; Vargas Salazar, I.Y. Monetary policy and microfinance credit risk: The case of Peru, period 2003–2021. In Pensamiento Crítico; Facultad de Ciencias Económicas, Universidad Nacional Mayor de San Marcos: Lima, Peru, 2023. [Google Scholar]
  36. Madeira, C. The double impact of deep social unrest and a pandemic: Evidence from Chile. Can. J. Econ. Can. d’économique 2022, 55, 135–171. [Google Scholar] [CrossRef]
  37. Espinosa-Méndez, C. Civil unrest and herding behavior: Evidence in an emerging market. Econ.-Res.-Ekon. Istraz. 2022, 35, 1243–1261. [Google Scholar] [CrossRef]
  38. Dendramis, Y.; Tzavalis, E.; Adraktas, G. Credit risk modelling under recessionary and financially distressed conditions. J. Bank. Financ. 2018, 91, 160–175. [Google Scholar] [CrossRef]
  39. Elkhayat, N.; ElBannan, M.A. State divestitures and bank performance: Empirical evidence from the Middle East and North Africa region. Asian Econ. Financ. Rev. 2018, 8, 145. [Google Scholar] [CrossRef]
  40. Kemp, L.; Xu, C.; Depledge, J.; Ebi, K.L.; Gibbins, G.; Kohler, T.A.; Rockström, J.; Scheffer, M.; Schellnhuber, H.J.; Steffen, W.; et al. Climate endgame: Exploring catastrophic climate change scenarios. Proc. Natl. Acad. Sci. USA 2022, 119, e2108146119. [Google Scholar] [CrossRef] [PubMed]
  41. Aguilar-Valenzuela, G.R.; Vilca, E.M. Algoritmos para Machine Learning utilizados en la Gestión de Riesgo Crediticio en Perú. Micaela Rev. De Investig.-UNAMBA 2024, 5, 30–35. [Google Scholar] [CrossRef]
  42. Alzamora, G.S.; Aceituno-Rojo, M.R.; Condori-Alejo, H.I. An assertive machine learning model for rural micro credit assessment in Peru. Procedia Comput. Sci. 2022, 202, 301–306. [Google Scholar] [CrossRef]
  43. Marbán, O.; Mariscal, G.; Segovia, J. A Data Mining & Knowledge Discovery Process Model. In Data Mining and Knowledge Discovery in Real Life Applications; Ponce, J., Karahoca, A., Eds.; IntechOpen: Rijeka, Croatia, 2009; Chapter 1. [Google Scholar] [CrossRef]
  44. Mann, S.C.; Logeswaran, R. Data analytics in improved bankruptcy prediction with industrial risk. In Proceedings of the 2021 14th International Conference on Developments in eSystems Engineering (DeSE), Sharjah, United Arab Emirates, 7–10 December 2021; pp. 23–26. [Google Scholar]
  45. Sahiq, A.N.M.; Ismail, S.; Nor, S.H.S.; Ul-Saufie, A.Z.; Yaacob, W.F.W. Application of logistic regression model on imbalanced data in personal bankruptcy prediction. In Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS), Ipoh, Malaysia, 7–8 September 2022; pp. 120–125. [Google Scholar]
  46. Liu, J.; Zhang, S.; Fan, H. A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network. Expert Syst. Appl. 2022, 195, 116624. [Google Scholar] [CrossRef]
  47. Awad, I.M.; Karaki, M.S.A. The impact of bank lending on Palestine economic growth: An econometric analysis of time series data. Financ. Innov. 2019, 5, 14. [Google Scholar] [CrossRef]
  48. Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]
  49. Pandey, M.K.; Mittal, M.; Subbiah, K. Optimal balancing & efficient feature ranking approach to minimize credit risk. Int. J. Inf. Manag. Data Insights 2021, 1, 100037. [Google Scholar]
  50. Biswas, N.; Mondal, A.S.; Kusumastuti, A.; Saha, S.; Mondal, K.C. Automated credit assessment framework using ETL process and machine learning. Innov. Syst. Softw. Eng. 2022, 2022, 257–270. [Google Scholar] [CrossRef]
  51. Bosker, J.; Gürtler, M.; Zöllner, M. Machine learning-based variable selection for clustered credit risk modeling. J. Bus. Econ. 2024, Online First, 1–36. [Google Scholar] [CrossRef]
  52. Nwafor, C.N.; Nwafor, O.; Brahma, S. Enhancing transparency and fairness in automated credit decisions: An explainable novel hybrid machine learning approach. Sci. Rep. 2024, 14, 25174. [Google Scholar] [CrossRef] [PubMed]
  53. Yu, X.; Yang, Q.; Wang, R.; Fang, R.; Deng, M. Data cleaning for personal credit scoring by utilizing social media data: An empirical study. IEEE Intell. Syst. 2020, 35, 7–15. [Google Scholar] [CrossRef]
  54. Alam, T.M.; Shaukat, K.; Hameed, I.A.; Luo, S.; Sarwar, M.U.; Shabbir, S.; Li, J.; Khushi, M. An investigation of credit card default prediction in the imbalanced datasets. IEEE Access 2020, 8, 201173–201198. [Google Scholar] [CrossRef]
  55. Dumitrescu, E.; Hué, S.; Hurlin, C.; Tokpavi, S. Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur. J. Oper. Res. 2022, 297, 1178–1192. [Google Scholar] [CrossRef]
  56. Zheng, B. Financial default payment predictions using a hybrid of simulated annealing heuristics and extreme gradient boosting machines. Int. J. Internet Technol. Secur. Trans. 2019, 9, 404–425. [Google Scholar] [CrossRef]
  57. Ma, Y.; Zhang, P.; Duan, S.; Zhang, T. Credit default prediction of Chinese real estate listed companies based on explainable machine learning. Financ. Res. Lett. 2023, 58, 104305. [Google Scholar] [CrossRef]
  58. Chen, C.-C.; Liu, Z.; Yang, G.; Wu, C.-C.; Ye, Q. An Improved Fault Diagnosis Using 1D-Convolutional Neural Network Model. Electronics 2021, 10, 59. [Google Scholar] [CrossRef]
  59. Lee, J.Y.; Yang, J.; Anderson, E. Who benefits from alternative data for credit scoring? Evidence from Peru. SSRN 2024. [Google Scholar] [CrossRef]
Figure 1. Time series visualization of external factors from January 2020 to September 2023. The plot includes normalized series (min–max scaled) for external factors such as COVID-19 confirmed cases, COVID-19 deaths, protest-related road blockages, weather-induced road blockages, and temperature anomalies. The y-axis ranges from 0 to 1, representing relative intensity over time for each factor. This normalization facilitates the comparison of dynamic patterns across factors with different units and magnitudes.
Figure 1. Time series visualization of external factors from January 2020 to September 2023. The plot includes normalized series (min–max scaled) for external factors such as COVID-19 confirmed cases, COVID-19 deaths, protest-related road blockages, weather-induced road blockages, and temperature anomalies. The y-axis ranges from 0 to 1, representing relative intensity over time for each factor. This normalization facilitates the comparison of dynamic patterns across factors with different units and magnitudes.
Data 10 00063 g001
Figure 2. CRISP-DM framework applied to EF-driven credit risk modeling. This diagram illustrates how the CRISP-DM phases structure the experimental workflow, from data understanding and preparation—including the monthly alignment and integration of exogenous time-series variables—to scenario-based modeling (with and without external factors) and the use of explainable AI techniques (SHAP and LIME) during evaluation. The variables DelinquencyDays_MMYYYY and ExternalFactorImpact_MMYYYY are constructed to support a time-aware analysis of credit behavior and its relationship to validated external shocks across economic sectors.
Figure 2. CRISP-DM framework applied to EF-driven credit risk modeling. This diagram illustrates how the CRISP-DM phases structure the experimental workflow, from data understanding and preparation—including the monthly alignment and integration of exogenous time-series variables—to scenario-based modeling (with and without external factors) and the use of explainable AI techniques (SHAP and LIME) during evaluation. The variables DelinquencyDays_MMYYYY and ExternalFactorImpact_MMYYYY are constructed to support a time-aware analysis of credit behavior and its relationship to validated external shocks across economic sectors.
Data 10 00063 g002
Figure 3. Impact on economic activities by external factor COVID-19. This figure presents normalized credit risk impact values across economic activity types under different pandemic-related external factor scenarios. The vertical axis represents a normalized metric (ranging from 0 to 1) that quantifies the relative intensity of impact observed for each external factor. External factor notations: F = COVID-19 Deaths, P = COVID-19 Positive Cases, F + P = Combined Impact of COVID-19 Deaths and Positive Cases, F + Mov = Combined Impact including Mobility Restrictions.
Figure 3. Impact on economic activities by external factor COVID-19. This figure presents normalized credit risk impact values across economic activity types under different pandemic-related external factor scenarios. The vertical axis represents a normalized metric (ranging from 0 to 1) that quantifies the relative intensity of impact observed for each external factor. External factor notations: F = COVID-19 Deaths, P = COVID-19 Positive Cases, F + P = Combined Impact of COVID-19 Deaths and Positive Cases, F + Mov = Combined Impact including Mobility Restrictions.
Data 10 00063 g003
Figure 4. Impact on economic activities by external factors: social unrest and climatic events. This figure presents normalized credit risk impact values across economic activity types under different external factor scenarios. The vertical axis represents a normalized impact metric (ranging from 0 to 1) that reflects the relative intensity of the effect across sectors. External factor notations: U = social unrest, T = temperature anomaly, W = weather-induced road blockages, U + T + W = combined effect of social unrest, temperature anomaly, and road blockages.
Figure 4. Impact on economic activities by external factors: social unrest and climatic events. This figure presents normalized credit risk impact values across economic activity types under different external factor scenarios. The vertical axis represents a normalized impact metric (ranging from 0 to 1) that reflects the relative intensity of the effect across sectors. External factor notations: U = social unrest, T = temperature anomaly, W = weather-induced road blockages, U + T + W = combined effect of social unrest, temperature anomaly, and road blockages.
Data 10 00063 g004
Figure 5. Relevance of Economic Activity.
Figure 5. Relevance of Economic Activity.
Data 10 00063 g005
Figure 6. Top 10 features between without external factors (SFE) and COVID-19 stress scenarios (F + P). (a) XGB SHAP analysis, (b) XGB LIME analysis, (c) CNN SHAP analysis, and (d) CNN LIME analysis. Legend: F = COVID-19 confirmed cases; P = COVID-19 deceased; F + P = Combined scenaries; Mov = Mobility restrictions; WGB_CR = With Government-Backed Credit; WOGB_CR = Without Government-Backed Credit.
Figure 6. Top 10 features between without external factors (SFE) and COVID-19 stress scenarios (F + P). (a) XGB SHAP analysis, (b) XGB LIME analysis, (c) CNN SHAP analysis, and (d) CNN LIME analysis. Legend: F = COVID-19 confirmed cases; P = COVID-19 deceased; F + P = Combined scenaries; Mov = Mobility restrictions; WGB_CR = With Government-Backed Credit; WOGB_CR = Without Government-Backed Credit.
Data 10 00063 g006
Figure 7. Top 10 features between scenarios without external factors (SFE) and with climate external factors (T + W): (a) XGB SHAP analysis. (b) XGB LIME analysis. Legend: SFE = Scenario without external factors; T = Temperature Anomaly; W = Climate Blockage; T + W = Combined scenarios; WGB_CR = With Government-Backed Credit; WOGB_CR = Without Government-Backed Credit.
Figure 7. Top 10 features between scenarios without external factors (SFE) and with climate external factors (T + W): (a) XGB SHAP analysis. (b) XGB LIME analysis. Legend: SFE = Scenario without external factors; T = Temperature Anomaly; W = Climate Blockage; T + W = Combined scenarios; WGB_CR = With Government-Backed Credit; WOGB_CR = Without Government-Backed Credit.
Data 10 00063 g007
Figure 8. SHAP XGB and CNN: Top 10 Features in scenarios with social unrest (U). Legend: U = social unrest, represented by events of civil protest and political instability registered in 2023; WGB_CR = with government-backed credit; WOGB_CR = without government-backed credit.
Figure 8. SHAP XGB and CNN: Top 10 Features in scenarios with social unrest (U). Legend: U = social unrest, represented by events of civil protest and political instability registered in 2023; WGB_CR = with government-backed credit; WOGB_CR = without government-backed credit.
Data 10 00063 g008
Figure 9. SHAP CNN, XGB, and XNN: Top 10 features in scenarios combining external factors. Legend: U = social unrest (e.g., protests and road blockages); T = temperature anomalies; W = climate blockages; U + T + W = combined scenario including all three external factors; WOGB_CR = credit risk portfolio excluding government-backed loans.
Figure 9. SHAP CNN, XGB, and XNN: Top 10 features in scenarios combining external factors. Legend: U = social unrest (e.g., protests and road blockages); T = temperature anomalies; W = climate blockages; U + T + W = combined scenario including all three external factors; WOGB_CR = credit risk portfolio excluding government-backed loans.
Data 10 00063 g009
Table 1. Research questions.
Table 1. Research questions.
IdQuestion
RQ1How did the COVID-19 pandemic, through factors such as positive cases and mortality rates, influence credit defaults in different economic activities?
RQ2To what extent do climate change indicators, such as temperature anomalies and road blockages due to weather, impact credit delinquency patterns?
RQ3What is the relationship between credit delinquency and social unrest, considering disruptions to economic activities and societal stability?
RQ4How do the combined effects of external factors (COVID-19, climate change, and social unrest) contribute to variations in credit delinquency, and what are the most influential factors?
Note: This table outlines the research questions (RQs) focusing on how external factors such as COVID-19, climate change, and social unrest influence credit default and delinquency patterns. It explores their combined effects and identifies the most influential factors.
Table 2. Datasets.
Table 2. Datasets.
It.Dataset and EFSourceRows
1000×
Use to Evaluate
1COVID-19 Positive Caseshttps://www.datosabiertos.gob.pe/4523Time series of positive COVID cases
2COVID-19 Deathshttps://www.datosabiertos.gob.pe/1064Time series of deaths from COVID
3Road Blockageshttps://www.gob.pe/mtc271Time series of roadblocks due to social unrest and climate factors
4Temperature Anomalyhttps://www.senamhi.gob.pe/22Time-series temperature anomaly climatic factor
5Delinquency ActivityModel Finance1945Time series of days of non-payment of economic activities related to credits
6Credits with EFModel Finance367Labeled Observations w/
External Factors
Total Rows 8192
Note: This table summarizes the datasets used in the study, including their sources, the number of rows (in thousands), and their role in modeling credit risk under the influence of EF such as the COVID-19 pandemic, climate anomalies, and social unrest. Public datasets are accessible through the listed official URLs and are also included in the public repository. Proprietary financial data, used under a confidentiality agreement, have been anonymized and aggregated to comply with data protection regulations. The repository additionally contains the metadata, modeling programs, and experimental results used in this study: https://zenodo.org/records/14890903.
Table 3. Description of variables in the time-series dataset on credit delinquency by economic activity.
Table 3. Description of variables in the time-series dataset on credit delinquency by economic activity.
VariableTypeDescription
DateDateDate of credit data registration (YYYY-MM-DD).
EconomicActivityCodeintIdentifier for the borrower’s economic activity.
IncomeSourceCodeintCode representing the borrower’s income source (e.g., self-employed, salaried).
LoanPurposeCodeintIdentifier for the purpose or destination of the loan.
AvgOverdueDaysintAverage number of overdue days among loans in the grouped record.
MaxOverdueDaysintMaximum number of overdue days observed in the group.
LoanCountintNumber of loans aggregated in the same activity group and date.
Table 4. Summary of variables used in the modeling experiments.
Table 4. Summary of variables used in the modeling experiments.
VariableTypeDescription
PortfolioSegmentintPortfolio segment category (see Appendix Table A23)
InterestRatedecEffective interest rate applied to the credit
LoanAmountdecApproved loan amount
LoanTermDaysintActual loan term in days
MaxInternalAmountdecMaximum historical internal debt of the client
GovernmentBackedIDbitIndicates if the loan was government-backed (1 = yes)
GenderIDintCode representing borrower gender (see Appendix Table A24)
NormalizedSalarydecClient’s salary normalized by total portfolio
NormalizedTotalAssetsdecBusiness asset value normalized by total
NormalizedBusinessEquitydecBusiness equity normalized by total
NormalizedAvailableLiquiditydecBusiness liquidity normalized by total
NormalizedNetBusinessIncomedecNet business income normalized by total
BankingHistoryMonthsintNumber of months with banking history
MicroOriginationScoreintMicrocredit origination score
OverIndebtednessScoreintOver-indebtedness score
ProspectionSourceIDintCredit acquisition channel code
CreditDestinationIDintIntended use or destination of the credit
NormalizedDisbursementAgeintClient’s age at disbursement (normalized)
OccupationCodeintCode representing the borrower’s occupation (see Appendix Table A25)
DependentsintNumber of reported dependents
EducationLevelCodeintEducation level code (see Appendix Table A26)
EmploymentCodeintEmployment status code (see Appendix Table A27)
ResidenceStartYearintYear when current residence started
ResidenceOwnershipCodeintType of residence ownership (see Appendix Table A28)
ResidenceTypeCodeintType of residence (see Appendix Table A29)
DisbursementOfficeIDintOffice code where the loan was disbursed
DelinquencyDays_MMYYYYintNumber of days past due per month from January 2020 to September 2023
ExternalFactorImpact_MMYYYYbitIndicates impact of external factor per month from January 2020 to September 2023
Table 5. Descriptive statistics of the credit dataset used for modeling (FMOD).
Table 5. Descriptive statistics of the credit dataset used for modeling (FMOD).
IndicatorValue
Effective Interest Rate (Average)69.98%
Average Loan AmountS/. 5554
Average Loan Term (in days)364
Number of Credit Records367,346
Number of Government-Backed Loans21,769
Number of Non-Government-Backed Loans345,577
Number of Loans to Male Borrowers141,857
Number of Loans to Female Borrowers225,489
Loan Balance with Less Than 30 Days Past Due (Jan 2020)S/. 299,040,327.46
Loan Balance with 30–120 Days Past Due (Jan 2020)S/. 27,787,259.90
Delinquency Rate Over 30 Days (Jan 2020)8.50%
Note. According to the credit classification policy of FMOD, credits with more than 120 days of delinquency are classified as “charged-off” and fully provisioned (100%). As such, they are excluded from the calculation of the delinquency rate.
Table 6. Acronyms of evaluated scenarios.
Table 6. Acronyms of evaluated scenarios.
ItemScenarioAcronym
1Without EFSFE
2EF COVID DeadF
3EF COVID PositiveP
4EF COVID Dead and PositiveF + P
5EF Social UnrestU
6EF Temperature AnomalyT
7EF Road blockage weather (RBW)W
8EF S.Unrest and Temp.AnomalyU + T
9EF Temp. Anomaly and RBWT + W
10EF S. Unrest and RBWU + W
11EF S. Unrest, Temp. Anomaly and RBWU + T + W
12With govt-backed CreditWGB_CR
13Without govt-backed CreditWOGB_CR
Note: This table lists the acronyms for the evaluated scenarios, such as social unrest (U), temperature anomaly (T), and road blockages (RBW). Combined scenarios such as U + T and U + T + W are also defined.
Table 7. Economic activity delinquency stationarity.
Table 7. Economic activity delinquency stationarity.
Class.Without EFWith EF
Eval. StartApril 2018January 2019April 2020December 2022
Eval. EndFebruary 2020December 2019September 2023June 2023
(#) Non-Stat.647625465404
(#) Stat.494013729
(#) Indeterm.22211541
(%) Non-Stat.90%91%75%85%
(%) Stat.7%6%22%6%
(%) Indeterm.3%3%2%9%
(#) Total718686617474
(%) Total100%100%100%100%
Note: This table compares the stationarity of credit delinquency series by economic activity, under scenarios with and without EF, across different evaluation periods. The first column (April 2018–February 2020) includes the full set of 718 distinct economic activities associated with credit operations that remained active during that period. The other columns reflect subsets of those activities, based on the data availability of each corresponding evaluation window. The symbol (#) indicates the number of activities in each category.
Table 8. Acronyms of the results for the evaluated scenarios.
Table 8. Acronyms of the results for the evaluated scenarios.
ItemResultsAcronym
1Result F minus SFEF - SFE
2Result P minus SFEP - SFE
3Result F + P minus SFEF + P - SFE
4Result U minus SFEU - SFE
5Result T minus SFET - SFE
6Result W minus SFEW - SFE
7Result U + T minus SFEU + T - SFE
8Result T + W minus SFET + W - SFE
9Result U + W minus SFEU + W - SFE
10Result U + T + W minus SFEU + T + W - SFE
Note: This table provides acronyms for the results of the evaluated scenarios, comparing the differences between scenarios like “Result F minus SFE” and “Result U + T + W minus SFE”. These comparisons are systematically abbreviated for clarity.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Noriega, J.; Rivera, L.; Castañeda, J.; Herrera, J. From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning. Data 2025, 10, 63. https://doi.org/10.3390/data10050063

AMA Style

Noriega J, Rivera L, Castañeda J, Herrera J. From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning. Data. 2025; 10(5):63. https://doi.org/10.3390/data10050063

Chicago/Turabian Style

Noriega, Jomark, Luis Rivera, Jorge Castañeda, and José Herrera. 2025. "From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning" Data 10, no. 5: 63. https://doi.org/10.3390/data10050063

APA Style

Noriega, J., Rivera, L., Castañeda, J., & Herrera, J. (2025). From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning. Data, 10(5), 63. https://doi.org/10.3390/data10050063

Article Metrics

Back to TopTop