Next Article in Journal
A Complementary Dataset of Scalp EEG Recordings Featuring Participants with Alzheimer’s Disease, Frontotemporal Dementia, and Healthy Controls, Obtained from Photostimulation EEG
Previous Article in Journal
A Dataset for Examining the Problem of the Use of Accounting Semi-Identity-Based Models in Econometrics
 
 
Article
Peer-Review Record

From Crisis to Algorithm: Credit Delinquency Prediction in Peru Under Critical External Factors Using Machine Learning

by Jomark Noriega 1,2,*,†, Luis Rivera 1,3, Jorge Castañeda 4,† and José Herrera 1,5
Reviewer 1: Anonymous
Reviewer 2:
Submission received: 19 February 2025 / Revised: 21 April 2025 / Accepted: 25 April 2025 / Published: 28 April 2025
(This article belongs to the Section Information Systems and Data Management)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors of the manuscript From Crisis to Algorithm: Credit Delinquency Prediction… investigate how selected external credit-risk factors - i.e., those beyond the control of loan borrowers - affect credit delinquency prediction using machine learning techniques. Among these external factors are the COVID-19 pandemic, temperature anomalies, weather-related transport blockages, and social unrest. On the credit-risk side, the authors analyze, among others, the number of government-backed loans that would otherwise default and the length of delinquency periods broken by types of economic activity. As a result of their study, the authors identify factors that significantly impact the predictability of credit risk.

The study is well planned and carried out. Notably, the authors provide a comprehensive list of determinants that may limit the robustness of their approach, which deserves special mention. However, a key weakness of the study is a lack of clarity in the presentation of results. This issue stems from the overly frequent use of bullet lists, unclear rules for applying bold and italic fonts within these lists, and an ambiguous rationale for dividing the summary into three separate sections: 6 (Challenges), 7 (Discussion), and 8 (Conclusions), whereas these sections could have been consolidated into a single summary.

A similar lack of clarity is evident in Section 2 (Related work), which does not seem to align with its title. Instead of providing an expected literature review - an aspect that is not given sufficient attention - the section primarily presents research questions and datasets. Another shortcoming of the manuscript is the absence of a comparison between the outcomes and conclusions of the study and those found in available literature (if there is any).

Minor remarks:

  1. English month abbreviations in Figures 3 & 4 (horizontal axes) as well as English feature names in Figures 6-9 (vertical axes) and in the related text passages should be used.
  2. Figures 6-9 require significantly extended captions. In its present form, it is rather inconvenient to follow what the plots show.
  3. The letters denoting factors (F, P, U, T, W, etc.) should be associated with their full names in figure captions wherever it applies. Listing them in Tables 4 & 5 is not sufficient.
  4. All acronyms must be defined in text when they are used for the first time.
  5. How can the authors explain why the external factors tend to cause a decrease in non-stationarity of the analyzed datasets (Table 3)? Don’t they introduce additional spikes or trends, which could do exactly the opposite?

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents experimental results applying multiple machine learning algorithms to model credit risk for a data set of Peruvian loans, under external factors (EF). Since the exploration of external factors such as economics and Covid-19 has been widely studied, the key contribution of this paper is the application to a novel Peruvian data set, and the use of machine learning with EF. This is interesting, but the article currently has several flaws that will need to be addressed:-

  1. There is a Section 2 called “Related Work”, but it is not related work: it describes the RQs and data. A separate literature review is required.
  2. In Section 1, the authors claim “a significant gap persists in academic research on the impact of EF on credit risk assessment”, but this is untrue. The effect of external economic conditions has been studied for almost two decades and a search on Google Scholar on “macroeconomic factor credit risk” will reveal many relevant papers. Additionally, searching for “Covid-19 credit risk” also reveals a rich source of articles. This literature is relevant to this study and so should be reviewed, and the RQs and results positioned in relation to existing literature (e.g. what is new? How do results align to previous results? And so on).
  3. Similarly, one of the novelties of this article is application to Peruvian data but no literature for credit risk in Peru is given. A quick search of “Peru credit risk” in Google Scholar yields several relevant papers that can be reviewed as background material.
  4. Although the article has 51 references, many of the references lack a clear connection to the specific topics covered by the article and it would be better to remove them and replace them with more relevant literature such as those mentioned in the two points above.
  5. In Table 2, what is “Model Finance”? Can a website be given? Can the data be made available to other researchers?
  6. In section 3, include a citation for CRISP-DM.
  7. In section 3 and Figure 2, I do not see clearly how this relates to CRISP-DM: how does it connect to the 6 CRISP-DM processes? (see https://www.datascience-pm.com/crisp-dm-2/). Why does CRISP-DM need to be “adapted”. Surely, EF can fit into the Data Processing and Modeling processes.
  8. Details of the credit data need to be given: What type of loans are they? (secured/unsecured, consumer/business, residential, etc.?). What is the default rate? What is the loan term? What variables are available for model build? Distribution of variables? Locations in Peru (urban/rural?). On page 9, it says there are 367,000 loans but in the Abstract it is 8.3 million, so which is right? How is data presented: one row of data, or rows for each payment period (i.e. panel data)?
  9. It is very important that the method by which EF time-series data is integrated with credit data is clearly described. For example, was it linked to time of default event in credit data? (in which case how was it linked if there was no default?). Authors must give details because the way this is done will have a big impact on results.
  10. The authors try a very large number of machine learning algorithms, but given the RQs in Table 1, it is not clear to me why. It would have been reasonable to choose a few robust models like XGB and applied the experiments with them. There is already a good existing literature benchmarking machine learning for credit scoring (typically XGB and ensemble methods do well).
  11. In any case, the authors need to properly define what each of the machine learning algorithms are. For example, although “ebm” seems to be one of the best models used, it is undefined and I have no idea what it is.
  12. On page 13, “demonstrated scenario-specific improvements…”: please explain what this means.
  13. Accuracy is used as a performance measure but this is typically not used in credit scoring because of class imbalance (e.g. if default rate is very low, it is easy to get a high accuracy just by predicting all as non-default). Instead, use an alternative like Kolmogorov-Smirnov or F1-score.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have done much work to improve the manuscript by addressing my critical remarks and completing the missing elements. However, in their response to my remark 5 I see confusing statements.

  1. "When signs of delinquency arise, they may limit or suspend lending to affected segments, thereby mitigating future volatility and effectively cutting off persistent trends. This active management leads to reduced stationarity in observed default behavior."
  2. "As a result, delinquencies tend to persist across broader population groups and over longer periods, increasing the temporal dependence in the series and thus producing more stationary patterns."

Ad. 1. If trends are removed, stationarity is expected to strengthen rather than to be reduced.

Ad. 2. If temporal dependencies are increased, stationarity is expected to weaken rather than to increase.

The authors are asked to clarify what they mean when using their unconventional statements.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The 1st revision of this article is a substantial change from the first submitted version. It represents a good improvement and deals with most of my previous comments. The exposition is much clearer and background deeper. However, there remain some several areas of concern, mostly of a minor nature.

  1. In the Abstract, it states that “CNN, XGB and XNN consistently demonstrated superior adaptability”. What is meant by “adaptability”? Does it merely mean they give superior performance, or something else?
  2. On page 2, what is the “BIG DATA concept”? and, is there a reason it is in capital letters? Please rephrase this to make it clear what is meant.

 

  1. On page 2, the data is referred to as the “financial model (FMOD)”: this is confusing because data is not a model. Do the authors mean “Model Finance”: i.e. the company the data is sourced from?

 

  1. Table 1 does not belong in the Introduction; would be better in Section 3.1. (materials) and linked to Tables 4 & 5.

 

  1. Page 3: “Finally, this study evaluated”: to be consistent with tense in remainder of the paragraph, perhaps this should be “evaluates”.

 

  1. Page 4 line 8: “This study analyzes …”: put citation at the beginning: “This study [35] analyzes …”.

 

  1. Figure 1: What kind of Visualization is that? A plot of frequencies over time, or something else? What is the scale in the vertical axis? Are all graphs in the same scale?

 

  1. Figures 1 & 3: What does “FE” mean at the beginning of the external events’ names?

 

  1. Table 5: In description, what does “see equivalence sheet” refer to? Please explain.

 

  1. Equations (1), (2) & (3): The “Where” should be “would”.

 

  1. Page 10: Provide a reference for the “Bayesian optimization” method used for hyperparameter setting.

 

  1. Figure 3 & 4: What exactly is shown on the vertical axis?

 

  1. Page 13 line 3: “shifting fatality data forward by one period”: what is meant by “one period”? One month, one quarter, or something else?

 

  1. CNN is used in this study but since it takes training data in a 2D grid, which works for images, how was the tabular data for this study fed as input into CNN?

 

  1. Page 13: “using a 10-fold cross-validation split (60% training, 20% validation, 20% testing)”: I am not sure this makes sense, since cross-validation itself splits training data into training and validation (10 times). Sometimes cross-validation is used for grid search and a separate test set is held out for testing. However the results tables in Appendix A report the cross-validation results so the validation sets from cross-validation seem to be used as testing. This is fine, but then the question is what is used for hyperparameter tuning? The authors need to be more precise about their use of data for training, grid search and testing.
  2. Section 4: I am confused what the “economic activities” are that the authors refer to. Are they from the credit data. In Table 6, e.g. what are the 718 economic activities for 2018.Apr to 2020.Feb?

 

  1. It is not clear to me why authors give enthusiastic reports of XNN since it does not have good performance overall. In particular, for COVID where AUC is typically around 0.5 which is an ineffective model. For this reason, the XNN results in Fig 6(a) & (b) do not have any value (why explain variables in an ineffective model?).

 

  1. Appendix A: Suggest this is called something like “Results Tables”, rather than “Additional Tables” since the tables are important/central for the analysis, and not just additional.

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop