Next Article in Journal
From Likes to Wallets: Exploring the Relationship Between Social Media and FinTech Usage
Next Article in Special Issue
SEP and Blockchain Adoption in Western Balkans and EU: The Mediating Role of ESG Activities and DEI Initiatives
Previous Article in Journal / Special Issue
The Role of Regulatory Sandboxes in FinTech Innovation: A Comparative Case Study of the UK, Singapore, and Hungary
 
 
Article
Peer-Review Record

AI Driven Fiscal Risk Assessment in the Eurozone: A Machine Learning Approach to Public Debt Vulnerability

by Noah Cheruiyot Mutai 1,*, Karim Farag 1, Lawrence Ibeh 2, Kaddour Chelabi 1, Nguyen Manh Cuong 1 and Olufunke Mercy Popoola 1
Reviewer 1:
Reviewer 2:
Submission received: 24 April 2025 / Revised: 12 June 2025 / Accepted: 20 June 2025 / Published: 25 June 2025
(This article belongs to the Special Issue Fintech Innovations: Transforming the Financial Landscape)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper applies several machine learning models—logistic regression, XGBoost, and support vector machines (SVM)—to assess fiscal stress across 20 Eurozone countries from 2000 to 2024. While the topic is timely and relevant to fiscal surveillance and sovereign risk assessment, I believe the paper lacks sufficient novelty and methodological rigor to meet publication standards in its current form.

Major Concerns

  1. Lack of Novelty and Conceptual Contribution
    As an applied paper, it does not present a particularly compelling or original narrative. The authors apply well-known and, in some cases, outdated models (e.g., basic logistic regression) without a clear justification for their selection. There is no innovation in either methodology or framing, and the paper does not meaningfully advance our understanding of fiscal risk modeling with AI/ML tools.

  2. Model Justification and Parameter Tuning
    The paper fails to explain why these specific models were chosen over others, especially more modern or interpretable alternatives. For instance, ensemble models such as LightGBM or interpretable ML techniques like GAMs are not discussed. In particular, the SVM model's poor performance raises red flags. Given the simplicity and cleanliness of the dataset, I suspect the authors did not adequately tune hyperparameters for SVM. Without a thorough search for optimal settings (e.g., kernel, C, γ), the reported underperformance lacks credibility.

  3. Insufficient Data Transparency
    The paper does not clearly report the number of training/test data points, nor does it provide basic descriptive statistics about the dataset (e.g., class distribution, missing data rates, number of stress episodes). Without this context, it is difficult to judge whether the models were trained on a sufficiently large and representative sample. This lack of transparency undermines the credibility of the findings.

  4. Incorrect Cross-Validation Method for Time Series Data
    Perhaps the most serious issue is the misapplication of random 10-fold cross-validation on panel time-series data. This is a well-known methodological flaw, as randomly splitting time-series data can lead to information leakage and artificially inflated performance metrics. A proper time-aware split—such as training on earlier years and testing on later ones—is required to reflect real-world forecasting conditions. This methodological error is particularly problematic given the policy implications of the study.

    The paper should cite and learn from relevant work such as "Application of Deep Neural Networks to Assess Corporate Credit Rating", which clearly demonstrates that random splits on temporal data can falsely boost model performance. Without correcting for this issue, the reported high AUC and accuracy values—especially for XGBoost—are likely unreliable.

Author Response

Review 1 matrix

We appreciate your careful reading of our manuscript, and the constructive feedback provided. Below, we address each of the concerns raised:

Comments

Response

1.     Lack of Novelty and Conceptual Contribution

As an applied paper, it does not present a particularly compelling or original narrative. The authors apply well-known and, in some cases, outdated models (e.g., basic logistic regression) without a clear justification for their selection. There is no innovation in either methodology or framing, and the paper does not meaningfully advance our understanding of fiscal risk modeling with AI/ML tools.

We agree that logistic regression and SVM are widely used. However, our contribution lies not in the novelty of individual algorithms but in the operational integration of these models into a fiscal surveillance framework grounded in fiscal theory. We have revised the introduction and discussion to emphasize this integration, highlighting the use of SHAP values and fiscal rules to enhance interpretability and policy relevance. We also clarify the added value of comparing traditional and ensemble methods for sovereign risk classification in a harmonized panel.

2.     Model Justification and Parameter Tuning

The paper fails to explain why these specific models were chosen over others, especially more modern or interpretable alternatives. For instance, ensemble models such as LightGBM or interpretable ML techniques like GAMs are not discussed. In particular, the SVM model's poor performance raises red flags. Given the simplicity and cleanliness of the dataset, I suspect the authors did not adequately tune hyperparameters for SVM. Without a thorough search for optimal settings (e.g., kernel, C, γ), the reported underperformance lacks credibility.

We have updated the methodology section to explain the rationale for choosing each model. While logistic regression provides a transparent baseline, XGBoost captures nonlinearity and interactions. SVM was included to test robustness under nonlinear kernels. We now clarify that hyperparameter tuning was performed via grid search for all models, and specific ranges for C, γ (SVM), and boosting iterations (XGBoost) are reported. We acknowledge that LightGBM and GAMs are viable alternatives and have mentioned these in the revised limitations section, inviting future comparisons.

3.     Insufficient Data Transparency

The paper does not clearly report the number of training/test data points, nor does it provide basic descriptive statistics about the dataset (e.g., class distribution, missing data rates, number of stress episodes). Without this context, it is difficult to judge whether the models were trained on a sufficiently large and representative sample. This lack of transparency undermines the credibility of the findings.

We have revised the data section to include descriptive statistics. SMOTE was used to balance classes. The paper also states that the dataset includes 600 observations after cleaning and imputation.

4.     Incorrect Cross-Validation Method for Time Series Data

Perhaps the most serious issue is the misapplication of random 10-fold cross-validation on panel time-series data. This is a well-known methodological flaw, as randomly splitting time-series data can lead to information leakage and artificially inflated performance metrics. A proper time-aware split—such as training on earlier years and testing on later ones—is required to reflect real-world forecasting conditions. This methodological error is particularly problematic given the policy implications of the study.

We acknowledge this important oversight. As you correctly noted, random k-fold CV can lead to information leakage in temporal data. We have replaced it with a rolling-origin cross-validation approach (i.e., training on earlier years, testing on subsequent years) and updated all performance metrics accordingly. This correction led to modest reductions in AUC and F1 scores for XGBoost but improved generalizability. We also cite relevant work (e.g., Petropoulos et al., 2023) that critiques inappropriate cross-validation in temporal ML contexts.

 

Reviewer 2 Report

Comments and Suggestions for Authors

Refer to the attached file for my comments.

Comments for author File: Comments.pdf

Author Response

Reviewer 2 comments

Response

1.      The linear equation for the primary balance is specified as: ???? = ?? + ???????−1 + ???? + ???. However, the vector ??? is not defined. I assume it refers to additional macro-fiscal control

variables, but clarification is needed. If it represents newly accumulated debt, this should be explicitly stated (Section 2, Page 3)

We agree that the vector  was not clearly defined. It refers to the set of macro-fiscal control variables included in the empirical specification—specifically, real GDP growth, inflation, and long-term interest rates. We have revised the text to clarify this and avoid ambiguity.

 

2.      In Table 1, the authors describe ???????_??? as lagged government debt. However, in the logistic regression specification presented in the subsequent methodology section, the debt variable appears to be unlagged, ??????. Given that early results rely on lagged debt, consistency is important. Should this variable not also be ??????−1?

Thank you for pointing out the inconsistency. The variable used in the regression is in fact lagged government debt (govdebt_lag= We have corrected the notation in the methodology section to ensure consistency with Table 1 and the underlying data structure.

 

3.      The manuscript states: “The predictive function is. The general prediction function is:” This appears to be a typographical or editing oversight. One of these sentences should be removed (Line 197).

We acknowledge this editing error. The redundant sentence has been removed in the revised manuscript.

4.      Table 2 presents result from the logistic regression model, where the authors note that both government balance and lagged government debt are statistically significant predictors. However, the reported coefficient for ???????_??? is zero, which implies no marginal effect on the log-odds of fiscal stress. In that case, is it correct to interpret the logistic regression model as effectively having only one significant explanatory variable—government balance?

4.(Table 2)

We appreciate the reviewer’s close attention. The coefficient for lagged debt was small but statistically significant at the 1% level. However, the presentation rounded it to zero due to formatting. We have updated Table 2 to report coefficients with sufficient precision to reflect their statistical contribution.

5.      In the discussion, the authors state: “The statistically significant coefficients for government balance and lagged debt affirm long-standing theoretical claims about the primacy of fiscal variables in determining sovereign risk.” Given the issue raised in point 4 above, this statement may require revision. If the coefficient for lagged debt is effectively zero, then its contribution to the model's predictive power—and theoretical affirmation—may be overstated. (Line 386)

We have revised the statement to reflect the more nuanced finding. While lagged debt is statistically significant, its marginal impact is smaller compared to government balance. The revised text now states: “The statistically significant coefficient for government balance, and the smaller but still significant effect of lagged debt, support fiscal theory’s emphasis on primary balances and debt accumulation as key drivers of sovereign risk.”

6.      Table 5 compares classification performance across Logistic Regression, XGBoost, and SVM. However, results for the Random Forest model are conspicuously absent, despite its mention earlier as one of the models applied. If results exist, they should be included; otherwise, the omission should be acknowledged and explained. (Table 5)

Thank you for this observation. This was a mistake, because we did not intend to apply the random forest model in this paper.

 

 

7.      While the authors aim to model and predict fiscal stress episodes, the manuscript does not provide information on the credit quality of the sovereigns studied. It would be valuable to explore whether the model’s predictions align with sovereign credit ratings from Moody’s, Fitch, or S&P. Since these agencies rate most EU countries, a comparison could provide an external validation of the model's predictive credibility and offer insight into how fiscal stress indicators map onto actual sovereign ratings.

We agree that aligning model outputs with sovereign credit ratings could strengthen the validation. While ratings were not included in the original scope, we have added a short exploratory analysis comparing model-predicted fiscal stress episodes with contemporaneous sovereign credit ratings (S&P). This comparison, now included in the discussion section, suggests a moderate correspondence, particularly in high-stress cases. We flag this as a promising direction for future work.

 

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

No further comments.

Author Response

Response to academic editors’ comments

We sincerely thank the academic editor for the thoughtful and constructive feedback. We address each point below:

  1. Annual Data Limitation: We acknowledge that annual data limit the models' ability to detect short-term fiscal stress. We now emphasize this more clearly in the revised discussion section and highlight the potential of high-frequency fiscal indicators as a promising area for future research.
  2. Overfitting Concerns (XGBoost AUC = 0.997): We agree that the near-perfect performance warrants scrutiny. To strengthen our robustness checks, we have now included additional out-of-sample validation using a temporal split (pre- and post-2015 data). Results remain strong but slightly attenuated, supporting model generalisability.
  3. SVM Underperformance: The reviewer is correct. We revisited the kernel settings and conducted experiments with polynomial and RBF kernels. Results improved marginally but remained below ensemble performance, reinforcing our conclusion that ensemble methods are better suited to the given data structure.
  4. Institutional Implementation Challenges: We have expanded the discussion on practical implementation, focusing on barriers such as model transparency requirements, data access constraints, and institutional inertia. This adds context to our policy relevance claims.
  5. Fiscal Stress Thresholding: We agree that a binary threshold may oversimplify risk dynamics. In the revised paper, we now include a short subsection proposing a potential extension using probabilistic outputs or fuzzy thresholds to better reflect the spectrum of fiscal risk.

Once again, we appreciate the editor’s positive assessment and insightful suggestions, which have substantially strengthened the manuscript.

Back to TopTop