Research on External Risk Prediction of Belt and Road Initiative Major Projects Based on Machine Learning
Abstract
1. Introduction
1.1. Background and Motivation
1.2. Literature Review
1.2.1. Risk Assessment and Prediction in BRI Major Projects
1.2.2. Methodological Limitations of Traditional Risk Prediction Approaches
1.2.3. Advantage of Machine Learning for Risk Prediction
1.2.4. Research Gap
1.3. Research Objectives
- (1)
- To construct a comprehensive and specialized indicator system for quantifying external risks in the unique context of BRI major projects.
- (2)
- To establish an objective weighting framework using the entropy-weighted TOPSIS method to assign weights to risk indicators and calculate composite risk scores, which provides a quantifiable and data-driven foundation for risk assessment.
- (3)
- To develop and validate a stacking ensemble learning model that integrates multiple base ML algorithms (e.g., Ridge Regression, XGBoost, GBRT, RF) to deliver stronger predictive performance than individual models or traditional approaches or techniques.
- (4)
- To provide a data-driven decision-support tool that can enhance risk early-warning capabilities and inform strategic policy-making for ensuring the sustainable implementation of BRI major projects.
2. Materials and Methods
2.1. Constructing the External Risk Indicator System for BRI Major Projects
2.2. Data Sources and Preprocessing
2.3. Calculation of External Risk Scores for BRI Major Projects
2.4. Screening Key Risk Indicators
2.5. PCA for Dimensionality Reduction
3. Model Selection and Construction
3.1. Stacked Ensemble Classifier
3.2. Introduction of Each Classifier
- Ridge Regression Model
- 2.
- XGBoost Model
- 3.
- Random Forest Algorithm
- 4.
- Support Vector Regression Model
- 5.
- Gradient Boosting Regression Trees
- 6.
- Stacking Ensemble Model
4. Model Prediction Results and Evaluation
4.1. Hyperparameter Tuning
4.2. Prediction Results
4.3. Model Evaluation
4.4. Robustness Checks
4.4.1. Sensitivity Analysis to Data Partitioning
4.4.2. Model Configuration Robustness
4.4.3. Sensitivity Analysis
4.4.4. Model Validation and Robustness Check
5. Discussion
5.1. Theoretical Implications of the Predictive Model
5.2. Comparison with Existing Literature
5.3. Limitations and Future Research
6. Conclusions
6.1. Summary of Findings
6.2. Practical Implications and Policy Recommendations
6.3. Contributions to Sustainable Development
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hu, Y.N.; Ding, Y.B.; Wang, T.S. Risk identification and countermeasures for PPP projects in Belt and Road countries. Int. Econ. Coop. 2019, 132–140. [Google Scholar]
- Wu, Y.; Wang, J.; Ji, S.; Song, Z. Renewable energy investment risk assessment for nations along China’s Belt & Road Initiative: An ANP-cloud model method. Energy 2020, 190, 116381. [Google Scholar] [CrossRef]
- Duan, F.; Ji, Q.; Liu, B.-Y.; Fan, Y. Energy Investment Risk Assessment for Nations Along China’s Belt & Road Initiative. J. Clean. Prod. 2018, 170, 535–547. [Google Scholar] [CrossRef]
- Dang, L.; Zhao, J. Cultural risk and management strategy for Chinese enterprises’ overseas investment. China Econ. Rev. 2020, 61, 101433. [Google Scholar] [CrossRef]
- Hussain, J.; Zhou, K.; Guo, S.; Khan, A. Investment risk and natural resource potential in ‘Belt & Road Initiative’ countries: A multi-criteria decision-making approach. Sci. Total Environ. 2020, 723, 137981. [Google Scholar]
- Liu, B. Risk analysis and prevention of overseas infrastructure projects under the Belt and Road Initiative. Financ. Account. Transp. 2018, 13–15. [Google Scholar]
- Wang, J.X.; Zhang, H.C.; Xu, S.Y. Research on dynamic risk monitoring index system for overseas projects under the Belt and Road Initiative based on big data. E-Government 2021, 11–19. [Google Scholar] [CrossRef]
- Yao, D.; Zhan, W. Analysis of external risk correlation networks in international engineering contracting projects under the Belt and Road Initiative. Stat. Decis. 2023, 39, 178–182. [Google Scholar] [CrossRef]
- Chen, L.; Ren, J. Multi-attribute sustainability evaluation of alternative aviation fuels based on fuzzy ANP and fuzzy grey relational analysis. J. Air Transp. Manag. 2018, 68, 176–186. [Google Scholar] [CrossRef]
- Gacu, J.; Kantoush, S.; Candelario, R.; Falculan, J.; Moaje, K.V.; Famaran, M.J.; Nepomuceno, M.; Ebon, J.A.; Parungao, R.; Ignacio, R.; et al. Integrated multi-hazard risk assessment under compound disasters using analytical hierarchy process (AHP). Heliyon 2025, 11, e43173. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, K.; Mao, J.; Yu, Z.; Khan, M.; Wu, J. A TOPSIS-XGBoost evaluation method for train-track-bridge system travelling safety based on probability density evolution theory and machine learning. Structures 2025, 74, 108614. [Google Scholar] [CrossRef]
- Yang, C.; Zheng, X.; Dai, C.; Li, D.; Liu, L.; Fang, L.; Tian, H.; Shao, T.; Zhang, J. Risk Assessment of Coal Supply Chain Based on Analytic Hierarchy Process and Fuzzy Comprehensive Evaluation. Heliyon 2025, 11, e42629. [Google Scholar] [CrossRef]
- Han, D.; Kolli, K.K.; Gransar, H.; Lee, J.H.; Choi, S.-Y.; Chun, E.J.; Han, H.-W.; Park, S.H.; Sung, J.; Jung, H.O.; et al. Machine learning based risk prediction model for asymptomatic individuals who underwent coronary artery calcium score: Comparison with traditional risk prediction approaches. J. Cardiovasc. Comput. Tomogr. 2020, 14, 168–176. [Google Scholar] [CrossRef]
- Altuncan, I.Ü.; Vanhoucke, M. Duration forecasting in resource constrained projects: A hybrid risk model combining complexity indicators with sensitivity measures. Eur. J. Oper. Res. 2025, 325, 329–343. [Google Scholar] [CrossRef]
- Mullainathan, S.; Spiess, J. Machine Learning: An Applied Econometric Approach. J. Econ. Perspect. 2017, 31, 87–106. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, Y. Tail Risk Early Warning System for Capital Markets Based on Machine Learning Algorithms. Comput. Econ. 2022, 60, 901–923. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
- Hanafy, N.O. An extensive examination of uses of machine learning and artificial intelligence in the construction industry’s project life cycle. Energy Build. 2025, 345, 116094. [Google Scholar] [CrossRef]
- Wahono, T.; Purniawan, A.; Mukhlash, I.; Putri, E.R. Risk-based asset integrity management in the oil and gas industry from traditional to machine learning approaches: A systematic review. Results Eng. 2025, 28, 107287. [Google Scholar] [CrossRef]
- Ding, X.; Wan, H.; Shi, G.; Hong, C.; Liu, Z. Predicting hazard degree levels of metro operation accidents based on ordered constraint Apriori-RF method. Int. J. Transp. Sci. Technol. 2025, 18, 245–260. [Google Scholar] [CrossRef]
- Al-Naghi, A.A.A.; Ahmad, A.; Amin, M.N.; Algassem, O.; Alnawmasi, N. Sustainable Optimisation of GGBS-Based Concrete: De-Risking Mix Design through Predictive Machine Learning Models. Case Stud. Constr. Mater. 2025, 23, e04900. [Google Scholar] [CrossRef]
- Liu, X.; Xu, Z.; He, T.; Xiang, H.; Zhao, J.; Jiao, Y.; Jin, T.; Li, L.; Feng, W.; Yu, Z.; et al. Application of Spatiotemporal Data Prediction Method in Intelligent Monitoring System for Early Warning of Equipment Failure in Power Distribution Room. Procedia Comput. Sci. 2025, 262, 227–235. [Google Scholar] [CrossRef]
- Tutz, G.; Ramzan, S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data Anal. 2015, 90, 84–99. [Google Scholar] [CrossRef]
- Huang, G.; Yin, F.; He, H.; Zeng, P. Intelligent prediction of lost circulation based on improved k-nearest neighbor and self-attention mechanism-convolutional neural network. Geoenergy Sci. Eng. 2025, 247, 213712. [Google Scholar] [CrossRef]
- Taylan, O.; Bafail, A.O.; Abdulaal, R.M.; Kabli, M.R. Construction projects selection and risk assessment by fuzzy AHP and fuzzy TOPSIS methodologies. Appl. Soft Comput. 2014, 17, 105–116. [Google Scholar] [CrossRef]
- Chauhan, R.; Singh, T.; Tiwari, A.; Patnaik, A.; Thakur, N. Hybrid entropy—TOPSIS approach for energy performance prioritization in a rectangular channel employing impinging air jets. Energy 2017, 134, 360–368. [Google Scholar] [CrossRef]
- Buda, A.; Jarynowski, A. Life Time of Correlations and Its Applications; Andrzej Buda Wydawnictwo NiezaleĹĽne: Wrocław, Poland, 2010. [Google Scholar]
- Jia, J.; He, X.; Jin, Y. Statistics, 7th ed.; China Renmin University Press: Beijing, China, 2018. [Google Scholar]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning (ICML’96), Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
- Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef] [PubMed]
- Cui, S.; Yin, Y.; Wang, D.; Li, Z.; Wang, Y. A stacking-based ensemble learning method for earthquake casualty prediction. Appl. Soft Comput. 2021, 101, 107038. [Google Scholar] [CrossRef]
- Han, S.; Li, Z.; Zhou, Z.; Tan, Z.; Wei, F. Research on real-time prediction method of surrounding rock classification of TBM tunnel based on stacked ensemble classifier. Tunn. Undergr. Space Technol. 2025, 166, 107025. [Google Scholar] [CrossRef]
- Wu, L.; Li, J.; Zhang, J.; Wang, Z.; Tong, J.; Ding, F.; Li, M.; Feng, Y.; Li, H. Prediction model for the compressive strength of rock based on stacking ensemble learning and shapley additive explanations. Bull. Eng. Geol. Environ. 2024, 83, 439. [Google Scholar] [CrossRef]
- Allahbakhshian-Farsani, P.; Vafakhah, M.; Khosravi-Farsani, H.; Hertig, E. Regional flood frequency analysis through some machine learning models in semi-arid regions. Water Resour. Manag. 2020, 34, 2887–2909. [Google Scholar] [CrossRef]
- Çiftçioğlu, A.Ö. RAGN-L: A stacked ensemble learning technique for classification of Fire-Resistant columns. Expert Syst. Appl. 2024, 240, 122491. [Google Scholar]
- Pamir; Javaid, N.; Akbar, M.; Aldegheishem, A.; Alrajeh, N.; Mohammed, E.A. Employing a machine learning boosting classifiers based stacking ensemble model for detecting non technical losses in smart grids. IEEE Access 2022, 10, 121886–121899. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, Q.; Xie, G.; Tian, Z.; Zhang, B.; Yue, J. Sparse Principal Component Analysis and SHAP-based Explainable Framework for SO2 Concentration Prediction: A Multi-Method Stacked Ensemble Model. J. Environ. Chem. Eng. 2025, 13, 118330. [Google Scholar] [CrossRef]
- Marquardt, D.W.; Snee, R.D. Ridge Regression in Practice. Am. Stat. 1975, 29, 3–20. [Google Scholar] [CrossRef]
- Douak, F.; Melgani, F.; Benoudjit, N. Kernel ridge regression with active learning for wind speed prediction. Appl. Energy 2013, 103, 328–340. [Google Scholar] [CrossRef]
- Han, P.; Liu, Z.; Sun, Z.; Yan, C. A novel prediction model for ship fuel consumption considering shipping data privacy: An XGBoost-IGWO-LSTM-based personalized federated learning approach. Ocean Eng. 2024, 302, 117668. [Google Scholar] [CrossRef]
- Leo, B. Random forests. Mach. Learn. 2001, 45, 5–23. [Google Scholar] [PubMed]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Ren, X.; Tian, X.; Wang, K.; Yang, S.; Chen, W.; Wang, J. Enhanced load forecasting for distributed multi-energy system: A stacking ensemble learning method with deep reinforcement learning and model fusion. Energy 2025, 319, 135031. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, K.; Liu, Z.; Ma, T.; Luo, R.; Chen, H.; Wang, X.; Ge, W.; Sun, H. Prediction and explanation of debris flow velocity based on multi-strategy fusion Stacking ensemble learning model. J. Hydrol. 2024, 638, 131347. [Google Scholar] [CrossRef]
- Rao, R.S.; Kalabarige, L.R.; Alankar, B.; Sahu, A.K. Multimodal imputation-based stacked ensemble for prediction and classification of air quality index in Indian cities. Comput. Electr. Eng. 2024, 114, 109098. [Google Scholar] [CrossRef]
- Munshi, T.A.; Popi, K.; Jahan, L.N.; Howladar, M.F.; Hashan, M. Stacking modeling with genetic algorithm-based hyperparameter tuning for uniaxial compressive strength prediction. Appl. Comput. Geosci. 2025, 27, 100276. [Google Scholar] [CrossRef]
- Zhu, S.; Wu, H.; Ngai, E.W.T.; Ren, J.; He, D.; Ma, T.; Li, Y. A Financial Fraud Prediction Framework Based on Stacking Ensemble Learning. Systems 2024, 12, 588. [Google Scholar] [CrossRef]
- Li, Y.; Li, G.; Wang, K.; Wang, Z.; Chen, Y. Forest fire risk prediction based on stacking ensemble learning for Yunnan Province of China. Fire 2023, 7, 13. [Google Scholar] [CrossRef]
- Hua, Z.; Jing, X.; Martínez, L. Consensus reaching for social network group decision making with ELICIT information: A perspective from the complex network. Inf. Sci. 2023, 627, 71–96. [Google Scholar] [CrossRef]
- Cui, H.; Dong, S.; Hu, J.; Chen, M.; Hou, B.; Zhang, J.; Zhang, B.; Xian, J.; Chen, F. A hybrid MCDM model with Monte Carlo simulation to improve decision-making stability and reliability. Inf. Sci. 2023, 647, 119439. [Google Scholar] [CrossRef]
Dimension | Key Indicators | Data Source (Acronym) |
---|---|---|
Political Risk | Political Stability | WGI |
Regulatory Quality | WGI | |
Rule of Law | WGI | |
Military in Politics | ICRG | |
Corruption | ICRG | |
Social-Cultural Risk | Labor Market Instability | ESG |
Religious Tensions | ICRG | |
Ethnic Tensions | ICRG | |
Cultural Distance | hofstede | |
Internal Conflict | ICRG | |
External Conflict | ICRG | |
Crime Prevalence | NUMBEO | |
Economic Risk | Price Risk | WDI |
Economic Scale | WDI | |
Level of Economic Development | WDI | |
GDP Growth Rate Risk | ICRG | |
Debt Level | ESG | |
Exchange Rate Risk | ICRG | |
Economic Freedom | EFI | |
International Liquidity Risk | ICRG | |
Solvency Risk | ICRG | |
Legal, Environmental and Institutional Risk | Law and Order | ICRG |
Air Pollution Exposure | ESG | |
Carbon Emission Intensity | ESG | |
Human Capital Development | CPIA | |
Business Regulatory Environment | CPIA | |
Public Resource Allocation Equity | CPIA | |
Fiscal Policy Risk | CPIA |
Key Indicator | PC1 | PC2 | PC3 | PC4 |
---|---|---|---|---|
Political Stability | 0.295699 | −0.314095 | −0.295487 | 0.435434 |
Regulatory Quality | 0.390943 | −0.200437 | 0.107837 | 0.172709 |
Rule of Law | 0.375636 | −0.229324 | −0.07037 | 0.360275 |
Military in Politics | 0.253387 | −0.135788 | 0.498755 | −0.399742 |
Corruption | 0.337441 | −0.108654 | 0.137377 | −0.184799 |
Economic Scale | 0.176418 | 0.688609 | 0.251653 | 0.245889 |
Level of Economic Development | 0.345625 | 0.109165 | −0.416008 | −0.281951 |
Economic Freedom | 0.320299 | −0.157308 | 0.333152 | −0.232687 |
Carbon Emission Intensity | −0.261529 | −0.226545 | 0.523682 | 0.456902 |
Risk_Score | 0.344939 | 0.464938 | 0.082776 | 0.232307 |
Model | Optimal Hyperparameter Combination |
---|---|
Ridge | Alpha = 11.7 (Regularization strength) |
XGBoost | Learning rate = 0.02, n_estimators = 498, max_depth = 15, subsample = 0.6 |
RandomForest | n_estimators = 222, max_depth = 29, min_samples_split = 4 |
SVR | Kernel = ‘linear’, gamma = 0.01, epsilon = 0.01,C = 0.01 |
GBRT | Learn_rate = 0.1, n_estimators = 206, max_depth = 4 |
Model | MSE | RMSE | R2 |
---|---|---|---|
Ridge | 0.00036 | 0.01888 | 0.96483 |
XGBoost | 0.00043 | 0.02083 | 0.95720 |
RandomForest | 0.00050 | 0.02238 | 0.95058 |
SVR | 0.00036 | 0.01888 | 0.96482 |
GBRT | 0.00039 | 0.01985 | 0.96112 |
Stacking | 0.00034 | 0.01857 | 0.96597 |
Filter Threshold | Number of Selected Indicators | Model Performance (R2) |
---|---|---|
0.3 | 9 | 0.96597 |
0.35 | 9 | 0.96597 |
0.4 | 7 | 0.96119 |
Fold | R2 on Validation Set |
---|---|
1 | 0.97547 |
2 | 0.97047 |
3 | 0.97618 |
4 | 0.97768 |
5 | 0.97690 |
6 | 0.97442 |
7 | 0.97522 |
8 | 0.97390 |
9 | 0.97766 |
10 | 0.97638 |
Average | 0.97543 |
Std. Dev. | 0.00204 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Wang, C. Research on External Risk Prediction of Belt and Road Initiative Major Projects Based on Machine Learning. Sustainability 2025, 17, 9089. https://doi.org/10.3390/su17209089
Liu S, Wang C. Research on External Risk Prediction of Belt and Road Initiative Major Projects Based on Machine Learning. Sustainability. 2025; 17(20):9089. https://doi.org/10.3390/su17209089
Chicago/Turabian StyleLiu, Siyao, and Changfeng Wang. 2025. "Research on External Risk Prediction of Belt and Road Initiative Major Projects Based on Machine Learning" Sustainability 17, no. 20: 9089. https://doi.org/10.3390/su17209089
APA StyleLiu, S., & Wang, C. (2025). Research on External Risk Prediction of Belt and Road Initiative Major Projects Based on Machine Learning. Sustainability, 17(20), 9089. https://doi.org/10.3390/su17209089