# Simulation and Reconstruction of Runoff in the High-Cold Mountains Area Based on Multiple Machine Learning Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

_{XY}) and random forest feature importance (FI) (ρ

_{XY}= 0.63, FI = 0.723), followed by soil temperature (ρ

_{XY}= 0.63, FI = 0.043), precipitation, hours of sunshine, wind speed, relative humidity, and atmospheric circulation were weakly correlated. A total of 12 elements were selected as the machine learning input data. (2) Comparing the results of the Yurungkash River runoff simulated by eight machine learning methods, we found that the gradient boosting and random forest methods performed best, followed by the AdaBoost and Bagging methods, with Nash–Sutcliffe efficiency coefficients (NSE) of 0.84, 0.82, 0.78, and 0.78, while the support vector regression (NSE = 0.68), ridge (NSE = 0.53), K-nearest neighbor (NSE = 0.56), and linear regression (NSE = 0.51) were simulated poorly. (3) The application of four machine learning methods, gradient boosting, random forest, AdaBoost, and bagging, to simulate the runoff of the Kalakash River for 1978–1998 was generally outstanding, with the NSE exceeding 0.75, and the results of reconstructing the runoff data for the missing period (1999–2019) could well reflect the characteristics of the intra-annual and inter-annual changes in runoff.

## 1. Introduction

^{8}m

^{3}and 3.3 × 10

^{8}m

^{3}, respectively, over the past 50 years, leading to greater flooding [4,5,6]. Arid and semi-arid mountainous areas in northwest China are highly vulnerable to glacial snowmelt flooding under extreme climate change [7]. Floods can cause damage to local transportation facilities, downstream river runoff processes, and related hydrological forecasting systems such as the bursting of the Kayagil glacial dam in the Kunlun Mountains of Xinjiang [8,9]. Achieving sustainable use and scientific management of water resources in the HCMA is directly linked to the green and high quality development of the region downstream [10]. Runoff simulation and reconstruction can provide a rational decision-making basis for optimizing the allocation and use of water resources in the HCMA, thus reducing and preventing the occurrence of natural disasters such as floods, which is of great importance for ensuring downstream ecological security and promoting economic and social development [11,12].

## 2. Research Area and Data

#### 2.1. Research Area

^{4}km

^{3}, the average air temperature of the basin is 10.6 °C, the precipitation is 38.4 mm, and the average runoff of the Tongguziluoke Hydrological Station is 21.95 × 10

^{8}m

^{3}. There are more than 1300 glaciers in the Yurungkash River Basin, with a total area of 2958.31 km

^{2}, total coverage of 20.30%, and the glacier reserves are 410.32 km [39]. The geographical location of the Kalakash River is between 77°25′~80° E and 34°52′~38°04′ N, originating from the Tuanjie Peak in the Karakoram Mountains, the total length of 808 km, the source of the highest mountain elevation of 6662 m, the average temperature for many years 11.3 °C, the annual precipitation of 36.5 mm, Wuluwati Hydrological Station cross-section of the measured multi-year average runoff of 21.51 × 10

^{8}m

^{3}. For the Kalakash River Basin, there are more than 1900 glaciers with a total area of 2163.17 km

^{2}, total coverage of 10.83%, glacier reserves 156.09 km

^{3}(Table 1) [40]. The two tributaries pass through the oasis plains of Hotan, Karakax and Lop Counties, connect with the Hotan River at Kuoshilashi, then flow for 319 km and join the Tarim River at Xiaojiake [41]. The simulation and reconstruction of runoff in the Hotan River Basin can provide recommendations for future water resources planning and management in the mountainous basin, which is of great significance for safeguarding the ecological balance of the green corridor zone of the Hotan River as well as promoting the socio-economic development of arid and semi-arid regions in China [42,43].

#### 2.2. Data

## 3. Research Methods

#### 3.1. Runoff Simulation and Reconstruction Modelling

#### 3.2. Feature Selection

#### 3.3. Evaluation Parameters

_{0}refers to the actual observed runoff, Q

_{m}refers to the runoff simulated by the model, $\overline{{\mathrm{Q}}_{0}}$ represents the mean of the actual observed value, the t superscript represents the moment t, and T represents the total number of time series.

## 4. Results and Analyses

#### 4.1. Feature Analysis

#### 4.1.1. Pearson Correlation Coefficient

_{XY}= 0.63) and T_mean (ρ

_{XY}= 0.63) were strongly correlated with runoff, P20_20 (ρ

_{XY}= 0.18), Wind (ρ

_{XY}= 0.21), and Sunshine (ρ

_{XY}= 0.22) were moderately correlated, and RH (ρ

_{XY}= 0.02) was weakly correlated. The Hotan River Basin belongs to the HCMA, with less clouds, more glaciers and snow, scarce precipitation, and runoff mainly from glaciers and snowmelt recharge [81]. Therefore, the T_mean and DT_mean are more important for runoff than precipitation and had the strongest correlation. Wind speed and sunshine hours influenced the temperature changes and indirectly contributed to glacier snowmelt, with a slightly higher correlation than P20_20. RH had less of an effect on runoff due to the dry and cold climate in the HCMA. The atmospheric circulation effects of WPSHI (ρ

_{XY}= 0.24) and AMO (ρ

_{XY}= 0.13) were nearly moderately correlated with runoff, and PDO (ρ

_{XY}= 0.06), ENSO (ρ

_{XY}= 0.04), NAO (ρ

_{XY}= 0.03), and AO (ρ

_{XY}= 0.01) were weakly correlated with runoff. Atmospheric circulation can influence global climate change and had a non-negligible influence on the formation and generation of runoff in the HCMA, but the correlation was small and can be used as an input feature to improve the credibility of runoff predictions.

#### 4.1.2. Random Forest Feature Importance

#### 4.2. Runoff Simulation of the Yurungkash River

#### 4.3. Runoff Simulation and Reconstruction of the Kalakash River

#### 4.3.1. Runoff Simulation of the Kalakash River

#### 4.3.2. Runoff Reconstruction of the Kalakash River

## 5. Discussion

## 6. Conclusions

- (1)
- Temperature is the most important driver of runoff changes in the mountainous areas upstream of the Hotan River, followed by precipitation, hours of sunshine, wind speed, and weak correlation of atmospheric circulation. The random forest features were ranked in order of importance as T_mean > RH > DT_mean > Wind > WSPHI > PDO > Sun > NAO > P20_20 > AO > ENSO > AMO, with a total of 12 elements selected as the machine learning training input data.
- (2)
- Machine learning (ML) methods can successfully simulate runoff changes in the HCMA. Comprehensive runoff curve coincidence and evaluation parameters using gradient boosting, random forest, AdaBoost, and bagging showed obvious advantages over several other ML methods with NSE of 0.84, 0.82, 0.78, and 0.78, respectively, and the other four methods performed the simulation poorly.
- (3)
- The four methods including random forest were applied to simulate the runoff of the Kalakash River from 1978 to 1998 with good results, and the Nash–Sutcliffe efficiency coefficients exceeded 0.75. The reconstruction results of the runoff data of the missing period (1999–2019) reflected the intra-annual and inter-annual variations of the runoff characteristics.

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

ML Models | Parameters |
---|---|

Random Forest | Bootstrap = True, criterion = ‘mse’, max_depth = 100, max_samples = 490, n_estimators = 1000, random_state = 99 |

Gradient Boosting | n_estimators = 2000, learning_rate = 0.01, max_depth = 15, max_features = ‘sqrt’, alpha = 0.9 |

SVR | Verbose = False, degree = 3, coef0 = 0.0, kernel = ‘rbf’, tol = 0.001, epsilon = 0.1, max_iter = −1, shrinking = True, cache_size = 200 |

AdaBoost | n_estimators = 50, learning_rate = 1.0, random_stat e = None, base_estimator = None, loss = ‘linear’ |

KNN | n_neighbors = 4, weights = ‘uniform’, metric_params = None, n_jobs = None, p = 2, algorithm = ‘auto’ |

Bagging | n_estimators = 90, oob_score = True, random_state = 90, max_samples = 490 |

Ridge | Normalize = False, fit_intercept = True, alpha = 1.0, copy_X = True, max_iter = None, tol = 0.001, solver = ‘auto’, random_state = None |

Linear Regression | fit_intercept = True, normalize = False, copy_X = True, n_jobs = None, positive = False |

## References

- Luo, M.; Liu, T.; Meng, F.; Duan, Y.; Bao, A.; Xing, W.; Feng, X.; De Maeyer, P.; Frankl, A. Identifying climate change impacts on water resources in Xinjiang, China. Sci. Total Environ.
**2019**, 676, 613–626. [Google Scholar] [CrossRef] [PubMed] - Rangecroft, S.; Harrison, S.; Anderson, K.; Magrath, J.; Castel, A.P.; Pacheco, P. Climate change and water resources in arid mountains: An example from the Bolivian Andes. Ambio
**2013**, 42, 852–863. [Google Scholar] [CrossRef] [PubMed] - Zhang, J.; Xu, B.; Gu, Z.; Lv, Y.; Yin, Z.; Guo, X.; Li, L. Coupling of river discharges and alpine glaciers in arid Central Asia. Quat. Int.
**2023**, 667, 19–28. [Google Scholar] [CrossRef] - Yang, B.; Du, W.; Li, J.; Bao, A.; Ge, W.; Wang, S.; Lyu, X.; Gao, X.; Cheng, X. The Influence of Glacier Mass Balance on River Runoff in the Typical Alpine Basin. Water
**2023**, 15, 2762. [Google Scholar] [CrossRef] - Wang, C.; Xu, J.; Chen, Y.; Bai, L.; Chen, Z. A hybrid model to assess the impact of climate variability on streamflow for an ungauged mountainous basin. Clim. Dyn.
**2018**, 50, 2829–2844. [Google Scholar] [CrossRef] - Jiang, J.; Cai, M.; Xu, Y.; Fang, G. The changing trend of flooding in the Aksu River basin. J. Glaciol. Geocryol
**2021**, 43, 1–10. [Google Scholar] - Wang, X.; Chen, R.; Li, K.; Yang, Y.; Liu, J.; Liu, Z.; Han, C. Trends and Variability in Flood Magnitude: A Case Study of the Floods in the Qilian Mountains, Northwest China. Atmosphere
**2023**, 14, 557. [Google Scholar] [CrossRef] - Sommer, C.; Malz, P.; Seehaus, T.C.; Lippl, S.; Zemp, M.; Braun, M.H. Rapid glacier retreat and downwasting throughout the European Alps in the early 21st century. Nat. Commun.
**2020**, 11, 3209. [Google Scholar] [CrossRef] - Wang, J.; Liu, D.W.; Tian, S.N.; Ma, J.L.; Wang, L.X. Coupling reconstruction of atmospheric hydrological profile and dry-up risk prediction in a typical lake basin in arid area of China. Sci. Rep.
**2022**, 12, 6535. [Google Scholar] [CrossRef] - Mo, K.L.; Chen, Q.W.; Chen, C.; Zhang, J.Y.; Wang, L.; Bao, Z.X. Spatiotemporal variation of correlation between vegetation cover and precipitation in an arid mountain-oasis river basin in northwest China. J. Hydrol.
**2019**, 574, 138–147. [Google Scholar] [CrossRef] - Yan, L.; Lei, Q.W.; Jiang, C.; Yan, P.T.; Ren, Z.; Liu, B.; Liu, Z.J. Climate-informed monthly runoff prediction model using machine learning and feature importance analysis. Front. Environ. Sci.
**2022**, 10, 1049840. [Google Scholar] [CrossRef] - Xiao, C.; Zhong, Y.; Wu, Y.; Bai, H.; Li, W.; Wu, D.; Wang, C.; Tian, B. Applying Reconstructed Daily Water Storage and Modified Wetness Index to Flood Monitoring: A Case Study in the Yangtze River Basin. Remote Sens.
**2023**, 15, 3192. [Google Scholar] [CrossRef] - Wang, F.; Shao, W.; Yu, H.J.; Kan, G.Y.; He, X.Y.; Zhang, D.W.; Ren, M.L.; Wang, G. Re-evaluation of the power of the mann-kendall test for detecting monotonic trends in hydrometeorological time series. Front. Earth Sci.
**2020**, 8, 14. [Google Scholar] [CrossRef] - Abebe, S.A.; Qin, T.; Zhang, X.; Yan, D. Wavelet transform-based trend analysis of streamflow and precipitation in Upper Blue Nile River basin. J. Hydrol. Reg. Stud.
**2022**, 44, 101251. [Google Scholar] [CrossRef] - Huang, T.T.; Wang, Z.H.; Wu, Z.Y.; Xiao, P.Q.; Liu, Y. Attribution analysis of runoff evolution in Kuye River Basin based on the time-varying budyko framework. Front. Earth Sci.
**2023**, 10, 1092409. [Google Scholar] [CrossRef] - Quang, N.H.; Loc, H.H.; Park, E. Characterizing sediment load variability in the red river system using empirical orthogonal function analysis: Implications for water resources management in data poor regions. J. Hydrol.
**2023**, 624, 129891. [Google Scholar] [CrossRef] - Feng, Z.; Niu, W.; Tang, Z.; Xu, Y.; Zhang, H. Evolutionary artificial intelligence model via cooperation search algorithm and extreme learning machine for multiple scales nonstationary hydrological time series prediction. J. Hydrol.
**2021**, 595, 126062. [Google Scholar] [CrossRef] - Amiri, S.N.; Khoshravesh, M.; Valashedi, R.N. Assessing the effect of climate and land use changes on the hydrologic regimes in the upstream of Tajan river basin using SWAT model. Appl. Water Sci.
**2023**, 13, 130. [Google Scholar] [CrossRef] - Zhao, Y.; Chen, Y.; Zhu, Y.; Xu, S. Evaluating the Feasibility of the Liuxihe Model for Forecasting Inflow Flood to the Fengshuba Reservoir. Water
**2023**, 15, 1048. [Google Scholar] [CrossRef] - Liu, J.; Liu, T.; Bao, A.; De Maeyer, P.; Kurban, A.; Chen, X. Response of hydrological processes to input data in high alpine catchment: An assessment of the Yarkant River Basin in China. Water
**2016**, 8, 181. [Google Scholar] [CrossRef] - Luo, M.; Meng, F.; Liu, T.; Duan, Y.; Frankl, A.; Kurban, A.; De Maeyer, P. Multi–model ensemble approaches to assessment of effects of local Climate Change on water resources of the Hotan River Basin in Xinjiang, China. Water
**2017**, 9, 584. [Google Scholar] [CrossRef] - He, C.; Chen, F.; Long, A.; Qian, Y.; Tang, H. Improving the precision of monthly runoff prediction using the combined non-stationary methods in an oasis irrigation area. Agric. Water Manag.
**2023**, 279, 108161. [Google Scholar] [CrossRef] - Perrin, C.; Oudin, L.; Andreassian, V.; Rojas-Serna, C.; Michel, C.; Mathevet, T. Impact of limited streamflow data on the efficiency and the parameters of rainfall—Runoff models. Hydrol. Sci. J.
**2007**, 52, 131–151. [Google Scholar] [CrossRef] - Lu, M.; Hou, Q.; Qin, S.; Zhou, L.; Hua, D.; Wang, X.; Cheng, L. A Stacking Ensemble Model of Various Machine Learning Models for Daily Runoff Forecasting. Water
**2023**, 15, 1265. [Google Scholar] [CrossRef] - Mohammadi, B. A review on the applications of machine learning for runoff modeling. Sustain. Water Resour. Manag.
**2021**, 7, 98. [Google Scholar] [CrossRef] - Hao, R.; Bai, Z. Comparative Study for Daily Streamflow Simulation with Different Machine Learning Methods. Water
**2023**, 15, 1179. [Google Scholar] [CrossRef] - Rizeei, H.M.; Pradhan, B.; Saharkhiz, M.A. An integrated fluvial and flash pluvial model using 2D high-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS. Complex Intell. Syst.
**2019**, 5, 283–302. [Google Scholar] [CrossRef] - Langhammer, J. Flood Simulations Using a Sensor Network and Support Vector Machine Model. Water
**2023**, 15, 2004. [Google Scholar] [CrossRef] - Vaheddoost, B.; Safari, M.J.S.; Yilmaz, M.U. Rainfall-runoff simulation in ungauged tributary streams using drainage area ratio-based multivariate adaptive regression spline and random forest hybrid models. Pure Appl. Geophys.
**2023**, 180, 365–382. [Google Scholar] [CrossRef] - Nasiboglu, R.; Nasibov, E. WABL method as a universal defuzzifier in the fuzzy gradient boosting regression model. Expert Syst. Appl.
**2023**, 212, 118771. [Google Scholar] [CrossRef] - Noble, W.S. What is a support vector machine? Nat. Biotechnol.
**2006**, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed] - Ratnasingam, S.; Muñoz-Lopez, J. Distance Correlation-Based Feature Selection in Random Forest. Entropy
**2023**, 25, 1250. [Google Scholar] [CrossRef] - Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobotics
**2013**, 7, 21. [Google Scholar] [CrossRef] - Niu, W.-J.; Feng, Z.-K.; Xu, Y.-S.; Feng, B.-F.; Min, Y.-W. Improving prediction accuracy of hydrologic time series by least-squares support vector machine using decomposition reconstruction and swarm intelligence. J. Hydrol. Eng.
**2021**, 26, 04021030. [Google Scholar] [CrossRef] - Fu, X.; Shen, B.; Dong, Z.; Zhang, X. Assessing the impacts of changing climate and human activities on streamflow in the Hotan River, China. J. Water Clim. Chang.
**2020**, 11, 166–177. [Google Scholar] [CrossRef] - Wei, X.; Long, A.; Yin, Z.; Jiawen, Y. Simulation of response of glacier runoff to climate change in the Hotan River Basin. Water Resour. Prot.
**2022**, 38, 137–144. [Google Scholar] - Xu, Y. A study of comprehensive evaluation of the water resource carrying capacity in the arid area: A case study in the Hetian river basin of Xinjiang. J. Nat. Resour.
**1993**, 8, 229–237. [Google Scholar] - Fan, M.; Xu, J.; Chen, Y.; Li, W. Modeling streamflow driven by climate change in data-scarce mountainous basins. Sci. Total Environ.
**2021**, 790, 148256. [Google Scholar] [CrossRef] - Wang, X.; Luo, Y.; Sun, L.; Shafeeque, M. Different climate factors contributing for runoff increases in the high glacierized tributaries of Tarim River Basin, China. J. Hydrol. Reg. Stud.
**2021**, 36, 100845. [Google Scholar] [CrossRef] - Guo, H.; Ling, H.; Xu, H.; Guo, B. Study of suitable oasis scales based on water resource availability in an arid region of China: A case study of Hotan River Basin. Environ. Earth Sci.
**2016**, 75, 984. [Google Scholar] [CrossRef] - Tan, K.; Wang, X.; Gao, H. Analysis of ecological effects of comprehensive treatment in the Tarim River Basin using remote sensing data. Min. Sci. Technol.
**2011**, 21, 519–524. [Google Scholar] [CrossRef] - Xue, X.; Mi, Y.; Li, Z.; Chen, Y. Long-term trends and sustainability analysis of air temperature and precipitation in the Hotan River Basin. Resour. Sci.
**2008**, 30, 1833–1838. [Google Scholar] - Luo, M.; Liu, T.; Meng, F.; Duan, Y.; Huang, Y.; Frankl, A.; De Maeyer, P. Proportional coefficient method applied to TRMM rainfall data: Case study of hydrological simulations of the Hotan River Basin (China). J. Water Clim. Chang.
**2017**, 8, 627–640. [Google Scholar] [CrossRef] - Liu, P.; Jiang, Z.; Li, Y.; Lan, F.; Sun, Y.; Yue, X. Quantitative Study on Improved Budyko-Based Separation of Climate and Ecological Restoration of Runoff and Sediment Yield in Nandong Underground River System. Water
**2023**, 15, 1263. [Google Scholar] [CrossRef] - Nuber, S.; Rae, J.W.; Zhang, X.; Andersen, M.B.; Dumont, M.D.; Mithan, H.T.; Sun, Y.; De Boer, B.; Hall, I.R.; Barker, S. Indian Ocean salinity build-up primes deglacial ocean circulation recovery. Nature
**2023**, 617, 306–311. [Google Scholar] [CrossRef] [PubMed] - Ye, Y.; Li, Z.; Li, X.; Li, Z. Projection and Analysis of Floods in the Upper Heihe River Basin under Climate Change. Atmosphere
**2023**, 14, 1083. [Google Scholar] [CrossRef] - Zhang, Q.; Shen, Z.; Pokhrel, Y.; Farinotti, D.; Singh, V.P.; Xu, C.-Y.; Wu, W.; Wang, G. Oceanic climate changes threaten the sustainability of Asia’s water tower. Nature
**2023**, 615, 87–93. [Google Scholar] [CrossRef] - Wang, J.; Sun, X.; Cheng, Q.; Cui, Q. An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Sci. Total Environ.
**2021**, 762, 143099. [Google Scholar] [CrossRef] [PubMed] - Shah, S.H.; Angel, Y.; Houborg, R.; Ali, S.; McCabe, M.F. A random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat. Remote Sens.
**2019**, 11, 920. [Google Scholar] [CrossRef] - Nguyen, J.M.; Jézéquel, P.; Gillois, P.; Silva, L.; Ben Azzouz, F.; Lambert-Lacroix, S.; Juin, P.; Campone, M.; Gaultier, A.; Moreau-Gaudry, A. Random forest of perfect trees: Concept, performance, applications and perspectives. Bioinformatics
**2021**, 37, 2165–2174. [Google Scholar] [CrossRef] - Saravanan, S.; Abijith, D.; Reddy, N.M.; Parthasarathy, K.; Janardhanam, N.; Sathiyamurthi, S.; Sivakumar, V. Flood susceptibility mapping using machine learning boosting algorithms techniques in Idukki district of Kerala India. Urban Clim.
**2023**, 49, 101503. [Google Scholar] [CrossRef] - Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Kurani, A.; Doshi, P.; Vakharia, A.; Shah, M. A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann. Data Sci.
**2023**, 10, 183–208. [Google Scholar] [CrossRef] - Zhong, W.; Du, L. Predicting Traffic Casualties Using Support Vector Machines with Heuristic Algorithms: A Study Based on Collision Data of Urban Roads. Sustainability
**2023**, 15, 2944. [Google Scholar] [CrossRef] - Ren, J.; Zhao, H.; Zhang, L.; Zhao, Z.; Xu, Y.; Cheng, Y.; Wang, M.; Chen, J.; Wang, J. Design optimization of cement grouting material based on adaptive boosting algorithm and simplicial homology global optimization. J. Build. Eng.
**2022**, 49, 104049. [Google Scholar] [CrossRef] - Wang, C.; Xu, S.; Yang, J. Adaboost algorithm in artificial intelligence for optimizing the IRI prediction accuracy of asphalt concrete pavement. Sensors
**2021**, 21, 5682. [Google Scholar] [CrossRef] [PubMed] - Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep.
**2022**, 12, 6256. [Google Scholar] [CrossRef] - Wang, F.; Zhen, Z.; Wang, B.; Mi, Z. Comparative study on KNN and SVM based weather classification models for day ahead short term solar PV power forecasting. Appl. Sci.
**2017**, 8, 28. [Google Scholar] [CrossRef] - Garcia, S.; Derrac, J.; Cano, J.; Herrera, F. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 34, 417–435. [Google Scholar] [CrossRef] [PubMed] - Rajan, M. An efficient Ridge regression algorithm with parameter estimation for data analysis in machine learning. SN Comput. Sci.
**2022**, 3, 171. [Google Scholar] [CrossRef] - Wang, X.; Wang, X.; Ma, B.; Li, Q.; Wang, C.; Shi, Y. High-performance reversible data hiding based on ridge regression prediction algorithm. Signal Process.
**2023**, 204, 108818. [Google Scholar] [CrossRef] - Hothorn, T.; Lausen, B. Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognit.
**2003**, 36, 1303–1309. [Google Scholar] [CrossRef] - Wang, Q.; Luo, Z.; Huang, J.; Feng, Y.; Liu, Z. A novel ensemble method for imbalanced data learning: Bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci.
**2017**, 2017, 1827016. [Google Scholar] [CrossRef] [PubMed] - James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Linear regression. In An Introduction to Statistical Learning:With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023; pp. 69–134. [Google Scholar]
- Parashar, A.; Parashar, A.; Ding, W.; Shabaz, M.; Rida, I. Data Preprocessing and Feature Selection Techniques in Gait Recognition: A Comparative Study of Machine Learning and Deep Learning Approaches. Pattern Recognit. Lett.
**2023**, 172, 65–73. [Google Scholar] [CrossRef] - Wang, W.; Jing, H.; Guo, X.; Dou, B.; Zhang, W. Analysis of Water and Salt Spatio-Temporal Distribution along Irrigation Canals in Ningxia Yellow River Irrigation Area, China. Sustainability
**2023**, 15, 12114. [Google Scholar] [CrossRef] - Zhao, Y.; Zhu, W.; Wei, P.; Fang, P.; Zhang, X.; Yan, N.; Liu, W.; Zhao, H.; Wu, Q. Classification of Zambian grasslands using random forest feature importance selection during the optimal phenological period. Ecol. Indic.
**2022**, 135, 108529. [Google Scholar] [CrossRef] - Alduailij, M.; Khan, Q.W.; Tahir, M.; Sardaraz, M.; Alduailij, M.; Malik, F. Machine-learning-based DDoS attack detection using mutual information and random forest feature importance method. Symmetry
**2022**, 14, 1095. [Google Scholar] [CrossRef] - Fu, H.; Shen, Y.; Liu, J.; He, G.; Chen, J.; Liu, P.; Qian, J.; Li, J. Cloud detection for FY meteorology satellite based on ensemble thresholds and random forests approach. Remote Sens.
**2018**, 11, 44. [Google Scholar] [CrossRef] - Han, H.; Morrison, R.R. Improved runoff forecasting performance through error predictions using a deep-learning approach. J. Hydrol.
**2022**, 608, 127653. [Google Scholar] [CrossRef] - Vu, M.; Raghavan, S.V.; Liong, S.-Y. SWAT use of gridded observations for simulating runoff–a Vietnam river basin study. Hydrol. Earth Syst. Sci.
**2012**, 16, 2801–2811. [Google Scholar] [CrossRef] - Lu, X.; Li, J.; Liu, Y.; Li, Y.; Huo, H. Quantitative Precipitation Estimation in the Tianshan Mountains Based on Machine Learning. Remote Sens.
**2023**, 15, 3962. [Google Scholar] [CrossRef] - Yapo, P.O.; Gupta, H.V.; Sorooshian, S. Automatic calibration of conceptual rainfall-runoff models: Sensitivity to calibration data. J. Hydrol.
**1996**, 181, 23–48. [Google Scholar] [CrossRef] - Gu, X.; Yang, G.; He, X.; Zhao, L.; Li, X.; Li, P.; Liu, B.; Gao, Y.; Xue, L.; Long, A. Hydrological process simulation in Manas River Basin using CMADS. Open Geosci.
**2020**, 12, 946–957. [Google Scholar] [CrossRef] - Lee, J.; Noh, J. Development of a One-Parameter New Exponential (ONE) Model for Simulating Rainfall-Runoff and Comparison with Data-Driven LSTM Model. Water
**2023**, 15, 1036. [Google Scholar] [CrossRef] - Xu, Y.; Hu, C.; Wu, Q.; Jian, S.; Li, Z.; Chen, Y.; Zhang, G.; Zhang, Z.; Wang, S. Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation. J. Hydrol.
**2022**, 608, 127553. [Google Scholar] [CrossRef] - Feng, Z.-K.; Niu, W.-J.; Wan, X.-Y.; Xu, B.; Zhu, F.-L.; Chen, J. Hydrological time series forecasting via signal decomposition and twin support vector machine using cooperation search algorithm for parameter identification. J. Hydrol.
**2022**, 612, 128213. [Google Scholar] [CrossRef] - Patro, E.R.; De Michele, C.; Avanzi, F. Future perspectives of run-of-the-river hydropower and the impact of glaciers’ shrinkage: The case of Italian Alps. Appl. Energy
**2018**, 231, 699–713. [Google Scholar] [CrossRef] - Taylor, G.P.; Loikith, P.C.; Aragon, C.M.; Lee, H.; Waliser, D.E. CMIP6 model fidelity at simulating large-scale atmospheric circulation patterns and associated temperature and precipitation over the Pacific Northwest. Clim. Dyn.
**2023**, 60, 2199–2218. [Google Scholar] [CrossRef] - Fahu, C.; Tingting, X.; Yujie, Y.; Shengqian, C.; Feng, C.; Wei, H.; Jie, C. Discussion on the problem of “warming and humidification” and its future trend in the arid area of Northwest China. Sci. China Earth Sci.
**2023**, 53, 1246–1262. [Google Scholar] [CrossRef] - Zhao, Q.; Ye, B.; Ding, Y.; Zhang, S.; Yi, S.; Wang, J.; Shangguan, D.; Zhao, C.; Han, H. Coupling a glacier melt model to the Variable Infiltration Capacity (VIC) model for hydrological modeling in north-western China. Environ. Earth Sci.
**2013**, 68, 87–101. [Google Scholar] [CrossRef] - El Bilali, A.; Abdeslam, T.; Ayoub, N.; Lamane, H.; Ezzaouini, M.A.; Elbeltagi, A. An interpretable machine learning approach based on DNN, SVR, Extra Tree, and XGBoost models for predicting daily pan evaporation. J. Environ. Manag.
**2023**, 327, 116890. [Google Scholar] [CrossRef] [PubMed] - Li, X.; Zhang, L.; Zeng, S.; Tang, Z.; Liu, L.; Zhang, Q.; Tang, Z.; Hua, X. Predicting Monthly Runoff of the Upper Yangtze River Based on Multiple Machine Learning Models. Sustainability
**2022**, 14, 11149. [Google Scholar] [CrossRef] - Wang, G.; Hao, X.; Yao, X.; Wang, J.; Li, H.; Chen, R.; Liu, Z. Simulations of Snowmelt Runoff in a High-Altitude Mountainous Area Based on Big Data and Machine Learning Models: Taking the Xiying River Basin as an Example. Remote Sens.
**2023**, 15, 1118. [Google Scholar] [CrossRef] - Tang, H.; Zhang, F.; Zeng, C.; Wang, L.; Zhang, H.; Xiang, Y.; Yu, Z. Simulation of Runoff through Improved Precipitation:The Case of Yamzho Yumco Lake in the Tibetan Plateau. Water
**2023**, 15, 490. [Google Scholar] [CrossRef] - Aksan, F.; Suresh, V.; Janik, P.; Sikorski, T. Load Forecasting for the Laser Metal Processing Industry Using VMD and Hybrid Deep Learning Models. Energies
**2023**, 16, 5381. [Google Scholar] [CrossRef] - Guo, J.; Liu, Y.; Zou, Q.; Ye, L.; Zhu, S.; Zhang, H. Study on optimization and combination strategy of multiple daily runoff prediction models coupled with physical mechanism and LSTM. J. Hydrol.
**2023**, 624, 129969. [Google Scholar] [CrossRef] - Jin, Q.; Sun, Y.; Liu, Z.; He, S. Multidimensional tensor strategy for the inverse analysis of in-service bridge based on SHM data. Innov. Infrastruct. Solut.
**2023**, 8, 228. [Google Scholar] [CrossRef] - Khandelwal, A.; Xu, S.; Li, X.; Jia, X.; Stienbach, M.; Duffy, C.; Nieber, J.; Kumar, V. Physics guided machine learning methods for hydrology. arXiv
**2020**, arXiv:2012.02854. [Google Scholar] [CrossRef]

**Figure 1.**The locations of the Xinjiang Uygur Autonomous Region (

**A**) and the Hotan River Basin (

**B**) in China are shown. The Hotan River Basin (

**C**) includes the Hotan (HT) Meteorological Station, Tongguzilouke (TGZLK), and Wuluwati (WLWT) Hydrological Stations.

**Figure 5.**Feature analysis: Pearson correlation coefficient (

**a**) and random forest feature importance (

**b**).

**Figure 6.**Comparison of the time series curves of the simulation results and actual observations of runoff in the Yurungkash River from 1999 to 2019.

**Figure 7.**Box plot (

**A**) of the eight ML models simulating runoff from the YurungKash River versus the measured runoff and the Taylor diagram (

**B**) of the evaluation parameters (NSE, RMSE, MAE) for the eight ML models.

**Figure 8.**Comparison of the time series curves of the simulation results and actual observations of runoff in the Kalakash River from 1978 to 1998.

**Figure 9.**Box plot (

**A**) of the four ML models simulating runoff from the Kalakash River versus the measured runoff and the Taylor diagram (

**B**) of the evaluation parameters (NSE, RMSE, MAE) for the four ML models.

**Figure 10.**Measured runoff curve of the 1958–1998 data (

**a**). Reconstructed runoff curve of the 1999–2019 data (

**b**).

River Basin | Lengths (km) | Area (×10 ^{4} km^{3}) | Temperature (°C) | Precipitation (mm) | Runoff (×10^{8} m^{3}) | Glacier Area (km^{2}) |
---|---|---|---|---|---|---|

Yurungkash | 513 | 1.98 | 10.6 | 38.4 | 21.95 | 2958.31 |

Kalakash | 808 | 2.66 | 11.3 | 36.5 | 21.51 | 2163.17 |

Data Types | Input Data Name | Time Span | Obtaining Sources |
---|---|---|---|

Meteorological data | Air temperature (T_mean) | 1958–2019 | Hotan Meteorological Station (HT) |

Soil temperature (DT_mean) | |||

Total precipitation (P20_20) | |||

Relative humidity (RH) | |||

Sunshine hours (Sun) | |||

Wind speed (Wind) | |||

Hydrological data | Yurungkash River runoff | 1958–2019 | Tongguziluoke Hydrographic Station (TGZLK) |

Kalakash River runoff | 1958–1998 | Wuluwati Hydrographic Station (WLWT) | |

Atmospheric circulation data | El Niño-Southern Oscillation (ENSO) | 1958–2019 | Global Climate Observing System (GCOS) https://www.psl.noaa.gov/gcos_wgsp/ (accessed on 5 July 2023) |

Pacific Decadal Oscillation (PDO) | |||

Arctic Oscillation (AO) | |||

Atlantic Multidecadal Oscillation (AMO) | |||

North Atlantic Oscillation (NAO) | |||

Western Pacific Subtropical High Pressure Intensity (WPSHI) |

ML Models | The Core Idea | Strengths and Weaknesses | Reference |
---|---|---|---|

Random Forest (RF) | Randomly and independently select a subset of samples to construct multiple decision trees for training, input unknown data, predict each decision tree, and use voting or averaging to obtain the final regression results. | It can better prevent the overfitting phenomenon and overcome the problem of too large a feature dimension, with simple model structure, short training time, high efficiency, strong generalization ability, and good robustness. However, for the sample set with too much data noise, it is easy to produce the overfitting phenomenon. | [48,49,50] |

Gradient Boosting | The training process first finds a model with weak prediction accuracy, gradually reduces the residuals by adding a predictor, and calculates the residual value between the predicted and actual values of the model to achieve the purpose of improving accuracy, using the iterative principle to use the appropriate loss function, develops a strong learner based on the weak learner, and performs prediction simulation. | The training effect is better, not easy to produce overfitting, with the advantages of high interpretability, high learning efficiency, minimal prediction error, and high stability. However, it requires careful parameter tuning and longer training time. | [51,52] |

Support Vector Regression (SVR) | Using support vector machines to fit curves for regression analysis, finding a plane to which all the data in the set are closest, minimizing the risk to the expected value, is a machine learning regression algorithm based on support vector machines. | It can effectively solve the regression problem of high-dimensional features, only needing to use part of the support vector to do the decision of the hyperplane, with high accuracy and resolution. However, it is very sensitive to missing data and not very applicable when the sample size is very large. | [53,54] |

AdaBoost | Multiple weakly learned classifiers are learned by changing the weights of the training samples, and then these weakly learned classifiers are assembled to form a strongest learner for linear fitting to achieve the purpose of predictive simulation. | It can solve multi-class single-label and multi-label problems with high accuracy, highly flexible in use, and fully considers the weight of each classifier. However, the number of classifiers is not well set, and the imbalance of experimental data will lead to a decrease in prediction accuracy and a longer training time. | [55,56] |

K-Nearest Neighbor (KNN) | KNN scans the set of training samples to find the training sample that is most like the test sample, and then votes to determine the class of the test sample based on the class of the most similar training sample, or votes weighted by the degree of similarity between each sample and the test sample to obtain the result. | KNN is the simplest model in the learner. Based on the KNN regression algorithm, there is no need to consider boundary instances. However, using KNN is more computationally intensive, has low prediction accuracy when the samples are unbalanced, slow to predict, and not very interpretable. | [57,58,59] |

Ridge | An improvement on the method of least squares estimation. The core idea is to determine the value of the regular term coefficient parameter K. It is dedicated to solving the covariance data partitioning problem and is a regularized regression model. | Enough to effectively reduce the data overfitting phenomenon, the prediction of unknown data is more robust and the obtained regression coefficients are more in line with the mathematical reality, but its fitting ability is easily limited and the underfitting phenomenon may occur. | [60,61] |

Bagging | The input randomized uniformly selected dataset is trained in multiple rounds to construct weak learners with differences and parallel relationships, which are combined to obtain the final strong learner. | Bagging can be used directly to solve multi-classification and regression problems; by reducing the variance of the classifier, it improves the flourish error and can effectively prevent overfitting. However, underfitting can occur. | [62,63] |

Linear Regression | Based on regression analysis in mathematics, a straight line is used to describe the relationship more accurately between one or more independent variables and the dependent data. The input data are trained and processed in an algorithmic language to produce a simple prediction value. | The algorithm is simple, fast, and interpretable; however, it can only be used for regression problems, lacks some logic, has a low accuracy of predicted value, and is prone to overfitting. | [64] |

ML Methods | NSE | PBIAS (%) | RMSE | MAE |
---|---|---|---|---|

Random Forest | 0.82 | 4.89 | 1.32 | 0.70 |

Gradient Boosting | 0.84 | 9.44 | 1.24 | 0.65 |

SVR | 0.68 | 24.95 | 1.74 | 1.11 |

AdaBoost | 0.78 | −14.42 | 1.38 | 0.81 |

KNN | 0.56 | 13.94 | 2.03 | 1.10 |

Bagging | 0.78 | 11.07 | 1.42 | 0.75 |

Ridge | 0.53 | 34.13 | 2.11 | 1.42 |

Linear Regression | 0.51 | 39.99 | 2.14 | 1.48 |

ML Methods | NSE | PBIAS (%) | RMSE | MAE |
---|---|---|---|---|

Random Forest | 0.78 | −19.17 | 1.08 | 0.61 |

Gradient Boosting | 0.78 | −18.00 | 1.06 | 0.59 |

Bagging | 0.76 | −19.71 | 1.11 | 0.62 |

AdaBoost | 0.75 | −31.13 | 1.13 | 0.74 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, S.; Sun, M.; Wang, G.; Yao, X.; Wang, M.; Li, J.; Duan, H.; Xie, Z.; Fan, R.; Yang, Y.
Simulation and Reconstruction of Runoff in the High-Cold Mountains Area Based on Multiple Machine Learning Models. *Water* **2023**, *15*, 3222.
https://doi.org/10.3390/w15183222

**AMA Style**

Wang S, Sun M, Wang G, Yao X, Wang M, Li J, Duan H, Xie Z, Fan R, Yang Y.
Simulation and Reconstruction of Runoff in the High-Cold Mountains Area Based on Multiple Machine Learning Models. *Water*. 2023; 15(18):3222.
https://doi.org/10.3390/w15183222

**Chicago/Turabian Style**

Wang, Shuyang, Meiping Sun, Guoyu Wang, Xiaojun Yao, Meng Wang, Jiawei Li, Hongyu Duan, Zhenyu Xie, Ruiyi Fan, and Yang Yang.
2023. "Simulation and Reconstruction of Runoff in the High-Cold Mountains Area Based on Multiple Machine Learning Models" *Water* 15, no. 18: 3222.
https://doi.org/10.3390/w15183222