# Study on the Snowmelt Flood Model by Machine Learning Method in Xinjiang

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}, accounting for 47.97% of China’s ice reserves [2]. The Tianshan Mountains straddle the entire territory of Xinjiang and are the birthplace of many international rivers. The cross-border rivers are complex and dense, and it is one of the areas with the most prominent cross-border river problems in the world, accounting for 20.0~40.0% of the total runoff in the Tianshan area [3]. Although snowmelt water is important for river runoff recharge, rapid snowmelt may also cause flood disasters. Snowmelt floods are often mixed with a large amount of ice, and may be accompanied by secondary disasters such as mudslides and landslides, causing great damage [4]. At the same time, due to the existence of seasonal frozen soil, the melting process of snow cover is uncertain, resulting in frequent occurrence of snowmelt floods in spring, which seriously threatens the safety of people’s lives and property [5].

## 2. Materials and Methods

#### 2.1. Study Area

^{2}, a longest confluence path of 26 km, a maximum altitude of 4147 m, and a maximum drop of 2440 m. Figure 1 shows the geographical location of the Lianggoushan catchment. The Lianggoushan catchment has a temperate continental climate, which is characterized by long sunshine hours, large temperature differences between day and night, obvious vertical differences, abundant precipitation, short frost-free period, sharp temperature changes in spring, rapid cooling in autumn, and great disparity between winter and summer. The annual average temperature is 5.4 °C, the minimum temperature is −36.5 °C, and the maximum temperature is 37.1 °C; the average minimum temperature in January is less than −10 °C, which is a severe cold area. The annual average sunshine is 2795 h, and the annual average precipitation is 561.7 mm. The snow thickness is 34.7 cm, the average wind speed in the area is 2.9 m/s, and the wind direction is mainly northwest. The frost-free period is generally about 104 days, the depth of the seasonal frozen soil layer is generally about 0.9 m, and the maximum frozen soil depth is 1.0 m.

#### 2.2. Data Collection

#### 2.3. Modeling Approaches

_{t}

_{+1}and Z

_{t}are the daily average water level of the watershed on day t + 1 and day t, respectively. When the model was applied and evaluated, the residual dZ and the water level Z

_{t}of the previous day were superimposed to obtain the Z

_{t}

_{+1}. In this study, t was taken as 30; that is, the data of the first 30 days were used to predict the water level residual on the 31st day. The model structure is shown in Figure 3.

#### 2.3.1. Element Screening

- (1)
- Pearson coefficient

_{1}, x

_{2}…., x

_{n}} and Y = {y

_{1}, y

_{2}…., y

_{n}}, R(X,Y) was used to define their degree of correlation (see Formula (2)).

- (2)
- Principal component analysis

- (3)
- Factor analysis

_{1}, X

_{2},… X

_{m}} satisfy Formula (3):

_{1}, f

_{2}, …, f

_{m}} is an m-dimensional vector (m ≤ n), and each component of f is a common factor. $\overline{\epsilon}$ reflects some inherent characteristics of the dataset and is an unobservable hidden variable. A = {a

_{i,j}|1 ≤ i ≤ n, 1 ≤ j ≤ m} and U = {u

_{1}, u

_{2}, …… u

_{n}} are the load matrix and special factors, respectively. a

_{i,j}reflect the importance of each jth common factor f

_{j}, and u

_{i}reflects the unique features in each sample Xi. In factor analysis, weighted least squares and regression methods can be used to calculate the factor scores of each common factor, in order to evaluate the importance of each factor.

#### 2.3.2. Machine Learning Methods

- (1)
- Support Vector Regression (SVR)

_{1}, X

_{2}, … X

_{n}}, define the output function as follows:

_{i}is the i-th independent variable; f(X

_{i}) is the model output; X

_{j*}is the support vector selected by the model (selected from all independent variables during the model training phase); nSV is the number of support vectors (not greater than the number of independent variable groups); ω And b are coefficients; ψ(X

_{i}, X

_{j*}) is a kernel function.

- (2)
- Random Forest (RF)

- (3)
- K-Nearest Neighbor (KNN)

_{1}, x

_{2}…, x

_{n}} and Y = {y

_{1}, y

_{2}…, y

_{n}} in KNN is defined by the following formula:

- (4)
- Artificial Neural Network (ANN)

- (5)
- Recurrent Neural Network (RNN)

_{1}, X

_{2}, … X

_{T}}, the cycle unit at time t of an RNN is expressed by the following formula:

- (6)
- Long Short-Term Memory Neural Network (LSTM)

_{1}, X

_{2}, … X

_{T}}, each LSTM unit has a dedicated unit memory, and at time t, the LSTM unit status is c and h is the output hidden state. The forgetting gate f, input gate i, and output gate o are used to control the model’s access to the storage unit. The calculation process of an LSTM unit is as follows:

#### 2.3.3. Evaluation Criteria

_{s}(i) is the simulated flow at the i-th moment, Q

_{o}(i) is the observed flow at the i-th moment.

^{2}) measures the explanatory proportion of independent variables and reflects the goodness of fit of the regression equation The value range of R

^{2}is [0, 1]. When R

^{2}is close to 0, the correlation is low. When R

^{2}is closer to 1, the correlation is higher. Its calculation formula is as follows:

_{s}is the average simulated flow, and $\overline{Q}$

_{o}is the average measured flow.

## 3. Result and Discussion

#### 3.1. Element Screening

#### 3.2. Machine Learning Results

#### 3.3. Selection of Hyperparameters

#### 3.4. Result Analysis

^{2}of 0.999 and 0.970, respectively. The next best results were from RF, whose RMSEs in the training period and the test period were 0.012 and 0.072, respectively; R

^{2}values were 0.999 and 0.969, respectively. From an application point of view, RF may be a better choice, because as long as the number of classifiers is set large enough, an ideal model can be obtained through training. The LSTM model requires more work on model structure design and parameter tuning.

## 4. Conclusions

- (1)
- We used Pearson coefficient, principal component analysis, and factor analysis to screen input elements, screen 14 kinds of meteorological observation data from JINGHE and BAYANBULAK stations, and finally select 5 kinds of elements for modeling, including average sea level pressure, average wind speed, snow cover depth of JINGHE and average station pressure, and snow cover depth of BAYANBULAK. From the perspective of Pearson coefficient, the average temperature, average dew point, and average sea level pressure had a very high linear correlation. When constructing the model, we approximated that they were equivalent and only retained one.
- (2)
- SVR, RF, KNN, ANN, RNN, and LSTM were selected to construct 24 sets of models with different hyperparameters. Among all the models, LSTM had the best results, and the RMSEs in the training period and the testing period were respectively 0.011 and 0.071, and R2 values were 0.999 and 0.970, respectively. Next best were the results of RF, whose RMSEs in the training period and the test period were 0.012 and 0.072, respectively; R2 values were 0.999 and 0.969, respectively. Compared to other models, LSTM performed best, but it had more hyperparameters to optimize. From an application point of view, RF may be a better choice, because as long as the number of classifiers is set large enough, a model with good performance can be obtained. The LSTM model requires more work on model structure design and parameter optimization.
- (3)
- From the contribution rate results of the RF model, when the model made predictions, the contribution of meteorological elements was higher, and the contribution of rainfall in the basin was lower. From the prediction results of LSTM, the average error of each month was relatively stable, most of which did not exceed ±0.01 m, and the errors fluctuated greatly in March and April. The selection of fitting data is very important when modeling. The results obtained by directly fitting the water level were not ideal. Adjusting the model to try to fit the water level residuals (i.e., the difference between future water levels and known water levels), and calculating future water levels based on the predicted residuals, would significantly improve the accuracy of the simulation.
- (4)
- The purpose of this study was to explore a hydrological forecast method that can be used in practical work under limited data conditions. Hydrological sensors have been widely constructed in Xinjiang, and as time goes by, more and more hydrological data will be available for modeling. For areas with rich hydrological data, there are more and better choices when modeling. Physical models, distributed models, or combinations of different types of models can obtain richer conclusions and results. Therefore, the method proposed in this study is a temporary solution when hydrological data are limited, and subsequent research on snowmelt models and forecasting and early warning technologies in Xinjiang should be continued.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Zhao, Y.; Deng, X.L.; Li, Q.; Yang, Q.; Huo, W. Characteristics of the Extreme Precipitation Events in the Tianshan Mountains in Relation to Climate Change. J. Glaciol. Geocryol.
**2010**, 32, 927–934. [Google Scholar] - Xu, L.P.; Li, P.H.; Li, Z.Q.; Zhang, Z.Y.; Wang, P.Y.; Xu, C.H. Advances in research on changes and effects of glaciers in Xinjiang mountains. Adv. Water Sci.
**2020**, 31, 946–959. [Google Scholar] [CrossRef] - Chen, Y.N.; Li, Z.; Fang, G.H. Changes of key hydrological elements and research progress of water cycle in the Tianshan Mountains, Central Asia. Arid. Land Geogr.
**2022**, 45, 1–8. [Google Scholar] - Cui, M.Y.; Zhou, G.; Zhang, D.H.; Zhang, S.Q. Global snowmelt flood disasters and their impact from 1900 to 2020. J. Glaciol. Geocryol.
**2022**, 44, 1898–1911. [Google Scholar] - Wei, T.F.; Liu, Z.H.; Wang, Y. Effect on Snowmelt Water Outflow of Snow-covered Seasonal Frozen Soil. Arid. Zone Res.
**2015**, 32, 435–441. [Google Scholar] [CrossRef] - Wu, S.F.; Liu, Z.H.; Qiu, J.H. Analysis of the Characteristics of Snowmelt Flood and Previous Climate Snow Condition in North Xinjiang. J. China Hydrol.
**2006**, 26, 84–87. [Google Scholar] - Huai, B.J.; Li, Z.Q.; Sun, M.P.; Xiao, Y. Snowmelt runoff model applied in the headwaters region of Urumqi River. Arid Land Geogr.
**2013**, 36, 41–48. [Google Scholar] [CrossRef] - Muattar, S.; Ding, J.L.; Abudu, S.; Cui, C.L.; Anwar, K. Simulation of Snowmelt Runoff in the Catchments on Northern Slope of the Tianshan Mountains. Arid Zone Res.
**2016**, 33, 636–642. [Google Scholar] [CrossRef] - Yu, Q.Y.; Hu, C.H.; Bai, Y.G.; Lu, Z.L.; Cao, B.; Liu, F.Y.; Liu, C.S. Application of snowmelt runoff model in flood forecasting and warning in Xinjiang. Arid Land Geogr.
**2023**, 1–15. [Google Scholar] - Dang, S.Z.; Liu, C.M. Modification of SNTHERM Albedo Algorithm and Response from Black Carbon in Snow. Adv. Mat. Res.
**2011**, 281, 147–150. [Google Scholar] [CrossRef] - Bartelt, P.; Lehning, M. A physical SNOWPACK model for the Swiss avalanche warning. Cold Reg. Sci. Technol.
**2002**, 35, 123–145. [Google Scholar] [CrossRef] - Wang, W.C.; Zhao, Y.W.; Tu, Y.; Dong, R.; Ma, Q.; Liu, C.J. Research on Parameter Regionalization of Distributed Hydrological Model Based on Machine Learning. Water
**2023**, 15, 518. [Google Scholar] [CrossRef] - Vafakhah, M.; Sedighi, F.; Javadi, M.R. Modeling the Rainfall-Runoff Data in Snow-Affected Watershed. Int. J. Comput. Electr. Eng.
**2014**, 6, 40. [Google Scholar] [CrossRef] - Thapa, S.; Zhao, Z.; Li, B.; Lu, L.; Fu, D.; Shi, X.; Tang, B.; Qi, H. Snowmelt-Driven Streamflow Prediction Using Machine Learning Techniques (LSTM, NARX, GPR, and SVR). Water
**2020**, 12, 1734. [Google Scholar] [CrossRef] - Himan, S.; Ataollah, S.; Somayeh, R.; Shahrokh, A.; Binh, T.P.; Fatemeh, M.; Marten, G.; John, J.C.; Dieu, T.B. Flash flood susceptibility mapping using a novel deep learning model based on deep belief network, back propagation and genetic algorithm. Geosci. Front.
**2021**, 12, 101100. [Google Scholar] [CrossRef] - Wang, G.; Hao, X.; Yao, X.; Wang, J.; Li, H.; Chen, R.; Liu, Z. Simulations of Snowmelt Runoff in a High-Altitude Mountainous Area Based on Big Data and Machine Learning Models: Taking the Xiying River Basin as an Example. Remote Sens.
**2023**, 15, 1118. [Google Scholar] [CrossRef] - Yang, R.; Zheng, G.; Hu, P.; Liu, Y.; Xu, W.; Bao, A. Snowmelt Flood Susceptibility Assessment in Kunlun Mountains Based on the Swin Transformer Deep Learning Method. Remote Sens.
**2022**, 14, 6360. [Google Scholar] [CrossRef] - Zhou, G.; Cui, M.Y.; Li, Z.; Zhang, S.Q. Dynamic evaluation of the risk of the spring snowmelt flood in Xinjiang. Arid Zone Res.
**2021**, 38, 950–960. [Google Scholar] - Waldmann, P. On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction. Front. Genet.
**2019**, 10, 899. [Google Scholar] [CrossRef] - Jackson, J.E. A User’s Guide to Principal Components; Wiley: Hoboken, NJ, USA, 1992. [Google Scholar]
- Horn, J.L. A rationale and test for the number of factors in factor analysis. Psychnmetrica
**1965**, 30, 179–185. [Google Scholar] [CrossRef] - Okkan, U.; Serbes, Z.A. Rainfall–runoff modeling using least squares support vector machines. Environmetrics
**2012**, 23, 549–564. [Google Scholar] [CrossRef] - Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol.
**2020**, 588, 125033. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Li, X.N.; Zhang, Y.J.; She, Y.J.; Chen, L.W.; Chen, J.X. Estimation of impervious surface percentage of river network regions using an ensemble leaning of CART analysis. Remote Sens. Land Resour.
**2013**, 25, 174–179. [Google Scholar] - Juna, A.; Umer, M.; Sadiq, S.; Karamti, H.; Eshmawi, W.; Mohamed, A.; Ashraf, I. Water Quality Prediction Using KNN Imputer and Multilayer Perceptron. Water
**2022**, 14, 2592. [Google Scholar] [CrossRef] - Lippmann, R.P. An introduction to computing with neural nets. IEEE Assp. Mag.
**1988**, 4, 4–22. [Google Scholar] [CrossRef] - Robert, H.N. Theory of the backpropagation neural network. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Washington, DC, USA, 18–22 June 1988. [Google Scholar]
- Wang, S.S.; Xu, P.B.; Hu, S.Y.; Wang, K. Research on a Deep Learning Based Model for Predicting Mountain Flood Water Level in Small Watersheds. Comput. Knowl. Technol.
**2022**, 18, 89–91. [Google Scholar] [CrossRef] - Gao, W.L.; Gao, J.X.; Yang, L.; Wang, M.J.; Yao, W.H. A Novel Modeling Strategy of Weighted Mean Temperature in China Using RNN and LSTM. Remote Sens.
**2021**, 13, 3004. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Liu, X.; Zhao, N.; Guo, J.Y.; Guo, B. Prediction of monthly precipitation over the Tibetan Plateau based on LSTM neural network. J. Geo-Inf. Sci.
**2020**, 22, 1617–1629. [Google Scholar] [CrossRef]

**Figure 8.**Statistics of water level error. (

**a**) Water level error in training. (

**b**) Water level error in testing.

**Figure 9.**Element importance. (

**a**) The importance of elements in the first 30 days. (

**b**) Accumulated importance of each element.

Station | Data Source | Item | Desc | Mean Value | Precision and Unit |
---|---|---|---|---|---|

Lianggoushan | Hydrological department | Z | Water level | 2216.56 | 0.01 m |

DYP | Precipitation | 1.8 | 0.1 mm | ||

JINGHE | GSOD | TEMP(J) | Average temperature | 49.4 | 0.1 °F |

DEWP(J) | Average dew point | 30.0 | 0.1 °F | ||

SLP(J) | Average sea level pressure | 1021.2 | 0.1 mb | ||

STP(J) | Average station pressure | 981.2 | 0.1 mb | ||

WDSP(J) | Average wind speed | 4.1 | 0.1 knots | ||

PRCP(J) | Daily precipitation | 0.14 | 0.01 inches | ||

SNDP(J) | Snow cover depth | 0.2 | 0.1 inches | ||

BAYANBULAK | GSOD | TEMP(B) | Average temperature | 26.1 | 0.1 °F |

DEWP(B) | Average dew point | 15.0 | 0.1 °F | ||

SLP(B) | Average sea level pressure | 1029.2 | 0.1 mb | ||

STP(B) | Average station pressure | 758.2 | 0.1 mb | ||

WDSP(B) | Average wind speed | 5.6 | 0.1 knots | ||

PRCP(B) | Daily precipitation | 0.05 | 0.01 inches | ||

SNDP(B) | Snow cover depth | 1.0 | 0.1 inches |

Item | Com1 | Com 2 | Com 3 | Com 4 | Com 5 |
---|---|---|---|---|---|

TEMP(J) | 0.91 | 0.26 | 0.11 | −0.21 | −0.16 |

TEMP(B) | 0.85 | 0.28 | 0.04 | −0.41 | −0.15 |

DEWP(J) | 0.90 | 0.21 | 0.05 | −0.14 | −0.15 |

DEWP(B) | 0.85 | 0.30 | 0.01 | −0.36 | −0.13 |

SLP(J) | −0.95 | −0.16 | 0.16 | 0.08 | 0.12 |

SLP(B) | −0.82 | −0.32 | 0.19 | 0.40 | 0.14 |

STP(J) | −0.95 | −0.13 | 0.22 | 0.05 | 0.11 |

STP(B) | −0.11 | −0.16 | 0.97 | −0.02 | −0.03 |

WDSP(J) | 0.35 | 0.70 | −0.08 | 0.00 | −0.05 |

WDSP(B) | 0.18 | 0.62 | −0.14 | −0.25 | −0.09 |

PRCP(J) | −0.24 | −0.14 | −0.14 | 0.23 | −0.14 |

PRCP(B) | 0.05 | 0.03 | −0.06 | 0.01 | −0.02 |

SNDP(J) | −0.26 | −0.10 | −0.03 | 0.09 | 0.78 |

SNDP(B) | −0.45 | −0.14 | 0.00 | 0.58 | 0.21 |

Algorithm | Setting Items | Hyperparameter | Training | Testing | ||
---|---|---|---|---|---|---|

RMSE | R^{2} | RMSE | R^{2} | |||

SVR | Kernel function | kernel = linear | 0.041 | 0.985 | 0.082 | 0.960 |

kernel= rbf | 0.033 | 0.990 | 0.075 | 0.967 | ||

kernel = poly | 0.036 | 0.988 | 0.078 | 0.964 | ||

kernel = sigmoid | 5884 | −3.2 × 10^{8} | 3251 | −6.2 × 10^{8} | ||

RF | Estimator number | Estimators = 10 | 0.014 | 0.998 | 0.073 | 0.969 |

Estimators = 50 | 0.013 | 0.998 | 0.072 | 0.969 | ||

Estimators = 100 | 0.012 | 0.999 | 0.072 | 0.970 | ||

Estimators = 500 | 0.012 | 0.999 | 0.072 | 0.969 | ||

KNN | Neighbor number | Neighbors = 2 | 0.016 | 0.997 | 0.071 | 0.970 |

Neighbor = 10 | 0.029 | 0.992 | 0.071 | 0.970 | ||

Neighbor = 30 | 0.033 | 0.990 | 0.070 | 0.971 | ||

Neighbor = 100 | 0.035 | 0.989 | 0.070 | 0.971 | ||

ANN | Number of neurons and layers | 16 × 16 | 0.038 | 0.986 | 0.074 | 0.968 |

32 × 32 | 0.040 | 0.985 | 0.078 | 0.964 | ||

64 × 64 | 0.031 | 0.991 | 0.075 | 0.967 | ||

256 × 256 | 0.022 | 0.995 | 0.075 | 0.967 | ||

RNN | Number of neurons and layers | 1024 | 0.011 | 0.999 | 0.083 | 0.959 |

64 × 32 | 0.010 | 0.999 | 0.076 | 0.966 | ||

128 × 64 × 32 | 0.012 | 0.999 | 0.076 | 0.966 | ||

256 × 128 × 64 × 32 | 0.011 | 0.999 | 0.075 | 0.967 | ||

LSTM | Number of neurons and layers | 1024 | 0.013 | 0.998 | 0.076 | 0.966 |

64 × 32 | 0.012 | 0.999 | 0.072 | 0.969 | ||

128 × 64 × 32 | 0.012 | 0.999 | 0.073 | 0.968 | ||

256 × 128 × 64 × 32 | 0.010 | 0.999 | 0.071 | 0.970 |

Algorithm | Training | Testing | ||
---|---|---|---|---|

RMSE | R^{2} | RMSE | R^{2} | |

SVR | 0.033 | 0.990 | 0.075 | 0.967 |

RF | 0.012 | 0.999 | 0.072 | 0.969 |

KNN | 0.016 | 0.997 | 0.071 | 0.970 |

ANN | 0.022 | 0.995 | 0.075 | 0.967 |

RNN | 0.011 | 0.999 | 0.075 | 0.967 |

LSTM | 0.010 | 0.999 | 0.071 | 0.970 |

Month | Mean Water Level (m) | Error of LSTM Model | ||
---|---|---|---|---|

Mean | Min | Max | ||

Jan. | 2216.29 | −0.009 | −0.014 | 0.000 |

Feb. | 2216.24 | 0.001 | −0.068 | 0.144 |

Mar. | 2216.22 | −0.005 | −0.465 | 0.102 |

Apr. | 2216.47 | −0.003 | −0.361 | 0.240 |

May | 2216.45 | 0.000 | −0.132 | 0.171 |

Jun. | 2216.97 | 0.004 | −0.092 | 0.101 |

Jul. | 2216.92 | 0.002 | −0.061 | 0.096 |

Aug. | 2216.74 | 0.008 | −0.037 | 0.048 |

Sep. | 2216.59 | −0.009 | −0.014 | −0.003 |

Oct. | 2216.33 | −0.010 | −0.015 | −0.004 |

Nov. | 2216.33 | −0.011 | −0.018 | −0.007 |

Dec. | 2216.53 | −0.010 | −0.017 | −0.007 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhou, M.; Lu, W.; Ma, Q.; Wang, H.; He, B.; Liang, D.; Dong, R.
Study on the Snowmelt Flood Model by Machine Learning Method in Xinjiang. *Water* **2023**, *15*, 3620.
https://doi.org/10.3390/w15203620

**AMA Style**

Zhou M, Lu W, Ma Q, Wang H, He B, Liang D, Dong R.
Study on the Snowmelt Flood Model by Machine Learning Method in Xinjiang. *Water*. 2023; 15(20):3620.
https://doi.org/10.3390/w15203620

**Chicago/Turabian Style**

Zhou, Mingqiang, Wenjing Lu, Qiang Ma, Han Wang, Bingshun He, Dong Liang, and Rui Dong.
2023. "Study on the Snowmelt Flood Model by Machine Learning Method in Xinjiang" *Water* 15, no. 20: 3620.
https://doi.org/10.3390/w15203620