Daily Runoff Forecasting in the Middle Yangtze River Using a Long Short-Term Memory Network Optimized by the Sparrow Search Algorithm
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area and Data
2.2. Coupled Model Construction and Scenario Design
2.2.1. LSTM
2.2.2. SSA
2.2.3. SSA-LSTM
- Feature Selection and Sequence Decomposition Preprocessing. To optimize prediction performance and efficiency, preprocessing of the original dataset was first performed. The Pearson correlation coefficient was computed between each feature and the prediction target (runoff). Features exhibiting stronger correlations, such as discharges from key hydrological stations (e.g., Chenglingji (0.96), Yichang (0.86)), were retained to form an optimized input feature set. This process aims to eliminate redundant information, thereby improving model training efficiency and generalization capability. Subsequently, VMD was applied to the target prediction sequence [40]. VMD is an adaptive technique for decomposing non-stationary signals and is widely employed in hydrology to disentangle multi-scale temporal characteristics [41]. The number of Intrinsic Mode Functions (IMFs), denoted as K, is a critical parameter in VMD. K and the maximum iterations of the VMD algorithm were determined through the SSA automated parameter optimization process. The optimized K value converged to 5, leading to the decomposition of the runoff series into 5 IMFs. A parallel modeling framework was implemented. In this framework, each of the 5 IMFs served as an independent prediction target. A dedicated SSA-LSTM model was trained for each IMF. The final discharge forecast was obtained by summing the predictions from all five IMF-specific models. This method helps capture the physical laws at different time scales in the data, thereby improving the final prediction accuracy.
- Hyperparameter encoding and search space definition. The key structural hyperparameters of the LSTM to be optimized were encoded into the position vector of each sparrow individual in the SSA. The search space was defined as follows: the number of hidden layers [1, 3], the number of neurons in each hidden layer [2, 50], and the dropout rate [0.001, 0.5]. The initial learning rate was fixed at 0.001. In the SSA framework, each hyperparameter is directly mapped to a dimension of the sparrow‘s position vector , and the algorithm searches within the predefined bounds for each hyperparameter.
- Iterative optimization and fitness evaluation. The SSA population iterated and updated its position according to its rules. In each generation, the corresponding LSTM model was constructed and trained using the combination of hyperparameters represented by the position vector of each sparrow individual. In the SSA optimization framework, each sparrow individual’s position vector encoded four key structural hyperparameters: the number of hidden layers , the number of neurons in the first and second layers and , and the dropout rate . The fitness function was used to evaluate the strengths and weaknesses of individuals. The fitness function for SSA was the Mean Absolute Error (MAE) calculated on the validation set.
- Final model training. When the SSA reached the maximum number of iterations or meets the convergence conditions, the optimal combination of hyperparameters obtained through search was substituted into the LSTM model architecture. After 20 generations of optimization with a population size of 10, the algorithm converged to an optimal configuration comprising a single hidden layer with 32 neurons and a dropout rate of 0.001. Other parameters were held constant. Using this optimal architecture, training was conducted on the training set and the validation set, and finally the SSA-LSTM prediction model with the best performance was obtained.
2.2.4. Model Evaluation Metrics
3. Results and Discussion
3.1. Model Calibration and Validation
3.2. Comparison of Model Prediction Performance
3.3. Predictive Performance Across Different Flow Regimes
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
| Algorithm A1: SSA-VMD-LSTM Prediction Model |
| Input: Historical hydrological and meteorological data Output: Predicted future runoff values # Step 1: Data Preprocessing and Feature Selection 1.1 Select features with highest Pearson correlation coefficients 1.2 Split data into training, validation, and testing sets # Step 2: VMD Parameter Optimization and Decomposition 2.1 Optimize VMD parameters using SSA 2.2 Decompose runoff series into K IMF components # Step 3: For each IMF component, optimize LSTM hyperparameters For each IMF component k: 3.1 Prepare input: [selected features, IMF_k] 3.2 Use SSA to optimize LSTM hyperparameters (layers, neurons, dropout) - Fitness function: MAE on validation set 3.3 Obtain optimal LSTM architecture for IMF_k # Step 4: Model Training and Prediction For each IMF component k: 4.1 Train LSTM with optimized architecture for IMF_k 4.2 Predict IMF_k values on test set # Step 5: Result Reconstruction and Evaluation 5.1 Final prediction = Σ (IMF_1_prediction + … + IMF_K_prediction) 5.2 Evaluate using RMSE, MAE, NSE metrics on test set Return predicted runoff values |
References
- Li, X.; Ye, X.; Li, Z.; Zhang, D. Hydrological drought in two largest river-connecting lakes in the middle reaches of the Yangtze River, China. Hydrol. Res. 2023, 54, 82–98. [Google Scholar] [CrossRef]
- An, C.; Fang, H.; Zhang, L.; Su, X.; Fu, X.; Huang, H.Q.; Parker, G.; Hassan, M.A.; Meghani, N.A.; Anders, A.M.; et al. Poyang and Dongting Lakes, Yangtze River: Tributary lakes blocked by main-stem aggradation. Proc. Natl. Acad. Sci. USA 2022, 119, e2101384119. [Google Scholar] [CrossRef]
- Wang, Y.; Guo, S.; Xiang, X.; Li, C.; Li, N. Impact of Three Gorges Reservoir Operation on Water Level at Jiujiang Station and Poyang Lake in the Yangtze River. Hydrology 2025, 12, 52. [Google Scholar] [CrossRef]
- Zhan, P.; Jiang, L.; Liu, K.; Chen, T.; Fan, C.; Zeng, F.; Song, C. Unveiling the Floodplain River-Lake Hydrological Interactions by SWOT Observations. Geophys. Res. Lett. 2025, 52, e2025GL118459. [Google Scholar] [CrossRef]
- Han, Y.; Zhong, P.; Zhu, F.; Zhang, Y.; Song, Z.; Wang, Y.; Wang, B.; Xu, C.; Ben, M.; Li, M. Spatiotemporal attribution of runoff changes in the upper Yangtze River Basin using the SWAT+ model. J. Hydrol. Reg. Stud. 2025, 61, 102753. [Google Scholar] [CrossRef]
- Shen, C. A trans-disciplinary review of deep learning research for water resources scientists. Water Resour. Res. 2018, 54, 8558–8593. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009; pp. 28–32. [Google Scholar]
- Granata, F.; Di Nunno, F.; de Marinis, G. Stacked machine learning algorithms and bidirectional long short-term memory networks for multi-step ahead streamflow forecasting: A comparative study. J. Hydrol. 2022, 613, 128431. [Google Scholar] [CrossRef]
- Di Nunno, F.; Zhu, S.; Ptak, M.; Sojka, M.; Granata, F. A stacked machine learning model for multi-step ahead prediction of lake surface water temperature. Sci. Total Environ. 2023, 890, 164323. [Google Scholar] [CrossRef]
- Ye, X.; Liu, F.; Zhang, Z.; Xu, C. Quantifying the Impact of Compounding Influencing Factors to the Water Level Decline of China’s Largest Freshwater Lake. J. Water Resour. Plan. Manag. 2020, 146, 5020006. [Google Scholar] [CrossRef]
- Granata, F.; Di Nunno, F. Neuroforecasting of daily streamflows in the UK for short- and medium-term horizons: A novel insight. J. Hydrol. 2023, 624, 129888. [Google Scholar] [CrossRef]
- Deulkar, A.M.; Dixit, P.R.; Londhe, S.N.; Jain, R.K. Comparative Assessment of Artificial Neural Networks (ANNs), Long Short Term Memory Network (LSTM) and Hydrologic Engineering Centre-Hydrologic Modelling System (HEC-HMS) for Runoff Modelling. Water Resour. Manag. 2025, 39, 2049–2068. [Google Scholar] [CrossRef]
- Asif, M.; Kuglitsch, M.M.; Pelivan, I.; Albano, R. Review and Intercomparison of Machine Learning Applications for Short-term Flood Forecasting. Water Resour. Manag. 2025, 39, 1971–1991. [Google Scholar] [CrossRef]
- Huang, S.; Xia, J.; Wang, Y.; Wang, W.; Zeng, S.; She, D.; Wang, G. Coupling Machine Learning into Hydrodynamic Models to Improve River Modeling with Complex Boundary Conditions. Water Resour. Res. 2022, 58, e2022WR032183. [Google Scholar] [CrossRef]
- Yao, S.; He, Z. River and Lake Evolution of the Middle and Lower Yangtze River Basin and Its Impacts. J. Chang. River Sci. Res. Inst. 2025, 42, 1. [Google Scholar] [CrossRef]
- Xiang, Z.; Yan, J.; Demir, I. A Rainfall-Runoff Model With LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res. 2020, 56, e2019WR025326. [Google Scholar] [CrossRef]
- Zhang, X.; Zheng, Z.; Wang, K. Prediction of runoff in the upper Yangtze River based on CEEMDAN-NAR model. Water Supply 2021, 21, 3307–3318. [Google Scholar] [CrossRef]
- Zhu, S.; Hrnjica, B.; Ptak, M.; Choiński, A.; Sivakumar, B. Forecasting of water level in multiple temperate lakes using machine learning models. J. Hydrol. 2020, 585, 124819. [Google Scholar] [CrossRef]
- Fan, H.; Jiang, M.; Xu, L.; Zhu, H.; Cheng, J.; Jiang, J. Comparison of Long Short Term Memory Networks and the Hydrological Model in Runoff Simulation. Water 2020, 12, 175. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhan, C.; Wang, Y.; Zhang, H.; Lin, Z. A coupled model applied to complex river-lake systems. Hydrol. Sci. J. 2024, 69, 95–105. [Google Scholar] [CrossRef]
- Yan, X.; Mohammadian, A.; Khelifa, A. Modeling spatial distribution of flow depth in fluvial systems using a hybrid two-dimensional hydraulic-multigene genetic programming approach. J. Hydrol. 2021, 600, 126517. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrol. Earth Syst. Sci. 2019, 23, 5089–5110. [Google Scholar] [CrossRef]
- Song, X.; Chen, Z.; Liu, H.; Shen, Y. Comparative Evaluation of Runoff Forecasting Methods in Xiangjiang River Basin at Multiple Temporal Scales. J. Chang. River Sci. Res. Inst. 2025, 42, 33–40. [Google Scholar] [CrossRef]
- Chu, H.; Jiang, Y.; Wang, Z. A Grid-Based Long Short-Term Memory Framework for Runoff Projection and Uncertainty in the Yellow River Source Area Under CMIP6 Climate Change. Water 2025, 17, 750. [Google Scholar] [CrossRef]
- Liu, J.; Xu, T.; Lu, C. VMDI-LSTM-ED: A novel enhanced decomposition ensemble model incorporating data integration for accurate non-stationary daily streamflow forecasting. J. Hydrol. 2025, 653, 132769. [Google Scholar] [CrossRef]
- Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [Google Scholar] [CrossRef]
- Pu, Z.; Yan, J.; Chen, L.; Li, Z.; Tian, W.; Tao, T.; Xin, K. A hybrid Wavelet-CNN-LSTM deep learning model for short-term urban water demand forecasting. Front. Environ. Sci. Eng. 2022, 17, 22. [Google Scholar] [CrossRef]
- Paul, V.; Ramesh, R.; Sreeja, P.; Jarin, T.; Sujith Kumar, P.S.; Ansar, S.; Ashraf, G.A.; Pandey, S.; Said, Z. Hybridization of long short-term memory with Sparrow Search Optimization model for water quality index prediction. Chemosphere 2022, 307, 135762. [Google Scholar] [CrossRef]
- Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer Nature: Cham, Switzerland, 2019; p. 8. [Google Scholar]
- Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
- Wei, X.; Chen, M.; Zhou, Y.; Ran, L.; Shi, R.; Zou, J. Mid-Long Term Hydrological Forecasting Based on Multiple Hybrid Models. J. Chang. River Sci. Res. Inst. 2025, 42, 24–31. [Google Scholar] [CrossRef]
- Mei, X.; Dai, Z.; Du, J.; Chen, J. Linkage between Three Gorges Dam impacts and the dramatic recessions in China’s largest freshwater lake, Poyang Lake. Sci. Rep. 2015, 5, 18197. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Xia, J.; Zeng, S.; Wang, Y.; She, D. Effect of Three Gorges Dam on Poyang Lake water level at daily scale based on machine learning. J. Geogr. Sci. 2021, 31, 1598–1614. [Google Scholar] [CrossRef]
- Meng, Y.; Jiang, L.; Du, E.; Zhang, X.; Wang, W.; Wang, L. A New Understanding of the Poyang Lake-Yangtze River Interaction: A Backwater Effect on the Yangtze River Perspective. Geophys. Res. Lett. 2025, 52, e2025GL114807. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Barzegar, R.; Aalami, M.T.; Adamowski, J. Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting. J. Hydrol. 2021, 598, 126196. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
- Yin, Z.; Xu, T.; Ye, H.; Wang, L.; Liang, L. Reconstruction of Daily Runoff Series in Data-Scarce Areas Based on Physically Enhanced Seq-to-Seq-Attention-LSTM Model. Water 2025, 17, 3396. [Google Scholar] [CrossRef]
- Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
- Gan, M.; Lai, X.J.; Guo, Y.; Lu, Z.; Chen, Y.P.; Pan, S.Q.; Pan, H.D.; Chu, A. Unravelling the spatiotemporal variation in the water levels of Poyang Lake with the variational mode decomposition model. Hydrol. Process 2024, 38, e15239. [Google Scholar] [CrossRef]
- Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
- Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
- Nedkov, S.; Campagne, S.; Borisova, B.; Krpec, P.; Prodanova, H.; Kokkoris, I.P.; Hristova, D.; Le Clec’H, S.; Santos-Martin, F.; Burkhard, B.; et al. Modeling water regulation ecosystem services: A review in the context of ecosystem accounting. Ecosyst. Serv. 2022, 56, 101458. [Google Scholar] [CrossRef]
- Wei, X.; Chen, M.E.; Zhou, Y.L.; Zou, J.H.; Ran, L.B.; Shi, R.B. Research on optimal selection of runoff prediction models based on coupled machine learning methods. Sci. Rep. 2024, 14, 32008. [Google Scholar] [CrossRef]
- Yang, L.; Zeng, S.; Xia, J.; Wang, Y.; Huang, R.; Chen, M. Effects of the Three Gorges Dam on the downstream streamflow based on a large-scale hydrological and hydrodynamics coupled model. J. Hydrol. Reg. Stud. 2022, 40, 101039. [Google Scholar] [CrossRef]
- Bing, J.P.; Deng, P.X.; Zhang, D.D.; Liu, X. Influence of Three Gorges Reservoir operation on hydrological regime of Poyang Lake. Yangtze River 2020, 51, 87–93. [Google Scholar] [CrossRef]










| Data Type | Data Content | Data Source |
|---|---|---|
| Daily Hydrological Data (2009–2016) | Daily outflow data of the Yichang and Jiujiang hydrological stations | Changjiang Water Resources Commission of the Ministry of Water Resources. (Wuhan, China) |
| Daily outflow data of the Chenglingji and Xiantao hydrological stations | Hubei Hydrology and Water Resources Center. (Wuhan, China) | |
| TGD discharge data | China Three Gorges Corporation. (Wuhan, China) | |
| Daily Rainfall Data (2009–2016) | Daily rainfall data from meteorological stations in the middle reaches of the Yangtze River basin | China Meteorological Data Center http://data.cma.cn/ (accessed from 1 January 2015) |
| Model | NSE | RMSE (m3/s) | MAE (m3/s) | KGE | R2 |
|---|---|---|---|---|---|
| LSTM | 0.94 | 3409.26 | 1902.78 | 0.84 | 0.95 |
| SSA-LSTM | 0.98 | 1729.33 | 926.72 | 0.93 | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, Q.; Dong, Y.; Zhan, C.; Wang, Y.; Wang, H.; Zou, H. Daily Runoff Forecasting in the Middle Yangtze River Using a Long Short-Term Memory Network Optimized by the Sparrow Search Algorithm. Water 2026, 18, 364. https://doi.org/10.3390/w18030364
Zhang Q, Dong Y, Zhan C, Wang Y, Wang H, Zou H. Daily Runoff Forecasting in the Middle Yangtze River Using a Long Short-Term Memory Network Optimized by the Sparrow Search Algorithm. Water. 2026; 18(3):364. https://doi.org/10.3390/w18030364
Chicago/Turabian StyleZhang, Qi, Yaoyao Dong, Chesheng Zhan, Yueling Wang, Hongyan Wang, and Hongxia Zou. 2026. "Daily Runoff Forecasting in the Middle Yangtze River Using a Long Short-Term Memory Network Optimized by the Sparrow Search Algorithm" Water 18, no. 3: 364. https://doi.org/10.3390/w18030364
APA StyleZhang, Q., Dong, Y., Zhan, C., Wang, Y., Wang, H., & Zou, H. (2026). Daily Runoff Forecasting in the Middle Yangtze River Using a Long Short-Term Memory Network Optimized by the Sparrow Search Algorithm. Water, 18(3), 364. https://doi.org/10.3390/w18030364

