Research on Annual Runoff Prediction Model Based on Adaptive Particle Swarm Optimization–Long Short-Term Memory with Coupled Variational Mode Decomposition and Spectral Clustering Reconstruction

: Accurate medium-and long-term runoff prediction models play crucial guiding roles in regional water resources planning and management. However, due to the significant variation in and limited amount of annual runoff sequence samples, it is difficult for the conventional machine learning models to capture its features, resulting in inadequate prediction accuracy. In response to the difficulties in leveraging the advantages of machine learning models and limited prediction accuracy in annual runoff forecasting, firstly, the variational mode decomposition (VMD) method is adopted to decompose the annual runoff series into multiple intrinsic mode function (IMF) components and residual sequences, and the spectral clustering (SC) algorithm is applied to classify and reconstruct each IMF. Secondly, an annual runoff prediction model based on the adaptive particle swarm optimization–long short-term memory network (APSO-LSTM) model is constructed. Finally, with the basis of the APSO-LSTM model, the decomposed and clustered IMFs are predicted separately, and the predicted results are integrated to obtain the ultimate annual runoff forecast results. By decomposing and clustering the annual runoff series, the non-stationarity and complexity of the series have been reduced effectively, and the endpoint effect of modal decomposition has been effectively suppressed. Ultimately, the expected improvement in the prediction accuracy of the annual runoff series based on machine learning models is achieved. Four hydrological stations along the upper reaches of the Fen River in Shanxi Province, China, are studied utilizing the method proposed in this paper, and the results are compared with those obtained from other methods. The results show that the method proposed in this article is significantly superior to other methods. Compared with the APSO-LSTM model and the APSO-LSTM model based on processed annual runoff sequences by single VMD or Wavelet Packet Decomposition (WPD), the method proposed in this paper reduces the RMSE by 40.95–80.28%, 25.26–57.04%, and 15.49–40.14%, and the MAE by 24.46–80.53%, 16.50–59.30%, and 16.58–41.80%, in annual runoff prediction, respectively. The research has important reference significance for annual runoff prediction and hydrological prediction in areas with data scarcity.


Introduction
In recent years, the frequent occurrence of extreme climate events has exerted a profound influence on the global water cycle [1].Extreme precipitation [2], extreme drought [3], Water 2024, 16 and extreme flooding [4] pose a great threat to human lives, property, and safety.An accurate runoff forecast plays a vital role in water resource management.By predicting the runoff for the next year, it can help decision makers better plan the utilization and distribution of water resources, thereby coping with possible extreme climate events such as droughts or floods, preventing and mitigating the corresponding disasters.Therefore, an increasing number of scholars [5,6] have conducted detailed studies on the methods related to runoff prediction.However, due to the extremely complex mechanism of the non-linear and non-stationary nature of runoff series [7], it is still an untoward task to forecast medium-and long-term runoff accurately.Currently, the widely employed models for runoff forecasting include physically based hydrological models and data-driven hydrological models [8].The hydrological models deriving from a physical process generally combine the processes of meteorological elements, and utilize the traditional runoff generation as well as confluence theory to achieve the runoff prediction.The drawbacks of the methods are quite apparent, such as the difficulty in acquiring meteorological data and the presence of numerous empirical parameters that need to be determined in traditional hydrological theories [9].In contrast, there is no need for the hydrological models based on data-driven, with machine learning, model ideas to require the explicit hydrological physical process, instead, just simply combine precipitation, evapotranspiration, and runoff data to achieve runoff prediction.The hydrological models based on data-driven model ideas can be further divided into two types, namely, those combined with thoughts of mathematical statistics [10] and the machine learning model ideas [11].Among them, the performances of machine learning models represented by the LSTM model [12], support vector machine (SVM) model [13], and extreme gradient boosting (XGB) model [14] have consistently exceeded the hydrological models based on physical processes in medium-and long-term runoff forecasting.Therefore, this type of method has gained favor among numerous scholars.
To further improve the runoff prediction ability built upon machine learning models, the current studies mainly focus on two aspects, on the one hand, optimizing and improving various parameters and mechanisms within machine learning models, starting from the internal mechanisms.Examples are employing optimization algorithms like particle swarm optimization to fine-tune the sensitive parameters of machine learning models [15], adding appropriate attention mechanisms [8,16], and incorporating multiple time scales into machine learning models [17].On the other hand, starting from reducing the complexity of data by integrating the "decomposition-prediction-reconstruction" strategy in the field of time series prediction [18,19], complex sequences are partitioned into multiple intrinsic mode function (IMF) components with simple characteristics and residual sequences established on certain mathematical rules.Subsequently, all of the IMFs can be predicted and reconstructed to obtain the final prediction results.The "decomposition-predictionreconstruction" strategy can further explore the data characteristics of runoff series, thereby effectively improving forecast accuracy [7,20].Nevertheless, the related research findings indicated that [21,22] the application of decomposition methods introduced a boundary effect to the sequences, limiting the further improvement in prediction accuracy to some extent.Aiming at this issue, periodic extension and quadratic decomposition methods have been proposed and applied [23][24][25], achieving relatively favorable results to a certain extent.However, in regard to annual runoff forecasting, the full potential of the two methods remains underutilized.The challenge lies in capturing the periodic traits of annual runoff series, primarily due to the inherent limitation in the length of series.Meanwhile, the direct extending may result in data distortion after extension.Conversely, making use of the quadratic decomposition method may exacerbate the endpoint problem.Therefore, determining the appropriate preprocessing methods for limited-length, complex, and nonstationary original annual runoff series to effectively extract their features before forecasting becomes pivotal in further enhancing prediction accuracy based on machine learning models.As the research deepens, it is found that by combining clustering algorithms, the finite length time series can be decomposed into IMF components with simple features; at the same time, the endpoint effect problem accompanying the decomposition can be reduced, which aligns well with the requirements of addressing the problem outlined in this article.
On account of this background, the clustering algorithm is employed to classify and reconstruct the decomposed annual runoff IMFs in this research, and a new annual runoff prediction model, termed VMDSC-APSO-LSTM, is constructed based on the basic prediction model APSO-LSTM.On one side, the use of clustering algorithms to process the annual runoff IMFs can avoid the length requirement of runoff sequences by applying periodic extension methods.On the other side, the number of reconstructed annual runoff IMFs is reduced, which is beneficial for mitigating the boundary effects induced by decomposition algorithms.
In summary, to improve the prediction accuracy of annual runoff series, based on the APSO-LSTM model, taking advantage of the variational mode decomposition (VMD) method, and aiming to reduce the endpoint effects, a comprehensive annual runoff prediction model, VMDSC-APSO-LSTM, is proposed in this study, which couples the spectral clustering (SC) algorithm and VMD method.Taking four hydrological stations in the upper reaches of the Fenhe River in China as research objects, the annual runoff prediction results built upon VMDSC-APSO-LSTM are compared and analyzed with the other three models to examine the effectiveness and applicability of the proposed method.
Considering the aforementioned discussion, the novelty of this study can be summarized in three parts.Firstly, a method for extracting complex time series features by coupling the SC algorithm and VMD method is proposed.Secondly, a comprehensive annual runoff prediction model, VMDSC-APSO-LSTM, with the help of the SC algorithm and VMD method is put forward.Finally, the effectiveness and superiority of the proposed method is confirmed via a case study.In addition, the research findings can be applied to annual runoff forecasts for other regions and even to forecasting tasks in other fields with the time series, which have the characteristics of significant spatial heterogeneity and limited sequence length.

VMD Model
The VMD model is an adaptive, completely non-recursive approach to sequence decomposition proposed by Dragomiretskiy et al. [26].This method can effectively reduce the phenomenon of "modal confusion" in empirical modal-type algorithms, which demonstrates the superior performance in non-stationary and non-linear complex signal sequences [27].Therefore, in this study, the method is utilized to decompose the annual runoff series and extract key information from the complex series.The primary principles of the model are as follows: 1.
Establish a variational problem: The marginal spectrum of each modal function a k (t) is solved by applying the Hilbert transform; subsequently, the exponential term of each modal center frequency bk is incorporated to complete the modulation of the fundamental band of a k (t).Finally, the bandwidth of each mode is determined by using the Gaussian smoothing method, and a variational problem with constraints is formulated as follows: where T represents the objective function of the variational problem.a k denotes the k-th modal function.b k signifies the center frequency of the k-th modal function.δ(t) belongs to the Dirac distribution.(δ(t) + j πt ) means a single spectrum, and f (t) is the original runoff sequence.

2.
Solve the variational problem: The aforementioned constrained problem is converted to an unconstrained problem by employing penalty factor α and Lagrange multiplier λ: The Parseval theorem is a fundamental theorem in signal processing and a Fourier analysis.According to Parseval's theorem in a Fourier transform, it can be found that the energy of the signal is equivalent in both the time and frequency domains.Therefore, the problem in the time domain can be solved in the frequency domain.For a signal f (t) and its Fourier transform F(ω), the Parseval theorem can be expressed as By using the Parseval theorem, the spectral characteristics of a signal can be observed from a frequency domain perspective.In short, such transformations enable the previously insignificant features of runoff sequences to be displayed in the complex field in a spectral manner, making it easier for further mining in deep learning.
The modal function ân+1 k , center frequency bn+1 k , and Lagrange multiplier λn+1 in Equation ( 2) are iteratively updated through the alternating-direction multiplier method, and the iterative formulae are as follows: where ˆis the frequency domain form corresponding to the Fourier transform of the signal.τ denotes the noise tolerance.n stands for the number of iterations.
The expression for the iteration termination condition is the following: < ε (5) in which ε represents the convergence tolerance error, which is set to 10 −7 in this study.

Spectral Clustering (SC) Model
Although the VMD model is a non-recursive decomposition method, the truncation of the signal and the use of the Hilbert transform can lead to certain boundary effects [21].For the sake of suppressing the error accumulation resulting from this effect in the prediction process, it is proposed in this study to classify each IMF after decomposition by the clustering model, and then integrate the IMFs in groups in order to reduce the prediction numbers of endpoints for the prediction model, thereby minimizing the error accumulation.Considering the large number of data points in each IMF, the mean and variance are selected as the eigenvalues of each IMF in the clustering model to form the point set.
The common clustering algorithms can be mainly classified into six categories, namely, the prototype clustering, density clustering, hierarchical clustering, grid clustering, model clustering, and spectral clustering.Among them, the SC is a method of clustering without requiring the clustering object to have a convex sphere or other specific shape.Considering Water 2024, 16, 1179 5 of 18 the unknown nature of each IMF sample after modal decomposition, the SC algorithm is adopted to classify the IMFs in this paper.
The SC is a clustering algorithm evolved from the graph theory.The main idea of SC is to treat all data as points in space initially, where these points can be connected to each other by edges.The edge weight value between two points that are farther away is lower, whereas it is higher for closer points.Subsequently, by slicing the graph composed of all data points, make the sum of edge weights between different subgraphs after slicing as low as possible, while making the sum of edge weights within subgraphs as high as possible, and thereby the data point clustering can be achieved.The principle of SC is as follows: First of all, suppose the weight between two points is ω ij ; then, for any point, the corresponding degree d i can be defined as the weights' sum of all edges connected to it, which can be expressed as Define the subset of point set V as A. The sum of the degrees for all vertices in subset A is denoted by vol(A): The degree matrix D can be constructed according to the definition of each point.Only the main diagonal has values in the matrix D. The expression of matrix D is Secondly, through calculating the similarity matrix S formed by these points, the adjacency matrix W can be obtained.The exact calculation methods of the similarity matrix S and adjacency matrix W will not be extensively discussed here, and the detailed procedure can be found in Reference [28].Consequently, the Laplacian matrix L can be calculated with the following expression: Finally, the indicator vector h j is introduced and the NCut is performed, in which h j is an n-dimensional vector, which can be calculated by transforming it into an optimization problem: where v i represents the points in the point set V. I is the unit diagonal matrix.H denotes the optimal indicator vector.The k-means clustering algorithm is chosen as the base algorithm for SC.The k-means clustering is performed on the points in vector H to obtain the final result of SC.

APSO-LSTM Model
The LSTM model is a recurrent neural network (RNN) that is suitable for capturing important event dependencies with large intervals in sequential data.The model overcomes the issues of gradient vanishing and exploding in the hidden layer variables of RNN.The implicit state of LSTM includes the implicit layer variables and the memory cells.The memory cells of LSTM are illustrated in Figure 1.In this study, a Dropout layer is set in the LSTM model to reduce the model's excessive dependence on training data and decrease the risk of model overfitting.
the optimal indicator vector.
The k-means clustering algorithm is chosen as the base algorithm for SC.The k-means clustering is performed on the points in vector H to obtain the final result of SC.

APSO-LSTM Model
The LSTM model is a recurrent neural network (RNN) that is suitable for capturing important event dependencies with large intervals in sequential data.The model overcomes the issues of gradient vanishing and exploding in the hidden layer variables of RNN.The implicit state of LSTM includes the implicit layer variables and the memory cells.The memory cells of LSTM are illustrated in Figure 1.In this study, a Dropout layer is set in the LSTM model to reduce the model's excessive dependence on training data and decrease the risk of model overfitting.Related studies [29][30][31] have shown that utilizing heuristic optimization algorithms to optimize the parameters of the LSTM model can effectively improve their accuracy in runoff prediction.The particle swarm optimization (PSO) is a population intelligence optimization algorithm inspired by the study of bird flocking behavior.The basic idea of the PSO algorithm revolves around finding the optimal solution through collaboration and information sharing among individuals in a population.The main process is to find the optimal solution by iteration after generating a series of random particles (random solutions).In each iteration, the particle updates their positions by constantly approaching the local and global optima.After obtaining the local and global optima, the particle updates the velocities and positions by Equations ( 12) and ( 13): ( + 1) =  () +  ( + 1) where i represents the number of particles, i = 1, 2, …, N. vi(t) means the velocity of the i th particle at time t.vi(t + 1) denotes the velocity of the i th particle at time t + 1. c1 and c2 are learning factors.r1 and r2 symbolize random numbers between 0 and 1. bpi and bgi signify the local and global optima, respectively.xi(t) is the position of the i th particle at time t.xi(t + 1) connotes the position of the i th particle at time t + 1.Typically, the maximum velocity of vi(t) is expressed as vmax.When the velocity reaches its maximum value, vi(t) = vmax.
To better obtain optimization results and prevent them from falling into local optima, some scholars have made improvements to the PSO algorithm [32,33].For example, an inertia factor ω, which belongs to (0, 1), is introduced to construct an adaptive weighted particle swarm optimization algorithm (APSO).Generally, as the weight value increases, the global optimization ability strengthens while the local optimization ability diminishes.Related studies [29][30][31] have shown that utilizing heuristic optimization algorithms to optimize the parameters of the LSTM model can effectively improve their accuracy in runoff prediction.The particle swarm optimization (PSO) is a population intelligence optimization algorithm inspired by the study of bird flocking behavior.The basic idea of the PSO algorithm revolves around finding the optimal solution through collaboration and information sharing among individuals in a population.The main process is to find the optimal solution by iteration after generating a series of random particles (random solutions).In each iteration, the particle updates their positions by constantly approaching the local and global optima.After obtaining the local and global optima, the particle updates the velocities and positions by Equations ( 12) and ( 13): where i represents the number of particles, i = 1, 2, . .., N. v i (t) means the velocity of the ith particle at time t.v i (t + 1) denotes the velocity of the ith particle at time t + 1. c 1 and c 2 are learning factors.r 1 and r 2 symbolize random numbers between 0 and 1. b pi and b gi signify the local and global optima, respectively.x i (t) is the position of the ith particle at time t.x i (t + 1) connotes the position of the ith particle at time t + 1.Typically, the maximum velocity of v i (t) is expressed as v max .When the velocity reaches its maximum value, v i (t) = v max .
Water 2024, 16, 1179 7 of 18 To better obtain optimization results and prevent them from falling into local optima, some scholars have made improvements to the PSO algorithm [32,33].For example, an inertia factor ω, which belongs to (0, 1), is introduced to construct an adaptive weighted particle swarm optimization algorithm (APSO).Generally, as the weight value increases, the global optimization ability strengthens while the local optimization ability diminishes.Conversely, as the weight value decreases, the global optimization ability weakens, and the local optimization ability strengthens.For optimal minimization function problems, the update of ω primarily follows the subsequent strategy: where ω min stands for the minimum value of weight, which is set to 0.4.ω max denotes the maximum value of weight, and the value is 0.9.f is the fitness value of each particle.f represents the average fitness value of all particles.f max is the maximum value of particle fitness.f min indicates the minimum value of particle weight.Due to the sensitivity of certain parameters in the LSTM model during actual training, the APSO algorithm is utilized for the optimization of sensitive parameters in LSTM, and the range of each optimization parameter is shown in Table 1.The APSO algorithm can effectively avoid the objective function falling into a local optimal solution, premature maturity, and convergence during the optimization process.Using the mean square error (MSE) between predicted and measured values as the objective function, an APSO-LSTM annual runoff prediction model is constructed.The process is outlined in Figure 2, and the main steps are as follows: 1.
Initialization of model parameters: The initial matrix of each optimization parameter in the LSTM algorithm is constructed, and the initial values of other insensitive parameters, population size, population dimension, learning factor, etc., in the APSO algorithm are determined.

2.
The particle populations X (learning rate, LSTM layer, max epochs) are randomly generated, and the initial velocity and initial position of the particles are defined.

3.
The values of LSTM parameters are assigned.The model networks under different parameters are trained, and each training process is recorded.4.
According to the fitness function, the optimal particle fitness value is selected by calculating and comparing the fitness value of each particle.The velocity and position of the particle itself are updated according to Equations ( 12) and (13), respectively.5.
When the selected maximum number of iterations has been reached, the minimum value of MSE at this time is picked as the optimization result of the objective function.
The optimal particle population location is the output.The obtained parameters are assigned to the LSTM model.The trained optimization model is adopted to predict the runoff volume, and then the prediction results can be achieved.

VMDSC-APSO-LSTM Model
Integrating the advantages of various basic methods mentioned above with pose of solving the significant boundary effect question of VMD in a small sa quence, a new annual runoff prediction model based on decomposition and clus proposed in this paper, namely, the VMDSC-APSO-LSTM model.The VMDSC LSTM model leverages VMD to extract key feature information from the runoff s aggregates the initial IMFs by SC, and forecasts and integrates the IMFs throug LSTM.The structure of the VMDSC-APSO-LSTM model is depicted in Figure 3.

Evaluation of the Model
For a comparative analysis of the accuracy for the proposed model, VMDSC-APSO-LSTM, in this study, three additional models, APSO-LSTM, VMD-APSO-LSTM, and WPD-APSO-LSTM, are added for comparison, and the models are assigned with numbers S4, S1, S2, and S3, respectively, as outlined in Table 2.Among them, the APSO-LSTM model is an improved machine learning model leveraging optimization algorithms, serving as the foundational model in this study.The VMD-APSO-LSTM model is the focus of improvement in this work, with the enhanced model being the VMDSC-APSO-LSTM model proposed in this research.To further examine the effectiveness of the proposed VMDSC-APSO-LSTM model in solving boundary effects, the WPD-APSO-LSTM model is introduced as a control.

Evaluation of the Model
For a comparative analysis of the accuracy for the proposed model, VMDSC-APSO-LSTM, in this study, three additional models, APSO-LSTM, VMD-APSO-LSTM, and WPD-APSO-LSTM, are added for comparison, and the models are assigned with numbers S4, S1, S2, and S3, respectively, as outlined in Table 2.Among them, the APSO-LSTM model is an improved machine learning model leveraging optimization algorithms, serving as the foundational model in this study.The VMD-APSO-LSTM model is the focus of improvement in this work, with the enhanced model being the VMDSC-APSO-LSTM model proposed in this research.To further examine the effectiveness of the proposed VMDSC-APSO-LSTM model in solving boundary effects, the WPD-APSO-LSTM model is introduced as a control.
The Nash efficiency coefficient (NSE), root mean square error (RMSE), and mean absolute error (MAE) are selected as evaluation indicators for the models, while each evaluation indicator is calculated as follows: where Q 0 (t) is the measured annual runoff volume, m 3 .Q p (t) denotes the predicted annual runoff volume, m 3 .n represents the number of years in the test period.

Study Area and Data
Fenhe River, the second largest tributary of the Yellow River, China, is depicted in

Decomposition of Annual Runoff Series
It is shown from related work [35] that the sensitive parameters of the VMD model are mainly the number of the decomposition layers K and the penalty factor α. When K is chosen appropriately, the components decomposed by the VMD method can reflect the frequency components contained in the original signal.If the selection of K is not proper, the under-decomposition or over-decomposition phenomenon will occur.Regarding the parameter α, as the corresponding value increases, the decomposition convergence speed

Decomposition of Annual Runoff Series
It is shown from related work [35] that the sensitive parameters of the VMD model are mainly the number of the decomposition layers K and the penalty factor α. When K is chosen appropriately, the components decomposed by the VMD method can reflect the frequency components contained in the original signal.If the selection of K is not proper, the under-decomposition or over-decomposition phenomenon will occur.Regarding the parameter α, as the corresponding value increases, the decomposition convergence speed of the VMD model tends to initially accelerate and then decelerate.However, the value of α is not a standard proportional or inverse relationship with the running speed of the model.At the same time, a higher value of α reduces the likelihood of a modal confounding occurrence in the results of VMD decomposition.
To sum up, it is necessary for this study to first determine the value of α.The manual tuning method is used to ensure that the value of α is increased by a certain gradient, when the number of decomposition layers is constant.The last α value is determined as the optimal penalty factor while the average absolute error between the reconstructed data and the original data appears to increase.The optimal α value of each station is shown in Table 3.The main methods for determining the number of decomposition layers K are the empirical value method and the center frequency method [36].The center frequency method is adopted in this study and the process to determine the number of decomposition layers at each station is shown in Figure 5.The main methods for determining the number of decomposition layers K are the empirical value method and the center frequency method [36].The center frequency method is adopted in this study and the process to determine the number of decomposition layers at each station is shown in Figure 5. From Figure 5, it can be seen that as the number of decomposition layers increases, the center frequencies of each IMF are gradually closer to each other.For the Zhaishang Station, when the number of decomposition layers is set to 5, the center frequency of IMF5 is 0.4634; as the number of decomposition layers turns to 6, the center frequency of IMF6 equals to 0.4720.The center frequency of the IMF increases less than 0.01, which indicates From Figure 5, it can be seen that as the number of decomposition layers increases, the center frequencies of each IMF are gradually closer to each other.For the Zhaishang Station, when the number of decomposition layers is set to 5, the center frequency of IMF5 is 0.4634; as the number of decomposition layers turns to 6, the center frequency of IMF6 equals to 0.4720.The center frequency of the IMF increases less than 0.01, which indicates that when the runoff sequence of Zhaishang Station is decomposed to the fifth layer, the feature information contained in the sequence can be basically extracted, and if the number of decomposition layers is increased continuously, the phenomenon of "modal mixing" may appear.Therefore, for the sequence of Zhaishang Station, five is selected as the number of decomposition layers.Similarly, by analyzing the runoff series of other stations, it is found that the runoff sequence of each station is optimized when the number of decomposition layers is five.
Combining the parameters determined above, the annual runoff series for the four stations are decomposed, respectively, and the results are depicted in Figure 6.

Clustering Grouping of IMFs
In this research, the adjacency matrix in the SC algorithm is calculated by employing the Gaussian kernel function radial basis function (RBF).For the standard deviation of RBF, the integer value of the average standard deviation of the clustering points is used.After multiple adjustment experiments, the number of categories for clustering is consistently determined to be three.When each parameter is determined, the decomposition results of each station are analyzed by SC, setting the IMF mean as the horizontal coordinate and the variance as the vertical coordinate.The results are illustrated in Figure 7.

Clustering Grouping of IMFs
In this research, the adjacency matrix in the SC algorithm is calculated by employing the Gaussian kernel function radial basis function (RBF).For the standard deviation of RBF, the integer value of the average standard deviation of the clustering points is used.After multiple adjustment experiments, the number of categories for clustering is consistently determined to be three.When each parameter is determined, the decomposition results of each station are analyzed by SC, setting the IMF mean as the horizontal coordinate and the variance as the vertical coordinate.The results are illustrated in Figure 7.It is seen from Figure 7a that, for the Zhaishang Station, IMF1 and IMF3 are the first category, IMF2 and IMF4 are the second category, and IMF5 is the third category.With regard to the Lancun station, IMF1 and IMF2 belong to the first category, IMF3 and IMF4 are the second category, and IMF5 enters into the third category, as shown in Figure 7b.When the Fenhe Reservoir Station is referred, IMF1 occupies the first category, IMF2 and IMF4 take up the second category, and IMF3 and IMF5 are the third category, as shown in Figure 7c.With respect to the Shangjinyou Station, IMF1 becomes the first category, IMF2 and IMF4 turn into the second category, and IMF3 and IMF5 are the third category, as shown in Figure 7d.Combining the IMFs belonging to the same category results in new IMFs, which will be utilized in the subsequent prediction process.By clustering and grouping the IMFs, on the one hand, it is ensured that the impact brought by the endpoint effect is minimized on the basis of all the information extraction by VMD; on the other hand, the number of IMFs declines, and the computational scale of the prediction model is decreased, which in turn shortens the overall prediction duration.

Prediction Results and Discussion
The prediction results of each model for different hydrological stations are illustrated in Figure 8. From Figure 8, it can be noticed that the S1 (APSO-LSTM) model can basically capture the future trend of runoff, but performs poorly in predicting the runoff variation process.Compared with S1, the S2 (VMD-APSO-LSTM) model has significantly improved the prediction effect on the runoff change process, and can more accurately reflect the It is seen from Figure 7a that, for the Zhaishang Station, IMF1 and IMF3 are the first category, IMF2 and IMF4 are the second category, and IMF5 is the third category.With regard to the Lancun station, IMF1 and IMF2 belong to the first category, IMF3 and IMF4 are the second category, and IMF5 enters into the third category, as shown in Figure 7b.When the Fenhe Reservoir Station is referred, IMF1 occupies the first category, IMF2 and IMF4 take up the second category, and IMF3 and IMF5 are the third category, as shown in Figure 7c.With respect to the Shangjinyou Station, IMF1 becomes the first category, IMF2 and IMF4 turn into the second category, and IMF3 and IMF5 are the third category, as shown in Figure 7d.Combining the IMFs belonging to the same category results in new IMFs, which will be utilized in the subsequent prediction process.By clustering and grouping the IMFs, on the one hand, it is ensured that the impact brought by the endpoint effect is minimized on the basis of all the information extraction by VMD; on the other hand, the number of IMFs declines, and the computational scale of the prediction model is decreased, which in turn shortens the overall prediction duration.

Prediction Results and Discussion
The prediction results of each model for different hydrological stations are illustrated in Figure 8. From Figure 8, it can be noticed that the S1 (APSO-LSTM) model can basically capture the future trend of runoff, but performs poorly in predicting the runoff variation process.Compared with S1, the S2 (VMD-APSO-LSTM) model has significantly improved the prediction effect on the runoff change process, and can more accurately reflect the trend of the runoff process during the test period.It is indicated that introducing the decomposition method to process the runoff sequences can obtain better linear data, thereby reducing the difficulty of model prediction and improving the prediction accuracy.However, it should be pointed out that, due to the small sample size of the annual runoff series, the endpoint effect of the S2 model results in significant deviations in predicted results for the two ends of the test period, namely, 1995 and 2000, for each station.As for the S3 (WPD-APSO-LSTM) model, although it can also accurately reflect the trend of runoff process changes during the testing period, and the overall prediction results at the boundaries are better that of S2, it still exhibits significant flaws.For example, the boundary prediction of the Lancun Station in 2000 is negative, as shown in Figure 8b.Comparing the calculation results of the above three models, the S4 (VMDSC-APSO-LSTM) model proposed in this study not only grasps the trend of the runoff process more precisely, but also performs well at the boundaries of the test period, especially at Zhaishang Station, Lancun Station, and Fenhe Reservoir Station, where the forecasting results are completely superior to those of S3.In order to further clarify the performance of each model at different stations, the fitting effects between the predicted and measured values of the model during the test period at each station are analyzed, and the evaluation indexes of each model are calculated.The results are shown in Figure 9 and Table 4, respectively.It can be seen from Figure 9 that the S4 model has the highest degree of scatter compactness, with the adjusted In order to further clarify the performance of each model at different stations, the fitting effects between the predicted and measured values of the model during the test period at each station are analyzed, and the evaluation indexes of each model are calculated.The results are shown in Figure 9 and Table 4, respectively.It can be seen from Figure 9 that the S4 model has the highest degree of scatter compactness, with the adjusted R 2 = 0.95, followed closely by S2, S3, and S1 models, with the corresponding adjusted R 2 of 0.93, 0.92, and 0.66, respectively.The above results indicate that the S4 model proposed in this article demonstrates the most outstanding performance overall in the annual runoff prediction for the four stations.
Water 2024, 16, x FOR PEER REVIEW 16 of 19 R 2 = 0.95, followed closely by S2, S3, and S1 models, with the corresponding adjusted R 2 of 0.93, 0.92, and 0.66, respectively.The above results indicate that the S4 model proposed in this article demonstrates the most outstanding performance overall in the annual runoff prediction for the four stations.

9.
Scatter plot for prediction results of each model.
A further observation of Table 4 shows that the overall performance of the four models in this study is ranked as S4 > S3 > S2 > S1.Specifically, compared to S3, the S4 model reduces RMSE by 15.49-40.14%,and MAE by 16.58-41.80%.Compared to S2, the S4 model brings down RMSE and MAE by 25.26-57.04%and 16.50-59.30%,respectively.The above results illustrate that the S4 model, which combines VMD and SC algorithms, is more suitable for annual runoff prediction with relatively few runoff samples.It can effectively improve the significant endpoint effect of the S2 model when the length of the annual runoff series is small, and the improved effect is even better than that of the prediction model using S3.In addition, compared with the basic model S1, the reduction in RMSE and MAE of the S4 model are cut down by a greater extent, ranging from 40.95-80.28%and 24.46-80.53%,respectively.It is indicated that the synergistic effect of decomposition, clustering, and the LSTM model can weaken the inadequacy of the individual LSTM model, minimize the complexity of annual runoff series, and ameliorate the forecasting accuracy.Observing the NSE values of each model in Table 4, it is evident that the NSE values of each station based on the S4 model are all greater than 0.70, especially at Zhaishang Station and Lancun Station, with the NSE values of 0.87 and 0.90, respectively.According to the standard for hydrological information and hydrological forecasting [37], the accuracy levels of the four stations predicted by the S4 model have reached level B (good, NSE ≥ 0.70).A further observation of Table 4 shows that the overall performance of the four models in this study is ranked as S4 > S3 > S2 > S1.Specifically, compared to S3, the S4 model reduces RMSE by 15.49-40.14%,and MAE by 16.58-41.80%.Compared to S2, the S4 model brings down RMSE and MAE by 25.26-57.04%and 16.50-59.30%,respectively.The above results illustrate that the S4 model, which combines VMD and SC algorithms, is more suitable for annual runoff prediction with relatively few runoff samples.It can effectively improve the significant endpoint effect of the S2 model when the length of the annual runoff series is small, and the improved effect is even better than that of the prediction model using S3.In addition, compared with the basic model S1, the reduction in RMSE and MAE of the S4 model are cut down by a greater extent, ranging from 40.95-80.28%and 24.46-80.53%,respectively.It is indicated that the synergistic effect of decomposition, clustering, and the LSTM model can weaken the inadequacy of the individual LSTM model, minimize the complexity of annual runoff series, and ameliorate the forecasting accuracy.Observing the NSE values of each model in Table 4, it is evident that the NSE values of each station based on the S4 model are all greater than 0.70, especially at Zhaishang Station and Lancun Station, with the NSE values of 0.87 and 0.90, respectively.According to the standard for hydrological information and hydrological forecasting [37], the accuracy levels of the four stations predicted by the S4 model have reached level B (good, NSE ≥ 0.70).
In summary, in the prediction of annual runoff, the VMDSC-APSO-LSTM model proposed in this paper outperforms the other comparative models.This can be attributed to several factors: (1) The VMD method can better capture the essential features of the annual runoff series, such as the trend and period, by decomposing the non-stationary annual runoff series with spatial heterogeneity into multiple modal components.(2) Integrating the VMD and SC algorithm enables the recombination of the decomposed modal components, thus minimizing the number of decomposition layers.The reduction in layers decreases the loop iteration of the program, boosting the computational speed of the combined model, which helps mitigate the adverse effects of endpoint effects caused by modal components.It can be seen that a prediction model combining the decomposition method, the clustering method, the heuristic optimization algorithm, and the deep learning technology can achieve a more satisfactory prediction effect in the annual runoff prediction.

Conclusions
Based on the characteristics of a small sample size of annual runoff series and a serious endpoint effect during decomposition, starting from the theoretical VMD, and with the aim of suppressing the endpoint effects, an annual runoff prediction model called VMDSC-APSO-LSTM is proposed in this paper, which couples VMD and SC methods on the basis of an improved machine learning algorithm.The model combines two efficient signal-processing methods, VMD and SC.By decomposing and clustering the annual runoff series, it can capture the key feature information of the annual runoff series more accurately and finally improve the prediction accuracy.The proposed model and three comparative models are applied to the annual runoff prediction of four hydrological stations in the upper reaches of Fenhe Basin.The results show that the VMDSC-APSO-LSTM model proposed in this paper is significantly better than other comparative models in terms of accuracy.Based on the model proposed in this paper, the methodological barriers encountered in predicting runoff with limited sample data have been effectively addressed to a certain extent, providing valuable insights for annual runoff prediction and other short-term hydrological element prediction.

Figure 1 .
Figure 1.The internal operation structure of LSTM.

Figure 1 .
Figure 1.The internal operation structure of LSTM.

2. 4 .
VMDSC-APSO-LSTM ModelIntegrating the advantages of various basic methods mentioned above with the purpose of solving the significant boundary effect question of VMD in a small sample sequence, a new annual runoff prediction model based on decomposition and clustering is proposed in this paper, namely, the VMDSC-APSO-LSTM model.The VMDSC-APSO-LSTM model leverages VMD to extract key feature information from the runoff sequence, aggregates the initial IMFs by SC, and forecasts and integrates the IMFs through APSO-LSTM.The structure of the VMDSC-APSO-LSTM model is depicted in Figure3.

Figure 4 .
The four hydrological stations located in the upper reaches of the Fenhe River basin, namely, Zhaishang Station, Lancun Station, Fenhe Reservoir Station, and Shangjingyou Station, are utilized as the subjects of study.The collected annual runoff series measured at each hydrological station from 1958 to 2000 are divided into a training set and testing set, with the training set from 1958 to 1994 and the testing set from 1995 to 2000.To prevent the leakage of training set information, the training set is further divided into a new training set and validation set [34].The new training set is from 1958 to 1988, and the validation set is from 1989 to 1994.Water 2024, 16, x FOR PEER REVIEW 11 of 19

Figure 4 .
Figure 4. Location of study area and hydrological stations.

Figure 4 .
Figure 4. Location of study area and hydrological stations.

Figure 5 .
Figure 5.The results of center frequencies for each IMF at different decomposition layers.

Figure 5 .
Figure 5.The results of center frequencies for each IMF at different decomposition layers.

Water 2024 , 19 Figure 6 .
Figure 6.Results of the variational mode decomposition at each station.

Figure 6 .
Figure 6.Results of the variational mode decomposition at each station.

Water 2024 ,
16,  x FOR PEER REVIEW 15 of 19 trend of the runoff process during the test period.It is indicated that introducing the decomposition method to process the runoff sequences can obtain better linear data, thereby reducing the difficulty of model prediction and improving the prediction accuracy.However, it should be pointed out that, due to the small sample size of the annual runoff series, the endpoint effect of the S2 model results in significant deviations in predicted results for the two ends of the test period, namely, 1995 and 2000, for each station.As for the S3 (WPD-APSO-LSTM) model, although it can also accurately reflect the trend of runoff process changes during the testing period, and the overall prediction results at the boundaries are better than that of S2, it still exhibits significant flaws.For example, the boundary prediction of the Lancun Station in 2000 is negative, as shown in Figure8b.Comparing the calculation results of the above three models, the S4 (VMDSC-APSO-LSTM) model proposed in this study not only grasps the trend of the runoff process more precisely, but also performs well at the boundaries of the test period, especially at Zhaishang Station, Lancun Station, and Fenhe Reservoir Station, where the forecasting results are completely superior to those of S3.

Figure 8 .
Figure 8. Annual runoff prediction results for each station.

Figure 8 .
Figure 8. Annual runoff prediction results for each station.

Figure 9 .
Figure 9. Scatter plot for prediction results of each model.

Table 1 .
Preferred Range of Parameters.

Table 2 .
Implication of each model.

Table 3 .
Table of values for penalty coefficient α.

Table 3 .
Table of values for penalty coefficient .

Table 4 .
Performance indicators of each model.