1. Introduction
The accelerating development of urbanization and industrialization has significantly impacted water environment quality, particularly through issues such as excessive nitrogen and phosphorus nutrients, frequent black and odorous water bodies, and habitat destruction, which gradually intensify and seriously threaten basin ecological security. As one of the most important sections of the Yangtze River Basin, the Shanghai section bears significant management and protection responsibilities for ecological environmental quality while supporting economic and social development. Efficient and accurate modeling and prediction of regional basin water quality changes have become important tools for water resource regulation and pollution control.
Among various water quality indicators, dissolved oxygen is considered one of the most representative and critical indicators because it reflects water self-purification capacity and the survival status of aquatic organisms. Changes in DO are influenced by multiple factors such as temperature, pH, total nitrogen, and total phosphorus, exhibiting strong non-linear and non-stationary characteristics. High-precision prediction of DO not only helps environmental management departments implement scientific regulation but also provides a data foundation for decision support in basin ecological compensation and for early warning regarding pollution sources.
Among various water quality indicators, dissolved oxygen is regarded as one of the most representative and critical indicators, as it reflects water self-purification capacity and the survival status of aquatic organisms [
1,
2]. Traditional machine learning methods [
3,
4,
5,
6,
7,
8,
9,
10] exhibit limitations in predicting DO when complex functions are involved. These methods typically assume that the relationships in the data are linear, making it difficult to capture the complex, non-linear relationships between dissolved oxygen and environmental factors. Furthermore, machine learning methods are highly sensitive to noise and outliers, which can lead to instability in the prediction results. Deep learning, however, possesses robust data learning and generalization capabilities, enabling efficient approximation of high-dimensional functions and extraction of hidden information within the data [
11]. Within deep learning models, recurrent neural networks (RNNs) [
12] demonstrate significant efficacy in the prediction of parameters within time series analysis. Long Short-Term Memory networks (LSTMs) [
13] and Gated Recurrent Units (GRUs) [
14], through the introduction of gating mechanisms, significantly enhance the ability to process long sequential data. Cao et al. [
15] proposed a DO prediction model for pond aquaculture based on K-means clustering and a GRU network. Whereas this model improved prediction accuracy, it still suffers from the vanishing gradient problem inherent to GRU structures and is less effective when handling large-scale datasets. To further address the vanishing gradient problem, researchers have integrated neural network models with attention mechanisms. Qin et al. [
16] proposed a dual-stage attention-based recurrent neural network (DA-RNN) that employs temporal and feature attention layers to dynamically emphasize crucial patterns, substantially improving long-term dependency modeling capabilities. Yang et al. [
17] proposed a hybrid model integrating Convolutional Neural Networks with Long Short-Term Memory networks (CNN-LSTM). This model utilizes convolutional layers to extract key features and employs an attention mechanism to weight time series features, thereby enhancing prediction accuracy. In addition, in recent years, the attention mechanism has been widely applied in multiple fields, such as air pollution prediction and water quality prediction [
18,
19,
20,
21], demonstrating excellent time series modeling capabilities and cross-domain adaptability.
However, the predictive performance of neural network models is also influenced by parameter settings. Currently, the selection of model parameters typically relies on researchers’ experience and iterative debugging, which consumes significant computational and human resources. To address this issue, intelligent optimization is gradually being introduced into hyperparameter configurations. By continuously iterating to search for the optimal parameters, the problems of low prediction accuracy and time-consuming parameter tuning caused by empirical parameter selection are avoided. The Fruit Fly Optimization Algorithm (FOA) is widely used due to its fast convergence speed and high efficiency [
22,
23,
24]. In [
25], improves the stability and prediction accuracy of the BP neural network by enhancing the fruit fly optimization algorithm. In addition, water quality data prediction also faces the problem of data non-stationarity and vulnerability to exogenous variables. To address this, experts have introduced signal decomposition to reduce data fluctuation and non-stationarity. Refs. [
26,
27] combined data decomposition techniques with neural network models, using Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD), respectively, to decompose water quality data before inputting it into models for prediction, achieving relatively good prediction results. However, the EMD algorithm itself lacks a rigorous theoretical foundation and may encounter endpoint effects and mode mixing problems during decomposition; additionally, the EEMD algorithm, although overcoming the mode mixing problem to a certain extent, has not completely eliminated it, and also has issues such as slow parallel processing speed and overfitting noise [
28]. Variational Mode Decomposition (VMD) can decompose complex, non-stationary sequences into multiple relatively stationary modal components while effectively avoiding the mode mixing problem [
29], overcoming the shortcomings of EMD and EEMD.
In summary, this study introduces a hybrid predictive model integrating Variational Mode Decomposition (VMD), an Improved Fruit Fly Optimization Algorithm (IFOA), and an Attention-based Gated Recurrent Unit (Attention-GRU) neural network. The VMD method decomposes data into sub-modal components with finite bandwidths. The IFOA-Attention-GRU model is then applied to each component for prediction, and the final prediction results are obtained by superimposing the predicted values of each component. Experimental results show that this model can accurately predict dissolved oxygen in the Shanghai region of the Yangtze River Basin, providing an important reference for environmental monitoring in this region.
2. Materials and Methods
2.1. Attention-GRU Network Model
Compared with traditional GRU networks, the Attention-GRU network [
30] introduces an attention mechanism, achieving adaptive weighted processing of input sequences, enabling the model to precisely identify and focus on key environmental factors affecting dissolved oxygen. Through a dynamic weight allocation mechanism, it effectively overcomes the limitations of traditional models in processing time series data. Especially in basin environments where dissolved oxygen concentrations are influenced by multiple factors, this network can optimize weights according to the actual contribution of each factor, thereby more accurately predicting the dynamic variation patterns of dissolved oxygen. Its structure is shown in
Figure 1.
The network workflow is as follows:
Initially, the input
at the current time step is combined with the hidden state
from the preceding time step. This combination undergoes a linear transformation and the application of a
tanh activation function to produce an intermediate variable
.
Subsequently, the intermediate variable
is multiplied by the weight vector w and normalized via the
softmax function, yielding the attention weight vector
for the current time step:
The attention weight vector
is multiplied by the input vector
, yielding the weighted input vector
:
In this formulation, S denotes the length of the input sequence.
Finally, the weighted input vector
and the hidden state from the preceding time step
, are inputted into the GRU layer. The GRU, through its gating mechanisms, manages the flow of information across time steps, thereby updating the hidden state
and denote the update gate and reset gate, respectively, , and represent the weight matrices for the input , while , and denote the weight matrices for the hidden state. Finally, , and correspond to the bias vectors.
2.2. Variational Mode Decomposition (VMD)
Variational Mode Decomposition (VMD) represents an effective methodology for time series signal processing, enabling the decomposition of a raw time series signal into a series of intrinsic mode functions (IMFs), each characterized by finite bandwidths and distinct center frequencies. This decomposition facilitates the extraction of underlying periodic characteristics within the data, thereby simplifying the signal’s complexity and mitigating non-stationarity inherent in the time series. The process is outlined as follows:
(1) Variational Problem Formulation
Following VMD, the initial signal undergoes decomposition into K constituent subsequences. Each of these subsequences is defined by a unique center frequency and a finite bandwidth. The primary objective is to minimize the bandwidth associated with each individual mode. The constraint expression is defined as:
The constraints are defined:
where
is the function of each mode;
is the center frequency of each mode;
is the Dirac function;
is the convolution operation; and
is the original signal.
(2) Transformation into an unconstrained optimization problem
Introducing the augmented Lagrange function, the constrained problem is transformed into an unconstrained optimization problem with the following optimization expression:
where
is the penalty parameter; λ is the Lagrange multiplier.
(3) Solve the center frequency and the components.
Combining the alternating direction multiplier method and Parseval theorem, Fourier transform, the center frequency and modal components can be solved by the following iterative steps to solve the saddle point of the Lagrange function and obtain the optimal solution. The specific iterative formula is:
Modal Component Update Equation:
Center frequency update formula:
Lagrange multiplier update formula:
where
is the noise tolerance,
,
and
, are the Fourier transforms of
,
,
,
, respectively.
2.3. Adaptive Mesh Division Optimization Fruit Fly Algorithm (IFOA)
The fruit fly algorithm has few parameters and high computational efficiency, but its fixed search span makes it prone to local optima and slow convergence, affecting the precision of the final optimization. Therefore, this paper proposes an Improved Fruit Fly Optimization Algorithm based on adaptive grid partitioning, which dynamically adjusts the grid size to achieve fine-grained search in high-fitness areas. Compared with traditional optimization algorithms, IFOA can rapidly explore globally in the early stage and conduct fine local searches in the later stage, effectively avoiding premature convergence issues, providing the Attention-GRU network with optimal parameter combinations, and enhancing the prediction accuracy of the model.
The adaptive grid partitioning formula is as follows:
where
denotes the grid size at the
i-th iteration,
is the attenuation coefficient, and
is the fitness function of Drosophila at position
in the
t-th iteration. When the fitness value of a region is high, the grid of that region is divided more finely. The specific steps are as follows:
- (1)
Set the number of fruit flies N, the maximum grid size , and the scent source concentration constant D. The position of each fruit fly is denoted as , and the judgment value of the odor concentration is , where .
- (2)
Adaptive gridding of the search space is performed, setting each grid as a possible solution. The odor concentration judgment value is calculated by the following equation:
where
denotes the location of the target source, and the best solution is selected in each grid.
- (3)
Dynamically adjust the grid size according to the fitness value of the optimal solution;
- (4)
Continue searching and adjusting the grid size in the new grid, which is satisfied if the average change in the location of the optimal solution is less than a set threshold value ϵ in Y consecutive iterations:
Then the algorithm determines convergence and stops the search. denotes the location of the global optimal solution at the t-th iteration;
- (5)
Repeat steps 2 and 3 until the search stops when the location of the global optimal solution does not change in Y consecutive iterations.
- (6)
When the algorithm terminates, the optimal solutions in all grids are compared, and the solution with the highest fitness value is selected as the global optimal solution. The optimized Drosophila position update formula is as follows:
where
and
describe the coordinates of the
i-th fruit fly at the
t-th iteration and t + 1st iterations, respectively, and rand(a) generates a random number.
2.4. VMD-IFOA-Attention-GRU Model
To enhance the model’s predictive accuracy on complex, high-frequency, and non-linear feature data, this study combines the Variable Modal Decomposition (VMD) technique with the Improved Fruit Fly Optimization Algorithm (IFOA) and the Attention-GRU model, and proposes a new model: VMD-IFOA-Attention-GRU. The procedure is as follows:
First, the VMD decomposes the original data
into n subsequences, each of which contains different frequency components to help extract potential periodic and dynamic change features. The decomposition expression is:
is the i-th subsequence obtained from the VMD decomposition, and these subsequences superimposed can approximately describe the trend of the original sequence ; is the remaining trend term.
Each decomposed subsequence is input into the Attention-GRU model separately for training, and the parameters in Attention-GRU, including the sample batch, the number of hidden layers, the learning rate, etc., are optimized using the Improved Fruit Fly Optimization Algorithm (IFOA), to ensure that the model can fully capture the features of each subsequence.
The final prediction results are derived by aggregating all subsequence forecasting outputs. The structure of the VMD-IFOA-Attention-GRU network is shown in
Figure 2.
4. Results
4.1. Model Performance Evaluation Indexes
To comprehensively assess the predictive performance and generalization capabilities of the model, we employed Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination (R
2) as evaluation metrics. Lower values of RMSE and MAE, coupled with a higher R
2 value, indicate superior model accuracy.
is the number of samples, represents the original data, is the predicted value, and is the mean of the true value.
4.2. Analysis of VMD Results
Before the VMD decomposition, the decomposition modulus number k needs to be determined, and its value directly affects the accuracy of the decomposition. In this paper, the center frequency observation method is used to evaluate the effect of different k values on the decomposition effect, and the value of k is gradually increased from k = 2, and the distribution characteristics of the center frequency are analyzed, as shown in
Table 2. When k is increased to 6, the center frequency of the last modal component is always stable at about 0.484. If the value of k continues to increase, it is easy to generate additional noise components. Therefore, it can be determined that the optimal value of the decomposition modal number k in this paper is 6. The VMD decomposition results are shown in
Figure 4.
The low-frequency component IMF1 reflects the overall trend of the original sequence over time; the mid-frequency components IMF2 to IMF5 contain the detailed information of the original sequence, demonstrating VMD’s capability in multi-scale feature extraction; the high-frequency component IMF6 has a higher fluctuation frequency, showing the most significant improvement in fitting effect for DO abnormal fluctuation segments. By separating multi-frequency signals through VMD, the model can focus on modeling relatively stable variation patterns within each frequency band, thereby significantly improving prediction accuracy.
4.3. Comparative Analysis of Different Models
To validate the enhanced efficacy of the proposed model within this study, it was compared with the following models, and the fitting results of each model are shown in
Table 3 and
Figure 5.
From the comparative experiment group (a), it can be observed that compared to the VMD-IFOA-LSTM model, the VMD-IFOA-Attention-GRU model shows a higher degree of alignment between the prediction curve and the measured curve. Not only is the overall trend well-aligned, but even in areas of intense short-term fluctuations, such as sudden increases or rapid decreases in DO levels, the model can still provide relatively accurate responses, demonstrating strong capability in capturing fluctuations.
In groups (b) and (c), the prediction curves of PCA-IFOA-Attention-GRU and IFOA-Attention-GRU models are relatively smooth, with poor fitting capability in peak and valley regions, easily resulting in underestimation or delayed response phenomena. In contrast, after introducing VMD, our proposed model accurately extracts the fluctuation trend information of the data, preserves key variation characteristics, and thus achieves superior fitting precision even during dramatic data fluctuations.
In group (d), the comparison among IFOA-Attention-GRU, FOA-Attention-GRU, and Attention-GRU indicates that although traditional FOA possesses certain parameter optimization capabilities, its convergence stability and result accuracy are inferior to the optimized IFOA. The improved IFOA enhances local search precision through an adaptive grid partitioning strategy, resulting in more optimal model parameter combinations, ultimately achieving significant improvements in three evaluation metrics: MAE, RMSE, and R2.
In summary, the integration of VMD for the decomposition of preprocessed dissolved oxygen data, coupled with the optimization of base model hyperparameters via an IFOA, demonstrably enhances model performance and elevates predictive accuracy.
4.4. Analysis of Ablation Experiments
In order to assess the independent and joint contributions of the VMD and the IFOA to the model performance, the following ablation experiments were designed in this study.
Table 4 shows the role of each model in the ablation experiments.
As shown in
Table 5, the individual introduction of either VMD or IFOA can improve prediction accuracy to some extent. However, the improvement becomes more significant when both VMD and IFOA modules are integrated. VMD transforms the complex and non-stationary DO time series into simpler and more stationary components, which capture the intrinsic variation characteristics of the data. These decomposed components provide a clearer signal structure for IFOA to perform optimization, enabling it to find the optimal hyperparameter combination for the Attention-GRU model. As a result, the MAE and RMSE are reduced by 47.7% and 57.5%, respectively, while the R
2 increases by 27.8%. These findings indicate that the combined application of VMD and IFOA significantly enhances the model’s prediction accuracy and overall performance.
In addition, according to
Figure 6, after adding only the VMD or IFOA module, the model predictions were significantly weaker and more data points deviated from the 45° fit line; with the introduction of both VMD and IFOA (VMD-IFOA-Attention-GRU), the model performance was significantly enhanced. The distribution of data points is more centralized, the dispersion is significantly reduced, and the fit to the 45° fitting line is significantly improved.
In summary, it can be seen that the introduction of the VMD module and the IFOA module plays an important role in improving the prediction accuracy of the model, and when they are used in combination, the prediction accuracy of the model can be further improved. The VMD reduces the interference of noise on the prediction by improving the smoothness of the data, whereas the IFOA improves the model’s fitting ability through adaptive parameter optimization. The experimental results show that the joint use of VMD and IFOA not only significantly improves the prediction accuracy but also enhances the robustness and generalization ability of the model under complex data conditions.
5. Conclusions
The VMD-IFOA-Attention-GRU hybrid model proposed in this study aims to improve the accuracy of dissolved oxygen prediction in the Shanghai section of the Yangtze River. By decomposing the original dissolved oxygen data through VMD, highly characteristic and relatively stable intrinsic mode functions are obtained, which facilitates subsequent prediction models to fully explore the fluctuation trends and detailed features of data during short-term oscillation processes, significantly enhancing the accuracy of dissolved oxygen prediction.
Experiments demonstrate that this model outperforms various comparative models in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R2), exhibiting excellent prediction accuracy, extreme value response capability, and generalization performance. Ablation experiments further indicate that VMD and IFOA modules have a significant synergistic effect in improving model performance.
The VMD-IFOA-Attention-GRU framework constructed in this study is not only applicable to dissolved oxygen prediction tasks, but its structural advantages and modeling approach can also be extended to the prediction of other important water quality indicators, showing wide adaptability, especially in complex time series data modeling tasks with strong noise disturbances and short-term non-linear mutation characteristics.
To further enhance the application scope of the model, future work could focus on deeply integrating this model with existing environmental monitoring platforms or decision support systems, exploring its application potential in multi-site collaborative prediction, heterogeneous water body adaptation, and real-time warning scenarios, thereby further improving the model’s engineering practicality and environmental governance support value.