1. Introduction
As global water environmental pollution intensifies, water quality monitoring and prediction have become key research areas in environmental science and urban management [
1]. This study focuses on the efficient simulation and prediction of the spatial and temporal distribution of dissolved oxygen (DO) concentration, a key indicator in typical urban water bodies such as bays. In semi-enclosed water bodies, DO concentration is influenced by a variety of interacting factors, including tidal exchange, freshwater runoff, water temperature, salinity stratification, and anthropogenic impacts such as domestic sewage discharge, nutrient (nitrogen and phosphorus) loading, and chemical oxygen demand (COD) emissions. These combined natural and human-induced effects result in strong spatiotemporal nonlinearity in DO variation, making accurate prediction particularly challenging [
2]. Traditional water quality monitoring mainly relies on field sampling and laboratory analysis. Although the data are reliable, this method is time-consuming, costly, and not suitable for large-scale, high-precision monitoring. To address this issue, computational fluid dynamics (CFD) methods have been introduced to predict DO distribution and provide a detailed description of physical processes. However, CFD methods depend heavily on boundary conditions and initial parameters, are inefficient in high-resolution long-term simulations, and are difficult to meet the practical needs for rapid prediction and response.
In recent years, artificial intelligence technology has rapidly developed, and machine learning, an algorithmic framework capable of automatically learning patterns from data and making predictions, has been widely applied in environmental science, weather forecasting, hydrological simulation, and other fields [
3,
4,
5]. Among these, deep learning, an important branch of machine learning, has demonstrated outstanding performance in time-series prediction and spatiotemporal feature extraction tasks due to its multi-layer neural network structure and powerful feature extraction capabilities. Deep learning offers a potential technological pathway for the efficient simulation of water quality parameters and is expected to break through the bottleneck of traditional methods in terms of efficiency and accuracy, enabling efficient simulation and prediction of DO concentration distribution [
6].
Traditional experimental methods have accumulated significant achievements in DO concentration detection. D’Autilia et al. [
7] captured short-term oscillation patterns of DO using local maximum time distance analysis. Dubuc et al. [
8] and Hishe et al. [
9], respectively, used multi-parameter probes to measure dissolved oxygen concentrations in urbanized wetlands and rivers. Beadle [
10] improved the Winkler method to estimate oxygen content in polluted waters effectively. Wilkin et al. [
11] compared colorimetric, electrode, and modified Winkler titration methods and confirmed the latter’s high-precision advantage under strict data requirements. In addition, Wittkampf et al. [
12] developed silicon thin-film sensors, Li et al. [
13] introduced luminescent oxygen quenching sensors, and Hydes et al. [
14] utilized Optode sensors, all of which demonstrated good stability and linear response characteristics. These methods are accurate and have a solid theoretical foundation; however, they rely on point measurements, making it difficult to comprehensively reflect the spatiotemporal distribution characteristics of complex water bodies. Moreover, high-precision instruments are complex to operate and costly.
Numerical simulation methods expand the spatiotemporal dimension of DO dynamics by coupling environmental factors with physical equations. Abbaspour et al. [
15] simulated watershed hydrology and material transport using the SWAT program. Almeida et al. [
16] conducted a 19-year long-term water quality simulation of a shallow eutrophic lagoon using the SWAT and CE-QUAL-W2 models. Hull et al. [
17] and Antonopoulos et al. [
18] used continuous models and one-dimensional layered models to analyze the effects of variables such as water temperature and radiation on DO. Fang et al. [
19] used the minlake96 model, and Stefan et al. [
20] used vertical transport equations to quantify the correlation between climate and water quality. Carlsson et al. [
21] and Chapelle et al. [
22] used ecological models to reveal seasonal variations in oxygen concentration and nitrogen–oxygen flux mechanisms. Mandal et al. [
23] and Park et al. [
24] modeled the diurnal effects of wind speed, light, and aquatic plants using differential equations and the MACRIV model. Hoque et al. [
25] proposed the COPSTZ model, focusing on the impact of reduced dissolved oxygen on plankton. In addition, Curbani et al. [
26] used CFD to evaluate the dissolved oxygen balance and control processes in the Vitória Island estuarine system in Brazil, and Piehl et al. [
27] employed the MOM-ERGOM model to investigate the spatiotemporal variations in seasonal hypoxia and assess oxygen indicators in the western Baltic Sea. Numerical simulation can characterize multi-factor coupling mechanisms, but they depend on precise parameterization and high computational resources and have limited ability to express complex nonlinear relationships.
Machine learning methods break through the limitations of traditional physical frameworks through data-driven modeling. Kisi et al. [
28] showed that Bayesian model averaging (BMA) outperforms traditional statistical models in DO prediction. Ahmed [
29] and Zare Abyaneh [
30] verified the strong fitting ability of ANN for biochemical indicators. Ramaraj and Sivakumar [
31] compared the performance of ANFIS, LSTM, and NAR neural networks in water quality prediction. Elkiran et al. [
32] and Ay et al. [
33] confirmed the robustness of the ANFIS and RBNN models in multi-step predictions, while Faruk [
34] improved time-series prediction accuracy using an ARIMA-neural network hybrid framework. However, existing research mainly focuses on single models, with certain limitations in capturing the spatiotemporal dynamics of DO under tidal influences.
Based on the above analysis, this paper proposes a hybrid deep learning model combining Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) for predicting the spatiotemporal distribution of DO concentration in Shenzhen Bay. Compared with single models used in existing studies, the LSTM-GRU Hybrid Model demonstrates distinct advantages both theoretically and practically. From a theoretical perspective, the model leverages the strengths of LSTM in modeling long-term dependencies in time series and the computational efficiency of GRU, thereby overcoming the traditional trade-off between prediction accuracy and computational cost. From a practical perspective, comparative experiments with LSTM, Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Recurrent Neural Network (RNN) models confirm its applicability under complex hydrological conditions, offering a feasible solution to meet the dual requirements of real-time performance and accuracy in water quality prediction for bay areas. This study not only deepens the understanding of deep learning mechanisms in predicting complex water bodies but also provides an efficient, intelligent, data-driven predictive tool for coastal ecological environment management.
This study uses Shenzhen Bay as a typical research area. As an important nearshore area of the Pearl River Estuary, Shenzhen Bay is influenced by urban development, pollution pressure, and marine ecological changes. Its water quality directly affects the health of the ecosystem and regional sustainable development. Additionally, the bay’s strong tidal dynamics and frequent human interference present complex hydrodynamic structures, making it a typical and challenging testing ground for deep learning algorithms. This study not only provides a technical solution for ecological early warning in coastal areas but also helps validate the ability of deep learning models to handle complex environmental system data, contributing to the theoretical refinement and methodological innovation of deep learning in environmental science.
2. Methodology
2.1. Overall Research Approach
This study constructs an LSTM-GRU Hybrid Model to predict the spatiotemporal distribution of DO concentration in Shenzhen Bay. The overall research framework is shown in
Figure 1. Firstly, based on the actual conditions of the study area, data preparation is conducted, including the collection of necessary hydrodynamic data. The model setup is performed by selecting appropriate physical parameters, and a hydrodynamic model is established based on Delft3D FM (Deltares, Delft, The Netherlands). The model is then validated to ensure the accuracy and reliability of the hydrodynamic simulation results. Building upon the existing hydrodynamic model, a water quality model framework is further developed, with dissolved oxygen as the primary water quality factor for simulation, followed by model validation. Based on the numerical simulation results from the hydrodynamic and water quality models, the LSTM-GRU Hybrid Model is constructed. The dissolved oxygen concentration data from four key boundary points in the research area over the past six days are used as input (the statistical characteristics of the input and target variables are shown in
Table 1). A total of 793,520 data samples are employed to train the machine learning model, which is randomly split by time index into training, validation, and test sets at a ratio of 6:2:2. The model is then applied to predict the spatial distribution of DO concentration across the entire study area. Through model optimization, including adjusting hyperparameters, the final model’s performance is evaluated to ensure that the constructed model has good prediction accuracy and stability.
2.2. Study Area and Data Sources
2.2.1. Study Area
Shenzhen Bay is located in the southern part of Guangdong Province, China, situated between the city of Shenzhen and the Hong Kong Special Administrative Region. It is a typical semi-enclosed water body and is part of the Pearl River Estuary water system. As shown in
Figure 2, Shenzhen Bay stretches east to west with a narrow north–south width. It is influenced by the runoff from the Pearl River Estuary, tidal dynamics, land-based runoff inputs, and monsoon climate. The hydrodynamic conditions of Shenzhen Bay are complex, with relatively limited water exchange capacity, making the water quality environment sensitive to changes. In recent years, with the acceleration of urbanization in the region, the population density around Shenzhen Bay has increased, and industrial activities have become more frequent. This has led to an increase in land-based pollutant inputs, resulting in noticeable changes in the spatiotemporal distribution of dissolved oxygen concentration. The water quality situation urgently requires monitoring and improvement. Shenzhen Bay not only has important ecological functions but also serves as a key area for socio-economic activities between Guangdong and Hong Kong. Therefore, in-depth research on the variation patterns of dissolved oxygen concentration in Shenzhen Bay is crucial for improving the management of the bay’s water environment and safeguarding the health of its ecosystem.
2.2.2. Data Sources
The meteorological data and water quality data used in this study were obtained from monitoring stations from October 2021 to June 2022. The daily temperature data and daily relative humidity data were sourced from the Shenzhen Meteorological Bureau’s automatic station monitoring system, while the daily wind speed and direction data, daily cloud cover data, and daily average atmospheric pressure data were provided by the Hong Kong Observatory. Solar radiation data were collected from the Hong Kong Kong-King’s Bay Meteorological Station. Water quality monitoring data were obtained from the Shenzhen Coastal Waters Special Monitoring Table, with monthly monitoring data from nine water quality monitoring points (LH016 to LH024) between October 2021 and June 2022. The locations of these monitoring points are shown in
Figure 2. The underwater topography data were sourced from previous studies [
35]. The five main river mouths selected for this study are Shenzhen River, Dasha River, Fengtang River, Houhai River, and Xiaosha River. River flow was based on multi-year average runoff data; for rivers without multi-year average runoff data, runoff was estimated based on river width, as shown in
Table 2.
2.3. Physical Representation
The governing equations of the Delft3D hydrodynamic and water quality models in the vertical sigma coordinate system and horizontal curvilinear coordinate system are as follows:
In the equation, u and v represent the flow velocities in the ξ and η directions, respectively (unit: m·s−1); Q denotes the flow rate change per unit area due to drainage or precipitation (unit: m·s−1); d is the depth below the reference surface (unit: m), ζ represents the free surface elevation above the reference plane (unit: m), and d + ζ is the total water depth (unit: m); t denotes time (unit: s).
- (2)
Momentum equation
The momentum equation in the
ξ direction is:
The momentum equation in the
η direction is:
In the equation, w represents the flow velocity in the vertical direction (unit: m·s−1); σ is the proportional vertical coordinate; ρ0 is the density of water (unit: kg·m−3); Pξ and Pη represent the hydrostatic pressure gradients in the ξ and η directions, respectively (unit: kg·m−2·s−2)); Fξ and Fη represent the effects of horizontal Reynolds stresses causing imbalance in the ξ and η directions, respectively (unit: m·s−2); Mξ and Mη represent the effects of external factors on momentum in the ξ and η directions, respectively (unit: m·s−2); is the vertical eddy viscosity coefficient (unit: m·s−2).
- (3)
Convection-diffusion-reaction equation
In the equation, C is the concentration (unit: gO2·m−3), Dx, Dy, and Dz are the diffusion coefficients in the x, y, and z directions, respectively (unit: m2·s−1); S is the source (unit: gO2·m−3·s−1), and fR is the reaction term (unit: gO2·m−3·s−1).
2.4. Numerical Experiments
This study uses a structured grid with an average grid size of approximately 150 m. After orthogonality testing, the cosine values of the node angles are all less than 0.02, meeting the model’s accuracy requirements. The vertical dimension uses a sigma coordinate system divided into 10 layers, and the time step is set to 30 s to balance model stability and computational efficiency. The temperature-salinity module selects the Composite model, which comprehensively considers the horizontal temperature distribution, vertical water-atmosphere interface transfer, and solar radiation absorption rate. The initial conditions are set as a cold start, with a water level of 0 m, salinity of 31 ppt, and temperature of 27 °C. The model boundaries include the entrance of Shenzhen Bay and five main river mouths. The boundary water level at the bay entrance is derived from tidal harmonic analysis based on historical data, while salinity and water temperature are taken from the data of the nearshore monitoring point LH016. For the river mouth boundary conditions, the flow is calculated based on the runoff data (see
Table 1), and water quality data are organized based on river water quality evaluation data. Regarding model parameters, the horizontal eddy viscosity and diffusion coefficient are both 1 m
2·s
−1, and the vertical eddy viscosity and diffusion coefficient are both 5 × 10
−5 m
2·s
−1. The gravitational acceleration is set to 9.81 m·s
−2, and the water density is 1000 kg·m
−3. The water transparency is represented by a Secchi depth of 2 m, and the meteorological conditions are based on daily meteorological data from October 2021 to June 2022. The water quality model selects dissolved oxygen as the simulated substance, coupled with hydrodynamic simulation results. The initial conditions are obtained through spatial interpolation of the monitoring data from LH016 to LH024, and the dissolved oxygen boundary conditions at the bay entrance and river mouths are set according to water quality observation data.
2.5. LSTM-GRU Hybrid Model
This study develops the LSTM-GRU Hybrid Model to predict the distribution of dissolved oxygen concentration in the Shenzhen Bay area. The LSTM-GRU Hybrid Model combines the advantages of LSTM networks and GRU networks to enhance the spatiotemporal sequence modeling capability and prediction accuracy. The model architecture is shown in
Figure 3, and the hyperparameter settings are provided in
Table 3. The model architecture achieves temporal feature extraction through a three-layer cascading structure of LSTM(200)-GRU(150)-LSTM(100). The first layer, LSTM, utilizes 200 neurons and retains the full time-step output to capture long-term dependencies. As the core feature extraction layer, the 200 neurons provide sufficient parameter capacity, preventing underfitting caused by an insufficient number of units and ensuring the capture of subtle temporal variations. The intermediate GRU layer with 150 units preserves the computational efficiency of GRU while forming a parameter-decreasing structure with the preceding LSTM, thereby reducing feature redundancy. The final layer, LSTM, compresses the time dimension with 100 neurons and outputs the final time-step features. Serving as the feature integration layer, it fuses and refines the features extracted from the previous two layers, with 100 neurons enabling a focus on the most critical temporal representations. To prevent overfitting, a Dropout (0.5) and Batch Normalization layer are inserted after each recurrent network layer, enhancing the model’s generalization through random neuron deactivation and feature distribution standardization. The training process uses the RMSprop optimizer combined with an Early Stopping strategy, dynamically optimizing the mean squared error loss within 800 epochs. This model efficiently models complex time series patterns by leveraging the advantages of hybrid networks, regularization constraints, and adaptive training mechanisms.
2.6. Reference Methods
To ensure the fairness of the ablation experiments, all deep learning baseline models were trained with consistent hyperparameter settings: the LeakyReLU activation function was adopted, mean squared error (MSE) was used as the loss function, the Adam optimizer was employed for parameter updates, the maximum number of training epochs was set to 800, and model performance was monitored on the validation set to prevent overfitting. The only differences among the models lie in their network architectures, which are described in detail below.
2.6.1. LSTM Model
This paper uses the standard Long Short-Term Memory (LSTM) model as a reference algorithm to predict the distribution of dissolved oxygen concentrations in the Shenzhen Bay area. The network architecture adopts a two-layer LSTM stacked structure. The first layer of LSTM is configured with 200 neurons and a tanh activation function, retaining the time-step sequence output to facilitate inter-layer information transfer, followed by a Dropout regularization technique to prevent overfitting. The second layer of LSTM is reduced to 150 neurons, outputting the feature vector of the final time step, achieving feature space compression through a decrease in the number of neurons at each layer. The output layer uses a Dense fully connected layer to match the dimensions of the target variable.
2.6.2. MLP Model
The Multi-Layer Perceptron (MLP) model is a typical feedforward neural network composed of multiple layers of neurons, and is a commonly used deep learning model. In this study, the MLP model is introduced as a reference algorithm. The network architecture adopts a two-layer fully connected structure, with the first layer configured with 200 neurons and using the ReLU activation function to achieve nonlinear high-order feature mapping. The second layer reduces the number of neurons to 150, progressively compressing the feature dimensions. Dropout regularization with an 80% high dropout rate is applied after each layer to force neurons to learn independently and suppress overfitting risks. The output layer consists of a Dense fully connected layer matching the target variable dimensions.
2.6.3. KNN Model
K-Nearest Neighbors (KNN) regression is an instance-based learning method that predicts outcomes by calculating the distance between the input sample and the samples in the training set, selecting the nearest neighbors, and then averaging their output results. In this study, the KNN regression model makes predictions based on local similarity in the feature space. Given a test sample, the model searches for the 5 samples in the training set that are closest in Euclidean distance, and uses the arithmetic mean of the target values of these neighbors as the predicted output. The input feature dimensions are consistent with the standardized training set, and the implicit time series are flattened into static feature vectors from the original sequential structure. The model is initialized using KNeighborsRegressor(n_neighbors = 5), which defaults to using Euclidean distance to construct the feature space similarity measure. During training, the standardized training data is fitted to the KNN regression model, and during prediction, the dissolved oxygen concentration field is generated based on the local linear assumption.
2.6.4. RNN Model
The Recurrent Neural Network (RNN) model can handle the temporal dependencies in sequential data through the hidden state propagation mechanism. In this study, the RNN model consists of two layers of SimpleRNN units. The first layer contains 200 neurons, using tanh as the activation function to preserve the full temporal sequence features, followed by a Dropout layer to prevent overfitting. The second layer of the SimpleRNN unit contains 150 neurons, outputting the final timestep features, followed by a Dropout layer. The output layer is a fully connected structure, with the number of nodes matching the target output dimension.
2.7. Performance Metrices
To evaluate the performance of the predictive models, five commonly used statistical indicators were applied, including the coefficient of determination (R
2), mean absolute error (MAE), root mean square error (RMSE), standard deviation (SD), and bias (Bias). Their calculation formulas are as follows:
where
,
, and
denote the observed, predicted, and mean observed values, respectively, and
n is the number of samples.
4. Discussion
This study proposes for the construction of an LSTM-GRU Hybrid Model to predict the distribution of DO concentration in semi-enclosed water body. The model fully integrates the advantages of LSTM in modeling long-term temporal dependencies and the superior computational efficiency and convergence speed of the GRU structure. It not only captures long-term dependencies in time series data but also effectively handles noise and nonlinear features in the data, significantly improving the prediction accuracy and stability. This allows for efficient and accurate prediction of DO concentration distribution in the study area. The model not only improves the spatial and temporal resolution of simulations but also provides new ideas and methods for modeling dynamic water quality changes in complex coastal environments.
Compared to traditional experimental measurements and CFD numerical simulations, machine learning methods offer significant advantages in this study. Traditional in situ monitoring relies on discrete point sampling, which is limited by the density of device deployment and measurement frequency, making it difficult to achieve high-frequency observation over large-scale spatial regions and long time periods. CFD methods can simulate hydrodynamic-ecological coupling processes, but their computational efficiency is constrained by grid resolution, parameterization schemes, and boundary condition uncertainties. In complex water bodies like Shenzhen Bay, which are affected by multiple factors such as tides, runoff, and pollution discharge, CFD simulations can take hours to complete. In contrast, the LSTM-GRU Hybrid Model, through a data-driven approach, learns the spatiotemporal correlations in the data, enabling the analysis of nonlinear interactions without explicitly constructing physical equations. This reduces the prediction time to seconds while maintaining prediction accuracy, improving computational efficiency by one order of magnitude and demonstrating its strong potential to replace and supplement traditional methods.
Compared to other machine learning models, such as KNN, MLP, RNN, and LSTM models, the LSTM-GRU Hybrid Model shows significant advantages in the training, validation, and testing stages, as shown in
Figure 11, the error metrics of each algorithm are presented in
Table 4. On all three datasets, the LSTM-GRU Hybrid Model’s standard deviation (SD) is closer to the reference values, its R
2 is closer to 0.99, and its RMSE is below 0.09 gO
2/m
3, indicating superior fitting ability and prediction accuracy for dissolved oxygen concentration. In comparison, the performance of KNN, MLP, RNN, and LSTM models is slightly inferior.
Figure 12 shows the kernel density estimation of each algorithm. The LSTM-GRU Hybrid Model significantly outperforms the other models, with its predicted values closely concentrated near the ideal fit line, and the error distribution is relatively small. This further validates the superiority of the LSTM-GRU Hybrid Model in modeling dissolved oxygen concentration in semi-enclosed water body. This advantage may be attributed to the complementary nature of LSTM and GRU structures. LSTM excels in handling long-term dependencies, while GRU performs better in training efficiency and convergence speed. The combination of the two alleviates overfitting and training complexity while maintaining prediction accuracy. In this study, the LSTM-GRU Hybrid Model shows a convergence speed comparable to other reference models, but demonstrates stronger expressive power in feature extraction and temporal modeling, better adapting to the complex hydrodynamics and biogeochemical processes of Shenzhen Bay, thus improving overall prediction performance.
In addition, this study compares its results with existing machine learning research. For example, Kim et al. [
36] employed an LSTM model to predict the British Columbia Water Quality Index (BCWQI) in Lake Päijänne, Finland, and the results showed that the model outperformed ANN, SVR, and RF, achieving an R
2 of 0.91 and an RMSE of 0.11. Similarly, this study verifies the advantages of deep learning in capturing the nonlinear dynamic characteristics of water bodies. Compared with the standalone LSTM model, however, the proposed LSTM-GRU Hybrid Model further improves prediction accuracy and stability, achieving an R
2 close to 0.99 and an RMSE below 0.09 gO
2/m
3 in DO prediction for semi-enclosed coastal areas. This demonstrates its stronger adaptability and robustness in highly nonlinear and noisy marine environments.
This study achieved good prediction results in simulating dissolved oxygen concentration in the Shenzhen Bay area and validated the effectiveness of the LSTM-GRU Hybrid Model. However, the model primarily relies on data trained under normal conditions and lacks specialized modeling and response mechanisms for abnormal events (such as heavy rainfall, red tides, and other extreme situations). In addition, strong physical forcing factors such as tidal height, current velocity, wind speed, air temperature, salinity, radiation, and river inflow were not explicitly incorporated; instead, the DO concentrations at four key boundary points over the past six days were selected as the primary input. The rationale for this choice is as follows: first, boundary DO concentrations largely integrate the combined effects of water exchange, meteorological conditions, and biological processes, thereby indirectly reflecting the influence of multiple physical and chemical drivers; second, Shenzhen Bay is a semi-enclosed water body with relatively stable hydrodynamic conditions, and the DO time series at boundary points is highly representative of the spatiotemporal distribution throughout the bay; and third, to balance model complexity and generalization capability, simplified input features were adopted to validate the feasibility and effectiveness of prediction under limited driving information. In addition, this study conducted a feature importance analysis, and the contribution rates of the boundary points were 31.6%, 38.6%, 16.3%, and 13.5%, respectively, revealing the differences in the model’s sensitivity to DO concentrations at different boundary points. Future research will further incorporate tidal, salinity, and meteorological data to construct a multimodal input framework, enhance the collection and modeling of extreme event data, and allow the model to maintain good adaptability and robustness even under sudden environmental changes, further promoting the application of intelligent prediction models in water environment simulation and management.