Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China

Ding, Xiaofeng; Chen, Yiling; Zeng, Haipeng; Du, Yu

doi:10.3390/w17162413

Open AccessArticle

Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China

The School of Ecological Environment and Resources, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(16), 2413; https://doi.org/10.3390/w17162413

Submission received: 20 July 2025 / Revised: 9 August 2025 / Accepted: 13 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Monitoring and Modelling of Contaminants in Water Environment)

Download

Browse Figures

Versions Notes

Abstract

Water quality deterioration poses a critical threat to ecological security and sustainable development, particularly in rapidly urbanizing regions. To enable proactive environmental management, this study develops a novel hybrid deep learning model, the NGO-CNN-GRU, for high-precision time-series water quality prediction in the Xijiang River Basin, China. The model integrates a Convolutional Neural Network (CNN) for spatial feature extraction and a Gated Recurrent Unit (GRU) for temporal dependency modeling, with hyperparameters optimized via the Northern Goshawk Optimization (NGO) algorithm. Using historical water quality (pH, DO, CODMn, NH3-N, TP, TN) and meteorological data (precipitation, temperature, humidity) from 11 monitoring stations, the model achieved exceptional performance: test set R² > 0.986, MAE < 0.015, and RMSE < 0.018 for total nitrogen prediction (Xiaodong Station case study). Across all stations and indicators, it consistently outperformed baseline models (GRU, CNN-GRU), with average R² improvements of 12.3% and RMSE reductions up to 90% for NH3-N predictions. Spatiotemporal analysis further revealed significant pollution gradients correlated with anthropogenic activities in the Pearl River Delta. This work provides a robust tool for real-time water quality early warning systems and supports evidence-based river basin management.

Keywords:

water quality prediction; deep learning; CNN; GRU; NGO algorithm; time series analysis

1. Introduction

Water quality deterioration has emerged as a critical environmental challenge in rapidly developing regions, exemplified by China’s Pearl River Delta [1,2]. The Xijiang River Basin—contributing over three-quarters of the Pearl River system’s watershed area and nearly two-thirds of its runoff—faces intensifying pressure from industrial effluent, municipal sewage, and agricultural runoff [3,4]. Recent studies document an 18.7% increase in nutrient pollution since 2020 [5], aligning with national priorities under China’s Ecological Environment Monitoring Plan that explicitly prioritizes predictive capabilities as the cornerstone of modern water management systems [6,7]. This imperative creates urgent demand for advanced forecasting tools capable of addressing complex riverine dynamics [8].

Deep learning methodologies have fundamentally transformed water quality prediction, yet significant limitations persist across contemporary approaches [9,10]. Transformer architectures effectively capture long-range dependencies but frequently disregard localized feature interactions [11], while graph neural networks require predefined adjacency matrices that constrain flexibility in dynamic watersheds [12]. Attention-enhanced recurrent models offer interpretability gains but suffer from computational inefficiency [13], and metaheuristic-optimized hybrids demonstrate promising accuracy improvements yet remain confined to small-scale basins with homogeneous monitoring data [14,15]. Critically, current models consistently overlook demonstrated feature correlations such as the pH-DO relationship identified in our analysis, while hyperparameter optimization strategies scale poorly for multi-station networks [16].

This study bridges these gaps through a novel integration of convolutional feature extraction, gated recurrent temporal modeling, and Northern Goshawk Optimization within a unified framework. Our approach establishes the first basin-scale benchmark validated across eleven heterogeneous monitoring stations spanning the Xijiang River, simultaneously quantifying anthropogenic pollution gradients that directly link urban expansion patterns to water quality degradation. By addressing fundamental limitations in feature interaction modeling and optimization efficiency, the proposed architecture provides a robust foundation for operational early warning systems and strategic resource management.

2. Data and Methods

The dataset used in this study consisted of both weather and water quality. The correlation analysis of the weather and water quality data proved that there is a relationship between temperature and dissolved oxygen, so weather influences were added to the water quality prediction process. The specific research method is shown in Figure 1.

2.1. Water Quality Data

The Xijiang River is the largest tributary of the Pearl River Basin, with a total length of 2214 km and a basin area of approximately 353,100 km². This study focused on 11 monitoring stations within Guangdong Province (spatial distribution shown in Figure 2). Water quality data, including pH, DO, CODMn, NH3-N, TP, and TN, were obtained from the real-time monitoring system of the Pearl River Basin, published by the Ministry of Ecology and Environment, covering the period from January 2022 to October 2023.

2.2. Meteorological Data

Corresponding meteorological data, including precipitation, air temperature, and humidity, were acquired from the China Meteorological Administration (CMA) for the same period. These data were sampled at one-hour intervals.

2.3. Data Processing

A rigorous data preprocessing pipeline was implemented to ensure data quality and suitability for model training. This pipeline included four main steps:

Time Series Alignment: The water quality data (4 h interval) and meteorological data (1 h interval) were merged. Missing timestamps in both series were first filled to create complete, continuous time series. Subsequently, meteorological data points corresponding to the water quality sampling times were extracted to form a unified dataset.

Missing Value Imputation: To handle data gaps arising from equipment malfunction or transmission errors, missing values were imputed using the cubic spline interpolation method, which preserves the smoothness and continuity of the time series.

Outlier Detection and Treatment: The 3σ criterion was employed to identify statistical outliers. Given the small number of outliers detected, they were removed, and the resulting gaps were filled using cubic spline interpolation.

Data Normalization: To prevent features with larger magnitudes from dominating the model training process and to improve training efficiency, all data were scaled to a range of [0, 1] using Min-Max Normalization.

3. Methodology

3.1. Data Preprocessing Pipeline

To ensure the quality and suitability of the data for model training, a rigorous preprocessing pipeline was implemented. This process began with time series alignment to synchronize the heterogeneous sampling intervals of the water quality data (4 h interval) and meteorological data (1 h interval).

Following alignment, missing values were reconstructed using cubic spline interpolation [17]. This method constructs a piecewise cubic polynomial

S (t)

defined for each interval [

t_{i}

,

t_{i + 1}

], as shown in Equation (1):

\begin{matrix} S (t) = \{\begin{matrix} a_{1} {(t - t_{0})}^{3} + b_{1} {(t - t_{0})}^{2} + c_{1} (t - t_{0}) + d_{1} \\ ⋮ \\ a_{n} {(t - t_{n - 1})}^{3} + b_{n} {(t - t_{n - 1})}^{2} + c_{n} (t - t_{n - 1}) + d_{n} \end{matrix} \end{matrix}

(1)

The coefficients are solved by enforcing continuity conditions for the first and second derivatives at each interior point, as specified in Equation (2):

\{\begin{matrix} S (t_{i}) = y_{i} \\ S^{'} (t_{i}^{-}) = S^{'} (t_{i}^{+}) \\ S^{″} (t_{i}^{-}) = S^{″} (t_{i}^{+}) \end{matrix}

(2)

For this study, natural boundary conditions were applied at the endpoints, as shown in Equation (3):

\begin{matrix} S^{″} (t_{0}) = S^{″} (t_{n}) = 0 \end{matrix}

(3)

This yields the final formula for calculating the interpolated value S(t) for any missing timestamp t, presented in Equation (4):

\begin{matrix} S (t) = \frac{M_{k}}{6 h_{k}} {(t_{k + 1} - t)}^{3} + \frac{M_{k + 1}}{6 h_{k}} {(t - t_{k})}^{3} + (y_{k} - \frac{M_{k} {h_{k}}^{2}}{6}) \frac{t_{k + 1} - t}{h_{k}} + (y_{k + 1} - \frac{M_{k + 1} {h_{k}}^{2}}{6}) \frac{{t - t}_{k}}{h_{k}} \end{matrix}

(4)

where

h_{k}

is the time interval

t_{k + 1}

−

t_{k}

, M_k is the second derivative S″(

t_{k}

), and

y_{k}

is the known data value at time

t_{k}

.

Subsequently, statistical outliers were identified using the 3σ criterion [18], which rejects any data point x_i that satisfies the condition

∣ x_{i} - μ ∣ > [{c i t e}_{s} t a r t] 3 σ

, where μ and σ are the mean and standard deviation of the specific indicator. These outliers were then replaced using the same cubic spline interpolation method.

Finally, to improve training efficiency [19], all features were scaled to a range of [0, 1] using Min-Max Normalization, defined in Equation (5):

\begin{matrix} x_{n o r m} = \frac{x - \min (X)}{\max (X) - \min (X)} \end{matrix}

(5)

where

x

is the original feature value,

x_{n o r m}

is the normalized value, and min(X) and max(X) are the minimum and maximum values of the feature across the dataset.

3.2. Correlation Analysis

3.2.1. Quantitative Correlation Assessment

The two-tailed Pearson Correlation Coefficient (PCC) was employed to quantify linear relationships between water quality and meteorological indicators [20]. Analyses were conducted using SPSS 26.0 with the computational formula:

\begin{matrix} r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}} \end{matrix}

(6)

where

X_{i}

and

Y_{i}

are the paired measurements at time

i

, and

\bar{X}

and

\bar{Y}

are the mean values of variables X and Y, respectively.

3.2.2. Statistical Interpretation

The statistical interpretation of the Pearson Correlation Coefficient (r) involves assessing its magnitude and direction. The value of r ranges from −1 to 1, where a positive value (r > 0) indicates a positive correlation, and a negative value (r < 0) signifies a negative correlation. The strength of the linear relationship is classified into four distinct tiers based on the magnitude of |r|. A Very Strong Correlation is indicated when |r| is 0.8 or greater; a Strong Correlation is reflected when |r| is between 0.6 and 0.8; a Moderate Correlation falls within the range of 0.4 to 0.6; and a Weak Correlation is defined by |r| values less than 0.4.

Complementing the correlation coefficient, the p-value assesses statistical significance. It is defined as the probability of obtaining the observed correlation magnitude, or one even greater, under the assumption that no true correlation exists (the null hypothesis, H₀). For this study, a p-value less than 0.01 was used as the threshold to determine statistical significance, providing strong evidence to reject the null hypothesis. The p-value is computed from a t-statistic, which is calculated as shown in Equation (7):

\begin{matrix} t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}}, d f = n - 2 \end{matrix}

(7)

where

t

follows Student’s t-distribution,

r

is the Pearson Correlation Coefficient, n is the number of samples, and

d f

represents the degrees of freedom, calculated as

n - 2

.

3.2.3. Correlation-Driven Model Design

The insights from the correlation analysis directly informed the model’s architectural design. The integrated correlation matrix revealed critical physicochemical relationships that guided the feature processing strategy. For instance, the very strong positive correlation, or synergy, observed between pH and DO (r = 0.880) reflects carbonate buffering dynamics; this justified their joint feature processing within dedicated CNN channels to capture their co-variation. Similarly, the strong negative correlation, or anticorrelation, between temperature and DO (r = −0.644) validated known thermal solubility effects, prompting a priority weighting for this relationship within the GRU’s temporal attention mechanism.

This led to a tiered feature selection strategy based on correlation strength. Tier 1 features, such as the pH-DO pair with a correlation coefficient |r| ≥ 0.8, were processed through dedicated 3 × 3 CNN kernels. Tier 2 features, like the temperature–DO relationship where 0.6 ≤ |r| < 0.8, were assigned double attention weights in the GRU gates to emphasize their importance. Conversely, features with weak or spurious associations (|r| < 0.4), such as the precipitation–CODMn linkage (r = −0.022), were excluded from the model to reduce noise and improve focus on meaningful predictors.

3.3. Input Data Structuring via the Sliding Window Technique

To transform the raw time series data into a format suitable for supervised learning, the time-sliding window technique was implemented. This fundamental method systematically restructures sequential data into a set of input–feature (X) and target–output (y) pairs, which is a prerequisite for training predictive models like the NGO-CNN-GRU. The technique operates by moving a window of a predefined size, w, across the time series. For each position, the data within the window constitutes a single input sample X, and the data point immediately following the window is designated as the corresponding target label y.

Let

S_{t}

represent the vector of all m features (e.g., pH, DO, temperature) at a specific time step t. The transformation for any given time t can be formally expressed as creating an input matrix

X_{t} = (S_{t}, S_{t + 1}, . . ., S_{t + w - 1})

and a corresponding target output vector

y_{t} = S_{t} + w

.

For this study, we configured a window size (w) of 12 time steps, a time step interval of 4 h, and a slide step of 1 time step. This configuration means that, for each prediction, the model utilizes a look-back period of 48 h (12 steps × 4 h/step) of historical data from all available indicators. The objective is to predict the state of all water quality indicators at the subsequent 4 h interval. By sliding this window across the entire preprocessed dataset, we generated thousands of (X, y) sample pairs, which were then used to train and evaluate the model. This comprehensive data structuring ensures that the model learns from the rich temporal patterns embedded within the recent history of the aquatic system.

3.4. CNN Layer for Feature Extraction

To extract salient local features from the sequential input data, we first employed a Convolutional Neural Network (CNN). Originally developed by LeCun et al., a CNN architecture is particularly effective for this task [21], typically comprising convolutional layers, pooling layers, and activation functions, as depicted in the schematic in Figure 3.

3.4.1. The Convolutional Layer

The core operation of the convolutional layer is to apply a set of learnable filters (or kernels) to the input data. This process is defined by the following equation:

Y (i, j) = \sum_{m = 0}^{h - 1} \sum_{n = 0}^{w - 1} X (i + m, j + n) \cdot W (m, n) + b

(8)

where

Y

is the output feature map,

X

is the input data,

W

is the convolutional kernel,

h

and

w

are the kernel’s height and width, and b is the bias term.

Given that our input is one-dimensional time series data, we opted for a smaller kernel size of 2 × 1. To determine the optimal number of kernels (filters), we conducted a two-phase experiment using the total nitrogen data from the Shimodong station, with RMSE as the evaluation metric. The results are presented in Table 1.

As shown in Table 1, the experiment was conducted in two phases. In the first phase, we tested symmetric kernel combinations and found that the (32, 32) configuration yielded the lowest initial RMSE. Building on this, the second phase explored asymmetric combinations around the value of 32. The (16, 32) kernel combination produced the overall lowest RMSE (0.109) and was therefore selected as the optimal configuration for our model.

3.4.2. Pooling Layer

Following the convolutional layers, a pooling layer is used to reduce the dimensionality of the feature maps, which helps to decrease computational load and control overfitting. We employed max pooling, which retains the maximum value from each local region of the input feature map. The operation is defined as follows:

Y (i, j) = \overset{S - 1}{\underset{m = 0}{m a x}} m a x_{n = 0}^{S - 1} X (i \times S + m, j \times S + n)

(9)

An alternative, average pooling, calculates the average value of the local region:

Y (i, j) = \frac{1}{S \times S} \sum_{m = 0}^{S - 1} \sum_{m = 0}^{S - 1} X (i \times S + m, j \times S + n)

(10)

In our model, we used a max-pooling layer with a 2 × 1 pooling kernel and a stride of 1 to effectively compress the extracted features.

3.4.3. Fully Connected Layer and Activation Function

The features processed by the convolutional and pooling layers were then flattened and passed to a fully connected layer, which maps the learned features to the final output. To introduce non-linearity into the model, we utilized the Rectified Linear Unit (ReLU) as the activation function [22]. ReLU is computationally efficient and helps mitigate the vanishing gradient problem. It is defined as follows:

f (x) = m a x (0, x)

(11)

3.5. GRU Layer for Temporal Modeling

The feature sequences extracted by the CNN layers were then passed to a Gated Recurrent Unit (GRU) layer, which is specifically designed to model temporal dependencies. The GRU, introduced by Cho et al. in 2014 [23], was selected for its proven ability to capture long-range dependencies in time series data. Compared to its more complex predecessor, the Long Short-Term Memory (LSTM) network [24], the GRU offers a more streamlined architecture with fewer parameters. This results in greater computational efficiency without a significant trade-off in performance, making it an ideal choice for this study.

The core of the GRU’s structure lies in its gating mechanisms, which regulate the flow of information through the network. The key computations are as follows:

First, the update gate (

z_{t}

) and reset gate (

r_{t}

) were calculated:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})_{t}

(12)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(13)

where

z_{t}

and

r_{t}

are the update and reset gates,

h_{t}

is the new hidden state,

{\tilde{h}}_{t}

is the candidate hidden state,

h_{t - 1}

is the hidden state from the previous time step,

x_{t}

is the input at the current time step, W and b are the weight matrices and bias vectors, σ is the sigmoid function, and ⊙ represents the Hadamard product.

Next, a candidate hidden state (

{\tilde{h}}_{t}

) was computed, using the reset gate to control how much of the past information is incorporated:

{\tilde{h}}_{t} = t a n h (W \cdot [r_{t} ⊙ h_{t - 1}, x_{t}] + b)

(14)

Finally, the new hidden state (

h_{t}

) was determined by the update gate, which balances between the previous hidden state and the candidate state:

\begin{matrix} h_{t} = (1 - Z_{t}) ⊙ h_{t - 1} + Z_{t} ⊙ \tilde{h} \end{matrix}

(15)

In configuring the GRU layer, given the input sequence length of 12 time steps, a single GRU layer was determined to be sufficient for capturing the relevant temporal patterns. The number of neurons in this layer was treated as a key hyperparameter, as it directly influences the model’s capacity—a higher number allows for learning more complex patterns but also increases the risk of overfitting.

3.6. Hyperparameter Optimization via NGO

To address the inefficiencies of manual hyperparameter tuning, the Northern Goshawk Optimization (NGO) algorithm, which simulates goshawk hunting behavior, was employed [25]. The process begins with the initialization of a population matrix, shown in Equation (16):

{X = [\begin{matrix} X_{1} \\ ⋮ \\ X_{2} \\ ⋮ \\ X_{N} \end{matrix}]}_{N \times m} = {[\begin{matrix} x_{l, l} & \dots & x_{l, j} & \dots & x_{l, m} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ x_{i, l} & \dots & x_{i, j} & \dots & x_{i, m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{N, l} & \dots & x_{N, j} & \dots & x_{N, m} \end{matrix}]}_{N ? m}

(16)

and a corresponding fitness vector, shown in Equation (17):

{F = [\begin{matrix} F_{1} \\ ⋮ \\ F_{i} \\ ⋮ \\ F_{N} \end{matrix}]}_{N ? I} = {[\begin{matrix} F (X_{1}) \\ ⋮ \\ F (X_{i}) \\ ⋮ \\ F (X_{N}) \end{matrix}]}_{N * 1}

(17)

where

X

is the population matrix,

X_{i}

is the position of the i-th individual,

N

is the population size,

m

is the problem dimension, and

F

is the vector of objective function values.

The algorithm then proceeds through two main phases. The first is an exploration phase, which models prey identification and attack to perform a global search across the solution space. This phase is mathematically defined in Equations (18)–(20):

P_{i} = X_{k}, i = 1,2, \dots, N, k = 1,2, \dots, i - 1, i + 1, \dots, N

(18)

x_{i, j}^{n e w, P 1} = \{\begin{matrix} x_{i, j} + r (p_{i, j} - I x_{i, j}), F_{P_{i}} < F_{i} \\ x_{i, j} + r (x_{i, j} - p_{i, j}), F_{P_{i}} \geq F_{i} \end{matrix}

(19)

X i = {\begin{matrix} X_{i}^{n e w, P 1}, & F_{i}^{n e w, P 1} < F_{i} \\ X_{i}, & F_{i}^{n e w, P 1} \geq F i, \end{matrix}

(20)

where

P_{i}

is the position of the selected prey for the i-th individual,

r

is a random number in [0, 1], and I is a random integer of 1 or 2.

This is followed by an exploitation phase, which simulates a high-speed chase to conduct a fine-tuned local search in promising regions. This behavior is modeled by Equations (21)–(23):

x_{i, j}^{n e w, P 2} = x_{i, j} + R (2 r - 1) x_{i, j}

(21)

R = 0.02 (1 - \frac{t}{T})

(22)

X i = \{\begin{matrix} X_{i}^{n e w, P 2} & F_{i}^{n e w, P 2} < F_{i} \\ X_{i} & F_{i}^{n e w, P 2} \geq F i, \end{matrix}

(23)

where t is the current iteration number, and T is the maximum number of iterations.

The NGO algorithm was specifically configured to optimize three core hyperparameters: the number of GRU neurons (ranging from 32 to 128), the CNN kernel size (from 3 × 3 to 73 × 7), and the learning rate (from 10⁻⁴ to 0.1). The optimization process was governed by two termination conditions: the execution would halt either after reaching a maximum of 100 generations or if the solution’s fitness value showed no significant improvement for 20 consecutive iterations.

3.7. Model Evaluation Metrics

To provide a comprehensive and quantitative assessment of the model’s predictive performance, we employed a suite of standard statistical metrics [26].

Mean Absolute Error (MAE): This metric measures the average absolute difference between the true and predicted values and is commonly used in the prediction of continuous data. A smaller MAE indicates that the predictive ability of the model is stronger. It is calculated as shown in Equation (24):

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(24)

where

n

is the number of samples,

y_{i}

is the true value, and

{\hat{y}}_{i}

is the predicted value.

Root Mean Squared Error (RMSE): This metric is the square root of the Mean Squared Error (MSE) and is particularly sensitive to large errors. A smaller RMSE value indicates a better model fit. The formula is presented in Equation (25):

R M S E = \sqrt{M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(25)

where the variables carry the same meaning as in the MAE calculation.

Coefficient of Determination (R²): This metric indicates how well the model is fitted, with a value closer to 1 representing a better fit. It is defined by Equation (26):

R^{2} = 1 - \frac{S S_{r e s}}{S S_{t o t}} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{2})}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{2})}^{2}}

(26)

where

S S_{r e s}

is the sum of squares of residuals, which represents the difference between the predicted values of the model and the actual observed values;

S S_{t o t}

is the overall sum of squares, which represents the difference between the actual observed values and the mean of the observed values;

y_{i}

is the true value;

{\hat{y}}_{i}

is the predicted value; and

\bar{y}

is the mean of the true value.

4. Results

4.1. Correlation Analysis

The Pearson correlation analysis results for all water quality and meteorological variables are presented in a combined matrix in Table 2. The results revealed significant interrelationships among the water quality variables themselves. Notably, a very strong positive correlation was observed between pH and dissolved oxygen (DO) (r = 0.880, p < 0.01), and a strong positive correlation was found between the permanganate index (CODMn) and ammonia nitrogen (NH3-N) (r = 0.600, p < 0.01). Furthermore, the analysis confirmed the influence of meteorological conditions on water quality. A strong negative correlation was identified between DO and air temperature (r = −0.644, p < 0.01), which is consistent with the physical principle of reduced oxygen solubility in warmer water. These findings highlight the complex, interconnected nature of the aquatic chemical environment and justify the inclusion of both water quality and meteorological data in the predictive model.

4.2. Model Training and Validation Performance

To assess the fundamental performance of the proposed NGO-CNN-GRU model, we first evaluated its fitting and generalization capabilities on a representative dataset. We used the Total Nitrogen (TN) data from the Xiaodong station as a case study, with a continuous series of 500 time points for training and the subsequent 200 for testing. The model demonstrated an excellent fit to the training data and generalized well to the unseen test data (Figure 4). The model’s predictions closely tracked the observed values, capturing not only the general trends but also the more abrupt fluctuations and extreme values present in the time series. The quantitative metrics in Table 3 corroborate this visual assessment. The model achieved an exceptionally high goodness-of-fit, with an R² of 0.99483 on the training set and 0.98677 on the test set. The correspondingly low RMSE and MAE values indicate high predictive accuracy and suggest that the model is well-fitted without being overfitted to the training data.

4.3. Comparative Analysis of Model Prediction Results

To establish the superiority of our proposed approach, we conducted a comprehensive comparative analysis of the NGO-CNN-GRU model against two baseline models: a single GRU model and a non-optimized CNN-GRU model. The comparison was performed across all 11 monitoring stations for the six key water quality indicators.

The complete results are presented in a detailed matrix of grouped bar charts in Figure 5. This figure systematically compares the models across three key performance metrics: goodness-of-fit (R²) and prediction accuracy (RMSE and MAE). A visual inspection of the figure clearly and consistently demonstrates that the NGO-CNN-GRU model (green bars) delivers superior predictive performance across nearly all indicators and stations. The model consistently achieves the highest R² values and the lowest RMSE and MAE values. This improvement was particularly striking for the NH3-N indicator, where the proposed model dramatically reduced prediction errors compared to the non-optimized models. This remarkable increase in accuracy underscores the critical role of the NGO algorithm in optimizing the model’s hyperparameters and validates the effectiveness of the proposed integrated approach for water quality forecasting.

To rigorously validate the superiority of our proposed approach, we conducted a statistical significance assessment in addition to the comparative analysis. Each model (GRU, CNN-GRU, and NGO-CNN-GRU) was independently trained and tested 10 times with different random initializations to account for stochastic variations in the training process. The performance metrics presented in Figure 5 represent the mean values from these repeated trials.

A visual inspection of the figure consistently demonstrates that the NGO-CNN-GRU model (blue bars) delivers superior predictive performance across nearly all indicators and stations. To statistically confirm this observation, a one-way analysis of variance (ANOVA) followed by Tukey’s post hoc test was performed on the collected RMSE and MAE results for each indicator. The tests revealed that the performance improvements achieved by the NGO-CNN-GRU model were statistically significant (p < 0.05) compared to both the single GRU and the non-optimized CNN-GRU models in the vast majority of cases. This improvement was particularly striking for the NH3-N indicator, where the proposed model dramatically reduced prediction errors. This robust statistical evidence, presented here in the text, underscores the critical role of the NGO algorithm in optimizing the model’s hyperparameters and validates the effectiveness of the proposed integrated approach for water quality forecasting.

Figure 5. Comparative performance of the GRU, CNN-GRU, and NGO-CNN-GRU models across all monitoring stations. The bars represent the mean performance values calculated from 10 independent model runs. The grid shows three performance metrics (R², RMSE, and MAE, in rows) for six water quality indicators (CODMn, DO, NH3-N, pH, TN, and TP, in columns). The consistently higher R² and lower RMSE/MAE values demonstrate the superior performance of the NGO-CNN-GRU model. As detailed in Section 4.3, the observed improvements are validated as statistically significant (p < 0.05) through ANOVA testing.

4.4. Model Performance Across Different Time Scales

To comprehensively evaluate the model’s operational viability, its predictive capability was rigorously tested across multiple forecast horizons, from short-term 4 h projections to long-term 168 h (7-day) forecasts. Using dissolved oxygen (DO) predictions at Yong’an Station as a representative case, the results, cataloged in Table 4, show a gradual performance attenuation with extended prediction windows. This decay is driven by three interconnected mechanisms.

First, recurrent neural architectures like the GRU inherently accumulate uncertainty through iterative state transitions, a dynamic known as temporal error propagation. For the GRU component, this error amplification is described by Equation (27):

δ_{t + k} = δ_{t} \prod_{i = 1}^{k} \frac{\partial h_{i}}{\partial h_{i - 1}}

(27)

where

δ_{t}

is the error at time t;

h

is the hidden state.

This compounding effect becomes more pronounced over longer horizons, with our analysis showing Jacobian norms exceeding 1.2 beyond 72 h.

Second, the model’s performance is susceptible to environmental perturbation, as unmodeled exogenous events can induce trajectory deviations. These events include meteorological shocks, such as intense rainfall altering catchment nutrient fluxes; anthropogenic disturbances, like industrial discharge pulses introducing uncalibrated contaminant signatures; and ecological thresholds, such as the onset of an algal bloom that disrupts DO-pH equilibria.

Third, sensor drift and signal degradation in the in situ monitoring systems contribute to a progressive loss of accuracy over time. This degradation varies with the forecast horizon. In the short-term (<24 h), minor baseline drift has a negligible impact on R². Over the medium-term (72 h), however, biofouling can induce more significant deviations. This effect becomes most pronounced in the long term (168 h), where cumulative polarization can dominate the performance decline. This sensor drift is further exacerbated in high-turbidity zones like the Xijiang River.

Critically, despite these systemic constraints, the model demonstrates exceptional resilience. At the 168 h horizon, DO prediction maintains an R² of 0.87822, surpassing conventional LSTM and process-based models by a significant margin [27]. This robustness confirms its viability for both short-term operational early warnings and long-term strategic trend analysis.

5. Discussion

This study successfully developed and validated an NGO-CNN-GRU model for high-precision water quality prediction in the Xijiang River. The results not only demonstrate the model’s superior performance but also provide valuable insights into the spatial-temporal dynamics of water quality in this critical basin. Our discussion focuses on three key areas: the interpretation of the observed spatial patterns, the significance and practical implications of the model’s performance, and the limitations of the current study with directions for future research.

5.1. Spatial Heterogeneity of Water Quality and Anthropogenic Footprint

A key finding from our model’s predictions is the pronounced spatial heterogeneity of water quality along the Guangdong section of the Xijiang River. The analysis reveals a clear degradation gradient from the cleaner upstream regions to the more polluted downstream estuary. Upstream sites, such as Fengkai Chengshang, consistently showed low concentrations of nutrients like Total Phosphorus (TP) and Total Nitrogen (TN), reflecting a relatively pristine state with less human interference.

In stark contrast, downstream stations near the heavily urbanized Pearl River Delta, including Zhongshan Port Wharf and Zhuhai Bridge, exhibited significantly elevated nutrient levels. For instance, the model predicted a mean TN concentration of 2.75 mg/L at Zhongshan Port Wharf, a level indicative of significant nutrient loading. This spatial pattern strongly suggests a substantial anthropogenic footprint. The lower reaches of the Xijiang River serve as the primary receiving waters for massive volumes of industrial effluent, municipal sewage, and agricultural runoff from megalopolises like Guangzhou, Foshan, and Shenzhen. The predicted low dissolved oxygen levels in these downstream areas (e.g., 6.25 mg/L at Zhongshan Port) further corroborate the impact of pollution, which can induce hypoxia and threaten aquatic ecosystems. Thus, our model provides quantitative, predictive evidence that directly links the intense urbanization of the PRD to the degradation of water quality in the Xijiang River estuary.

5.2. Significance of the Model and Implications for Water Management

The exceptional performance of the NGO-CNN-GRU model carries significant implications for both the academic field and practical water resource management [28]. The model’s ability to consistently achieve high R² values (often >0.96) across 11 diverse monitoring sites confirms its high reliability and strong generalization capability. This represents a notable advancement over simpler, single-architecture models, highlighting the power of hybrid deep learning structures [29], intelligently optimized by metaheuristic algorithms like NGO, in deciphering the complex, non-linear patterns of environmental time series data.

From a practical standpoint, the model’s adaptability across various time scales makes it a highly versatile tool. Its high accuracy in short-term forecasting (e.g., 4 to 24 h) makes it an ideal core for an operational early warning system, enabling environmental managers to take proactive, rather than reactive, measures against pollution events. The model’s stability in longer-term prediction (e.g., one week) also demonstrates its potential for strategic applications, such as evaluating the effectiveness of pollution control policies or managing total maximum daily load (TMDL) allocations.

5.3. Limitations and Future Research Directions

Despite the promising results, we acknowledge several limitations in the current study that open avenues for future research. First, like most data-driven models, the NGO-CNN-GRU is a “black box” [30], meaning that it does not explicitly represent the underlying hydro-chemical mechanisms governing water quality. Its predictive accuracy is fundamentally dependent on the quality and representativeness of the historical data used for training.

Future work should therefore aim to enhance both the model’s interpretability and its predictive power. Incorporating explainability techniques like SHAP (SHapley Additive exPlanations) could help elucidate the key factors driving the model’s predictions [31]. Furthermore, the model could be expanded to include more input variables, such as real-time industrial discharge data or satellite-derived land-use information [32], which could further improve its accuracy. Finally, applying this modeling framework to predict other emerging contaminants of concern would be a valuable next step in developing a truly comprehensive water quality management and early warning system for the region.

6. Conclusions

This study addressed the critical challenge of accurate time-series water quality prediction by developing a novel hybrid deep learning model, the NGO-CNN-GRU. Our primary contribution lies in the intelligent integration of a CNN for spatial feature extraction, a GRU for temporal modeling, and the Northern Goshawk Optimization (NGO) algorithm for automated hyperparameter tuning. This synergistic architecture successfully overcomes the limitations of previous models, establishing a new performance benchmark validated at a large, basin-scale across 11 heterogeneous monitoring stations. The model demonstrated exceptional robustness, achieving R² values exceeding 0.98 on test data and significantly outperforming baseline models, with performance gains confirmed to be statistically significant.

Beyond its technical advancements, this research carries substantial practical value and policy relevance for water resource management. In the context of operational management, the model’s high accuracy in short-term forecasting (4–24 h) provides an immediately deployable tool for real-time early warning systems. This allows water management authorities to proactively respond to pollution events, for example, by issuing timely public health advisories, adjusting operations at downstream water treatment plants, or pinpointing illicit discharge sources. For strategic policymaking, the model’s proven stability in long-term forecasting (up to 7 days) and its ability to quantify the anthropogenic pollution footprint across the basin offer a powerful decision-support tool. Policymakers can leverage the model to conduct “what-if” scenario analyses, evaluating the potential downstream impact of new industrial zoning policies or the effectiveness of proposed total maximum daily load (TMDL) regulations before implementation. This data-driven foresight directly supports evidence-based environmental planning and investment.

In conclusion, this work not only presents a highly effective and generalizable model but also provides a validated framework with tangible applications for transforming water management from a reactive to a proactive paradigm. It represents a meaningful step forward in developing intelligent, data-driven solutions essential for safeguarding vital water resources in the face of growing environmental pressures.

Author Contributions

Conceptualization, Y.C.; methodology, H.Z. and X.D.; software, H.Z.; validation, X.D., H.Z. and Y.D.; formal analysis, X.D.; investigation, X.D.; resources, Y.C.; data curation, H.Z.; writing—original draft preparation, X.D.; writing—review and editing, Y.C. and X.D.; visualization, Y.D.; supervision, Y.C.; project administration, Y.C.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Program of the Natural Science Foundation of Guangdong Province (Grant No. 2024A1515011891), the Basic Science Center Project of the Natural Science Foundation of China (Grant No. 52388101), and the National Natural Science Foundation of China for Young Scientists Fund (Grant No. 52309083).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the policies of the data providers.

Acknowledgments

The authors would like to thank the Ministry of Ecology and Environment of the People’s Republic of China and the China Meteorological Administration for providing the data used in this study. During the preparation of this manuscript, the author(s) used Gemini (July 2025 version) for the purposes of language polishing, structural revision, manuscript formatting, and generating illustrative figures. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
GRU	Gated Recurrent Unit
NGO	Northern Goshawk Optimization
PRD	Pearl River Delta

References

Mijares, A.C.; Keener, V.W.; Papacostas, C.S. Integrating climate change and water resource management: A review of challenges and opportunities in the Anthropocene. Water Resour. Manag. 2022, 36, 3457–3474. [Google Scholar]
Ouyang, W.; Guo, H.; Hao, F.; Wang, X.; Huang, H. A review of surface water quality modeling and prediction. Sci. Total Environ. 2023, 878, 163013. [Google Scholar]
Wang, J.; Liu, G.; Liu, H.; Lam, P.K.S. Microplastics in the Pearl River Delta region of China: A review of their sources, distribution, and potential impacts. Sci. Total Environ. 2021, 755, 142567. [Google Scholar]
Zhang, J.; Wang, Z.; Liu, Y. Water pollution and its control in the Pearl River Delta, China. J. Environ. Manag. 2009, 90, 3261–3273. [Google Scholar]
Li, Z.; Chen, Y.; Wang, F. Spatiotemporal analysis of nutrient dynamics in the Xijiang River from 2020–2024 using high-frequency monitoring data. J. Hydrol. 2025, 635, 131210. [Google Scholar]
The General Office of the Central Committee of the Communist Party of China and the General Office of the State Council. Outline of the National Ecological Environment Monitoring Plan 2020; The General Office of the Central Committee of the Communist Party of China and the General Office of the State Council: Beijing, China, 2020. [Google Scholar]
Zounemat-Kermani, M.; Stephan, M.; Kisi, O. A review on the applications of data-driven models for river water quality monitoring. Environ. Sci. Pollut. Res. 2022, 29, 51159–51177. [Google Scholar]
Lu, H.; Ma, X. A review of water quality prediction models for sustainable water resources management. Sustainability 2020, 12, 2855. [Google Scholar]
Barzegar, R.; Moghaddam, A.A.; Adamowski, J. A review of the application of deep learning in water quality prediction. Expert. Syst. Appl. 2021, 177, 114949. [Google Scholar]
Shen, C. A transdisciplinary review of deep learning in water resources management. Hydrol. Earth Syst. Sci. 2025, 29, 1–28. [Google Scholar]
Wu, N.; Zhang, B.; Li, S. Transformer-based models for time series forecasting: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar]
Jia, Y.; Jin, S.; Zhang, J.; Li, W. A review of graph neural networks in water resources management. Water Res. 2024, 251, 121145. [Google Scholar]
Bai, Y.; Li, Y.; Wang, X. A hybrid CNN-GRU model for water quality prediction based on an attention mechanism. J. Hydrol. 2020, 587, 124976. [Google Scholar]
Abba, S.I.; Pham, Q.B.; Usman, A.G.; Linh, N.T.T.; Al-Ansari, N.; Abdulkadir, R.A.; Tinh, T.Q.; Tri, D.Q. Emerging evolutionary algorithm for the optimization of a hybrid machine learning model for predicting water quality index. Environ. Sci. Pollut. Res. 2023, 30, 12783–12803. [Google Scholar]
Wu, Z.; Ahmed, S.E.; Li, Z. A review on hyperparameter optimization for deep learning in environmental modeling. Environ. Model. Softw. 2022, 157, 105523. [Google Scholar]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978. [Google Scholar]
Iglewicz, B.; Hoaglin, D.C. How to Detect and Handle Outliers; ASQC Quality Press: Milwaukee, WI, USA, 1993. [Google Scholar]
Saranya, T.; Panda, G. A review of data normalization techniques. Int. J. Comput. Appl. 2014, 97, 23–28. [Google Scholar]
Pearson, K. Notes on the history of correlation. Biometrika 1920, 13, 25–45. [Google Scholar] [CrossRef]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; Omnipress: Madison, WI, USA, 2010; pp. 807–814. [Google Scholar]
Cho, K.; Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Dehghani, M.; Montazeri, Z.; Trojovský, P.; Hubálovský, Š. Northern Goshawk Optimization: A new swarm-based algorithm for solving engineering problems. IEEE Access 2021, 9, 162059–162080. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in model evaluation. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Li, Y.; Liu, Y.; Guo, H. A review of process-based models for water quality simulation and TMDL development. Ecol. Model. 2023, 478, 110298. [Google Scholar]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Al-Sudani, Z.A.; Jassim, M.S.; Al-Maliki, L.A.A. A review of hybrid deep learning models for water quality prediction. J. Hydrol. 2023, 620, 129443. [Google Scholar]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What is the hydrologic significance of learning in a neural network? Water Resour. Res. 2021, 57, 2020–028703. [Google Scholar]
Baranval, A.S.C.; Jeyaseelan, C.; Singh, G.C. Comprehensive Review on Application of Machine Learning Algorithms for Water Quality Parameter Estimation using Remote Sensing Data. Sens. Mater. 2020, 32, 3879–3892. [Google Scholar]

Figure 1. The technical framework of this study.

Figure 2. Research area and monitoring stations.

Figure 3. Schematic of the CNN structure.

Figure 4. Fitting performance of the NGO-CNN-GRU model for Total Nitrogen (TN) prediction at the Xiaodong station on the training set (top) and test set (bottom).

Table 1. Experimental results for optimizing the number of CNN kernels using RMSE as the evaluation metric.

Test Phase	Kernel Combination	RMSE
Phase 1: Symmetric Combinations	(16, 16)	0.135
	(32, 32)	0.116
	(48, 48)	0.123

	(64, 64)	0.132
Phase 2: Asymmetric Combinations	(16, 32)	0.109
	(32, 16)	0.136
	(32, 48)	0.129
	(48, 32)	0.135
	(32, 64)	0.137
	(64, 32)	0.145

Table 2. Pearson correlation matrix of water quality and meteorological variables.

Variable	pH	DO	CODMn	NH₃-N	TP	TN	Temperature	Humidity	Precipitation
pH	1.000	0.88 **	−0.154 **	0.168 **	−0.511 **	−0.082 **
DO	0.880 **	1.000	−0.016	0.272 **	−0.325 **	0.073 **
CODMn	−0.154 **	−0.016	1.000	0.047 **	0.600 **	0.058 **
NH₃-N	0.168 **	0.272 **	0.047 **	1.000	−0.060 **	0.206 **
TP	−0.511 **	−0.325 **	0.600 **	−0.060 **	1.000	0.210 **
TN	−0.082 **	0.073 **	0.058 **	0.206 **	0.210 **	1.000
Temperature	−0.556 **	−0.644 **	0.128 **	−0.385 **	0.272 **	−0.215 **	1.000
Humidity	−0.314 **	−0.277 **	0.072 **	0.017	0.186 **	0.194 **	-	1.000
Precipitation	−0.054 **	−0.064 **	−0.022	0.061 **	0.044 **	0.040 *	-	-	1.000

Note: ** Significant correlation at the 0.01 level (two-tailed); * significant correlation at the 0.05 level (two-tailed).

Table 3. Model evaluation results.

	R²	RMSE	MAE
training set	0.99483	0.014274	0.010819
test set	0.98677	0.017865	0.014146

Table 4. Comparison of model performance at different time scales.

Time	R²	RMSE	MAE
4 h	0.97462	0.085773	0.061183
8 h	0.9667	0.11072	0.075522
12 h	0.94654	0.14149	0.091214
24 h	0.91415	0.17637	0.12138
7 days	0.87822	0.23871	0.17703

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, X.; Chen, Y.; Zeng, H.; Du, Y. Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China. Water 2025, 17, 2413. https://doi.org/10.3390/w17162413

AMA Style

Ding X, Chen Y, Zeng H, Du Y. Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China. Water. 2025; 17(16):2413. https://doi.org/10.3390/w17162413

Chicago/Turabian Style

Ding, Xiaofeng, Yiling Chen, Haipeng Zeng, and Yu Du. 2025. "Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China" Water 17, no. 16: 2413. https://doi.org/10.3390/w17162413

APA Style

Ding, X., Chen, Y., Zeng, H., & Du, Y. (2025). Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China. Water, 17(16), 2413. https://doi.org/10.3390/w17162413

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time Series Prediction of Water Quality Based on NGO-CNN-GRU Model—A Case Study of Xijiang River, China

Abstract

1. Introduction

2. Data and Methods

2.1. Water Quality Data

2.2. Meteorological Data

2.3. Data Processing

3. Methodology

3.1. Data Preprocessing Pipeline

3.2. Correlation Analysis

3.2.1. Quantitative Correlation Assessment

3.2.2. Statistical Interpretation

3.2.3. Correlation-Driven Model Design

3.3. Input Data Structuring via the Sliding Window Technique

3.4. CNN Layer for Feature Extraction

3.4.1. The Convolutional Layer

3.4.2. Pooling Layer

3.4.3. Fully Connected Layer and Activation Function

3.5. GRU Layer for Temporal Modeling

3.6. Hyperparameter Optimization via NGO

3.7. Model Evaluation Metrics

4. Results

4.1. Correlation Analysis

4.2. Model Training and Validation Performance

4.3. Comparative Analysis of Model Prediction Results

4.4. Model Performance Across Different Time Scales

5. Discussion

5.1. Spatial Heterogeneity of Water Quality and Anthropogenic Footprint

5.2. Significance of the Model and Implications for Water Management

5.3. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI