Rapid Prediction Approach for Water Quality in Plain River Networks: A Data-Driven Water Quality Prediction Model Based on Graph Neural Networks

Yuan, Man; Li, Yong; Zhang, Linglei; Zhao, Wenjie; Zhang, Xingnong; Li, Jia

doi:10.3390/w17172543

Open AccessArticle

Rapid Prediction Approach for Water Quality in Plain River Networks: A Data-Driven Water Quality Prediction Model Based on Graph Neural Networks

by

Man Yuan

^1,2,

Yong Li

^1,*,

Linglei Zhang

¹,

Wenjie Zhao

³,

Xingnong Zhang

² and

Jia Li

¹

State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, China

²

Nanjing Hydraulic Research Institute, Nanjing 210029, China

³

Sichuan Province Zipingpu Development Corporation Limited, Chengdu 610091, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(17), 2543; https://doi.org/10.3390/w17172543

Submission received: 9 July 2025 / Revised: 19 August 2025 / Accepted: 26 August 2025 / Published: 27 August 2025

(This article belongs to the Special Issue Sustainable Research on Water Quality Monitoring and Nutrient Pollution Control)

Download

Browse Figures

Versions Notes

Abstract

With the rapid development of socioeconomics and the continuous advancement of urbanization, water environment issues in plain river networks have become increasingly prominent. Accurate and reliable water quality (WQ) predictions are a prerequisite for water pollution warning and management. Data-driven modeling offers a promising approach for WQ prediction in plain river networks. However, existing data-driven models suffer from inadequate capture of spatiotemporal (ST) dependencies and misalignment between direct prediction strategy assumptions with actual data characteristics, limiting prediction accuracy. To address these limitations, this study proposes a spatiotemporal graph neural network (ST-GNN) that integrates four core modules. Experiments were performed within the Chengdu Plain river network, with performance comparisons against five baseline models. Results suggest that ST-GNN achieves rapid and accurate WQ prediction for both short-term and long-term, reducing prediction errors (MAE, RMSE, MAPE) by up to 46.62%, 37.68%, and 45.67%, respectively. Findings from the ablation experiments and autocorrelation analysis further confirm the positive contribution of the core modules in capturing ST dependencies and eliminating data autocorrelation. This study establishes a novel data-driven model for WQ prediction in plain river networks, supporting early warning and pollution control while providing insights for water environment research.

Keywords:

water quality prediction; deep learning; spatiotemporal dependency; plain river networks

1. Introduction

Plain river network regions features unique and complex hydrological and geomorphological characteristics that are widely found in the middle and lower reaches of the Yangtze River and the Pearl River Delta in China [1]. Historically, human settlements have been concentrated near water sources, and many of China’s major urban agglomerations are distributed in these regions. The water quality (WQ) of rivers is crucial for maintaining the health and stability of ecosystems as well as regional water resource security [2]. However, plain river network areas face numerous challenges, including low river flow and severe water pollution. With the rapid development of socioeconomics and the continuous advancement of urbanization, water environment issues in plain river networks have become increasingly prominent.

WQ modeling and prediction, as important tools for early warning of water pollution, play a vital role in water resource management [3]. Accurate and reliable WQ prediction is essential for establishing early warning mechanisms for decision-making and implementing pollution control measures. However, river is complex nonlinear system, with WQ data exhibiting nonlinearity and nonstationarity that complicate prediction [4]. Additionally, dynamic interactions among WQ indicators and the multifaceted factors influencing WQ changes further hinder accurate and reliable prediction [5]. Therefore, there is an urgent need for advanced models to enhance the accuracy and reliability of WQ prediction in plain river network areas. River WQ prediction models are generally categorized into mechanistic and data-driven models.

Mechanistic models, based on the principles of mass conservation and hydrodynamics, mathematically describe pollutant generation, transport, and transformation. Over the past several decades, mechanistic models have undergone significant advancements and now serve as powerful decision-making tools for water resource and environmental management. Commonly used mechanistic models include QUAL2E [6], QUAL2K [7], EFDC [8] and SWAT [9]. Despite their ability to simulate the complex dynamic processes of river WQ, mechanistic models still face challenges in achieving precise WQ predictions, especially in plain river network areas, where the application of mechanistic models is significantly constrained: First, mechanistic models typically require large volumes of high-precision data, including hydrological, topographical, land-use, water quality (WQ), and pollution source load data. In complex basins, obtaining such data can be difficult, or the data quality may be poor, complicating models development [10]. Second, the migration and transformation mechanisms of pollutants in water bodies are not fully understood. Many processes are difficult to describe via deterministic mathematical methods, and many model parameters cannot be precisely measured or calibrated, resulting in relatively low accuracy [11,12]. Third, prediction accuracy and reliability suffer from highly uncertain and unpredictable boundary conditions [13]. Fourth, computational intensity and time-consuming simulations [14], coupled with slow response times, preclude real-time operational utility of mechanistic models in decision-making contexts [15].

Data-driven models differ from mechanistic models by not explicitly simulating water pollutant migration and transformation processes. Instead, they establish mapping relationships between input variables and output results through mathematical and computational techniques. Advancements in satellite remote sensing technology and monitoring sensor networks drive exponential growth in earth science data, establishing a robust foundation for data-driven modeling [16]. Data-driven models have gained widespread application in WQ prediction [17,18,19,20], with machine learning approaches including artificial neural networks (ANNs) [21], support vector regression (SVR) [22], and random forests (RFs) [23] learning effective information from data to improve accuracy. Nevertheless, limitations persist in capturing long-term temporal dependencies and eliminating prediction lags. The rise of deep learning architectures, such as convolutional neural networks (CNNs) [24], recurrent neural networks (RNNs) [25], long short-term memory networks (LSTMs) [26], and Transformers [27], provides promising approaches to effectively modeling complex temporal dependencies.

Notably, despite the numerous successful applications of data-driven models based on deep learning in the field of WQ prediction, two significant challenges remain for predicting WQ in plain river networks:

(1) When addressing WQ issues in plain river networks, variations in WQ are influenced not only by the internal characteristics of the time series (i.e., temporal dependencies) but also by the interdependencies between sequences (i.e., spatial dependencies). Modeling the high-dimensional and complex ST relationships of multisource data requires capturing temporal dependencies while also addressing spatial dependencies between sequences [16]. Previous studies have focused primarily on modeling temporal dependencies, often neglecting spatial dependencies. Modeling in a single dimension may overlook critical information regarding WQ, thereby compromising predictive performance.

(2) In WQ prediction, prediction performance is also influenced by model prediction strategy. Existing studies primarily employ recursive or direct forecasting strategies [28]. Recursive strategy uses prior predicted value as subsequent inputs to enable multistep forecasting. Although accounting for label data autocorrelation, this strategy accumulates prediction errors over time, reducing multistep prediction accuracy [29]. Direct strategy employs multi-input multioutput approach to generate multistep prediction, minimizing error. This strategy is widely adopted in deep learning because of its high accuracy, rapid computation speed, and ease of implementation [30]. However, direct strategy assumes that label data are mutually independent, yet there is often significant autocorrelation in label data. This misalignment between strategy assumption with actual data characteristics, limiting model predictive performance.

To address these challenges, we first propose a data-driven model based on Graph neural networks (GNNs) for WQ prediction. ST dependencies in plain river networks are captured by constructing adaptive multiperiod enhancement, temporal period dependency, and multivariate spatial dependency modules. Second, we introduce a hybrid loss function module incorporating time-frequency loss to mitigate label autocorrelation impacts. Third, we evaluate the proposed model’s performance and compare it with other models through a case study, demonstrating its superior predictive performance. Furthermore, an ablation experiment and autocorrelation analysis are used to validate the positive contributions of core modules in resolving the aforementioned challenges.

This study comprises five sections: Section 1 presents the research background, state of the art, and objectives; Section 2 describes methods and proposed model; Section 3 details the case study, including the study area, dataset, and baseline models; Section 4 discusses results and implications; Section 5 concludes the research.

2. Methods and Model

2.1. Basic Theory

2.1.1. Multivariate Time Series Forecasting

Data-driven WQ prediction is essentially a specific application of multivariate time series (MTS) forecasting [31]. In MTS forecasting tasks, the input data consists of historical time series, which can be represented as

X = {x_{1}^{t}, \dots, x_{N}^{t}}_{t = 1}^{S e q} \in ℝ^{S e q \times N}

, where

x_{i}^{t}

denotes the value of the

i - t h

variable at the

t - t h

time step,

N > 1

represents the dimensionality of the variables, and

s e q

specifies the size of the lookback time window. At the

s e q - t h

time step, the historical time series can be expressed as

X = {x_{1}^{t}, \dots, x_{N}^{t}}_{t = 1}^{S e q} \in ℝ^{S e q \times N}

, while the label time series is denoted as

Y = {x_{1}^{t}, \dots, x_{N}^{t}}_{t = S e q + 1}^{S e q + T} \in ℝ^{S e q \times N}

, where

T \geq 1

is the forecast horizon. The MTS forecasting task can be defined as constructing a mode

g : ℝ^{S e q \times N} \to ℝ^{T \times N}

that uses the historical time series

X

to generate predictions

\hat{Y} = {{\hat{x}}_{1}^{t}, \dots, {\hat{x}}_{N}^{t}}_{t = S e q + 1}^{S e q + T} \in ℝ^{T \times N}

that approximate the label time series

Y

.

2.1.2. Graph Neural Network

GNNs represent a category of deep learning models engineered to process graph-structured data by leveraging neural networks. These models are designed to extract and exploit features and patterns inherent in graph-structured data by learning the complex dependencies between nodes and edges within the graph. This enables the fulfillment of graph learning tasks such as clustering, classification, prediction, segmentation, and generation [32]. The concept of GNNs was initially introduced by Gori et al. in 2005 [33], who utilized recurrent neural networks to process various graph types, including undirected graphs, directed graphs, labeled graphs, and cyclic graphs. Subsequently, in 2009, Scarselli et al. [34] refined the GNN algorithm.

Early GNN architectures were primarily based on RNNs, employing basic feature mapping and node aggregation operations to generate node vector representations. However, these early iterations were limited by the sequential processing nature of RNNs and their constrained ability to extract local features, which together hindered their effectiveness in handling the high-dimensional and complex graph data encountered in real-world scenarios. To address these limitations, Defferrard et al. [35] proposed the application of convolutional neural networks to graphs. By ingeniously adapting convolutional operators, they introduced the graph convolutional network (GCNs), which facilitated the implementation of translation invariance, local perception, and weight sharing on graphs, mirroring the capabilities of CNNs. GCNs have also demonstrated significant efficacy in MTS prediction, where they can capture the interactions between multivariate time series [36].

GNNs as an emerging branch of neural networks facilitate the transmission and aggregation of information between nodes within a graph structure, enabling the modeling of temporal dependencies while also emphasizing the relevance of spatial dimensions [37]. GNNs have demonstrated their application potential across various domains, such as WQ prediction for water supply systems [38], estimating soil heavy metal content [39], and forecasting urban water demand [40]. These studies validate the effectiveness of GNNs in modeling complex ST dependencies, providing solutions to the challenges faced by data-driven models in plain river networks.

A graph consists of nodes and edges connecting these nodes and can be represented as

G = (V, E)

, where the set of nodes is denoted as

V = {v_{1}, v_{2}, \dots, v_{n}}

and the set of edges as

E = {e_{1}, e_{2}, \dots, e_{m}}

. The graph representation is via a quintuple:

G (V, E, A, X, D)

, where

A \in ℝ^{N \times N}

represents the adjacency matrix of the graph,

X \in ℝ^{N \times M}

represents the feature matrix of the nodes, and

D \in ℝ^{N \times N}

represents the degree matrix.

N

and

F

denote the number of nodes and the feature dimension of the nodes, respectively.

2.2. Construction of Spatiotemporal Graph Neural Network Water Quality Prediction Model

2.2.1. Overall Model Architecture

Owing to its temporal and multisource characteristics, MTS inherently exhibit ST dependencies, including both temporal dependencies within the time series and spatial dependencies between different time series. Therefore, capturing both the internal dependencies within a time series and the interdependencies among multiple time series is crucial for accurate river network WQ prediction [41,42]. This study proposes a data-driven WQ prediction model based on graph neural networks, referred to as the Spatiotemporal Graph Neural Network (ST-GNN). This model employs an adaptive multiperiod enhancement module to extract temporal sequences at key periodic scales for model input. A temporal dependency module to capture the temporal dependencies of MTS. A multivariate spatial dependency module is adopted to capture the spatial dependencies of the MTS, thereby modeling it ST relationships. Finally, a hybrid loss function module is utilized to mitigate the label autocorrelation effects associated with direct prediction strategy. The detailed architecture of the model is shown in Figure 1.

2.2.2. Adaptive Multiperiod Enhancement Module

Data with rich feature is fundamental for modeling the ST dependencies of WQ in plain river networks. The adaptive multiperiod enhancement module is designed to extract time series data at key periodic scales, thereby constructing model inputs. Sensor data in environmental monitoring often exhibit high noise levels and contain multiple periodic components. Time-domain trend decomposition requires preset periodic parameters, demonstrates sensitivity to noise, struggles to separate multiple periodic components, and suffers from lower computational efficiency [43]. In contrast, fast Fourier transform (FFT) [44] offers superior noise robustness and lower computational complexity, making it particularly suitable for analyzing complex datasets with embedded multi-periodic components [45,46]. Therefore, this study employs FFT for key periodic scales extraction. The implementation of this technique involves several steps:

(1) The time series is converted from the time domain to the frequency domain via FFT. This step facilitates the analysis of potential periodic scales within the time series. By calculating the amplitude of the N-dimensional time series at various frequencies and averaging the amplitudes across variable dimensions, the top

z

frequencies corresponding to the highest amplitude values are selected as

T o p F

.

F = M e a n_{i = 1}^{N} (A m g (F F T (X)))

(1)

T o p F = {f_{1}, f_{2} \dots, f_{z}} = H T L (F)

(2)

In this context,

F F T (\cdot)

denotes the computation of the FFT,

A m g (\cdot)

refers to the frequency amplitudes, and

F

is the average amplitude at each frequency. The

H T L (\cdot)

function is used to sort these values from high to low, filtering out

T o p F

.

(2) The periodic scale

p_{z} = ⌈\frac{S e q}{f_{z}}⌉, z \in {1, 2, \dots, z}

corresponding to the high-frequency amplitude

T o p F

is calculated, with the kernel size

k e r n e l

and stride size

s t r i d e

of the one-dimensional average pooling

A v g p o o l 1 D (\cdot)

set to the periodic scale

p_{z}

. The time series sampled at the

z - t h

periodic scale can be represented as:

X_{Z} = A v g p o o l 1 D {(X)}_{\ker n e l = s t r i d e = p_{z}}

(3)

In this equation,

X_{Z} \in \in ℝ^{s e q_{P} (z) \times N}

, the length of the time series corresponding to the

z - t h

periodic scale is denoted as

s e q_{P} (z) = ⌊\frac{S e q}{p_{z}}⌋

, and

⌊\cdot⌋

indicates the rounding operation.

(3) The time series sampled at each periodic scale are concatenated along the time dimension to obtain a multiperiodic scale time series

X' = C o n c a t (X_{1}, X_{2}, \dots, X_{Z}) \in ℝ^{S e q' \times N}

, with a multiperiodic time series length of

S e q' = \sum_{Z = 1}^{Z} s e q_{P} (z)

. A multilayer perceptron (MLP) is then applied to increase the dimensionality of the data, resulting in multichannel data (

c h a n n e l d i m = C

). The purpose of expanding the data into multiple channels is to enhance the local features of each periodic scale, positively influencing the capture of temporal and spatial dependencies. Ultimately, the shape of

X'

is expanded to

ℝ^{S e q' \times N \times C}

.

2.2.3. Temporal Period Dependency Module

The temporal period dependency module is designed to utilize time series

X'

across multiple periodic scales to extract clearer temporal dependencies within the time series. The temporal dependency graph can be represented as

G^{t i m e} = (V^{t i m e}, E^{t i m e})

, where

V^{t i m e} = {v_{1}^{t i m e}, v_{2}^{t i m e}, \dots, v_{S e q'}^{t i m e}}

denotes the time nodes at different periodic scales. Here,

v_{i}^{t i m e}

represents the

i - t h

time node, and

E^{t i m e} \in ℝ^{S e q' \times S e q'}

represents the weight relationships between time nodes, including weights for both nodes within the same periodic scale and those across different periodic scales. Two vectors,

v e c t o r_{1}^{t i m e}

and

v e c t o r_{2}^{t i m e}

, are generated from

V^{t i m e}

for initialization:

E^{t i m e} = S o f t \max (Re L U (v e c t o r_{1}^{t i m e} \times v e c t o r_{2}^{t i m e}))

(4)

In this equation, the activation function

Re L U (\cdot)

constrains the correlation within the range [0, 1].

S o f t \max (\cdot)

maps the input vector to a probability distribution.

(1) Capturing dependencies across different periodic scales

For any given time node, the number of related time nodes at shorter periodic scales should exceed that at longer periodic scales. For any time node

v_{i}^{t i m e}

, the number of related nodes at the

z - t h

periodic scale is denoted as

m_{z} = ⌈\frac{M}{p_{z}}⌉

, where

p_{z}

represents the length of the

z - t h

periodic scale,

⌈\cdot⌉

indicates the ceiling function, and

M

is a constant that ensures that time series at shorter periodic scales contribute more significantly to the temporal dependencies. The adjacent time nodes of

v_{i}^{t i m e}

at the

z - t h

periodic scale can be expressed as:

N b r_{z}^{p e r i o d} (v_{i}^{t i m e}) = H T L_{m_{z}} (E_{z}^{t i m e} (v_{i}^{t i m e}))

(5)

In this equation,

E_{z}^{p e r i o d} (v_{i}^{t i m e})

denotes the relevant weights of the

i - t h

time node

v_{i}^{t i m e}

at the

z - t h

periodic scale.

H T L_{m_{z}} (\cdot)

is utilized to extract the

m_{z}

nodes with the highest weights. Through the aforementioned methods, the number of adjacent nodes across different periodic scales can be effectively controlled.

(2) Temporal Trend Capture

To ensure the ability to capture temporal trends,

N b r^{t r e n d} (v_{i}^{t i m e})

is designed to retain the associations of adjacent time nodes within the same periodic scale:

N b r^{t r e n d} (v_{i}^{t i m e}) = {v_{j}^{t i m e} | | i - j | \leq 1, p e r i o d (v_{i}^{t i m e}) = p e r i o d (v_{j}^{t i m e})}

(6)

In this equation, the trend set

v_{i}^{t i m e}

consists of its adjacent time nodes (i.e.,

| i - j | \leq 1

), where

p e r i o d (\cdot)

represents the periodic scale of the time nodes, and

N b r^{t r e n d} (v_{i}^{t i m e})

includes adjacent nodes that share the same periodic scale (i.e.,

| i - j | \leq 1

). This method effectively preserves the temporal trends within the same periodic scale.

(3) Normalization of Adjacent Node Weights

The set of adjacent nodes for time node

v_{i}^{t i m e}

can be represented as

N b r (v_{i}^{t i m e}) = N b r^{p e r i o d} (v_{i}^{t i m e}) \cup N b r^{t r e n d} (v_{i}^{t i m e})

.

E^{t i m e}

denotes the correlation weights of the time nodes, which are renormalized as follows:

E^{t i m e} [i, j] = \{\begin{matrix} \frac{E^{t i m e} [i, j]}{\sum_{v_{k} \in N b r (v_{i})} E^{t i m e} [i, k]}, & i f v_{j} \in N b r (v_{i}^{t i m e}) \\ 0, & o t h e r w i s e \end{matrix}

(7)

In this equation,

E^{t i m e} [i, j]

represents the correlation weight between

v_{i}^{t i m e}

and

v_{j}^{t i m e}

. During this step, insignificant correlations are filtered out, and a constrained set of adjacent nodes

N b r (v_{i}^{t i m e})

is retained for each node. Renormalization is employed to maintain the correlations between time nodes, ultimately constructing the expression of temporal dependencies.

(4) Aggregation of Temporal Dependencies

By utilizing the temporal dependency graph

G^{t i m e} = (V^{t i m e}, E^{t i m e})

, different periodic scales of temporal dependencies are captured on the basis of a stacked

L

-layer GCN:

H_{i}^{t i m e, L} = \sum_{z = 1}^{z} \sum_{v_{j} \in N b r_{z}^{p e r i o d} (v_{i}^{t i m e})} V_{j}^{t i m e, L - 1} \cdot E^{t i m e} [i, j] + \sum_{v_{j} \in N b r^{t r e n d} (v_{i}^{t i m e})} V_{j}^{t i m e, L - 1} \cdot E^{t i m e} [i, j]

(8)

V_{i}^{t i m e, L} = σ (c a t (V_{i}^{t i m e, L - 1}, H_{i}^{t i m e, L}) \cdot W_{m a t r i x})

(9)

In these equations,

H_{i}^{t i m e, L}

represents the aggregation of the time adjacent nodes

N b r (v_{i}^{t i m e})

and the features of the time nodes from the previous layer GCN, where the adjacent nodes include those from different periodic scales

N b r^{p e r i o d} (v_{i}^{t i m e})

and those from the same periodic scale

N b r (v_{i}^{t i m e})

.

V_{i}^{t i m e, L}

denotes the features of the current time node, updated through the aggregation

H_{i}^{t i m e, L}

and the features of the previous time node

V_{i}^{t i m e, L - 1}

.

σ

is the

G E L U

activation function, and

W_{m a t r i x}

is a learnable matrix. Finally,

V_{i}^{t i m e, L}

is normalized to obtain the output.

2.2.4. Multivariate Spatial Dependency Module

A WQ indicator at a monitoring site is not only correlated with other WQ indicators at the same site but also significantly influenced by WQ indicators from hydrologically connected or geographically adjacent sites. Additionally, hydrometeorological conditions and other external auxiliary drivers critically impact WQ. Multivariate spatial dependency module aims to capture the spatial dependency relationships among multivariate variables. Therefore, the multisource data at the monitoring site can be regarded as spatial nodes, denoted as

V^{s p a t i o} = {v_{1}^{s p a t i o}, v_{2}^{s p a t i o}, \dots, v_{N}^{s p a t i o}}

, where

v_{i}^{s p a t i o}

represents the time series of the

i - t h

variable. The spatial dependency relationships among multivariate variables can be represented as

G^{s p a t i o} = (V^{s p a t i o}, E^{s p a t i o})

, where

E^{s p a t i o} \in ℝ^{N \times N}

indicates the weight relationships among these multivariate variables.

E^{s p a t i o}

encompasses multisource data from different monitoring sites, facilitating the generalization of spatial dependency relationships among the multisource data across different monitoring locations. Two vectors,

v e c t o r_{1}^{s p a t i o}

and

v e c t o r_{2}^{s p a t i o}

, are generated from

V^{s p a t i o}

for initialization:

E^{s p a t i o} = S o f t \max (Tanh (v e c t o r_{1}^{s p a t i o} \times v e c t o r_{2}^{s p a t i o}))

(10)

In this equation, the activation function

Tanh (\cdot)

constrains the correlation within the range [−1, 1].

S o f t \max (\cdot)

maps the input vector to a probability distribution.

(1) Selection of Spatial Adjacent Nodes

Let

E^{s p a t i o} (v_{i}^{s p a t i o})

represent the relationship weights for nodes related to

v_{i}^{s p a t i o}

. The spatial nodes with higher correlation values are selected as adjacent nodes

N b r_{H}^{s p a t i o}

, which are characterized their homogeneity, whereas those with the lowest correlation are designated as adjacent nodes

N b r_{L}^{s p a t i o}

, which are characterized their heterogeneity. For

v_{i}^{s p a t i o}

, the set of spatial adjacent nodes can be expressed as:

N b r_{H}^{s p a t i o} = T o p H^{s p a t i o} (E^{s p a t i o} (v_{i}^{s p a t i o}))

(11)

N b r_{L}^{s p a t i o} = B o t t o m L^{s p a t i o} (E^{s p a t i o} (v_{i}^{s p a t i o}))

(12)

In this equation,

T o p H^{s p a t i o}

selects the top

H

nodes with the highest weights as homogeneous adjacent nodes, whereas

B o t t o m L^{s p a t i o}

selects the bottom

L

nodes with the lowest weights as heterogeneous adjacent nodes.

(2) Normalization of Adjacent Node Weights

E^{s p a t i o}

represents the relevant weights of spatial nodes, which are renormalized as follows:

E^{s p a t i o} [i, j] = \{\begin{matrix} \frac{E^{s p a t i o} [i, j]}{\sum_{v_{k} \in N b r_{H}^{s p a t i o} (v_{i})} E^{s p a t i o} [i, k]}, & i f v_{j} \in N b r_{H}^{s p a t i o} (v_{i}^{s p a t i o}) \\ - \frac{\frac{1}{E^{s p a t i o} [i, j]}}{\sum_{v_{k} \in N b r_{L}^{s p a t i o} (v_{i})} \frac{1}{E^{s p a t i o} [i, k]}}, & i f v_{j} \in N b r_{L}^{s p a t i o} (v_{i}^{s p a t i o}) \\ 0, & o t h e r w i s e \end{matrix}

(13)

The normalization process preserves the relationships between homogeneous and heterogeneous spatial nodes, where the relevant weights of homogeneous nodes are positively correlated with their relevance, whereas the relevant weights of heterogeneous nodes are negatively correlated with their relevance. Furthermore, this normalization method filters out other spatial nodes aside from adjacent nodes, creating a sparse matrix that effectively reduces the computational burden while retaining key information within the graph structure.

(3) Aggregation of Spatial Dependency Relationships

By utilizing the spatial dependency graph

G^{s p a t i o} = (V^{s p a t i o}, E^{s p a t i o})

, spatial dependency relationships among multivariate variables are captured on the basis of a stacked

L

layer GCN:

H_{i}^{s p a t i o, L} = \sum_{v_{j} \in N b r_{H}^{s p a t i o} (v_{i}^{s p a t i o})} V_{j}^{s p a t i o, L - 1} \cdot E^{s p a t i o} [i, j] + \sum_{v_{j} \in N b r_{L}^{s p a t i o} (v_{i}^{s p a t i o})} V_{j}^{s p a t i o, L - 1} \cdot E^{s p a t i o} [i, j]

(14)

V_{i}^{s p a t i o, L} = σ (c a t (V_{i}^{s p a t i o, L - 1}, H_{i}^{s p a t i o, L}) \cdot W_{m a t r i x})

(15)

In this equation,

H_{i}^{s p a t i o, L}

denotes the aggregation of spatial adjacent nodes from the previous layer GCN model and the spatial node features

V^{s p a t i o, L - 1}

. These adjacent nodes include homogeneous spatial adjacent nodes

N b r_{H}^{s p a t i o} (v_{i}^{s p a t i o})

and heterogeneous spatial adjacent nodes

N b r_{L}^{s p a t i o} (v_{i}^{s p a t i o})

.

V_{i}^{s p a t i o, L}

represents the current spatial node features, updated through the aggregation

H_{i}^{s p a t i o, L}

and the previous layer spatial node features

V_{i}^{s p a t i o, L - 1}

.

σ

is the

G E L U

activation function, and

W_{m a t r i x}

is the learnable matrix. Finally,

V_{i}^{s p a t i o, L}

is normalized to obtain the output.

2.2.5. Prediction Strategy

The model employs a direct prediction strategy to achieve outcome forecasting. The final predicted results can be expressed as follows:

\hat{Y} = {{\hat{x}}_{1}^{t}, \dots, {\hat{x}}_{N}^{t}}_{t = S e q + 1}^{S e q + T} = L i n e a r_{T} (L i n e a r_{C} (V^{s p a t i o, L}))

(16)

In this equation, the decoder consists of two linear layers, where the first linear layer maps from the channel dimension

C

to 1 and the second linear layer maps from the input sequence length

s e q'

to the prediction step

T

.

2.2.6. Hybrid Loss Function Module

(1) Fourier transform

Fourier transform (FT) [47] is a mathematical technique for transforming signals from the time domain to the frequency domain. This transformation is based on the orthogonal properties of the Fourier basis function set, allowing time series to be decomposed into these basis functions. Specifically, the Fourier coefficients associated with frequency

k

, which represent the projection of the time series onto frequency

k

, can be obtained through inner product operations. Given a multivariate time series

X = {x_{1}^{t}, \dots, x_{N}^{t}}_{t = 1}^{S e q} \in ℝ^{S e q \times N}

, for the

N - t h

univariate time series

x_{N}^{t}

, its projection related to frequency

k

can be expressed as:

F_{k} (x_{N}^{t}) = \sum_{t = 0}^{S e q} x_{N}^{t} \cdot e^{j 2 π k t}, 0 \leq k \leq S e q

(17)

Here,

e

is the base of the natural logarithm, and

j

is the imaginary unit (

j^{2} = - 1

). The FT spans

0 \leq k \leq S e q

, denoted as

F = F (x_{N}^{t})

.

(2) Hybrid loss function

For a prediction task with a lookback time window of

s e q

and a prediction time step of

T

. The label time series is

Y = {x_{1}^{t}, \dots, x_{N}^{t}}_{t = S e q + 1}^{S e q + T} \in ℝ^{T \times N}

, and the generated prediction results are

\hat{Y} = {{\hat{x}}_{1}^{t}, \dots, {\hat{x}}_{N}^{t}}_{t = S e q + 1}^{S e q + T} \in ℝ^{T \times N}

. To eliminate the influence of label autocorrelation effects in direct multistep forecasting, a loss function is introduced that integrates errors from the time domain and the frequency domain. First, the time-domain error is computed on the basis of the model’s output, which can be expressed as:

L o s s^{t m p} = M A E (Y - \hat{Y})

(18)

Here,

L o s s^{t m p}

is the time-domain loss function, which uses common loss functions such as the mean absolute error (MAE) or the root mean square error (MSE).

Next, the label sequence and output results undergo a FT to compute the frequency-domain error, which can be expressed as:

L o s s^{f e q} = |F - \hat{F}| = |F (Y) - F (\hat{Y})|

(19)

Here,

L o s s^{f e q}

is the frequency-domain loss function, where

F = F (Y)

and

\hat{F} = F (\hat{Y})

represent the FT applied to the label sequence

Y

and the predicted results

\hat{Y}

, respectively.

|\cdot|

denotes the modulus operation applied to each element in the complex matrix. For a complex number

z = a + b i

,

|z| = \sqrt{a^{2} + b^{2}}

.

Finally, the time-domain and frequency-domain errors are fused, and the hybrid loss function can be expressed as:

L o s s^{h y b} = β \cdot L o s s^{t m p} + (1 - β) \cdot L o s s^{f e q}, 0 \leq β \leq 1

(20)

Here,

L o s s^{h y b}

is a hybrid loss function, where

β

represents the time-frequency domain error adjustment coefficient. When

β

is set to 1,

L o s s^{h y b}

represents the time-domain error, whereas when

β

is set to 0,

L o s s^{h y b}

represents the frequency-domain error.

The hybrid loss function preserves the efficient computational characteristics of direct prediction strategy and inherits their advantages in multitask processing. Moreover, it mitigates the impact of label sequence autocorrelation, significantly enhancing the model’s predictive ability.

3. Case Study

3.1. Study Area

The Chengdu Plain River Network is a typical plain river network in Southwest China (as shown in Figure 2). It is located in the middle reaches of the Min River and has evolved from a conventional irrigation district into a complex water that integrates various ecosystems, including irrigation areas, rural areas, and urban areas. The region features a complex distribution of reservoirs, ponds, weirs, and canals, with more than 150 rivers, spanning a total length of approximately 1500 km. The water volume of the Chengdu Plain River Network is primarily controlled by the Zipingpu Reservoir and is also regulated by the Dujiangyan Canal Head Project.

The main stream of the Min River flows from Wenxian County into Chengdu, where it splits into the inner and outer river systems through the Dujiangyan canal head. The Chengdu Plain River Network is characterized by a “single input and single output, with internal chaos.” Its topological structure can be summarized as a single main stream input and output, with the internal network showing highly complex, nonlinear features. In a neural network, the mapping relationship between the input and output layers is achieved through nonlinear transformations of the hidden layers. The complex connections and state transitions within these hidden layers resemble the internal water system structure of the Chengdu Plain River Network, both of which exhibit high uncertainty and dynamics. On the basis of this structural similarity, this study uses GNNs to simulate the complex connections and interactions within the river network.

3.2. Station Selection

Based on the structural characteristics of the Chengdu Plain River Network, eight WQ monitoring stations were selected. These stations cover key geographic regions across the entire river network. The selected WQ monitoring stations include the Weimen Bridge (W1), Dujiangyan (W2), Sanyi Bridge (W3), 201 Hospital (W4), Erjiang Temple (W5), Huanglongxi (W6), Yuedianzi Lower (W7), and Pengshan Minjiang Bridge (W8) stations.

In addition, the WQ of a river network is closely related to the hydrodynamic conditions, meteorological conditions, and operation of the river network’s control structures. Therefore, this study also considers the operation of key river control projects, the hydrological conditions within the river network, and the meteorological processes in the model. The following additional sites were selected: one key reservoir, Zipingpu Reservoir (Dam_zpp); two hydrological monitoring stations, Wangjianglou (H1) and Pengshan (H2); and two meteorological monitoring stations, Wenjiang (M1) and Shuangliu (M2). Details of the various monitoring stations are provided in Supplementary Materials.

3.3. Dataset and Data Preprocessing

WQ monitoring data for the study were collected from the China National Environmental Monitoring Center and the Sichuan Provincial Environmental Monitoring Center. The monitoring data were recorded every 4 h. The key WQ parameters include 9 critical indicators: water temperature (WT), pH, electrical conductivity (EC), turbidity (NTU), chemical oxygen demand (COD_Mn), total phosphorus (TP), total nitrogen (TN), ammonia nitrogen (NH₃-N), and dissolved oxygen (DO).

Additionally, hydrological, meteorological, and reservoir operation data were collected respectively from the Yangtze River Hydrological Real-Time Monitoring System, the China Meteorological Data Service Center, and the Zipingpu Reservoir Dispatch Center, with a data acquisition frequency of once every 24 h. Given the inconsistent recording frequencies between these data and WQ data, and that their variations are gradual, linear interpolation was applied to temporally extend these data. This was done to fully utilize data, increase the model’s training samples, and match the time step of WQ data. Given the temporal misalignment and gradual variations between these datasets and water quality (WQ) measurements, linear interpolation temporally extends the data to maximize WQ data utilization, augment training samples, and align with WQ monitoring time resolution.

Owing to sensor malfunctions or other nontechnical factors, some of the monitoring stations’ data records had missing values. To ensure the completeness of the dataset, missing data were filled in using linear interpolation.

The dataset spans from January 2020 to December 2023 and comprises approximately 420,000 entries. Details of the dataset are provided in Table 1 below. The dataset was split into training, validation, and testing sets at a 7:2:1 ratio to ensure the model’s generalizability and robustness.

3.4. Prediction Tasks

Huanglongxi (S6) is located at the exit point of the Chengdu Plain Inner River network. Its WQ directly reflects the overall WQ status of the Chengdu Plain River network, making it a key monitoring point for environmental managers. Therefore, Huanglongxi is selected as the target station for prediction.

COD_Mn, total phosphorus TP, total nitrogen TN, and DO represent oxygen-consuming pollutants, nutrient levels, eutrophication degree, and self-purification capacity of the water body, respectively. These WQ indicators are of significant reference value for water environment management and ecological protection; thus, they are chosen as the target prediction parameters in this study.

The goal of the model is to predict the WQ at the Huanglongxi station for both short- and long-term periods to meet various environmental management needs. The prediction time steps are set as 1, 6, 12, 24, and 42 h, corresponding to predictions for the next 4 h, 1 day, 2 days, 4 days, and 7 days, respectively. Among these tasks, the short-term prediction tasks involve forecasting the WQ for the next 4 h and 1 day, whereas the long-term prediction tasks aim to forecast the WQ for the next 2 days, 4 days, and 7 days. Owing to the long-term trends in river network WQ data, the model’s retrospective time window is set to 42 h. The prediction task information for the model is summarized in Table 2. The ST-GNN is implemented using Python 3.8, developed and tested with the PyTorch 1.7.1 deep learning framework. The remaining model hyperparameters are detailed in Table 3.

3.5. Baseline Models

In this study, the ST-GNN is compared with five baseline models based on deep learning architectures, including LSTM [48], the gated recurrent unit (GRU) [49,50], transformer, DLinear [51], and TimesNet [52], to validate the superiority of the ST-GNN in terms of prediction performance. To ensure a fair comparison of model performance, the prediction tasks and dataset splits for the baseline models are set to be consistent with those used in the ST-GNN. Furthermore, the hyperparameters and network structures of the baseline models are optimized to ensure that they achieve the best performance in the experiments.

3.6. Evaluation Metrics

To evaluate the prediction performance of the models, three widely used metrics are employed: the mean absolute error (MAE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE). The formulas are as follows:

M A E = \frac{1}{T_{t e s t}} \sum_{t = 1}^{T_{t e s t}} | Y_{t} - {\hat{Y}}_{t} |

(21)

R M S E = \sqrt{\frac{1}{T_{t e s t}} \sum_{t = 1}^{T_{t e s t}} {(Y_{t} - {\hat{Y}}_{t})}^{2}}

(22)

M A P E = \frac{1}{T_{t e s t}} \sum_{t = 1}^{T_{t e s t}} |\frac{Y_{t} - {\hat{Y}}_{t}}{Y_{t}}| \times 100 %

(23)

Here,

T_{t e s t}

represents the total number of time steps in the test dataset, whereas

Y_{t}

and

{\hat{Y}}_{t}

denote the actual and predicted values of the time series at the

t - t h

time step, respectively.

4. Results and Discussion

4.1. Spatiotemporal Characteristics of Water Quality Data

For WQ issues in plain river networks, the variation in WQ is influenced not only by the internal characteristics of the time series (i.e., temporal dependence) but also by the interseries dependency (i.e., spatial dependence). Modeling solely in one dimension leads to critical information regarding WQ changes being overlooked. The ST dependencies in the river network can be expressed as follows:

Temporal Dependence: Changes in river network WQ exhibit significant periodic patterns, which may be closely related to seasonal cycles and human activities. Spatial Dependence: The topological structure of the river network determines the direction and path of the water flow, which, in turn, exerts a substantial influence on trends in river WQ changes. In river networks, the WQ variations upstream directly impact the downstream WQ, and changes in WQ parameters at adjacent monitoring stations may follow similar patterns [53]. In addition, hydrological and meteorological factors can also influence trends in river WQ changes.

To explore the temporal dependence of WQ data, this study applies the t-distributed stochastic neighbor embedding (t-SNE) algorithm [54] for dimensionality reduction and then uses the kernel density estimation method [55] to visually analyze the data distributions of the training and testing datasets. As shown in Figure 3a, the data exhibit clear and stable temporal patterns, indicating pronounced periodic features. Additionally, the data distributions of both the training and testing sets show a high degree of similarity. Furthermore, this study employs the Spearman correlation coefficient [56] to analyze the spatial dependence of WQ indicators across different monitoring stations. The analysis results are presented in Figure 3b, where the correlations between monitoring stations W3~W4 and W5~W8 indicate significant correlations between the WQ indicators of adjacent stations. In contrast, the correlation between monitoring stations W1~W2 is relatively weak. Moreover, significant correlations are observed between different WQ indicators at the same monitoring station. Specifically, COD_Mn is positively correlated with TP and TN, whereas DO is negatively correlated with COD_Mn, TP, and TN. The inherent ST characteristics within WQ data provide the foundation for modeling such dependencies.

It is worth noting that sufficient available data is a prerequisite for model construction. Second, the data should exhibit ST characteristics. Otherwise, the model cannot capture such dependencies effectively. After the above conditions are met, ST-GNN can be applied to most plain river network areas, not limited to the Chengdu Plain river network.

4.2. Hyperparameter Sensitivity Analysis

In this study, a hyperparameter sensitivity analysis is conducted on the four core modules of the ST-GNN: the adaptive multiperiod enhancement module, the temporal periodic dependence module, the multivariate spatial dependence module, and the hybrid loss function module. The impact of various hyperparameters on the model’s predictive performance is monitored, and on the basis of the results, adjustments are made to the model’s hyperparameters to achieve optimal prediction accuracy. The experimental results for different WQ indicators show similar trends; therefore, only the hyperparameter experimental results for COD_Mn are analyzed here.

4.2.1. Top Frequency $z$

In the adaptive multiperiod enhancement module, the hyperparameter

z

directly affects the construction of the multiperiod scale sequence

X' = C o n c a t (X_{1}, X_{2}, \dots, X_{Z}) \in ℝ^{S e q' \times N}

. A larger value of

z

increases the number of amplitudes in the frequency domain that the model focuses on, thereby increasing the number of key periodic scales in the multiperiod scale sequence

X'

. This hyperparameter thus regulates the sensitivity of the adaptive multiperiod enhancement, controlling its enhancement effect on the sequence data.

The hyperparameter

z

has a value ranging from 1 to 10. The experimental results (as shown in Figure 4) indicate that as the value of

z

increases, the prediction error tends to decrease. However, when the value of

z

increases beyond 5, the model’s predictive performance becomes less sensitive to changes in

z

. This phenomenon suggests that moderately focusing on the key periodic scales of the sequence during modeling can have a positive effect on capturing ST dependencies, thereby effectively improving the model’s prediction accuracy.

4.2.2. Time Node $M$

In the temporal period dependence module, the number of time nodes

m_{z}

is controlled by the hyperparameter

M

. Specifically, a smaller value of

M

means that the model considers fewer time-related nodes, which, especially in cases where multiperiod interactions are significant, may limit the model’s ability to capture complex periodic patterns. On the other hand, a larger value of

M

means that the model will focus on more time nodes, increasing its sensitivity to periodic changes but also increasing the model’s complexity, which may lead to the risk of overfitting.

The hyperparameter

M

ranges from 5 to 50. The experimental results (as shown in Figure 5) indicate that as

M

increases, the prediction error decreases. However, when

M

exceeds 15, further increases in

M

have a limited impact on the prediction performance. This phenomenon suggests that during the process of capturing temporal dependencies, the model tends to focus on strong interactions between related nodes and that paying attention to more nodes does not significantly increase the prediction accuracy.

4.2.3. Spatial Adjacent Node $H (L)$

In the spatial dependence module, the number of spatial adjacent nodes is controlled by the hyperparameters

H

and

L

.

H

represents the number of homogeneous nodes, whereas

L

represents the number of heterogeneous nodes. A smaller value for hyperparameter

H (L)

means that the model focuses on fewer spatial nodes, which may limit its ability to capture complex spatial patterns. On the other hand, a larger value for

H (L)

makes the model focus on more spatial nodes, enhancing its ability to capture spatial dependencies but also increasing the model’s complexity.

The hyperparameter

H (L)

ranges from 5 to 50. The experimental results (as shown in Figure 6) indicate that as

H (L)

increases, the prediction error decreases. However, when

H (L)

increases beyond 20, the model’s prediction performance becomes less sensitive to changes in

H (L)

. This finding suggests that during the process of capturing spatial dependencies, the model needs to focus only on strongly correlated spatially adjacent nodes to capture spatial dependency features effectively.

4.2.4. Time–Frequency Domain Error Adjustment Coefficient $β$

In the hybrid loss function module,

β

is the time–frequency domain error adjustment coefficient. By controlling the weight of the frequency domain loss in the mixed loss function, the hyperparameter

β

helps eliminate the effects of label autocorrelation in direct prediction strategy, thereby improving the model’s performance in long-term prediction tasks.

This hyperparameter ranges from 0 to 1. The experimental results (as shown in Figure 7) indicate that when the prediction step size is 1, introducing a frequency domain loss does not improve model performance because of the absence of label autocorrelation effects. In most other prediction tasks, when

β

is close to 1, the model’s prediction error reaches its minimum. This finding suggests that achieving an appropriate balance between time-domain and frequency-domain losses and placing appropriate emphasis on frequency-domain loss can effectively eliminate the impact of label autocorrelation and enhance model prediction performance.

4.3. Model Prediction Performance Comparison

To validate the predictive performance of the ST-GNN, this study presents a comparison analysis of the ST-GNN and other baseline models in both short- and long-term prediction tasks. The results of the model prediction performance are shown in Figure 8. The experimental results demonstrate that the ST-GNN can effectively capture the ST dependence characteristics of the river network’s WQ, achieving the best prediction accuracy in the vast majority of both short- and long-term prediction tasks. Details of the experimental results are provided in Supplementary Materials.

In all prediction tasks, compared with those of models based on recurrent neural networks (LSTM and GRU), the ST-GNN reduces the MAE, RMSE, and MAPE by 46.62%, 37.68%, and 45.67%, respectively. Compared with those of the transformer model, the reductions in the MAE, RMSE, and MAPE are 39.89%, 31.48%, and 37.87%, respectively. Compared with those of the TimesNet model, the ST-GNN reduces the MAE, RMSE, and MAPE by 20.04%, 15.83%, and 19.63%, respectively. Finally, compared with those of the DLinear model, the reductions in the MAE, RMSE, and MAPE are 7.49%, 2.70%, and 10.16%, respectively.

Compared with that of the other models, the ST-GNN achieves a greater performance improvement in short-term prediction tasks (4-h and 1-day ahead). Specifically, compared with those of models based on recurrent neural networks (LSTM and GRU), the MAPE is reduced by 55.72%; compared with that of the transformer model, the MAPE is reduced by 46.82%; compared with that of TimesNet, the MAPE is reduced by 31.37%; and compared with that of DLinear, the MAPE is reduced by 9.54%. For long-term prediction tasks (2 days, 4 days, and 7 days ahead), the improvement in predictive performance is relatively small. Compared with those of models based on recurrent neural networks (LSTM and GRU), the MAPE is reduced by 39.66%; compared with that of the transformer model, the MAPE is reduced by 33.06%; compared with that of TimesNet, the MAPE is reduced by 13.37%; and compared with that of DLinear, the MAPE is reduced by 10.54%.

The ST-GNN demonstrates varying levels of performance improvement across different WQ indicators. In the DO prediction task, the performance improvement is the greatest, with the MAPE being reduced by 4.30% to 66.83% compared with that of the baseline models. In the TP prediction task, the MAPE is reduced by 12.67% to 39.58%; in the TN prediction task, the MAPE is reduced by 7.10% to 43.23%; and in the COD_Mn prediction task, the MAPE is reduced by 5.01% to 36.20%.

Notably, the ST-GNN achieves a significant improvement in prediction accuracy for the DO task. This can be attributed to two factors: (1) DO levels exhibit more distinct cyclical variations than other WQ indicators do, and these changes are influenced primarily by temperature fluctuations [57]. (2) The adaptive multiperiod enhancement technique used in the ST-GNN effectively identifies and extracts temporal features at key periodic scales, while the temporal period dependency module is specifically designed to capture temporal dependency. This modeling strategy enables the ST-GNN to accurately predict the cyclical variations in DO. However, fluctuations of COD_Mn, TP and TN are mainly driven by the non-seasonal discharge of organic loading, including point-source emissions [58], non-point source pollution [59], and storm runoff [60]. Their variation lacks obvious periodic patterns, thus ST-GNN struggle to effectively capture such patterns, resulting in the prediction performance being less satisfactory than that of DO.

Due to data availability, we are unable to use such data to train the model. However, it can be envisaged that incorporating environmental data such as land use and real-time pollution source loading would help the model better capture the variation patterns of WQ in river networks and improve its prediction performance.

4.4. Long-Term Prediction Ability of Models

To provide a more intuitive comparison of the prediction performance, the long-term prediction results of the ST-GNN and baseline models are visualized, as shown in Figure 9. Specifically, the model uses historical data from the past 42 time steps (7 days) to predict the WQ change trends for the next 42 time steps (7 days).

Despite variations in the data’s temporal patterns, the ST-GNN is able to effectively predict long-term trends. In particular, for the prediction of DO, which exhibits periodic fluctuations, the ST-GNN performs exceptionally well. For parameters such as COD_Mn, TP, and TN, although the data show less distinct periodic fluctuations, the ST-GNN still manages to predict the long-term trend with considerable accuracy, demonstrating a significant improvement in prediction performance compared with that of the baseline models.

Figure 10 illustrates the correlation between maximum variation magnitudes of WQ indicators and prediction errors in testing set. Results reveal a positive correlation, with TP prediction errors exhibiting the highest sensitivity to maximum variation magnitude, while DO shows the lowest. When maximum variation magnitudes remain below critical thresholds (COD_Mn:60%, DO:30%, TP:75%, TN:75%), prediction errors are acceptable. Beyond these thresholds, errors increase substantially, indicating limitations in extreme-scenario forecasting and necessitating cautious application. Due to limited representation of high-magnitude variation samples in training and validation sets, the model inadequately learns the patterns of WQ variation under extreme scenarios, constraining generalization performance [61]. Mitigation strategies include augmenting extreme-scenario samples in training data or fine-tuning models via transfer learning [62,63].

4.5. Ablation Experiment Results

In this study, an ablation experiment was conducted to assess the contribution of each module in the ST-GNN to the prediction accuracy. The experiment was performed under identical training iterations and hyperparameter conditions. The experimental setup includes (1) removing the adaptive multiperiod enhancement module (Ampe); (2) removing the temporal period dependency module (Temp); (3) removing the multivariate spatial dependency module (Spat); and (4) removing the hybrid loss function module (Loss). These models are referred to as Wo/Ampe, Wo/Temp, Wo/spat, and Wo/Loss, respectively. Since the experimental results for different WQ indicators exhibit similar patterns, only the ablation results for the COD_Mn prediction task are analyzed.

As shown in Figure 11, the ablation results indicate that the core modules make a positive contribution to the prediction performance. The MAPE of the ST-GNN was reduced by an average of 15.45%, 14.96%, 15.36%, and 12.03% compared to the Wo/Ampe, Wo/Temp, Wo/Spat, and Wo/Loss models, respectively. In general, as the prediction time step increases, the negative impact of removing core modules on the prediction accuracy becomes more pronounced, highlighting the importance of these core modules in long-term prediction tasks. Among all the ablation experiments, the Wo/Ampe model showed the most significant decrease in prediction accuracy, emphasizing that rich and effective data are fundamental to constructing deep learning models. While the Temp, Spat, and Loss modules have varying degrees of impact on model performance across different prediction time steps, their overall positive influence on enhancing model performance is consistent, suggesting that the ST dependencies in the data should be incorporated during modeling. Furthermore, when direct prediction strategy be used, attention should be given to the potential negative effects of label autocorrelation on long-term predictions.

4.6. Elimination of Label Autocorrelation

The direct strategy is widely adopted in deep learning frameworks. However, it has certain limitations, particularly with respect to the autocorrelation present within the label sequence. This strategy assumes the independence of the label sequence, thus neglecting the internal autocorrelation, as indicated by the yellow arrow in Figure 12a, and only models the mapping relationship from input to output, as shown by the black arrow in Figure 12a. This assumption is inconsistent with the actual characteristics of the data, potentially limiting the model’s predictive performance.

To validate the effect of the hybrid loss function in eliminating label sequence autocorrelation, we visualized the autocorrelation of the label sequence in both the time and frequency domains using the Spearman correlation coefficient [56]. The results are shown in Figure 12b and Figure 12c, respectively. In Figure 12b, the off-diagonal elements exhibit significant correlations, confirming the presence of autocorrelation in the time domain, which contradicts the assumption of independence in the direct multistep prediction strategy. By applying the hybrid loss function and performing a FT on the data, the label sequence is converted from the time domain to the frequency domain, and the correlations of the off-diagonal elements are significantly reduced, as shown in Figure 12c. This change indicates that the hybrid loss function effectively eliminates the impact of label sequence autocorrelation on predictions, allowing the model to avoid biases caused by autocorrelation when the direct multistep prediction strategy is used, thereby increasing prediction accuracy.

5. Conclusions and Outlook

To address the inadequate capture of spatiotemporal dependencies in existing data-driven models and the misalignment between direct prediction strategy assumptions with actual data characteristics, this study proposed the ST-GNN to improve WQ prediction performance for plain river networks. First, the adaptive multiperiod enhancement module was designed to extract data at key periodic scales, providing feature-rich data support for modeling ST dependencies. Second, the temporal period dependency module focused on mining cyclical patterns in WQ sequences across different time scales, while the multivariate spatial dependency module analyzed coupling characteristics among multiple monitoring indicators. Their integration significantly enhanced the capability to capture ST dependencies in WQ prediction. Third, the hybrid loss function module effectively mitigated the interference caused by label autocorrelation during the model training process.

ST-GNN achieved high accuracy in both short- and long-term prediction tasks, as demonstrated in the case study of the river network in the Chengdu Plain. Compared with the baseline models, the ST-GNN reduced prediction errors (MAE, RMSE, MAPE) by up to 46.62%, 37.68%, and 45.67%. Although different WQ indicators exhibit distinct temporal patterns, the model could predict long-term trends, and particularly excelled for periodically fluctuating indicators. Ablation experiment results showed that as the prediction horizon increases, the removal of core modules has a more significant negative impact on the prediction accuracy, highlighting the importance of these modules. Through the visualization of autocorrelation analysis results, it was demonstrated that the hybrid loss function effectively mitigated label autocorrelation. The proposed ST-GNN introduces a novel approach for WQ predicting in plain river networks, providing a valuable reference for early warning of water pollution, pollution source management, and water resource allocation in plain river networks.

Future studies can be expanded in two directions. First, for data with less obvious periodic characteristics, more effective prediction models can be designed by incorporating advanced feature extraction techniques and dynamic modeling methods, thereby improving prediction accuracy. Second, when the WQ prediction model is coupled with a water resource allocation model, multimodel fusion and collaborative optimization techniques can be employed to ensure that the WQ prediction results are directly applied to water resource allocation decisions, guiding regional WQ management and thereby enhancing the scientific and effective management of regional water resources.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17172543/s1, Table S1. Details of the monitoring stations in the study area. Table S2. Models prediction performance.

Author Contributions

Conceptualization, M.Y.; Data curation, W.Z.; Formal analysis, M.Y.; Funding acquisition, Y.L., L.Z. and W.Z.; Investigation, Y.L. and W.Z.; Methodology, M.Y.; Project administration, Y.L., L.Z. and J.L.; Supervision, Y.L. and L.Z.; Validation, X.Z.; Writing—original draft, M.Y.; Writing—review & editing, M.Y., Y.L., L.Z., X.Z. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Key Research and Development Program, China (Grant No. 2022YFC3202403). The authors declare that this study received funding from the Scientific Research Program of Sichuan Zipingpu Development Co., Ltd. (Grant No. ZPPC2020[R]-02). The funder (Wenjie Zhao) had the following involvement with the study: Data curation, Funding acquisition and Investigation.

Data Availability Statement

The data are not publicly available due to the continuation of a follow-up study by the authors.

Acknowledgments

The authors are grateful to the editors and the anonymous reviewers for their insightful comments and suggestions.

Conflicts of Interest

Author Wenjie Zhao was employed by the company Sichuan Province Zipingpu Development Corporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lai, Z.; Li, S.; Lv, G.; Pan, Z.; Fei, G. Watershed Delineation Using Hydrographic Features and a DEM in Plain River Network Region: Watershed Delineation in Plain River Network Region. Hydrol. Process. 2016, 30, 276–288. [Google Scholar] [CrossRef]
Li, L.; Knapp, J.L.A.; Lintern, A.; Ng, G.-H.C.; Perdrial, J.; Sullivan, P.L.; Zhi, W. River Water Quality Shaped by Land–River Connectivity in a Changing Climate. Nat. Clim. Chang. 2024, 14, 225–237. [Google Scholar] [CrossRef]
Xu, R.; Hu, S.; Wan, H.; Xie, Y.; Cai, Y.; Wen, J. A Unified Deep Learning Framework for Water Quality Prediction Based on Time-Frequency Feature Extraction and Data Feature Enhancement. J. Environ. Manag. 2024, 351, 119894. [Google Scholar] [CrossRef]
Huan, S. A Novel Interval Decomposition Correlation Particle Swarm Optimization-Extreme Learning Machine Model for Short-Term and Long-Term Water Quality Prediction. J. Hydrol. 2023, 625, 130034. [Google Scholar] [CrossRef]
Avila, R.; Horn, B.; Moriarty, E.; Hodson, R.; Moltchanova, E. Evaluating Statistical Model Performance in Water Quality Prediction. J. Environ. Manag. 2018, 206, 910–919. [Google Scholar] [CrossRef] [PubMed]
Paliwal, R.; Sharma, P.; Kansal, A. Water Quality Modelling of the River Yamuna (India) Using QUAL2E-UNCAS. J. Environ. Manag. 2007, 83, 131–144. [Google Scholar] [CrossRef]
Zhang, R.; Qian, X.; Li, H.; Yuan, X.; Ye, R. Selection of Optimal River Water Quality Improvement Programs Using QUAL2K: A Case Study of Taihu Lake Basin, China. Sci. Total Environ. 2012, 431, 278–285. [Google Scholar] [CrossRef] [PubMed]
Tang, T.J.; Yang, S.; Peng, Y.; Yin, K.; Zou, R. Eutrophication Control Decision Making Using EFDC Model for Shenzhen Reservoir, China. Water Resour. 2017, 44, 308–314. [Google Scholar] [CrossRef]
Douglas-Mankin, K.R.; Srinivasan, R.; Arnold, J.G. Soil and Water Assessment Tool (Swat) Model: Current Developments and Applications. Trans. ASABE 2010, 53, 1423–1431. [Google Scholar] [CrossRef]
Wellen, C.; Kamran-Disfani, A.-R.; Arhonditsis, G.B. Evaluation of the Current State of Distributed Watershed Nutrient Water Quality Modeling. Environ. Sci. Technol. 2015, 49, 3278–3290. [Google Scholar] [CrossRef]
Cui, F.; Park, C.; Kim, M. Application of Curve-Fitting Techniques to Develop Numerical Calibration Procedures for a River Water Quality Model. J. Environ. Manag. 2019, 249, 109375. [Google Scholar] [CrossRef]
Jiang, L.; Li, Y.; Zhao, X.; Tillotson, M.R.; Wang, W.; Zhang, S.; Sarpong, L.; Asmaa, Q.; Pan, B. Parameter Uncertainty and Sensitivity Analysis of Water Quality Model in Lake Taihu, China. Ecol. Model. 2018, 375, 1–12. [Google Scholar] [CrossRef]
Huang, Y.; Cai, Y.; Dai, C.; He, Y.; Wan, H.; Guo, H.; Zhang, P. An Integrated Simulation-Optimization Approach for Combined Allocation of Water Quantity and Quality under Multiple Uncertainties. J. Environ. Manag. 2024, 363, 121309. [Google Scholar] [CrossRef] [PubMed]
Talukdar, P.; Kumar, B.; Kulkarni, V.V. A Review of Water Quality Models and Monitoring Methods for Capabilities of Pollutant Source Identification, Classification, and Transport Simulation. Rev. Environ. Sci. Biotechnol. 2023, 22, 653–677. [Google Scholar] [CrossRef]
Ilampooranan, I.; Van Meter, K.J.; Basu, N.B. A Race Against Time: Modeling Time Lags in Watershed Response. Water Resour. Res. 2019, 55, 3941–3959. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Prabhat Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Wu, J.; Wang, Z. A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water 2022, 14, 610. [Google Scholar] [CrossRef]
Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Zuo, M.; Zou, X.; Wang, J.; et al. Comparative Analysis of Surface Water Quality Prediction Performance and Identification of Key Water Parameters Using Different Machine Learning Models Based on Big Data. Water Res. 2020, 171, 115454. [Google Scholar] [CrossRef]
Najah Ahmed, A.; Binti Othman, F.; Abdulmohsin Afan, H.; Khaleel Ibrahim, R.; Ming Fai, C.; Shabbir Hossain, M.; Ehteram, M.; Elshafie, A. Machine Learning Methods for Better Water Quality Prediction. J. Hydrol. 2019, 578, 124084. [Google Scholar] [CrossRef]
Ahmed, U.; Mumtaz, R.; Anwar, H.; Shah, A.A.; Irfan, R.; García-Nieto, J. Efficient Water Quality Prediction Using Supervised Machine Learning. Water 2019, 11, 2210. [Google Scholar] [CrossRef]
Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for Developing ANN Models and Its Application to the Assessment of the Quality of the ANN Model Development Process in Drinking Water Quality Modelling. Environ. Model. Softw. 2014, 54, 108–127. [Google Scholar] [CrossRef]
Mahmoudi, N.; Orouji, H.; Fallah-Mehdipour, E. Integration of Shuffled Frog Leaping Algorithm and Support Vector Regression for Prediction of Water Quality Parameters. Water Resour. Manag. 2016, 30, 2195–2211. [Google Scholar] [CrossRef]
Xu, J.; Xu, Z.; Kuang, J.; Lin, C.; Xiao, L.; Huang, X.; Zhang, Y. An Alternative to Laboratory Testing: Random Forest-Based Water Quality Prediction Framework for Inland and Nearshore Water Bodies. Water 2021, 13, 3262. [Google Scholar] [CrossRef]
Baek, S.-S.; Pyo, J.; Chun, J.A. Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach. Water 2020, 12, 3399. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Fitch, P.; Thorburn, P.J. Predicting the Trend of Dissolved Oxygen Based on the kPCA-RNN Model. Water 2020, 12, 585. [Google Scholar] [CrossRef]
Hien Than, N.; Dinh Ly, C.; Van Tat, P. The Performance of Classification and Forecasting Dong Nai River Water Quality for Sustainable Water Resources Management Using Neural Network Techniques. J. Hydrol. 2021, 596, 126099. [Google Scholar] [CrossRef]
Sun, W.; Chang, L.-C.; Chang, F.-J. Deep Dive into Predictive Excellence: Transformer’s Impact on Groundwater Level Prediction. J. Hydrol. 2024, 636, 131250. [Google Scholar] [CrossRef]
Bao, Y.; Xiong, T.; Hu, Z. Multi-Step-Ahead Time Series Prediction Using Multiple-Output Support Vector Regression. Neurocomputing 2014, 129, 482–493. [Google Scholar] [CrossRef]
Chevillon, G. Direct Multi-Step Estimation and Forecasting. J. Econ. Surv. 2007, 21, 746–785. [Google Scholar] [CrossRef]
Taieb, S.B.; Atiya, A.F. A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting. IEEE Trans. Neural Netw. Learning Syst. 2016, 27, 62–76. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Zheng, W. Multistage Attention Network for Multivariate Time Series Prediction. Neurocomputing 2020, 383, 122–137. [Google Scholar] [CrossRef]
Wu, B.; Liang, X.; Zhang, S.; Xun, R. Advancesand Applications in Graph Neural Network. Chin. J. Comput. 2022, 45, 35–68. [Google Scholar]
Gori, M.; Monfardini, G.; Scarselli, F. A New Model for Learning in Graph Domains. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Montreal, QC, Canada, 31 July–4 August 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. 729–734. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the Advances in Neural Information Processing Systems29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Neural Information Processing Systems: San Diego, CA, USA, 2016. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Bloemheuvel, S.; van den Hoogen, J.; Atzmueller, M. Graph Construction on Complex Spatiotemporal Data for Enhancing Graph Neural Network-Based Approaches. Int. J. Data Sci. Anal. 2024, 18, 157–174. [Google Scholar] [CrossRef]
Li, Z.; Liu, H.; Zhang, C.; Fu, G. Real-Time Water Quality Prediction in Water Distribution Networks Using Graph Neural Networks with Sparse Monitoring Data. Water Res. 2024, 250, 121018. [Google Scholar] [CrossRef]
Li, P.; Hao, H.; Zhang, Z.; Mao, X.; Xu, J.; Lv, Y.; Chen, W.; Ge, D. A Field Study to Estimate Heavy Metal Concentrations in a Soil-Rice System: Application of Graph Neural Networks. Sci. Total Environ. 2022, 832, 155099. [Google Scholar] [CrossRef]
Zanfei, A.; Brentan, B.M.; Menapace, A.; Righetti, M.; Herrera, M. Graph Convolutional Recurrent Neural Networks for Water Demand Forecasting. Water Resour. Res. 2022, 58, e2022WR032299. [Google Scholar] [CrossRef]
Xiao, Y.; Yin, H.; Zhang, Y.; Qi, H.; Zhang, Y.; Liu, Z. A Dual-stage Attention-based conv-LSTM Network for Spatio-temporal Correlation and Multivariate Time Series Prediction. Int. J. Intell. Syst. 2021, 36, 2036–2057. [Google Scholar] [CrossRef]
Fan, J.; Zhang, K.; Huang, Y.; Zhu, Y.; Chen, B. Parallel Spatio-Temporal Attention-Based TCN for Multivariate Time Series Prediction. Neural Comput. Appl. 2023, 35, 13109–13118. [Google Scholar] [CrossRef]
Hogrefe, C.; Vempaty, S.; Rao, S.T.; Porter, P.S. A Comparison of Four Techniques for Separating Different Time Scales in Atmospheric Variables. Atmos. Environ. 2003, 37, 313–325. [Google Scholar] [CrossRef]
Welch, P. The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms. IEEE Trans. Audio Electroacoust. 1967, 15, 70–73. [Google Scholar] [CrossRef]
Hastaoglu, K.O.; Poyraz, F.; Erdogan, H.; Tiryakioglu, İ.; Ozkaymak, C.; Duman, H.; Gül, Y.; Guler, S.; Dogan, A.; Gul, Y. Determination of Periodic Deformation from InSAR Results Using the FFT Time Series Analysis Method in Gediz Graben. Nat. Hazards 2023, 117, 491–517. [Google Scholar] [CrossRef]
Karatay, S. Estimation of Frequency and Duration of Ionospheric Disturbances over Turkey with IONOLAB-FFT Algorithm. J. Geodesy 2020, 94, 89. [Google Scholar] [CrossRef]
Griffin, D.; Lim, J. Signal Estimation from Modified Short-Time Fourier Transform. IEEE Trans. Acoust. Speech Signal Process. 1984, 32, 236–243. [Google Scholar] [CrossRef]
Liu, P.; Wang, J.; Sangaiah, A.K.; Xie, Y.; Yin, X. Analysis and Prediction of Water Quality Using LSTM Deep Neural Networks in IoT Environment. Sustainability 2019, 11, 2058. [Google Scholar] [CrossRef]
Zhang, X.; Chen, X.; Zheng, G.; Cao, G. Improved Prediction of Chlorophyll-a Concentrations in Reservoirs by GRU Neural Network Based on Particle Swarm Algorithm Optimized Variational Modal Decomposition. Environ. Res. 2023, 221, 115259. [Google Scholar] [CrossRef] [PubMed]
Peng, L.; Wu, H.; Gao, M.; Yi, H.; Xiong, Q.; Yang, L.; Cheng, S. TLT: Recurrent Fine-Tuning Transfer Learning for Water Quality Long-Term Prediction. Water Res. 2022, 225, 119171. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? arXiv 2022, arXiv:2205.13504. [Google Scholar] [CrossRef]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. arXiv 2023, arXiv:2210.02186. [Google Scholar]
Zhang, Y.; Rashid, A.; Guo, S.; Jing, Y.; Zeng, Q.; Li, Y.; Adyari, B.; Yang, J.; Tang, L.; Yu, C.-P.; et al. Spatial Autocorrelation and Temporal Variation of Contaminants of Emerging Concern in a Typical Urbanizing River. Water Res. 2022, 212, 118120. [Google Scholar] [CrossRef] [PubMed]
van der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. Kernel Density Estimation via Diffusion. Ann. Statist. 2010, 38, 2916–2957. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
Zhi, W.; Ouyang, W.; Shen, C.; Li, L. Temperature Outweighs Light and Flow as the Predominant Driver of Dissolved Oxygen in US Rivers. Nat. Water 2023, 1, 249–260. [Google Scholar] [CrossRef]
Bouriqi, A.; Ouazzani, N.; Deliege, J.-F. Modeling the Impact of Urban and Industrial Pollution on the Quality of Surface Water in Intermittent Rivers in a Semi-Arid Mediterranean Climate. Hydrology 2024, 11, 150. [Google Scholar] [CrossRef]
Kaiser, D.; Unger, D.; Qiu, G.; Zhou, H.; Gan, H. Natural and Human Influences on Nutrient Transport through a Small Subtropical Chinese Estuary. Sci. Total Environ. 2013, 450–451, 92–107. [Google Scholar] [CrossRef]
Xiao, Y.; Zhang, C.; Zhang, T.; Luan, B.; Liu, J.; Zhou, Q.; Li, C.; Cheng, H. Transport Processes of Dissolved and Particulate Nitrogen and Phosphorus over Urban Road Surface during Rainfall Runoff. Sci. Total Environ. 2024, 948, 174905. [Google Scholar] [CrossRef] [PubMed]
Nguyen, K.T.N.; François, B.; Balasubramanian, H.; Dufour, A.; Brown, C. Prediction of Water Quality Extremes with Composite Quantile Regression Neural Network. Environ. Monit. Assess. 2023, 195, 284. [Google Scholar] [CrossRef]
Chia, M.Y.; Koo, C.H.; Huang, Y.F.; Di Chan, W.; Pang, J.Y. Artificial Intelligence Generated Synthetic Datasets as the Remedy for Data Scarcity in Water Quality Index Estimation. Water Resour. Manag. 2023, 37, 6183–6198. [Google Scholar] [CrossRef]
Chen, S.; Huang, J.; Wang, P.; Tang, X.; Zhang, Z. A Coupled Model to Improve River Water Quality Prediction towards Addressing Non-Stationarity and Data Limitation. Water Res. 2024, 248, 120895. [Google Scholar] [CrossRef]

Figure 1. The ST-GNN architecture comprises adaptive multiperiod enhancement module, temporal period dependency module, multivariate spatial dependency module, and hybrid loss function module.

Figure 2. Study area: Chengdu Plain River Network and the geographical locations of monitoring stations.

Figure 3. Spatiotemporal characteristics of WQ data: (a) Temporal periodicity features after dimensionality reduction using t-SNE. (b) Correlations of WQ indicators among different monitoring station.