A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture

Liu, Wenjing; Wang, Ji; Li, Zhenhua; Lu, Qingjie

doi:10.3390/electronics14020331

Open AccessArticle

A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture

by

Wenjing Liu

^1,2,

Ji Wang

^1,2,*,

Zhenhua Li

^1,2 and

Qingjie Lu

^1,2

¹

School of Electronic and Information Engineering, Guangdong Ocean University, Zhanjiang 524088, China

²

Guangdong Province Smart Ocean Sensor Network and Equipment Engineering Technology Research Center, Guangdong Ocean University, Zhanjiang 524088, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(2), 331; https://doi.org/10.3390/electronics14020331

Submission received: 3 December 2024 / Revised: 26 December 2024 / Accepted: 14 January 2025 / Published: 15 January 2025

Download

Browse Figures

Versions Notes

Abstract

The aquatic environment in aquaculture serves as the foundation for the survival and growth of aquatic animals, while a high-quality water environment is a necessary condition for promoting efficient and healthy aquaculture development. To effectively guide early warnings and the regulation of water quality in aquaculture, this study proposes a predictive model based on a dual-channel and dual-attention mechanism, namely, the DAM-ResNet-LSTM model. This model encompasses two parallel feature extraction channels: a residual network (ResNet) and long short-term memory (LSTM), with dual-attention mechanisms integrated into each channel to enhance the model’s feature representation capabilities. Then, the proposed model is trained, validated, and tested using water quality and meteorological parameter data collected by an offshore farm environmental monitoring system. The results demonstrate that the proposed dual-channel structure and dual-attention mechanism can significantly improve the predictive performance of the model. The prediction accuracy for pH, dissolved oxygen (DO), and salinity (SAL) (with Nash coefficients of 0.9361, 0.9396, and 0.9342, respectively) is higher than that for chemical oxygen demand (COD), ammonia nitrogen (NH₃-N), nitrite (NO₂⁻), and active phosphate (AP) (with Nash coefficients of 0.8578, 0.8542, 0.8372, and 0.8294, respectively). Compared to the single-channel model DA-ResNet (ResNet integrated with the proposed dual-attention mechanism), the Nash coefficients for predicting pH, DO, SAL, COD, NH₃-N, NO₂⁻, and AP increase by 12.76%, 12.58%, 11.68%, 18.350%, 19.32%, 16%, and 14.99%, respectively. Compared to the single-channel DA-LSTM model (LSTM integrated with the proposed dual-attention mechanism), the corresponding increases in Nash coefficients are 9.15%, 9.93%, 9.11%, 10.91%, 10.11%, 10.39%, and 10.2%, respectively. Compared to the ResNet-LSTM (ResNet and LSTM in parallel) model without the attention mechanism, the improvements in Nash coefficients are 1.91%, 2.4%, 0.74%, 3.41%, 2.71%, 3.55%, and 4.13%, respectively. The predictive performance of the model fulfills the practical requirements for accurate forecasting of water quality in nearshore aquaculture.

Keywords:

nearshore aquaculture; water quality prediction; dual-channel; dual-attention mechanism

1. Introduction

Water quality management is crucial in aquaculture, as it directly relates to the health and growth efficiency of the farmed fish. Good water quality can help reduce the occurrence of diseases, improve feed conversion rates, promote fish growth, and ultimately enhance economic benefits [1,2]. Therefore, during the process of aquaculture, great importance should be attached to water quality management to ensure that water conditions are appropriate, stable, and safe. Water quality management involves monitoring and maintaining indicators such as dissolved oxygen, pH value, ammonia nitrogen, and nitrite within appropriate ranges in the water body, as well as reducing water pollution through reasonable feed and substrate management [3,4].

The water quality parameters in aquaculture are the result of the interaction of various physical, chemical, and biological processes. These processes are intertwined and mutually influential, causing the water quality parameters to exhibit characteristics such as nonlinearity, coupling, and time-variability [5]. Water quality parameters can be forecasted by developing a series of coupled differential equations or dynamic models that mirror changes in water quality. However, this methodology necessitates a thorough consideration of the interactions among diverse water quality indicators and the influence of environmental factors on water quality, rendering it highly theoretically sophisticated and complex. Consequently, it demands considerable professional expertise and computational resources for its formulation and resolution. In recent years, artificial intelligence technology, especially deep learning algorithms, has demonstrated powerful capabilities in modeling complex nonlinear systems, making it widely used in tasks such as water quality parameter prediction, stock price prediction, traffic flow prediction, and other similar applications [6,7,8,9,10]. Deep learning models possess multi-layered nonlinear structures and nonlinear activation functions, which enable them to capture high-dimensional features and nonlinear relationships in data, and achieve complex function approximation.

Recurrent neural networks (RNNs), based on deep learning, are neural network architectures specifically designed for processing sequential data. They possess feedback connections that enable the network to utilize information from previous time steps when processing current inputs, making them highly effective in handling time-series data with significant temporal dependencies. As one of the most popular variants of RNNs, long short-term memory (LSTM) networks effectively address the issues of gradient vanishing and gradient explosion that arise during RNN training and have become a mainstream approach for time-series prediction [11]. Huan et al. [12] proposed a DO prediction model that combines gradient boosting decision trees (GBDTs) with LSTM networks. Chen et al. [13] established an LSTM network and its attention-based model (AT-LSTM) to predict water quality in the Burnett River in Australia. The research results indicated that the incorporation of the attention mechanism improved the prediction performance of the LSTM model. Wu et al. [14] proposed a novel hybrid DO prediction model based on LSTM optimized using an improved sparrow search algorithm (ISSA). Wang et al. [15] introduced a short-term water quality prediction model based on variational mode decomposition (VMD) and an improved grasshopper optimization algorithm (IGOA) to optimize LSTM neural networks. Arepalli et al. [16] presented a lightweight spatial shared attention LSTM (SSA-LSTM) model for the accurate prediction of hypoxic conditions. Bi et al. [17] proposed a water quality prediction model that combines VMD, a bidirectional input attention mechanism, an encoder–decoder, and bidirectional long short-term memory (Bi-LSTM) fusion.

LSTM possesses powerful capabilities in extracting temporal features, but it has limitations in extracting local features from input data. A Convolutional Neural Network (CNN), on the other hand, is another specially designed deep learning model. By combining convolutional operations with deep hierarchical structures, a CNN can automatically extract local features from data and build higher-level abstract representations layer by layer. It excels in processing grid-like data such as images, videos, and audio. Residual networks (ResNets) are an important variant of CNNs, which introduce residual blocks to address the degradation problem in training deep networks. In recent years, many scholars have embarked on exploring the integration of CNNs or ResNets with LSTM, aiming to fully leverage the strengths of both to construct more sophisticated and powerful models for processing spatiotemporal data with grid structures [18,19,20,21].

Barzegar et al. [22] first proposed a hybrid CNN-LSTM model for predicting water quality parameters. The results demonstrated that the hybrid model outperformed individual models (LSTM, CNN, support vector regression (SVR), and decision tree (DT) models) in predicting DO and Chlorophyll-a (Chl-a). Tan et al. [23] constructed a neural network model combining CNN and LSTM networks to predict the DO. Experimental results showed that this model provided more accurate predictions, especially in terms of peak fitting, compared to the traditional LSTM. Wang et al. (2024) [24] proposed a hybrid water quality prediction model based on ensemble empirical mode decomposition (EEMD), which combines a CNN and BiLSTM. The results showed that the proposed model improved the R² index by 5%, 7%, and 5%, respectively, compared to the suboptimal model, in predicting the index at 4 h, 1 day, and 2 days. In the aforementioned literature, a CNN and LSTM are connected in series. Firstly, the CNN learns the sequential features of the input, and then the extracted features are passed to LSTM. Finally, LSTM is utilized to handle long-distance dependencies for predicting the target value. Additionally, many scholars have attempted to integrate the attention mechanism into the series-connected CNN-LSTM model. Zhang et al. [25] integrated a spatial attention mechanism (SAM) and a temporal attention mechanism (TAM) into the CNN-LSTM model to build a multi-index and time-series prediction model for surface water quality. The results indicate that the model incorporating the attention mechanism outperforms the CNN-LSTM model. Furthermore, the model that integrates two attention mechanisms exhibits superior prediction performance compared to the CNN-LSTM model with only a single attention mechanism. Wang et al. [26] proposed a novel coupled model, AC-BiLSTM, which combines CNN and BiLSTM with an attention mechanism (AM), to address the discontinuous dynamic changes in DO over long time series. Compared to the BiLSTM and CNN-BiLSTM models, AC-BiLSTM exhibits superior performance based on evaluation metrics such as mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination. Additionally, AC-BiLSTM possesses a stronger capability to capture global dependencies. The results from the aforementioned literature indicate that incorporating an attention mechanism module into time-series prediction models can significantly enhance the model’s ability to capture key information and dynamic changes in the data. By computing the correlation weights between different time steps or features, the attention mechanism allows the model to adaptively focus on the parts that have the greatest impact on the prediction results, thereby improving the accuracy of the predictions.

Concurrently, in other domains of time-series prediction, researchers have embarked on exploring hybrid models with a parallel structure that integrates a CNN and LSTM. Qiu et al. (2023) [19] introduced a model named differential attention residual network long short-term memory (DARLNet) specifically for predicting epileptic seizures. This model comprises two parallel channels: ResNet and LSTM. The ResNet is responsible for capturing local correlation features from the input electroencephalogram (EEG) signals, whereas LSTM handles the extraction of temporal dependency features. Ultimately, the high-level seizure features extracted from both channels are concatenated and fed into a fully connected (FC) layer for feature fusion and seizure detection. The findings reveal that, in comparison to several existing seizure detection methods, this model demonstrates superior prediction performance.

In summary, the hybrid model combining a CNN and an RNN (particularly LSTM) has emerged as a new research trend in the field of time-series prediction. The hybrid model performs better due to its ability to simultaneously extract both local and global features from time-series data, and the parallel structure of the CNN and LSTM offers advantages over the serial structure. Furthermore, integrating an attention mechanism into the hybrid model enables the model to focus more on important features or critical information in the input data, thereby improving the model’s accuracy and efficiency. In light of this, this paper proposes a hybrid model named DDA-ResNet-LSTM for water parameter prediction in offshore aquaculture, which incorporates dual-channel and dual-attention mechanisms. Unlike previous water quality prediction models, DDA-ResNet-LSTM adopts a parallel structure combining a ResNet and LSTM. Additionally, the attention mechanism in this model is designed to be more comprehensive. In the ResNet channel, the Gram Angle Field (GAF) method is utilized to convert one-dimensional time-series data into two-dimensional grid point data that CNNs excel at processing. Combined with dual mechanisms of channel attention and spatial attention, the model can adaptively and dynamically focus on environmental variables and critical moments that have a greater impact on the predicted parameter, thereby enhancing its feature representation capabilities. In the LSTM channel, a recall gate attention mechanism is introduced at the input end to enhance the temporal correlation of the data, and a global attention mechanism is introduced at the output end to highlight the influence of certain important moments throughout the entire time series, thus improving the model’s ability to extract temporal features. Finally, a fully connected layer is used to reduce the dimensionality of the features extracted from both channels, resulting in an end-to-end DDA-ResNet-LSTM hybrid model. This design not only fully leverages the advantages of a CNN and LSTM but also further enhances the model’s prediction performance by incorporating an attention mechanism, providing a new and effective method for dissolved oxygen prediction in offshore aquaculture.

2. Materials and Methods

2.1. Study Area and Data Collection

2.1.1. Data Source Introduction

The research team has independently designed an offshore aquaculture environment monitoring system. This system integrates water quality and meteorological sensors for data acquisition and utilizes LoRa + 5G technology to transmit the data to a server for storage, processing, and analysis. Ultimately, users can access real-time information about the aquaculture environment through information terminals such as mobile phones, tablets, and computers, receive timely alerts for anomalies, and remotely control corresponding equipment based on monitoring results. The overall architecture of the monitoring system is shown in Figure 1.

The monitoring system was deployed in a gravity-type deep-sea cage located in the offshore area of Xilian, Xuwen County, Zhanjiang City, Guangdong Province, China, as shown in Figure 2. The observed water parameters included water temperature (WT), salinity (SAL), chemical oxygen demand (COD), pH, dissolved oxygen (DO), ammonia nitrogen (NH₃-N), nitrite (NO₂⁻), and active phosphate (AP). The observed meteorological parameters included relative humidity (RH), atmospheric pressure (PRESS), wind speed (WS), wind direction (WD), solar radiation (SR), and precipitation (PRECIP). The data collection period spans from 1 May 2023 to 30 September 2023, totaling 153 days. Water quality parameters were collected every 30 min, while meteorological parameters were collected every hour.

2.1.2. Data Preprocessing

During the data collection process, factors such as the aquaculture environment, sensor malfunctions, and network signal fluctuations can lead to a small number of missing and abnormal values in the sample data. The mean smoothing method was employed to eliminate abnormal data, and the linear interpolation method was used to fill in missing values. Additionally, data series with different dimensions can affect the final model’s prediction performance. Therefore, before model training and testing, all variables were subjected to min–max normalization. The specific processing steps are as follows:

(1) Removal of Outliers: The box plot method is used to identify abnormal data. If a data value falls below

Q_{1} - 1.5 * I Q R

or above

Q_{3} + 1.5 * I Q R

, it is considered an outlier. Here,

Q_{1}

,

Q_{3}

and

I Q R

are the lower and upper quartiles of the data and the difference between them, respectively.

(2) Missing value handling: Linear interpolation is used for completion. If there are a large number of missing data points, data from the same time of day on days with similar weather conditions or adjacent days are used to fill in the gaps. The formula for linear interpolation is given in Equation (1):

x_{k + i} = x_{k} + \frac{i (x_{k + j} - x_{k})}{j} (0 < i < j)

(1)

In the formula,

i, j, k

denote the time point;

x_{k}, x_{k + j}

represent the environmental parameter value collected at the time point

k

,

k + j

; and

x_{k + i}

denotes the missing data at time point

k + i

.

(3) Normalization: The complete data, after removing abnormal values and filling in the missing values, are subjected to min–max normalization using Equation (2). In the equation,

\bar{x}

denotes the normalized data,

x

denotes the original data, and

x_{\max}

and

x_{\min}

represent the maximum and minimum values of the original data, respectively.

\bar{x} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(2)

After removing outliers and filling in missing values, the meteorological data with larger sampling intervals were linearly interpolated to align with the water quality parameters. Table 1 presents the statistical distribution of the eight water quality parameters and six meteorological parameters after data preprocessing.

2.1.3. Sample Production

When preparing experimental samples, the first step is to determine the input and output of the model. This paper mainly focuses on predicting seven water quality parameters excluding sea water temperature. These sea water quality parameters interact through complex physical, chemical, and biological processes, forming a complex system. Additionally, this system is directly and indirectly affected by meteorological parameters. For any water quality prediction model, incorporating all meteorological parameters as inputs would increase the model’s complexity, leading to increased computational load during training and inference, thereby prolonging runtime. Furthermore, it may also result in issues such as overfitting and reduced generalization ability. Considering that some meteorological parameters have a weaker influence on specific water quality parameters, when building the prediction model for a particular water quality parameter, we only select the top four meteorological factors with higher correlation by calculating their Pearson correlation coefficients with each meteorological parameter. Ultimately, for the water quality parameters of pH, dissolved oxygen, chemical oxygen demand, and salinity, the corresponding meteorological input parameters are relative humidity, atmospheric pressure, solar radiation, and precipitation. For the water quality parameters of ammonia nitrogen, nitrite, and active phosphate, the corresponding meteorological input parameters are wind speed, wind direction, solar radiation, and precipitation. Table 2 lists the inputs and outputs corresponding to the water quality parameter models.

Subsequently, samples were created for each prediction model based on the defined inputs and outputs. For a sample at time t, the input consists of the water quality and meteorological parameters from the previous 48 h, and the output is the value of the target water quality parameter at time t + 2. With 153 days of data, a total of 7341 samples were generated. These samples were then divided into a training set, a validation set, and a test set in a ratio of 70:15:15, resulting in 5139, 1101, and 1101 samples, respectively.

2.2. Construction of Water Quality Prediction Model

2.2.1. Gram Angle Field (GAF)

The GAF is an image encoding technique that converts one-dimensional time-series data into two-dimensional images. It offers advantages such as preserving temporal information, enhancing feature extraction efficiency, and simplifying data processing. It treats each data point in the one-dimensional time-series data as a point in a vector space, calculates the cosine values of the angles between these points, and then maps these cosine values onto the pixels of a two-dimensional image, thereby generating an image that reflects the dynamic and periodic characteristics of the time series. The specific implementation steps are as follows:

(1): Data Preprocessing: Standardize the original one-dimensional time-series data to normalize them within the range [0,1] in order to eliminate the influence of different dimensions on the results.
(2): Construct Vector Space: Treat the time-series data as vectors in a vector space.
(3): Calculate Inner Products: Compute the inner products between these vectors to form a Gram matrix. The Gram matrix is a symmetric matrix whose elements reflect the similarity between data at different time points.
(4): Calculate Angles: Based on the Gram matrix, calculate the angles between the vectors at different time points.
(5): Generate Angle Matrix: Convert the angles into values between 0 and 1 to generate a new angle matrix.
(6): Image Generation: Use the angle matrix as the pixel values to generate a two-dimensional image. Each pixel value in the image corresponds to the cosine value of the angle between different time points in the time-series data, thereby preserving the temporal information and dynamic characteristics of the data.

2.2.2. Residual Neural Network (ResNet)

Traditional neural networks enhance their feature extraction capabilities by increasing their width and depth. However, as the number of layers increases, issues such as gradient vanishing, gradient exploding, and degradation problems emerge. The ResNet addresses these issues by introducing residual connections (also known as shortcut connections) between the original input and output, enabling the network to learn residuals.

The residual block is the core building unit in the ResNet, used to construct deep networks. A residual block typically includes an input layer, convolutional layers, a shortcut connection (also known as a skip connection), and an output layer, as shown in Figure 3. The input layer receives feature maps from the previous layer. The convolutional layers extract features through convolution operations, and, between multiple convolutional layers, there are also batch normalization and ReLU activation functions to accelerate the training process and improve model stability. The shortcut connection directly connects the input layer to the output of the convolutional layers, forming a residual connection. The output layer sums the output of the convolutional layers with the result of the shortcut connection to obtain the output of the residual block.

2.2.3. Long Short-Term Memory Neural Network

Long short-term memory (LSTM), by introducing unique gating mechanisms and memory cells, is capable of capturing dependencies over long time spans. The basic structural unit within LSTM is illustrated in Figure 4. The gating mechanisms consist of a forget gate, an input gate, and an output gate, which determine the flow and update of information at each time step. The forget gate decides which information in the memory cell should be forgotten. The input gate determines how much new information from the current input will be added to the memory cell. The output gate decides which information in the memory cell should be output as the hidden state for the current time step and passed to the next time step. The memory cell is the core of LSTM; through the gating mechanisms, it can selectively save, update, or delete information, thereby transmitting long-term information without being affected by the vanishing gradient problem.

The operations of an LSTM unit at time step t can be described using Equations (3)–(8):

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(3)

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(4)

{\tilde{C}}_{t} = \tanh (W_{C} [h_{t - 1}, x_{t}] + b_{C})

(5)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(6)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(7)

h_{t} = o_{t} * \tanh (C_{t})

(8)

f_{t}, i_{t}, o_{t}

are the outputs of the forget, input, and output gates, respectively.

{\tilde{C}}_{t}

is the output of the candidate memory cell.

C_{t - 1}, C_{t}

are the memory state from the previous time step and the update to the current memory state, respectively.

h_{t - 1}, h_{t}

are the hidden state from the previous time step and the hidden state for the current time step, respectively.

x_{t}

is the input for the current time step.

W

and

b

are the weight and bias matrices, respectively.

σ, \tan h

denote the Sigmoid activation function and the hyperbolic tangent function, respectively.

*

denotes the element-wise multiplication operation.

2.2.4. Attention Mechanism

An attention mechanism is a method that mimics human visual and cognitive systems, allowing neural networks to focus on relevant parts of the input data during processing. By introducing an attention mechanism, neural networks can automatically learn and selectively pay attention to important information in the input, thereby improving the model’s performance and generalization capabilities.

ResNet-Dual-Attention

The convolutional block attention module (CBAM) is a lightweight and efficient attention mechanism module that combines both spatial and channel dimensions. Embedding the CBAM in ResNet can help ResNet better focus on important features, enhance its feature representation ability, and improve its efficiency and accuracy in processing information. Figure 5a depicts the overall structure after adding the CBAM to ResNet. Given an intermediate feature map, the CBAM sequentially infers attention maps along the two independent channel and spatial dimensions; then, it multiplies the attention maps with the input feature map to perform adaptive feature refinement.

The channel attention module, as shown in Figure 5b, generates a channel attention feature map using the inter-channel relationships of features, focusing on “what” is meaningful in the input image. Firstly, the spatial information of a feature map is aggregated through the average pooling and max pooling operations to generate two different spatial context descriptors. These are then fed into a shared network consisting of a multi-layer perceptron (MLP) with one hidden layer to generate a channel attention map. The calculation is shown in Equation (9):

\begin{array}{l} M_{c} (F) & = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) \\ = σ (W_{1} (W_{0} (F_{a v g}^{C})) + W_{1} (W_{0} (F_{m a x}^{C}))) \end{array}

(9)

where

F_{a v g}^{C}

and

F_{m a x}^{C}

denote the features subjected to average pooling and max pooling, respectively, and

W_{1}

and

W_{0}

are the weight matrices of the MLP.

The spatial attention module, as shown in Figure 5c, generates a spatial attention map using the spatial interrelationships of features, complementing channel attention by focusing on “where” the informative parts are. Firstly, the average pooling and max pooling operations are performed along the channel axis, and the results are concatenated to generate an efficient feature descriptor. This descriptor is then input into a standard convolutional layer to produce a 2D spatial attention map. The calculation is shown in the following equation:

\begin{array}{l} M_{s} (F) & = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) \\ = σ (f^{7 \times 7} ([F_{a v g}^{S}; F_{m a x}^{S}])) \end{array}

(10)

Here,

7 \times 7

denotes the size of the convolutional kernel, and

F_{a v g}^{S}

and

F_{m a x}^{S}

represent the features subjected to average pooling and max pooling along the channel axis, respectively.

LSTM-Dual-Attention

In the hidden layer of LSTM, each time step receives an input and the cell state from the previous time step, meaning that the current state of the layer depends on the state from the previous moment. For tasks with large temporal spans in time dependency, this “one-step” temporal dependency may limit the LSTM’s ability to model the dynamic characteristics of sequential signals. To achieve balanced control between learning short-term dependencies and long-term representations, in this study, a self-attention mechanism is introduced into the LSTM, with an additional recall gate added to capture the correlation between the current memory state and a certain past period, allowing the network to learn the influence weight of the state from a certain past period on the current state. Thus, by improving the recurrent transition function of the memory state, the update of the current memory state depends not only on the current input and the hidden state from the previous step but also on the fusion of the previous memory and the memory from a certain past period. The structure of the LSTM unit with the added recall gate is shown in Figure 6. The operation of the recall gate is shown in Equation (11), and the update of the current memory state is shown in Equation (12).

r_{f} = softmax (f_{t} \cdot {(C_{1 : t - 1})}^{T}) \cdot (C_{1 : t - 1})

(11)

C_{t} = i_{t} * {\tilde{C}}_{t} + LayerNorm (C_{t - 1} + r_{f})

(12)

Furthermore, in this study, a global temporal attention mechanism is added to the output end of the LSTM, that is, the last layer of the LSTM structure. This mechanism assigns weight values to the hidden layer outputs at different time steps, concatenates the weighted sum of the temporal vectors with the output of the last time step, and serves as the output of the LSTM network after processing by the attention mechanism. The network structure with added attention can highlight the importance of temporal information at different time steps, selectively focus on some important information, and reduce the interference of unimportant information. The specific calculation steps are shown in Equations (13)–(18):

Q = ω_{Q} H_{D}

(13)

V_{t} = ω_{V} O_{t}

(14)

K_{t} = ω_{K} O_{t}

(15)

e_{t} = \frac{Q K_{t}^{T}}{\sqrt{d_{k}}}

(16)

a_{t} = \frac{\exp (e_{t})}{\sum_{t = 0}^{n} \exp (e_{t})}

(17)

z (Q, K, V) = \sum_{t} a_{t} V

(18)

Q

,

V_{t}

, and

K_{t}

represent the query, key, and value matrices, respectively.

d_{k}

is the feature dimension of both the key and query matrices.

O_{t}

denotes the output at each time step, and

H_{D}

is the hidden state at the final time step.

ω_{Q}, ω_{V}, ω_{K}

are the parameters of the neural network, which are updated during backpropagation. Finally, the weighted sum of the outputs at each time step is calculated to obtain the feature vector

z

with attention.

2.2.5. DDA-ResNet-LSTM Hybrid Model

The proposed hybrid model comprises two parallel feature extraction channels: the ResNet and LSTM. In the ResNet channel, the GAF method is utilized to convert one-dimensional time-series data into two-dimensional grid point data that are well suited for processing by the ResNet. The ResNet is combined with the dual mechanisms of channel attention and spatial attention, enabling the model to adaptively and dynamically focus on environmental variables and critical moments that have a greater impact on the predicted water quality parameter, thereby enhancing the model’s feature representation ability. In the LSTM channel, a recall gate attention mechanism is introduced at the input end to enhance the temporal correlation of the data, and a global attention mechanism is introduced at the output end to emphasize the influence of certain important moments throughout the time series, comprehensively strengthening the model’s extraction of temporal features. Subsequently, fully connected layers are used for feature dimensionality reduction, and batch normalization is introduced to improve the network’s generalization performance. Finally, an end-to-end Dual-Channel and Dual-Attention ResNet-LSTM hybrid model, abbreviated as DDA-ResNet-LSTM, is established for water quality prediction, as shown in Figure 7.

2.3. Setting of Model Hyperparameters

After the model is constructed, the next step is to train the model. The computer configuration used in this experiment is Windows 11, with an i9-13900KF CPU, RTX 4090 GPU, and 64 GB of RAM. The integrated development environments are Anaconda3 and Matlab R2020a, and the programming languages are Python and Matlab.

The purpose of model training is to determine the optimal weight parameters and bias term sets in each neuron so that the entire neural network model achieves the highest degree of fitting to the learning target. Before training the model, it is necessary to first set some hyperparameters. These hyperparameters are not updated during the training process but ultimately directly affect the model’s prediction performance and operational efficiency by influencing the neural network’s structure and training process. In this study, a random search method was used to determine the optimal combination of hyperparameters for the model, with 50 iterations of random search. Table 3 lists the hyperparameter value ranges for the network model.

Using the hyperparameter combinations provided by the random search and the pre-divided training set, the DDA-ResNet-LSTM model was trained. Supervised learning was employed to train the model, with the root mean square error (RMSE) function serving as the loss function. The mathematical definition of the RMSE function is shown in Equation (19):

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\tilde{y}}_{i})}^{2}}

(19)

In the equation,

y_{i}

represents the actual value,

{\tilde{y}}_{i}

represents the model’s predicted value, and N represents the number of training batch samples. An end-to-end learning approach was adopted, where the weights of the neural network were continuously adjusted through the forward propagation and backpropagation of gradients. The iteration stopped when the preset number of iterations or training target was reached, completing the neural network training. Each combination of hyperparameters corresponded to a trained model, and this process continued until the search was completed. Finally, using the pre-divided validation dataset, the models were validated, and the validation results were measured using the mean squared error (MSE). The model with the smallest MSE was retained as the optimal model.

Following the above optimization and training methods, the final optimal hyperparameters for the DDA-ResNet-LSTM model were obtained, as shown in Table 4.

2.4. Model Evaluation

The prediction performance of the model was evaluated using the root mean square error (RMSE), mean absolute percentage error (MAPE), and Nash–Sutcliffe efficiency (NSE) coefficients. The calculation formulae are shown in Equations (19)–(21).

MAPE = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\tilde{y}}_{i}}{y_{i}}|

(20)

NSE = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\tilde{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}

(21)

In the formulae,

y_{i}

denotes the actual value,

{\bar{y}}_{i}

denotes the mean of the actual values,

{\tilde{y}}_{i}

denotes the predicted value of the model, and N represents the number of data points in the dataset used to evaluate the model’s performance. A smaller MAPE and RMSE, along with a higher NSE, indicate better prediction performance of the model.

3. Results

3.1. Prediction Performance of DDA-ResNet-LSTM

3.1.1. The Performance of the Model on the Test Set

Each of the 1101 test set samples corresponding to various water quality parameters was input into its optimized DDA-ResNet-LSTM model to obtain the prediction results sequentially. The performance parameters of the models were calculated using Equations (20)–(22) and listed in Table 5. The prediction accuracy for pH, dissolved oxygen (DO), and salinity (SAL) (with Nash coefficients of 0.9361, 0.9396, and 0.9342, respectively) is higher than that for chemical oxygen demand (COD), ammonia nitrogen (NH₃-N), nitrite (NO₂⁻), and active phosphate (AP) (with Nash coefficients of 0.8578, 0.8542, 0.8372, and 0.8294, respectively). Figure 8a–g presents scatter plots of the predicted versus actual values for the seven water quality parameters. The plots show a relatively tight and orderly distribution of data points, indicating that the proposed DDA-ResNet-LSTM model achieved high prediction accuracy for all seven water quality parameters. Figure 9 shows the min–max normalized errors between predicted and observed values for seven water quality parameters on the test set.

3.1.2. Model Comparison

Ablation Experiments

The main architecture of the proposed DDA-ResNet-LSTM fusion model consists of two parallel branches, the ResNet and LSTM, with dual-attention mechanisms added to each branch. To demonstrate the parallel fusion effect of the ResNet and LSTM, the designed comparison models include Group A: ResNet, LSTM, DA-ResNet, and DA-LSTM. To illustrate the role of the dual-attention mechanism in the fusion model, the designed comparison models include Group B: ResNet-LSTM, DAResNet-LSTM, and ResNet-DALSTM. Brief introductions to each model are as follows:

ResNet: This model removes all attention mechanism structures from the DDA-ResNet-LSTM model and retains only the ResNet part.

LSTM: This model removes all attention mechanism structures from the DDA-ResNet-LSTM model and retains only the LSTM part.

DA-ResNet: This model removes LSTM and its related attention mechanism modules from the DDA-ResNet-LSTM model.

DA-LSTM: This model removes ResNet and its related attention mechanism modules from the DDA-ResNet-LSTM model.

ResNet-LSTM: This model removes all attention mechanism structures from the DDA-ResNet-LSTM model.

DAResNet-LSTM: This model removes the attention mechanism structure of the LSTM module from the DDA-ResNet-LSTM model.

ResNet-DALSTM: This model removes the attention mechanism structure of the ResNet module from the DDA-ResNet-LSTM model.

The prediction performance of each model on the test set is shown in Table 6 and Table 7, corresponding to Group A and Group B, respectively. It can be observed from Table 6 that the prediction accuracy of the single ResNet, single LSTM, and their respective single-channel models with attention mechanisms added is lower than that of the proposed parallel dual-channel and dual-attention mechanism model. Specifically, compared to the single-channel model DA-ResNet (ResNet integrated with the proposed dual-attention mechanism), the Nash coefficients for predicting pH, DO, SAL, COD, NH₃-N, NO₂⁻, and AP increase by 12.76%, 12.58%, 11.68%, 18.350%, 19.32%, 16%, and 14.99%, respectively. Compared to the single-channel DA-LSTM model (LSTM integrated with the proposed dual-attention mechanism), the corresponding increases in Nash coefficients are 9.15%, 9.93%, 9.11%, 10.91%, 10.11%, 10.39%, and 10.2%, respectively. This suggests that the parallel fusion of the ResNet and LSTM helps improve the model’s prediction performance. Figure 10 presents the prediction performance of the models in Group B. It can be seen from Table 7 that the addition of dual-attention mechanisms to both ResNet and LSTM can enhance the model’s prediction performance. Compared to the ResNet-LSTM (ResNet and LSTM in parallel) model without the attention mechanism, the improvements in Nash coefficients are 1.91%, 2.4%, 0.74%, 3.41%, 2.71%, 3.55%, and 4.13%, respectively.

Comparative Experiments

To validate the overall prediction effectiveness of the DAM-ResNet-LSTM fusion model proposed in this paper, a comparative study was conducted using timing prediction models proposed in the literature in the last two years, including the DARLNet [19], Attention-ResNet-LSTM [21], CNN-CBAM-LSTM [25], and AC-BiLSTM [26] models, and brief introductions to each model are as follows:

DARLNet [19]: This model is composed of a parallel integration of a ResNet with a channel attention mechanism and LSTM.

Attention-ResNet-LSTM [21]: This model is composed of a serial integration of a ResNet, which incorporates a channel attention mechanism, and LSTM, which incorporates a global attention mechanism.

CNN-CBAM-LSTM [25]: This model is constructed by connecting a CNN and LSTM in series, with a temporal attention mechanism and a spatial attention mechanism sequentially integrated between them.

AC-BiLSTM [26]: This model is built by serially connecting a CNN and BiLSTM and incorporating a global attention mechanism within it.

The prediction performance of various models on the test set is shown in Figure 9. The green line in Figure 10 represents the prediction performance of the proposed model, indicating that it outperforms the models in references [19,21,25,26]. This demonstrates the effectiveness and superiority of each module and their integration in this hybrid model in improving the accuracy of water quality predictions.

3.2. Application of the Model

Taking the prediction of various water quality parameters on 1 October 2023 as the experimental subject, the prediction results of the proposed model are shown in Figure 11. It can be observed that, for the three parameters of pH, DO, and SAL, the model is able to capture the trends of the real data, sensitively identify subtle fluctuations in the data, and maintain high prediction accuracy. For COD, NH₃-N, NO₂⁻, and AP, while the model can effectively predict the overall trends of the data, there are larger errors in predicting inflection points and sudden changes.

4. Discussion

4.1. Analysis of the Model’s Prediction Effectiveness for Different Water Quality Parameters

As can be seen from Table 5 and Figure 8, the model proposed in this paper exhibits higher prediction accuracy for the parameters pH, DO, and SAL, while the prediction accuracy for COD, NH₃-N, NO₂⁻, and AP is relatively lower. The pH, DO, and SAL change relatively stably in water bodies and are less affected by external factors, with relatively simple changing patterns. Therefore, the model can more easily capture their regularities. However, NH₃-N, NO₂⁻, and AP are also influenced by various complex factors such as biological activity and pollutant discharge. Due to the interaction among many factors, the changes in these parameters are more drastic and complex, leading to increased difficulty in model prediction.

4.2. Model Contribution of Parallel Fusion Structure of ResNet and LSTM

As shown in Table 6, the model consisting of the parallel fusion of a ResNet and LSTM demonstrates superior prediction performance to a single ResNet, single LSTM, and the serial fusion models of a ResNet and LSTM. A ResNet possesses powerful local spatial feature extraction capabilities, but it lacks the ability to learn the overall temporal features of time series. Conversely, LSTM excels in extracting temporal features but has limitations in extracting local features. When these two models are coupled in parallel, the spatial local features extracted by the ResNet and the overall features extracted by LSTM are combined through a fully connected network. This approach allows for the comprehensive capture of multivariate time-series data features, thereby improving the prediction accuracy and robustness of the model.

When the ResNet and LSTM are fused in series, during the process of the ResNet extracting local features using convolutional operations, these operations typically only focus on features within a local window without considering the temporal relationship between these features and the current time point. This results in the extracted features lacking temporal dependence, inevitably leading to the loss of some temporal correlation information. In contrast, the parallel fusion approach avoids this issue by allowing the ResNet and LSTM to process the data simultaneously, preserving the temporal relationships in the features extracted by both models and enabling a more comprehensive understanding of the data.

4.3. Model Contribution of Dual-Attention Mechanisms

As seen in Table 7, the dual-attention mechanisms in both the ResNet and LSTM contribute to enhancing the prediction accuracy of the model. The channel attention mechanism in the ResNet dynamically and adaptively allocates weights to the 13 water quality and meteorological environmental parameters, emphasizing the role of variables that have a greater impact on the predicted parameter and suppressing unimportant information. The spatial attention mechanism focuses on the local features of these variables, highlighting the influence of critical moments that significantly affect the predicted parameter while inhibiting those that are less important.

The recall gate attention mechanism added to the input of LSTM enables the model to continuously focus on the impact of previous time periods on the current moment, maintaining and enhancing the temporal correlation between data points. The global self-attention mechanism added to the output dynamically adjusts the weights of the influence of different time steps on the current predicted water parameter. In so doing, the model can better understand the temporal dependencies within the data and make more accurate predictions.

In summary, the dual-attention mechanisms in both the ResNet and LSTM work synergistically to improve the model’s ability to capture the complex relationships between the different water quality parameters and meteorological parameters. By emphasizing important features and moments and by maintaining temporal correlations, the model is able to provide more accurate water quality predictions.

5. Conclusions

This study developed a data-driven model named DDA-ResNet-LSTM, which utilized historical water quality data and meteorological data from the past 24 h to make real-time predictions of seven water quality parameters 2 h in the future, i.e., pH, DO, SAL, COD, NH₃-N, NO₂⁻, and AP. The proposed DDA-ResNet-LSTM model combines the advantages of a CNN-based network and an RNN-based network by using Resnet and LSTM to extract spatial correlations and temporal dependencies in parallel. In addition, the Gramian Angular Field (GAF) method is utilized to convert one-dimensional time-series data into two-dimensional grid point data that are well suited for processing by a ResNet. Additionally, the ResNet is combined with the dual mechanisms of channel attention and spatial attention, enabling the model to adaptively and dynamically focus on environmental variables and critical moments that have a greater impact on the predicted water quality parameter. The LSTM incorporated a recall gate attention mechanism at the input end to enhance the temporal correlation of the data and a global attention mechanism at the output end to emphasize the influence of certain important moments throughout the time series.

The reliability of the proposed model was confirmed by the observation data from the offshore aquaculture environment detection system. The prediction accuracy of the proposed DDA-ResNet-LSTM model for pH, dissolved oxygen (DO), and salinity (SAL) (with Nash coefficients of 0.9361, 0.9396, and 0.9342, respectively) is higher than that for chemical oxygen demand (COD), ammonia nitrogen (NH₃-N), nitrite (NO₂⁻), and active phosphate (AP) (with Nash coefficients of 0.8578, 0.8542, 0.8372, and 0.8294, respectively). The dual-channel structure and dual-attention mechanism proposed in this paper can significantly improve the predictive performance of the model. Compared to the single-channel model DA-ResNet (ResNet integrated with the proposed dual-attention mechanism), the Nash coefficients for predicting pH, DO, SAL, COD, NH₃-N, NO₂⁻, and AP increase by 12.76%, 12.58%, 11.68%, 18.350%, 19.32%, 16%, and 14.99%, respectively. Compared to the single-channel DA-LSTM model (LSTM integrated with the proposed dual-attention mechanism), the corresponding increases in Nash coefficients are 9.15%, 9.93%, 9.11%, 10.91%, 10.11%, 10.39%, and 10.2%, respectively. Compared to the ResNet-LSTM (ResNet and LSTM in parallel) model without the attention mechanism, the improvements in Nash coefficients are 1.91%, 2.4%, 0.74%, 3.41%, 2.71%, 3.55%, and 4.13%, respectively. The predictive performance of the model meets the practical needs for precise prediction of water quality in offshore aquaculture.

In future research, we will focus on optimizing model hyperparameters and refining the design of the model in terms of input and output time ranges. Furthermore, we will broaden the temporal scope of our research by incorporating data from diverse environmental conditions. Specifically, we plan to cluster the data based on weather patterns and then develop prediction models for each cluster of data in order to enhance the applicability and accuracy of our model.

Author Contributions

W.L.: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review and editing. J.W.: Conceptualization, Methodology, Writing—review and editing. Z.L.: Software, Visualization, Writing—review and editing. Q.L.: Data curation, Visualization, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Key R&D Program of Shaanxi Province (2023-ZDLGY-15); Program for scientific research start-up funds of Guangdong Ocean University (060302112309); General Project of National Natural Science Foundation of China (62401162); New Generation Information Technology Special Project in Key Fields of Ordinary Universities in Guangdong Province (2020ZDZX3008); Key Special Project in the Field of Artificial Intelligence in Guangdong Province (2019KZDZX1046); Guangdong Youth Fund Project (2023A15151110770); Zhanjiang Marine Youth Talent Innovation Project (2023E0010).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cuenco, M.L.; Stickney, R.R.; Grant, W.E. Fish bioenergetics and growth in aquaculture ponds: II. effects of interactions among, size, temperature, dissolved oxygen, unionized ammonia and food on growth of individual fish. Ecol. Model. 1985, 27, 191–206. [Google Scholar] [CrossRef]
Abdel-Tawwab, M.; Monier, M.N.; Hoseinifar, S.H.; Faggio, C. Fish response to hypoxia stress: Growth, physiological, and immunological biomarkers. Fish Physiol. Biochem. 2019, 45, 997. [Google Scholar] [CrossRef] [PubMed]
Neilan, R.M.; Rose, K. Simulating the effects of fluctuating dissolved oxygen on growth, reproduction, and survival of fish and shrimp. J. Theor. Biol. 2014, 343, 54–68. [Google Scholar] [CrossRef]
Jiang, X.; Dong, S.; Liu, R.; Huang, M.; Dong, K.; Ge, J. Effects of temperature, dissolved oxygen, and their interaction on the growth performance and condition of rainbow trout (Oncorhynchus mykiss). J. Therm. Biol. 2021, 98, 102928. [Google Scholar] [CrossRef]
Sun, Y.; Lv, F.; Chen, Z. Spatial-temporal distribution and dynamics of dissolved oxygen in an adjacent area of the Changjiang estuary. Mar. Sci. 2021, 45, 86–96. [Google Scholar]
Shahi, T.B.; Shrestha, A.; Neupane, A. Stock Price Forecasting with Deep Learning: A Comparative Study. Mathematics 2020, 8, 1441. [Google Scholar] [CrossRef]
Wai, K.P.; Chia, M.Y.; Koo, C.H.; Huang, Y.F.; Chong, W.C. Applications of deep learning in water quality management: A state-of-the-art review. J. Hydrol. 2022, 613, 128332. [Google Scholar] [CrossRef]
Irwan, D.; Ali, M.; Ahmed, A.N. Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications. Arch. Computat. Methods Eng 2023, 30, 4633–4652. [Google Scholar] [CrossRef]
Fafoutellis, P.; Vlahogianni, E.I. Unlocking the Full Potential of Deep Learning in Traffic Forecasting Through Road Network Representations: A Critical Review. Data Sci. Transp. 2023, 5, 23. [Google Scholar] [CrossRef]
Zhong, K.F.; Zhang, J.S.; Niu, W.J. A state-of-the-art review of long short-term memory models with applications in hydrology and water resources. Appl. Soft Comput. 2024, 167, 112352. [Google Scholar]
Heddam, S.; Kim, S.; Mehr, A.D. Predicting dissolved oxygen concentration in river using new advanced machines learning: Long-short term memory (LSTM) deep learning. In Computers in Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2022; pp. 1–20. [Google Scholar]
Huan, J.; Li, H.; Li, M.B. Prediction of dissolved oxygen in aquaculture based on gradient boosting decision tree and long short-term memory network: A study of Chang Zhou fishery demonstration base, China. Comput. Electron. Agric. 2020, 175, 105530. [Google Scholar] [CrossRef]
Chen, H.; Yang, J.; Fu, X.; Zheng, Q.; Song, X.; Fu, Z. Water Quality Prediction Based on LSTM and Attention Mechanism: A Case Study of the Burnett River, Australia. Sustainability 2022, 14, 13231. [Google Scholar] [CrossRef]
Wu, Y.; Sun, L.; Sun, X. A hybrid XGBoost-ISSA-LSTM model for accurate short-term and long-term dissolved oxygen prediction in ponds. Environ. Sci. Pollut. Res. 2022, 29, 18142–18159. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wang, Q.; Wu, T. A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Front. Environ. Sci. Eng. 2023, 17, 88. [Google Scholar] [CrossRef]
Arepalli, P.G.; Naik, K.J. A deep learning-enabled IoT framework for early hypoxia detection in aqua water using light weight spatially shared attention-LSTM network. J. Supercomput. 2024, 80, 2718–2747. [Google Scholar] [CrossRef]
Bi, J.; Chen, Z.; Yuan, H.; Zhang, J. Accurate water quality prediction with attention-based bidirectional LSTM and encoder–decoder. Expert Syst. Appl. 2024, 238, 121807. [Google Scholar] [CrossRef]
Lu, J.; Jiang, M.; Zhang, Y.; Lv, W.; Li, T. Typhoon Track Prediction Based on TimeForce CNN-LSTM Hybrid Model. In Computer and Information Science. ICIS 2022, Studies in Computational Intelligence; Lee, R., Ed.; Springer: Cham, Switzerland, 2022; p. 1055. [Google Scholar]
Qiu, X.; Yan, F.; Liu, H. A difference attention ResNet-LSTM network for epileptic seizure detection using EEG signal. Biomed. Signal Process. Control 2023, 83, 104652. [Google Scholar] [CrossRef]
Uluocak, I.; Bilgili, M. Daily air temperature forecasting using LSTM-CNN and GRU-CNN models. Acta Geophys. 2024, 72, 2107–2126. [Google Scholar] [CrossRef]
Yu, S.; Zhang, Z.; Wang, S.; Huang, X.; Lei, Q. A performance-based hybrid deep learning model for predicting TBM advance rate using Attention-ResNet-LSTM. J. Rock Mech. Geotech. Eng. 2024, 16, 65–80. [Google Scholar] [CrossRef]
Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2020, 34, 415–433. [Google Scholar] [CrossRef]
Tan, W.; Zhang, J.; Wu, J.; Lan, H.; Liu, X.; Xiao, K. Application of CNN and long short-term memory network in water quality predicting. Intell. Autom. Soft Comput. 2022, 34, 1943–1958. [Google Scholar] [CrossRef]
Wang, Z.; Duan, L.; Shuai, D. Research on water environmental indicators prediction method based on EEMD decomposition with CNN-BiLSTM. Sci. Rep. 2024, 14, 1676. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, Z.; Wang, X. The Use of Attention-Enhanced CNN-LSTM Models for Multi-Indicator and Time-Series Predictions of Surface Water Quality. Water Resour. Manag. 2024, 38, 6103–6119. [Google Scholar] [CrossRef]
Wang, X.; Tang, X.; Zhu, M.; Liu, Z.; Wang, G. Predicting abrupt depletion of dissolved oxygen in Chaohu lake using CNN-BiLSTM with improved attention mechanism. Water Res. 2024, 261, 122027. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Offshore aquaculture environment monitoring system based on LoRa + 5G.

Figure 2. The location of the study area.

Figure 3. Structure of a residual block.

Figure 4. Structure diagram of LSTM unit.

Figure 5. (a) Overview of ResNet + CBAM. (b) Overview of channel attention module. (c) Overview of spatial attention module.

Figure 6. LSTM unit’s structure with recall gate.

Figure 7. Flowchart of DDA-ResNet-LSTM for water quality prediction.

Figure 8. Scatter plots of predicted versus observed values for 7 water quality parameters on the test set. (a) pH; (b) DO; (c) SAL; (d) COD; (e) NH₃-N; (f) NO₂⁻; (g) AP.

Figure 9. Boxplot of the min–max normalized errors between predicted and observed values for 7 water quality parameters on the test set. The positions of the ends of the rectangular box in the diagram correspond to the upper and lower quartiles (Q₃ and Q₁) of the data, respectively. The line segment inside the rectangular box represents the median of the data. The uppermost and lowermost line segments are located at Q₃ + 1.5IQR and Q₁ − 1.5IQR, respectively, where IQR (Interquartile Range) is Q₃ − Q₁, representing the cut-off points for outliers. The red plus signs indicate the presence of outliers.

Figure 10. Spider plots of prediction performance metrics for different water quality prediction models for comparison experiment. (a) RMSE; (b) MAPE; (c) NSE.

Figure 11. Comparison of predicted and observed values for 7 water quality parameters on 1 October 2023. The blue line represents the observed values, while the red line represents the predicted values. (a) pH; (b) DO; (c) SAL; (d) COD; (e) NH3-N; (f) NO₂⁻; (g) AP.

Table 1. Statistical distribution of the data after preprocessing.

Category	Indicators	Mean ± SD	Range
Water quality parameters	Water temperature/°C	26.748 ± 3.421	18.83~33.78
	Salinity/‰	27.482 ± 3.421	25.26~35.73
	Chemical oxygen demand (COD)/(mg·L⁻¹)	2.251 ± 1.748	0.13~3.42
	pH value	7.880 ± 0.207	7.28~8.96
	Dissolved oxygen/(mg·L⁻¹)	7.325 ± 2.236	3.15~11.96
	Ammonia nitrogen/(mg·L⁻¹)	0.132 ± 0.062	0.02~0.43
	Nitrite/(mg·L⁻¹)	0.036 ± 0.014	0.016~0.18
	Active phosphate/(mg·L⁻¹)	0.126 ± 0.028	0.05~0.203
Meteorological parameters	Relative humidity/%	86.827 ± 7.835	70.93~93.74
	Pressure/kPa	101.428 ± 0.801	99.44~102.13
	Wind speed/(km·h⁻¹)	17.846 ± 7.862	6.50~61.35
	Wind direction/(°)	186.935 ± 68.727	15.85~360
	Solar radiation (W·m⁻²)	759.956 ± 605.235	0.0~2031.67
	Rainfall/mm	3.527 ± 10.753	0.0~56.73

Table 2. Input corresponding to the prediction of different water quality parameters.

Output	pH	DO	COD	SAL	NH₃-N	NO₂⁻	AP
Input	WT	WT	WT	WT	WT	WT	WT
	pH	pH	pH	pH	pH	pH	pH
	DO	DO	DO	DO	DO	DO	DO
	COD	COD	COD	COD	COD	COD	COD
	SAL	SAL	SAL	SAL	SAL	SAL	SAL
	NH₃-N	NH₃-N	NH₃-N	NH₃-N	NH₃-N	NH₃-N	NH₃-N
	NO₂⁻	NO₂⁻	NO₂⁻	NO₂⁻	NO₂⁻	NO₂⁻	NO₂⁻
	AP	AP	AP	AP	AP	AP	AP
	RH	RH	RH	RH	SR	SR	SR
	PRESS	PRESS	PRESS	PRESS	PRECIP	PRECIP	PRECIP
	SR	SR	SR	SR	WS	WS	WS
	PRECIP	PRECIP	PRECIP	PRECIP	WD	WD	WD

Table 3. Hyperparameter value ranges of the network model.

Hyperparameter	Hunting Zone
Number of hidden layer nodes in the LSTM	[72, 96, 120, 144]
Learning rate	[0.001, 0.005, 0.01, 0.05, 0.1]
Batch of samples	[16, 32, 64, 128]
Max epochs	[25, 50, 75, 100]

Table 4. Results of hyperparameter optimization for the model.

Predicted Water Quality Parameter	Hyperparameter
Predicted Water Quality Parameter	Number of Hidden Layer Nodes in the LSTM	Learning Rate	Batch of Samples	Max Epochs
pH	96	0.005	64	50
DO	96	0.005	64	50
COD	72	0.005	64	75
SAL	72	0.005	64	75
NH₃-N	120	0.005	128	75
NO₂⁻	120	0.005	128	75
AP	120	0.005	128	75

Table 5. Performance of DDA-ResNet-LSTM for different predicted parameters.

Evaluation Metrics	Predicted Parameters
Evaluation Metrics	pH	DO	SAL	COD	NH₃-N	NO₂⁻	AP
RMSE	0.0512	0.4146	0.5839	0.3315	0.0219	0.0047	0.0108
MAPE	0.0052	0.0402	0.0178	0.1446	0.1205	0.0971	0.0667
NSE	0.9361	0.9396	0.9342	0.8578	0.8542	0.8372	0.8294

Table 6. Prediction performance parameters of different water quality prediction models in ablation experiment Group A.

Evaluation Metrics	Model	pH	DO	SAL	COD	NH₃-N	NO₂⁻	AP
RMSE	ResNet	0.0843	0.7866	0.8167	0.5387	0.0594	0.0078	0.0406
	DA-ResNet	0.0824	0.7321	0.8025	0.5274	0.0572	0.0076	0.0381
	LSTM	0.0746	0.6219	0.7452	0.4983	0.0515	0.0071	0.0328
	DA-LSTM	0.0732	0.6024	0.7306	0.4896	0.0492	0.0068	0.0306
	DDA-ResNet-LSTM	0.0512	0.4146	0.5839	0.3315	0.0219	0.0047	0.0108
MAPE	ResNet	0.0080	0.0673	0.0472	0.3972	0.3564	0.3021	0.0953
	DA-ResNet	0.0078	0.0668	0.0457	0.3854	0.3427	0.2965	0.0942
	LSTM	0.0073	0.0627	0.0412	0.3465	0.2993	0.2534	0.0893
	DA-LSTM	0.0071	0.0612	0.0396	0.3342	0.2794	0.2476	0.0879
	DDA-ResNet-LSTM	0.0052	0.0402	0.0178	0.1446	0.1205	0.0971	0.0667
NSC	ResNet	0.8287	0.8294	0.8293	0.7021	0.7120	0.7176	0.7158
	DA-ResNet	0.8302	0.8346	0.8365	0.7248	0.7159	0.7217	0.7213
	LSTM	0.8512	0.8502	0.8502	0.7689	0.7639	0.7531	0.7486
	DA-LSTM	0.8576	0.8547	0.8562	0.7734	0.7758	0.7584	0.7523

Table 7. Prediction performance parameters of different water quality prediction models in ablation experiment Group B.

Evaluation Metrics	Model	pH	DO	SAL	COD	NH₃-N	NO₂⁻	AP
RMSE	ResNet-LSTM	0.0585	0.4626	0.6238	0.3883	0.0332	0.0056	0.0169
	DAResNet-LSTM	0.0559	0.4425	0.6023	0.3693	0.0305	0.0052	0.0139
	ResNet-DALSTM	0.0523	0.4278	0.5896	0.3486	0.0287	0.0049	0.0122
	DDA-ResNet-LSTM	0.0512	0.4146	0.5839	0.3315	0.0219	0.0047	0.0108
MAPE	ResNet-LSTM	0.0058	0.0484	0.0247	0.2088	0.1778	0.1393	0.0743
	DAResNet-LSTM	0.0055	0.0435	0.0225	0.1794	0.1427	0.1023	0.0713
	ResNet-DALSTM	0.0053	0.0426	0.0193	0.1621	0.1396	0.0984	0.0692
	DDA-ResNet-LSTM	0.0052	0.0402	0.0178	0.1446	0.1205	0.0971	0.0667
NSC(R2)	ResNet-LSTM	0.9186	0.9176	0.9273	0.8295	0.8317	0.8085	0.7965
	DAResNet-LSTM	0.9287	0.9275	0.9203	0.8402	0.8412	0.8214	0.8112
	ResNet-DALSTM	0.9303	0.9321	0.9286	0.8495	0.8496	0.8296	0.8186
	DDA-ResNet-LSTM	0.9361	0.9396	0.9342	0.8578	0.8542	0.8372	0.8294

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, W.; Wang, J.; Li, Z.; Lu, Q. A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture. Electronics 2025, 14, 331. https://doi.org/10.3390/electronics14020331

AMA Style

Liu W, Wang J, Li Z, Lu Q. A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture. Electronics. 2025; 14(2):331. https://doi.org/10.3390/electronics14020331

Chicago/Turabian Style

Liu, Wenjing, Ji Wang, Zhenhua Li, and Qingjie Lu. 2025. "A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture" Electronics 14, no. 2: 331. https://doi.org/10.3390/electronics14020331

APA Style

Liu, W., Wang, J., Li, Z., & Lu, Q. (2025). A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture. Electronics, 14(2), 331. https://doi.org/10.3390/electronics14020331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Improved Dual-Channel and Dual-Attention Mechanism Model for Water Quality Prediction in Nearshore Aquaculture

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Collection

2.1.1. Data Source Introduction

2.1.2. Data Preprocessing

2.1.3. Sample Production

2.2. Construction of Water Quality Prediction Model

2.2.1. Gram Angle Field (GAF)

2.2.2. Residual Neural Network (ResNet)

2.2.3. Long Short-Term Memory Neural Network

2.2.4. Attention Mechanism

ResNet-Dual-Attention

LSTM-Dual-Attention

2.2.5. DDA-ResNet-LSTM Hybrid Model

2.3. Setting of Model Hyperparameters

2.4. Model Evaluation

3. Results

3.1. Prediction Performance of DDA-ResNet-LSTM

3.1.1. The Performance of the Model on the Test Set

3.1.2. Model Comparison

Ablation Experiments

Comparative Experiments

3.2. Application of the Model

4. Discussion

4.1. Analysis of the Model’s Prediction Effectiveness for Different Water Quality Parameters

4.2. Model Contribution of Parallel Fusion Structure of ResNet and LSTM

4.3. Model Contribution of Dual-Attention Mechanisms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI