1. Introduction
Water quality management is crucial in aquaculture, as it directly relates to the health and growth efficiency of the farmed fish. Good water quality can help reduce the occurrence of diseases, improve feed conversion rates, promote fish growth, and ultimately enhance economic benefits [
1,
2]. Therefore, during the process of aquaculture, great importance should be attached to water quality management to ensure that water conditions are appropriate, stable, and safe. Water quality management involves monitoring and maintaining indicators such as dissolved oxygen, pH value, ammonia nitrogen, and nitrite within appropriate ranges in the water body, as well as reducing water pollution through reasonable feed and substrate management [
3,
4].
The water quality parameters in aquaculture are the result of the interaction of various physical, chemical, and biological processes. These processes are intertwined and mutually influential, causing the water quality parameters to exhibit characteristics such as nonlinearity, coupling, and time-variability [
5]. Water quality parameters can be forecasted by developing a series of coupled differential equations or dynamic models that mirror changes in water quality. However, this methodology necessitates a thorough consideration of the interactions among diverse water quality indicators and the influence of environmental factors on water quality, rendering it highly theoretically sophisticated and complex. Consequently, it demands considerable professional expertise and computational resources for its formulation and resolution. In recent years, artificial intelligence technology, especially deep learning algorithms, has demonstrated powerful capabilities in modeling complex nonlinear systems, making it widely used in tasks such as water quality parameter prediction, stock price prediction, traffic flow prediction, and other similar applications [
6,
7,
8,
9,
10]. Deep learning models possess multi-layered nonlinear structures and nonlinear activation functions, which enable them to capture high-dimensional features and nonlinear relationships in data, and achieve complex function approximation.
Recurrent neural networks (RNNs), based on deep learning, are neural network architectures specifically designed for processing sequential data. They possess feedback connections that enable the network to utilize information from previous time steps when processing current inputs, making them highly effective in handling time-series data with significant temporal dependencies. As one of the most popular variants of RNNs, long short-term memory (LSTM) networks effectively address the issues of gradient vanishing and gradient explosion that arise during RNN training and have become a mainstream approach for time-series prediction [
11]. Huan et al. [
12] proposed a DO prediction model that combines gradient boosting decision trees (GBDTs) with LSTM networks. Chen et al. [
13] established an LSTM network and its attention-based model (AT-LSTM) to predict water quality in the Burnett River in Australia. The research results indicated that the incorporation of the attention mechanism improved the prediction performance of the LSTM model. Wu et al. [
14] proposed a novel hybrid DO prediction model based on LSTM optimized using an improved sparrow search algorithm (ISSA). Wang et al. [
15] introduced a short-term water quality prediction model based on variational mode decomposition (VMD) and an improved grasshopper optimization algorithm (IGOA) to optimize LSTM neural networks. Arepalli et al. [
16] presented a lightweight spatial shared attention LSTM (SSA-LSTM) model for the accurate prediction of hypoxic conditions. Bi et al. [
17] proposed a water quality prediction model that combines VMD, a bidirectional input attention mechanism, an encoder–decoder, and bidirectional long short-term memory (Bi-LSTM) fusion.
LSTM possesses powerful capabilities in extracting temporal features, but it has limitations in extracting local features from input data. A Convolutional Neural Network (CNN), on the other hand, is another specially designed deep learning model. By combining convolutional operations with deep hierarchical structures, a CNN can automatically extract local features from data and build higher-level abstract representations layer by layer. It excels in processing grid-like data such as images, videos, and audio. Residual networks (ResNets) are an important variant of CNNs, which introduce residual blocks to address the degradation problem in training deep networks. In recent years, many scholars have embarked on exploring the integration of CNNs or ResNets with LSTM, aiming to fully leverage the strengths of both to construct more sophisticated and powerful models for processing spatiotemporal data with grid structures [
18,
19,
20,
21].
Barzegar et al. [
22] first proposed a hybrid CNN-LSTM model for predicting water quality parameters. The results demonstrated that the hybrid model outperformed individual models (LSTM, CNN, support vector regression (SVR), and decision tree (DT) models) in predicting DO and Chlorophyll-a (Chl-a). Tan et al. [
23] constructed a neural network model combining CNN and LSTM networks to predict the DO. Experimental results showed that this model provided more accurate predictions, especially in terms of peak fitting, compared to the traditional LSTM. Wang et al. (2024) [
24] proposed a hybrid water quality prediction model based on ensemble empirical mode decomposition (EEMD), which combines a CNN and BiLSTM. The results showed that the proposed model improved the R
2 index by 5%, 7%, and 5%, respectively, compared to the suboptimal model, in predicting the index at 4 h, 1 day, and 2 days. In the aforementioned literature, a CNN and LSTM are connected in series. Firstly, the CNN learns the sequential features of the input, and then the extracted features are passed to LSTM. Finally, LSTM is utilized to handle long-distance dependencies for predicting the target value. Additionally, many scholars have attempted to integrate the attention mechanism into the series-connected CNN-LSTM model. Zhang et al. [
25] integrated a spatial attention mechanism (SAM) and a temporal attention mechanism (TAM) into the CNN-LSTM model to build a multi-index and time-series prediction model for surface water quality. The results indicate that the model incorporating the attention mechanism outperforms the CNN-LSTM model. Furthermore, the model that integrates two attention mechanisms exhibits superior prediction performance compared to the CNN-LSTM model with only a single attention mechanism. Wang et al. [
26] proposed a novel coupled model, AC-BiLSTM, which combines CNN and BiLSTM with an attention mechanism (AM), to address the discontinuous dynamic changes in DO over long time series. Compared to the BiLSTM and CNN-BiLSTM models, AC-BiLSTM exhibits superior performance based on evaluation metrics such as mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination. Additionally, AC-BiLSTM possesses a stronger capability to capture global dependencies. The results from the aforementioned literature indicate that incorporating an attention mechanism module into time-series prediction models can significantly enhance the model’s ability to capture key information and dynamic changes in the data. By computing the correlation weights between different time steps or features, the attention mechanism allows the model to adaptively focus on the parts that have the greatest impact on the prediction results, thereby improving the accuracy of the predictions.
Concurrently, in other domains of time-series prediction, researchers have embarked on exploring hybrid models with a parallel structure that integrates a CNN and LSTM. Qiu et al. (2023) [
19] introduced a model named differential attention residual network long short-term memory (DARLNet) specifically for predicting epileptic seizures. This model comprises two parallel channels: ResNet and LSTM. The ResNet is responsible for capturing local correlation features from the input electroencephalogram (EEG) signals, whereas LSTM handles the extraction of temporal dependency features. Ultimately, the high-level seizure features extracted from both channels are concatenated and fed into a fully connected (FC) layer for feature fusion and seizure detection. The findings reveal that, in comparison to several existing seizure detection methods, this model demonstrates superior prediction performance.
In summary, the hybrid model combining a CNN and an RNN (particularly LSTM) has emerged as a new research trend in the field of time-series prediction. The hybrid model performs better due to its ability to simultaneously extract both local and global features from time-series data, and the parallel structure of the CNN and LSTM offers advantages over the serial structure. Furthermore, integrating an attention mechanism into the hybrid model enables the model to focus more on important features or critical information in the input data, thereby improving the model’s accuracy and efficiency. In light of this, this paper proposes a hybrid model named DDA-ResNet-LSTM for water parameter prediction in offshore aquaculture, which incorporates dual-channel and dual-attention mechanisms. Unlike previous water quality prediction models, DDA-ResNet-LSTM adopts a parallel structure combining a ResNet and LSTM. Additionally, the attention mechanism in this model is designed to be more comprehensive. In the ResNet channel, the Gram Angle Field (GAF) method is utilized to convert one-dimensional time-series data into two-dimensional grid point data that CNNs excel at processing. Combined with dual mechanisms of channel attention and spatial attention, the model can adaptively and dynamically focus on environmental variables and critical moments that have a greater impact on the predicted parameter, thereby enhancing its feature representation capabilities. In the LSTM channel, a recall gate attention mechanism is introduced at the input end to enhance the temporal correlation of the data, and a global attention mechanism is introduced at the output end to highlight the influence of certain important moments throughout the entire time series, thus improving the model’s ability to extract temporal features. Finally, a fully connected layer is used to reduce the dimensionality of the features extracted from both channels, resulting in an end-to-end DDA-ResNet-LSTM hybrid model. This design not only fully leverages the advantages of a CNN and LSTM but also further enhances the model’s prediction performance by incorporating an attention mechanism, providing a new and effective method for dissolved oxygen prediction in offshore aquaculture.
5. Conclusions
This study developed a data-driven model named DDA-ResNet-LSTM, which utilized historical water quality data and meteorological data from the past 24 h to make real-time predictions of seven water quality parameters 2 h in the future, i.e., pH, DO, SAL, COD, NH3-N, NO2−, and AP. The proposed DDA-ResNet-LSTM model combines the advantages of a CNN-based network and an RNN-based network by using Resnet and LSTM to extract spatial correlations and temporal dependencies in parallel. In addition, the Gramian Angular Field (GAF) method is utilized to convert one-dimensional time-series data into two-dimensional grid point data that are well suited for processing by a ResNet. Additionally, the ResNet is combined with the dual mechanisms of channel attention and spatial attention, enabling the model to adaptively and dynamically focus on environmental variables and critical moments that have a greater impact on the predicted water quality parameter. The LSTM incorporated a recall gate attention mechanism at the input end to enhance the temporal correlation of the data and a global attention mechanism at the output end to emphasize the influence of certain important moments throughout the time series.
The reliability of the proposed model was confirmed by the observation data from the offshore aquaculture environment detection system. The prediction accuracy of the proposed DDA-ResNet-LSTM model for pH, dissolved oxygen (DO), and salinity (SAL) (with Nash coefficients of 0.9361, 0.9396, and 0.9342, respectively) is higher than that for chemical oxygen demand (COD), ammonia nitrogen (NH3-N), nitrite (NO2−), and active phosphate (AP) (with Nash coefficients of 0.8578, 0.8542, 0.8372, and 0.8294, respectively). The dual-channel structure and dual-attention mechanism proposed in this paper can significantly improve the predictive performance of the model. Compared to the single-channel model DA-ResNet (ResNet integrated with the proposed dual-attention mechanism), the Nash coefficients for predicting pH, DO, SAL, COD, NH3-N, NO2−, and AP increase by 12.76%, 12.58%, 11.68%, 18.350%, 19.32%, 16%, and 14.99%, respectively. Compared to the single-channel DA-LSTM model (LSTM integrated with the proposed dual-attention mechanism), the corresponding increases in Nash coefficients are 9.15%, 9.93%, 9.11%, 10.91%, 10.11%, 10.39%, and 10.2%, respectively. Compared to the ResNet-LSTM (ResNet and LSTM in parallel) model without the attention mechanism, the improvements in Nash coefficients are 1.91%, 2.4%, 0.74%, 3.41%, 2.71%, 3.55%, and 4.13%, respectively. The predictive performance of the model meets the practical needs for precise prediction of water quality in offshore aquaculture.
In future research, we will focus on optimizing model hyperparameters and refining the design of the model in terms of input and output time ranges. Furthermore, we will broaden the temporal scope of our research by incorporating data from diverse environmental conditions. Specifically, we plan to cluster the data based on weather patterns and then develop prediction models for each cluster of data in order to enhance the applicability and accuracy of our model.