Dissolved Oxygen Forecasting in Aquaculture: A Hybrid Model Approach

: Dissolved oxygen (DO) concentration is a vital parameter that indicates water quality. We present here DO short term forecasting using time series analysis on data collected from an aquaculture pond. This can provide the basis of data support for an early warning system, for an improved management of the aquaculture farm. The conventional forecasting approaches are commonly characterized by low accuracy and poor generalization problems. In this article, we present a novel hybrid DO concentration forecasting method with ensemble empirical mode decomposition (EEMD)-based LSTM (long short-term memory) neural network (NN). With this method, ﬁrst, the sensor data integrity is improved through linear interpolation and moving average ﬁltering methods of data preprocessing. Next, the EEMD algorithm is applied to decompose the original sensor data into multiple intrinsic mode functions (IMFs). Finally, the feature selection is used to carefully select IMFs that strongly correlate with the original sensor data, and integrate into both inputs for the NN. The hybrid EEMD-based LSTM forecasting model is then constructed. The performance of this proposed model in training and validation sets was compared with the observed real sensor data. To obtain the exact evaluation accuracy of the forecasted results of the hybrid EEMD-based LSTM forecasting model, four statistical performance indices were adopted: mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE). Results are presented for the short term (12-h) and the long term (1-month) that are encouraging, indicating suitability of this technique for forecasting DO values.


Introduction
Water quality (WQ) is usually determined by the general composition of water in relation to its physical, biological, and chemical properties [1,2].DO (dissolved oxygen) is one of the freshwater properties and undoubtedly one of the most important components for the survival of the aquatic life.DO concentration is an important WQ indicator of water pollution in the aquaculture ecosystem [3].In the aquaculture farms, the required concentration of dissolved oxygen typically depends on the fish species and the water temperature.However, concentrations below 3 mg/L are related to stress in the aquatic species, increasing mortality and disease in most of the species.If this drop in DO concentration can be accurately forecasted, the aquaculture farmers can take early remediation actions to avert this catastrophe by increasing DO concentration, for example, through activating an aeration system [4,5].
Given the crucial role of DO in aquafarming [6], the short-term forecasting of DO concentration is critical in ensuring good WQ management in aquaculture.The short-term forecasting will enable aquaculture farmers to foresee any falling DO concentrations in their farms and give them time to take remedial action to avoid loss/damage to aquatic life.Apparently, it is imperative that in aquaculture, short-term forecasting models are adopted for regular monitoring and control of DO concentration to prevent the potential death of aquatic lives which results from low DO concentration.In other words, the need for efficient DO concentration forecasting models in aquaculture industry cannot be over emphasized.
Given the above-mentioned importance of water quality monitoring [7][8][9], especially DO concentration for improved productivity in aquaculture, this paper proposes a hybrid prediction model to solve the challenge of poor DO concentration forecasting accuracy in aquaculture management.The hybrid model is designed by combining ensemble empirical mode decomposition (EEMD) technique with long short-term memory (LSTM) neural network (NN).The applied EEMD-based LSTM (long short-term memory) method allowed for the decomposition of the original sensor data into multiple intrinsic mode functions (IMFs), which are applied to improve DO concentration forecasting accuracy.
The rest of the paper is organized thus: Section 2 presents the related literature review.Section 3 discussed the methodology used in this study.Section 4 contains the experiments discussion and results.Section 5 presents the general discussion and Section 6 concludes the paper.

Related Literature Review
Several models based on different prediction methods have been developed for DO concentration forecasting in aquaculture ecosystems [10][11][12][13][14][15][16].Xiao et al. [10] applied back propagation (BP) NN method, with a combination of purelin, logsig, and tansig activation functions to propose a prediction model for DO concentration in aquaculture.Wijayanti [11] proposed a forecasting model based on a smooth support vector machine (SSVM) for short-term forecasting of the aquaculture water quality.Guo et al. [12] proposed a numeric forecasting model for DO status through a two-stage training for classification-driven regression (CDR).Xue et al. [13] applied neural network and decision tree to conduct forecasting and warning system regarding DO concentration in carp aquaculture.The effect of their proposed model in practical application shows that the designed system can use both neural network and decision tree methods to forecast DO concentration and conduct early warning by value forecasting and rule-based reasoning, respectively.Liu et al. [14] proposed a prediction model for water quality in smart mariculture with deep bi-directional stacked simple recurrent unit (Bi-S-SRU) learning network.Yan et al. [15] applied a deep belief network and least squares support vector regression (LSSVR) machine to propose a forecasting model based on cross-section water quality.Furthermore, Liu et al. [16] used support vector regression (SVR) machine to propose a hybrid forecasting approach with genetic algorithm optimization for aquaculture ponds DO content.
Although, these studies reported above have shown good performance, as demonstrated in the papers, they all share a common weakness, in the sense that they are limited to single scale feature of the dataset used for training the proposed models.In other words, each of the studies only obtain the surface features of the datasets.However, study has shown that multi-scale prediction methods can obtain more features for the forecasted signals by decomposing the original signal into several sub-sequences [17,18].The decomposition shows that each sub-sequence reveals the disparate intrinsic features of the original signal.The empirical mode decomposition (EMD) method is usually applied for original signal decomposition into its intrinsic multi-scale characteristics [19].Generally, prediction methods that are based on signal's multi-scale characteristics are widely applied in different fields like short-term rainfall forecasting [20], short-term traffic flow prediction [21][22][23] and short-term wind power forecasting [24,25].In the fields of water quality forecasting in aquaculture environment, Li et al. [17] applied the ensemble empirical mode decomposition method to propose an efficient hybrid model for DO concentration forecasting in aquaculture based on original signal multi-scale features, in order to increase the forecasting accuracy of DO content [26] in the aquaculture environment.The experimental results of their proposed hybrid model established that the EEMD method is reliable and effective for the forecasting of DO concentration in intensive aquafarming.From the analysis of these related literatures, it can be seen that novel hybrid forecasting models that are based on multi-scale features of the original datasets are not only effective and reliable, but also suitable for water quality data forecasting in the field of aquaculture.Hence, our study seeks to develop a novel accurate hybrid forecasting model for DO content prediction in aquaculture environment, by combining the potentials of EEMD method with LSTM [27] neural network.
Although a similar study to our proposed novel hybrid EEMD-based LSTM forecasting model was carried out by Li et al. [17] as shown above, their study applied least squares support vector regression (LSSVR), back propagation (BP) neural network, and radial basis function (RBF) neural network.In principle, a linear system is solvable by LSSVR [25], but its limitation lies in solving a large dataset because of its high computational complexity, which is usually of order O n 3 (where n represents size of the training set).This is known to severely limit the benefit of applying LSSVRs in large scale applications.Additionally, like multilayer perceptron neural network (MLPNN), the artificial NNs used in [17], which are back-propagation neural network (BPNN) and radial basis function neural network (RBFNN), have a common challenge of long-term dependency problem.In this study, we used LSTM neural network because of its ability to overcome these above-mentioned problems.Hence, as opposed to the existing solutions, our proposed novel hybrid EEMD-based LSTM water quality forecasting model can overcome the identified research gap, as stated above.This is achieved using the EEMD method and LSTM neural network.

Proposed Model Design
The proposed forecasting model combined EEMD method and LSTM neural network technique to form the hybrid EEMD-based LSTM forecasting model.The main implementation process of EEMD method and LSTM neural network technique is described in the next section.

Ensemble Empirical Mode Decomposition
The EMD method [28] is a widely applied non-linear signal adaptive decomposition method.The EMD algorithm has demonstrated a great potential in decomposing a non-stationary and non-linear time series data into IMFs and a residual through an iterative process with individual intrinsic time scale properties [17].Ensemble EMD (EEMD) is an improved version of the EMD methods, developed by Torres et al. [29], which was aimed at overcoming the problem of intrinsic drawbacks of mode mixing that is associated with the conventional EMD algorithm [17].
EEMD is a noise-aided time series data analysis method.In EEMD method of time series data analysis, white noise is added to enable the separation of contrasting time series scales, which in turn, leads to the improved decomposition efficiency of the EMD method.The introduced white-noise is comprised of components of disparate scale which would systematically fill the entire time-frequency space.The disparate scale components of the signal are spontaneously projected onto proper scales of reference initiated by the Gaussian white-noise, as the systematically distributed white-noise is introduced to the signal.Since all the decomposed components of the introduced Gaussian white-noise consist of both the signal and the introduced white noise, all the individual trials usually end up with noisy results.However, the white-noise can be almost completely cancelled out with the aid of ensemble mean of whole trials, because the white-noise in each of the trials are unique in different trials [30].Consequently, the actual underlying components of the water quality time series data can be represented by the ensemble mean.In other words, EEMD method sums up the components and adopts the average as the true decomposition results.Finally, the result of decomposition solves the mode mixing drawbacks associated with conventional EMD method.It is a useful method for extracting underlying and crucial components from the water quality time series data.
For the DO time series data x(t), the EEMD method follows a certain procedure, which can be described as follows.
Stage 1: Initialize an ensemble number M and the amplitude of the introduced Gaussian white-noise.
Stage 2: Perform the m th trial for introducing disparate white-noise W m (t) to x(t), in order to generate the noise-augmented time series data x m (t), where Stage 3: Determine all the local minima and maxima of x m (t) and use them to generate both lower and upper envelopes, with the help of cubic spline interpolation functions.
Stage 4: Compute the mean m 1 (t) of both lower and upper envelopes.Stage 5: Calculate the difference h 1 (t) that exists between the mean computed in stage 4 and the signal x m (t), using, The two properties of IMF are described as follows: (i) the number of the zero crossing and extrema must either equal or differ at most by 1 over the entire data x(t), and (ii) at any given point, the mean value h 1 (t) of the generated envelopes given by both local minimum and local maximum must be zero.
Stage 7: Separate the residue R 1 (t) from the rest of the dataset using, Let the residue R 1 (t) be a new signal and sift out the remaining IMFs by repeating Stage 3 through Stage 7 n times, until the stopping criterion is satisfied.The applied stopping criterion can be either of the following: (i) when the residue R n (t) is reduced to a monotonic function such that no more IMF can be extracted from it; (ii) when the residue R n (t) or IMF component C 1 (t) becomes smaller than the predetermined value.Then, after the EEMD decomposition process, the original signal x m (t) can be mathematically expressed as the sum total of each of the IMFs C 1 (t) components and the residue R 1 (t).Hence, where n and C i (t) denote the total number of the IMFs C 1 (t) components and the i th IMF, respectively; and R 1 (t) represents the final residue.
and the ensemble residue R n can be expressed as Therefore, the original DO time series data are efficiently decomposed through EEMD method into n ensemble IMFs and a single ensemble residue.In each frequency band, the contained IMF components are individually different, and can change with the variation of the DO dataset x(t).Additionally, the ensemble residue denotes the general trend of the DO dataset x(t).

LSTM Neural Network
As depicted by Figure 1, a set of repeating block-chain facilitates effective learning of the time series information by the LSTM NN.The horizontal line (from C t−1 to C t ) which runs from one block to another above the graph is called the cell state.Cell state runs across the blocks with slight linear interactions to ensure that constant state information is maintained [31].Additionally, LSTM NN uses three different gates (see Appl.Sci.2020, 10, x FOR PEER REVIEW 5 of 15 to another above the graph is called the cell state.Cell state runs across the blocks with slight linear interactions to ensure that constant state information is maintained [31].Additionally, LSTM NN uses three different gates (see ❶ , ❷ , and ❸ in Figure 1).The gates use sigmoid and tanh layer, with a pointwise multiplication to allow for selective passing of information [32].With the aid of the gates which act as a multi-level feature selector, LSTM NN can memorize or forget information.Other artificial NNs have a common challenge of long-term dependency problem.Contrarily, LSTM NN is designed to overcome this problem by using gates which control the processes of memorizing necessary information and forgetting the unnecessary information.This helps LSTM NN to forecast the output of the next time series data, based on the feature of the previous time series data.The equation below illustrates the calculation processes involved.a) Forget gate equation: where   is a vector with values from 0 to 1, with ,   , and   representing the logistic sigmoid function, weight matrices and bias of the forget gate, respectively.From (3), the sigmoid layer determines if the new information is necessary to be used for update or unnecessary and ignored.Then, tanh function adds weight to each value that passed and decides their level of importance ranging from −1 to 1. Similar operations are repeated in input and output gates shown in ( 8) through (11).b) Input gate equations: c) Output gate equations: to another above the graph is called the cell state.Cell state runs across the blocks with slight linear interactions to ensure that constant state information is maintained [31].Additionally, LSTM NN uses three different gates (see ❶ , ❷ , and ❸ in Figure 1).The gates use sigmoid and tanh layer, with a pointwise multiplication to allow for selective passing of information [32].With the aid of the gates which act as a multi-level feature selector, LSTM NN can memorize or forget information.Other artificial NNs have a common challenge of long-term dependency problem.Contrarily, LSTM NN is designed to overcome this problem by using gates which control the processes of memorizing necessary information and forgetting the unnecessary information.This helps LSTM NN to forecast the output of the next time series data, based on the feature of the previous time series data.The equation below illustrates the calculation processes involved.a) Forget gate equation: where   is a vector with values from 0 to 1, with ,   , and   representing the logistic sigmoid function, weight matrices and bias of the forget gate, respectively.From (3), the sigmoid layer determines if the new information is necessary to be used for update or unnecessary and ignored.Then, tanh function adds weight to each value that passed and decides their level of importance ranging from −1 to 1. Similar operations are repeated in input and output gates shown in ( 8) through (11).b) Input gate equations: c) Output gate equations: , and Appl.Sci.2020, 10, x FOR PEER REVIEW 5 of 15 to another above the graph is called the cell state.Cell state runs across the blocks with slight linear interactions to ensure that constant state information is maintained [31].Additionally, LSTM NN uses three different gates (see ❶ , ❷ , and ❸ in Figure 1).The gates use sigmoid and tanh layer, with a pointwise multiplication to allow for selective passing of information [32].With the aid of the gates which act as a multi-level feature selector, LSTM NN can memorize or forget information.Other artificial NNs have a common challenge of long-term dependency problem.Contrarily, LSTM NN is designed to overcome this problem by using gates which control the processes of memorizing necessary information and forgetting the unnecessary information.This helps LSTM NN to forecast the output of the next time series data, based on the feature of the previous time series data.The equation below illustrates the calculation processes involved.a) Forget gate equation: where   is a vector with values from 0 to 1, with ,   , and   representing the logistic sigmoid function, weight matrices and bias of the forget gate, respectively.From (3), the sigmoid layer determines if the new information is necessary to be used for update or unnecessary and ignored.Then, tanh function adds weight to each value that passed and decides their level of importance ranging from −1 to 1. Similar operations are repeated in input and output gates shown in ( 8) through (11).b) Input gate equations: c) Output gate equations: in Figure 1).The gates use sigmoid and tanh layer, with a pointwise multiplication to allow for selective passing of information [32].With the aid of the gates which act as a multi-level feature selector, LSTM NN can memorize or forget information.Other artificial NNs have a common challenge of long-term dependency problem. to another above the graph is called the cell state.Cell state runs across the blocks with slight linear interactions to ensure that constant state information is maintained [31].Additionally, LSTM NN uses three different gates (see ❶, ❷, and ❸ in Figure 1).The gates use sigmoid and tanh layer, with a pointwise multiplication to allow for selective passing of information [32].With the aid of the gates which act as a multi-level feature selector, LSTM NN can memorize or forget information.Other artificial NNs have a common challenge of long-term dependency problem.Contrarily, LSTM NN is designed to overcome this problem by using gates which control the processes of memorizing necessary information and forgetting the unnecessary information.This helps LSTM NN to forecast the output of the next time series data, based on the feature of the previous time series data.The equation below illustrates the calculation processes involved.a) Forget gate equation: where is a vector with values from 0 to 1, with , , and representing the logistic sigmoid function, weight matrices and bias of the forget gate, respectively.From (3), the sigmoid layer determines if the new information is necessary to be used for update or unnecessary and ignored.Then, tanh function adds weight to each value that passed and decides their level of importance ranging from −1 to 1. Similar operations are repeated in input and output gates shown in (8) through (11).b) Input gate equations: = tanh( × ℎ , + ) c) Output gate equations: Contrarily, LSTM NN is designed to overcome this problem by using gates which control the processes of memorizing necessary information and forgetting the unnecessary information.This helps LSTM NN to forecast the output of the next time series data, based on the feature of the previous time series data.The equation below illustrates the calculation processes involved.
(a) Forget gate equation: where F t is a vector with values from 0 to 1, with σ, W f , and b f representing the logistic sigmoid function, weight matrices and bias of the forget gate, respectively.From (3), the sigmoid layer determines if the new information is necessary to be used for update or unnecessary and ignored.Then, tanh function adds weight to each value that passed and decides their level of importance ranging from −1 to 1. Similar operations are repeated in input and output gates shown in (8) through (11).(b) Input gate equations: (c) Output gate equations: (d) Cell state equation: where W i and W o denote the weight matrixes, b i and b o represent the network's bias vectors, of the input and output gates.Tanh represents the hyperbolic tangent function.

Novel Hybrid EEMD-Based LSTM Model
The proposed novel hybrid EEMD-based LSTM forecasting model is depicted in Figure 2.With the proposed new model, the real DO concentration data set is first decomposed by EEMD into several components to improve the forecast accuracy.The detailed procedures illustrated in Figure 2 show the three crucial steps that lead to the development of the new hybrid EEMD-based LSTM forecasting model.In the first step, DO time series data x(t) are decomposed into several IMFs and a residual item R N (t) by EEMD algorithm.The data set decomposition is performed through an iterative sifting process, which is expressed as Appl.Sci.2020, 10, x FOR PEER REVIEW 7 of 15

Water Quality Dataset Acquisition
The data used for the experiments were collected from Laizhou Mingbo mariculture, based at Laizhou City, Shandong Province, China, by a research team at Chinese Agriculture University.A In the second step, each IMF and residual item is normalized and used for forecasting by the LSTM NN.Finally, reverse normalization of individual forecasting results of the LSTM NN is carried out prior to combining all of them together through summation operation to get the final forecasted values, as illustrated in Figure 2. When the final forecasted result is obtained, performance evaluation of the proposed novel hybrid EEMD-based LSTM forecasting model is carried out through the application of three statistical metrics such as mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE).

Water Quality Dataset Acquisition
The data used for the experiments were collected from Laizhou Mingbo mariculture, based at Laizhou City, Shandong Province, China, by a research team at Chinese Agriculture University.A team of researchers from the Chinese Agriculture University visited Laizhou Mingbo farm in Autumn 2019 and collected water quality data.They were responsible for deploying the sensors and their necessary maintenance for the four-month duration of the data collection.The collected raw data consists of seven water quality parameters: DO, pH, water temperature, salinity, Ammonia, and Nitrogen.This choice of parameters is governed by the availability of suitable sensors and the monitoring needs of the mariculture farm.In the presented study, we have investigated forecasting techniques using DO datasets.The reasons for our choice are already indicated in Sections 1 and 2 above.

Data Normalization
where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  ̃ denotes the normalized data set, mean( • ) represents the mathematical expectation of ( • ), and std( • ) represents the standard variance of ( • ).

Problem Formulation
Assuming  = ( 1 ,  2 , ⋯ ,   ) is the set of time and ℱ = ( 1 ,  2 , ⋯ ,   ) is the set of features of the time series data.Then, at time , the value of a feature can be given as a matrix  = [ , ] × (see (14)).In , at time , let  ,  and    represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the shortterm future trends of the target sequence expressed as = ( +,  ) ∀ ∈ 1,2,3, ⋯ , .
where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  ̃ denotes the normalized data set, mean( • ) represents the mathematical expectation of ( • ), and std( • ) represents the standard variance of ( • ).

Problem Formulation
Assuming  = ( 1 ,  2 , ⋯ ,   ) is the set of time and ℱ = ( 1 ,  2 , ⋯ ,   ) is the set of features of the time series data.Then, at time , the value of a feature can be given as a matrix  = [ , ] × (see (14)).In , at time , let  ,  and    represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the shortterm future trends of the target sequence expressed as = ( +,  ) ∀ ∈ 1,2,3, ⋯ , .
where X represents the data matrix at a given time step within the time series, x ij represents the j th feature of i th week, N and M denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation, x ij : where x ij denotes the normalized data set, mean(x• j) represents the mathematical expectation of (x•j), and std(x• j) represents the standard variance of (x•j).

Problem Formulation
Assuming T = ( where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : n ) is the set of time and F = ( where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : , the value of a feature can be given as a matrix X = X here  represents the data matrix at a given time step within the time series,   represents the  ℎ ature of  ℎ week,  and  denote the length and number of features of the time series data, spectively.Finally, the normalization of the data set was done by removing the dimension of each ature using normalization equation,  ̃ : where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : m represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the short-term future trends of the target sequence expressed as = x where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  ̃ denotes the normalized data set, mean( • ) represents the mathematical expectation of ( • ), and std( • ) represents the standard variance of ( • ).

Problem Formulation
Assuming  = ( 1 ,  2 , ⋯ ,   ) is the set of time and ℱ = ( 1 ,  2 , ⋯ ,   ) is the set of features of the time series data.Then, at time , the value of a feature can be given as a matrix  = [ , ] × (see ( 14)).In , at time , let  ,  and    represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the shortterm future trends of the target sequence expressed as = ( +,  ) ∀ ∈ 1,2,3, ⋯ , .

Performance Evaluation Metrics
In this section, the adopted performance evaluation metrics such as MAE, MSE, RMSE and MAPE were used to measure the forecasting accuracy of the new model.These error metrics show the difference between the forecasted values and original (real DO concentration) values and the smaller the differences; better is the performance of the proposed novel hybrid EEMD-based LSTM forecasting model.The error metrics formulas are given as: where  represents the data matrix at a given time step within the time series,   represents the  ℎ feature of  ℎ week,  and  denote the length and number of features of the time series data, respectively.Finally, the normalization of the data set was done by removing the dimension of each feature using normalization equation,  ̃ : where  ̃ denotes the normalized data set, mean( • ) represents the mathematical expectation of ( • ), and std( • ) represents the standard variance of ( • ).

Problem Formulation
Assuming  = ( 1 ,  2 , ⋯ ,   ) is the set of time and ℱ = ( 1 ,  2 , ⋯ ,   ) is the set of features of the time series data.Then, at time , the value of a feature can be given as a matrix  = [ , ] × (see ( 14)).In , at time , let  ,  and    represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the shortterm future trends of the target sequence expressed as = ( +,  ) ∀ ∈ ⋯ , .

Performance Evaluation Metrics
In this section, the adopted performance evaluation metrics such as MAE, MSE, RMSE and MAPE were used to measure the forecasting accuracy of the new model.These error metrics show the difference between the forecasted values and original (real DO concentration) values and the smaller the differences; better is the performance of the proposed novel hybrid EEMD-based LSTM forecasting model.The error metrics formulas are given as: where denotes the normalized data set, mean( • ) represents the mathematical expectation of ( • ), and std( • ) represents the standard variance of ( • ).

Problem Formulation
Assuming = ( , , ⋯ , ) is the set of time and ℱ = ( , , ⋯ , ) is the set of features of the time series data.Then, at time , the value of a feature can be given as a matrix = , × (see ( 14)).In , at time , let , and represent the value of the target sequence and the feature sequence of the forecast target that is selected, respectively.Therefore, our goal is to forecast the shortterm future trends of the target sequence expressed as = , ∀ ∈ 1,2,3, ⋯ , .

Performance Evaluation Metrics
In this section, the adopted performance evaluation metrics such as MAE, MSE, RMSE and MAPE were used to measure the forecasting accuracy of the new model.These error metrics show the difference between the forecasted values and original (real DO concentration) values and the smaller the differences; better is the performance of the proposed novel hybrid EEMD-based LSTM forecasting model.The error metrics formulas are given as: Dissolved Oxygen (mg/L) Level

Performance Evaluation Metrics
In this section, the adopted performance evaluation metrics such as MAE, MSE, RMSE and MAPE were used to measure the forecasting accuracy of the new model.These error metrics show the difference between the forecasted values and original (real DO concentration) values and the smaller the differences; better is the performance of the proposed novel hybrid EEMD-based LSTM forecasting model.The error metrics formulas are given as: where N denotes the number of data points, V i F i represent the real and forecasted values, respectively.

Discussions
Our study used an hourly centered average value for the DO concentration time series data.In this study, EEMD, which is a powerful technique for signal decomposition, was used for decomposing the non-linear and non-stationary signal.Decomposing the DO concentration time series sensor data is an integral part of the proposed hybrid EEMD-based LSTM forecasting model, for forecasting the short-term values of the DO concentration.The adopted EEMD technique decomposed the original DO concentration sensor time series data into eight (8) relatively stable IMFs (IMF 1 to IMF 8) and one residual item (see Figure 4).The EEMD amplitude of Gaussian white noise was set to 0.2.Finally, all the extracted sub-band signals by the adopted EEMD technique were utilized in the decomposition step of the proposed hybrid EEMD-based LSTM forecasting model.The EEMD trend was extracted through the summation of low-frequency IMFs. Figure 5 depicts the non-stationary continuous signal of the real DO concentration composed of sinusoidal waves with a distinct change in frequency.In Figure 3 is shown the distribution of the collected DO concentration raw dataset; it can be observed that at a certain period, the level of DO concentration dropped to its lowest level, by up to 4.0 mg/L and 3.9 mg/L.The proposed hybrid model was applied in this area and the result is presented in Figure 6. Figure 6 shows the area with DO concentration downward decrease to a meagre 4.0 mg/L up to 3.9 mg/L before picking up again by increasing towards DO concentration of up to 5.25 mg/L.The graphs in Figures 7 and 8 reveal that the new hybrid model provided good results and successfully forecasted DO concentration with a high-level of accuracy for both Short-term forecast and Long-term forecast.It is also noteworthy to mention that Figure 8 illustrates fluctuations in the range of 7.0 mg/L to 8.5 mg/L, which are not critical for aquaculture, since aquatic life can only suffer harm, with a possibility of death, when such DO concentration decreases.Figure 8 shows error graphics of the proposed hybrid EEMD-based LSTM forecasting model performance: short-and long-term forecast.In Figure is shown the distribution of the collected DO concentration raw dataset; it can be observed that at a certain period, the level of DO concentration dropped to its lowest level, by up to 4.0 mg/L and 3.9 mg/L.The proposed hybrid model was applied in this area and the result is presented in Figure 6. Figure 6 shows the area with DO concentration downward decrease to a meagre 4.0 mg/L up to 3.9 mg/L before picking up again by increasing towards DO concentration of up to 5.25 mg/L.The graphs in Figures 7 and 8 reveal that the new hybrid model provided good results and successfully forecasted DO concentration with a high-level of accuracy for both Shortterm forecast and Long-term forecast.It is also noteworthy to mention that Figure 8 illustrates fluctuations in the range of 7.0 mg/L to 8.5 mg/L, which are not critical for aquaculture, since aquatic life can only suffer harm, with a possibility of death, when such DO concentration decreases.Figure 8 shows error graphics of the proposed hybrid EEMD-based LSTM forecasting model performance: short-and long-term forecast.In Table 1, the error statistics for both short-term and long-term DO content forecasting performance of our proposed hybrid model are shown.The steep increase in error gap between the short-term and long-term DO content forecasting performance of our proposed hybrid model indicates that the nearer the forecast future the higher the forecast accuracy, and vice versa.Figure 9 shows the forecasting error statistics, using bar charts for both short-term and song-term DO concentration to further emphasize the increase in prediction error as the forecasting period increases.In Table 1, the error statistics for both short-term and long-term DO content forecasting performance of our proposed hybrid model are shown.The steep increase in error gap between the short-term and long-term DO content forecasting performance of our proposed hybrid model indicates that the nearer the forecast future the higher the forecast accuracy, and vice versa.Figure 9 shows the forecasting error statistics, using bar charts for both short-term and song-term DO concentration to further emphasize the increase in prediction error as the forecasting period increases.In Table 1, the error statistics for both short-term and long-term DO content forecasting performance of our proposed hybrid model are shown.The steep increase in error gap between the short-term and long-term DO content forecasting performance of our proposed hybrid model indicates that the nearer the forecast future the higher the forecast accuracy, and vice versa.Figure 9 shows the forecasting error statistics, using bar charts for both short-term and song-term DO concentration to further emphasize the increase in prediction error as the forecasting period increases.In Table 2, the performance of the proposed hybrid EEMD-based LSTM NN is compared with another related hybrid water quality prediction model, based on sparse auto-encoder (SAE) and  In Table 2, the performance of the proposed hybrid EEMD-based LSTM NN is compared with another related hybrid water quality prediction model, based on sparse auto-encoder (SAE) and LSTM NN, SAE-BPNN, single LSTM and BPNN developed by Li et al. [27].The tabulated error statistics indicate that our proposed hybrid model outperforms the other models as listed in Table 2, in terms of the error margin of the forecasted data.This performance gain over the other related prediction models is because our proposed hybrid model applied the EEMD method to effectively decompose the original signal into its constituent several intrinsic sub-sequences.Consequently, the proposed hybrid multi-scale forecasting model can get more features through the decomposition process for the forecasted signals, which further results in improved forecasting accuracy.Amongst the models proposed in [27], the hybrid SAE-LSTM model demonstrated the least error in terms of prediction accuracy.However, the tabulated error statistics in Table 2 indicate that our Hybrid EEMD-based LSTM outperforms the SAE-LSTM model, due to the potentials of the applied EEMD method.

Conclusions
This paper proposes a hybrid prediction model to solve the challenge of poor DO concentration forecasting accuracy [17,26,27] in aquaculture management.The hybrid model was designed by combining the EEMD technique with LSTM NN.The applied EEMD method allowed for the decomposition of the original sensor data into multiple IMFs.Furthermore, this method is used to carefully select IMFs that are strongly correlated with the original sensor data through a feature selection process, and integrate both into inputs for the neural network.The actual experimental WQ data from a fish farm show that the hybrid model provides good results and outperforms related models with high accuracy, as indicated by error metrics shown in Table 2.For future work, a hybrid EEMD-based multi-variate prediction model can be explored to propose a more comprehensive water quality forecasting and analysis.Additionally, more WQ measuring sites will also be considered to expand this model.

Stage 8 :
By adding a different noise in each trial, repeatedly execute Stage 2 to Stage 7 until m = M if m < M, through a consecutive increment of the value of m by using m = m + 1. Stage 9: Determine the i th ensemble mean C i of the M trials for individual IMF, by way of expression,

Figure 1 .
Figure 1.The structure of long short-term memory (LSTM) neural networks (NN) blocks with their corresponding symbols and meanings.

Figure 1 .
Figure 1.The structure of long short-term memory (LSTM) neural networks (NN) blocks with their corresponding symbols and meanings.

Figure 1 .
Figure 1.The structure of long short-term memory (LSTM) neural networks (NN) blocks with their corresponding symbols and meanings.

Figure 1 .
Figure 1.The structure of long short-term memory (LSTM) neural networks (NN) blocks with their corresponding symbols and meanings.

Figure 1 .
Figure 1.The structure of long short-term memory (LSTM) neural networks (NN) blocks with their corresponding symbols and meanings.
After pre-processing the sensor data, 12,852 groups of chronological data are used in constructing the improved EMD-based hybrid LSTM forecasting model.The collected water quality data are divided into two parts: the first 75% of the dataset is used for training the developed hybrid model and the last 25% of the dataset is used for model testing in order to analyze the forecasting performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are shown in Figure 3.The structural representation of the data set can be expressed as follows: X = X Appl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are shown in Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ]

Appl. Sci. 2020 , 1 ,
10, x FOR PEER REVIEW 8 performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data shown in Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] ( where  represents the data matrix at a given time step within the time series,   represents th feature of  ℎ week,  and  denote the length and number of features of the time series d respectively.Finally, the normalization of the data set was done by removing the dimension of feature using normalization equation,  ̃ :  ̃ =   − mean( • ) std( • ) ( Appl.Sci.2020, 10, x FOR PEER REVIEW performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw d shown in Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] where  represents the data matrix at a given time step within the time series,   represents feature of  ℎ week,  and  denote the length and number of features of the time serie respectively.Finally, the normalization of the data set was done by removing the dimension feature using normalization equation,  ̃ :  ̃ =   − mean( • ) std( • ) 2 , • • • , Appl.2020, 10, x FOR PEER REVIEW performance of the proposed novel hybrid EEMD-based LSTM model.The collected shown in Figure 3.The structural representation of the data set can be expressed as fol  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] where  represents the data matrix at a given time step within the time series,   repr feature of  ℎ week,  and  denote the length and number of features of the tim respectively.Finally, the normalization of the data set was done by removing the dim feature using normalization equation,  ̃ :  ̃ =   − mean( • ) std( • ) m ) is the set of features of the time series data.Then, at time Appl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are shown in Figure 3.The structural representation of the data set can be expressed as follows: Appl.Sci.2020, 10, x FOR PEER REVIEW performance of the proposed novel hybrid EEMD-based LSTM model.shown in Figure 3.The structural representation of the data set can be ex  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] where  represents the data matrix at a given time step within the time s feature of  ℎ week,  and  denote the length and number of featu respectively.Finally, the normalization of the data set was done by remo feature using normalization equation,  ̃ :  ̃ =   − mean( • ) std( • ) , Appl.Sci.2020, 10, x FOR PEER REVIEW performance of the proposed novel hybrid EEMD-based LSTM model.shown in Figure 3.The structural representation of the data set can be ex  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] where  represents the data matrix at a given time step within the time s feature of  ℎ week,  and  denote the length and number of featu respectively.Finally, the normalization of the data set was done by remo feature using normalization equation,  ̃ :  ̃ =   − mean( • ) std( • ) N×M (see (14)).X, at time 020, 10, x FOR PEER REVIEW 8 of 15 ance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] (14) represents the data matrix at a given time step within the time series,   represents the  ℎ f  ℎ week,  and  denote the length and number of features of the time series data, ely.Finally, the normalization of the data set was done by removing the dimension of each sing normalization equation,  ̃ :   − mean( • ) , let x pl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 rformance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are own in Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ] (14) here  represents the data matrix at a given time step within the time series,   represents the  ℎ ture of  ℎ week,  and  denote the length and number of features of the time series data, spectively.Finally, the normalization of the data set was done by removing the dimension of each ture using normalization equation,  ̃ :   − mean( • ) , pl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 rformance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are own in Figure 3.The structural representation of the data set can be expressed as follows: m and x Appl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are shown in Figure 3.The structural representation of the data set can be expressed as follows: Appl.Sci.2020, 10, x FOR PEER REVIEW 8 of 15 performance of the proposed novel hybrid EEMD-based LSTM model.The collected raw data are shown in Figure 3.The structural representation of the data set can be expressed as follows:  = [ , ] × = [  11  12 ⋯  1  21  22 ⋯  2 ⋮ ⋮ ⋱ ⋮  1  2 ⋯   ]

Figure 3 .
Figure 3. Four months' distribution of the DO concentration raw data.

Figure 3 .
Figure 3. Four months' distribution of the DO concentration raw data.

Figure 3 .
Figure 3. Four months' distribution of the DO concentration raw data.

Figure 3 .
Figure 3. Four months' distribution of the DO concentration raw data.

Figure 5 .Figure 5 .
Figure 5. Non-stationary continuous signal composed of sinusoidal waves with a distinct change in frequency.

Figure 6 .
Short-term (6 h) forecast result measured against actual dissolved oxygen (DO) concentration values.

Figure 7 .
Figure 7. Short-term (12 h) forecast result measured against actual DO concentration values.

Figure 8 . 15 Figure 8 .
Figure 8.Long-term forecast result measured against actual DO concentration values.

Figure 9 .
Figure 9. Forecasting error statistics for short-term and long-term DO concentration.

Figure 9 .
Figure 9. Forecasting error statistics for short-term and long-term DO concentration.
Stage 6: If the properties of IMF are satisfied by the h 1 (t), that is, from the signal x m (t), C 1 (t) = h 1 (t) becomes the first IMF component.Otherwise, replace x m (t) with h 1 (t) and return to Stage 3.

Table 1 .
Error Statistics for Short-term and Long-term DO Content Forecasting.

Table 1 .
Error Statistics for Short-term and Long-term DO Content Forecasting.

Table 1 .
Error Statistics for Short-term and Long-term DO Content Forecasting.

Table 2 .
Performance Comparison with Related Forecasting Models *.