Design of an Intelligent Variable-Flow Recirculating Aquaculture System Based on Machine Learning Methods

. Abstract: A recirculating aquaculture system (RAS) can reduce water and land requirements for intensive aquaculture production. However, a traditional RAS uses a ﬁxed circulation ﬂow rate for water treatment. In general, the water in an RAS is highly turbid only when the animals are fed and when they excrete. Therefore, RAS water quality regulation technology based on process control is proposed in this paper. The intelligent variable-ﬂow RAS was designed based on the circulating pump-drum ﬁlter linkage working model. Machine learning methods were introduced to develop the intelligent regulation model to maintain a clean and stable water environment. Results showed that the long short-term memory network performed with the highest accuracy (training set 100%, test set 96.84%) and F1-score (training 100%, test 93.83%) among artiﬁcial neural networks. Optimization methods including grid search, cuckoo search, linear squares, and gene algorithm were proposed to improve the classiﬁcation ability of support vector machine models. Results showed that all support vector machine models passed cross-validation and could meet accuracy standards. In summary, the gene algorithm support vector machine model (accuracy: training 100%, test 98.95%; F1-score: training 100%, test 99.17%) is suitable as an optimal variable-ﬂow regulation model for an intelligent variable-ﬂow RAS.


Introduction
With global economic growth, consumer demand for seafood products is also increasing. However, fishery productivity is facing a massive challenge of declining resources due to environmental pollution and overfishing [1]. The recirculating aquaculture mode is an effective solution to maintain the supply of seafood products and support the modern and sustainable development of the aquaculture industry while decreasing ecological impact [2]. A recirculating aquaculture system (RAS) can offer a high degree of environmental control

Experimental RAS
The experimental RAS used the recirculating aquaculture system of Dalian Huixin Titanium Equipment Development Co., Ltd. (Dalian city, China) for breeding L. vannamei. Figure 1a shows the schematic of the experimental RAS control system. The control system collected the water quality indicators by connecting them with the sensors. Water quality changes can be monitored in real time, and the centrifugal pump was controlled by variable-frequency operation using a flow regulation model based on machine learning. The variable-flow circulation caused different trends in the drum filter backwash frequency during the unit period (0.5 h). The water quality indicators were used to train the regulation strategy model for variable-flow circulation. The types of water treatment equipment included biofilters, a micro-screen drum filter, an ultraviolet generator, ozone generators, foam fractionators, and oxygenation cones. Figure 1b shows the actual indoor workshop. The RAS contained 10 circular FRP tanks with a diameter of 1.8 m and a depth of 1.4 m, with a total water volume of 35 m 3 . Shrimp were fed five times a day during the culture period with a 36% protein commercial feed (Dale 2# shrimp commercial feeds, Dale, Inc., Yantai, China). During the early stage of shrimp culture, the amount of feed accounted for 5-8% of the total biomass of shrimp. The amount of feed was reduced over time and accounted for 3.7-5% of the total biomass by the end of the culture process. The whole culture process lasted for 90 days, with a culture density of 800 individuals/m 3 and a final yield of 525 kg of shrimp.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 15 The primary purpose of the present study was to develop the circulating pump-drum filter linkage working technique using machine learning methods. Water quality indicators and the backwash frequency of the drum filter were used as primary indicators in developing a variable-flow model. An intelligent variable-flow RAS can rapidly remove suspended solids and reduce ammonia and nitrite generation from the source.

Experimental RAS
The experimental RAS used the recirculating aquaculture system of Dalian Huixin Titanium Equipment Development Co., Ltd. (Dalian city, China) for breeding L. vannamei. Figure 1a shows the schematic of the experimental RAS control system. The control system collected the water quality indicators by connecting them with the sensors. Water quality changes can be monitored in real time, and the centrifugal pump was controlled by variable-frequency operation using a flow regulation model based on machine learning. The variable-flow circulation caused different trends in the drum filter backwash frequency during the unit period (0.5 h). The water quality indicators were used to train the regulation strategy model for variable-flow circulation. The types of water treatment equipment included biofilters, a micro-screen drum filter, an ultraviolet generator, ozone generators, foam fractionators, and oxygenation cones. Figure 1b shows the actual indoor workshop. The RAS contained 10 circular FRP tanks with a diameter of 1.8 m and a depth of 1.4 m, with a total water volume of 35 m 3 . Shrimp were fed five times a day during the culture period with a 36% protein commercial feed (Dale 2# shrimp commercial feeds, Dale, Inc., Yantai, China). During the early stage of shrimp culture, the amount of feed accounted for 5-8% of the total biomass of shrimp. The amount of feed was reduced over time and accounted for 3.7-5% of the total biomass by the end of the culture process. The whole culture process lasted for 90 days, with a culture density of 800 individuals/m 3 and a final yield of 525 kg of shrimp.

Variable-Flow Experiment Design
Turbidity (NTU) is mainly influenced by water flow fluctuations and can only reflect the instantaneous transparency of the water body. This study proposes a technique for detecting turbidity in an RAS based on a micro-screen drum filter. The backwash frequency of the drum filter within a unit period (0.5 h) was used to represent overall RAS turbidity, and the variable-flow regulation model was constructed using the backwash frequency and various water quality data. The variable-flow regulation model can determine the operating frequency of the centrifugal pump for the next period using real-time

Variable-Flow Experiment Design
Turbidity (NTU) is mainly influenced by water flow fluctuations and can only reflect the instantaneous transparency of the water body. This study proposes a technique for detecting turbidity in an RAS based on a micro-screen drum filter. The backwash frequency of the drum filter within a unit period (0.5 h) was used to represent overall RAS turbidity, and the variable-flow regulation model was constructed using the backwash frequency and various water quality data. The variable-flow regulation model can determine the operating frequency of the centrifugal pump for the next period using real-time data from the current period. The intelligent variable-flow RAS technology is implemented by controlling the RAS circulation rate by changing the circulating pump flow rate. The primary purpose of the variable-flow RAS is to implement a linkage control technology to model the relationship between the micro-screen drum filter backwash frequency and the circulation flow rate.
The total flow rate of the circulating pump was set to three levels: 55, 65, and 75 m 3 /h. The circulation rate was operated with a cycle of 24 h. A cycle started with a circulation rate of 55 m 3 /h and was adjusted to 65 m 3 /h after an interval of 24 h and then to 75 m 3 /h after the same interval (24 h). The drum filter controller collected backwash data every 0.5 h. Turbidity sensors were placed at the main return pipeline to monitor and record overall RAS water turbidity. Water quality indicators, including water temperature (T), dissolved oxygen (DO), pH, and salinity, were measured by sensors in real time using YSI ProPlus portable sensors. Total suspended solids (TSS), total ammonia nitrogen (TAN), and nitrite nitrogen (NO2-N) were measured daily with a Palintest 7500 water quality analyzer.
The circulating pump was set to three circulating levels: slow (55 m 3 /h), medium (65 m 3 /h), and fast (75 m 3 /h). In the variable-flow RAS, the circulation rate was maintained at a medium level, and the control system read water quality indicators and backwash times from sensors at every unit period. The circulation rate for the next period could be adjusted to slow or fast levels. The circulation adjustment process could be operated in two ways: upshift and downshift. In the drum filter controller program, the backwash frequency was recorded for 48 periods in a day, using 0.5 h as a period. The circulating pump was utilized to determine the upshift/downshift for the next period by reading the current water quality sensors, current backwash frequency, and current circulating level. A water gauge controlled the drum filter backwash frequency; the backwash frequency reflects water turbidity in the RAS. Downshifts (−1) and upshifts (+1) of circulating pump frequency were used as indicators of circulation levels. The water quality indicators, current circulating pump frequency, and the drum filter backwash frequency were chosen as independent variables, and the downshifts (−1)/upshifts (+1) data were considered as the dependent variable. As the whole culture process lasted for 90 days in the RAS, the total circulation rate was set to 55 m 3 /h for the first 30 days, 65 m 3 /h for the middle 30 days, and 75 m 3 /h for the last 30 days. Establishing a variable-flow circulation strategy was the core task of the experiment, and therefore the circulation rate regulation model was constructed using the optimal classification model based on machine learning to control the variable-flow circulation rate in the RAS.
As shown in Figure 2, the drum filter controller was used to collect the backwash frequency, circulation flow rate, and water quality data that were then uploaded to the industrial PC through the RS485 protocol. The embedded system was connected to the industrial computer. The dataset was processed with the optimal machine learning model in the industrial computer to regulate pump frequency for the next period and feed it back to the embedded system, so that the RAS circulation flow rate could be regulated intelligently.

Artificial Neural Networks (ANNs)
ANNs are statistical learning algorithms that possess prediction and approximation abilities given sufficient and considerable inputs [26]. ANNs are derived from the biological neural networks in the human brain. Interconnected artificial neural networks are usually composed of neurons that can deal with the inputs and follow various situations. ANNs are suitable not only for machine learning but also pattern recognition. Therefore, ANNs have become a popular way of indicating a function by observation in the case of complex data. Figure 3a shows a typical ANN structure, including input, hidden, and output layers.
In this study, several ANN methods, including the backpropagation neural network (BPNN), extreme learning machine (ELM), probabilistic neural network (PNN), and long short-term memory (LSTM) neural network, were used to develop variable-flow models. The BPNN and ELM are feedforward neural networks with no cycles or loops. Information propagates in one direction, forward from the input layer, through the hidden layer, and then to the output layer, in a feedforward neural network.

Artificial Neural Networks (ANNs)
ANNs are statistical learning algorithms that possess prediction and approximation abilities given sufficient and considerable inputs [26]. ANNs are derived from the biological neural networks in the human brain. Interconnected artificial neural networks are usually composed of neurons that can deal with the inputs and follow various situations. ANNs are suitable not only for machine learning but also pattern recognition. Therefore, ANNs have become a popular way of indicating a function by observation in the case of complex data. Figure 3a shows a typical ANN structure, including input, hidden, and output layers.

Artificial Neural Networks (ANNs)
ANNs are statistical learning algorithms that possess prediction and approximation abilities given sufficient and considerable inputs [26]. ANNs are derived from the biological neural networks in the human brain. Interconnected artificial neural networks are usually composed of neurons that can deal with the inputs and follow various situations. ANNs are suitable not only for machine learning but also pattern recognition. Therefore, ANNs have become a popular way of indicating a function by observation in the case of complex data. Figure 3a shows a typical ANN structure, including input, hidden, and output layers.  The activation function can introduce a nonlinear factor to the neuron so that the ANN can approximate any nonlinear function. In the present study, a sigmoid function was adopted in the BPNN model and ELM model. For the sigmoid activation function, it holds that where the output of the sigmoid function is between 0 and 1. For the binary classification task, the output of the sigmoid is divided into a positive class/negative class when the output satisfies a certain probability condition. Figure 3b shows the schematic of the LSTM network. The LSTM network is a special RNN focusing on long sequences of data [27]. A standard LSTM unit comprises a cell, an input gate, an output gate, and a forget gate to solve the long-term dependency problem. Long-term memory information is stored during three steps (forgetting, remembering, and outputting) in an LSTM. In the present study, a rectified linear unit (ReLU) function was applied in the LSTM model. The ReLU function is described as which means that The convergence rate of the stochastic gradient descent obtained by the ReLU function is much faster than the tanh/sigmoid function. However, the learning rate should be set appropriately to prevent neurons in the network from losing their activation ability. In this study, the parameters of the LSTM training process were set as follows: sequence input layer = 9, initial learning rate = 0.01, learning rate drop factor = 0.1, batch size = 128, number of training epochs = 200, hidden layer = 1 (with 32 hidden units). Adaptive moment estimation (Adam) was chosen as the optimization method. The fully connected layer was set as 2 for the binary classification task. Figure 3c shows the architecture of a typical PNN, which was first proposed by Dr. D.F. Specht [28]. As a branch of a radial basis network, PNN has the advantages of a simple learning process and fast training time. Therefore, PNN models can be well implemented in hardware since the neuron number in each layer is fixed. Generally, a PNN network contains four layers: input layer, pattern layer, summation layer, and output layer. The input layer simply distributes the input to the neurons in the pattern layer. The pattern layer neuron may compute its output by Gaussian function when receiving x from the input layer. It holds that where l g denotes the total number of samples, n is the input feature, sigma represents the smoothing parameter, and x ij represents the j-th data of the i-th neuron of the class g. The summation layer connects the pattern layer units of each class, and then the output layer is responsible for outputting the category with the highest score in the summation layer. K-fold cross-validation is useful for preventing models with small datasets from overfitting but is not used too frequently in deep learning. The dataset is equally divided into k parts. Every time a unique fold is used as a validation subset, the remaining pattern examples train the ANN. In this study, we introduced 4-fold cross-validation to evaluate the machine learning models. The evaluation indicators were all calculated by averaging the 4-fold cross-validation results.

Support Vector Machine (SVM)
An SVM has excellent generalization ability between model complexity and learning ability when dealing with limited sample information [29]. In SVM applications, choosing the appropriate kernel function and suitable parameters is crucial for prediction accuracy. As for the linear separable binary classification, finding the optimal hyperplane that divides all samples with maximum margin is the principal function of an SVM. For linear problems, the optimal classification hyperplane in separating two classes of training vector sets D is D = x 1 , y 1 , . . . , x l , y l , x ∈ R n , y ∈ (−1, 1). The plane was assumed as When the optimal classification surface is generated, the vectors are classified without error, and when redundancy occurs, a typical hyperplane is assumed where w and b are constrained: The classification hyperplane in the regular form must satisfy the following constraints: The coordinate of the point x in the hyperplane at a distance d(w, b; x) is The final hyperplane that can satisfy the separated samples is the hyperplane that minimizes the data: For nonlinear classification, the idea of SVM is to map the samples to a high-dimensional space, where the nonlinear problem is transformed into a linear solution using a kernel function, at which point the weight w is expressed as Introducing the relaxation variable ξ(ξ ≥ 0) describing the function interval, the optimization equation under the kernel approach is expressed as The model is described as In the present study, the SVM model was adopted to control the inverter frequency to improve circulating pump operating efficiency under different water quality conditions. The SVM is a kind of machine learning algorithm with a high generalization ability to classify and predict small samples. As upshifting and downshifting of the circulating pump is a binary problem, water quality indicators as variables can provide good generalization ability for the model. Support vector classification (SVC) can be used as the core algorithm for developing drum filter-circulating pump linkage technology. However, there is no international standard for selecting optimal parameters, and the parameter selection principles are based on dataset performance and the construction of a more reliable solution through cross-validation methods [30,31]. Here, we used the Gaussian kernel function in resolving the nonlinear support vector classification task: For the SVM model, the penalty parameter C and RBF kernel parameter g need to be decided to improve the classification accuracy. In the present study, several optimizing algorithms, including grid search (GS), least squares method (LS), genetic algorithm (GA), and cuckoo search (CS) algorithm, were applied to improve the classification performance of the SVM model. The parameters of GA were set as follows: max generation = 300, population size = 50, generation gap = 0.9, range of parameter c = (0, 100), range of parameter g = (0, 1000). For the CS algorithm, the parameters were set as follows: iteration = 300, number of nests = 20, probability = 0.25. The best parameters of GS and LS methods were obtained through the traversal method; the ranges of c and g were set as (0, 100) and (0, 1000), respectively. K-fold cross-validation was utilized in the SVM models to prevent overfitting, and the evaluation indicators were calculated using averaging. The optimal SVM model can be determined by comparing the evaluation indicators of classification results from different algorithms.

Data Processing for Variable-Flow Regulation
Ranges of the water quality data and backwash frequency from the measurements at three total circulation rates in RAS are shown in Table 1. The variable-flow regulation was decided by the frequency of the circulating pump. The upshifting and downshifting of the circulating pump inverter as two indicators of the classifier were labeled as 1 (upshift) and −1 (downshift) in the dataset. In order to develop the variable-flow regulation models based on the machine learning methods, water quality indicators, current circulation flow rate, and current backwash frequency were used as input variables, and regulating data (upshift/downshift) for the next period (0.5 h) were used as output variables. Upshift/downshift data were labeled by manual marking. The marking principal was decided from the variable-flow experiments under three circulation rates in RAS. The binary classification models can be applied for variable-flow regulation strategy, and the current data for water quality indicators and backwash frequency can be used to determine the total circulation rates for the next period through the classification models. A total of 375 datasets were collected in the experiment, of which 280 were used as the training set and 95 as the test set. The training data were normalized after data pre-processing. The first step in developing the machine learning models was to simplify the explanatory variables by principal component analysis (PCA). PCA can reduce the complexity of the dataset and reveal hidden structures. The simplified principal components can be used as valid indicators to develop models. Figure 4 illustrates that the simplified variables reduced the original dataset from nine dimensions (water quality indicators) to three dimensions and could reflect 99% of the information in the original independent variables. However, the key components extracted from the original data were compressed and mapped to another space, and the simplified variables were not directly related to the original data [32]. Hence, in the present study, PCA successfully provided the optimal reduced representation for the data. The new dataset could then be used to develop machine learning models to reduce the complexity of the computation process.
A total of 375 datasets were collected in the experiment, of which 280 were used as the training set and 95 as the test set. The training data were normalized after data preprocessing. The first step in developing the machine learning models was to simplify the explanatory variables by principal component analysis (PCA). PCA can reduce the complexity of the dataset and reveal hidden structures. The simplified principal components can be used as valid indicators to develop models. Figure 4 illustrates that the simplified variables reduced the original dataset from nine dimensions (water quality indicators) to three dimensions and could reflect 99% of the information in the original independent variables. However, the key components extracted from the original data were compressed and mapped to another space, and the simplified variables were not directly related to the original data [32]. Hence, in the present study, PCA successfully provided the optimal reduced representation for the data. The new dataset could then be used to develop machine learning models to reduce the complexity of the computation process.

Results of the ANN Models
ANN classification models, including GA-BP, ELM, PNN, and LSTM, were used to adjust the circulating pump's frequency. The upshifting operation of circulating pump frequency was labeled as 1, and downshifting operation was labeled as −1. The classification process was regarded as a binary classification problem. The classification accuracy

Results of the ANN Models
ANN classification models, including GA-BP, ELM, PNN, and LSTM, were used to adjust the circulating pump's frequency. The upshifting operation of circulating pump frequency was labeled as 1, and downshifting operation was labeled as −1. The classification process was regarded as a binary classification problem. The classification accuracy of both training set and test set data was calculated. For the BPNN model, the GA algorithm was applied to optimize the model performance. Models were tested by cross-validation to prevent the overfitting problem. ANN models were implemented by programming in Python 3.8.5 [33]. For the BPNN model, the maximum epoch was set to 1000 iterations, and the learning rate was set to 0.01 during the training process. The GA method optimized the BPNN model with the lowest error rate (2.59%) at 25 generations. The GA-BP model had the best validation performance (0.12) at epoch 142. For the LSTM model training process, loss and accuracy gradually converged after 350 iterations. The accuracy of the training set reached 100% when the loss was below 0.05. Table 2 presents the evaluations of the ANN classification models. Results showed that the training accuracy of all the ANN models was beyond 90%. PNN and LSTM achieved the most accurate classification (100%). For the test set, the LSTM model had a 96.84% accuracy rate; however, the accuracy rates of other models were less than 90%. Thus, the optimal model was identified as the LSTM model, with the highest accuracy for both the training set (100%) and test set (96.84%) among the ANN models.

Results of the SVM Models
The SVM models were developed in Python 3.8.5. As classification accuracy is directly related to the optimal parameters of the SVM model, we used several optimizing methods to determine penalty parameter c and the kernel parameter g in the present study. Table 3 shows the optimizing methods for SVM models. The optimized parameters were determined by the grid search, least squares, cuckoo search, and gene algorithm. As Table 3 shows, the accuracy rates of classification results of the SVM models were maintained at relatively high levels. The least squares method had 94.29% accuracy, and other methods all had 100% accuracy rates for the training set. The test set from the gene algorithm optimized support vector machine (GA-SVM) model had one set of data classified with the wrong label among 95 groups (accuracy 98.95%). The grid search optimized support vector machine (GS-SVM) and the cuckoo search optimized support vector machine (CS-SVM) both had two error sets (97.89%). For the least squares support vector machine (LS-SVM), the test set results exhibited lower accuracy (96.84%) than other methods. Thus, the GA-SVM was identified as the optimal SVM classification model through comprehensive comparison. Table 3 shows that the four searching algorithms optimized the parameters (penalty c and kernel radius g). Although the accuracy could be maintained at a high level, the ranges of the optimized parameters of the SVM models were quite different. Therefore, it was necessary to further select the SVM model through evaluation indicators.

Model Evaluation
The confusion matrix, which comprehensively reflects the performance of the classifiers, can derive many evaluation indicators. Here, the calculated evaluation indicators, including accuracy, precision, recall, and F1-score, were used to evaluate classification performance for the binary classifier. The SVM model was estimated by 4-fold cross-validation, and the indicators were computed by averaging the folds. Accuracy represents the ratio of correct samples to the total samples without considering the positive and negative. Recall refers to the ratio of the correctly classified positive samples to the total true positive samples, and precision refers to the ratio of correctly classified positive samples to all classified positive samples. The F1-score indicator is proposed based on precision and recall to evaluate the indicators as a whole. The F1-score can be used to comprehensively consider the pros and cons of the classification models. Table 4 shows the results of model evaluation indicators for machine learning classifiers. Figure 5 shows the histograms of the evaluation indicators (accuracy and F1-score) of the training set and test set from machine learning classification models. According to the summaries of the model evaluation indicators, GA-SVM shows both higher accuracy and F1-score than other machine learning methods. Accuracy can reflect the classification correctness of the global results of the model. The F1-score can reflect the weighted average between precision and recall, and the results show that the GA-SVM classifier can be considered as an optimal model for drum filter-circulating pump linkage technology in a variable-flow RAS because the model indicators satisfied the criteria.

Discussion
Feces and residual feed may decompose to organic suspended solids, which further generate TAN and nitrite, harming breeding animals' health. Suspended solids in the RAS also provide surface area that can be colonized by bacteria. As circulation intensity increases, more particles accumulate, which may increase the bacterial carrying capacity of the system. Hence, rapid removal of solid waste is the most critical unit process in an RAS [34]. The traditional method of water quality regulation in an RAS is to act when water quality deteriorates. This approach leads to large fluctuations in the water environment, and the cost of water quality regulation becomes very high, often requiring many water

Discussion
Feces and residual feed may decompose to organic suspended solids, which further generate TAN and nitrite, harming breeding animals' health. Suspended solids in the RAS also provide surface area that can be colonized by bacteria. As circulation intensity increases, more particles accumulate, which may increase the bacterial carrying capacity of the system. Hence, rapid removal of solid waste is the most critical unit process in an RAS [34]. The traditional method of water quality regulation in an RAS is to act when water quality deteriorates. This approach leads to large fluctuations in the water environment, and the cost of water quality regulation becomes very high, often requiring many water exchanges to control water quality. This study proposes regulation of RAS circulation based on process control technology, relying on the microfilter backwash times in a unit period (0.5 h) as the main parameter to reflect the overall turbidity of the water body. The variable-flow RAS circulation strategy was designed to form microfilter-circulating pump linkage technology based on water quality parameters and backwash times at different flow rates. An intelligent variable-flow regulation model was developed to keep the water clean and quickly and dynamically remove suspended solids.
Related research has proven the significant differences in water quality between the high and low makeup water exchange treatment groups [35]. One study has shown that increasing RAS water circulation can effectively reduce ammonia and nitrite [36]; the higher the circulation level, the lower the ammonia and nitrite mass concentrations became. Moreover, the conversion of nitrite revealed a certain hysteresis, and the ammonia peak appeared earlier than the nitrite peak after feeding was stopped.
RAS solids come mainly from uneaten feed and fecal solids, and the decomposition and mineralization of these solids lead to elevated ammonia and nitrite levels in the RAS [10]. Data such as TAN, NO2-N, and TSS must be obtained by manual measurement and are challenging to obtain by sensors. According to Vinatea et al. [37], TSS tended to accumulate in the intensive L. vannamei culture and was eventually reflected in an increase in NTU. As both turbidity and TSS can reflect the clarity of a liquid, the turbidity parameter was used for modeling in this study. The principal component analysis (PCA) results for dimensionality reduction showed that turbidity, dissolved oxygen, pH, and temperature could be used as the leading indicators for modeling. The variable-flow regulation model obtains the current water quality indicators in real time and then applies these indicators to predict and classify the circulation rate for the next period. The turbidity sensor in turbulent flow had a measured data fluctuation that was too large, and the sensor arrangement position also caused measurement errors. An innovative point of this study is that the drum filter backwash frequency over a certain period was used as one of the critical factors for modeling instead of the momentary RAS water turbidity. Backwash times can effectively replace turbidity reading to reflect overall RAS water turbidity, avoiding the instability of the data collected by the turbidity sensor.
The application of machine learning methods in aquaculture-related research is focused mainly on the prediction, classification, and evaluation of water quality indicators such as dissolved oxygen, salinity, pH, ammonia, and nitrite [25]. In the present study, machine learning was used to model the variable-flow regulation strategy. Sensors collected water indicators, including DO, pH, temperature, and turbidity. In order to implement the variable-flow principle, the machine learning methods were introduced in the present study to develop the optimal variable-flow regulation model for RAS. The water quality indicators, the backwash frequency, and the circulating pump frequency were obtained through continuous monitoring. For the ANN methods, the LSTM model was identified as the optimal regulation model, since the accuracy and F1-score indicators reflected the strong ability of the LSTM classifier. The modeling data based on time series were collected from the continuously running RAS in the present study. The water quality indicators, backwash frequency, and total circulation rates were recorded through the fixed time interval during the whole rearing period. Research has shown that LSTM can indeed perform well in processing long time series sequences of data [38]. The optimal classification model needs to be relatively simple in order to be applied in the embedded devices. The variable-flow adjustment strategy in RAS also needs to respond quickly and satisfy the high standard of classification accuracy. All the evaluated indicators of the SVM models demonstrated better results compared with the LSTM model. The gene algorithm contributed the highest accuracy and F1-score among the four optimization algorithms in the classification task. As a supervised algorithm, GA-SVM can be applied to effectively adjust water refreshment in RAS.
In future work on variable-flow RAS regulation, the data-driven model needs to be improved to establish continuous variable-flow control technology by adjusting circulating pump frequency. A larger quantity of data from the running RAS can ensure higher availability and robustness for optimizing the intelligent variable-flow strategy. The continuous variable-flow control technology prerequisite is required for the indicators (water quality, backwash frequency, and rearing cycle) to correspond to the ideal circulation volume. Furthermore, the interaction effects between various indicators need to be revealed through experiments and analysis. The ultimate goal of the study is to achieve a precise circulation control strategy in the RAS and execute rapid water treatment without affecting the health of the reared animals.

Conclusions
A variable-flow regulation model was established in the present study to implement the circulating pump-drum filter linkage working technique. Classification models based on machine learning methods between the explanatory variables and the regulation strategy were developed based on experimental data. ANN models including GA-BP, LSTM, PNN, and ELM were established. The LSTM model had the highest accuracy (training set 100%, test set 96.84%) and F1-score (training 100%, test 93.83%) and was regarded as the best classification model among ANN methods. SVM models were developed and optimized using linear squares, grid search, cuckoo search, and gene algorithm. Results showed that SVM models required less training time and exhibited higher accuracy compared with ANN models. Finally, the optimal model was GA-SVM, with the highest classification accuracy (training 100%, test 98.95%) and F1-score (training 100%, test 99.17%). The model was tested under cross-validation with precise classification performance and used for the circulating pump-drum filter intelligent linkage working technique.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.