Gaussian Distribution Model for Detecting Dangerous Operating Conditions in Industrial Fish Farming

: The development of better monitoring technologies, the early combat of outbreaks, massive mortality, and promoting sustainability are challenges that the aquaculture industry still faces, and the development of solutions for this is an open problem. In this paper, focusing our attention on monitoring technologies as a promising solution to these issues, we report a Gaussian distribution model for detecting dangerous operating conditions in industrial ﬁsh farming. This approach allows us to indicate through a 2D image visualization when ﬁsh production is under normal, warning, or dangerous operating conditions. Furthermore, our proposed method has promising possibilities for application in the most varied ﬁelds of science, given that the mathematical procedure described allows us to discover the fundamental statistical structure of physical, chemical, and biological systems governed by laws of a probabilistic nature.


Introduction
Fish mortality, mainly due to outbreaks, is a factor of great concern for fish farmers and it severely affects production in aquaculture facilities [1][2][3]. Over the past decade, tremendous effort has been devoted to developing solutions for this. For instance, outbreaks have constantly been investigated to better identify important pathological and epidemiological pathways of infections, and determine the etiological agents [4][5][6]. Several effective control strategies to reduce high levels of mortality can be mentioned, such as disruption of bacterial quorum sensing [7], sampling time and surveillance [8], remote physiological monitoring [9], long-term monitoring [10], enriched rearing [11], and postbiotics applications [12]. Besides, strategies for sustainable aquaculture have also been widely discussed these days to reduce financial losses caused by high mortality rates. Hydrogen peroxide and peracetic acid therapeutants, natural feed supplements, and humic substances are just some of the promising possibilities for this purpose [13]. High levels of nitrite are the cause of severe physiological disturbances that result in mass fish mortality [14,15]. Nevertheless, the quantity of nitrite required to kill fish changes with each species and may even delay the occurrence of deaths [16,17]. Besides, if a fish with methemoglobinemia is frightened or otherwise forced to become active, it may die of anoxia [18]. This evidence may suggest a possible explanation for the observation of extremely high mortality rates in just a single day [19]. In this way, appropriate nitrite management approaches have also been investigated [20].
Furthermore, one possibility to have better control of fish health and welfare, and consequently reduce the level of mortality, is to apply recirculation aquaculture systems (RAS) that are intensive aquaculture facilities models in which water is partially reused after undergoing treatment [21,22]. Despite all the benefits that RAS can offer, recent investigations reveal that high mortality rates are still observed in RAS facilities, typically resulting from complex interactions between the fish biomass and water chemistry in the hatchery, and where small variations may result in inadequate conditions for the welfare of fish. These variations can cause stress on the fish, reduce their feed intake, impair their growth rate, and even cause them to die [23,24]. To optimize fish production, aquaculture facilities have developed to the point of reaching a large industrial scale [25], monitoring several physical parameters such as, alkalinity, bromine, nitrogen dioxide, pH, ammonia, salinity, temperature, among others, at the same time to satisfy a wide range of operating requirements [26]. However, with all this monitoring structure, new problems quickly emerged. Among the most relevant, is the massive amount of data generated that requires an automated acquisition and specialized analysis [27][28][29], and also the need for frequent maintenance of the sensor devices [26].
In order to fill this gap and develop a mechanism that would allow an efficient analysis of all these parameters generated by the modern aquaculture enterprises, in previous work, we developed a decision-making tool for monitoring these parameters [19]. With this data processing, all aquaculture parameters can be mapped simultaneously as well as grouped according to their variability. This methodology is useful not only to better regulate the parameters of the hatchery, but also to identify possible causes of mortality. Furthermore, deep learning is nowadays pointed out as a possible solution for smart fish farming. Nevertheless, this data processing tool is still in a weak artificial intelligence stage and demands massive amounts of labeled data for training, which has become a bottleneck that restricts its application in aquaculture enterprises [30]. Additionally, computer vision models with emphasis on fish detection and behavior analysis have also been investigated to address these issues [31].
The high levels of mortality, the need for better data monitoring and analysis technologies, and the considerable decline in labor intensity in all aquaculture facilities dictated by COVID-19 require an urgent innovation in current production monitoring technologies for this industry. In order to develop an innovative approach that can indicate the current state of the fish hatchery, avoiding high mortality rates, and to assist in the construction of positive changes in aquaculture practices, we report in this paper a Gaussian distribution model for detecting dangerous operating conditions in these aquaculture facilities. To present this approach, this paper is organized as follows. In Section 2, the acquisition of fish hatchery water quality parameters employed to build the approach is presented. Then, the mathematical formalism of our model is described along with the main steps for its implementation. Subsequently, Section 3 describes the results of the proposed method referring to weaning and pre-fattening development phases of Senegalese sole. Next, we discuss the results obtained in Section 4. Finally, the closing remarks are addressed in Section 5.

Acquisition of Fish Hatchery Water Quality Parameters
To carry out this research, data were collected from two of the main phases of the development of Senegalese sole (Solea senegalensis) over two years: 2018 and 2019. These phases comprise the weaning stage when the fish larvae substitute their diet, changing from live food to artificial and inert feed. Moreover, the other one, denominated pre-fattening, designated by being a stage before the fattening phase, responsible for the fish development until they reach a suitable size and weight for sale and consumption. Figure 1a,b displays the tanks of these two important fish development phases.
The SEA8 group is responsible for this industrial-scale fish farming. In this aquaculture facility, RAS are employed to treat the water, eliminate the waste products excreted by the fish, and add oxygen. The RAS's operating principle is to continuously draw water from fish tanks and take it to a mechanical filter and then to a biological filter. In addition, the water is aerated and stripped of carbon dioxide to finally return to the fish tanks. Furthermore, all stages of fish growth are monitored daily, i.e., 365 measurements over a year. Part of this process consists of measuring a set of physical and chemical parameters such as temperature, pH, ammonia, bromine, and nitrogen dioxide to assess the water quality in the fish hatcheries. Any parameter measured has its own sensor, like temperature sensor, salinity sensor, ammoniacal nitrogen sensor, where each sensor element is positioned at critical points of the circuit. Specifically, the weaning phase has 59 quadrangular tanks, with a water column of 20 cm in a dark room, and the fish stay in this phase until they weigh 1 g. The prefattening phase has 52 tanks with a length of 12 m and a water column between 15 and 20 cm. The fish stay in this phase until they weigh 40 g. Fish growth is determined with weekly weightings that are also used to calculate the right amount of food that needs to be provided. Periodically, or when the growth of the fish is very heterogeneous, screenings are carried out. These screenings are also used to detect abnormalities and diseases. In the weaning phase, the screenings are done manually, while in the pre-fattening phase, they are done by a machine. A full description of the data set collected and analyzed for the two main Senegalese sole development phases is presented in Table 1. Fish hatchery water quality parameters: pH (0-14) ammoniacal nitrogen (mg/L) ammoniacal nitrogen (mg/L) nitrogen dioxide (mg/L) nitrogen dioxide (mg/L) water renewal rate 1 (m 3 /kg) transmittance at 400 µm (%) redox (mV) transmittance at 500 µm (%) bromine (mg/L) fish mortality 2 (dead fish sum) amount of food provided (kg) fish mortality 2 (dead fish sum) Legend: 1 It depends on the amount of food supplied in kg. 2 It is not necessarily a parameter for assessing water quality, but it does provide an analysis of production and it was used in this research.

Gaussian Distribution Model Conceptualization
All data analysis was performed applying Matlab software, and the steps for implementing the Gaussian distribution model are ordered sequentially below.

Data Normalization Operation
Initially, each parameter that integrates the data set collected from aquaculture facilities was normalized between 0 and 1. This normalization operation is given mathematically by Here, P z j represents a specific monitored parameter, where z j : j = 1, . . . , k is the position vector.  [32,33].In this research, to find the principal components of a m × n data matrix X, we employ the singular value decomposition (SVD) algorithm [34].

Model Probability Distribution Identification
The set of raw data formed by the fish hatchery water quality parameters presents a random probabilistic pattern in the cases investigated in this study. In view of this situation, PCA was applied to reduce the dimensionality of this data, increasing their interpretability, but at the same time minimizing the loss of information. After further analysis, it was found to have a well-defined Gaussian distribution with a probability density function given by In this equation, x, µ, and σ are the principal components values, the mean, and the standard deviation, respectively.

2D Visualization of the Gaussian Distribution Model
For a better visual perception of the presented model, the Gaussian distribution curves that describe each of the three possible operating conditions of the fish farming were stacked sequentially according to their standard deviation and the image formed by the top view of these curves creates the visualization of the Gaussian distribution model proposed.

Results
For a better presentation of the results, this section is divided into two subsections: The first one evaluates the performance of our method with data related to the monitoring of the pre-fattening phase of Senegalese sole fish. Subsequently, in order to assess the viability of our approach, the following subsection performs this same analysis, but with data regarding the weaning phase of these fish.

Data Investigation of the Pre-Fattening Phase of Senegalese Sole Development
The raw data set constituted of several parameters from the monitoring of the Senegalese sole fish farm and has a distribution not defined according to Figure 2a. After applying the PCA on these raw data, a new set of variables with well-defined Gaussian distribution (except for the small side tails) is obtained, as exhibited in Figure 2b. Subsequently, this new set of variables was divided into three groups according to fish mortality. The definition of the mortality threshold that describes each of these operating conditions depends on each installation. A large fish farm will have a mortality threshold definition that is different from a small one. In our case, we consider that the mortality rate is low when it is less than 2%. It is moderate when it is between 2% and 10%. And it is high when it exceeds 10%. Figure 3 displays these three groups, each of which describes an operation condition of the fish farm production. From top to bottom is a normal condition with a low mortality rate, followed by a warning condition in which the mortality rises above usual, and, finally, a dangerous condition characterized by high fish mortality rates. These results demonstrate that as the mortality increases, the standard deviation of the Gaussian distribution that represents the data in each of these three groups also grows. Further investigation reveals that there is a well-defined linear relationship between these two quantities, as exhibited in Figure 4a. In this way, an image that indicates the current state of the fish hatchery was designed based on these statistical analyses of Senegalese sole production monitoring data as exposed in Figure 4b.

Data Investigation of the Weaning Phase of the Senegalese Sole Development
The results presented in Figures 2-4 refer to the pre-fattening phase of the Senegalese sole collected throughout 2019. In order to confirm the viability of our proposal, an identical investigation was carried out on the data referring to the phase of weaning of this fish species collected throughout 2018, given that the mortality is a problematic factor in this phase of Senegalese sole development [35]. This additional analysis is shown in Figures 5-7, in the same structure and sequence as the previous one. The results of this subsection confirm the effectiveness and the promising potential that our Gaussian distribution model offers not only for the fish farming industry, but also for other biological, physical, and chemical fields.
It is important to emphasize that each fish development phase has different values for the physiochemical monitoring parameters that assess the quality of the water in the fish tanks. Moreover, even with this difference, our mathematical model can successfully reveal regardless of the fish development phase that the system (here represented by the physio-chemical parameters) has an internal structure with a very well-defined probabilistic nature as displayed in Figure 5b, which evolves while preserving this characteristic, as shown in Figure 6, and consequently supports the building of a highly precise Gaussian distribution model as exhibited in Figure 7. Furthermore, these results suggest that non-standard variations of a wide range of parameters can be monitored with our method. For example, we can identify when the fish's stress level is high (through cortisol measurements [36,37]), if the amount of food provided is not adequate for the fish growth, or even the rate of water renewal in the tanks it is not ideal for fish welfare, among other possibilities.

Discussion
Our Gaussian distribution model provides three possible working environmentsnormal, warning, and dangerous, and to differentiate between them, we adopted the fish mortality rate as a demarcation criterion. The choice of this parameter as a separation boundary is based on two fundamental justifications. First, mortality is one of the main concerns of fish farmers in several countries, as we have already covered in the introduction section. Second, various physical parameters such as temperature, pH, nitrite, ammonia, salinity, alkalinity, bromine, among others, are measured daily, and a dangerous health condition for fish does not occur with a change in a single of these parameters, as changing one alters others too. Therefore, an inappropriate situation occurs when all of them assume a specific combination of values that originate an unsuitable environment for the health and welfare of the fish, making them sick and consequently die. In other words, there is a direct relationship between the general change in all parameters that assess water quality in fish hatchery with mortality. A deeper discussion about the correlation between the gathered data and the fish mortality can be found in our previous work [19]. However, it is necessary to stress here that the proposal specified here takes into account the parameters measured by the intensive aquaculture facilities from Safiestela SA. (from SEA8 group), in Portugal. In this way, the results may differ slightly from fish farming facilities that measures other parameters than those described in this research.
A comparison between the results obtained with the data analysis from the prefattening and weaning phases of Senegalese sole reveals some points worth mentioning. Initially, Figure 5b confirms that after applying the PCA, the resulting principal components have a very well-defined Gaussian distribution. Second, when comparing Figures 4b and 7b, we perceive that the normal and dangerous operating conditions for these two phases of Senegalese sole development are very similar to each other, so that only the warning condition presents a slight discrepancy (small difference in the Gaussian distribution) produced by the fact that in each of these analyzes the monitoring parameters are not exactly the same. Furthermore, it is not our intention to determine the fish mortality rate. The purpose of the analysis presented in Figures 4a and 7a is only to demonstrate that we can use the mortality rate as a criterion indicating a dangerous situation for fish. Besides, the practical use of our method does not require any additional instrumentation costs, as this is based on the statistical analysis of physical parameters that are already routinely measured. The application of the PCA to the monitoring data before looking for a standard probability distribution that describes the data is also one of the strengths of our approach. PCA is a multivariate statistical data technique that has already been successfully applied in several scientific fields to assist in the detection and interpretation of information hidden in the data. Specifically in our case, with the PCA we were able to determine a Gaussian distribution pattern that was previously not noticeable in the raw data.

Conclusions
In summary, this study presented a Gaussian distribution model for detecting dangerous operating conditions in fish farming facilities. In our approach, the mortality parameter responds to the stimuli caused by the change in the set of physicochemical parameters of the water in the fish tanks in a specific and very linear manner. Therefore, this linear behavior between these quantities were explored as a data processing technique. This approach constitutes a powerful method to assist in aquaculture practices, specifically in the early combat of outbreaks and high mortality rates that constantly generate massive financial losses to the fish market. Investigations in development comprise evaluating this approach introduced here on other growth stages of Senegalese sole, as well as on different fish species produced in diversified fish farming environments. Furthermore, our proposed approach has promising opportunities for application in all areas of the natural sciences, as the mathematical procedure reported here allows us to understand in a probabilistic perspective the behavior of physical, chemical, and biological systems represented by a set of parameters whose internal structure evolves and is governed by laws of an intrinsically statistical nature.