An Application of Fish Detection Based on Eye Search with Artificial Vision and Artificial Neural Networks

A fish can be detected by means of artificial vision techniques, without human intervention or handling the fish. This work presents an application for detecting moving fish in water by artificial vision based on the detection of a fish′s eye in the image, using the Hough algorithm and a Feed-Forward network. In addition, this method of detection is combined with stereo image recording, creating a disparity map to estimate the size of the detected fish. The accuracy and precision of this approach has been tested in several assays with living fish. This technique is a non-invasive method working in real-time and it can be carried out with low cost. Furthermore, it could find application in aquariums, fish farm management and to count the number of fish which swim through a fishway. In a fish farm it is important to know how the size of the fish evolves in order to plan the feeding and when to be able to catch fish. Our methodology allows fish to be detected and their size and weight estimated as they move underwater, engaging in natural behavior.


Introduction
Object detection has been a much-discussed topic in artificial vision studies, since a good identification of the required object provides the basis for collecting information via image processing in a system of this type. In the proposed work, the object to be located is a live fish that is submerged in water and where the conditions of both light and visibility may be variable. The detection of fish in their natural environment using artificial vision allows the setting aside of techniques which are potentially invasive for the fish, such as the use of sensors [1], subjecting the specimens to stress, or other traditional techniques such as direct vision that involves continuous supervision by an operator. Our technique can be used in several applications, ranging from monitoring the growth of fish in aquaculture in order to adapt their feeding, to estimating differences in body size and condition of migratory fish moving through fish passageways.
The use of video cameras is one of the most powerful methods to detect objects as it provides a great deal of information and is also one of the cheapest methods. To detect objects under water there are several techniques, one widely used of which is acoustic technology [2,3]. One of the great advantages of acoustic technology is the possibility of detecting objects at many distance ranges, but The proposed technique was first evaluated in a controlled environment, and later experiments were conducted in a real scenario with fish of different size and under changing conditions of light and turbidity of water. For the video acquisition, two submersible synchronized cameras were used to allow stereoscopic capture and thus be able to perform a measurement of the fish, once detected. Section 2 of this paper describes two techniques for fish detection: first, the technique of eye detection using the Hough algorithm, and second, eye detection using an Artificial Neural Network. We combine both techniques to obtain a better detection capacity against the different conditions that can occur: turbidity, particles in suspension in the water, bubbles, and different positions and movements of the fish. The performed detection is applied in Section 3, to estimate the size of the fish. Section 4 presents the analyses and results of the technique. Finally, Section 5 includes our conclusions, and possible future improvements are discussed.

Materials and Methods
In order to be able to measure fish length or weight, the first step is their detection via the input images. To achieve this objective, several image filtering techniques are applied, as well as background subtraction models, contour detection and cascade classifiers, so method combinations can be additive and complementary. First, the background subtraction model is detailed. After obtaining a rough idea of where the fish is positioned in the images as a result of background subtraction, a validation technique is applied, ensuring that the discovered objects are indeed fish.
The first fish detection technique that will be analyzed is based on fisheye detection using the Hough Transform. Since this approach is highly sensitive to noise in the input images, a second fish detection approach using cascade classifiers is proposed. This approach does not perform fisheye detection, but instead attempts to detect the fish shape and texture. This makes the proposed system more robust against noise and possible false positives. In addition, this technique also poses the opportunity for fish species recognition. In parallel to this technique, a trained ANN is obtained to detect whether there is a fisheye in an image and also a fish-like shape. In this way, combining both techniques, detection capacity is improved.
Before applying these techniques, the image is pre-processed with different filtering techniques (greyscale, gaussian blur, mean shift), in order to improve the characteristics of the image, and in the next step a background subtraction method is employed (as in [17]). In Figure 1 we show the steps of the algorithm.

Image Filtering
Since the images from our test scenarios are substantially noisy and the test environment is also of a difficult nature (bubbles, turbidity, etc.), several image filtering techniques are used to try to enhance the processed images and ease the work of background subtraction and detection techniques. As Appendix A details, the first step is the conversion of the image colors from RGB (Red-Green-Blue) to grayscale (as can be seen in Equation (A1)). Then, in order to reduce noise and improve the robustness of the background subtraction technique, a Gaussian Blur filter is applied to the input images. This filter uses a Gaussian matrix as its underlying kernel, which produces output pixel values only upon the pixel values in its neighborhood, as processed by the convolution kernel (Equation (A2)).
Since the input images are too noisy and complex to obtain good circle matches with the Hough Transform, several steps have to be applied to the input images before fisheyes can be detected. The first step computes a Mean Shift Filter (Equation (A3)). In the second step, the edge contrast of an image is enhanced. In this way the Hough transform is made more robust, An unsharp mask is also applied to the images in order to achieve this objective. The sharpening process utilizes a slightly Capture stereo images Image1, Image2 Preprocess Image1 Background subtraction in Image1 Hough algorithm for Fish detection in Image1 Trained ANN for Fish detection in Image1

Image Filtering
Since the images from our test scenarios are substantially noisy and the test environment is also of a difficult nature (bubbles, turbidity, etc.), several image filtering techniques are used to try to enhance the processed images and ease the work of background subtraction and detection techniques. As Appendix A details, the first step is the conversion of the image colors from RGB (Red-Green-Blue) to grayscale (as can be seen in Equation (A1)). Then, in order to reduce noise and improve the robustness of the background subtraction technique, a Gaussian Blur filter is applied to the input images. This filter uses a Gaussian matrix as its underlying kernel, which produces output pixel values only upon the pixel values in its neighborhood, as processed by the convolution kernel (Equation (A2)).
Since the input images are too noisy and complex to obtain good circle matches with the Hough Transform, several steps have to be applied to the input images before fisheyes can be detected. The first step computes a Mean Shift Filter (Equation (A3)). In the second step, the edge contrast of an image is enhanced. In this way the Hough transform is made more robust, An unsharp mask is also applied to the images in order to achieve this objective. The sharpening process utilizes a slightly blurred version of the original image (Gaussian blur), which is subsequently subtracted from the original to detect its edges, effectively creating an unsharp mask. Contrast is then increased, along with these edges (Equation (A4)). Finally, since this type of filter can cause artifacts on edge borders which may lead to double edges, the Laplacian of the original image is also added to the sharpened image (Equations (A5) and (A6))

Background Subtraction
As performance is key in our application for detecting fish in real time, the first implemented background subtraction method is the simplest and fastest. This approach uses the Gaussian Blur filter in order to reduce noise in the input images.
Next, this blurred image (img 1 in Equation (1)) is compared to the background image (img 2 in Equation (1)) using absolute differences. The following equation is employed for this purpose: Once the absolute difference is computed, it can be used to infer which pixels belong to the background and which pixels belong to a moving object. Even when using the previously mentioned filters, noise still presents a problem, in the form of false positives. In order to discard as many false positives as possible, a threshold filter was used: After disregarding small differences as a result of applying the aforementioned filter, the matches can be expanded to fill holes in the detected objects. This phase involves the dilation of the image with the detected objects, using a 3 × 3 rectangular structuring element as the shape of the pixel neighborhood, over which the maximum will be chosen: After the dilation process, the resulting image is run through a contour detection algorithm [18]. The algorithm will detect the object contours by border following, in an image with edge pixels that can be the result of, for example, a Canny edge detector [19], yielding a binary image with the detected borders as a result. This algorithm is able to distinguish between interior boundaries and exterior boundaries of zero regions (holes).
In Figure 2, we can see contours of the two mentioned types; exterior contours are represented by dashed lines, and interior contours by dotted lines: Water. 2020, 10, x FOR PEER REVIEW 5 of 20 blurred version of the original image (Gaussian blur), which is subsequently subtracted from the original to detect its edges, effectively creating an unsharp mask. Contrast is then increased, along with these edges (Equation (A4)). Finally, since this type of filter can cause artifacts on edge borders which may lead to double edges, the Laplacian of the original image is also added to the sharpened image (Equations (A5) and (A6))

Background Subtraction
As performance is key in our application for detecting fish in real time, the first implemented background subtraction method is the simplest and fastest. This approach uses the Gaussian Blur filter in order to reduce noise in the input images.
Next, this blurred image (img1 in Equation (1)) is compared to the background image (img2 in Equation (1)) using absolute differences. The following equation is employed for this purpose: Once the absolute difference is computed, it can be used to infer which pixels belong to the background and which pixels belong to a moving object. Even when using the previously mentioned filters, noise still presents a problem, in the form of false positives. In order to discard as many false positives as possible, a threshold filter was used: After disregarding small differences as a result of applying the aforementioned filter, the matches can be expanded to fill holes in the detected objects. This phase involves the dilation of the image with the detected objects, using a 3 × 3 rectangular structuring element as the shape of the pixel neighborhood, over which the maximum will be chosen: After the dilation process, the resulting image is run through a contour detection algorithm [18]. The algorithm will detect the object contours by border following, in an image with edge pixels that can be the result of, for example, a Canny edge detector [19], yielding a binary image with the detected borders as a result. This algorithm is able to distinguish between interior boundaries and exterior boundaries of zero regions (holes).
In Figure 2, we can see contours of the two mentioned types; exterior contours are represented by dashed lines, and interior contours by dotted lines: After the object contours are calculated, the minimal up-right bounding rectangle of the detected object can be easily computed. This yields the potential Region of Interest (ROI) of the object. Since the ROI is not guaranteed to contain a fish, it is first validated as a fish, and only when estimated to enclose a fish is the measurement algorithm applied for estimation. After the object contours are calculated, the minimal up-right bounding rectangle of the detected object can be easily computed. This yields the potential Region of Interest (ROI) of the object. Since the ROI is not guaranteed to contain a fish, it is first validated as a fish, and only when estimated to enclose a fish is the measurement algorithm applied for estimation.

Hough Eye Detection
As mentioned in the previous section, once a ROI is obtained with a potential fish candidate, further validation that the detected object is a fish is needed. The first method designed to achieve this objective attempts to identify the eye of the fish in order to distinguish fishes from noise, bubbles, vegetation or other types of marine life. This method is deterministic, meaning that no training is required (in contrast to other methods, such as Artificial Neural Networks), since it mainly relies on the detection of circles using the Hough Transform.
The first steps that are performed in order to try to detect fisheyes deal with the enhancing of the input image in order for the Hough Transform to work better. As mentioned in an earlier section, Mean Shift Filtering is applied to reduce noise and an Unsharp Mask to increase edge contrast.
In addition, before applying the Hough Transform to the image, a Maximally Stable Extremal Regions (MSER) extractor [20] is used to detect binary large objects (BLOBs), a group of connected pixels in an image. This step is necessary due to the issues that the Hough Transform has with noisy and low contrast images. Unlike MSER, it is robust against blur and scale, and useful when it comes to processing images acquired through real-time sources, such as a camera submerged in a fish tank. It reduces the number of false positives of the Hough Transform, which processes the resulting binary image with only the detected white BLOBs against a black background. MSER could also be used to detect the circles pertaining to the fish's eyes, but one would need to measure the parameters of the eye shape in order to detect candidates in advance.
The Circle Hough Transform [21,22] provides a more general method that was specifically adapted to detect circles. This method can be used to determine the parameters of a circle when a number of points inside its perimeter are known. A circle with radius R and center (a,b) can be described by the following parametric equations: For (0 . . . 360) degrees, the points x and y trace the perimeter of a circle ( Figure 3). This is a 3D parameter space, in which the circle parameters can be identified by the intersection of many conic surfaces defined by points on the 2D circle. An accumulator matrix is used for tracking the different intersection points. The true center point will be common to all parameter circles and can be found with a Hough accumulator array.

Hough Eye Detection
As mentioned in the previous section, once a ROI is obtained with a potential fish candidate, further validation that the detected object is a fish is needed. The first method designed to achieve this objective attempts to identify the eye of the fish in order to distinguish fishes from noise, bubbles, vegetation or other types of marine life. This method is deterministic, meaning that no training is required (in contrast to other methods, such as Artificial Neural Networks), since it mainly relies on the detection of circles using the Hough Transform.
The first steps that are performed in order to try to detect fisheyes deal with the enhancing of the input image in order for the Hough Transform to work better. As mentioned in an earlier section, Mean Shift Filtering is applied to reduce noise and an Unsharp Mask to increase edge contrast.
In addition, before applying the Hough Transform to the image, a Maximally Stable Extremal Regions (MSER) extractor [20] is used to detect binary large objects (BLOBs), a group of connected pixels in an image. This step is necessary due to the issues that the Hough Transform has with noisy and low contrast images. Unlike MSER, it is robust against blur and scale, and useful when it comes to processing images acquired through real-time sources, such as a camera submerged in a fish tank. It reduces the number of false positives of the Hough Transform, which processes the resulting binary image with only the detected white BLOBs against a black background. MSER could also be used to detect the circles pertaining to the fish's eyes, but one would need to measure the parameters of the eye shape in order to detect candidates in advance.
The Circle Hough Transform [21,22] provides a more general method that was specifically adapted to detect circles. This method can be used to determine the parameters of a circle when a number of points inside its perimeter are known. A circle with radius R and center (a,b) can be described by the following parametric equations: For (0… 360) degrees, the points x and y trace the perimeter of a circle ( Figure 3). This is a 3D parameter space, in which the circle parameters can be identified by the intersection of many conic surfaces defined by points on the 2D circle. An accumulator matrix is used for tracking the different intersection points. The true center point will be common to all parameter circles and can be found with a Hough accumulator array. Since the radius is not known, the points locus falls on the surface of a cone due to the changing radius R. Instead of a circle, each point on the perimeter of the geometric space circle produces a cone surface. Thus, the vector (a,b,R) corresponds to the accumulation cell where most cone surfaces intersect. After the candidate circles are obtained, the researchers still cannot be certain whether they are fisheyes. Further processing is needed to ensure that they are indeed eyes. Besides, if multiple matches are found, only the best should be selected, since our case study needs to measure the fish Since the radius is not known, the points locus falls on the surface of a cone due to the changing radius R. Instead of a circle, each point on the perimeter of the geometric space circle produces a cone surface. Thus, the vector (a,b,R) corresponds to the accumulation cell where most cone surfaces intersect. After the candidate circles are obtained, the researchers still cannot be certain whether they are fisheyes. Further processing is needed to ensure that they are indeed eyes. Besides, if multiple matches are found, only the best should be selected, since our case study needs to measure the fish and the objective is to detect the fish on its side (only one eye is visible from that position). In order to ascertain whether the detected circles are fisheyes or not, the fisheyes in our case study were observed in order to develop a score system in order to classify matched circles. The types of fish that the system was tasked to detect are common fish and looks like the shape in the following pattern ( Figure 4). The characteristics of its eyes can be observed.
Water. 2020, 10, x FOR PEER REVIEW 7 of 20 and the objective is to detect the fish on its side (only one eye is visible from that position). In order to ascertain whether the detected circles are fisheyes or not, the fisheyes in our case study were observed in order to develop a score system in order to classify matched circles. The types of fish that the system was tasked to detect are common fish and looks like the shape in the following pattern ( Figure 4). The characteristics of its eyes can be observed. Generally, fish eyes have a sclera [23], also known as the white of the eye (Figure 4). This is the opaque, fibrous, protective, outer layer of the eye that contains collagen and elastic fiber. This part has a very characteristic white color that surrounds the pupil of the fisheye. The pupil [24] is a hole located in the center of the eye that allows light to strike the retina. It appears black because of the fact that light rays entering it are either absorbed by the tissues inside the eye directly, or absorbed after diffusing reflections within the eye, which results in them missing the exit of the narrow pupil. The spherical lens of the fisheye protrudes through the pupil opening of the iris. In contrast, humans have a flattened camera-like lens sitting below the iris and pupil. The human iris is therefore adjustable according to light intensity, while the fisheye iris is not.
These characteristics make the fisheye an identifying feature that has the potential to allow us to distinguish fish from other objects. The proposed score system accounts for these two factors, the sclera and pupil of the fisheye. This score system checks the difference between color averages in both regions, since there is usually a high contrast between them as a result of the white and black colors. In order to obtain the scores of the detected candidates and taking into account that different fishes can have different eye features, the user needs to establish several parameters. These parameters include the pupil percentage, sclera percentage, pupil color target and sclera color target. Figure 5 shows what these parameters represent: The pupil percentage is the percentage of the eye that corresponds to the pupil; the rest of the eye yielding the sclera percentage. The two remaining parameters, the pupil color target, and the sclera color target respectively, account for the color average of the pupil and the sclera of the fish's eye. Once the parameters are chosen depending on the fish type, the next step involves iterating over all the pixels in the candidate match and calculating the color averages of the different fisheye Generally, fish eyes have a sclera [23], also known as the white of the eye (Figure 4). This is the opaque, fibrous, protective, outer layer of the eye that contains collagen and elastic fiber. This part has a very characteristic white color that surrounds the pupil of the fisheye. The pupil [24] is a hole located in the center of the eye that allows light to strike the retina. It appears black because of the fact that light rays entering it are either absorbed by the tissues inside the eye directly, or absorbed after diffusing reflections within the eye, which results in them missing the exit of the narrow pupil. The spherical lens of the fisheye protrudes through the pupil opening of the iris. In contrast, humans have a flattened camera-like lens sitting below the iris and pupil. The human iris is therefore adjustable according to light intensity, while the fisheye iris is not.
These characteristics make the fisheye an identifying feature that has the potential to allow us to distinguish fish from other objects. The proposed score system accounts for these two factors, the sclera and pupil of the fisheye. This score system checks the difference between color averages in both regions, since there is usually a high contrast between them as a result of the white and black colors. In order to obtain the scores of the detected candidates and taking into account that different fishes can have different eye features, the user needs to establish several parameters. These parameters include the pupil percentage, sclera percentage, pupil color target and sclera color target. Figure 5 shows what these parameters represent: Water. 2020, 10, x FOR PEER REVIEW 7 of 20 and the objective is to detect the fish on its side (only one eye is visible from that position). In order to ascertain whether the detected circles are fisheyes or not, the fisheyes in our case study were observed in order to develop a score system in order to classify matched circles. The types of fish that the system was tasked to detect are common fish and looks like the shape in the following pattern ( Figure 4). The characteristics of its eyes can be observed. Generally, fish eyes have a sclera [23], also known as the white of the eye (Figure 4). This is the opaque, fibrous, protective, outer layer of the eye that contains collagen and elastic fiber. This part has a very characteristic white color that surrounds the pupil of the fisheye. The pupil [24] is a hole located in the center of the eye that allows light to strike the retina. It appears black because of the fact that light rays entering it are either absorbed by the tissues inside the eye directly, or absorbed after diffusing reflections within the eye, which results in them missing the exit of the narrow pupil. The spherical lens of the fisheye protrudes through the pupil opening of the iris. In contrast, humans have a flattened camera-like lens sitting below the iris and pupil. The human iris is therefore adjustable according to light intensity, while the fisheye iris is not.
These characteristics make the fisheye an identifying feature that has the potential to allow us to distinguish fish from other objects. The proposed score system accounts for these two factors, the sclera and pupil of the fisheye. This score system checks the difference between color averages in both regions, since there is usually a high contrast between them as a result of the white and black colors. In order to obtain the scores of the detected candidates and taking into account that different fishes can have different eye features, the user needs to establish several parameters. These parameters include the pupil percentage, sclera percentage, pupil color target and sclera color target. Figure 5 shows what these parameters represent: Figure 5. This illustration shows the lines that delimit the sclera and pupil percentages of the candidate match.
The pupil percentage is the percentage of the eye that corresponds to the pupil; the rest of the eye yielding the sclera percentage. The two remaining parameters, the pupil color target, and the sclera color target respectively, account for the color average of the pupil and the sclera of the fish's eye. Once the parameters are chosen depending on the fish type, the next step involves iterating over  The pupil percentage is the percentage of the eye that corresponds to the pupil; the rest of the eye yielding the sclera percentage. The two remaining parameters, the pupil color target, and the sclera color target respectively, account for the color average of the pupil and the sclera of the fish's eye. Once the parameters are chosen depending on the fish type, the next step involves iterating over all the pixels in the candidate match and calculating the color averages of the different fisheye features. Assuming that the pupil and sclera can be modelled with ellipses, this is a matter of checking whether a certain pixel is enclosed in the ellipse defined by the pupil and sclera percentages. This can be ascertained using the following formula: x and y being the pixel position, h and k the position of the ellipse center, r x the semi-major axis and r y the semi-minor axis. If the inequality is satisfied, then the pixel will be inside the ellipse. In contrast, if the inequality is not satisfied, then the point will be outside the ellipse. Moreover, knowing if a pixel is inside the sclera or the pupil is simple: if the pixel is inside the two ellipses it will belong to the pupil; in contrast, if the pixel is just inside the outer ellipse, it will belong to the sclera.
As the pixels are classified as part of the sclera or pupil, their color contributes to either the sclera color average or the pupil color average which are calculated using the following Equations: img(x p , y p ) (6) n s and n p being the number of pixels in the sclera and pupil, img the input image and x s , y s , x p , y p the positions of the sclera and pupil in the input images. After the average is calculated, it can be compared to the pupil color target or sclera color target and the differences are obtained between the calculated averages and the desired target. If the average error exceeds a certain threshold, the match is discarded; if not, it is temporarily kept. In order to obtain the best match, the candidates are sorted according to their error and the one with the lowest error is chosen as the best match. Figure 6 shows an example of a correctly matched fisheye in a noisy underwater image. Finally, it can be assumed that the detected object is not a fish or the fish is not in a position suitable for its automatic measurement if an eye match is not present; otherwise it is safe to go ahead and estimate the fish measurements.
The eye detection using the Hough technique has an accuracy rate close to 50% for 9 cm fishes, where the small size of the eye makes it more difficult to locate, whereas for approximately 1 m fishes accuracy rate is over 60%. The processing time is less than one second, which allows in real time processing at least 1 image per s (as can be seen later in the results). However, the 60% adjustment falls short of the end goal. Therefore, it is necessary to implement another technique that allows us to improve this detection capacity; therefore, when this technique is not capable of detecting the fish, Deep Learning is used for its detection, specifically an ANN to classify whether there is a fish present or not. This technique is much more computationally expensive, so it is only applied when the previous algorithm fails. The eye detection using the Hough technique has an accuracy rate close to 50% for 9 cm fishes, where the small size of the eye makes it more difficult to locate, whereas for approximately 1 m fishes accuracy rate is over 60%. The processing time is less than one second, which allows in real time processing at least 1 image per s (as can be seen later in the results). However, the 60% adjustment falls short of the end goal. Therefore, it is necessary to implement another technique that allows us to improve this detection capacity; therefore, when this technique is not capable of detecting the fish, Deep Learning is used for its detection, specifically an ANN to classify whether there is a fish present or not. This technique is much more computationally expensive, so it is only applied when the previous algorithm fails.

Fish Detection Using a Feed-Forward Artificial Neural Network
In this part of the work, we use the Deep Learning technique; a Feed-Forward Artificial Neural Network [25] is used for fish eye detection as a classification problem. The Artificial Neural Network (ANN) will indicate whether there is a fish in the image.
The structure of a feed-forward neural network is divided into the input layer, the intermediate layer, and the output layer. The intermediate layer is a multilayer of hidden layers. In this work the intermediate layer will have only one hidden layer in order to reduce the execution time.
Using ANNs requires a training process as first step; in this work the input of the training is a dataset of images classified as "eye" or "no eye" (see images in Figure A3 in Appendix B). All of these samples are derived from full images of fish within the tank and cropped to the desired size. In this work used images width and height are 100 px to avoid too high training times (with this size, training with an ANN requires 3-4 h).

Fish Detection Using a Feed-Forward Artificial Neural Network
In this part of the work, we use the Deep Learning technique; a Feed-Forward Artificial Neural Network [25] is used for fish eye detection as a classification problem. The Artificial Neural Network (ANN) will indicate whether there is a fish in the image.
The structure of a feed-forward neural network is divided into the input layer, the intermediate layer, and the output layer. The intermediate layer is a multilayer of hidden layers. In this work the intermediate layer will have only one hidden layer in order to reduce the execution time.
Using ANNs requires a training process as first step; in this work the input of the training is a dataset of images classified as "eye" or "no eye" (see images in Figure A3 in Appendix B). All of these samples are derived from full images of fish within the tank and cropped to the desired size. In this work used images width and height are 100 px to avoid too high training times (with this size, training with an ANN requires 3-4 h). Figure A3 shows a subsample of the images used for the ANN training. The full dataset is composed of 69 eyes images and 103 non-eyes images. These cropped images are used as input samples to the network, so the network will have an input element for each pixel of the image (10,000 PE). A percentage of 85% of these images are used during the training (146 images) and the rest are used in the validation phase (26).
We perform several trials with different architectures and configuration parameters. For example, architectures with 500, 1000, 2000, 5000 and 7500 neurons in the hidden layer were tested. The best results were achieved using 2000 neurons in the hidden layer and showing the samples' 7500 epochs to the ANN.
With this configuration, ANN was able to predict 21 of 26 of the validation images (80%) and 145 of 146 of the training images (99.31%).
Once the network is trained, it can be applied by means of a sliding window technique to full images used only for the test proposed (they were not used to extract cropped images, neither in test nor validation phases).
Good results are achieved with this technique although too many false positives are detected, so a post-processing phase is required. Furthermore, the main disadvantage of this process is the time required, since it involves the cropping of the original image into segments of 100 × 100 pixels and executing an ANN of considerable dimensions with each, which entails a considerable cost in terms of execution time and computational requirements (more than 10 s). This is the main reason of using only the ANN once the Hough transformation algorithm has not detected the fish.

Size Estimation
After performing detection (with Hough algorithm or with Deep Learning technique (ANN)), the silhouette can be located by edge detection techniques. In this case, the information from the resulting image is used after subtracting the background in order to detect the contours appearing in the image and to locate the one containing the detected eye.
With the aim of using this information to estimate the size of the fish, the technique described in [26] was applied. This technique uses two synchronized submerged cameras, creating a stereo vision system that allows the generation of a disparity map of the scenario. The cameras used in this study are two GoPro 3+ Black, with an angle of view of 107 degrees. To submerge them, they are introduced in a watertight housing which maintains the axes of the cameras as parallel and at the same height, and the optical axes are separated by 3.5 cm from each other.
The first step of the technique involves performing a calibration of the cameras, for which a template of coplanar points is used and an OpenCV calibration algorithm is applied based on [27][28][29]. Through this process, the camera calibration matrices are generated, and the camera pattern is obtained.
The second step consists of calculating the depth. This calculation requires the calibration information from the previous step in order to perform the transformations (remove distortions, displacements, etc.) required in the acquired images, the disparity recorded by the two cameras, so that they can locate the same points in the images of both cameras. This search for matching points between the two images is made using the Block Matching Artificial Vision algorithm [30] and, once found, a disparity map of the image can be generated. The disparity is the difference in the x-axis coordinates of each point between the two images and is inversely proportional to the depth. This is a costly process and, with the aim of reducing the processing time, it is applied only to the region where a fisheye was detected. To estimate the size of the fish, the technique used in [31] is applied, which uses the disparity map and the calibration information (Figure 7). The application of the detection technique proposed in this study leads to a decrease of the percentage of false positives, thereby the estimation of the fish size is more accurate, since the size is calculated only at the time when the fish′s eye is detected. The application of the detection technique proposed in this study leads to a decrease of the percentage of false positives, thereby the estimation of the fish size is more accurate, since the size is calculated only at the time when the fish s eye is detected.

Results
To check the system, a database of test videos recorded at different locations was used. Videos made where the environment is more controlled are available, as they were recorded in a laboratory of the Centre for Technological Innovation in Building and Civil Engineering (CITEEC), along with other videos recorded in environments more similar to real scenarios, such as the fish tanks from the Finisterrae Aquarium, A Coruña. Over 3000 images were analyzed.
The performance tests include benchmarks that test system efficiency at multiple combinations with fish, no fish, several fishes, different positions of the fish, different combinations of turbidity and luminosity, bubbles, etc., while the robustness benchmarks test the correctness of the proposed method in terms of the detection and measurement of fish. The system that has been used to test the developed software is a medium-performance computer with the following hardware: The operating system used to run the tests was Windows 7 × 64, the compiler used to build the code was VC++ v110 and the OpenCV version was 3.0.
The execution of the technique was tested in real time. The graphs, with the execution times of the proposed technique in two videos, are shown as follows. Figure 8 shows the system tested against the Video 1 scene. This scene contains a single test small fish (9 cm of length), which swims at different depths. Moreover, this video contains high amounts of noise, bubbles, and changing light, which in turn present a serious challenge for our detectors. The calculation of the disparity map of the detected fish is not too complex since the fish has bright colors and several identifying features, which allows the algorithm to obtain good results. The performance results of the second test can be seen in Figure 9, where the system is tested against the Video 2 scene. This is a more complex scene, since there are multiple fish swimming in the recorded tank and the fish are 1 m in length.   The highest peaks in Figure 8 are produced by changes in light and shadows when the fish is next to the walls of the tank. Thus, the background subtraction has changed, as Figure 10a,b show, and, as Hough algorithm fails to detect the fish, the ANN must analyze the images, which uses more performance time. The highest peaks in Figure 8 are produced by changes in light and shadows when the fish is next to the walls of the tank. Thus, the background subtraction has changed, as Figure 10a,b show, and, as Hough algorithm fails to detect the fish, the ANN must analyze the images, which uses more performance time.
Water. 2020, 10, x FOR PEER REVIEW 13 of 20 Both graphs show that, in order to achieve real-time processing, using only the Hough algorithm allow us to obtain 1 frame per second, but if we want to increase the accuracy when this method does not detect the fish, if we use the ANN then the runtime peaks during processing exceed one second. This is due to the pre-processing which has to be performed to reduce noise, the impact of the changing light, and validation of the results provided by both algorithms (Hough and ANN). Using Deep Learning with the trained ANN, when the Hough algorithm fails, the system has a 74% accuracy rate with the validation set and almost 100% with the training set.
In the tests the specimens were European perch (Perca fluviatilis) and brown trout (Salmo Both graphs show that, in order to achieve real-time processing, using only the Hough algorithm allow us to obtain 1 frame per second, but if we want to increase the accuracy when this method does not detect the fish, if we use the ANN then the runtime peaks during processing exceed one second. This is due to the pre-processing which has to be performed to reduce noise, the impact of the changing light, and validation of the results provided by both algorithms (Hough and ANN). Using Deep Learning with the trained ANN, when the Hough algorithm fails, the system has a 74% accuracy rate with the validation set and almost 100% with the training set.
In the tests the specimens were European perch (Perca fluviatilis) and brown trout (Salmo trutta), while in the second test ( Figure 11) they were Atlantic wreckfish (Polyprion americanus). About 3000 images have been analyzed in these experiments and the results of the measurements obtained by the system have been compared with those obtained manually.
Water. 2020, 10, x FOR PEER REVIEW 14 of 20 Figure 11. Video 2 with a frame of the fish detection algorithm. correctly detecting a fish in a difficult image. The small differences in the image tones, turbidity, noise and multiple fish in invalid positions present serious challenges that the classifier is able to overcome. The detected fish is highlighted in red, and the candidate ROI in green. Table 1 shows the results obtained by the method proposed in this paper operating in real situations in a tank with a school of fish of different species. As can be seen, the average size of the fish can be calculated for the different species with good precision: in European perch about 6.8% error over the real fish size, in brown trout about 10.7%, and in Atlantic wreckfish about 7%. As can be seen in the results, the system, combining deep learning with artificial vision techniques, has a detection accuracy of 74% in large fish and over 90% in medium fish. As can be seen in this precision, false negatives are not obtained because the fish is always detected when it appears in any of the images (30 frames per s), which is one of the advantages of this technique; however, there is a false positive ratio, where the algorithm indicates the detection of fish when there are really none. This ratio is higher with large fish mainly due to changes in brightness, bubbles, shadows and noise that tend to cover larger areas of the image.  Figure 11. Video 2 with a frame of the fish detection algorithm. correctly detecting a fish in a difficult image. The small differences in the image tones, turbidity, noise and multiple fish in invalid positions present serious challenges that the classifier is able to overcome. The detected fish is highlighted in red, and the candidate ROI in green. Table 1 shows the results obtained by the method proposed in this paper operating in real situations in a tank with a school of fish of different species. As can be seen, the average size of the fish can be calculated for the different species with good precision: in European perch about 6.8% error over the real fish size, in brown trout about 10.7%, and in Atlantic wreckfish about 7%. As can be seen in the results, the system, combining deep learning with artificial vision techniques, has a detection accuracy of 74% in large fish and over 90% in medium fish. As can be seen in this precision, false negatives are not obtained because the fish is always detected when it appears in any of the images (30 frames per s), which is one of the advantages of this technique; however, there is a false positive ratio, where the algorithm indicates the detection of fish when there are really none. This ratio is higher with large fish mainly due to changes in brightness, bubbles, shadows and noise that tend to cover larger areas of the image.

Conclusions
In this paper we show an application of artificial intelligence techniques for detecting fish. The advantage of the method proposed in this study is that there is no need to have in-depth knowledge of the fish species to detect them. However, similarly to other artificial vision techniques, it remains conditional on good image quality and good background subtraction. Other advantages of this technique are that detection is performed in real time and the recording is made in stereo format, and these data can be subsequently used to calculate the size of the fish.
The Hough Transform method, even with the usage of MSER to increase robustness, has difficulties with noisy images and is also highly dependent on light conditions. For this reason, an alternate fish detection method has also been provided, Deep Learning with an ANN. Comparing the Hough Transform method with the use of neural networks, the latter correctly detect the fish in the environment for which they are trained, but when the conditions vary, the ANN should be re-built and re-trained for the new scenario. Other works are based on the shape or color of the fish or background to obtain a segmentation that allows their detection, thereby changing species or conditions, would not make this method particularly useful. On the other hand, comparison with patterns has a major drawback consisting of the need to have enough images of the species to be detected; so our proposal could perform a search among the available patterns, and using both methods (Hough Transformation algorithm and, when it fails, the ANN) we can obtain an accuracy of 74%.
The first step to test the proposed technique was to include more than one species in the same environment so that several specimens of different species may appear in the same frame. Secondly, the tests have been conducted in an environment as close as possible to reality, where turbidity and changes in brightness have a greater impact. The results obtained have been satisfactory; in a real video of 3000 s (50 min) of a school of fish in an aquarium, the method has been able to detect 74% of images in which a fish appeared. This means that it is feasible to carry out a real-time fish detection system (one image per second) that is capable of efficiently detecting fish and subsequently detecting their size. In this way it could be applied in practical cases such as in fish farms or in studies of wild populations, where the objective is to analyze the evolution of the size of a school of fish.  . This project was also supported by the Spanish Ministry of Economy and Competitiveness through the project BIA2017-86738-R and through the funding of the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13-3503) and the European Regional Development Funds (FEDER) by the European Union. Additional support was offered by the Consolidation and Structuring of Competitive Research Units-Competitive Reference Groups (ED431C 2018/49) and Accreditation, Structuring, and Improvement of Consolidated Research Units and Singular Centers (ED431G/01), funded by the Ministry of Education, University and Vocational Training of the Xunta de Galicia endowed with EU FEDER funds. Last, the authors also acknowledge research grants from the Ministry of Economy and Competitiveness, MINECO, Spain (FEDER CTQ2016-74881-P).

Acknowledgments:
The authors would also like to thank the managers and personnel of the Finisterrae Aquarium of A Coruña for their support, technical assistance and for allowing the unrestricted use of the Finisterrae facilities and of the Centre for Technological Innovation in Building and Civil Engineering (CITEEC).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The first step is the conversion of the image colors from RGB to grayscale, using the following formula: For testing, the default values are maintained (r = 0.299, g = 0.587 and b = 0.114) provided by OpenCV [27], as they are valid in these cases. However, in other environments they may need to be varied in order to obtain a higher contrast in the frame, facilitating the following steps.
Next, in order to reduce noise and improve the robustness of the background subtraction technique, a Gaussian Blur filter is applied to the input images. This filter uses a Gaussian matrix as its underlying kernel, which produces output pixel values resulting only upon the pixel values in its neighborhood as processed by the convolution kernel. The following formula represents the 2D Gaussian applied: where A is the amplitude, µ is the mean and σ is the variance per each of the variables x and y.
Since the input images are too noisy and complex to obtain good circle matches with the Hough Transform, several steps have to be applied to the input images before fisheyes can be detected with the aforementioned approach. The first step computes a Mean Shift Filter [32,33] in order to smooth textures, reduce noise and therefore make eye color validation easier and more robust, having an RGB image that consists of n pixels, each of which has three values (r, g, b). Since an image is a distribution of points in a three-dimensional space, a density function can be defined as: Local maximum points of the density function can be detected by Mean Shift Analysis as illustrated below. In Figure A1, the dark circles indicate local maximum points. dimensional space, a density function can be defined as: Local maximum points of the density function can be detected by Mean Shift Analysis as illustrated below. In Figure A1, the dark circles indicate local maximum points. Any point in the three-dimensional space converges to a local maximum, therefore smoothing the image. In addition, position information can also be utilized so that distant objects are not labeled by the same colors. In order to accomplish this, the three dimensional vector (r, g, b) can be expanded to a 5-dimensional vector (x, y, r, g, b), which includes location information.
Moreover, since the edge contrast of an image should be enhanced, the Hough transform is more robust; an unsharp mask was also applied to the images in order to achieve this objective. The sharpening process utilizes a slightly blurred version of the original image (Gaussian blur), which is subsequently subtracted from the original to detect its edges, effectively creating an unsharp mask. Contrast is then increased, along with these edges, using the aforementioned mask: Since this type of filter can cause artifacts on edge borders, which may lead to double edges, the Laplacian of the original image is also added to the sharpened image:  Any point in the three-dimensional space converges to a local maximum, therefore smoothing the image. In addition, position information can also be utilized so that distant objects are not labeled by the same colors. In order to accomplish this, the three dimensional vector (r, g, b) can be expanded to a 5-dimensional vector (x, y, r, g, b), which includes location information.
Moreover, since the edge contrast of an image should be enhanced, the Hough transform is more robust; an unsharp mask was also applied to the images in order to achieve this objective. The sharpening process utilizes a slightly blurred version of the original image (Gaussian blur), which is subsequently subtracted from the original to detect its edges, effectively creating an unsharp mask. Contrast is then increased, along with these edges, using the aforementioned mask: Out(x, y) = 1.5 × Input(x, y) − 0.5 × G(x, y) Since this type of filter can cause artifacts on edge borders, which may lead to double edges, the Laplacian of the original image is also added to the sharpened image: Sharp(x, y) = 1.5 × Input(x, y) − 0.5 × G(x, y) − W × (Input(x, y) × s × L(x, y)) (A5) W being the weight, s the scale and L(x,y) the Laplacian: L(x, y) = ∂ 2 Input ∂x 2 + ∂ 2 Input ∂y 2 (A6) Figure A2 shows an example of the transformation in greyscale and the image filtering. Water. 2020, 10, x FOR PEER REVIEW 17 of 20 input output. Figure A2. Example of image filtering Appendix B Figure A3 shows some of the frames extracted from original images that were used in the training and validation phases of the artificial neural network.  Figure A3 shows some of the frames extracted from original images that were used in the training and validation phases of the artificial neural network. input output. Figure A2. Example of image filtering Appendix B Figure A3 shows some of the frames extracted from original images that were used in the training and validation phases of the artificial neural network.