Non-Contact Tilapia Mass Estimation Method Based on Underwater Binocular Vision

: The non-destructive measurement of ﬁ sh is an important link in intelligent aquaculture, and realizing the accurate estimation of ﬁ sh mass is the key to the stable operation of this link. Taking tilapia as the object, this study proposes an underwater tilapia mass estimation method, which can accurately estimate the mass of free-swimming tilapia under non-contact conditions. First, image enhancement is performed on the original image, and the depth image is obtained by correcting and stereo matching the enhanced image using binocular stereo vision technology. And the ﬁ sh body is segmented by an SAM model. Then, the segmented ﬁ sh body is labeled with key points, thus realizing the 3D reconstruction of tilapia. Five mass estimation models are established based on the relationship between the body length and the mass of tilapia, so as to realize the mass estimation of tilapia. The results showed that the average relative errors of the method models were 5.34%~7.25%. The coe ﬃ cient of determination of the ﬁ nal tilapia mass estimation with manual measurement was 0.99, and the average relative error was 5.90%. The improvement over existing deep learning methods is about 1.54%. This study will provide key technical support for the non-destructive measurement of tilapia, which is of great signi ﬁ cance to the information management of aqua-culture, the assessment of ﬁ sh growth condition, and baiting control.


Introduction
The tilapia (Oreochromis sp.) has characteristics such as fast growth, easy cultivation, and rich nutritional content [1].Accurate measurement of fish length information is crucial for estimating fish mass, bait control, and aquaculture sales [2].Length and body weight are fundamental biological features of fish populations and can reflect individual physiological states [3].Fish mass information is essential for various aquaculture management activities.Traditional fish mass measurement methods require capturing fish and measuring their length and weight, which can result in fish mortality, resource waste, and poor measurement consistency [4].
With the continuous development of computer vision technology, more and more researchers are investing in the field of underwater binocular vision.Non-contact fish mass estimation methods [5] have attracted increasing attention from researchers.Abinaya et al. [6] used deep learning techniques, specifically the YOLOv4 model, to detect and analyze fish head, body, and tail features.Completely visible fish (CVF) were identified by the segmentation analysis technique.The fish biomass was calculated from the estimated length using a length-mass relationship calibration curve.The method showed high accuracy and reliability in estimating fish biomass, especially in cryptic environments.Konovalov et al. [7] collected about 2500 images of golden eye bass and estimated their mass.Two instances of a convolutional neural network (CNN) were trained using LinkNet-34 architecture.A CNN that regresses the weight directly from the images was also trained to estimate the mass of the fish efficiently and automatically.That study demonstrated the potential for automated mass estimation of golden eye bass using deep learning techniques.Fernandes et al. [8] designed the CVS to be able to efficiently distinguish the fish body from the background and fins, and the extracted fish body regions could be successfully used to predict fish weight and carcass weight.That study demonstrated the potential of utilizing computer vision and deep learning techniques for fast, accurate, and non-invasive measurements in the aquaculture industry.Yu et al. [9] improved the feature pyramid network (FPN) for Mask R-CNN and proposed an effective scheme for the accurate measurement of fish body length and body width in precision agriculture, which can achieve highly accurate measurement results in different contexts.But, in these methods, the fish were caught and placed on a platform.Such a process may cause a lot of stress to the fish, affecting its growth and even leading to its death [10].Of course, there are researchers who conduct experiments underwater.Sanchez-Torres et al. [11] used a single underwater camera to acquire images in a controlled environment in order to shorten the measurement time and reduce fish stress.The length and mass of the fish were also estimated by image processing and regression analysis techniques.Saberioon et al. [12] developed an automated system using a Kinect as an RGB-D camera to collect depth map and top view images of 295 farmed bass of different sizes.The geometric features of the back of the fish and machine learning algorithms were utilized to estimate the mass of the fish.Hao et al.'s [13] study proposed an unsupervised fin removal method.The fish mass was estimated based on the area and the area squared after fully automated caudal fin removal with a maximum relative error of 8.46%.Although this method of mass prediction based on fish body area is efficient, it is not possible to accurately estimate the mass of the fish when its body plane is at an angle to the camera level.
This study utilized binocular stereoscopic vision technology to capture images of tilapia, eliminated the influence of uneven illumination on underwater images using the Retinex image enhancement algorithm, conducted stereo matching, segmented fish bodies using a segmentation platform, marked key point coordinates, and estimated the length of the tilapia using three-dimensional reconstruction technology.Five tilapia length-mass prediction models were established, and the one with the smallest error was selected through analysis and comparison, providing technical support for rapidly estimating the length and mass of freely swimming fish underwater.The main contributions of the work in this paper are as follows: 1. Firstly, a binocular image dataset of underwater tilapia is established, which can be used for binocular image stereo matching in combination with underwater image enhancement algorithm.It provides an effective basis for the three-dimensional reconstruction of tilapia.2. Aiming at the problem of non-contact tilapia length estimation based on underwater binocular stereo vision, this paper proposes a tilapia length estimation method based on binocular vision, image segmentation, and key pixel marking, which is aimed at realizing efficient and low-cost non-contact tilapia length estimation.3. A regression model for predicting the mass of tilapia body length-body mass relationship was developed.The body length was obtained by the body length estimation algorithm, and the mass of tilapia could be estimated by inputting the model.The experimental results showed that the method was highly reliable and could be used for non-contact mass estimation of underwater tilapia.

Experimental Setup
In this experiment, the image data were acquired from the aquaculture plant in the coastal aquaculture base of Shanghai Ocean University, and tilapia, which has a high economic and edible value, was selected to obtain the images.The underwater binocular vision system used a deepwater binocular camera model ZF-USB-02B10 (Weihai Zhifan Marine Equipment Technology Company Limited, Weihai, and China), with a pixel size of 3.75 × 3.75 × 10 −6 mm 2 , a frame rate of 30 fps, a resolution of 2560 × 960 pixels, a focal length of 3.0 mm, a baseline of 60 mm, and a horizontal fixation, which was connected to a laptop computer via USB 2.0.The processor of the laptop was Intel i5-9300H 2.4 GHz; the memory was 16 G; the graphics card was GTX-1660Ti; the operating system was Windows 11; and the programming environment was Python3.8.The experimental environment is as shown in Figure 1, and the tilapia were swimming freely in a white culture tank with a size of 120 cm × 90 cm × 34 cm.The weighing was carried out using an electronic scale with an accuracy of 1 g and connected to a DC power supply.

Data Acquisition
When acquiring data, the specific steps were as follows: Water was drawn directly from the breeding pool into a white breeding box, and the tilapia were scooped out and placed on a flat surface.Their body length was measured with a tape measure once they ceased to move.The body length of the tilapia was defined as the straight-line distance from the tip of the snout to the base of the tail fin.Afterward, the tilapia, whose length had been measured, were placed on a container weighing 1 m , and the total weight was determined with an electronic scale to be 2 m .Thus, the mass of the tilapia was as follows: To minimize errors, the same sample was measured three times, and the average value was taken.This study acquired the body length and mass information for approximately 50 tilapia specimens.These samples were divided into three different weight groups: M1 (0-200 g), M2 (200-400 g), and M3 (400-800 g) for analyzing tilapia at different stages of growth.Additionally, 10 tilapia from the samples were sequentially placed in the breeding box for photography.The computer software AMCap 9.08 (build 63.4) was used to acquire and filter out 1000 unobstructed images of the tilapia, and during the image acquisition process, the binocular camera was continuously moved to capture images of the swimming fish from multiple angles.

Image Processing and Analysis
The main flow of this experiment is shown in Figure 2. Firstly, underwater camera calibration was performed to obtain the parameters of the camera for aberration correction.Then, the acquired images were processed in a division of labor.On the one hand, the underwater image was enhanced as a way to eliminate some effects of uneven illumination.The enhanced image was then stereo-matched to calculate the disparity value and obtain the depth information of the image.On the other hand, the acquired data were segmented, and the purpose of segmentation was to facilitate the labeling of key points in the next step.After labeling, the pixel body length of the tilapia could be calculated.Then, the depth information and pixel body length were combined to reconstruct the tilapia in three dimensions, which was used to calculate the actual body length of the tilapia.Finally, the relationship between the body length and the mass of tilapia was modeled.The mass information of the tilapia was obtained by inputting the estimated body length of the tilapia.

Camera Calibration
The refraction of light when shooting in underwater environments can lead to image distortion.To ensure the accuracy of the results, the binocular camera requires binocular calibration and distortion correction underwater [14].The ZHANG's method [15,16] was used, and a 12 × 9 square grid aluminum substrate was selected for the calibration plate.The size of each square was 30 mm × 30 mm.
Keeping the position of the camera unchanged, the position of the calibration plate was constantly changed, and several groups of images with different angles and positions were captured.The relative position of the binocular camera to the calibration board is shown in Figure 3. Forty of these images were selected and programmed to be split into separate left and right viewpoint maps and stored in separate folders.The left and right viewpoint maps were automatically calibrated using tools in Matlab2017b.The corner point detection results of the calibration board are shown in Figure 4.The optical aberrations produced by the images can be corrected using Equations ( 2)-( 4).
(1 ) where ( , )  x y are the coordinates of the original image.The histogram of the reprojection error of the Matlab2017b first calibration results is shown in Figure 5a.The maximum error in the first calibration reached 0.15, and it can be seen that several images had a large impact on the error.In order to further improve the subsequent estimation accuracy, we removed some image groups with large errors.The histograms of the reprojection errors after removal are shown in Figure 5b, and the errors were all below 0.1.The calculated camera parameters are shown in Table 1.It can be seen that the rotation matrix R approximates the unit matrix.The first parameter of the translation vector T represents the distance between the centers of the two cameras.The default parameter for the binocular camera baseline was 60 mm, which is within the tolerance of the error.Because of the uneven illumination of the underwater images obtained from the experiment, this study preprocessed the images before stereo matching.A Retinex-based image enhancement algorithm was used to enhance the underwater images, and the final image was obtained to eliminate the uneven illumination and retain the nature of the fish itself.In this study, several mainstream image enhancement methods were selected for comparison, including single-scale Retinex (SSR), multi-scale Retinex (MSR), and multiscale Retinex with color restoration (MSRCR).A comparison of the underwater enhancement results is shown in Figure 6.Quantitative analysis is needed for images processed by underwater image enhancement methods.In this study, two commonly used image quality evaluation metrics, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), were used to quantitatively analyze and compare the performance of different algorithms.
In the application of underwater image enhancement, PSNR can be used to determine whether the enhanced image is over-enhanced.Ideally, the enhanced image should be closer to the reality of the original scene rather than producing unnatural enhancement.A higher PSNR value means that the difference between the enhanced image and the original image is smaller, which usually indicates that the image is of better quality and does not appear to be over-enhanced.The formula for calculating the PSNR is as follows: In the above formula, A MAX denotes the maximum pixel value of the image, m and n denote the pixel value of each row and column of the image, ( , ) A i j denotes the original image, and ( , ) B i j denotes the enhanced processed image.The four sets of images in Figure 6 are numbered from 1 to 4. The PSNR values of different algorithms in the same environment are shown in Table 2. Table 2 shows that the peak SNR values of MSRCR algorithm were higher than those of the other algorithms for different images in the same environment, which proves that the MSRCR performed well in this index.
SSIM is a metric used to assess the similarity between images.It is determined by the covariance between two images.A larger value of SSIM indicates a higher-quality image.When two images are identical, the SSIM is equal to 1.For two images x and y , the structural similarity index between them can be calculated using a specific formula: In the above equation, the mean of x is denoted by x  and the mean of y is denoted by  y . 2 x denotes the variance of x;  2 y denotes the variance of y;  xy denotes the co- variance of x and y; and c denotes a constant value.The structural similarity ranges from 0 to 1. Table 3 shows the SSIM index of different algorithms.As can be seen from Table 3, the similarity structure index of MSR algorithm was better than that of the other two algorithms.Because the image must be restored to the original image as much as possible during image stereo matching, the two evaluation indexes and the final effect of the image were synthesized.In this study, we decided to choose the MSR algorithm for underwater image enhancement before stereo matching to make the image clearer.
After completing the underwater camera calibration and image enhancement, the stereo matching algorithm can be used to match each pixel point in the left and right cameras.The depth of each point was calculated from the disparity value and converted into a depth image.The 3D coordinates of the corresponding target points were obtained to complete the 3D reconstruction.The depth of the target point corresponding to the pixel point in the image was calculated using the similar triangle principle, as shown in Figure 7.In the figure , l O and r O are the optical centers of the left and right cameras, respectively, l X and r X are the horizon- tal coordinates of the pixels on the left and right imaging planes, respectively.The depth D of the target point P is related to the disparity value d by the following equation: where D is the depth value, f is the focal length.l B is the baseline length of the cam- era, and d is the disparity value.p L is the edge length of a single pixel of the camera.In this paper, we used the current mainstream semi-global block matching method (SGBM) [17,18] as the camera stereo matching algorithm, and the adopted parameters are shown in Table 4.In this paper, we used the relevant functions and methods integrated in the OpenCV library to implement the process of stereo matching.The texture filtering algorithm of the block matching algorithm was also integrated in the preprocessing of the SGBM algorithm in the OpenCV library, which helped to remove regions with low texture values.An example of the stereo matching results is shown in Figure 8.It is not difficult to observe that there is only a small void in the disparity of the fish body part of the disparity map [19], which basically meets the needs of size measurement.

Image Segmentation
Before further processing the image and estimating the fish body length, the fish body needs to be recognized and segmented as the foreground.Deep-learning-based semantic segmentation methods have better environmental adaptability than traditional image segmentation methods.The effect of environmental factors on the segmentation results can be overcome.As this paper used a white breeding tank, the difference between the tilapia and the background was more obvious.In order to simplify the processing, the segment anything model (SAM) [20] modeling platform of Meta-AI was used to segment the acquired images like the one illustrated in Figure 9.The model has a huge amount of training data and can perform hierarchical and full segmentation of images as shown in Figure 10.The segmented fish body parts help in the next step of key point labeling and improve the accuracy of manual labeling.

Methods of Estimating Body Length
The segmented fish body image was labeled with the coordinates of the muzzle ( , x y ) and the base of the caudal fin ( 2 2 , x y ), as shown in Figure 11.Since stereo matching is not good at obtaining high-density correspondences, the problem of not finding the corresponding depths on key points can occur.For such cases, we acquired the depths of multiple neighboring points of the key point.The average of the depths of these neighboring points was taken as the depth of the key point.The pixel length ( PL ) of the tilapia was calculated by Equation ( 8), and we tried to avoid acquiring the values of empty areas when acquiring the depth information.And the depth values of five different parts of the fish body were acquired.The average value avg D was taken as the depth value of the fish body in this figure, so as to minimize the error.Then, we combined the depth information with the triangle similarity principle to calculate the body length ( BL ) of the fish: In practice, it is difficult to obtain a high-quality image of the fish at a perfect angle (fish body parallel to the mirror) [21,22].In most cases, the body of the fish is not parallel to the mirror.As shown in Figure 12, the angle between the fish body and the mirror is  .We need to acquire the depth information of the fish's muzzle and the base of the caudal fin.Their pixel coordinates are combined for 3D reconstruction of the fish body.In the figure , 1 p and 2 p are the pixel coordinates of the fish's muzzle and caudal fin base in the x-axis, respectively.1 d and 2 d are the depth values of the first two, respectively.Finally, the trigonometric function is combined to further calculate the real length ( RL ) of the fish body:

Fitting Results
In this study, a mass estimation model for tilapia was developed based on the body length of the tilapia.Five regression models were established: linear regression model, exponential regression model, logarithmic regression model, power function regression model, and quadratic term regression model.We manually measured the length and mass of 50 tilapia and fitted a model to these data.Categorized by different mass groups, the results were obtained as shown in Table 5. M denotes the mass of the fish, x denotes the actual body length of tilapia derived from Equation (10), and 2 R denotes the coefficient of determination of the model.From the results of fitting these five models, it can be seen that the correlation between fish length and mass was high.The overall coefficient of determination was greater than 0.78.The exponential, quadratic term, and power model had the highest coefficients of determination of 0.99 for the M1 mass group, but the coefficients of determination for all three types of models decreased as the fish grew.In contrast, the logarithmic model's coefficient of determination increased with fish mass.Overall, the quadratic term model had the highest coefficient of determination in each mass group and was superior to the other models.Therefore, the quadratic term model was designated as the best fitting model.The results of the quadratic term fitting for the three mass groups are shown in Figure 13.

Tilapia Mass Estimation
To further validate the stability of the model, we used the body lengths of the acquired tilapia to input directly into the model.Different mass groups can be interpreted as tilapia at different growth stages.Comparisons were made by selecting the linear, power function, and quadratic term models.The estimated values of the corresponding mass groups for the three models are shown in Table 6.The evaluation metrics were root mean square error (RMSE), mean absolute error (MAE), and mean relative error (MRE), and their equations are as follows: ) In the above equation, n is the number of samples.i y denotes the true value.ˆi y denotes the estimated value.Comparing the three tilapia mass estimation models, the RMSE ranged from 6.62 to 60.99 g.The lowest of these was the quadratic term model estimation results in M1, and the highest was the linear model estimation results in M3, with MAE ranging from 4.97 to 54.90 g.The lowest was still the quadratic term model estimation result in M1 and the highest was still the linear model estimation result in M3.It is not difficult to find that the linear model had poorer estimation results with 2 R ranging from 0.89 to 0.90.Removing the group with the smallest mass, the MRE ranged from 5.74% to 9.40%.And the power function model performed better than the linear model in the small mass estimation group.The 2 R was 0.87 to 0.99 and the MRE was 6.06% to 8.90%.The best prediction was made by the quadratic term model with an 2 R of 0.91 to 0.99, and the MRE was also minimized to 5.34% to 7.25%.
Finally, in this study, the mass of 10 tilapia whose images were acquired was estimated.After image enhancement, stereo matching, body length estimation, and mass fitting, we calculated the mass of tilapia.The results of the comparison between the manual measurements and the estimated mass of the final tilapia are shown in Figure 14.The 2 R was 0.99, and the MRE was 5.90%.No significant difference was found between the artificially measured mass and the estimated mass, so the test proved that the method can be used for the mass estimation of tilapia.Although the experiments are affected by the water body environment with some differences in the results, we compared them with some existing methods in order to show the superiority of our approach.The methods we compared are the area-and areasquared-based mass estimation model [13] and the deep learning method using YOLOv4 DLN [6].As can be seen in Table 7, comparing the other two methods, the MRE of our method reached 5.90%, which is an improvement of 1.54% compared to the second place.The coefficient of determination of our method reached 0.99, which is an improvement of 0.05 compared to the deep learning method of YOLOv4 DLN.Thus, our method performed better than the first two.

Discussion
In some previous studies using fish body length to estimate fish mass, it was necessary to manually mark the fish contour.However, such marking is not only less accurate but also time-consuming and laborious [23,24].Moreover, the free swimming of the fish in the water increases the difficulty of this operation.There are also combinations of features [25] to estimate fish mass and the use of fish body area [26] to estimate mass.In this study, in order to improve the accuracy of labeling the mouth and tail end of the fish, the image is segmented first.The body part of the fish is accurately segmented, and then the segmented image is labeled.And underwater image enhancement is performed before image stereo matching to further improve the matching accuracy.With the rapid development of deep learning, the SAM model released by Meta-AI has been able to perform semantic segmentation without the researcher providing any training samples.The segmentation in this study was directly based on this platform.For the segmented fish body, we can more clearly label the coordinates of the fish's muzzle and caudal fin base.And use the pixel coordinates combined with the depth information to estimate the body length of the fish more accurately.
The experimental results show that the mass of the tilapia can be accurately estimated using the single characteristic of fish body length.However, as the tilapia grows it may lead to larger errors.In this study, it was found that, at the later stage of tilapia growth, the growth rate of the body length of the tilapia was lower than the growth rate of its own mass.At this point, a single body length can no longer accurately estimate the mass of tilapia.It is necessary to consider adding other characteristic parameters such as body height and body width to improve the estimation of tilapia weight to further improve the accuracy of tilapia weight estimation.
Although this study created a dataset containing images of tilapia, the dataset was primarily collected under farm tank conditions.These experimental conditions resulted in a homogenous experimental context and fish species limitations.In the future, in order to enhance the general applicability of the method, it could be tested in more diverse natural water environments.A wider range of species of fish and their images under different background conditions could be collected to verify the applicability and generalizability of the system.In this paper, the coordinates of the key points of the fish's muzzle and caudal fin base are extracted using manual labeling, and there is some human error in the calculation of the fish's body length.In the subsequent research, the model can be built using neural networks with deep learning methods.More complex datasets can be trained so as to achieve automatic detection and labeling of the endpoints of the snout and the base of the caudal fin for fully automated fish quality estimation.Currently, the system relies on a laptop computer as the platform for image processing and computation, and the images are captured by a binocular camera.Given the relatively large size of the laptop, the integration of the system components and the binocular camera into a smaller embedded device can be considered in the future.In this way, the size and weight of the system can be reduced, making it more suitable for use in practical application scenarios.When acquiring image depth information, some advanced devices can be used, such as the ZED depth camera, which can avoid the complex calibration and ranging process.

Conclusions
In this paper, a non-contact tilapia mass estimation method based on underwater binocular vision is proposed.Firstly, a platform for acquiring underwater tilapia images was built.The underwater tilapia images were acquired by a binocular vision system, and the underwater camera was calibrated, and the aberration was corrected.The image was preprocessed using a Retinex-based image enhancement algorithm, combined with stereo matching to obtain depth information.Image segmentation was performed on the acquired tilapia images, and the coordinates of key points were labeled.Thus, the threedimensional reconstruction of the underwater tilapia was realized.The estimation model between body length and mass of tilapia was established, thus realizing the mass estimation of non-contact tilapia.The MRE of the tilapia mass estimation model developed in this paper ranged from 5.34% to 12.14%.Experimental results show that the method proposed in this paper has high accuracy and stability.The R of the estimated value of the mass estimation model to the manually measured value was 0.99, and the MRE was 5.90%.The mass estimation of underwater tilapia can be carried out effectively.The tilapia mass estimation model established by this study can estimate the mass of tilapia without contact, which improves the measurement efficiency.It provides technical support for the rapid estimation of body length and mass of underwater free-swimming tilapia.

Figure 2 .
Figure 2. Flowchart for mass estimation of tilapia based on underwater binocular vision.

Figure 3 .
Figure 3. Relative position of binocular camera to calibration board.

Figure 5 .
Figure 5.Comparison of reprojection error before and after processing: (a) histogram of reprojection error for first calibration; (b) histogram of reprojection error after removal of large errors.

Figure 12 .
Figure 12.Effect of fish distance and angle variation on pixel body length.

Figure 13 .
Figure 13.Quadratic term fit plots for different mass groups of tilapia.

Figure 14 .
Figure 14.Comparison between estimated and manually measured fish mass.

Table 2 .
PSNR results for different algorithms.

Table 3 .
SSIM results for different algorithms.

Table 5 .
Comparison of fittings of different models for tilapia mass.

Table 6 .
Error analysis of tilapia mass estimation.

Table 7 .
Comparison of results of different methods for estimating fish mass.