Intelligent Measurement of Morphological Characteristics of Fish Using Improved U-Net

: In the smart mariculture, batch testing of breeding traits is a key issue in the breeding of improved ﬁsh varieties. The body length (BL), body width (BW) and body area (BA) features of ﬁsh are important indicators. They are of great signiﬁcance in breeding, feeding and classiﬁcation. To accurately and intelligently obtain the morphological characteristic sizes of ﬁsh in actual scenes, data augmentation is ﬁrst used to greatly expand the published ﬁsh dataset, thereby ensuring the robustness of the training model. Then, an improved U-net segmentation and measurement algorithm is proposed, which uses a dilated convolution with a dilation rate 2 and a convolution to partially replace the convolution in the original U-net. This operation can enlarge the partial convolution receptive ﬁeld and achieve more accurate segmentation for large targets in the scene. Finally, a line ﬁtting method based on the least squares method is proposed, which is combined with the body shape features of ﬁsh and can accurately measure the BL and BW of inclined ﬁsh. Experimental results show that the Mean Intersection over Union (mIoU) is 97.6% and the average relative error of the area is 0.69%. Compared with the unimproved U-net, the average relative error of the area is reduced to about half. Moreover, with the improved U-net and the line ﬁtting method, the average relative error of BL and the average relative error of BW of inclined ﬁsh decrease to 0.37% and 0.61%, respectively.


Introduction
Nowadays, artificial intelligence technology is widely used in traditional agricultural production [1,2]. In recent years, precision aquaculture based on artificial intelligence and image processing technology has developed rapidly [3,4]. The production mode of aquaculture has been transformed from an extensive model to an ecological, precise and intensive model. Accurate, automatic and intelligent aquaculture can greatly improve fishery productivity, resource utilization, and is conducive to the protection of the aquaculture ecological environment. Therefore, it is of great significance to accelerate the digitalization, precision and intelligence of fishery [5,6].
In precision aquaculture, deep learning methods have been widely used. Wageeh, Y. et al. used image enhancement technology and the YOLO model to extract the number and behavior trajectories of fish through underwater cameras [7]. Hu, J. et al. used YOLO-v3-Lite network with a novel backbone structure to recognize fish behavior [8]. Wu, H. et al. constructed a deep network with YOLO architecture to detect the bounding boxes of fishes and extract the edges of fishes from the bounding boxes. Then, the authors used the SGBM method to estimate the length and width of the fishes [9]. Liu, S. et al. realized online underwater fish detection and tracking by using YOLO-v3 detection algorithm and parallel correlation filter [10]. These methods used the schemes based on YOLO model to detect

Evaluation indicators
In the experiment, to evaluate the effect of network segmentation, a better analysis of the network performance and the needs of actual production are considered. mIOU [40], average accuracy rate, average recall rate [41], and average area relative error are used as evaluation indicators. To evaluate the measurement of body length and body width of fish, the average relative error is used as the evaluation indicator.

Improved U-net performance verification
In order to initially verify the performance of the improved U-net network structure, no rotation transformation is performed on the original data set. The experimental results of segmentation by using U-net and improved U-net are shown in Fig. 13. In order to better display the experimental results, the detected area is cropped and some obvious segmentation differences are presented with blue frames.  The operation of contrast transformation and rotation are used to simulate the actual shooting environment, and a large number of training samples are generated for training by appropriate translation and scaling transformations; 2.
According to the characteristics of the experimental dataset, the U-net network structure is improved by using a 3 × 3 dilated convolution with a dilation rate 2 and a 1 × 1 convolution to partially replace the 3 × 3 convolution in the original network, the partial convolution receptive field can be expanded to achieve a more accurate segmentation effect; 3.
Combined with the characteristics of fish body shape, the least squares line fitting method is adopted. The solution realizes accurate measurement of the BL and BW of the inclined fish.
The remainder of this paper is organized as follows: Section 2 briefly introduces the data acquisition and the scheme proposed in this article. Section 3 gives a detailed introduction to the process of data augmentation, improvement of U-net network structure, and least squares line fitting to obtain BL and width. Section 4 describes the process of the experiment in detail and analyzes the results of the experiment. Section 5 summarizes this paper and discusses future directions. The main abbreviations and symbols used are listed in Table 1. In order to achieve fast, accurate and stable acquisition of fish images, this paper uses a home-made image acquisition device [20]. As shown in Figure 2, the device consists of a standard measuring plate (bottom length 560 mm, width 400 mm) and a mechanical arm. The process of collecting the fish body image is as follows. First, the acquisition camera (OLYMPUS TG-4, f/2.0, focal length: 4 mm, self-contained lens distortion correction) is installed at the end of the execution of the mechanical arm, and then connects it to the computer via a data cable. Next, the position of the camera is set by adjusting the robot arm so that the photographing screen can cover the bottom length of the platform, and the camera lens is parallel to the platform. Finally, we place the fish on the measuring plate and keep the camera directly over the fish body, quickly capture the image, collect the image data of the fish body, and then transmit it to the computer through the data cable.  Figure 2. Image acquisition device. "1" is the standard measuring plate, "2" is the fixing clamp, "3" is the knob, "4" is the mechanical arm, and "5" is the end-effector.

Proposed Scheme
The gray-filled modules in Figure 3 are the main work of this paper. From Figure 3, the flow chart is proposed for the segmentation and measurement of inclined fish features by U-net with increased receptive field. First, the original image is acquired through the image acquisition device. Due to the ideal environment when the image is collected, the contrast of the image does not change significantly and the fish body is placed nearly horizontally. In order to better simulate the actual environment, contrast transformation, rotation transformation, translation transformation and scaling transformation are performed on the training set, and contrast transformation and rotation transformation are performed on the test set. The main purpose of the translation and scaling transformations on the training set is to generate more training samples. Second, the expanded training set is input to the improved U-net network for training, and a trained model is obtained. Then, the test samples after image processing are input into the trained model to obtain the accurately segmented binary images [21]. Next, the outer contour acquisition and linear fitting operation based on the least square are performed to obtain the set of contour points of the fish and the fitted line [22][23][24]. Subsequently, the values of BL and BW are obtained by mathematical derivation. Finally, comparing the obtained data with the actual morphological data and the binary image labeled on the test set, we can obtain various indicators for evaluating the performance of the scheme.

Data Augmentation
In order to better simulate the actual processing environment and generate more images for deep learning network learning, data augmentation [25][26][27] is adopted to address the issue of small dataset. The contrast transformation [28] and the rotation transformation are used to simulate the change of light and the incomplete horizontal phenomenon of fish in actual processing, respectively. In the meantime, the translation transformation and the scaling transformation are used to simulate positional differences in the image where the fish is located and individual differences in the fish, respectively.

Contrast Transformation
Contrast transformation is an image processing method that changes the contrast of image pixels by changing the brightness value of image pixels, thereby improving the image quality. In the experiment, the contrast transformation is used to simulate the light transformation in the actual environment. Figure 4 shows the effect of contrast transformation. The value of the contrast transformation used in the experiment is set randomly in the interval of [0.5, 1.5].
transformation are used to simulate positional differences in the image where the fish is located and individual differences in the fish, respectively.

Contrast transformation
Contrast transformation is an image processing method that changes the contrast of image pixels by changing the brightness value of image pixels, thereby improving the image quality. In the experiment, the contrast transformation is used to simulate the light transformation in the actual environment.

Rotation transformation
Rotation transformation is a transformation that rotates the image and fills the vacant part after rotation with adjacent values. The rotation transformation of the fish is used to simulate the phenomenon that the fish is not level in the actual processing environment.

Translation transformation
Translation transformation is to translate the image horizontally and vertically, but the resolution of the image is unchanged. For the vacant part after translation, adjacent values will be used for padding. The translation transformation can be used to solve the differences in the position of the fish in the image and to produce a large number of training images. In order to better display the translation effect, a slightly larger translation ratio column is selected for presentation, and the transformation effect is shown in Fig. 5. In the experiment, the ratio of horizontal or vertical translation length to picture length is within [0, 0.02].

Rotation Transformation
Rotation transformation is a transformation that rotates the image and fills the vacant part after rotation with adjacent values. The rotation transformation of the fish is used to simulate the phenomenon that the fish is not level in the actual processing environment. Figure  transformation are used to simulate positional differences in the image where the fish is located and individual differences in the fish, respectively.

Contrast transformation
Contrast transformation is an image processing method that changes the contrast of image pixels by changing the brightness value of image pixels, thereby improving the image quality. In the experiment, the contrast transformation is used to simulate the light transformation in the actual environment.

Rotation transformation
Rotation transformation is a transformation that rotates the image and fills the vacant part after rotation with adjacent values. The rotation transformation of the fish is used to simulate the phenomenon that the fish is not level in the actual processing environment.

Translation transformation
Translation transformation is to translate the image horizontally and vertically, but the resolution of the image is unchanged. For the vacant part after translation, adjacent values will be used for padding. The translation transformation can be used to solve the differences in the position of the fish in the image and to produce a large number of training images. In order to better display the translation effect, a slightly larger translation ratio column is selected for presentation, and the transformation effect is shown in Fig. 5. In the experiment, the ratio of horizontal or vertical translation length to picture length is within [0, 0.02].

Translation Transformation
Translation transformation is to translate the image horizontally and vertically, but the resolution of the image is unchanged. For the vacant part after translation, adjacent values will be used for padding. The translation transformation can be used to solve the differences in the position of the fish in the image and to produce a large number of training images. In order to better display the translation effect, a slightly larger translation ratio column is selected for presentation, and the transformation effect is shown in Figure 6. In

Scaling transformation
Scaling transformation is to randomly scale the length and width of the image, but the resolution of the image does not change. The enlarged image is intercepted and the reduced image is filled with neighboring values. The scaling transformation can be used to generate a large number of samples of different fish sizes for the network to learn from. In order to better display the scaling effect, a slightly larger scaling ratio is used for presentation. Fig. 6 shows the effect of scaling transformation. In the experiment, the ratio of scaling transformation to the image is within [0, 0.01].

Improved U-net network structure
U-net [25] is a classic network for segmentation tasks [26], which adopts the encoder-decoder structure and channel dimension splicing to integrate the multi-scale features. The network is widely used due to its advantages of supporting a small amount of data to train the model, simple structure, high segmentation accuracy and fast segmentation speed. According to the characteristics of the experimental data set as well as the large proportion of fish in the image, the U-net network structure is improved to provide a larger receptive field for the partial convolution. The improved U-net uses a 3 × 3 dilated convolution [27][28][29] with a dilation rate of 2 and one 3 × 3 convolution to partially replace the 3 × 3 convolution in the original network. The partial replacement is to avoid too many dilated convolution leading to gridding effect [30].
The improved U-net network is shown in Fig. 7, in which the red virtual box is the main part of the improvement.

Scaling Transformation
Scaling transformation randomly scales the length and width of the image, but the resolution of the image does not change. The enlarged image is intercepted and the reduced image is filled with neighboring values. The scaling transformation can be used to generate a large number of samples of different fish sizes for the network to learn from. In order to better display the scaling effect, a slightly larger scaling ratio is used for presentation. Figure 7 shows the effect of scaling transformation. In the experiment, the ratio of scaling transformation to the image is within [0, 0.01].

Scaling transformation
Scaling transformation is to randomly scale the length and width of the image, but the resolution of the image does not change. The enlarged image is intercepted and the reduced image is filled with neighboring values. The scaling transformation can be used to generate a large number of samples of different fish sizes for the network to learn from. In order to better display the scaling effect, a slightly larger scaling ratio is used for presentation. Fig. 6 shows the effect of scaling transformation. In the experiment, the ratio of scaling transformation to the image is within

Improved U-net network structure
U-net [25] is a classic network for segmentation tasks [26], which adopts the encoder-decoder structure and channel dimension splicing to integrate the multi-scale features. The network is widely used due to its advantages of supporting a small amount of data to train the model, simple structure, high segmentation accuracy and fast segmentation speed. According to the characteristics of the experimental data set as well as the large proportion of fish in the image, the U-net network structure is improved to provide a larger receptive field for the partial convolution. The improved U-net uses a 3 × 3 dilated convolution [27][28][29] with a dilation rate of 2 and one 3 × 3 convolution to partially replace the 3 × 3 convolution in the original network. The partial replacement is to avoid too many dilated convolution leading to gridding effect [30].
The improved U-net network is shown in Fig. 7, in which the red virtual box is the main part of the improvement.

Improved U-Net Network Structure
U-net [29] is a classic network for segmentation tasks [30], which adopts the encoderdecoder structure and channel dimension splicing to integrate the multi-scale features [31]. The network is widely used due to its advantage, supporting a small amount of data to train the model, simple structure, high segmentation accuracy and fast segmentation speed. According to the characteristics of the experimental data set as well as the large proportion of fish in the image, the U-net network structure is improved to provide a larger receptive field for the partial convolution. The improved U-net uses a 3 × 3 dilated convolution [32][33][34] with a dilation rate of 2 and one 1 × 1 convolution to partially replace the 3 × 3 convolution in the original network. The partial replacement is to avoid too many dilated convolution leading to gridding effect [35].
The improved U-net network is shown in Figure 8, in which the red virtual box is the main part of the improvement. The role of dilated convolution in the improved U-net structure is to expand the receptive field [36]. The schematic diagram of its work is shown in Figure 9b. In Figure 9, the stride of convolution is 1 and no padding operation is performed. It can be seen from Figure 9 that by using dilated convolution with a 3 × 3 kernel and a dilation rate of 2, the receptive field of each convolution is amplified from 3 × 3 to 5 × 5. In this way, each convolution output contains a large range of information of the original feature map, and appropriately compensates for some feature loss caused by the pooling operation in the U-net network [37].
For the dilated convolution, calculation formula of receptive field is: and the calculation formula of output feature size is:  The role of dilated convolution in the improved U-net structure is to expand the receptive field [30]. The schematic diagram of its work is shown in Fig. 8(b). In Fig. 8, the stride of convolution is 1 and no padding operation is performed. It can be seen from Fig. 8 that by using dilated convolution with a 3 × 3 kernel and a dilation rate of 2, the receptive field of each convolution is amplified from 3 × 3 to 5 × 5. In this way, each convolution output contains a large range of information of the original feature map, and appropriately compensates for some feature loss caused by the pooling operation in the U-net network [31].
(a) Convolution with a 3 × 3 kernel (b) Dilated Convolution with a 3 × 3 kernel and dilation rate 2 Fig. 8 Working diagram of ordinary convolution and dilated convolution. Fig. 9(a) shows the working diagram of a 1 × 1 convolution when both input and output channels are 1, where the value in the filter is the weight value to be learned. Fig. 9(b) shows the working diagram for input channels with N and output channel as 1. Therefore, 1 × 1 convolution in the improved U-net network is equivalent to adding an auto-learnable coefficient on the basis of each dilated convolution, and then combining multi-dimensional information [32]. This operation has the effect of enhancing cross-group information exchange and non-linearity, thereby achieving a certain degree of adaptive optimization and adjustment effect on the features  Figure 10a shows the working diagram of a 1 × 1 convolution when both input and output channels are 1, where the value in the filter is the weight value to be learned. Figure 10b shows the working diagram for input channels with N and output channel as 1. Therefore, 1 × 1 convolution in the improved U-net network is equivalent to adding an auto-learnable coefficient on the basis of each dilated convolution, and then combining multi-dimensional information [38]. This operation has the effect of enhancing cross-group information exchange and non-linearity, thereby achieving a certain degree of adaptive optimization and adjustment effect on the features collected after expanding the receptive field [39,40].
This operation has the effect of enhancing cross-group information exchange and non-linearity, thereby achieving a certain degree of adaptive optimization and adjustment effect on the features collection after expanding the receptive field [33,34].  Figures 10 and 11 show the measurement range standards for the corresponding characteristics after the actual requirements are known [35]. In Fig. 10, BL and BW represent the body length and the body width of the fish, respectively. In Fig. 11, the area surrounded by green dots and lines is the required fish area.  In Figs. 10-11, the fish bodies are all in a horizontal state. When the fish bodies are no longer close to the horizontal state, an accurate judgement the angle of the fish inclination becomes a challenge. By combining the characteristics of the fish body shape and the improved U-net to  Figures 11 and 12 show the measurement range standards for the corresponding characteristics after the actual requirements are known [41]. In Figure 11, BL and BW represent the BL and the BW of the fish, respectively. In Figure 12, the area surrounded by green dots and lines is the required fish area.

Line Fitting Scheme
This operation has the effect of enhancing cross-group information exchange and non-linearity, thereby achieving a certain degree of adaptive optimization and adjustment effect on the features collection after expanding the receptive field [33,34].
(a) Input and output channels are 1 (b) Input channel is N and output channel is 1 Fig. 5 Schematic diagram of 1 × 1 convolution operation. Figures 10 and 11 show the measurement range standards for the corresponding characteristics after the actual requirements are known [35]. In Fig. 10, BL and BW represent the body length and the body width of the fish, respectively. In Fig. 11, the area surrounded by green dots and lines is the required fish area.  This operation has the effect of enhancing cross-group information exchange and non-linearity, thereby achieving a certain degree of adaptive optimization and adjustment effect on the features collection after expanding the receptive field [33,34].

Line fitting scheme
(a) Input and output channels are 1 (b) Input channel is N and output channel is 1 Fig. 5 Schematic diagram of 1 × 1 convolution operation. Figures 10 and 11 show the measurement range standards for the corresponding characteristics after the actual requirements are known [35]. In Fig. 10, BL and BW represent the body length and the body width of the fish, respectively. In Fig. 11, the area surrounded by green dots and lines is the required fish area.  In Figures 11 and 12, the fish bodies are all in a horizontal state. When the fish bodies are no longer close to the horizontal state, an accurately judge the angle of the fish inclination becomes a challenge. By combining the characteristics of the fish body shape and the improved U-net to obtain a precise binary segmentation image, the idea of microscopicizing the fish body into multiple pixels is proposed. The least squares method [42,43] is used to fit the point set in a straight line, and the angle of the straight line can be regarded as the angle of the fish. At the same time, the method also conforms to the judgment logic of human judging the tilt direction of fish. Figure 13 shows a working diagram of the body length and width of the inclined fish calculated by combining the outer contour detection of fish and line fitting. All the intersecting lines in Figure 13 are vertical. First, the contour point set of the segmented binary image is obtained by using outer contour detection, and the linear equation of MN (y = kx + b) was obtained by using the line fitting. Second, the contour point set

Line fitting scheme
is divided into two parts by the straight line of MN. For each point set, the shortest distance d from each point to line MN was calculated [44], and the longest distance of all distances (AE, BF) and the point that reached the longest distance (point A, point B) were selected. The length of the line segments AE(l 1 ) and BF(l 2 ) is the BW of the fish. BW = l 1 ± l 2 .
obtain a precise binary segmentation image, the idea of microscopicizing the fish body into multiple pixels is proposed. The least squares method [36,37] is used to fit the point set in a straight line, and the angle of the straight line can be regarded as the angle of the fish. At the same time, the method also conforms to the judgment logic of human judging the tilt direction of fish.

Fig. 8
Working diagram of contour detection and line fitting. Figure 12 shows a working diagram of the body length and width of the inclined fish calculated by combining the outer contour detection of fish and line fitting. All the intersecting lines in Fig.  12 are vertical. First, the contour point set of the segmented binary image is obtained by using outer contour detection, and the linear equation of MN was obtained by using the line fitting. Second, the contour point set is divided into two parts by the straight line of MN. For each point set, the shortest distance from each point to line MN was calculated [38], and the longest distance of all distances (AE, BF) and the point that reached the longest distance (point A, point B) were selected. The length of the line segments AE and BF is the body width of the fish. Then, since the line AE is perpendicular to MN, the slope of the line AE is obtained according to the slope of the line MN. Based on the known coordinates of point A, the equation of the straight line AE can be obtained. Next, taking the same operation as above, the straight line AE divides the contour point set into two parts. The coordinates of the points C and D and the lengths of the line segments CH and DG are obtained, respectively. The sum of the lengths of the line segments CH and DG is the body length of the fish. Finally, four straight lines are drawn according to the known points A, B, C, D and the corresponding slope of the tangent line for a more intuitive display. The pseudo code description of the line fitting scheme is shown in Table 1. Table 1 Pseudo code description of the line fitting scheme.
Finally, four straight lines are drawn according to the known points A, B, C, D and the corresponding slope of the tangent line for a more intuitive display. The pseudo code description of the line fitting scheme is shown in Algorithm 1.
Algorithm f low : Step 1: Obtain the body_width. Follow the steps above to find the length of l 2 and the coordinates of point B; 13: end if 14: end for BW = l 1 + l 2 .
Step 2: Obtain the body_length. Pass point A to make AE perpendicular to MN, and drop foot to point E; Follow the steps above to find the length of l 4 and the coordinates of point D; 27: end if 28: end for BL = l 3 + l 4 .
Step 3: Obtain the tile angle. According to point A, pint B, point C, point D and related slopes, the four tangent lines corresponding to the inclined fish can be obtained. Fish tilt angle is α = arctank.

Experimental Environment and Parameter Settings
The experimental environment is the ubuntu18.04.1 operating system, Tesla v100 GPU, keras platform and python3. To ensure the reliability of the experiment and the adequacy of the network training, we set the batch size to 2, the learning rate to 5 × 10 −6 , the epoch to 50, and the number of iterations per round to 600. For the label of the data set, labelme software [45] is used to obtain the mask image in the experiment. Then the mask image is converted into a binary image in uint8 format to be trained as a labeled image. For the data set and code used in the experiment, please see the link in Supplementary Materials.

Evaluation Indicators
In the experiment, to evaluate the effect of network segmentation, a better analysis of the network performance and the needs of actual production are considered. mIoU [46], average accuracy rate, average recall rate [47], and average area relative error are used as evaluation indicators. To evaluate the measurement of BL and BW of fish, the average relative error is used as the evaluation indicator.

Improved U-Net Performance Verification
In order to initially verify the performance of the improved U-net network structure, no rotation transformation is performed on the original data set. The line chart of IoU, accuracy rate, recall rate and area relative error of 50 test images by two networks is shown in Figure 14. From Table 2, the results of mIoU, average accuracy rate, average recall rate and average area relative error are calculated. It can be seen from the data that the improved network is better than that of U-net in terms of performance, and the average relative error is reduced to about a half.

Feature Measurement for Tilted Fish
To simulate the actual production environment, the fish may be tilted when placed on a conveyor belt or on a fish measuring plate. The environment is simulated by randomly rotating the training and test sets within an angle of [−45 • , 45 • ]. To further verify the applicability of the improved U-net network, two models are generated using original and improved network training, and 50 test images are tested. The experimental results of segmentation by using U-net and improved U-net are shown in Figure 15, where the detected area is cropped and some obvious segmentation differences are represented by blue frames for better display.

Feature measurement for tilted fish
To simulate the actual production environment, the fish may be tilted when placed on a conveyor belt or on a fish measuring plate. The environment is simulated by randomly rotating the training and test sets within an angle of [-45°, 45°]. To further verify the applicability of the improved U-net network, two models are generated using original and improved network training, and 50 test images are tested. The experimental results of segmentation by using U-net and improved U-net are shown in Fig. 15, where the detected area is cropped and some obvious segmentation differences are represented by blue frames for better display.  From Figure 15, the improved U-net has a better segmentation effect on the edge of the fish, and the improved U-net can accurately segment the requested inclined fish BA. The line chart of IoU, accuracy rate, recall rate and area relative error of 50 test images by two networks is shown in Figure 16.  As shown in Table 3, the results of mIoU, average accuracy rate, average recall rate and average area relative error are calculated. According to the data in Table 3, the improved U-net is still better than U-net with mIoU as high as 97.6%. Compared with U-net, the average relative error of the area is still reduced to about a half. To measure fish BL and BW, the line fitting scheme in this paper is also compared with the commonly used circumscribed rectangle [48] and the smallest circumscribed rectangle [49] method in the experiment. In order to ensure the accuracy of the experiment, the true length and width of the test fish are expressed by the number of pixels in the picture occupied by the BL and BW. The average value of three manual measurements is used as the standard value. The test result images are shown in Figure 17. As shown in Fig. 17, the yellow line represents the result of the line fitting scheme, the red box represents the result of the smallest circumscribed rectangle scheme, and the green box represents the result of using the circumscribed rectangle scheme. From Fig. 17 (c) and (d), when the fish is close to horizontal, the smallest circumscribed rectangle is still inclined at a certain angle. This will lead to a large error. As shown in Fig. 17, the yellow line represents the result of the line fitting scheme, the red box represents the result of the smallest circumscribed rectangle scheme, and the green box represents the result of using the circumscribed rectangle scheme. From Fig. 17 (c) and (d), when the fish is close to horizontal, the smallest circumscribed rectangle is still inclined at a certain angle. This will lead to a large error. As shown in Figure 17, the yellow line represents the result of the line fitting scheme, the red box represents the result of the smallest circumscribed rectangle scheme, and the green box represents the result of using the circumscribed rectangle scheme. From Figure 17c,d, when the fish is close to horizontal, the smallest circumscribed rectangle is still inclined at a certain angle. This will lead to a large error.
To observe the performance differences in measurement by using different measurement schemes more intuitively, line charts are drawn with the relative error as the indicator. From Figures 18-20, the line fitting scheme proposed in this paper is superior to the circumscribed rectangle and the smallest circumscribed rectangle. The improved U-net network is better than the previous network. It can be seen from Table 4 that the comprehensive scheme of improved U-net and line fitting can accurately measure the BL and BW of inclined fish. At this time, the relative errors of BL and BW are 0.37% and 0.61%, respectively. Because the line fitting method is used, the specific tilt angle of the fish body can be obtained from the slope of the fitted straight line. The angle parameter is also of great significance for object grabbing [50].

Conclusions and Future Work
Obviously, the accurately measured morphological characteristic data can be used as an important reference for feeding, fishing, classification and genetic breeding in aquaculture research. This paper proposes an accurate method for measuring the actual size for the length, width and area of the fish body. This method is especially effective for measuring the characteristics of the fish body in a tilted state. The proposed method mainly includes a data set expansion module, a segmentation module using improved U-net model, and the least square linear fitting module, which can achieve the segmentation of a tilted fish body in the images and accurate measurement of various characteristics. The experimental comparison results of various metrics show that the performance of the measurement system is indeed improved. Specifically, the mIoU of the improved U-net model can reach 97.6%. The average relative error of the fish BA can be reduced to 0.69%. The average relative error of the BL and width can be reduced to 0.37% and 0.61%, respectively. In conclusion, the proposed method can achieve the purpose of accurate measurement of fish body morphological characteristics in practical applications. In addition, the inclination angle of the fish body obtained in the calculation process can also be used as a useful parameter for realizing the automatic capture of the fish.
The research of this paper divides the fish body segmentation, the pixel size calculation, and the conversion from pixel size to actual size into three different steps. In our future research, we will consider combining segmentation, pixel size prediction and actual size conversion into one network model in one step. This can greatly improve the efficiency of the measurement process.