Next Article in Journal
Preliminary Study Finds LEDs, UV Lights, and C-Type Hooks May Reduce Sustainability in Aegean Small-Scale Fisheries
Next Article in Special Issue
A Novel Dual-Index Analysis Method for Quantifying Fish School Feeding Intensity Using Average Swimming Speed and Feeding Aggregation Speed
Previous Article in Journal
Status and Development Potential of Bellamya Aquaculture in Asia: Ecology, Integrated Farming Models, and High-Value Utilization
Previous Article in Special Issue
An Edge-Ready Lightweight Computer Vision Framework for On-Site Fish Disease Detection in Aquaculture Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrated Laser Imaging for Fusiform Fish Measurement in Aquaculture

1
Key Laboratory of Fisheries Remote Sensing, Ministry of Agriculture and Rural Affairs, East China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Shanghai 200090, China
2
School of Ocean Engineering and Technology, Sun Yat-sen University, Zhuhai 519082, China
*
Author to whom correspondence should be addressed.
Fishes 2026, 11(5), 298; https://doi.org/10.3390/fishes11050298
Submission received: 12 March 2026 / Revised: 15 May 2026 / Accepted: 15 May 2026 / Published: 18 May 2026
(This article belongs to the Special Issue Computer Vision Applications for Fisheries and Aquaculture)

Abstract

This paper details the implementation of an integrated engineering framework for the real-time assessment of pose and size in fusiform fish, utilizing laser-camera technology. The design, comprising a camera and laser emitter, leverages laser triangulation for accurately measuring distances between key points, providing a reliable baseline for data comparison. Enhanced with the yolov7 model backbone, it includes detection and segmentation features, enabling precise image instance segmentation of fish and laser lines. The system’s dual-network structure, which combines fully connected regression and DSNT-MobileFaceNet networks, efficiently identifies six crucial landmarks on fish—an essential step for detailed pose analysis. This method facilitates the accurate determination of two-dimensional fish posture by analyzing the relative positions of these landmarks. A notable capability of this system is its ability to infer depth information from laser lines on the fish’s body, aiding in the accurate measurement of dimensions such as body length and depth. Empirical results demonstrate the system’s effectiveness, with high mean Average Precision (mAP) values for both object detection (0.9560 for fish, 0.8550 for laser lines) and segmentation (0.9740 for fish, 0.8420 for laser lines). The DSNT-MobileFaceNet network, in particular, shows excellent fitting accuracy with an R 2 value of 0.9170. The deep learning model achieves an average error rate of 7.75% in detecting fish data, markedly improving upon the baseline error rate of 14.70%. Overall, this study confirms the proposed system’s capability in accurately assessing fish pose and size. As a rigorous proof of concept validated in a controlled laboratory environment, this work establishes a foundational framework for non-invasive morphological monitoring, suggesting its future applicability in marine biology and aquaculture.
Key Contribution: This study develops an integrated non-invasive laser-camera system that combines instance segmentation with a DSNT-based landmark detection network, achieving a significant reduction in measurement error to 7.75% for real-time 3D posture and morphological assessment of fusiform fish in aquaculture.

1. Introduction

1.1. Background and Related Works

The study of fish measurement is dedicated to scientifically measuring the morphology and anatomy of fish, with the goal of comprehending their morphological characteristics and adaptation to the environment. Fish measurement encompasses multiple aspects, such as length, weight, gills, muscles, and organs. Among these aspects, length and weight represent the oldest and most extensively employed methods in fish measurement research. The data obtained from these measurements serve as fundamental information for investigating fish ecology, behavior, taxonomy, and evolutionary biology.
Fish farming serves as a representative case in fish measurement, where efficient health assessments are essential. However, the current industry heavily relies on manual measurements, which are time-consuming, labor-intensive, and potentially harmful to the fish. The emergence of computer vision-based methods, propelled by advancements in image processing technology, holds great promise for fish measurement. While the utilization of computer vision for fish measurement is not a recent development, notable progress has been made in recent years. As early as 1993, N. Strachan [1] attempted to measure the length of gutted haddock using computer vision techniques, albeit under stringent conditions. The approach involved placing gutted haddock in a designated container on a standard conveyor belt and capturing 512 × 512 resolution images with specific lighting for subsequent analysis. By estimating pixel distances and referencing the size of an object in pixels, precise fish length calculations were achievable given the fixed distance between the fish and the camera. The primary focus of this research approach was accurate target identification in images. In more recent studies, the utilization of deep learning techniques, such as Mask R-CNN [2], has significantly enhanced the measurement performance of computer vision-based methods. C. Yu et al. [3] employed designated equipment to capture images, trained a model using Mask R-CNN, and applied the model to segment fish images and extract morphological features. In controlled backgrounds, the average error for all measured parameters was below 2.8%, with length and width demonstrating average errors below 0.8%. In complex backgrounds, the average errors for all parameters were less than 3%, with length and width exhibiting average errors below 1.8%. The notable accuracy achieved in this approach can be attributed to the adoption of high-performance object segmentation models. However, while this method demonstrates high precision in fish measurement, its suitability for health monitoring in fish farming is limited. In aquaculture settings, fish are in constant motion, resulting in variable distances between the fish and the camera equipment. As the distance changes, the representation of a unit pixel in the imaging plane should correspondingly vary. In the absence of depth information, this measurement approach becomes ineffective.
To extract information from images, computer vision employs two primary methods: binocular imaging [4,5] and laser-camera combinations [6]. Binocular imaging systems simulate the human visual system’s observation of a scene by utilizing two cameras. By analyzing the disparity (difference) between the images captured by each camera, the depth of objects in the scene can be calculated. This approach, with its cost-effectiveness and high resolution, finds extensive applications in extracting three-dimensional spatial distribution information [7,8]. However, it has two major limitations. Firstly, it relies on passive object textures to provide reliable disparity information for depth calculation. Secondly, because it depends on passive texture matching, it is highly sensitive to changes in ambient lighting conditions and water turbidity. When lighting is poor or water is cloudy, surface textures are obscured, causing stereo depth calculations to fail significantly. Recent advancements in underwater stereo vision utilize keypoint-based and post-detection fusion approaches to significantly improve measurement robustness [9,10,11]. However, passive stereo systems still fundamentally rely on identifying corresponding features on the object’s surface. Consequently, they remain sensitive to extreme changes in ambient lighting and water turbidity. When lighting is poor or water is cloudy, natural surface textures and keypoints are easily obscured. This makes reliable feature matching challenging. In contrast, an active laser system provides a high-contrast reference line. This reference is independent of the natural texture of the fish. This active illumination ensures reliable depth extraction even when ambient conditions degrade. Another method for obtaining depth information is the laser-camera combination, which utilizes laser triangulation principles. In this approach, a camera records the position information of the laser, and depth is calculated using triangulation. This method offers advantages such as high accuracy and independence from texture constraints, making it widely used for measuring crop biomass density [12] and conducting three-dimensional scanning in variable field settings [13]. The depth resolution of the laser-camera combination is lower compared to binocular imaging systems. When dealing with data from the experimental tank, factors such as lighting, impurities, and water turbidity can introduce high instability and obscure texture information. Consequently, the laser-camera combination demonstrates greater utility in fish farming applications.
Research on calculating fish morphological data using depth information is relatively limited. Among these studies, the majority have focused on binocular imaging systems. For instance, R. Cheng et al. [14] estimated the length of underwater fish using a binocular imaging system, while H. Liu et al. [15] combined a binocular imaging system with the YOLO [16] V5 network to estimate length and width. Recent studies focusing on large species such as tuna have demonstrated the necessity of robust methods to extract fork length and snout-fork length in unconstrained open-water environments [17,18]. Within the context of laser-assisted measurement, recent foundational work has demonstrated the high viability of combining deep learning with active line lasers. Notably, a 2024 study successfully utilized a YOLOv8 model and a line laser system to measure the bodies of underwater fish in inclined positions [19]. Building upon these excellent advancements, we employ a laser-camera system to extract depth information in this study. We utilize an effective integration of object segmentation and landmark detection algorithms to extract fish body postures and calculate morphological data. The objective of this paper is to establish a foundational framework for non-invasive morphological extraction. Considering the complexity of aquatic morphology, which encompasses different shapes such as Fusiform, Boxy, and Globiform, our system is specifically validated for Fusiform fish as a controlled laboratory proof of concept. Hence, the term ‘fish’ used hereafter refers strictly to Fusiform fish.

1.2. Main Contributions

  • Six-Landmark Definition Scheme. We propose a six-landmark definition scheme specifically designed for fusiform fish. While traditional geometric or skeleton-based modeling provides dense representations, they are computationally expensive and susceptible to underwater noise. Our scheme represents an effective integration of biological necessity and computational efficiency, targeting only the essential anatomical points required for morphometric calculations. To detect these landmarks, we implement a landmark detection network using MobileFaceNet [20] and the DSNT (Differentiable Spatial-to-Numerical Transform) module.
  • Laser Triangulation Method for Fish Measurement. A key contribution of this study is the application of laser triangulation for measuring the posture, body length, fork length, and body depth of fusiform fish. Unlike traditional methods that rely on manual optical calibration and are easily disrupted by water turbidity, our approach effectively integrates laser triangulation with deep-learning instance segmentation to automatically extract 3D morphometric data from 2D pixel coordinates, offering a robust advancement for continuous monitoring. This approach not only improves the accuracy of measurements but also enables the real-time analysis of fish morphology.

2. Materials and Methods

2.1. Materials

2.1.1. Data Collection

The data used in this study were obtained through experimentation, and the processed data are openly available for disciplinary research. Although cameras conveniently capture the overall appearance of an object from a specific angle, they lack depth information. Binocular cameras and laser triangulation have been commonly used to overcome this limitation. In this study, we used laser triangulation in combination with cameras to obtain complete information, including depth. The experimental hardware employed for data collection is illustrated in Figure 1. Both traditional laser triangulation and the proposed method, which combines laser triangulation with computer vision, require only a single camera and a laser emitter. The use of two cameras in the data acquisition device shown in Figure 1 serves two purposes: it doubles the amount of data collected in the same amount of time, and the different angles and positions of the two cameras increase the data coverage.
Notably, Widder et al. [21] had demonstrated that red light is significantly less disruptive to fish compared to white light in the context of observing fish behavior. In the experiments conducted for this paper, which involved edible fish species, the exposure to intense red light was brief, leading to no significant, observable impact on the fish. This finding is crucial, as it aligns with ethical research practices and ensures the welfare of the fish used in the study. By utilizing a light source that minimizes disturbance, the study contributes to humane and responsible research methodologies in the field of marine biology.
Eight individual fish representing four edible species (blackfish, catfish, crucian carp, and sea bass) were procured from a local market and reared in a 2 m diameter circular experimental tank. To monitor the fish’s survival and collect data, the data acquisition system was placed inside the experimental tank, as shown in Figure 1. Morphological measurements of these eight fish were recorded manually and used as reference values for the subsequent experiments in this study, as presented in Table 1.

2.1.2. Data Processing

The collected data in this study are used for computer vision methods based on traditional laser triangulation and laser triangulation principles for fish measurement and body condition monitoring. This includes target extraction of fish and laser lines, as well as landmark extraction of fish. Hence, the quality of the data will significantly impact the entire experiment. As a wide-angle camera is used to capture the entire situation inside the experimental tank, the collected images may have a certain degree of wide-angle distortion due to the camera’s presence. After collecting a sufficient number of representative data, it is necessary to correct the wide-angle images first. Once the correction is completed, the target detection segmentation datasets and landmark extraction datasets can be created.
To correct the distorted images, we employed the chessboard calibration method [22] in this study. In laboratory conditions, a wide-angle camera was used to capture the distorted images, and we computed the internal parameter matrix and distortion coefficient of the camera by imaging the calibration board. We used the acquired data to correct all images, as shown in Figure 2.
Two datasets were created using the corrected images: an object detection segmentation dataset and a fish landmarks extraction dataset. To mark the object segmentation dataset, we used the Roboflow tool with the polygonal annotation method. This method extracts the minimum rectangular box containing the left and right edge points for the object segmentation dataset, which can be used in object detection networks such as YOLO. We manually completed the annotation of fish and laser lines in a representative subset of 298 images, labeling all fish uniformly as “fish” and laser lines as “light”. To improve the model’s final generalization ability from the data perspective, we augmented the annotated images using Shear, Grayscale, Hue, Brightness, etc. The Shear range was ±15° horizontally and ±15° vertically, and the Grayscale was applied to 25% of the images. The Hue range was between −41° and +41°, and the Brightness range was between −30% and +30%. We divided the enhanced dataset into training, validation, and testing sets. Table 2 shows the distribution of images and labels in each dataset.
In the field of face detection, landmarks have been widely studied and applied in various areas, such as expression recognition [23]. However, there is a lack of research on fish landmarks, and publicly available datasets are not yet available. Fish landmark detection has the potential to be applied in areas such as fish body posture monitoring and health monitoring. This study proposes a fish landmark annotation standard consisting of six key points, which are illustrated in Figure 3. Compared to dense skeleton extraction, which often fails under partial occlusion, this sparse point strategy ensures higher robustness in complex aquaculture environments.
The dataset creation process comprises three main steps. Initially, all fish classes’ minimum rectangular boxes are cropped out based on the annotations in the object detection and segmentation dataset. Subsequently, all fish landmarks are manually annotated according to the standard in Figure 3 using the open-source tool, imgLab. The resulting six landmarks are then arranged in a specific order. Finally, the XML format label file is converted to the TXT format label file, with each line representing a single image containing 17 elements separated by commas. The first element denotes the file name, and the next four elements indicate the starting coordinates (x, y) and width and height (w, h) of the rectangular box. The last 12 elements represent the coordinates of the six landmarks. To ensure the landmark data’s quality, we excluded images from the dataset where less than half of the fish body is visible. This is because these images have little meaning for landmarks. We then split the landmark dataset into training, validation, and test sets in an 8:1:1 ratio, similar to the object detection and segmentation dataset.
The dataset preparation involved two distinct scales to serve the dual-network architecture. The object detection dataset consists of 298 full-frame images. The landmark detection dataset comprises 631 cropped regions of interest. This numerical difference occurs because a single full-frame image can yield multiple valid fish crops. Additionally, some fish instances were suitable for bounding-box detection but were excluded from the landmark dataset due to severe posture deformation or occlusion of critical anatomical points. To minimize the risk of data leakage caused by temporal correlation in video frames, we avoided random shuffling of the entire image pool. The training, validation, and test sets were constructed by sampling images from temporally separated video segments. This ensured that nearly identical adjacent frames were not distributed across different sets. However, this dataset size is relatively small for accurate landmark detection. Moreover, upon manual observation, it was noted that some of the 631 images exhibited characteristics such as blurriness and unclear features. To address these limitations and enhance the model’s generalization capability, we applied various data augmentation techniques. These techniques included Random Scale, Random Translate, Random Shear, Random Mask, Random Blur, Random Brightness, Random Rotate, and Random Center Crop. Each augmentation method was executed with a probability of 0.5 and within specific parameter ranges, as depicted in Figure 4.
The primary dataset is derived from only eight individual fish. While digital data augmentation techniques improve algorithmic robustness against noise and lighting variations, they cannot synthesize true biological diversity. Moreover, although we employed temporal separation to prevent adjacent-frame leakage, the limited subject pool means the test set evaluates the model on the same individual fish present in the training set. Therefore, this proof of concept currently demonstrates algorithmic feasibility rather than broad biological generalization. Future large-scale deployments must prioritize subject-independent validation protocols (e.g., leave-one-subject-out cross-validation) using a vastly expanded pool of distinct biological individuals.

2.2. Laser Triangulation for Fish Measurement in the Experimental Tank

Laser triangulation is a commonly used method for distance measurement that takes advantage of the propagation characteristics of a laser beam in space. By measuring the laser signal reflected back from a target object, the distance between the target object and the measuring device can be calculated. This distance measurement method is widely used in various fields, such as industry, military, and geological exploration. The principle of laser triangulation is based on the geometry of a triangle. The measuring device emits a laser beam towards the target object, which intersects with the surface of the target object and is reflected back to the measuring device. By controlling the emission angle of the laser beam and the angle at which the reflected laser is received by the measuring device, the lengths of two internal angles and one side of a triangle can be obtained. Based on the geometry of the triangle, the distance between the target object and the measuring device can be calculated.
As shown in Figure 5a, when the laser emitted by the Laser is directed at the target object Obj1, it forms a spot P on the surface of the object. The spot P is then reflected back to the camera and ultimately forms a point P’ on the Imaging Surface. When the object Obj1 moves to the position of Obj2, the imaging position of the spot Q also moves to Q’. Therefore, when the distance between the object and the laser changes from d1 to d2, the corresponding point on the imaging surface also experiences a displacement of ∆d. In fact, there is an inherent relationship between these two variables. As shown in Figure 5b, the laser emitted by the Laser device forms a spot P on the object Obj and is mapped to point P’ on the imaging surface of the camera (N). The angle between the laser ray and the vertical direction is α. On the imaging surface, point Q is located such that the line QN is parallel to the line AP. Therefore, we have △QNP’∽△APN by the corresponding sides and perpendicular lines of similar triangles, which yields q/s = f/x. Since x is composed of x1 and x2, and the focal length of the camera f is often known, x1 can be calculated from the focal length and angle α, and x2 should be determined based on the size of each pixel in the image and the position of P’ with respect to the center of the image. Therefore, the distance d between the laser and the object can be calculated using Equation (1):
d = q sin α = f × s × sin α f tan α + S pixel × P center = f × s × sin 2 α f × cos α + sin α × S pixel × P center
where S pixel is the edge length of each square pixel on the imaging surface in the final generated image, in units of cm/pixel, and Pcenter is the pixel distance from the mapping point of the target point on the imaging surface to the image center in a certain direction, in units of pixels. f represents the focal length of the camera, s represents the distance between the laser emitter and the camera lens (in the experiment designed in this paper, the laser emitter and the camera lens are kept on the same vertical line), q represents the distance from the target point P on the object to the straight line where the laser emitter and the camera are located, and for ease of calculation, the units of f, s, and q are all in cm. Equation (1) shows that the distance between the laser and a certain point on the object can be calculated based on the five variables f, s, α, S pixel, and P center, of which f is a camera parameter that can be obtained from hardware documentation, and the other four variables can be obtained through measurement.
Laser triangulation is a mature traditional technology that has been extensively applied in many fields. However, its application in both aquaculture and offshore fisheries is rare. One major reason is that, compared to distance, the fisheries industry is more concerned with information such as body length, fork length, body depth, etc. For aquaculture, fish body shape and health condition are also valuable information. In order to better utilize laser triangulation in fish measurement, we have optimized it in two ways. The first optimization involves relocating the laser emitter from the traditional “I” shape to the center of the system’s rotation, thereby reducing the relative movement velocity of the laser emitter and the water. The second optimization is to extend laser triangulation from distance measurement to measuring the length attribute of objects. In Figure 5b, in addition to calculating the distance d from the laser emitter to the object Obj according to Formula (1), the distance between points P and N can also be obtained by the relationship between △APO and △APN, as shown in Formula (2):
| P N | = q 2 + | O N | 2 = q 2 + | s d 2 q 2 | 2
The distance between points P and N plays a crucial role in measuring the size of fish. As shown in Figure 6a, the optimized linear laser is fixed on a stand and emits a laser beam onto a plane. Points M and N on the figure represent the head and tail of a fish, respectively. Assuming that the laser beam precisely strikes the fish between points M and N and reflects onto a camera, two different light spots will form on the camera’s imaging plane. By using the laser triangulation method and Formula (2), the lengths of AN and AM can be calculated. In the plane of △AMN, point D can be introduced such that the line AD is perpendicular to line NM. The relationship between point D and line segment MN affects the calculation of the length of segment NM (the length of the target fish). However, since the laser emitter and the camera are kept on the same vertical line, the midpoint of the laser beam and the camera’s line is perpendicular to the NM line. Therefore, point D is the midpoint of the laser beam. There are two possible cases for the relationship between point D and line segment MN: when point D is not on MN (Figure 6b) and when point D is on MN (Figure 6c). The calculation formula for the length of MN in these two cases is given by Formula (3):
| M N | = s d 1 2 q 2 s d 2 2 q 2 s d 1 2 q 2 + s d 2 2 q 2
where
q = f s sin α f cos α + sin α × S pixel × P center ,
d1 and d2 represent the distance between the laser emitter and the points M and N, respectively, while the other variables correspond to those in Figure 5b.
In our comparative experiments, the baseline traditional values were obtained by applying standard color thresholding to segment the red laser line. The extreme points of the fish were then manually defined in the two-dimensional plane to calculate dimensions via traditional triangulation formulas.

2.3. DL-Driven Framework for Laser-Based Morphometric Measurement

The study introduces an effective integration of deep learning algorithms to automate and refine the extraction of laser triangulation data. It is crucial to note that the deep learning algorithms do not replace the laser. Rather, they replace traditional and error-prone manual image processing by robustly extracting the fish landmarks and the laser line from turbid underwater images. The laser module remains the irreplaceable source for calculating absolute physical depth. It is crucial to note that the deep learning algorithms do not replace the laser; rather, they replace traditional, error-prone manual image processing (e.g., thresholding) to robustly extract the fish landmarks and the laser line from turbid underwater images. The laser module remains the irreplaceable source for calculating absolute physical depth. This approach consists of four sub-steps. The first step, discussed in Section 2.2, involves optical calibration. In the laser-camera system constructed for this study, the position of the laser line in the captured images changes as the target object’s position relative to the camera changes, following a deterministic relationship. Therefore, optical calibration is essential to establish the relationship between the position of the laser line and the actual distance. The second step focuses on detecting the fish and the laser emitted by the laser emitter in the images. Well-established techniques in deep learning’s object detection field can accomplish this step. Additionally, image segmentation can be employed to enhance accuracy and refinement. In the third step, key points (landmarks) of the detected fish are extracted. These landmarks serve as reference points for measuring the fish’s morphological data, where the pixel distances between different landmarks represent the pixel-based manifestation of the morphological data. Finally, based on the outcomes of the previous three steps, an algorithm is developed to accurately calculate the fish’s morphological data.

2.3.1. Optical Calibration

The previous analysis illustrates the existence of a deterministic relationship in the laser-camera system between the distance from the object to the camera and the imaging position of the laser line. Theoretical analysis indicates that this relationship is linear. When the object remains at a fixed distance from the camera, the mapping of real-world measurements to pixel values in the captured image becomes well-defined. In this section, three variables, X, Y, and Z, are defined. X represents the distance from the camera to the calibration board (equivalent to the measured object) in centimeters (cm). Y represents the distance from the laser line to the top of the captured image (assuming the emitted laser line is horizontal) in pixels. Z represents the pixel length corresponding to a known physical length of 5 cm on the calibration board. Thus, the previous analysis can be summarized as a deterministic linear relationship between X and Y, as well as between X and Z. The calibration board used in this study is a custom-made steel board, depicted in Figure 7. The holes on either side of the calibration board are designed for underwater calibration. The board features alternating black and yellow lines, dividing it into 10 cm segments to mitigate challenges associated with underwater imaging. During the calibration process, the laser line is directed onto the horizontal line at the center of the calibration board to minimize errors caused by camera distortion.
The investigation focused on examining the relationship between X and Y. We systematically varied the distance X between the calibration board and the camera, ranging from 50 cm to 146 cm in 2 cm increments. For each of the two cameras, we measured the distance Y from the imaged laser line to the top of the image, resulting in 49 sets of calibration data. Theoretical analysis suggests that both the X-Y and X-Z relationships are linear. However, in real-world underwater optical environments, standard in-air calibration principles are fundamentally disrupted by the refractive index of water. The air–glass–water interface alters the apparent focal length and introduces systematic, non-linear refractive distortions. Therefore, the linear relationship assumed in basic triangulation is skewed. To address this issue and compensate for these systematic refractive effects, we incorporated quadratic terms into the distance fitting process. This serves as an empirical approximation. While rigorous underwater optical calibration utilizing precise physical models (such as the PinAx model) is ideally required to intrinsically eliminate these distortions, our empirical quadratic correction represents an effective integration for maintaining accuracy within the constrained, near-field range of our experimental setup. To address this issue, we incorporated quadratic terms into the fitting process. Consequently, we compared the outcomes of linear and quadratic polynomial fits and employed the coefficient of determination (R2) as the evaluation metric. The calibration results, presented in Figure 8, illustrate the findings.
Figure 8 illustrates the data with blue dots representing the recorded measurements. The yellow dashed line represents the result of linear fitting, while the red dashed line represents the result of quadratic polynomial fitting. The R2 values clearly indicate that the quadratic polynomial fitting outperforms the linear fitting for both cameras. However, it is important to note that the quadratic polynomial fitting exhibits high precision only within a specific range, with deviations occurring beyond that range. Since our experimental setup employed a circular fish tank with a diameter of 2 m, the calibration data collection was limited to a certain range. Consequently, for subsequent experiments, only the data within the distance range of 50–150 cm were utilized to ensure measurement accuracy. In the case of larger fish tanks, recalibration to obtain new parameters would be necessary to ensure precise measurements. The relationship between X and Y for the two cameras, obtained through fitting, is expressed by Equation (4).
Y = 0.054 X 2 + 16.073 X 652.15 , 50 X 150 , Camera 1 0.083 X 2 + 26.038 X 1084.8 , 50 X 150 , Camera 2
The relationship between the distance X from the measured object to the camera and the pixel length Z corresponding to the calibration board’s unit length (5 cm) can also be determined using a similar approach. However, unlike the calibration of the X-Y relationship, calibrating the X-Z relationship involves obtaining average pixel lengths (Z values) for multiple unit lengths on the calibration board. The calibration results are presented in Figure 9.
The relationship between the distance X from the measured object to the camera and the corresponding pixel length Z is determined through fitting for the two cameras, as expressed by Equation (5).
Z = 0.0016 X 2 4.7134 X + 442.87 , 50 X 150 , Camera 1 0.0113 X 2 3.5278 X + 366.17 , 50 X 150 , Camera 2
Accurately calculating the real-world morphological data of the target object from pixel positions requires not only establishing the relationships between X-Y and X-Z, but also accurately identifying the target object and laser line in the images. This crucial task can be accomplished by leveraging object detection and segmentation techniques in deep learning.

2.3.2. Object Detection and Object Segmentation

Object detection and object segmentation are well-established techniques in the field of computer vision. Object detection aims to identify the locations of objects in an input image and determine their corresponding classes. Object segmentation can be further categorized into instance segmentation and semantic segmentation. Semantic segmentation assigns a semantic label to each smallest unit, such as a pixel in a 2D image, enabling accurate object category prediction for every pixel. In contrast, instance segmentation focuses on classifying specific objects rather than every pixel, providing masks and class labels for each individual object. This distinction limits the ability of semantic segmentation to differentiate between individual instances of the same class. In this paper’s fish and laser line segmentation task, semantic segmentation is used to classify pixels as fish, laser line, or background, while instance segmentation is employed to generate masks for each fish and laser line. Since the experiment’s goal is to measure the morphological data of individual fish, distinguishing between different instances is crucial. Therefore, the subsequent sections of this paper specifically refer to instance segmentation when using the term “segmentation.”
In the experiments conducted in this paper, object detection is just one of the sub-steps, requiring a strict balance between real-time inference speed and accuracy. At the time of system design, YOLOv7 provided a highly mature and stable backbone for edge deployment. While more recent architectures have emerged, YOLOv7 effectively meets the real-time requirements of our pipeline. To achieve this, the YOLOV7 backbone network structure is adopted in this paper, and the ISegment module is incorporated into its head structure, as shown in Figure 10.
The ISegment module comprises two components: the detection head and the segmentation head. In Figure 10, the “Detect” and “Proto” represent these two components, respectively. The detection module processes the feature map to derive object detection results, while the Proto module generates image segmentation results. By combining the results of image object detection and segmentation, this design achieves the desired outcomes for fish morphological measurement in this study. One key advantage of this design is its utilization of the same feature map to concurrently perform object detection and segmentation, thereby eliminating unnecessary time consumption associated with additional network structures applied to the image.

2.3.3. Landmarks Detection

The detection of landmarks is commonly employed for tasks like human skeletal joint detection and facial feature detection. One example is the widely used 68-point facial landmark localization method, which serves as a typical application of landmark detection [24]. In contrast, another approach to landmark detection is based on Gaussian heatmaps. This method utilizes upsampling to combine high-dimensional feature maps with low-dimensional feature maps, generating heatmap outputs with channel numbers corresponding to the number of landmarks and spatial dimensions aligned with the input image. The position coordinates of the landmarks are then obtained by applying the Argmax function to each channel. Several experiments have demonstrated that Gaussian heatmap-based landmark detection outperforms fully connected regression-based detection. Nevertheless, this approach possesses an inherent error floor due to the usage of Argmax, which forces coordinates to integer pixel values, introducing quantization errors. This can notably impact performance, particularly in low-resolution or blurry underwater images where precision is critical. To overcome these limitations, an effective alternative called Differentiable Spatial-to-Numerical Transform (DSNT) [25] has been proposed for Gaussian heatmap-based detection. Unlike Argmax, DSNT calculates the spatial expected value of the heatmap, enabling fully differentiable, sub-pixel coordinate regression. The DSNT structure demonstrates exceptional spatial generalization capability and is fully differentiable. One of the major advantages of DSNT is its ease of integration into any fully connected network.
MobileFaceNet is a prominent face verification model specifically tailored for high-precision real-time face verification on mobile and embedded devices. Compared to MobileNetV2, MobileFaceNet achieves more than twice the inference speed while maintaining superior accuracy. With a model size of 4 MB, it achieves impressive accuracy rates of 99.55% on LFW and 92.59% on MegaFace. Regarding its structure, the main difference between MobileFaceNet and MobileNetV2 lies in the optimization of global average pooling. In MobileFaceNet, global average pooling is replaced with global depthwise convolution (GDConv). Inspired by this optimization approach, our study also adopts GDConv instead of traditional global average pooling in the design of the basic convolutional neural network. This replacement introduces learnable weights for each position, enhancing the network’s capabilities. In this paper, both the fully connected regression-based landmark detection and DSNT-based landmark detection network architectures are designed based on MobileFaceNet, as depicted in Figure 11.
In Figure 11, the network architecture consists of the Depth wise Convolution operation represented by DWConv and the inverted residual module represented by the Inverted Res module. The parameter dw_num in the Inverted Res module indicates the number of depthwise separable convolution operations within the module. The term “groups” in the Conv module refers to the number of groups in the grouped convolution. Following the fully connected layer, the feature vector has a size of 1 × 16. This is because, in addition to the 6 landmark points, the top-left and bottom-right corners of the minimum bounding box enclosing the fish are also considered two landmark points, resulting in a total of 8 points with 2D coordinates, thus comprising 16 numbers. In the DSNT structure, matrices X and Y are obtained through a formula based on the dimensions of the feature matrix:
X i , j = 2 j ( n + 1 ) n ,   Y i , j = 2 i ( m + 1 ) m ( i = 1 , 2 , m ;   j = 1 , 2 , n ,
where m and n represent the width and height of the feature matrix, respectively. The symbol < A , B > F denotes element-wise multiplication of corresponding elements in matrices A and B, followed by summation. The loss calculated in the DSNT structure is the average of the Euclidean loss and the regression loss.
In recent years, there has been a growing interest in landmark point detection. YOLOV8, a well-known open-source library for object detection, has also introduced a pre-trained model called YOLOV8-POSE [26] for landmark point detection. In our study, we utilized this model for fish landmark point detection and compared it with the fully connected regression network and the DSNT-based landmark point detection network.

2.3.4. Algorithm Design of Fish Morphological Data Recognition

In the laser-camera system utilized in this study, fish morphological data detection can be divided into two scenarios. The first scenario occurs when the camera captures the fish, but the laser line does not intersect with the fish’s body. In the second scenario, the camera captures the fish, and the laser line precisely falls on the fish’s body. These scenarios result in different amounts of information obtained by the laser-camera system. In the first scenario, the system operates as a basic underwater camera system without depth information. In this case, the detected landmark points can be used to infer the fish’s two-dimensional body posture. In the second scenario, depth information is available, allowing for the estimation of a third-dimensional offset based on the two-dimensional body posture, thus providing three-dimensional body posture information in the real world. Additionally, in this scenario, calibration information can be utilized to calculate more comprehensive morphological data, such as body length, fork length, body depth, and surface area. Therefore, the algorithm first determines if a laser line is present on the detected fish. If no laser line is detected, the morphological information extraction is performed according to the first scenario. If a laser line is detected, a deeper level of data analysis is conducted by incorporating the depth information.
  • Scenario: Laser Line Not Intersecting with the Fish’s Body
When the laser line does not intersect with the fish’s body, the laser-camera system operates as a basic underwater camera system. In this case, the fish’s two-dimensional body posture is determined by analyzing the relative positions of landmark points (primarily points A and D in Figure 3) obtained through landmark point detection.
Figure 12 illustrates the angle α between the line formed by points A and D and the x-axis. This angle α is used to classify the fish’s two-dimensional body posture into six categories: swimming leftward, swimming diagonally left upward, swimming diagonally right upward, swimming rightward, swimming diagonally right downward, and swimming diagonally left downward. Specifically, when α falls within the range of 0–30° or 330–360°, the fish’s two-dimensional body posture corresponds to swimming leftward. When α falls within the range of 30–90°, the fish’s two-dimensional body posture corresponds to swimming diagonally left upward. For α within the range of 90–150°, the fish’s two-dimensional body posture corresponds to swimming diagonally right upward. In the case of α within the range of 150–210°, the fish’s two-dimensional body posture corresponds to swimming rightward. When α falls within the range of 210–270°, the fish’s two-dimensional body posture corresponds to swimming diagonally right downward. Finally, for α within the range of 270–330°, the fish’s two-dimensional body posture corresponds to swimming diagonally left downward. To ensure consistency and avoid conflicting definitions, the defined angle ranges are left-closed and right-open intervals.
  • Scenario: Laser Line Intersecting with the Fish’s Body
When the laser line intersects with the fish’s body, depth information is incorporated. In such cases, the z-axis offset, denoted as z, is computed by analyzing the laser points at the fish’s head and tail. A positive z value indicates that the head is closer to the camera than the tail, while a negative z value indicates the opposite. The magnitude of z represents the distance between point A and the midpoint of points C and E along the z-axis. By combining this depth information with the laser line, additional morphological data of value, including body length, fork length, and body depth, can be calculated. However, it should be noted that the laser line appears as a line segment in the generated image, and its distance from the top of the image can vary at different x-axis coordinates. Averaging the values to estimate morphological data can introduce significant errors. Therefore, in this study, when determining body length, fork length, and body depth, the target line segment is divided evenly into n small segments along the x-axis. The length within each small segment is calculated based on the distance from the laser line to the top of the image, and the lengths of these n small segments are summed to obtain the target value.
Figure 13 illustrates the calculation scheme employed for determining the fork length when the parameter n is set to 6. The line segment spanning points A and D represents the fork length of the fish. Considering the fish’s body posture, it is evident in Figure 13 that the fish’s head is positioned closer to the camera. Consequently, the physical world size corresponding to each pixel at the head significantly differs from that at the tail. To mitigate the impact of this variation on the calculation outcomes, the segment AD is divided into six equal parts. Within each part, the physical world size associated with each pixel point is derived based on the position of the laser line, thereby facilitating the calculation of the fork length |AD|. To enhance the system’s flexibility, the value of n is considered a configurable parameter. Here, n represents a positive integer that is smaller than the pixel distance between AD, as usual. Increasing the value of n reduces errors but also incurs higher program consumption time. When n is set to 0, the program performs pixel-by-pixel calculations. To address segments lacking laser coverage, an interpolation method is employed to supplement the missing laser line using the nearest laser line. This approach aims to minimize errors as much as possible.
The computation of total length and body depth employs a similar approach to that of calculating fork length, with the difference lying in the selection of distinct landmark points. In this scenario, the presence of target segmentation enables the calculation of the fish’s unilateral surface area. While the application of fish surface area remains constrained within the fisheries field, it serves as a fundamental morphological parameter that holds potential for diverse applications. To determine the unilateral surface area, it is adequate to ascertain the pixel count in each column of the mask and establish its correspondence with the physical world size per unit pixel in the respective column.

3. Results

3.1. Neural Network Training Results

3.1.1. Detection and Segmentation Results

The integration of detection and segmentation heads into the YOLOV7 backbone enabled the successful detection and segmentation of both laser and fish targets. Leveraging the robust generalization capability of YOLOV7, the detection and segmentation models attained convergence after undergoing 280 epochs of training on a specialized dataset specifically designed for detection and segmentation tasks. The progression of parameters during the training process is visualized in Figure 14.
The integration of segmentation heads into the YOLOV7 backbone introduced a new parameter called Segmentation Loss to the training process. This parameter complements the conventional Bounding Loss, Objectness Loss, and Classification Loss. In addition to precision, recall, and mean average precision (mAP) for object detection, a new set of metrics for segmentation was included. The progression of these parameters throughout the 280 training epochs is depicted in Figure 14. Both object detection and segmentation achieved mAP values surpassing 0.85, with object detection demonstrating slightly higher accuracy compared to segmentation.
The trained model was evaluated on the test set to obtain precision, recall, and mean average precision (mAP) values for each class in object detection and segmentation. Table 3 displays the precision (P), recall (R), and mAP values, providing an assessment of the average classification accuracy.
Table 3. Precision, Recall, and mAP for Object Detection and Segmentation in the Test Sets.
Table 3. Precision, Recall, and mAP for Object Detection and Segmentation in the Test Sets.
ObjectnessSegmentation
PRmAPPRmAP
fish0.9760.9240.9560.9840.9560.974
light0.9170.8440.8690.8550.7990.842
all0.9460.8840.9120.9200.8770.908
Table 3 presents the results, which indicate comparable accuracy between the object detection and segmentation models on the test set when compared to the training set, suggesting no significant overfitting. Notably, the fish class demonstrates higher precision and recall in both object detection and segmentation, while the light class achieves accuracy parameters around 0.8, meeting the experimental requirements of this study.
To rigorously validate the effectiveness of our selected architecture, we benchmarked our proposed module against Mask R-CNN. Mask R-CNN is a widely established two-stage instance segmentation baseline. Both models were fine-tuned and evaluated on our specific underwater dataset. The empirical results demonstrated the clear superiority of our approach in this specific domain. As shown in Table 4, the fine-tuned Mask R-CNN baseline achieved a bounding box mAP@0.5 of 0.884 and a segmentation mask mAP@0.5 of 0.770.
Our proposed module achieved significantly higher accuracy across all metrics, with a bounding box mAP@0.5 of 0.912 and a segmentation mask mAP@0.5 of 0.908. The reduced segmentation performance of Mask R-CNN highlights the inherent difficulty of extracting crisp pixel contours from blurry underwater images using traditional two-stage mask heads on limited datasets. Additionally, two-stage models inherently require substantially larger computational overhead compared to our lightweight single-stage backbone. This comparative analysis strongly justifies our methodological design. It proves that integrating a highly optimized single-stage detector with a dedicated landmark network provides a more robust and computationally efficient solution for morphometric measurement.

3.1.2. Landmark Detection Results

The accuracy of landmark detection plays a crucial role in ensuring precise measurement of fusiform fish size and pose data in this study. To assess the accuracy and fitting of the model, multiple control experiments were designed and conducted in the experimental section. Landmark detection involves solving a coordinate regression problem. Therefore, standard regression analysis metrics such as Mean Absolute Error (MAE) and R-squared (R2) were used to evaluate the performance of the model. MAE measures the average absolute difference between predicted values and true values. A smaller MAE indicates lower prediction errors and greater proximity to the true values. MAE is intuitively interpretable and less affected by outliers, but it does not capture the relative relationship between predicted and true values. On the other hand, R2 is a statistical measure that assesses how well the model fits the observed data by explaining the variance in the dependent variable. R2 ranges from negative infinity to 1, with values closer to 1 indicating a better fit of the model to the data, while smaller values indicate poorer fit. A negative R2 implies that the model performs worse than simply using average values for prediction. Both fully connected convolutional neural networks and DSNT-based convolutional neural networks were trained for 1000 epochs using both original and augmented data. The variations in MAE and R2 during the training process are depicted in Figure 15.
The results for two landmark regression methods, namely CNN (fully connected convolutional regression) and DSNT (DSNT-based landmark regression), are presented in Figure 15. The terms “Source” and “Enhanced” refer to the original dataset and the dataset after enhancement, respectively. The DSNT approach involves calculating MAE for the 32 × 32 points obtained from the heatmap, while the fully connected regression calculates MAE based on 8 points. This difference in point calculation leads to noticeable variations in MAE values between CNN and DSNT in Figure 15. When it comes to dataset variations, both the fully connected regression and DSNT-based landmark detection exhibit significant improvements when trained on the enhanced dataset. The R2 evaluation for both methods is based on 8-point coordinates. The DSNT-based landmark detection model performs the best on the training set of the enhanced dataset, while the fully connected regression shows similar R2 values between the original and enhanced datasets, both around 0.8. However, the DSNT approach demonstrates the poorest R2 performance on the original dataset, exhibiting considerable oscillations and a notable number of negative values during the initial 50 epochs. In conclusion, the DSNT-based regression model trained on the enhanced dataset demonstrates the best overall performance.
Both of the landmark detection methods mentioned earlier showed lower performance in terms of fitting when applied to the testing set compared to their performance on the corresponding training sets. The evaluation of model performance was conducted on the untrained testing set using MAE and R2 metrics. The results of the evaluation are presented in Table 5.
Table 5. Performance Evaluation of Landmark Detection Methods on the Test Sets.
Table 5. Performance Evaluation of Landmark Detection Methods on the Test Sets.
MAER2
MinMaxMeanMinMaxMean
CNN0.008630.76370.16910.06430.99910.7013
DSNT0.005530.76560.14310.06750.99960.9170
Table 4 displays the results for two models: CNN (fully connected regression) and DSNT (landmark detection using the DSNT module). The “MAE” and “R2” columns contain sub-columns indicating the minimum, maximum, and average values on the testing set. The results demonstrate that the DSNT-based landmark detection model slightly outperforms the fully connected regression model. However, Table 4 also highlights a limitation where both CNN and DSNT models exhibit poor fitting performance (R2 below 0.1 and MAE above 30) in certain images. Upon examination, these outliers occur when the fish is only partially captured, such as when only half of the fish body is detected. To address this issue, it is recommended to perform body posture and other morphological measurements exclusively when a complete fish body is detected in subsequent analyses.
The pre-trained YOLOV8n-Pose model was applied to the fish landmark detection dataset in this study, but the results were unsatisfactory. Although YOLOV8n-Pose performs both object detection and landmark detection, its landmark evaluation is based on six actual landmarks, which is fewer than the eight points used in CNN and DSNT. The first two points in CNN and DSNT represent the top-left and bottom-right corners of the minimum bounding box. As a result, the results of YOLOV8-based landmark detection are not included in Table 4. On the testing set, the average R2 for the detection of these six landmarks is only 0.3831, considerably lower than the R2 values obtained by the fully connected regression and DSNT-based landmark detection networks. Consequently, the DSNT-based landmark detection model will serve as the foundation for landmark delineation in subsequent measurements.

3.2. Fish Measurement Results

After completing the fish and laser line detection and fish landmark detection processes, essential data for measuring fish body posture and morphological characteristics are obtained. For each processed image, the object detection model is initially employed to identify all fish present, followed by the landmark detection model to detect landmarks on each fish. Subsequently, it is determined whether a laser line is present on each fish. If no laser line is detected, the two-dimensional body posture of the fish is calculated based on the relative positions of the landmarks. However, if a laser line is detected, the positions of the fish head and tail relative to the camera are determined, enabling the calculation of the z-axis position and measurements such as body length, fork length, and body depth. Finally, the two-dimensional body posture and three-dimensional morphological information are annotated onto the image, as depicted in Figure 16.
Figure 16 showcases the automated computation of body posture and morphological data when the fish’s body is intersected by the laser line (Figure 16a). In cases where the laser line is not projected onto the fish’s body, Figure 16b,c demonstrate the determination of two-dimensional body posture.
Table 6 presents a comparative analysis of morphological data for the eight edible fish species observed in the experiment. The columns TIP-Laser (Traditional Image Processing), DL-Laser (Deep Learning), and Label correspond to data computed using laser triangulation, data derived from deep learning-based methods, and manually measured data, respectively.
The findings presented in Table 6 demonstrate that both underwater laser triangulation and deep learning methods are viable for measuring fish morphological data. However, there are variations in the accuracy of these measurements. Upon analyzing the data in the table, it is observed that underwater laser triangulation exhibits an average error rate of 14.70%, whereas deep learning methods yield an average error rate of 7.75%. It is important to note that when the laser line is absent from a fish’s body, both approaches are unable to compute the morphological data for that particular fish. Consequently, the lack of calculated values for Catfish in Table 6 can be attributed to this limitation.

4. Discussion

4.1. System Positioning and Comparative Advantages

This study conducted research on measuring morphological data of fusiform fish in a laboratory environment. The objective was to provide a foundational solution for health monitoring in the aquaculture industry. Given the current validation within a controlled experimental tank, we position this study as a rigorous proof of concept. It demonstrates the feasibility of combining active laser illumination with deep learning, establishing a baseline with potential for future industry-wide implementation upon broader environmental validation. The proposed approach in this paper offers several advantages over other studies in terms of non-invasiveness, ease of deployment, high accuracy, and applicability to fusiform fish.
Furthermore, regarding practical deployment on living organisms, the potential for stress or avoidance behavior induced by measurement equipment must be carefully managed. Traditional manual measurements often cause severe physical stress and risk of injury. While optical methods are non-contact, intense light sources can still provoke avoidance reactions. Our system addresses this by utilizing a red laser line. As noted in our experimental setup, research has demonstrated that many fish species exhibit significantly lower visual sensitivity and behavioral disruption when exposed to the red spectrum compared to white light [21]. Combined with the instantaneous nature of the measurement, this optical design effectively mitigates light-induced stress. This ensures that the continuous monitoring process remains truly non-invasive and aligns with ethical animal welfare practices in aquaculture.
Real-time health monitoring of fish is crucial in aquaculture. While some existing research focuses on fish identification and segmentation, their descriptions of length estimation accuracy are limited. For example, R. Garcia et al. [27] achieved high segmentation accuracies for individual fish and overlapping fish, but their study lacked detailed information on length estimation accuracy. The rapid development of deep learning has led to high precision in object segmentation, especially with the introduction of the Segment Anything Model [28], which is relevant to large-scale image processing models. The current challenge in morphological measurement lies in measurement precision rather than segmentation accuracy. Some studies with high measurement accuracy perform computer vision-based measurements on fish placed on workbenches or assembly lines after the fish have died. However, these studies are often limited to specific fish species. A. F. Fernandes et al. [29] developed a computer vision system for measuring morphological characteristics and predicting body weight (BW) and carcass weight (CW) in live Nile tilapia. Although their study achieved high accuracy, it was restricted to measuring tilapia on a test bench and lacked flexibility for other fish species. Their approach relied solely on fitting within the workbench setting, resulting in limited universality and ease of deployment. In contrast, the proposed approach in this paper utilizes an assembled laser-camera system that can be placed inside the experimental tank without disrupting fish activities. It allows for real-time monitoring of fish health and only requires recalibration when environmental changes occur. Moreover, the proposed landmark detection algorithm is designed for fusiform fish, demonstrating promising applicability within this morphological category.
In the comparison between the traditional image-processing-based laser measurement and the deep learning-driven laser measurement, the latter demonstrates superior performance. Both approaches rely on the same laser hardware for depth. However, traditional image processing relies on rigid thresholds, making its experimental performance compromised due to noise and water impurities. The deep learning-driven approach effectively integrates instance segmentation and landmark detection to locate the laser line and fish features with much higher robustness. Inaccuracies in both angular and structural distance measurements directly affect measurement precision through conversion formulas. The utilization of more precise measuring instruments could potentially reduce the error rate in this approach. However, any changes in the structural configuration of the hardware model render the existing parameters ineffective. Additionally, impurities in the water can affect the propagation of the laser, leading to deviations from the original derivations. In contrast, the deep learning-driven measurement method establishes relationships between the camera’s distance to the target and the imaging position of the laser line, as well as the correspondence between the camera’s distance to the target and the pixel length representing a unit physical length. These relationships are obtained through underwater calibration. Underwater calibration accounts for potential variations in angular measurements and is considerably less challenging than achieving precise measurements. Regarding environmental robustness, it is important to note the difference in lighting sensitivity between our proposed system and traditional stereo vision. As our system relies on optical cameras for 2D object and landmark detection, these specific deep-learning stages are inevitably affected by extremely low light or severe turbidity. However, the most challenging aspect of underwater measurement—depth extraction—is significantly more robust in our design. Stereo vision relies on passive texture matching, which fails when dim lighting obscures fish scales. Conversely, our system employs an active illumination source (the red laser). Extracting a self-illuminated, high-contrast laser line from an image is fundamentally more resilient to ambient lighting variations and lack of object texture than passive disparity matching. Therefore, while our optical cameras require reasonable visibility to detect the fish, the active laser ensures that depth computation remains highly reliable even under sub-optimal illumination.

4.2. Error Analysis and Algorithmic Optimization

In the context of aquaculture health monitoring, research on extracting morphological data such as fish length, fork length, and body depth remains insufficient, hindering its industrial applications. In this study, the average measurement error for the extracted valid fish data in the breeding ponds was 7.75%, indicating a certain room for improvement. Throughout the experimental process, several aspects can be further optimized, including image enhancement, target segmentation, and landmark detection. Firstly, image enhancement plays a crucial role as it serves as the fundamental processing unit, directly influencing subsequent steps. Due to the unique characteristics of the fish growth environment, data acquisition can only be conducted underwater, leading to substantial noise interference in the collected data. Image quality optimization can be addressed from both hardware and software perspectives. On the hardware side, cameras equipped with features like night vision can be utilized, while on the software side, underwater image denoising algorithms can be employed. It is essential to ensure that image optimization does not compromise the imaging of the laser line.
The second optimization point focuses on target segmentation. While object detection has achieved high accuracy, precise target segmentation in blurry underwater environments remains challenging. As observed in our qualitative results (e.g., Figure 10), the predicted mask boundaries are relatively rough despite high confidence scores. It is important to clarify that this limitation is precisely why our system does not rely on mask contours or bounding boxes for measuring body length and width. Relying on rough segmentation edges would introduce unacceptable variance. Instead, this limitation validates our design choice to rely on the robust six-landmark structural constraint for critical morphometric calculations. In our current pipeline, the instance segmentation mask is primarily retained to facilitate the estimation of the unilateral surface area and to quickly bound the laser intersection. Nevertheless, inaccuracies in target segmentation still contribute to minor errors when determining the exact intersection of the fish and the laser line. One approach to improve target segmentation accuracy is to utilize the SAM large model as a pretraining model for transfer learning, as it has a significant impact.
The third optimization point revolves around landmark detection, which directly and intuitively affects the detection of morphological data. Based on the experimental process in this study, optimizing landmark detection offers the greatest potential for improvement. Notably, cases where the fish’s body is partially captured in the image can lead to significant deviations in landmark detection. Among the various landmark detection methods used in this study, the yolov8-pose transfer learning model exhibited poor detection performance, the fully connected regression-based landmark detection network was prone to overfitting and lacked robustness, and the DSNT module-based landmark detection network performed the best, although there is still considerable room for optimization. Overall, calculating Loss based on heatmaps tends to generate better robustness.

4.3. Limitations and Future Work

A notable limitation of the current framework is its reliance on isolated single-frame spatial analysis. The system requires at least half of the fish body to be visible to reliably extract the six anatomical landmarks. Under conditions of severe partial visibility or dense fish occlusion, the extraction of these critical landmarks inevitably fails. This failure disrupts the morphological calculation process. To overcome this limitation in high-density aquaculture environments, future iterations must incorporate temporal tracking mechanisms. By tracking individual fish across sequential video frames, the system could maintain continuous morphological records. When a subject becomes temporarily occluded, the system could utilize the highly accurate measurements captured in preceding frames where the full body was visible. This temporal integration will be crucial for maintaining measurement continuity and reliability. And we explicitly acknowledge that the current selection of the six anatomical landmarks is primarily an engineering-oriented heuristic. It was designed to maximize computational efficiency and system robustness. We have not yet conducted systematic ablation studies or quantitative comparative analyses against dense skeleton-based approaches. Validating this heuristic through rigorous ablation studies remains an important direction for future methodological refinement.
Despite the promising results, we acknowledge several limitations in the current study. Most notably, the dataset size remains relatively small (e.g., 16 images in the test set for object detection). Acquiring and meticulously annotating underwater live-fish data with structural landmarks is highly labor-intensive. Consequently, the small test set introduces a risk of statistical variance, and the current system should be viewed as a rigorous proof of concept validated within a controlled laboratory tank. While data augmentation techniques were employed to improve model robustness, they cannot fully replicate the biological and environmental diversity of real-world aquaculture. And generalizing this system to real-world aquaculture environments requires overcoming several complex environmental variables. First, water properties heavily impact system reliability. High turbidity from organic matter accelerates laser attenuation. Furthermore, varying salinity levels alter the underwater refractive index. Second, ambient light conditions in outdoor marine farms fluctuate drastically. Intense sunlight or variable cloud cover introduces unpredictable surface reflections and optical noise. This complicates the extraction of the laser line. Third, the physical properties of the fish themselves introduce significant variability. Different species exhibit diverse skin reflectances and scale textures. Some scales cause severe specular highlights, while others absorb the active illumination. Additionally, dense schooling behaviors in real farms create severe occlusions. These combined factors of water, light, and fish properties dictate that our current laboratory proof of concept will require robust physical modeling and advanced optical filtering prior to commercial deployment. Future work will focus on expanding the dataset across diverse commercial aquaculture environments, incorporating varying water turbidities, different lighting conditions, and a broader range of fusiform species to further validate and enhance the system’s generalization capabilities.
Furthermore, transitioning to real-time commercial deployment introduces strict computational hardware constraints. Our current laboratory validation utilizes high-performance hardware. However, practical aquaculture monitoring requires deploying the system on embedded edge devices within sealed underwater enclosures. These edge computing platforms have severe thermal dissipation limits and strict power budgets. While our YOLOv7 backbone is structurally efficient, running the integrated detection and landmark regression pipeline continuously could lead to thermal throttling. This thermal buildup would subsequently reduce the effective inference frame rate and compromise real-time performance. Therefore, future field deployments will necessitate aggressive model optimization. Techniques such as weight quantization and hardware-specific compilation are essential. These engineering optimizations will ensure the system maintains low latency without exceeding the strict thermal and power constraints of remote underwater hardware.
Transitioning this proof of concept from a near-field tank to a larger commercial farm environment will introduce critical operational considerations. The primary challenge is the anticipated measurement yield. Because our proposed system strictly requires the intersection of the laser line with the fish body to compute morphology, the instantaneous probability of capturing a valid measurement will naturally decrease in vast, unconstrained aquaculture cages. However, since aquaculture environments typically feature high stocking densities, continuous system operation over extended periods can continuously accumulate valid intersections. This temporal accumulation is sufficient to generate statistically reliable morphological distributions for population-level health and biomass estimation. Furthermore, practical deployment will strictly require rigorous in situ recalibration. Water properties such as salinity, temperature, and varying turbidity levels significantly alter the underwater refractive index. Consequently, standard baseline calibrations must be dynamically updated or replaced with robust physical models to maintain depth estimation accuracy across different farm environments. Future research will prioritize these field-deployment challenges.
Finally, future research should explore more advanced neural network paradigms. Recent architectures include Vision Transformers [30] and hybrid convolutional and transformer models. Notable examples include DETR [31] and MobileViT [32]. These models offer powerful dynamic attention mechanisms. However, they typically require vast datasets and high computational resources. Therefore, they are not strictly necessary under our current edge deployment constraints. Nevertheless, exploring these physics-guided and attention-enhanced architectures remains a highly promising future direction for handling extreme occlusion.

5. Conclusions

In conclusion, this study successfully demonstrates a deep learning-driven laser measurement framework. It serves as a rigorous proof of concept validated for fusiform fish under controlled laboratory conditions. Within this system, objectives such as fusiform fish and laser line detection, target segmentation, landmark detection of fusiform fish, and measurement of morphological data, including length, fork length, and body width, were achieved. Future research with vastly expanded datasets is required to evaluate its generalization to other marine species and complex commercial environments.

Author Contributions

Conceptualization, S.W. and S.Z.; methodology, Y.S.; software, S.W.; validation, Y.S., Z.W. and T.C.; formal analysis, Y.S.; investigation, S.W.; resources, S.Z.; data curation, T.C.; writing—original draft preparation, S.W.; writing—review and editing, S.Z.; visualization, S.W.; supervision, S.Z.; project administration, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Central Public-interest Scientific Institution Basal Research Fund, East China Sea Fisheries Research Institute (ECSFR), and Chinese Academy of Fishery Sciences (CAFS) (Grant No. 2024TD04).

Institutional Review Board Statement

Ethical review and approval were waived for this study. The subjects were commercially available edible species acquired strictly for non-invasive observational purposes. All experimental procedures involved zero physical contact and caused no harm. Furthermore, the active illumination system deliberately utilized a red laser to minimize visual stress and ensure maximal animal welfare during the brief data collection process.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to institutional data privacy policies.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Strachan, N. Length measurement of fish by computer vision. Comput. Electron. Agric. 1993, 8, 93–104. [Google Scholar] [CrossRef]
  2. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
  3. Yu, C.; Fan, X.; Hu, Z.; Xia, X.; Zhao, Y.; Li, R.; Bai, Y. Segmentation and measurement scheme for fish morphological features based on Mask R-CNN. Inf. Process. Agric. 2020, 7, 523–534. [Google Scholar] [CrossRef]
  4. Liao, W.; Liu, P.; Qiao, J.; Zhong, Y.; Huang, H.; Liu, G. Reducing the depth data fluctuation error of the binocular imaging system based on the trapezoidal body calibration diagram. Appl. Opt. 2025, 64, 10552–10563. [Google Scholar] [CrossRef]
  5. Zhang, L.; Zheng, Y.; Liu, Z. Fish mass Estimation method based on adaptive parameter tuning and disparity map restoration under binocular vision. Aquac. Eng. 2025, 110, 102535. [Google Scholar] [CrossRef]
  6. Gao, M.; Yin, X.; Yu, Y. Experimental Study on Thrust Prediction for Zebrafish Unsteady Maneuvering at St> 1: A Wake-Vortex-Based Linear Scaling Law. Available SSRN 6206644 2026. in preprint. [Google Scholar] [CrossRef]
  7. Soom, J.; Boavida, I.; Leite, R.; Costa, M.J.; Toming, G.; Leier, M.; Tuhtan, J.A. Open real-time, non-invasive fish detection and size estimation utilizing binocular camera system in a Portuguese river affected by hydropeaking. Ecol. Inform. 2025, 90, 103196. [Google Scholar] [CrossRef]
  8. Wang, G.; Yu, J.; Liu, S.; Xu, W.; Li, X.; Hao, Y.; Li, D. Automatic fish weight estimation and 3D surface reconstruction with a lightweight instance segmentation model. Expert Syst. Appl. 2025, 288, 128275. [Google Scholar] [CrossRef]
  9. Cheng, C.Y.; Lau, C. Edge-Deployable Stereo Vision for Fish Biomass Estimation via Lightweight YOLOv11n-Pose and Dynamic Geometry. Appl. Sci. 2026, 16, 4125. [Google Scholar] [CrossRef]
  10. Seibold, C.; Hilsmann, A.; Eisert, P. Non-invasive Growth Monitoring of Small Freshwater Fish in Home Aquariums via Stereo Vision. arXiv 2026, arXiv:2603.06421. [Google Scholar] [CrossRef]
  11. Zhang, H.; Guo, Y.; Xie, Y.; Zheng, Z. An integrated method for non-intrusive underwater fish measurement based on keypoint detection and stereo vision. Aquac. Int. 2025, 33, 1–28. [Google Scholar] [CrossRef]
  12. Ehlert, D.; Horn, H.-J.; Adamek, R. Measuring crop biomass density by laser triangulation. Comput. Electron. Agric. 2008, 61, 117–125. [Google Scholar] [CrossRef]
  13. Chen, R.; Li, Y.; Xue, G.; Tao, Y.; Li, X. Laser triangulation measurement system with Scheimpflug calibration based on the Monte Carlo optimization strategy. Opt. Express 2022, 30, 25290–25307. [Google Scholar] [CrossRef]
  14. Cheng, R.; Zhang, C.; Xu, Q.; Liu, G.; Song, Y.; Yuan, X.; Sun, J. Underwater fish body length estimation based on binocular image processing. Information 2020, 11, 476. [Google Scholar] [CrossRef]
  15. Liu, H.; Suo, F.; Li, Y.; Xiang, J. Research on A Binocular Fish Dimension Measurement Method Based on Instance Segmentation and Fish Tracking. In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
  16. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  17. Muñoz-Benavent, P.; Andreu-García, G.; Martínez-Peiró, J.; Puig-Pons, V.; Morillo-Faro, A.; Ordóñez-Cebrián, P.; Atienza-Vanacloig, V.; Pérez-Arjona, I.; Espinosa, V.; Alemany, F. Automated Monitoring of Bluefin Tuna Growth in Cages Using a Cohort-Based Approach. Fishes 2024, 9, 46. [Google Scholar] [CrossRef]
  18. Muñoz-Benavent, P.; Puig-Pons, V.; Andreu-García, G.; Espinosa, V.; Atienza-Vanacloig, V.; Pérez-Arjona, I. Automatic bluefin tuna sizing with a combined acoustic and optical sensor. Sensors 2020, 20, 5294. [Google Scholar] [CrossRef]
  19. Li, J.; Zhang, S.; Li, P.; Dai, Y.; Wu, Z. Research on measuring the bodies of underwater fish with inclined positions using the YOLOv8 model and a line-laser system. Fishes 2024, 9, 206. [Google Scholar] [CrossRef]
  20. Chen, S.; Liu, Y.; Gao, X.; Han, Z. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Biometric Recognition, Proceedings of the13th Chinese Conference, CCBR 2018, Urumqi, China, 11–12 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 428–438. [Google Scholar] [CrossRef]
  21. Widder, E.; Robison, B.; Reisenbichler, K.; Haddock, S. Using red light for in situ observations of deep-sea fishes. Deep Sea Res. Part I Oceanogr. Res. Pap. 2005, 52, 2077–2085. [Google Scholar] [CrossRef]
  22. De La Escalera, A.; Armingol, J.M. Automatic chessboard detection for intrinsic and extrinsic camera parameter calibration. Sensors 2010, 10, 2027–2044. [Google Scholar] [CrossRef]
  23. So, J.; Han, Y. Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition. Sensors 2025, 25, 3762. [Google Scholar] [CrossRef]
  24. Tang, C.-T.; Chiu, C.-T.; Chen, W.-J. 3D landmark-based face detection and recognition system for large poses. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
  25. Nibali, A.; He, Z.; Morgan, S.; Prendergast, L. Numerical coordinate regression with convolutional neural networks. arXiv 2018, arXiv:180107372. [Google Scholar]
  26. Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:230400501. [Google Scholar]
  27. Garcia, R.; Prados, R.; Quintana, J.; Tempelaar, A.; Gracias, N.; Rosen, S.; Vågstøl, H.; Løvall, K. Automatic segmentation of fish using deep learning with application to fish size measurement. ICES J. Mar. Sci. 2020, 77, 1354–1366. [Google Scholar] [CrossRef]
  28. Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y. Segment anything. arXiv 2023, arXiv:230402643. [Google Scholar] [CrossRef]
  29. Fernandes, A.F.; Turra, E.M.; De Alvarenga, É.R.; Passafaro, T.L.; Lopes, F.B.; Alves, G.F.; Singh, V.; Rosa, G.J. Deep Learning image segmentation for extraction of fish body measurements and prediction of body weight and carcass traits in Nile tilapia. Comput. Electron. Agric. 2020, 170, 105274. [Google Scholar] [CrossRef]
  30. Wang, M.; Wang, Y.; Islam, M.; Wang, Y.; Wang, Y.; Hwang, J.; Fan, Y. Dual machine learning pinpoints the Radius of Informative Structural Environments in metallic glasses. npj Comput. Mater. 2026, 12, 122. [Google Scholar] [CrossRef]
  31. Shi, Y.; Li, J.; Jia, Y.; Hong, Q. LDA-DETR: A lightweight dynamic attention-enhanced DETR for small object detection. PLoS ONE 2026, 21, e0340977. [Google Scholar] [CrossRef]
  32. Barot, M.; Kim, J.; Won, D.; Yoon, S.W. PhyViT-GAN: Physics-Guided MobileViT-GAN for precise self-alignment image generation. Int. J. Adv. Manuf. Technol. 2026, 1–20. [Google Scholar] [CrossRef]
Figure 1. Hardware device diagram for data collection. (a) displays the data acquisition device constructed using PVC pipes, T-junctions, and cross-junctions. Two underwater Hakester HK90A cameras equipped with six high-power white lights for night use, an RJ45 network interface, and support for free networking were fixed inside the frame along with one custom-made, waterproofed laser emitter in a straight line shape produced by the factory, creating a line length of 2 m when 2 m away from the target. (bd) show the front, side, and top views of the device, respectively. The frame’s bottom measures 380 mm in length and 298 mm in width and has a height of 638.5 mm. Camera 1 (the upper camera) was angled 20 degrees downwards from the horizontal direction, while Camera 2 (the lower camera) was angled 10 degrees downwards from the horizontal direction.
Figure 1. Hardware device diagram for data collection. (a) displays the data acquisition device constructed using PVC pipes, T-junctions, and cross-junctions. Two underwater Hakester HK90A cameras equipped with six high-power white lights for night use, an RJ45 network interface, and support for free networking were fixed inside the frame along with one custom-made, waterproofed laser emitter in a straight line shape produced by the factory, creating a line length of 2 m when 2 m away from the target. (bd) show the front, side, and top views of the device, respectively. The frame’s bottom measures 380 mm in length and 298 mm in width and has a height of 638.5 mm. Camera 1 (the upper camera) was angled 20 degrees downwards from the horizontal direction, while Camera 2 (the lower camera) was angled 10 degrees downwards from the horizontal direction.
Fishes 11 00298 g001
Figure 2. Distortion correction of wide-angle images.
Figure 2. Distortion correction of wide-angle images.
Fishes 11 00298 g002
Figure 3. Standard for labeling fish landmarks with 6 points. The distance between points A and D represents the fork length, while the total length is defined as the distance from point A to the line that connects points C and E. Moreover, the body depth is indicated by the distance between points B and F. These six landmarks’ relative positions effectively reflect the fish’s body shape.
Figure 3. Standard for labeling fish landmarks with 6 points. The distance between points A and D represents the fork length, while the total length is defined as the distance from point A to the line that connects points C and E. Moreover, the body depth is indicated by the distance between points B and F. These six landmarks’ relative positions effectively reflect the fish’s body shape.
Fishes 11 00298 g003
Figure 4. Illustration of Data Augmentation Techniques of the Landmark Datasets. The first column displays five randomly selected image samples from the Landmark dataset. Each row in the figure corresponds to the preview obtained after applying the respective augmentation operation. The last column (RA) showcases preview images generated by randomly combining all data augmentation methods, with each method having a probability of 0.5. Through data augmentation, the sample size increased to 6310. This augmented dataset was divided into training and validation sets in an 8:2 ratio.
Figure 4. Illustration of Data Augmentation Techniques of the Landmark Datasets. The first column displays five randomly selected image samples from the Landmark dataset. Each row in the figure corresponds to the preview obtained after applying the respective augmentation operation. The last column (RA) showcases preview images generated by randomly combining all data augmentation methods, with each method having a probability of 0.5. Through data augmentation, the sample size increased to 6310. This augmented dataset was divided into training and validation sets in an 8:2 ratio.
Fishes 11 00298 g004
Figure 5. Schematic diagram of laser triangulation ranging. (a) The basic principle of displacement, where a change in the object’s distance (from d1 to d2) causes a proportional displacement (Δd) of the reflected laser spot on the imaging surface. (b) The detailed geometric model for depth calculation based on similar triangles. Point A represents the laser emitter, and point N is the optical center of the camera. The solid pink line denotes the active laser beam, while the black lines indicate the optical projection path. Key variables such as focal length (f), baseline distance (s), and imaging surface distances (x1,x2) are utilized to compute the actual depth (d).
Figure 5. Schematic diagram of laser triangulation ranging. (a) The basic principle of displacement, where a change in the object’s distance (from d1 to d2) causes a proportional displacement (Δd) of the reflected laser spot on the imaging surface. (b) The detailed geometric model for depth calculation based on similar triangles. Point A represents the laser emitter, and point N is the optical center of the camera. The solid pink line denotes the active laser beam, while the black lines indicate the optical projection path. Key variables such as focal length (f), baseline distance (s), and imaging surface distances (x1,x2) are utilized to compute the actual depth (d).
Fishes 11 00298 g005
Figure 6. Object dimensions measurement based on laser triangulation. (a) The 3D spatial geometric model of the laser-camera system. The red spheres A and B represent the camera and laser emitter, respectively. The solid red line represents the active laser beam intersecting the object between points M and N, while the black dashed lines denote the optical projection paths. (b) The 2D geometric relationship in the plane of triangle AMN when the perpendicular projection point D is not located on the line segment MN. (c) The 2D geometric relationship in the plane of triangle AMN when the perpendicular projection point D is located on the line segment MN.
Figure 6. Object dimensions measurement based on laser triangulation. (a) The 3D spatial geometric model of the laser-camera system. The red spheres A and B represent the camera and laser emitter, respectively. The solid red line represents the active laser beam intersecting the object between points M and N, while the black dashed lines denote the optical projection paths. (b) The 2D geometric relationship in the plane of triangle AMN when the perpendicular projection point D is not located on the line segment MN. (c) The 2D geometric relationship in the plane of triangle AMN when the perpendicular projection point D is located on the line segment MN.
Fishes 11 00298 g006
Figure 7. The custom-made steel calibration board. The alternating black and yellow vertical lines, along with the corresponding numbers, divide the board into 10 cm segments to mitigate challenges associated with underwater imaging. The central horizontal line serves as an alignment guide for the laser line during the calibration process.
Figure 7. The custom-made steel calibration board. The alternating black and yellow vertical lines, along with the corresponding numbers, divide the board into 10 cm segments to mitigate challenges associated with underwater imaging. The central horizontal line serves as an alignment guide for the laser line during the calibration process.
Fishes 11 00298 g007
Figure 8. Fitting result of X-Y.
Figure 8. Fitting result of X-Y.
Fishes 11 00298 g008
Figure 9. Fitting result of X-Z.
Figure 9. Fitting result of X-Z.
Fishes 11 00298 g009
Figure 10. Enhanced Object Detection and Segmentation Model with YOLOV7 Backbone. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Figure 10. Enhanced Object Detection and Segmentation Model with YOLOV7 Backbone. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Fishes 11 00298 g010
Figure 11. Network Architectures for Landmark Detection (Fully Connected and DSNT-based) based on MobileFaceNet. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Figure 11. Network Architectures for Landmark Detection (Fully Connected and DSNT-based) based on MobileFaceNet. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Fishes 11 00298 g011
Figure 12. Fish Body Posture Classification Based on Landmark Points. The red ellipses denote the predefined anatomical landmarks. Letters A and D represent the snout tip and tail fork, respectively. The angle α represents the angle between the line connecting A and D and the horizontal x-axis, which is utilized to classify the 2D swimming posture. The arrows indicate the directions of the x, y a axes in the spatial coordinate system.
Figure 12. Fish Body Posture Classification Based on Landmark Points. The red ellipses denote the predefined anatomical landmarks. Letters A and D represent the snout tip and tail fork, respectively. The angle α represents the angle between the line connecting A and D and the horizontal x-axis, which is utilized to classify the 2D swimming posture. The arrows indicate the directions of the x, y a axes in the spatial coordinate system.
Fishes 11 00298 g012
Figure 13. Calculation Scheme for Fork Length with n = 6 Segments. The red dots represent the detected anatomical landmarks, with letters A and D denoting the snout tip and tail fork, respectively. The green dotted line connecting A and D indicates the fork length being measured. The black vertical dotted lines illustrate the division of the segment into six equal parts along the x-axis for localized depth calculation. The thick red line across the fish body is the projected active laser line. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Figure 13. Calculation Scheme for Fork Length with n = 6 Segments. The red dots represent the detected anatomical landmarks, with letters A and D denoting the snout tip and tail fork, respectively. The green dotted line connecting A and D indicates the fork length being measured. The black vertical dotted lines illustrate the division of the segment into six equal parts along the x-axis for localized depth calculation. The thick red line across the fish body is the projected active laser line. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Fishes 11 00298 g013
Figure 14. Parameter changes during the training process. (a) The progression of various training losses, including Bounding Box, Segmentation, Objectness, and Classification Loss. (b) The Precision curves for Objectness and Segmentation. (c) The Recall curves for Objectness and Segmentation. (d) The mean Average Precision (mAP) curves for Objectness and Segmentation.
Figure 14. Parameter changes during the training process. (a) The progression of various training losses, including Bounding Box, Segmentation, Objectness, and Classification Loss. (b) The Precision curves for Objectness and Segmentation. (c) The Recall curves for Objectness and Segmentation. (d) The mean Average Precision (mAP) curves for Objectness and Segmentation.
Fishes 11 00298 g014
Figure 15. Variations in MAE and R2 during the Training Process for Landmark Detection Models.
Figure 15. Variations in MAE and R2 during the Training Process for Landmark Detection Models.
Fishes 11 00298 g015
Figure 16. Calculation of Body Posture and Morphological Data. (a) Automated computation of 3D morphological data and 2D posture when the fish is intersected by the active red laser line. (b,c) Determination of 2D body posture when the laser line does not project onto the fish. The red dots and corresponding letters (A–F) indicate the six detected anatomical landmarks. The blue text displays the real-time algorithm output. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Figure 16. Calculation of Body Posture and Morphological Data. (a) Automated computation of 3D morphological data and 2D posture when the fish is intersected by the active red laser line. (b,c) Determination of 2D body posture when the laser line does not project onto the fish. The red dots and corresponding letters (A–F) indicate the six detected anatomical landmarks. The blue text displays the real-time algorithm output. The fish shown in the images are live and free-swimming. Inverted or tilted orientations capture natural, transient dynamic maneuvers within the experimental tank.
Fishes 11 00298 g016
Table 1. Morphological data of cultured fish.
Table 1. Morphological data of cultured fish.
NameFork Length/cmTotal Length/cmBody Depth/cmBody Weight/g
Blackfish343810631.8
Crucian carp 118.320.37.5172.2
Crucian carp 21516.55.891.6
Crucian carp 317.819.57.1159.5
Crucian carp 4242710.5413.9
Crucian carp 524299.5385.8
Sea bass23.52610.5395.5
Catfish31.237.113.6741.2
Table 2. The distribution of images and labels in the target detection and segmentation dataset.
Table 2. The distribution of images and labels in the target detection and segmentation dataset.
PartitionNumber of ImagesFishes InstancesLights Instances
Training set66915641875
Validation set59132154
Test set164043
Total74417362072
Table 4. Performance comparison between the proposed model and the Mask R-CNN baseline.
Table 4. Performance comparison between the proposed model and the Mask R-CNN baseline.
ModelBBox mAP@0.5Mask mAP@0.5
Mask R-CNN (Fine-tuned)0.8840.770
Proposed (YOLOv7-ISegment)0.9120.908
Table 6. Comparison of Morphological Data.
Table 6. Comparison of Morphological Data.
NameFork Length/cmTotal Length/cmBody Depth/cm
TIP-LaserDL-LaserLabelTIP-LaserDL-LaserLabelTIP-LaserDL-LaserLabel
Blackfish28.2231.663429.4435.03387.808.5510
Crucian carp 123.4220.4918.318.4720.7420.39.027.657.5
Crucian carp 219.0513.721516.9514.5616.56.035.705.8
Crucian carp 321.9617.7217.820.4718.0719.56.745.677.1
Crucian carp 420.6425.282428.2726.852712.3811.8510.5
Crucian carp 523.0424.992425.5226.482911.9710.799.5
Sea bass19.0325.6423.520.228.382610.8110.9410.5
CatfishNoneNone31.2NoneNone37.1NoneNone13.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Zhang, S.; Shi, Y.; Wu, Z.; Cheng, T. Integrated Laser Imaging for Fusiform Fish Measurement in Aquaculture. Fishes 2026, 11, 298. https://doi.org/10.3390/fishes11050298

AMA Style

Wang S, Zhang S, Shi Y, Wu Z, Cheng T. Integrated Laser Imaging for Fusiform Fish Measurement in Aquaculture. Fishes. 2026; 11(5):298. https://doi.org/10.3390/fishes11050298

Chicago/Turabian Style

Wang, Shuxian, Shengmao Zhang, Yongchuang Shi, Zuli Wu, and Tianfei Cheng. 2026. "Integrated Laser Imaging for Fusiform Fish Measurement in Aquaculture" Fishes 11, no. 5: 298. https://doi.org/10.3390/fishes11050298

APA Style

Wang, S., Zhang, S., Shi, Y., Wu, Z., & Cheng, T. (2026). Integrated Laser Imaging for Fusiform Fish Measurement in Aquaculture. Fishes, 11(5), 298. https://doi.org/10.3390/fishes11050298

Article Metrics

Back to TopTop