Design and Realization of a Novel Robotic Manta Ray for Sea Cucumber Recognition, Location, and Approach

Sea cucumber manual monitoring and fishing present various issues, including high expense and high risk. Meanwhile, compared to underwater bionic robots, employing autonomous underwater robots for sea cucumber monitoring and capture also has drawbacks, including low propulsion efficiency and significant noise. Therefore, this paper is concerned with the design of a robotic manta ray for sea cucumber recognition, localization, and approach. First, the developed robotic manta ray prototype and the system framework applied to real-time target search are elaborated. Second, by improved YOLOv5 object detection and binocular stereo-matching algorithms, precise recognition and localization of sea cucumbers are achieved. Thirdly, the motion controller is proposed for autonomous 3D monitoring tasks such as depth control, direction control, and target approach motion. Finally, the capabilities of the robot are validated through a series of measurements. Experimental results demonstrate that the improved YOLOv5 object detection algorithm achieves detection accuracies (mAP@0.5) of 88.4% and 94.5% on the URPC public dataset and self-collected dataset, respectively, effectively recognizing and localizing sea cucumbers. Control experiments were conducted, validating the effectiveness of the robotic manta ray’s motion toward sea cucumbers. These results highlight the robot’s capabilities in visual perception, target localization, and approach and lay the foundation to explore a novel solution for intelligent monitoring and harvesting in the aquaculture industry.


Introduction
The high economic value of sea cucumber products has led to the rapid development of sea cucumber aquaculture [1,2]. During the sea cucumber farming process, real-time recognition and localization of sea cucumbers play a vital role in monitoring their growth status and facilitating the capture of farmed sea cucumbers. Currently, underwater manual operations are the primary means for sea cucumber monitoring and harvesting. However, prolonged underwater operations pose significant risks to personnel due to factors such as high pressure and low-temperature [3]. Therefore, highly intelligent autonomous underwater robots offer convenience for underwater mobile monitoring and harvesting [4,5]. The traditional autonomous underwater robots are commonly driven by propellers during underwater operations. They are prone to entanglement with aquatic vegetation and suffer from disadvantages such as low propulsion efficiency and high noise, which cause significant disturbance to aquatic organisms [6]. In contrast, fish species have evolved perception, researchers have also conducted corresponding studies on manta rays. The Automation Institute of the Chinese Academy of Sciences developed a robotic manta ray equipped with a visual system and proposed an algorithmic framework for real-time digital video stabilization [19]. Northwestern Polytechnical University achieved manta ray relative positioning by combining improved target detection algorithms and binocular distance measurement using a robotic manta ray equipped with dual cameras [20]. The fusion of visual perception and deep learning techniques in robotic fish will be a future development trend for underwater bio-inspired robots.
With the advancement of deep learning and edge computing technologies, combined with lightweight image processing algorithms, robotic fish with visual perception capabilities can achieve real-time online processing of image data. Object detection is an important means of visual perception for underwater robotic fish. Convolutional neural networkbased object detection algorithms can be divided into two-stage and one-stage algorithms. Two-stage algorithms, mainly represented by the RCNN series [21][22][23], achieve higher detection accuracy but have slower processing speeds. One-stage algorithms, mainly represented by the YOLO series [24][25][26][27] and SSD series [28,29], have faster inference speeds. In recent years, with the advantages of Transformer in global feature extraction, it has been successfully applied to dense prediction tasks [30,31]. For example, the Swin Transformer [32] constructed a pyramid structure with gradually decreasing resolutions to realize feature learning based on the Transformer at multiple scales and extract short-range and long-range visual information. Experimental results demonstrated the superiority of this algorithm. Exploring lightweight and high-precision object detection algorithms to be embedded in bio-inspired robotic manta rays is particularly important for enhancing their visual perception capabilities. Additionally, localization algorithms based on binocular vision and semi-global block matching (SGBM) [33,34] will provide stereo visual perception capabilities for underwater robots.
Therefore, the objective of this paper is to design and implement a small bio-inspired manta ray with visual perception capabilities and a rigid-flexible coupled pectoral fin. It aims to enable sea cucumber recognition, localization, and approach, thus establishing the foundation for monitoring the activity status of sea cucumbers and subsequent automated harvesting. The main contributions can be summarized as follows:

1.
Designing a novel robotic manta ray with visual perception capabilities and a rigidflexible coupled pectoral fin.

2.
Improving the YOLOv5s object detection and incorporating binocular stereo-matching algorithms to achieve accurate sea cucumber identification and localization.

3.
Designing a fuzzy PID controller to realize depth control, direction control, and target approach motion control for the robotic manta ray.
The remaining structure of this paper is as follows: Section 2 elaborates on the overall electromechanical design of the rigid-flexible coupled pectoral fin bio-inspired manta ray. In Section 3, the sea cucumber recognition and localization algorithm based on the improved YOLOv5s object detection and SGBM binocular stereo matching is introduced. Section 4 focuses on the depth control, direction control, and approach motion control of the manta ray based on localization information. Experimental results of the sea cucumber recognition and localization algorithm, as well as the depth control, direction control, and approach motion control of the manta ray, are presented in Section 5. Section 6 provides a discussion of the research presented in this paper. Finally, Section 7 concludes the entire paper with a comprehensive summary.

Overview of Robotic Manta Ray
The manta ray, as a typical fish utilizing the MPF mode of propulsion, exhibits outstanding stability and maneuverability during motion [35]. It also demonstrates remarkable agility and disturbance resistance at low speeds, making it highly suitable for carrying various optoelectronic sensors and performing flexible maneuvers underwater. The undulatory fins of the manta ray inspire the propulsor design of the robotic manta ray.
To ensure the integrity and consistency of the bio-inspired robotic manta ray, a topdown design approach is employed for the mechanical structure design. First, the overall shape of the bio-inspired robotic manta ray is designed from a holistic perspective. Second, considering the practical requirements, functionalities, performance, and constraints of the entire system, the bio-inspired robotic manta ray is decomposed into three separate sub-components: pectoral fins, caudal fin, and body shell. Finally, employing a local design approach, each component module with different functionalities is gradually refined and designed.
The bio-inspired robotic manta ray operates underwater in a marine environment; therefore, the materials used must possess characteristics such as lightweight, high strength, corrosion resistance, good plasticity, and ease of processing [36]. Considering the compressive strength and corrosion resistance of the resin [37], the black resin is chosen for constructing the body shell of the robotic manta ray. This paper analyzes the shape characteristics of manta rays based on the propulsion mode of manta rays in nature and knowledge from biomimetics. The mechanical structure of the bio-inspired robotic manta ray is rationally simplified. Based on this analysis, the design parameters for the caudal fin, rigid body shell, and rigid-flexible coupled pectoral fins are determined. Figure 1a illustrates the overall rendering of the bio-inspired robotic manta ray, and Figure 1b shows the prototype of the bio-inspired robotic manta ray. Table 1 provides the technical parameters of the bio-inspired robotic manta ray.

Internal Layout of Robotic Manta Ray
The rigid shell of the robotic manta ray provides ample space for accommodating various electronic devices, control components, and batteries. The internal layout, as shown in Figure 2, includes four sets of 7.4 V lithium batteries positioned at the central bottom of the shell to lower the center of gravity and ensure balance. Above the battery compartment, the controller, inertial measurement unit (IMU), and battery level monitoring module are placed at a relatively higher position to protect the electronic components from direct damage in case of accidental water ingress. The attitude sensor is centrally located within the internal space of the shell, accurately capturing the manta ray's posture. The power module is connected to a separate battery compartment through support pillars at the bottom of the shell, providing both convenience of connection and waterproofing functionality. The machine vision computing module, equipped with a Jetson Xavier NX board, is located at the back of the robotic manta ray, powered by a dedicated 14.8 V battery. The two buoyancy balance units, positioned on both sides of the robotic manta ray, serve to adjust the center of gravity, thereby increasing stability and balancing buoyancy forces. The bottom layout of the robotic manta ray is depicted in Figure 3. The waterproof electric switch, charging port, and depth sensor are positioned within the central groove of the robotic manta ray. This design can avoid affecting the overall hydrodynamic performance. The binocular camera, as shown in Figure 3b, is externally mounted on the bottom of the robotic manta ray, facilitating easy disassembly and expansion.

Pectoral Fin Undulation Design
The pectoral fin is the most crucial locomotion organ of the manta ray [38] and serves as the core design element in the robotic manta ray. According to relevant biological research, the complex and flexible deformation of the pectoral fin during stable cruising can be decomposed into the superposition of two orthogonal traveling waves [39]. As shown in Figure 4, traveling wave I propagate from the base to the tip of the pectoral fin along the span direction, while traveling wave II approximately propagates from the head to the tail along a chord parallel to the water flow. By coordinating these two sets of traveling waves, the manta ray achieves efficient and agile motion.
Inspired by this, this paper proposes a bio-inspired manta ray pectoral fin design scheme, where the propulsion mechanism of the pectoral fin employs a simple configuration of two pairs of fin strips and a flexible membrane wing. The overall structure of the pectoral fin is illustrated in Figure 5a. Each pectoral fin is equipped with two digital servos capable of continuous bidirectional rotation from 0 to 180 degrees, enabling independent or synchronized control. This design scheme allows for switching between undulating and flapping propulsion modes. The servo motion of the pectoral fin follows a sinusoidal pattern as described by Equation (1), where ψ l represents the angular motion of the front servo, ψ r represents the angular motion of the rear servo, ψ L0 − ψ R0 represents the phase difference between the front and rear servos, and θ L0 − θ R0 represents the servo bias angle.
(1)  The undulation propulsion mode, depicted in Figure 5b, involves a 0.2 ms delay between the activation of the front and rear fin strips. The two fin strips have equal amplitudes and maintain a certain phase difference, resulting in periodic oscillations that drive the rubber membrane wing to create the undulating motion. The flapping propulsion mode, illustrated in Figure 5c, involves simultaneous activation of both fin strips. The front fin strip has a larger amplitude compared to the rear fin strip, resulting in a wave motion that gradually decreases from front to back, propelling the rubber membrane wing forward. As the main propulsion actuator in the MPF propulsion mode, the bio-inspired pectoral fin actuator generates stable and smooth thrust, providing the robotic manta ray with precise control forces for subtle adjustments during motion control. The designed motion of the bio-inspired robotic manta ray is achieved by four driving servos that actuate the two pairs of fin strips to perform cyclic oscillations. The rigid-flexible coupling design ensures the correct temporal sequence of pectoral fin motions while incorporating a certain level of passive flexibility to reduce resistance and increase radial force. The soft membrane wing undergoes passive deformation under the combined action of the active fin strips and water damping, generating a propulsion wave that propagates in the opposite direction, propelling the robotic manta ray forward. The front and rear pairs of fin strips enable precise control of the wave motion of the pectoral fin. Compared to other bio-inspired fish pectoral fins, the advantage of the proposed rigid-flexible coupling flapping structure lies in its ability to generate multiple motion modes, providing enhanced maneuverability. It also offers faster-flapping motion and greater flexibility in undulating movement, making it well-suited for a wide range of underwater tasks.

YOLOv5s-ST Network
To improve the detection efficiency of sea cucumbers in practical applications and achieve real-time edge computing with a lightweight network, the YOLOv5s lightweight model is employed as the object detection model in this paper. The YOLOv5s model uses the CSPDarknet53 backbone, which, while stacking convolutional layers, widens the receptive field to capture local information and perform global information mapping based on the local information [40]. However, convolutional neural networks do not possess the same capability as transformers in extracting global feature information based on receptive fields and network depth [41].
To further extract global features from images and improve the accuracy of sea cucumber detection, this paper proposes a network based on YOLOv5s-ST (combining YOLOv5s with Swin Transformer). By introducing swin transformer blocks into the structure, it effectively considers shift invariance, scale invariance, and receptive field in convolutional neural networks, while also capturing global information and learning long-range dependencies. It ensures information propagation between windows through windows and shifted windows, effectively reducing the computational overhead in dense prediction tasks based on transformers. This achieves global modeling with good generalization capabilities [32]. The main improvement is the incorporation of swin transformer block modules into the two C3 modules of the CSPDarknet53 backbone, referred to as C3STR modules, to extract more advanced semantic features. This enhances the ability to extract globally correlated features from images. The YOLOv5s-ST algorithm framework is illustrated in Figure 6, and the structure of the swin transformer block is shown in Figure 7.

Sea Cucumber Positioning Method Based on Binocular Stereo Matching
This paper employs the HBV-1780-2S 2.0 model of a binocular camera, which captures left and right binocular images with a resolution of 640 × 480. The MATLAB Stereo Camera Calibrator toolbox is utilized for calibrating the parameters of the binocular camera. The stereo-matching process is implemented using the Semi-Global Block Matching (SGBM) algorithm.
The key steps for obtaining target information involve target recognition, with a focus on object keypoint detection and obtaining target position information. This paper combines the YOLOv5s-ST algorithm with binocular stereo-vision algorithms to achieve the localization and range of specific targets. The specific flowchart is illustrated in Figure 8. The target detection algorithm is capable of identifying the target's category and center point coordinates in the image. Subsequently, the SGBM stereo-matching algorithm is employed to calculate the depth matrix of the target and obtain its three-dimensional coordinates. Finally, the distance to the target is computed, thereby achieving target localization and range.

Depth, Direction and Approach Control
Depth and direction control based on localization information is essential for the monitoring and operational tasks of the biomimetic robotic manta ray in aquaculture environments. Depth control ensures that the robotic manta ray maintains a specific depth in the water, allowing it to perform monitoring tasks within a designated depth range. This enables stable underwater footage, focuses on important scenes, and facilitates detailed inspection. direction control allows the robotic manta ray to move and monitor in specific directions, enabling comprehensive monitoring of aquaculture areas. Equipped with a variety of sensors, the biomimetic robotic manta ray can perceive its underwater state. By effectively integrating depth and direction control algorithms, the autonomy and flexibility of the robotic manta ray in water can be enhanced, enabling it to perform various tasks in complex underwater environments.
A fuzzy controller consists of four main components: fuzzification, fuzzy rule base, fuzzy inference, and defuzzification [42]. It is the core of a fuzzy control system. Its primary function is to map the input and output variables to membership functions and use a set of fuzzy rules based on empirical knowledge to determine the output. This improves the responsiveness and stability of the system [43].
In the depth control system, the depth sensor and the set depth value are the inputs to the controller. The error e and the rate of change in error ec are calculated based on these inputs. The error and error rate are used in the fuzzy PID controller to compute the modified values of the traditional PID parameters, namely ∆K p , ∆K i , ∆K d . Similarly, in directional control, the inputs to the fuzzy PID controller are the yaw angle error e and the rate of change in the yaw angle error ec. The fuzzy PID controller for the approach control consists of a depth controller and directional controller, which is illustrated in Figure 9. In the approach control system, the binocular camera sends the three-dimensional spatial coordinates of the sea cucumber to the lower-level controller. The lower-level controller utilizes the fuzzy PID controllers for depth and direction control to adjust the fin strip's bias angle and amplitude for the next motion cycle.  In PID control, the initial values of three parameters, K p , K i , and K d , need to be determined. These initial values can be determined using engineering measurement methods. Referring to Equation (2), the PID parameters are adjusted based on the correction information from the fuzzy PID controller.
K p0 , K i0 , and K d0 represent the initial values of the PID controller, while K p , K i , and K d represent the adjusted output values. The fuzzy inference designed in this study is based on the fuzzification of the error e and the error rate ec, as well as the fuzzy rule base, to derive the fuzzy subsets corresponding to ∆K p , ∆K i , and ∆K d . The three-dimensional surface plots of the fuzzy-inferred output variables are depicted in Figure 10. It can be observed that the output variables exhibit smooth changes as the input variables vary, which satisfies the basic requirements of fuzzy control rules. In the figure, different colors represent different values of ∆K p , ∆K i , and ∆K d , with yellow representing larger values and blue representing smaller values.

Experiment and Analysis of Sea Cucumber Recognition and Location Algorithms Based on Improved YOLOv5s
The hardware environment used for model training in this study consisted of an Intel i7-1180H CPU and an Nvidia GeForce RTX3060 GPU. The software environment employed Python 3.6.5 and the PyTorch deep learning framework. The training process parameters are shown in Table 2. Two sea cucumber datasets were used in this study to validate the model's detection performance. The first dataset was from the China Underwater Robot Professional Contest (URPC2020) [44], which contains publicly available data. After data cleaning and partitioning, the sea cucumber dataset consisted of 3001 images with a total of 6808 ground truth bounding boxes. The training set contained 2370 images with 5413 ground truth bounding boxes, while the validation set consisted of 631 images with 1395 ground truth bounding boxes. The second sea cucumber dataset was collected from Mingbo Aquaculture Co., Ltd. (Yantai, China). The self-collected dataset was calibrated and divided, resulting in 630 sea cucumber images with a total of 1951 ground truth bounding boxes. The training set consisted of 490 images with 1510 ground truth bounding boxes, and the validation set comprised 140 images with 441 ground truth bounding boxes.
To better validate the effectiveness of the algorithm, this study conducted experiments on the URPC public dataset. In the same experimental environment, a comparative experiment and performance evaluation were performed using YOLOv5s-ST. Additionally, to address the challenges posed by harsh underwater environments, the relative global histogram stretching (RGHS) [45] image enhancement method was employed for preprocessing the images. The comparative results of model training on the public dataset are shown in Figure 11. From the figure, it could be observed that YOLOv5s-ST was capable of detecting smaller and more concealed sea cucumbers and successfully detecting a larger number of sea cucumbers. The comparison demonstrated that YOLOv5s-ST outperforms YOLOv5s in terms of detection performance. As shown in Figure 12, the comparison of average precision (AP) before and after the improvement was depicted. The mAP@0.5 represents the AP for each category when the intersection over union (IoU) threshold is set to 0.5. It could be observed that the purple curve exhibited greater improvement compared to the yellow, green, and blue curves. The blue curve rises the slowest, indicating the slowest fitting speed during YOLOv4 training. Although the yellow and green curves have a faster-rising speed, with the increase in training rounds, the stable mAP@0.5 value is slightly lower than that of the purple and red curves, indicating that YOLOv5s and YOLOv7 have a faster fitting speed during training. However, the accuracy after full training is less than that of YOLOv5s-ST and YOLOv5s-ST-RGHS. The specific training results are presented in Table 3. The calculations of Precision, Recall, and F1-Score are shown in Equations (3)-(5). In the formula, TP (True Positive) is the sample that is correctly predicted as sea cucumber; FN (False Negative) is the sample that is incorrectly predicted as the background; TN (True Negative) is the sample that is correctly predicted as the background; and FP (False Positive) is the sample that is incorrectly predicted as sea cucumber. For the self-collected dataset, comparative experiments were also conducted, and the detection result comparison is shown in Figure 13. The mAP@0.5 was 93.5% for the YOLOv5s. In this study, the model trained on the URPC dataset was used as the pre-trained model for the self-collected dataset. By training with YOLOv5s-ST, the mAP@0.5 improved to 94.3%, showing a relative improvement of 0.8% compared to the original model. Finally, by applying RGHS for image enhancement on the dataset, the optimal experimental model of this study was obtained, with an average precision mAP@0.5 of 94.5%, representing a 1.0% improvement compared to the original model. These results met the experimental requirements and demonstrated better performance compared to the training results of YOLOv4 and YOLOv7. The training process and specific experimental results are shown in Figure 14 and Table 4. It can be seen from Figure 14

Experiment and Analysis of Binocular Positioning
To test the localization accuracy of the binocular positioning method, distance measurements were taken every 40 cm. Manual methods were used to measure the positioning of the binocular camera relative to the sea cucumber. This experiment was conducted in a recirculating water tank in the fish farming facility. Since the sea cucumbers adhered to the bottom of the tank, the binocular camera was moved during the experiment. The movement may have caused slight variations in the horizontal and vertical directions. Figure 15 illustrates the target recognition and 3D positioning of a single sea cucumber using the binocular camera. By performing 3D positioning and distance measurements on individual targets, the localization accuracy of the aforementioned method was verified. The measurement results are shown in Table 5.  Based on the underwater experiments, the computer vision-based sea cucumber recognition and localization algorithm studied in this paper could accurately locate sea cucumbers in underwater environments. Based on the calculated data from the experiments, the average relative error in sea cucumber location was 1.97%, with a maximum relative error of 4.21%. This indicated that the proposed method can effectively and accurately localize sea cucumbers, providing reliable target position information for underwater robots.

Experiment and Analysis of Depth, Direction, and Approach Control
To validate the performance of the depth control, direction control, and approach control of the biomimetic robotic manta ray, a variety of experiments were conducted in an open water environment using the established motion control system. The objective of these experiments was to demonstrate the effectiveness of the depth, direction, and approach control of the system.

Experimental Scheme
The experimental setup for the motion control of the biomimetic robotic manta ray in this study primarily consisted of the biomimetic robotic manta ray and a remote control terminal. The two components communicated and transmitted data through a wireless RF module. Multiple motion control experiments were conducted at the sedimentation pool of Mingbo Aquaculture Company in Laizhou, China. The experimental site is depicted in Figure 16.
In this paper, the Jetson Xavier NX edge computing box was used to realize target detection and binocular stereo matching, and the three-dimensional coordinate information of the sea cucumber target was calculated. The binocular camera transmitted the image data to Jetson Xavier NX for processing, calculated the three-dimensional coordinate information of the target, and transmitted it to the STM32F407 master controller through serial communication. The STM32F407 controlled the bionic manta ray's movement based on the real-time transmission of target coordinate information.

Experimental Result
In the depth control experiment, the biomimetic robotic manta ray had an initial depth of 0.2 m, and the target depth was 1 m. Figure 17 illustrates the depth variation curve, where the brown dashed line represents the desired depth and the blue solid line represents the actual depth measured by the sensor. The biomimetic robotic manta ray reached the target depth within 7 s and maintained it in the vicinity of the target depth. Several snapshots of the closed-loop depth control experiment are shown in Figure 18. The biomimetic robotic manta ray rapidly reached the desired depth with no significant overshoot and remained stable near the desired depth. The steady-state error was within the range of (−30 mm, 30 mm). This indicated that the depth controller was capable of achieving precise depth control for the biomimetic robotic manta ray. During the motion of the biomimetic robotic manta ray, there was a risk of collision with obstacles due to the absence of active obstacle avoidance capability. This can cause loosening or deformation of the pectoral fin linkage mechanism, resulting in inconsistent thrust generated by the left and right pectoral fins, thereby affecting the motion posture of the biomimetic robotic manta ray. Therefore, a direction control system was designed to adjust the motion posture of the biomimetic robotic manta ray by altering the flapping amplitude of the left and right pectoral fins. Figure 19 illustrates the yaw angle variation curve (blue solid line) under the influence of the direction control system during straight swimming, with the target yaw angle indicated by the orange dashed line. The biomimetic robotic manta ray reached the target angle within 2 s and maintained it in the vicinity of the target angle. The experimental results demonstrated that, under the effect of the direction control system, the biomimetic robotic manta ray could maintain the desired yaw angle with a steady-state error within the range of (−1°, 1°). The rapid descent process of the biomimetic robotic manta ray resulted in large changes in the viewing angle, leading to target loss. Since the prototype lacks a gimbal system, it only performed two-dimensional approach motions. The screenshots of the approach motion and underwater sea cucumber localization video sequences are shown in Figure 20. From 0 to 6 s, the manta ray performed the approach motion and then started to move away after 6 s. Once the biomimetic robotic manta ray detected the three-dimensional spatial coordinates of the sea cucumber, it fine-tuned its motion direction by adjusting the deflection angles of the left and right pectoral fins. In the underwater images, the midpoint position of the left boundary was taken as the origin (0, 0) for the coordinate system. The x-values to the right of the origin were all greater than 0, while the y-values above the origin were negative, and below the origin were positive. The two-dimensional coordinate changes of the sea cucumber are illustrated in Figure 21. In the y-direction, the distance between the biomimetic robotic manta ray and the sea cucumber initially decreased and then increased. In the x-direction, at 2nd, the biomimetic robotic manta ray adjusted its motion direction toward the sea cucumber target, and at 5 seconds, it further adjusted its motion direction. At 6 seconds, it crossed over the sea cucumber from above and gradually moved away. The distance variation curve between the biomimetic robotic manta ray and the sea cucumber is depicted in Figure 22, showing that the distance initially decreased and then increased, indicating that the biomimetic robotic manta ray first approached and then moved away from the target under the effect of inertia. This confirmed that the biomimetic robotic manta ray could perform approach motions toward the sea cucumber target under the influence of the approach control system.

Discussion
The proposed YOLOv5-ST object detection algorithm in this paper enhances the model's global feature extraction capability by introducing swin transformer blocks. The model comparison experiments in Tables 3 and 4 also demonstrate the effectiveness of the improved model, achieving high detection accuracy. Although the model introduces a small number of transformer blocks, it slightly increases the computational cost of the model. However, this increase has minimal impact on the overall computational cost of the model. Since object localization requires the integration of binocular stereo-matching algorithms, it occupies significant computational resources, affecting the real-time performance of the overall detection and localization algorithm. To improve the real-time capability of localization, it is necessary to reduce the computational cost of stereo matching by improving binocular stereo-matching algorithms or using local image stereo-matching techniques [20] and other methods. Compared with the YOLOv4 algorithm, the YOLOv5 algorithm itself has higher algorithm accuracy and model performance. Although the accuracy of the YOLOv7 algorithm has decreased compared with YOLOV5-ST, considering the superiority of its algorithm itself, there is still room for improvement to achieve higher detection accuracy.
The biomimetic manta ray robot designed in this paper generates thrust by periodically oscillating two pairs of fins with specific amplitudes and frequencies. By changing the rotation angles of the servos and the phase difference between the two fins, the robot can perform various modes of motion. Its rigid-flexible coupling design has advantages [13,46], where passive deformation of the soft wing generates effective propulsion, allowing for the replication of the complex flexible deformation of real batoids while ensuring the lightweight and flexibility of the overall mechanism. Although the swimming of the biomimetic manta ray is relatively stable, it still affects the image quality and the success rate of binocular stereo matching in underwater image acquisition. Therefore, optimizing the control algorithm is needed to achieve smoother motion for the manta ray. The depth, direction, and approach control experiments of the manta ray demonstrated effective detection and localization of sea cucumbers in aquaculture ponds. However, the patrol path of the manta ray exhibits randomness, representing an initial exploration of applying underwater biomimetic robots to the aquaculture industry. It is necessary to implement global path-tracking control for the manta ray to improve its ability to traverse and monitor, enabling efficient applications in underwater biological monitoring and empowering aquaculture.

Conclusions
In this paper, we have designed and implemented a small biomimetic manta ray robot with visual perception capabilities and a rigid-flexible coupled pectoral fin for sea cucumber recognition, localization, and approach. First, the mechanical structure of the manta ray robot was designed as a platform for subsequent underwater monitoring. Second, by improving the YOLOv5 object detection algorithm and integrating it with binocular stereo matching, precise sea cucumber identification, and localization were achieved. Finally, a fuzzy PID controller was designed to realize depth control, direction control, and target approach motion control for the manta ray robot. Experimental results demonstrate that the improved YOLOv5 object detection algorithm achieves detection accuracies (mAP@0.5) of 88.4% and 94.5% on the URPC public dataset and self-collected dataset, respectively, effectively recognizing and localizing sea cucumbers. Control experiments were conducted, validating the effectiveness of the robotic manta ray's motion toward sea cucumbers. Experimental results confirmed the usability of the manta ray platform, the accuracy of the improved algorithms, and the effectiveness of the approach motion. This work provides valuable insights for the future development of more intelligent and efficient underwater biomimetic detection platforms, offering a novel solution for intelligent monitoring in aquaculture.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.