Ship Target Detection Algorithm Based on Improved YOLOv5

: In order to realize the real-time detection of an unmanned ﬁshing speedboat near a ship ahead, a perception platform based on a target visual detection system was established. By controlling the depth and width of the model to analyze and compare training, it was found that the 5S model had a fast detection speed but low accuracy, which was judged to be insufﬁcient for detecting small targets. In this regard, this study improved the YOLOv5s algorithm, in which the initial frame of the target is re-clustered by K-means at the data input end, the receptive ﬁeld area is expanded at the output end, and the loss function is optimized. The results show that the precision of the improved model’s detection for ship images was 98.0%, and the recall rate was 96.2%. Mean average precision ( mAP ) reached 98.6%, an increase of 4.4% compared to before the improvements, which shows that the improved model can realize the detection and identiﬁcation of multiple types of ships, laying the foundation for subsequent path planning and automatic obstacle avoidance of unmanned ships.


Introduction
As intelligent platforms that can be used for marine monitoring, unmanned surface ships need to complete complex and orderly autonomous operation tasks such as target recognition and obstacle avoidance when operating at high speeds on complex and uncertain surface environments. Accurate recognition and automatic obstacle avoidance place high requirements on the high-speed information processing capabilities of the vision system of an unmanned ship [1].
In recent years, deep learning has been widely used in the entire target detection field including in face recognition [2,3], in vehicle detection and recognition [4,5], in autonomous driving [6], and in the medical industry [7]. Compared to the SIFT algorithm [8] and what was proposed by David in the texture extraction algorithm [9][10][11], which are from among the representative traditional algorithms, as well as the HOG algorithm [12] proposed by the Navneet team, the deep learning target detection algorithm has made a great leap in performance and accuracy, and its model network's anti-scale change and anti-translation capabilities have been significantly improved.
Unmanned platforms are developing rapidly and becoming more mature. Equipment such as unmanned aerial vehicles and unmanned vehicles has gradually become more widely used. Research on unmanned offshore equipment has gradually begun to receive more attention, especially regarding surface unmanned boats, which has caused extensive research by scholars such as that on automatic collision avoidance [13] and path planning [14,15]. Environmental perception and target recognition technology are not only the core keys to realizing the autonomous decision-making functions and autonomous obstacle avoidance functions of surface unmanned boats, but they also improve the safety guarantee for the navigation of the unmanned boat. Therefore, the establishment of a visual inspection system for ships has become a hot issue for autonomous ships at sea. In terms of ship detection, considering real-time requirements, current mainstream algorithms include TWO-STAGE and ONE-STAGE algorithms. In an algorithm based on area detection, Su J. [16] and Wang G. H. [17] used feature enhancement, pre-training model parameter tuning, and fine-tuning of the classification framework to achieve higher detection accuracy with the SSD algorithm for inland watercraft. This detection algorithm is slower because it needs to generate a region candidate frame first. In 2016, Redmon proposed YOLO (you only look once) [18]. This kind of regression-based algorithm is used to locate and identify achieved outstanding performance in the field of target detection. Yu Y. [19] and Jiang W. Z. [20] improved YOLOv2 and YOLOv3 by adjusting the network structure and changing the input scale to increase mAP to about 80%. However, this method still has room for improvement in the detection of small targets with complex maritime conditions.
Real-time detection of ship targets has high requirements for accuracy. As the latest representative algorithm of the YOLO series, YOLOv5 is characterized by faster speed, higher recognition accuracy, and smaller-sized files, and it can be carried on mobile devices with lower configurations [21], which gives it high research value. In this research, the model was applied to ship detection based on an unmanned ship platform. Aiming at the problem of poor detection of small targets, structure and detection accuracy were improved. Figure 1 shows the perception platform based on the target visual detection system, which was an intelligent, water-fishing, unmanned speedboat that integrated water quality detection, automatic bait throwing, automatic obstacle avoidance, unmanned driving, image processing, and other technologies. The mechanical structure of the device was mainly composed of a 304 stainless steel hull and a 304 stainless steel drive shaft seal. The size was 800 mm × 280 mm × 320 mm, and it used a V-shaped bow structure design, which was beneficial for reducing resistance, reducing wake, lowering the center of gravity, enhancing stability, and accommodating more components. of a visual inspection system for ships has become a hot issue for autonomous ships at sea. In terms of ship detection, considering real-time requirements, current mainstream algorithms include TWO-STAGE and ONE-STAGE algorithms. In an algorithm based on area detection, Su J. [16] and Wang G. H. [17] used feature enhancement, pre-training model parameter tuning, and fine-tuning of the classification framework to achieve higher detection accuracy with the SSD algorithm for inland watercraft. This detection algorithm is slower because it needs to generate a region candidate frame first. In 2016, Redmon proposed YOLO (you only look once) [18]. This kind of regression-based algorithm is used to locate and identify achieved outstanding performance in the field of target detection. Yu Y. [19] and Jiang W. Z. [20] improved YOLOv2 and YOLOv3 by adjusting the network structure and changing the input scale to increase mAP to about 80%. However, this method still has room for improvement in the detection of small targets with complex maritime conditions.

Hardware Platform
Real-time detection of ship targets has high requirements for accuracy. As the latest representative algorithm of the YOLO series, YOLOv5 is characterized by faster speed, higher recognition accuracy, and smaller-sized files, and it can be carried on mobile devices with lower configurations [21], which gives it high research value. In this research, the model was applied to ship detection based on an unmanned ship platform. Aiming at the problem of poor detection of small targets, structure and detection accuracy were improved. Figure 1 shows the perception platform based on the target visual detection system, which was an intelligent, water-fishing, unmanned speedboat that integrated water quality detection, automatic bait throwing, automatic obstacle avoidance, unmanned driving, image processing, and other technologies. The mechanical structure of the device was mainly composed of a 304 stainless steel hull and a 304 stainless steel drive shaft seal. The size was 800 mm × 280 mm × 320 mm, and it used a V-shaped bow structure design, which was beneficial for reducing resistance, reducing wake, lowering the center of gravity, enhancing stability, and accommodating more components.

Vision Platform System
This article mainly focuses on image processing target detection. The image recognition module was an embedded Jetson nano development board, as shown in Figure 2, which embedded the improved model algorithm that had been trained in advance and realized wireless communication, remote monitoring, and remote control through a 4G network module. The communication system was divided into an unmanned ship terminal, a cloud server terminal, and a client terminal, which realized the transmission and storage of information and could also realize the remote wireless control of the ship. We performed "end-to-end" calculations through the captured videos and pictures and returned the re-

Vision Platform System
This article mainly focuses on image processing target detection. The image recognition module was an embedded Jetson nano development board, as shown in Figure 2, which embedded the improved model algorithm that had been trained in advance and realized wireless communication, remote monitoring, and remote control through a 4G network module.

Vision Platform System
This article mainly focuses on image processing target detection. The image recogn tion module was an embedded Jetson nano development board, as shown in Figure 2 which embedded the improved model algorithm that had been trained in advance an realized wireless communication, remote monitoring, and remote control through a 4G network module. The communication system was divided into an unmanned ship terminal, a clou server terminal, and a client terminal, which realized the transmission and storage of in formation and could also realize the remote wireless control of the ship. We performe "end-to-end" calculations through the captured videos and pictures and returned the re sults to the terminal to issue instructions to the ship. The detection steps are shown i Figure 3. The communication system was divided into an unmanned ship terminal, a cloud server terminal, and a client terminal, which realized the transmission and storage of information and could also realize the remote wireless control of the ship. We performed "end-to-end" calculations through the captured videos and pictures and returned the results to the terminal to issue instructions to the ship. The detection steps are shown in Figure 3.

Vision Platform System
The graphics card used was an NVIDIA GeForce GTX 1660Ti; the CPU was INTEL Core I7-9750H@2.60 GHz six-core with 16GB of memory. The environment configuration was Windows 10, Python3.8, Pytorch1.8.1, and Cuda10.1, and the framework was Tensor-Flow. The parameter settings are shown in Table 1.

Principles and Methods
The YOLOv5 model structure is similar to that of other YOLO algorithm series divided into four parts: input, backbone, neck, and prediction. Figure 4 shows the main structure of YOLOv5s.

Vision Platform System
The graphics card used was an NVIDIA GeForce GTX 1660Ti; the CPU was INTEL Core I7-9750H@2.60 GHz six-core with 16GB of memory. The environment configuration was Windows 10, Python3.8, Pytorch1.8.1, and Cuda10.1, and the framework was TensorFlow. The parameter settings are shown in Table 1.

Principles and Methods
The YOLOv5 model structure is similar to that of other YOLO algorithm series divided into four parts: input, backbone, neck, and prediction. Figure 4 shows the main structure of YOLOv5s.

Vision Platform System
The graphics card used was an NVIDIA GeForce GTX 1660Ti; the CPU was INTEL Core I7-9750H@2.60 GHz six-core with 16GB of memory. The environment configuration was Windows 10, Python3.8, Pytorch1.8.1, and Cuda10.1, and the framework was Tensor-Flow. The parameter settings are shown in Table 1.

Principles and Methods
The YOLOv5 model structure is similar to that of other YOLO algorithm series divided into four parts: input, backbone, neck, and prediction. Figure 4 shows the main structure of YOLOv5s. The input part can realize data enhancement, adaptive anchor frame calculation, and adaptive image scaling. The feature extraction part mainly adopts the focus structure that can complete slicing and convolution operations and the CSP structure that enhances the The input part can realize data enhancement, adaptive anchor frame calculation, and adaptive image scaling. The feature extraction part mainly adopts the focus structure that can complete slicing and convolution operations and the CSP structure that enhances the learning ability of the feature network. Because the Focus and CBL of different networks have different numbers of convolution kernels, and the number of residual modules of the CSP is different, the model can show different performances by controlling the width and depth of the network. The neck part uses FPN and PAN structures, using the information extracted from the backbone part to strengthen the network feature fusion ability. The output layer is divided into three convolutional layer channels, which are calculated through the loss function, and the result is subjected to maximum value suppression processing to give the prediction result.

Dataset Preparation and Preprocessing
The experiments in this article are divided into public datasets and self-made datasets. The public dataset is the SeaShips dataset, in which the images are from a monitoring system deployed on the coastline, and the pictures obtained from each frame of the image have been intercepted. The self-made dataset was collected from common ships in the river.
The mosaic enhancement method was used to randomly select four pictures for random scaling and then randomly distribution for splicing, which greatly enriched the detection dataset, especially because the random scaling added a lot of small targets, making the network more robust. The enhanced effect is shown in Figure 5.
J. Mar. Sci. Eng. 2021, 9,908 and depth of the network. The neck part uses FPN and PAN structures, usin mation extracted from the backbone part to strengthen the network feature fu The output layer is divided into three convolutional layer channels, which ar through the loss function, and the result is subjected to maximum value supp cessing to give the prediction result.

Dataset Preparation and Preprocessing
The experiments in this article are divided into public datasets and se tasets. The public dataset is the SeaShips dataset, in which the images are from ing system deployed on the coastline, and the pictures obtained from each image have been intercepted. The self-made dataset was collected from comm the river.
The mosaic enhancement method was used to randomly select four pict dom scaling and then randomly distribution for splicing, which greatly enri tection dataset, especially because the random scaling added a lot of small targ the network more robust. The enhanced effect is shown in Figure 5. When the image was zoomed on the input end, there were different bl around it as well as information redundancy, which affected the training spe Equation (1) to calculate the adaptive zoom: When the image was zoomed on the input end, there were different black borders around it as well as information redundancy, which affected the training speed. We used Equation (1) to calculate the adaptive zoom: where x and y represent the length and width of the input, respectively; c and d represent the scaled size; e is the original height that needs to be filled; and f is the sum of the two sides that need to be filled. where x and y represent the length and width of the input, respectively; c and d represent the scaled size; e is the original height that needs to be filled; and f is the sum of the two sides that need to be filled. Figure 6a is an anchor frame distribution map to show the intuitive situation of data labels, and an overall analysis of the target position and target size on the label data obtained a target relative position map, as shown in Figure 6b, as well as a target relative size map, as shown in Figure 6c.   Figure 6b shows that the lower left corner of the data set picture was set as the coordinate origin to establish a rectangular coordinate system, and the relative coordinate values of the abscissa x and the ordinate y were used to evaluate the relative position of the target. The results show that the horizontal direction of the target runs through the entire coordinate axis, and the vertical direction is more concentrated but somewhat discrete. Figure 6c shows that the width of the target mostly occupied 2~5% of the image width, and the target height mostly occupied 5~8% of the image height.

YOLOv5s Algorithm Network Structure Improvement
It can be seen from the above analysis that there was a large gap between the initial set of regional candidate frames and the distribution of the dataset because the target sample dataset had a rich variety of objects in different sizes, resulting in the insufficient detection of small targets and unbalanced targets. Therefore, the initial frame of the target was clustered first, and the loss function module and the receptive field area were improved.

K-Means Dimensional Clustering
To improve the accuracy of ship identification, the direct use of the original a priori box cannot fully meet demands. Therefore, the K-means clustering algorithm was used to cluster the target frame of the labeled dataset. The purpose was to give the anchor frame and the detection frame a greater intersection ratio to select the best a priori frame. The calculation formula is as Equation (2): where IOU represents the intersection ratio of the predicted frame and the true frame. The prior boxes obtained by re-clustering were (12,16), (17,39), (30,52), (54,60), (33,26), (126,183), (227,283), (373,326), and (407,486). The allocation was carried out according to the principle of using large a priori boxes for small scales and small a priori boxes for large scales.

Expanding the Receptive Field Area
In many vision tasks, the size of the receptive field is a key issue because each pixel in the output feature map must respond to a large enough area in the image to capture information about large objects. Therefore, we chose to add a maximum pooling layer in the space pyramid to improve multiple receptive fields fusion, thereby improving the detection accuracy of small targets. The improved structure is shown in Figure 7.
dinate origin to establish a rectangular coordinate system, and the relative coordinate v ues of the abscissa x and the ordinate y were used to evaluate the relative position of target. The results show that the horizontal direction of the target runs through the ent coordinate axis, and the vertical direction is more concentrated but somewhat discrete Figure 6c shows that the width of the target mostly occupied 2~5% of the ima width, and the target height mostly occupied 5~8% of the image height.
It can be seen from the above analysis that there was a large gap between the ini set of regional candidate frames and the distribution of the dataset because the target sa ple dataset had a rich variety of objects in different sizes, resulting in the insufficient tection of small targets and unbalanced targets. Therefore, the initial frame of the tar was clustered first, and the loss function module and the receptive field area were i proved.

K-Means Dimensional Clustering
To improve the accuracy of ship identification, the direct use of the original a pri box cannot fully meet demands. Therefore, the K-means clustering algorithm was used cluster the target frame of the labeled dataset. The purpose was to give the anchor fra and the detection frame a greater intersection ratio to select the best a priori frame. T calculation formula is as Equation (2): where IOU represents the intersection ratio of the predicted frame and the true fram The prior boxes obtained by re-clustering were (12,16), (17,39), (30,52), (54,60), (33,2 (126,183), (227,283), (373, 326), and (407,486). The allocation was carried out according the principle of using large a priori boxes for small scales and small a priori boxes for la scales.

Expanding the Receptive Field Area
In many vision tasks, the size of the receptive field is a key issue because each pi in the output feature map must respond to a large enough area in the image to capt information about large objects. Therefore, we chose to add a maximum pooling layer the space pyramid to improve multiple receptive fields fusion, thereby improving the tection accuracy of small targets. The improved structure is shown in Figure 7.  Figure 7a is the macro structure, which visually shows that a maximum pooling lay has been added. Figure 7b shows the microstructure. In the figure, SPP is a spatial py mid pooling module, and CBL is a combination module comprising a convolutional lay a BN layer, and an activation function layer. From a microscopic point of view, we creased the receptive field of the model by adding a 3 × 3 maximum pooling filter.

Improved Loss Function
Equations (2)-(5) are the loss functions of the original YOLOv5 algorithm that w used for the bounding box, GIOU_loss, which has certain limitations. When there i phenomenon contained between the detection box and the real box, the overlapping p is unable to be optimized. For confidence and category loss, the original algorithm use two-category, cross-entropy loss function, which, to a certain extent, is not conducive the classification of positive and negative samples.

Loss GIOU Loss Loss
where C represents the smallest bounding rectangle between the detection frame and t prior frame and Q represents the difference between the smallest bounding rectangle a the union of the two boxes.  Figure 7. Comparison before and after pooling layer improvement: (a) macro structure; (b) micro structure. Figure 7a is the macro structure, which visually shows that a maximum pooling layer has been added. Figure 7b shows the microstructure. In the figure, SPP is a spatial pyramid pooling module, and CBL is a combination module comprising a convolutional layer, a BN layer, and an activation function layer. From a microscopic point of view, we increased the receptive field of the model by adding a 3 × 3 maximum pooling filter.

Improved Loss Function
Equations (2)-(5) are the loss functions of the original YOLOv5 algorithm that was used for the bounding box, GIOU_Loss, which has certain limitations. When there is a phenomenon contained between the detection box and the real box, the overlapping part is unable to be optimized. For confidence and category loss, the original algorithm uses a two-category, cross-entropy loss function, which, to a certain extent, is not conducive to the classification of positive and negative samples.

Loss = GIOU_Loss + Loss con f + Loss class
(3) where C represents the smallest bounding rectangle between the detection frame and the prior frame and Q represents the difference between the smallest bounding rectangle and the union of the two boxes. According to the above problems, the improved loss function of Equations (6)-(8) was adopted. The bounding box of the improved algorithm used the CIOU_Loss loss function to increase the restriction mechanism for the aspect ratio so that the prediction box would be more in line with the real box. Confidence and category loss functions adopted an improved cross-entropy function, which made the separation of positive and negative samples flexible by changing their weights and reduced the impact on them.
where ρ() is the Euclidean distance between the center point of the detection frame and the prior frame, c is the diagonal length of the two smallest enclosing rectangles, and α is the weight coefficient.
The distance between the overlapping area and the center point is considered, but the aspect ratio is not considered, so the following parameters are added to the penalty term of DIOU: where v is a parameter for measuring the consistency of the aspect ratio.
where α and γ represent coordination parameters.

Results and Discussion
The evaluation index system of this experiment included mean average precision, recall rate, and precision rate. The closer a mAP value was to 1, the better the overall performance of the model. There were six types of ships in the dataset used in this study, so the mAP calculation was the average of the six types of AP, the value of which was the area enclosed by the recall and precision curves, as in Equation (10): where TP represents the number of correctly identified ship images, FP represents the number of misrecognized ship images, and FN represents the number of missed ship images.

Model Training
By controlling the depth and width of the model, the four models could be trained in groups to determine which model was suitable for the detection of ships on the water. The four models (s, m, l, x) ranged from shallow to deep and from narrow to wide. The depth of the model was related to the number of residual components, and the width was related to the number of convolution kernels. The parameter settings are shown in Table 2. The results of group training are shown in Table 3. Although the YOLOv5s model performed slightly worse, the mAP values of the other three models were all around 98%. Each parameter of the 5× model had a strong fluctuation in the 0-50 rounds; it was judged that the model had great instability for the detection of small targets. The specific situation is shown in Figure 8. The abscissa in the two figures is the epoch, and the ordinate is the value of the loss and mAP@0.5.

Model Training
By controlling the depth and width of the model, the four models could be trained in groups to determine which model was suitable for the detection of ships on the water. The four models (s, m, l, x) ranged from shallow to deep and from narrow to wide. The depth of the model was related to the number of residual components, and the width was related to the number of convolution kernels. The parameter settings are shown in Table 2.
The results of group training are shown in Table 3. Although the YOLOv5s model performed slightly worse, the mAP values of the other three models were all around 98%.  Each parameter of the 5× model had a strong fluctuation in the 0-50 rounds; it was judged that the model had great instability for the detection of small targets. The specific situation is shown in Figure 8. The abscissa in the two figures is the epoch, and the ordinate is the value of the loss and mAP@0.5. Among them, the 5l and 5m model detection times were too long and did not have good real-time performance; the 5S model had a short detection time, so it had real-time requirements. A reason for its poor accuracy may be that the model is not effective for Among them, the 5l and 5m model detection times were too long and did not have good real-time performance; the 5S model had a short detection time, so it had real-time requirements. A reason for its poor accuracy may be that the model is not effective for small target recognition, and the output frame is biased. This study has made improvements to this situation. Figure 9 shows the improved PR curve of the 5s model. It can be seen that the improved model achieved good recognition results for all types of ships, and the AP value for container ships reached 99.5%. small target recognition, and the output frame is biased. This study has made improvements to this situation. Figure 9 shows the improved PR curve of the 5s model. It can be seen that the improved model achieved good recognition results for all types of ships, and the AP value for container ships reached 99.5%. The confusion matrix displayed in Figure 10, each column of which representing the predicted category, the total number and value in each column indicating the number of data predicted to be the category and the number of real data predicted to be the category, each row representing the true attribution category of the data, and the total amounts of data in each row representing the number of data instances of that category, shows good stability in detecting various types of ships  The confusion matrix displayed in Figure 10, each column of which representing the predicted category, the total number and value in each column indicating the number of data predicted to be the category and the number of real data predicted to be the category, each row representing the true attribution category of the data, and the total amounts of data in each row representing the number of data instances of that category, shows good stability in detecting various types of ships. small target recognition, and the output frame is biased. This study has made improvements to this situation. Figure 9 shows the improved PR curve of the 5s model. It can be seen that the improved model achieved good recognition results for all types of ships, and the AP value for container ships reached 99.5%. The confusion matrix displayed in Figure 10, each column of which representing the predicted category, the total number and value in each column indicating the number of data predicted to be the category and the number of real data predicted to be the category, each row representing the true attribution category of the data, and the total amounts of data in each row representing the number of data instances of that category, shows good stability in detecting various types of ships   Figure 11 shows comparisons of the pictures before and after model detection. Through the comparison pictures, it was found that the small target of the passenger ship that was not originally recognized was detected after the improvement in Figure 11a, which improved the ability for small target detection. Figure 11 shows comparisons of the pictures before and after model detection. Through the comparison pictures, it was found that the small target of the passenger ship that was not originally recognized was detected after the improvement in Figure 11a, which improved the ability for small target detection. The original algorithm of Figure 11b misidentified the distant shore as an ore ship; the improved algorithm corrected the misidentification of the target and improved the confidence of the original algorithm for the ore ship.

Improved Model Result Analysis and Comparison
The original algorithm of Figure 11c output multiple sets of prediction boxes, predicting that the target object was a cargo ship, a container ship, or a bulk carrier, but the confidence was low. The improved algorithm improved this situation and made a correct prediction.
Based on the above situation, the algorithm's ability to detect small targets and various types of ships was significantly improved, and the error rate was reduced. Although the detection time increased by 2.2 ms, the mAP increased by 4.4% compared with the original algorithm, which indicates that the improved network performance can meet the needs of real-time and accuracy and shows greater improvement compared to YOLOv2 and YOLOv3. The performance comparison is shown in Table 4. The original algorithm of Figure 11b misidentified the distant shore as an ore ship; the improved algorithm corrected the misidentification of the target and improved the confidence of the original algorithm for the ore ship.
The original algorithm of Figure 11c output multiple sets of prediction boxes, predicting that the target object was a cargo ship, a container ship, or a bulk carrier, but the confidence was low. The improved algorithm improved this situation and made a correct prediction.
Based on the above situation, the algorithm's ability to detect small targets and various types of ships was significantly improved, and the error rate was reduced. Although the detection time increased by 2.2 ms, the mAP increased by 4.4% compared with the original algorithm, which indicates that the improved network performance can meet the needs of real-time and accuracy and shows greater improvement compared to YOLOv2 and YOLOv3. The performance comparison is shown in Table 4.

Conclusions
Autonomous navigation of unmanned ships at sea is inseparable from accurate detection of maritime targets. The images returned by a camera combined with accurate image analysis techniques can provide powerful preconditions for the perception systems of unmanned ships.
This study analyzed four models by adjusting the width and depth of YOLOv5. The results showed that the 5S model had a low accuracy rate, which may be due to insufficient detection capabilities for small targets, resulting in low accuracy. Therefore, to retain its high detection speed, advantages to improve it are required. By performing K-means dimensional clustering on the target frame of the dataset, the input end adopted mosaic enhancement and image scale transformation, added the largest pooling layer, and optimized the improvement method of the loss function, so that the mAP of the improved YOLOv5s reached 98.6%, which was an increase of 4.4% compared to the original; this improved the problem of low detection accuracy for small targets, indicating that the proposed improved method has a better recognition effect and can provide a strong guarantee for automatic driving of unmanned ships.
This research largely concerns the detection of several common ship types. Multiframe recognition of dynamic targets is the key to dynamic obstacle avoidance at sea. The next step in this research will be to analyze the correlations between the data to identify a variety of other types of targets through transfer learning, improving the generalization ability of the model. This article will provide information support for future research: (1) A combination of ultrasonic and Doppler radars for water and underwater detection combined with an onboard camera and drone shooting to achieve obstacle avoidance and real-time path planning; (2) Weather detection equipment used to transmit weather conditions in a rescue area in real-time; (3) Real-time water depth and velocity detection through echo sounders and ADCP to achieve water rescue in difficult conditions.