An Algorithm for Ship Detection in Complex Observation Scenarios Based on Mooring Buoys

: Marine anchor buoys, as ﬁxed-point proﬁle observation platforms, are highly susceptible to the threat of ship collisions. Installing cameras on buoys can effectively monitor and collect evidence from ships. However, when using a camera to capture images, it is often affected by the continuous shaking of buoys and rainy and foggy weather, resulting in problems such as blurred images and rain and fog occlusion. To address these problems, this paper proposes an improved YOLOv8 algorithm. Firstly, the polarized self-attention (PSA) mechanism is introduced to preserve the high-resolution features of the original deep convolutional neural network and solve the problem of image spatial resolution degradation caused by shaking. Secondly, by introducing the multi-head self-attention (MHSA) mechanism in the neck network, the interference of rain and fog background is weakened, and the feature fusion ability of the network is improved. Finally, in the head network, this model combines additional small object detection heads to improve the accuracy of small object detection. Additionally, to enhance the algorithm’s adaptability to camera detection scenarios, this paper simulates scenarios, including shaking blur, rain, and foggy conditions. In the end, numerous comparative experiments on a self-made dataset show that the algorithm proposed in this study achieved 94.2% mAP50 and 73.2% mAP50:95 in various complex environments, which is superior to other advanced object detection algorithms.


Introduction
A marine mooring buoy is a floating device located on the surface of the sea that has traditionally been used to mark shipping lanes or to indicate potentially hazardous objects, such as coral reefs or underwater shipwrecks [1].Since the 1920s, it has been an important multifunctional platform carrying various types of observational equipment for observing information on the marine environment, meteorological conditions, and navigational safety [2][3][4].If the ocean buoy is damaged, it not only leads to the loss of key data but also may collide with ships during drifting, causing secondary accidents [5,6].Therefore, in order to reduce the damage of marine buoys due to ship collisions and other damages, the design of a buoy warning and forensic system is crucial.
Traditional buoy monitoring systems use CCTV technology monitoring or multisensor fusion technology to detect and warn ships.For example, Zheng et al. [7] used a CCTV video surveillance system to continuously collect image information for real-time vessel detection.Zhao et al. [8] designed a control system that combines geomagnetic detection and infrared sensing, which can detect ship targets around buoys.Chen et al. [9] used underwater acoustic signals to locate the position of ships and provided different levels of warning based on the position of the ship to the buoy.Hwang et al. [6] used AIS to determine the distance between ships and buoys and then took photos of surrounding ships for evidence collection.Although the traditional detection methods mentioned above can detect the presence of ships, it is difficult to recognize the type of ship directly at the buoy end, and it is necessary to upload the pictures to the shore station system and subsequently identify the type of ship.
As artificial intelligence technology develops, it is becoming more commonplace to process photos or videos using deep learning techniques to detect ship targets.However, ship detection tasks based on the buoy end have unique features when compared to other types of object detection: 1.
Based on its special application environment, the swaying and shaking of the buoy itself will cause the ship to appear shaky and blurred in the camera.This leads to the easy loss of key details of the ship during the extraction of edges and textures, thereby affecting the detection effectiveness; 2.
The marine environment is complex and variable, and cameras are often disturbed by weather such as rain and fog, which can lead to ship targets being easily occluded, posing significant challenges to accurate positioning and recognition of ships; 3.
Small ships that frequently damage buoys are more likely to exhibit smaller scales in the image, which may lead to missed detection of small targets.Therefore, ship detection algorithms need to have stronger small object detection capabilities to effectively identify and detect these small ships.
Currently, object detection methods based on deep learning are primarily categorized into two-stage detection models and one-stage detection models.The target detection process in the two-stage detection model involves two phases: creating candidate regions initially and then refining and classifying them to achieve the ultimate detection outcomes.This model usually has high accuracy but a slow detection speed.R-CNN [10], Fast-RCNN [11] and Faster-RCNN [12] are some of the top two-stage models.Gu et al. [13] improved the Faster R-CNN by combining the scene semantic narrowing and topic narrowing sub-networks to solve the problems of insufficient ship feature extraction and repeated detection.However, the two-stage algorithm is not suitable for application at the buoy end due to its inherently slower detection speed.
Conversely, one-stage detection models such as YOLO [14][15][16][17][18][19] and SSD [20] eliminate the step of generating candidate boxes, greatly simplifying the process and reducing computational complexity.In addition, these models also demonstrate good performance in handling small object detection in complex environments, making them an ideal choice for ship detection at the buoy end.Zhang et al. [21] improved the YOLOv4 algorithm by introducing Swin Transformer to extract deep features, effectively improving the accuracy of ship detection.However, the model has a complex structure and needs further simplification.Zheng et al. [22] proposed the MC-YOLOv5s algorithm, which uses MobileNetV3-Small as a lightweight feature extraction backbone network to improve the detection speed.Although the above improvement methods have shown some improvement in detection speed and accuracy, further optimization is still needed.
In recent years, the importance of the attention mechanism in the field of target detection has increased.This mechanism improves the generalization ability of detection models by focusing attention on regions and features of interest.For example, Shang et al. [23] proposed to improve the YOLOv5 algorithm by utilizing the convolutional block attention module (CBAM) and coordinate attention (CA), which improves the maritime target detection accuracy by assigning higher weights to the tensor regions where the target is more likely to appear.Although this method has been effective in improving the accuracy of ship detection, they have not proposed effective solutions when dealing with dense ship identification problems.In response, Wang et al. [24] optimized the YOLOv5 model by combining the CNeB2 module and the separated and enhanced attention module (SEAM) attention mechanism, successfully solving the problem of difficult positioning when ships are dense.Zhao et al. [25] significantly enhanced the model's performance in detecting occlusion and targets of different scales by integrating the ECA module into YOLOX's backbone network and improving non-maximum suppression.However, they did not provide targeted optimization to address recognition issues caused by rain, fog, and visual blurring.Si et al. [26] first used the K-means method to improve the clustering performance of ship data and then introduced the squeeze-and-extraction (SE) mechanism in the feature extraction network to enhance the feature extraction ability.This innovation effectively improves the recognition performance of the model for ships under sea fog conditions, but no effective improvement measures have been proposed for the problem of shaking blur.Wang et al. [27] proposed a feature fusion module incorporating a GT module, which effectively suppresses the noise introduced by shallow features, and introduced an SPD-Conv module in order to improve the detection accuracy of low-resolution images.This method has achieved significant results in improving the image resolution reduction caused by shaking, but it has limitations in processing visible light images required in this study, as it is mainly suitable for infrared image processing.Although the above target detection algorithms are able to achieve high-precision detection of ships in fixed surveillance environments, they fail to effectively deal with the problems of recognizing ships on shaky platforms and under complex environmental conditions at the buoy end at the same time.In view of this, this article proposes an improved YOLOv8 object detection algorithm aimed at effectively overcoming issues such as camera shake and rain and fog occlusion at the buoy end, thereby accurately identifying ship types at the buoy end.The following are this study's primary contributions: 1.
Aiming at the problem of blurring ship images and a reduction in target spatial resolution due to camera shake, this study redesigned the bottleneck structure of the C2f module, combined with the polarized self-attention mechanism, to reduce information loss due to blurring through dual enhancement of spatial and channel dimensions, and to ensure a more comprehensive extraction and fusion of ship features; 2.
For the feature of rain and fog obscuring ship targets in bad sea conditions, this study introduces the multi-head self-attention module in the neck network.This approach captures the relative position relationship between features by introducing relative position coding, which fully exploits the correlation between features, effectively weakens the interference of rain and fog, and significantly improves the accuracy of ship detection; 3.
In response to the common problem of low accuracy in small vessel detection, this study combines an independently designed small target detection head and a larger size feature map to enhance the detection accuracy of small ships by extracting richer feature information from the shallow feature map.

Dataset Description
The dataset used in this study was collected from the Jiaozhou Bay area of Qingdao, China.The HIKVISION model DS-2TD5167-50H4/W/GLT camera was used as the equipment, as shown in Figure 1.The resolution of the device is 2688 × 1520 (visible light images), and it captures 25 frames per second.
The captured videos were then analyzed and processed, resulting in 3732 images.These images cover six different types of vessels: container ships (CS), general cargo ships (GCS), fishing boats (FB), passenger ships (PS), small wooden boats (SWB) and sailboats (SB).Figure 2 shows the various ship images captured.The captured videos were then analyzed and processed, resulting in 3732 images.These images cover six different types of vessels: container ships (CS), general cargo ships (GCS), fishing boats (FB), passenger ships (PS), small wooden boats (SWB) and sailboats (SB).Figure 2 shows the various ship images captured.However, after collation, it was found that the amount of data for some types of ships was too small.In order to enrich the dataset of ship types, 162 and 313 images were selected from the SMD [28] and SeaShips [29] datasets for supplementation.Finally, the enriched dataset will be named BaiLongOnBoardShips, hereinafter referred to as the BLOBS dataset, which includes a total of 4207 images.
After acquiring the images, the Labellmg annotation tool was used to annotate the images and generate the corresponding XML files, and the number of specific targets is shown in Table 1.Then, the corresponding annotation file for each image was converted to a txt file.Finally, this study randomly divides the dataset into a training set, validation set and test set according to the ratio of 8:1:1.The captured videos were then analyzed and processed, resulting in 3732 images.These images cover six different types of vessels: container ships (CS), general cargo ships (GCS), fishing boats (FB), passenger ships (PS), small wooden boats (SWB) and sailboats (SB).Figure 2 shows the various ship images captured.However, after collation, it was found that the amount of data for some types of ships was too small.In order to enrich the dataset of ship types, 162 and 313 images were selected from the SMD [28] and SeaShips [29] datasets for supplementation.Finally, the enriched dataset will be named BaiLongOnBoardShips, hereinafter referred to as the BLOBS dataset, which includes a total of 4207 images.
After acquiring the images, the Labellmg annotation tool was used to annotate the images and generate the corresponding XML files, and the number of specific targets is shown in Table 1.Then, the corresponding annotation file for each image was converted to a txt file.Finally, this study randomly divides the dataset into a training set, validation set and test set according to the ratio of 8:1:1.However, after collation, it was found that the amount of data for some types of ships was too small.In order to enrich the dataset of ship types, 162 and 313 images were selected from the SMD [28] and SeaShips [29] datasets for supplementation.Finally, the enriched dataset will be named BaiLongOnBoardShips, hereinafter referred to as the BLOBS dataset, which includes a total of 4207 images.
After acquiring the images, the Labellmg annotation tool was used to annotate the images and generate the corresponding XML files, and the number of specific targets is shown in Table 1.Then, the corresponding annotation file for each image was converted to a txt file.Finally, this study randomly divides the dataset into a training set, validation set and test set according to the ratio of 8:1:1.

Data Preprocessing
Since the camera is located on a buoy and has to experience a variety of rough sea conditions, this study preprocesses the dataset before training in order to better simulate the actual ship images and to improve the algorithm's generalization ability.Figure 3 shows a sample of the processed image, and the following are the specific operations of the preprocessing: 1.
For the motion blur caused by camera shake, this paper uses a blur kernel of size 50 × 50 for simulation and normalizes the blur kernel to ensure that the total brightness of the processed image remains unchanged; 2.
For the effect of rainy days, this paper first generates a noise layer on the input image to simulate raindrops and then rotates and stretches the noise layer to increase the real sensation and dynamic effect of the simulated raindrops.Finally, we use the transparency blending technique to blend the simulated raindrop layer with the original image, which finally enhances the visual impact of the rainy day; 3.
In order to simulate the effect of a real foggy day, this paper adopts the method of synthesizing fog at the center point.The specific operation is to set the fog center at the center of the image, then calculate the Euclidean distance of each pixel point relative to the fog center, then use the attenuation coefficient to adjust the pixel brightness accordingly in order to simulate the natural attenuation of the light when it passes through the fog, and finally multiply the luminance value of each pixel by the attenuation coefficient, so as to achieve the simulation of the fog effect.
the actual ship images and to improve the algorithm's generalization ability.Figure 3 shows a sample of the processed image, and the following are the specific operations of the preprocessing: 1.For the motion blur caused by camera shake, this paper uses a blur kernel of size 50 × 50 for simulation and normalizes the blur kernel to ensure that the total brightness of the processed image remains unchanged; 2. For the effect of rainy days, this paper first generates a noise layer on the input image to simulate raindrops and then rotates and stretches the noise layer to increase the real sensation and dynamic effect of the simulated raindrops.Finally, we use the transparency blending technique to blend the simulated raindrop layer with the original image, which finally enhances the visual impact of the rainy day; 3.In order to simulate the effect of a real foggy day, this paper adopts the method of synthesizing fog at the center point.The specific operation is to set the fog center at the center of the image, then calculate the Euclidean distance of each pixel point relative to the fog center, then use the attenuation coefficient to adjust the pixel brightness accordingly in order to simulate the natural attenuation of the light when it passes through the fog, and finally multiply the luminance value of each pixel by the attenuation coefficient, so as to achieve the simulation of the fog effect.

Overall Network Introduction
YOLOv8 [19], as an advanced target detection model, uses one-stage regression to quickly and efficiently locate and identify targets in images through convolutional operations, thus achieving an optimal balance between detection speed and accuracy.The original YOLOv8 network architecture is shown in Figure 4.

Overall Network Introduction
YOLOv8 [19], as an advanced target detection model, uses one-stage regression to quickly and efficiently locate and identify targets in images through convolutional operations, thus achieving an optimal balance between detection speed and accuracy.The original YOLOv8 network architecture is shown in Figure 4.
However, ship targets are often blurred and heavily occluded in images captured by camera shake and rainy and foggy days, and the existing YOLOv8 network structure cannot meet the higher detection demands.To address these problems, this paper takes YOLOv8n as the base model and optimizes the model in terms of feature extraction, feature fusion and small target detection head.
Firstly, in order to help the model accurately understand the spatial information of ships, this study embedded the polarized self-attention mechanism into the C2f module, thereby creating a new C2f_PSA module that can retain the high-resolution information in the original deep convolutional neural network through polarization filtering.Furthermore, in order to enhance the model's ability to capture remote structural information from images and further optimize the effect of feature fusion, this paper introduces the MHSA mechanism into the neck network of YOLOv8, thereby not only improving the receptive field of the network but also significantly enhancing its feature expression capability.Finally, in order to further enhance the detection capability of small ships in complex environments, this paper first concatenates shallow and deep feature maps to obtain richer feature information and then adds smaller scale detection heads for detection.Figure 5 shows the YOLOv8-PMH network architecture proposed in this study, with improvements represented in color.However, ship targets are often blurred and heavily occluded in images captured by camera shake and rainy and foggy days, and the existing YOLOv8 network structure cannot meet the higher detection demands.To address these problems, this paper takes YOLOv8n as the base model and optimizes the model in terms of feature extraction, feature fusion and small target detection head.

Backbone
Firstly, in order to help the model accurately understand the spatial information of ships, this study embedded the polarized self-attention mechanism into the C2f module, thereby creating a new C2f_PSA module that can retain the high-resolution information in the original deep convolutional neural network through polarization filtering.Furthermore, in order to enhance the model's ability to capture remote structural information from images and further optimize the effect of feature fusion, this paper introduces the MHSA mechanism into the neck network of YOLOv8, thereby not only improving the receptive field of the network but also significantly enhancing its feature expression capability.Finally, in order to further enhance the detection capability of small ships in complex environments, this paper first concatenates shallow and deep feature maps to obtain richer feature information and then adds smaller scale detection heads for detection.Figure 5 shows the YOLOv8-PMH network architecture proposed in this study, with improvements represented in color.

C2f_PSA Module
The key aspect of the attention mechanism is to enable the algorithm to focus on the most critical information.Channel attention mechanisms such as SENet [30], ECANet [31] and GCNet [32] are mainly achieved by assigning different weights to different channels, but they remain consistent in processing spatial location information.Therefore, the channel attention mechanism ignores the differences in spatial dimensions, which may lead to insufficient utilization of information from different regions in the image, thereby affecting the performance of the model in identifying ships in harsh sea conditions.Although the spatial attention mechanism can capture the location information features of important areas well, its ability to distinguish color differences between ships and the sea surface is relatively limited.
After the emergence of spatial and channel attention mechanisms, dual attention mechanisms that integrate these two dimensions have emerged to further improve the performance of models, such as DANet [33] and CBAM [34].Although these two attention mechanisms combine channel information and spatial information, they are sensitive to input noise.Therefore, a polarized self-attention (PSA) mechanism [35] has been proposed for achieving high-quality pixel-by-pixel regression.Compared with existing channel spatial combination methods, PSA attention does not prefer specific layouts, thus demonstrating greater flexibility and adaptability.Combined with the principle of the PSA module, in this paper, channel self-attention and spatial self-attention are fused in parallel to obtain the PSA module shown in Figure 6.

C2f_PSA Module
The key aspect of the attention mechanism is to enable the algorithm to focus on the most critical information.Channel attention mechanisms such as SENet [30], ECANet [31] and GCNet [32] are mainly achieved by assigning different weights to different channels, but they remain consistent in processing spatial location information.Therefore, the channel attention mechanism ignores the differences in spatial dimensions, which may lead to insufficient utilization of information from different regions in the image, thereby affecting the performance of the model in identifying ships in harsh sea conditions.Although the spatial attention mechanism can capture the location information features of important areas well, its ability to distinguish color differences between ships and the sea surface is relatively limited.
After the emergence of spatial and channel attention mechanisms, dual attention mechanisms that integrate these two dimensions have emerged to further improve the performance of models, such as DANet [33] and CBAM [34].Although these two attention mechanisms combine channel information and spatial information, they are sensitive to input noise.Therefore, a polarized self-attention (PSA) mechanism [35] has been proposed for achieving high-quality pixel-by-pixel regression.Compared with existing channel spatial combination methods, PSA attention does not prefer specific layouts, thus demon- strating greater flexibility and adaptability.Combined with the principle of the PSA module, in this paper, channel self-attention and spatial self-attention are fused in parallel to obtain the PSA module shown in Figure 6.PSA attention, in order to preserve the potential loss of high-resolution information in the original deep convolutional neural network by downsampling, which maintains the dimension of C/2 in the channel and [H, W] in the space, minimizes information loss due to blurring and ensures that the network can capture more efficiently, even in the case of bad sea conditions that cause the buoys to rock violently, resulting in blurred imaging ship information.As illustrated in Figure 6, the PSA module first divides the input feature map X ∈ ℜ × × , and thus in the channel part, the input feature X is first transformed into two parts, q and v, by a 1 × 1 convolution operation.During this process, the channel of q is PSA attention, in order to preserve the potential loss of high-resolution information in the original deep convolutional neural network by downsampling, which maintains the dimension of C/2 in the channel and [H, W] in the space, minimizes information loss due to blurring and ensures that the network can capture more efficiently, even in the case of bad sea conditions that cause the buoys to rock violently, resulting in blurred imaging ship information.
As illustrated in Figure 6, the PSA module first divides the input feature map X ∈ R C×H×W , and thus in the channel part, the input feature X is first transformed into two parts, q and v, by a 1 × 1 convolution operation.During this process, the channel of q is compressed, while the channel of v remains at a high C/2 level to retain high-resolution color information.In order to further enhance the information of q, the Softmax function is used to enhance the information of q.Next, q and v are matrices multiplied, and the result is used to raise the channel from C/2 to C dimensions by convolution and LayerNorm operations.And finally, the Sigmoid function is used so that all the parameters are kept between 0 and 1, to complete the accurate adjustment of the weights.By enhancing the channel information, the model is able to accurately separate the ship from the sea surface, relying on the color difference.The final output of the channel part is shown in Equation (2): Here, W q , W v and W z are the 1 × 1 convolutional layers, σ 1 and σ 2 are the tensor reshaping operations, θ 1 is the intermediate parameter for channel convolution, F SM (•) is the Softmax operator, × is the matrix dot-product operation, ch is the multiplication operator for channel branching and F SG (•) is the Sigmoid function.
In the spatial part of the PSA module, unlike the channel part, for the q feature, global average pooling is used in this step to compress the spatial dimension and convert it to a size of 1 × 1. Next, q and v are subjected to matrix dot-product operations followed by the Reshape and Sigmoid operations.By enhancing the spatial part, the high-resolution spatial information of the ship can be well preserved, even when encountering shaking blur.The calculation process of the final spatial part is shown in Equations ( 3) and (4): Here, σ 1 , σ 2 and σ 3 are the tensor reshaping operations, F GP (•) is a global average pooling operation, and sp is a multiplication operator for spatial branches.
After connecting in parallel mode, the final output of the PSA module is: This article uses the PSA attention mechanism to improve the C2f module, as shown in Figure 7.After embedding the polarized self-attention mechanism behind the original bottleneck structure, a new Parallel Polarized module is formed.Among them, the backbone network partly adopts the residual structure, while the neck network does not have the residual structure, and finally, the C2f_PSA module is constructed.Through this improvement, the model preserves the high-resolution information of ship targets as much as possible, thereby improving the precise positioning of ship targets.
This article uses the PSA attention mechanism to improve the C2f module, as shown in Figure 7.After embedding the polarized self-attention mechanism behind the original bottleneck structure, a new Parallel Polarized module is formed.Among them, the backbone network partly adopts the residual structure, while the neck network does not have the residual structure, and finally, the C2f_PSA module is constructed.Through this improvement, the model preserves the high-resolution information of ship targets as much as possible, thereby improving the precise positioning of ship targets.

MHSA Module
Transformer was initially used for natural language processing tasks and then widely referenced in various aspects with its powerful representation capability and global vision [36], and its representative works include Vision Transformer (ViT) [37], data-efficient image transformers [38], Swin Transformer [39], etc.The multi-head self-attention mechanism (MHSA) [40] introduced in this study is the benchmark of attention mechanisms in Transformer.
The MHSA mechanism assigns higher weights to key target features by analyzing the correlation between features while assigning lower weights to irrelevant background features, thereby improving the feature fusion effect of the network and significantly reducing the interference of background factors such as rain and fog on ship detection.This makes the MHSA module highly applicable in ship recognition tasks in rainy and foggy

MHSA Module
Transformer was initially used for natural language processing tasks and then widely referenced in various aspects with its powerful representation capability and global vision [36], and its representative works include Vision Transformer (ViT) [37], data-efficient image transformers [38], Swin Transformer [39], etc.The multi-head self-attention mechanism (MHSA) [40] introduced in this study is the benchmark of attention mechanisms in Transformer.
The MHSA mechanism assigns higher weights to key target features by analyzing the correlation between features while assigning lower weights to irrelevant background features, thereby improving the feature fusion effect of the network and significantly reducing the interference of background factors such as rain and fog on ship detection.This makes the MHSA module highly applicable in ship recognition tasks in rainy and foggy environments.Therefore, this article introduces the MHSA mechanism into the YOLOv8 ship detection model, and the single-layer structure of the module is shown in Figure 8.
environments.Therefore, this article introduces the MHSA mechanism into the YOLOv8 ship detection model, and the single-layer structure of the module is shown in Figure 8.The MHSA module uses relative position encoding  and  to capture the relative position relationship between features.The input X size is  ×  × , where H and W represent the height and width of the feature matrix, and d represents the dimension of each label.
MHSA performs pointwise convolution on input features to obtain the query code The MHSA module uses relative position encoding R h and R w to capture the relative position relationship between features.The input X size is H × W × d, where H and W represent the height and width of the feature matrix, and d represents the dimension of each label.
MHSA performs pointwise convolution on input features to obtain the query code W Q , key code W K and value code W V .The content information is obtained by multiplying the query code and key code matrices, and the position information is obtained by multiplying the relative position code with the query code matrices.Then, the position information and content information are matrix summed, the Softmax operation is performed, and then matrix multiplication is performed with the value encoding, and the final output result is shown in Equation ( 6): The MHSA module takes into account not only the content information but also the relative distances between features at different locations, which enables it to effectively correlate cross-object information with location awareness.This unique design allows the MHSA module to fully utilize the correlation between features, thereby improving the network's focus on ship targets.By computing multiple heads in parallel, the model is able to better understand the environmental information around the ship and reduce the sensitivity to disturbances such as rain and fog, thus improving the model's ability and robustness to perceive complex scenes.

Small Ship Target Detection Head
Because in practical situations, it is usually fishing boats and small wooden boats that cause damage to buoys, the self-made BLOBS dataset contains a large number of small vessels.When using the original YOLOv8 algorithm for detection, its downsampling factor is large, which leads to the loss of microscopic information as the number of network layers increases and the receptive field expands.In this case, information about small-scale targets is spatially aggregated to a single point, which leads to a decrease in the accuracy of the model during the detection process.
In order to address the above issues, this study introduces an additional detection layer for smaller-size targets in the head section of YOLOv8.By upsampling the operation on the 80 × 80 feature map, the feature map size is changed from 80 × 80, 40 × 40 and 20 × 20 to 160 × 160, 80 × 80, 40 × 40 and 20 × 20, and the detection heads are changed from the original three to four.These modifications utilize the relatively comprehensive feature information in shallow feature maps, effectively ensuring a reduction in the network's receptive field, making the network pay more attention to the micro features, such as the shape and texture of small ships, thereby improving the recognition ability of small ships under adverse sea conditions.

Evaluation Metrics
In order to comprehensively evaluate the performance of the YOLOv8-PMH algorithm in visible image ship detection, precision, recall, average precision (AP) and mAP are selected as evaluation metrics in this study.Where "m" in mAP denotes the average value of AP for all categories.In addition, mAP50 denotes the mAP value when the IoU threshold is 0.5, while mAP50:95 represents the mAP value when the IoU is incremented from 0.5 to 0.95 in steps of 0.05.These two metrics are the key indexes for evaluating the performance of the model, and the higher value of mAP implies that the model's detection effect is more excellent.The expression of each evaluation index is as follows: Recall = TP TP + FN (8) In the formula, TP represents the correct number of positive samples for prediction, which means that the YOLOv8-PMH algorithm correctly detects and locates ship targets.FP refers to an object that is actually a negative sample, but the network incorrectly predicts it as a positive sample.FN refers to an object that is actually a positive sample, but the network incorrectly predicts it as a negative sample.

Experimental Platform
All the experimental code sources in this study are in Python, using the Pytorch neural network architecture, with software installation versions of Python 3.8.10,torch 1.11.1 and cuda11.3.The NVIDIA GeForce RTX 3080 graphics card and 50 G RAM were used for the computational unit.Under the above experimental conditions, this study conducted network model training, and the specific parameters of the training are shown in Table 2.

Ablation Experiment
In order to verify the effectiveness of the improved method in identifying ship targets under complex sea conditions, seven ablation experiments were conducted on the BLOBS dataset using YOLOv8n as the baseline to analyze the impact and effectiveness of different modules on the YOLOv8-PMH algorithm.The comparison results of the ablation experiment are shown in Table 3.
It is obvious from Table 3 that in the ablation experiments, compared to the original YOLOv8 target detection algorithm, by adding the PSA mechanism to the original network architecture, the mAP50 is improved from 92.2% to 93.1%, and the mAP50:95 is improved from 70.4% to 71.6%, and these enhancements prove that the PSA mechanism has an augmentation of the spatial and channel dimensions.These improvements demonstrate the enhancement of the PSA mechanism in both spatial and channel dimensions, which ensures that the system is still able to effectively recognize ship information, even when the spatial resolution of the image is reduced due to shaking.Furthermore, after introducing the MHSA mechanism and small object detection head, the experimental results not only demonstrated stable performance in detecting large ships but also significantly improved the detection accuracy of small ships.The final comprehensive experimental data shows that the overall mAP50 has increased by 2%, and mAP50:95 has increased by 2.8%, which proves that the improvement strategy proposed in this paper has achieved certain improvements in detection performance indicators.
In addition, in order to show the effect of the improved model more intuitively, this paper also shows a visualization example of the original YOLOv8 target detection algorithm and the YOLOv8-PMH algorithm proposed in this paper in Figure 9.By comparing the images, it can be found that the reduction in ship spatial resolution and partial occlusion caused by the shaking of the buoy significantly reduces the recognition accuracy of the YOLOv8 object detection algorithm.However, the YOLOv8-PMH algorithm improves this situation very well; even in images with low clarity or occlusion, it can still maintain high detection accuracy.In rainy and foggy weather, especially on foggy days with insufficient visibility, the problem of unimpressive detail features and difficult extraction of small target vessels is significant, and in practical situations, it is often small wooden boats as well as fishing boats and other small vessels that destroy buoys.However, the original YOLOv8 algorithm suffers from poor prediction of target size and position.In contrast, the YOLOv8-PMH algorithm proposed in this study possesses better feature extraction capabilities and has a better detection effect on all types of vessels.Therefore, the improved algorithm enables better evidence collection for such small vessels.
The PR curve is formed by the precision and recall of the model as the coordinate axes.Usually, due to the mutual constraint between precision and recall, the comprehensive performance of the model can be comprehensively understood by observing the enclosing area of the PR curve.In Figure 10, the PR curve of the original YOLOv8 model is shown on the left, and the PR curve of the YOLOv8-PMH model is shown on the right.By comparison, it is evident that the PR curve of the YOLOv8-PMH algorithm has a larger encircling area, especially the green curve about the fishing boat is obviously close to the upper-right corner, and this result intuitively demonstrates that the YOLOv8-PMH algorithm has significantly improved the detection results.By comparing the images, it can be found that the reduction in ship spatial resolution and partial occlusion caused by the shaking of the buoy significantly reduces the recognition accuracy of the YOLOv8 object detection algorithm.However, the YOLOv8-PMH algorithm improves this situation very well; even in images with low clarity or occlusion, it can still maintain high detection accuracy.In rainy and foggy weather, especially on foggy days with insufficient visibility, the problem of unimpressive detail features and difficult extraction of small target vessels is significant, and in practical situations, it is often small wooden boats as well as fishing boats and other small vessels that destroy buoys.However, the original YOLOv8 algorithm suffers from poor prediction of target size and position.In contrast, the YOLOv8-PMH algorithm proposed in this study possesses better feature extraction capabilities and has a better detection effect on all types of vessels.Therefore, the improved algorithm enables better evidence collection for such small vessels.
The PR curve is formed by the precision and recall of the model as the coordinate axes.Usually, due to the mutual constraint between precision and recall, the comprehensive performance of the model can be comprehensively understood by observing the enclosing area of the PR curve.In Figure 10

Performance Comparison of Multiple Models
In order to fully validate the effectiveness of the improved algorithm, this study conducted several sets of comparative experiments to compare and analyze its performance with a series of superior target detection algorithms, including Faster R-CNN (ResNet50), YOLOv3-tiny, YOLOv4-tiny, YOLOv5n, YOLOv6n and YOLOv8n.The experimental results are shown in Table 4. Based on the comparison results of the different models mentioned above, the YOLOv8-PMH algorithm proposed in this study achieves 94.2% for mAP50 and 73.2% for mAP50:95, which collectively shows the best detection performance on the homemade dataset.When detecting large ships, such as container ships and regular cargo ships, the detection performance of each model is quite good.However, for smaller vessels such as fishing boats, passenger ships, and sailboats that are more likely to cause damage to buoys, the YOLOv8-PMH algorithm exhibits superior performance.
In order to more intuitively highlight the superiority of the algorithm proposed in this article and its performance comparison with other detection algorithms, a visual example of the YOLOv8-PMH algorithm and other existing algorithms is shown in Figure 11.

Performance Comparison of Multiple Models
In order to fully validate the effectiveness of the improved algorithm, this study conducted several sets of comparative experiments to compare and analyze its performance with a series of superior target detection algorithms, including Faster R-CNN (ResNet50), YOLOv3-tiny, YOLOv4-tiny, YOLOv5n, YOLOv6n and YOLOv8n.The experimental results are shown in Table 4. Based on the comparison results of the different models mentioned above, the YOLOv8-PMH algorithm proposed in this study achieves 94.2% for mAP50 and 73.2% for mAP50:95, which collectively shows the best detection performance on the homemade dataset.When detecting large ships, such as container ships and regular cargo ships, the detection performance of each model is quite good.However, for smaller vessels such as fishing boats, passenger ships, and sailboats that are more likely to cause damage to buoys, the YOLOv8-PMH algorithm exhibits superior performance.
In order to more intuitively highlight the superiority of the algorithm proposed in this article and its performance comparison with other detection algorithms, a visual example of the YOLOv8-PMH algorithm and other existing algorithms is shown in Figure 11.By comparing the images, it can be found that YOLOv3-tiny and YOLOv5n have a slightly inferior detection performance compared to other object detection algorithms in images with lower resolution caused by blur.Faster R-CNN performs well in detecting large targets but performs poorly in detecting small targets.In the comprehensive analysis, the YOLOv8-PMH algorithm proposed in this paper is able to accurately localize and identify the targets in various test scenarios compared to other algorithms, demonstrating excellent detection performance, and can be well applied to the ship warning task at the end of the buoy.
In order to better demonstrate the performance of the algorithm on different datasets, the SeaShips dataset was processed and experimented with in this study using the same method.The experimental results are shown in Table 5.The experimental results show that the improved algorithm's public dataset still maintains better detection results.

Conclusions
This study aims to improve the ship detection capability of ocean buoys in complex weather conditions in order to provide technical support for subsequent warning and evidence collection at the buoy end.In order to achieve this goal, this paper proposes an improved YOLOv8 target detection algorithm by simulating the actual working situation for the buoy platform, which has been in a wet environment for a long time at sea and keeps swaying.Firstly, in the image preprocessing stage, a 50 × 50 blurring kernel is used to simulate the visual blur caused by buoy swaying, and Gaussian noise and center point synthetic fog are used to simulate a rainy and foggy environment.Subsequently, this study compared and analyzed the target detection accuracy of the original algorithm and the YOLOv8-PMH algorithm, and the research results showed that: 1.
The YOLOv8-PMH algorithm improved with PSA attention maintains C/2 in the channel dimension and [H, W] in the spatial dimension, a mechanism that effectively preserves the high-resolution information in the original deep convolutional neural network and significantly reduces the effect of blurring due to camera shake on information loss.Especially in the ship target recognition task, the C2f module with integrated PSA attention leads to a 0.9% improvement in the mAP50 of the algorithm and a 1.2% improvement in mAP50:95; 2.
The MHSA attention mechanism based on the Transformer architecture effectively reduces rain and fog background interference and enhances the ability of feature fusion.After introducing this attention into the neck module, the algorithm achieved a 0.7% improvement in mAP50 and a 0.9% improvement in mAP50:95; 3.
Based on the original algorithm, the newly designed small ship target detection head significantly improves the feature extraction capability for small ships without affecting the detection performance for large ship targets.With this improvement, the mAP50 of the algorithm realizes a 0.8% improvement, and the mAP50:95 obtains a 0.9% improvement; 4.
After integrating the above improvements, compared with the original YOLOv8 algorithm, the mAP50 of the YOLOv8-PMH algorithm has increased by 2%, and the mAP50:95 has increased by 2.8%.

Figure 2 .
Figure 2. Example of image acquisition of various types of ships.

Figure 2 .
Figure 2. Example of image acquisition of various types of ships.

Figure 2 .
Figure 2. Example of image acquisition of various types of ships.

Figure 3 .
Figure 3. Preprocessing image simulation under different sea conditions.(a) Visual blur processing; (b) simulation of rain impacts; (c) center point synthetic fog treatment.

Figure 3 .
Figure 3. Preprocessing image simulation under different sea conditions.(a) Visual blur processing; (b) simulation of rain impacts; (c) center point synthetic fog treatment.

Figure 9 .
Figure 9.Comparison of detection results in different situations.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.

Figure 9 .
Figure 9.Comparison of detection results in different situations.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.
, the PR curve of the original YOLOv8 model is shown on the left, and the PR curve of the YOLOv8-PMH model is shown on the right.By comparison, it is evident that the PR curve of the YOLOv8-PMH algorithm has a larger encircling area, especially the green curve about the fishing boat is obviously close to the upper-right corner, and this result intuitively demonstrates that the YOLOv8-PMH algorithm has significantly improved the detection results.

Figure 11 .
Figure 11.Comparison of experimental results of YOLOv8-PMH with other algorithms.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.Figure 11.Comparison of experimental results of YOLOv8-PMH with other algorithms.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.

Figure 11 .
Figure 11.Comparison of experimental results of YOLOv8-PMH with other algorithms.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.Figure 11.Comparison of experimental results of YOLOv8-PMH with other algorithms.(a) Fuzzy detection results; (b) rainy day detection results; (c) fog detection results.

Table 1 .
Number of vessels of each type.

Table 1 .
Number of vessels of each type.

Table 1 .
Number of vessels of each type.

Table 2 .
Configuration table for dataset training parameters.

Table 4 .
Comparison of ship detection results between YOLOv8-PMH and other networks.

Table 4 .
Comparison of ship detection results between YOLOv8-PMH and other networks.

Table 5 .
Comparison of detection accuracy in SeaShips dataset.