1. Introduction
In recent years, with the increasing frequency of maritime political, trade, and cultural exchanges, ships play an important role as carriers of material and cultural communication. In order to ensure the safety of marine environments, smooth traffic, and an efficient operation level, the real-time detection of ship targets has long been the focus of wide attention. The marine environment is complex and changeable, and complex natural environments, such as rain, snow, and thunderstorms, bring certain difficulties to the interpretation of ship targets. In addition, the complex island background, such as the intricate distribution of islands and reefs, ports and terminals, and maritime facilities and equipment, will cause certain occlusions and overlapping interference for the ship and affect the accuracy of image detection. Moreover, the types and sizes of ships at sea are diverse, especially the small multi-dynamic targets at sea are important and difficult targets for sea surface monitoring, early warning, and tracking.
There are many traditional methods for target detection of maritime vessels, such as the constant virtual alarm rate method for SAR images [
1], edge detection of ship infrared images based on adaptive Canny operator [
2], and detection methods using wavelet transform for ordinary optical ship targets. The ship target detection method for SAR images uses a scattering mechanism to process the image through the different beam propagations between the ship and the background or other targets to realize the ship detection. The ship infrared image detection method has good penetration, is not affected by ambient brightness, and can realize the interpretation of ship target information by fuzzy mathematics and spectral classification. Detection methods based on common optical images can achieve rapid detection of multiple targets in a small sea area with high detection accuracy, which can be achieved using fuzzy analysis, wavelet change method, gray histogram segmentation method, and so on. Traditional sea ship target detection methods are difficult to obtain images with; most need to set target characteristics, leading to larger workload, difficulty in setting characteristics, and vulnerability to weather conditions, marine environments, sea clutter, complex electromagnetic interference, and other factors. SAR images are highly correlated with polarization methods, weather, wind, etc. Infrared image detection is easily disturbed by non-uniform quantization noise, and the image clarity is poor; visible-light image detection is not adaptable to the shape of the target, weather conditions, and whether the appearance is completely displayed.
The complex marine environment is not only affected by the complex natural environment but also aggravated by the complex electromagnetic environment composed of the interweaving and overlapping radiation sources of various frequency bands, such as navigation signals, communication signals, radio broadcasting, and television signals, which limit the ability of target detection methods based on infrared and radar images and make it difficult to meet the needs of accurate detection. The problems, including occlusion, overlap, and complex island background, all bring great interference to target detection. The wide distribution of long-range small targets at sea is relatively small in the image, which also increases the difficulty of processing. The 21st century has brought about the rapid development of related technologies in the field of artificial intelligence; target detection networks based on deep learning have gradually matured, and the technology has a high-performance detection advantage, which is useful in achieving accurate detection and identification of the target; since its inception, its network and algorithms have been continuously improved and enhanced, and the detection accuracy has reached a level comparable to the human eye. Deep learning-based target detection algorithms are divided into second-order target detection algorithms and first-order target detection algorithms; the second-order target detection algorithms have higher detection accuracy, but the detection time is long, and it is difficult to meet the real-time demand.
The rigid object detection problem of the first-order object detection network is simplified to an end-to-end processing problem, which greatly improves the detection efficiency, but the detection accuracy is slightly worse than that of the second-order object detection network. Scholars from all walks of life have carried out long-term research and improvement work on the first-order target detection network. Wang et al. [
3] introduced the CFE (Comprehensive Feature Enhancement) module in CFENet [
4] in the YOLOv3 network and improved the detection performance of small and medium-sized ship objects in visible images; Yang et al. [
5]. proposed anchorless ship objects based on a rotating enclosure frame; Gu Jiaojiao et al. [
6]. introduced a multi-scale feature fusion module in a Faster R-CNN network to improve the network detection performance; MA Z F [
7] proposed the processing method of multi-channel SAR image fusion, which improves the detection accuracy of the YOLOv4 network.
The YOLO series first-order target detection algorithm has high detection accuracy, fast speed, and little influence by interference, especially in small targets. The subsequent improved networks proposed make the network model more streamlined, and the training and detection speed have been greatly improved [
8].
The YOLOv11 target detection network proposed in 2024 has better-balanced the relationship between detection efficiency and accuracy, and the detection accuracy and speed have been improved by leaps and leaps compared with previous target detection methods, with outstanding performance in a number of visual processing tasks. In addition, the YOLOv11 target detection algorithm based on visible images is not affected by complex electromagnetic environment interference, can more accurately interpret the blocked target and small target information, and can more accurately distinguish the targets of multiple types of ships. The YOLOv11 algorithm is more lightweight and suitable for application in scenarios with limited hardware conditions. In the future, the network can be migrated to embedded platforms and carried to mobile platforms, such as drones, to realize real-time detection of marine ship information, providing favorable conditions. In order to further improve the performance of accurate detection of multiple types of dynamic targets in complex marine environments, especially for the detection tasks with the influence of occlusion and overlapping situations, small targets, and dynamic factors; to improve the detection efficiency; and, at the same time, to consider the limitations of UAVs and other platforms on the performance of the equipment, this paper proposes an improved YOLOv11 ship target detection network and carries out a study based on visible-light images to avoid the impact of the complex electromagnetic environment interference on infrared/radar image acquisition equipment at sea and environmental interference on infrared/radar image acquisition equipment. The main work is as follows:
The original YOLOv11 backbone network is replaced with the improved EfficientNetv2 network, and the channel attention mechanism CA is introduced to improve the ship feature learning ability under complex sea conditions.
In order to solve the problem of detecting dense ship target interference by moving objects at sea, reducing leakage and false positives, the algorithm in the process of neck feature pyramid network fusion ConvNeXt Block ideas, which allows for targeted segmentation of target feature information area in 2D space; this method reduces the influence of high semantic noise in the process of context semantic fusion and supports the model extraction being more conducive to the characteristics of ship class classification.
As if the traditional IoU distance index is affected by the limited number of pixels and may adversely affect the detection of small target vessels, the WIoU loss function is introduced in the model of this experiment. This loss function helps to compensate for the effects of small targets with fewer pixels during the regression loss calculation and thus improve the detection performance of small targets.
A self-built visible-light ship image dataset was created, including a variety of complex backgrounds, occlusion overlaps, small target scenes, ship targets covering a variety of scales, and different images from different angles, to enhance network adaptability.
3. Improved YOLOv11 Dynamic Multi-Type Ship Target Detection Algorithm
In order to improve the performance of accurate and efficient detection of diverse ship targets in complex scenes, this paper mainly focuses on strengthening the backbone network, improving the neck network and optimizing the loss function to enhance the network detection performance based on visible-light images.
3.1. EfficientNetv2 + CA-ECNet
Influenced by the complex sea conditions and severe weather and the performance of image acquisition equipment, the collected target images of ships sailing at sea are prone to uneven light and uneven imaging, which need to further strengthen the extraction of target features by the YOLOv11 backbone network. The EfficientNetv2 model integrates a series of new network architecture design and training strategies to improve the detection performance and training efficiency of the model. In addition, the ECNet network is constructed to improve the EfficientNetv2 network to further improve the ability of the model to learn the characteristics of multiple types of target ships in complex environments.
3.1.1. EfficientNetv2
EfficientNetv2 [
12] is a deep learning model released by Mingxing Tan and Quoc V. Le et al. in 2021. It is the latest version of the EfficientNet series. On the basis of maintaining the model performance, it has faster training rate, higher operational efficiency, and smaller model size. EfficientNetv2 is a significant advancement in the field of deep learning and has important contributions to real-world application problems. The network performance comparison is shown in
Figure 11.
EfficientNetv2, using a new search method—training-aware NAS—can focus on the training speed of the model while searching, rather than other networks that only evaluate performance at the end, balancing the relationship between detection accuracy, training speed, and parameter number, with higher accuracy and faster training speed; the network also tightens the size of the search space by reducing unnecessary search items, such as pooling skip, and focusing on channel size (channel sizes).
EfficientNetv2 introduces the Fused-MBConv structure in the shallow network, which is optimized on the basis of MBConv, fuses expansion convolution and Depthwise convolution, and replaces them with a 3 × 3 standard convolutional layer, simplifying the network architecture and reducing the computing cost, especially in the training phase, which is important for deep learning applications in resource-constrained scenarios. The module structure is shown in
Figure 12.
In order to better exert the performance of Fused-MBConv, EfficientNetv2 optimized the scaling strategy based on the idea of composite scaling, and the model can better balance the relationship between rate and performance. EfficientNetv2 also employs Progressive Learning, a strategy that accelerates convergence, avoids premature saturation, and improves model performance.
3.1.2. CA Mechanism
In this paper, the EfficientNetV2 network is used as the backbone network to lose some feature information of the target while generating feature maps of different resolutions. When the network processes feature maps of different receptive fields through a convolution operation, it obtains the effective features of the target and produces some interference information and accuracy loss. In order to improve the accuracy of model detection under limited resources, it is necessary to provide more resources for the important features of the target, namely the weight in the neural network. The introduced attention mechanism can enable the convolutional neural network to extract effective features and delete useless features, which can greatly improve the efficiency and accuracy of image processing. Commonly used attention mechanisms are SE, CBATM, ECA, etc. When the existing attention mechanism obtains the channel attention, the channel-processing method is mostly average pooling/global maximum pooling, which will cause the loss of spatial information in the target part. In the detection of ship targets in complex marine environments, spatial information is particularly important for accurately judging the target position and size information.
Coordinate attention (CA) [
13], in response to the insufficient ability of traditional mechanisms in processing spatial information and increasing model complexity, integrates position information into channel attention, and by calculating the attention on the two dimensions of height and width, it can accurately capture the relationship between different channels and the spatial distribution of features and more effectively capture the visual task of long-distance dependence on the important features and characteristics of the visual task, which can improve the accuracy of feature representation, and the mechanism can be flexibly inserted into typical mobile networks, such as MobileNetv2, MobileNet, and EfficientNet, which basically do not increase the computational overhead.
The performance of different attention mechanisms is compared in the same backbone network (MobileNetV2), as shown in
Figure 13.
The structure of the CA mechanism is shown in
Figure 14.
The input feature map’s dimensions are C × H × W, where C is the number of channels, and H and W are height and width. After spatial pooling to subsidize the height and width direction information, feature fusion and transformation, batch normalization and nonlinear activation, classification and 2D convolution, and sgmoid activation and recalibration to the weight of the original features are performed, and then, output the feature map of C × H × W, which contains the channel information and spatial location information.
3.2. Improving the Neck Network (ConvNeXt Block Thought)
The marine environment is complex, and the widely distributed islands and reefs, marine buildings, and smoke curtains and clouds all bring a certain occlusion interference to the ship targets. In the densely distributed areas of ports and terminals, the mutual occlusion of ships also makes it more difficult to detect them. The YOLOv11 network is susceptible to occlusion interference when extracting the ship image feature information, which makes the model focus on the local pixel positions only. In order to effectively utilize the context to capture the target information, it is necessary to stack the convolutional layers several times. However, direct repeated stacking of these layers leads to computational inefficiency and difficulty in optimizing the model. In this paper, a new CCB (C3k2 ConvNeXt Block, CCB) module is constructed using a spanning information connection to avoid this problem. The CCB module consists of the C3k2 module and the ConvNeXt Block module. These modules can significantly improve the model’s ability to capture contextual information.
ConvNeXt [
14] is a computer vision model released by Meta AI researchers in 2022 which explores the potential performance of CNNs (convolutional neural networks) for image recognition problems, especially in comparison with mainstream models, such as Vision Transformer (ViT). The design of the ConvNeXt structure draws on the concept of the Transformer model and is able to perform better than the Transformer under the premise of guaranteeing the efficiency of the model.
3.3. WIoU Loss Function
YOLOv11 uses the EIoU loss function, replacing the aspect ratio with a regression on the width and height values, and such processing may result in a situation where the overlap area of the real frame and the predicted frame is the same, but the width and height values are different, leading to different loss values; secondly, it does not effectively deal with the balance of difficult and easy samples, which may cause excessive attention to the simple samples in optimization during target detection and neglect the difficult samples’ learning; again, it did not consider the problem of the angle of the detection frame: if the angle is different from the real frame, even if the overlap area is the same, it will still affect the overlap accuracy.
WIOU (Wise-IoU) [
15] is a new loss function proposed for the improvement of the bounding-box regression loss (bounding). The quality of the anchor frame is evaluated using a dynamic non-monotonic focusing mechanism (FM) with gradient gain, which can improve the overall performance of the ship detection model by reducing the effect of harmful gradients while ensuring the effect of high-quality anchor frames.
In
Figure 16, the overlapping area is as follows:
The improved overall structure diagram of YOLOv11 is shown in
Figure 17.
The algorithm pseudocode is shown in Algorithm 1.
Algorithm 1. YOLOv11 Improved model algorithm. |
1: | begin |
2: | //Defines the input image size and number of categories |
3: | input_size = (640, 640) |
4: | num_classes = 6 |
5: | //Defines the number of sizes of the anchor boxes |
6: | anchors = [] |
7: | num_anchors = len(anchors) |
8: | function YOLOv11_improve_detector(input): |
9: | //backbone |
10: | x = Conv(input, 64, 3, 2) |
11: | x = Conv(x, 128, 3, 2) |
12: | x = MBConvBlock_CA(x, 256, 3, 1, 6) |
13: | x = Conv(x, 256, 3, 2) |
14: | x = MBConvBlock_CA(x, 512, 3, 1, 6) |
15: | out1 = x |
16: | x = Conv(x, 512, 3, 2) |
17: | x = C3k2(x, 512, 3, 2) |
18: | out2 = x |
19: | x = Conv(x, 1024, 3, 2) |
20: | x = C3k2(x, 1024, 3, 2) |
21: | x = SPPF(x, 1024, 5) |
22: | x = C2PSA(x, 1024) |
23: | out3 = x |
24: | //head |
25: | x = Upsample(None, 2) |
26: | x = Concat(1) |
27: | x = CCB(512) |
28: | x = PANetFPN(x) |
29: | //output |
30: | output1 = Conv(out1, num_anchors×(num_classes + 5), 1) |
31: | output2 = Conv(out2, num_anchors×(num_classes + 5), 1) |
32: | output3 = Conv(out3, num_anchors×(num_classes + 5), 1) |
33: | return output1, output2, output3 |
34: | //Traversing each frame, T is the mp4 file timestamp array |
35: | for t in range(T) do: |
36: | for img in range (T + 1) do: |
37: | if Judgment_score(YOLOv11_improve_detector(img)) > 0.25, draw box, continue |
38: | end for |
39: | end for |
4. Experimental Results and Comparative Analysis
4.1. Dataset Construction
For something more in line with the actual marine environment and image acquisition equipment perspective, verify the improved YOLOv11 algorithm in the complex island background of the dynamic detection performance of ship targets through the existing dataset selection and network search to collect a lateral view angle of a visible-light ship image, including multi-angle, multi-scale information. The self-built visible image dataset of the ship contains many types of ships; the dataset contains a block-overlapping scene, a complex nearshore islands and reef background, and small target images to train and verify the network detection performance in the marine environment.
In the dataset, ship types were divided into engineering ships, cargo boats, passenger ships, sailboats, and fishing boats, with a total of 6846 pieces. The image display of the dataset is shown in
Figure 18, and the structure is shown in
Table 2.
The YOLOv11 dataset was constructed, and all the images are shown in a .jpg format, which is named form 0000001. The targets in the picture are marked by a labeling annotation tool, and the targets are marked with the rectangular box most fitting the size of the outer contour of the ship. The targets with incomplete contours caused by occlusion overlap or other reasons are also calibrated in detail, so as to ensure that the network fully learns and trains on the targets of different types, shapes, and sizes.
4.2. Experimental Environment
This experiment was completed in an Ubuntu18.04 operating system using a Pytorch 1.10 deep learning framework, cuda11.1 + cudnn8.0.4 + opencv4.5.8 environment; hardware environment: Intel (R) Xeon (R) Platinum i9-13900k, Nvidia GeForce RTX 4090; and python language. See
Table 3 for details.
4.3. Setting the Experimental Parameters
The input image size of this experiment was 640 × 640, the initial learning rate was 0.01, the learning rate was updated using stochastic gradient descent, the momentum was 0.937, and the weight delay was 0.0005. During the training process, Mosaic data enhancement was used to read four pictures each time and then flip and zoom, respectively, so as to enrich the detection background. The label smoothing was set to 0.01 to prevent model overfitting and increase the generalization of the model. All models were trained on 500 epochs, and the batch size was set to 32 and number of worker threads to 16.
4.4. Evaluation Indicators
(1) Precision
Precision is the accuracy/accuracy rate, that is, the ratio of all samples that really belong to this category.
(2) Recall
Recall is the recall rate/recall rate, also known as the detection rate, that is, the proportion of the number of targets detected as positive classes to the total number of all detected as target classes.
(3) AP (Average precision)
AP is the graph area between the precision–recall curve (PR curve) and the X axis.
(4) mAP (Mean Average Precision)
The mAP is the average of the AP ordered by all the query results.
(5) Identification speed (FPS)
The FPS is the number of identified images per second, per unit frame/s.
4.5. Analysis of the Experimental Results
4.5.1. Analysis of the Results of Improved Network Experiments
Use the self-built visible-light ship target dataset to train the improved dynamic ship target detection network, and the ratio of the training set and the test set is 8:2. The Mosaic data enhancement was used during the training process, and after 500 rounds of training, the network gradually converged and stabilized. The network LOSS curves are shown in
Figure 19.
As can be seen from the curve change, the loss value is large at the beginning of the training; then, the loss value decreases greatly, and then the region is stable. The box_loss curve and dfl_loss curve in the test set curve gradually flatten after about 50 rounds, and the cls_loss curve flattens after 75 rounds. In general, the test set loss function curve leveled off earlier than the training-set curve, while the fluctuation of the test set loss function curve is slightly, but not significantly, larger than the training set. This is because the test set pictures do not exist in the training set, and there may be some different images, which is a normal situation. After 500 rounds of training, the loss curve gradually declines and flattens; the loss values are small. In the process of loss value decline, there are no sharp fluctuations, no overfitting, under fitting, the depth of the model construction, the size, and dataset are reasonable, the training task is better, and the algorithm obtains better convergence.
The PR curve of the network is shown in
Figure 20.
The PR curve refers to the precision–recall curve, which shows the change in model accuracy and recall rate. The higher the accuracy, the more it means that most of the positive samples predicted by the model are true positive samples, and the higher the recall rate, the more it means that the model can correctly detect more real positive samples. The PR curve of five types of ship targets in the dataset is shown in
Figure 20, when IoU is set at 0.5. The squarer the curve is, the more accurate and comprehensive the detection results are. It can be seen that the engineering ship and passenger ship have the best effect: the detection accuracy reaches 0.999, and the curve is closer to the square compared with other classes. The detection accuracy of the cargo ship is 0.798, and the detection accuracy is relatively low in the five types. The mAP value of the network reaches 0.897. The average detection accuracy of the algorithm is higher, which can be seen from the blue PR curve. Although the accuracy decreases with the increasing recall rate, the accuracy value of the model can also be guaranteed at a high level when the recall rate is high. The detection accuracy of the different types of targets is shown in
Table 4.
As can be seen from the statistical results of all kinds of targets, the network shows excellent performance in the detection of multiple types of ship targets. The average detection accuracy of the network reaches 89.7%, and the detection time of a single picture is about 1.5 ms, which can meet the real-time requirements. The detection effect of engineering ships and passenger ships should reach 0.999, mainly due to the large pixel proportion of these two types of targets in the image and the obvious image characteristics. The detection accuracy of cargo ships is slightly lower than that of other categories, which may be caused by the large difference in the target characteristics of cargo ships collected in the dataset. The network detection accuracy of the sailing class reaches 0.811, reflecting its excellent performance in small-target detection.
The two image tables in
Figure 21 represent the mean accuracy when the IoU is 50% and the mean IoU threshold between 90% and 95%. The latter requires higher requirements, so the mAP is slightly lower than the left figure, and the mAP 50 index is relatively loose. It can be seen that the final average accuracy is stable at about 0.92, showing the overall performance of the model under the condition of low rigor. The average accuracy of the model reached about 0.8 under 90–95 condition, indicating that the model can also detect and locate the targets more accurately under strict IoU conditions.
The actual detection effect of the network is shown in
Figure 22.
In
Figure 22 is an actual rendering of the detection; in complex environments, the network can better overcome the nearshore island background interference influence, can accurately distinguish a variety of types of ship targets, and the small target detection performance is good, which can accurately detect the image of small target information. The method can detect the ship type and location information accurately in the scenes with overlapping occlusion.
4.5.2. Comparison Experiments
For the testing analysis, a network was used to compare the advantages and effectiveness of the typical network algorithm, considering the actual environment for target detection accuracy, speed, and hardware capacity limit in the same experimental environment and parameter setting for the YOLOv8n, YOLOv10n, and YOLOv11n network training, with iterations of 500 epoch, and the network performance was tested with the basic statistical network performance parameters shown in
Table 5.
Compared with the other network models, the improved algorithm proposed in this paper has a higher precision rate of 93.9% and an average detection accuracy of 89.7%, and the network recall rate of 77.1% is also significantly improved compared with YOLOv11n. Also, the model has fewer parameters than YOLOv8n and YOLOv10n and higher detection accuracy than YOLOv11n, with only a small number of parameters; the optimized algorithm detects a single image in 1.5 ms, which is faster than Mask R-CNN, CenterNet, and YOLOv10n. Due to the deepening of the number of layers by the network improvement, it requires a lot of time and effort to improve the feature extraction ability. The feature extraction ability, at the same time, needs more parameter computation, resulting in a decrease in inference speed, with YOLOv8n; inYOLOv11n, the detection speed is slightly different, but the detection accuracy is significantly improved, and the detection time can meet the real-time requirements.
In summary, the analysis improved the optimization of YOLOv11’s target detection network in terms of parameters and processing time, reflecting the speed advantage of the lightweight model, the various ship targets and shielding overlap in a complex environment, and the high-accuracy small target detection. The model can accurately meet the target interpretation in the complex marine environment, improving real-time ship monitoring at sea and providing feasible technical support.
4.5.3. Ablation Experiments
This chapter proposes three aspects of YOLOv11 for conducting ablation experiments to verify the function of each module in the network. Based on the actual environment, the application needs to consider the image-processing equipment’s computing performance and platform limit, respectively, from the detection accuracy, number and detection time of the module-improved network’s comprehensive comparative analysis performance, unchanged experimental environment and parameters, experimental dataset for self-built visible-light ship target dataset, and ablation experiment results, as shown in
Table 6. ‘×’ means the module is not used in the new network. ‘√’ means the module is used in the new network.
From
Table 6, it can be seen that, under the same experimental environment, the detection accuracy of the improved YOLOv11 network is 89.7%, which is improved by 5.6% compared to the original YOLOv11 network’s accuracy of 84.1%. This proves that the improved methods can improve the detection accuracy of the network efficiently. The EfficientNetv2 module is introduced on the basis of the original YOLOv11 network, which optimizes the model-training mode, strengthens the network training and learning process, better balances the relationship between the network recognition performance and the model efficiency, and increases only a small number of network parameters while improving the detection accuracy by 1.8%. After the introduction of the CA mechanism into the network, the network detection accuracy is increased by 0.7%. The CA mechanism can accurately capture the relationship between different channels and the spatial distribution of features by aggregating the input features in the horizontal and vertical directions, which effectively improves the accuracy of the feature expression, and it can more accurately extract the spatial information of the ship target to improve the accurate interpretation of the target position and size information for the problem of ship target detection in the complex marine environment. For the problem of ship target detection in the complex marine environment, the spatial information of the ship target can be extracted more accurately to improve the accurate interpretation of the target position and size information. Moreover, the CA mechanism can assign specific weights to the feature values of different channels to enhance the attention to useful features and inhibit the effect of useless features, which reduces the unnecessary waste of resources, lowers the computational complexity, and lowers the number of network parameters for the introduction of the CA mechanism. The introduction of the ConvNeXt Block module uses the spanning information connection to simplify the network model and reduce the overall number of parameters of the model, while the ConvNeXt Block significantly improves the model’s ability to capture contextual information and is able to effectively deal with the situation where the target is subjected to occlusion and interference, and the network’s detection accuracy is again improved by 0.7%. The network uses the WIoU Loss Function to use a dynamic non-monotonic focusing mechanism, which ensures a high-quality anchor frame effect while reducing the effect of harmful gradients, improving the overall performance of the ship detection model and improving the network detection accuracy by 2.4% without increasing the number of residual parameters. The network detection rate is 1.5 ms, which is increased compared to the original network, mainly due to the introduction of the EfficientNetv2 module and the CA mechanism. The complexity of the algorithm model is increased, and the detection rate of the algorithm is effectively improved after the using of the ConvNeXt Block module (
Figure 23).
The first to the fifth picture correspond to the detection results of the improved network in
Table 6, respectively. From the comparison of the results of the detection effect graph, after the optimization of the network, the accuracy of target detection is significantly improved, the classification of target categories is more accurate, the effect of small target detection is better, and more small ship targets in the picture can be detected, with a lower leakage detection rate. The improved YOLOv11 network has a better effect on the detection of multiple types of ship targets in complex environments, which is more suitable for complex marine environments, and is able to accurately identify the ships and small targets that are subjected to the interference of the occlusion. The speed of the network detection is able to satisfy the real-time requirements, and the number of model parameters is in line with the characteristics of the lightweight network, which is able to satisfy the application of the unmanned aerial platforms and other application scenarios, where the computational resources are limited.
In conclusion, comparing the YOLOv11 network with the improved and optimized network, the detection rate can meet the real-time requirements, and the overall detection accuracy is improved by 5.9% with the introduction of a small number of parameters, which is more adaptable to the detection of multiple types of ship targets in complex scenarios. It can meet the demand for accurate and efficient target detection in real-time scenarios, and it can provide feasible technical support and theoretical support for marine monitoring, ship monitoring, and other tasks. It provides feasible technical support and theoretical support for marine monitoring and ship monitoring.