High-Efficiency and High-Precision Ship Detection Algorithm Based on Improved YOLOv8n

: With the development of the intelligent vision industry, ship detection and identification technology has gradually become a research hotspot in the field of marine insurance and port logistics. However, due to the interference of rain, haze, waves, light, and other bad weather, the robustness and effectiveness of existing detection algorithms remain a continuous challenge. For this reason, an improved YOLOv8n algorithm is proposed for the detection of ship targets under unforeseen environmental conditions. In the proposed method, the efficient multi-scale attention module (C2f_EMAM) is introduced to integrate the context information of different scales so that the convolutional neural network can generate better pixel-level attention to high-level feature maps. In addition, a fully-concatenate bi-directional feature pyramid network (Concatenate_FBiFPN) is adopted to replace the simple superposition/addition of feature map, which can better solve the problem of feature propagation and information flow in target detection. An improved spatial pyramid pooling fast structure (SPPF2+1) is also designed to emphasize low-level pooling features and reduce the pooling depth to accommodate the information characteristics of the ship. A comparison experiment was conducted between other mainstream methods and our proposed algorithm. Results showed that our proposed algorithm outperformed other models by achieving 99.4% of accuracy, 98.2% of precision, 98.5% of recall, 99.1% of mAP@.5, and 85.4% of mAP@.5:.95 on the SeaShips dataset.


Introduction
As an important part of human comprehensive transportation, water transportation has become the main mode of bulk cargo transportation because of its advantages of low cost, small pollution, large loading capacity, and long average transport distance.However, with the increasing number of ships, with the increasingly complex water transport environment, ship collision, illegal escape, overloading, illegal berthing, and other accidents occurring frequently, great threats might consequently be caused to social economy, ecological, environmental protection, and people's property safety.At present, in real ocean surveillance, due to the huge amount of video information, manual monitoring can no longer meet the needs of ship detection.In recent years, with the rapid development of GPU and artificial intelligence, researchers have applied image processing technology to the field of maritime supervision for ship detection and identification.Different from land traffic, the existing ship detection methods are prone to errors or neglect because of background interference such as waves and clutters, as well as the scale change of different ships.Therefore, effective monitoring and positioning of ships in complex situations is an important challenge that urgently needs to be addressed in the field of computer vision.
Currently, traditional ship detection algorithms are mainly classified into background modeling [1,2], temporal difference [3,4], optical flow [5,6], and template matching [7,8].Most of these algorithms are bottom-up and differentiate targets from background regions by designing a large number of image-related features (such as grayscale, texture, edges, etc.), resulting in generally unsatisfactory performance in a complex environment.In the past decade, research of convolutional neural networks has been thriving throughout various fields, and a large number of object detection algorithms based on deep learning have emerged, such as Faster R-CNN [9], SSD [10], and YOLO [11].Inspired by these technologies, Lin et al. [12] proposed an improved Faster R-CNN network, which further increases detection performance by using squeeze and excitation mechanisms.Later, considering the dense distribution and different size constraints, Zhou et al. [13] proposed an R-Libra CNN approach that could better integrate higher-level feature semantic information with lower-level features.Based on the principle of relay amplification, Li et al. [14] proposed a lightweight Faster R-CNN, which has better real-time performance.Similarly, Wen et al. [15] proposed a multi-scale SSD network, which effectively enhanced the feature representation capability and detection efficiency.Yang et al. [16] optimized the prediction frame suppression strategy of SSD to effectively solve the problem of occlusion or missed detection of small targets.Compared with Faster R-CNN and SSD, YOLO is the first single-stage target detection method, which divides the input image into grids according to the size of the feature map and sets different numbers of anchor boxes for each grid, so it has better real-time performance.Subsequently, researchers added a new feature extraction network, straight-through layer, feature pyramid, and other operations to the original YOLO and proposed Lira-YOLO [17], LMO-YOLO [18], LK-YOLO [19] and a series of improved versions for ship detection, which further improved target detection accuracy and reduced reasoning time.
Although the abovementioned algorithms have achieved fair results in ship detection, they still face challenges in practical application, mainly including (1) interference of rain, haze, wave, illumination, and other bad weather; (2) diversification of dimensions such as size and aspect ratio; (3) the complexity of orientation, position, posture, and other factors; (4) interference of background areas such as floating objects, shorelines, buildings.As shown in Figure 1, under these conditions, the detection accuracy of ship targets would be greatly affected.
In order to address these problems, we propose an improved YOLOv8n network to better detect ships from numerous misleading targets.In our algorithm, there are three advanced modules: efficient multi-scale attention module, fully connected weighted feature fusion mechanism, and spatial pyramid pooling fast structure.The combination of these modules has efficient processing performance, and it can achieve 99.4% accuracy, 98.2% precision, 98.5% recall, 99.1% mAP@.5, and 85.4% mAP@.5:.95.The contributions of this study are shown as follows: 1.
The proposed efficient multi-scale attention module (C2f_EMAM) achieves the fusion of contextual information at different scales and significantly improves the attention of high-level feature maps; 2.
The fully-concatenate bi-directional feature pyramid network (Concatenate_FBiFPN) is used to optimize the classification and regression structure to better solve the problem of feature propagation and information flow in target detection; 3.
The spatial pyramid pooling fast structure (SPPF2+1) is redesigned to emphasize the low-level pooling features and learn the target features more comprehensively; 4.
The ablation study of C2f_EMAM, Concatenate_FBiFPN, and SPPF2+1 is conducted, and the experimental results verified the feasibility and effectiveness of these modules.

Overview of the Improved YOLOv8n
YOLOv1 (You Only Look Once) [11] is the initially developed version of the YO series, which was proposed by Redmon et al. in 2015.Compared with traditional ta detection methods, it essentially transforms the target border positioning problem in regression problem, which utilizes the whole graph as the network input to directly ret the position and category of the bounding box in the output layer.More recently, YOL [20] and YOLOv7 [21] have become two of the most trending algorithms in the YO series.The main difference between them is that YOLOv5 is a lightweight target detec model compared to the more complex architecture of YOLOv7; it adopts the backb network structure based on the feature pyramid network and the anchor-free detec mode.This method reduces the computational complexity and the number of parame and greatly enhances the speed and accuracy.As a new training strategy, YOLOv7 adop deeper network structure and introduces some new technical means, such as the bottlen attention module, which can significantly improve the accuracy and generalization abilit the target detector.In 2023, the Ultralytics team launched the latest version of YOLOv8 (U lytics, Frederick, MD, USA) [22].By optimizing the C3 module in YOLOv5, removing the volutional structure in the up-sampling stage, and using a new loss strategy, YOLOv8 h faster detection speed and can detect and locate the target object more accurately.

Overview of the Improved YOLOv8n
YOLOv1 (You Only Look Once) [11] is the initially developed version of the YOLO series, which was proposed by Redmon et al. in 2015.Compared with traditional target detection methods, it essentially transforms the target border positioning problem into a regression problem, which utilizes the whole graph as the network input to directly return the position and category of the bounding box in the output layer.More recently, YOLOv5 [20] and YOLOv7 [21] have become two of the most trending algorithms in the YOLO series.The main difference between them is that YOLOv5 is a lightweight target detection model compared to the more complex architecture of YOLOv7; it adopts the backbone network structure based on the feature pyramid network and the anchor-free detection mode.This method reduces the computational complexity and the number of parameters and greatly enhances the speed and accuracy.As a new training strategy, YOLOv7 adopts a deeper network structure and introduces some new technical means, such as the bottleneck attention module, which can significantly improve the accuracy and generalization ability of the target detector.In 2023, the Ultralytics team launched the latest version of YOLOv8 (Ultralytics, Frederick, MD, USA) [22].By optimizing the C3 module in YOLOv5, removing the convolutional structure in the up-sampling stage, and using a new loss strategy, YOLOv8 has a faster detection speed and can detect and locate the target object more accurately.
Because of these advantages, we chose the lightweight version YOLOv8n (Ultralytics, Frederick, MD, USA) as the baseline, which mainly includes feature extraction modules (backbone and neck networks), a recognition branch, and a detection branch.The network architecture of the improved YOLOv8n is shown in Figure 2. Firstly, replacing the original C2f convolutional with C2f_EMAM lightweight structure can aggregate multi-scale spa-tial structure information; thereby, the convolutional neural network can generate better pixel-level attention to advanced feature maps.Secondly, using an efficient bi-directional connection and weighted feature fusion strategy, the neck network is improved to Con-catenate_FBiFPN so that the feature information can be propagated in both top-down and bottom-up directions.Finally, the SPPF structure has been improved by using two Maxpool 5 in parallel and performing secondary pooling on one of the pooling features.This has attracted more attention to low-level pooling features while also taking into account different receptive field features.The introduction details of each module are as follows.
Because of these advantages, we chose the lightweight version YOLOv8n (Ultraly Frederick, MD, USA) as the baseline, which mainly includes feature extraction mod (backbone and neck networks), a recognition branch, and a detection branch.The netw architecture of the improved YOLOv8n is shown in Figure 2. Firstly, replacing the orig C2f convolutional with C2f_EMAM lightweight structure can aggregate multi-scale tial structure information; thereby, the convolutional neural network can generate b pixel-level a ention to advanced feature maps.Secondly, using an efficient bi-directi connection and weighted feature fusion strategy, the neck network is improved to C catenate_FBiFPN so that the feature information can be propagated in both top-down bo om-up directions.Finally, the SPPF structure has been improved by using two M pool 5 in parallel and performing secondary pooling on one of the pooling features.has a racted more a ention to low-level pooling features while also taking into acco different receptive field features.The introduction details of each module are as follo

Architecture of C2f_EMAM
According to the literature [22], the CSP structure of YOLOv5 was replaced by the C2f structure in YOLOv8, which added more residual connections and thus enriched the gradient flow.However, due to different ship types and complex weather disturbances, the robustness and effectiveness of the YOLOv8 algorithm remain problematic.In deep learning and computer vision, the proposal of attention mechanisms is proven to be a very effective way to help models focus on important parts of the input.However, the traditional attention mechanism often works in a certain spatial scope and cannot obtain information from the global context.Inspired by Ouyang et al. [23], we add an efficient multi-scale attention module to C2f and propose a new C2f_EMAM structure.Specifically, C2f_EMAM performs a convolution operation on input features, followed by the split function and EMAM, and then enters the last half into a bottleneck block and finally concatenates it through a convolution output; the overall structure of C2f_EMAM is shown in Figure 3.By fusing attention mechanisms, convolution nuclei of different sizes, and grouping convolution, C2f_EMAM aims to improve model performance while maintaining computational efficiency.
C2f structure in YOLOv8, which added more residual connections and thus enriched the gradient flow.However, due to different ship types and complex weather disturbances, the robustness and effectiveness of the YOLOv8 algorithm remain problematic.In deep learning and computer vision, the proposal of a ention mechanisms is proven to be a very effective way to help models focus on important parts of the input.However, the traditional a ention mechanism often works in a certain spatial scope and cannot obtain information from the global context.Inspired by Ouyang et al. [23], we add an efficient multiscale a ention module to C2f and propose a new C2f_EMAM structure.Specifically, C2f_EMAM performs a convolution operation on input features, followed by the split function and EMAM, and then enters the last half into a bo leneck block and finally concatenates it through a convolution output; the overall structure of C2f_EMAM is shown in Figure 3.By fusing a ention mechanisms, convolution nuclei of different sizes, and grouping convolution, C2f_EMAM aims to improve model performance while maintaining computational efficiency.

Architecture of Concatenate_FBiFPN
Concatenate_FBiFPN is a unique feature pyramid network that combines the bidirectional cross-scale connections and weighted feature fusion strategies of FPN, PANet, and BiFPN, as shown in Figure 4.The feature information can be propagated from top to bo om and connected efficiently according to certain rules, which can accomplish the aggregation of deep and shallow features easily and quickly.FPN uses a simple top-down approach to combine multi-scale features, while PANet adds a simple bo om-up secondary fusion path on the basis of FPN.On the basis of PANet, BiFPN removes nodes with only one input edge to simplify the network and adds an extra path from the original input to the output node on the same horizontal route.Since different input feature maps have different effects on the fused multi-scale feature maps, the Concatenate_FBiFPN structure is the solution to tackle this problem.By incorporating more features without increasing the number of parameters and computational cost, Concatenate_FBiFPN adds a skip connection between some input nodes and output nodes to achieve adequate multiscale feature information fusion.

Architecture of Concatenate_FBiFPN
Concatenate_FBiFPN is a unique feature pyramid network that combines the bidirectional cross-scale connections and weighted feature fusion strategies of FPN, PANet, and BiFPN, as shown in Figure 4.The feature information can be propagated from top to bottom and connected efficiently according to certain rules, which can accomplish the aggregation of deep and shallow features easily and quickly.FPN uses a simple top-down approach to combine multi-scale features, while PANet adds a simple bottom-up secondary fusion path on the basis of FPN.On the basis of PANet, BiFPN removes nodes with only one input edge to simplify the network and adds an extra path from the original input to the output node on the same horizontal route.Since different input feature maps have different effects on the fused multi-scale feature maps, the Concatenate_FBiFPN structure is the solution to tackle this problem.By incorporating more features without increasing the number of parameters and computational cost, Concatenate_FBiFPN adds a skip connection between some input nodes and output nodes to achieve adequate multi-scale feature information fusion.

Architecture of SPPF2+1
The SPPF module is the spatial pyramid pooling module used in YOLOv5.Its mechanism is to pool feature maps of different scales without changing the size of the feature maps so as to improve the accuracy of receptor field extraction and target detection.The original SPPF is pooled three times layer by layer and concatenated at the end.Although shallow semantic features are considered in the final concatenate, the concatenated features are weakened.Considering that the shallow semantics contain more image features, we parallel two Maxpools 5 and only perform secondary pooling on one of the first pooling and concatenate the various features to obtain SPPF2+1.In this way, more attention is paid to the pooling characteristics of the lower layers, and different receptive field characteristics are also taken into account, as shown in Figure 5.

Architecture of SPPF2+1
The SPPF module is the spatial pyramid pooling module used in YOLOv5.Its mechanism is to pool feature maps of different scales without changing the size of the feature maps so as to improve the accuracy of receptor field extraction and target detection.The original SPPF is pooled three times layer by layer and concatenated at the end.Although shallow semantic features are considered in the final concatenate, the concatenated features are weakened.Considering that the shallow semantics contain more image features, we parallel two Maxpools 5 and only perform secondary pooling on one of the first pooling and concatenate the various features to obtain SPPF2+1.In this way, more a ention is paid to the pooling characteristics of the lower layers, and different receptive field characteristics are also taken into account, as shown in Figure 5.

Architecture of SPPF2+1
The SPPF module is the spatial pyramid pooling module used in YOLOv5.Its mechanism is to pool feature maps of different scales without changing the size of the feature maps so as to improve the accuracy of receptor field extraction and target detection.The original SPPF is pooled three times layer by layer and concatenated at the end.Although shallow semantic features are considered in the final concatenate, the concatenated features are weakened.Considering that the shallow semantics contain more image features, we parallel two Maxpools 5 and only perform secondary pooling on one of the first pooling and concatenate the various features to obtain SPPF2+1.In this way, more a ention is paid to the pooling characteristics of the lower layers, and different receptive field characteristics are also taken into account, as shown in Figure 5.

Results
The convolutional neural network is needed for training object detection models, so it is necessary to build an operating environment based on the Pytorch deep learning framework.In the Windows 10 environment, we use 24GB NVIDIA RTX6000 GPU memory and support CUDA to accelerate the model during the training process.In the training process, Adam Optimizer [24] and Tensorboard were used as a training visualization tool to visualize the training state of the model based on the Pytorch framework by reading the generated training log.The batch size was adjusted to 16, the learning rate was 0.001, and the number of iterations was 300.We used the publicly available SeaShips [25] dataset to verify the effect of the object detection methods.The images in the dataset were divided into three parts by the ratio of 3:1:1, which were respectively used as the training set, validation set, and test set.The data set contains a total of six classes: ore carrier, bulk cargo carrier, general cargo ship, container ship, fishing boat, and passenger ship, which can generally cover the main types of ships encountered at sea.

Evaluation Metrics
To evaluate the performance of these methods, accuracy, precision [26], recall [27], mAP@.5 [28], and mAP@.5:.95 [29] are used; equations are as follows: where TP and TN are the counts of positive and negative pixels that are correctly classified.
FP and FN are the counts of misclassified positive and negative pixels.mAP@.5 means that when IoU is set to 0.5, the AP of all images in each category is calculated, and then all categories are averaged.mAP@.5:.95 represents the average mAP across different IoU thresholds ranging from 0.5 to 0.95 with steps of 0.05.

Results of the Improved YOLOv8n
As shown in Figure 6, the loss of the training set and verification set and the change of evaluation index in the training process of our method were illustrated.Observing Figure 6a-c, after 200 epochs, although the training set index showed a continuous downward trend, the loss of the validation set had become flat, which indicates that the model reached a convergence state within 300 epochs.In Figure 6d, it is shown that although precision and recall continue to show oscillations after 200 epochs, mAP has become flat.
Table 1 shows the detection indexes of various types of ships, and the results indicate that the detection of the proposed method is not lower than 0.988 at mAP@.5 for all types of detection.Figure 7 shows that the model can accurately locate and classify each type of ship.

C2f_EMAM
Table 2 shows the comparison between the recognition results of C2f integrated with EMAM structure and the original C2f.The results show that the comprehensive index of C2f_EMAM is improved by 0.002 compared with the original Yolov8n model mAP@.5 and mAP@.5:.95.
Figure 8 shows the change values of the recall of each ship type before and after the integration of EMAM, and it is found that the model integrated with EMAM improves the Recall of all types of ships.
In order to better show the role of the EMAM mechanism, we demonstrate the change of characteristic heatmap of passing the C2f_EMAM layer through Grad-CAM.As shown in the first row of Figure 9, after the EMAM structure is added, the extracted features are mostly focused around the ship, while the original structure still distributes a lot of attention on the sea surface and coast.As shown in the second line, the EMAM structure can improve the ability to search for small targets.Compared with the original structure, the EMAM has a higher attention level on the main object-the ships, on the images.However, even for some seemingly undistributed heat, the background features are suppressed away Table 1 shows the detection indexes of various types of ships, and the results indicate that the detection of the proposed method is not lower than 0.988 at mAP@.5 for all types of detection.Figure 7 shows that the model can accurately locate and classify each type of ship.

Concatenate_FBiFPN
The evaluation metrics of three Concatenate structures are shown in Table 3, and the results show that the proposed Concatenate_FBiFPN structure effectively improves general performance of the model.Among them, mAP@.5 is increased by 0.002, mAP@.5:.95 is increased by 0.004.
Figure 10 shows the heatmap of a randomly selected testing sample image being processed by Layer 4 and Layer 13 after passing through different Concatenate structures.The results show that compared with the original Concatenate and Concatenate_BiFPN structure, Concatenate_FBiFPN has a higher attention level for the two tested objects judg-ing from the higher density of the highlighted areas.This indicates that Concatenate_BiFPN can have heavier and more concentrated weights than the target.

C2f_EMAM
Table 2 shows the comparison between the recognition results of C2f integrated with EMAM structure and the original C2f.The results show that the comprehensive index of C2f_EMAM is improved by 0.002 compared with the original Yolov8n model mAP@.5 and mAP@.5:.95.

Structure
Accuracy Precision Recall mAP@.5 mAP@.5:.95 Yolov8n-C2f 0.990 0.982 0.979 0.988 0.848 Yolov8n-C2f_EMAM 0.993 0.981 0.982 0.990 0.950 Figure 8 shows the change values of the recall of each ship type before and after the integration of EMAM, and it is found that the model integrated with EMAM improves the Recall of all types of ships.

SPPF2+1
Table 4 shows the influence of four SPPF structures on the detection accuracy.The data shows that compared to the original SPPF structure, the SPPF2+1 structure significantly improves the detection accuracy of the model.However, after further deepening the model structure, the detection accuracy is relatively low, as shown in the test indexes of SPPF3+1 and SPPF3+3.
From the heat map shown in Figure 11, we can observe that the SPPF2+1 structure focuses highly on the detected object.However, SPPF3+1 focuses on more features in the background region, and SPPF3+3 makes the features of the tested region biased or even missing due to the excessive hybridity of features.The results show that the proposed SPPF2+1 structure can effectively adapt to ship target detection and distribute the weight of the detected area well.In order to be er show the role of the EMAM mechanism, we demonstrate the change of characteristic heatmap of passing the C2f_EMAM layer through Grad-CAM.As shown in the first row of Figure 9, after the EMAM structure is added, the extracted features are mostly focused around the ship, while the original structure still distributes a lot of a ention on the sea surface and coast.As shown in the line, the EMAM structure can improve the ability to search for small targets.Compared with the original structure, the EMAM has a higher a ention level on the main object-the ships, on the images.However, even for some seemingly undistributed heat, the background features are suppressed away from background areas, as shown in the last line of

Concatenate_FBiFPN
The evaluation metrics of three Concatenate structures are shown in Table 3, and the results show that the proposed Concatenate_FBiFPN structure effectively improves general performance of the model.Among them, mAP@.5 is increased by 0.002, mAP@.5:.95 is increased by 0.004.In order to be er show the role of the EMAM mechanism, we demonstrate the change of characteristic heatmap of passing the C2f_EMAM layer through Grad-CAM.As shown in the first row of Figure 9, after the EMAM structure is added, the extracted features are mostly focused around the ship, while the original structure still distributes a lot of a ention on the sea surface and coast.As shown in the second line, the EMAM structure can improve the ability to search for small targets.Compared with the original structure, the EMAM has a higher a ention level on the main object-the ships, on the images.However, even for some seemingly undistributed heat, the background features are suppressed away from background areas, as shown in the last line of

Concatenate_FBiFPN
The evaluation metrics of three Concatenate structures are shown in Table 3, and the results show that the proposed Concatenate_FBiFPN structure effectively improves general performance of the model.Among them, mAP@.5 is increased by 0.002, mAP@.5:.95 is increased by 0.004.

SPPF2+1
Table 4 shows the influence of four SPPF structures on the detection accuracy.The data shows that compared to the original SPPF structure, the SPPF2+1 structure significantly improves the detection accuracy of the model.However, after further deepening the model structure, the detection accuracy is relatively low, as shown in the test indexes of SPPF3+1 and SPPF3+3.From the heat map shown in Figure 11, we can observe that the SPPF2+1 structure focuses highly on the detected object.However, SPPF3+1 focuses on more features in the background region, and SPPF3+3 makes the features of the tested region biased or even missing due to the excessive hybridity of features.The results show that the proposed SPPF2+1 structure can effectively adapt to ship target detection and distribute the weight of the detected area well.

Discussion
To test the performance of the model more comprehensively, our improved YOLOv8n was compared with some classic object detection methods based on the same dataset, and the results are shown in Table 5.The data shows that the mAP@.5 of our YOLOv8n gets 0.991 on the data set, which is much higher than other models.In terms of detection speed, its FPS score of 83.33 is also the highest.In addition, the number of parameters and the size of GFLOPs are kept to a minimum.The results show that our YOLOv8n on SeaShips dataset has be er general performance.

Discussion
To test the performance of the model more comprehensively, our improved YOLOv8n was compared with some classic object detection methods based on the same dataset, and the results are shown in Table 5.The data shows that the mAP@.5 of our YOLOv8n gets 0.991 on the data set, which is much higher than other models.In terms of detection speed, its FPS score of 83.33 is also the highest.In addition, the number of parameters and the size of GFLOPs are kept to a minimum.The results show that our YOLOv8n on SeaShips dataset has better general performance.

Conclusions
This paper proposes a deep learning architecture based on YOLOv8n for the objective of ship detection under complex environmental conditions.Our conclusions are as follows: firstly, the proposed C2f_EMAM can integrate the context information of different scales and significantly improve the attention of high-level feature maps.Secondly, the fusion of Concatenate_FBiFPN into the model can better solve the problem of feature propagation and information flow in target detection.Finally, the SPPF2+1 is redesigned to emphasize the low-level pooling features and learn the target features more comprehensively.In the comparison experiments we conducted between our algorithm and other similar purpose models, our algorithm achieved 99.4% accuracy, 98.2% precision, 98.5% recall, 99.1% of mAP@.5 and 85.4% of mAP@.5:.95, which is superior to other traditional detection models.The results show that our proposed method possesses research significance in the field of automatic and intelligent ship detection.
However, this study does not fully discuss the representativeness and diversity of the datasets used, and the robustness of the model is still to be further improved.Currently, we are building a more comprehensive and diverse ship detection dataset and adopting more diverse evaluation indicators to improve the generalization ability and practicability of the model, which is also the focus of research in the future.

Figure 1 .
Figure 1.Challenges and limitations of ship detection.The first row: interference of rain, haze, w illumination and other bad weather; The second row: diversification of dimensions such as size aspect ratio; The third row: the complexity of orientation, position, posture and other factors; last row: interference of background areas such as floating objects, shorelines, buildings.

Figure 1 .
Figure 1.Challenges and limitations of ship detection.The first row: interference of rain, haze, wave, illumination and other bad weather; The second row: diversification of dimensions such as size and aspect ratio; The third row: the complexity of orientation, position, posture and other factors; The last row: interference of background areas such as floating objects, shorelines, buildings.

Figure 2 .
Figure 2. The network architecture of the improved YOLOv8n.Figure 2. The network architecture of the improved YOLOv8n.

Figure 2 .
Figure 2. The network architecture of the improved YOLOv8n.Figure 2. The network architecture of the improved YOLOv8n.

Figure 8 .
Figure 8. Influence of C2f_EMA structure on ship identification recall.

Figure 9 .
In summary, the C2f structure integrated into EMAM improves the feature extraction ability of the model.

Figure 8 .
Figure 8. Influence of C2f_EMA structure on ship identification recall.

Figure 8 .
Figure 8. Influence of C2f_EMA structure on ship identification recall.

Figure 10
Figure10shows the heatmap of a randomly selected testing sample image being processed by Layer 4 and Layer 13 after passing through different Concatenate structures.The results show that compared with the original Concatenate and Concatenate_BiFPN structure, Concatenate_FBiFPN has a higher a ention level for the two tested objects judging from the higher density of the highlighted areas.This indicates that Concate-nate_BiFPN can have heavier and more concentrated weights than the target.

Table 1 .
Performance results of the model on test set.

Table 1 .
Performance results of the model on test set.

Table 2 .
Comparison of detection results of different C2f structures.

Table 2 .
Comparison of detection results of different C2f structures.

Table 3 .
Comparison of detection results of different Concatenate structures.

Table 4 .
Comparison of detection results of different SPPF structures.

Table 4 .
Comparison of detection results of different SPPF structures.

Table 5 .
Comparison of detection results of different models.