MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments

Ma, Feng; Kang, Zhe; Chen, Chen; Sun, Jie; Deng, Jizhu

doi:10.3390/jmse12010072

Open AccessArticle

MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments

by

Feng Ma

^1,2,3

,

Zhe Kang

^1,2,3

,

Chen Chen

^4,*,

Jie Sun

⁵ and

Jizhu Deng

⁶

¹

State Key Laboratory of Maritime Technology and Safety, Wuhan University of Technology, Wuhan 430063, China

²

National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China

³

Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China

⁴

School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China

⁵

Nanjing Smart Water Transportation Technology Co., Ltd., Nanjing 210028, China

⁶

Nanjing Port Co., Ltd., Nanjing 210011, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(1), 72; https://doi.org/10.3390/jmse12010072

Submission received: 24 November 2023 / Revised: 22 December 2023 / Accepted: 23 December 2023 / Published: 27 December 2023

(This article belongs to the Special Issue Intelligent Ships and Waterways: Design, Operation and Advanced Technology)

Download

Browse Figures

Versions Notes

Abstract

:

In high-traffic harbor waters, marine radar frequently encounters signal interference stemming from various obstructive elements, thereby presenting formidable obstacles in the precise identification of ships. To achieve precise pixel-level ship identification in the complex environments, a customized neural network-based ship segmentation algorithm named MrisNet is proposed. MrisNet employs a lightweight and efficient FasterYOLO network to extract features from radar images at different levels, capturing fine-grained edge information and deep semantic features of ship pixels. To address the limitation of deep features in the backbone network lacking detailed shape and structured information, an adaptive attention mechanism is introduced after the FasterYOLO network to enhance crucial ship features. To fully utilize the multi-dimensional feature outputs, MrisNet incorporates a Transformer structure to reconstruct the PANet feature fusion network, allowing for the fusion of contextual information and capturing more essential ship information and semantic correlations. In the prediction stage, MrisNet optimizes the target position loss using the EIoU function, enabling the algorithm to adapt to ship position deviations and size variations, thereby improving segmentation accuracy and convergence speed. Experimental results demonstrate MrisNet achieves high recall and precision rates of 94.8% and 95.2%, respectively, in ship instance segmentation, outperforming various YOLO and other single-stage algorithms. Moreover, MrisNet has a model parameter size of 13.8M and real-time computational cost of 23.5G, demonstrating notable advantages in terms of convolutional efficiency. In conclusion, MrisNet accurately segments ships with different spot features and under diverse environmental conditions in marine radar images. It exhibits outstanding performance, particularly in extreme scenarios and challenging interference conditions, showcasing robustness and applicability.

Keywords:

ship segmentation; radar image; lightweight convolution; adaptive attention mechanism; loss function

1. Introduction

Real-time monitoring of busy waterways is a critically important task in maritime management, with positive implications for ensuring ship navigation safety, and improving port operational efficiency. Marine radar is widely applied in various domains such as ship collision avoidance, weather forecasting, and marine resource monitoring [1,2]. Similarly, shore-based deployed marine radar plays a significant role as it enables continuous monitoring of ships in expansive water areas, even under adverse weather and low visibility conditions. Compared to detection technologies like Automatic Identification System (AIS) and Very High-Frequency (VHF), marine radar does not rely on real-time information responses from ships, thereby greatly enhancing the speed of obtaining navigation information.

In general, the images presented by marine radar consist of a two-dimensional dataset composed of a series of spots. These spots represent the intensity and positional information of the radar signal’s reflection and echo from objects such as ships, islands, and air masses in space. By analyzing and processing these spots, radar images with different resolutions and clarity can be generated. Additionally, by overlaying and calculating a series of radar images taken at different time intervals, the positions and trajectories of ships can be simulated, resulting in ship spots with varying lengths of trails.

Traditional ship recognition methods for marine radar images mainly involve techniques such as filtering and pattern recognition [3]. These methods demonstrate appropriate suppression capabilities when dealing with relatively simple interference. However, they are relatively inapplicable in tasks that involve low pixel features, slow movement speeds, and strong interference in ship recognition. In recent years, deep learning techniques have rapidly advanced in the field of object recognition, particularly with the widespread application of Convolutional Neural Networks (CNNs). CNN-based methods have shown more competitive performance in complex scenarios, such as equipment defect detection, autonomous driving, and aerospace applications. Compared to traditional methods, these approaches are capable of extracting deep semantic information from images and achieving more accurate object localization.

Both traditional recognition methods and CNN-based object detection algorithms can only provide rough positional information of the targets. As a derivative method in the field of computer vision, instance segmentation offers higher precision in pixel-level target localization and can effectively distinguish multiple instances. For marine radar images, instance segmentation methods also provide richer ship motion information, such as heading and speed. Therefore, utilizing these methods to process marine radar images can provide more accurate and comprehensive ship identification and tracking information for maritime authorities.

Compared to ship instance segmentation in natural images, performing such tasks in radar images faces more challenges. Firstly, there are only small numbers of spots represent actual moving ships. This significantly affects the accurate classification and localization of ship targets, while interferences such as waves, clouds, rain, clutter, etc., further increase the difficulty of interference removal. Secondly, both long-tail and short-tail ships in radar images are considered small or even tiny objects. Particularly, in cases of dense sailing or crossing navigation, distinguishing between ship spots can be challenging. Lastly, due to the extensive use of embedded devices in radar systems, traditional fractal algorithms and micro-Doppler techniques have been commonly employed for radar signal processing for several decades, despite their relatively limited adaptability in many cases. In contrast, deep learning-based recognition methods generally exhibit preferable performance, but they require more computational resources, and only a fraction of mature algorithms can be directly applied to embedded devices.

In response to the characteristics of marine radar images, this paper proposes a customized CNN-based instance segmentation algorithm called MrisNet. Compared to previous research, this method exhibits significant differences in several aspects.

(1): We enhance the feature network to extract crucial ship features by employing more efficient convolutional modules.
(2): A convolutional enhancement method that incorporates channel correlations is introduced to further enhance the generalization ability of the feature network.
(3): An attention mechanism with contextual awareness is utilized to enhance the multi-scale feature fusion structure, enriching the representation of convolutional features at different levels and effectively integrating micro-level and global-level ship features.
(4): The positioning loss estimation of predicted box is optimized to improve the precision of ship localization and enhance the segmentation performance for dense ship scenarios.
(5): To evaluate the effectiveness of various algorithms for ship segmentation in radar images, a high-quality dataset called RadarSeg, consisting of 1280 radar images, is constructed.

The remaining sections of this paper are organized as follows: Section 2 provides a brief overview of relevant research on instance segmentation and detection of ships in different imaging scenarios. In Section 3, a customized CNN-based ship segmentation algorithm is proposed. Section 4 presents a comparative analysis of experimental results using various algorithms in marine radar images. Finally, Section 5 summarizes the main contributions of the proposed method and discusses future directions for development.

2. Related Works

Currently, ship recognition in natural scenes and remote sensing images has been widely explored and developed. Early traditional methods primarily employed techniques such as clustering analysis and filtering for ship recognition [4]. However, these methods have obvious limitations and relatively poor adaptability, making it challenging to achieve satisfactory results in different scenarios. Meanwhile, the rapid development of deep learning and neural network-based technologies has brought significant advancements in tasks like instance segmentation. However, in the field of ship recognition in marine radar images, the application of deep learning-related methods remains relatively limited, presenting vast opportunities for further development.

In this section, we first provided an overview of ship recognition techniques in marine radar images, including neural network-based approaches for object detection and instance segmentation. It is worth noting object detection and instance segmentation share commonalities in feature extraction, object localization, and post-processing, thus the design principles of object detection algorithms can offer valuable insights for the design of instance segmentation algorithms. Additionally, we reviewed and summarized ship recognition techniques and relevant optimization methods in other scenarios.

2.1. Ship Identification Methods under Radar and Other Scenarios

In recent years, significant advancements have been made in object detection and instance segmentation using CNNs. In 2014, the R-CNN algorithm demonstrated a significant advantage on the PASCAL VOC dataset, gradually establishing the dominance of deep learning-based algorithms in the field of object recognition. As is widely known single-stage algorithms is achieving a good balance between computational speed and recognition accuracy, allowing for end-to-end training and widespread adoption in various recognition tasks. However, in certain tasks, single-stage algorithms may exhibit slightly lower accuracy compared to two-stage algorithms such as Faster R-CNN [5] and Mask R-CNN [6]. Therefore, the selection of benchmark algorithms and network architectures requires careful consideration and decision-making based on specific application scenarios and requirements.

In the research on ship recognition in marine radar images, significant progress has been made by scholars who have incorporated deep learning methods into the field of object detection. By designing effective algorithm structure and utilizing techniques such as clutter suppression and feature enhancement, even single-stage algorithms can achieve satisfactory recognition results. Chen et al. [7] proposed a ship recognition algorithm based on a dual-channel convolutional neural network (DCCNN) and a false alarm-controllable classifier (FACC) to suppress clutter and accurately extract ship features in images. Furthermore, some studies have achieved high levels of ship recognition accuracy and effective interference suppression using two-stage recognition algorithms. For instance, Chen et al. [8] made several improvements to Faster R-CNN in multiple aspects, including optimizing the backbone network, sample data balancing, and scale normalization, aiming to enhance the algorithm’s accuracy and robustness. Additionally, differential neural architecture search, label reassignment, and various types of feature pyramid structures have been widely applied in algorithm design, yielding promising recognition results [9,10,11]. In conclusion, deep neural networks have proven effective in extracting ship features. However, current object detection techniques still have limitations in suppressing complex interference, and there is room for improvement in recognizing dense small-sized ships.

Currently, research on ship segmentation under marine radar images is still relatively limited. It is well-known ships in synthetic aperture radar (SAR) images exhibit small scales, limited effective pixels, and significant background interference, which are similar to the challenges faced in marine radar images. In the context of SAR images, there has been extensive research on ship segmentation, and we could draw upon the technical methods from these studies to provide insights for the design of ship segmentation algorithms in marine radar images.

In ship segmentation under complex background conditions of SAR images, a well-structured feature extraction network plays a crucial role in enhancing the model’s recognition capability. Zhang et al. [12] proposed methods such as multi-resolution feature extraction networks, and enhanced feature pyramid networks to ensure more adaptable performance in complex scenarios. Moreover, various attention mechanisms can significantly enhance the extraction of ship features and contextual information. Zhao et al. [13] proposed a ship segmentation method based on collaborative attention mechanism, which improved the recognition performance of multi-task branches. Typically, instance segmentation networks employ horizontal bounding boxes to fit the objects, which may include redundant information. Moreover, in dense ship scenarios, it becomes challenging to accurately differentiate between individual targets. Yang et al. [14] proposed a novel ship segmentation network that utilized rotated bounding boxes as the segmentation foundation, enclosing the targets along the direction of the ships. Besides, they designed a dual feature alignment module to capture representative features of the ships. Instance segmentation of small-scale or multi-scale ships has been a focal topic in SAR images. Shao et al. [15] proposed a multi-scale ship segmentation method specifically designed for SAR images. Specifically, they achieved more precise target regression by re-designing the input unit, backbone network, and ROI module of the Mask R-CNN.

The above-mentioned methods provide valuable references for the design of instance segmentation algorithms in marine radar images, particularly in terms of feature network design, attention mechanism application, and multi-scale segmentation.

In other domains, research on ship segmentation primarily focuses on natural images. Researchers have employed various techniques and methods for algorithm design. Some studies have utilized single-stage algorithms, incorporating efficient backbones, feature fusion structures, and prediction networks to achieve high-precision ship segmentation [16]. Others have adopted two-stage algorithm designs, involving the extraction of target candidate regions followed by classification and fine-grained segmentation to enhance accuracy and robustness [17]. These diverse algorithm design strategies cater to the demands and challenges of various domains.

2.2. Optimization Method for Ship Identification Research

The ship recognition in radar images can be enhanced through the utilization of auxiliary techniques. For instance, employing multi-modal fusion techniques and clutter suppression techniques can result in clearer and higher-resolution radar images, thereby providing more accurate ship information. Guo et al. [18] proposed a method that utilized deep learning techniques to identify targets in marine radar images and achieved consistent fusion of electronic chart and marine radar images by treating these targets as reference points. Mao et al. [19] introduced a marine radar imaging framework based on non-uniform imaging theory, which combined techniques such as beam recursive anti-interference, non-uniform sampling models, and dimensionality reduction iterations for improved computational efficiency and higher-quality imaging results. Zhang et al. [20] employed generative adversarial networks to remove noise in radar images and utilized image registration methods to eliminate imaging discrepancies between radar images and chart data, supporting efficient ship recognition through parallel computation of feature data and image fusion.

Currently, there are several techniques available to improve the ship recognition models. Among them, constructing datasets that include rich ship features is an effective method for enhancing recognition performance [21]. In addition, the ship detection can be increased by integrating feature augmentation and key point extraction into the recognition algorithms [22]. Furthermore, through the design of feature enhancement modules and small target attention mechanisms, substantial improvements in the recognition of small-scale ships might be made [23,24]. In certain scenarios, it is necessary to consider the computational limitations of embedded devices, thus requiring the design of efficient lightweight algorithms. Yin et al. [25] addressed this challenge by introducing depthwise separable convolutions and point-wise group convolutions, resulting in a lightweight feature extraction network that balances the accuracy and inference speed of ship recognition.

Based on the above analysis, it can be concluded leveraging deep learning techniques to design feature networks and prediction structures significantly enhances the effectiveness of ship recognition. Furthermore, compared to traditional methods, deep learning approaches exhibit prominent advantages. Given the presence of numerous small-scale targets in marine radar images, this study proposes a ship segmentation algorithm that combines deep convolutional networks, feature attention mechanisms, and multi-scale feature fusion structures. By integrating different hierarchical features from the images, this algorithm effectively suppresses interference and enables accurate localization and precise segmentation of different types of ships.

3. A Proposed Method

The overall framework of the ship segmentation model for marine radar images proposed in this paper is illustrated in Figure 1. The model consists of components such as the feature extraction network, feature fusion network, and prediction structure.

To be specific, the feature network, called FasterYOLO, adopts a novel convolutional structure that combines efficient convolutional units and adaptive attention mechanisms. The FasterYOLO network is composed of four types of structures, namely FTN (FasterNet) [26], CBS, SimAM (Simple, Parameter-Free Attention Module) [27], and SPPF (Spatial Pyramid Pooling-Fast), which are concatenated in a sequential manner. The FTN structure, an efficient convolutional module, is detailed in Section 3.1.2. CBS, a commonly employed structure in YOLOv5 [28], consists of conventional convolutions, batch normalization layers, and the SiLU function. Additionally, SimAM, a parameter-free attention module, is introduced in Section 3.1.3. Finally, SPPF, a widely adopted feature enhancement method, is utilized. The network reduces redundant computations and improves the extraction of spatial features of the targets, facilitating the model’s understanding of the hierarchical pixel and semantic information of ships.

The feature fusion network, CPFPN (Context-sensitive Processing within Feature Pyramid Network), integrates a novel convolutional enhancement method known as the CoT (Contextual Transformer) module, based on the Transformer pattern [29]. CPFPN employs a dual-path feature fusion strategy, combining both bottom-up and top-down approaches, to facilitate the integration of multi-scale features. Notably, the CoT module enhances contextual information of ship targets. It captures global and local dependencies in the feature sequences, thereby avoiding feature loss and degradation.

Finally, in the prediction structure of the proposed algorithm, three object detection branches are designed to target ships at different scales, along with an instance segmentation branch, enabling accurate ship localization and pixel-level segmentation. Especially, the EIoU (Enhanced Intersection over Union) loss [30] is introduced to optimize the loss calculation for predicting bounding box positions. This method improves the convergence speed and accuracy of the predicted boxes, enabling the model to learn feature representations with better generalization ability.

3.1. Feature Extraction Network

As mentioned earlier, marine radar images exhibit unique characteristics such as noise and low resolution, making them significantly different from natural images. Classic feature networks like ResNet-50, ResNet-101, and SENet are typically designed for general natural datasets and may not adapt well to the distinctive image features of radar images, resulting in suboptimal feature extraction performance. Furthermore, these networks often contain a large number of redundant parameters, which can lead to overfitting issues when applied to ship segmentation in marine radar images. Therefore, employing lightweight customized feature networks may be more suitable for such tasks.

Furthermore, due to the generally small-scale of ships in marine radar images, the feature network should possess the ability to perceive small objects. Moreover, to capture different signature features of ships, the feature network should employ suitable convolutional structures and downsampling ratios to obtain feature representations of targets at different scales, thereby enhancing the accuracy of ship segmentation. Based on the aforementioned analysis, the overall framework of the proposed feature extraction network is illustrated in Figure 2. This network is an adaptive neural network that effectively extracts ship features at various levels and achieves more prominent performance.

3.1.1. YOLOv5(S) Network

In this research, we selected the YOLOv5(S) [28] as the baseline feature network. Despite its relatively smaller convolutional scale, YOLOv5(S) is still able to perform effectively. The network is illustrated in Figure 3, YOLOv5(S) utilizes a series of downsampling layers to progressively extract image features, adapting to variations in target scale. It employs different sizes of convolutional kernels and pooling layers to design receptive fields at different levels, capturing multi-scale features of the targets while maintaining a lower number of convolutional parameters and computational complexity. This contributes to improving the accuracy and robustness of ship segmentation, particularly enhancing the recognition of small-scale ships in radar images. It is important to note while YOLOv5(S) has fewer convolutional layers and parameters, as well as lower computational complexity, this lightweight design makes it more suitable for instance segmentation tasks in resource-constrained environments.

3.1.2. The Main Architecture of FasterYOLO Network

In marine radar images, image sequences often contain ship regions of various sizes and shapes, with the majority of ships being relatively small in scale. Therefore, it is crucial to capture key ship features with high sensitivity. Additionally, the feature network should possess noise resistance and robustness. To address these requirements, valid convolutional computation units can be employed to enhance network performance. This paper introduces a simple and fast convolutional method, combining standard convolutions and pointwise convolutions, to design a more efficient feature network.

With the advancement of CNNs, classical computation methods represented by standard convolutions often suffer from redundant calculations, resulting in inefficient increases in model parameters and computational costs. Moreover, a significant amount of ineffective convolutional computations can impact the extraction of crucial features, especially in ship recognition under radar images where targets are small in scale and feature information is limited. Excessive convolutional calculations can lead to overfitting issues. Research has revealed convolutional feature maps exhibited high similarity across different channels, and standard convolution unavoidably duplicated the extraction of image features in a per-channel computation. Therefore, simplifying the convolutional computation to reduce feature information not only ensures network performance but also greatly enhances computational speed and efficiency. Specifically, this paper proposes a network structure called FasterYOLO, which incorporates the FasterNet modules [26] after standard convolutions, as illustrated in Figure 4. This improvement aims to enhance the model’s ability to extract key ship features and reduce the risk of overfitting.

Compared to conventional convolutional networks, integrating the FasterNet modules in the feature network significantly reduces the computational cost of convolutions. The FasterNet employs PConv (Partial Convolution) to optimize per-channel convolutional computations. In practical terms, PConv convolution is a convolutional structure with multiple branches and spatial feature extraction capabilities. It utilizes partial convolution operations, which means it performs conventional convolutions only on a subset of channels in the input feature maps, thereby reducing the computational workload. Additionally, PConv incorporates a parallel branch structure that performs convolutional operations on different spatial positions of the input feature maps, capturing more spatial feature information.

Figure 5 illustrates a brief comparison of the computational processes between PConv convolution, standard convolution, and depthwise convolution [26]. Depthwise convolution, a classical convolutional variant widely used in various neural networks, has shown significant effectiveness in MobileNet series. For an input

I \in R^{c \times h \times w}

, We apply

c

filters

w \in R^{k \times k}

to compute the output

O \in R^{c \times h \times w}

. In this case, the computational cost of standard convolution is

h \times w \times k^{2} \times c^{2}

, while depthwise convolution only incurs a cost of

h \times w \times k^{2} \times c

. It can be observed depthwise convolution is effective in reducing computational cost. However, it directly leads to a decrease in recognition accuracy and therefore cannot directly replace standard convolution. On the other hand, PConv convolution only applies regular convolution to a subset of input channels, resulting in a computational cost of

h \times w \times k^{2} \times c_{p}^{2}

. When r is 1/4, the computational cost of PConv is merely 1/16 of that of standard convolution. Consequently, PConv convolution greatly simplifies the computational process of standard convolution while maintaining identical input and output dimensions. Without altering the network hierarchy, standard convolution can be directly replaced with PConv convolution.

3.1.3. Feature Enhancement Mechanism Based on SimAM Attention

For the feature network, it is crucial to fully leverage the contextual information surrounding ships in radar images to enhance the accuracy of ship segmentation. To fulfill this requirement, attention mechanisms can be introduced or convolutional structures with global contextual awareness can be utilized. By capturing the correlations between ships and their surrounding environments, the network can better comprehend the shape, contour, and semantic information of ships, thereby enhancing the effectiveness of ship segmentation.

In this research, the SimAM method [27], which is constructed based on principles derived from visual neuroscience, is appended after the FasterYOLO network. According to the theory of visual neuroscience, neurons carrying more information are more salient compared to their neighboring neurons when processing visual tasks, and thus, they should be assigned higher weights. In ship segmentation, it is equally important to enhance the neurons in the convolutional network that are responsible for extracting crucial ship features. SimAM captures both spatial and channel attention simultaneously and possesses spatial inhibition capabilities that are translation-invariant. Unlike methods such as SENet and CBAM that focus on designing attention mechanisms through pooling and fully connected layers, SimAM evaluates the importance of each feature based on an energy function derived from neuroscientific principles. It offers better interpretability and does not require the introduction of additional learnable parameters. Consequently, SimAM effectively extracts and enhances salient information of ships in marine radar images. Specifically, SimAM evaluates each neuron in the network by defining an energy function based on linear separability, as shown in Equations (1)–(4).

e_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{{(t - \hat{μ})}^{2} + 2 {\hat{σ}}^{2} + 2 λ}

(1)

\hat{μ} = \frac{1}{M} \sum_{i = 1}^{M} x_{i}

(2)

{\hat{σ}}^{2} = \frac{1}{M} \sum_{i = 1}^{M} {(x_{i} - \hat{μ})}^{2}

(3)

\tilde{X} = s i g m o i d (1 / e_{t}^{*}) ⊙ X

(4)

Among them,

t

is the target neuron,

x

is the adjacent neuron,

M

is the number of neurons, and

λ

is a hyperparameter. A lower energy value of

e_{t}^{*}

indicates higher discriminability between the neuron and its neighboring neurons, implying a higher level of importance for that neuron. In Equation (4), we weight the importance of neurons using

1 / e_{t}^{*}

. The introduction of SimAM enables the feature network to comprehensively assess feature weights, thereby enhancing the representation of crucial ship information, and reducing reliance on prior information regarding significant variations in the shape of the targets.

3.2. Feature Fusion Network

In marine radar images, the majority of ships exhibit relatively small-scales. Therefore, developing an efficient convolutional architecture to capture fine and effective feature representations of ships becomes crucial. In convolutional networks, shallow convolutional features possess higher resolution, providing more detailed spatial information that aids in the recognition of object edge structures. Moreover, shallow features are more robust to image noise and lighting variations. However, shallow features exhibit limited adaptability to transformations such as translation, scaling, and rotation in images. In contrast, deep convolutional features have lower resolution but contain stronger multi-scale and semantic information, which helps filter out the effects of noise and lighting variations in images. However, deep features have relatively weaker capability in extracting fine-grained details from images. Hence, this paper proposes a more efficient feature fusion network that fully integrates feature information extracted from different scales and depths of receptive fields in the images.

Currently, object detection or instance segmentation algorithms based on deep neural networks commonly employ feature pyramid structures to address the challenge of scale variation. Among them, FPN serves as the most widely used feature structure, delivering more adaptive results in both single-stage and two-stage algorithms. FPN realizes the fusion of features at different scales through a top-down feature propagation path. However, high-level features need to undergo multiple intermediate-scale convolutions and be fused with features at these scales before merging with low-level features. In this process, the semantic information of high-level features may be lost or degraded. In contrast, dual-path fusion structures such as PANet compensate for the shortcomings of FPN in preserving high-level features but also introduce the opposite problem, where the detailed information of low-level features may be degraded during fusion. To address this issue, this paper introduces an attention mechanism that integrates rich feature information into the PANet network. This mechanism adaptively learns the importance of different regions in the feature map and weights the fusion of features at different scales, resulting in more comprehensive feature representations. Based on these theoretical considerations, we refer to the proposed feature fusion network as CPFPN, and its overall structure is illustrated in Figure 6.

Specifically, in this research, we employ an attention mechanism based on CoT [31] to extract crucial features of ships in radar images. This mechanism fully utilizes the contextual information of the input data and enhances the expressive power of key features by learning a dynamic attention matrix. In comparison to traditional self-attention mechanisms, the CoT provides a more comprehensive treatment of contextual information. Traditional self-attention mechanisms only interact information in the spatial domain and independently learn correlation information, thereby overlooking rich contextual information among adjacent features and limiting the self-attention learning capability of feature maps. In terms of specific implementation, the CoT module integrates two types of contextual information about the image. Firstly, the input data is encoded through convolutional operations to capture contextual information and generate static contextual representations. Next, dynamic contextual representations are obtained through concatenation and consecutive convolutions. Finally, the static and dynamic contextual representations are fused to form the final output.

In the context of marine radar images, the CoT module effectively captures global and local contextual features of targets by encoding the input feature maps with contextual information. By leveraging contextual information, the network is able to extract more diverse and meaningful features, thereby obtaining additional information about the background surrounding the ships. Furthermore, CoT adaptively allocates attention weights to different features, enabling the model to flexibly acquire the most relevant and salient ship characteristics from the feature maps. Moreover, CoT exhibits the capability to directly substitute the standard 3 × 3 convolutional structure, facilitating its seamless integration into other classical convolutional networks.

The convolutional process of the CoT module, as depicted in Figure 7, operates on the input feature map

X \in R^{H \times W \times C}

, with keys (

K = X

), queries (

Q = X

), and values (

V = X W_{v}

) associated with the convolutional feature map [31]. Unlike traditional self-attention methods, the CoT module employs

k \times k

groups of convolutional operations to extract contextual information. Through this process, the resulting

K^{1} \in R^{H \times W \times C}

is further utilized as the static contextual representation of the input X.

Next, the previously obtained

K^{1}

is concatenated with Q and passed through two consecutive convolutional operations to compute the attention matrix A.

A = [K^{1}, Q] W_{θ} W_{δ}

(5)

Based on the given attention matrix A, the enhanced feature

K^{2}

is generated and computed according to the following formula. By leveraging the enhanced feature

K^{2}

, it becomes capable of capturing dynamic contextual representations regarding the input X. Ultimately, the fusion of these two contextual representations is achieved through a classical attention mechanism, resulting in the final output.

K^{2} = V ⊙ A

(6)

3.3. Ship Prediction Module

As MrisNet maintains the fundamental structure of YOLOv5, there exist slight variations in the calculation of loss for object detection and instance segmentation. The comprehensive loss for object detection consists of three components, i.e., localization loss, confidence loss, and classification loss. These losses are individually multiplied by corresponding weights and ultimately summed to yield the overall loss. Notably, since this study focuses solely on a single category, the classification loss remains consistently zero. The object detection branch predominantly employs the CIoU (Complete Intersection over Union) [32] metric from the IoU series to compute the localization loss and utilizes binary cross-entropy loss for the calculation of the confidence loss.

In contrast to object detection, the comprehensive loss for instance segmentation comprises four components, i.e., localization loss, confidence loss, classification loss, and mask segmentation loss. Similarly, these losses are weighted and summed to yield the final calculation result for the overall loss. Specifically, the calculation process for localization loss, confidence loss, and classification loss is identical to that in the object detection branch. However, for mask segmentation loss, binary cross-entropy loss is predominantly employed to evaluate the disparity between predicted masks and ground truth masks.

The significance of the localization loss is apparent in both the object detection and instance segmentation branches. In the instance segmentation branch, specifically, the extraction of predicted masks is restricted to the scope of predicted bounding boxes. The predicted bounding boxes, generated through regression using the localization loss, form the basis for calculating the mask segmentation loss. Consequently, this research optimized the calculation process of the localization loss in the instance segmentation branch while preserving the standard calculation of other losses.

Therefore, we introduce a novel evaluation method, referred to as EIoU, as the computation standard for the target’s position loss, aiming to further enhance the localization accuracy of ship bounding box predictions. The loss function takes into account multiple factors such as IoU loss, distance loss, and width-height loss, fully considering the relationship between the predicted box and the ground truth box. This leads to a more stable gradient during the algorithm training. Moreover, the EIoU loss helps the model better understand the spatial distribution and relative positions of the targets, which is crucial for achieving high-precision ship segmentation. Specifically, the EIoU loss is defined as follows:

L o s s_{E I o U} = 1 - (I o U - \frac{ρ^{2} (b^{g t}, b)}{c^{2}} - \frac{ρ^{2} (h^{g t}, h)}{c_{h}^{2}} - \frac{ρ^{2} (w^{g t}, w)}{c_{w}^{2}})

(7)

I o U = \frac{S^{p r} \cap S^{g t}}{S^{p r} \cup S^{g t}}

(8)

wherein,

c

represents the Euclidean distance of the minimum bounding rectangle diagonals between the predicted box and ground truth box;

ρ (b^{g t}, b)

denotes the Euclidean distance between the center points of the two boxes (i.e., ground truth box and predicted box);

ρ (h^{g t}, h)

represents the difference in length between the two boxes;

ρ (w^{g t}, w)

represents the difference in width between the two boxes;

c_{h}

and

c_{w}

are the height and width of the minimum bounding rectangle, respectively;

S^{g t}

represents the area of the ground truth box;

S^{p r}

represents the area of the predicted box, and other key indicators are illustrated as shown in Figure 8.

In this section, we compared the EIoU loss with the classical DIoU [32], CIoU, and SIoU [33] and demonstrated their computational processes (refer to Figure 8). Specifically, unlike the IoU metric that solely focuses on the overlapping region, DIoU considers both the distance and the area overlap between the ground truth box and predicted box, thereby enhancing the regression stability of the predicted boxes. CIoU not only takes into account the positional information and localization errors of the predicted box but also introduces shape information to more accurately measure the target accuracy. However, when the aspect ratios of the two boxes are small, there may be issues of gradient explosion or vanishing. Building upon CIoU, SIoU further considers the angle of the regression vector between the two boxes by defining an angle penalty vector, which encourages the predicted box to quickly converge to the nearest horizontal or vertical axis. However, SIoU defines multiple different IoU thresholds, which present challenges in the practical training and evaluation of the model. In comparison, the EIoU function exhibits stronger interpretability in capturing the variations of loss during the regression of the predicted box. This leads to more accurate measurement of position, shape, distance, and width-height losses for small-scale ship predicted boxes in radar images, thereby facilitating the training and convergence of the algorithm.

4. A Case Study

4.1. Dataset

A high-quality dataset can significantly improve the empirical outcomes of CNN-based algorithms. Typically, the performance of the model is heavily reliant on the quality and quantity of the training data. A good dataset should encompass multiple scenarios and situations. This research conducted pre-processing on the real data obtained from the JMA5300 marine radar deployed at Zhoushan Port in Zhejiang, resulting in the creation of a high-quality marine radar image dataset named RadarSeg. The dataset comprises 1280 images. As shown in Figure 9, ships in the images are mainly classified into two categories, i.e., long-wake ships and short-wake ships. To be specific, long-wake ships have distinct features that are easy to extract, while the pixel features of short-wake ships are similar to those of interferences such as reefs, which can interfere with the instance segmentation of ships. Moreover, the RadarSeg dataset covers complex background environments, including different weather conditions, harbor environments, and imaging conditions. Additionally, the dataset also takes into account factors such as the variations in ship heading, and traffic flow. Besides, it also emphasizes an augmentation in the number of images depicting ships of small-scale and miniature sizes. Considering different types of ships exhibit similar spot features in marine radar images, all ships in the dataset are uniformly labeled as “boat” category.

In the RadarSeg dataset, each image has been annotated with ship type labels and tightly fitting bounding boxes along the target edges. These annotations were saved in JSON format files and have been preprocessed to meet the requirements of the YOLO algorithms. We divided all the images into training, validation, and testing sets in an 8:1:1 ratio. Various algorithms were trained on the training and validation sets, and evaluation metrics and algorithm performance were assessed on the testing set. Additionally, cluster analysis revealed the number of pixels occupied by ships accounts for approximately 0.035% of the entire image. Thus, in marine radar images, ships are predominantly small-scale or even miniature targets.

4.2. Training Optimization Methods

This research optimized the training process of the proposed algorithm by improving aspects such as learning rate decay, loss calculation, and data augmentation. Normally, traditional neural network-based algorithms usually employ a series of fixed learning rates for training, which may result in significant learning rate decay at different training stages, leading to unstable changes in model momentum and negatively affecting the algorithm’s training effectiveness. To address this issue, the Adam optimizer was introduced to optimize the learning rate, which adaptively adjusted the learning rate without the need for manual tuning, based on the magnitude of parameter gradients. Additionally, label smoothing was introduced to enhance the generalization performance of ship segmentation model. This approach reduces the model’s reliance on noise or uncertainty information in the training data. Furthermore, data augmentation methods, including random cropping and horizontal flipping, were employed to increase the diversity of training data and improve the model’s robustness and generalization abilities.

4.3. Experimental Environment and Training Results

The experimental setup relied on the Ubuntu 20.04 operating system and utilized the NVIDIA RTX3090 graphics card with an effective memory size of 24GB. The CUDA version used was 11.1.0, PyTorch version was 1.9.1, and the Python environment was 3.8. We compared the performance of different algorithms in the ship segmentation, including the YOLO series and its improved variants, classical two-stage algorithms, hierarchical algorithm based on the Transformer, and MrisNet. All algorithms were evaluated using the same dataset of ship images. During algorithm training, the input image size was set to 640 × 640 pixels, the momentum was set to 0.9, the batch size was set to 8, and the training was conducted for 800 epochs with an initial learning rate of 10⁻⁴.

Under the aforementioned settings, the training of various algorithms and related experiments were carried out. Among them, YOLOv5(S), YOLOv8(S) [34], and MrisNet achieved convergence of the loss values on the training set to 14.91 × 10⁻³, 15.09 × 10⁻³, and 14.52 × 10⁻³, respectively. The experimental results indicated, when evaluated using the same metrics, MrisNet exhibited lower convergence loss compared to the standard YOLO algorithms. This illustrates MrisNet is capable of more accurately identifying ship pixel-level features in radar images. Furthermore, through the analysis of Figure 10, it could be concluded under different threshold conditions, MrisNet has achieved favorable experimental results across various evaluation metrics, further substantiating its ability to minimize misjudgments and omissions in ship segmentation while accurately locating and identifying targets.

4.4. Comparisons and Discussion

Due to the monolithic nature of the dataset utilized in this research, which consists solely of a single ship category, the calculation of mean Average Precision (mAP) directly corresponds to the Average Precision (AP) value of that specific class. Thus, we selected Recall (R), Precision (P), mAP₅₀, and mAP_50-95 as metrics to measure the effectiveness of ship segmentation and object detection. Additionally, we evaluated the convolution parameter count and computational cost using parameters (PARAMs) and floating-point operations (FLOPs). Furthermore, five sets of experiments were designed to assess the practical performance of MrisNet. Firstly, comparative experiments were designed to assess the actual performance of various algorithms across different evaluation metrics, thereby validating the effectiveness of the MrisNet. Moreover, ablation experiments were conducted to analyze the individual improvements in MrisNet and verify the specific effects of different methods. Finally, the adaptability of the MrisNet for ship segmentation in marine radar images was examined through the identification of different categories of radar images.

4.4.1. Experimental Analysis of Different Algorithms

On the constructed RadarSeg dataset, a comparison was conducted between several commonly used standard algorithms and the proposed MrisNet. All the algorithms were trained using the same hyperparameters and tested on the same dataset. Moreover, each algorithm underwent cross-testing, and the Recall, Precision, and mAP values were averaged over three experimental trials. As shown in Table 1 and Table 2, for the ship segmentation, our proposed method achieved a recall rate of 94.8% and a precision of 95.2% on the testing images. Compared to other algorithms, our method demonstrated satisfactory experimental results. This is attributed to the MrisNet’s enhanced capability to accurately identify densely navigated and small-scale ships in radar images. Furthermore, MrisNet demonstrated remarkable performance in the object detection, achieving a recall rate of 98% and precision of 98.6%. This highlights the beneficial ship localization capability of our proposed method. The results also indicate, in the context of ship instance segmentation, MrisNet surpasses other algorithms with its mAP₅₀ and mAP_50-95 scores of 0.96 and 0.508, respectively. These findings underscore the notable accuracy of MrisNet in recognizing diverse ship instances. Additionally, MrisNet showcased desirable performance in terms of parameter count and real-time computational cost, with values of 13.8M and 23.5 G, respectively. Compared to the standard YOLOv7 [35], MrisNet achieved a reduction of 61.77% and 83.44% in model parameters and computational cost, respectively. These results indicate MrisNet is more suitable for deployment on edge computing devices.

To evaluate the practical performance of MrisNet, multiple single-stage, two-stage, and contour-based instance segmentation algorithms were employed for comparison. The experimental results demonstrate MrisNet outperforms single-stage algorithms such as YOLACT [36] and SOLOv2 [37] in terms of core evaluation metrics. This indicates MrisNet, through its adaptive design in network architecture, loss function, and feature enhancement, enables more precise extraction of ship features from radar images, resulting in accurate localization and segmentation of ship instances. Due to the introduction of an additional segmentation branch outside the object detection framework, Mask R-CNN achieves relatively robust precision and segmentation quality. Experimental data reveals for ship segmentation, this algorithm surpasses 93% in both accuracy and recall rate. Additionally, it achieves mAP₅₀ and mAP_50-95 rates of 0.952 and 0.495, respectively, which are very close to the experimental results of MrisNet. However, due to its high parameter count and computational demands, it exceeds the computational capacity of most maritime devices, making it unsuitable for direct deployment on such devices. Additionally, Deepsnake [38], combining deep learning with the Snake algorithm in active contour models, enables real-time instance segmentation in various common scenarios. However, in the marine radar images, Deepsnake achieves a segmentation precision of only 91.7%, significantly lower than the proposed method. The analysis suggests the fine and small ship contours in radar images do not provide sufficient information for accurate learning of vertex offsets, thereby affecting the segmentation accuracy and recall of targets.

The Swin Transformer [39], a novel recognition algorithm based on the evolution of the Transformer architecture, achieves a recall of 94% and precision of 93.7% in ship segmentation. Moreover, the method demonstrates encouraging performance in both the mAP₅₀ and mAP_50-95 metrics, outperforming the majority of conventional benchmark algorithms. Compared to MrisNet, it experiences a decrease of 0.8% in recall and 1.5% in precision. The analysis suggests due to the limited pixel information of ships in the images and the presence of significant background noise, the fixed-size image patches used in Swin Transformer may struggle to effectively capture and represent targets of different scales. As a result, the algorithm illustrates relatively underwhelming performance in recognizing different types of ships and fails to extract detailed ship features accurately, making it difficult to distinguish ships from interfering objects.

Due to the scarcity of research focused on ship segmentation using CNN or transformer techniques in the context of marine radar images, we selected two instance segmentation algorithms specifically designed for SAR images, namely HTC+ [12] and SA R-CNN [13], to compare against our proposed method, MrisNet, to validate the effectiveness of our approach. As previously discussed, the features of targets in SAR images bear resemblance to the spot-like information of ships in marine radar images. Experimental results indicated the two modified two-stage algorithms, when applied to radar images, exhibited subpar performance. While they demonstrate relatively suitable performance in terms of object detection, their precision, recall, and mAP metrics in ship segmentation are noticeably weaker compared to MrisNet. This discrepancy can be attributed to the fact that ships in SAR images predominantly represent static targets in harbor areas, which starkly contrasts with the characteristics of ships in radar images. Hence, it is likely the algorithms designed for SAR environments may not be well-suited for the scenarios addressed in our research.

Compared to the standard YOLOv5, YOLOv7, and YOLOv8 series, MrisNet exhibits relatively robust segmentation precision and recall rate, surpassing various YOLO algorithms, including the relatively high-performing YOLOv5(S). Moreover, MrisNet exhibits significantly lower parameter count and computational costs compared to many deep-layer algorithms in the YOLO series. Additionally, in terms of recall rate, MrisNet increases 9.7% and 1.5% compared to the lightweight YOLOv8(S) and YOLOv5(S), respectively. Moreover, the method significantly outperforms these two classical lightweight algorithms in terms of both the mAP₅₀ and mAP_50-95 metrics. This indicates MrisNet experiences fewer instances of ship loss in marine radar images and shows improved efficiency throughout the experiments. Furthermore, the proposed method demonstrates noteworthy capability in suppressing false positives, accurately identifying interferences that resemble ship features, such as coastal objects, reefs, and clouds. This effectively reduces misidentification rates and enhances the precision of ship segmentation in various scenarios.

To investigate the influence of different levels of image features on ship segmentation, we combined a progressive feature pyramid with the MrisNet. This network integrated multi-scale features through a layer-by-layer concatenation approach, preserving deep-level semantic information more effectively compared to standard PANet. In the process of algorithm improvement, the feature network and prediction structure of MrisNet were retained, and the original PANet was replaced with the AsymptoticFPN [40] to construct the Mris_APFN algorithm for comparison. The experimental results demonstrated Mris_APFN achieved a recall rate of 92.3% and a precision of 92%. In addition, the method attains a modest performance with scores of 0.935 and 0.452 in terms of the mAP₅₀ and mAP_50-95 metrics, respectively. Apparently, it falls short of MrisNet in all evaluation metrics, indicating Mris_APFN suffers from more instances of missing ships and fails to extract features relevant to ship instances effectively. Further analysis suggests deep-level features tend to focus more on abstract semantic information while being relatively weaker in extracting contour details. Therefore, for ship segmentation in radar images, it is necessary to appropriately restrict the influence weight of deep-level features to better express salient features of the targets.

4.4.2. Ablation Experiments

To further validate the practical performance of each improvement method in MrisNet, a comprehensive decomposition analysis was conducted based on the RadarSeg dataset to analyze their impact on ship segmentation. The main experimental process involved step-by-step application of various improvement methods on the standard YOLOv5(S), followed by testing their respective performance metrics. The specific results of the ablation experiments for MrisNet are presented in Table 3.

(1) Analysis of the FasterYOLO network. By replacing the feature network with FasterYOLO, experimental results revealed the improved feature network increased the ship segmentation precision by 1.0%, with an improvement in recall as well. Furthermore, the network yields a substantial enhancement of 0.9% and 1.2% in the mAP₅₀ and mAP_50-95 metrics, respectively. This indicated the algorithm’s ability to suppress false targets has been enhanced, reducing the misidentification rate and decreasing the probability of target omissions. Furthermore, through multiple experiments, it has been observed FasterYOLO could accelerate the convergence speed of the algorithm, leading to relatively rapid and stable convergence of the loss values based on the training and validation sets.

(2) Analysis of SimAM. In this paper, a SimAM attention mechanism was appended after the FasterYOLO feature network, enabling the algorithm to weight the convolutional feature maps based on the principle of similarity. This allows the model to focus more on the relevant feature representations associated with the target’s spatial location. As a result, the model exhibits enhanced robustness when faced with challenges such as low illumination and occlusion in challenging scenarios. Based on the experimental results, the SimAM module enhances the salient features of ships, resulting in improvements in both precision, recall, and mAP rates of the ship segmentation model.

(3) Analysis of the CPFPN network. As mentioned earlier, applying attention mechanisms based on Transformer structures to feature fusion network enables better modeling of global contextual information in images. According to Table 3, it is evident compared to the standard PANet network, CPFPN improves the recall and precision of ship segmentation by 0.5% and 0.3%, respectively. Concurrently, the network demonstrates an enhancement of 0.7% and 1.3% in the mAP₅₀ and mAP_50-95 metrics, respectively, leading to a relatively clear improvement in the overall ship recognition capability. However, this improvement comes at the cost of a moderate increase in model parameters. In theory, due to the presence of various interfering factors in radar images, the complexity and diversity of objects such as islands and reefs make it challenging to distinguish ships from the background. The CPFPN network, by adaptively attending to target features and effectively filtering out noise or redundant objects, plays a crucial role in the instance segmentation of small-scale ships.

(4) Analysis of EIoU loss. In neural network-based algorithms, the position loss function aids the model in better accommodating variations in the position and scale of the targets. Experimental results indicate the EIoU loss improves ship segmentation precision by approximately 0.6%, along with improvements observed in other evaluation metrics. To further validate the effectiveness of EIoU, the SIoU and DIoU functions were also compared in this section. Theoretically, these two types of loss functions are also capable of accelerating model convergence. It could be observed the model employing the aforementioned comparative functions achieved somewhat favorable results, yet lower than EIoU in terms of evaluation metrics. Analysis reveals the position loss function, which takes into account spatial and shape factors, exhibits relatively better performance in ship segmentation under radar images.

4.4.3. Comparisons in Radar Images

We presented in Figure 11 the ship segmentation results in marine radar images under different scenarios using MrisNet, with a specific focus on evaluating its performance on ships with various trail features. From Figure 11a, it can be observed MrisNet accurately segments ships with long trails in different water areas. This demonstrates the algorithm’s ability to recognize ships with prominent features, as these targets maintain a certain consistency in position and shape across consecutive frames. In radar images, it is noticeable longer ship trails often exhibit trajectory interruptions or significant curvature, posing a significant challenge to the algorithm’s adaptability. However, experimental results show MrisNet effectively avoids the degradation of segmentation precision caused by these two scenarios. Figure 11b reveals MrisNet achieves satisfactory segmentation results for short-trail ships under different backgrounds, without missing or misidentifying small-scale or even tiny-scale targets. This indicates the algorithm’s relatively positive capability in extracting fine-grained object features and mitigating the influence of terrain, sea conditions, and clutter interference. Figure 11c demonstrates MrisNet accurately segments dense ships in radar images and exhibits good recognition capabilities for complex scenarios such as head-on or crossing trajectories. Analysis suggests the adoption of adaptive attention mechanisms and efficient convolutional computations in MrisNet enables it to capture salient ship features even in scenarios with dense small targets.

4.4.4. Comparisons of Small-Scale Ship Segmentation

Instance segmentations of small objects has been a prominent research direction in the field of visual recognition tasks. To evaluate the performance of different algorithms in the segmentation of small-scale ships, this section conducted experiments in two scenarios, i.e., cross navigation and dense navigation. These scenarios involve a large number of ships with short trails, and factors such as islands, reefs, and atmospheric disturbances significantly affect the model’s performance. To enhance the credibility of the findings, this experiment compared the performance of MrisNet with the standard YOLOv5(L) and YOLOv8(L). As shown in Figure 12, YOLOv8(L) exhibited a higher misidentification rate but fewer omissions in recognition. YOLOv5(L) demonstrated relatively poor accuracy in recognizing small-scale ships, leading to more omission issues and a tendency to misidentify ships traveling in opposite directions. In comparison to the comparative algorithms, MrisNet achieved more accurate localization of small-scale ships, enabling finer segmentation of ship contours and demonstrating favorable segmentation accuracy for tiny objects. This result highlights lightweight algorithm, through the construction of a rational network structure, can effectively extract ship pixels and contour features from radar images, thereby significantly improving the performance of the model in small-scale instance segmentation tasks.

4.4.5. Ship Identification in Extreme Environments

To evaluate the capability of MrisNet in ship segmentation under extreme environments, a subset of extreme scenario images was constructed by selecting samples from the RadarSeg dataset. This subset consists of 200 images, categorized into two identification scenarios, i.e., dense ship and tiny ship identification. For comparison, YOLACT, YOLOv8(S), and Mask R-CNN were also tested on the same images and threshold settings. The experimental results, as shown in Table 4, indicated MrisNet exhibited encouraging performance. It achieves relatively higher ship segmentation precision and fewer misidentification errors and performs better in terms of recall rate. Therefore, MrisNet demonstrates satisfactory adaptability to ship segmentation in extreme scenarios.

Figure 13 showcases the instance segmentation of MrisNet on ship radar images under various types of interference signals. The experimental images were extracted from the aforementioned subset of extreme scenario images, covering three common target types, i.e., short-trail ships, long-trail ships, and dense ships, which were prevalent in radar images. In this experiment, the original radar images were subjected to three types of noise processing, i.e., salt-and-pepper noise, gaussian noise, and speckle noise. The research found speckle noise and gaussian noise had a more significant impact on radar images, causing noticeable interference with trail features. This led to a decrease in the positional precision of long-trail ships and the potential omission of identification for small-scale ships, as their trail features became confounded with the image background. In contrast, salt-and-pepper noise had a minor impact on the effective ship features. The experimental results demonstrated even under the interference of salt-and-pepper noise, MrisNet was able to achieve precise segmentation of all ships. However, under the influence of speckle noise and Gaussian noise, the confidence of ship segmentation by MrisNet slightly decreased, and the positioning accuracy of ships in dense navigation scenarios also decreased. Nevertheless, in the majority of scenarios, MrisNet maintained an effective ship segmentation performance.

5. Conclusions and Discussion

This paper presents a ship segmentation algorithm called MrisNet for marine radar images. MrisNet proposes a novel feature extraction network called FasterYOLO, which incorporates efficient convolutional units to accurately extract key features of ships in radar images. Furthermore, a simple, parameter-free attention module is introduced to infer the three-dimensional attention weights of feature maps, optimizing the deep feature output of the network and enhancing the ship segmentation at a higher semantic level. Additionally, the algorithm integrates a self-attention mechanism based on the Transformer structure into the feature fusion network to model and extract long sequence feature of images more accurately, aiding in distinguishing ship instances from the background environment. In the prediction structure, the algorithm improves the calculation of position loss for the predicted boxes by utilizing the EIoU function, thereby extracting more precise positional information of ships.

Experimental results have demonstrated MrisNet outperforms common algorithms in ship segmentation on marine radar images. MrisNet achieves accurate segmentation of long-tail and short-tail ships in various scenarios, with recall and precision reaching 94.8% and 95.2%, respectively. Particularly, it exhibits acceptable performance in scenarios involving tiny ships, dense navigation, and complex interference. Furthermore, MrisNet exhibits advantages in terms of model parameter size and real-time computational cost. It has a parameter size of 13.8 M and a calculation consumption of 23.5 G, significantly reduced compared to deep YOLO series, making it more suitable for deployment in maritime monitoring devices. Considering the relatively limited image samples, future research will focus on expanding the RadarSeg dataset to cover a wider range of ship navigation scenarios. Additionally, the next steps in research will involve introducing more interference factors to enhance the model’s robustness.

Author Contributions

Conceptualization, F.M., Z.K. and C.C.; methodology, F.M. and Z.K.; software, Z.K.; validation, Z.K.; formal analysis, F.M. and C.C.; investigation, J.S. and J.D.; resources, J.D.; data curation, J.S.; writing—original draft preparation, Z.K.; writing—review and editing, F.M. and Z.K.; visualization, Z.K.; supervision, F.M. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R and D Program of China, grant number 2021YFB1600400, the National Natural Science Foundation of China, grant number 52171352.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Access to the data will be considered upon request.

Acknowledgments

We are deeply grateful to our colleagues for their exceptional support in developing the dataset and assisting with the experiments.

Conflicts of Interest

Jie Sun is an employee of Nanjing Smart Water Transportation Technology Co., Ltd. and Jizhu Deng is an employee of Nanjing Port Co., Ltd. Other authors declare no conflict of interest.

References

Wei, Y.; Liu, Y.; Lei, Y.; Lian, R.; Lu, Z.; Sun, L. A new method of rainfall detection from the collected X-band marine radar images. Remote Sens. 2022, 14, 3600. [Google Scholar] [CrossRef]
Li, B.; Xu, J.; Pan, X.; Chen, R.; Ma, L.; Yin, J.; Liao, Z.; Chu, L.; Zhao, Z.; Lian, J. Preliminary investigation on marine radar oil spill monitoring method using YOLO model. J. Mar. Sci. Eng. 2023, 11, 670. [Google Scholar] [CrossRef]
Wen, B.; Wei, Y.; Lu, Z. Sea clutter suppression and target detection algorithm of marine radar image sequence based on spatio-temporal domain joint filtering. Entropy 2022, 24, 250. [Google Scholar] [CrossRef] [PubMed]
He, W.; Ma, F.; Liu, X. A recognition approach of radar blips based on improved fuzzy c means. Eurasia J. Math. Sci. Technol. Educ. 2017, 13, 6005–6017. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Chen, X.; Su, N.; Huang, Y.; Guan, J. False-alarm-controllable radar detection for marine target based on multi features fusion via CNNs. IEEE Sens. J. 2021, 21, 9099–9111. [Google Scholar] [CrossRef]
Chen, X.; Mu, X.; Guan, J.; Liu, N.; Zhou, W. Marine target detection based on Marine-Faster R-CNN for navigation radar plane position indicator images. Front. Inf. Technol. Electron. Eng. 2022, 23, 630–643. [Google Scholar] [CrossRef]
Wang, Y.; Shi, H.; Chen, L. Ship detection algorithm for SAR images based on lightweight convolutional network. J. Indian Soc. Remote Sens. 2022, 50, 867–876. [Google Scholar] [CrossRef]
Li, S.; Fu, X.; Dong, J. Improved ship detection algorithm based on YOLOX for SAR outline enhancement image. Remote Sens. 2022, 14, 4070. [Google Scholar] [CrossRef]
Zhao, C.; Fu, X.; Dong, J.; Qin, R.; Chang, J.; Lang, P. SAR ship detection based on end-to-end morphological feature pyramid network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4599–4611. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X. HTC+ for SAR ship instance segmentation. Remote Sens. 2022, 14, 2395. [Google Scholar] [CrossRef]
Zhao, D.; Zhu, C.; Qi, J.; Qi, X.; Su, Z.; Shi, Z. Synergistic attention for ship instance segmentation in SAR images. Remote Sens. 2021, 13, 4384. [Google Scholar] [CrossRef]
Yang, X.; Zhang, Q.; Dong, Q.; Han, Z.; Luo, X.; Wei, D. Ship instance segmentation based on rotated bounding boxes for SAR images. Remote Sens. 2023, 15, 1324. [Google Scholar] [CrossRef]
Shao, Z.; Zhang, X.; Wei, S.; Shi, J.; Ke, X.; Xu, X.; Zhan, X.; Zhang, T.; Zeng, T. Scale in scale for SAR ship instance segmentation. Remote Sens. 2023, 15, 629. [Google Scholar] [CrossRef]
Sun, Y.; Su, L.; Yuan, S.; Meng, H. DANet: Dual-branch activation network for small object instance segmentation of ship images. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6708–6720. [Google Scholar] [CrossRef]
Sun, Y.; Su, L.; Luo, Y.; Meng, H.; Li, W.; Zhang, Z.; Wang, P.; Zhang, W. Global Mask R-CNN for marine ship instance segmentation. Neurocomputing 2022, 480, 257–270. [Google Scholar] [CrossRef]
Guo, M.; Guo, C.; Zhang, C.; Zhang, D.; Gao, Z. Fusion of ship perceptual information for electronic navigational chart and radar images based on deep learning. J. Navig. 2019, 73, 192–211. [Google Scholar] [CrossRef]
Mao, D.; Zhang, Y.; Zhang, Y.; Pei, J.; Huang, Y.; Yang, J. An efficient anti-interference imaging technology for marine radar. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5101413. [Google Scholar] [CrossRef]
Zhang, C.; Fang, M.; Yang, C.; Yu, R.; Li, T. Perceptual fusion of electronic chart and marine radar image. J. Mar. Sci. Eng. 2021, 9, 1245. [Google Scholar] [CrossRef]
Dong, G.; Liu, H. A new model-data co-driven method for radar ship detection. IEEE Trans. Instrum. Meas. 2022, 71, 2508609. [Google Scholar] [CrossRef]
Zhang, F.; Wang, X.; Zhou, S.; Wang, Y.; Hou, Y. Arbitrary-oriented ship detection through center-head point extraction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5612414. [Google Scholar] [CrossRef]
Zhang, T.; Wang, W.; Quan, S.; Yang, H.; Xiong, H.; Zhang, Z.; Yu, W. Region-based polarimetric covariance difference matrix for PolSAR ship detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5222016. [Google Scholar] [CrossRef]
Qi, X.; Lang, P.; Fu, X.; Qin, R.; Dong, J.; Liu, C. A regional attention-based detector for SAR ship detection. Remote Sens. Lett. 2022, 13, 55–64. [Google Scholar] [CrossRef]
Yin, Y.; Cheng, X.; Shi, F.; Zhao, M.; Li, G.; Chen, S. An enhanced lightweight convolutional neural network for ship detection in maritime surveillance system. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5811–5825. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher flops for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Jocher, G. YOLOv5 by Ultralytics. Available online: https://github.com/ultralytics/yolov5 (accessed on 21 September 2023).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA; 2017; pp. 6000–6010. [Google Scholar]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef] [PubMed]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Jocher, G. YOLO by Ultralytics. Available online: https://github.com/ultralytics/ultralytics (accessed on 21 September 2023).
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 9157–9166. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. arXiv 2020, arXiv:2003.10152. [Google Scholar]
Peng, S.; Jiang, W.; Pi, H.; Li, X.; Bao, H.; Zhou, X. Deep snake for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8533–8542. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. arXiv 2023, arXiv:2306.15988. [Google Scholar]

Figure 1. Overall structure of the proposed MrisNet. In comparison to the standard network of YOLOv5(S), MrisNet introduces several improvements and appropriate innovations. Specifically, in the feature extraction network, MrisNet replaces the original C3 (Cross Stage Partial-Darknet53) module with the FTN module and incorporates a SimAM mechanism. In the feature fusion network, the proposed method replaces the original C3 module with the CoT module. In the prediction head structure, MrisNet replaces the localization loss calculation criterion with the EIoU function.

Figure 2. Overall architecture of the feature network. By aggregating convolutional features from three distinct depths, salient information pertaining to different categories of ship spots, with a particular emphasis on small-scale ships, can be acquired.

Figure 3. Network structure of instance segmentation of YOLOv5(S). It is apparent the standard YOLOv5(S) exhibits conspicuous disparities from the algorithm proposed in this paper. However, MrisNet still retains the sequential connectivity structure of the feature extraction network and the bidirectional fusion structure of FPN (Feature Pyramid Network) and PAN (Path Aggregation Network) in the feature fusion network.

Figure 4. Network structure of FasterYOLO. This research employed four FTN modules to replace the C3 modules in the original YOLOv5, while enhancing the deep-level features through the concatenation of SPPF.

Figure 5. Comparison of several types of convolution methods. It is apparent PConv selectively incorporates conventional computation solely in specific convolutional channels, leading to a substantial reduction in redundant convolutional parameters and computational costs compared to conventional convolutional methods.

Figure 6. Network structure of CPPFN. In this section, we conducted substitutions of the C3 modules in the original YOLOv5 with four CoT modules. This integration of contextual information derived from image features contributes to the significant feature expression of minute ship spots.

Figure 7. Convolution fusion process of CoT module. Through the concatenation of static and dynamic features, CoT aids the model in capturing appearance information (e.g., texture and color) and motion information relevant to ship spots.

Figure 8. Key indicators of EIoU and calculation comparison of different loss functions. DIoU (Distance Intersection over Union) exclusively takes into account the variation in the distance between the center points of the predicted box and the ground truth box. CIoU extends this concept by incorporating considerations for the difference in aspect ratios between the two boxes. Building upon the considerations for area, shape, and distance differences between the boxes, SIoU (Scylla Intersection over Union) introduces the novel factor of angle difference.

Figure 9. Marine radar images. Long-wake ships, when compared to short-wake ships, occupy a larger number of pixels in an image, thereby possessing more pronounced and distinctive feature information.

Figure 10. Various verification results of MrisNet. The convergence curves of these three categories manifest the efficacy of the proposed method in extracting positive samples for ship targets and suppressing false positives.

Figure 11. Segmentation results of MrisNet under marine radar images. The red boxes represent the detected positions of the ship spots, and it can be observed that there is no omission of various types of ships to be recognized. The findings illustrate the proposed method exhibits commendable efficacy across a wide range of radar scenarios.

Figure 12. Comparisons of various algorithms for small-scale ship segmentation. The positions of the ship spots are indicated by the red boxes, revealing that MrisNet achieves lower false positive and false negative rates compared to the benchmark algorithms.

Figure 13. Ship segmentation results of MrisNet under various noises. By depicting the detected positions of ship spots using the red boxes, the results underscore the satisfactory ship segmentation performance of MrisNet, which remains robust in the presence of moderate levels of interference.

Table 1. Specific experimental results (i.e., Precision, Recall, mAP₅₀, and mAP_50-95) of various algorithms.

	Box				Mask
Algorithms	P	R	mAP₅₀	mAP_50-95	P	R	mAP₅₀	mAP_50-95
YOLACT	0.901	0.86	0.92	0.492	0.875	0.826	0.871	0.349
SOLOv2	0.897	0.871	0.926	0.508	0.882	0.855	0.884	0.367
Mask R-CNN	0.97	0.975	0.991	0.717	0.938	0.942	0.952	0.495
Deepsnake	0.939	0.94	0.964	0.603	0.917	0.919	0.911	0.424
Swin-Transformer (T)	0.979	0.976	0.991	0.725	0.937	0.94	0.953	0.519
HTC+	0.913	0.883	0.936	0.528	0.903	0.881	0.892	0.373
SA R-CNN	0.946	0.958	0.97	0.594	0.911	0.905	0.919	0.401
YOLOv5(N)	0.957	0.961	0.981	0.618	0.913	0.908	0.92	0.406
YOLOv5(S)	0.962	0.965	0.988	0.655	0.932	0.933	0.94	0.477
YOLOv5(M)	0.966	0.968	0.988	0.669	0.923	0.924	0.933	0.467
YOLOv5(L)	0.965	0.971	0.986	0.668	0.924	0.925	0.937	0.466
YOLOv5(X)	0.967	0.971	0.989	0.671	0.925	0.924	0.939	0.463
YOLOv7	0.968	0.965	0.98	0.675	0.919	0.917	0.924	0.412
YOLOv8(S)	0.961	0.909	0.956	0.608	0.917	0.851	0.914	0.377
YOLOv8(M)	0.967	0.956	0.982	0.662	0.923	0.914	0.935	0.446
YOLOv8(L)	0.958	0.973	0.982	0.663	0.92	0.925	0.927	0.47
YOLOv8(X)	0.969	0.964	0.977	0.659	0.925	0.922	0.94	0.465
Mris_APFN	0.966	0.965	0.981	0.65	0.92	0.923	0.935	0.452
MrisNet	0.986	0.98	0.993	0.737	0.952	0.948	0.96	0.508

Table 2. Specific experimental results (i.e., PARAMs and GFLOPs) of various algorithms.

Algorithms	PARAMs/(M)	GFLOPs
SOLOv2	61.3	232.6
YOLACT	53.72	240.2
HTC+	95.53	1289.5
SA R-CNN	53.79	101.9
Mask R-CNN	62.74	244.8
Swin-Transformer (T)	88	745
Deepsnake	16.37	25.94
YOLOv5(N)	1.8	6.7
YOLOv5(S)	7.1	25.7
YOLOv5(M)	20.65	69.8
YOLOv5(L)	45.27	146.4
YOLOv5(X)	84.2	264
YOLOv7	36.1	141.9
YOLOv8(S)	11.23	42.4
YOLOv8(M)	25.96	110
YOLOv8(L)	43.79	220.1
YOLOv8(X)	68.4	343.7
Mris_APFN	10.3	20.6
MrisNet	13.8	23.5

Table 3. Ablation experiments of MrisNet. The presence of an asterisk denotes the model’s adoption of the method corresponding to the leftmost column of the table.

Methods	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6	Model 7
YOLOv5(S)	*	*	*	*	*	*	*
+FasterYOLO		*	*	*	*	*	*
+SimAM			*	*	*	*	*
+CPFPN				*	*	*	*
+SIoU					*
+DIoU						*
+EIoU							*
R^mask	0.933	0.937	0.94	0.945	0.944	0.94	0.948
P^mask	0.932	0.942	0.943	0.946	0.941	0.938	0.952
mAP₅₀	0.94	0.949	0.95	0.957	0.945	0.942	0.96
mAP_50-95	0.477	0.489	0.492	0.505	0.49	0.488	0.508

Table 4. Comparisons of experimental data in extreme scenarios.

Algorithms	Detected Ships	True Ships	False Alarms	Recall	Pr
YOLACT	1296	1117	179	0.8007	0.8619
YOLOv8(S)	1288	1170	118	0.8387	0.9084
Mask R-CNN	1389	1299	90	0.9312	0.9352
MRNet	1382	1301	81	0.9326	0.9414

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, F.; Kang, Z.; Chen, C.; Sun, J.; Deng, J. MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments. J. Mar. Sci. Eng. 2024, 12, 72. https://doi.org/10.3390/jmse12010072

AMA Style

Ma F, Kang Z, Chen C, Sun J, Deng J. MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments. Journal of Marine Science and Engineering. 2024; 12(1):72. https://doi.org/10.3390/jmse12010072

Chicago/Turabian Style

Ma, Feng, Zhe Kang, Chen Chen, Jie Sun, and Jizhu Deng. 2024. "MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments" Journal of Marine Science and Engineering 12, no. 1: 72. https://doi.org/10.3390/jmse12010072

APA Style

Ma, F., Kang, Z., Chen, C., Sun, J., & Deng, J. (2024). MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments. Journal of Marine Science and Engineering, 12(1), 72. https://doi.org/10.3390/jmse12010072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MrisNet: Robust Ship Instance Segmentation in Challenging Marine Radar Environments

Abstract

1. Introduction

2. Related Works

2.1. Ship Identification Methods under Radar and Other Scenarios

2.2. Optimization Method for Ship Identification Research

3. A Proposed Method

3.1. Feature Extraction Network

3.1.1. YOLOv5(S) Network

3.1.2. The Main Architecture of FasterYOLO Network

3.1.3. Feature Enhancement Mechanism Based on SimAM Attention

3.2. Feature Fusion Network

3.3. Ship Prediction Module

4. A Case Study

4.1. Dataset

4.2. Training Optimization Methods

4.3. Experimental Environment and Training Results

4.4. Comparisons and Discussion

4.4.1. Experimental Analysis of Different Algorithms

4.4.2. Ablation Experiments

4.4.3. Comparisons in Radar Images

4.4.4. Comparisons of Small-Scale Ship Segmentation

4.4.5. Ship Identification in Extreme Environments

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI