Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference

Zhu, Haibin; Mu, Yaxin; Xie, Wupeng; Xing, Kang; Tan, Bin; Zhou, Yashi; Yu, Zhongde; Cui, Zhiying; Zhang, Chuang; Liu, Xin; Xia, Zhenghuan

doi:10.3390/rs17111835

Open AccessArticle

Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference

by

Haibin Zhu

¹

,

Yaxin Mu

¹

,

Wupeng Xie

²

,

Kang Xing

³

,

Bin Tan

³,

Yashi Zhou

⁴,

Zhongde Yu

³,

Zhiying Cui

³,

Chuang Zhang

³,

Xin Liu

^3,*

and

Zhenghuan Xia

³

¹

School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing 100101, China

²

Artificial Intelligence Institute of China Electronics Technology Group Corporation, Beijing 100041, China

³

Beijing Institute of Satellite Information Engineering, Beijing 100095, China

⁴

China Academy of Space Technology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(11), 1835; https://doi.org/10.3390/rs17111835

Submission received: 1 April 2025 / Revised: 9 May 2025 / Accepted: 20 May 2025 / Published: 24 May 2025

Download

Browse Figures

Versions Notes

Abstract

The interference of sidelobe often causes different targets to exhibit similar features, diminishing fine-grained classification accuracy. This effect is particularly pronounced when the available data are limited. To address the aforementioned issues, a novel classification framework for sidelobe-affected SAR imagery is proposed. First, a method based on maximum median filtering is adopted to remove sidelobe by exploiting local grayscale differences between the target and sidelobe, constructing a high-quality SAR dataset. Second, a deep metric learning network is constructed for fine-grained classification. To enhance the classification performance of the network on limited samples, a feature extraction module integrating a lightweight attention mechanism is designed to extract discriminative features. Then, a hybrid loss function is proposed to strengthen intra-class correlation and inter-class separability. Experimental results based on the FUSAR-Ship dataset demonstrate that the method exhibits excellent sidelobe suppression performance. Furthermore, the proposed framework achieves an accuracy of 84.18% across five ship target classification categories, outperforming the existing methods, significantly enhancing the classification performance in the context of sidelobe interference and limited datasets.

Keywords:

synthetic aperture radar; fine-grained ship classification; sidelobe effect; maximum median filter; deep metric learning

Graphical Abstract

1. Introduction

Synthetic aperture radar (SAR) is unaffected by weather and lighting conditions, enabling the acquisition of high-resolution images under all-day and all-weather conditions. SAR-based target recognition systems have demonstrated substantial superiority over conventional optical image classification in complex environments. Consequently, SAR has been widely used in military surveillance and civilian infrastructure monitoring [1,2,3]. Current research on SAR target classification is predominantly categorized into coarse-grained target classification and fine-grained target classification. The former distinguishes broad object categories (e.g., aircraft, ships, buildings) [4], whereas the latter is concerned with differentiating subordinate categories within a general category (e.g., cargo ships, tankers, and fishery vessels) [5]. Constrained by the limited number of images available and inadequate discriminative features, SAR fine-grained ship classification still faces formidable challenges, resulting in critical target regions being insufficiently captured by conventional models.

The core of SAR fine-grained ship classification lies in extracting inter-class discriminative features from SAR images. In recent years, deep learning (DL) has been widely used in fine-grained target classification due to its powerful feature extraction capabilities. Shamsolmoali et al. [6] proposed an image pyramid network based on rotation equivariance convolution (REFIPN) to ensure the classification performance of ship targets of different sizes by integrating multi-scale features to obtain key semantic information. Abdelmalek Toumi et al. [7] introduced a hybrid architecture combining a shallow conventional neural network (CNN) with a long short-term memory (LSTM), which leverages the LSTM’s capacity for sequential data processing to capture both spatial features and temporal dependencies within SAR imagery. Yi et al. [8] developed an essential feature mining network (EFM-Net) that augments discriminative feature representation by integrating local target features with attention-guided characteristics, achieving classification accuracies of 92.6% and 99.3% on the FGSC-23 and FGSCR-42 remote sensing datasets, respectively. Zhao and Lang [9] constructed a dual-branch network (DBN) employing transfer learning to enhance the fine-grained classification performance of the SAR image domain by incorporating the rich discriminative information inherent in optical remote sensing imagery. Zhao et al. [10] developed an efficient channel attention mechanism and integrated it into CNN to achieve low-parameter discriminative feature extraction. Chen et al. [11] introduced a SAR ship classification algorithm under the conditions of limited samples. Their method mitigates data scarcity by aligning available instances within the feature space. Concurrently, they construct a detail-enhancement module to accentuate high-frequency features and suppress noise interference. Although the above methods have relatively excellent classification performance, they still face the following challenges for the task of the fine-grained classification of SAR images.

(1): Existing DL methods typically enhance feature extraction capabilities through the design of network architectures. However, the increased complexity of networks exacerbates the occurrence of overfitting, making them less suitable for classification tasks involving limited sample sizes.
(2): Sidelobe introduces additional strong scattering points around the target in SAR imagery, obscuring the scattering features of the target [12,13]. Consequently, the combination of sidelobe suppression techniques and the DL network can facilitate critical feature extraction, resulting in enhanced classification performance. However, the current research lacks a reasonable framework to integrate sidelobe suppression and DL methods effectively. First, existing sidelobe suppression techniques suffer from excessive computational complexity and limited robustness, rendering them unsuitable for large-scale batch processing. Second, DL approaches often neglect the impact of sidelobe interference on feature extraction, resulting in the inability to achieve optimal performance under the conditions of limited data.

In this article, we propose a novel fine-grained ship classification framework for SAR images affected by sidelobe interference. Initially, an image-domain approach based on maximum median filtering is performed to eliminate various types of sidelobe interference. Subsequently, to ensure the network classification performance under a limited amount of SAR data, a proxy-based deep metric learning network is adopted as the baseline, and a feature extraction module incorporating a lightweight attention mechanism is designed to extract key features while preventing overfitting. Furthermore, a metric classification module with a hybrid loss function is designed to map the features into an embedding space, constraining the model to learn an optimal similarity metric for fine-grained classification. During the model optimization process, the FUSAR-Ship dataset after sidelobe suppression is used for the training and testing phases. The experimental results show that the lightweight network effectively extracts discriminative identification features from limited data, rendering it well-suited for fine-grained classification in few-shot scenarios.

The main contributions of this study can be concluded as follows:

(1): We first apply a sidelobe removal method based on maximum median filtering to promote dataset quality by analyzing the grayscale differences between sidelobe and adjacent pixels. Compared to the traditional method, it is characterized by low computational complexity and robustness, which renders it particularly suitable for dataset processing.
(2): A novel deep metric learning network is proposed to tackle the challenge of insufficient sample sizes in SAR imagery. The feature extraction module integrates a lightweight attention mechanism aimed at enhancing feature extraction capabilities while simultaneously reducing the risk of overfitting. Furthermore, the metric classification module is designed to improve the model’s classification performance by selecting an appropriate set of metric loss functions for fine-grained classification tasks.

The rest of this study is organized as follows. Section 2 introduces the procedure of the sidelobe suppression algorithm based on maximum median filtering. Section 3 discusses the key processes of the proposed framework and the network architecture for fine-grained classification. Experimental results under different conditions based on the FUSAR dataset are shown in Section 4. Finally, Section 5 concludes the study.

2. Related Work

In this section, a review of the current mainstream DML methods is provided, followed by discussions of the applicability of existing sidelobe removal techniques for deep learning tasks.

2.1. Deep Metric Learning

Deep metric learning (DML) is a machine learning technique employed for classification or clustering by learning the similarity metric between samples [14]. Current DML techniques primarily consist of two phases: feature extraction and metric classification. For feature extraction, Gao et al. [15] introduced a multi-branch embedding network with a bi-classifier. The multi-branch architecture captures spatial information at multiple scales, while the bi-classifier enforces a more compact clustering of semantically similar samples in the embedding space. Yang et al. [16] developed the Hierarchical Embedding Network with Center Calibration (HENC), integrating object-level and part-level feature extractors to capture hierarchical representations. Experimental evaluations on the OpenSARShip confirm that HENC achieves classification accuracies of 81.83%.

Metric classification enhances the fine-grained classification performance by designing an effective optimization function that integrates the advantages of different loss functions. Xu and Lang [17] developed an inter-class distribution shift (ICDS) to modify the original distribution. Fan et al. [18] introduced the Distribution Structure Learning Loss (DSLL), which quantifies the spatial relationships between negative samples using entropy weights derived from structural distributions, preserving the intrinsic structure of the image feature vectors. He et al. [19] integrated triplet loss, cross-entropy, and Fisher discriminant regularization into a unified objective function, mitigating overfitting and enhancing inter-class separability. Experiments based on OpenSARShip show that the classification performance of the proposed method achieves a value of 88.97%. Traditional DML avoids complex network architectures, reducing overfitting risks. However, it often suffers from suboptimal classification performance, primarily attributed to inadequate category representation in the embedding space under the conditions of limited data.

Proxy-based deep metric learning approximates actual class distributions in the embedding space using trainable proxies. This approach avoids the data dependency of traditional DML by focusing on sample–proxy relationships, making it suitable for limited-sample classification tasks. We adopt this method as the focus of the present research and further enhance its classification accuracy for SAR ship target applications.

2.2. Sidelobe Suppression

Existing sidelobe suppression methods can be categorized into two domains: signal domain and image domain. The signal domain processing method achieves sidelobe removal by processing SAR echo data. Representative algorithms include frequency-domain windowing and spatial variant apodization (SVA). Wang et al. [20] proposed an SVA algorithm based on radar image amplitude, which achieves the distinction between the main lobe and sidelobe by comparing the amplitude value of the complex data SAR image sampling point with the sum of the amplitude values of two adjacent Nyquist sampling points. Liu et al. [21] proposed the Double-SVA algorithm for squint SAR, which employs a phase shift term to realign the sidelobe distribution of squint SAR systems, thereby enabling sidelobe suppression across arbitrary squint angles. Simulation experiments further demonstrated that the sidelobe energy was reduced to 3.16% of the main-lobe energy. Compared to signal domain approaches, the image-domain processing techniques exclusively rely on image intensities rather than the original echo signals. Representative algorithms include spatial filtering and the Radon transform. Xu and Wang [22] proposed a fine segmentation algorithm for ship targets based on Radon transform, which uses the characteristics of ship targets in the radon transform domain to determine the sidelobe region. Since sidelobe is only at specific angles within the radon transform domain, sidelobe removal is achieved by selecting an appropriate angle range.

For the task of sidelobe removal in DL, the following challenges have been identified in the existing methods: (1) The signal domain processing method involves complex manipulations of raw echo signals or complex-valued SAR images, which results in high computational complexity; (2) although the Radon transform-based algorithm can concentrate sidelobe energy within a localized Radon domain, its effectiveness is limited when handling multiple or cross-shaped sidelobes. The discontinuous nature of the sidelobe results in inaccuracies in determining the extent of the sidelobes in the Radon region. Moreover, the overlap between target information and sidelobe components may lead to an irreversible loss of target details. Therefore, restricting the angular spacing alone in the Radon domain is insufficient for the complete removal of sidelobes; (3) The Radon transform-based algorithm distinguishes the target region by identifying the maximum brightness point in the Radon domain. However, the accuracy is significantly diminished under strong sidelobes and small targets, limiting its applicability for the batch processing of datasets. In our study, we attempt to design a sidelobe suppression method characterized by robustness and low computational complexity.

3. Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference

3.1. Overall Architecture

The overall framework of fine-grained ship classification in SAR images with sidelobe interference is illustrated in Figure 1. Initially, a sidelobe removal method based on maximum median filtering is applied to SAR images to enhance their quality. Subsequently, the de-sidelobed images are fed into a proxy-based deep metric learning network to achieve fine-grained classification. The network architecture comprises a feature extraction module and a metric classification module. The feature extraction module uses ResNet18 as the backbone and incorporates a lightweight attention mechanism to extract the key features of the ship. The metric classification module uses a fully connected layer to map features to the embedding space and leverages the complementarity between classification loss and pairwise loss to jointly constrain the model, which minimizes intra-class feature distances and maximizes inter-class discrepancies, ultimately accomplishing ship target classification.

3.2. Sidelobe Suppression Algorithm-Based Maximum Median Filtering

A sidelobe in SAR ship imagery typically manifests in two distinct patterns: cross-shaped and trailing-shaped. As shown in Figure 2a, trailing sidelobes are presented as streaks extending outward from the end of the target. Cross sidelobes, as illustrated in Figure 2b, are typically present as four orthogonal elongated stripes extending from the target.

In this study, a sidelobe removal method based on maximum median filtering is proposed. In contrast to the target region, sidelobes exhibit more significant differences in the grayscale values between adjacent pixels within the local area, showing a “mutation” characteristic. Maximum median filtering [23] is employed to filter out these mutation pixels while preserving target edge information and preventing edge blurring.

The main flow chart of the proposed sidelobe suppression algorithm is shown in Figure 3. First, SAR image X is preprocessed using morphological filtering to prevent the loss of target edges and internal detail information during the filtering process. Second, the preprocessed image is converted into a binary map using the two-dimensional (2D) Otsu method, enhancing the separability of target regions from the background and facilitating precise segmentation. Subsequently, we perform maximum median filtering on the obtained binarized image

X_{binarize}

. The calculation is described as follows:

y (m, n) = \max [Z_{1}, Z_{2}, Z_{3}, Z_{4}]

(1)

\begin{array}{l} Z_{1} = m e d i a n [X_{b i n a r i z e} (m, n - N), X (m, n - N + 1), \dots, X (m, n + N)] \\ Z_{2} = m e d i a n [X_{b i n a r i z e} (m - N, n), X (m - N + 1, n), \dots, X (m + N, n)] \\ Z_{3} = m e d i a n [X_{b i n a r i z e} (m - N, n - N), X (m - N + 1, n - N + 1), \dots, X (m + N, n + N)] \\ Z_{4} = m e d i a n [X_{b i n a r i z e} (m - N, n + N), X (m - N + 1, n + N - 1), \dots, X (m + N, n - N)] \end{array}

where

y (m, n)

refers to the filtering result at the coordinates

(m, n)

. The sliding window is shown in Figure 4.

Z_{1} ~ Z_{4}

represent the gray median values in the row, column, and diagonal directions within the sliding window. N is half the length of the sliding window centered at

(m, n)

.

To mitigate the loss of target structural information incurred by the filtering process, we attempt to process sidelobe as the primary component. The coarse sidelobe extraction result

z (m, n)

is obtained by performing pixel-wise subtraction between the original image and the maximum median filtering result. Then, we apply a threshold T to segment

z (m, n)

to generate a binary map of the potential sidelobe area

t (m, n)

:

z (m, n) = X (m, n) - y (m, n)

(2)

t (m, n) = \{\begin{matrix} 0, z (m, n) < T \\ 1, z (m, n) \geq T \end{matrix}

(3)

To determine the target area, we perform a comparative analysis between the original binary map and t(m,n). Finally, a pixel-wise multiplication is executed to segment target information, effectively separating the target from the sidelobe:

X^{'} (m, n) = X (m, n) ⊙ [X_{b i n a r i z e} (m, n) - t (m, n)]

(4)

where

⊙

represents the Hadamard product, and

X^{'} (m, n)

denotes the sidelobe suppression image. The sidelobe suppression process based on maximum median filtering is illustrated in Figure 5.

3.3. Feature Extraction Module

To improve the feature extraction performance in DML, we designed the feature extraction module, as shown in Figure 6. The feature extraction module modifies the first Conv layer of the traditional ResNet18 architecture by replacing it with three sequential layers, each employing a 3 × 3 convolutional kernel with a stride of 1 and padding of 1. This adjustment ensures that the network captures more local details at a shallow layer, enhancing its effectiveness for fine-grained classification tasks on limited datasets.

Furthermore, an energy function [24] is introduced to provide a lightweight attention mechanism to the backbone network, which determines the local key features by measuring the linear separability among the local descriptors. The local descriptor can be regarded as a discriminant feature in the embedding space at a specific spatial location. Since calculating the energy function does not require additional parameters and network layers, the backbone network avoids overfitting while adding an attention mechanism. The energy function of the local descriptor is defined as follows:

e_{n} = {(y_{n} - \hat{n})}^{2} + \frac{\sum_{r = 1}^{R - 1} {(y_{n} - {\hat{z}}_{r})}^{2}}{R - 1} + λ w_{n}^{2}

(5)

where

\hat{n} = w_{n} n + b_{n}

,

{\hat{z}}_{r} = w_{n} z_{r} + b_{n}

, indicates the linear transformation of the target descriptor

\hat{n}

and the rest of the descriptors

{\hat{z}}_{r}

under channel q(q = 1,2,3), respectively,

R = H \times W

represents the total number of descriptors under the channel,

y_{n} = 1

,

y_{0} = - 1

represent the target descriptor and the rest of the descriptors, and

λ

is a regularized hyperparameter. Minimizing the above equation is equivalent to calculating the linear separability of the target descriptor

\hat{n}

from the rest of the descriptors. A lower minimization value indicates enhanced linear separability, signifying that the descriptors contribute more substantially to the fine-grained classification task.

Theoretically, there are

H \times W

energy function expressions under a specific channel. To simplify the computation, the analytical solutions for

w_{n}

and

b_{n}

in Equations (6) and (7) are used to simplify the energy function expression. The formula is as follows:

w_{n} = - \frac{2 (t - μ_{n})}{{(t - μ_{n})}^{2} + 2 σ_{n}^{2} + 2 λ}

(6)

b_{n} = - \frac{1}{2} (t + μ_{n}) w_{n}

(7)

where

μ_{n} = (1 / R - 1) \sum_{r}^{R - 1} z_{r}

,

σ_{n}^{2} = (1 / R - 1) \sum_{r}^{R - 1} {(z_{r} - μ_{n})}^{2}

represent the mean and variance of other descriptors except

\hat{n}

. Since each parameter in the expressions for

μ_{n}

and

σ_{n}^{2}

is obtained within a single channel, it is reasonable to assume that all descriptors in a single channel follow the same distribution to reduce computational costs. The simplified minimum energy function expression is defined as follows:

e_{n}^{*} = - \frac{4 (σ_{n}^{2} + λ)}{{(t - μ_{n})}^{2} + 2 σ_{n}^{2} + 2 λ}

(8)

It can be observed from Equation (11) that the smaller value of

e_{n}^{*}

signifies a higher discrimination. Consequently, the attention weights corresponding to local key features can be represented as

1 / e_{n}^{*}

. The final optimization phase is as follows:

\tilde{F} = sigmoid (\frac{1}{e_{a l l}}) ⊙ F

(9)

where

e_{a l l}

contains the

e_{n}^{*}

obtained by different descriptors under all channels, sigmoid is used to map

e_{a l l}

to the range [0, 1], and ⊙ represents the Hadamard product.

3.4. Metric Classification Module

Proxy-based DML exhibits performance limitations in fine-grained classification tasks, primarily due to its neglect of rich local relationships between samples. In view of this problem, the metric classification module considers classification loss and pairwise loss simultaneously. Classification loss focuses on distinguishing the global distribution information across different categories, while Pairwise loss concentrates on the local information that distinguishes different types of samples. Based on the advantage of classification loss and pairwise loss, a hybrid loss function

L_{t o t a l}

is designed to leverage the complementarity between the two losses, enhancing the classification performance of the training model. The hybrid loss function

L_{t o t a l}

is defined as follows:

L_{t o t a l} = λ L_{c l s} + (1 - λ) L_{p a i r}

(10)

where

L_{c l s}

is the classification loss and

L_{p a i r}

is the pairwise loss, and λ ∈ (0, 1) is used to control the importance of the two types of losses in

L_{t o t a l}

. Multiple experimental verifications recommend that

L_{t o t a l}

be given greater emphasis on global information. The parameter λ should be set to a relatively high value. In this study,

λ = 0.8

is set empirically.

SoftTriple loss [25] was adopted as the classification loss due to its capability to leverage multiple proxies for enhanced representation learning. A specific category of ships is usually composed of multiple local features. SoftTriple loss uses multiple proxies to better represent the intra-class differences of ships, which is defined as:

s_{X_{i}, p_{y}^{n}}^{'} = \sum_{n = 1}^{K} \frac{\exp (\frac{1}{γ} s_{X_{i}, p_{y}^{n}})}{\sum_{n = 1}^{N} \frac{1}{γ} s_{X_{i}, p_{y}^{n}}} \frac{1}{γ} s_{X_{i}, p_{y}^{n}}

(11)

L_{c l s} (x_{i}) = - \log \frac{\exp (θ (s_{X_{i}, p_{y}^{n}}^{'} - δ))}{\exp (θ (s_{X_{i}, p_{y}^{n}}^{'} - δ)) + \sum_{j \neq y_{i}} \exp (θ s_{X_{i}, p_{j}^{n}}^{'})} + \frac{τ \sum_{j = 1}^{C} \sum_{t = 1}^{K} \sum_{s = t + 1}^{K} \sqrt{2 - 2 {W_{j}^{s}}^{T} W_{j}^{t}}}{C K (K - 1)}

(12)

where

X_{i}

represents the ith SAR image sample in a batch,

p_{y}^{n}

represents the

n

th proxy of class y (

y \in \{1, \dots C\}

), the constant γ is used to control the smoothing scale,

s_{X_{i}, p_{y}^{n}}

represents the cosine similarity of the sample

X_{i}

to the proxy

p_{y}^{n}

, and

s_{X_{i}, p_{y}^{n}}^{'}

is the weighted sum of similarities between sample

X_{i}

and all proxies in class

y

. A higher score for

s_{X_{i}, p_{y}^{n}}^{'}

corresponds to a greater probability of the sample

X_{i}

belonging to category y. In Equation (12), C represents the number of categories, K represents the number of proxies within the class,

δ

is the margin,

θ

is the scale factor, and

{W_{j}^{s}}^{T} W_{j}^{t}

represents the similarity between the

s

th proxy and

t

th proxy in category

j

.

Circle loss [26] was selected as the pairwise loss. The core concept of circle loss is to address the imbalance in optimization directions between samples. By introducing dynamic scale factors and margin parameters, circle loss efficiently utilizes the sample distribution and is better suited to handling imbalanced datasets and complex inter-class relationships. It is defined as follows:

L_{p a i r} = \log [1 + \sum_{j = 1}^{N} \exp (γ s_{n}^{j} α_{n}^{j} - m) \sum_{i = 1}^{P} \exp (- γ s_{p}^{i} α_{p}^{i} - (1 - m))]

(13)

where

s_{p}^{i}

denotes the

i

th sample (

i

= 1… P) from the positive sample set

p

in the embedding space,

s_{n}^{j}

denotes the

j

th sample (

j

= 1… N) from the negative sample set

n

in the embedded space, and

γ

and

m

are defined as hyperparameters to control the separation between samples and achieve loss scale scaling, respectively.

α_{p}^{i} = {[O_{p} - s_{p}^{i}]}_{+}

,

α_{n}^{j} = {[s_{n}^{j} - O_{n}]}_{+}

are used to regulate the step size of the loss function during the iterative process, and

O_{p}

and

O_{n}

are the optimal solutions for

s_{p}^{i}

and

s_{n}^{j}

, respectively.

4. Experimental Results Based on the FUSAR Dataset

4.1. FUSAR-Ship Dataset

The FUSAR-Ship dataset, as established by Hou et al. [27], comprises high-resolution maritime observations extracted from 126 original Gaofen-3 remote sensing images. This dataset exhibits dual-polarimetric configurations (HH/VV) with a ground spatial resolution of 1.124 m × 1.728 m, acquired in ultra-fine stripmap (UFS) imaging mode. Due to the background complexity of ships in the FUSAR-Ship dataset, rigorous sample selection was implemented to ensure applicability for fine-grained ship classification research. Specifically, 393 ship samples from five categories were retained for model training, as shown in Table 1. Then, 60% of samples were selected as the training set, while the remaining 40% were used as the test set to evaluate the classification performance of the network. Representative slices for each category are shown in Figure 7.

4.2. Sidelobe Suppression Algorithm Effectiveness Analysis

4.2.1. Evaluation Indicators

In order to verify the sidelobe removal performance of our method, we select the uniformity of intra-region (UR), the dissimilarity of inter-region (DR), and the complexity (C) to evaluate the improvement in image quality. Let the target region in

X_{(m, n)}^{'}

be denoted as

R_{1}

, and the remaining as

R_{2}

. UR and DR are calculated as follows [28]:

UR = 1 - \frac{1}{M} \sum_{k = 1}^{2} \frac{{\sum_{(m, n) \in R_{k}} [X (m, n) - \frac{1}{M_{k}} \sum_{(m, n) \in R_{k}} X (m, n)]}^{2}}{{[\max_{(m, n) \in R_{k}} X (m, n) - \min_{(m, n) \in R_{k}} X (m, n)]}^{2}}

(14)

DR = \frac{\frac{1}{M_{1}} \sum_{(m, n) \in R_{1}} X (m, n) - \frac{1}{M_{2}} \sum_{(m, n) \in R_{2}} X (m, n)}{\max_{(m, n) \in X} X (m, n) - \min_{(m, n) \in X} X (m, n)}

(15)

where

M_{k}

represents the number of pixels in the region

R_{1}

or

R_{2}

, and M is the total number of pixels in the image. UR ∈ [0, 1], where a larger UR value indicates greater uniformity within the two regions, leading to better target extraction quality. DR ∈ [0, 1], where a larger DR value indicates a greater difference between the

R_{1}

and

R_{2}

regions, resulting in stronger discrimination and better target extraction quality.

Complexity serves as a quantitative metric characterizing the feature preservation efficacy of target regions following sidelobe suppression, which can be defined as [28]:

C = \frac{L^{2}}{4 π S}

(16)

where L and S represent the perimeter and area of

R_{1}

, respectively. The greater the value of C, the more complex the edge structure of the target. Ships typically have regular geometry. Therefore, the smaller the shape’s complexity, the closer the target extraction area is to the ship’s geometry.

4.2.2. Comparison Results

In the experiments, four representative ship slices were selected from the FUSAR-Ship dataset [27]. Figure 8 presents the processing results of four slices using the conventional Radon transform algorithm [22] and the proposed methodology. The sliding window size in maximum median filtering was adaptively configured according to target dimensions, with larger vessels necessitating larger window sizes. Based on a statistical analysis of ship sizes within the FUSAR-Ship dataset, the half-side length of the window was empirically set to N = 17 pixels. The threshold T was determined as half of the image’s maximum grayscale value.

Figure 8a presents the original SAR image of four samples (Sample ID 1–4), each containing a ship with distinct azimuth orientations, sizes, and sidelobe interference patterns. Figure 8b shows the sidelobe suppression results of the Radon transform algorithm. It can be observed that the trailing sidelobe in Sample 1 is effectively eliminated. However, for Samples 2 and 3, which exhibit multiple sidelobes and cross-shaped sidelobes, accurately determining the angular range becomes challenging, leading to residual sidelobe interference. In the case of the small-scale ship target in Sample 4, the spatial overlap between the sidelobe and the target in the Radon domain results in a loss of structural information, while some edge features of the sidelobe interference are retained.

Figure 8c shows the sidelobe removal results of the proposed method. It is evident that both the cross-shaped sidelobe and trailing sidelobe are more effectively suppressed compared to the Radon transform algorithm, while the structural features of the ship are well preserved. This outcome demonstrates that the proposed algorithm exhibits strong robustness in mitigating sidelobe interference, making it applicable to a wide range of sidelobe scenarios and enhancing its feasibility in dataset applications. Furthermore, it eliminates the requirement for model design or complex data computations, resulting in low computational complexity. Its efficiency is particularly advantageous under the constraints of limited computational resources.

Table 2 presents a quantitative comparison of key indicators between the two sidelobe removal algorithms. It is seen that the maximum median filtering-based algorithm can obtain a higher UR and DR, showing better sidelobe suppression than the Radon-transform-based algorithm. Additionally, the complexity of the SAR images is lower than that of images processed with the Radon transform algorithm, which suggests that the proposed method effectively eliminates sidelobe interference while preserving the structural characteristics of the ship target.

4.3. Fine-Grained Classification Performance Analysis

4.3.1. Experimental Environment and Sampling Strategy

All experiments reported in this study were conducted in the PyTorch 1.8.0 environment and trained on Nvidia GeForce GTX1650 GPU. In the training phase, the backbone network initializes the model based on ImageNet pre-training weights, the optimizer uses AdamW, and the learning rate of the backbone network and the metric loss hyperparameters are set to 3 × 10⁻⁵ and 1 × 10⁻⁴, respectively. The batch size is set to 16.

Due to the issues of class imbalance and insufficient sample sizes in the experimental dataset, data augmentation techniques were implemented, including image intensity scaling, horizontal flipping, and random rotation, to improve the generalization ability of the training model. In addition, a class-weight-based sampling strategy was employed. This is mathematically formalized as:

w_{c} = \frac{1}{\sqrt{N_{c}}}

(17)

P (x_{i}) = \frac{w_{c (i)}}{\sum_{j = 1}^{C} ω_{j} N_{j}}, \forall x_{i} \in D_{t r a i n}

(18)

where

N_{c}

denotes the class-specific sample amount, and C represents the total classes. Smaller weights are assigned to samples from large sample categories, while larger weights are assigned to samples from small sample categories. This weighted sampling strategy ensures that samples from different categories have an equal probability of being selected, alleviating the class imbalance problem.

4.3.2. Evaluation Metrics

The present study employs the mean value of the classification accuracy from ten repeated experiments as an evaluation metric, which can effectively reduce the random fluctuations in the classification performance of individual experiments.

{A c c}_{f i n a l}

is defined as follows:

A c c_{f i n a l} = \frac{1}{M} \sum_{j = 1}^{M} A c c_{j}

(19)

where M = 10. For the jth experiment, the classification accuracy

{A c c}_{j}

is defined as:

A c c_{j} = \frac{D \{i f x \in D_{t e s t} \cap f_{x} = y_{x}\}}{D_{t e s t}}

(20)

where

D_{t e s t}

is the amount of test data, and

x

represents an instance in

D_{t e s t}

.

f_{x}

and

y_{x}

denote the predicted label category and the accurate label category of

x

, respectively.

4.3.3. Hyperparameter Selection

The value of λ represents the trade-off hyperparameter of the hybrid loss function

L_{total}

, which is used to control the relative importance of the two types of losses. By adjusting the value of λ, the model can find the best balance between classification loss and pairwise loss, achieving good performance on a given dataset.

To justify the choice of λ, a sensitivity analysis was conducted under the same alternative parameter configuration. Figure 9 presents the hyperparameter selection results for the proposed framework. Increasing λ signifies that classification loss possesses greater importance within the hybrid metric loss function. It is evident that the optimal value of λ is 0.8, achieving an accuracy of 84.18%. When λ > 0.8, the contribution of Circle loss to the hybrid metric loss function diminishes, impeding the model’s ability to fully capture inter-sample relationships and thus preventing optimal performance. When λ < 0.8, the classification performance deteriorates progressively with decreasing λ values. Notably, when λ falls below 0.5, the classification accuracy declines significantly to 81.25%. This result is consistent with the theoretical analysis. Classification loss emphasizes the supervision of class-discriminative information to capture differences between categories, whereas pairwise loss focuses on local information between samples and uses inter-sample differences to fine-tune the decision boundary of the model. An excessively low λ value leads to interference with the direction of model optimization and overfitting. Thus, we recommend maintaining a sufficiently high α value to ensure stable model convergence.

4.3.4. Comparison with SOTA Methods

Due to the limited amount of data in the FUSAR dataset, existing models presenting complex network architectures suffer from overfitting. Therefore, we select eight representative methods, namely, ResNet18 [29], BN-Inception [30], VGG-16 [31], EFM-Net [8], Single Proxy-DML (SP-DML) [32], DSL Loss [18], Combination Loss [19], and UMP+D [33], to evaluate the classification performance of our methods.

As shown in Table 3, our method attains an

{A c c}_{f i n a l}

of 84.18% on the FUSAR-Ship dataset, which is significantly higher than traditional classification methods. In addition, the bold value indicates the optimal values of the parameters under different methods. It can be observed that our approach maintains a parameter number (PN) comparable to that of the lightweight ResNet18 network. Although the floating-point operations (FLOPs) are higher than those of ResNet18, the additional computational cost is acceptable relative to the improvement in performance. This performance superiority stems from the sidelobe suppression technique based on maximum median filtering, which effectively mitigates sidelobe interference while preserving the fine-grained features of the ship target, providing high-quality inputs for the subsequent classification network. In addition, a classification network based on DML demonstrates robust few-shot adaptation capabilities through optimized similarity metric learning, enforcing intra-class compactness and inter-class separability in the embedding space, thereby enhancing the fine-grained classification performance.

4.3.5. Ablation Study

To verify the impact of three major innovations on the final classification performance, a series of ablation studies was conducted under the same parameter configuration. As shown in Table 4, (a) the integration of the feature extraction module into the baseline DML architecture yields a 0.36% accuracy improvement in

{A c c}_{f i n a l}

, which indicates that the lightweight attention mechanism can exploit the linear separability between local descriptors to obtain attention weights, enabling discriminative feature selection. (b) The incorporation of the metric classification module elevates performance by 0.86%, demonstrating that the hybrid loss function, constructed on the basis of the complementarity between classification loss and pairwise loss, is capable of considering both global and local information in a comprehensive manner, contributing to the further enhancement of the classification performance of the training model. (c) When sidelobe removal is integrated,

{A c c}_{f i n a l}

generates the most substantial improvement of 6.06%, which can be attributed to the fact that the sidelobe removal method effectively suppresses interference caused by sidelobes in the feature extraction process. (d) The full integration of the three innovations achieves state-of-the-art performance, reaching 84.18% and surpassing conventional DML implementations by 9.07%. This result demonstrates that the combination of the sidelobe removal method and the DML network can fully exploit the discriminative features of the ship target to derive a more discriminative similarity metric, which, in turn, enhances the fine-grained classification accuracy of the DML network.

Moreover, based on a comparative analysis of the improved classification performance of the two groups (Rows 1–2 and Rows 4–5), it is evident that, when the SAR image undergoes sidelobe removal, the performance of the feature extraction module is significantly enhanced, with the

{A c c}_{f i n a l}

improvement increasing from an initial value of 0.36% to 1.11%. This phenomenon is attributed to the effectiveness of the energy function being restricted by the image quality. The reduced image quality prevents the energy function from accurately reflecting the actual contribution of the local descriptors, introducing a bias in the calculation of attention weights and consequently weakening the impact of the attention mechanism on fine-grained classification.

4.3.6. Visualization Results

To verify the influence of the sidelobe and feature extraction modules on the network’s performance, a heat map visualization analysis is performed on the FUSAR-Ship dataset. The visualization samples are realized through the application of Grad-CAM. The luminance of the color within the feature map indicates the contribution of distinct image regions to the classification decision, with higher luminance denoting a more substantial contribution to the final classification outcome.

As depicted in Figure 10, our model demonstrates a superior capacity to extract discriminative features from the ship images. The following aspects warrant observation: (a) following the sidelobe suppression, the heat map is more concentrated on the ship region, demonstrating that the sidelobe removal method can effectively mitigate the influence exerted by the sidelobe on subsequent feature extraction processes. (b) Compared to the heat maps presented in Figure 10c,d, the highlighted regions within the proposed network’s heat map are relatively focused on the ship region, which indicates that the feature extraction module can accurately identify ship target features.

4.3.7. Confusion Matrix

We present a confusion matrix from the traditional DML network and our proposed method to evaluate classification performance across different ship categories. The results are shown in Figure 11.

By examining the confusion matrix in Figure 11a, it can be observed that two classes, namely, ‘Carrier’ and ‘Fishing’, achieve excellent classification accuracies of 92.31% and 89.74%, respectively. On the other hand, the ‘Cargo’ type of ships exhibits the lowest classification accuracy, with a rate of only 55.56%. This phenomenon is mainly attributed to the imbalance of samples and the obstruction caused by the sidelobes, which has a greater effect on smaller targets’ geometric feature extraction, resulting in poor classification performance for the ‘Cargo’ type.

According to Figure 11b, our method elevates the ‘Cargo’ class classification accuracy by 17.17% (from 55.56% to 72.73%). This enhancement confirms our method’s efficacy in eliminating sidelobe and facilitating subsequent ship feature extraction. In addition, we can observe that, for the ‘Carrier’ category, the classification performance of our method is slightly lower than that of the traditional DML network (from 92.31% to 90.38%). This performance degradation primarily stems from the category imbalance and the masking of features caused by the sidelobe. These factors led to an overemphasis on the target and side-lobe features associated with the ‘carrier’ category during model optimization. Consequently, the classification decision is tilted in favor of the ‘Carrier’ category, producing the appearance of superior classification performance for the ‘Carrier’ category. When the dataset is processed with sidelobe removal, the sidelobe features are eliminated, abolishing the over-learning of ‘Carrier’ categories. The classification performance reverts to baseline, exhibiting a modest decline. In future work, we can try to address this limitation using methods such as multi-source feature fusion and transfer learning.

5. Conclusions

In this study, we consider the influence of sidelobe on target fine-grained classification in SAR images. We present a novel fine-grained classification framework under the conditions of sidelobe interference. A method based on maximum median filtering was designed to suppress sidelobe, generating high-quality processed image data. Simultaneously, the enhanced SAR imagery was integrated into a deep-metric-learning-based framework, which consists of the feature extraction module and the metric classification module. The feature extraction module, grounded in ResNet18, was combined with a lightweight attention mechanism to enhance the extraction of key features. For the metric classification module, a hybrid loss function that combines SoftTriple loss and Circle loss was employed to constrain the embedding space geometry, guiding the model to optimize inter-class separability and intra-class compactness. Experimental validation conducted on the FUSAR-Ship dataset demonstrates that our method, based on maximum median filtering, is suitable for the SAR batch process. Additionally, the classification performance of our framework reaches 84.18% for the five types of ship classification tasks, outperforming the state-of-the-art classification methods.

In summary, the experimental verification results show that our proposed framework may offer valuable insights for the development of fine-grained ship classification techniques, particularly in practical applications constrained by limited datasets. As we all know, the scarcity of training samples constrains the model’s generalization capability. With the development of SAR imaging systems and related technologies in recent years, the sample size of available SAR images continues to increase. In future work, we will improve the proposed framework’s generalization performance based on larger and more diverse datasets.

Author Contributions

Conceptualization, Y.M. and X.L.; methodology, H.Z., W.X. and K.X.; validation, B.T. and Z.X.; writing—original draft preparation, H.Z.; writing—review and editing, Y.Z., Z.C., C.Z., Z.Y. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62201069 and 62201573.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Wupeng Xie was employed by the company Artificial Intelligence Institute of China Electronics Technology Group Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SAR	Synthetic aperture radar
DL	Deep learning
REFIPN	Image pyramid network based on rotation equivariance convolution
CNN	Conventional neural network
LSTM	Long short-term memory
EFM-Net	Essential feature mining network
DBN	Dual-branch network
DML	Deep metric learning
HENC	Hierarchical embedding network with center calibration
ICDS	Inter-class distribution shift
DSLL	Distribution structure learning loss
SVA	Spatial variant apodization
UFS	Ultra-fine stripmap
UR	Uniformity of intra-region
DR	Dissimilarity of inter-region
C	Complexity
PN	Parameter number
Flops	Floating-point operations
SOTA	State-of-the-art
SP-DML	Single proxy- based deep metric learning

References

Bi, H.; Liu, Z.; Deng, J.; Ji, Z.; Zhang, J. Contrastive Domain Adaptation-Based Sparse SAR Target Classification under Few-Shot Cases. Remote Sens. 2023, 15, 469. [Google Scholar] [CrossRef]
Wang, L.; Qi, Y.; Mathiopoulos, P.T.; Zhao, C.; Mazhar, S. An Improved SAR Ship Classification Method Using Text-to-Image Generation-Based Data Augmentation and Squeeze and Excitation. Remote Sens. 2024, 16, 1299. [Google Scholar] [CrossRef]
Shi, Y.; Du, L.; Guo, Y.; Du, Y.; Li, Y. Unsupervised Domain Adaptation for Ship Classification via Progressive Feature Alignment: From Optical to SAR Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5222517. [Google Scholar] [CrossRef]
Shi, J.; Jiang, Z.; Zhang, H. Few-Shot Ship Classification in Optical Remote Sensing Images Using Nearest Neighbor Prototype Representation. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 3581–3590. [Google Scholar] [CrossRef]
Wei, X.S.; Song, Y.Z.; Aodha, O.M.; Wu, J.; Peng, Y.; Tang, J.; Yang, J.; Belongie, S. Fine-Grained Image Analysis With Deep Learning: A Survey. IEEE Trans. Pattern. Anal. Mach. Intell. 2022, 44, 8927–8948. [Google Scholar] [CrossRef]
Shamsolmoali, P.; Zareapoor, M.; Chanussot, J.; Zhou, H.; Yang, J. Rotation Equivariant Feature Image Pyramid Network for Object Detection in Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5608614. [Google Scholar] [CrossRef]
Toumi, A.; Cexus, J.C.; Khenchaf, A.; Abid, M. A Combined CNN-LSTM Network for Ship Classification on SAR Images. Sensors 2024, 24, 7954. [Google Scholar] [CrossRef]
Yi, Y.; You, Y.; Li, C.; Zhou, W. EFM-Net: An Essential Feature Mining Network for Target Fine-Grained Classification in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5606416. [Google Scholar] [CrossRef]
Zhao, S.; Lang, H. Improving Deep Subdomain Adaptation by Dual-Branch Network Embedding Attention Module for SAR Ship Classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 8038–8048. [Google Scholar] [CrossRef]
Zhao, S.; Li, W.; Shen, F.; You, M. LN-SCNet: A Lightweight Convolutional Neural Network for SAR Ship Classification. IEEE Access 2025, 13, 39394–39404. [Google Scholar] [CrossRef]
Chen, Y.; An, W.; Zou, B.; Ren, P. AlignMixup-based ship classification in SAR imagery. Signal Image Video Process. 2025, 19, 252. [Google Scholar] [CrossRef]
Chan, Y.K.; Koo, V. An introduction to Synthetic Aperture Radar (SAR). Prog. Electromagn. Res. B 2008, 2, 27–60. [Google Scholar] [CrossRef]
Yuan, S.; Yu, Z.; Li, C.; Wang, S. A Novel SAR Sidelobe Suppression Method Based on CNN. IEEE Geosci. Remote Sens. Lett. 2020, 18, 132–136. [Google Scholar]
Kaya, M.; BİLge, H.Ş. Deep Metric Learning: A Survey. Symmetry 2019, 11, 1066. [Google Scholar] [CrossRef]
Gao, G.; Wang, M.; Zhou, P.; Yao, L.; Zhang, X.; Li, H.; Li, G. A Multibranch Embedding Network With Bi-Classifier for Few-Shot Ship Classification of SAR Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5201515. [Google Scholar] [CrossRef]
Yang, M.; Bai, X.; Wang, L.; Zhou, F. HENC: Hierarchical Embedding Network With Center Calibration for Few-Shot Fine-Grained SAR Target Classification. IEEE Trans. Image Process. 2023, 32, 3324–3337. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Lang, H. Distribution Shift Metric Learning for Fine-Grained Ship Classification in SAR Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 2276–2285. [Google Scholar] [CrossRef]
Fan, L.; Zhao, H.; Zhao, H.; Liu, P.; Hu, H. Distribution Structure Learning Loss (DSLL) Based on Deep Metric Learning for Image Retrieval. Entropy 2019, 21, 1121. [Google Scholar] [CrossRef]
He, J.; Wang, Y.; Liu, H. Ship Classification in Medium-Resolution SAR Images via Densely Connected Triplet CNNs Integrating Fisher Discrimination Regularized Metric Learning. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3022–3039. [Google Scholar] [CrossRef]
Wang, Q.; Zhu, W.; Li, Z.; Ji, Z.; Sun, Y. Module spatially variant apodization algorithm for enhancing radar images. In Proceedings of the 2012 9th European Radar Conference, Amsterdam, The Netherlands, 31 October–2 November 2012; pp. 294–297. [Google Scholar]
Liu, M.; Li, Z.; Liu, L. A Novel Sidelobe Reduction Algorithm Based on Two-Dimensional Sidelobe Correction Using D-SVA for Squint SAR Images. Sensors 2018, 18, 783. [Google Scholar] [CrossRef]
Xu, X.; Wang, X. Fine segmentation of ship targets for high-resolution SAR images based on Radon transform. Appl. Electron. Tech. 2023, 49, 142–148. [Google Scholar]
Suyog, D.D.; Meng Hwa, E.; Ronda, V.; Philip, C. Max-mean and max-median filters for detection of small targets. Signal Data Process. Small Targets 1999, 3809, 74–83. [Google Scholar]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Qian, Q.; Shang, L.; Sun, B.; Hu, J.; Tacoma, T.; Li, H.; Jin, R. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6449–6457. [Google Scholar]
Sun, Y.; Cheng, C.; Zhang, Y.; Zhang, C.; Zheng, L.; Wang, Z.; Wei, Y. Circle Loss: A Unified Perspective of Pair Similarity Optimization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 6398–6407. [Google Scholar]
Hou, X.; Ao, W.; Song, Q.; Lai, J.; Wang, H.; Xu, F. FUSAR-Ship: Building a high-resolution SAR-AIS matchup dataset of Gaofen-3 for ship detection and recognition. Sci. China Inf. Sci. 2020, 63, 140303. [Google Scholar] [CrossRef]
Chabrier, S.; Emile, B.; Rosenberger, C.; Laurent, H. Unsupervised Performance Evaluation of Image Segmentation. EURASIP J. Adv. Signal Process. 2006, 2006, 096306. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512-03385. [Google Scholar]
Sergey, I.; Christian, S. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502-03167. [Google Scholar]
Simonyan, K.; Zisserman, A.J.C. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409-1556. [Google Scholar]
Movshovitz-Attias, Y.; Toshev, A.; Leung, T.K.; Ioffe, S.; Singh, S. No Fuss Distance Metric Learning Using Proxies. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 360–368. [Google Scholar]
Xu, J.; Lang, H. A Unified Multiple Proxy Deep Metric Learning Framework Embedded With Distribution Optimization for Fine-Grained Ship Classification in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 5604–5620. [Google Scholar] [CrossRef]

Figure 1. The overall framework of the proposed method.

Figure 2. SAR ship target under sidelobe interference: (a) Trailing sidelobe; (b) cross sidelobe.

Figure 3. Flow chart of the proposed sidelobe suppression algorithm.

Figure 4. Schematic diagram of the maximum median filter window.

Figure 5. Results of sidelobe suppression algorithm-based maximum median filtering: (a) original SAR image; (b) morphological filtering; (c) binary image

X_{binarize}

; (d) coarse sidelobe extraction result; (e) sidelobe potential area; (f) sidelobe suppression image.

Figure 5. Results of sidelobe suppression algorithm-based maximum median filtering: (a) original SAR image; (b) morphological filtering; (c) binary image

X_{binarize}

; (d) coarse sidelobe extraction result; (e) sidelobe potential area; (f) sidelobe suppression image.

Figure 6. The structure of the feature extraction module.

Figure 7. Representative sample of each class in FUSAR-Ship. (a) Cargo; (b) carrier; (c) fishing; (d) other; (e) tanker.

Figure 8. Comparison of sidelobe removal results from ship slices in SAR images: (a) original SAR image; (b) Radon-transform-based algorithm; (c) sidelobe suppression algorithm-based maximum median filtering.

Figure 9. Variation in classification accuracy with different

λ

values.

Figure 9. Variation in classification accuracy with different

λ

values.

Figure 10. Original ship images and attention maps: (a) original SAR image; (b) visualization results without sidelobe removal; (c) visualization results of DML without the feature extraction module; (d) visualization results of DML with sidelobe removal and the feature extraction module.

Figure 11. Confusion matrices for the best classification results of each framework on the FUSAR dataset: (a) traditional DML network:

{A c c}_{f i n a l}

= 75.11%; (b) proposed network:

{A c c}_{f i n a l}

= 84.18%.

Figure 11. Confusion matrices for the best classification results of each framework on the FUSAR dataset: (a) traditional DML network:

{A c c}_{f i n a l}

= 75.11%; (b) proposed network:

{A c c}_{f i n a l}

= 84.18%.

Table 1. FUSAR-Ship dataset composition.

Class	Number
Cargo	80
Carrier	134
Fishing	92
Other	41
Tanker	46
Total	393

Table 2. Comparison of the sidelobe removal performance of different algorithms.

ID	Radon Transform			Proposed Methods
ID	UR	DR	C	UR	DR	C
1	0.9929	0.8316	7.3556	0.9952	0.8329	4.9709
2	0.9779	0.8087	9.7632	0.9889	0.7996	6.8564
3	0.9923	0.8798	4.9137	0.9946	0.8948	4.8471
4	0.9864	0.7991	12.0521	0.9885	0.8901	4.5120

Table 3. Comparison with SOTA methods on the FUSAR-Ship dataset.

Method	Acc_final (%)	PN (M)	FLOPs (G)
ResNet18 [28]	72.10	11.21	2.38
BN-Inception	72.56	10.34	2.68
VGG-16	73.21	138.36	20.21
EFM-Net	76.02	87.10	15.21
SP-DML [30]	73.38	11.21	2.38
DSL Loss [14]	75.51	11.21	2.41
Combination Loss [15]	74.69	11.21	2.41
UMP+D [29]	77.22	11.21	2.40
Our Methods	84.18	11.20	9.02

Table 4. Ablation study on the FUSAR-Ship Dataset.

Sidelobe Removal	Feature Extraction Module	Metric Classification Module	Acc_final (%)
×	×	×	75.11
×	√	×	75.47
×	×	√	75.97
√	×	×	81.17
√	√	×	82.28
√	×	√	82.91
×	√	√	76.10
√	√	√	84.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, H.; Mu, Y.; Xie, W.; Xing, K.; Tan, B.; Zhou, Y.; Yu, Z.; Cui, Z.; Zhang, C.; Liu, X.; et al. Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference. Remote Sens. 2025, 17, 1835. https://doi.org/10.3390/rs17111835

AMA Style

Zhu H, Mu Y, Xie W, Xing K, Tan B, Zhou Y, Yu Z, Cui Z, Zhang C, Liu X, et al. Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference. Remote Sensing. 2025; 17(11):1835. https://doi.org/10.3390/rs17111835

Chicago/Turabian Style

Zhu, Haibin, Yaxin Mu, Wupeng Xie, Kang Xing, Bin Tan, Yashi Zhou, Zhongde Yu, Zhiying Cui, Chuang Zhang, Xin Liu, and et al. 2025. "Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference" Remote Sensing 17, no. 11: 1835. https://doi.org/10.3390/rs17111835

APA Style

Zhu, H., Mu, Y., Xie, W., Xing, K., Tan, B., Zhou, Y., Yu, Z., Cui, Z., Zhang, C., Liu, X., & Xia, Z. (2025). Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference. Remote Sensing, 17(11), 1835. https://doi.org/10.3390/rs17111835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference

Abstract

1. Introduction

2. Related Work

2.1. Deep Metric Learning

2.2. Sidelobe Suppression

3. Deep Metric Learning for Fine-Grained Ship Classification in SAR Images with Sidelobe Interference

3.1. Overall Architecture

3.2. Sidelobe Suppression Algorithm-Based Maximum Median Filtering

3.3. Feature Extraction Module

3.4. Metric Classification Module

4. Experimental Results Based on the FUSAR Dataset

4.1. FUSAR-Ship Dataset

4.2. Sidelobe Suppression Algorithm Effectiveness Analysis

4.2.1. Evaluation Indicators

4.2.2. Comparison Results

4.3. Fine-Grained Classification Performance Analysis

4.3.1. Experimental Environment and Sampling Strategy

4.3.2. Evaluation Metrics

4.3.3. Hyperparameter Selection

4.3.4. Comparison with SOTA Methods

4.3.5. Ablation Study

4.3.6. Visualization Results

4.3.7. Confusion Matrix

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI