Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition

Zhao, Chenxi; Wang, Daochang; Zhang, Siqian; Kuang, Gangyao

doi:10.3390/rs17183186

Open AccessArticle

Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(18), 3186; https://doi.org/10.3390/rs17183186

Submission received: 2 September 2025 / Revised: 5 September 2025 / Accepted: 5 September 2025 / Published: 15 September 2025

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

Under conditions of limited training samples, incorporating structured scattering information enhances feature completeness and model robustness.
Feature fusion strategies and feature space projection strategies can further enhance the discriminability and separability of features.

What is the implication of the main finding?

Provides a significant solution that enhances the discrimination capability and robustness of SAR target feature representations.
Provides a critical solution for SAR target recognition under limited training sample conditions.

Abstract

Synthetic aperture radar (SAR) target recognition tasks face the dilemma of limited training samples. The fusion of target scattering features improves the ability of the network to perceive discriminative information and reduces the dependence on training samples. However, existing methods are inadequate in utilizing and fusing target scattering information, which limits the development of target recognition. To address the above issues, the multi-level structured scattering feature fusion network is proposed. Firstly, relying on the visual geometric structure of the target, the correlation between local scattering points is established to construct a more realistic target scattering structure. On this basis, the scattering association pyramid network is proposed to mine the multi-level structured scattering information of the target to achieve the full representation of the target scattering information. Subsequently, the discriminative information in the features is measured by the information entropy theory, and the results of the measurements are employed as weighting factors to achieve feature fusion. Additionally, the cosine space classifier is proposed to enhance the discriminative capability of features and the correlation with azimuth information. The effectiveness and superiority of the proposed method are verified on two publicly available SAR image target recognition datasets.

Keywords:

synthetic aperture radar; deep learning; electromagnetic scattering features; graph structure; feature fusion

Graphical Abstract

1. Introduction

Target recognition [1,2,3] based on the SAR [4,5,6] image holds significant importance in contemporary military operations. Deep learning methods [7,8] have profoundly advanced SAR target recognition [9,10,11], driving significant progress in the field. However, with the deeper research, the limitations of deep learning-based SAR target recognition methods are gradually exposed. Currently, the intelligent algorithms for SAR target recognition are generally based on visual neural networks and highly dependent on sufficient training samples. The acquisition of high-quality labeled SAR data is a time-consuming and laborious subject due to the complexity of the imaging mechanism and the difficulty of visual interpretation. Incorporating target knowledge into deep learning models is the key solution to break through the bottleneck of existing methods, which has the following advantages: (i) guiding the model to focus on the key features of the target to enhance its representational capabilities; and (ii) introducing the prior knowledge to alleviate the limitation of insufficient training samples. Thus, an increasing number of researchers have focused on target recognition methods that incorporate prior target features.

Electromagnetic scattering features [12,13] describe the essential properties of SAR image targets and are widely employed in the field of SAR image interpretation. Existing methods have been extensively studied for fusing the scattering center [12,14,15] knowledge of targets and can be broadly categorized into two groups: The first group is the simple feature fusion approach, whose core idea is to improve the completeness and distinctiveness of target representations by integrating multiple features. Zhang et al. [16] quantized the Attributed Scattering Center (ASC) features into k feature vectors and designed a network framework to fuse ASC features with deep features. Jiang et al. [12] employed the convolutional neural network (CNN) to perform the preliminary classification of input samples and evaluate the reliability of the output results. When the classification results were deemed highly reliable, the recognition results were directly output; otherwise, the ASC matching module was activated to further identify the category of the test sample, and the outcome was taken as the final classification decision. Xiao et al. [17] fused multi-channel features with peak features to enhance the saliency of the aircraft, which improved the representation of the features and the detection performance of multi-scale aircraft.

The second group is the SAR target recognition method based on deep scattering information mining, which constructs the new scattering feature representation and mines the deep target scattering information. Liu et al. [18] divided the ASC features into distinct local feature sets and employed point cloud networks to extract deep scattering information. This approach enabled the deep network to autonomously filter useful information, ultimately yielding the complete target representation through the weighted feature fusion method. Wen et al. [19] proposed a novel multimodal feature fusion learning framework that captures integrated target features from different domains, which fuses features learned from the phase history and scattering domains with image features obtained from an off-the-shelf deep feature extractor for final target recognition. Zhang et al. [20] employed a k-nearest neighbor (KNN) algorithm to transform the ASC into a local graph structure, followed by an ASC feature association module to generate multi-scale global scattering features. These scattering features were then fused with deep features through weighted integration to enhance feature diversity. Sun et al. [21,22,23,24] realized the prediction of scattering points of ships, airplanes, and other targets based on deep networks for the first time. They further employed different loss functions to jointly constrain scattering information prediction and deep feature extraction, ultimately enabling intelligent scattering feature extraction and fusion. Feng et al. [25] designed the ASC parameter estimation network and obtained a more complete feature representation by fusing multi-level deep scattering features with deep learning features.

These fusion methods undoubtedly enhance the representation and discrimination of features, but they also suffer from the following intractable problems:

(1): The networks utilize the scattering feature insufficiently. Currently, the exploitation of scattering center features mainly relies on the spatial proximity of local scattering points, and techniques such as clustering, point cloud construction, or graph structure are used to extract scattering information. These methods are still insufficient to reveal deep intrinsic correlations among scattering points. The scattering centers of the target are closely related to the local structure of the target, and their overall distribution reflects the global geometry of the target. Therefore, it is crucial to construct correlations among scattering points relying on the visual geometry of the target.
(2): Simple feature fusion strategies result in poor feature discrimination. Existing fusion frameworks assume that a priori scattering center features and deep image features are either independent or complementary, and they integrate scattering knowledge through manually assigned weights to obtain discriminative representations. Once the target background environment or input data distribution changes, the original weights may no longer be applicable, leading to the decrease in the discriminability of the fused features.

These two factors create performance bottlenecks for existing methods.

To solve the above problems, the multi-level structured scattering feature fusion network (MSSFF-Net) is proposed in this paper. The graph is the effective tool for modeling the nonlinear representation of scattering point features. Thus, in this paper, the graph nodes represent the features of each local scattering point, while the edges represent the correlations between the individual scattering point features. Moreover, the target scattering structure matching the local and global structure of the target is constructed to mine the deep structure scattering information. On this basis, the information entropy between the feature and the target category is utilized as the weight, and the discriminative information in the feature is fused to enhance the representation ability of the feature. Moreover, a cosine space classifier is proposed to enhance feature separability by projecting features onto a spherical manifold, while simultaneously establishing correlations between features and azimuthal angles to improve feature robustness. The details of the work are as follows:

(1): For target scattering information mining, the adaptive k-nearest neighbor method is used to construct the target scattering structure model, firstly. Such a method matches the geometrical structural properties of the target and constructs the intrinsic association among the local scattering features of the target. Subsequently, the scattering association pyramid network is designed to mine the scattering information of the target layer by layer to obtain the complete scattering information.
(2): In the feature fusion stage, the information entropy is used as the theoretical basis to quantify the target discrimination information in the deep scattering domain features and deep image features. The measured results are employed as fusion weights to adaptively fuse the target discriminative information in various features.
(3): A cosine space classifier is proposed to transfer the features from the mutually coupled and entangled Euclidean space to the divisible manifold space by simple feature projection. The robustness of the features to azimuth variations is improved, which enhances the generalization ability of the features.

The rest of this article is organized as follows. Section 2 describes the principles of attribute scattering center. Section 3 describes the proposed method in detail. The experimental results and discussion are reported in Section 4 and Section 5. Finally, the conclusion is drawn in Section 6.

2. Attribute Scattering Center

In this paper, we employ ASC features and fuse them with deep features to enhance the representational capacity of the features. The ASC theory is introduced below.

The backscatter response in the high-frequency region of the target can be equated to the superposition effect of several localized phenomena, which are referred to as scattering centers. The ASC model is a more complete modeling of target knowledge and is formulated as follows:

\begin{matrix} E^{s} (f, φ) = \sum_{i = 1}^{p} E_{i}^{s} (f, φ) \end{matrix}

(1)

where

E^{s} (f, φ)

denotes the backward scattering of the overall target at an azimuth angle

φ

and the frequency f. p is the number of scattering centers. For the

i_{t h}

ASC, the backscattered field is shown below.

\begin{matrix} E_{i}^{s} (f, φ) = & A_{i} {(j \frac{f}{f_{c}})}^{α_{i}} sin c (k L_{i} sin (φ - φ_{i}^{'})) \\ \times exp (- k c γ_{i} sin φ) \\ \times exp \{- j 2 k (x_{i} cos φ + y_{i} sin φ)\} \end{matrix}

(2)

where A is the amplitude, L is the size of the localized scattering structure, and

α

describes the frequency dependence. If the

i_{t h}

ASC is localized,

L_{i} = 0

, otherwise, the ASC is distributed,

L_{i} \neq 0

, and

γ_{i} = 0

.

k = 2 π f / c

is the wave number, and c is the speed of light.

φ

indicates the azimuth angle, which takes values within [

- φ_{m} / 2

,

φ_{m} / 2

], and

φ_{m} / 2

is the maximum imaging observation angle.

φ_{i}^{'}

denotes the degree of deviation from the imaging azimuth angle of the distributed ASC.

r_{i} =

(

x_{i}

,

y_{i}

) is the projection of the the ASC position on the imaging plane, and

x_{i}

,

y_{i}

expresses the position parameter of the ASC in the range and azimuth angle, respectively.

Thus, the entire ASC model of the target can be represented by a parameter set

Θ = \{θ_{1}, θ_{2}, \dots θ_{p}\}

, where each element

θ_{i} =

(

A_{i}

,

α_{i}

,

x_{i}

,

y_{i}

,

L_{i}

,

γ_{i}

,

φ_{i}^{'}

),

i = 1, . . ., p

. Thus, the ASC features of the target are expressed as the set of feature points.

3. Materials and Methods

3.1. Overall Framework

The network structure proposed in this paper is shown in Figure 1. Specifically, the MSSFF-Net includes the following key modules: the image domain deep feature extraction (IDD-FE) module, the scattering association pyramid (SAP) module, the mutual information-based feature fusion (MI-FF) module and the cosine space classifier (CSC). By coupling the proposed modules, the feature representation capability and robustness of the model are improved.

Firstly, the proposed method constructs the target scattering structure representation by using the adaptive KNN method. On this basis, the SAP module is proposed to extract hierarchical target electromagnetic scattering information. Secondly, the MI-FF module is proposed to fuse discriminative target information. Specifically, the mutual information (MI) between features and label information is calculated in the training phase and used as weights to realize adaptive fusion between different features. In the testing phase, the average value of the mutual information weights obtained in the training phase is used as the weight to realize feature fusion. Finally, the robustness and separability of the features are enhanced by projecting the features into the spherical manifold space via CSC.

In this paper, we focus on mining and fusing the deep scattering information; thus, the classical network, i.e., AconvNet [9], is employed to extract the deep feature from SAR images. The network details are presented in the Table 1.

3.2. Scattering Structure Construction

The construction of the scattering structure is the key part of the proposed method, which directly affects the mining of deep scattering information by the model. In this paper, the scattering structure is constructed by graph theory, where the graph nodes represent scattering point features and the edges are correlation maps between scattering point features. Edge features are particularly important and reveal the correlation among different scattering points. Specifically, two main criteria define the edges: (i) how to measure the relationships between scattering points; and (ii) which relationships among scattering points need to be preserved.

3.2.1. Scattering Point Similarity Measure

We assume that there are two nodes u and v, with corresponding ground scattering features

F_{u}

and

F_{v}

, respectively. Here, the distance between them is defined as the relationship. The commonly used metrics include euclidean distance, cosine distance, hamming distance, and mahalanobis distance. Their computational principles are shown in Figure 2.

Euclidean distance is the spatial distance metric that preserves the properties of neighboring nodes nearer to the center node, which is strongly related to the actual physical structure of the target and is also consistent with human visual perception. Therefore, in this paper, the correlations between scattering points are measured by Euclidean distances that match the constructed scattering structure to the visual geometry.

3.2.2. Scattering Point Correlation Construction

In this paper, the adaptive KNN approach [26,27] is implemented to represent the physical information of the target scattering structure, and the modeling results are analyzed. Then, the computational procedure of the adaptive KNN method is described:

Step 1: Select the maximum and minimum values of K,

K_{m a x}

,

K_{m i n}

and construct KNN graph with

K = K_{m a x}

as initial value.

Step 2: Calculate the indegree

d_{i}

of each vertex i in the KNN graph.

Step 3: Calculate the final number of neighbors K of node i, where

\begin{matrix} K_{i} = m i n {K_{m a x}, m a x {d_{i}, K_{m i n}}} \end{matrix}

(3)

Through the above computational process, the appropriate number of neighbors is matched for each node to generate the representation that is more compatible with the target geometry.

3.3. Scattered Association Pyramid Network

Subsequently, the SAP module is proposed to mine deep target scattering information, as shown in Figure 1. The input to the SAP module is the target scattering structure and the output is the target deep scattering feature. Specifically, SAP is realized through the Stacked Scattered Information Interaction (SII) network, which is implemented by the Graph AttenTion mechanism (GAT) [28], Graph Convolutional Network (GCN) [29], and graph coarsening (GC). The GAT breaks the limitation of the input scattering structure to realize the aggregation of global node information. The GCN aggregates information from neighboring scattering points in the local structure based on the original scattering structure. Moreover, we coarsen the initial scattering structure to capture multi-level, multi-scale target scattering representations. The network structure of the SII module is described in detail.

The SII module is the proprietary network for processing scattering structure data that capture more comprehensive target knowledge from global and local structures, as shown in Figure 3. The computational principles of the GAT, GCN, and GC will be described in detail.

The GAT learns the attention weight

α_{i, j}

of each node to its neighboring nodes to measure the importance of neighboring node j to node i. Firstly, feature mapping of node features

f_{n}

is performed by the learnable shared weight matrix W:

z = W f_{n}

. Then, the correlation between node i and its neighbor j is calculated, denoted as

e_{i, j}

.

e_{i, j} = a ([W z_{i} ∥ W z_{j}]), j \in N_{i}

(4)

where

[\cdot | | \cdot]

denotes the feature splicing function, and

a (\cdot)

can map the spliced features to a number, which is the similarity value between two nodes. Subsequently,

e_{i, j}

is normalized by the softmax function to obtain the correlation coefficients.

α_{i j} = \frac{exp (σ (e_{i j}))}{\sum_{k \in N_{i}} exp (σ (e_{i k}))}

(5)

Finally, the features of the neighbors are aggregated by weight and the node representation is updated:

f_{n i}^{'} = σ (\sum_{j \in N (i)} α_{i, j} \cdot W f_{n j})

(6)

where

σ (\cdot)

is the activation function, such as

R e L U

.

The GCN learns the new node representations by fusing information from neighboring nodes, which is more dependent on the target scattering structure. Obviously, the GCN is fundamentally different from the GAT in that the former focuses on capturing local information, while the latter is more concerned with global information aggregation and feature updating. They complement each other to achieve the more comprehensive interaction and update of scattering information. The kernel operation of the GCN is realized by the following equation:

H^{(l + 1)} = σ (\hat{A} H^{(l)} W^{(l)})

(7)

where

H^{(l)}

is the node feature in l layer,

W^{(l)}

is the learnable weight matrix for layer l, and

\hat{A}

is the normalized adjacency matrix. The adjacency matrix

\hat{A}

is the highly important a priori input that records information about the relationship among all scattering points.

Then, the multi-scale target scattering information is obtained by the GC. The coarse-grained structural description of the target is achieved by dividing the graph into k non-intersecting substructures depending on the topology of the graph. Specifically, Adaptive Structure-Aware Pooling (ASAPooling) [30] is used to achieve this objective. ASAPooling determines which nodes can be aggregated by comparing their structural characteristics (e.g., adjacencies, node characteristics) and aggregates similar nodes into the same supernode to maximize the preservation of local structure. The calculation process is shown in Figure 4. In general, the core idea is pooling through adaptive clustering among nodes, which is limited by two aspects: (i) preserving the local structure of the graph, i.e., dynamically adjusting clustering by node features and topological information; and (ii) preserving valid node information, i.e., selectively aggregating similar nodes by learning their importance weights.

ASAPooling is an effective graph coarsening method that combines the node features and structural information of the graph to maximize the preservation of important local scattering structural properties during the dimensionality reduction process.

3.4. Mutual Information-Based Feature Fusion

After obtaining the image domain deep features and target scattering information, it is necessary to fuse them to achieve a more complete target representation. To better integrate and fully utilize their advantages, the adaptive weight feature fusion strategy based on information entropy theory is proposed, i.e., the MI-FF method.

MI is a commonly used information measurement method that indicates the degree of correlation or the amount of information shared between two random variables. The proposed feature fusion method utilizes MI to measure the correlation of each feature with the category and takes the results of the metric as weights to achieve feature fusion.

Assume that the image domain features and scattering features are

f_{n 1}

and

f_{n 2}

, respectively, and the category label is c. It can be defined as follows:

I (f_{n i}; c) = P (f_{n i}, c) log \frac{P (f_{n i}, c)}{P (f_{n i}) P (c)}, i \in {1, 2}

(8)

where

P (f_{n i}, y)

is the joint probability distribution of

f_{n i}

and c.

P (f_{n i})

and

P (c)

are the marginal distribution probabilities of them. The larger the MI, the stronger the discriminative information about the category in feature

f_{n i}

. The feature weights

α_{i}

are calculated:

α_{i} = \frac{I (f_{n i}; c)}{\sum_{j = 1}^{n} I (f_{n j}; c)}, n = 2

(9)

Then, the discriminative feature representation of the target is obtained as follows:

f_{n} = \sum_{i = 1}^{n} α_{i} \cdot f_{n i}

(10)

Notably, the method obtains more discriminative target features on the one hand, and the weight interpretability is enhanced on the other hand. Therefore, the subjectivity and fixity associated with manual weight settings are avoided in the feature fusion process.

3.5. Cosine Space Classifier

The above feature extraction and fusion process implements the projection of the SAR image to feature spaces (from the image space to the feature space). Subsequently, the mapping from features to categories needs to be implemented with the help of the classifier. In an ideal feature space, different classes of features have significant discrimination, which can simplify the classification task. Therefore, the selection of the reasonable feature space is an important prerequisite for achieving precise classification.

SAR images have the high-dimensional representation in Euclidean space, but their actual effective information tends to be concentrated in low-dimensional manifolds because of the physical limitations of the imaging process and the geometrical characteristics of the target. Therefore, mapping the features to the manifold space contributes to obtaining the essential attributes of the target. The Structure Similarity Index Measure (SSIM) is an image similarity measure based on the perception of the human visual system, which evaluates the overall image similarity by calculating the structural, luminance, and contrast similarity between two images in a localized area. Such a method of comparing image structures, intensities, and variations based on local regions is actually a similarity comparison of the scattering characteristics of different SAR images, which is highly consistent with the objective laws of the scattering characteristics of SAR images and can clearly describe the data structure of the manifold space.

Figure 5 shows the similarity results of the original SAR image space versus the feature space of the SAR image. Specifically, we measure the similarity between a SAR image or feature and other SAR images or features at ranging from 0° to 360°. It is found that the projection of the features into the cosine space is strongly consistent with the manifolds of the SAR images. Obviously, the tendency of the target scattering properties to vary with azimuth is reflected in this manifold space, enhancing the ability of the feature to characterize azimuthal information. Thus, the separability of the features in cosine space is enhanced. Additionally, the similarities in features within the Euclidean space exhibit a messy characteristic, indicating that it is difficult to highlight the critical scattering information and azimuth information of the target within the Euclidean space. These comparative results further emphasize the significance of the cosine space classifier.

Therefore, the cosine space classifier is proposed to project the feature space into the spherical manifold space to preserve the critical information of the features. The calculation process is shown below:

cos (θ_{c}) = \frac{w_{c} f_{n}}{∥ f_{n} ∥_{2} {∥w_{c}∥}_{2}}

(11)

P (y = c ∣ f_{n}) = \frac{exp (s \cdot cos (θ_{c}))}{\sum_{C} exp (s \cdot cos (θ_{i}))}

(12)

where c is the target category, C is the total number of categories,

θ_{c}

is the angle of the feature,

f_{n}

is the feature, and

w_{c}

is the learnable weights.

4. Results

All experiments are conducted on a personal computer equipped with an Intel i9-11900K CPU, an NVIDIA RTX 4090 GPU, and 24 GB of RAM. The implementation is carried out in a Python 3.9 environment using the open-source machine learning framework PyTorch 2.01. Additionally, CUDA 11.7 is employed to leverage GPU acceleration for enhanced computational efficiency.

4.1. Dataset Description

In this paper, experimental validations are carried out on the Moving and Stationary Target Acquisition and Recognition (MSTAR) [31] and Full Aspect Stationary Targets-Vehicle (FAST-Vehicle) [32] datasets.

4.1.1. MSTAR

The MSTAR dataset was developed under the sponsorship of the U.S. Department of Defense’s Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) to support the evaluation of target recognition methodologies, with an emphasis on completeness, diversity, and standardization. This is a public SAR target recognition dataset. The dataset comprises SAR images of ground targets captured under varying configurations, including different target types, azimuth angles, and radar depression angles. Specifically, it contains X-band spotlight mode images at a resolution of 0.3 m, with azimuth angles ranging from 0° to 360°. The dataset includes 10 categories of Soviet-era ground armored vehicles: 2S1, BMP2, BRDM2, BTR60, BTR70, D7, T62, T72, ZIL131, and ZSU23/4. The optical image of the above target and its corresponding SAR image are shown in Figure 6.

To support targeted studies, researchers commonly construct sub-datasets under standard (SOC) and extended (EOC) operating conditions according to the specific observation settings. Under SOC, the imaging conditions of the training and test datasets are largely consistent, differing only slightly in depression angle 17° for training and 15° for testing. The specific configurations of the training and test sets under SOC are summarized in Table 2. In contrast, the EOC scenarios introduce substantial variations between the training and test sets, such as differences in depression angles and target variants, as detailed in Table 3 and Table 4.

4.1.2. FAST-Vehicle

The FAST-Vehicle dataset comprises vehicle SAR data in X-band spotlight mode, collected using the MiniSAR system developed by Nanjing University of Aeronautics and Astronautics (NUAA). It offers a high spatial resolution of 0.1 meters and covers a full azimuth range from 0° to 360°. The copyright of FAST-Vehicle dataset belongs to NUAA. The SAR images were acquired in March and July 2022, encompassing nine target categories: 62LT, 63APTV, 63CAAV, 63AT, T3485, 591TC, 54TH, 59AG, and J6. The optical image of the above targets and their corresponding SAR images are shown in Figure 7.

Compared to the classical SAR ATR dataset MSTAR, the FAST-Vehicle dataset offers SAR imagery under more diverse conditions, including a wider range of imaging parameters and multiple acquisition times, as summarized in Table 5. To comprehensively evaluate the effectiveness of the proposed method, three main groups of experimental conditions are designed as follows:

(1): Scenario 1: same time, different azimuths. July 26° depression angle data are used as the training set, and 31°, 37°, and 45° are used as the test set, respectively.
(2): Scenario 2: different time, same azimuth. We treat July 31° and 45° as the training set and, correspondingly, March 31° and 45° as the test set, respectively.

These experimental conditions are designed to both effectively leverage the dataset’s complexity and rigorously validate the effectiveness of the proposed method.

4.2. Experimental Setup

In this paper, we adopt AconvNet as the backbone network. The Adaptive Moment Estimation (Adam) optimizer is utilized with the learning rate of 0.01. All experiments are trained for 500 epochs with a batch size of 32.

4.3. Comparison Experiments on MSTAR Dataset

Firstly, the performance of the algorithm is verified on the MSTAR dataset.

4.3.1. Comparative Experimental Analysis Under SOC

The experimental results are shown in the Table 6. The overall recognition performance of the proposed algorithm reaches 99.46% when all the data in the training set are used for training, i.e., almost all the data in the test set are correctly classified. Subsequently, the size of the training set is further reduced to verify the recognition performance of the proposed method under limited training samples. As the training samples continue to decrease, when 20 or 30 samples are randomly selected from each class, the recognition rate of the proposed method is still able to maintain more than 85%. Moreover, with the much smaller sample size, i.e., only 10 training samples per class, the recognition rate of the proposed method still reaches 65.54%, achieving correct classification for most of the test samples. The above experiments verify the basic performance of the proposed method with sufficient training samples and the effectiveness and stability of the algorithm with limited training samples.

Moreover, to verify the superiority of the proposed method, we compared some classical methods, and the experiment results are shown in Table 6. Specifically, the compared methods are divided into two groups: methods relying solely on SAR images and methods based on feature fusion. The first category of methods includes AlexNet, VGGNet, ResNet, AconvNet, Lm-BN-CNN, and SRC. Among them, the first three methods are classical methods in the field of computer vision, and these methods have been successfully transplanted to the field of SAR image target recognition. Methods other than these are proprietary network structures designed for SAR target recognition tasks. Clearly, the AlexNet network has the worst recognition performance in all limited training sample cases. When each category contains 20 or more training samples, the recognition performances of the VGGNet and ResNet methods are similar; and when the number of training sets is reduced, the recognition performance of the VGGNet method is relatively better. In the SAR proprietary network, AconvNet maintains the optimal recognition performance. The Lm-BN-CNN and SRC methods have poor generalization ability with limited training samples. Therefore, it is reasonable to adopt the AconvNet method as the backbone network for feature extraction in the SAR image domain in this paper.

To further illustrate the superiority of the proposed method, some representative fusion methods are selected for performance comparison in this paper, including FEC, SDF-Net, and ASC-MACN. These methods are implemented to model a priori scattering points by utilizing vectorized modeling, point cloud modeling, and deep modeling, respectively. Under the condition of limited training samples, the performance difference among them is small, around 2%. The proposed method constructs the scattering structure of the target based on the visual geometric properties of the target, while mining and fusing the multi-level scattering information of the target to achieve sufficient and complete target representation. On this basis, the separability among various categories is enhanced by a cosine space classifier. Thus, the proposed method ensures optimal recognition under any training sample setting conditions, which proves the robustness and advancement of the proposed method.

4.3.2. Comparative Experimental Analysis Under EOC

Based on the experimental results in Table 7, several methods with better performance are selected in this section for experiments under EOC, including VGGNet, Lm-BN-CNN, AconvNet, FEC, SDF-Net, and ASC-MACN.

Under EOC1, the degress angle difference between the training and test sets further increases, i.e., the similarity between the training and test samples decreases. The recognition performance of the fusion methods is slightly better when all training data are used for training under EOC1. However, as the training samples are reduced, the methods in which only SAR image data is utilized for training achieve better recognition performance. When the training samples are limited, the recognition performance of AconvNet is optimal among the compared methods. The proposed method ensures the best recognition performance under any conditions, proves the validity and reasonableness of the proposed scattering structure representation, and achieves the full utilization of the a priori knowledge of the target.

Under EOC2, the target categories are more fine-grained; thus, more detailed information about the target needs to be obtained to achieve precise classification. From Table 8, it can be found that the fusion method obtains higher recognition performance in the case of limited training samples. Such extra target scattering information enhances the network’s ability to perceive localized information about the target, making it more discriminative in distinguishing fine-grained target classes. That result is also highly consistent with the general laws of the human perception of things. The proposed method not only fully mines and fuses the scattering information of the target to achieve complete target representation, but also enhances the separability of the features through the cosine space classifier, thus realizing the more superior and robust recognition performance.

4.4. Comparison Experiments on FAST-Vehicle Dataset

To verify the generality of the proposed method, experimental validations are performed on the latest FAST-Vehicle dataset.

4.4.1. Comparative Experimental Analysis Under Scenario 1

Table 9, Table 10 and Table 11 demonstrate the performance of all algorithms under Scenario 1 conditions. Due to the poor image quality in the FAST-Vehicle dataset, the overall performance of all the algorithms on this dataset is worse. When all the data in the training set are used for training, the AconvNet method achieves recognition rates of 59.19%, 66.14%, and 53.37% under EXP1, EXP2, and EXP3 conditions, respectively. Under the EXP1 condition, the fusion of a priori scattering information leads to the performance gain and, thus, the fusion method is superior to AconvNet. When only 50 samples per class are used for training, the complex manner of scattering information mining will bring burden to the network, thus making the ASC-MACN method worse among the fusion methods. At this time, the FEC method achieves the best recognition performance among the compared methods. With the increase in training samples, the difference in recognition performance between the FEC method and the ASC-MACN method is small, and the performance of the SDF-Net method gradually improves and is optimal among the compared methods when the training samples are sufficient. The proposed method achieves the best recognition performance of 68.55% in the EXP1 condition. This method designs a more adapted modeling approach and takes into account the presence of redundant information in the features that is irrelevant to the classification task, and it compresses the redundant information through mutual information to achieve further enhancement of the discriminative information. The following factors work together to achieve the optimal performance of the proposed method: (i) relying on the visual structural attributes to construct the scattering structure association, and mining the target multilevel scattering information based on it to enrich the target representation; (ii) quantifying the discriminative information in the features by information entropy, and filtering the redundant information to improve the discrimination of the fused features; and (iii) enhancing the divisibility of features by projecting them into the spherical manifold space.

Under the EXP2 condition, the recognition performance of the FEC method reaches the best among all the compared methods at 67.97%, making it the only method that performs better than AconvNet among all the compared methods. Although the performance of the SDF-Net method is degraded, it still outperforms the recognition performance of the ASC-MACN method, which reaches 64.58% and 64.15%, respectively. Such performance patterns are maintained in the EXP3 experimental conditions, which achieved 53.72%, 53.26%, and 50.27% for the three fusion methods, respectively. The proposed method maintains the optimal recognition performance under Scenario 1 conditions.

In the case of limited samples, the discriminative information provided by the training data is limited, and adding extra target information is the powerful tool to enrich the target discriminative information. Under the EXP1 conditions, the fusion methods show significant performance improvement over AconvNet, regardless of whether the training samples have 150, 100, or 50 samples per class. With the changes in the degress angle, the performances of the SDF-Net algorithm and the ASC-MACN algorithm show different degrees of degradation, and are even slightly worse than the recognition effect of AconvNet. The proposed method relies on the reasonable a priori scattering model knowledge representation strategy, which enhances the ability to utilize the knowledge of the a priori scattering model to achieve robust recognition in the case of limited training samples.

4.4.2. Comparative Experimental Analysis Under Scenario 2

In the Scenario 2 condition of FAST-Vehicle, the training and test sets are samples at the same degress angle with different acquisition times, including the EXP4 and EXP5 experimental conditions. SAR images acquired at different acquisition times have more uncontrollable factors and are more complex, e.g., the surrounding environment changes significantly during different seasons and meteorological conditions may also produce uncontrollable variability in the imaging results. The experimental results under Scenario 2 conditions are shown in Figure 8 and Figure 9.

Among the compared methods, the SDF-Net method shows superior recognition performance under the training samples conditions. However, even the recognition performances of the FEC and SDF-Net methods show various degrees of performance degradation compared to the AconvNet method. Moreover, as the number of training samples increases, the advantage of the fused feature methods in the comparison methods gradually disappears compared to AconvNet, and all comparison methods show performance degradation. Such phenomenon is inseparable from the inconsistent imaging quality of the FAST-Vehicle dataset. The proposed method improves in all aspects, from the modeling and mining of scattering information to feature fusion and classifier. (i) On the one hand, the more complete representation of the target is achieved. (ii) On the other hand, the target discriminative information is further enhanced by the feature fusion strategy. (iii) The features are also projected into the manifold space, which enables better feature distinguishability. Thus, the proposed method maintains the optimal recognition performance even when the data situation is more complex.

5. Discussion

5.1. Discussion on MSTAR

Under the MSTAR SOC, we randomly selected 10 training samples from each category for discussion.

5.1.1. Scattering Structure Analysis

The MSTAR dataset provides high-quality imagery, enabling relatively accurate extraction of scattering features using traditional methods. It contains fewer noisy scatter points, allowing the constructed scatter structure model to maintain strong consistency with the target’s visual geometric properties. Figure 10 presents the visualization results of different scattering structures, where the two KNN-based methods demonstrate superior capability in representing the target’s geometric structure. Compared with the standard KNN method, the adaptive KNN approach can effectively eliminate redundant and erroneous local structural information, yielding a more precise representation of the target’s scattering structure. While the cosine distance metric measures semantic similarity and is well suited for high-dimensional features, it is less effective in capturing the spatial distance between scattering points, as illustrated in the last two rows of Figure 10. These visualization results demonstrate that the adaptive KNN method based on Euclidean distance provides an effective means of representing the target’s scattering structure.

On this basis, the impact of different scattering structures on the recognition task is analyzed, and the effectiveness, robustness, and superiority of the adaptive KNN method are verified through quantitative evaluation. The experimental results are presented in Table 12, where the scattering structure proposed in this paper achieves the highest recognition accuracy of 65.54%, demonstrating its suitability for representing the target’s geometric structural properties.

5.1.2. Ablation Experiment Analysis

To validate the effectiveness of the proposed modules, we sequentially inserted different modules into the network and observed their impact on algorithm performance, as detailed in Table 13. Given that the proposed method employs AconvNet as the backbone network for image feature extraction, we adopt the AconvNet approach as the baseline method to examine how the algorithm’s recognition performance evolves when incorporating different modules.

As shown in Table 13, the AconvNet network achieves a recognition performance of 59.36%. Subsequently, using this model as the base network, the SAP module is added to provide structured scattering information. A simple feature-addition fusion strategy is employed to combine deep image features with deep scattering features, yielding more comprehensive target description information. This approach ultimately achieves a recognition performance of 61.15%, representing a 1.79% improvement over the baseline AconvNet method. Building upon this, to better fuse image and scattering features, we further incorporate an MI-FF module into the network to enhance discriminative target information. This ultimately achieves a recognition performance of 63.26%, representing a 3.90% improvement over the baseline method. Subsequently, we examine the impact of cosine space classifier on algorithm recognition performance. Firstly, replacing the classifier module on the baseline method AconvNet yields a slight improvement in recognition performance, achieving an accuracy of 60.82%, representing an increase of 1.46%. Such results validate the rationality of the cosine space classifier to be the SAR image feature classifier. Based on this, the SAP module continues to incorporate into the network, providing target deep scattering information to the network, achieving an identification performance of 63.77%, with a performance improvement of 4.41%. Finally, integrating all proposed modules into a unified framework yielded a recognition accuracy of 65.54%. Compared to the baseline method, this achieved a significant 6.18% improvement in algorithm performance.

5.2. Discussion on FAST-Vehicle

The experiments are performed under FAST-Vehicle EXP1 conditions.

5.2.1. Scattering Structure Analysis

Figure 11 shows the results of scattering structure visualization constructed by different graph structure modeling approaches for the targets in the FAST-Vehicle dataset, respectively. Figure 11 shows similar experimental results to Figure 10. The Euclidean adaptive KNN method gives fewer erroneous scattering structures and maintains consistency with the visual geometric structure of the target.

Table 14 demonstrates the effectiveness, robustness, and superiority of the adaptive KNN method through quantitative metrics. The scattering structure proposed in this paper has the best recognition performance of 68.55%, which proves that such an approach is more suitable for the geometrical structural properties of the target. Other scattering structures have “noise” relationships that drown critical scattering information in useless information, resulting in degraded recognition performance. The difference in the target recognition performance of the comparison methods is small, basically hovering around 65%, which is far less than the performance of the proposed algorithm.

5.2.2. Ablation Experiment Analysis

The following ablation experiments are shown in Table 15. The baseline AconvNet network recognition performance can reach 59.19%. After adding SAP modules to the network and using the feature addition fusion method, the recognition performance reaches 62.54%, which is improved by 3.35% over the baseline AconvNet method. On this basis, we continue to add the MI-FF module to the network and achieve 66.68% recognition performance, which improves the performance by 7.49% compared to the baseline method. None of the above discussions use cosine space classifiers.

Then, the impact of cosine space classifiers on the recognition performance is discussed. Firstly, after replacing the classifier on the baseline method AconvNet, the recognition performance is slightly improved to 60.94%, which is a performance improvement (PI) of about 1.75%. The cosine space classifier is reasonable as the feature classifier in the field of SAR target recognition. Subsequently, the SAP module continues to be added to the network and the recognition performance reaches 62.54%, a PI of about 3.35%. Finally, by integrating all the proposed modules into the same framework, the final recognition performance reaches 68.55%, which shows a performance improvement of the algorithm of 9.36% compared to the baseline method. The proposed method considers various aspects such as scattering information deep mining, feature fusion, and classifier design to fully utilize the target scattering information, achieve more discriminative target characterization through efficient feature fusion strategy, and project the fused target information to hyperspherical space to achieve better target category discrimination.

6. Conclusions

In this paper, MSSFF-Net is proposed to improve the performance of SAR target recognition with limited training samples. Firstly, a target scattering structure is constructed that matches the local and global structure of the target to mine the deep structure scattering information. Based on this, the information entropy between the feature and target category is employed as a weighting factor to integrate discriminative information, improving the representational capability of the feature. In addition, the cosine space classifier is proposed to establish the correlation between features and azimuths to improve the separability of features. The discriminative robustness of the model and the separability of the features are enhanced by the combined effect of the above proposed modules. Moreover, experimental validations are performed on the MSTAR and FAST-Vehicle datasets to demonstrate the effectiveness of the proposed methodology and the validity of the various modules.

Author Contributions

Conceptualization, C.Z., D.W. and S.Z.; Methodology, C.Z., D.W., S.Z. and G.K.; Validation, C.Z.; Formal analysis, C.Z. and D.W.; Investigation, C.Z.; Writing—original draft, C.Z., D.W. and S.Z.; Writing—review & editing, C.Z., D.W., S.Z. and G.K.; Visualization, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

MSTAR dataset is the Dats available in a publicly accessible repository that does not issue DOIs: Publicly available datasets were analyzed in this study. This data can be found here: in the Air Force Research Laboratory’s Science Data Management System (SDMS). FAST-Vehicle dataset is the 3rd Party Data: Restrictions apply to the availability of these data. Data were obtained from Nanjing University of Aeronautics and Astronautics (NUAA) and are available from the authors with the permission of NUAA.

Acknowledgments

Great thanks to Nanjing University of Aeronautics and Astronautics (NUAA) for providing the FAST-Vehicle dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bhanu, B. Automatic Target Recognition: State of the Art Survey. IEEE Trans. Aerosp. Electron. Syst. 2007, AES-22, 364–379. [Google Scholar] [CrossRef]
El-Darymli, K.; Gill, E.W.; Mcguire, P.; Power, D.; Moloney, C. Automatic Target Recognition in Synthetic Aperture Radar Imagery: A State-of-the-Art Review. IEEE Access 2016, 4, 6014–6058. [Google Scholar] [CrossRef]
Clemente, C.; Pallotta, L.; Gaglione, D.; De Maio, A.; Soraghan, J.J. Automatic Target Recognition of Military Vehicles with Krawtchouk Moments. IEEE Trans. Aerosp. Electron. Syst. 2017, 53, 493–500. [Google Scholar] [CrossRef]
Wiley, C.A. Synthetic Aperture Radars. IEEE Trans. Aerosp. Electron. Syst. 1985, AES-21, 440–443. [Google Scholar] [CrossRef]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on Synthetic Aperture Radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
Brown, W.M. Synthetic Aperture Radar. IEEE Trans. Aerosp. Electron. Syst. 1967, AES-3, 217–229. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Chen, S.; Wang, H.; Xu, F.; Jin, Y.Q. Target Classification Using the Deep Convolutional Networks for SAR Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
Deep SAR-Net: Learning Objects from Signals. ISPRS J. Photogramm. Remote Sens. 2020, 161, 179–193. [CrossRef]
Geng, J.; Wang, H.; Fan, J.; Ma, X. Deep Supervised and Contractive Neural Network for SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2442–2459. [Google Scholar] [CrossRef]
Potter, L.; Moses, R. Attributed Scattering Centers for SAR ATR. IEEE Trans. Image Process. 1997, 6, 79–91. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Zhang, X.; Wang, H.; Xu, F. A Reinforcement Learning Framework for Scattering Feature Extraction and SAR Image Interpretation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Potter, L.; Chiang, D.M.; Carriere, R.; Gerry, M. A GTD-based parametric model for radar scattering. IEEE Trans. Antennas Propag. 1995, 43, 1058–1067. [Google Scholar] [CrossRef]
Rigling, B.D.; Moses, R.L. GTD-based scattering models for bistatic SAR. In Algorithms for Synthetic Aperture Radar Imagery XI; SPIE: Bellingham, WA, USA, 2004; Volume 5427, pp. 208–219. [Google Scholar]
Zhang, J.; Xing, M.; Xie, Y. FEC: A Feature Fusion Framework for SAR Target Recognition Based on Electromagnetic Scattering Features and Deep CNN Features. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2174–2187. [Google Scholar] [CrossRef]
Xiao, X.; Jia, H.; Xiao, P.; Wang, H. Aircraft Detection in SAR Images Based on Peak Feature Fusion and Adaptive Deformable Network. Remote Sens. 2022, 14, 6077. [Google Scholar]
Liu, Z.; Wang, L.; Wen, Z.; Li, K.; Pan, Q. Multilevel Scattering Center and Deep Feature Fusion Learning Framework for SAR Target Recognition. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar]
Wen, Z.; Yu, Y.; Wu, Q. Multimodal Discriminative Feature Learning for SAR ATR: A Fusion Framework of Phase History, Scattering Topology, and Image. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Zhang, X.; Feng, S.; Zhao, C.; Sun, Z.; Zhang, S.; Ji, K. MGSFA-Net: Multiscale Global Scattering Feature Association Network for SAR Ship Target Recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4611–4625. [Google Scholar] [CrossRef]
Sun, Y.; Wang, Z.; Sun, X.; Fu, K. SPAN: Strong Scattering Point Aware Network for Ship Detection and Classification in Large-Scale SAR Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1188–1204. [Google Scholar] [CrossRef]
Fu, K.; Dou, F.Z.; Li, H.C.; Diao, W.H.; Sun, X.; Xu, G.L. Aircraft Recognition in SAR Images Based on Scattering Structure Feature and Template Matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4206–4217. [Google Scholar]
Sun, Y.; Sun, X.; Wang, Z.; Fu, K. Oriented Ship Detection Based on Strong Scattering Points Network in Large-Scale SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
Sun, X.; Lv, Y.; Wang, Z.; Fu, K. SCAN: Scattering Characteristics Analysis Network for Few-Shot Aircraft Classification in High-Resolution SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
Feng, S.; Ji, K.; Wang, F.; Zhang, L.; Ma, X.; Kuang, G. Electromagnetic Scattering Feature (ESF) Module Embedded Network Based on ASC Model for Robust and Interpretable SAR ATR. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Sun, Y.; Lei, L.; Guan, D.; Kuang, G. Iterative Robust Graph for Unsupervised Change Detection of Heterogeneous Remote Sensing Images. IEEE Trans. Image Process. 2021, 30, 6277–6291. [Google Scholar] [CrossRef] [PubMed]
Structured Graph Based Image Regression for Unsupervised Multimodal Change Detection. ISPRS J. Photogramm. Remote Sens. 2022, 185, 16–31.
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Ranjan, E.; Sanyal, S.; Talukdar, P.P. ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations. arXiv 2019, arXiv:1911.07979. [Google Scholar] [CrossRef]
Keydel, E.R.; Lee, S.W.; Moore, J.T. MSTAR extended operating conditions: A tutorial. Algorithms Synth. Aperture Radar Imag. III 1996, 2757, 228–242. [Google Scholar]
Lv, J.; Zhu, D.; Geng, Z.; Han, S.; Wang, Y.; Yang, W.; Ye, Z.; Zhou, T. Recognition of Deformation Military Targets in the Complex Scenes via MiniSAR Submeter Images With FASAR-Net. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–19. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhou, F.; Wang, L.; Bai, X.; Hui, Y. SAR ATR of Ground Vehicles Based on LM-BN-CNN. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7282–7293. [Google Scholar] [CrossRef]

Figure 1. The overall network structure of the proposed method.

Figure 2. Different distance metrics.

Figure 3. SII network framework.

Figure 4. ASAPooling calculation process.

Figure 5. Results of similarity measures for SAR images by SSIM, and similarity measures for features in cosine space. The first row shows the SSIM similarity results, the second row displays the similarity results in cosine space, and the third row presents the similarity results in Euclidean space. From left to right column, the similarity results for different azimuths are sequentially illustrated. (a) Feature space visualization results for 2S1 category targets. (b) Feature space visualization results for T72 category targets.

Figure 6. MSTAR dataset SAR and optical images.

Figure 7. FAST-Vehicle dataset SAR and optical images.

Figure 8. FAST-Vehicle EXP4 experimental results.

Figure 9. FAST-Vehicle EXP5 experimental results.

Figure 10. Scattering structure on MSTAR.

Figure 11. Scattering structure on FAST-Vehicle.

Table 1. IDD-FE module network structure.

Layer	Type	Kernel Size	Channel	Stride	Output
0	SAR image	-	1	-	$1 \times 128 \times 128$
1	Conv	$5 \times 5$	16	$1 \times 1$	$16 \times 124 \times 124$
2	ReLU	-	-	-	$16 \times 124 \times 124$
3	Pool	$2 \times 2$	16	$2 \times 2$	$16 \times 62 \times 62$
4	Conv	$5 \times 5$	32	$1 \times 1$	$32 \times 58 \times 58$
5	ReLU	-	-	-	$32 \times 58 \times 58$
6	Pool	$2 \times 2$	32	$2 \times 2$	$32 \times 29 \times 29$
7	Conv	$6 \times 6$	64	$1 \times 1$	$64 \times 24 \times 24$
8	ReLU	-	-	-	$64 \times 24 \times 24$
9	Pool	$2 \times 2$	64	$2 \times 2$	$64 \times 12 \times 12$
10	Conv	$5 \times 5$	128	$1 \times 1$	$128 \times 8 \times 8$
11	ReLU	-	-	-	$128 \times 8 \times 8$
12	Pool	$2 \times 2$	128	$2 \times 2$	$128 \times 4 \times 4$
13	FC	$1024 \times 512$			512
14	Dropout	Rate = 0.5			512
15	Softmax	-			512

Table 2. Detailed information about the data in the MSTAR SOC.

Class	Serial Number	Training Set	Testing Set
Class	Serial Number	17°	15°
2S1	B01	299	274
BMP2	9563	233	195
BRDM2	E71	298	274
BTR60	7532	256	195
BTR70	C71	233	196
D7	13015	299	274
T62	A51	299	273
T72	132	232	196
ZIL131	E12	299	274
ZSU23/4	D08	299	274

Table 3. The detailed description of the data in the MSTAR EOC1.

Class	Serial Number	Training Set	Testing Set
Class	Serial Number	17°	30°
2S1	B01	299	288
BRDM2	E71	298	287
T72	A64	299	288
ZSU23/4	D08	299	288

Table 4. The detailed description of the data in the MSTAR EOC2.

	Class	Serial No.	Depression	Number
Training Set	T72	132	17°	232
	BMP2	9563	17°	233
	BRDM2	E71	17°	298
	BTR70	C71	17°	233
Testing Set	T72	812	17°, 15°	426
		A04	17°, 15°	573
		A05	17°, 15°	573
		A07	17°, 15°	573
		A10	17°, 15°	567
	BMP2	9566	17°, 15°	428
	BMP2	C21	17°, 15°	429

Table 5. Details of the FAST-Vehicle dataset.

Class	Scenario 1						Scenario 2
	EXP1		EXP2		EXP3		EXP4		EXP5
	Train	Test	Train	Test	Train	Test	Train	Test	Train	Test
	July 26°	July 31°	July 26°	July 37°	July 26°	July 45°	July 31°	Mar 31°	July 45°	Mar 45°
54TH	231	286	231	295	231	282	286	172	282	128
591TC	231	286	231	297	231	291	286	307	291	270
59AG	228	283	228	293	289	291	283	821	289	727
62LT	230	286	230	296	230	291	286	150	291	129
63APTV	230	286	230	296	230	289	286	158	289	122
63AT	228	286	228	296	228	292	286	185	292	161
63CAAV	231	286	231	296	231	291	286	157	291	104
J6	230	286	231	293	231	292	286	174	292	115
T3485	222	278	222	295	222	289	278	718	289	551

Table 6. MSTAR SOC experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

	Method	Sample Sizes
	Method	10	20	30	ALL
Only image	AlexNet [7]	50.42%	63.59%	70.62%	97.67%
	VGGNet [33]	57.66%	70.43%	78.11%	98.21%
	ResNet [34]	49.78%	69.51%	75.38%	98.75%
	AconvNet [9]	59.36%	73.19%	81.63%	99.13%
	Lm-BN-CNN [35]	52.18%	65.37%	70.87%	96.44%
	SRC	58.25%	71.82%	76.54%	96.29%
Feature fusion	FEC [16]	62.06%	76.35%	83.94%	99.59%
	SDF-Net [18]	64.73%	78.81%	86.46%	99.58%
	ASC-MACN [25]	62.85%	79.46%	85.57%	99.42%
Proposed method	MSSFF-Net	65.54%	85.48%	87.05%	99.56%

Table 7. MSTAR EOC1 experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

Method	Sample Sizes
Method	10	20	30	ALL
VGGNet [33]	72.29%	75.76%	80.97%	95.37%
AconvNet [9]	74.98%	76.80%	83.54%	96.12%
Lm-BN-CNN [35]	61.86%	66.38%	73.66%	91.66%
FEC [16]	68.90%	68.55%	73.15%	97.79%
SDF-Net [18]	64.47%	67.94%	74.33%	97.16%
ASC-MACN [25]	71.24%	73.50%	82.36%	98.09%
MSSFF-Net	76.89%	78.45%	88.79%	98.12%

Table 8. MSTAR EOC2 experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

Method	Sample Sizes
Method	10	20	30	ALL
VGGNet [33]	76.22%	83.91%	87.84%	97.12%
AconvNet [9]	77.32%	82.80%	86.91%	97.93%
Lm-BN-CNN [35]	72.26%	79.29%	82.15%	88.60%
FEC [16]	80.95%	84.42%	86.02%	98.48%
SDF-Net [18]	78.52%	86.16%	88.01%	98.76%
ASC-MACN [25]	79.19%	86.57%	91.45%	98.07%
MSSFF-Net	91.40%	92.72%	93.33%	98.29%

Table 9. FAST-Vehicle EXP1 experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

Method	Sample Sizes
Method	50	100	150	ALL
AconvNet [9]	43.84%	45.21%	50.32%	59.19%
FEC [16]	48.19%	55.56%	58.25 %	60.12%
SDF-Net [18]	46.63%	54.34%	59.70%	62.23%
ASC-MACN [25]	45.03%	55.18%	58.71%	59.11 %
MSSFF-Net	51.46%	56.69%	61.14%	68.55%

Table 10. FAST-Vehicle EXP2 experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

Method	Sample Sizes
Method	50	100	150	ALL
AconvNet [9]	51.60%	58.60%	63.57%	66.14%
FEC [16]	52.58%	58.52%	64.73 %	67.97%
SDF-Net [18]	52.07%	58.83%	62.55%	64.58%
ASC-MACN [25]	47.05%	54.75%	59.24%	64.15 %
MSSFF-Net	53.44%	58.75%	65.26%	68.79%

Table 11. FAST-Vehicle EXP3 experimental results. The bolded values indicate the highest recognition accuracy achieved under the corresponding condition.

Method	Sample Sizes
Method	50	100	150	ALL
AconvNet [9]	43.52%	46.58%	49.77%	53.57%
FEC [16]	45.05%	48.66%	49.88%	53.72%
SDF-Net [18]	43.02%	49.88%	51.80%	53.26%
ASC-MACN [25]	42.75%	45.43%	47.13%	50.27 %
MSSFF-Net	44.86%	49.77%	52.19%	56.57%

Table 12. Recognition results for different scattering structures on MSTAR. The bolded values indicate the highest recognition accuracy.

Scattering Structure	Recognition Accuracy
Full connection	61.65%
Euclidean KNN	62.19%
Euclidean adaptive KNN	65.54%
Cosine KNN	54.52%
Cosine adaptive KNN	56.45%

Table 13. Ablation experiment on MSTAR. The bolded values indicate the highest recognition accuracy.

IDD-EF	SAP	MI-WF	CS-cls	Recognition Accuracy	Performance Improvement
✓	×	×	×	59.36%	-
✓	✓	×	×	61.15%	$↑ 1.79 %$
✓	✓	✓	×	63.26%	$↑ 3.90 %$
✓	×	×	✓	60.82%	$↑ 1.46 %$
✓	✓	×	✓	63.77%	$↑ 4.41 %$
✓	✓	✓	✓	65.54%	↑ 6.18%

Table 14. Recognition results for different scattering structures. The bolded values indicate the highest recognition accuracy.

Scattering Structure	Recognition Accuracy
Full connection	65.28%
Euclidean KNN	65.86%
Euclidean adaptive knn	68.55%
Cosine KNN	64.96%
Cosine adaptive knn	65.59%

Table 15. Ablation experiment. The bolded values indicate the highest recognition accuracy.

IDD-EF	SAP	MI-WF	CSC	Recognition Accuracy	Performance Improvement
✓	×	×	×	59.19%	-
✓	✓	×	×	62.54%	$↑ 3.35 %$
✓	✓	✓	×	66.68%	$↑ 7.49 %$
✓	×	×	✓	60.94%	$↑ 1.75 %$
✓	✓	×	✓	62.54%	$↑ 3.35 %$
✓	✓	✓	✓	68.55%	↑ 9.36%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, C.; Wang, D.; Zhang, S.; Kuang, G. Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition. Remote Sens. 2025, 17, 3186. https://doi.org/10.3390/rs17183186

AMA Style

Zhao C, Wang D, Zhang S, Kuang G. Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition. Remote Sensing. 2025; 17(18):3186. https://doi.org/10.3390/rs17183186

Chicago/Turabian Style

Zhao, Chenxi, Daochang Wang, Siqian Zhang, and Gangyao Kuang. 2025. "Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition" Remote Sensing 17, no. 18: 3186. https://doi.org/10.3390/rs17183186

APA Style

Zhao, C., Wang, D., Zhang, S., & Kuang, G. (2025). Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition. Remote Sensing, 17(18), 3186. https://doi.org/10.3390/rs17183186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Level Structured Scattering Feature Fusion Network for Limited Sample SAR Target Recognition

Abstract

Highlights

Abstract

1. Introduction

2. Attribute Scattering Center

3. Materials and Methods

3.1. Overall Framework

3.2. Scattering Structure Construction

3.2.1. Scattering Point Similarity Measure

3.2.2. Scattering Point Correlation Construction

3.3. Scattered Association Pyramid Network

3.4. Mutual Information-Based Feature Fusion

3.5. Cosine Space Classifier

4. Results

4.1. Dataset Description

4.1.1. MSTAR

4.1.2. FAST-Vehicle

4.2. Experimental Setup

4.3. Comparison Experiments on MSTAR Dataset

4.3.1. Comparative Experimental Analysis Under SOC

4.3.2. Comparative Experimental Analysis Under EOC

4.4. Comparison Experiments on FAST-Vehicle Dataset

4.4.1. Comparative Experimental Analysis Under Scenario 1

4.4.2. Comparative Experimental Analysis Under Scenario 2

5. Discussion

5.1. Discussion on MSTAR

5.1.1. Scattering Structure Analysis

5.1.2. Ablation Experiment Analysis

5.2. Discussion on FAST-Vehicle

5.2.1. Scattering Structure Analysis

5.2.2. Ablation Experiment Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI