Angle-Controllable SAR Image Generation for Target Recognition with Few Samples

Wang, Xilin; Hui, Bingwei; Wang, Wei; Guo, Pengcheng; Ding, Lei; Lin, Huangxing

doi:10.3390/rs17071206

Open AccessArticle

Angle-Controllable SAR Image Generation for Target Recognition with Few Samples

by

Xilin Wang

¹,

Bingwei Hui

^2,*,

Wei Wang

²,

Pengcheng Guo

¹,

Lei Ding

³

and

Huangxing Lin

²

¹

Xi’an Electronic Engineering Research Institute, China North Industries Group Corporation Limited, Xi’an 710100, China

²

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

³

Key Laboratory of Remote Sensing and Digital Earth, Chinese Academy of Sciences Aerospace Information Research Institute, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(7), 1206; https://doi.org/10.3390/rs17071206

Submission received: 28 February 2025 / Revised: 22 March 2025 / Accepted: 27 March 2025 / Published: 28 March 2025

(This article belongs to the Special Issue SAR Image Object Detection and Information Extraction: Methods and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The availability of high-quality and ample synthetic aperture radar (SAR) image datasets is crucial for understanding and recognizing target characteristics. However, in practical applications, the limited availability of SAR target images significantly impedes the advancement of SAR interpretation methodologies. In this study, we introduce a Generative Adversarial Network (GAN)-based approach designed to manipulate the target azimuth angle with few samples, thereby generating high-quality target images with adjustable angle ranges. The proposed method consists of three modules: a generative fusion local module conditioned on image features, a controllable angle generation module based on sparse representation, and an angle discrimination module based on scattering point extraction. Consequently, the generative modules fuse semantically aligned features from different images to produce diverse SAR samples, whereas the angle synthesis module constructs target images within a specified angle range. The discriminative module comprises a similarity discriminator to distinguish between authentic and synthetic images to ensure the image quality, and an angle discriminator to verify that generated images conform to the specified range of the azimuth angle. Combining these modules, the proposed methodology is capable of generating azimuth angle-controllable target images using only a limited number of support samples. The effectiveness of the proposed method is not only verified through various quality metrics, but also examined through the enhanced distinguishability of target recognition methods. In our experiments, we achieved SAR image generation within a given angle range on two datasets. In terms of generated image quality, our method has significant advantages over other methods in metrics such as FID and SSIM. Specifically, the FID was reduced by up to 0.37, and the SSIM was increased by up to 0.46. In the target recognition experiments, after augmenting the data, the accuracy improved by 6.16% and 3.29% under two different pitch angles, respectively. This demonstrates that our method has great advantages in the SAR image generation task, and the research content is of great value.

Keywords:

synthetic aperture radar; generative adversarial network; attention mechanism; perceptual loss

1. Introduction

Synthetic aperture radar (SAR) is a type of active signal-emitting sensor capable of obtaining high-resolution images by utilizing pulse compression technology and synthetic aperture principles [1,2]. Compared to other sensors, such as optical and infrared sensors, SAR can acquire the scattering information of the target area in all-weather and all-time conditions, delivering rich and timely information about the Earth’s surface. This makes SAR widely used in extensive applications including land-cover mapping, disaster monitoring, military reconnaissance, and emergency alert [3,4,5].

The interpretation of SAR images is crucial for their practical applications, where substantial research has been conducted across many domains, such as SAR image despeckling [6], super-resolution [7], target detection recognition, and multi-sensor image fusion [8,9]. The investigation on SAR characteristics relies on a sufficient number of high-quality SAR samples, especially those focusing on SAR targets. For instance, in the domains of SAR multi-sensor image fusion [10,11,12] and SAR-optical image fusion [13], high-quality SAR samples are indispensable for the effective integration of scene information and interpretation of SAR scene imagery [14]. One of the research fields with an urgent necessity for high-quality SAR samples is the automatic target recognition (ATR) of SAR images [15,16,17,18,19,20]. A substantial quantity of SAR target images is essential to extract target features, thereby enhancing recognition rates and facilitating the practical implementation of SAR ATR [21].

However, in practical situations, numerous studies focused on SAR image interpretation studies are encountering the issue of insufficient datasets. SAR image data acquisition is challenging, time-consuming, and resource-intensive. Consequently, the availability of effective samples is limited, particularly concerning non-cooperative targets. Moreover, the unique imaging mechanism of SAR makes it highly sensitive to target angle offset, making it hard to collect datasets with full angle information for specific targets. Although the literature studies have accumulated a large amount of data for SAR target detection (such as MSTAR, SSDD, HRSID, LS-SSDD [22,23,24,25,26]), the acquisition of such target samples with sufficient angle information remains constrained in practical scenarios, primarily due to counter-reconnaissance technologies. With the continuous advancements in SAR target recognition, many studies found that multi-azimuth observation can provide richer target features, thereby improving the accuracy and reliability of SAR target recognition [27]. Therefore, SAR image generation with a controllable azimuth can provide detailed features of the target, thereby enhancing the recognizability of SAR images. There is still a substantial gap in meeting the practical requirements due to the scarcity of SAR samples.

In response to the challenge of insufficient sample availability, extensive research has been undertaken in recent years [28,29,30,31,32,33,34]. The four primary strategies for acquiring synthetic aperture radar (SAR) target samples include the following: (i) data augmentation operations, such as rotation, translation, mirroring, and affine transformation, (ii) SAR image synthesis based on 3D modeling, (iii) data expansion utilizing deep generative models, and (iv) measured data collection. Among these strategies, data augmentation operations alter only the geometric shape of SAR images without considering the SAR imaging mechanisms; thus, they are incapable of achieving angle-specific data expansion. SAR image synthesis based on 3D modeling requires precise parameters of the targets and involve complex electromagnetic calculations. For typical non-cooperative targets, obtaining their precise structural information is unfeasible, and the intricate and laborious nature of electromagnetic calculations significantly increases resource costs. measured data collection can obtain the SAR target images under different actual scenarios with different platforms. These acquired data are the most authentic and effective. However, this acquirement will consume massive human, material, and time resources, and the number of the acquired SAR target images in each experiment are often limited. The result is that they cannot be used as a cost-effective way for application to obtain enough SAR data. Additionally, the synthesized images frequently suffer from a low resolution and insufficient diversity [35]. Moreover, sample enhancement in SAR (ATR) typically focuses on increasing the diversity of SAR samples and avoiding the overfitting of classification algorithms [36]. These operations primarily augment images from an image processing viewpoint; thus, the enhanced images fail to comply with radar imaging mechanisms and do not include novel content. Therefore, these methods cannot substantially augment the informational content of the target.

By contrast, generative approaches employing deep learning technologies have incomparable advantages in terms of sample expansion. Deep generative models simulate data that match the desired sample distribution. They bypass explicit feature extraction, learning effective hidden representations to enhance model performance and avoid complex electromagnetic calculations. Generative Adversarial Networks (GAN) demonstrate superior performance in image generation, owing to their advanced data-fitting and versatility capabilities [37]. Although extensive studies on SAR image generation using GANs exist, obtaining the required number of SAR images is still challenging, especially in practical applications with very scarce samples of the specific targets. Therefore, performing angle extrapolation for specific targets with limited samples is still a challenging task.

This research presents a sample-efficient and azimuth angle-controllable GAN-based approach, enabling the generation of SAR images within a given angle range. The proposed GAN architecture primarily comprises a generator, an angle synthesis algorithm, and a discriminator. First, the generator uses a local fusion module to merge features from images at different angles, creating a specified range of SAR images using a sparse representation-based algorithm. Next, a similarity discriminator and an azimuth angle predictor based on scattering point topology are introduced. Finally, a discriminator identifies real versus generated images, and the angle predictor checks if the angle of the output image is within the given range.In summary, this research presents the subsequent contributions:

A GAN architecture is proposed for angle controllable image generation with limited sample availability. It includes a generator, an angle synthesis algorithm module, and a discriminator, utilizing a local fusion module and a sparse representation-based synthesis algorithm to generate SAR images within a specified angle range.
Two discriminators are introduced, i.e., the similarity discriminator and the azimuth discriminator, to regulate image quality and azimuth angles. The similarity discriminator uses adversarial and local reconstruction losses to enhance network training and guide generation, while the azimuth angle predictor determines image angles via a scattering point topology diagram and learns angle encoding relationships for generating images at specified angles.
The recognition of SAR targets is improved through the generative augmentation of the samples. A linear combination of synthetic aperture radar (SAR) images with missing azimuth angles is achieved utilizing a sparse model. Therefore, the proposed approach not only generates near-authentic high-quality samples, but also enhances the exploitation of SAR-specific target characteristics.

2. Related Work

This section introduces three research areas related to this research: Generative Adversarial Networks, few-shot generation models, and SAR image augmentation techniques.

2.1. Few-Shot Generation Models

Few-shot generation is an extremely challenging research task. Recent research on GAN-based few-shot generation can be roughly categorized into three types: transfer learning-based approaches, optimization-based approaches, and fusion-based approaches. Generative Adversarial Networks (GANs) that utilize transfer learning exploit large-scale pre-training as the foundational basis for subsequent generative processes [38,39,40]. However, owing to the specific imaging principles of SAR and the considerable differences in images derived from distinct radar bands, suitable pre-trained models are hard to find. Optimization-based methods, such as FIGR [41] and Dawson [42] methods, combine adversarial learning with meta-learning techniques; nonetheless, empirical studies indicate that these approaches face constraints on the quality of the generated images [43]. In term of fusion-based methods, GMN [44] and MatchingGAN [45] extend few-shot classification by employing VAE and GAN for few-shot image generation tasks. F2GAN [46] further improves MatchingGAN by incorporating a non-local attention fusion module to merge and integrate features at different levels. A major limitation of these methods is merging high-level image features with global coefficients, leading to increased aliasing artifacts and reduced diversity in generated images. Additionally, global reconstruction loss is typically employed to constrain the model to generate images resembling a weighted stack of input images, which further degrades the generation quality [47]. Fusion-based few-shot generation methods possess an intrinsic advantage in SAR image generation, as they can significantly reduce artifacts and layout issues due to data scarcity compared to traditional GANs using noise as a basis. In contrast to these research approaches, this study aims to fuse deep features at a finer granularity by refining local representations and applying local-based reconstruction loss to minimize artifacts and introduce adjustable variables for precise generation.

2.2. Augmentation of SAR Images

Recently, deep learning has achieved remarkable success in signal and image processing, leading to the development of effective methods for generating SAR images [48,49,50,51]. Among the literature methods, Guo et al. address the problem of model collapse during the SAR image generation process by introducing a conditional Generative Adversarial Network that utilizes speckle normalization [52]. Cui et al. propose a Deep Convolutional GAN (DCGAN) algorithm for generating SAR images at arbitrary azimuth angles, employing an azimuth discriminator to filter images that approximate a particular angle [53]. Jiang et al. propose a Gabor Deep Convolutional Neural Network for SAR ATR [54], which is a DCNN data augmentation method based on Gabor filtering. This algorithm addresses the significant fitting challenge posed by the limited availability of training data associated with the application of DCNN algorithms. Zheng et al. introduce a multi-discriminator GAN algorithm based on label smoothing regularization for generating type-ambiguous SAR target images [55]. Du et al. proposed a Multi-Constraint GAN (MCGAN) aimed at enhancing the quality of SAR images by generating high-quality multi-class SAR images as a response to the existing shortcomings in SAR image quality [56]. Wang et al. combine Constrained Naive Generative Adversarial Networks (CN-GANs) with Least Squares Generative Adversarial Networks and image-to-image translation techniques to mitigate challenges such as the low signal-to-noise ratio, instability of models, and excessive flexibility in the output present in traditional GANs [57]. Jin et al. employ the transfer learning of Cyclic Consistency Generative Adversarial Networks (CycleGANs) to transcode between the optical and SAR image modalities [58]. The aforementioned deep learning-based SAR image generators have significantly advanced research in the field of SAR image acquisition. In recent years, with the continuous development of Denoising Diffusion Probabilistic Models (DDPMs) [59], numerous scholars have conducted research on the augmentation of SAR images. For example, Zhang et al. [60] proposed a method for repairing SAR ship target images using ship models. Hu et al. [61] introduced a Regional Denoising Diffusion Probabilistic Model (R-DDPM) for SAR speckle removal. Their proposed R-DDPM can perform multifunctional speckle removal for SAR images of various scales within a single training session. Additionally, using region-guided reverse sampling can effectively avoid artifacts in the fused SAR images. Bai et al. [62] proposed a method for translating SAR images to optical images based on diffusion models. Although diffusion models demonstrate greater stability in generation tasks, they still lag behind GANs in terms of generation speed.

3. Proposed Method

This section introduces the proposed framework of azimuth-controllable SAR target image generation. Initially, we provide a detailed illustration of the proposed azimuth-controllable SAR target image generation framework. Subsequently, we elaborate on the details of various modules introduced within the proposed framework.

3.1. Proposed SAR Image Generation Framework

According to momentum learning theory, data situated within multidimensional space generally originate from low-dimensional momentum mapping [63], and targets in SAR images are no exception. Targets in SAR imagery with continuous azimuthal angles can be represented within a low-dimensional flow [64], indicating that the distribution of targets in SAR images is stable and learnable. Consequently, we introduce a SAR target image generation network that is azimuth-controllable and sample-efficient in training.

The structure of the proposed framework is depicted in Figure 1. The objective is to synthesize SAR images with missing range azimuth angles utilizing SAR images with known azimuth angles. The network is architected to generate SAR target images with estimated angles through feature fusion and sparse representation techniques. Furthermore, the distribution and precision of these angles are validated using similarity and angle discriminators.

Initially, the dataset is partitioned into a training set D_a and a generation set D_b, ensuring that

D_{a} \cap D_{b} = \emptyset

. During the training phase, the model extracts a certain number of images from the training set

D_{a}

for training, being encouraged to learn transferable generative capabilities and continuously optimize the angle discriminator. Concurrently, the aim of the angle discriminator is to correlate the target angles with distinct angle labels during this training phase. It should be noted that the angle synthesis algorithm we designed is ‘frozen’ during the training phase; the particular reasons are explained in Section 3.2. During the testing phase, the networks generate novel images of a particular category, thus augmenting the target samples.

As depicted in Figure 1, The generator

G

is a conditional generator that comprises an encoder

E

, a decoder

H

, and a local fusion module (LFM). The input images

X = {x_{1}, \dots, x_{k}}

are first fed into the encoder

E

to extract deep features

F = E (X)

. Then, the LFM module takes

F

and a random coefficient vector

α

as inputs, producing semantically aligned fused features

\hat{F} = L F M (F, α)

. Subsequently,

H

reconstructs

\hat{F}

into images, producing generation results of the network. Furthermore, the authentic image

x

and the generated image

\hat{x}

are further processed through the angle synthesis algorithm utilizing sparse representation, obtaining a synthesized image

{\hat{x}}_{1}

of a given angle range. Parallel to this image generation process, there is also training to the image discriminators. The real image

X

and the synthesized image

{\hat{x}}_{1}

are sent into the discriminator

D_{1}

to assess the quality of generation. Subsequently, the angle discriminator

D_{2}

verifies that the angle of the synthesized target sample lies within the specified range.

3.2. Local Fusion Module (LFM)

A straight forward fusion approach performs a weighted average of the feature maps without semantic alignment, potentially resulting in numerous artifacts. In contrast, we propose a local fusion module, where one feature map is designated as the base, and the remaining images are utilized as references. Through performing feature selection and semantic matching, the base and reference features are aligned, reducing artifacts and preventing incomprehensible fusion results. Furthermore, the feature replacement operation contributes to increasing the diversity of generated images.

Figure 2 illustrates the detailed setup of the proposed LFM module as depicted in three image generation settings. Given a set of encoded feature maps

F = E (X) \in R^{k \times w \times h \times c}

, each tensor in

F

can be regarded as a set of

w \times h \times c

-dimensional local representations. The idea is to randomly select one of the features in

F

as the base feature

f_{base}

, while taking the remaining

k - 1

features as the reference features

f_{r e f}

. The LFM takes the selected

f_{base}

as the base and the rest of the

f_{r e f}

as a local feature bank to generate fused features. The entire fusion process can be divided into three stages: local selection, local matching, and local replacement.

3.2.1. Local Selection

Given a selected base feature

f_{base}

, the first step is to select which features within

f_{base}

should be replaced. At this stage, we randomly select local representations from the

h \times w

local positions in

f_{base}

. Random feature selection is employed to increase the diversity of the generation process, which is beneficial for improving the diversity of SAR target sample generation. In particular, multiple local representations are selected, wherein n = η × w × ℎ, where η ∈ (0, 1] represents the proportion of local representations to be integrated. It is noteworthy that when the chosen η approaches 1, an excessive number of features will be replaced, potentially leading to distortion in the generated target image. Conversely, when the value of η approaches 0, it leads to the replacement of very few features, resulting in a loss of diversity. Therefore, the determination of an optimal value for n is worth empirical investigation. In this paper, a value of

η = 0.6

is used to preserve diversity while maintaining structural integrity. After feature selection, we obtain a set of

n c

-dimensional local representations from the base feature

f_{base}

, denoted as

ϕ_{1}

.

3.2.2. Local Matching

After completing feature selection based on the base feature, we further find content in

F_{r e f}

that can semantically match with

ϕ_{1}

for replacement. The similarity between each pair of

f_{ref}

and

f_{base}

is calculated, constructing a similarity map

M

:

M^{(i, j)} = g (ϕ_{base}^{(i)}, f_{r e f}^{(j)})

(1)

where

i \in {1, \dots, n}, j \in {1, \dots, h \times w}

,

g

is a similarity measurement function implemented with matrix multiplication. The similarity map allows us to find and replace original local representations in

f_{base}

with the most similar ones from

ϕ_{1}

in the next step. The replacement of semantically alignment features can mitigate artifacts arising from the mixture of non-aligned features, which is crucial in the generation of SAR images. Let us denote the set of the top

k - 1

matching local representations in

f_{ref}

as

Φ_{r e f} \in R^{(k - 1) \times n \times c}

. Note that we also record the position information for each local representation in

ϕ_{1}

and

Φ_{r e f}

, which is further utilized to calculate the local reconstruction loss (see details in Section 3.5.1).

3.2.3. Local Feature Blending

After local selection and local matching, each base feature

ϕ_{1}

has

k - 1

replaceable local representations. For instance,

ϕ_{r e f}^{(1)} \in R^{(k - 1) \times c}

contains the most similar local representations in each

f_{ref}

matching the base local representation

ϕ_{base}^{(1)} \in R^{c}

(as represented within Figure 2). All the local representations are then replaced with the matching representations in

f_{base}

. We then blend the matching feature representations with a random coefficient vector

α = [α_{1}, \dots, α_{k}]

at each spatial location

t

, calculated as follows:

ϕ_{fuse}^{(t)} = α_{base} \cdot ϕ_{base}^{(t)} + \sum_{i = 1, \dots, k, i \neq b a s e} α_{i} \cdot ϕ_{r e f}^{(i)} (t)

(2)

where

\sum_{i = 1}^{k} α_{i} = 1, α_{i} \geq 0

and

t = 1, \dots, n

. This calculation retains the initial local representation with the ratio of

α_{b a s e}

, while the selection of random

α

values increase the diversity of image generation. This introduces stochasticity during feature fusion—each spatial location blends base/reference features with unique weights. This controlled randomization diversifies local texture combinations while maintaining global structural integrity, enabling the generation of distinct yet physically plausible SAR targets from identical inputs. Then, we replace all the

n

fused local representations

ϕ_{f u s e}

back to the corresponding positions in

f_{base}

. The fused feature, denoted as

\hat{F}

, is the resulting output of the LFM module.

3.3. Angle Synthesis Algorithm

We further introduce an image generation algorithm based on sparse coding to synthesize the controllable target sample with a specified azimuth angle. Sparse representation leverages the inherent sparsity of SAR scattering structures (e.g., isolated dominant points) to achieve sample-efficient angle synthesis. Unlike optical images, the deterministic scattering patterns in SAR images ensure sparse combinations yield physically meaningful results without causing texture ambiguity. We represent the expected target image through sparse coefficients using other atoms in the dictionary. The procedure can be represented as follows:

\{\begin{matrix} y = D \cdot α \\ s . t . | α |_{0} < K \end{matrix}

(3)

This describes a simplified synthetic algorithm [64], where

y

is the target angle SAR image

y \in R^{n \times 1}

,

D \in R^{n \times m}

is a comprehensive dictionary consisting of real images, generated images and rotated real images (each column vector in

D

known as atoms),

K

is a sparsity constant, and

α \in R^{m \times 1}

is the sparse representation coefficient.

The process of synthesizing target images can be viewed as calculations based on the synthetic sparse coefficients and the basic images in the dictionary. Therefore, the primary challenge lies in deriving sparse representation coefficients from known images while incorporating angular information. The designed computational model is as follows:

\{\begin{matrix} X_{θ^{*}} = D \cdot α = \sum_{i} R (X_{θ_{i}}) \cdot α_{i} \\ s . t . \sum_{i} α_{i} = 1 a n d | α |_{0} = K \end{matrix}

(4)

where

X_{θ^{*}}

represents the SAR target image with a specified angle that is expected. In the following, we provide a detailed explanation on deriving the coefficients

α_{i}

.

With two known angles, the angle measurement method can be defined as follows:

$d (θ_{i}, θ_{j}) = m i n {| θ_{i} - θ_{j} |, | θ_{i} + 360 - θ_{j} |, | θ_{i} - 360 - θ_{j} |}$

(5)
The measurement of angles is employed to inversely determine the coefficients of sparse representation. Based on the known reciprocal of the distances between $M$ images, the synthesis coefficients can be calculated. We empirically set $M$ to 5, comprising two boundary angle images and three images generated within the specified angle range. By employing two authentic images to delineate the synthesis angle interval and three images to incorporate the necessary internal features, the target angle image is synthesized utilizing these five images.
The required sparse representation coefficients $α_{i}$ are derived through the following calculation:

$α_{i} = \sum_{i = 1}^{M} d {(X_{θ^{*}}, X_{θ_{i}})}^{- 1} / \sum_{i = 1}^{K} d {(X_{θ^{*}}, X_{θ_{i}})}^{- 1}$

(6)

Utilizing the introduced sparse model, we can obtain images from different perspectives. It is worth noting that the completeness of the dictionary in the sparse representation is crucial for the final generation quality. Therefore, we aim to enhance the completeness of the dictionary by generating high-quality pseudo-images using the generator

G

. During training, the angle synthesis algorithm is frozen to prevent interference with adversarial optimization. The optimization of parameters is conducted on only the generator and discriminator. In the testing phase, it is activated to construct a hybrid dictionary from both real images and generator-produced samples, allowing controllable angle extrapolation through sparse representation.

3.4. Angle Discriminator

An angle discriminator is further introduced to regularize the synthesis of angle-controllable samples. As illustrated in Figure 3, it consists of two branches: the upper branch predicts the target angle, whereas the lower branch accesses the actual angle value. The angle prediction branch is constructed utilizing convolutional layers and fully connected layers, whereas the true angle branch obtains the actual angle of the target utilizing the corresponding image labels. Finally, we employ a comprehensive loss function combining cosine similarity, periodic angle, and MSE losses (see details in Section 3.5), to optimize the angle prediction.

In the training phase, the angle prediction module exploits features in real images to estimate the angles, and continuously refines the generator to reduce discrepancies between the predicted and the actual angles. Simultaneously, the angle predictor learns to encode and map the target azimuth angle. Differently, during the testing phase, the angle predictor guides the angle synthesis algorithm to generate SAR target images constrained within a specified angle range.

3.5. Loss Functions

3.5.1. Local Reconstruction Loss

In typical fusion-based approaches for few-shot generation, a weighted reconstruction loss at the image level is employed to impose constraints on the generated images. For example, given a set of input images

X = {x_{1}, \dots, x_{k}}

(with the same azimuth angle) and a random coefficient vector

α

, the generated image

\hat{x}

can be learned with the following objective:

L_{global} = {∥ \hat{x} - \sum_{i = 1}^{k} α_{i} \cdot x_{i} ∥}_{1}

(7)

where

\sum_{i = 1}^{k} α_{i} = 1, α_{i} \geq 0

. Within this optimizing objective, as the value of

α_{i}

increases, the generated image approximates the authentic image. Their features exhibit increased similarity, as this objective minimizes the distance between the generated image and the weighted addition of authentic images. Features that have not been semantically aligned during the generation phase may result in aliasing artifacts, which are particularly notable in SAR images. An in-depth analysis of this issue is presented in Section 4, with experimental evidence.

To address this problem, we introduce a local reconstruction loss function. We first record the matching relationship between each selected base feature

ϕ_{base}

and the reference feature

Φ_{r e f}

, then map the selected feature back to the same position in the original image to obtain a coarsely fused image

L F M (X, α)

. Subsequently, we regularize the generated image

\hat{x}

with a local constraint:

L_{local} = ∥ \hat{x} - L F M (X, α) ∥_{1}

(8)

The advantage of this approach is that it captures the spatial information in both reference and base features through weighted fusion, facilitating the generation of images with rich semantics and a reasonable spatial layout. This local fusion strategy, along with the local reconstructive loss, can significantly reduce the artifacts and greatly improve the quality of generated SAR images.

Furthermore, to precisely capture the structural characteristics of the images, a multi-scale structural similarity loss (MSSIM-loss) is introduced along with an L1 mixed loss function:

L^{M S - S S I M} (\hat{x}, X) {+ L}^{l_{1}} = (1 - M S - S S I M (\tilde{\hat{x} - X})) + \frac{1}{N} \sum_{p \in P} | \hat{x} - X |

(9)

Combining these two objectives, the comprehensive reconstruction loss is as follows:

L_{lcal} = L_{local} + L^{M S - S S I M} (\hat{x}, X) {+ L}^{l_{1}}

(10)

3.5.2. Adversarial Losses

In the proposed framework, multiple adversarial losses are adopted to increase the authentic and semantic structure of the generated images. The primary adversarial training objective is adopted using the loss in hinge GAN [18] to impose constraints on the generator to produce realistic images that are indistinguishable by the discriminator:

L_{a d v}^{D} = \max (0.1 - D (X)) + \max (0.1 + D (x)),

(11)

L_{a d v}^{G} = - D (\hat{x})

3.5.3. Label and Angle Loss Functions

To increase the discriminability of generated images, we further introduce a classification loss derived from ACGAN [18], where an auxiliary classifier is employed to categorize the input images into their respective classes. Specifically, the discriminator is responsible for accurately categorizing both authentic and synthetic images, whereas the generator is required to produce images that are classified into the same class as the authentic images, denoted as follows:

L_{c l s}^{D} = - l o g P (c (X) ∣ X),

(12)

L_{c l s}^{G} = - l o g P (c (X) ∣ \hat{x}) .

We further impose an angular constraint to regularize the angle characteristics. This angle loss consists of two components: the cosine similarity loss and the periodic angle loss. The cosine similarity loss ensures directional alignment between predicted and true angles, while the periodic angle loss accounts for the circular nature of azimuth measurements.

First, the cosine similarity loss aims to maximize the cosine similarity between the predicted angle and the true angle. Cosine similarity measures the angle between two vectors, where a value closer to 1 indicates that the angles are more similar.

L_{cosine} = 1 - \cos (\hat{θ}, θ_{true})

(13)

where

\hat{θ}

is the predicted angle and

θ_{true}

is the reference angle. This cosine similarity loss facilitates the convergence of the predicted angle towards the true angle, thereby enhancing the prediction accuracy.

Meanwhile, the periodic angle loss minimizes the angular error, ensuring that deviations remain within

[- π, π]

. Since angle values are periodic, the angle difference is restricted within

[- π, π]

to avoid computational errors caused by angle warping. The calculation is formulated as follows:

L_{periodic} = \frac{1}{N} \sum_{i = 1}^{N} {(remainder ({\hat{θ}}_{i} - θ_{true, i} + π, 2 π) - π)}^{2}

(14)

where

{\hat{θ}}_{i}

is the predicted angle,

θ_{true}

is the true angle, the term

(x, 2 π)

ensures the difference value stays within

[- π, π]

, and remainder is the calculation for the remainder.

This dual formulation avoids large errors near 0°/360° boundaries and improves angular fidelity. Combining them, the total loss function for the angle discriminator is as follows:

L_{a d l v} = {λ_{p} L}_{periodic} + {λ_{c} L}_{cosine}

(15)

Their combination enables the learning of both global orientation coherence and local cyclic constraints, which is critical for generating physically plausible SAR targets. Integrating these training losses, the entire GAN framework undergoes end-to-end optimization through the application of the following training objective:

L_{G} = L_{a d v}^{G} + λ_{c l s}^{G} L_{c l s}^{G} + λ_{l c a l} L_{l c a l}^{G} + L_{a d l v}

(16)

L_{D} = L_{a d v}^{D} + λ_{c l s}^{D} L_{c l s}^{D} + L_{a d l v}

3.6. Implementation Details

The proposed algorithm is implemented using PyTorch 2.0 and trained on a computational server equipped with an NVIDIA A4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA). We conducted experiments using two types of ground vehicle datasets (see details in Section 4). The images are classified into

k

categories based on vehicle types, while their labels consist of the azimuth angles and the categories. The model employs explicit category control through label conditioning. During training, we use the first k − 1 categories with their class labels to optimize the class-specific feature learning. In the testing phase, the generator synthesizes images for the held-out k-th category by directly specifying its label. This learning mechanism ensures controllable generation while verifying label consistency. In both datasets, 72 image samples are utilized as the foundation for image generation (k = 72). Notably, the model requires about 12,700 MB of GPU memory for training and 17,400 MB for testing.

To balance between training time and performance, the weighting parameters for the loss functions are set as

λ_{c l s}^{G}

= 0.1,

λ_{l c a l}

l = 0.5,

λ_{p}

= 0.5,

λ_{c}

= 0.1, following the literature practice in [65]. We empirically set K = 4 in Equation (4) to ensure deterministic angle interpolation (the selection of the value is validated in Section 4.3). Training runs were conducted for 10,000 iterations with a learning rate of 0.0001 for the generator, similarity discriminator, and angle discriminator.

4. Experimental Analysis and Discussions

In this section, we conduct a comprehensive experimental evaluation of the proposed algorithm, providing both a quantitative and qualitative analysis of its generation performance, as well as discussing the advantages and limitations. Through the integration of multiple parameters and validation of enhancements in recognition algorithms, we conduct a comprehensive examination of the generation capability, and examine its generalization across different datasets.

4.1. Dataset Introduction

In the experimental section, two datasets are employed: the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset and a self-constructed SAR target image dataset. The MSTAR dataset is one of the most widely used benchmarks in the field of SAR target recognition.

MSTAR: The MSTAR dataset was collected and released by Sandia National Laboratories in the United States in the 1990s. This dataset is primarily used for research in target recognition and classification of SAR targets. It includes images of ten types of vehicles, including 2S1, BRDM2, BTR60, BTR70, BMP2, D7, T62, T72, ZIL131, and ZSU234. The MSTAR dataset comprises 21,600 images, each with a resolution of 128x128 pixels. It contains targets captured at angular intervals ranging from 1° to 2°. We employed its subsets with 1° azimuth sampling to benchmark angular resolution limits. This selection intentionally tests fine-grained angle synthesis under maximal angular density disparity. Some examples of the targets are shown in Figure 4.

CV-SARID: We have constructed a dataset containing 10 types of SAR vehicle targets, named the Civilian Vehicle SAR Image Dataset (CV-SARID). As shown in Figure 5, targets in the CV-SARID include Mercedes-Benz sedans (B200), Toyota off-road vehicles (GXR1), trucks (H500), pickup trucks (JX4D), transit vans (JX493), Land Cruiser off-road vehicles (PRA1), Volvo sedans (S90), fire trucks (T5G340), bulky trucks (V5), and vans (W306), differentiated by size and scattering characteristics. The multi-aspect data are collected with pitch angles of 25°, 30°, and 45°, and azimuth angles from 0° to 360° at 5° intervals. Each category comprises 72 images captured at pitch angles of 25° and 30°, while 70 images are captured at a pitch angle of 45°, amounting to a total of 2120 images. The dataset will be made public later.

4.2. Evaluation Metrics

In this section, we briefly introduce the evaluation metrics adopted to assess the generative quality. We use Fréchet Inception Distance (FID), Structural Similarity Index Measure (SSIM), and signal-to-noise ratio (SNR) to quantitatively evaluate the quality of images with angular shifts.

4.2.1. Fréchet Inception Distance (FID)

FID is a commonly used metric to evaluate the difference between distributions of the generated data and the authentic data. FID measures the quality of generated images by calculating the distribution difference in the feature space (projected using a pre-trained Inception V3 model), and then compares their mean and covariance matrices to compute the Fréchet distance. A lower FID value shows generated images are more similar to real data.

4.2.2. Structural Similarity Index Measure (SSIM)

The SSIM evaluates image quality by comparing luminance, contrast, and structure to determine visual similarity, reflecting human visual perception. Thus, the SSIM is commonly applied in image processing and quality assessment for tasks like image denoising, super-resolution, and compression to assess reconstructed and original image similarity. The SSIM effectively evaluates differences between generated and real SAR target images, making it essential for assessing structural integrity in the SAR image generation task.

4.2.3. Signal-to-Noise Ratio (SNR)

In SAR images, the SNR is a crucial quality metric representing the ratio of the signal strength to noise level in the image. A high SNR indicates that the signal component is strong while the noise component is weak, resulting in clearer images with richer details. This is crucial for target detection and recognition in SAR images, as noise can blur target features and cause inaccurate analysis. Consequently, achieving an SNR for generated images that closely approximates that of real images serves as an important indicator.

4.3. Ablation Study

To evaluate the impact of each module in the proposed framework, ablation experiments are conducted to examine the LFM, AZS, and angle discriminator. Table 1 presents the computational costs of generating a 128 × 128 resolution image, obtained by gradually adding different comments to the vanilla generator. The quantitative results demonstrate progressive increases in the parameter count, training time, and GPU memory consumption as additional components are incorporated into the baseline architecture. The LFM contributes the most significant computational overhead due to its feature alignment and semantic matching operations. The AZS module introduces moderate parameter growth while enabling angle-controllable synthesis through sparse representation, requiring no additional training time. The angle discriminator further adds 1.05 million parameters for angle verification. It is worth noting that the inference requires only about 20 s per image generation, demonstrating operational feasibility despite the upfront training costs.

We further present the generative results before and after using the LFM in Figure 6. The visual results demonstrate that incorporating the LFM significantly enhances the generative quality. The LFM, through its semantic alignment and feature blending, effectively reduces artifacts and improves structural integrity compared to conventional fusion methods. In summary, although employing the LFM increases the computational costs, it substantially improves generation quality by reducing artifacts.

Additionally, the sparsity parameter K in Equation (4) controls the number of dictionary atoms (real/synthetic images) used for angle synthesis. Figure 7 presents the generative results obtained using different K values. Empirical tests on both datasets confirm K = 4 optimally balances artifact suppression and generation diversity. A lower K value, K = 2, results in minimal variation compared to the original image, whereas increasing the parameter to K = 6 introduces significant noise and artifacts.

Figure 8 and Figure 9 further compare training dynamics with/without employing the key components. The complete model (Figure 8) shows accelerated convergence and stable loss plateaus, confirming that (1) the local feature replacement (Section 3.2.3) in the LFM mitigates mode collapse through semantic alignment; (2) structural losses (MSSIM + L1) maintain target geometry integrity during adversarial optimization. In the subsequent loss function curve experiments, the stable co-convergence of generator/discriminator losses and minimal oscillation can also be observed, which further demonstrates the resistance to overfitting—a critical advantage in the few-shot training regime.

4.4. Qualitative Evaluation of the Generative Results

Figure 10 presents the generative results on the two datasets. To ensure the validity of the experiments, all model parameters were kept consistent throughout both the training and generation processes. The quantity of image data for both training and testing phases is consistent, with the primary difference lying in the interval of target angles across the two datasets. Table 2 further evaluates the quantitative results of the generated images. One can observe that our methods achieve high performance in both datasets, while the metrics obtained on the MSTAR dataset are overall higher than those obtained on the CV-SARID. In the CV-SARID, the proposed method effectively learns the complete scattering information while preserving the overall structural integrity of vehicles, achieving a certain degree of angular shift. In the MSTAR, the generative results exhibit fewer artifacts and clearer scattering points, as demonstrated by a lower FID and high SNR. It also exhibits higher structural fidelity (e.g., clearer vehicle contours), as demonstrated by the higher SSIM.

This phenomenon is attributed to the increased density of angular variations in MSTAR, which, given an equivalent volume of data, facilitates the learning of a more coherent relationship between angles and images. Consequently, a more accurate angle discriminator is trained. Table 2 shows that our model performs well on both datasets, with the MSTAR results showing superiority in structural similarity and peak SNR. A detailed analysis on the angular shift is discussed in Section 4.4.

4.5. Generation of Multi-Angle Target Images

Figure 11 presents several examples of the synthetic images with different angles. The angle synthesis algorithm effectively generates high-quality SAR target images within a specified angle range. In Figure 11a, the proposed approach maintains a stable image quality and a good angle range even for vehicles with varied surface structures. However, there also exists evident artifacts in the generated images. This phenomenon is attributable to the increased target angles in the CV-SARID dataset, which induce more substantial alterations in the structural features and scattering points of the targets as the angle varies. Consequently, more errors occur, complicating the capture of common features during angle synthesis.

In the results on MSTAR in Figure 11b, the generative quality is more stable. The generated images exhibit not only enhanced structural completeness but also more precise azimuth angles. This can be attributed to the increased discrimination capability of the learned angle discriminator, as well as the optimized training of the generative method with more coherent angle information.

Moreover, as shown in Figure 12, the MSDA module improves feature resolution in generated results. The first row in Figure 12 exhibits that objects are more distributed and detailed (e.g., flight trajectories), achieving better consistency with authentic images. Additionally, the MSDA module partially mitigates the presence of artifacts. In the second row in Figure 12, one can observe that the MSDA module facilitates the learning of more complete scattering features.

4.6. Comparative Experiments

We further compare the proposed method with several state-of-the-art (SOTA) methods, including FIGR, F2GAN, and MatchingGAN. The evaluation is conducted comprehensively from three dimensions: generative qualitative, quantitative results, and visual evaluation. To test the generalization capability across different image scales, experiments are conducted on both two datasets.

4.6.1. Qualitative Evaluation

Figure 12 and Figure 13 present the target images generated by different methods on the two datasets. Figure 12 presents the generated samples in the CV-SARID dataset. It is evident that the proposed method enhances the preservation of vehicle structural information and generates fewer artifacts in comparison to other approaches. Furthermore, it effectively captures the scattering characteristics of SAR targets. Figure 13 presents samples generated in the MSTAR dataset. The proposed method also yields improved results by providing clearer distinctions between background and targets and learning more reasonable overall boundary features and appropriate shadow details.

Among the compared models, FIGR is an optimization-based few-shot generation method, which exhibits suboptimal performance across both datasets. This is due to the inherent limitations in optimization algorithms and the special characteristics of SAR images. F2GAN and MatchingGAN are both fusion-based few-shot generation models, but they generate many artifacts or fail to capture comprehensive target feature details. The implementation of the semantic feature alignment in the proposed method substantially diminishes artifacts and improves the structural and background details, leading to significant advantages compared to the compared literature methods.

4.6.2. Quantitative Evaluation

Table 3 and Table 4 present the quantitative comparison with the SOTA methods, where the proposed approach shows a clear advantage in various metrics. Its performance on the FID metric demonstrates a better data distribution, indicating that the difference between the synthetic data and the authentic data is marginal. It also obtains the highest SSIM, indicating that the produced SAR images possess more comprehensive target structures. Additionally, its highest SNR indicates that the generated images are closer to real data. Based on these three metrics, it can be concluded that the proposed method exhibits a significant advantage in few-shot SAR image generation. The consistent performance gains on military (MSTAR) and civilian targets (CV-SARID) highlight its adaptability to diverse target types. By decoupling geometry-specific dependencies through local scattering fusion and sparse representation, the architecture inherently supports generalization across diverse SAR target categories.

4.7. Generative Data Augmentation for SAR Target Recognition

To evaluate the enhancement on ATR, a CNN-based Fea-DA recognition model is employed [65]. Since the recognition of targets in the MSTAR dataset has been extensively studied, the experiments are conducted on the CV-SARID dataset. It contains three types of pitch angle data, namely 25 degrees, 30 degrees, and 45 degrees. To demonstrate the generalization capability of our model, we selected images with a 30-degree pitch angle for training and used the images at 25 degrees and 45 degrees for testing. This evaluation setting tests the generalization capability in practical applications. To ensure the validity and accuracy of the experiments, the number of training epochs and experimental trails are made consistent in each experiment. The proposed method is utilized to generate data for each category at the 30-degree pitch angle, thereby supplementing each class with an additional 50 images (the original image number is 72 for each class). Both experiments are repeated for five trails to reduce the impact of random factors.

The experimental results indicate that the augmentation of data contributed to significant improvements in the target recognition rate. With 25-degree pitch angle data (see Table 5, Figure 14 and Figure 15), baseline recognition accuracy improved from 88.06% to 94.22% after data augmentation, resulting in a 6.16% increase. This shows the generated data fit the real distribution and supplemented angle data, thus enhancing the classification accuracy.

For the 45-degree pitch angle test data (see Table 6, Figure 16 and Figure 17), the baseline recognition accuracy was 71.45%, which improved to 74.74% with added data, exhibiting a 3.29% increase. Additionally, the generated data also enhance the recognition rate. An analysis is conducted on the correlation between the enhancement in the recognition rate at a 45-degree pitch angle compared to that at 25 degrees. The disparity between the training samples with a 45-degree pitch angle is more pronounced compared to the 30-degree samples. As the data augmentation and divergences in different source domains equally increase, the recognition of the targets becomes more challenging. Consequently, the observed improvement in the recognition rate is not as substantial.

To conclude, these experiments demonstrate that the generated images do not only exhibit a high quality relative to the other literature methods, but they also enhance the accuracy of SAR target recognition models.

5. Conclusions

In this study, we explored the angle-controllable generation of SAR image targets with few samples and proposed a GAN framework based on feature alignment and angle synthesis algorithms. We conducted semantic alignment to generate images with shared feature characteristics and employed sparse representation to create SAR images within expected angle intervals. Furthermore, in the generation phase, the optimization of the reconstruction loss function was undertaken, alongside the incorporation of structural constraints to improve the overall generation quality. Additionally, an angle discriminator was introduced to manipulate the azimuth angle of generated target samples.

The experimental results indicate that the proposed method is capable of producing SAR images with a coherent structure, reduced artifacts, substantial diversity, and controllable target angles. Through testing on two datasets, stable performance improvements were achieved, not only in terms of the quality of the images, but also in terms of the stability of the synthetic target angle. Furthermore, the proposed approach demonstrates its effectiveness as a data augmentation technique to improve the accuracy of SAR target recognition. In the target recognition experiments on the CV-SARID dataset, we conducted data augmentation on images with a 30-degree pitch angle and tested them at 25 degrees and 45 degrees, achieving recognition rate improvements of 6% and 3%.

Two remaining limitations of the proposed method are that it still produces certain artifacts in the generated images and may generate target samples that exceed the specified angle range. Additionally, broader generalization to non-vehicle SAR objects (e.g., ships, infrastructure) remains constrained by current dataset availability, a common limitation in SAR ATR research that warrants community efforts for data diversity. These issues require further research investigations in the future.

Author Contributions

Conceptualization, X.W. and B.H.; methodology, X.W. and B.H.; software, W.W.; validation, X.W., B.H. and P.G.; resources, B.H.; data curation, X.W.; writing—original draft preparation, X.W. and L.D.; writing—review and editing, X.W., B.H. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Fund of the National Key Laboratory of Automatic Target Recognition under Grant 230101; in part by the Project funded by China Post-Doctoral Science Foundation under Grant 2023M744322; and in part by the Post-Doctoral Fellowship Program of China Post-Doctoral Science Foundation under Grant GZC20233543.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Acknowledgments

The authors thank the editors and reviewers for their hard work and valuable advice.

Conflicts of Interest

Authors Xilin Wang and Pengcheng Guo were employed by the company China North Industries Group Corporation Limited. They participated in manuscript writing, software development, and experiments. in the study. The role of the company was provided technical support in this study, but did not participate in research design, data analysis, or interpretation of results. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Curlander, J.C.; McDonough, R.N. Synthetic Aperture Radar; Wiley: New York, NY, USA, 1991; Volume 11. [Google Scholar]
Dudgeon, D.E.; Lacoss, R.T. An overview of automatic target recognition. Linc. Lab. J. 1993, 6, 3–10. [Google Scholar]
Wang, C.; Pei, J.; Liu, X.; Huang, Y.; Mao, D.; Zhang, Y.; Yang, J. SAR target image generation method using azimuth-controllable generative adversarial network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9381–9397. [Google Scholar] [CrossRef]
Wang, C.; Pei, J.; Li, M.; Zhang, Y.; Huang, Y.; Yang, J. Parking information perception based on automotive millimeter wave SAR. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Wang, C.; Pei, J.; Liu, X.; Huang, Y.; Yang, J. A deep deformable residual learning network for SAR image segmentation. In Proceedings of the 2021 IEEE Radar Conference (RadarConf21), Atlanta, GA, USA, 8–14 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Vitale, S.; Ferraioli, G.; Pascazio, V. Complexity analysis of an edge preserving CNN SAR despeckling algorithm. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 6922–6925. [Google Scholar]
Wang, L.; Zheng, M.; Du, W.; Wei, M.; Li, L. Super-resolution SAR image reconstruction via generative adversarial network. In Proceedings of the 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE), Hangzhou, China, 3–6 December 2018; pp. 1–4. [Google Scholar]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
Salcedo-Sanz, S.; Ghamisi, P.; Piles, M.; Werner, M.; Cuadra, L.; Moreno-Martı, A.; Izquierdo-Verdiguier, E.; Mun, J.; Mosavi, A.; Camps-Valls, G. Machine learning information fusion in earth observation: A comprehensive review of methods, applications and data sources. Inf. Fusion 2020, 63, 256–272. [Google Scholar] [CrossRef]
Doi, K.; Sakurada, K.; Onishi, M.; Iwasaki, A. GAN-based SAR-tooptical image translation with region information. In Proceedings of the IGARSS 2020 2020 IEEE International Geoscience and Remote Sensing Symposium, Virtual, 26 September–2 October 2020; pp. 2069–2072. [Google Scholar]
Wang, X.; Zhu, D.; Li, G.; Zhang, X.-P.; He, Y. Proposal-copulabased fusion of spaceborne and airborne SAR images for ship target detection. Inf. Fusion 2022, 77, 247–260. [Google Scholar] [CrossRef]
Rasti, B.; Ghamisi, P. Remote sensing image classification using subspace sensor fusion. Inf. Fusion 2020, 64, 121–130. [Google Scholar] [CrossRef]
Kulkarni, S.C.; Rege, P.P. Pixel level fusion techniques for SAR and optical images: A review. Inf. Fusion 2020, 59, 13–29. [Google Scholar] [CrossRef]
Simone, G.; Farina, A.; Morabito, F.C.; Serpico, S.B.; Bruzzone, L. Image fusion techniques for remote sensing applications. Inf. Fusion 2002, 3, 3–15. [Google Scholar] [CrossRef]
Wang, C.; Luo, S.; Pei, J.; Liu, X.; Huang, Y.; Zhang, Y.; Yang, J. An entropy-awareness meta-learning method for SAR open-set ATR. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Wang, C.; Pei, J.; Luo, S.; Huo, W.; Huang, Y.; Zhang, Y.; Yang, J. SAR ship target recognition via multiscale feature attention and adaptive weighed classifier. IEEE Geosci. Remote Sens. Lett. 2023, 21, 4003905. [Google Scholar] [CrossRef]
Wang, C.; Liu, X.; Huang, Y.; Luo, S.; Pei, J.; Yang, J.; Mao, D. Semisupervised SAR ATR framework with transductive auxiliary segmentation. Remote Sens. 2022, 14, 4547. [Google Scholar] [CrossRef]
Wang, C.; Pei, J.; Wang, Z.; Huang, Y.; Wu, J.; Yang, H.; Yang, J. When deep learning meets multi-task learning in SAR ATR: Simultaneous target recognition and segmentation. Remote Sens. 2020, 12, 3863. [Google Scholar] [CrossRef]
Wang, C.; Liu, X.; Pei, J.; Huang, Y.; Zhang, Y.; Yang, J. Multiview attention CNN-LSTM network for SAR automatic target recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 504–513. [Google Scholar]
Wang, C.; Pei, J.; Wang, Z.; Huang, Y.; Yang, J. Multi-view CNN-LSTM neural network for SAR automatic target recognition. In Proceedings of the IGARSS 20202020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1755–1758. [Google Scholar]
Ross, T.D.; Worrell, S.W.; Velten, V.J.; Mossing, J.C.; Bryant, M.L. Standard SAR ATR evaluation experiments using the MSTAR public release data set. In Algorithms for Synthetic Aperture Radar Imagery V; International Society for Optics and Photonics: Orlando, FL, USA, 1998; Volume 3370, p. 566573. [Google Scholar]
Li, J.W.; Qu, C.W.; Shao, J.Q. Deep learning based SAR image ship detection dataset and performance analysis. In Proceedings of the Fifth Annual Conference on High Resolution Earth Observation, Xi’an, China, 18–20 June 2018; p. 22. [Google Scholar]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR dataset of ship detection for deep learning under complex backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef]
Sun, X.; Wang, Z.R.; Sun, Y.R.; Diao, W.; Zhang, Y.; Fu, K. AIR-SARShip-1. 0: A high-resolution SAR ship detection dataset. J. Radar. 2019, 8, 852–862. [Google Scholar]
Lehner, S.; Schulz-Stellenfleth, J.; Brusch, S.; Li, X. Use of TerraSAR-X Data for Oceanography. In Proceedings of the 7th European Conference on Synthetic Aperture Radar, Friedrichshafen, Germany, 2–5 June 2008. [Google Scholar]
Gao, F.; Yang, Y.; Wang, J.; Sun, J.; Yang, E.; Zhou, H. A deep convolutional generative adversarial networks(DCGANs) -based semi-supervised method for object recognition in synthetic aperture radar (SAR) images. Remote Sens. 2018, 10, 846. [Google Scholar] [CrossRef]
Pei, J.; Huang, Y.; Huo, W.; Zhang, Y.; Yang, J.; Yeo, T.-S. SAR automatic target recognition based on multiview deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 2196–2210. [Google Scholar]
Liu, X.; Tao, Z.; Shao, J.; Yang, L.; Huang, X. Elimrec: Eliminating single-modal bias in multimedia recommendation. In Proceedings of the 30th ACM International Conference on Multimedia, New York, NY, USA, 10 October 2022. [Google Scholar]
Tao, Z.; Liu, X.; Xia, Y.; Wang, X.; Yang, L.; Huang, X.; Chua, T.-S. Self-supervised learning for multimedia recommendation. IEEE Trans. Multimed. 2022, 25, 5107–5116. [Google Scholar]
Wei, Y.; Liu, X.; Ma, Y.; Wang, X.; Nie, L.; Chua, T. Strategy-aware bundle recommender system. In SIGIR; ACM: New York, NY, USA, 2023. [Google Scholar]
Zhao, Y.; Fei, H.; Ji, W.; Wei, J.; Zhang, M.; Zhang, M.; Chua, T. Generating visual spatial description via holistic 3d scene understanding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 7960–7977. [Google Scholar]
Hu, X.; Zhang, P.; Ban, Y.; Rahnemoonfar, M. GAN-based SAR and optical image translation for wildfire impact assessment using multi-source remote sensing data. Remote Sens Environ. 2023, 289, 113522. [Google Scholar] [CrossRef]
Wang, C.; Huang, Y.; Liu, X.; Pei, J.; Zhang, Y.; Yang, J. Global in local: A convolutional transformer for SAR ATR fsl. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4509605. [Google Scholar]
Khan, M.A.; Menouar, H.; Hamila, R. Multimodal Crowd Counting with Pix2Pix GANs. arXiv 2024, arXiv:2401.07591. [Google Scholar]
Tian, Z.; Wang, W.; Zhou, K.; Song, X.; Shen, Y.; Liu, S. Weighted Pseudo-Labels and Bounding Boxes for Semisupervised SAR Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5193–5203. [Google Scholar] [CrossRef]
Deng, J.; Wang, W.; Zhang, H.; Zhang, T.; Zhang, J. PolSAR Ship Detection Based on Superpixel-Level Contrast Enhancement. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4008805. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. (NeurIPS) 2014, 27, 2672–2680. [Google Scholar]
Wang, Y.; Wu, C.; Herranz, L.; van de Weijer, J.; Gonzalez-Garcia, A.; Raducanu, B. Transferring GANs: Generating images from limited data. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 218–234. [Google Scholar]
Robb, E.; Chu, W.-S.; Kumar, A.; Huang, J. Few-shot adaptation of generative adversarial networks. arXiv 2020, arXiv:2010.11943. [Google Scholar]
Zhao, M.; Cong, Y.; Carin, L. On leveraging pretrained GANs for generation with limited data. In Proceedings of the 37th International Conference on Machine Learning (ICML), Online, 13–18 July 2020; Volume 119, pp. 11340–11351. [Google Scholar]
Clou, L.; Demers, M. Figr: Few-shot image generation with reptile. arXiv 2019, arXiv:1901.02199. [Google Scholar]
Liang, W.; Liu, Z.; Liu, C. Dawson: A domain adaptive few shot generation framework. arXiv 2020, arXiv:2001.00576. [Google Scholar]
Nichol, A.; Schulman, J. Reptile: A scalable metalearning algorithm. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Bartunov, S.; Vetrov, D. Few-shot generative modelling with generative matching networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Lanzarote, Spain, 9–11 April 2018; pp. 670–678. [Google Scholar]
Hong, Y.; Niu, L.; Zhang, J.; Zhang, L. MatchingGAN: Matching-based few-shot image generation. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
Hong, Y.; Niu, L.; Zhang, J.; Zhao, W.; Fu, C.; Zhang, L. F2GAN: Fusing-and-filling GAN for few-shot image generation. In Proceedings of the ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2535–2543. [Google Scholar]
Ghazanfari, S.; Garg, S.; Krishnamurthy, P.; Khorrami, F.; Araujo, A. R-LPIPS: An Adversarially Robust Perceptual Similarity Metric; Cornell University Library: Ithaca, NY, USA, 2023. [Google Scholar]
El-Darymli, K.; Gill, E.W.; McGuire, P.; Power, D.; Moloney, C. Automatic target recognition in synthetic aperture radar imagery: A state-of-the-art review. IEEE Access 2016, 4, 6014–6058. [Google Scholar] [CrossRef]
Wilmanski, M.; Kreucher, C.; Lauer, J. Modern approaches in deep learning for SAR ATR. In Algorithms for Synthetic Aperture Radar Imagery XXIII; International Society for Optics and Photonics: Oulu, Finland, 2016; Volume 9843, p. 98430N. [Google Scholar]
Gong, M.; Yang, H.; Zhang, P. Feature learning and change feature classification based on deep learning for ternary change detection in SAR images. ISPRS J. Photogramm. Remote Sens. 2017, 129, 212–225. [Google Scholar] [CrossRef]
Pu, W. Shuffle GAN with autoencoder: A deep learning approach to separate moving and stationary targets in SAR imagery. IEEE Trans. Neural Networks Learn. Syst. 2021, 33, 4770–4784. [Google Scholar]
Guo, J.; Lei, B.; Ding, C.; Zhang, Y. Synthetic aperture radar image synthesis by using generative adversarial nets. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1111–1115. [Google Scholar]
Cui, Z.; Zhang, M.; Cao, Z.; Cao, C. Image data augmentation for SAR sensor via generative adversarial nets. IEEE Access 2019, 7, 42255–42268. [Google Scholar] [CrossRef]
Jiang, T.; Cui, Z.; Zhou, Z.; Cao, Z. Data augmentation with gabor filter in deep convolutional neural networks for SAR target recognition. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 689–692. [Google Scholar]
Zheng, C.; Jiang, X.; Liu, X. Semi-supervised sar atr via multi-discriminator generative adversarial network. IEEE Sens. J. 2019, 19, 7525–7533. [Google Scholar]
Du, S.; Hong, J.; Wang, Y.; Qi, Y. A high-quality multicategory SAR images generation method with multiconstraint GAN for ATR. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4011005. [Google Scholar]
Wang, X.; Hui, B.; Guo, P.; Jin, R.; Ding, L. Coarse-to-Fine Structure and Semantic Learning for Single-Sample SAR Image Generation. Remote Sens. 2024, 16, 3326. [Google Scholar] [CrossRef]
Jin, R.; Cheng, J.; Wang, W.; Zhang, H.; Zhang, J. Attribute Feature Perturbation-Based Augmentation of SAR Target Data. Sens. 2024, 24, 5006. [Google Scholar]
Wang, H.; Tan, Z.; Kuang, S.; Yu, A. DDPM investigation on centrifugal slurry pump with inlet and sideline configuration retrofit. Powder Technol. 2025, 449, 120386. [Google Scholar]
Zhang, X.; Li, Y.; Li, F.; Jiang, H.; Wang, Y.; Zhang, L.; Zheng, L.; Ding, Z. Ship-Go: SAR Ship Images Inpainting via instance-to-image Generative Diffusion Models. ISPRS J. Photogramm. Remote Sens. 2024, 207, 203–217. [Google Scholar]
Hu, X.; Xu, Z.; Chen, Z.; Feng, Z.; Zhu, M.; Stankovic, L.J. SAR Despeckling via Regional Denoising Diffusion Probabilistic Model. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024. [Google Scholar]
Bai, X.; Pu, X.; Xu, F. Conditional Diffusion for SAR to Optical Image Translation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4000605. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [PubMed]
Zhang, L.Z.L.; Zhang, L.Z.L.; Mou, X.M.X.; Zhang, D.Z.D. Fsim: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [PubMed]
Zeng, Z.; Sun, J.; Xu, C.; Wang, H. Unknown SAR Target Identification Method Based on Feature Extraction Network and KLD–RPA Joint Discrimination. Remote Sens. 2021, 13, 2901. [Google Scholar] [CrossRef]

Figure 1. An overview of the proposed SAR generation paradigm.

Figure 2. Calculations within the local fusion module.

Figure 3. The proposed angle discriminator.

Figure 4. Samples of SAR targets in the MSTAR dataset. Each type of target is captured at angular intervals of 1 degree.

Figure 5. Samples in our self-constructed SAR image dataset. Each type of target is captured at angular intervals of 5 degrees.

Figure 6. Visual comparison of generated SAR images with and without using LFM.

Figure 7. Impact of the sparse parameter K on generation quality.

Figure 8. Training dynamics of baseline model (without LFM and structural loss). Black curve: generator loss; red curve: discriminator loss.

Figure 9. Training dynamics of proposed model (with LFM and structural loss). Black curve: generator loss; red curve: discriminator loss.

Figure 10. Examples of the generated images from the two experimental datasets. The red boxes represent real images.

Figure 11. Visual presentation of the generated multi-angle target samples: (a) 10 types of synthetic target samples with random angles the in CV-SARID dataset, corresponding to (b) 7 types of synthetic target samples with random angles in MSTAR.

Figure 12. Qualitative comparisons on the generated small targets in CV-SARID dataset.

Figure 13. Qualitative comparisons on the generated small targets in MSTAR dataset.

Figure 14. Confusion matrix of the classification results on the CV-SARID dataset (trained on 30-degree pitch angle images and evaluated on the 25-degree ones). (a–e) represent results of the five trials.

Figure 15. Confusion matrix of the classification results on the CV-SARID dataset (trained on 25-degree pitch angle images and evaluated on the 30-degree ones). (a–e) represent results of the five trials.

Figure 16. Confusion matrix of the classification results on the CV-SARID dataset (trained on 30-degree pitch angle images and evaluated on the 45-degree ones). (a–e) represent results of the five trials.

Figure 17. Confusion matrix of the classification results on the CV-SARID dataset (trained on 25-degree pitch angle images and evaluated on the 45-degree ones). (a–e) represent results of the five trials.

Table 1. Computational costs of the proposed framework with different components.

Methods	Parameter Count	Training Time	GPU Usage
Base model	3,628,731	4 h	12,343 MB
+LFM	5,464,297	6.2 h	16,487 MB
+LFM+AZS	6,372,876	6.2 h	17,477 MB
+LFM+AZS+Angle Discriminator	7,423,157	7.5 h	18,294 MB

Table 2. Quantitative results of the generated data.

Evaluation Metrics	FID	SSIM	SNR
CV-SARID	0.33	0.52	19 dB
MSTAR	0.14	0.73	25 dB

Table 3. Quantitative comparison of the results obtained on the CV-SARID dataset.

Evaluation Metrics	FID	SSIM	SNR
FIGR	0.64	0.22	4 dB
F2GAN	0.49	0.31	9 dB
Matching GAN	0.44	0.47	11 dB
Ours	0.27	0.68	19 dB

Table 4. Quantitative comparison of the results obtained on the MSTAR dataset.

Evaluation Metrics	FID	SSIM	SNR
FIGR	0.57	0.18	12 dB
F2GAN	0.42	0.25	17 dB
Matching GAN	0.36	0.45	22 dB
Ours	0.19	0.73	25 dB

Table 5. The recognition accuracy of target samples with 25-degree pitch angle (CV-SARID dataset). The recognition rate before and after the augmentation is presented.

Trials	Before Augmentation	After Augmentation
1	91.13%	94.23%
2	87.32%	94.79%
3	87.61%	94.08%
4	91.27%	94.37%
5	82.96%	93.66%
Average	88.06%	94.22%

Table 6. The recognition accuracy of target samples with 45-degree pitch angle (CV-SARID dataset). The recognition rate before and after the augmentation is presented.

Trials	Before Augmentation	After Augmentation
1	69.71%	76.14%
2	71.29%	76.43%
3	72.57%	73.29%
4	73.71%	75.00%
5	70.00%	72.86%
Average	71.45%	74.74%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Hui, B.; Wang, W.; Guo, P.; Ding, L.; Lin, H. Angle-Controllable SAR Image Generation for Target Recognition with Few Samples. Remote Sens. 2025, 17, 1206. https://doi.org/10.3390/rs17071206

AMA Style

Wang X, Hui B, Wang W, Guo P, Ding L, Lin H. Angle-Controllable SAR Image Generation for Target Recognition with Few Samples. Remote Sensing. 2025; 17(7):1206. https://doi.org/10.3390/rs17071206

Chicago/Turabian Style

Wang, Xilin, Bingwei Hui, Wei Wang, Pengcheng Guo, Lei Ding, and Huangxing Lin. 2025. "Angle-Controllable SAR Image Generation for Target Recognition with Few Samples" Remote Sensing 17, no. 7: 1206. https://doi.org/10.3390/rs17071206

APA Style

Wang, X., Hui, B., Wang, W., Guo, P., Ding, L., & Lin, H. (2025). Angle-Controllable SAR Image Generation for Target Recognition with Few Samples. Remote Sensing, 17(7), 1206. https://doi.org/10.3390/rs17071206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Angle-Controllable SAR Image Generation for Target Recognition with Few Samples

Abstract

1. Introduction

2. Related Work

2.1. Few-Shot Generation Models

2.2. Augmentation of SAR Images

3. Proposed Method

3.1. Proposed SAR Image Generation Framework

3.2. Local Fusion Module (LFM)

3.2.1. Local Selection

3.2.2. Local Matching

3.2.3. Local Feature Blending

3.3. Angle Synthesis Algorithm

3.4. Angle Discriminator

3.5. Loss Functions

3.5.1. Local Reconstruction Loss

3.5.2. Adversarial Losses

3.5.3. Label and Angle Loss Functions

3.6. Implementation Details

4. Experimental Analysis and Discussions

4.1. Dataset Introduction

4.2. Evaluation Metrics

4.2.1. Fréchet Inception Distance (FID)

4.2.2. Structural Similarity Index Measure (SSIM)

4.2.3. Signal-to-Noise Ratio (SNR)

4.3. Ablation Study

4.4. Qualitative Evaluation of the Generative Results

4.5. Generation of Multi-Angle Target Images

4.6. Comparative Experiments

4.6.1. Qualitative Evaluation

4.6.2. Quantitative Evaluation

4.7. Generative Data Augmentation for SAR Target Recognition

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI