PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement

Huang, Jiale; Li, Xiaoyong; Liu, Lei; Shi, Xiaoran; Zhou, Feng

doi:10.3390/rs17173047

Open AccessArticle

PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement

by

Jiale Huang

¹

,

Xiaoyong Li

^2,*

,

Lei Liu

¹

,

Xiaoran Shi

¹

and

Feng Zhou

³

¹

Key Laboratory of Electronic Information Countermeasure and Simulation Technology of Ministry of Education, Xidian University, Xi’an 710071, China

²

Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China

³

School of Aerospace Science and Technology, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3047; https://doi.org/10.3390/rs17173047

Submission received: 12 July 2025 / Revised: 18 August 2025 / Accepted: 29 August 2025 / Published: 2 September 2025

Download

Browse Figures

Versions Notes

Abstract

Inverse Synthetic Aperture Radar (ISAR) imaging plays a crucial role in reconnaissance and target monitoring. However, the presence of uncertain factors often leads to indistinct component visualization and significant noise contamination in imaging results, where weak scattering components are frequently submerged by noise. To address these challenges, this paper proposes a Phase-Aware Multi-Scale Transformer network (PA-MSFormer) that simultaneously enhances weak component regions and suppresses noise. Unlike existing methods that struggled with this fundamental trade-off, our approach achieved 70.93 dB PSNR on electromagnetic simulation data, surpassing the previous best method by 0.6 dB, while maintaining only 1.59 million parameters. Specifically, we introduce a phase-aware attention mechanism that separates noise from weak scattering features through complex-domain modulation, a dual-branch fusion network that establishes frequency-domain separability criteria, and a progressive gate fuser that achieves pixel-level alignment between high- and low-frequency features. Extensive experiments on electromagnetic simulation and real-measured datasets demonstrate that PA-MSFormer effectively suppresses noise while significantly enhancing target visualization, establishing a solid foundation for subsequent interpretation tasks.

Keywords:

image enhancement; ISAR; image denoising; transformer

1. Introduction

Inverse Synthetic Aperture Radar (ISAR) [1,2,3] imaging technology generates high-resolution two-dimensional images by resolving the relative motion between targets and radar systems, offering irreplaceable applications in critical domains such as military reconnaissance, aerospace target identification, and environmental monitoring. However, constrained by inherent limitations including system noise, atmospheric attenuation, and non-uniform target scattering characteristics, ISAR images exhibit two fundamental defects: First, the weak scattering components of spatial targets often suffer from insufficient visualization; second, interference from speckle noise and background clutter degrades image contrast and blurs structural details. These issues not only compromise visual interpretation but critically hinder the reliability of subsequent tasks such as target detection, classification, and feature extraction [4,5,6,7], necessitating the development of intelligent enhancement methods capable of simultaneously amplifying weak components and suppressing noise with precision.

To address these challenges, existing enhancement methods can be broadly categorized into three approaches:

(1) Traditional Filter-Based Methods: Non-local means filtering [8] and wavelet denoising [9,10] effectively suppress noise but inevitably sacrifice weak scattering components, leading to structural distortion. Histogram equalization [11,12] and gamma correction [13] improve dynamic range but generate over-enhancement artifacts due to poor noise-background separation. These methods lack the ability to selectively enhance weak components while preserving structural integrity.

(2) Statistical Model-Based Methods: Techniques like BayesShrink [14] and Wiener filtering [15,16] rely on fixed thresholds or prior knowledge, exhibiting limited adaptability to complex noise environments. While preserving key information during noise reduction, they fail to enhance low-scattering regions, resulting in critical detail loss [17]. These approaches struggle with the fundamental trade-off between noise suppression and weak component preservation.

(3) Deep Learning-Based Methods: Recent advances have shown promise in ISAR image enhancement [18,19]. Wang et al. [20] developed a deep learning-enhanced ISAR-RID framework requiring extensive prior knowledge, while Qi et al. [21] introduced a recursive residual network with limitations in low-SNR robustness. Liu et al. [22] proposed a complex-valued network constrained by point-scattering assumptions. Conventional CNN models (e.g., U-Net) [23,24] suffer from local receptive fields that cannot distinguish overlapping frequency-domain features between weak components and noise. Transformer-based approaches [25,26] capture long-range dependencies but lack phase-sensitive modeling essential for ISAR complex-valued data processing [27].

A critical limitation across all existing methods is their inability to simultaneously enhance weak scattering components while effectively suppressing noise. Most techniques either over-smooth weak features during noise reduction or amplify noise when enhancing weak components, failing to achieve the delicate balance required for high-quality ISAR image enhancement.

To address these challenges, this study proposes a Phase-Aware Multi-Scale Transformer network (PA-MSFormer) that achieves multimodal feature disentanglement and cross-level collaborative optimization. The method establishes spectral-selective pathways within an encoder-decoder architecture, and balances computational efficiency with enhancement accuracy through lightweight design. Specifically, the proposed multi-scale downsampling fusion (MSDF) module and progressive gated fusion (PGF) module extract multi-level features while enabling pixel-level alignment between high-frequency detail components and low-frequency features. These modules are integrated with phase-aware Transformer blocks, each containing a Phase-Aware Multi-Head Self-Attention (PA-MSA) module and a dual-branch gated feed-forward network (DBGFN). This architecture effectively distinguishes weak scattering regions from noise-corrupted areas, thereby enhancing weak components while suppressing noise interference.

The contributions of this work can be summarized as follows:

1.: We propose a PA-MSA module that implements complex-domain feature modulation through learnable phase difference parameters. By embedding orthogonal phase encoding in query/key vectors, this module effectively separates spectral components of noise and weak scattering features via frequency-phase decomposition.
2.: A DBGFN module is introduced to decouple feature selection from nonlinear mapping through channel-splitting strategies. Dynamic weight adjustment mechanisms distinguish noise suppression paths from weak component enhancement paths, enabling precise feature selection.
3.: This MSDF module employs parallel multi-branch convolutions and max-pooling operations to establish frequency-complementary downsampling pathways. The design achieves joint optimization of dynamic noise suppression and weak scattering component preservation through band-selective sampling.
4.: The PGF module implements deformable transposed convolution for progressive upsampling, combined with channel-attention gating for cross-level feature weighting. This mechanism achieves pixel-level alignment between high-frequency details and low-frequency features during decoding, overcoming the feature misalignment issues inherent in traditional skip connections.

This paper is structured as follows. Section 2 elaborates on the proposed network’s architecture, including its phase-aware attention mechanisms and multi-scale feature decoupling modules. Section 3 evaluates performance via simulations and real-world experiments, benchmarking against state-of-the-art (SOTA) methods in terms of quantitative metrics and visual fidelity. Section 4 concludes with a summary of key contributions and future research directions for improving real-time applicability in radar imaging systems.

2. Materials and Methods

2.1. Enhancement Network

To address the conflicting requirements of weak scattering component enhancement and noise suppression in ISAR imagery, this paper proposes a PA-MSFormer that achieves synergistic optimization through complex-domain feature modulation and cross-scale progressive fusion. As shown in Figure 1a, the network adopts a U-shaped encoder-decoder architecture. The encoder stage employs cascaded TransformerBlocks and MSDF modules to hierarchically extract multi-level features while compressing spatial dimensions. The MSDF module captures multi-frequency information and eliminates redundancy, while TransformerBlocks enhance weak component features through frequency-phase joint modulation. These components collaboratively perform downsampling, multi-scale feature extraction, and critical weak component enhancement for observed ISAR targets.

In the bottleneck layer, stacked phase-aware TransformerBlocks model long-range cross-region dependencies to strengthen global feature associations and weak component responses. The decoder stage implements PGF modules to gradually restore resolution, dynamically weighting cross-level encoder-decoder features via channel-attention gating. Combined with TransformerBlock global modeling, this architecture separates target-related information from noise in ISAR imagery, enabling artifact removal and detail reconstruction.

Given complex-valued ISAR input data containing weak components, real and imaginary parts are processed as dual-channel inputs

I \in R^{H \times W \times 2}

. An initial 3 × 3 convolution embeds the input into high-dimensional features

F \in R^{H \times W \times C}

, which undergo end-to-end processing to produce enhanced complex-valued outputs

\hat{I} \in R^{H \times W \times 2}

while preserving intrinsic phase information.

2.2. Key Architectural Components

TransformerBlock. As shown in Figure 1b, the TransformerBlock consists of two Layer Normalization (LN) layers, a PA-MSA module, and a DBGFN. This block serves as a fundamental computational unit throughout the network, employed in encoder, bottleneck, and decoder stages.

PA-MSA. As shown in Figure 1c, the PA-MSA module enhances traditional self-attention mechanisms by addressing the phase sensitivity of ISAR complex-valued data. Unlike conventional self-attention that directly computes dot-product correlations among queries (Q), keys (K), and values (V), PA-MSA introduces learnable phase difference parameters for complex-domain modulation. This design is motivated by the fundamental physical distinction between weak scattering components and noise in ISAR imagery: weak scattering components follow predictable phase patterns determined by electromagnetic scattering mechanisms, while noise exhibits random phase distribution. By explicitly modeling these phase characteristics, our module achieves frequency-phase decomposition that separates signal from noise without manual filter design.

Design Principle: In ISAR imaging, weak scattering components and noise often occupy overlapping frequency bands but exhibit distinct phase behaviors. Traditional self-attention mechanisms operating on magnitude-only data cannot distinguish these overlapping components. Our approach leverages the complex-valued nature of ISAR data by introducing orthogonal phase modulation - sine modulation for queries enhances contrast of weak scattering components, while cosine modulation for keys suppresses noise interference. This creates a frequency-phase joint representation that enables precise separation of target features from noise.

Input features

X \in R^{H \times W \times C}

are reshaped into token sequences

X \in R^{H W \times C}

, where

H, W

are feature dimensions and C denotes channels. Then is split into h heads:

X = [X_{1}, X_{2}, \cdot \cdot \cdot, X_{h}]

(1)

where

X_{i} \in R^{H W \times d_{h}}, d_{h} = \frac{C}{h}

is the dimension of each attention head. Three bias-free fully connected layers map (

f c

) inputs to

Q_{i} \in R^{H W \times d_{h}}

,

K_{i} \in R^{H W \times d_{h}}

,

V_{i} \in R^{H W \times d_{h}}

:

Q_{i} = f c (X_{i}) W_{Q_{i}}, K_{i} = f c (X_{i}) W_{K_{i}}, V_{i} = f c (X_{i}) W_{V_{i}}

(2)

where

W_{Q_{i}}, W_{K_{i}}, W_{V_{i}} \in R^{C \times d_{i}}

denote learnable parameters.

To amplify discriminability between weak/strong scattering regions and noise, learnable phase parameters

ϕ_{q}, ϕ_{k} \in R^{H \times d_{i}}

are introduced. Sine/cosine functions modulate Q and K:

Q^{'} = Q ⊙ s i n (ϕ_{q}), K^{'} = K ⊙ c o s (ϕ_{k})

(3)

where ⊙ denotes element-wise multiplication. This phase modulation enhances spectral contrast between weak scattering components and noise.

L2-normalized

Q^{'}

and

K^{'}

compute attention weights:

A = V S o f t m a x (\frac{K' Q'}{\sqrt{d_{h}}} \cdot γ)

(4)

where

γ \in R^{H \times 1 \times 1}

is a learnable scaling factor that adaptively adjusts the attention distribution. Multi-head outputs are concatenated, linearly transformed and then plus a positional encoding

P \in R^{H W \times C}

. Finally, we reshaped to produce final features

F_{o u t} \in R^{H \times W \times C}

.

The learnable phase parameters eliminate manual filter design requirements, enabling adaptive focus on weak component frequency bands while suppressing noise spectra.

DBGFN. The second core component of the TransformerBlock is the DBGFN, illustrated in Figure 1d. This module replaces conventional FFNs in Transformer blocks, employing gating mechanisms and depth-wise convolutions to enhance fine-grained feature processing.

Design Principle: Traditional FFN architectures apply uniform nonlinear transformations across all features, failing to distinguish between noise-dominated and signal-rich frequency bands. In radar signal theory, where noise and target scattering exhibit distinct statistical properties in the complex domain, our DBGFN introduces a channel-splitting strategy that decouples these conflicting tasks into two specialized pathways: the gate branch focuses on noise suppression via nonlinear transformations, while the trunk branch preserves weak scattering components with minimal distortion.

Given input features

X \in R^{H \times W \times C}

, the operational pipeline is defined as:

A 1 × 1 convolution expands the input channels, followed by a 5 × 5 depth-wise convolution that extracts localized spatial features. Subsequently, the channels are divided into two branches

F_{g a t e}, F_{t r u n k} \in R^{C \times H \times W}

.

F_{g a t e}, F_{t r u n k} = S p l i t (D W C o n v_{5 \times 5} (C o n v_{1 \times 1} (X)))

(5)

where

C o n v_{1 \times 1} ()

denotes a 1 × 1 convolution,

D W C o n v_{5 \times 5} ()

represents a 5 × 5 depth-wise convolution, and

S p l i t ()

refers to channel-wise feature partitioning.

The gated mechanism is implemented through nonlinear transformation and feature fusion:

F_{g a t e} = τ (F_{1}) ⊙ F_{2}

(6)

where

τ (\cdot)

introduces nonlinearity to enhance feature discriminability, and ⊙ denotes element-wise multiplication, adaptively selecting salient features while suppressing redundancies.

F_{o u t} = C o n v_{1 \times 1} (F_{g a t e})

(7)

where

C o n v_{1 \times 1} ()

restores the original channel dimension. This gated architecture dynamically prioritizes critical feature components, with

F_{o u t}

representing the final refined feature maps.

MSDF. To enhance fine-grained weak component extraction by the PA-MSA module, we design the MSDF module (Figure 2a). MSDF implements frequency-complementary downsampling through mixed convolutional kernels and pooling operations.

Design Principle: In ISAR imagery, weak scattering components and noise exhibit complementary frequency characteristics across different spatial scales. Traditional single-scale downsampling fails to capture this multi-scale information, leading to either insufficient noise suppression or loss of weak component details. Our MSDF module addresses this challenge by implementing parallel processing paths that capture complementary frequency information: the 4 × 4 convolutional path extracts broadband spatial-spectral features critical for weak component preservation, while the max-pooling path emphasizes dominant energy distributions essential for noise suppression.

The input features

X \in R^{H \times W \times C}

are processed via two parallel paths:

F_{conv} = C o n v_{4 \times 4} (X)

(8)

where

C o n v_{4 \times 4} ()

denotes 4 × 4 convolution kernels (stride = 2) for capturing wideband spatial-spectral features, and

F_{conv} \in R^{\frac{H}{2} \times \frac{W}{2} \times 2 C}

.

F_{p o o l} = M a x P o o l_{2 \times 2} (X)

(9)

where

F_{pool} \in R^{\frac{H}{2} \times \frac{W}{2} \times C}

.

The dual-branch outputs are concatenated along the channel dimension and fused via a 1 × 1 convolution for cross-band information interaction:

F_{out} = C o n v_{1 \times 1} ([F_{c o n v} \oplus F_{p o o l}])

(10)

where ⊕ denotes the channel-wise concatenation operation that merges features along the channel dimension.

PGF. In the decoder stage, the PGF module (Figure 2b) implements deformable upsampling chains and channel-attention gating for cross-level feature refinement.

Design Principle: Traditional skip connections in U-Net architectures suffer from feature misalignment due to information loss during downsampling operations, particularly affecting weak scattering component reconstruction in ISAR imagery. Our PGF module employs deformable transposed convolution to enable adaptive spatial sampling, precisely aligning high-frequency details with low-frequency features and overcoming the fixed-grid limitation of standard interpolation. A channel-attention gate then dynamically weights each feature according to its spatial context, selectively preserving weak scattering signatures while suppressing noise artifacts. This design is motivated by radar imaging physics: weak scattering responses exhibit spatially varying characteristics that rigid upsampling strategies inherently fail to capture.

Low-resolution features

F_{d o w n} \in R^{\frac{H}{2} \times \frac{W}{2} \times 2 C}

are first processed through multi-stage transposed convolution upsampling:

F_{u p}^{i} = R E L U (D e F o r m C o n v_{3 \times 3}^{i} (F_{d o w n}))

(11)

where

R e L U (\cdot)

denote the rectified linear unit activation function, i denotes the processing stage, and

D e F o r m C o n v_{3 \times 3}^{i} ()

represents the transposed convolution operation. The progressive upsampling approach avoids detail blurring caused by interpolation.

The upsampled results

F_{u p} \in R^{H \times W \times C}

are then concatenated with skip-connected features

F_{s k i p} \in R^{H \times W \times C}

from the encoder along the channel dimension:

F_{c a t} = C o n c a t (F_{u p}, F_{s k i p})

(12)

where

F_{s k i p} \in R^{H \times W \times C}, F_{c a t} \in R^{H \times W \times 2 C}

. Spatial attention weights are then generated through global average pooling (GAP) and sigmoid activation:

A = σ (G A P (F_{cat}))

(13)

where

A \in R^{H \times W \times 1}

, and

σ ()

denotes the sigmoid function for weight normalization.

The concatenated features are adaptively weighted through a gating mechanism and a 1 × 1 convolution integrates channel-wise information:

F_{g a t e} = A ⊙ F_{c a t}

(14)

F_{out} = C o n v_{1 \times 1} (F_{g a t e})

(15)

where

F_{o u t} \in R^{H \times W \times C}

. The encoder details preserved through skip connections mitigate gradient vanishing issues during network training.

2.3. Loss Function Design

To simultaneously constrain pixel-level reconstruction accuracy and high-frequency detail preservation, we propose a Wavelet-Domain Constrained Hybrid Loss combining L1 reconstruction loss [28] and wavelet high-frequency subband loss:

L_{t o t a l} = λ_{1} L_{L 1} + λ_{2} L_{H F}

(16)

where

L_{L 1}

denotes the L1 reconstruction loss,

L_{H F}

represents the wavelet high-frequency subband loss, and

λ_{1}, λ_{2}

is a balancing coefficient.

The L1 loss directly constrains pixel-level discrepancies between enhanced images and ground-truth labels, defined as:

L_{L 1} = \frac{1}{N} \sum_{i = 1}^{N} ‖ Y_{i} - Y_{g t, i} ‖_{1}

(17)

where N denotes the total pixel count,

Y_{i}

represents network outputs, and

Y_{g t, i}

indicates ground-truth values. Compared to L2 loss, the L1 formulation demonstrates enhanced robustness to outliers and prevents over-smoothing artifacts in reconstructed ISAR imagery.

To enhance high-frequency details corresponding to weak scattering components, we introduce Haar wavelet decomposition to enforce subband feature alignment:

L_{H F} = \frac{1}{3 N} \sum_{s \in {L H, H L, H H}} {\sum_{i = 1} ‖ W_{s} {(Y)}_{i} - W_{s} {(Y_{g t})}_{i} ‖}_{2}^{2}

(18)

where

W_{s} ()

denotes the first-level Haar wavelet decomposition operator extracting specific subbands: LH: Horizontal-vertical high-frequency components; HL: Vertical-horizontal high-frequency components; HH: Diagonal high-frequency components.

By minimizing subband reconstruction errors, the network explicitly learns to preserve texture information associated with weak scattering components. In ISAR imagery, these components typically manifest as high-frequency scattering features. The proposed joint optimization strategy selectively enhances target-related high-frequency components while suppressing noise interference through frequency-domain constraints. The training strategy of PA-MSFormer is depicted in Algorithm 1.

Algorithm 1 Training strategy of PA-MSformer

1:: Initialize: network parameters $θ$ ;
2:: for each epoch = 1, 2, …, N do
3:: for each batch b in dataset $D$ do
4:: Get input reference data $Y_{b}^{gt}$ , simulate the data $Y_{b}^{\deg}$ under two typical degradation scenarios;
5:: Perform forward pass: get reconstructed image $Y = f_{i m g} (Y_{b}^{deg}, θ)$ ;
6:: Compute L1 loss using Equation (17);
7:: Compute wavelet high-frequency subband loss using Equation (18);
8:: Compute total loss using Equation (16);
9:: Perform backward pass: update $θ$ ;
10:: end for
11:: end for

3. Results

3.1. Experimental Configuration

Datasets. To address the lack of public benchmark datasets for ISAR image enhancement tasks, this study constructs a dedicated electromagnetic simulation dataset. Utilizing X-band radar parameters (center frequency: 9.6 GHz, bandwidth: 1 GHz, pulse repetition frequency: 1 kHz), comprehensive electromagnetic simulations are performed on satellite models including Aqua, Beidou-2, Dawn, Hubble Space Telescope (HST), and CALIPSO, as shown in Figure 3. Full 360° aspect angle projections (azimuth step: 1°) generate ground-truth ISAR images with 512 × 512 resolution. Based on ISAR imaging characteristics, two typical degradation scenarios are systematically designed: (1) Weak component degradation: Local structural features are randomly attenuated through controlled scattering coefficient reduction to simulate weak scattering visualization challenges. (2) Composite low-SNR degradation: Gaussian noise, speckle noise, and impulse noise are synthetically superimposed on degraded images to emulate complex noise environments.

For each satellite model, 200 sample pairs are generated per degradation scenario through electromagnetic simulation; the dataset is then partitioned into training (90%) and testing (10%) subsets. Spatial augmentation techniques—including random cropping to 128 × 128 patches, ±90°/180° rotations, and horizontal/vertical flipping are applied during training to enhance data diversity and mitigate overfitting risks.

Comparison Baselines. The proposed PA-MSFormer, an improved UNet-derived architecture, is benchmarked against SOTA methods including Transformer-based frameworks (IGDFormer [29], Retinexformer [30], Restormer [31]), multi-stage networks (MIRNet_v2 [32], MPRNet [33]), and CNN-based models (HINet [34], NAFNet [35]). All competing methods were trained and evaluated on our unified dataset using their publicly available implementations to ensure fair comparisons.

Implementation Details. The PA-MSFormer was implemented in PyTorch [36] with an Adam optimizer [37], employing a cosine annealing learning rate schedule (initial:

2 \times 10^{- 4}

, final:

1 \times 10^{- 6}

) [38] and a batch size of 8. Training spanned 100 epochs on NVIDIA GeForce RTX 4090 GPUs, with model selection based on peak validation peak signal-to-noise ratio (PSNR) performance.

3.2. Simulated Electromagnetic Data Enhancement Comparision Results

To demonstrate the superiority of the proposed method in weak-scattering component enhancement, we compare PA-MSFormer with SOTA methods in unified dataset evaluations. All competing methods were trained using their publicly available implementations.

Quantitative Analysis. Table 1 systematically compares PA-MSFormer with advanced methods in ISAR image enhancement across two degradation scenarios: weak scattering component degradation and composite noise degradation. Performance is quantified using PSNR for reconstruction accuracy and structural similarity index measure (SSIM) [39] for perceptual quality. PSNR measures global error between enhanced and reference images, reflecting weak component recovery and noise suppression capabilities, while SSIM evaluates contrast, luminance, and structural preservation for human visual perception.

In weak scattering degradation scenarios, PA-MSFormer achieves 74.11 dB PSNR and 0.9892 SSIM, significantly outperforming all comparative methods. Compared with the SOTA IGDFformer, the proposed method improves PSNR by 0.17 dB. Under composite noise degradation scenarios, PA-MSFormer maintains superior performance with 70.93 dB PSNR, exceeding MPRNet by 0.6 dB. These results demonstrate its synergistic optimization capability in weak component enhancement and noise suppression, particularly excelling in high-frequency detail reconstruction and structural preservation.

Computational Efficiency. As shown in Figure 4, PA-MSFormer demonstrates dual advantages in parameter efficiency and computational complexity. With 1.59 million (M) parameters, it reduces model size by 73.0% compared to MIRNet_v2 (5.86 M) and maintains 90.0% parameter count of Retinexformer. The computational complexity measured at 16.06 G FLOPS shows 7.1% reduction from Retinexformer (17.29 G), and achieves remarkable compression ratios of 97.0% and 89.3% against MPRNet (533.83 G) and HINet (146.64 G), respectively. The “Time” reported in the table is computed as the average processing time per frame, defined as

T i m e = \frac{T_{total}}{N_{frames}}

(19)

where

T_{total}

denotes the total inference time and

N_{frames}

is the number of test frames. Notably, the proposed architecture achieves the fastest single-frame processing speed among all evaluated methods.

In summary, PA-MSFormer achieves an effective balance among weak component enhancement accuracy, noise suppression capability, and computational efficiency, establishing its competitiveness in radar image processing.

Weak component degradation Qualitative Results. Figure 5 and Figure 6 present visual comparisons of PA-MSFormer against SOTA methods for the enhancement of weak scattering component in ISAR imaging. The validation employs electromagnetic simulation data from Aqua and Beidou-2 satellites, with red and blue bounding boxes highlighting critical detail regions.

As shown in Figure 5, for the Aqua satellite enhancement results, PA-MSFormer achieves optimal reconstruction of the solar panel surface grid structure in the red-boxed region. Its output not only preserves the contrast characteristics of grid textures but also significantly surpasses other methods in structural continuity. Retinexformer produces dim and blurred visual effects in this region due to insufficient enhancement intensity, while MPRNet causes over-enhancement of strong scattering points, leading to imbalanced detail recovery between solar panels and main structures. For the dual-side solar panel structure of Beidou-2 satellite (Figure 6), PA-MSFormer demonstrates uniform and high-fidelity enhancement. Notably, this method successfully retains weak scattering features at solar panel edges while suppressing background noise, achieving synergistic optimization between visual perception and quantitative metrics.

The experiments indicate that PA-MSFormer possesses superior enhancement consistency across target main bodies and auxiliary structures, with reconstruction accuracy for deep-level details (e.g., antenna structures) reaching the current SOTA level. These advantages originate from the adaptive enhancement module and feature decoupling strategy introduced in the network architecture, which achieve balanced optimization between degradation modeling and visual fidelity.

Composite low-SNR degradation Qualitative Results. To validate PA-MSFormer’s enhancement capability under coexisting strong noise and weak scattering components, this study further designs comparative experiments under extreme degradation conditions as shown in Figure 7 and Figure 8. The experiments focus on harsh environments where critical weak scattering features of key structures (e.g., solar panels) are severely overwhelmed by noise, posing dual challenges to both noise suppression and weak component reconstruction. Experimental results demonstrate that PA-MSFormer achieves precise recovery of target structures in extreme degradation scenarios, with significant superiority over existing methods across structural consistency, noise suppression capability, and detail fidelity.

As illustrated in Figure 7 and Figure 8 for the composite low-SNR degradation enhancement task, the input image suffers from near-complete information loss in solar panels and main structures due to strong noise interference. PA-MSFormer implements dual-channel feature decoupling mechanisms to optimize noise suppression and weak component enhancement through path-separated strategies: The red-boxed solar panel grid structure eliminates residual noise while preserving texture details. Among comparative methods, IGDFormer and MPRNet excessively smooth weak component regions during noise suppression, causing substantial structural information loss; MIRNet_v2 and Retinexformer achieve moderate information recovery but exhibit inadequate background noise suppression; HINet shows enhanced performance yet suffers from global noise amplification. This performance disparity highlights the innovation advantage of our noise-signal separation modeling.

In conclusion, PA-MSFormer achieves balanced optimization of performance and efficiency in weak scattering component enhancement and noise suppression tasks for ISAR imaging through innovative phase-aware and multi-scale fusion strategies. Particularly under extreme degradation scenarios, it demonstrates remarkable robustness, not only resolving the noise-signal separation dilemma in traditional methods but also ensuring reconstruction authenticity via physically constrained mechanisms, thereby providing a solution with both theoretical innovation and engineering practicality for radar image processing.

3.3. Measured Data Enhancement Comparision Results

To systematically evaluate PA-MSFormer’s generalization capability in real-world scenarios, this study conducts weak scattering component enhancement experiments based on two publicly available measured ISAR datasets: Yak aircraft and Citation aircraft. For the Yak dataset with inherently high imaging quality, two typical degradation scenarios are constructed by artificially attenuating the echo intensity in the right-wing region. In contrast, the Citation aircraft dataset, which exhibits poor intrinsic imaging quality, is directly fed into the network for enhancement processing.

It should be noted that objective quantitative metrics are statistically analyzed exclusively for the Yak dataset due to its controllable degradation scenarios and evaluable characteristics. Particularly, the experiment strictly adheres to the “train-test data separation” principle: the network training process completely excludes any measured data exposure, with all measured samples serving as independent test sets for enhancement reconstruction via PA-MSFormer and multiple SOTA methods. This rigorous protocol comprehensively validates the proposed method’s robustness and engineering practicality in real radar observation scenarios.

Quantitative Results. Table 2 presents PSNR and SSIM comparisons of competing methods under two degradation scenarios on measured data. PA-MSFormer achieves 52.27 dB PSNR in the conventional degradation scenario, surpassing IGDFormer and Retinexformer by 2.01 dB and 2.00 dB respectively. Under the low-SNR extreme degradation scenario, it achieves 51.72 dB PSNR, demonstrating improvements exceeding 1.5 dB over the current SOTA methods. These results validate that PA-MSFormer achieves precise degradation reconstruction without fine-tuning on measured data, demonstrating superior cross-domain generalization capability.

Qualitative Results. Figure 9 further reveals performance disparities among competing methods through visual analysis. Taking the red-boxed left-wing region of the Yak aircraft as an example, PA-MSFormer’s output not only clearly resolves wing surface texture details but also maintains high consistency in enhancement intensity with the main body region, while preserving the inherent sparsity characteristics of radar echoes.

Figure 10 validates the method’s balanced capability in weak component enhancement and noise suppression. The right-wing enhancement in the red-boxed region demonstrates PA-MSFormer’s superior performance in maintaining visual consistency with the main body while significantly suppressing background noise and retaining edge texture details. Conversely, Retinexformer, Restormer, and IGDFormer exhibit residual background speckles due to inadequate noise suppression, with enhanced outputs generating artificial scattering features. MPRNet and MIRNet_v2 suffer from blurred right-wing regions caused by imbalanced enhancement intensity, deviating from the observed target’s overall visual characteristics. In the blue-boxed tail region, PA-MSFormer’s reconstruction outperforms existing methods in both edge sharpness preservation and authenticity of strong/weak scattering distribution patterns.

For the directly input Citation aircraft measured data (Figure 11), most SOTA methods demonstrate significant over-enhancement phenomena, introducing abundant non-physical artifacts such as local brightness saturation and structural distortions. Compared to reference images, these approaches exhibit notable deviations in scattering intensity distribution and geometric structure consistency. Among them, NAFNet and IGDFormer achieve relatively superior global imaging quality but still suffer from wing component detail loss. In contrast, PA-MSFormer effectively suppresses artifact generation while preserving global target contours, with precise reconstruction of wing-edge scattering features.

3.4. Robustness Validation

To rigorously validate PA-MSFormer’s robustness under extreme degradation conditions, this study establishes a systematic evaluation framework based on Tiangong-model ISAR images. Through systematic multi-level SNR degradation scenarios (SNR range: −10 dB to 5 dB), the noise-corrupted data were subjected to multidimensional comparative testing. A blind testing protocol was implemented, where noisy inputs were directly fed into PA-MSFormer and multiple SOTA methods for enhancement processing. Key metrics including performance stability across noise intensity gradients and detail reconstruction fidelity were comprehensively assessed.

Qualitative Results. Figure 12 visually demonstrates that reconstruction performance diverges significantly as SNR decreases progressively. Under extreme low-SNR conditions (SNR = −10 dB), PA-MSFormer maintains high-fidelity target structure reconstruction with only minor background noise residuals, whereas all comparative methods exhibit varying degrees of detail loss and artifact interference. The proposed method’s robustness advantages under extreme low-SNR scenarios are particularly prominent: Its output preserves weak scattering features (e.g., solar panel edges and wing textures) while achieving visual consistency between enhanced regions and main structures, with background noise suppressed to subthreshold levels for target identification.

Quantitative Results. Figure 13 presents the PSNR value trends of competing methods under multi-level SNR conditions through line plots. Experimental data demonstrate that PA-MSFormer maintains superior reconstruction accuracy even in extreme degradation scenarios (SNR = −10 dB). This result validates the method’s precise weak-component recovery capability under strong noise interference and further confirms its noise suppression performance exhibits significant robustness.

The vulnerabilities of competing approaches become increasingly evident under low-SNR conditions. MPRNet exhibits information loss at SNR = 5 dB, and its residual learning mechanism demonstrates heightened sensitivity to high-frequency noise, causing progressive reconstruction distortion with increasing noise intensity. While MIRNet_v2 achieves moderate background noise suppression, its multi-scale enhancement strategy over-smooths weak scattering regions, leading to feature blurring. Restormer, Retinexformer, and IGDFormer perform adequately at SNR ≥ 0 dB but completely lose weak component recovery capabilities under extreme conditions (SNR = −10 dB), manifested as geometric feature disappearance in right-wing structures.

PA-MSFormer demonstrates significant advantages in robustness validation experiments, particularly exhibiting superior reconstruction performance under extreme low-SNR conditions. Its innovative architectural design provides technical pathways for resolving challenges in weak component enhancement and noise suppression for ISAR imaging, offering substantial theoretical and practical value for radar signal processing applications.

3.5. Ablation Study

To systematically validate the proposed method’s performance in low-SNR ISAR image enhancement tasks involving weak scattering components and quantitatively assess the effectiveness of each critical network component, ablation studies were conducted on the constructed extreme degradation dataset. The experiments focused on analyzing the contributions of the phase-aware TransformerBlock mechanism, multi-scale feature architecture, and cross-level skip connection fusion strategy to overall model performance.

Crucially, this study extends beyond individual module evaluation to explicitly investigate the synergistic effects between key components, revealing how their collaborative operation achieves the fundamental trade-off between weak component enhancement and noise suppression: (1) The PA-MSA and DBGFN modules form a frequency-domain feature disentanglement pipeline, where phase-aware modulation creates the spectral separation necessary for effective channel splitting; (2) The MSDF and PGF modules establish a multi-scale feature continuity framework, where hierarchical feature extraction enables precise pixel-level alignment during reconstruction. These synergies address the fundamental challenge in ISAR enhancement: simultaneously preserving weak scattering components while suppressing noise.

For the PA-MSA module within the phase-aware TransformerBlock mechanism, the necessity of dynamic phase modeling was validated through two modifications: first, by replacing its learnable phase parameters with fixed random phase parameters, and second, by completely removing the phase-aware layer while retaining only the amplitude-aware module. These ablations confirmed dynamic phase modeling’s critical role in reconstructing weak scattering features and its importance in noise-signal separation modeling.

Additionally, substituting the DBGFN module with standard Transformer FFN blocks further validated its unique value in nonlinear phase-aware feature mapping. To evaluate the MSDF and PGF modules in the encoder-decoder framework, control experiments were designed where single-scale convolution operations replaced MSDF modules during downsampling to test their multi-scale weak component modeling capability, and simple convolutional concatenation substituted PGF modules during upsampling to assess their synergistic optimization effects on noise suppression and detail reconstruction.

The balanced capability of weak component enhancement and noise suppression was rigorously verified through dual-dimensional validation using both electromagnetic simulation data and measured data.

In all experimental scenarios, replacing the learnable phase parameters in the PA-MSA module with random non-trainable phase information caused significant PSNR and SSIM degradation. As shown in Table 3, for low-SNR electromagnetic simulation data containing weak scattering components, the PSNR decreased from 70.93 dB to 68.49 dB. Further complete removal of the phase-aware layer while retaining only the amplitude-aware module led to additional performance deterioration. The visual results in Figure 14 demonstrate that introducing random phase information not only amplified background noise but also disrupted structural consistency in weak scattering regions. When the phase-aware layer was entirely removed, the observed targets exhibited background noise amplification with insufficient enhancement of valid regions. As evidenced by the measured data processing results in Figure 15, this approach caused overexposure effects on observation targets and introduced spurious scattering points.

The DBGFN module demonstrates critical synergy in noise suppression and feature enhancement. In low-SNR electromagnetic data, removing the DBGFN module caused a 2.61 dB PSNR drop, while measured data showed a 1.46 dB decrease. As illustrated in Figure 14, although the standard FFN module reconstructs weak scattering regions, its insufficient noise suppression leads to significant residual artifacts. This confirms that the DBGFN module’s frequency-domain sensitivity constraints enable simultaneous preservation of high-frequency target details and effective noise suppression.

Specifically, the PA-MSA module’s orthogonal phase modulation creates enhanced spectral contrast between weak scattering components and noise, providing the essential foundation for DBGFN’s channel-splitting strategy. Without this spectral separation, DBGFN cannot effectively distinguish noise-dominated channels from signal-rich channels, resulting in either over-smoothing of weak features or inadequate noise suppression. This interdependence is evidenced by the combined PA-MSA and DBGFN configuration consistently outperforms any single-module variant across all metrics. Visual evidence from Figure 13 and Figure 14 further demonstrates that isolated use of either module leads to significantly higher noise levels, while their synergistic operation achieves substantially improved noise suppression with preserved weak scattering features.

Removing the MSDF module resulted in 0.88 dB degradation on electromagnetic data and 1.89 dB reduction on measured data. Figure 14 reveals that despite moderate noise suppression, the texture details in weak scattering regions (e.g., solar panels) exhibit notable quality deterioration.

The PGF module removal degraded electromagnetic data by 1.44 dB and measured data by 1.79 dB. Visual results indicate that while the right-wing region remains partially reconstructable, edge sharpness and local texture details suffer significant loss.

The most compelling evidence of MSDF-PGF synergy appears in the measured data results (Figure 13): when both modules are present, the aircraft wing edges maintain sharpness and continuity across the entire structure; when PGF is removed but MSDF remains, the wing edges show discontinuities at certain sections; when MSDF is removed but PGF remains, the edges become blurred despite maintaining overall continuity. This pattern confirms that MSDF provides the multi-scale context necessary for PGF’s precise spatial alignment, while PGF implements the pixel-level fusion that preserves the structural integrity captured by MSDF.

4. Conclusions

To address the challenges of weak scattering component degradation in ISAR imaging and insufficient frequency-domain separation capability in conventional enhancement methods, this study proposes the PA-MSFormer. The proposed architecture achieves breakthroughs in complex-domain phase preservation, multi-scale feature decoupling, and cross-level context modeling. Specifically, the PA-MSA module introduces learnable phase shift parameters to resolve electromagnetic scattering feature distortion caused by traditional amplitude-only processing. Extensive experiments demonstrate that the method achieves significant superiority in both ISAR image detail reconstruction and quantitative accuracy. However, single-frame enhancement inevitably suffers from information loss. Future work will explore dynamic information exploitation from image sequences to further improve performance.

Author Contributions

Conceptualization, J.H.; methodology, J.H. and L.L.; validation, J.H.; formal analysis, J.H. and X.L.; investigation, J.H., X.L. and L.L.; writing—original draft preparation, J.H.; writing—review and editing, J.H., X.L., L.L., X.S. and F.Z.; visualization, J.H.; supervision, F.Z.; project administration, J.H.; funding acquisition, L.L. and F.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Program of the National Natural Science Foundation of China under Grant 62531020, in part by the National Natural Science Foundation of China under Grant 62425113, Grant 62501437, Grant 62401429, Grant 62371468, and Grant 62401445, in part by the Postdoctoral Fellowship Program of CPSF under Grant GZC20232048 and GZC20241332, in part by the Fundamental Research Funds for the Central Universities under Grant XJSJ24099, XJSJ24008, ZDRC2207, and ZYTS25144, in part by the National Key Basic Research Project under Grant 2022JCJQZD20600.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments to improve the paper quality.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ISAR	inverse synthetic aperture radar
PA-MSFormer	Phase-Aware Multi-Scale Transformer networ
PA-MSA	Phase-Aware Multi-Head Self-Attention
DBGFN	Dual-Branch Gated Fusion Network
PGF	Progressive Gate Fuser
SOTA	state-of-the-art
LN	Layer Normalization
Q	Queries
K	Keys
V	Values
PSNR	peak signal-to-noise ratio
SSIM	measurement coordinate system

References

Zhao, L.; Wang, J.; Su, J.; Luo, H. Spatial Feature-Based ISAR Image Registration for Space Targets. Remote Sens. 2024, 16, 3625. [Google Scholar] [CrossRef]
Li, W.; Yuan, Y.; Zhang, Y.; Luo, Y. Unblurring ISAR Imaging for Maneuvering Target Based on UFGAN. Remote Sens. 2022, 14, 5270. [Google Scholar] [CrossRef]
Liu, L.; Zhou, Z.; Li, C.; Zhou, F. An Enhanced Sequential ISAR Image Scatterer Trajectory Association Method Utilizing Modified Label Gaussian Mixture Probability Hypothesis Density Filter. Remote Sens. 2025, 17, 354. [Google Scholar] [CrossRef]
Zhou, W.; Liu, L.; Du, R.; Wang, Z.; Shang, R.; Zhou, F. Three-Dimensional Reconstruction of Space Targets Utilizing Joint Optical-and-ISAR Co-Location Observation. Remote Sens. 2025, 17, 287. [Google Scholar] [CrossRef]
Zhang, X.; Ye, P.; Leung, H.; Gong, K.; Xiao, G. Object fusion tracking based on visible and infrared images: A comprehensive review. Inf. Fusion 2020, 63, 166–187. [Google Scholar] [CrossRef]
Yang, S.; Jiang, W.; Tian, B. ISAR Image Matching and 3D Reconstruction Based on Improved SIFT Method. In Proceedings of the 2019 International Conference on Electronic Engineering and Informatics (EEI), Nanjing, China, 8–10 November 2019; pp. 224–228. [Google Scholar] [CrossRef]
Du, R.; Liu, L.; Bai, X.; Zhou, Z.; Zhou, F. Instantaneous Attitude Estimation of Spacecraft Utilizing Joint Optical-and-ISAR Observation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Wong, A.; Orchard, J. A nonlocal-means approach to exemplar-based inpainting. In Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 2600–2603. [Google Scholar] [CrossRef]
Halidou, A.; Mohamadou, Y.; Ari, A.A.A.; Zacko, E.J.G. Review of wavelet denoising algorithms. Multimed. Tools Appl. 2023, 82, 41539–41569. [Google Scholar] [CrossRef]
Tian, C.; Zheng, M.; Zuo, W.; Zhang, B.; Zhang, Y.; Zhang, D. Multi-stage image denoising with the wavelet transform. Pattern Recognit. 2023, 134, 109050. [Google Scholar] [CrossRef]
Paul, A.; Bhattacharya, P.; Maity, S.P. Histogram modification in adaptive bi-histogram equalization for contrast enhancement on digital images. Optik 2022, 259, 168899. [Google Scholar] [CrossRef]
Zhang, F.; Shao, Y.; Sun, Y.; Gao, C.; Sang, N. Self-supervised low-light image enhancement via histogram equalization prior. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; pp. 63–75. [Google Scholar] [CrossRef]
Sun, Z.-W.; Chen, Z.-Q.; Pei, Q.-Q.; Lai, J.-A. Low-Light Image Enhancement Network based on Gamma Correction and Multi-scale Attention Mechanism. In Proceedings of the 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 15–17 September 2023; Volume 7, p. 772. [Google Scholar] [CrossRef]
Hashemi, S.; Beheshti, S. Adaptive image denoising by rigorous Bayesshrink thresholding. In Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 28–30 June 2011; pp. 713–716. [Google Scholar] [CrossRef]
Jin, F.; Fieguth, P.; Winger, L.; Jernigan, E. Adaptive Wiener filtering of noisy images and image sequences. In Proceedings of the 2003 International Conference on Image Processing (Cat. No.03CH37429), Barcelona, Spain, 14–17 September 2003; Volume 3, p. III-349. [Google Scholar] [CrossRef]
Salehi, H.; Vahidi, J.; Abdeljawad, T.; Khan, A.; Rad, S.Y.B. A SAR Image Despeckling Method Based on an Extended Adaptive Wiener Filter and Extended Guided Filter. Remote Sens. 2020, 12, 2371. [Google Scholar] [CrossRef]
Limshuebchuey, A.; Duangsoithong, R.; Saejia, M. Comparison of Image Denoising using Traditional Filter and Deep Learning Methods. In Proceedings of the 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand, 24–27 June 2020; pp. 193–196. [Google Scholar] [CrossRef]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
Pan, X.; Zhan, X.; Dai, B.; Lin, D.; Loy, C.C.; Luo, P. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7474–7489. [Google Scholar] [CrossRef]
Wang, X.; Dai, Y.; Song, S.; Jin, T.; Huang, X. Deep Learning-Based Enhanced ISAR-RID Imaging Method. Remote Sens. 2023, 15, 5166. [Google Scholar] [CrossRef]
Qi, M.; Li, K.; Qi, X.; Hu, Y. ISAR image enhancement based on a Recursive Residual Network. In Proceedings of the 2023 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 22–24 September 2023; pp. 1391–1396. [Google Scholar] [CrossRef]
Liu, F.; Huang, D.; Guo, X.; Feng, C. SR-DNnet: A Deep Network for Super-Resolution and De-Noising of ISAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6567–6583. [Google Scholar] [CrossRef]
Liu, X.; Hu, J.; Chen, X.; Dong, C. UDC-UNet: Under-display camera image restoration via U-shape dynamic network. In Proceedings of the European Conference on Computer Vision. Springer, Tel Aviv, Israel, 23–27 October 2022; pp. 113–129. [Google Scholar] [CrossRef]
Jin, Z.; Iqbal, M.Z.; Bobkov, D.; Zou, W.; Li, X.; Steinbach, E. A Flexible Deep CNN Framework for Image Restoration. IEEE Trans. Multimed. 2020, 22, 1055–1068. [Google Scholar] [CrossRef]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A General U-Shaped Transformer for Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17662–17672. [Google Scholar] [CrossRef]
Zhao, H.; Gou, Y.; Li, B.; Peng, D.; Lv, J.; Peng, X. Comprehensive and Delicate: An Efficient Transformer for Image Restoration. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14122–14132. [Google Scholar] [CrossRef]
Mai, Y.; Zhang, S.; Jiang, W.; Zhang, C.; Liu, Y.; Li, X. ISAR Imaging of Target Exhibiting Micro-Motion With Sparse Aperture via Model-Driven Deep Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
He, X.; Cheng, J. Revisiting L1 loss in super-resolution: A probabilistic view and beyond. arXiv 2022, arXiv:2201.10084. [Google Scholar] [CrossRef]
Wen, Y.; Xu, P.; Li, Z.; Xu, W. An illumination-guided dual attention vision transformer for low-light image enhancement. Pattern Recognit. 2025, 158, 111033. [Google Scholar] [CrossRef]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 4–6 October 2023; pp. 12470–12479. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–22 June 2022; pp. 5718–5729. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning Enriched Features for Fast Image Restoration and Enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1934–1948. [Google Scholar] [CrossRef]
Mehri, A.; Ardakani, P.B.; Sappa, A.D. MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Virtual, 5–9 January 2021; pp. 2703–2712. [Google Scholar] [CrossRef]
Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. HINet: Half Instance Normalization Network for Image Restoration. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Virtual, 19–25 June 2021; pp. 182–192. [Google Scholar] [CrossRef]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the European conference on computer vision, Tel Aviv, Israel, 23–27 October 2022; pp. 17–33. [Google Scholar] [CrossRef]
Paszke, A. Pytorch: An imperative style, high-performance deep learning library. arXiv 2019, arXiv:1912.01703. [Google Scholar] [CrossRef]
Kingma, D. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the proposed methodology. (a) PA-MSFormer framework architecture. (b) TransformerBlock structure. (c) Internal configuration of the PA-MSA module. (d) Internal configuration of the DBGFN module.

Figure 2. Architectural details of key components. (a) Multi-Scale Downsampling Fusion (MSDF) module. (b) Progressive Gate Fuser(PGF) module implementing deformable upsampling and gated feature fusion.

Figure 3. The Three-dimensional Structural Models, Optical Models, and ISAR Imaging Results of Typical Satellites.

Figure 4. The trade-off between computational complexity (FLOPs) and PSNR for various ISAR image enhancement methods.

Figure 5. Aqua Model’s Visual Comparison under the Weak Component Degradation Scenario: PSNR Metrics Highlighted in Parentheses.

Figure 6. Beidou2 Model’s Visual Comparison under the Weak Component Degradation Scenario: PSNR Metrics Highlighted in Parentheses.

Figure 7. Aqua Model’s Visual Comparison under the Composite Low-SNR Degradation Scenario.

Figure 8. Beidou2 Model’s Visual Comparison under the Composite Low-SNR Degradation Scenario.

Figure 9. Yak aircraft Measured Data’s Visual Comparison under the Weak Component Degradation Scenario.

Figure 10. Yak aircraft Measured Data’s Visual Comparison under the Composite Low-SNR Degradation Scenario.

Figure 11. Citation aircraft Measured Data’s Visual Comparison under the Composite Low-SNR Degradation Scenario.

Figure 12. Weak Scattering Enhancement Visual Comparisons under SNR Variations.

Figure 13. PSNR Evaluation Curve of Weak Scattering Enhancement under SNR Variations.

Figure 14. Visual Comparative Analysis of Electromagnetic Data Ablation Experiments under Composite Low-SNR Degradation Scenario.

Figure 15. Visual Comparative Analysis of Measured Data Ablation Experiments under Composite Low-SNR Degradation Scenario.

Table 1. Quantitative Comparison Results for Electromagnetic Simulation Data Under Two Degradation Scenarios. The highest-performing metrics are marked with red, and the second-highest with blue.

Method	Complexity			Weak Degradation		Low-SNR Degradation
Method	FLOPS (G)	Params (M)	Time (s)	PSNR	SSIM	PSNR	SSIM
NAFNet	14.84	17.11	1.25	66.14	0.9005	65.49	0.8467
HINet	146.64	88.67	1.47	71.62	0.9286	66.90	0.5396
MIRNet_v2	130.49	5.86	1.33	72.07	0.9841	69.47	0.9390
Restormer	104.75	19.93	1.45	73.49	0.9878	69.78	0.9414
MPRNet	533.83	15.74	1.47	73.72	0.9869	70.33	0.9688
Retinexformer	17.29	1.77	1.19	73.75	0.9887	70.20	0.9546
IGDFormer	22.46	2.30	1.18	73.94	0.9881	70.04	0.9653
PA-MSFormer	16.06	1.59	1.13	74.11	0.9892	70.93	0.9710

Table 2. Quantitative Comparison Results for Yak aircraft Measured Data Under Two Degradation Scenarios. The highest-performing metrics are marked with red, and the second-highest with blue.

Method	Weak Degradation		Low-SNR Degradation
Method	PSNR	SSIM	PSNR	SSIM
NAFNet	48.77	0.9926	48.55	0.3797
HINet	47.11	0.9903	46.77	0.6453
MIRNet_v2	49.78	0.9920	49.40	0.9861
Restormer	51.05	0.9955	50.58	0.8943
MPRNet	49.32	0.9917	49.10	0.9906
Retinexformer	50.27	0.9942	50.23	0.9559
IGDFormer	50.26	0.9936	50.06	0.9472
PA-MSFormer	51.43	0.9956	51.07	0.9311

Table 3. Ablation Study on Phase-Aware Transformer Blocks and Feature Fusion Modules.

Configuration	PA-MSA			Modules			Electromagnetic Simulation		Measured Data
Configuration	Learnable	Random	None	DBGFN	MSDF	PGF	PSNR	SSIM	PSNR	SSIM
Ours (Full Model)	🗸	×	×	🗸	🗸	🗸	70.93	0.9653	51.07	0.9311
PA-MSA-Random	×	🗸	×	🗸	🗸	🗸	68.49	0.8191	50.26	0.7754
PA-MSA-None	×	×	🗸	🗸	🗸	🗸	68.69	0.8267	49.28	0.8431
Ours w/o DBGFN	🗸	×	×	×	🗸	🗸	68.32	0.7252	49.61	0.6815
Ours w/o MSDF	🗸	×	×	🗸	×	🗸	70.05	0.9626	49.18	0.7652
Ours w/o PGF	🗸	×	×	🗸	🗸	×	69.49	0.9661	49.28	0.9534

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Li, X.; Liu, L.; Shi, X.; Zhou, F. PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement. Remote Sens. 2025, 17, 3047. https://doi.org/10.3390/rs17173047

AMA Style

Huang J, Li X, Liu L, Shi X, Zhou F. PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement. Remote Sensing. 2025; 17(17):3047. https://doi.org/10.3390/rs17173047

Chicago/Turabian Style

Huang, Jiale, Xiaoyong Li, Lei Liu, Xiaoran Shi, and Feng Zhou. 2025. "PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement" Remote Sensing 17, no. 17: 3047. https://doi.org/10.3390/rs17173047

APA Style

Huang, J., Li, X., Liu, L., Shi, X., & Zhou, F. (2025). PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement. Remote Sensing, 17(17), 3047. https://doi.org/10.3390/rs17173047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement

Abstract

1. Introduction

2. Materials and Methods

2.1. Enhancement Network

2.2. Key Architectural Components

2.3. Loss Function Design

3. Results

3.1. Experimental Configuration

3.2. Simulated Electromagnetic Data Enhancement Comparision Results

3.3. Measured Data Enhancement Comparision Results

3.4. Robustness Validation

3.5. Ablation Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI