MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition

Li, Hongxu; Li, Xiaodi; Xu, Zihan; Jin, Xinfei; Su, Fulin

doi:10.3390/rs17152601

Open AccessArticle

MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition

by

Hongxu Li

,

Xiaodi Li

^*

,

Zihan Xu

,

Xinfei Jin

and

Fulin Su

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2601; https://doi.org/10.3390/rs17152601

Submission received: 12 June 2025 / Revised: 16 July 2025 / Accepted: 24 July 2025 / Published: 26 July 2025

(This article belongs to the Special Issue Target Recognition and Detection Based on High Resolution Radar Images)

Download

Browse Figures

Versions Notes

Abstract

High-resolution range profile (HRRP) recognition serves as a foundational task in radar automatic target recognition (RATR), enabling robust classification under all-day and all-weather conditions. However, existing approaches often struggle to simultaneously capture the multi-scale spatial dependencies and global spectral relationships inherent in HRRP signals, limiting their effectiveness in complex scenarios. To address these limitations, we propose a novel multi-scale domain perception network tailored for HRRP-based target recognition, called MSDP-Net. MSDP-Net introduces a hybrid spatial–spectral representation learning strategy through a multiple-domain perception HRRP (DP-HRRP) encoder, which integrates multi-head convolutions to extract spatial features across diverse receptive fields, and frequency-aware filtering to enhance critical spectral components. To further enhance feature fusion, we design a hierarchical scale fusion (HSF) branch that employs stacked semantically enhanced scale fusion (SESF) blocks to progressively aggregate information from fine to coarse scales in a bottom-up manner. This architecture enables MSDP-Net to effectively model complex scattering patterns and aspect-dependent variations. Extensive experiments on both simulated and measured datasets demonstrate the superiority of MSDP-Net, achieving 80.75% accuracy on the simulated dataset and 94.42% on the measured dataset, highlighting its robustness and practical applicability.

Keywords:

inverse synthetic-aperture radar (ISAR); radar automatic target recognition (RATR); high-resolution range profile (HRRP)

1. Introduction

Inverse synthetic-aperture radar (ISAR) is a critical technology for advanced target recognition in military and civilian applications due to its all-day and all-weather capability [1,2]. By exploiting the target’s rotational motion to synthesize a synthetic aperture, ISAR systems enable automatic target characterization without relying on cooperative conditions. However, the generation of ISAR images demands relative motion stability and sufficient data collection periods, limiting real-time applicability in dynamic scenarios.

In contrast, high-resolution range profiles (HRRPs) offer a computationally efficient alternative by capturing the one-dimensional backscattering distribution along the radar line-of-sight (LOS) direction. Unlike ISAR imaging, HRRPs can be readily obtained using single-pulse wideband radar systems, requiring minimal relative motion and computational resources [3,4]. This efficiency has driven growing interest in HRRP-based radar automatic target recognition (RATR) [5,6,7,8,9,10,11].

The existing HRRP recognition methods can be categorized into two kinds: conventional methods [12,13,14,15] and deep learning (DL) methods [16,17,18,19,20,21,22,23,24].

Conventional methods primarily depend on handcrafted features and classical classifiers to recognize various targets. Pilcher et al. [12] extracted five distinct physical features from HH and VV polarization and collaboratively fused multiple nonlinear classifiers to achieve robust maritime target classification. To alleviate the sensitivity to time shifts and azimuth variations, Du et al. [13] constructed a template construction strategy based on azimuth-interval-averaged profiles, effectively enhancing the azimuth robustness of HRRP recognition. To address the redundancy inherent in conventional bispectral integration features, Zhang et al. [14] proposed a bispectral feature selection method based on maximizing inter-class separability, which effectively alleviates azimuth dependency. To mitigate the recognition performance degradation induced by a low signal-to-noise-ratio (SNR), Du et al. [15] introduced a noise-robust high-resolution range image recognition approach based on dominant scattering center matching, effectively extending the radar system’s recognition range. However, accurately designing handcrafted features relies on extensive domain expertise. Moreover, these kinds of methods are hampered by high computational complexity.

In contrast, DL-based HRRP recognition methods can automatically learn and extract various features from the raw HRRP data, eliminating the requirement for explicit feature engineering [25,26,27,28,29,30,31,32]. Pan et al. [17] first extracted an embedded representation of HRRPs using a CNN. Then, this representation was processed as a time series through a stacked bidirectional recurrent neural network (Bi-RNN) with an attention mechanism to achieve target recognition. Zhang et al. [33] combined the ConvLSTM and self-attention to emphasize the discriminative range cell, improving the classification performance for polarimetric HRRP-based RATR. Chen et al. [16] combined a 1-D convolutional neural network (CNN) with a bidirectional gated recurrent unit (Bi-GRU) to emphasize key local features. However, these RNN-based methods are limited in modeling long-range relationships due to vanishing gradients.

Dong et al. [18] employed a sparse autoencoder (SAE) in conjunction with a Gramian angular field (GAF) transformation to extract HRRP target features, subsequently achieving target recognition through a linear classifier. Gao et al. [19] proposed a structure-aware network that integrated shallow and deep features to capture long-range dependencies and determine differential feature contributions. Zhong et al. [6] proposed a dual self-supervised contrastive learning framework with an online clustering module to learn target aspect-invariant representations, demonstrating improved performance for noncooperative target recognition under aspect-deficient conditions. Chen et al. [20] proposed a graph convolutional network (GCN)-based network to model spatial dependencies among range cells via amplitude-based node vectors and range-relative adjacency matrices, achieving improved accuracy under limited training sample conditions. Zhang et al. [34] proposed a feature-guided transformer-based model to address the challenge of small-training-sample scenarios in polarimetric HRRP-based RATR. However, these methods only extract single-domain features and struggle to exploit cross-scale interactions for enhanced target recognition performance.

To address these issues, we propose a light-weight multi-scale domain perception network for HRRP target recognition, called MSDP-Net. MSDP-Net consists of the domain perception HRRP (DP-HRRP) encoder and the hierarchical scale fusion (HSF) branch. The DP-HRRP encoder employs hybrid spatial–spectral-domain perception to extract both local and global features in HRRP data across multiple scales. The HSF branch leverages the bottom-up structure to hierarchically fuse cross-scale features, sequentially aggregating high-level contexts with low-level details to derive robust feature representations. Within the HSF branch, the semantically enhanced scale fusion (SESF) block first modulates the scale-wise features, enhancing the contribution of the target regions. Then, this block utilizes attention-based projection to map the high-level features into low-level space, adaptively gating and fusing them into a unified embedding. Finally, we aggregate the features of each scale to generate the final prediction. The effectiveness of the proposed network is validated through simulated and measured datasets.

The main contributions of this paper are summarized as follows:

We propose a domain perception-based multi-scale framework for HRRP recognition, which explicitly models the complex interplay between multi-scale spatial patterns and multi-domain spectral characteristics inherent in HRRP data.
We propose a DP block to jointly capture spatial dependencies across multiple receptive fields and spectral relationships through frequency awareness. The DP block employs a dual-path mechanism: the spatial path utilizes a pyramid of dilated convolutions to encode multi-scale spatial correlations, while the spectral path applies frequency-selective filters to analyze the power distribution across radar backscattering spectra.
We design a multi-scale hierarchical fusion branch with progressive bottom-up semantic propagation, which systematically aggregates contextual information from fine-grained to coarse-scale features. The fusion branch employs a cascaded architecture with skip connections that propagate high-level semantics to low-level spatial details, while simultaneously injecting coarse-scale global context back to fine-grained representations.

The remainder of this article is organized as follows: Section 2 presents a detailed mathematical formulation of the HRRP signal model. Section 3 introduces the comprehensive architecture of the proposed network, with subsections elaborating on the DP encoder and HSF branch. Section 4 provides comparative experiments across simulated and measured datasets and ablation studies on the designed modules. Section 5 concludes the paper by summarizing the contributions and proposing future research in RATR under different modalities.

2. Signal Model

HRRP recognition aims to classify targets based on their one-dimensional radar backscatter signals. Despite its importance in radar automatic target recognition, HRRP data pose significant challenges due to the complex and varying spatial structures and spectral characteristics, influenced by aspect angle, noise, and target dynamics. Existing methods often fail to effectively capture multi-scale spatial dependencies and global spectral relationships simultaneously, leading to suboptimal recognition performance. Therefore, this work focuses on designing a multi-scale domain-aware model that can jointly extract and fuse spatial and spectral features from HRRP data, improving classification accuracy and robustness under practical conditions.

For the signal model, a target is typically modeled as a distributed group of electromagnetic scatterers in ISAR imaging, where each scatterer contributes distinct backscattered energy depending on its geometric location, orientation, and material properties. The received radar signal is thus considered as the coherent superposition of electromagnetic waves reflected from individual scatterers across the target’s surface, as illustrated in Figure 1. In pulse-Doppler (PD) radar systems, the transmitter employs a wideband linear frequency-modulated (LFM) signal waveform to enable high range resolution. Therefore, the received signal at the radar receiver is written as

\begin{matrix} S_{1} (t_{r}) = \sum_{i = 1}^{Z} I_{i} rect (\frac{t_{r} - 2 R_{i} / c}{T_{P}}) \\ exp (j 2 π f_{c} (t_{r} - \frac{2 R_{i}}{c})) exp (j π γ {(t_{r} - \frac{2 R_{i}}{c})}^{2}) \end{matrix}

(1)

where

t_{r}

denotes the fast time. Z and I denote the scatterer number and the backscattering coefficient, respectively.

R_{i}

denotes the distance between the radar and the i-th scatterers.

f_{c}

,

T_{P}

, and

γ

denote the center frequency, the pulse width, and the frequency modulation slope, respectively. Implementing dechirp processing, we transform (1) into the HRRP as

\begin{matrix} S_{2} (f_{r}) = \sum_{i = 1}^{Z} I_{i} sinc [T_{P} (f_{r} + 2 \frac{γ}{c} R_{i})] \cdot exp (- j \frac{4 π f_{c}}{c} R_{i}) \end{matrix}

(2)

where

f_{r}

denotes the fast frequency. The HRRP provides a one-dimensional representation of a target’s structural characteristics by capturing the spatial distribution and amplitude variations of its scattering centers. Structural differences among targets directly influence the distribution and intensity of their scattering responses, resulting in different HRRP signatures for target recognition tasks.

In addition, Gaussian white noise is added to the amplitude of the HRRP to simulate the noise interference commonly present in practical radar systems as

\begin{matrix} S_{3} (f_{r}) = S_{2} (f_{r}) + w (f r), w (f r) ∽ (0, σ) \end{matrix}

(3)

where

w (f r)

denotes the Gaussian white noise.

3. Methods

3.1. Method Overview

This section provides a comprehensive description of the proposed MSDP-Net, as illustrated in Figure 2. MSDP-Net comprises the DP-HRRP encoder and the HSF branch. The DP-HRRP encoder is designed to jointly capture local spatial relationships within multiple spatial resolutions and global spectral dependencies within the frequency domain. Then, the HSF branch employs a hierarchical bottom-up fusion strategy to integrate encoder features across scales, employing adaptive gates to propagate critical cross-scale features. Finally, we aggregate all features to generate a final prediction, leveraging the complementary strengths of multi-scale information.

3.2. Domain Perception HRRP (DP-HRRP) Encoder

The proposed DP-HRRP encoder is a composite architecture that integrates scale perception (SP) and frequency perception (FP) modules to exploit both local spatial and global frequency information in HRRPs, as shown in Figure 2a. Specifically, the encoder comprises stacking perception blocks. We utilize patch merging and two sequential convolutional layers for preserving the intrinsic structural relationships of the HRRP and downsampling the spatial resolution. Then, we employ the SP module and FP module to capture spatial-domain and frequency-domain information. A 1 × 1 convolutional layer is applied to fuse the concatenated features and reduce dimensionality. This hybrid-domain encoder preserves fine-grained spatial details while enhancing the encoder’s capacity to model global frequency dependencies, which is critical for HRRP-based target recognition.

As shown in Figure 3a, the SP module introduces multi-head dilated depthwise convolutions (DDWConvs) to focus on different granularity across multiple receptive fields. We first split up the feature maps into K parallel heads along the channel dimensions. Each head employs a unique dilation rate of k, which can be written as

\begin{matrix} \hat{x_{k}} = D D W C o n v (x_{k}), k = 1, \dots, K \end{matrix}

(4)

where k and K denote the head index and the total number of heads.

\hat{x_{k}}

and

x_{k}

denote the enhanced features and the original features of the k-th head. Then, the features of multiple heads are sequentially divided into M groups, where each group contains one channel from each head to enhance the diversity of features. The group number M is calculated as

\begin{matrix} M = \frac{C}{K} \end{matrix}

(5)

where C denotes the channel number of the input features. Finally, we utilize pointwise convolutions to aggregate inter-group and intra-group information.

Frequency-domain perception offers a significant perspective for HRRP feature extraction by revealing the scattering center distributions and global spectral variations within the target response. Inspired by [35], we propose the FP module to adaptively enhance global spectral representations. As shown in Figure 3b, we apply a fast Fourier transformation (FFT) to project input features into the frequency domain, subsequently decoupling the magnitude and phase components. To emphasize critical spectral components, we employ parallel 1 × 5 and 1 × 7 DWConvs to perform multi-scale enhancement on the magnitude. The modulated amplitude is then recombined with the phase and transformed back to the spatial domain via an inverse FFT (IFFT). A GELU-based gating mechanism is employed to adaptively balance the original and spectral-enhanced features, enhancing the representation of the fused features.

3.3. Hierarchical Scale Fusion (HSF) Branch

The proposed HSF branch utilizes a bottom-up design with stacked SESF blocks to enhance cross-scale information propagation, as shown in Figure 4. The SESF block employs a semantic-aware modulation to selectively emphasize the most contributive range cells within each scale, as depicted in Figure 4a. These features are subsequently embedded into the same channel dimensionality utilizing channel projectors to ensure cross-scale compatibility. Then, we integrate scale-wise representations across adjacent scales, leveraging semantic guidance from the high-level scale to guide the refinement of low-level details, as depicted in Figure 4b. Specifically, we utilize a cross-scale attention map

ϕ_{L \to L - 1}

to map high-level features to the low-level scale, which can be expressed as

\begin{matrix} ϕ_{L \to L - 1} = {[\frac{exp ({\hat{f}}_{L - 1}^{T} \cdot {\hat{f}}_{L})}{\sum_{n = 1}^{N} exp ({\hat{f}}_{L - 1}^{T} \cdot {\hat{f}}_{L})}]}^{T} \end{matrix}

(6)

where

{\hat{f}}_{L - 1}

and

{\hat{f}}_{L}

denote the projected

L - 1

-th scale feature and L-th scale feature, respectively. N denotes the total number of range cells within the L-th scale. Then, we aggregate the L-th scale feature into the

L - 1

-th scale to facilitate cross-scale interaction, which is written as

\begin{matrix} {\hat{\hat{f}}}_{L - 1} = G \cdot {\hat{f}}_{L - 1} + (1 - G) \cdot {\hat{f}}_{L} \cdot ϕ_{L \to L - 1} \end{matrix}

(7)

where

{\hat{\hat{f}}}_{L - 1}

denotes the aggregated

L - 1

-th scale feature. G denotes the adaptive gating coefficient, which is jointly determined by cross-scale features, as shown in Figure 4c. Specifically, we feed the concatenated features into a linear gating module, which consists of two linear layers, two layer normalization (LN) layers, and two activation functions, to achieve the coefficient G. This fusion strategy enables the intermediate representation to incorporate both high-level semantic context and fine-grained structural detail, leading to more informative multi-scale feature representations.

3.4. Final Classification

We first apply adaptive average pooling to compress the spatial resolution of the fused features from different scales, obtaining fixed-length representations. These features are then concatenated along the channel dimension to integrate multi-scale information. Finally, the concatenated feature is passed through a linear classification head to produce the final target classification result.

3.5. Loss Function

We employ the classification prediction loss as the loss function of MSDP-Net, which can be written as

\begin{matrix} l o s s = L_{C E} (y, \hat{y}) \end{matrix}

(8)

where

L_{C E} (y, \hat{y}) = - \sum_{i} y_{i} log ({\hat{y}}_{i})

. i denotes the class index. y and

\hat{y}

denote the ground truth label and prediction, respectively.

4. Results and Discussion

4.1. Experimental Setup

To validate the effectiveness of the proposed method, we employed both the simulated and measured maritime target datasets, as shown in Table 1. The measured dataset was acquired using an X-band phased-array radar system. It comprises ten ship types spanning a size range of 60–250 m, including three cargo ships, three cruise liners, two LNG carriers, and two oil tankers, as shown in Figure 5. The simulated dataset comprises eight types of ships, as shown in Figure 6 and Figure 7. Each class in both datasets comprises 2000 samples. These two datasets were split into training, validation, and test sets with a ratio of 70%, 15%, and 15%, respectively. Additionally, we added 10 dB of white Gaussian noise to ensure generalization.

We implemented our network on a Linux workstation with an Intel Xeon CPU and an NVIDIA RTX 1660s GPU (NVIDIA Corporation, Santa Clara, CA, USA). The network architecture was initialized with an input channel dimension of 8 to align with the inherent dimensionality of HRRP data while maintaining computational efficiency. The hidden dim

C_{h}

of the SESF block was set to 16 to optimize the trade-off between feature representation capacity and model complexity.

The initial learning rate was set to 0.0005 and decayed after each epoch with the cosine annealing scheduler. The epoch number was set as 100. The network was trained using the AdamW optimizer with a batch size of 64.

4.2. Experimental Results

To validate the performance of the proposed method, we compare MSDP-Net with ASTT-Net [21], TACNN-Net [16], and SA-Net [19]. The results for both the simulated and measured datasets are presented in Table 2 and Table 3, where each experiment was repeated five times to mitigate the randomization in the training process. We employ the accuracy and the

F_{1}

score to evaluate the classification performance. The FLOPs and the number of parameters (Params) are utilized to evaluate the efficiency of each method. The confusion matrices are depicted in Figure 8 and Figure 9. We employ accuracy to measure overall correctness across all classes. The

F_{1}

score is utilized to balance precision and recall, which is critical for handling class imbalances. The FLOPs and the number of parameters (Params) are utilized to quantify computational and memory efficiency for practical deployment. The confusion matrices are depicted in Figure 8 and Figure 9.

Compared to ASTT-Net, which adopts a discrete wavelet-based swin transformer architecture, MSDP-Net achieves a 2.67% accuracy improvement on the simulated dataset and 2.00% on the measured dataset. This demonstrates the efficacy of integrating the proposed multi-domain perception encoder, which simultaneously captures spatial–spectral interactions that ASTT-Net’s single-domain transformer architecture fails to fully exploit.

When compared with TACNN-Net, which implements attention mechanisms exclusively at the coarsest scale, MSDP-Net exhibits a 1.74% accuracy gain on the simulated data and 1.08% on the measured data. This underscores the advantage of our hierarchical multi-scale feature interaction strategy over TACNN-Net’s scale-restricted attention design, which limits its ability to leverage fine-to-coarse contextual relationships.

Compared to SA-Net, which focuses on spatial structure awareness through conventional convolution blocks, MSDP-Net delivers performance improvements of 1.23% on the simulated dataset and 1.57% on the measured dataset. This validates the necessity of explicitly modeling joint spatial–spectral-domain features through our hybrid encoder, as opposed to SA-Net’s spatial-only feature-encoding paradigm.

These results underscore MSDP-Net’s capacity to facilitate multi-domain feature extraction and exploit complementary information across scales, thereby improving recognition performance.

In addition it is observed that our model is generally able to differentiate between the broader categories of Cruiseship and Cargoship. However, confusion mainly arises at the subclass level, such as between Cruiseship1 and Cruiseship2, or between certain Cargoship instances and structurally similar Cruiseships. The difficulty in distinguishing these subclasses lies in the high structural similarity and overlapping scattering characteristics when observed from specific aspect angles. HRRP data represent one-dimensional projections of radar backscattering profiles along the line of sight. As a result, fine-grained structural differences between subclasses may not be sufficiently captured, especially when the aspect angle does not emphasize discriminative features.

We employ t-SNE [36], a popular technique for projecting high-dimensional features into a low-dimensional space, to assess how well different approaches separate classes, as shown in Figure 10 and Figure 11. Our network’s embeddings form more distinct clusters than those produced by comparative methods. In the simulated dataset, this translates into markedly clearer boundaries between categories, while in the measured dataset we observe substantial gains in the separation of Cargoship1, Cargoship2, and Cargoship3. These figures visually confirm that our architecture yields feature representations with superior discriminative classification.

Moreover, cruise ships and cargo ships often share large, flat surfaces and distributed scattering components, which leads to overlapping patterns in their HRRPs, especially under side-looking or oblique aspect angles. Therefore, all models fail to effectively classify Cruiseship1, Cruiseship2, and Cargoship.

Among the four methods, SA-Net exhibits the smallest parameter count at 0.0602 M, while TACNN-Net and ASTT-Net require 0.2148 M and 0.7764 M parameters, respectively. Our proposed method achieves a parameter budget of 0.1102 M, doubling SA-Net’s parameters but remaining 51% smaller than TACNN-Net and 14% of ASTT-Net’s count, effectively balancing representational capacity with model compactness to reduce storage and transmission overhead.

Considering the FLOPs, the proposed method demonstrates the best computational efficiency with only 4.9 M FLOPs, significantly outperforming ASTT-Net (6.7 M), TACNN-Net (15.6 M), and even SA-Net (6.3 M). Compared to the second-most efficient SA-Net, our method reduces FLOPs by 22%, while achieving a 69% reduction relative to TACNN-Net. This translates to accelerated inference speeds and lower energy consumption under identical hardware constraints.

Furthermore, all methods fail to effectively distinguish between costa and container1, as shown in both the classification results and t-SNE visualization. This is primarily due to the inherent projection characteristics of HRRPs.

An HRRP represents a one-dimensional projection of a target’s scattering centers along the radar LOS, which makes it highly sensitive to the aspect angle of the target. Although costa and container1 are structurally different ships, their backscattering profiles can appear remarkably similar when observed from certain angles. As a result, all models struggle to discriminate between these two classes under specific projection conditions.

4.3. Ablation Studies

To evaluate the contribution of each module, we performed ablation studies on both the simulated and measured datasets. Four variants were analyzed: w/o DP block, w/o SP module, w/o FP module, and w/o HSF branch. We evaluate the performance of these variants and the proposed network utilizing the accuracy,

F_{1}

score, Params, and FLOPs, as shown in Table 4. The results show that removing the DP block leads to a degradation of 12.74% in the accuracy and 13.17% in the

F_{1}

score, demonstrating its role in extracting local and global dependencies. Removing the SP module results in reductions of 2.41% in the accuracy and 2.38% in the

F_{1}

score, indicating the effectiveness of modeling spatial relationships across multiple receptive fields. Removing the FP module causes performance degradations of 0.45% in the accuracy and 0.42% in the

F_{1}

score, showing its ability to capture global spectral dependencies. Removing the HSF branch leads to a degradation of 0.76% in the accuracy and 0.81% in the

F_{1}

score, demonstrating the effectiveness of the bottom-up adaptive scale fusion strategy. These findings demonstrate that combining these modules allows the proposed network to interactively exploit the potential of multi-domain features, achieving improved HRRP target recognition. We also conducted module accumulation experiments, as shown in Table 5. The results demonstrate that the SP and FP modules significantly improve the model performance with only a small increase in parameters and computational cost.

Considering the computational efficiency, introducing the DP block increases the model size from 0.0353 M to 0.1102 M parameters and raises the FLOPs from 1.8 M to 4.9 M, yet yields the largest performance improvement, which confirms its critical role in joint spatial–spectral modeling. The SP module adds 0.0494 M parameters and 1.9 M FLOPs, while the FP module adds a computational cost of 0.0354 M parameters and 1.8 M FLOPs. Finally, the HSF branch contributes a slight improvement in the computational cost of 0.0023 M parameters and 0.1 M FLOPs. Collectively, these results demonstrate that our architecture allocates computational resources—investing to achieve efficiency and superior recognition performance.

4.4. The Effect of the Channel Number, the Hidden Dim, and Replacing the DP Block

We observe that increasing the number of channels leads to improved model performance, reflecting enhanced feature representation capability as shown in Table 6. Moreover, the performance exhibits fluctuations beyond 16 channels, which may be attributed to overfitting or training instability introduced by the higher model complexity. Therefore, we select 8 channels as our initial configuration to balance performance and model stability.

We further investigate the impact of the hidden dimension

C_{h}

in the SESF block by varying its value, as summarized in Table 7. When

C_{h}

is reduced to 8, a noticeable degradation is observed across all evaluation metrics. As

C_{h}

increases beyond 16, the performance exhibits fluctuations without consistent improvement. Therefore, we set

C_{h}

to 16 to achieve a balance between recognition accuracy and computational efficiency.

Similarly, we replace the proposed DP block with SE and CBAM modules for comparison, as shown in Table 8. While this substitution results in a slight reduction in computational cost and parameter count, the overall performance degrades significantly compared to our original design.

5. Conclusions

In this paper, we propose MSDP-Net, a light-weight multi-scale domain perception network for HRRP recognition that addresses the challenges of capturing multi-scale spatial dependencies and spectral relationships inherent in HRRP data. Unlike conventional approaches that often treat spatial and spectral dimensions independently, MSDP-Net explicitly integrates a hybrid spatial–spectral-based DP-HRRP encoder to holistically model both local structural patterns and long-range global dependencies. This dual-modality encoding mechanism enables the network to effectively disentangle the complex interplay between radar backscattering characteristics and target geometry, which are critical for robust target classification under varying observation conditions.

To further enhance cross-scale feature representation, the network adopts a hierarchical bottom-up fusion strategy that systematically aggregates multi-scale contextual information. By progressively integrating features from fine-grained local details to coarse global structures, this architecture not only preserves discriminative local patterns but also captures large-scale semantic relationships. The dynamic fusion mechanism ensures adaptive emphasis on salient features at different scales, addressing the inherent scale variability present in real-world HRRP data.

Extensive validation experiments were conducted across diverse datasets, including both simulated and measured datasets under challenging environmental conditions. The results demonstrate that MSDP-Net achieves state-of-the-art performance metrics in terms of classification accuracy and computational efficiency, outperforming existing methods by significant margins in cluttered scenarios. The network’s lightweight design also ensures practical applicability for real-time radar systems with limited computational resources.

In addition, due to the physical limitations of HRRP resolution and viewpoint dependency, complete disambiguation may remain difficult. Therefore, our future work will incorporate additional information such as infrared modality, multi-aspect views, and contextual metadata.

Author Contributions

Conceptualization, H.L. and X.L.; methodology, H.L.; software, H.L.; validation, H.L.; investigation, H.L.; resources, F.S.; data curation, H.L.; writing—original draft preparation, H.L. and X.L.; writing—review and editing, H.L.; visualization, H.L., X.L., X.J. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Aeronautical Science Foundation of China under Grant 2024Z073077005.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The source code will be available on GitHub to contribute to the development of remote sensing.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, P.; Xia, X.G.; Zhan, M.; Liu, X.; Liao, G.; Jiang, X. ISAR imaging of a maneuvering target based on parameter estimation of multicomponent cubic phase signals. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5103918. [Google Scholar] [CrossRef]
Wang, C.; Gu, Y.; Li, X. A Robust Multispectral Point Cloud Generation Method Based on 3-D Reconstruction From Multispectral Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5407612. [Google Scholar] [CrossRef]
Li, M.D.; Xiao, S.P.; Chen, S.W. Polarimetric ISAR Space Target Structure Recognition Based on Embedded Scattering Mechanism and Semi-Supervised Representation Learning. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5102019. [Google Scholar] [CrossRef]
Li, M.D.; Deng, J.W.; Xiao, S.P.; Chen, S.W. NLSAN: A Non-Local Scene Awareness Network for Compact Polarimetric ISAR Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5111416. [Google Scholar] [CrossRef]
Song, J.; Wang, Y.; Chen, W.; Li, Y.; Wang, J. Radar HRRP recognition based on CNN. J. Eng. 2019, 2019, 7766–7769. [Google Scholar] [CrossRef]
Zhong, Y.; Lin, W.; Xu, Y.; Huang, L.; Huang, Y.; Ding, X. Contrastive learning for radar HRRP recognition with missing aspects. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3504605. [Google Scholar] [CrossRef]
Wen, Y.; Shi, L.; Yu, X.; Huang, Y.; Ding, X. HRRP target recognition with deep transfer learning. IEEE Access 2020, 8, 57859–57867. [Google Scholar] [CrossRef]
Tian, L.; Chen, B.; Guo, Z.; Du, C.; Peng, Y.; Liu, H. Open set HRRP recognition with few samples based on multi-modality prototypical networks. Signal Process. 2022, 193, 108391. [Google Scholar] [CrossRef]
Liu, Y.; Long, T.; Zhang, L.; Wang, Y.; Zhang, X.; Li, Y. SDHC: Joint semantic-data guided hierarchical classification for fine-grained HRRP target recognition. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 3993–4009. [Google Scholar] [CrossRef]
Li, W.; Li, S.; Tian, B.; Huang, P.; Zheng, M.; Xu, S. Regularization-Based Region Learning for Radar HRRP Open Set Recognition. IEEE Trans. Instrum. Meas. 2025, 74, 2512615. [Google Scholar] [CrossRef]
Zhou, Q.; Yu, B.; Wang, Y.; Zhang, L.; Zheng, L.; Zou, D.; Zhang, X. Generative Multi-View HRRP Recognition Based on Cascade Generation and Fusion Network. In Proceedings of the 2024 International Radar Conference (RADAR), Rennes, France, 21–25 October 2024; pp. 1–5. [Google Scholar]
Pilcher, C.M.; Khotanzad, A. Maritime ATR using Classifier Combination and High Resolution Range Profiles. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 2558–2573. [Google Scholar] [CrossRef]
Du, L.; Liu, H.; Bao, Z.; Xing, M. Radar HRRP target recognition based on higher order spectra. IEEE Trans. Signal Process. 2005, 53, 2359–2368. [Google Scholar] [CrossRef]
Zhang, X.D.; Shi, Y.; Bao, Z. A new feature vector using selected bispectra for signal classification with application in radar target recognition. IEEE Trans. Signal Process. 2001, 49, 1875–1885. [Google Scholar] [CrossRef]
Du, L.; He, H.; Zhao, L.; Wang, P. Noise Robust Radar HRRP Target Recognition Based on Scatterer Matching Algorithm. IEEE Sens. J. 2016, 16, 1743–1753. [Google Scholar] [CrossRef]
Chen, J.; Du, L.; Guo, G.; Yin, L.; Wei, D. Target-attentional CNN for Radar Automatic Target Recognition with HRRP. Signal Process. 2022, 196, 108497. [Google Scholar] [CrossRef]
Pan, M.; Liu, A.; Yu, Y.; Wang, P.; Li, J.; Liu, Y.; Lv, S.; Zhu, H. Radar HRRP Target Recognition Model Based on a Stacked CNN–Bi-RNN With Attention Mechanism. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5100814. [Google Scholar] [CrossRef]
Dong, J.; She, Q.; Hou, F. HRPnet: High-Dimensional Feature Mapping for Radar Space Target Recognition. IEEE Sens. J. 2024, 24, 11743–11758. [Google Scholar] [CrossRef]
Gao, S.; Bie, B.; Wang, H.; Xing, M.; Quan, Y. Radar High-Resolution Range Profile Target Recognition Method Based on Structure-Aware Network. IEEE Sens. J. 2024, 24, 32660–32672. [Google Scholar] [CrossRef]
Chen, L.; Sun, X.; Pan, Z.; Liu, Q.; Wang, Z.; Su, X.; Liu, Z.; Hu, P. HRRPGraphNet: Make HRRPs to be graphs for efficient target recognition. Electron. Lett. 2024, 60, e70088. [Google Scholar] [CrossRef]
Chen, S.; Huang, X.; Xu, W. Adaptive soft threshold transformer for radar high-resolution range profile target recognition. IET Radar Sonar Navig. 2024, 18, 1260–1273. [Google Scholar] [CrossRef]
Liu, Q.; Zhang, X.; Liu, Y. A Prior-Knowledge-Guided Neural Network Based on Supervised Contrastive Learning for Radar HRRP Recognition. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 2854–2873. [Google Scholar] [CrossRef]
Liu, X.; Wang, L.; Bai, X. End-to-end radar HRRP target recognition based on integrated denoising and recognition network. Remote Sens. 2022, 14, 5254. [Google Scholar] [CrossRef]
Zhang, X.; Wei, Y.; Wang, W. Patch-Wise Autoencoder Based on Transformer for Radar High-Resolution Range Profile Target Recognition. IEEE Sens. J. 2023, 23, 29406–29414. [Google Scholar] [CrossRef]
Guo, Z.; Liu, Z.; Xie, R.; Ran, L. HRRP Few-shot Target Recognition for Full Polarimetric Radars via SCs Optimal Matching. IEEE Trans. Aerosp. Electron. Syst. 2024, 61, 4526–4541. [Google Scholar] [CrossRef]
Liu, X.; Zhou, D.; Huang, Q. Radar HRRP Target Recognition Based on Hybrid Quantum Neural Networks. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6173–6188. [Google Scholar] [CrossRef]
Wu, L.; Hu, S.; Xu, J.; Liu, Z. Ship HRRP target recognition against decoy jamming based on CNN-BiLSTM-SE model. IET Radar Sonar Navig. 2024, 18, 361–378. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, S.; Gao, H.W.; Wei, G.; Wang, X.; Pan, X.M. A Dual-Polarization Feature Fusion Network for Radar Automatic Target Recognition Based On HRRP Sequence. arXiv 2025, arXiv:2501.13541. [Google Scholar] [CrossRef]
Zhang, Y.; Feng, X.; Yin, H.; Wei, X.; Yan, H. HRRPSeqNet: Open-Set Recognition of Space Target Motions Using HRRP Sequences. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 7481–7496. [Google Scholar] [CrossRef]
Gao, F.; Lang, P.; Yeh, C.; Li, Z.; Ren, D.; Yang, J. An interpretable target-aware vision transformer for polarimetric HRRP target recognition with a novel attention loss. Remote Sens. 2024, 16, 3135. [Google Scholar] [CrossRef]
Zhang, F.; Bi, X.; Zhang, Z.; Xu, Y. HIFR-Net: A HRRP-Infrared Fusion Recognition Network Capable of Handling Modality Missing and Multisource Data Misalignment. IEEE Sens. J. 2024, 25, 5769–5781. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, K.; Zhang, W. PTrans: Transformer-Based HRRP Target Recognition Method with Patching. In Proceedings of the 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), Nanjing, China, 6–8 December 2024; pp. 1293–1298. [Google Scholar]
Zhang, L.; Li, Y.; Wang, Y.; Wang, J.; Long, T. Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention. IEEE Sens. J. 2021, 21, 7884–7898. [Google Scholar] [CrossRef]
Zhang, L.; Han, C.; Wang, Y.; Li, Y.; Long, T. Polarimetric HRRP recognition based on feature-guided Transformer model. Electron. Lett. 2021, 57, 705–707. [Google Scholar] [CrossRef]
Feijoo, D.; Benito, J.C.; Garcia, A.; Conde, M.V. DarkIR: Robust Low-Light Image Restoration. arXiv 2025, arXiv:2412.13443. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. The generation of the HRRP data.

Figure 2. The architecture of the multi-scale domain perception network (MSDP-Net). MSDP-Net consists of the domain perception HRRP (DP-HRRP) encoder and the hierarchical scale fusion (HSF) branch. (a) The DP-HRRP encoder comprises stacking DP blocks to extract multi-scale spatial–spectral features. (b) The HSF branch employs a bottom-up design to dynamically aggregate high-level contexts into low-level features, generating robust representations.

Figure 3. The architectures of the scale perception (SA) module and the frequency perception (FA) module. (a) The SA module introduces multi-head dilated depthwise convolutions to capture features within multiple receptive fields. Then, this module aggregates each head utilizing intra-group and inter-group pointwise convolutions. (b) The FA module introduces frequency enhancement to model global spectral dependencies and subsequently utilizes GELU-based gating to adaptively re-weight the contributions of features.

Figure 4. The architectures of the semantically enhanced scale fusion (SESF) block. (a) The SESF block first derives the semantic probability maps from adjacent scale features

f_{L - 1}

and

f_{L}

to emphasize the target regions. (b) The enhanced feature

{\hat{f}}_{L}

is then projected into the

L - 1

scale by calculating the affinity between

{\hat{f}}_{L}

and

{\hat{f}}_{L - 1}

. (c)

{\hat{\hat{f}}}_{L}

and

{\hat{f}}_{L - 1}

are adaptively fused through the gate mechanism.

Figure 4. The architectures of the semantically enhanced scale fusion (SESF) block. (a) The SESF block first derives the semantic probability maps from adjacent scale features

f_{L - 1}

and

f_{L}

to emphasize the target regions. (b) The enhanced feature

{\hat{f}}_{L}

is then projected into the

L - 1

scale by calculating the affinity between

{\hat{f}}_{L}

and

{\hat{f}}_{L - 1}

. (c)

{\hat{\hat{f}}}_{L}

and

{\hat{f}}_{L - 1}

are adaptively fused through the gate mechanism.

Figure 5. The HRRP data of the measured dataset with ship length ranging from 60 to 250 m. (a) The raw HRRP data of Cargoship1. (b) The raw HRRP data of Cargoship2. (c) The raw HRRP data of Cargoship3. (d) The raw HRRP data of Cruiseship1. (e) The raw HRRP data of Cruiseship2. (f) The raw HRRP data of Cruiseship3. (g) The raw HRRP data of LNGship1. (h) The raw HRRP data of LNGship2. (i) The raw HRRP data of Oilship1. (j) The raw HRRP data of Oilship2.

Figure 6. The scatteringmodels of the simulated dataset, consisting of 8 classes. (a) The scattering model of the costa ship. (b) The scattering model of the Boat ship. (c) The scattering model of the LNG ship. (d) The scattering model of the patrol ship. (e) The scattering model of the transport ship. (f) The scattering model of container ship-1. (g) The scattering model of container ship-2. (h) The scattering model of the owlfelino ship.

Figure 7. The HRRP data of the simulated dataset, consisting of 8 classes. (a) The HRRP data of the costa ship. (b) The HRRP data of the Boat ship. (c) The HRRP data of the LNG ship. (d) The HRRP data of the patrol ship. (e) The HRRP data of the transport ship. (f) The HRRP data of container ship-1. (g) The HRRP data of container ship-2. (h) The HRRP data of the owlfelino ship.

Figure 8. The confusion matrices of the compared methods on the simulated dataset. (a) The confusion matrix of ASTT-Net on the simulated dataset. (b) The confusion matrix of TACNN-Net on the simulated dataset. (c) The confusion matrix of SA-Net on the simulated dataset. (d) The confusion matrix of the proposed MSDP-Net on the simulated dataset.

Figure 9. The confusion matrices of the compared methods on the measured dataset. (a) The confusion matrix of ASTT-Net on the measured dataset. (b) The confusion matrix of TACNN-Net on the measured dataset. (c) The confusion matrix of SA-Net on the measured dataset. (d) The confusion matrix of the proposed MSDP-Net on the measured dataset.

Figure 10. The 2-D t-SNE visualizations of the compared methods on the simulated dataset. (a) The t-SNE matrix of ASTT-Net. (b) The t-SNE matrix of TACNN-Net. (c) The t-SNE matrix of SA-Net. (d) The t-SNE matrix of the proposed MSDP-Net.

Figure 11. The 2-D t-SNE visualizations of the compared methods on the measured dataset. (a) The t-SNE matrix of ASTT-Net. (b) The t-SNE matrix of TACNN-Net. (c) The t-SNE matrix of SA-Net. (d) The t-SNE matrix of the proposed MSDP-Net.

Table 1. The key parameters of the simulated and measured dataset.

Parameters	Simulated Dataset	Measured Dataset
Carrier frequency	$9 GHz$	X-band
Bandwidth	400 $MHz$	300 $MHz$
PRF	$800 Hz$	$1000 Hz$
Classes	8	10

Table 2. The comparative experiments on the simulated dataset.

Method	Params	FLOPs	Simulated Dataset
Method	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
PCA	-	-	54.31	55.07	59.10	54.31
ASTT-Net	0.7764 M	6.7 M	77.68 ± 0.17	77.59 ± 0.12	78.08 ± 0.10	77.67 ± 0.17
TACNN-Net	0.2148 M	15.6 M	79.01 ± 0.53	78.77 ± 0.52	79.13 ± 0.60	79.01 ± 0.53
SA-Net	0.0602 M	6.3 M	79.52 ± 0.19	79.56 ± 0.25	80.25 ± 0.42	79.52 ± 0.19
Proposed	0.1102 M	4.9 M	80.75 ± 0.34	80.74 ± 0.32	80.91 ± 0.31	80.75 ± 0.34

Table 3. The comparative experiments on the measured dataset.

Method	Params	FLOPs	Measured Dataset
Method	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
PCA	-	-	67.21	67.17	69.26	67.24
ASTT-Net	0.7764 M	6.7 M	77.68 ± 0.17	77.59 ± 0.12	78.08 ± 0.10	77.68 ± 0.17
TACNN-Net	0.2148 M	15.6 M	93.34 ± 0.59	93.37 ± 0.58	93.75 ± 0.95	93.38 ± 0.58
SA-Net	0.0602 M	6.3 M	92.85 ± 1.12	92.89 ± 1.13	93.37 ± 0.95	92.98 ± 1.12
Proposed	0.1102 M	4.9 M	94.42 ± 0.33	94.07 ± 0.34	94.33 ± 0.35	94.13 ± 0.36

Table 4. The ablation experiments on the simulated dataset.

Method	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
w/o DP block	0.0353 M	1.8 M	68.70 ± 0.59	68.10 ± 0.72	71.07 ± 0.63	68.70 ± 0.59
w/o SP module	0.0847 M	3.7 M	78.34 ± 0.59	78.36 ± 0.61	79.01 ± 0.69	78.34 ± 0.59
w/o FP module	0.0748 M	3.6 M	80.30 ± 0.29	80.32 ± 0.31	80.61 ± 0.35	80.30 ± 0.29
w/o HSF branch	0.1079 M	4.8 M	79.99 ± 0.44	79.93 ± 0.50	80.22 ± 0.70	79.99 ± 0.44
Proposed	0.1102 M	4.9 M	80.75 ± 0.34	80.74 ± 0.32	80.91 ± 0.31	80.75 ± 0.34

Table 5. The ablation experiments about the module accumulate on the simulated dataset.

Method	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
add SP module	0.0726 M	3.5 M	78.96 ± 0.87	78.95 ± 0.86	79.11 ± 0.87	79.17 ± 0.77
add FP module	0.0825 M	3.6 M	78.28 ± 0.70	78.30 ± 0.75	78.49 ± 0.82	78.68 ± 0.84

Table 6. The ablation experiments on the channel number.

Number	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
12	0.2374 M	10.5 M	81.89 ± 0.61	82.10 ± 1.02	81.75 ± 0.78	81.89 ± 0.61
16	0.4136 M	18.21 M	81.14 ± 0.34	81.14 ± 0.46	81.03 ± 0.63	81.14 ± 0.34
20	0.6388 M	28.0 M	81.33 ± 0.94	80.87 ± 0.93	81.02 ± 1.21	81.33 ± 0.94

Table 7. The ablation experiments on the hidden dim

C_{h}

.

Table 7. The ablation experiments on the hidden dim

C_{h}

.

Number	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
8	0.1075 M	4.8 M	80.10 ± 0.28	80.11 ± 0.27	80.45 ± 0.38	80.10 ± 0.28
24	0.1135 M	5.1 M	80.67 ± 0.23	80.61 ± 0.24	80.86 ± 0.25	80.67 ± 0.23
32	0.1178 M	5.3 M	80.64 ± 0.31	80.58 ± 0.30	80.74 ± 0.31	80.64 ± 0.31

Table 8. The ablation experiments on replacing The DP block with the SE and CBAM modules on the simulated dataset.

Method	Params	FLOPs	Accuracy (%)	$F_{1}$ Score (%)	Precision (%)	Recall (%)
add SE module	0.0438 M	1.8 M	78.80 ± 0.49	78.79 ± 0.82	78.74 ± 1.30	78.80 ± 0.49
add CBAM module	0.0439 M	1.8 M	78.34 ± 0.80	78.30 ± 0.76	78.52 ± 0.97	78.34 ± 0.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Li, X.; Xu, Z.; Jin, X.; Su, F. MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition. Remote Sens. 2025, 17, 2601. https://doi.org/10.3390/rs17152601

AMA Style

Li H, Li X, Xu Z, Jin X, Su F. MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition. Remote Sensing. 2025; 17(15):2601. https://doi.org/10.3390/rs17152601

Chicago/Turabian Style

Li, Hongxu, Xiaodi Li, Zihan Xu, Xinfei Jin, and Fulin Su. 2025. "MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition" Remote Sensing 17, no. 15: 2601. https://doi.org/10.3390/rs17152601

APA Style

Li, H., Li, X., Xu, Z., Jin, X., & Su, F. (2025). MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition. Remote Sensing, 17(15), 2601. https://doi.org/10.3390/rs17152601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition

Abstract

1. Introduction

2. Signal Model

3. Methods

3.1. Method Overview

3.2. Domain Perception HRRP (DP-HRRP) Encoder

3.3. Hierarchical Scale Fusion (HSF) Branch

3.4. Final Classification

3.5. Loss Function

4. Results and Discussion

4.1. Experimental Setup

4.2. Experimental Results

4.3. Ablation Studies

4.4. The Effect of the Channel Number, the Hidden Dim, and Replacing the DP Block

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI