1. Introduction
Radar technology has consistently played a pivotal role in both military and civilian domains. With the rapid development of modern radar technology, radar imaging technology has gone way beyond traditional detection limits. High-resolution techniques, primarily achieved through wideband signal transmission and pulse compression, enable the acquisition of detailed target signatures such as the high-resolution range profile (HRRP), making fine-grained target recognition by radar possible. Consequently, radar automatic target recognition (RATR) has become an important task in radar applications. The core principle of this technology involves extracting and analyzing features from radar measurements using advanced signal processing algorithms and pattern recognition methods to achieve automatic target classification and identification. HRRP, derived from wideband radar systems, represents the vector sum of target scattering echoes projected along the radar’s radial direction, containing crucial discriminative information such as target shapes, structural dimensions, and scattering center distributions [
1]. HRRP data exhibit unique advantages due to their ease of acquisition, convenient processing, efficient storage, and rich structural information [
2]. These distinct characteristics make HRRP an important technical support for RATR, offering reliable solutions for efficient and precise target identification.
From a methodological perspective, HRRP-based RATR methods can primarily be categorized into three groups: (1) feature extraction-based methods that identify robust features from HRRP signals and leverage designed classifiers for target differentiation, (2) statistical model-based methods that construct probability distribution models using training data and classify target categories by calculating the likelihood probabilities of test samples, and (3) deep learning-based methods that enable end-to-end recognition through neural networks for autonomous feature extraction [
3].
In feature extraction-based methods, the recognition performance is critically influenced by the effectiveness and accuracy of feature selection. The core challenge lies in constructing feature spaces with translation invariance and intrinsic target separability, the design of which heavily relies on empirical expertise. Existing research has achieved preliminary progress: Studies [
4,
5,
6] demonstrated feasibility in target recognition tasks by selecting physically meaningful features with strong representational capabilities from HRRP data. Meanwhile, through the integration of signal processing and pattern recognition theories, these methods have evolved into multidimensional frameworks incorporating innovative approaches such as time-frequency analysis [
7,
8], sparse representation [
9,
10], and nonlinear transformations [
11,
12]. Although such methods exhibit theoretical advantages in characterizing target scattering properties, significant technical limitations persist. Crucially, such approaches exhibit limited generalizability in multi-target recognition contexts, constrained by category-specific feature engineering and vulnerability to long-tailed distributions characterized by dominant majority classes and severely scarce minority categories that induce biased learning.
Statistical model-based HRRP target recognition methods characterize the distribution patterns of target scattering properties through probabilistic modeling. The core principle involves treating radar echoes as stochastic processes and establishing probabilistic mapping relationships between target types and echo data via statistical inference. Representative statistical models include adaptive Gaussian classifiers, Gamma mixture models, Gamma–Gaussian mixture models, and factor analysis (FA) models. Du et al. systematically developed statistical modeling approaches to address recognition robustness in noisy environments [
13,
14,
15]. Notably, since the introduction of FA models into HRRP statistical recognition in 2008, significant research advancements have been achieved in this domain [
16,
17,
18]. These methods fundamentally depend on prior distribution assumptions. This intrinsic constraint creates a theoretical mismatch with HRRP’s inherent non-Gaussian characteristics and multimodality, ultimately limiting complex scattering pattern characterization.
Deep learning approaches leverage HRRP sequences’ inherent spatiotemporal characteristics, where temporal dependencies from azimuth variations and spatial correlations across range cells provide complementary discriminative information. Initial efforts focused on temporal modeling through recurrent architectures, achieving notable success in aircraft target recognition [
2,
19,
20]. While existing studies have achieved progress in target recognition through temporal feature modeling with recurrent neural networks, their frameworks exhibit notable limitations by neglecting spatial correlation characteristics within HRRP sequences. Consequently, the effective integration of spatiotemporal joint features in HRRP has emerged as a critical research focus in this field. Wan et al. [
21] developed a hybrid recognition framework integrating convolutional neural networks (CNNs) with bidirectional recurrent neural networks (BiRNNs), where CNN modules extract spatial correlation features from HRRP data, while BiRNN modules capture temporal dependencies across range cells. Wang et al. [
22] innovatively combined CNNs with Bidirectional Encoder Representations from Transformers (BERT), employing convolutional modules to characterize local spatial structures and utilizing BERT’s multi-head attention mechanisms to extract temporal features from HRRP sequences. Wu et al. [
23] advanced this approach by constructing a fusion network that initially pre-extracts features through BERT, subsequently employs multi-scale CNNs and bidirectional gated recurrent networks to capture local characteristics and long-range dependencies, respectively, and finally concatenates features to integrate advantages from diverse networks. Establishing effective spatiotemporal feature extraction mechanisms to achieve complementary fusion of HRRP’s spatiotemporal characteristics and extending these to multi-target recognition scenarios remains a highly valuable research frontier.
With the rapid advancement of deep learning and growing academic interest in Transformer methodologies, recent years have witnessed increasing exploration of their applications in HRRP recognition. Zhang et al. [
24] developed a feature-guided Transformer model that integrates manually designed features into attention modules to focus on HRRP range cells with rich scattering information, significantly improving recognition accuracy under small-sample conditions. Diao et al. [
25] proposed a positional embedding-free CNN–Transformer hybrid architecture optimized for HRRP data characteristics, effectively eliminating the need for traditional positional encoding. Wang et al. [
26] employed dual-branch Transformer encoders to separately extract temporal and spatial features, with designed attention fusion mechanisms achieving adaptive feature weighting. Gao et al. [
27] innovatively introduced polarization preprocessing modules that combine artificial features with CNNs to enhance feature representation, constructing a Vision Transformer-based framework that substantially improves local and temporal feature extraction. Although current studies improve performance through module stacking, they commonly suffer from overfitting risks caused by excessive parameters. Insufficient emphasis on model lightweighting poses challenges in practical deployment on edge devices.
Current HRRP radar target recognition research primarily focuses on two typical data environments: (1) recognition methodologies under class-balanced conditions, primarily addressing cooperative target identification tasks. Under such conditions, controllable detection conditions enable researchers to acquire abundant high-quality samples, thereby facilitating the adoption of complex models like deep neural networks to leverage their powerful feature extraction capabilities for high-precision recognition. (2) Recognition technology exploration under data-limited and class-imbalanced conditions, mainly targeting non-cooperative target identification. Due to factors such as long detection ranges and complex electromagnetic environments, effective sample acquisition for such targets proves challenging, resulting in datasets exhibiting typical long-tailed distribution characteristics. It is noteworthy that in practical applications such as national defense and aerospace, the identification of non-cooperative targets with high observational value presents more urgent demands. These operational environments face three core challenges: First, target echoes are susceptible to noise interference, leading to significantly reduced signal-to-noise ratios; Second, there exists a prominent contradiction between target category diversity and limited training samples; Third, measured data inherently exhibit long-tailed distribution properties. Establishing effective recognition models under multi-category imbalanced data conditions has emerged as a critical scientific challenge in advancing the practical application of radar automatic recognition technology.
Researchers are actively exploring diverse technical approaches to address class imbalance issues in HRRP target recognition. Yin et al. [
28] proposed an Adaptive Uniform Manifold Approximation and Projection (AUMAP) segmentation algorithm, which mitigates data imbalance by modifying the CNN loss function into a focal loss formulation. Jia et al. [
29] developed a memory-based neural network (MBNN) for imbalanced data conditions: CNN-extracted features are processed through a memory module that records misclassified and low-confidence samples, followed by long short-term memory (LSTM) based fusion of classified samples with buffer-stored similar samples for final decision-making. Zhang et al. [
30] introduced an open-set imbalanced recognition network integrating dual-attention mechanisms, memory modules, across functions, and decoupled training strategies, optimizing intra-class and inter-class similarity constraints via angular penalty loss. Guo et al. [
31] established a transfer learning framework involving pretrained models with source domain data, subsequently resetting fully-connected and output layer parameters, and proposed a novel loss function to suppress inter-class bias for measured HRRP datasets with limited samples and class imbalance. Wu et al. [
32] created a weighted synthetic minority oversampling technique (SMOTE) algorithm that dynamically allocates synthetic weights based on Euclidean distances among minority samples, combined with multi-scale CNN–Transformer encoder attention models to enhance multi-level feature classification accuracy. Tian et al. [
33] designed a gradient-guided class re-balancing loss (GRB Loss) for space micro-motion targets, which dynamically assigns weights according to accumulated positive-negative gradient ratios at classification nodes, ensuring recognition robustness across varying imbalance ratios.
As a core technology of RATR, HRRP sequence-based target recognition plays a critical role in determining battlefield perception capabilities within complex environments. Nevertheless, existing methodologies exhibit significant technical limitations in feature synergy utilization, computational efficiency, and scenario generalization. Primarily, inadequate collaborative modeling of spatiotemporal features restricts cross-dataset adaptability. While HRRP sequences inherently contain temporal pose evolution characteristics and spatial scattering structural information, current approaches predominantly focus on temporal feature extraction via recurrent neural networks or spatial correlation mining through convolutional architectures, resulting in inadequate collaborative representation of spatiotemporal features. This deficiency leads to drastic performance degradation in complex battlefield environments involving variant targets or low signal-to-noise conditions, revealing fundamental weaknesses in generalization robustness. Secondly, structural redundancy in models elevates overfitting risks. Contemporary mainstream approaches predominantly adopt complex module stacking strategies, such as deep Transformer encoders and multi-branch convolutional architectures, to enhance recognition performance. This architectural complexity results in exponential escalation of both parameter volumes and computational demands. However, the inherent contradiction between embedded radar devices’ limited computing power and resource-intensive model requirements not only hinders deployment feasibility but also exacerbates overfitting risks due to insufficient training data, severely compromising practical utility. Finally, biased data distribution exacerbates generalization deterioration. Current research predominantly concentrates on class-balanced datasets, with insufficient investigation into multi-class long-tailed distribution challenges. Traditional cross-entropy loss functions, which apply uniform weighting to all samples, induce excessive model adaptation to majority classes while neglecting discriminative feature learning for high-value tail categories. Consequently, this imbalance triggers catastrophic performance degradation in critical minority class recognition.
To address the above technical challenges in HRRP sequence target recognition, this paper proposes a lightweight spatiotemporal fusion-based (LSTF) HRRP sequence target recognition method. The main innovations are summarized as follows:
- 1.
LSTF-based HRRP sequence target recognition framework:
- 2.
Lightweight Transformer encoder for temporal modeling:
- 3.
Transform-domain spatial feature extraction network:
We develop a transform-domain spatial feature extraction network that integrates the fractional Fourier transform (FrFT) with an enhanced squeeze-and-excitation fully convolutional network (FSCN). By exploiting multi-domain spatial representations, the proposed network enhances the discriminability of scattering energy distributions across target classes at specific fractional orders, leading to improved classification performance.
- 4.
Adaptive loss for long-tail recognition:
The remainder of this paper is organized as follows.
Section 2 details the proposed LSTF network architecture, including the overall framework, the TGLT-based temporal feature encoder module, the transform-domain spatial feature fully convolutional module, and the decision fusion and recognition module.
Section 3 introduces the proposed AFL-LS function, designed to address class imbalance.
Section 4 presents the experimental results and analysis, including dataset descriptions, recognition performance comparisons, ablation studies, and feature visualizations. Finally,
Section 5 concludes the paper.
Notably, the proposed method also exhibits strong potential for extension to target detection tasks in high-frequency surface wave radar (HFSWR) systems, particularly in high-resolution maritime surveillance scenarios. For ship detection and discrimination from sea clutter on range–Doppler maps, the lightweight TGLT-based encoder can effectively capture the temporal dynamics of ship echoes, while the transform-domain FSCN module enhances feature separability by exploiting scattering energy distributions in the fractional Fourier domain. Moreover, the AFL-LS loss mitigates the severe imbalance between sparse ship targets and dense sea clutter, thereby improving detection robustness. Benefiting from its lightweight architecture and low computational overhead, the proposed framework is well aligned with the real-time processing requirements of operational HFSWR systems. These characteristics indicate that the proposed method is not limited to HRRP sequence recognition but also holds broader practical value for cross-domain radar target analysis.