AI for Wireless Waveform Recognition: A Survey from a Component Perspective

Zhao, Decan; Yang, Junteng; Zhao, Dongwei; Zhang, Lechi; Xu, Zhenyu; Cao, Anjie; Lin, Wensheng; Cheng, Wenchi; Du, Qinghe; Li, Lixin

doi:10.3390/electronics15102112

Open AccessReview

AI for Wireless Waveform Recognition: A Survey from a Component Perspective

by

Decan Zhao

¹,

Junteng Yang

²,

Dongwei Zhao

^1,3,

Lechi Zhang

^1,4,

Zhenyu Xu

⁵,

Anjie Cao

⁵,

Wensheng Lin

^1,6,*,

Wenchi Cheng

⁷,

Qinghe Du

⁸ and

Lixin Li

^1,6,*

¹

School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China

²

Key Laboratory of Radio Spectrum Testing Technology, The State Radio_Monitoring_Center Testing Center, Ministry of Industry and Information Technology, Beijing 100037, China

³

No. 208 Research Institute of China Ordnance Industries, Beijing 102202, China

⁴

Xi’an Hengxiang Control Technology Co., Ltd., Xi’an 710071, China

⁵

Shanghai Institute of Satellite Engineering, Shanghai 201109, China

⁶

DecoreX Intelligent Technologies Co., Ltd., Xi’an 710075, China

⁷

State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China

⁸

School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(10), 2112; https://doi.org/10.3390/electronics15102112

Submission received: 27 March 2026 / Revised: 24 April 2026 / Accepted: 8 May 2026 / Published: 14 May 2026

(This article belongs to the Special Issue Innovations in Radio Frequency Technologies, Wireless Communication, and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

Electromagnetic signal waveform recognition (ESWR) constitutes a fundamental enabling technology for modern spectrum management, cognitive radio, and electronic warfare applications. Among various ESWR subtasks, automatic modulation recognition (AMR) has attracted the most intensive research efforts and serves as the primary focus of this survey. Over the past decade, deep learning (DL) has fundamentally transformed ESWR by replacing hand-crafted feature engineering with data-driven end-to-end learning paradigms. However, the rapid proliferation of DL-based approaches has resulted in a fragmented research landscape. This paper addresses this gap by proposing a unified system-component framework that decomposes any DL-ESWR system into four foundational modules: (i) dataset construction and data augmentation, (ii) signal representation and preprocessing, (iii) core network architecture, and (iv) training and optimization strategy. Through this systematic lens, we provide a comprehensive review that catalogs the state of the art across recent publications and precisely attributes each innovation to specific modules within our framework. Furthermore, we identify eight core challenges confronting the practical deployment of DL-ESWR systems and systematically analyze how targeted modular innovations address each challenge. A critical analysis of prevalent benchmark datasets reveals significant limitations in channel diversity, modulation coverage, and ecological validity. Finally, we outline seven promising future research directions, including foundation models for wireless signals, physics-informed neural networks, and waveform recognition for emerging communication paradigms, such as semantic communications and integrated sensing and communication (ISAC). This survey aims to provide researchers and practitioners with a structured roadmap for understanding, evaluating, and advancing the field of AI-enabled electromagnetic signal waveform recognition.

Keywords:

automatic modulation recognition; deep learning; electromagnetic signal waveform recognition; convolutional neural network; transformer; signal representation; adversarial robustness; open-set recognition; few-shot learning; cognitive radio

1. Introduction

1.1. Background and Significance of ESWR

Electromagnetic signal waveform recognition (ESWR)refers to the process of automatically identifying the intrinsic attributes and modulation patterns embedded within intercepted electromagnetic waves. As wireless communication systems evolve toward the sixth generation (6G) era, the electromagnetic spectrum has become an increasingly contested resource. ESWR serves as a cornerstone technology underpinning a broad spectrum of applications across both civilian and military domains, as illustrated in Figure 1.

In the civilian sector, ESWR enables dynamic spectrum access (DSA) for cognitive radio (CR) networks, where secondary users must identify the modulation schemes of primary users to opportunistically access idle spectrum bands. With the Internet of Things (IoT) paradigm, reliable modulation identification is essential for spectrum management and interference mitigation. Furthermore, integrated sensing and communication (ISAC) in 6G systems demands waveform recognition capabilities extending beyond conventional communication signals to encompass radar waveforms and composite multi-function signals. The authors of [1] proposed a CNN-driven radio frequency identification framework to enable accurate UAV recognition from raw RF signals. The proposed framework exhibits strong robustness under varying UAV types and different SNR conditions, highlighting its effectiveness for practical UAV detection scenarios.

In the military and defense domain, ESWR constitutes a critical component of electronic warfare (EW), electronic intelligence (ELINT), and signal intelligence (SIGINT) systems. The ability to rapidly identify the modulation types of intercepted adversary signals directly impacts threat assessment and the formulation of effective electronic countermeasures (ECM). Modern multi-function radars employ agile waveforms with complex intra-pulse modulation patterns, presenting formidable recognition challenges.

Within the broad umbrella of ESWR, this survey focuses primarily on automatic modulation recognition (AMR) of communication signals while also encompassing radar signal intra-pulse modulation recognition and jamming signal identification as closely related subtasks.

1.2. Limitations of Traditional ESWR Approaches

Traditional ESWR approaches fall into two paradigms: likelihood-based (LB) methods and feature-based (FB) methods. LB methods [2] formulate modulation classification as a hypothesis testing problem, employing the average likelihood ratio test (ALRT), generalized likelihood ratio test (GLRT), or hybrid likelihood ratio test (HLRT) [3,4,5] to determine the most probable modulation type.

The fundamental signal model considers a baseband received signal

y (t) = \int h (τ, t) \cdot x (t - τ) \cdot e^{j (2 π Δ f t + θ)} d τ + w (t)

(1)

where

y (t)

is the received signal,

x (t)

is the transmitted signal,

h (τ, t)

is the time-varying channel impulse response,

Δ f

and

θ

represent frequency and phase offsets, and

w (t)

is additive noise. The AMR problem is formulated as a multiple composite hypothesis test over M candidate modulation types:

H_{k} : r (n) = h (n) * s_{k} (n; θ_{k}) + w (n), k = 1, 2, \dots, M

(2)

Under the ALRT framework, the optimal decision rule maximizes the average log-likelihood ratio

L (y | H_{i}) = ln \int_{Θ} p (y | H_{i}, Θ) \cdot p (Θ) d Θ

(3)

where

Θ

is the unknown parameter vector. The GLRT substitutes maximum likelihood estimation when the prior

p (Θ)

is unavailable:

{\hat{k}}_{GLRT} = arg max_{i} max_{Θ} ln p (y | H_{i}, Θ)

(4)

While theoretically elegant, these methods suffer from prohibitive computational complexity and require accurate knowledge of channel parameters, exhibiting severe degradation when assumed models deviate from actual conditions.

FB methods extract discriminative statistical features—such as higher-order cumulants (HOCs), cyclostationary features [6,7], and instantaneous statistics—and employ classifiers including SVMs or ANNs. While computationally simpler, FB methods heavily rely on expert domain knowledge for feature engineering.

LB methods, despite practical limitations, establish asymptotically optimal performance bounds for modulation classification. These bounds serve as benchmarks against which any AMR algorithm should be evaluated. However, as discussed in Section 3.1, the DL-ESWR community has largely neglected this comparative dimension.

Summary of Limitations. We consolidate the limitations of the LB and FB paradigms into five categories, which jointly motivate the transition to data-driven methods: (i) Model-dependence: Both LB and FB methods require explicit channel, noise, and signal models, and any deviation—multipath variations, non-Gaussian noise, or unmodeled hardware impairments—causes severe performance degradation. (ii) Computational intractability: The ALRT involves high-dimensional integration over unknown parameter vectors, often computationally prohibitive for real-time operation with tens of modulation classes. (iii) Parameter-knowledge requirement: LB methods demand accurate prior knowledge of carrier frequency offset, phase offset, symbol rate, and pulse-shaping filter, which are rarely available for non-cooperative signals in electronic warfare or spectrum surveillance. (iv) Feature-engineering bottleneck: FB methods rely on expert-designed features whose discovery for emerging modulation schemes (e.g., generative-model-produced waveforms) requires years of domain expertise. (v) Limited scalability: Both paradigms scale poorly to large modulation sets and to composite multi-function waveforms (e.g., ISAC signals), where the number of required features and hypothesis tests grows combinatorially. Despite these limitations, LB methods’ asymptotic optimality means they remain important as theoretical benchmarks, a role that is often overlooked in the DL-ESWR literature. The deep learning paradigm reviewed next in Section 1.3 directly addresses limitations (i)–(v) by jointly learning features and decision boundaries from data.

1.3. The Deep Learning Paradigm Shift

Deep learning (DL) has fundamentally revolutionized ESWR by introducing a data-driven end-to-end learning paradigm. DL-based methods automatically learn hierarchical feature representations directly from raw signal data, eliminating manual feature engineering while achieving superior performance [8,9]. The seminal work by O’Shea et al. [10,11] first demonstrated the application of CNNs to raw I/Q samples for modulation classification, catalyzing an explosion of research interest.

The advantages of DL-based ESWR include: (1) automatic discovery of complex nonlinear feature representations capturing subtle discriminative patterns at low SNR; (2) joint optimization of feature extraction and classification; and (3) remarkable flexibility through transfer learning, domain adaptation, and incremental learning strategies.

1.4. Related Work and Research Gaps

The rapid growth of DL-ESWR has motivated a number of survey papers, which can be categorized along three complementary dimensions.

Surveys organized by signal type. Several surveys focus on a specific class of signals. Geng et al. [12] provide a comprehensive overview of deep learning applications to radar signal processing. Huynh-The et al. [13] survey deep architecture designs specifically for communication AMR. While offering depth within their chosen scope, these signal-type surveys cannot expose the shared methodological patterns across signal types and typically lack a unifying analytical lens.

Surveys organized by algorithm family. Wang et al. [14] organize the literature by neural network family (CNNs, RNNs, transformers, and GNNs). Useful as an algorithmic taxonomy, this organization conflates architecture-level innovations with orthogonal innovations in data, representation, and training, blurring the sources of performance gains.

Surveys organized by specific research theme. A third group targets a specific theme or module. Peng et al. [15] focus on signal representation and data preprocessing (i.e., primarily Module II in our framework). Wang et al. [16] survey deep transfer-learning methods for AMC (primarily Module IV). Such theme-centric surveys illuminate a focused slice but do not provide a system-level view connecting all four modules.

Research gap. Three recurrent limitations are observable across all three categories: (i) no formal system-component decomposition; (ii) no innovation attribution; (iii) no bidirectional challenge↔module mapping.

The present survey. Our work addresses these three gaps: we propose an explicit four-module decomposition; we attribute each surveyed publication to specific module(s), enabling the quantitative observation that Modules III and IV absorb over 70% of innovations while Modules I and II remain underexplored; and we establish a formal challenge–solution mapping that makes deployment-to-research connections explicit.

1.5. Contributions and Organization

This paper makes the following key contributions:

(1) Unified System-Component Framework: We propose a unified analytical framework decomposing any DL-ESWR system into four foundational modules—dataset and data augmentation, signal representation and preprocessing, core network architecture, and training and optimization strategy.

(2) Innovation Attribution and Taxonomy: We attribute innovations in recent publications to specific modules within our framework, revealing that most advances arise from targeted modifications to one or two modules.

(3) Challenge–Solution Mapping: We identify eight core challenges confronting practical DL-ESWR deployment and construct a detailed mapping between challenges and modular innovations.

(4) Critical Dataset Analysis: We critically assess prevalent benchmark datasets, exposing limitations in channel diversity, modulation coverage, and annotation quality.

(5) Forward-Looking Research Roadmap: We delineate seven future research directions, including foundation models, physics-informed neural networks, and waveform recognition for emerging paradigms.

The organization of this paper is shwon in Figure 2.The remainder of this paper is organized as follows. Section 2 introduces our framework. Section 3 presents the eight core challenges. Section 4 outlines future directions. Section 5 concludes the paper.

2. A Unified Framework for DL-ESWR and Its Core Modules

This section establishes a universal system-component framework for analyzing DL-ESWR works, as depicted in Figure 3. Any DL-ESWR system can be decomposed into a sequential pipeline:

Raw Signal→[Module I: Dataset and Data Augmentation]→[Module II: Signal Representation and Preprocessing]→[Module III: Core Network Architecture]→[Module IV: Training and Optimization Strategy]→Recognition Result

Figure 3. The proposed unified system-component framework for DL-ESWR. Any DL-ESWR system can be decomposed into four foundational modules: Module I (Dataset and Data Augmentation), Module II (Signal Representation and Preprocessing), Module III (Core Network Architecture), and Module IV (Training and Optimization Strategy). The data flow from raw intercepted signals to final recognition results passes sequentially through these modules.

Each module encapsulates a distinct functional responsibility. Module I governs dataset construction and data augmentation. Module II transforms raw signals into representations that are suitable for neural network processing. Module III encompasses the core neural network architecture. Module IV defines the training paradigm, loss functions, and optimization strategies.

2.1. Module I: Dataset Construction and Data Augmentation

2.1.1. Benchmark Datasets for ESWR

We selected the benchmark datasets reviewed in this section according to four criteria, each tied to deployment relevance. (i) Public availability and licence permissiveness: Only openly released datasets can be independently reproduced and compared across research groups. (ii) Modulation diversity: The dataset should cover both legacy schemes (BPSK, QPSK, QAM, and FM/AM) and modern waveforms (OFDM, LoRa, APSK, and 5G NR) so that classifiers generalize across the heterogeneous modulation zoo rather than overfitting to a narrow subset. (iii) SNR-range coverage: The dataset should span as wide an SNR range as possible (ideally

- 20

to

+ 30

dB) because deployment SNR varies sharply. Urban multipath and distant emitters push operational SNR below 0 dB, while cooperative short-range links exceed

+ 20

dB; narrow-SNR training leaves blind spots at either extreme. (iv) Channel-model realism: The dataset should extend beyond AWGN to include frequency-selective multipath fading (Rayleigh/Rician), Doppler spread, co-channel interference, and RF-chain non-idealities (CFO, phase noise, and I/Q imbalance); classifiers trained only on AWGN systematically fail on over-the-air captures, and channel fidelity is the most important determinant of the simulation-to-deployment accuracy gap. By these criteria, the RadioML family, HisarMod2019, and DeepSig are the de facto benchmarks for communication AMR, while radar intra-pulse modulation research still relies on laboratory-specific custom datasets.

The limitations of these datasets directly affect model generalization in three ways. First, because RadioML’s channel generator uses simplified multipath and AWGN models, models trained on it overfit to simulator-specific channel statistics and exhibit substantial accuracy drops when tested on over-the-air captures (see Section 3.4). Second, the restricted modulation set (11–24 types) causes false confidence under open-world deployment, where out-of-distribution modulations are misclassified with high softmax confidence. Third, the fixed window length (128 or 1024 samples) creates receptive-field dependencies that break when models are deployed on captures of different durations. We flag these three mechanisms throughout the manuscript and return to them in Section 3.4 (Challenge IV).

Standardized benchmark datasets have been instrumental in catalyzing DL-ESWR progress. The RadioML family holds a preeminent position. The RML2016.10a dataset [10,11] comprises 220,000 samples spanning 11 modulation types across

- 20

dB to

+ 18

dB SNR with 128 I/Q time steps. The RML2018.01a dataset [11] contains over 2.5 million samples across 24 modulation types with 1024 I/Q time steps. The HisarMod2019 dataset provides 26 modulation types under various SNR conditions. The commonly used datasets are summarized in Table 1.

2.1.2. Data Augmentation Techniques

Data augmentation is a critical strategy for mitigating data scarcity in DL-ESWR.

Basic Signal Transformations: These include rotation of I/Q constellation points, flipping of time-domain waveforms, adding Gaussian noise at varying levels, and applying random phase and frequency offsets. The main design consideration is preserving modulation-invariant properties while generating sufficient distributional diversity; overly aggressive transformations risk crossing class boundaries.

Generative Model-Based Augmentation: Conditional GAN architectures conditioned on modulation type and SNR [17] generate high-fidelity synthetic signals that boost classifier performance when augmenting limited real datasets. Limitations include training instability and mode-collapse risk, which may silently reduce augmented-sample diversity despite high visual fidelity.

Signal-Specific Augmentation: Domain-informed strategies, such as segment-wise substitution [17,18], replace signal portions with segments from same-class samples, preserving modulation characteristics while introducing intra-class variability. Such approaches outperform generic computer-vision-style augmentations because they respect the sequential structure of modulated symbol streams.

2.1.3. Critical Analysis of Dataset Limitations

Most DL-ESWR progress has been measured on synthetic benchmarks (RML2016, RML2018, HisarMod, and private radar/jamming datasets). These synthetic datasets introduce several structural biases that must be explicitly acknowledged.

Channel-model bias. The RadioML family uses a simplified channel generator that does not capture site-specific scattering, non-Gaussian interference, or non-WSSUS behavior. Models overfit to the generator rather than to the underlying modulation physics.

SNR-definition bias. SNR labels in synthetic datasets are computed analytically; real over-the-air SNR is estimated and includes spatial/temporal variation absent from simulation. The “accuracy-vs-SNR” curve on RadioML therefore does not faithfully predict the accuracy-vs-measured-SNR curve in deployment.

Parameter-coverage bias. Carrier frequency offsets, sampling-rate offsets, and phase noise are typically drawn from narrow distributions (or set to zero). Models frequently collapse outside the training distribution.

Modulation-set bias. RadioML contains 11–24 modulation types, whereas a modern spectrum includes hundreds of waveform variants. The closed-world assumption fundamentally limits generalization.

Temporal-window bias. Fixed 128 or 1024 sample windows bias architectures toward window-sized receptive fields. Real captures have variable durations, causing length-mismatch failures.

Pulse shape and filtering bias. Synthetic signals use ideal raised-cosine pulse shaping with zero excess bandwidth variability; real transmitters exhibit manufacturer-specific filter responses that may serve as classification features (“RF fingerprints”), confounding the intended modulation classification task.

Mitigation pathways. (i) Mixed synthetic–measurement datasets with clearly documented provenance; (ii) evaluation protocols that include both in-distribution and out-of-distribution test splits; (iii) domain randomization during data generation; (iv) explicit reporting of the simulation-to-real accuracy drop as a standard benchmark metric. These pathways connect to the challenge aspect in Section 3.4 and the agenda in Section 4.

Other signal-analysis domains offer instructive templates for how to design the experimental component of a pattern-recognition study with physical rigor. In particular, Manin et al. [19] demonstrate, in the context of essential-oil authentication, the value of combining physics-driven spectral features (FTIR) with statistically principled classification (PCA–LR) rather than relying solely on end-to-end black-box learning. The underlying methodological lesson directly transfers to DL-ESWR: combining physics-informed signal representations (e.g., cyclostationary features and higher-order cumulants) with data-driven classifiers can improve both interpretability and sample efficiency relative to purely end-to-end architectures. This principle reinforces the physics–AI hybrid direction discussed in Section 3.6 and Section 4.4 and motivates extending the DL-ESWR experimental protocol beyond synthetic-only evaluation towards tiered simulation–measurement validation.

The original concise summary is retained below for quick reference: Insufficient Channel Diversity: Most public datasets employ only AWGN or simplified multipath models, leading to “laboratory performance inflation.” Limited Modulation Coverage: Most datasets contain 11–24 modulation types, representing only a fraction of modern schemes. SNR Distribution Mismatch: Sample quality at extreme low SNR levels is often inadequate. Annotation Reliability: Simulated datasets lack realistic annotation noise. Future efforts should move toward hybrid simulation–measurement paradigms with multi-channel, multi-scenario coverage.

2.1.4. Toward an Ideal ESWR Dataset

Based on the bias analysis of Section 2.1.3 and the deployment-condition analysis of Section 3.4, we outline the characteristics of an ideal DL-ESWR dataset.

Multi-modal provenance. A tiered structure consisting of: (i) a large synthetic base with randomized channel, RF-chain, and pulse-shape parameters, (ii) a medium-sized laboratory cable-connected capture, and (iii) a smaller but carefully curated over-the-air split spanning multiple locations, antenna types, and receiver hardware.

Expanded modulation taxonomy. Beyond the 24 classes in RML2018, the dataset should cover contemporary waveforms: 5G NR numerologies, LoRa chirps, Bluetooth LE variants, DVB-S2X, ATSC 3.0, satellite-IoT protocols (e.g., NB-IoT NTN), and Wi-Fi 6/7 preambles. A minimum of 50–100 classes is necessary to approach real-world spectrum diversity.

Rich metadata. Every sample should carry a complete metadata record: exact SNR (measured and analytical), CFO, phase noise, bandwidth, hardware identifier, location, and time stamp.

Realistic channel conditions. Over-the-air captures should span urban, suburban, indoor, aerial, and maritime propagation. Standard 3GPP channel models should be included in synthetic splits, not only simplistic AWGN.

Standardized splits and evaluation protocols. Pre-defined train/val/test splits including both in-distribution and held-out out-of-distribution partitions, with standardized reporting scripts for per-SNR, per-class, calibration, and open-set metrics.

To address the validity gap between simulated benchmarks and real-world deployment, we propose that future DL-ESWR publications routinely report at least three quantities: (i) in-distribution accuracy on the standard synthetic benchmark (for historical comparability), (ii) out-of-distribution accuracy on a held-out synthetic split with randomized channel and RF-chain parameters (to assess generalization within the simulated domain), and (iii) over-the-air accuracy on at least one publicly shared measurement set (to assess simulation-to-real transfer). The gap between (i) and (iii) directly quantifies the experimental-validation gap that has motivated this comment and should itself be treated as a headline metric rather than an afterthought.

2.2. Module II: Signal Representation and Preprocessing

Module II transforms raw signals into formats amenable to neural network processing. It encompasses only input-level representation construction, while network-internal feature fusion belongs to Module III. Table 2 lists commonly used signal representation methods, and Figure 4 illustrates the taxonomy of signal representation.

2.2.1. Time-Domain Representations

The most direct representation utilizes raw I/Q sample sequences as a

2 \times N

matrix, where N is the observation window length (commonly 128 or 1024 samples). O’Shea et al. [20] established feeding raw I/Q data directly into CNNs, demonstrating that competitive AMR accuracy is achievable without any hand-crafted features. The attraction of raw I/Q is threefold: no information is lost to preprocessing, the computational cost of preprocessing is zero, and the representation is end-to-end differentiable. However, the representation is sensitive to phase rotation and frequency offset, necessitating either random-phase data augmentation or phase-invariant architectures. Amplitude/phase (A/P) representations transform Cartesian coordinates into polar form:

A (n) = \sqrt{r_{I} {(n)}^{2} + r_{Q} {(n)}^{2}}, φ (n) = arctan [\frac{r_{Q} (n)}{r_{I} (n)}]

(5)

A/P representations [21] provide complementary information to I/Q, motivating dual-stream architectures that have been shown to yield 3–5 percentage-point accuracy gains over single-representation baselines.

2.2.2. Image-Domain Representations

An influential line of research converts signals into two-dimensional images for classification.

Constellation Diagrams: Constellation density matrices (CDMs) [22] provide discretized representations of the constellation space amenable to CNN processing. CDMs naturally expose modulation order and symmetry but lose phase-transition information, motivating fusion with sequential representations.

Time–Frequency Images: The short-time Fourier transform (STFT) is

STFT (t, f) = \int r (τ) \cdot g (τ - t) \cdot e^{- j π f t} d τ

(6)

The Wigner–Ville distribution (WVD) achieves optimal time–frequency concentration

WVD (t, f) = \int r (t + \frac{τ}{2}) \cdot r^{*} (t - \frac{τ}{2}) \cdot e^{- j π f τ} d τ

(7)

but suffers from cross-term interference. The Choi–Williams distribution (CWD) mitigates this:

CWD (t, f) = \int \int \sqrt{\frac{σ}{4 π τ^{2}}} exp (- \frac{σ {(u - t)}^{2}}{4 τ^{2}}) x (u + \frac{τ}{2}) x^{*} (u - \frac{τ}{2}) e^{- j 2 π f τ} d u d τ

(8)

Additional alternatives include the SPWVD and multi-synchrosqueezing transform (MSST) [23]. Adaptive wavelet decomposition [24] generates multi-resolution time–frequency representations [25]. Time–frequency images trade computational cost for explicit spectro-temporal structure and are particularly effective for radar intra-pulse modulation, where instantaneous frequency evolution is the primary discriminative cue.

Other Image Representations: The spectral correlation function (SCF) exploits cyclostationary properties:

R_{r}^{α} (τ) = E [r (t + \frac{τ}{2}) \cdot r^{*} (t - \frac{τ}{2}) \cdot e^{- j π α t}]

(9)

S_{r}^{α} (f) = \int R_{r}^{α} (τ) \cdot e^{- j π f τ} d τ

(10)

Higher-order spectral images derived from bispectral analysis [26,27] offer noise suppression advantages. Gramian angular field (GAF) representations [7] encode temporal correlations as angular relationships. These higher-order representations are typically more robust at low SNR but incur additional preprocessing latency, making them a design-time trade-off rather than a universal solution.

2.2.3. Expert Feature Representations

Higher-order cumulants (HOCs), particularly fourth-order cumulants [6,7], remain widely adopted due to their invariance to Gaussian noise:

C_{42} = Cum (x, x, x^{*}, x^{*}) = {E [| x |}^{4} {] - 2 (E [| x |}^{2} {])}^{2} - {| E [x^{2}] |}^{2}

(11)

C_{p e} = Cum (x, \dots, x, x^{*}, \dots, x^{*}) \equiv 0 for Gaussian w (t) when p \geq 3

(12)

Incorporating expert features as auxiliary inputs alongside raw I/Q data can significantly enhance performance [28]. The advantage is a principled inductive bias with minimal added cost; the limitation is that hand-crafted features cannot cover newly emerging modulation schemes without fresh expert effort, motivating a hybrid expert–raw strategy rather than a pure expert-feature approach.

2.2.4. Multi-Modal Representations (Input-Level Fusion)

Multi-modal representation exploits complementary information across different domains by constructing composite network inputs, including dual-stream I/Q and A/P, time–frequency fusion, and hybrid expert–raw representations. Reported gains over best-single-representation baselines are typically 2–4 percentage points on RML2016.10a at the cost of 1.5–2× the preprocessing and memory budget.

Figure 5 shows some examples of different signal representations.

2.3. Module III: Evolution of Core Network Architectures

Module III encompasses the core neural network architecture responsible for mapping input representations to classification decisions. Table 3 compares the recognition performance of representative neural networks. The evolution timeline of core network architectures for DL-ESWR is illustrated in Figure 6.

2.3.1. Foundational Architectures

Convolutional Neural Networks (CNNs): CNNs constitute the most extensively studied architecture family. The pioneering application of two-layer CNNs to raw I/Q data established the field’s foundation. AlexNet, VGGNet, and GoogLeNet architectures [29] have been directly applied to AMR with competitive results. Key design choices within the CNN family are: (i) 1D vs. 2D convolution (1D for raw I/Q; 2D for time–frequency images), (ii) kernel size (small 3–5 taps for local time-domain structure vs. larger 8–16 taps for cross-time dependencies), (iii) pooling strategy (global average pooling is increasingly preferred over fully connected heads for parameter efficiency), and (iv) depth (from the 2-layer O’Shea CNN to >50-layer ResNet variants). The fundamental limitation of CNNs is their locality bias: receptive fields grow only linearly with depth, limiting sensitivity to global modulation-level statistics that span the whole observation window—a limitation that directly motivates the hybrid CNN–transformer and pure-transformer architectures discussed below.

Recurrent Neural Networks (RNNs): LSTM [30] and GRU networks [31] capture temporal dependencies in sequential signal data. Bidirectional LSTM variants [32] process sequences in both directions for enhanced temporal modeling. RNN-based AMR models have historically outperformed shallow CNN baselines because recurrent units naturally capture the sequential structure of modulated symbol streams. Typical weaknesses include difficulty in parallelizing training (sequential unroll), vanishing-gradient issues at long observation windows, and sensitivity to phase drift, which have motivated the shift toward attention-based and hybrid architectures.

2.3.2. Hybrid Architectures

Hybrid architectures combine complementary strengths of different network families. The CLDNN architecture [10] cascades CNN layers with LSTM layers, representing the prototypical hybrid design. CNN–LSTM dual-stream structures [33] process data through parallel branches for simultaneous spatial and temporal feature extraction. Multi-stream architectures [34] extend the paradigm with multiple parallel streams. The generalizable principle is that local inductive bias (CNN) and long-range context modeling (LSTM/transformer) are complementary rather than substitutive; hybrid designs empirically yield 3–5 percentage points over single-family baselines on RML2016.10a, although at the cost of larger parameter counts and more complex training.

2.3.3. Advanced Architectures

Deep Residual Networks: Residual learning through skip connections enables significantly deeper networks. ResNet variants (ResNeSt [35] and ConvNeXt [36]) have been widely adopted. The deep residual shrinkage network (DRSN) [37] integrates soft thresholding attention. For LPI radar signal recognition [38], improved ResNeSt architectures demonstrate superior capabilities. Residual designs mitigate gradient vanishing at depth and make it practical to stack tens of layers, which is the primary mechanism by which deep CNNs surpass RNN baselines on long observation windows.

Transformer Architectures: The transformer’s self-attention mechanism captures long-range dependencies. vision transformer (ViT) variants [39] have been applied to both I/Q sequences and image-domain representations. The CVT-Net architecture [40] employs CNN layers followed by transformer encoders. IQFormer [41] integrates multi-modality fusion within a transformer framework [42]. The multi-scale transformer (MST) [43] introduces cross-scale token fusion. Window attention convolution networks [44] combine local self-attention with convolutional operations. Key design considerations for transformer-based ESWR are: (i) tokenization strategy (patches of I/Q samples, time–frequency image patches, or cross-domain hybrid tokens), (ii) positional encoding (crucial for preserving temporal ordering of I/Q symbols), and (iii) computational cost, which grows quadratically with sequence length and can be partially mitigated by window or sparse attention. Transformers excel at capturing long-range structure in long observation windows but typically require more parameters and training data than pure CNN counterparts, making them less suited for extreme edge deployment unless combined with compression.

Graph Neural Networks (GNNs): GNNs have emerged as a distinctive paradigm for ESWR with three principal use cases: graph-based modeling of signal relationships [45], multi-domain feature graph fusion via STF–GCN [46], and adversarial robustness applications [47]. GNNs naturally encode non-Euclidean structural priors (e.g., correlations between multiple signal representations) that CNNs and transformers cannot express directly; the main design challenge is specifying an appropriate graph construction strategy.

2.3.4. Specialized Architectures

Complex-Valued Neural Networks (CVNNs): Since I/Q signals are inherently complex-valued, CVNNs employ complex-valued weights:

W * h = (W^{R} * h^{R} - W^{I} * h^{I}) + j (W^{R} * h^{I} + W^{I} * h^{R})

(13)

Complex-valued transformers for AMR [48,49] represent the state of the art in this direction. By preserving the phase relationship between I and Q channels throughout the network, CVNNs encode a physically motivated inductive bias and achieve particularly strong results on phase-sensitive modulations (e.g., PSK and QAM variants) at the cost of roughly 2× the computational complexity of a real-valued counterpart.

2.3.5. Comparative Analysis of Architectural Paradigms

CNNs remain the most mature family, with excellent efficiency. Transformers offer global receptive fields and parallel processing, addressing CNN and RNN limitations. GNNs enable relational reasoning over non-Euclidean representations. CVNNs offer physically motivated advantages for complex-valued signal processing.

Quantitative-informed qualitative trends. While Table 3 compile RML2016.10a accuracy figures, varying architectures, training strategies, and evaluation protocols across original publications preclude strict numerical ranking without a unified re-evaluation. Nevertheless, clear qualitative trajectories emerge across the literature. Accuracy consistently improves along a CNN → RNN → deep CNN → transformer axis, reflecting the progressive accumulation of stronger inductive biases, such as sequential dependency, deep receptive fields, and global context. Furthermore, the performance frontier (∼63–65%) is currently dominated by hybrid designs combining local convolutions with global attention, alongside specialized networks (GNNs and complex-valued architectures) that exploit structural priors specific to wireless signals. Concurrently, an exploitable accuracy–efficiency trade-off exists: ultra-lightweight architectures sacrifice only 3–5% accuracy while reducing parameters by orders of magnitude, enabling aggressive compression. Finally, despite these robust architectural trajectories, operationally critical metrics like end-to-end latency and energy consumption remain severely under-reported in existing studies.

2.4. Module IV: Training and Optimization Strategies

Module IV governs the training paradigm, loss function design, and optimization strategy. We structure the analysis along goal-oriented dimensions reflecting primary optimization objectives. Figure 7 provides the taxonomy of training and optimization strategies for DL-ESWR, and Table 4 compares the training strategies.

2.4.1. Goal I: Reducing Label Dependency (Annotation Efficiency)

Semi-Supervised Learning: Semi-supervised approaches exploit unlabeled signal data. SemiAMR [50,51] employs corrected pseudo-labels and consistency regularization. Virtual adversarial training (VAT) [52] has been applied to semi-supervised radar signal classification. The dominant SSL framework in DL-ESWR is pseudo-labeling with consistency regularization: a labeled-data-trained model produces pseudo-labels on unlabeled samples, and retraining incorporates a consistency loss requiring stable predictions under input perturbations. Reported results indicate that SSL can match fully supervised accuracy with only 20% of the labels [50,51]. Three practical design considerations are: (i) pseudo-label confidence thresholding, (ii) perturbation strength (must remain within the modulation-invariance manifold), and (iii) curriculum scheduling of the threshold as the model improves.

Self-Supervised and Contrastive Learning: Self-supervised learning constitutes one of the most rapidly growing directions. The KAN–MAE framework [53] integrates KAN architectures with masked autoencoder pretraining. The core contrastive objective is the InfoNCE loss,

L_{InfoNCE} = - log \frac{exp [sim (z_{i}, z_{i}^{+}) / τ]}{\sum_{j = 1}^{K} exp [sim (z_{i}, z_{j}) / τ]}

(14)

where

z_{i}

and

z_{i}^{+}

are embeddings of augmented views of the same sample and

τ

is a temperature parameter. CoCL-Sig [54] constructs pairs through signal-specific augmentations. EET-MoCo [55] combines an efficient embedding transformer with momentum contrast. Hybrid-view frameworks [56] extend contrastive learning with multiple signal views [57]. The success of SSL hinges on the quality of pretext tasks and augmentation design; poorly chosen augmentations that break modulation-invariant structure can actively harm downstream accuracy.

Few-Shot and Meta-Learning: Few-shot techniques enable recognition from a handful of examples. Prototypical networks [58] classify based on distance to class prototypes. The AMR-CapsNet [59] achieves over 80% accuracy using only 3% of training data by leveraging capsule network structures for few-sample AMR. The meta-transformer framework [60] integrates meta-learning with transformers. NAS combined with knowledge transfer [61] has been explored for optimal few-shot architectures. Few-shot AMR is particularly attractive for rare/emerging modulation schemes where large labeled datasets are infeasible; the main limitation is that performance degrades rapidly when the episode distribution at deployment differs from that used during meta-training.

Zero-Shot Learning: Recent work demonstrates zero-shot AMR by leveraging vision–language models (VLMs) [62] that classify based on textual descriptions. Zero-shot AMR is still in its infancy for ESWR: textual descriptions of modulation schemes are less semantically rich than the natural-image captions VLMs are trained on, which limits current accuracy to well below supervised baselines.

2.4.2. Goal II: Reducing Deployment Cost (Efficiency Orientation)

Knowledge Distillation: Knowledge distillation transfers learned knowledge from complex teacher to compact student models. The ClST framework [63] combines convolutional transformers with distillation for few-shot AMR. KD is particularly effective in ESWR owing to the relatively small number of classes and typically preserves 95–98% of teacher accuracy at a fraction of the parameter count, making it a preferred path to edge-deployable models.

Network Pruning: Channel pruning methods [64] remove insignificant convolutional channels [65]. Layer pruning [65] operates at the granularity of entire layers. Structured pruning (channel/layer) yields hardware-friendly acceleration on commodity GPUs and accelerators, whereas unstructured (weight-level) pruning produces higher compression ratios but requires specialized sparse-kernel support to realize speedups.

Federated Learning: Federated learning enables distributed training without transmitting raw data. Federated self-supervised frameworks [66] extend this to heterogeneous settings. GRU-based federated architectures [31] target cognitive radio networks. Federated ESWR is particularly relevant to spectrum monitoring and cognitive radio, where raw I/Q data is sensitive; key challenges are non-IID data distributions across client nodes and communication-efficiency constraints on model-update bandwidth.

2.4.3. Goal III: Enhancing Generalization Capability

Transfer Learning: Transfer learning mitigates domain shift between source and target scenarios. Cross-dataset studies [67,68] demonstrate effective hybrid transfer approaches. Recent work [69] proposes training with audio signals as the source domain and fine-tuning with few modulated signal samples, exploiting SNR as an auxiliary feature to facilitate cross-domain knowledge transfer. The typical design pattern is pretraining on a large simulated corpus followed by fine-tuning on a smaller target-domain dataset; the effectiveness of transfer depends strongly on the alignment between source and target data-generating processes.

Domain Adaptation: Domain adaptation addresses distribution shift through adversarial training or feature disentanglement [70,71]. Unlike transfer learning, domain adaptation explicitly aligns source and target feature distributions without requiring target labels, making it particularly suitable for simulation-to-OTA generalization where OTA data is available but labeling at scale is infeasible.

Multi-Task Learning: Multi-task learning jointly optimizes related objectives such as SNR estimation and DOA estimation alongside AMR. The MIMO-NN architecture [72] jointly performs DOA estimation and AMC. MTL exploits the inductive bias that shared low-level features carry information about multiple signal properties simultaneously, typically improving data efficiency; its success requires careful task-weight balancing to avoid any single task dominating the shared backbone.

2.4.4. Goal IV: Advanced Training Supervision Design

Loss Function Engineering: Focal loss reweights cross-entropy by a modulating factor:

FL (p_{t}) = - α_{t} {(1 - p_{t})}^{γ} \cdot log (p_{t})

(15)

Supervised contrastive loss [57] extends InfoNCE to the supervised setting. The SCLMR framework [73] addresses long-tailed modulation distributions. Loss-function design is a lightweight but highly cost-effective intervention: a better-chosen loss can deliver 1–3 percentage points of accuracy improvement without modifying the network architecture, making it an important design lever for low-SNR and class-imbalanced scenarios.

Curriculum Learning: Curriculum strategies control difficulty progression; e.g., SNR-based curricula progressively expose networks to lower-SNR samples. The benefit is improved low-SNR robustness and more stable training convergence; the main design question is whether to use a fixed schedule or an adaptive curriculum driven by instantaneous model performance.

2.4.5. Goal V: Robust and Open-World Training Strategies

Adversarial Training: The min–max objective seeks parameters minimizing worst-case loss:

{min}_{θ} E [{max}_{δ} L (f_{θ} (x + δ), y)]

, subject to

{∥ δ ∥}_{\infty} \leq ε

. Adaptive meta-learning-based adversarial training [74] treats robustness as a meta-learning problem. Adversarial training provides a strong empirical robustness baseline but at a 2–4× training-time cost and typically a 1–3 percentage-point clean-accuracy penalty, motivating joint designs with feature-purification preprocessing in Module II.

Incremental and Continual Learning: PASS-Net [75] and orthogonal pseudo-prototype [76] approaches demonstrate class-incremental learning viability. Open-set domain adaptation [77] addresses simultaneous domain shift and class extension. The central challenge is catastrophic forgetting. Accuracy on previously learned classes deteriorates as new ones are added; practical incremental learners combine rehearsal of stored exemplars with regularization terms that anchor representations of old classes.

2.4.6. Summary of Training Strategy Landscape

The training landscape has evolved from simple supervised learning to a rich ecosystem of specialized strategies, with contrastive and self-supervised approaches representing the fastest-expanding direction.

2.5. Evaluation Metric Taxonomy

Given the heterogeneous demands placed on DL-ESWR systems, a single scalar metric cannot adequately characterize performance. We therefore organize the relevant evaluation metrics into five complementary families, summarized in Table 5.

(i) Accuracy-based metrics. Overall classification accuracy is the most commonly reported metric, but it masks important details. Per-SNR accuracy curves reveal robustness at low SNR, which is operationally critical. Per-class accuracy and confusion matrices expose systematic confusion (e.g., between higher-order QAM variants). Macro-F1 and balanced accuracy account for class imbalance common in radar and jamming datasets.

(ii) Efficiency-based metrics. Beyond parameter count and FLOPs, operational efficiency is more faithfully captured by end-to-end inference latency (measured on the target hardware), throughput (samples per second), peak memory footprint, and energy consumption (Joules per inference). These matter more than theoretical FLOPs for edge deployment.

(iii) Robustness-based metrics. Adversarial robustness is measured by the attack success rate under a fixed perturbation budget

ε

, the minimum perturbation norm required for a successful attack, and certified accuracy under provable defences. Environmental robustness is measured by cross-dataset accuracy and accuracy under RF impairments.

(iv) Calibration and uncertainty metrics. Expected calibration error (ECE), Brier score, and the area under the risk–coverage curve quantify how well a classifier’s confidence matches its actual accuracy. These are essential for trustworthy deployment (see Section 3.6).

(v) Open-world and incremental metrics. For open-set recognition, the open-set classification rate (correctly classifying known classes while rejecting unknown ones) is more informative than closed-set accuracy alone. The harmonic mean of known-class accuracy and unknown-class rejection rate, and AUROC on the known-vs-unknown discrimination task, are standard open-set metrics. For class-incremental learning, average accuracy across all observed tasks and the forgetting measure quantify catastrophic forgetting.

We advocate that future DL-ESWR publications routinely report at least one metric from each applicable family rather than relying solely on average top-1 accuracy.

3. Addressing Practical Deployment Challenges Through Modular Innovation

This section examines eight core challenges confronting practical DL-ESWR deployment. The challenges and corresponding innovation modules are listed in Table 6.

3.1. Challenge I: Core Recognition Accuracy and Feature Discriminability

Achieving high accuracy under low SNR and for spectrally similar modulation types (e.g., 16-QAM vs. 64-QAM) remains the most fundamental challenge.

Module II Innovations: Multi-TFI fusion synthesizes multiple time–frequency distributions into three-channel images. The MIIF-Net framework [78] constructs multi-dimensional image inputs capturing comprehensive signal characteristics.

Module III Innovations: Hybrid CNN–transformer architectures combine local feature extraction with global self-attention. Attention innovations including MSTFA [79] enable adaptive focus on discriminative features. FAE-MSLKNet [80] captures both fine-grained details and broad spectral patterns.

Module IV Innovations: Focal loss (Equation (15)) increases the weight of hard-to-classify samples. Joint loss functions combining cross-entropy with center loss and supervised contrastive losses produce more discriminative feature spaces.

A Critical Gap: Distance to Theoretical Optimality

A fundamental limitation is the near-complete absence of comparisons against information-theoretic bounds. Classical LB methods [3,4,5] establish the Bayesian optimal classifier. The DL-ESWR literature overwhelmingly reports only relative comparisons among proposed models. Establishing the DL-to-CRB gap would provide far more informative assessments of progress.

Closing the DL-to-theoretical-bound gap requires three coordinated efforts. First, a systematic benchmarking initiative should report, for every new DL-ESWR model, its accuracy against the ALRT/GLRT upper bound under matched channel assumptions. This enables gap-vs-SNR curves that are far more informative than isolated accuracy numbers. Second, theoretical tools such as Rademacher complexity, PAC–Bayes bounds, and neural tangent kernel analysis—mature in the general DL-theory literature—can be adapted to characterize the sample complexity of AMR architectures and guide data-collection priorities. Third, the information-bottleneck perspective provides a principled language for analyzing why certain signal representations (e.g., constellation vs. STFT) are more informative than others for particular modulation families. Progress along these three fronts would transform DL-ESWR from a largely empirical discipline into one grounded in classical signal processing and learning theory.

3.2. Challenge II: Data Scarcity and Annotation Cost

Large-scale accurately labeled signal datasets are prohibitively expensive in many practical scenarios.

Module I Innovations: Data augmentation techniques from basic transformations to GAN-based generation provide the first line of defense.

Module IV Innovations: Semi-supervised and self-supervised learning dramatically reduce label requirements. Transfer learning and domain adaptation enable cross-scenario knowledge transfer. Few-shot and meta-learning provide extreme label efficiency. Zero-shot learning via VLMs eliminates the need for modulation-specific training data entirely.

3.3. Challenge III: Model Efficiency and Edge Deployment

Many high-performance models incur prohibitive costs precluding edge deployment. Table 7 compares the deployment requirements of representative models.

Module III Innovations: Depthwise-separable convolutions replace standard convolutions. MCNet [29] achieves cost-efficient AMC. ULCNN [81] targets UAV deployment. CPPCNet [82] introduces complex-valued partial convolutions for IoT. Lightweight networks [83] combine depthwise-separable convolution with SNR enhancement modules.

Module IV Innovations: Knowledge distillation, pruning, and federated learning reduce deployment costs. DLRT [84] targets real-time SDR processing.

3.3.1. Beyond Parameters and FLOPs: The Missing Deployment Metrics

The existing literature almost exclusively uses parameter count and FLOPs as efficiency proxies, while more operationally relevant metrics—processing latency, throughput, and power consumption—are largely overlooked. Future research should adopt comprehensive deployment evaluation methodologies.

Accuracy and computational efficiency cannot be optimized independently in DL-ESWR; the right balance depends on the deployment hardware, and model choice should scale accordingly. Three distinct operating points are identifiable. (i) Server-class regime: When compute, memory, and power budgets are effectively unconstrained (e.g., infrastructure nodes, back-end processing, and training-time inference), one can afford high-capacity backbones with hundreds of thousands to millions of parameters, selecting purely for peak accuracy. (ii) Mid-edge regime: For cognitive radio gateways, small-cell base stations, and similar devices with tens of MB in memory and moderate power budgets, mid-sized backbones (∼10–100 K parameters) combined with one compression technique (pruning or moderate quantization or knowledge distillation) typically give the best accuracy per unit of hardware cost. (iii) Ultra-edge regime: For UAV payloads, handheld receivers, and IoT-class sensors where memory is measured in kilobytes and power in milliwatts, ultra-compact backbones (<10 K parameters) combined with aggressive compression (structured pruning plus post-training integer quantization, optionally with distillation) are required. The general empirical observation across the literature is that moving from the server-class regime to the ultra-edge regime costs only a few percentage points of average accuracy on standard benchmarks while reducing parameter count by orders of magnitude, meaning that aggressive compression is viable whenever the deployment does not demand peak performance at the lowest SNR. Knowledge distillation, structured pruning, and post-training quantization (Module IV) are the dominant mechanisms for navigating this balance, and the choice among them should be guided by the specific bottleneck (parameter count, latency, or energy) at the target deployment tier.

3.3.2. Scalability Pathways for Real-Time Systems

Real-time ESWR imposes strict constraints: in a cognitive radio network, modulation decisions must be made within the channel coherence time (∼1–10 ms); in electronic warfare, threat recognition latency typically must be below 1 ms. Our four-module framework supports real-time scalability along four complementary axes.

Module II—representation-level acceleration. Using raw I/Q directly avoids the computational overhead of time–frequency transforms (STFT and CWD), which can dominate preprocessing latency. When richer representations are necessary, fixed-size windows, fast approximate transforms, and pre-computed feature caches can reduce latency by 2–5×.

Module III—architectural scalability. Depthwise-separable convolutions reduce FLOPs by 4–9× with marginal accuracy loss. Structured sparsity and block-wise pruning yield hardware-friendly acceleration on GPUs and AI accelerators. Early-exit architectures allow easy samples to exit shallow layers, delivering variable per-sample latency.

Module IV—optimization-time compression. Post-training integer quantization (INT8/INT4) yields 2–4× speedup with <1% accuracy drop on AMR tasks. Knowledge distillation is particularly effective in ESWR owing to the relatively small number of classes.

Hardware-level scalability. Hardware-aware NAS directly optimizes architectures for specific edge targets (FPGAs, ARM CPUs, and dedicated DSPs). Reported end-to-end latencies of ∼0.3 ms on DSP platforms (DLRT [84]) and ∼0.8 ms on server GPUs demonstrate that sub-millisecond inference is already attainable; further gains are expected from joint Module III–Module IV co-design.

Hence, the four-module framework is directly relevant to real-time scalability: the primary bottleneck is localized to at most two modules per deployment scenario, and the modular perspective makes it straightforward to identify and optimize the critical path.

3.4. Challenge IV: Environmental Robustness and Generalization

DL-ESWR models trained on specific conditions often degrade severely in different environments.

Module II Innovations: Anti-noise representations based on HOCs and spectral correlation functions provide inherent noise suppression. CFO-robust designs [85,86] address parameter mismatch.

Module III Innovations: Attention mechanisms [87] enable adaptive focus on discriminative regions. Multi-scale architectures capture characteristics at different resolutions. Noise-aware ensemble learning [88] combines specialized classifiers.

Module IV Innovations: Multi-task learning with SNR estimation provides implicit noise awareness. AFLNet [85] guides cross-channel AMC. The physics-aware framework [86] with prototype consistency addresses noisy labels.

The Gap Between Laboratory and Real-World Acquisition

Real-world signal acquisition introduces numerous distortions absent from standard benchmarks, which we group into four categories. (i) Front-end RF impairments: ADC quantization noise, LNA intermodulation, I/Q mixer imbalance, local-oscillator phase noise, and sampling-clock drift, jointly producing a mismatched I/Q distribution. (ii) Channel-level effects: Time-varying multipath, Doppler, co-channel interference, and impulsive noise, poorly captured by idealized AWGN/Rayleigh models. (iii) Protocol-level variability: Symbol-rate jitter, frame boundaries, pilot sequences, and FEC, absent from continuous simulated waveforms. (iv) Observation-window effects: Partial bursts, overlapping emitters, or short snapshots with duration mismatched to the training window.

Evidence from OTA studies. The simulation-to-OTA gap is directly documented in recent work: O’Shea et al. [89] collected 1.44M OTA samples with two USRP B210 SDRs (independent oscillators, ≈2 ppm drift) and reported that even matched-OTA ResNet training achieves only ∼95.6% at high SNR with sharp degradation at lower SNR; Oncu et al. [90] demonstrated 89.7% real-time OTA radar classification at 200 MSps using a USRP N320 with GPU-accelerated DBSCAN; and Jagannath [91] released RadComOta, the only public OTA dataset with joint modulation-and-signal-type labels for heterogeneous comm/radar waveforms. Together these works confirm that mixed simulation–measurement datasets, physics-informed augmentation, and domain adaptation are necessary elements of a robust deployment pipeline; a standardized OTA benchmark along the lines of RadComOta [91] is essential for community-wide progress.

The thermographic-monitoring community offers three practices that transfer directly to DL-ESWR robustness. (i) Environment-stratified evaluation: Rather than reporting a single aggregate accuracy, thermographic studies routinely stratify results by ambient temperature, emissivity, and lighting, exposing hidden failure modes. The ESWR analogue is stratification by channel type, CFO, and hardware platform, which we advocate as a default reporting practice. (ii) Statistical pre-whitening and normalization: Robust thermographic pipelines apply environment-aware statistical normalization before feeding data to the network. The ESWR analogue is physics-aware preprocessing, such as CFO compensation and adaptive gain control combined with learned features. (iii) Anomaly-aware decision-making: Pair the primary classifier with a statistical anomaly detector to flag out-of-distribution samples, whose ESWR analogue is the uncertainty-aware and open-set extensions discussed in Section 3.6 and Section 3.8.

Mitigation strategies. Mixed simulation–measurement datasets, physics-informed augmentation (injecting realistic RF-chain effects into simulated data), and domain-adaptation techniques are all necessary elements of a robust deployment pipeline. A broader community effort toward standardized over-the-air benchmarks, along the lines of RadComOta [91], is essential.

3.5. Challenge V: Security and Adversarial Robustness

DL models for ESWR exhibit alarming vulnerability to adversarial attacks. Main methods of adversarial attack and defense for DL-ESWR are summarized in Table 8.

3.5.1. Understanding and Implementing Adversarial Attacks

The core FGSM formulation seeks an imperceptible perturbation:

x_{adv} = x + ε \cdot sign (\nabla_{x} L (θ, x, y)), s . t . {∥ x_{adv} - x ∥}_{\infty} \leq ε

(16)

PGD iteratively refines the perturbation:

x_{t + 1} = Π_{ε} {x_{t} + α \cdot sign (\nabla_{x} L (θ, x_{t}, y))}

. DNN-based classifiers are highly susceptible [87] to both white-box and black-box attacks. Universal adversarial perturbations (UAPs) [92], backdoor attacks [93], cross-modal attacks [94], and channel-resilient ensemble attacks [95] have been extensively explored.

3.5.2. Adversarial Defense Strategies

Module II Defense: HFAD [96] removes perturbations via frequency-domain processing. Co-VQMAE [97] combines vector quantization with masked autoencoder reconstruction.

Module IV Defense: Adversarial training remains the most adopted strategy. Meta-learning-based adversarial training [74] enables adaptation to unseen attacks. Multi-distillation defense transfers knowledge from multiple teachers.

3.6. Challenge VI: Interpretability and Trustworthiness

The “black-box” nature of DL models limits adoption in high-reliability applications.

Post Hoc Methods: Decision tree-based surrogate models [98] and Grad-CAM variants highlight influential input regions.

Knowledge–Data Fusion: The knowledge-embedded convolutional transformer [99] incorporates communication-theoretic priors. The authors of [100] established a domain-knowledge and AI-integrated framework for reducing access delay, queuing delay, and transmission delay in time-sensitive wireless networks. Hybrid models [101] fusing DL and domain knowledge combine physics-based extraction with data-driven learning. Data and knowledge dual-driven AMC for 6G [102] provides a systematic integration framework. Three concrete physics–AI integration paradigms have emerged. (i) Physics-informed data augmentation: Physics-based channel simulators (3GPP and ray tracing) inject realistic multipath, fading, and hardware-impairment distributions into augmented samples, providing a structurally accurate inductive bias. (ii) Signal-theory-informed architectures: Networks that embed known signal-processing operations (matched filtering, cyclostationary extraction, and complex-valued convolutions) as non-learnable or lightly learnable layers effectively encode physical priors, as exemplified by [48,49,99]. (iii) Physics-consistent loss functions: Adding regularization terms penalizing predictions inconsistent with known signal invariances (modulation symmetry and pulse-shape constraints) yields representations that generalize better to unseen channels. Systematic comparison of these three paradigms remains an open problem.

Model Uncertainty Quantification

For mission-critical ESWR applications—electronic warfare, spectrum law enforcement, and safety-of-life communications—producing a single predicted label is insufficient; a calibrated uncertainty measure is equally critical. DL-ESWR models have traditionally reported only top-1 accuracy and largely ignored predictive uncertainty, but a growing ESWR-specific research focus now addresses this gap along four methodological directions.

Deep ensembles for AMC. Yang and Sahay [103] train multiple independent CNNs to produce predictive distributions, consistently outperforming single-model baselines under in-distribution, out-of-distribution, and low-SNR conditions.

Bayesian neural networks for incremental AMC. Luu et al. [104] combine frequentist pretraining with Laplace approximation and variational inference, enabling uncertainty-aware incremental classification that refines predictions as samples arrive and detects unseen modulations via confidence patterns.

Open-set recognition via confidence thresholding and OpenMax. Zhou et al. [105] apply confidence-score thresholding and OpenMax to radar-jamming recognition; Xiao et al. [106] extend this to compound jamming with a multi-task, multi-label framework combining time–frequency reconstruction and extreme-value modeling.

Robust Bayesian learning for wireless AI. Zecchin et al. [107] provide calibration guarantees under model misspecification and data outliers, pervasive conditions in wireless settings.

Standard calibration metrics (ECE and Brier score) should be routinely reported alongside accuracy. Model uncertainty quantification is an essential ingredient of the “trustworthy AI” future direction; the limited number of DL-ESWR studies explicitly reporting uncertainty remains a significant gap between current research and real-world deployment.

3.7. Challenge VII: System Integration and Cooperative Design

In MIMO, multi-signal overlap, OFDM, and ISAC scenarios, AMR cannot operate in isolation.

Module II Innovations: Blind source separation and frequency-domain sliding window detection [108,109] separate overlapping signal components.

Module III and Module IV Innovations: For MIMO systems, ZF equalization separates spatial streams before CNN classification. Cooperative AMC methods [110] design distributed frameworks. For OFDM, learning-driven receiver designs [111] jointly optimize recognition and demodulation. NAS [61] discovers optimal architectures for ISAC scenarios. MIMO-NN architectures [84] demonstrate multi-task cooperative benefits.

3.8. Challenge VIII: Open-World Recognition and Class-Incremental Learning

Traditional DL-ESWR models operate under a closed-set assumption.

Module III Innovations: Reconstruction-based frameworks employ class-information-guided reconstruction [112]. The multi-view discriminant framework [113] integrates multiple feature perspectives. GNN-assisted open-set recognition [47] models known class topology. The prototype-based decision rule is

\hat{y} = \{\begin{matrix} arg max_{k \in K} P (y = k | x), & if max P (y = k | x) \geq δ \\ Unknown, & otherwise \end{matrix}

(17)

Open-set AMC using multiple domain representations [114] constructs comprehensive feature spaces. Figure 8 depicts the differences between open-set recognition, closed-set classification and class-incremental learning, and the comparisons are listed in Table 9.

Module IV Innovations: PASS-Net [75] employs pseudo-classes and stochastic classifiers for class-incremental AMC. Orthogonal pseudo-prototype methods [76] mitigate catastrophic forgetting. Open-set domain adaptation [77] jointly addresses domain shift and class extension. Fine-grained open-set classification [115] leverages self-supervised contrastive pretraining.

4. Future Research Directions and Outlook

Building upon the systematic analysis in Section 3, this section delineates seven promising future research directions. Table 10 summarizes the future directions of DL-ESWR, and the research roadmap is illustrated in Figure 9.

4.1. Universality and Scalability (Extending Challenge VIII)

Future ESWR models must transition from static closed-set classifiers to dynamic open-world systems that are capable of continuous learning. The integration of meta-learning with lifelong learning frameworks represents a promising pathway. Developing universal signal representation spaces where all modulation types can be meaningfully embedded remains a fundamental challenge.

Empirical-validation path. Progress here can be measured by: (i) cross-dataset, cross-domain classification on a unified benchmark combining RadioML, HisarMod, and at least one OTA split; (ii) incremental-learning evaluation reporting both average accuracy across all observed tasks and the forgetting measure; and (iii) open-set classification rate and AUROC on novel modulation classes. A model that maintains >90% joint-training accuracy while being incrementally trained on three or more successive task splits would constitute compelling empirical validation.

4.2. Extreme Efficiency (Extending Challenge III)

Hardware-aware NAS (HW-NAS) that automatically designs optimal architectures under explicit hardware constraints is a critical direction. AutoML pipelines tailored for ESWR would significantly reduce expert effort in model design and deployment.

Empirical-validation path. Validation should report end-to-end inference latency on at least three edge targets (FPGA, ARM CPU, and dedicated DSP), peak memory footprint, and energy per inference, in addition to parameter count and FLOPs. A candidate architecture that achieves ≤1 ms latency on an ARM Cortex-class processor while staying within three percentage points of a server-grade ResNet baseline on RML2016.10a would constitute convincing validation.

4.3. Trustworthy AI (Extending Challenges V and VI)

Robustness certification methods providing provable guarantees on classifier behavior under bounded perturbations would enhance deployment confidence. Concept-based interpretability approaches grounding decisions in human-understandable signal processing concepts offer a promising direction for intrinsic interpretability.

Empirical-validation path. Validation requires jointly reporting: (i) certified accuracy under an

ℓ_{2}

/

ℓ_{\infty}

perturbation budget; (ii) expected calibration error (ECE) and Brier score on matched and OOD test splits; and (iii) human-rated interpretability on a sample of predictions using concept-grounded explanations. Results that simultaneously improve certified robustness and calibration without degrading clean accuracy would constitute strong empirical evidence for trustworthy DL-ESWR.

4.4. Knowledge–Data Fusion (Extending Challenges I and VI)

Physics-informed neural networks (PINNs) incorporating signal propagation models and channel effects as inductive biases promise physically consistent learned representations. Knowledge graphs formalizing relationships between modulation types and channel conditions could provide structured prior knowledge, guiding network design and training.

The integration of physical models with neural networks has been demonstrated to improve both robustness and interpretability in signal-analysis systems. For DL-ESWR specifically, three concrete physics–AI integration paradigms have emerged. (i) Physics-informed data augmentation: Rather than generating synthetic training signals from arbitrary distributions, physics-based channel simulators (e.g., 3GPP and ray tracing) can inject realistic multipath, fading, and hardware-impairment distributions into augmented samples, providing the model with a structurally accurate inductive bias. (ii) Signal-theory-informed architectures: Network designs that embed known signal-processing operations (matched filtering, cyclostationary extraction, and complex-valued convolutions) as non-learnable or lightly learnable layers effectively encode physical priors. The knowledge-embedded convolutional transformer [99] and complex-valued networks [48,49] exemplify this approach. (iii) Physics-consistent loss functions: Adding regularization terms that penalize predictions that are inconsistent with known signal invariances (e.g., modulation symmetry and pulse-shape constraints) yields representations that generalize better to unseen channels. We identify the systematic comparison of these three paradigms and their combination as a core open problem.

Empirical-validation path. Validation should compare identical architectures trained with and without physics-informed components on both synthetic and OTA data, reporting the generalization gap reduction that is attributable to each paradigm. A physics-informed model that substantially narrows the performance gap between simulation and OTA environments, while producing domain-consistent feature attributions, would constitute compelling validation.

4.5. Foundation Model-Driven Approaches (Extending Challenge II)

Wireless signal foundation models—large-scale models pretrained on diverse signal datasets capturing universal wireless patterns—represent a transformative opportunity. Such models could serve as general-purpose feature extractors for multiple downstream tasks, dramatically reducing per-task data and computational requirements.

Empirical-validation path. To empirically validate this direction, the community should converge on: (i) a pretraining benchmark comprising ≥1M unlabeled multi-domain wireless signals; (ii) a standard fine-tuning protocol testing n-shot classification on 10+ downstream tasks; and (iii) downstream-task average accuracy per labeled sample as the headline metric. A foundation model surpassing task-specific baselines with ≤10 labeled examples per class would constitute compelling validation.

4.6. Multi-Function Signal Waveform Recognition (New Frontier)

ISAC systems introduce qualitatively new challenges where waveforms simultaneously carry communication data and sensing information. Future recognition systems must evolve from identifying single-function signals to characterizing composite multi-function natures.

Empirical-validation path. Validation should use ISAC-oriented datasets such as RadComOta [91], reporting joint modulation-and-signal-type classification accuracy (as opposed to modulation-only accuracy) and confusion-matrix analysis across composite function categories.

4.7. Waveform Recognition for Emerging Communication Paradigms (New Frontier)

Scenario I—Traditional Modulation Carrying Semantic Content: When semantic communication systems employ conventional modulation, existing AMR techniques can identify the underlying modulation, although recognizing the semantic content presents additional challenges.

Scenario II—End-to-End Semantic Waveforms: Deep JSCC systems generate analog-like waveforms, bypassing conventional modulation. Recognizing these requires new frameworks that identify source content type and generative model architecture—a paradigm shift from modulation recognition to generative model fingerprinting.

Beyond semantic communication, RIS-assisted communication, NOMA, and cell-free massive MIMO each introduce unique recognition challenges.

Empirical-validation path. Validation requires constructing task-specific benchmarks for each paradigm (semantic-coded signals, RIS-reflected signals, NOMA superposition signals, and cell-free massive-MIMO uplinks) and reporting recognition accuracy relative to a conventional AMR baseline. A successful waveform-recognition system in this regime should achieve >80% accuracy at SNR ≥ 0 dB on semantic-content identification and should additionally demonstrate generative-model-fingerprinting capability (source architecture identification) with AUROC > 0.9.

5. Conclusions

5.1. Generalizability to Other Signal Processing Domains

Although this survey focuses on wireless signal recognition, the proposed four-module decomposition—data → representation → architecture → training—is a natural organizing principle for any deep learning-based signal classification task. We illustrate its applicability to four related domains.

Audio and acoustic recognition. Module I corresponds to AudioSet-style datasets; Module II corresponds to Mel-spectrogram or MFCC representations; Module III encompasses the same CNN/transformer architectures with domain-specific pooling; Module IV includes contrastive pretraining (wav2vec) and domain adaptation.

Biomedical signal analysis (ECG and EEG). Module II becomes wavelet decompositions or empirical mode decompositions; Module III emphasizes 1D CNNs and temporal convolution networks; Module IV places stronger weight on calibration and uncertainty (clinical stakes) and federated learning (privacy).

Radar and SAR imaging. Although Module II differs sharply (range-Doppler maps and SAR images), Modules III and IV are shared with ESWR, making knowledge transfer between the two fields natural.

Seismic and geophysical signals. Module I involves expert-labeled recordings with severe imbalance; the training strategy innovations cataloged in Module IV (few-shot and semi-supervised) transfer directly.

Thus, the four-module framework is best viewed as a general-purpose lens for analyzing any DL pipeline operating on one-dimensional or low-dimensional physical signals. The specific methods within each module are ESWR-centric, but the structural skeleton is domain-agnostic.

5.2. Summary of Contributions and Findings

This survey has presented a comprehensive review of AI-enabled ESWR through a unified system-component framework. By decomposing the DL-ESWR pipeline into four foundational modules, we have provided a structured perspective enabling precise attribution of innovations across the literature.

Our analysis reveals several insights. First, the majority of innovations arise from targeted modifications to one or two specific modules. Module III and Module IV have collectively attracted over 70% of the research attention, while Modules I and II remain underexplored. Second, the eight core challenges exhibit significant cross-module dependencies. Third, prevalent benchmark datasets have systematic limitations that raise concerns about ecological validity.

The analysis of research trends reveals rapidly accelerating directions: self-supervised and contrastive learning have experienced explosive growth; open-world recognition represents the newest frontier; GNN-based architectures are emerging as a distinctive paradigm.

Looking ahead, the convergence of foundation models, PINNs, trustworthy AI, ISAC, and semantic communication promises to reshape the ESWR landscape. The evolution from static modulation classifiers to dynamic open-world multi-function waveform recognition systems represents the grand challenge and ultimate aspiration of this field.

The practical impact of DL-ESWR research ultimately depends on bridging the gap between algorithmic innovation and system deployment, requiring sustained attention to computational efficiency, robustness, security, and interpretability.

Funding

This work was supported in part by National Natural Science Foundation of China under Grants 62571450 and 62101450, in part by Key Research and Development Program of Shaanxi under Grant 2025CY-YBXM-043, in part by Shanghai Academy of Spaceflight Technology under Grant SAST2025-037, in part by the Open Fund of Intelligent Control Laboratory, in part by the Open Fund of Key Laboratory of Radio Spectrum Testing Technology, The State Radio_monitoring_center Testing Center, Ministry of Industry and Information Technology.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors acknowledge all sources of support.

Conflicts of Interest

Author Lechi Zhang was employed by the company Xi’an Hengxiang Control Technology Co., Ltd. Authors Wensheng Lin and Lixin Li were employed by the company DecoreX Intelligent Technologies Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiao, Y.; Du, Q.; Zhao, Z.; Li, B.; Tang, X.; Zhang, S.; Song, H. RF-based Identification Framework against Unauthorized UAV Networking in Low-Altitude Economy. IEEE Trans. Netw. Sci. Eng. 2026, 13, 6538–6555. [Google Scholar] [CrossRef]
Wang, F.; Wang, X. Fast and Robust Modulation Classification via Kolmogorov-Smirnov Test. IEEE Trans. Commun. 2010, 58, 2324–2332. [Google Scholar] [CrossRef]
Luan, S.; Zhou, J.; Zhu, M.; Zhou, Z.; Ding, Z. KAN-MAE: KAN-Based Masked Autoencoder with Correntropy-Aided Near-Homogeneity Strategy for Automatic Modulation Classification. IEEE Trans. Veh. Technol. 2026, 75, 3429–3433. [Google Scholar] [CrossRef]
Gan, X.; Wang, H.; Li, X.; Liu, Z.; Jiang, H.; Wang, J. Potential Threat to Cognitive Radio Networks: A Black-Box and Label-Consistent Backdoor Attack Against Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 1395–1410. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, C.; Gan, C.; Sun, S.; Wang, M. Automatic Modulation Classification Using Convolutional Neural Network with Features Fusion of SPWVD and BJD. IEEE Trans. Signal Inf. Process. Netw. 2019, 5, 469–478. [Google Scholar] [CrossRef]
Hazza, A.; Shoaib, M.; Alshebeili, S.A.; Fahad, A. An overview of feature-based methods for digital modulation classification. In 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA); IEEE: New York, NY, USA, 2013; pp. 1–6. [Google Scholar] [CrossRef]
Shi, Y.; Xu, H.; Zhang, Y.; Qi, Z.; Wang, D. GAF-MAE: A Self-Supervised Automatic Modulation Classification Method Based on Gramian Angular Field and Masked Autoencoder. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 94–106. [Google Scholar] [CrossRef]
Wang, Y.; Liu, M.; Yang, J.; Gui, G. Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios. IEEE Trans. Veh. Technol. 2019, 68, 4074–4077. [Google Scholar] [CrossRef]
Peng, S.; Jiang, H.; Wang, H.; Alwageed, H.; Yao, Y. Modulation classification using convolutional Neural Network based deep learning model. In 2017 26th Wireless and Optical Communication Conference (WOCC); IEEE: New York, NY, USA, 2017; pp. 1–5. [Google Scholar] [CrossRef]
West, N.E.; O’Shea, T. Deep architectures for modulation recognition. In 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN); IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
Mendis, G.J.; Wei, J.; Madanayake, A. Deep learning-based automated modulation classification for cognitive radio. In 2016 IEEE International Conference on Communication Systems (ICCS); IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar] [CrossRef]
Geng, Z.; Yan, H.; Zhang, J.; Zhu, D. Deep-learning for radar: A survey. IEEE Access 2021, 9, 141800–141818. [Google Scholar] [CrossRef]
Huynh-The, T.; Pham, Q.V.; Nguyen, T.V.; Nguyen, T.T.; Ruby, R.; Zeng, M.; Kim, D.S. Automatic modulation classification: A deep architecture survey. IEEE Access 2021, 9, 142950–142971. [Google Scholar] [CrossRef]
Wang, T.; Yang, G.; Chen, P.; Xu, Z.; Jiang, M.; Ye, Q. A survey of applications of deep learning in radio signal modulation recognition. Appl. Sci. 2022, 12, 12052. [Google Scholar] [CrossRef]
Peng, S.; Sun, S.; Yao, Y.D. A survey of modulation classification using deep learning: Signal representation and data preprocessing. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 7020–7038. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Zhao, Y.; Huang, Z. A survey of deep transfer learning in automatic modulation classification. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1357–1381. [Google Scholar] [CrossRef]
Huang, L.; Pan, W.; Zhang, Y.; Qian, L.; Gao, N.; Wu, Y. Data Augmentation for Deep Learning-Based Radio Modulation Classification. IEEE Access 2020, 8, 1498–1506. [Google Scholar] [CrossRef]
Dong, G.; Liu, H. Signal Augmentations Oriented to Modulation Recognition in the Realistic Scenarios. IEEE Trans. Commun. 2023, 71, 1665–1677. [Google Scholar] [CrossRef]
Manin, L.; Oliva, G.; Bianco, M.G.; Hossain, M.M.; Valić, S.; Islam, S.K.; Laganà, F.; Fiorillo, A.S.; Pullano, S.A. Application of FTIR and PCA-LR metabolites recognition for bergamot essential oil authentication. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2026, 352, 127561. [Google Scholar] [CrossRef]
Zhang, F.; Luo, C.; Xu, J.; Luo, Y. An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation. IEEE Commun. Lett. 2021, 25, 3287–3290. [Google Scholar] [CrossRef]
Oikonomou, T.K.; Evgenidis, N.G.; Nixarlidis, D.G.; Tyrovolas, D.; Tegos, S.A.; Diamantoulakis, P.D.; Sarigiannidis, P.G.; Karagiannidis, G.K. CNN-Based Automatic Modulation Classification Under Phase Imperfections. IEEE Wirel. Commun. Lett. 2024, 13, 1508–1512. [Google Scholar] [CrossRef]
Peng, S.; Jiang, H.; Wang, H.; Alwageed, H.; Zhou, Y.; Sebdani, M.M.; Yao, Y. Modulation Classification Based on Signal Constellation Diagrams and Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 718–727. [Google Scholar] [CrossRef]
Song, G.; Jeon, G.; Yoon, D. Deep Learning-Based Automatic Modulation Classification for Composite Modulated Radar Signal Using Time-Frequency Image. In 2024 34th International Telecommunication Networks and Applications Conference (ITNAC); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Wang, J.; Yin, Y.; Shi, X.; Zhang, Z.; Zhang, Z. LPI Radar Signal Recognition Method Based on Adaptive Wavelet Decomposition and Deep Residual Shrinkage Network. IEEE Sens. J. 2025, 25, 41618–41633. [Google Scholar] [CrossRef]
Konopko, K.; Grishin, Y.P.; Janczak, D. Radar signal recognition based on time-frequency representations and multidimensional probability density function estimator. In 2015 Signal Processing Symposium (SPSympo); IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
Mi, X.; Chen, X.; Liu, Q.; Hu, D. Radar signals modulation recognition based on bispectrum feature processing. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1971, p. 012099. [Google Scholar]
Dong, Z.; Lv, F.; Wan, T.; Jiang, K.; Fang, X.; Zhang, L. Radar Signal Modulation Recognition Based on Bispectrum Features and Deep learning. In 2021 International Conference on Computer Engineering and Application (ICCEA); IEEE: New York, NY, USA, 2021; pp. 63–67. [Google Scholar] [CrossRef]
Bai, J.; Liu, X.; Wang, Y.; Xiao, Z.; Chen, F.; Zhou, H.; Jiao, L. Integrating Prior Knowledge and Contrast Feature for Signal Modulation Classification. IEEE Internet Things J. 2024, 11, 21461–21473. [Google Scholar] [CrossRef]
Huynh-The, T.; Hua, C.; Pham, Q.; Kim, D. MCNet: An Efficient CNN Architecture for Robust Automatic Modulation Classification. IEEE Commun. Lett. 2020, 24, 811–815. [Google Scholar] [CrossRef]
Huang, S.; Dai, R.; Huang, J.; Yao, Y.; Gao, Y.; Ning, F.; Feng, Z. Automatic Modulation Classification Using Gated Recurrent Residual Network. IEEE Internet Things J. 2020, 7, 7795–7807. [Google Scholar] [CrossRef]
Saraswathi, V.; Dayana, R. Federated Learning-Driven GRU for Modulation Recognition in Cognitive Radio Networks. In 2025 6th International Conference on Data Intelligence and Cognitive Informatics (ICDICI); IEEE: New York, NY, USA, 2025; pp. 1055–1060. [Google Scholar] [CrossRef]
Hong, D.; Zhang, Z.; Xu, X. Automatic modulation classification using recurrent neural networks. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC); IEEE: New York, NY, USA, 2017; pp. 695–700. [Google Scholar] [CrossRef]
Zhang, Z.; Luo, H.; Wang, C.; Gan, C.; Xiang, Y. Automatic Modulation Classification Using CNN-LSTM Based Dual-Stream Structure. IEEE Trans. Veh. Technol. 2020, 69, 13521–13531. [Google Scholar] [CrossRef]
Qiu, K.; Zheng, S.; Zhang, L.; Lou, C.; Yang, X. DeepSIG: A Hybrid Heterogeneous Deep Learning Framework for Radio Signal Classification. IEEE Trans. Wirel. Commun. 2024, 23, 775–788. [Google Scholar] [CrossRef]
Chen, B.; Wang, X.; Zhu, D.; Yan, H.; Xu, G.; Wen, Y. LPI Radar Signals Modulation Recognition in Complex Multipath Environment Based on Improved ResNeSt. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 8887–8900. [Google Scholar] [CrossRef]
Yin, Y.; Wang, J.; Tang, Y.; Wang, X. ConvNeXt Architecture Enhanced with Modified SCConv for LPI Radar Modulation Classification. In 2025 IEEE 8th International Conference on Electronic Information and Communication Technology (ICEICT); IEEE: New York, NY, USA, 2025; pp. 295–300. [Google Scholar] [CrossRef]
Hantouli, F.; Hall, G.; Brown, D.; Tieman, J.; Chakravarty, S. Implementing ResNet for Real-Time Radar Signal Classification in Electronic Warfare. In 2025 IEEE International Radar Conference (RADAR); IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
Wu, C.; Chen, S.; Sun, G. Automatic Modulation Recognition Framework for LPI Radar Based on CNN and Vision Transformer. In Proceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence; IEEE: New York, NY, USA, 2024; pp. 170–176. [Google Scholar] [CrossRef]
Dao, T.; Noh, D.; Pham, Q.; Hasegawa, M.; Sekiya, H.; Hwang, W. VT-MCNet: High-Accuracy Automatic Modulation Classification Model Based on Vision Transformer. IEEE Commun. Lett. 2024, 28, 98–102. [Google Scholar] [CrossRef]
Ma, W.; Cai, Z.; Wang, C. A Transformer and Convolution-Based Learning Framework for Automatic Modulation Classification. IEEE Commun. Lett. 2024, 28, 1392–1396. [Google Scholar] [CrossRef]
Shao, M.; Li, D.; Hong, S.; Qi, J.; Sun, H. IQFormer: A Novel Transformer-Based Model with Multi-Modality Fusion for Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1623–1634. [Google Scholar] [CrossRef]
Qi, P.; Zhou, X.; Zheng, S.; Li, Z. Automatic Modulation Classification Based on Deep Residual Networks with Multimodal Information. IEEE Trans. Cogn. Commun. Netw. 2021, 7, 21–33. [Google Scholar] [CrossRef]
Zhang, J.; An, S.; Meng, F.; Liu, Q. MST: A Multi-Scale Transformer Framework with Cross-Scale Token Fusion for Automatic Modulation Recognition. IEEE Wirel. Commun. Lett. 2025, 14, 4112–4116. [Google Scholar] [CrossRef]
Feng, Y.; Peng, K.; Wei, J.; Tang, Z. Window Attention Convolution Network (WACN): A Local Self-Attention Automatic Modulation Recognition Method. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1597–1608. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Yang, C. Modulation Recognition with Graph Convolutional Network. IEEE Wirel. Commun. Lett. 2020, 9, 624–627. [Google Scholar] [CrossRef]
Shao, M.; Fu, Z.; Li, D.; Zhang, F.; Cai, Y.; Hong, S.; Cao, L.; Peng, Y.; Qi, J. STF-GCN: A Multi-Domain Graph Convolution Network Method for Automatic Modulation Recognition via Adaptive Correlation. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 2036–2050. [Google Scholar] [CrossRef]
Cai, Z.; Wang, C.; Ma, W.; Li, X.; Zhou, R. Lightweight Automatic Modulation Classification Based on Efficient Convolution and Graph Sparse Attention in Low-Resource Scenarios. IEEE Internet Things J. 2025, 12, 3629–3638. [Google Scholar] [CrossRef]
Li, W.; Deng, W.; Wang, K.; You, L.; Huang, Z. A Complex-Valued Transformer for Automatic Modulation Recognition. IEEE Internet Things J. 2024, 11, 22197–22207. [Google Scholar] [CrossRef]
Tu, Y.; Lin, Y.; Hou, C.; Mao, S. Complex-Valued Networks for Automatic Modulation Classification. IEEE Trans. Veh. Technol. 2020, 69, 10085–10089. [Google Scholar] [CrossRef]
Zhang, H.; Zhou, F.; Wu, Q.; Al-Dhahir, N. SSwsrNet: A Semi-Supervised Few-Shot Learning Framework for Wireless Signal Recognition. IEEE Trans. Commun. 2024, 72, 5823–5836. [Google Scholar] [CrossRef]
Guo, Y.; Zhong, D.; Sun, H.; Jiang, Z.; Ye, L.; Deng, Z.; Liu, H. SemiAMR: Semi-Supervised Automatic Modulation Recognition with Corrected Pseudo-Label and Consistency Regularization. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 107–121. [Google Scholar] [CrossRef]
Cai, J.; He, M.; Cao, X.; Gan, F. Semisupervised Radar Intrapulse Signal Modulation Classification with Virtual Adversarial Training. IEEE Internet Things J. 2024, 11, 9929–9940. [Google Scholar] [CrossRef]
Yi, G.; Hao, X.; Yan, X.; Wang, J.; Dai, J. Automatic Modulation Recognition for Radio Frequency Proximity Sensor Signals Based on Masked Autoencoders and Transfer Learning. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 8700–8712. [Google Scholar] [CrossRef]
Hou, K.; Du, X.; Cui, G.; Chen, X.; Zheng, J.; Rong, Y.; Ma, W. A Hybrid Network-Based Contrastive Self-Supervised Learning Method for Radar Signal Modulation Recognition. IEEE Trans. Veh. Technol. 2026, 75, 4437–4451. [Google Scholar] [CrossRef]
Chen, T.; Liu, K.; Huang, Q. EET-MoCo: An Efficient Embedding Transformer with Momentum Contrast Learning for Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 3784–3796. [Google Scholar] [CrossRef]
Fu, Y.; Ma, Y.; Feng, Z.; Yang, S.; Wang, Y. Hybrid-View Self-Supervised Framework for Automatic Modulation Recognition. IEEE Internet Things J. 2025, 12, 7360–7375. [Google Scholar] [CrossRef]
Bai, J.; Wang, X.; Xiao, Z.; Zhou, H.; Ali, T.A.A.; Li, Y.; Jiao, L. Achieving Efficient Feature Representation for Modulation Signal: A Cooperative Contrast Learning Approach. IEEE Internet Things J. 2024, 11, 16196–16211. [Google Scholar] [CrossRef]
Feng, S.; Wang, Y.; Wen, Z.; Xu, L.; Yan, M. Fine-Grained Transductive Prototypical Network-Based Few-Shot Signal Modulation Classification Using Coarse Labels. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 2189–2204. [Google Scholar] [CrossRef]
Li, L.; Huang, J.; Cheng, Q.; Meng, H.; Han, Z. Automatic Modulation Recognition: A Few-Shot Learning Method Based on the Capsule Network. IEEE Wirel. Commun. Lett. 2021, 10, 474–477. [Google Scholar] [CrossRef]
Jang, J.; Pyo, J.; Yoon, Y.; Choi, J. Meta-Transformer: A Meta-Learning Framework for Scalable Automatic Modulation Classification. IEEE Access 2024, 12, 9267–9276. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, H.; Zhu, H.; Adebisi, B.; Gui, G.; Gacanin, H.; Adachi, F. NAS-AMR: Neural Architecture Search-Based Automatic Modulation Recognition for Integrated Sensing and Communication Systems. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1374–1386. [Google Scholar] [CrossRef]
Cao, X. Few-Shot and Zero-Shot Radar Active Jamming Recognition Based on a Vision-Language Model. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 14795–14808. [Google Scholar] [CrossRef]
Hou, D.; Li, L.; Lin, W.; Liang, J.; Han, Z. ClST: A Convolutional Transformer Framework for Automatic Modulation Recognition by Knowledge Distillation. IEEE Trans. Wirel. Commun. 2024, 23, 8013–8028. [Google Scholar] [CrossRef]
Chen, Z.; Wang, Z.; Gao, X.; Zhou, J.; Xu, D.; Zheng, S.; Xuan, Q.; Yang, X. Channel Pruning Method for Signal Modulation Recognition Deep Learning Models. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 442–453. [Google Scholar] [CrossRef]
Lu, Y.; Zhu, Y.; Li, Y.; Xu, D.; Lin, Y.; Xuan, Q.; Yang, X. A Generic Layer Pruning Method for Signal Modulation Recognition Deep Learning Models. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 2123–2134. [Google Scholar] [CrossRef]
Xu, F.; Zhu, Y.; Yang, F.; Zhang, X.; Mu, J.; Chen, H. Distributed Modulation Recognition for IoT Devices in Data-Limited Applications. IEEE Internet Things J. 2025, 12, 26740–26752. [Google Scholar] [CrossRef]
Zhao, D.; Li, L.; Lin, W.; Hou, D.; Zhang, X.; Han, Z. FUTUREs: Feature Utility Transfer for Uncooperative Recognition of Electromagnetic Signals Toward 6G Wireless Communications. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 644–657. [Google Scholar] [CrossRef]
Zhang, Z.; Li, H.; Li, Y.; Chen, Z.; Wang, S.; Luo, T. A Hybrid Approach for Cross-Dataset Modulation Recognition of Wireless Interference. IEEE Trans. Commun. 2025, 73, 13677–13690. [Google Scholar] [CrossRef]
Lin, W.; Hou, D.; Huang, J.; Li, L.; Han, Z. Transfer Learning for Automatic Modulation Recognition Using a Few Modulated Signal Samples. IEEE Trans. Veh. Technol. 2023, 72, 12391–12395. [Google Scholar] [CrossRef]
Zhang, M.; Wei, G.; Tang, P.; Ni, X.; Ding, G.; Wang, H. Semi-Supervised Domain Adaptation for Automatic Modulation Recognition in Unseen Scenarios. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1609–1622. [Google Scholar] [CrossRef]
Xing, H.; Wang, S.; Wang, C.; Quan, D.; Xu, Y.; Zhou, H.; Xu, H.; Jiao, L. Disentangling Domain-Invariant Features From Few Samples for Automatic Modulation Classification. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 1525–1538. [Google Scholar] [CrossRef]
Doan, V.; Le, H.; Hoang, V. Multi-In-Multi-Out Neural Network for Joint DOA Estimation and Automatic Modulation Classification. IEEE Commun. Lett. 2025, 29, 1993–1997. [Google Scholar] [CrossRef]
Kong, W.; Jiao, X.; Liu, B.; Xu, Y.; Yang, Q. SCLMR: An End-to-End Network for Long-Tailed Modulation Recognition Based on Supervised Contrastive Learning. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 2871–2884. [Google Scholar] [CrossRef]
Bamdad, A.; Owfi, A.; Afghah, F. Adaptive Meta-learning-based Adversarial Training for Robust Automatic Modulation Classification. In 2025 IEEE International Conference on Communications Workshops (ICC Workshops); IEEE: New York, NY, USA, 2025; pp. 292–297. [Google Scholar] [CrossRef]
Tan, H.; Zhang, Z.; Li, Y.; Shi, X.; Wang, L.; Yang, X.; Zhou, F. PASS-Net: A Pseudo Classes and Stochastic Classifiers-Based Network for Few-Shot Class-Incremental Automatic Modulation Classification. IEEE Trans. Wirel. Commun. 2024, 23, 17987–18003. [Google Scholar] [CrossRef]
Deng, Z.; Luo, C.; Tang, Z.; Pu, X.; Luo, Y. An Orthogonal Pseudo-Prototype and Pseudo-Target-Based Method for Few-Shot Class-Incremental Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 2542–2557. [Google Scholar] [CrossRef]
Yan, X.; Zhong, X.; Wu, H.; Yang, P.; Wang, Q.; Chen, Y. Automatic Composite-Modulation Classification Using Cyclic-Paw-Print Features for Cognitive Aerospace Communications. IEEE Trans. Commun. 2024, 72, 5486–5502. [Google Scholar] [CrossRef]
Liu, G.; Qian, B.; Liu, Z.; Hao, C. A Modulation Recognition Method Based on Adaptive Feature Fusion. In 2025 International Conference on Communication Networks and Smart Systems Engineering (ICCNSE); IEEE: New York, NY, USA, 2025; pp. 244–249. [Google Scholar] [CrossRef]
Yang, H.; Wang, D.; Liu, Y.; Liu, Y. An Automatic Modulation Recognition Model with Multi-Scale Triplet Attention. In 2025 17th International Conference on Communication Software and Networks (ICCSN); IEEE: New York, NY, USA, 2025; pp. 185–189. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, Y.; Zhang, C.; Li, C.; Xiong, Z.; Zhu, L.; Niyato, D. Boosting Robustness in Automatic Modulation Recognition for Wireless Communications. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1635–1648. [Google Scholar] [CrossRef]
Sheng, H.; Chen, K.; Zeng, W.; Geng, H.; Ma, W.; Wang, S. Carrier Frequency Offset Robust Self-Compensating Neural Network. IEEE Trans. Wirel. Commun. 2025, 24, 4073–4085. [Google Scholar] [CrossRef]
Zhou, F.; Ren, J.; Xu, F.; Wang, Y.; Wang, W.; Zhang, P. MATF-Net: Multiscale Attention with Tristream Fusion Network for Radar Modulation Recognition in S-Band. IEEE J. Sel. Areas Sens. 2025, 2, 247–258. [Google Scholar] [CrossRef]
Park, D.; Jeon, M.; Jeong, J.; Sim, I.; Yun, S.; Seo, J.; Kim, H. Noise-Aware Ensemble Learning for Efficient Radar Modulation Recognition. IEEE Internet Things J. 2025, 12, 28937–28949. [Google Scholar] [CrossRef]
Xu, J.L.; Su, W.; Zhou, M. Software-Defined Radio Equipped with Rapid Modulation Recognition. IEEE Trans. Veh. Technol. 2010, 59, 1659–1667. [Google Scholar] [CrossRef]
Xing, H.; Wang, S.; Wang, C.; Quan, D.; Mo, H.; Mei, L.; Zhou, H.; Jiao, L. AFLNet: Auxiliary Feature Learning-Guided Cross-Channel Automatic Modulation Classification. IEEE Trans. Commun. 2025, 73, 13519–13534. [Google Scholar] [CrossRef]
Li, L.; Lin, J.; Zhou, H.; Guo, Y.; Liu, X.; Liu, F.; Jiao, L. A Physics-Aware Collaborative Framework with Prototype Consistency for Noisy Labels Signal Modulation Classification. IEEE Internet Things J. 2025, 12, 44304–44317. [Google Scholar] [CrossRef]
Lin, Y.; Zhao, H.; Ma, X.; Tu, Y.; Wang, M. Adversarial Attacks in Modulation Recognition with Convolutional Neural Networks. IEEE Trans. Reliab. 2021, 70, 389–401. [Google Scholar] [CrossRef]
Bu, K.; He, Y.; Jing, X.; Han, J. Adversarial Transfer Learning for Deep Learning Based Automatic Modulation Classification. IEEE Signal Process. Lett. 2020, 27, 880–884. [Google Scholar] [CrossRef]
O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-air deep learning based radio signal classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [Google Scholar] [CrossRef]
Oncu, S.; Karakaya, M.; Dalveren, Y.; Kara, A.; Derawi, M. Real-time radar classification based on software-defined radio platforms: Enhancing processing speed and accuracy with graphics processing unit acceleration. Sensors 2024, 24, 7776. [Google Scholar] [CrossRef]
Jagannath, A.; Jagannath, J. Multi-task learning approach for modulation and wireless signal classification for 5G and beyond: Edge deployment via model compression. Phys. Commun. 2022, 54, 101793. [Google Scholar] [CrossRef]
Zhou, X.; Yu, X.; Zhang, Y.; Zeng, W. Exploring Universal Adversarial Attacks on DNN-Based Automatic Modulation Recognition Using Joint Metrics. IEEE Trans. Wirel. Commun. 2025, 24, 4853–4863. [Google Scholar] [CrossRef]
Gan, X.; Wang, H.; Li, X.; Liu, Z.; Jiang, H.; Wang, J. A Multitarget Backdoor Attack Against Automatic Modulation Recognition for IoT Wireless Signals. IEEE Internet Things J. 2025, 12, 27588–27605. [Google Scholar] [CrossRef]
Zhang, R.; Li, Y.; Liu, J. Transferable Anti-Intelligence Recognition Radar Waveform Design Based on Adversarial Attacks. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 3798–3812. [Google Scholar] [CrossRef]
Bao, Z.; Zhang, S.; Yang, S.; Fu, J.; Lin, Y. CCIFE: Channel-Resilient Ensemble Adversarial Attack Against DNN-Based Modulation Classifiers. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 1775–1787. [Google Scholar] [CrossRef]
Zhang, S.; Lin, Y.; Yu, J.; Zhang, J.; Xuan, Q.; Xu, D.; Wang, J.; Wang, M. HFAD: Homomorphic Filtering Adversarial Defense Against Adversarial Attacks in Automatic Modulation Classification. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 880–892. [Google Scholar] [CrossRef]
Zhang, S.; Song, Y.; Wang, S. Collaborative VQMAE: Defense Against Adversarial Attacks in Automatic Modulation Recognition. In ICC 2025—IEEE International Conference on Communications; IEEE: New York, NY, USA, 2025; pp. 434–439. [Google Scholar] [CrossRef]
Bai, J.; Lian, Y.; Wang, Y.; Ren, J.; Xiao, Z.; Zhou, H.; Jiao, L. An Interpretable Explanation Approach for Signal Modulation Classification. IEEE Trans. Instrum. Meas. 2024, 73, 2514613. [Google Scholar] [CrossRef]
Wang, Z.; Lai, B.; Liu, X.; Feng, Z.; Xiao, L.; Zhou, F. Knowledge Embedded Convolutional Transformer Hybrid Network for Automatic Modulation Classification. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 1725–1738. [Google Scholar] [CrossRef]
Xiao, Y.; Du, Q.; Cheng, W.; Karagiannidis, G.K.; Zhao, Z. Model-ML integrated intelligence in URLLC towards end-to-end delay fulfillment over vehicular networks. IEEE Internet Things Mag. 2023, 6, 62–68. [Google Scholar] [CrossRef]
Hou, C.; Liu, G.; Tian, Q.; Zhou, Z.; Hua, L.; Lin, Y. Multisignal Modulation Classification Using Sliding Window Detection and Complex Convolutional Network in Frequency Domain. IEEE Internet Things J. 2022, 9, 19438–19449. [Google Scholar] [CrossRef]
Chen, K.; Zhang, J.; Chen, S.; Zhang, S.; Zhao, H. Recognition and Estimation for Frequency-Modulated Continuous-Wave Radars in Unknown and Complex Spectrum Environments. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6098–6111. [Google Scholar] [CrossRef]
Yang, H.; Sahay, R. An Uncertainty Quantification Framework for Deep Learning-Based Automatic Modulation Classification. IEEE Internet Things J. 2025, 13, 5087–5096. [Google Scholar] [CrossRef]
Luu, V.C.; Park, J.; Hong, J.P. Uncertainty-aware incremental automatic modulation classification with Bayesian neural network. IEEE Internet Things J. 2024, 11, 24300–24309. [Google Scholar] [CrossRef]
Zhou, Y.; Shang, S.; Song, X.; Zhang, S.; You, T.; Zhang, L. Intelligent radar jamming recognition in open set environment based on deep learning networks. Remote Sens. 2022, 14, 6220. [Google Scholar] [CrossRef]
Xiao, Y.; Zhang, R.; Yu, X.; Jiang, Y. Open-set recognition of compound jamming signal based on multi-task multi-label learning. IET Radar Sonar Navig. 2024, 18, 1235–1246. [Google Scholar] [CrossRef]
Zecchin, M.; Park, S.; Simeone, O.; Kountouris, M.; Gesbert, D. Robust Bayesian learning for reliable wireless AI: Framework and applications. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 897–912. [Google Scholar] [CrossRef]
Wang, Y.; Gui, J.; Yin, Y.; Wang, J.; Sun, J.; Gui, G.; Gacanin, H.; Sari, H.; Adachi, F. Automatic modulation classification for MIMO systems via deep learning and zero-forcing equalization. IEEE Trans. Veh. Technol. 2020, 69, 5688–5692. [Google Scholar] [CrossRef]
Wang, Y.; Gui, G.; Gacanin, H.; Ohtsuki, T.; Sari, H.; Adachi, F. Transfer Learning for Semi-Supervised Automatic Modulation Classification in ZF-MIMO Systems. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 231–239. [Google Scholar] [CrossRef]
Zhang, J.; Liu, M.; Chen, Y.; Zhao, N. Spatial-Frequency Block Coding Automatic Recognition with Non-Gaussian Interference for Cognitive MIMO-OFDM Systems. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 156–167. [Google Scholar] [CrossRef]
Qian, L.P.; Wang, C.; Wang, Q.; Wu, M.; Wu, Y.; Yang, X. OFDM Receiver Design with Learning-Driven Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 429–441. [Google Scholar] [CrossRef]
Zhang, X.; Wang, Z.; Wang, X.; Luo, T.; Xiao, Y.; Fang, B.; Xiao, F.; Luo, D. STARNet: An Efficient Spatiotemporal Feature Sharing Reconstructing Network for Automatic Modulation Classification. IEEE Trans. Wirel. Commun. 2024, 23, 13300–13312. [Google Scholar] [CrossRef]
Hou, J.; Xu, D.; Song, F.; Chen, Z.; Xuan, Q.; Zheng, S.; Lin, Y.; Yang, X. Multi-View Discriminant Framework for Automatic Modulation Open Set Recognition. IEEE Trans. Commun. 2025, 73, 4378–4393. [Google Scholar] [CrossRef]
Flowers, B.; Buehrer, R.M.; Headley, W.C. Evaluating Adversarial Evasion Attacks in the Context of Wireless Communications. IEEE Trans. Inf. Forensics Secur. 2020, 15, 1102–1113. [Google Scholar] [CrossRef]
Feng, Z.; Pei, H.; Yang, S.; Yang, C. Fine-Grained Open Set Signal Modulation Classification via Self-Supervised Pre-Training. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 4011–4025. [Google Scholar] [CrossRef]

Figure 1. Typical application scenarios of ESWR across civilian and military domains.

Figure 2. Organization and content of this survey and its sections.

Figure 4. Taxonomy of signal representation and preprocessing methods in DL-ESWR.

Figure 5. Examples of different signal representations for the same 16-QAM signal sample at SNR

= 10

dB: (a) raw I/Q time-domain waveform, (b) constellation diagram, (c) STFT time–frequency image, (d) Choi–Williams distribution (CWD), (e) Gramian angular field (GAF) image, and (f) bispectrum image.

Figure 5. Examples of different signal representations for the same 16-QAM signal sample at SNR

= 10

dB: (a) raw I/Q time-domain waveform, (b) constellation diagram, (c) STFT time–frequency image, (d) Choi–Williams distribution (CWD), (e) Gramian angular field (GAF) image, and (f) bispectrum image.

Figure 6. Evolution timeline of core network architectures for DL-ESWR, from foundational CNN/RNN architectures (2016–2018) through hybrid CNN–LSTM designs (2018–2020) to advanced transformer, GNN, and complex-valued architectures (2020–present).

Figure 7. Taxonomy of training and optimization strategies for DL-ESWR.

Figure 8. Conceptual illustration of open-set recognition vs. closed-set classification in AMR. (a) Closed-set scenario: all test samples belong to known classes and are assigned to the closest class boundary. (b) Open-set scenario: unknown class samples (red stars) fall outside known class boundaries and must be correctly rejected. (c) Class-incremental learning: the model progressively incorporates new modulation types (dashed circles) while retaining knowledge of previously learned types.

Figure 9. Research roadmap for the future of DL-ESWR. The left column shows current challenges (Section 3) connected by arrows to corresponding future directions (right column). Dashed arrows indicate entirely new research frontiers that transcend the current challenge framework. The temporal axis indicates expected research maturity progression from near-term (2025–2027) to long-term (2028+) horizons.

Table 1. Comparison of commonly used benchmark datasets for DL-ESWR.

Dataset	Year	Public	Source	Samples	Mod. Types	SNR Range (dB)
RML2016.10a	2016	Yes	Sim.	220 K	11	$- 20$ to $+ 18$
RML2016.10b	2016	Yes	Sim.	1.2 M	10	$- 20$ to $+ 18$
RML2018.01a	2018	Yes	Sim.	2.56 M	24	$- 20$ to $+ 30$
HisarMod2019	2019	Yes	Sim.	780 K	26	$- 20$ to $+ 18$
DeepSig	2018	Yes	Sim.	$\sim 2.5$ M	24	$- 20$ to $+ 30$
Radar (custom)	Varies	No	Sim./meas.	Varies	6–16	$- 10$ to $+ 20$
Jamming (custom)	Varies	No	Sim.	Varies	4–12	$- 5$ to $+ 20$

Table 2. Comparison of signal representation methods in DL-ESWR.

Representation	Domain	Info. Preserved	Noise Robust.	Comp. Cost	Best For
Raw I/Q	Time	Complete (A+P)	Low	Minimal	General AMR
A/P	Time	Explicit A/P	Low	Low	Phase-sensitive mod.
Constellation	Spatial	Symbol-level	Medium	Low	Digital QAM/PSK
STFT	Time–freq.	Joint T–F	Medium	Medium	Radar/wideband
CWD/WVD	Time–freq.	High-res T–F	Medium–High	High	Chirp/LFM signals
Bispectrum	Spectral	Higher-order	High	High	Low-SNR scenarios
HOCs	Statistical	Order statistics	High (Gaussian)	Low	Theory-backed AMR
GAF	Angular	Temporal corr.	Medium	Medium	Self-supervised
Multi-modal	Hybrid	Complementary	High	High	Max. accuracy

Table 3. Performance comparison of representative architectures on RML2016.10a.

Architecture	Category	Input	Params	Avg. Acc. (%)	Acc. at 0 dB (%)	Key Innovation
CNN-2 (O’Shea)	CNN	I/Q	$\sim 93$ K	$\sim 56.0$	$\sim 72$	First DL-AMR baseline
ResNet	Deep CNN	I/Q	$\sim 300$ K	$\sim 61.5$	$\sim 82$	Residual learning
CLDNN	Hybrid	I/Q	$\sim 180$ K	$\sim 60.8$	$\sim 80$	CNN+LSTM+DNN
GRU–ResNet	Hybrid	I/Q	$\sim 250$ K	$\sim 62.0$	$\sim 83$	GRU + residual
MCNET	Light CNN	I/Q	$\sim 25$ K	$\sim 60.2$	$\sim 79$	Efficient convolutions
ULCNN	Ultra-light	I/Q	$\sim 5$ K	$\sim 57.5$	$\sim 75$	UAV deployment
VT-MCNet	ViT+CNN	I/Q	$\sim 420$ K	$\sim 63.8$	$\sim 85$	Vision transformer
CVT-Net	CNN+trans.	I/Q	$\sim 500$ K	$\sim 64.5$	$\sim 86$	Conv-Transformer
CV-TRN	Complex ViT	I/Q	$\sim 380$ K	$\sim 63.2$	$\sim 84$	Complex attention
STF–GCN	GNN	Multi	$\sim 450$ K	$\sim 64.3$	$\sim 85$	Graph convolution
CPPCNet	Complex CNN	I/Q	$\sim 12$ K	$\sim 60.5$	$\sim 80$	Complex partial PWC
MST	Multi-scale T	I/Q	$\sim 600$ K	$\sim 65.0$	$\sim 87$	Cross-scale fusion

Table 4. Comparison of training strategies for DL-ESWR.

Strategy	Label Req.	Data Req.	Comp. Overhead	Key Advantage	Pub. Count
Supervised (baseline)	100% labeled	Large	Standard	Simple, proven	60+
Semi-supervised	5–20% labeled	Large (mixed)	Low–medium	Uses unlabeled data	$\sim 10$
Self-supervised/CL	0% (pretrain)	Large unlabeled	High (pretrain)	No labels needed	$\sim 19$
few-shot/Meta	1–5 per class	Small support	Medium	Fast adaptation	$\sim 12$
Zero-shot (VLM)	0 samples	Text descriptions	Very high	No signal data needed	$\sim 3$
Transfer learning	Limited target	Source + target	Medium	Cross-domain	$\sim 10$
Domain adaptation	0 target labels	Source + unlabeled	Medium	Domain invariance	$\sim 8$
Knowledge distill.	Same as teacher	Same	High (train)	Model compression	$\sim 6$
Federated learning	Distributed	Distributed	Comm. overhead	Privacy preserving	$\sim 5$
Multi-task	Multi-label	Standard	Low	Auxiliary supervision	$\sim 6$

Table 5. Proposed taxonomy of evaluation metrics for DL-ESWR. Metrics are grouped by the performance aspect they characterize.

Family	Representative Metrics	Primary Use Case
Accuracy	Overall acc., per-SNR acc., per-class acc., confusion matrix, macro-F1, balanced acc.	Core recognition performance
Efficiency	Params, FLOPs, latency (ms), throughput, memory (MB), energy (J/inference)	Edge/real-time deployment
Robustness	Attack success rate, min-perturbation norm, certified acc., cross-dataset acc.	Adversarial/environmental
Calibration	ECE, Brier score, negative log-likelihood, risk–coverage AUC	Trustworthy prediction
Open World	Open-set classification rate, AUROC, harmonic mean, forgetting measure	Unknown class & incremental

Table 6. Mapping of eight core challenges to primary innovation modules.

Challenge	Primary Modules	Approx. Pubs.	Research Maturity
1. Core Recognition Accuracy	II, III, IV	60+	Mature
2. Data Scarcity & Annotation Cost	I, IV	40+	Active Growth
3. Model Efficiency & Edge Deploy.	III, IV	20+	Moderate
4. Environmental Robustness	II, III, IV	30+	Active Growth
5. Adversarial Security	II, IV	28+	Active Growth
6. Interpretability & Trust	III, IV (ext.)	10+	Emerging
7. System Integration & Co-Design	II, III, IV	15+	Moderate
8. Open World & Incremental	III, IV	8+	Emerging

Table 7. Comparison of lightweight DL-ESWR models for edge deployment.

Model	Params	FLOPs	Acc. (%)	Latency	Platform
ResNet (baseline)	$\sim 300$ K	$\sim 15$ M	$\sim 61.5$	$\sim 0.8$ ms (GPU)	GPU server
MCNet	$\sim 25$ K	$\sim 1.5$ M	$\sim 60.2$	N/R	CR edge
ULCNN	$\sim 5$ K	$\sim 0.5$ M	$\sim 57.5$	N/R	UAV
CPPCNet	$\sim 12$ K	$\sim 0.8$ M	$\sim 60.5$	N/R	IoT
SNR-DSNet	$\sim 406$ K	$\sim 4$ M	$\sim 62.0$	N/R	Edge
LightAMC	$\sim 50$ K	$\sim 2$ M	$\sim 59.8$	N/R	IoT sensor
Pruned ResNet	$\sim 45$ K	$\sim 3$ M	$\sim 60.0$	N/R	FPGA
KD Student	$\sim 30$ K	$\sim 2$ M	$\sim 60.8$	N/R	Mobile
DLRT	$\sim 100$ K	$\sim 5$ M	$\sim 58.0$	$\sim 0.3$ ms (DSP)	SDR

Note: Values are approximate. Acc. = average classification accuracy on RML2016.10a. N/R = not reported.

Table 8. Summary of adversarial attack and defense methods for DL-ESWR.

Method	Type	Category	Threat Model	Key Mechanism
FGSM	Attack	Evasion	White box	Single-step gradient
PGD	Attack	Evasion	White box	Iterative gradient
C&W	Attack	Evasion	White box	Optimization-based
UAP	Attack	Universal	White/black	Input-agnostic perturbation
Backdoor	Attack	Trojan	Training phase	Trigger-activated misclass.
CCIFE	Attack	Ensemble	Black box	Channel-resilient transfer
CMA	Attack	Cross-modal	Black box	Cross-representation transfer
HFAD	Defense	Preprocessing	Any	Homomorphic filtering
Co-VQMAE	Defense	Preprocessing	Any	VQ + MAE purification
Adv. Training	Defense	Robust Train	Specific attacks	Augment w/ adv. examples
Meta-AT	Defense	Robust Train	Unseen attacks	Meta-learn adaptation
Multi-Distill.	Defense	KD-based	Multiple attacks	Multi-teacher distillation

Table 9. Comparison of open-set and incremental learning methods for AMR.

Method	Task	Approach	Module	Known Acc.	Unknown Det.
OpenMax	Open set	EVT calibration	IV	High	Moderate
CIGR	Open set	Class-guided recon.	III	High	High
Multi-View DF	Open set	Multi-view discrim.	III	High	High
GNN–OSR	Open set	Graph topology	III	Moderate	High
Expert-Know.	Open set	HOC + DL hybrid	II, III	High	High
PASS-Net	Incremental	Pseudo-class + stoch.	IV	High	N/A
Orth. Proto.	Incremental	Orthogonal proj.	IV	High	N/A
OSDA–AMC	Joint OS+DA	Domain + class adapt.	IV	High	Moderate
FG-OSR–SSL	Open set	Self-sup. + fine-tune	IV	High	High
GAN–OSR	Open set	Gen. unknown + OpenMax	I, IV	High	Moderate–High

Table 10. Mapping between future directions and current challenges.

Future Direction	Corresponding Challenge	Type	Key Technologies
Section 4.1 Universality & Scalability	Ch. 8: Open World	Extension	Meta-learning, lifelong learning
Section 4.2 Extreme Efficiency	Ch. 3: Model Efficiency	Extension	HW-NAS, AutoML
Section 4.3 Trustworthy AI	Chs. 5+6: Security+Interp.	Extension	Certified defense, intrinsic XAI
Section 4.4 Knowledge–Data Fusion	Chs. 1+6: Accuracy+Interp.	Extension	PINNs, knowledge graphs
Section 4.5 Foundation Models	Ch. 2: Data Scarcity	Extension	VLMs, wireless FM
Section 4.6 Multi-Function Waveform	—	New Frontier	ISAC, composite signals
Section 4.7 Emerging Paradigms	—	New Frontier	Semantic comm., Deep JSCC

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, D.; Yang, J.; Zhao, D.; Zhang, L.; Xu, Z.; Cao, A.; Lin, W.; Cheng, W.; Du, Q.; Li, L. AI for Wireless Waveform Recognition: A Survey from a Component Perspective. Electronics 2026, 15, 2112. https://doi.org/10.3390/electronics15102112

AMA Style

Zhao D, Yang J, Zhao D, Zhang L, Xu Z, Cao A, Lin W, Cheng W, Du Q, Li L. AI for Wireless Waveform Recognition: A Survey from a Component Perspective. Electronics. 2026; 15(10):2112. https://doi.org/10.3390/electronics15102112

Chicago/Turabian Style

Zhao, Decan, Junteng Yang, Dongwei Zhao, Lechi Zhang, Zhenyu Xu, Anjie Cao, Wensheng Lin, Wenchi Cheng, Qinghe Du, and Lixin Li. 2026. "AI for Wireless Waveform Recognition: A Survey from a Component Perspective" Electronics 15, no. 10: 2112. https://doi.org/10.3390/electronics15102112

APA Style

Zhao, D., Yang, J., Zhao, D., Zhang, L., Xu, Z., Cao, A., Lin, W., Cheng, W., Du, Q., & Li, L. (2026). AI for Wireless Waveform Recognition: A Survey from a Component Perspective. Electronics, 15(10), 2112. https://doi.org/10.3390/electronics15102112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI for Wireless Waveform Recognition: A Survey from a Component Perspective

Abstract

1. Introduction

1.1. Background and Significance of ESWR

1.2. Limitations of Traditional ESWR Approaches

1.3. The Deep Learning Paradigm Shift

1.4. Related Work and Research Gaps

1.5. Contributions and Organization

2. A Unified Framework for DL-ESWR and Its Core Modules

2.1. Module I: Dataset Construction and Data Augmentation

2.1.1. Benchmark Datasets for ESWR

2.1.2. Data Augmentation Techniques

2.1.3. Critical Analysis of Dataset Limitations

2.1.4. Toward an Ideal ESWR Dataset

2.2. Module II: Signal Representation and Preprocessing

2.2.1. Time-Domain Representations

2.2.2. Image-Domain Representations

2.2.3. Expert Feature Representations

2.2.4. Multi-Modal Representations (Input-Level Fusion)

2.3. Module III: Evolution of Core Network Architectures

2.3.1. Foundational Architectures

2.3.2. Hybrid Architectures

2.3.3. Advanced Architectures

2.3.4. Specialized Architectures

2.3.5. Comparative Analysis of Architectural Paradigms

2.4. Module IV: Training and Optimization Strategies

2.4.1. Goal I: Reducing Label Dependency (Annotation Efficiency)

2.4.2. Goal II: Reducing Deployment Cost (Efficiency Orientation)

2.4.3. Goal III: Enhancing Generalization Capability

2.4.4. Goal IV: Advanced Training Supervision Design

2.4.5. Goal V: Robust and Open-World Training Strategies

2.4.6. Summary of Training Strategy Landscape

2.5. Evaluation Metric Taxonomy

3. Addressing Practical Deployment Challenges Through Modular Innovation

3.1. Challenge I: Core Recognition Accuracy and Feature Discriminability

A Critical Gap: Distance to Theoretical Optimality

3.2. Challenge II: Data Scarcity and Annotation Cost

3.3. Challenge III: Model Efficiency and Edge Deployment

3.3.1. Beyond Parameters and FLOPs: The Missing Deployment Metrics

3.3.2. Scalability Pathways for Real-Time Systems

3.4. Challenge IV: Environmental Robustness and Generalization

The Gap Between Laboratory and Real-World Acquisition

3.5. Challenge V: Security and Adversarial Robustness

3.5.1. Understanding and Implementing Adversarial Attacks

3.5.2. Adversarial Defense Strategies

3.6. Challenge VI: Interpretability and Trustworthiness

Model Uncertainty Quantification

3.7. Challenge VII: System Integration and Cooperative Design

3.8. Challenge VIII: Open-World Recognition and Class-Incremental Learning

4. Future Research Directions and Outlook

4.1. Universality and Scalability (Extending Challenge VIII)

4.2. Extreme Efficiency (Extending Challenge III)

4.3. Trustworthy AI (Extending Challenges V and VI)

4.4. Knowledge–Data Fusion (Extending Challenges I and VI)

4.5. Foundation Model-Driven Approaches (Extending Challenge II)

4.6. Multi-Function Signal Waveform Recognition (New Frontier)

4.7. Waveform Recognition for Emerging Communication Paradigms (New Frontier)

5. Conclusions

5.1. Generalizability to Other Signal Processing Domains

5.2. Summary of Contributions and Findings

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI