ParalIMR: Bypassing Shortcut Learning in Incremental Modulation Recognition via Parallel Reconstruction and Feature Decoupling
Abstract
1. Introduction
- We design an auxiliary reconstruction module using a parallel architecture that decouples denoising from the primary classification task. It enables effective denoising across diverse SNR conditions and modulation orders without propagating overfitting errors to the classifier. The performance on the RadioML 2016 and RadioM 2018 datasets [11,12] demonstrates that by leveraging the decoupled parallel structure, our proposed algorithm significantly mitigates catastrophic forgetting, with particularly marked performance gains observed for high-SNR signals.
- We adapt the segment replacement strategy for incremental learning to alleviate shortcut learning. The rationale of SS is to break spurious correlations between background artifacts and class labels, compelling the model to prioritize modulation-invariant structural cues over environment-dependent shortcuts.
- We comprehensively evaluate ParalIMR on the RadioML 2016.10a dataset. Results confirm its robust and consistent recognition capability. Comparative experiments against benchmarks reveal its superior performance, while ablation studies validate the positive roles of both the decoupled DAE branch and the adaptive SS module in mitigating catastrophic forgetting and shortcut learning.
2. Related Work
3. Problem Statement
3.1. Signal Model
3.2. Denoising Problem
3.3. Shortcut Learning Problem
4. The Proposed Method
4.1. Segment Substitution-Based Feature Decoupling
- Discrete Segment Substitution: Let us randomly select discontinuous index , and construct a set . The discrete-replaced signal is then given by
- Continuous Segment Substitution: A segment of length L is randomly selected in signal and replaced with a continuous segment from . Let u and v, , be the random starting indices in and respectively. The continuous substitution can be expressed as follows.
- The signals and should belong to the same modulation category y. The application of substitution thus weakens the channel characteristics without compromising the structural features of the signal.
- The SNR of should be less than or equal to that of . Incorporating sample points with lower SNR can force the model to extract features under more severe noise conditions, thereby enhancing its noise resistance.
4.2. Denoise-Robust Classifying
- Shared Encoder: It maps the raw I/Q time-series signal into a high-dimensional latent feature space , wherein semantic abstractions are progressively distilled through hierarchical representation learning;
- Modulation Classifier: It uses to perform fine-grained discrimination among modulation categories;
- Reconstruction Decoder: It implements an invertible mapping from back to the original signal, which provides waveform-level signal recovery.
- 1D Convolutional: All 2D convolution kernels are replaced with 1D kernels. This change eliminates spatial redundancy and allows precise modeling of temporal dynamics, such as instantaneous phase shifts, frequency sweeps, and amplitude modulations, via sliding-window operations aligned with the signal’s intrinsic time axis.
- Residual Networks: To mitigate gradient degradation in deep stacks, we integrate residual connections. For an input , the layer learns a residual mapping , yielding output . This identity mapping ensures unimpeded gradient propagation to early layers, sustaining stable and efficient parameter updates throughout prolonged incremental training.
- Hierarchical multi-scale encoding: As detailed in Figure 2, the encoder comprises three stages. The number of channels doubles per stage () to enrich semantic expressivity, while temporal resolution is halved via stride-2 convolutions (), progressively compressing redundant temporal sampling. The resulting latent tensor thus balances expressive power and compactness—retaining discriminative high-level semantics while discarding sample-level noise and aliasing.
4.3. Training for Incremental Tasks
| Algorithm 1: The proposed Incremental Learning for AMR |
![]() |
5. Experiment and Discussion
5.1. Experimental Setup
- 1.
- The RML2016a dataset includes 11 modulation types, i.e., eight digital (BPSK, QPSK, 8PSK, 16QAM, 64QAM, PAM4, CPFSK, GFSK) and three analog (AM-DSB, AM-SSB, WBFM), where the SNR ranges from −20 dB to +18 dB in 2-dB steps. Each complex-valued I/Q sample is represented as a real-valued matrix, where the first row encodes the in-phase (I) component and the second row the quadrature (Q) component. For every SNR level, there are exactly 1000 samples per modulation type. In total, the dataset contains 220,000 labeled samples. (11 classes × 11 SNR levels × 1000 samples).The initial stage contains 5 modulation classes, followed by two incremental tasks with 3 new classes each. Specifically, the modulation types are categorized as follows: Stage 0 (Classes 0–4): CPFSK, WBFM, AM-DSB, GFSK, 8PSK; Stage 1 (Classes 5–7): PAM4, BPSK, AM-SSB; Stage 2 (Classes 8–10): QAM16, QPSK, QAM64.
- 2.
- The RML2018 dataset includes 24 types of modulation patterns as follows: OOK, 4ASK, 8ASK, BPSK, QPSK, 8PSK, 16PSK, 32PSK, 16APSK, 32APSK, 64APSK, 128APSK, 16QAM, 32QAM, 16QAM, 32QAM, 64QAM, 128QAM, 256QAM, AMSSB-SC, AM-DSB-WC, AM-DSB-SC, FM, GMSK, and OQPSK, where the SNR ranges from −30 dB to +20 dB in 2-dB steps. Each complex-valued I/Q sample is represented as a real-valued matrix, where the first row encodes the in-phase (I) component and the second row the quadrature (Q) component. For every SNR level, there are exactly 4096 samples per modulation type. In total, the dataset contains 2,555,904 labeled samples. (24 classes × 26 SNR levels × 4096 samples).The initial stage contains 12 modulation classes, followed by two incremental tasks with 6 new classes each. Specifically, the modulation types are categorized as follows: Stage 0 (Classes 0–11): 4ASK, QPSK, 64QAM, GMSK, 8ASK, 32QAM, 128APSK, BPSK, 16QAM, AM-SSB-WC, AM-DSB-SC and 16PSK; Stage 1 (Classes 12–17): AM-DSB-WC, 32PSK, 128QAM, 256QAM, FM and OQPSK; Stage 2 (Classes 18–24): 32APSK, AM-SSB-SC, 8PSK, 16APSK, 64APSK and OOK.
5.2. Parameter Analysis
5.3. Comparative Performance
5.4. Confusion Matrix
5.5. Ablation Study
5.6. Shortcut Learning
- AWGN dataset: This dataset incorporates only Gaussian white noise to simulate ideal fading-free channel conditions, serving as a baseline environment for evaluating the impact of background noise.
- Rayleigh + AWGN dataset: Signals undergo Rayleigh channel under simulated multipath propagation conditions, followed by Gaussian white noise addition to characterize its behavioral characteristics in a purely scattering environment.
- Rician + AWGN dataset: Signals propagate through Rice channel under a mixed propagation environment containing both direct and multipath components, followed by the addition of Gaussian white noise to analyze signal performance in scenarios with partial direct signals.
- Rician + Rayleigh + AWGN dataset: Signals first pass through Rayleigh channel followed by Rice channel, simulating more complex fading scenarios, and are finally supplemented with Gaussian white noise to evaluate signal performance in highly dynamic channels.
- Configuration 1 employs the Rician + Rayleigh + AWGN dataset as the training set and the AWGN dataset as the test set, aiming to verify whether the model maintains robust recognition performance under ideal AWGN conditions after training in complex fading channel environments.
- Configuration 2 uses the AWGN dataset as the training set and the Rayleigh + AWGN dataset as the test set, verifying whether the model can maintain stable recognition accuracy in complex channel scenarios with superimposed Rayleigh fading after being trained solely on an AWGN environment.
- Configuration 3 employs the Rayleigh + AWGN dataset as the training set, while the Rician + AWGN dataset acts as the test set, verifying whether the model trained in a channel environment with Rayleigh fading can maintain reliable recognition performance even under a Rician fading channel scenario.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhu, Z.; Nandi, A.K. Automatic Modulation Classification: Principles, Algorithms and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Zhang, C.; Patras, P.; Haddadi, H. Deep learning in mobile and wireless networking: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 2224–2287. [Google Scholar] [CrossRef]
- Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef] [PubMed]
- Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Hadsell, R. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.-A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Miuccio, L.; Panno, D.; Riolo, S. A Wasserstein GAN Autoencoder for SCMA Networks. IEEE Wirel. Commun. Lett. 2022, 11, 1298–1302. [Google Scholar] [CrossRef]
- Geirhos, R.; Jacobsen, J.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
- Liu, Z.; Li, Y.; Wu, Y.; Gong, Y. Learning to Optimize Resource Allocation in Dynamic Wireless Environments: Embracing the New While Engaging the Old. IEEE Trans. Wirel. Commun. 2025, 24, 7346–7359. [Google Scholar] [CrossRef]
- Qu, Y.; Lu, Z.; Zeng, R.; Wang, J.; Wang, J. Enhancing automatic modulation recognition through robust global feature extraction. IEEE Trans. Veh. Technol. 2024, 74, 4192–4207. [Google Scholar] [CrossRef]
- O’Shea, T.J.; West, N. Radio machine learning dataset generation with gnu radio. In Proceedings of the GNU Radio Conference, Boulder, CO, USA, 12–16 September 2016. [Google Scholar]
- O’Shea, T.J.; Roy, T.; Clancy, T.C. Convolutional radio modulation recognition networks. In Proceedings of the 17th International Conference on Engineering Applications of Neural Networks, Aberdeen, UK, 2–5 September 2016; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
- Krzyston, J.; Bhattacharjea, R.; Stark, A. Complex-Valued Convolutions for Modulation Recognition using Deep Learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
- Lin, S.; Zeng, Y.; Gong, Y. Learning of time-frequency attention mechanism for automatic modulation recognition. IEEE Wirel. Commun. Lett. 2022, 11, 707–711. [Google Scholar] [CrossRef]
- Tunze, G.B.; Huynh-The, T.; Lee, J.-M.; Kim, D.-S. Multi-shuffled convolutional blocks for low-complex modulation recognition. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
- Emam, A.; Shalaby, M.; Aboelazm, M.A.; Bakr, H.E.A.; Mansour, H.A.A. A comparative study between CNN, LSTM, and CLDNN models in the context of radio modulation classification. In Proceedings of the 12th International Conference on Electrical Engineering (ICEENG), Cairo, Egypt, 7–9 July 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
- Chang, S.; Yang, Z.; He, J.; Li, R.; Huang, S.; Feng, Z. A fast multi-loss learning deep neural network for automatic modulation classification. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 1503–1518. [Google Scholar] [CrossRef]
- Jang, J.; Pyo, J.; Yoon, Y.; Choi, J. Meta-transformer: A meta-learning framework for scalable automatic modulation classification. IEEE Access 2024, 12, 9267–9276. [Google Scholar]
- Lei, J.; Li, Y.; Lo, Y.; Leng, Y.; Lin, Q.; Wu, Y. Understanding Complex-Valued Transformer for Modulation Recognition. IEEE Wirel. Commun. Lett. 2024, 13, 3523–3527. [Google Scholar] [CrossRef]
- Tunze, G.B.; Huynh-The, T.; Lee, J.-M.; Kim, D.-S. Sparsely Connected CNN for Efficient Automatic Modulation Recognition. IEEE Trans. Veh. Technol. 2020, 69, 15557–15568. [Google Scholar] [CrossRef]
- Ma, J.; Hu, M.; Chen, X.; Wan, L.; Wang, J. Few-shot automatic modulation classification via semi-supervised metric learning and lightweight conv-transformer model. IEEE Trans. Cogn. Commun. Netw. 2025, 12, 1012–1024. [Google Scholar]
- Lyu, Z.; Xiao, M.; Skoglund, M.; Debbah, M.; Poor, H.V. Quantization-Aware Collaborative Inference for Large Embodied AI Models. arXiv 2026, arXiv:2602.13052. [Google Scholar] [CrossRef]
- Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
- Zhang, X.; Li, T.; Gong, P.; Liu, R.; Zha, X.; Tang, W. Open set recognition of communication signal modulation based on deep learning. IEEE Commun. Lett. 2022, 26, 1588–1592. [Google Scholar] [CrossRef]
- Montes, C.; Morehouse, T.; Zhou, R. Class-incremental learning for baseband modulation classification: A comparison. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; IEEE: New York, NY, USA, 2024. [Google Scholar]
- Xu, B.; Wang, H.; Wu, B.; Cui, Z.; Cao, Z. Signal Modulation Recognition via Bias Adjustment-Based Class Incremental Learning. IEEE Sens. J. 2024, 24, 41437–41450. [Google Scholar] [CrossRef]
- Fan, Z.; Tu, Y.; Lin, Y.; Shi, Q. C-SRCIL: Complex-valued Class-Incremental Learning for Signal Recognition. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
- Fan, Z.; Tu, Y.; Lin, Y.; Shi, Q. Class-Incremental Learning for Recognition of Complex-Valued Signals. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 417–428. [Google Scholar]
- Deng, Z.; Luo, C.; Tang, Z.; Luo, Y. MSNCIL: A Domain-Agnostic Class-Incremental Learning Method Tailored for Automatic Modulation Recognition. IEEE Commun. Lett. 2025, 29, 1456–1460. [Google Scholar]
- Wang, G.; Liu, Z.; Zhang, X.; Chen, Y.; Zhang, Y.; Zhu, J. PID: A Parameter-Efficient Isolation Domain-Incremental Learning Framework for Signal Modulation Classification. IEEE Trans. Neural Netw. Learn. Syst. 2026, 37, 1449–1462. [Google Scholar] [PubMed]
- Tan, H.; Zhang, Z.; Li, Y.; Shi, X.; Wang, L.; Yang, X. PASS-Net: A Pseudo Classes and Stochastic Classifiers-Based Network for Few-Shot Class-Incremental Automatic Modulation Classification. IEEE Trans. Wirel. Commun. 2024, 23, 17987–18003. [Google Scholar]
- Qi, P.; Zhou, X.; Ding, Y.; Zheng, S.; Jiang, T.; Li, Z. Collaborative and incremental learning for modulation classification with heterogeneous local dataset in cognitive IoT. IEEE Trans. Green Commun. Netw. 2022, 7, 881–893. [Google Scholar] [CrossRef]
- Li, Z.; Hoiem, D. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]

















| Name | Settings | Input Dimension | Output Dimension | |
|---|---|---|---|---|
| shared encoder | input Conv | Conv1d (kernel size = 3, stride = 1, padding = 0) | ||
| stage 1 | ResNet (stride = 1) | |||
| stage 2 | ResNet (stride = 2) | |||
| stage 3 | ResNet (stride = 2) | |||
| decoder | upsample 1 | kernel size = 4, stride = 2 | ||
| skip connection | ||||
| upsample 2 | kernel size = 4, stride = 2 | |||
| output Conv | kernel size = 4, stride = 1 |
| Name | Settings | Input Dimension | Output Dimension | |
|---|---|---|---|---|
| shared encoder | input Conv | Conv1d (kernel size = 3, stride = 1, padding = 0) | ||
| stage 1 | ResNet (stride = 1) | |||
| stage 2 | ResNet (stride = 2) | |||
| stage 3 | ResNet (stride = 2) | |||
| stage 4 | ResNet (stride = 2) | |||
| decoder | upsample 1 | kernel size = 4, stride = 2 | ||
| skip connection 1 | ||||
| upsample 2 | kernel size = 4, stride = 2 | |||
| skip connection 2 | ||||
| upsample 3 | kernel size = 4, stride = 2 | |||
| output Conv1 | kernel size = 3, stride = 1 | |||
| output Conv2 | kernel size = 3, stride = 1 |
| SNR (dB) | Stage 1 | Stage 2 | Stage 3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| iCaRL | SS | PR | PIMR | iCaRL | SS | PR | PIMR | iCaRL | SS | PR | PIMR | |
| −4 | 86.48 | 87.34 | 86.98 | 88.87 | 70.79 | 73.59 | 70.04 | 75.64 | 60 | 66.53 | 60.43 | 67.1 |
| −2 | 93.28 | 94.12 | 93.59 | 94.29 | 82.29 | 80.94 | 82.11 | 83.51 | 73.44 | 77.63 | 73.66 | 78.01 |
| 0 | 97.86 | 97.88 | 98.2 | 98.54 | 87.93 | 87.17 | 88.1 | 88.32 | 81.63 | 85.19 | 82.96 | 84.32 |
| 2 | 99.11 | 98.85 | 98.93 | 99.32 | 89.63 | 88.99 | 89.68 | 89.78 | 84.41 | 86.56 | 85.36 | 86.26 |
| 4 | 99.25 | 98.91 | 99.06 | 99.43 | 90.17 | 89.18 | 89.72 | 90.31 | 85.44 | 87.39 | 86.49 | 87.16 |
| 6 | 99.29 | 99.22 | 99.22 | 99.49 | 90.67 | 90.74 | 90.79 | 90.85 | 87.37 | 88.61 | 88.79 | 88.79 |
| 8 | 99.17 | 99.16 | 99.16 | 99.3 | 90.14 | 89.32 | 89.61 | 90.3 | 86.37 | 88.15 | 87.97 | 87.94 |
| 10 | 99.45 | 99.09 | 99.25 | 99.55 | 91.15 | 90.2 | 90.92 | 91.08 | 87.59 | 88.38 | 89.12 | 88.69 |
| 12 | 99.46 | 99.72 | 99.56 | 99.62 | 90.83 | 91.07 | 90.68 | 90.82 | 87.46 | 88.96 | 87.8 | 88.81 |
| 14 | 99.3 | 99.34 | 99.34 | 99.37 | 90.8 | 89.69 | 89.49 | 90.57 | 87.31 | 88.12 | 87.62 | 88.17 |
| 16 | 99.27 | 98.44 | 98.36 | 99.34 | 90.49 | 90.32 | 90.12 | 90.54 | 87.71 | 88.38 | 87.67 | 88.96 |
| 18 | 99.24 | 99.33 | 99.41 | 99.43 | 90.55 | 90.41 | 90.11 | 90.21 | 87.11 | 87.31 | 87.85 | 87.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Z.; Zhou, Z.; Wu, Y. ParalIMR: Bypassing Shortcut Learning in Incremental Modulation Recognition via Parallel Reconstruction and Feature Decoupling. Electronics 2026, 15, 2766. https://doi.org/10.3390/electronics15132766
Wang Z, Zhou Z, Wu Y. ParalIMR: Bypassing Shortcut Learning in Incremental Modulation Recognition via Parallel Reconstruction and Feature Decoupling. Electronics. 2026; 15(13):2766. https://doi.org/10.3390/electronics15132766
Chicago/Turabian StyleWang, Zhilong, Zhiheng Zhou, and Yuansheng Wu. 2026. "ParalIMR: Bypassing Shortcut Learning in Incremental Modulation Recognition via Parallel Reconstruction and Feature Decoupling" Electronics 15, no. 13: 2766. https://doi.org/10.3390/electronics15132766
APA StyleWang, Z., Zhou, Z., & Wu, Y. (2026). ParalIMR: Bypassing Shortcut Learning in Incremental Modulation Recognition via Parallel Reconstruction and Feature Decoupling. Electronics, 15(13), 2766. https://doi.org/10.3390/electronics15132766

