Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation

Mu, Shiyuan; Chen, Shuai; Zu, Yong; Feng, Zhixi; Yang, Shuyuan

doi:10.3390/electronics14132728

Open AccessArticle

Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation

by

Shiyuan Mu

^1,2,

Shuai Chen

¹,

Yong Zu

¹,

Zhixi Feng

¹

and

Shuyuan Yang

^1,*

¹

School of Artificial Intelligence, Xidian University, Xi’an 710126, China

²

The 54th Research Institute of CETC, Shijiazhuang 050081, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2728; https://doi.org/10.3390/electronics14132728

Submission received: 1 June 2025 / Revised: 29 June 2025 / Accepted: 2 July 2025 / Published: 7 July 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

The application of self-supervised learning (SSL) is increasingly imperative for advancing wireless communication technologies, particularly in scenarios with limited labeled data. Traditional data-augmentation-based SSL methods have struggled to accurately capture the intricate properties of wireless signals. This letter introduces a novel self-supervised learning framework that leverages feature-level augmentation combined with contrastive learning to enhance wireless signal recognition. Extensive experiments conducted in various environments demonstrate that the proposed method achieves improvements of more than 2.56% over the existing supervised learning (SL) methods and SSL methods on the RadioML2016.10a and ADS-B datasets. Moreover, the experimental results show that the proposed SSL pre-training strategy improves performance by 4.67% compared to supervised approaches. These results validate that the proposed method offers stronger generalization capabilities and superior performance when handling different types of wireless signal tasks.

Keywords:

self-supervised learning; feature-level augmentation; wireless signal classification

1. Introduction

The rapid growth of wireless communication systems and the ever-increasing demand for spectrum efficiency present significant challenges in managing and optimizing wireless frequency resources [1]. Effective wireless signal classification (WSC) is a key enabler in addressing these challenges [2,3]. In this context, advanced techniques such as deep learning (DL) have emerged as promising solutions for WSC, particularly in scenarios with limited labeled data. Effective WSC is vital in various real-world applications, including specific emitter identification (SEI), which distinguishes individual transmitters by extracting wireless frequency fingerprinting (RFF) features. SEI is essential for military radar communication, interference detection, and cognitive wireless systems, leveraging hardware-specific discrepancies to provide tamper-resistant and counterfeit-resilient identification. Similarly, accurate modulation type classification is critical for optimizing spectrum utilization and ensuring reliable communication in increasingly complex electromagnetic environments. However, traditional WSC methods often rely on prior information and feature extraction, which may fail to fully capture the dynamic and intricate characteristics of wireless signals. Consequently, there is a growing need for effective WSC methods to extract and utilize wireless resources more efficiently [4].

With the rapid progression of deep learning (DL) technologies, a growing number of DL-based wireless signal classification (WSC) methodologies have been developed. Unlike traditional machine learning (ML) approaches, DL-based methods excel in automatically extracting high-dimensional features from radio signals through neural networks, which enhances their capability to manage non-cooperative signals within dynamic spectrum environments [5]. Nonetheless, conventional supervised learning (SL) frameworks necessitate a substantial amount of labeled data for training neural networks—a requirement that is often challenging to meet and prohibitively expensive in real-world scenarios [6]. This limitation significantly impedes the practical deployment of SL-based WSC techniques in engineering applications. Consequently, the ability to effectively leverage unlabeled radio signals in data-constrained environments has become increasingly critical.

To address the challenge of limited labeled samples, self-supervised learning (SSL) has emerged as a promising solution that extracts features from unlabeled data. SSL leverages pretext tasks during pre-training to capture high-dimensional representations through encoders, which are then transferred to downstream tasks. By designing various loss functions in pretext tasks, models can learn comprehensive representations that encapsulate the intrinsic information of the original input. Consequently, in downstream tasks, a simple classifier can achieve effective wireless signal classification (WSC) even with limited labeled samples. Recent studies [7,8] have employed contrastive SSL, primarily relying on the MoCo [9] framework. Additionally, the research [10] has demonstrated remarkable results in WSC by employing contrastive clustering of radio signals at both the instance and emitter levels. Traditional methods construct positive and negative samples through data augmentation and perform contrastive learning on high-dimensional representations. However, they have overlooked feature-level augmentation strategies, which may limit their ability to capture the complex characteristics of radio signals.

In this study, we introduce feature-level augmented contrastive learning (FLA-CL). FLA-CL is primarily composed of two parts: (i) a self-supervised pretraining encoder incorporating feature-level augmentation and (ii) a classifier for downstream tasks. Throughout the self-supervised pretraining phase, we apply both data and feature-level augmentations to generate improved positive and negative sample pairs within a high-dimensional space. This method, regulated by contrastive loss, allows the network to acquire stronger and more discriminative feature representations. The pretrained encoder’s weights are then utilized for downstream tasks, where WSC is achieved using a classification head. Our contributions are as follows:

To address the challenges of feature representation, robustness, and discriminability in complex wireless signal classification tasks, we propose a feature-level augmented self-supervised pretraining strategy. This strategy constructs positive and negative sample pairs in high-dimensional space.
The proposed FLA-CL is evaluated on two datasets. The experimental results demonstrate that even with only 10% of the data used for training in the downstream task, our method outperforms other SL and SSL methods in terms of classification accuracy and robustness.

This paper is structured as follows: An overview of the proposed framework is depicted in Figure 1. Section 2 introduces the signal model and formulates the problem. Section 3 provides a detailed explanation of the proposed FLA-CL. In Section 4, we evaluate the effectiveness of the proposed framework using several experiments conducted on two datasets. Lastly, Section 5 offers concluding remarks.

2. The Basic Signal Model and Problem Formulation

2.1. Signal Model

This paper examines a wireless communication system characterized by a single input and a single output, where the symbols transmitted by the primary user are converted into the generated signal

s (t)

. The signal received by the secondary user,

r (t)

, is expressed as

r (t) = h (t) * s (t) + n (t),

(1)

where

n (t)

is the additive white Gaussian noise [11],

h (t)

represents the equivalent channel impulse response, and * is the convolution operation.

Generally, the received continuous-time signal

r (t)

is converted into a discrete-time sequence

r (n)

through uniform, equidistant sampling. The discrete sequence

r (n)

is typically represented in complex form, comprising an in-phase (I) component and a quadrature (Q) component; that is,

r (n) = r_{I} (n) + r_{Q} (n), n = 0, 1, 2, \dots, N - 1,

(2)

where

r_{I} (n)

and

r_{Q} (n)

are the real part and the imaginary part of the

r (n)

, respectively.

2.2. Problem Formulation

In the context of self-supervised learning for WSC, the dataset is partitioned into an unlabeled pretraining set

D_{pre} = {x_{i}}_{i = 0}^{N - 1}

and a labeled fine-tuning set

D_{fin} = {x_{j}, y_{j}}_{j = 0}^{M - 1}

, with

D_{fin}

further divided into training, validation, and testing subsets. The primary objective of pretraining is to learn a robust and generalizable encoder

f (\cdot)

by optimizing the parameters

θ

using a self-supervised loss

L_{pre}

. This process leverages the extensive unlabeled data to capture intrinsic signal characteristics essential for distinguishing various radiation sources. Subsequently, during the downstream fine-tuning phase, the pretrained encoder

f (\cdot)

is adapted alongside a classifier

f_{C} (\cdot)

, resulting in updated parameters

ψ

and

ϕ

, respectively, by minimizing a fine-tuning loss

L_{fin}

on the labeled dataset

D_{fin}

. This fine-tuning enables the model to accurately classify radiation sources based on the learned representations. Finally, the performance of the fine-tuned encoder

f (\cdot; ψ)

and classifier

f_{C} (\cdot; ϕ)

is evaluated using the test set to assess the effectiveness and generalization capability of the SSL approach in radiation source identification.

3. The Proposed Framework

The overall framework of the proposed FLA-CL is illustrated in Figure 1. Given a wireless signal, it is first transformed into two augmentations,

x_{s}

and

x_{w}

, through strong or weak augmentation. Next, these two augmentations are fed into a Transformer encoder to obtain feature representations

z_{s}

and

z_{w}

. Then, to extract more comprehensive representations,

z_{s}

is passed through the enhanced prediction head (EPH) to generate the feature

z_{s}^{'}

. Finally,

z_{s}^{'}

and

z_{w}

are learned in the high-dimensional feature space through contrastive loss to acquire discriminative features.

3.1. Contrastive SSL Framework

3.1.1. Transformer-Based Encoder

At the beginning of the encoder, an input encoding layer, comprising residual convolutional blocks and positional encoding layers, is introduced to extract high-frequency information from the radio signal data. This module provides a broader and richer feature space for the subsequent Transformer-based encoder, thereby enhancing the overall performance of the model. In the TCN module, the ReLU function enhances the model’s capacity to handle nonlinear functions, forcing the model to learn and extract high-frequency information. The structure of the residual convolutional block within the TCN module is as follows:

\begin{matrix} z_{1} = R e L U (B N (C o n v (x; W_{1}))), \\ z_{2} = R e L U (B N (C o n v (z_{1}; W_{2}))), \\ z_{3} = R e L U (B N (C o n v (x; W_{3}))), \\ \tilde{z} = z_{2} + z_{3}, \end{matrix}

(3)

where

C o n v (\cdot; W)

denotes the convolutional layer,

B N (\cdot)

represents the BatchNorm layer, and

R e L U (\cdot)

stands for the ReLU function.

Although the input encoding layer excels at capturing local spatiotemporal features, they are limited in its ability to integrate global contextual information. To address this, the Informer module is introduced to extract relevant features from long-term sequences across multiple time steps [12]. Compared to traditional Transformer models, the Informer employed in the encoder streamlines the computational process within the Multi-Head Self-Attention (MHSA) framework. MHSA is an effective approach for modeling long-range dependencies between different segments. It operates in parallel across multiple attention heads, with each head focusing on different subspaces of the input sequence representation. This approach allows for more accurate modeling of dependencies between distant elements within the input sequence. The feed-forward network (FFN) structure between MHSA layers comprises two consecutive linear transformation layers. These layers include nonlinear activation functions, which enhance the interaction and transformation of information between MHSA modules, thereby improving the overall representation of the model. The existing studies have demonstrated the effectiveness of the Informer architecture on wireless signal classification tasks [13].

The alternating stacks of multi-layer MHSA and FFN constitute the primary architectural framework of Informer. This layered design facilitates the gradual refinement and comprehensive understanding of deep modulation information. The process can be expressed as (4) and (5):

\begin{matrix} {\hat{x}}^{k} = L a y e r N o r m (x^{k - 1} + M H S A (x^{k - 1})), \end{matrix}

(4)

\begin{matrix} x^{k} = L a y e r N o r m ({\hat{x}}^{k} + F F N ({\hat{x}}^{k})), \end{matrix}

(5)

where

L a y e r N o r m (\cdot)

represents layer normalization, k denotes the index of the layers, and

x^{0}

and

x^{K}

denote the output of the TCN and Informer, respectively. For ease of representation,

x^{K}

is abbreviated as x.

3.1.2. Enhanced Prediction Head

As shown in Figure 1C, the encoder maps the features

z_{s}

extracted from the strongly augmented input into an enhanced feature space via the EPH, resulting in the enhanced feature

z_{s}^{'}

. This feature-level augmentation significantly increases the diversity of positive and negative samples in the feature space, compelling the network to learn more comprehensive feature representations. The structure of the EPH is as follows:

\begin{matrix} z_{s}^{'} = L i n e a r (Re L U (D r o p O u t (L i n e a r (z_{s})))), \end{matrix}

(6)

where

L i n e a r (\cdot)

denotes the fully connected layer, and

D r o p O u t (\cdot)

denotes the DropOut layer.

By enhancing

z_{s}

, the distribution range and diversity of positive and negative samples in the feature space are expanded, which strengthens the consistency of sample features belonging to the same class in the feature space. This promotes greater intra-class aggregation and facilitates the model’s learning of more robust and comprehensive features.

3.1.3. Contrastive Loss

In contrastive learning, the InfoNCE loss is a critical component. It helps the model learn practical and highly discriminative feature representations in SSL tasks by pulling similar sample pairs closer and pushing dissimilar pairs apart. These learned representations typically lead to improved performance in downstream tasks. The mathematical formula for the InfoNCE loss is as follows:

\begin{matrix} L_{InfoNCE} = - log \frac{exp (q \cdot k^{+} / T)}{exp (q \cdot k^{+} / T) + \sum_{i = 1}^{K} exp (q \cdot k_{i}^{-} / T)}, \end{matrix}

(7)

where

q

is the query vector,

k^{+}

is the positive key vector from the same sample,

k_{i}^{-}

denotes the negative key vectors from different samples, and T is the temperature scaling factor that adjusts the sharpness of the similarity distribution.

3.2. Downstream Tasks

In the downstream classification task, the features extracted by the FLA-CL pretrained encoder are fed into a randomly initialized classification head

h_{ϕ} (\cdot)

, which is updated using a small number of labeled samples. The structure of the classification head is identical to that of the EPH, except that the output dimension of the final fully connected layer corresponds to the number of classes. The network is updated using the cross-entropy loss as follows:

L_{C E} = - E_{B_{L}} [y_{i} log (S o f t m a x (h_{ϕ} (z)))] .

(8)

where

y_{i}

is the ground truth label.

4. Experiment Results and Analysis

4.1. Datasets and Implementation Details

To substantiate the efficacy of the proposed method, comprehensive experiments were conducted on four public datasets: RadioML2016.10a [14] and the ADS-B short dataset [15], both acquired in real-world scenes. For brevity, these datasets are hereafter referred to as RMLa and ADS-B, respectively.

4.1.1. RMLa

The RMLa dataset serves as a benchmark for WSC, comprising synthetic signals across 11 modulation schemes, including both digital (e.g., BPSK, QPSK) and analog (e.g., AM-SSB, WBFM) types. It provides 2000 samples per modulation type in various signal-to-noise ratios (SNRs) (−20 dB to 18 dB), with each sample comprising 128 I/Q components. This dataset is crucial for evaluating machine learning models in tasks such as modulation recognition under various noise conditions.

4.1.2. ADS-B

The ADS-B dataset consists of automatic dependent surveillance-broadcast signals used for aircraft tracking and identification. It captures real-world aviation data encompassing 198 classes.

The experiments were implemented in the PyTorch v1.13.1 framework and conducted on an Ubuntu server equipped with an NVIDIA RTX 3090 GPU platform (NVIDIA Corporation, Santa Clara, CA, USA). The unlabeled experimental data were split into training, validation, and test sets in an 8:1:1 ratio. During the SSL pre-training phase, the unlabeled data were further divided into training and validation sets in a 7:3 ratio. The SSL stage was fixed at 100 epochs, and the network model was updated using the AdamW optimizer. In the fine-tuning phase, various data augmentation techniques were applied, including time warping, rotation, inversion, and time-domain signal dropout [8]. A portion of the unlabeled data was manually labeled for training during the 50-epoch fine-tuning phase. The model with the best performance on the validation set was selected for testing on the test set, and the learning rate followed a cosine annealing schedule. The initial learning rate for both SSL and SL was set to 0.001. The validation and test sets were strictly excluded from participation in any training stage.

4.2. Comparison Experiment Results

This study compares the proposed FLA-CL with various SSL methods, such as SemiAMC [7] and TCSSAMR [8], as well as SL methods like HCGDNN [10], PETCGDNN [16], MCLDNN [17], ICAMCNet [18], AWN [19], CVCNN [20], SFS-SEI [21], and the supervised FLA-CL. Table 1 presents the means and standard deviations of the OA and kappa scores across the two experiments for all the evaluated methods. Notably, the proposed FLA-CL method achieved the highest OA across both two datasets.

The visualization of our experimental results is shown in Figure 2. Specifically, Figure 2a presents the accuracy curves of different methods on the RML2016a dataset under varying SNRs. Figure 2b shows the corresponding accuracy curves on the ADS-B dataset. Figure 2c illustrates the ablation study of the EPH module on the RML2016a dataset.

As shown in Table 1, after SSL pre-training, the proposed FLA-CL demonstrates a 4.76% improvement in OA compared to fine-tuning with random initialization. This confirms that FLA-CL effectively leverages large amounts of unlabeled data to learn broadly applicable features. Compared to other SL and SSL methods, FLA-CL achieves at least a 2.56% improvement in OA. Additionally, as depicted in Figure 2a, FLA-CL outperforms TCSSAMR in high SNR regions above 0 dB, with an overall OA improvement of 5.55%. This is primarily due to FLA-CL’s use of the Transformer-based Informer architecture, which enhances the relationships between key global attention points, thereby boosting network performance. As illustrated in Figure 3, at 0 dB, the proposed method achieves nearly 90% recognition accuracy for most modulation signals. However, due to the high similarity between 16-QAM and 64-QAM in the time domain, the network tends to confuse these two categories.

As shown in Figure 2b and Table 1, the proposed FLA-CL outperforms the CVCNN and SFS-SEI methods, which are specifically designed for emitter recognition, on the ADS-B dataset, with a 3.62% improvement in OA. This demonstrates the excellent generalization ability of our approach across different radio signal datasets. Compared to FLA-CL (SL), our method achieves a 10.26% improvement in OA after self-supervised pre-training, further validating the effectiveness of SSL. Due to the large number of classes in the ADS-B dataset, the overall confusion matrix visualization is less effective. Therefore, we selected the top 10 best-performing and bottom 10 worst-performing classes for confusion matrix visualization. As shown in Figure 4, our proposed method achieves perfect classification performance on the best-performing classes, and even for the worst-performing classes, it still demonstrates reasonable effectiveness.

4.3. Ablation Experiment Results

To evaluate the impact of the EPH component in the proposed FLA-CL method, an ablation study was conducted by removing the EPH from the module, denoted as FLA-CL (w/o EPH). The accuracy curves across different SNRs on the RMLa dataset are shown in Figure 2c. We observe that in the −20 to −10 dB range, the performance difference between the two is minimal, as it is difficult for the network to learn effective feature information amid significant noise at extremely low SNRs. However, in the −6 to 0 dB range, FLA-CL achieves an average 5.63% improvement in overall accuracy (OA) compared to FLA-CL (w/o EPH). This improvement is due to the enhanced feature representation in the high-dimensional space provided by the EPH during SSL pre-training, allowing the model to learn more generalized and robust features. Moreover, although feature-level and data-level augmentations are beneficial for generating samples required by contrastive learning, they inevitably introduce noise that may not be present in real-world environments. Therefore, EPH applies dropout to linear layers, which allows the model to better capture subtle differences while dynamically ignoring noise. This architectural innovation contributes significantly to the overall performance gain, especially under high-SNR conditions where background noise is minimal. In such cases, the impact of noise introduced by data augmentation becomes more pronounced, and reducing its influence on the model is particularly important, thereby improving performance under low SNR conditions.

4.4. Limitations and Future Works

Although the proposed method shows promising results in wireless signal representation, several limitations need to be addressed. First, all experiments were conducted with simulated signals and did not include real-world data from actual communication systems. Consequently, real-life impairments such as channel noise, interference, and hardware-induced distortions were overlooked, which may have impacted the model’s generalization performance in practical settings. Second, the current framework has limited ability to distinguish between 16-QAM and 64-QAM modulated signals, which share a similar constellation structure and can often result in misclassification. Third, this study primarily focused on classification accuracy, neglecting essential factors such as computational complexity, model size, and inference efficiency, which are crucial for deployment on edge or embedded devices during model design.

To tackle the limitations mentioned, we propose several avenues for future research. First, we aim to gather real-world signal data using universal software radio peripheral (USRP) transceiver platforms to enhance the dataset with realistic channel conditions and hardware influences, facilitating a more thorough evaluation under practical scenarios. Second, to boost the ability to discriminate between similar modulation types, we plan to investigate advanced feature learning strategies, such as attention-based modules or metric learning approaches, which can more effectively capture subtle differences in signal characteristics. Lastly, to enable deployment on resource-constrained edge devices, we will examine model compression techniques, such as network pruning, quantization, and knowledge distillation, with the goal of creating a lightweight yet efficient model version.

5. Conclusions

This paper presents FLA-CL, an innovative self-supervised learning framework that enhances wireless signal recognition by employing feature-level augmentation combined with contrastive learning. This method enables the model to acquire strong and distinct feature representations by creating improved positive and negative sample pairs within a high-dimensional environment, thereby allowing the model to develop robust and distinct feature representations. Comprehensive experiments with both simulated and real-world data demonstrate the effectiveness of FLA-CL, resulting in significant improvements in classification accuracy and resilience compared to the existing methodologies. Remarkably, even when utilizing only 10% of labeled data for downstream tasks, our approach consistently surpasses other supervised learning (SL) and self-supervised learning (SSL) techniques, underlining the efficacy of feature-level augmentation for wireless communication applications.

Author Contributions

Conceptualization, S.M. and S.Y.; methodology, S.M.; software, S.C. and Y.Z.; validation, Z.F. and Y.Z.; formal analysis, S.M.; data curation, S.M. and S.C.; writing—original draft preparation, S.M. and S.C.; writing—review and editing, S.Y. and Z.F.; visualization, Y.Z.; supervision, S.Y.; funding acquisition, Z.F. and S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in the RML2016.10a dataset at https://www.deepsig.io/datasets, and the ADS-B dataset at https://doi.org/10.1016/j.cja.2021.08.016.

Conflicts of Interest

Author Shiyuan Mu was employed by the company The 54th Research Institute of CETC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiao, C.; Yang, S.; Feng, Z. Complex-Valued Depthwise Separable Convolutional Neural Network for Automatic Modulation Classification. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
Feng, Z.; Chen, S.; Ma, Y.; Gao, Y.; Yang, S. Learning Temporal–Spectral Feature Fusion Representation for Radio Signal Classification. IEEE Trans Ind. Inform. 2024, 21, 791–800. [Google Scholar] [CrossRef]
Liang, Y.-C.; Chen, K.-C.; Li, G.Y.; Mahonen, P. Cognitive radio networking and communications: An overview. IEEE Trans. Veh. Technol. 2011, 60, 3386–3407. [Google Scholar] [CrossRef]
Zhang, H.; Yang, N.; Huangfu, W.; Long, K.; Leung, V.C.M. Power Control Based on Deep Reinforcement Learning for Spectrum Sharing. IEEE Trans. Wirel. Commun. 2020, 19, 4209–4219. [Google Scholar] [CrossRef]
Chang, S.; Huang, S.; Zhang, R.; Feng, Z.; Liu, L. Multitask-Learning-Based Deep Neural Network for Automatic Modulation Classification. IEEE Internet Things J. 2022, 9, 2192–2206. [Google Scholar] [CrossRef]
Wang, Y.; Bai, J.; Xiao, Z.; Zhou, H.; Jiao, L. MsmcNet: A Modular Few-Shot Learning Framework for Signal Modulation Classification. IEEE Trans. Signal Process. 2022, 70, 3789–3801. [Google Scholar] [CrossRef]
Liu, D.; Wang, P.; Wang, T.; Abdelzaher, T. Self-Contrastive Learning based Semi-Supervised Radio Modulation Classification. In Proceedings of the MILCOM 2021—2021 IEEE Military Communications Conference (MILCOM), San Diego, CA, USA, 29 November–2 December 2021; pp. 777–782. [Google Scholar] [CrossRef]
Kong, W.; Jiao, X.; Xu, Y.; Zhang, B.; Yang, Q. A Transformer-Based Contrastive Semi-Supervised Learning Framework for Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 950–962. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the CVPR 2020—2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar] [CrossRef]
Chang, S.; Zhang, R.; Ji, K.; Huang, S.; Feng, Z. A Hierarchical Classification Head Based Convolutional Gated Deep Neural Network for Automatic Modulation Classification. IEEE Trans. Wirel. Commun. 2022, 21, 8713–8728. [Google Scholar] [CrossRef]
Wu, Z.; Zhou, S.; Yin, Z.; Ma, B.; Yang, Z. Robust Automatic Modulation Classification Under Varying Noise Conditions. IEEE Access 2017, 5, 19733–19741. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Han, J.; Yu, Z.; Yang, J. A lightweight deep learning architecture for automatic modulation classification of wireless internet of things. IET Commun. 2024, 18, 1220–1230. [Google Scholar] [CrossRef]
O’shea, T.J.; West, N. Radio Machine Learning Dataset Generation with GNU Radio. Proceedings of the GNU Radio Conference. [S.l.], v. 1, n. 1, Sep. 2016. Available online: https://pubs.gnuradio.org/index.php/grcon/article/view/11 (accessed on 1 July 2025).
Ya, T.U.; Yun, L.; Haoran, Z.; Zhang, J.; Yu, W.; Guan, G.; Shiwen, M. Large-scale real-world radio signal recognition with deep learning. Chin. J. Aeronaut. 2022, 35, 35–48. [Google Scholar] [CrossRef]
Zhang, F.; Luo, C.; Xu, J.; Luo, Y. An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation. IEEE Commun. Lett. 2021, 25, 3287–3290. [Google Scholar] [CrossRef]
Xu, J.; Luo, C.; Parr, G.; Luo, Y. A spatiotemporal multi-channel learning framework for automatic modulation recognition. IEEE Wirel. Commun. Lett. 2020, 9, 1629–1632. [Google Scholar] [CrossRef]
Hermawan, A.P.; Ginanjar, R.R.; Kim, D.-S.; Lee, J.-M. CNN-based automatic modulation classification for beyond 5G communications. IEEE Commun. Lett. 2020, 24, 1038–1041. [Google Scholar] [CrossRef]
Zhang, J.; Wang, T.; Feng, Z.; Yang, S. Toward the Automatic Modulation Classification with Adaptive Wavelet Network. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 549–563. [Google Scholar] [CrossRef]
Wang, Y.; Gui, G.; Gacanin, H.; Ohtsuki, T.; Dobre, O.A.; Poor, H.V. An Efficient Specific Emitter Identification Method Based on Complex-Valued Neural Networks and Network Compression. IEEE J. Sel. Areas Commun. 2021, 39, 2305–2317. [Google Scholar] [CrossRef]
Tao, M.; Fu, X.; Lin, Y.; Wang, Y.; Yao, Z.; Shi, S.; Gui, G. Resource-Constrained Specific Emitter Identification Using End-to-End Sparse Feature Selection. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 6067–6072. [Google Scholar] [CrossRef]

Figure 1. Overall flowchart of the proposed FLA-CL.

Figure 2. The comparison experiment results of FLA-CL in WSC.

Figure 3. Confusion matrix on RMLa at 0 dB.

Figure 4. Partial confusion matrix on ADS-B at 0 dB.

Table 1. The overall accuracy and kappa score of comparison experiments on four datasets. Bold: the best, underlined: the second best.

Methods	RMLa		ADS-B
Methods	OA (%)	Kappa	OA (%)	Kappa
TCSSAMR [8]	54.19 ± 0.28	0.4935 ± 0.0031	41.66 ± 4.47	0.4132 ± 0.0450
SemiAMC [7]	52.29 ± 0.80	0.4752 ± 0.0088	55.82 ± 2.37	0.5556 ± 0.0238
HCGDNN [10]	52.92 ± 0.33	0.4821 ± 0.0036	66.94 ± 0.93	0.6675 ± 0.0094
PETCGDNN [16]	48.45 ± 0.26	0.4329 ± 0.0029	86.76 ± 0.37	0.8669 ± 0.0039
AWN [19]	52.39 ± 0.51	0.4763 ± 0.0056	61.54 ± 0.73	0.6103 ± 0.0074
MCLDNN [17]	49.92 ± 0.34	0.4492 ± 0.0038	62.03 ± 6.24	0.6182 ± 0.0628
ICAMCNet [18]	52.03 ± 0.27	0.4723 ± 0.0029	86.85 ± 0.81	0.8678 ± 0.0082
CVCNN [20]	44.62 ± 0.09	0.3908 ± 0.0010	86.49 ± 0.65	0.8641 ± 0.0065
SFS-SEI [21]	44.79 ± 0.12	0.3927 ± 0.0013	86.96 ± 0.73	0.8688 ± 0.0073
FLA-CL (SL)	51.99 ± 0.36	0.4697 ± 0.0040	80.32 ± 0.74	0.8021 ± 0.0075
FLA-CL	56.75 ± 0.35	0.5225 ± 0.0039	90.58 ± 1.73	0.9053 ± 0.0017

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mu, S.; Chen, S.; Zu, Y.; Feng, Z.; Yang, S. Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation. Electronics 2025, 14, 2728. https://doi.org/10.3390/electronics14132728

AMA Style

Mu S, Chen S, Zu Y, Feng Z, Yang S. Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation. Electronics. 2025; 14(13):2728. https://doi.org/10.3390/electronics14132728

Chicago/Turabian Style

Mu, Shiyuan, Shuai Chen, Yong Zu, Zhixi Feng, and Shuyuan Yang. 2025. "Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation" Electronics 14, no. 13: 2728. https://doi.org/10.3390/electronics14132728

APA Style

Mu, S., Chen, S., Zu, Y., Feng, Z., & Yang, S. (2025). Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation. Electronics, 14(13), 2728. https://doi.org/10.3390/electronics14132728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contrastive Learning with Feature-Level Augmentation for Wireless Signal Representation

Abstract

1. Introduction

2. The Basic Signal Model and Problem Formulation

2.1. Signal Model

2.2. Problem Formulation

3. The Proposed Framework

3.1. Contrastive SSL Framework

3.1.1. Transformer-Based Encoder

3.1.2. Enhanced Prediction Head

3.1.3. Contrastive Loss

3.2. Downstream Tasks

4. Experiment Results and Analysis

4.1. Datasets and Implementation Details

4.1.1. RMLa

4.1.2. ADS-B

4.2. Comparison Experiment Results

4.3. Ablation Experiment Results

4.4. Limitations and Future Works

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI