Refined Semi-Supervised Modulation Classification: Integrating Consistency Regularization and Pseudo-Labeling Techniques

: Automatic modulation classification (AMC) plays a crucial role in wireless communication by identifying the modulation scheme of received signals, bridging signal reception and demodulation. Its main challenge lies in performing accurate signal processing without prior information. While deep learning has been applied to AMC, its effectiveness largely depends on the availability of labeled samples. To address the scarcity of labeled data, we introduce a novel semi-supervised AMC approach combining consistency regularization and pseudo-labeling. This method capitalizes on the inherent data distribution of unlabeled data to supplement the limited labeled data. Our approach involves a dual-component objective function for model training: one part focuses on the loss from labeled data, while the other addresses the regularized loss for unlabeled data, enhanced through two distinct levels of data augmentation. These combined losses concurrently refine the model parameters. Our method demonstrates superior performance over established benchmark algorithms, such as decision trees (DTs), support vector machines (SVMs), pi-models, and virtual adversarial training (VAT). It exhibits a marked improvement in the recognition accuracy, particularly when the proportion of labeled samples is as low as 1–4%.


Introduction
In the modern era of digital interconnectedness, wireless communication systems are pivotal in forging effective channels that link individuals, devices, and data across vast distances [1,2].These systems, dependent on the transmission and reception of signals, are critical for seamless communication.Initially, the field of wireless communication was characterized by a restricted suite of modulation schemes and a cooperative nature, where entities engaged in communication mutually agreed on the schemes to be employed.This era did not demand signal recognition at the receiving end.As wireless communication systems have progressed, expanding in diversity and intricacy, the implementation of multiple modulation schemes has become commonplace.This development underscores the need for precise classification of these varied schemes, a task skillfully managed by automatic modulation classification (AMC) [3,4].AMC has an integral role in non-cooperative communication scenarios, acting as a link between signal detection and demodulation.Here, non-cooperative communication refers to instances where an external party gains access to a communication system with no prior approval from the original communicating parties, without affecting their ongoing communication.AMC's utility spans military and civilian sectors.In military applications, it is instrumental in identifying and analyzing electromagnetic signals, which can reveal the functionalities of enemy electronics and assess their threat levels.Moreover, AMC processes this intelligence, bolstering reconnaissance efforts.In civilian applications, AMC's ability to distinguish between modulation schemes is fundamental to efficient device communication and vital for smart, efficient, and reliable systems.A clear example is in smart home ecosystems, where wireless communication integrates smart appliances and security systems.Through AMC, these devices can discern and accurately process communication signals, ensuring secure data transmission and operational reliability.
In situations where many parameters of the transmitted data and receiver are not known, such as signal power, carrier frequency, and phase offset, automatic modulation classification is a challenging task [5].Traditional methods can be broadly grouped into two categories: likelihood theory-based AMC (LB-AMC) [6][7][8] and feature-based AMC (FB-AMC) [9][10][11].The pivotal concept underlying the LB-AMC method involves establishing a likelihood function based on the statistical attributes of the signal model.This determines an optimal discrimination threshold.By evaluating the signal's likelihood ratio against the threshold, a modulation scheme can be identified.LB-AMC has been shown to achieve optimal performance in Bayesian estimation, as it minimizes the probability of erroneous classification.Nevertheless, its elevated computational complexity poses a challenge to its practical adoption, as it may not satisfy the requirements of real-time processing and a low cost.Notably, when the parameters remain unknown, the recognition accuracy may be significantly affected.Currently, the FB-AMC method has emerged as the predominant approach.Its main principle is to select suitable features and carry out feature extraction from the received signal.Subsequently, a trained classifier classifies the extracted feature information.In comparison with LB-AMC, its algorithm complexity is significantly diminished.In particular, it can be perceived as a mapping relationship from the signal space to the feature space.It morphs a high-dimensional signal space into a lower-dimensional feature space, simplifying calculations.Additionally, the features extracted from the received signal can exhibit differences between diverse modulation schemes.The features should be precise to achieve superior recognition performance.Selection of excellent statistical features is paramount for FB-AMC.The features selected can be grouped into the following categories: time-domain features based on instantaneous amplitude, phase, and frequency [12]; frequency-domain features based on a cyclic spectrum [13] and higher-order cumulants [14]; and transform-domain features based on constellation diagrams [15] and wavelet transforms [16].Traditional classifiers like support vector machines (SVMs) [17] and decision tree (DT) [18] struggle to learn complex feature representations from data.In addition, feature extraction necessitates artificial design, which lead to significant human resource engagement.Feature engineering is indeed a laborious task requiring specialized expertise.Most crucially, the designed features may be confined to specific modulation schemes.
Deep learning algorithms have demonstrated substantial advantages in applications such as image recognition, speech recognition, and facial recognition.Concurrently, they are employed in the field of wireless communication for tasks like spectrum prediction [19], specific emitter identification [20][21][22][23][24], and automatic modulation classification [25,26].Deep learning AMC (DL-AMC) methods are data-driven approaches [26].The depth of the network used is much deeper than the previous network.This allows them to extract more complex features.Unlike previous approaches, it can automatically extract features from raw data and makes effective decisions without manual intervention.These methods include convolutional neural networks (CNNs) [27,28] and recurrent neural networks (RNNs) [29].The core idea of a CNN is to extract features from input data through a series of convolution layers, pooling layers, and fully connected layers.An RNN is a deep learning model that excels in processing sequential data.Its core idea is to introduce recurrent units into the network, making the output of the network dependent not only on the current step but also on the output of the previous step.For deep learning methods, sample annotation is expensive and their performance highly depends on the quality of the data.
Motivated by this, some methods such as few-shot learning [30], conditional generative adversarial networks (CGANs) [31], and semi-supervised learning have been considered.Few-shot learning's objective is to develop precise models with fewer samples.A CGAN is a type of generative adversarial network that involves the conditional generation of images by a generator model.This enables discernible image generation of a specified type.Semisupervised learning employs minimal labeled data, coupled with substantial unlabeled data for training, as shown in Figure 1.In this paper, we propose a semi-supervised automatic modulation classification (SSL-AMC) method based on consistency regularization and pseudo-labeling.Consistency regularization effectively mitigates overfitting by requiring model output consistency amidst slight data variations.The common method is input perturbation, which involves making small random changes to input data.This can encourage the model to produce stable responses to minor variations.Pseudo-labeling is a semi-supervised learning method that leverages the model's predictions on unlabeled data and adds them as pseudo-labels to the unlabeled data, overcoming the problem of sparsely labeled data.We use a wide residual network (WRN) as a backbone network.A WRN is a variant of convolution neural networks that introduces a wider network structure based on residual networks; i.e., it increases the number of channels.Our experimental results demonstrate that the proposed method achieves a high recognition accuracy with limited labeled signal samples.The main contributions of this paper are summarized as follows: • We present a semi-supervised AMC method based on consistency regularization and pseudo-labeling.Consistency regularization is used to encourage the model to extract generalized features from the signal data.Pseudo-labeling is used to make manual predictions for unlabeled data.Both methods are alternatively used to improve the identification performance.

Related Works 2.1. Traditional AMC Methods
The likelihood-theory-based approach is proposed, which introduces the basic concept of the likelihood function and explains how to build likelihood functions from the known statistical properties of a signal.It also discusses how to apply the likelihood ratio method under different channel conditions.As a result of the computational complexity of LB-AMC, researchers have turned their attention to FB-AMC.Many feature extraction algorithms, such as cyclic features, high-order moments, and high-order cumulants, and transform domain features, such as wavelet transforms, exist.Azzouz et al. [32] select time-domain features such as instantaneous frequency, amplitude, and phase to extract the features.Their method achieves the identification of modulation schemes such as ASK, BPSK, and FSK.Y. Han et al. [33] select second-order, fourth-order, sixth-order, and eighth-order cyclic cumulants as features.This method achieves recognition of modulation schemes such as ASK, BPSK, QAM, and APSK.C. Chou et al. [34] choose constellation images as the feature; different modulation schemes present different distribution shapes and densities in constellation images.This method can identify BPSK, QPSK, and 8PSK in the presence of inter-symbol interference (ISI).S. Li et al. [35] choose the cyclic spectrum and the quadratic cyclic spectrum as features.The cyclic spectrum can be used to detect the periodic frequency components of the signal and the quadratic cyclic spectrum is more sensitive to the nonlinear characteristics of the signal.This method can classify PSK, FSK, QAM, MSK, and OFDM modulation schemes in the presence of a multipath effect.It can be seen that the computational complexity of FB-AMC is much reduced.However, FB-AMC has a high demand for features.Feature extraction is dependent on expert experience.Researchers are searching for an algorithm that can automatically extract features.

AMC Methods Based on Deep Learning
In [36], an eye diagram of the original signal is used as an input to Lenet-5, thus linking the AMC problem to the field of image recognition.Convolutional networks for radio recognition are proposed in [37].Simulations on the DeepSig dataset called RML2016.10A [37] show that recognition accuracy is higher than that of FB-AMC.The RML2016.10A dataset is generated based on the GNU Radio environment.The dataset includes eleven modulation signals, each of which consists of twenty signal-to-noise ratios (SNRs).Each SNR has 1000 samples.In [38], an LSTM-based AMC method is shown to outperform CNN models for small-or medium-sized received signals.In [39], long symbol rate signals are studied, and it is found that a stacked autoencoder can achieve better performance by increasing the simulation time.Wang et al. [27] design two CNNs to recognize different modulation schemes.The first CNN is trained with IQ-sampled signals.This network can distinguish QAMs from other modulation schemes; i.e., this network cannot distinguish between 16QAM and 64QAM.The second CNN is trained with a constellation image and can distinguish between 16QAM and 64QAM.Tu et al. [40] propose a novel CNN-based AMC method.They use a pruning technique to reduce the convolution parameters and the floating point operations per second (FLOPs).Experiments show that the convolution layer of this CNN, compared to the original CNN, reduces the convolution parameters without significant losses in recognition accuracy.

Semi-Supervised Learning and Its Applications
Semi-supervised learning has emerged as an imperative branch of machine learning.It seeks to enhance models using labeled and unlabeled data.Semi-supervised learning regards unlabeled data as a valuable source of information, and it is capable of amplifying the model's generalization capability and mitigating the risk of overfitting.Semi-supervised learning is based on the notion that unlabeled data can provide additional information to model training.It enables the discovery of more comprehensive and robust feature representation.In recent years, a number of semi-supervised methods have emerged.Lee et al. [41] proposed a novel pseudo-labeling approach.The key concept of this methodology is to consider the model's forecast of unlabeled signal samples as "labels" for unlabeled data.This approach alleviates the model overfitting problem triggered by insufficient labeled samples.Its disadvantage is that the predictive ability of the model is not strong during the initiation phase of model training.This may reduce the quality of the pseudo-labels, resulting in a decrease in recognition accuracy.Valpola et al. [42] proposed the mean teacher technique.Its core principle involves generating a teacher network by moving the average of the model parameters.
In the context of labeled data, the loss is calculated and updated through back propagation.As for unlabeled data, the loss is calculated by utilizing both the student network and teacher network.The loss is composed of two components.Supervised loss guarantees the compatibility of labeled training samples.Unsupervised loss ensures that predictions of student networks are as similar as possible to predictions of teacher networks.By carefully minimizing both parts of the loss, the generalization ability of the model can be improved.David et al. [43] integrated a variety of semi-supervised algorithms, including consistency regularization, entropy minimization, and conventional regularization techniques.Their principal goal is to implement K times data augmentation for unlabeled data.Then, K new data are acquired and K diverse predicted probability distributions are derived by feeding data into the identical classifier.These K probability distributions are averaged to obtain the average classification probability.Ultimately, the predicted labels for unlabeled data are determined using the sharp algorithm.

System Model
The semi-supervised-learning-based AMC system model is shown in Figure 2. At the transmitter side, the input signal i(t) is modulated to obtain a modulated signal s(t).The modulated signal s(t) is affected by noise and channel fading during transmission.At the receiver side, the received signal x(t) can be expressed as: where h(t) denotes the multi-path channel and * stands for the convolution operation.

Data Preprocessing
The dataset utilized in the experiment comprises twenty modulation schemes, namely BPSK, QPSK, 8PSK, 16QAM, 32QAM, 64QAM, 128QAM, 256QAM, GFSK, CPFSK, PAM4, B-FM, DSB-AM, SSB-AM, APSK, OQPSK, 2ASK, 4ASK, 2FSK, and 4FSK.Each modulation scheme contains 1000 samples.The dimension of each sample is (4096 × 2).These samples belong to the IQ sequence signal.Given the outstanding achievements of deep learning in the image recognition field, numerous research methodologies have adapted signal recognition methods for image recognition problems.Our manuscript utilizes the short-time Fourier transform (STFT) [44] to transform diverse modulated signals into the format of time-frequency images.The operating principle of the short-time Fourier transform involves separating the input signal into multiple overlapping windows.The data within each window are referred to as a frame, and the Fourier transform is applied to data within each frame to obtain information in the frequency domain.The output is a two-dimensional matrix where the horizontal axis represents time and the vertical axis represents frequency.From the time-frequency images, one can discern the energy distribution of the signal at different intervals of time and frequencies.We obtain time-frequency signal images; the dimension of each time-frequency image is (64,64,3), where 64, 64, and 3 refer to the length, width, and number of channels of the time-frequency image, respectively.The processed signal samples can be directly input into the convolution neural network.The data preprocessing process is illustrated in Figure 3.

Problem Description
Let X and Y denote the sample space and category space, respectively.x k ∈ X represents the input sample, i.e., signal samples with IQ format.y ∈ Y denotes the real category corresponding to the modulation scheme.

AMC Problem
The automatic modulation classification task is an intermediate step between signal reception and demodulation, where the received signal is transformed into a baseband complex-valued signal sequence X = {x(0), x(1), . . ., x(k − 1)} by the receiver.K is the number of sampling points.The received modulation signals are I/Q signals, which include a real part I and an imaginary part Q, which can be expressed as: (2) After data preprocessing, the time-frequency image dataset D t = {(x i , y i )} N i=1 .The aim of AMC is to create a mapping function f ∈ F and to minimize its expected error.
where L( f (x), y) represents the error generated by comparing the predicted value with the true label.However, the expected error cannot be calculated directly because we only have a few signal samples.In this case, we use training data to represent the entire distribution and thus calculate an approximation of the expected error.
However, this empirical error may be affected by factors such as randomness of data sampling and noise.It cannot fully and accurately reflect the generalization ability of the model.Thus, we must take into account the difference between the expected error and the empirical error, i.e., the generalization error ϵ = |ϵ em − ϵ ex |.If the two errors are not too different, this means that the model can generalize better on both training data and unknown data.The above equation can be rewritten as:

Semi-Supervised AMC Problem
In the semi-supervised AMC problem, the following settings were used.The training dataset is where L is the number of labeled training samples and N − L is the number of unlabeled training samples.For ease of illustration, we use D l = {(x i l , y i l )|i = 1, • • • , L} to denote the labeled dataset and The core idea of the semi-supervised AMC problem is to find a mapping function that minimizes the expected error.In contrast to the general machine-learning-based AMC problem, due to the sparse number of labeled signal samples, we cannot compute the expected error directly.This is a frequent concern encountered in semi-supervised tasks.Information about data distribution can be obtained from unlabeled samples.Thus, in the semi-supervised AMC task, the expected error can be approximated as: where L ul (•) represents the loss obtained by training on unlabeled signal samples.The remaining portion remains identical to prior usage.Both losses utilize the cross-entropy loss function, which is a frequently utilized loss function in image classification tasks.λ denotes the weight value of the unsupervised loss, which impacts the classification proficiency of the model to an extent.However, the aforementioned must adhere to a significant assumption that the data distribution in the unlabeled samples is useful for the AMC problem.Specifically, labeled and unlabeled data share the same label space.

Proposed Semi-Supervised AMC Method Based on Consistency Regularization and Pseudo Labeling
The structure of our proposed semi-supervised AMC method is illustrated in Figure 4.For simplicity, our proposed method is recognized as Fixmatch.Semi-supervised learning is commonly employed for tasks where data annotation expenses are considerable or the data contain a substantial number of unlabeled samples, such as medical image analysis and agriculture and environmental surveillance.Semi-supervised learning essentially utilizes unlabeled samples to improve model performance.In this article, the fundamental concept of the proposed method is to incorporate unlabeled signal samples during model training.The specific steps are to train the model using labeled and unlabeled data, respectively, generating supervised and unsupervised losses, and backpropagating to adjust model parameters.The following discussions concentrate on how to construct supervised and unsupervised losses.The initial I/Q signals are post-processed with data to obtain the timefrequency signal images.The time-frequency signal images are divided into labeled and unlabeled signal images.The supervised loss is derived by comparing the projected value of labeled data through the model with the authentic label.It is represented as follows: L ce (y i , P(y|x i l )), (8) where P(•) denotes the model's prediction, N denotes the number of labeled samples, y i denotes true labels of labeled samples, and L ce (•) denotes the cross-entropy loss function.
It is a loss function used in classification tasks to measure the difference between model predictions and true labels.We design unsupervised loss employing two semi-supervised methods, including consistency regularization and pseudo-labeling.We initially modify unlabeled signal samples twice with varying degrees of perturbation, which we refer to as weak data augmentation and strong data augmentation, respectively.Data augmentation is a technique that expands the training samples.The objective of data augmentation is to fortify the robustness of the model by modifying the data space."Strong" and "weak" here denote the degree of perturbation of the data.The strong augmented and weak augmented images are fed into the model to obtain model predictions.A(•) denotes strong data augmentation, which generally consists of operations including color transformation, contrast enhancement, etc. α(•) denotes weak data augmentation, which consists of operations including flipping.Since unlabeled samples do not possess labels, we utilize the weak augmentation branch to devise "pseudo-labels" for these unlabeled samples.The principle of pseudo-labeling [41] is to utilize the model itself to obtain artificial labels for unlabeled samples, which typically use hard labels, i.e., one-hot labels acquired by employing the argmax function.One-hot labels are only retained for labels whose maximum class probability surpasses the threshold.The formulation for pseudo-labeling is as follows: After obtaining the pseudo-labels, we obtain model predictions for the branch of strong data augmentation.The unsupervised loss is obtained by comparing model predictions for the branch of strong data augmentation with pseudo-labels.The reason for constructing the unsupervised loss in this way is that this embodies the idea of consistency regularization.Consistency regularization is an important part of current mainstream SSL, and is based on the assumption that the same image with different perturbations through the network will output the same prediction.The unsupervised loss L ul can be expressed as: where N − L represents the number of unlabeled modulation signal samples.τ denotes the confidence threshold.A(•) is a function.It takes the value of 1 if the maximum class probability is greater than the threshold, otherwise, it takes the value of 0. The value of the confidence threshold directly affects the quality of pseudo-labels.Setting the threshold too high results in a small number of samples used for training, and setting the threshold too low results in the introduction of a lot of erroneous information.In this paper, weak data augmentation consists of a standard flip-and-shift operation, with a 50% probability of flipping and a 12.5% probability of shifting.Employing only weak data augmentation could lead to overfitting of the training process and failure to extract crucial features.Strong data augmentation can cause severe distortion of signal images but still retains sufficient features to recognize the modulation schemes.Strong data augmentation applies the Randaugment [45] augmentation strategy and cutout [46] augmentation.The Randaugment methodology represents a variant of the Autoaugment [47] strategy, which employs a random sampling strategy to reduce the network's dependence on the degree of coupling between augmentations.Specifically, there is a list of 14 augmented techniques in Randaugment, along with a range of augmented magnitudes.N augmented methods are randomly selected from this list.A random magnitude M is chosen.Subsequently, the chosen augmented techniques and magnitude are implemented on the training signal image, where each augmented technique has a 50% probability of being utilized.The cutout strategy encourages the model to learn robust features by randomly masking a portion of the training image.It can successfully avert interference from noise or unexpected features.The combination of Randaugment and cutout effectively suppresses the noise introduced by Randaugment, thereby further enhancing the model's understanding of key features [48].The overall loss of model training is the sum of supervised and unsupervised losses.The overall loss can be expressed as: where λ u denotes the weight value of the unsupervised loss.In experiments, the value of this parameter is usually set to 1.0.After calculating the total training loss through forward propagation, back-propagation (BP) is performed.The goal of back-propagation is to compute the gradient of the model's loss function with respect to model parameters.The gradient represents the rate of change in the loss function for each model parameter.It specifies the direction that causes the loss function to fall the fastest in the model parameter space.By continuously updating the model parameters, the model's predictions gradually approach the ground truth.In this paper, we select the wide residual network (WRN) [49] as the backbone network for extracting features from time-frequency images.The WRN has residual blocks and jump connections.Each residual block is composed of multiple convolution layers and batch normalization layers.The WRN retains the advantages of residual networks in preventing gradient vanishing.

Decision-Tree-Based AMC Method
Decision tree is a machine learning algorithm for classification and regression problems.It is based on a set of decision rules to classify data.In essence, the approach is an FB-AMC method.The decision tree model is represented as a tree structure, where each internal node represents a feature.Each branch represents a decision rule.A leaf node represents a category label.The construction of a decision tree is based on the principle of recursive partitioning.Starting from the root node, optimal feature and segmentation thresholds are chosen to divide the dataset into different subsets.Then, the same operation is performed recursively on each subset until a certain termination condition (the number of samples in a node is less than a certain threshold) is reached.In this paper, we compute 26 statistical characteristics for the real and imaginary parts of the modulated signal separately.Thus, each sample has 52 statistical characteristics.Finally, a decision tree with 52 statistical characteristics for threshold judgment is formed.

VAT-Based AMC Method
Virtual adversarial training (VAT) [50] is a regularization method in semi-supervised learning.VAT is used to enhance the robustness of the conditional label distribution around the input data points to local perturbations.Unlike adversarial training, VAT introduces virtual adversarial directions.Adversarial directions can be defined on unlabeled data points even in the absence of label information.

Experimental Setup
Our server uses a Geforce GTX 2080Ti GPU to perform calculations.Firstly, the original signal with dimensions of (4096, 2) is converted into time-frequency images with dimensions of (64, 64) on MATLAB.The time-frequency images are fed into Python for subsequent processing.The environment in Python is the torch 1.4.0 deep learning framework.We use sklearn 0.24.2 tool to test the model.The dataset we use contains 20,000 time-frequency images with twenty modulation schemes.Each modulation scheme includes 1000 timefrequency images and the dimensions of each time-frequency image are 64 × 64 × 3. We set the proportion of labeled samples to 1%, 2%, 3%, 4%, and 5%, respectively.The specific parameter settings are shown in Table 1.

Experimental Results and Analysis
As shown in Figure 5 and Table 2, our proposed method achieves the best results with a labeled sample proportion of 1-4%.This indicates that semi-supervised learning methods based on pseudo-labeling and consistency regularization perform well in the AMC domain.We find that both semi-supervised learning methods (pi-model, VAT, proposed method) and machine learning methods (decision tree, support vector machine) have a higher classification accuracy than supervised learning methods when the number of labeled samples is small (1-5% of labeled samples).These supervised learning methods only use a backbone network for classification.This suggests that deep-learning-based methods are driven by data, and their classification accuracy is greatly affected in the case of insufficient data samples.The machine-learning-based method is not affected much by the number of data samples.However, it requires hand-designed features, which limits the improvement to the recognition accuracy.We found that the classification accuracy of both semi-supervised learning methods (VAT and Pi-model) is not good when the sample size is extremely small (1% of labeled samples).However, at this point, the recognition accuracy of our proposed semi-supervised method exceeds 94%.This suggests that our proposed method is suitable for scenarios where the number of labeled samples is extremely low.We find that the recognition accuracy of all methods increases as the sample size increases.This is because our proposed method learns the distribution of data not only in unlabelled data but also in labeled data.The greater the amount of labeled data, the more prior knowledge is gained.The recognition performance of the model is improved.

Limitations and Future Work
There are two limitations to our work.Related future work is also proposed.Firstly, the purpose of this paper is to demonstrate the efficiency of the proposed semi-supervised approach in automatic modulation classification.Our work has only been validated on one dataset.In practical applications, the data size and class number are not fixed.Therefore, the generalization ability of our proposed method needs further verification.Secondly, our work only classifies known modulation schemes.In practical applications, collected unlabeled signal samples will inevitably include unknown modulation schemes.In future work, we will consider automatic modulation classification in open-set scenarios.

Conclusions
In practical automatic modulation classification tasks, the amount of labeled data is very low.To solve this problem, in this paper, we propose a semi-supervised AMC method based on consistency regularization and pseudo-labeling.The proposed method transforms the modulation classification problem into an image classification task.We introduce both strong and weak data augmentation for consistency regularization operations and introduce pseudo-labeling techniques to construct artificial labels.Consistency regularization typically involves two branches handling different perturbations, with a loss function designed to harmonize the predicted outcomes of both branches.Our experimental results show that, compared to five benchmark algorithms, our proposed method achieves a better recognition accuracy when the number of labeled samples is limited.We believe that the proposed approach can help with practical automatic modulation classification tasks.

Figure 1 .
Figure 1.A semi-supervised learning framework for signal recognition.
n(t) denotes additive noise.The additive noise can be broadly classified into four categories according to its source, namely radio interference, industrial noise, atmospheric interference, and internal noise.The implementation of semi-supervised AMC typically comprises three stages.Initially, we deploy receivers to receive signals from wireless devices, ideally performing data preprocessing on received signals.Here, processing involves scrutinizing and processing the acquired signals to derive an excellent representation of the signals (time-frequency images, constellation images, or more complex features).The objective is to construct more precise, robust, and comprehensive models.Subsequently, supervised and unsupervised losses are effectively constructed, independently adhering to the proposed semi-supervised algorithm.The supervised loss is derived by contrasting the predicted values of labeled samples with their authentic labels.The unsupervised loss is obtained by training on unlabeled signal samples.Specifically, two distinct levels of perturbation are implemented for unlabeled samples.Thereafter, two different probability distributions are generated by the model.The unsupervised loss is calculated by assessing these two probability distributions.Both the supervised and unsupervised losses collaboratively update the model parameters.Lastly, time-frequency signal images for the test are fed into the trained model.The model is capable of achieving the segregation of twenty modulation schemes.

Figure 4 .
Figure 4.The structure of the proposed semi-supervised AMC method.

Figure 5 .
Figure 5. Accuracy comparison of different methods.

Table 2 .
Experimental results when the proportion is 1%.