Diffusion-Based Radio Signal Augmentation for Automatic Modulation Classification

: Deep learning has become a powerful tool for automatically classifying modulations in received radio signals, a task traditionally reliant on manual expertise. However, the effectiveness of deep learning models hinges on the availability of substantial data. Limited training data often results in overfitting, which significantly impacts classification accuracy. Traditional signal augmentation methods like rotation and flipping have been employed to mitigate this issue, but their effectiveness in enriching datasets is somewhat limited. This paper introduces the Diffusion-based Radio Signal Augmentation algorithm (DiRSA), a novel signal augmentation method that significantly enhances dataset scale without compromising signal integrity. Utilizing prompt words for precise signal generation, DiRSA allows for flexible modulation control and significantly expands the training dataset beyond the original scale. Extensive evaluations demonstrate that DiRSA outperforms traditional signal augmentation techniques such as rotation and flipping. Specifically, when applied with the LSTM model in small dataset scenarios, DiRSA enhances modulation classification performance at SNRs above 0 dB by 6%.


Introduction
Automatic modulation classification (AMC) is a pivotal technology in wireless communications that is designed to automatically categorize the modulation categories of received radio signals.Modulation techniques, the method by which information is encoded for transmission, vary widely, each with distinct advantages, applications, and suitability for different communication environments.Common modulation schemes, such as continuousphase frequency shift keying (CPFSK), amplitude modulation double sideband (AM-DSB), Gaussian frequency-shift keying (GFSK), M-phase-shift keying (MPSK), M-quadrature amplitude modulation (MQAM), and wideband frequency modulation (WBFM), have been extensively used due to their robustness and effectiveness in traditional communication systems.However, emerging modulation techniques such as chaotic modulation [1] and orthogonal time frequency space (OTFS) modulation [2] are gaining attention for their potential to enhance signal robustness and spectral efficiency in highly dynamic environments.The diverse range of available modulation techniques reflects the evolving landscape of wireless technology, where the choice of modulation can significantly impact the efficiency, reliability, and security of the communication system.Understanding and classifying these modulation categories accurately is therefore of paramount importance for the design and optimization of next-generation wireless communication systems.This backdrop sets the stage for our investigation into AMC, as it seeks to harness deep learning algorithms to automatically identify and adapt to various modulation schemes, further enhancing the capabilities of cognitive radios and adaptive communication systems.The ability to accurately and efficiently discern modulation categories is crucial not only for the operation of modern communication systems [3,4] but also for the significant role it plays in advancing areas such as cognitive radio [5], spectrum monitoring [6], and electronic warfare [7].
Currently, AMC has branched into three principal methodologies: likelihood theorybased (LB) methods, feature extraction (FB) methods [8], and deep learning-based (DL) methods [9].LB methods, while precise under controlled conditions, often fail to perform well in real-world scenarios due to their high sensitivity to noise and signal distortions.In contrast, FB methods, though effective in extracting key signal features, are limited by their inability to dynamically adapt to the variability and complexity of modern communication signals.Compared with DL methods, which can automatically extract signal features, the first two methods appear somewhat outdated.Therefore, the research interest in the first two has subsided, with deep learning-based methods now at the forefront of ongoing progress in the field.Recent advancements in deep learning, particularly in convolutional neural networks (CNNs) [10], recurrent neural networks (RNNs) [11], and other models [12], have significantly enhanced AMC's capabilities.It is illustrated that radio features extracted by deep learning models are similar to the knowledge of human experts [13].Therefore, DL has now become the mainstream pipeline for AMC [14].
Yet, despite their promise, the success of deep learning models hinges on the availability of large datasets.Insufficient data can result in overfitting, diminishing the model's practical applicability.Previous studies show that small datasets limit the complexity of deep learning models, and training more complex models on small datasets will have negative effects [15].And, for real-world problems, massive training samples are difficult to obtain in non-cooperative scenarios [16,17].Therefore, research on signal augmentation of small datasets is crucial to face the challenges associated with data scarcity.Traditional signal augmentation techniques, such as signal rotation and flipping [18], have been employed to increase the scale of training datasets, which helps mitigate overfitting.However, these methods are limited in the scale of augmentation they can achieve, typically up to 7×, resulting in a capped improvement in performance.Generative models, including generative adversarial networks (GANs) [19], are utilized for signal augmentation in the field of AMC.However, compared to rotation and flipping, the scale of augmentation achieved by GANs is relatively modest, and the resulting augmentation effect is considered to be average.
In this paper, we introduce DiRSA, a novel solution engineered to significantly enhance the performance and applicability of diffusion models in signal augmentation and AMC.DiRSA skillfully adapts the diffusion denoising probabilistic model (DDPM) framework [20] to meet the unique challenges present in AMC.By integrating masking with conditional probabilities, DiRSA extends the DDPM's capabilities to effectively handle complex radio signals, allowing for partial reconstruction and signal augmentation.To guide the modulation category of the augmented signals, we novelly introduce prompt words into DiRSA.This feature enables the precise and flexible generation of signal segments corresponding to specific modulation categories.The augmented datasets generated by DiRSA facilitate the training of AMC models that demonstrate markedly improved accuracy, as verified through thorough testing and evaluation.The main contributions of this paper are summarized as follows:

•
A novel signal augmentation algorithm, DiRSA, which significantly augments the volume of training data available for AMC models without compromising the intrinsic qualities of the radio signals.

•
By incorporating prompt words, the DiRSA technique enhances the accuracy of generating signal data for specific modulation categories, effectively reducing the potential for the confusion commonly associated with generative models.This approach is more adaptable compared to the plain method of training separate models for each modulation category.Utilizing prompt words to dictate the modulation category of generated signals not only simplifies the control mechanism but also significantly decreases the number of model files required, thereby facilitating easier management.

•
Compared to traditional signal augmentation methods, such as rotation and flipping, DiRSA-augmented datasets can achieve more effective modulation classification performance.When using long short-term memory (LSTM) [21], CNNs, and convolutional long short-term memory fully connected deep neural networks (CLDNNs) [22], DiRSA performs 2.75-5.92%better than rotation and flipping in small dataset scenarios at an SNR higher than 0 dB.

The DiRSA
In this section, we present the DiRSA, including the architecture of DiRSA, its promptbased denoising process, and its self-supervised training method.

Overview of DiRSA
DiRSA innovatively generates new signal samples by reconstructing segments of existing signals from random noise, guided by prompt words that specify the target modulation category.This process facilitates the expansion of signal datasets for training deep learning models, consequently enhancing AMC accuracy.The architecture of DiRSA, depicted in Figure 1, includes signal masking, signal denoising, and signal embedding: 1.
Signal masking.Starting with a sample of radio signals where L is the length of the sample, we apply a mask m = {m i | i ∈ {1, 2, . . ., L}} ∈ {0, 1} L .This mask ensures that L m consecutive elements are set to 1.Each masked signal sample x is then split into a masked segment As depicted in Figure 1, the masked segments are shown with red lines, while the unmasked segments are displayed with green lines.The masked segments x m are replaced with pure noise from a standard Gaussian distribution N (0, 1), resulting in a new noised signal sample x.

2.
Signal denoising.An adapted DDPM is employed to reconstruct the signal xm from the noise.This process is influenced by a prompt word vector c, representing the modulation category's one-hot encoding.The pair {x, m} and the prompt word c are input into the DDPM, which outputs xm , approximating x m based on the unmasked segments x u .3.
Signal embedding.The reconstructed segment xm is then embedded back into the original signal sample x by replacing the original masked segment.This produces a new augmented signal sample x, shown as blue solid lines in Figure 1, with the original x m depicted as gray dotted lines.
For each radio signal sample in the original dataset, we generate K different masks and repeat the DiRSA process K times, effectively augmenting the original dataset by a factor of K.This expansion coefficient, K, significantly enhances the diversity and volume of data available for training AMC models.

Prompt-Based Signal Denoising
To improve the suitability of diffusion models for signal processing, especially in addressing the unique challenges of AMC, DiRSA implements substantial modifications to the DDPM framework [20].These modifications are crafted to enhance performance and specificity in signal reconstruction and to enable the application of diffusion models for signal augmentation within the AMC domain.The details are as follows: 1.
Masking and conditional probability: DiRSA employs a novel approach that utilizes masking alongside conditional probabilities.This method allows DiRSA to adapt to IQ signal data and enables the reconstruction of the remaining noise portions of a signal into newly suitable signal segments based on the features of the non-noise portions, thereby achieving the effect of signal augmentation.

2.
Prompt word for modulation category: Uniquely, DiRSA incorporates the modulation category as a prompt word within its process.Each sample that requires signal augmentation will have a modulation category.It is assigned a unique identifier that ranges from 0 to 10.These identifiers are then transformed into a one-hot encoding format, where each identifier is represented by an 11-element binary vector.In this vector, only the position corresponding to the identifier is marked as "1", with all other positions set to "0".This one-hot encoded vector is subsequently used as a prompt word in the input to our model.This innovation allows for the stable generation of signal data corresponding precisely to the specified modulation category.By doing so, DiRSA enhances its utility and accuracy in the field of automatic modulation classification, ensuring that the generated signals are consistently aligned with the desired modulation features.

Denoising Process
DiRSA's denoising process unfolds over T = 50 steps, where the noise signal sample x is gradually denoised to the augmented signal sample x.We denote x t as the resulting signal sample at step t, for all t = T, T − 1, • • • , 0. We define x T = x, x 0 = x, and x m t as the masked segment of x t .This transformation through denoising is depicted in Figure 2.
As an example, we illustrate the first two steps of the prompt-based denoising process in Figure 2a.When t = T, we input {x T , m} into a noise estimation function, ϵ θ (•), which estimates the noise at the current step based on unmasked segments x u and the prompt word c, resulting in ϵ θ (x m T , T|x u , c).We then subtract the estimated noise from x T and obtain x T−1 for the next step.We then input {x T−1 , m} into the noise estimation function again, estimate the noise at t = T − 1 as ϵ θ (x m T−1 , t|x u , c), and obtain x T−2 .By iterating all steps t = T, T − 1, • • • , 0, we obtain the denoised signal sample, x 0 , as shown in Figure 2b.The core concept of the DDPM involves learning a model distribution p θ (•) that closely approximates a given data distribution q(•).The latter will be elaborated in detail in Section 2.3.Upon the model approximation, the key step in the denoising process is the recovery of the data distribution of x m t−1 from x m t , which is assumed to be a Gaussian diffusion process, as where µ θ and σ θ are adapted from Ho et al. [20], the conditional x u is adopted from Tashiro et al. [23], and the conditional prompt word c is newly introduced to fit the modulation label.Specifically, we have and β t is a small positive constant that represents a noise level defined for t = 1, 2, . . ., T by the equation where β 1 = 0.001 and β T = 0.5 in this paper.α t can also be expressed as Hence, the noise estimation function ϵ θ (•) in ( 2) plays an essential role in DiRSA's denoising process.The pseudocode of the denoising process is shown in Algorithm 1.
Algorithm We propose a noise estimation function architecture for ϵ θ (•), as illustrated in Figure 3. Drawing inspiration from CSDI [23] and DiffWave [24], our ϵ θ (•) has residual layers N layer = 4.Each layer processes multiple inputs, including the masked signal sample {x t , m}, the diffusion embedding of step t, and the prompt word c.The output from each residual layer feeds into the next via a 1 × 1 convolutional block.All layers are interconnected with skip connections, culminating in the final estimated noise output ϵ θ (x m t , t|x u , c), which undergoes convolution and ReLU processing before release.Within each layer, after the masked signal sample {x, m} is processed through convolution and ReLU activation, it is combined with preprocessed diffusion embedding t.This combined vector is then enhanced with side information, including time and feature embeddings extracted from {x, m}.
Unlike CSDI and DiffWave, which use separate Transformer layers for temporal and feature dependencies, we employ a single two-layer Transformer tailored for radio signals.
Given that I/Q data only comprise two features-in-phase (I) and quadrature-phase (Q)-this streamlined approach captures both temporal and feature dependencies simultaneously, eliminating the need for additional shape adjustments and convolution between separate Transformer layers.This method significantly simplifies the architecture, reduces memory consumption by nearly half, and accelerates training without compromising accuracy.A novel addition in DiRSA is the use of the prompt word c to guide noise estimation and direct the diffusion model's data generation.By performing an element-wise product between the aggregated vector and the preprocessed c, the prompt word integrates directly into each dimension of the input data.This deep integration ensures that the prompt words significantly influence the model's processing, aligning it more closely with its intended function [25].
To further elucidate the function of prompt words, let us consider a specific example.Suppose we have an input signal that is a noisy sample modulated under "QPSK".In this case, the prompt word c is represented by a one-hot vector corresponding to "QPSK".This vector undergoes an embedding process, transforming c into a continuous vector that encapsulates the modulation characteristics effectively.During the denoising process, as the signal passes through the noise estimation function shown in Figure 3, the embedded vector is combined with the input tensors via an element-wise product.This integration influences the noise generated, ensuring that the signal segment produced post-denoising predominantly exhibits "QPSK" modulation features.This method not only targets modulation precisely but also significantly enhances the clarity of the signal.

Train DiRSA
The self-supervised learning method used in DiRSA for ϵ θ (•) is inspired by the masked language modeling approach used in BERT [26].Initially, we begin with a masked signal sample {x 0 , m}, introduce random noise ϵ into the masked segment, and obtain a function input x t at step t.We then train the noise estimation function ϵ θ (•) to accurately estimate the embedded noise ϵ.The training procedure is illustrated in Figure 4. Initially, Gaussian noise is added to the masked segment x m 0 to generate the noisy masked segment x m t for step t, following the Markov chain, which is defined as According to Ho et al. [20], the transition probability q(x m t |x m 0 ) is expressed as This relationship allows x m t to be formulated as During training, DiRSA minimizes the loss between ϵ θ (x m t , t|x u , c) and ϵ.The loss function is defined as This training framework is designed to ensure that the estimated noise closely aligns with the actual noise, thereby facilitating effective signal reconstruction, denoted as x, from pure noise via the denoising process.

Dataset and Evaluation Method
In this section, we discuss radio signal dataset used and the evaluation approach applied to assess DiRSA with different AMC models, including LSTM, CNNs, and CLDNNs.

Radio Signal Dataset
The RadioML2016.10a[27] open radio signal dataset is utilized in our study.It comprises 220,000 modulated radio signal samples distributed evenly across 11 distinct modulation categories.Each modulation category in the dataset is represented at 20 signal-to-noise ratio (SNR) levels, ranging from −20 dB to 18 dB in 2 dB increments.Each signal sample is configured as a (2128) array, representing 128 consecutive modulated in-phase and quadrature-phase signals.All datasets employed in this study were derived through specific divisions and the signal augmentation of the RadioML2016.10Adataset.

Augmented Datasets and Models
To investigate DiRSA's performance under varying conditions, we conducted simulations with different scales of the original dataset.Initially, we randomly selected 80% of the data from each modulation category within the RadioML2016.10Adataset to form the training set, while the remaining 20% was equally split into the test set and the validation set.We refer to these datasets collectively as the full sets.Subsequently, a smaller subset was prepared by selecting only 5% of the data from each modulation category within the RadioML2016.10Adataset as the training set; the test set and validation set are the same as the full sets.These are referred to as the small sets.The training dataset is masked with a length L m = 15 for DiRSA training.
For AMC model training, we augment our training set using the rotation and flipping method [18], which increases the dataset scale to seven times the original scale, and label this augmented dataset "Rotation and Flipping".For full sets, we set K = 2, doubling the "Rotation and Flipping" dataset.Including the original "Rotation and Flipping", the total signal augmentation reaches 21 times the original dataset scale.Similarly, for small sets, we set K = 3, tripling the "Rotation and Flipping" dataset, and the total dataset is augmented to 28 times the scale of the original dataset.These augmented datasets are labeled "DiRSA".The "Validation" and "Test" datasets serve as validation and testing datasets for all training datasets.The details are summarized in Table 1.

AMC Models
To evaluate the performance of the DiRSA in this paper, we compared three different networks and observed their classification accuracy, i.e., LSTM, CNN, and CLDNN.As delineated in Table 2, the architectures of LSTM, CNN, and CLDNN are tailored for automatic modulation classification with distinct configurations and layer setups: 1.
LSTM Architecture: The LSTM model [21] consists of two layers, each equipped with 128 cells, an input size of 2, and a dropout rate of 0.5.This configuration is designed to effectively process sequential data, making it particularly suitable for time series analysis in modulation classification.The network transitions into a fully connected layer that narrows down from 128 to 11 output channels, corresponding to the different modulation categories present in the dataset.

2.
CNN Configuration: The CNN employs three convolutional layers that are sequentially arranged, each with a kernel size of 3, ReLU activation, and followed by a MaxPooling layer with a kernel size of 2. This structure is aimed at reducing computational load and simplifying network complexity.Channel depths are progressively increased from 2 to 32, then 64, and finally 128.Subsequent to the convolutional stages, the network includes two fully connected layers that scale down the features from 2048 (128 × 16, presuming full connectivity from the max-pooled outputs) to 128, and finally to 11 output channels.

CLDNN Model:
The CLDNN [12] architecture integrates elements of both the CNN and LSTM models.It starts with two convolutional layers identical to the initial layers of the CNN, which then lead into two LSTM layers configured similarly to those in the standalone LSTM model.This hybrid setup culminates in a fully connected layer that compresses the features to 11 output channels, enhancing the model's capability for nuanced modulation recognition tasks.These models were chosen for specific reasons.The LSTM network is preferred for its simplicity and proven effectiveness in handling sequential data, such as time series [28], making it ideal for modulated signal classification.While CNNs are commonly used and simple, they are slightly less effective than LSTMs in processing time series I/Q data.CLDNN, being the most complex of the three, combines the strengths of CNN and LSTM architectures and exhibits the highest adaptability and fitting capability.The hyperparameters are set as follows: training epochs at 50, mini-batch size at 400, and an initial learning rate of 0.001, which is halved if the validation loss does not decrease for five consecutive rounds.All models are implemented using PyTorch.The implementation details and training procedures for DiRSA are detailed in the publicly accessible source code, available at https://github.com/YicXu/DiRSA(accessed on 20 May 2024).

Performance
In this section, we evaluate the performance of DiRSA by assessing the efficacy of prompt words and AMC accuracy using different datasets and deep learning models.

Evaluation of Prompt Words
To illustrate the effectiveness of prompt words, we apply DiRSA on different modulation categories, i.e., 8PSK, CPFSK, AM-DSB, and GFSK samples, as shown in Figure 5.We first select a batch from the RadioML2016.10Adataset.Each sample in the batch is augmented with different modulation categories serving as prompt words.In the illustration, the blue line denotes the reconstructed signal segment, and the red dots represent the I/Q data points.A green line, which may not be distinctly visible in densely plotted areas, connects these I/Q data points.For each modulation category shown in Figure 5, we compare the green lines and the blue lines of the same modulation.The augmented segment is consistent with the remaining signal features.Hence, it can be confirmed that the diffusion model based on prompt words indeed generates signals as expected.Table 3 provides a summary of the mean absolute error (MAE) for augmented signal samples when SNR is greater than 0 dB.For instance, the MAE for augmented 8PSK samples is 0.588.To further demonstrate the effectiveness of prompt words in influencing signal augmentation, we present examples using incorrect prompt words in Figure 6.Specifically, all input signals in the figure belong to the "8PSK" modulation category but are processed with various prompt words.In Figure 6a, the correct prompt word c is set to "8PSK", resulting in the reconstructed segments in blue naturally aligning with the remaining signal.Conversely, when the prompt word is changed to "AM-DSB" in Figure 6c, the reconstructed segments markedly differ from the original signals.Similar discrepancies are observed in Figure 6b,d, when the prompt words are "CPFSK" and "GFSK", respectively.The variation in the blue lines across the four subfigures vividly illustrates how prompt words crucially affect the outcomes of the DiSRA algorithm.Corresponding to Figure 6, Table 4 provides a summary of the MAE for "8PSK" signal samples when the SNR is greater than 0 dB using different prompt words.After comparing it with Table 3, it is not difficult to find that the MAE of samples using the wrong prompt words increased significantly.For instance, the MAE is 1.167 when the prompt word is wrongly set to "AM-DSB".Hence, the prompt word c guides DiSRA's augmentation process.

Original Sample x
Unmasked segment x u Augmented segment x m

AMC Performance
We evaluate the AMC accuracies under different deep learning models, i.e., LSTM, CNN, and CLDNN, in Figure 7, Figure 8, and Figure 9, respectively.Obviously, all signal augmentation methods improve classification accuracy at SNRs higher than −6 dB compared to the baseline.It can be found that the accuracy of various methods is almost below 40% when the SNR is less than or equal to −8 dB.When SNR levels are too low, like lower than −8 dB, the noise in the system becomes so pronounced that it substantially obscures the modulated signal's features that are crucial for accurate modulation classification [29].This degradation impedes the AMC model's ability to extract and utilize the features necessary for distinguishing between different modulations.Analysis at this unusable level of accuracy is not very meaningful.Therefore, we will not analyze the case where the SNR is too low.
In Figures 7b, 8b, and 9b, we assess the small dataset and its augmented counterparts for all models.In the small dataset scenario, "No augmentation", containing only 50 samples per SNR per modulation category, exhibits poor performance.However, signal augmentation significantly enhances accuracy.Specifically, "DiRSA" surpasses "Rotation and Flipping", with LSTM models on "DiRSA" outperforming "Rotation and Flipping" by approximately 5.92% at a 0 dB SNR or higher.Similarly, for CLDNN and CNN models, "DiRSA" demonstrates superior accuracy compared to "Rotation and Flipping" alone.This demonstrates that while DiRSA is effective across a range of SNRs, it achieves optimal performance at SNRs of 0 dB or higher, which confirms DiRSA's higher scalability and effectiveness in enhancing model performance under sparse data conditions.To further highlight the benefits of DiRSA, we showcase confusion matrices for different signal augmentation methods in Figure 10.As depicted in Figure 10b, following "Rotation and Flipping", there is noticeable confusion between several modulation categories, notably between BPSK and QPSK, AM-DSB and WBFM, and QAM16 and QAM64.The most pronounced confusion occurs between QAM16 and QAM64, with QAM16 achieving only 37% classification accuracy.Conversely, Figure 10c demonstrates that, after applying DiRSA, the confusion between BPSK and QPSK, as well as between QAM16 and QAM64, is significantly mitigated compared to "Rotation and Flipping".Notably, the classification accuracy for QAM16 is enhanced by 50%.Although AM-DSB shows a slight decrease in performance by 5%, WBFM sees a substantial improvement of 35%, and overall confusion across the board is notably reduced.has 1,052,993 parameters, as detailed in Table 5, reflecting its intensive computational demands.The summary complexity of DiRSA's DDPM and AMC model is detailed in Table 5.The actual complexity of the DiRSA requires an additional T-step cycle based on DDPM, detailed in Algorithm 1.

Conclusions
This paper introduces DiRSA, an innovative radio signal augmentation method utilizing diffusion models that significantly enhances the volume and quality of datasets for modulation classification.By augmenting datasets up to 20 times their original scale, DiRSA effectively mitigates the issue of overfitting and improves the robustness of deep learning models, particularly in environments where large-scale data collection is challenging.The integration of prompt words within DiRSA's framework allows for precise signal generation tailored to specific modulation categories, thereby increasing the efficiency and manageability of the training process.Empirical results demonstrate that DiRSAaugmented datasets boost AMC performance significantly, especially in scenarios with limited data.
Building on the results of this study, future work could further enhance DiRSA in several ways.Optimizing the diffusion model could reduce computational demands and improve efficiency, potentially enabling real-time AMC applications.Then, applying DiRSA to a wider variety of AMC models and datasets would help validate its effectiveness across different modulation categories and conditions, broadening its applicability.Finally, the principle of DiRSA has the potential to be applied to other fields, like reconstructing incomplete signal data or extending the radio signal length.

Figure 1 .
Figure 1.The architecture of DiRSA.The green curves represent unmasked segments x u , the red curves represent masked segments x m , and the blue curves represent the denoised masked segments xm .

Figure 2 .
Figure 2. The prompt-based denoising process of DiRSA.The green curves represent unmasked segments x u , and the red curves represent masked segments x m t , and the black curves represents the noise.(a) The prompt-based denoising process when t = T and t = T − 1.(b) An illustration of the full denoising process when t = T, T − 1, • • • , 0.

Figure 4 .
Figure 4.The training of DiRSA.The green curves represent unmasked segments x u , the red curves represent masked segments x m 0 , the purple curves represent noisy masked segments x m t , the orange curves represent generated noise ϵ, and the brown curves represent estimated noise ϵ θ (x m t , t|x u , c).

Figure 6 .
Figure 6.Comparison of augmented 8PSK samples using different prompt words c.

Figure 7 .
Figure 7.Comparison of classification accuracy under full and small datasets for LSTM model.

Figure 8 .
Figure 8.Comparison of classification accuracy under full and small datasets for CNN model.

Figure 9 .
Figure 9.Comparison of classification accuracy under full and small datasets for CLDNN model.
Denoised signal sample x 0 . 1 for t ← T − 1 to 0 do // Estimate noise of step t Obtain the current sample by denoising the last sample; refer to Equation (2); 1: Signal Denoising Process for IQ Signals using DiRSA Input : Initial signal sample x T , mask m, prompt word c, total diffusion steps T. Output : 3 x t,i ← x t forward pass through the residual layer i; 4 x t,i stack to the skip connections; 5 x t,i input to the next layer; 6 end 7 ϵ θ (x m T , T|x u , c) ← stack the skip connections, sum and normalize; // Obtain the current sample 8 x t ← 9 if t > 0 then // Add noise to the current sample 10 σ θ (t) ← calculate variance refer to Equation (3); 11 x t ← add a random noise multiplied by σ θ (t);

Table 1 .
Datasets and information.

Table 2 .
AMC models evaluated in this paper.

Table 3 .
MAE for augmentations with correct prompt words.

Table 4 .
MAE for augmentations with different prompt words

Table 5 .
Summary of complexity.