An Efﬁcient Data Augmentation Method for Automatic Modulation Recognition from Low-Data Imbalanced-Class Regime

: The application of deep neural networks to address automatic modulation recognition (AMR) challenges has gained increasing popularity. Despite the outstanding capability of deep learning in automatic feature extraction, predictions based on low-data regimes with imbalanced classes of modulation signals generally result in low accuracy due to an insufﬁcient number of training examples, which hinders the wide adoption of deep learning methods in practical applications of AMR. The identiﬁcation of the minority class of samples can be crucial, as they tend to be of higher value. However, in AMR tasks, there is a lack of attention and effective solutions to the problem of Imbalanced-class in a low-data regime. In this work, we present a practical automatic data augmentation method for radio signals, called SigAugment, which incorporates eight individual transformations and effectively improves the performance of AMR tasks without additional searches. It surpasses existing data augmentation methods and mainstream methods for solving low-data and imbalanced-class problems on multiple datasets. By simply embedding SigAugment into the training pipeline of an existing model, it can achieve state-of-the-art performance on benchmark datasets and dramatically improve the classiﬁcation accuracy of minority classes in the low-data imbalanced-class regime. SigAugment can be trained for uniform use on different types of models and datasets and works right out of the box.


Introduction
With extensive research and application of deep neural networks in various fields, the performance of deep learning-based automatic modulation recognition (AMR) has improved tremendously over traditional signal processing methods [1]. The key enablers of deep learning are the availability of large datasets, the advent of GPUs, and the continued development of deep network architectures [2]. AMR tasks based on deep learning cannot be performed without high-quality labeled data, which are required for both algorithmic research and practical deployment. The amount of data in existing open-source datasets is the same for all modulation types [1, 3,4]. However, deep learning-based AMR tasks face three real-world challenges: (1) insufficient data; (2) imbalanced classes; and (3) high computational demands when computational power is limited. Sensor data are always skewed by class imbalance, i.e., a small number of classes have a large number of sample points, while other classes have only a few samples. There is also the possibility of addressing the low-data regime problem [5]. Data are crucial not only in communications but also in high-tech fields such as finance, automation, and blockchain [6][7][8]. Deep learning models struggle to achieve excellent performance when confronted with a class imbalance in a low-data regime, resulting in poor classification performance for minority classes with insufficient data. This has become a major challenge because the minority class has a higher cost of misidentification because it represents favorable samples that are rare or expensive in nature.
A number of recent studies have focused on the problem of insufficient data and imbalanced classes, such as resampling and reweighting based on the sample size of each class. Resampling is achieved by increasing the weight of the minority class while decreasing the weight of the majority class. Upsampling is the most commonly used method in recent years [9][10][11][12], which can increase the likelihood of overfitting and significantly increase training complexity, particularly at higher oversampling rates, making it difficult for classifiers to achieve good generalization performance in testing or in practical applications [13,14]. The most successful application of this strategy, which sets the sample weights inversely proportional to the class frequencies, is in the field of target detection for balancing background classes and other classes [15]. Recent research on large, real-world long-tail datasets has revealed that this strategy performs poorly [16]. In contrast, data augmentation methods expand the dataset by transforming the original data to generate synthetic samples [17][18][19][20][21]. It is a simple and practical technique. It is more widely used than other methods because it can be effectively applied to various types of class imbalance and low-data regime problems. Although the straightforward application of data augmentation may exacerbate the class imbalance, the method is extremely effective at suppressing overfitting. Furthermore, data augmentation can be used in combination with class rebalancing and module improvement [22]. In this paper, we only focus on the effective data augmentation method for AMR from low-data imbalanced-class regime. The balancing method and model design that go with it are beyond the scope of this paper.
Research into data augmentation methods for modulated signals has received little attention in comparison to computer vision and speech [18]. Existing research has shown that data augmentation can significantly improve deep learning model classification performance under class-balanced conditions. The main question at this point is how to create effective data augmentation methods and strategy combinations to improve modulation recognition performance in low-data regimes with class imbalance. Domainality, model and data distribution independence, and complexity must all be considered when developing data augmentation methods for modulated signals. Signal samples differ from image data in terms of intrinsic characteristics, the most notable of which are temporal dependency and spatial dependency, where spatial correlation refers to the correlation between the I and Q components [23]. Signal samples can be transformed in both the time and frequency domains, so that corresponding data augmentation methods can be designed in the transformed domains, but the temporal complexity of these transformations needs to be considered [24]. There are only two effective augmentation methods in the current study, namely rotation and flip [18], and further research is needed to design more effective methods. Existing data augmentation methods have yet to be tested for adaptability to different models and datasets. Data augmentation methods for signal data in simple channel environments, for example, may be ineffective for data in complex channel environments. Data augmentation methods for recurrent neural networks (RNNs) might not work for convolutional neural networks (CNNs), and augmentation methods for frequency domain data might not work for time domain data. Existing research has not investigated data augmentation methods that are effective for multiple types of models. As a result, selecting and combining augmentation algorithms is challenging.
Despite numerous works devoted to providing optimal recognition accuracy, model design solutions, and data processing means for AMR tasks, current research on these tasks suffers from the following problems: (1) There has not been enough attention paid to the problems of insufficient data and imbalanced classes, which are prevalent in practical applications; (2) There are many methods used to deal with the aforementioned problems in the image domain, but they are all difficult to transfer directly to modulated signal classification tasks. (3) Existing data augmentation methods can vary significantly in their performance for different models.
In this paper, we focus on the problems of AMR tasks in the low-data imbalanced-class regime. Our approach is to use automated data augmentation techniques. The idea is to design a set of individual data transformations based on the characteristics of the modulated signal. The augmentation sequence for the data samples is then randomly selected from this set of transformations. We present four new augmentation transformations and a combination method based on radio signal characteristics: channel shuffle, inversion, split and permutation, scaling, and flip and channel shuffle. In experiments, empirical results on several datasets with unbalanced classes and small sample sizes show that our model significantly outperforms existing state-of-the-art(SOTA) methods. Furthermore, extensive validation experiments and ablation studies can validate the aforementioned preliminary discovery.
Our approach differs from previous work in the following ways: (1) When compared to RandAugment data augmentation, our approach does not require setting hyperparameters or grid search; additionally, RandAugment targets image data and SigAugment targets communication signal data; (2) When compared to existing individual data augmentation methods, SigAugment automatically selects augmentation sequences during training, which is more adaptive to the model and data; (3) When compared to existing resampling methods, our method implements online data transformation with little to no additional training overhead and is better at combating overfitting and improving model generalisation performance.
We summarize the key contributions of this paper as follows.
• To the best of our knowledge, this is the first study of class imbalance modulation recognition in low-data regimes, and it will serve as a reference for researchers and the community in order to better understand the problem of class imbalance modulation recognition using data augmentation methods; • We demonstrate that existing rebalancing methods are limited in their ability to solve the class imbalance and small sample size problems in AMR tasks, both because they increase training costs and provide very limited improvements in recognition performance; • We propose a new automated data augmentation method for modulated signal data, called SigAugment, which can be used without additional hyperparameters compared to existing methods and can be adapted to models with different structures. Experimental results demonstrate that SigAugment can outperform existing SOTA methods on different types of datasets. In particular, the gain is significant on small datasets with unbalanced classes.

Related Works
Due to the accumulation of large data volumes and the widespread use of deep neural networks, deep learning-based AMR methods have outperformed traditional methods in terms of performance. Existing research has concentrated on developing deep neural network models based on open-source datasets and conducting studies on improving recognition accuracy, real-time performance, and the trade-off between the two [1, 3,4,23,[25][26][27][28][29][30][31][32][33][34]. Studies on the representation of radio signal data have also begun, which will provide additional domain knowledge aspects for deep learning-based automatic modulation identification methods [24]. Furthermore, data augmentation techniques have been shown to improve model classification accuracy [18,30]. In practice, however, AMR as a time series classification task is frequently subject to class imbalance and a low-data regime problem. When using small sample sizes, recent studies have used few-shot learning techniques to solve the modulated signal recognition problem [5]. In contrast, little attention has been paid to the study of class imbalance in modulated signal data. In contrast, the class imbalance problem has received extensive attention and discussion in the fields of computer vision, speech, and text [13,14,16,35,36]. The synthetic minority oversampling technique (SMOTE) for oversampling to artificially mitigate the imbalance is a conventional approach to solving the imbalance classification problem [37]. This method is widely used, and there are numerous variations of it [9][10][11][12].
However, the resampling strategy may alter the distribution of the original data, compromising the model's performance. Over-sampling, in particular, generates a large amount of redundant data, reducing model training efficiency, and the strategy overfits minority classes, making it difficult to transfer the model's excellent performance on the training set to the test data, often resulting in poor generalization performance on very imbalanced data. On the other hand, downsampling can result in significant information loss in the sample and even underfitting. The cost-sensitive reweighting technique is another alternative. Reweighting is the process of assigning different weights to different classes or samples, primarily to solve the class imbalance problem by reweighting the losses of different classes [15,35]. Samples from minority classes have a higher loss than samples from majority classes because the features learned in minority classes tend to be weaker. This can easily lead to the model overfitting the minority class [36]. A side effect of assigning higher weights to hard samples is the focus on noisy data. Assigning high weights to samples in modulation recognition tasks, which frequently contain a large number of samples with very low signal-to-noise ratios (SNRs), may reduce overall recognition performance. Other approaches are discussed in the recent review [13].
Data augmentation techniques are a simpler and more effective approach that is widely used for tasks such as classification, target detection, and speech recognition [18][19][20][21]30,38,39], and they are especially effective for class imbalance and low-data regime problems [40]. To the best of our knowledge, rotation and flip are effective modulation signal augmentation methods [18], and it has been experimentally demonstrated that data augmentation methods can achieve higher recognition accuracy in low-data regimes than when no data augmentation methods are available for the entire dataset. Modulated signal data augmentation for imbalanced-class and low-data regimes, on the other hand, has not been thoroughly investigated.

Signal Model
A typical communication system is made up of three parts: a transmitter, a channel, and a receiver. To transmit wireless signals over long distances in the air, the transmitter must modulate the low-frequency baseband signal to the high-frequency carrier signal in a specific way. The receiver uses the opposite modulation process after receiving the modulated signal to separate the baseband signal from the modulated signal. The identification of the signal modulation type is a critical step in signal demodulation. Signals received by the receiver are typically modulated in-phase and quadrature (IQ). In quadrature are two signals whose phases are 90 degrees apart. In mathematical terms, the samples obtained after sampling the signals can be expressed as L-dimensional arrays with two channels, denoted as follows: where I = [i 1 , i 2 , . . . , i L ], Q = [q 1 , q 2 , . . . , q L ]. In addition to representing the signal as IQs, amplitudes and phases (APs) can also be used to represent the signals.
where A = I 2 + Q 2 , φ = arctan 2(Q/I). A single sample in a deep learning-based AMR task is typically represented as (x, y), where y denotes the label of sample x.

Data Augmentation Methods for AMR
Data augmentation is a set of techniques for artificially increasing the amount of data by generating new samples from existing data. This includes applying small transfor-mations to data, combining data samples, and generating new data samples with deep learning models. Data augmentation is frequently used to prevent overfitting, improve model generalization, address class imbalances, and reduce the cost of collecting and labeling data. A number of traditional data augmentation methods have been successfully applied to computer vision and speech-related tasks, achieving SOTA performance. The basic linear data augmentation methods are rotation, scale, shear, and flip, while non-linear data augmentation methods include mosaic [19] and mixup [20]. Non-linear augmentation methods and their combinations are typically more effective than simple linear augmentation methods in tasks such as image classification and target detection. The same data augmentation methods that have been used successfully in the image domain are now widely used in speech. It is standard practice to first transform the raw speech signal into a Mel spectrogram before applying a data augmentation transform [41]. If an end-to-end architecture is used and the raw audio data are directly fed into a deep neural network model, widely used and proven methods for audio data augmentation include pitch shifting, time stretching, and random frequency filtering [21]. In the medical field, various data augmentation methods and combinations of methods for data from wearable sensors have been validated. These data augmentation methods are based on domain knowledge and thus enable efficient expansion of the data.
Modulated signal data augmentation can be seen as the injection of a priori knowledge about the invariant properties of radio signal data for certain transformations. In realistic scenarios, the data used for model training are obtained from a limited number of scenarios, while the target application of the model may exist in different conditions, such as fluctuating channel environments. Therefore, the amount of data has a decisive impact on the performance of the model. On the other hand, deep learning models usually contain huge amounts of parameters in order to enhance their representational capabilities. If there is a mismatch between the amount of data and the size of the model, the model is prone to overfitting. Data augmentation, on the other hand, can expand the input space not covered by the data, thus implicitly increasing the amount of data and extending the diversity of the data, preventing the model from overfitting the training data, and improving the generalization ability of the deep learning model on the test data.
However, the most critical issue facing practical applications is how to design data augmentation methods for the model so as to obtain greater improvements in the model's performance. This process relies on domain knowledge, as modulated signal data are a type of time series data, unlike electrocardiogram signal data in the medical field [42], which are not used to distinguish an event in a regular signal sequence but prefers a pattern of signal sample points over a period of time, which is difficult to detect intuitively in the absence of expert knowledge. Therefore, when designing augmentation methods for modulated signals, we must not only focus on the commonalities with other time-series data but also take into account the unique characteristics of modulated signals. In particular, a modulated signal sample consists of I and Q components, which are closely related at each point in time, as is evident from the APs representation of the signal, and therefore it may be difficult to obtain useful synthetic data by transforming the I or Q components alone.
Based on this principle, we proposed four new and one combination of label-preserving transformations for modulated signals. There were three augmentation methods proposed in the existing work [18], i.e., jitter, rotation, and flip, and five new augmentation methods proposed in this paper, i.e., channel shuffle, inversion, split and permutation, scaling, and flip and channel shuffle. As shown in Figure 1, we randomly selected CPSFK and QPSK from the dataset as examples to demonstrate the effect of eight augmentation transformations. A brief description of each transformation is given below. The Jitter (Jit) method augments the original samples with noise. Deep neural networks typically overfit when they learn high-frequency features that may or may not be useful. Gaussian noise with a zero mean contains data points at almost every frequency, effectively distorting high-frequency features. This also means that lower-frequency components are distorted. Model learning can thus be improved by adding a moderate amount of noise. Unlike previous research, in which Gaussian noise is added with a fixed standard deviation, our augmentation method is sample dependent. In Jit augmentation, the standard deviation σ 2 of Gaussian noise is 5% of each sample, and the augmented samplex is denoted as follows: Rotation (Rot) is currently considered to be the most efficient transformation for AMR tasks [18]. Rot rotates the constellation diagram of the signal clockwise by θ, θ ∈ {0, π/2, π, 3π/2}. In terms of the constellation diagram, the rotation transformation is thus spatially invariant. We can then obtain an augmented signal sample (Ã,φ) [18].
Flip inverts the sign of the IQ component [18], so the transformation exists in four cases: the I component is converted to a negative value, the Q component is converted to a negative value, both the I and Q components are converted to negative values, and both the I and Q components are held constant. This can be expressed mathematically as follows: where α, β ∈ {−1, 1}. Channel shuffle (CS) swaps the channels of the IQ array. This transformation does not change the correspondence between the I and Q components at each time point, and the values do not change; only the channels are shuffled. Swapping the two channels does not change the labels of the samples, and this transformation reduces the dependence of the model on the relationship between the locations of the IQ components. The augmented data are represented as follows: Inversion (Inv) is the process of inverting a time series. This transformation fully utilizes the AMR task, which is required to discover patterns within the IQ signal data cycle that do not change when the sequence is inverted. Inverting the sequence, on the other hand, reduces the model's reliance on the sequence's preceding and following orders and focuses more on the signal's global properties.
Split and permutation (SP) divide the IQ sample into S segments, which are then disrupted, with S ranging from 1 to 8. The greater the value of S, the more the intrinsic pattern of the sample in the time series is disrupted, reducing the model's reliance on high-frequency useless features and shifting the model's focus to local period features. However, S should not be too large because, for higher-order modulated signals with more samples required to form the intrinsic mode, such as QAM64, slicing the signal too thin will prevent the model from effectively capturing the full modulation pattern.
The scaling (Sca) transformation scales the IQ components by a random number with a mean of 1 and a variance equal to the sample variance. The scaled sample is as follows: The flip and channel shuffle (FCS) transformation is a combination of the Flip and CS transformations, which means it can take eight different forms.

SigAugment
The goal of data augmentation is to cover, as far as possible, situations that are not covered by the original input data through efficient transformations that may occur in test situations. It is not feasible to simply stack the data generated by all data augmentation methods together to achieve an increase in data volume. Since each data augmentation method performs differently in different models, stacking the data may compromise the performance of the model. One possible approach is to use an automatic data augmentation strategy [38]. AutoAugment is a method for learning augmentation strategies from data that designs a search space in which each strategy consists of many sub-strategies and uses a search algorithm to find the best strategy that allows the model to produce the highest validation performance on the target dataset. However, the search process of AutoAugment is expensive, and therefore it usually finds optimized hyperparameters through proxy tasks on small datasets where it is doubtful whether these hyperparameters are optimal on the target dataset. RandAugment [39] simplifies the space for augmented policy search and allows the search for hyperparameters to be carried out directly on the target dataset.
Therefore, we proposed an efficient automatic data augmentation method for radio signal data named SigAugment. The execution of SigAugment during the model training process is shown in Figure 2. For a batch of IQ samples, SigAugment selects the data augmentation sequence according to two alternative strategies provided: (1) selecting a constant number of transformations from the set of transformations and (2) selecting a random number of transformations from the set of transformations. Depending on the strategies, we named them SigAugment-C and SigAugment-R, respectively. The proposed RandAugment approach consists of two steps: 1. The first step is to define a set of data transformations. In this paper, these transformations included the following: jitter, rotation, flip, channel shuffle, inversion, split and permutation, scaling, and flip and channel shuffle. This set is extensible.
2. The second step is to select either the SigAugment-R or SigAugment-C method to obtain the transformation sequence. If SigAugment-R is used, a random number of transformation sequences are obtained for each sample in each training batch. If SigAugment-C is used, a constant number of transformation sequences is obtained for each sample in each training batch. With just one line of code, the method can be used as a plug-and-play component for training any deep learning-based AMR model. SigAugment is an online data augmentation method that provides two alternative transform selection methods, SigAugment-C and SigAugment-R, both of which use a fixed transform magnitude. SigAugment-C inherits from RandAugment the use of a fixed number of transformations, while SigAugment-R uses a random number of transformations. The augmentation transformations set, for example, contains T = 8 methods, and SigAugment-C employs N = 4 transformations. If the effect of the order of augmentation transformations is ignored, the possible augmentation combinations are C N T , while the possible SigAugment-R combinations are ∑ T i=0 C i T . In SigAugment, we used the eight data augmentation methods described in the previous section to construct the transformation pool. SigAugment-R randomly generated the N transformations used for augmentation. The value of N was generated randomly generated at each epoch, and N was the maximum number of transformations that could be chosen, N ∈ [0, T]. Therefore, the number of transformations generated by SigAugment-R may not be the same for each epoch.

Datasets and Empirical Settings
The RADIOML 2016.10A dataset (2016A) [3] was generated in GNU Radio using the GNU Radio channel model. The 2016A dataset contains 220,000 signal samples with a category number of 11. Considering that the AM-SSB signal in the dataset may be channel noise, we removed it from the dataset. In addition, existing studies have shown that the average accuracy of existing models is below 20% when the SNR is below −8 dB [43]. Therefore, we chose an SNR range of −8 dB to 18 dB for the data, and the censored dataset, therefore, contains 140,000 signal samples. To control the degree of data imbalance, we used an imbalance factor β used in [44] to describe the severity of the class imbalance problem with the number of training session samples for the most frequent class and the least frequent class, e.g., β = N max /N min . The imbalance factors we use in experiments are 10, 20, and 50. Table 1 shows the composition of each dataset. Each dataset consists of a training set, a validation set, and a testing set. The training and validation sets are used for training sessions. The 2016A and 2016A-1 datasets are class-balanced datasets. The 2016A dataset is the full version of the 2016A dataset, without the AM-SSB signals and with signal samples below −8 dB. This dataset is used for comparative analysis with existing methods. We follow the mainstream approach in the existing literature by dividing the dataset into a training, validation, and test set in a 6:2:2 ratio. The second dataset is the baseline dataset under class-balanced conditions, denoted 2016A-1, which contains the same test dataset as the five imbalanced datasets and the remaining 70,000 samples as training and validation sets. The ratio of training to validation sets on the 2016A-1 and imbalanced datasets is 3:1. The 2016A-10, 2016A-10a, 2016A-10b, 2016A-20, and 2016A-50 datasets are class-imbalanced datasets with an imbalance degree β of 10, 10, 10, 20, and 50, respectively.     1  7000  7000  7000  7000  7000  7000  7000  7000  7000  7000  70,000  2016A-10  7000  6300  5600  4900  4200  3500  2800  2100  1400  700  38,500  2016A-10a  3500  3150  2800  2450  2100  1750  1400  1050  700  350  19,250  2016A-10b  1400  1260  1120  980  840  700  560  420  280  140  7700  2016A-20  7000  2800  2450  2100  1750  1400  1120  840  560  350  20,370  2016A-50  7000  1260  1120  980  840  700  560  420  280  140 13,300 The training process is terminated when the validation loss does not decrease after 25 epochs or when the number of training epochs reaches 300. We used the Adam [45] optimizer and a warm-up learning rate with an initial learning rate of 0.001 to minimize the cross-entropy loss. The model was built with Tensorflow [46] and trained with two GPU cards.
We use the average classification accuracy of all classes across all SNRs to evaluate the performance algorithm in our tests, as the distribution of classes in the test dataset is balanced. We used the average results of three independent experiments. The standard deviation is also used to measure the stability of the model. Assuming C is the number of classes and A ij is the classification accuracy of the ith class in the jth experiment on a single model, the comparison accuracy is expressed as follows:

Deep Learning Models for Evaluating Data Augmentation Methods
Three representative SOTA models are selected for the evaluation of the proposed data augmentation algorithm. These three models are DAE [27], SE-MSFN [29], and LSTM2 [28]. The reasons for selecting them for the algorithm evaluation are as follows: First, the selected models represent three different types of network structures: DAE, LSTM2, and SE-MSFN are auto-encoder, RNN, and CNN structures, respectively; second, they are superior in performance; DAE is a lightweight neural network for computationally constrained platforms, while LSTM2 and SE-MSFN, as representatives of high-accuracy models, achieved SOTA performance on several modulation recognition benchmark datasets. A detailed description of the selected models is given below.
The Denoising Auto-Encoder (DAE) [27] is a lightweight network structure that is based on the denoising auto-encoder and RNNs. The DAE's inputs are L2-normalized amplitudes, and the classifier's encoder is a two-layer LSTM, while the decoder is a shared, fully connected layer. Since the network output includes both the decoder and the classification output, the loss function includes both the reconstruction and classification losses.
LSTM2 [28] uses amplitudes and phases (APs) as model inputs, as does DAE, and the backbone is stacked with two layers of LSTM. LSTM2 is a simple structure with reasonable classification performance that is widely used in existing studies as a benchmark model. SE-MSFN [29] is a CNN classifier that uses large kernel convolution, multi-scale structure, and a combined attention mechanism to extract features from original IQ samples. It has SOTA performance on several benchmark datasets and has the advantages of fast convergence and high recognition accuracy. At the same time, its large number of parameters and model complexity make it suitable for use on high-capacity platforms.

Comparison Methods
A detailed, empirical comparison of 85 variants of minority oversampling techniques (SMOTE) is presented in [47]. We use the top four SMOTE variants as representatives of upsampling methods based on the comprehensive ranking of various over-sampling techniques in the paper, and they are polynome-fit-SMOTE [9], ProWSyn [10], SMOTE-IPF [11], and Lee [12].
Focal loss was first introduced in RetinaNet to address the imbalance between background classes and other classes in target detection tasks [15]. Focal loss is useful for classification in cases where the classes are highly imbalanced. It downscales well-classified examples and focuses on hard samples. For a sample misclassified by the classifier, the loss value is much higher than the corresponding loss value for a well-classified sample. The focal loss adds a modulating factor α t (1 − p t ) γ to cross-entropy loss, with tunable focusing parameter γ and balanced variant α t . p t is the category probability estimated by the model. Focal loss is defined as follows: As we have several unbalanced datasets in our experiments, we used the recommended parameter settings in focal loss, i.e., α t = 0.25 and γ = 2.0. We implement the focal loss method with Tensorflow's additional functional interface, tensor-flow_addons.losses.SigmoidFocalCrossEntropy.
Data augmentation methods. The data augmentation methods used for comparison include individual data augmentation methods such as rotation, flip, and jitter [18], as well as the five augmentation methods proposed in this paper.

Baseline Experiments
We set up three sets of baseline experiments. The goals of the baseline experiments include determining the best hyperparameters, evaluating the performance of the baseline models, and assessing the effect of the datasets on the baseline models.
In the first set of experiments, we validated the accuracy and stability of the selected evaluation models DAE [27], SE-MSFN [29], and LSTM2 [28] for various batch sizes, since the setting of hyperparameters has a significant impact on the training of deep learning models [48,49]. During our experiments, we found that the hyperparameter batch size has a significant impact on model training. The test results on the 2016A dataset are shown in Figure 3, where the vertical error bars represent the standard deviation of the three experiments, and the shorter the error bars, the better the model stability. On the balanced dataset, we can see that (1) the recognition accuracy of the three models decreases as the batch size increases; (2) the SE-MSFN recognition accuracy outperforms the LSTM2 and DAE; and (3) the SE-MSFN and LSTM2 stability outperform the lightweight model. We set the batch size of all subsequent experiments to 128 to ensure a fair comparison and to account for performance, training efficiency, and stability. In the second set of experiments, we analyze and compare the performance of the baseline model on the datasets 2016A, 2016A-1, 2016A-10, 2016A-20, and 2016A-50. As can be seen from Figure 4, the performance of the baseline models tends to decrease as the degree of data imbalance increases and the amount of data decreases. The best-performing model on each dataset is SE-MSFN, which not only has consistently optimal recognition accuracy, but also has good model stability. SE-MSFN and LSTM2 perform similarly on both the well-balanced and the slightly imbalanced datasets. At imbalance factors of 20 and 50, the accuracy of LSTM2 decreased sharply, and the average recognition accuracy of SE-MSFN exceeded that of LSTM2 by 15.51% and 14.79% for the three experiments, and LSTM2 was 11.70% and 7.15% lower than the DAE of the lightweight model, respectively. Figure 5 shows the training process of LSTM2. The model was trained with high accuracy on the 2016A-1 and 2016A-50 datasets, and the validation loss started to increase at around 40 epochs, while the validation accuracy leveled off. We used an early stop strategy to avoid overfitting by monitoring the validation loss. However, the validation accuracy of the model on 2016A-1 was significantly higher than the results on 2016A-50. As the training process progressed, the gap between the model's training and validation accuracy grew significantly larger on 2016A-50 than on 2016A-1, indicating that LSTM2 overfitted more severely on the small-scale imbalanced dataset. In the third set of experiments, we investigate the two models with the best average recognition performance (SE-MSFN and LSTM2) on a class balance dataset for each category. Figure 6 shows the results. We can easily see that the average recognition accuracy of both models for WBFM signals, which are hard samples, is less than 40%. This is due to the fact that the modulated signal is a real audio stream with silent periods generated from a real audio stream with silent periods [28]. All existing methods make effective identification difficult. In addition, as can be seen from Figure A1, the models easily confuse QAM16 and QAM64 due to the fact that QAM16 and QAM64 are members of the same modulation family, while QAM16 is a subset of QAM64. When the SNR is below 0 dB, the 8BPSK and QPSK signals have a lower recognition accuracy, resulting in an average recognition accuracy of less than 80%. According to our rules for constructing our class-imbalanced data, the signals of the four modulation types QAM16, QAM64, QPSK, and WBFM belong to the minority class, the tail-end class of the long-tailed distribution data. As a result, As a result, it will be more difficult to identify these four signals in the class-imbalanced datasets.

Results of Data Augmentation Methods on the Balanced Datasets
We compared the performance of various data augmentation methods on two classbalanced datasets. These methods include Jit, Rot, Flip, CS, Inv, SP, Sca, FCS, and SigAugment. Experimental results are shown in Tables 3 and 4. Table 3. The average recognition accuracy (%) of the baseline models on the 2016A dataset using various data augmentation methods and the gain (%) when compared to no data augmentation. The top1 average accuracy and gain have been highlighted.  On the benchmark dataset 2016A, SE-MSFN achieves an optimal recognition accuracy of 62.96%, whereas using the SigAugment proposed in this paper, a SOTA mean average accuracy of 64.44% can be easily achieved on the LSTM2 model without changing the model structure. The average accuracy of the DAE and SE-MSFN models also improved from 58.45% and 62.96% to 62.63% and 63.94%, respectively. The improvement was greater for lightweight models than for larger models such as the SE-MSFN (4.18% vs. 0.98%). Individual augmentation transforms, for the most part, improve the model's recognition accuracy; however, there are some that have a negative effect on a specific model; for example, training SE-MSFN on the 2016A dataset with Rot and FCS decreased the accuracy by 0.24% and 0.68%, respectively. Individual transformations perform differently across models and datasets. Individual transformations that are most helpful for DAE, SE-MSFN, and LSTM2 on the 2016A dataset are Rot, CS, and FCS, in that order. FCS, SP, and Rot are the most helpful individual transformations for DAE, SE-MSFN, and LSTM2 on the 2016A-1 dataset, while our proposed SigAugment method consistently works, the two SigAugment transformation selection modes are only slightly different for different datasets and models. This demonstrates that our proposed SigAugment method can be applied effectively to a variety of datasets and deep neural network models.

Methods
On the 2016A-1 dataset, the gains from the three baseline models using SigAugment are 5.73%, 3.98%, and 5.48%, respectively, which are 1.55%, 3.00%, and 3.14% higher than the gains obtained on the 2016A dataset. The 2016A dataset is widely used in existing studies, and it achieves the best available average recognition accuracy of 64.44% without changing the model structure and using only SigAugment-C training LSTM2. At 0 dB, the three experiments achieve accuracies of 91.05%, 91.60%, and 91.85%, respectively, compared to 88.30%, 87.05%, and 87.80% without using any data augmentation method. When used to train SE-MSFN, the two augmentation methods, Rot and FCS, performed significantly differently on the two datasets. Rot and FCS have a negative impact on SE-MSFN recognition performance on the 2016A dataset, which has extremely low SNRs (SNR < 8 dB), but they perform well on the 2016A-1 dataset. We suspect that this is due to interference in the model's training after using the Rot and FCS transforms on signals with very low SNRs, which results in poor performance.

Results on Class-Imbalanced Datasets
On the class-imbalanced datasets, the comparison is divided into two parts: the comparison of the proposed methods with the existing representative class rebalancing methods on the class-imbalanced datasets and the comparison of the representative methods in terms of the recognition accuracy of each modulation class.
On the three class-imbalanced datasets, Table 5 compares the performance of the proposed methods to the representative class rebalancing methods and individual transformations. Based on the experimental results in the table, the following discussion and analysis will expand from top to bottom and from left to right.
As the degree of imbalance in the dataset increases, so does the performance of the baseline model. SE-MSFN outperforms DAE and LSTM2 in terms of stability. First, we investigate the performance of each model on each dataset. The performance gap between the lightweight model DAE and the other two models is larger than on the balanced dataset.
In class rebalancing methods, the focal loss is only useful for recognition when training a specific model on a specific dataset, and it is usually comparable to the baseline model, i.e., when using the cross-entropy loss function. The four up-sampling methods performed very differently. When training LSTM2 on the 2016A-20 dataset, the most helpful methods are Polynom_fit_SMOTE_poly and SMOTE_IPF, with accuracy gains of 14.12% and 14.80%, respectively. In most other cases, the gains in recognition accuracy are negligible, and the models frequently struggle to converge when training LSTM2 and DAE. On the three datasets in that order, we calculate the mean accuracy values for each of the four upsampling methods and then compare them to the baseline method, with gains of 1.74%, 2.58%, −34.43%, −1.28%, 1.12%, 3.63%, −25.66%, −5.32%, and 17.55%. When training DAE and LSTM2, we can see that the up-sampling methods always result in a significant decrease in accuracy on three datasets. The overall improvement in performance is minor and highly variable. Furthermore, the up-sampling methods generate data offline, which can significantly increase the number of training samples and thus reduce model training efficiency.   Our proposed online data augmentation strategies do not explicitly increase the number of samples, but rather transform the original training data at each epoch of training, only increasing the computational cost associated with the transformations. Similarly, we compute the average accuracy of SigAugment-C and SigAugment-R across models and datasets, with gains of 3.70%, 9.78%, 13.24%, 6.65%, 14.01%, 31.46%, 0.62%, 12.60%, and 29.92% on the three datasets in that order. With an average gain of 24.88% across the three datasets, SigAugment had the greatest improvement for the LSTM2 model. Furthermore, the greater the degree of imbalance in the dataset and the smaller the data size, the greater the gain from data augmentation. This is consistent with the expectation that smaller datasets require more regularization. Furthermore, when we compare the performance of SigAugment on the 2016A-50 dataset to the baseline model on the 2016A dataset, we find that even though the number of samples in the training set is reduced by a factor of 5 while the degree of sample imbalance increases by a factor of 5, on the same test set, our data augmentation strategies achieve recognition accuracies comparable to the baseline model (60.16% vs. 66.05%, 70.93% vs. 71.08%, and 72.61% vs. 70.06%). Figure 7 shows the loss and accuracy of LSTM2 training on the 2016A-50 dataset using the proposed SigAugment-R method. By comparing with Figure 5b, it is clear that the gap between the training loss and validation loss of LSTM2 is quite small by using SigAugment-R. This shows that SigAugment-R is effective in preventing overfitting. In addition, we report the results of our experiments on 2016A-10a and 2016A-10b in the Appendix, and the results are shown in Table A1. We can draw similar conclusions. Comparing the results of the model on 2016A-10b and 2016A-50, we find that the sample size of the head category has a greater impact on the resampling approaches, as it can lead to more severe overfitting problems in the model. In contrast, it has less impact on the individual data augmentation transformations and almost no impact on our proposed automated data augmentation method.
The individual data augmentation methods are scored according to the following rules: we rank all transformations from highest to lowest gain, with the top three receiving one point and the remaining receiving none. The total score for three models and three datasets is nine. Figure 8 shows the scoring results. The top three scoring transforms are Rot, SP, and FCS, with SP and FCS being new to this paper; the Sca and Jit transforms often find it difficult to outperform other augmentation methods because they add some perturbation to the original signals. Based on the average recognition accuracy of the models, we conducted a comparative analysis of the models and methods. Figure 9 shows the recognition accuracy of various modulation types at various SNRs on the 2016A-50 dataset. We analyze the model with the highest median average accuracy across the three experiments. The data augmentation methods proposed in this paper can effectively improve the performance of the model on class-imbalanced datasets. In particular, for the minority classes (PAM4, QAM16, QAM64, QPSK, and WBFM), training LSTM2 with SigAugment-R improved the average recognition accuracy on these five classes from 3.23% to 51.86%. When the SNR is above 0 dB, LSTM2 with SigAugment-R increased the average recognition accuracy of all classes from 49.94% to 83.38%.  Figure 10 shows the results for another model, the SE-MSFN. Comparing with Figure 9, we see that the WBFM signal is a hard sample for both models. The LSTM2 identifies QAM16 better than the SE-MFSN. The SE-MFSN has better results for QAM64. Both models show significant recognition gains for minority class samples and high SNR cases after using our data augmentation method. These results show that the proposed SigAugment is able to effectively mitigate the effects of class imbalance and low-data regime.

Ablation Study
We use the RandAugment [39] ablation experiment methodology to validate the contribution of each transformation to the SigAugment. Table 6 shows the results. When a specific transformation is added to the transformation set, the average difference in test accuracy increases. SE-MSFN models were trained on the 2016A-10, 2016A-10a, 2016A-10b, 2016A-20, and 2016A-50 datasets with SigAugment-R for ablation study. We can see that almost all of the transformations can improve test accuracy, with the SP transformation being the most helpful for SigAugment-R. However, Table 5 shows that the Rot transformation alone achieves the top-two average recognition accuracy gains on the three datasets, whereas the gains from the Rot transform (0.80%, 1.22%, 0.86%, 0.43%, and 2.98%) for SigAugment-R are relatively small compared to the other transforms, which does not appear to meet our intuitive expectations. Several experiments, however, confirmed this. We believe that this is an open question and that the issue of the gain of individual transforms on combined enhancement merits a more in-depth investigation that is beyond the scope of this paper.   Table 7. Furthermore, when Tables 5 and 7 are combined, we can see that the joint augmentation method outperforms the use of individual data augmentation methods.  In a real communication environment, the low-data category of modulation is not constant. We construct datasets with different majority and minority classes in two ways.
(1) By reversing the default list of modulation classes, i.e., changing the order from 8PSK, AM-DSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, and WBFM to WBFM, QPSK, QAM64, QAM16, PAM4, GFSK, CPFSK, BPSK, 8PSK, and AM-DSB. (2) By randomly shuffling the list but fixing the random seeds during each training session so that all experiments are repeatable, the order of the modulation classes obtained is AM-DSB, CPFSK, WBFM, BPSK, 8PSK, QAM16, PAM4, GFSK, QPSK, and QAM64. This gives us six new unbalanced datasets. Detailed information can be found in Tables A2 and A3 in  Appendix A.3. Based on previous experiments, the experimental protocol for these six datasets is to select one representative method from each of the rebalancing and data augmentation methods to experiment with. These two methods are Polynom_fit_SMOTE_poly and SigAugment-R. The experimental results are shown in Tables A4 and A5. We obtained similar results to the previous experiments. However, we also note that for the DAE model, our proposed SigAugment method is not always valid. This may be related to the encoder structure of a network such as DAE. Overly complex data augmentation is difficult to encode and decode effectively for the DAE model.

Discussion
In this paper, we present an efficient data augmentation method for class imbalance modulation recognition in the low-data regime. In particular, we present four new and one combination of label-preserving transformations as well as an efficient automatic data augmentation strategy that does not require a separate search for data augmentation policies. To validate their effectiveness, the proposed methods are thoroughly evaluated and compared using multiple types of datasets and three representative SOTA deep neural network models. Particularly in low-data, imbalanced-class regimes, the proposed method can significantly improve the recognition performance of deep neural network models.
There are still some intriguing unanswered questions. Despite the fact that we provide two methods for obtaining the final augmented sequence, there is no guarantee that the combination of augmentations obtained by these two methods is optimal. Is it then possible to use a multi-stage selection approach, such as selecting the more helpful transformations first, such as Rot and SP, and then randomly selecting other transformations with smaller contributions? Furthermore, questions such as how the individual augmentation methods in the SigAugment transform set interact with one another and whether some of these combinations cancel each other out need to be thoroughly investigated.
All of the transformations used in SigAugment are label-preserving. Label interpolation generation methods, such as mixup [20], have been shown to be more effective than label-preserving methods in a variety of tasks in the vision domain. So, can label interpolation methods be extended to raw IQ or AP data in the communications signal domain? This could be the next area to take a gander at. However, based on our preliminary results, direct interpolation of samples and labels for mixing does not work. However, mixing samples from the same category while preserving the labels is worth a shot.
Based on the results in Table 5, we chose the best-performing class rebalancing method from each of the three baseline models on the 2016A-50 dataset and added these methods and their corresponding models to the SigAugment method for joint training, as shown in Table 8. We can see that this combined approach does not result in improved performance but rather lower recognition accuracy. It is an intriguing phenomenon that the two methods used in the combination both improve the model's performance, but the combination fails. In order to find a solution, it may be worthwhile to examine the mechanism of the combination in greater detail in future work. Furthermore, when we compare Figure 8 to Table 6, we see that the three best methods for individual transformations are Rot, SP, and FCS, whereas for automatic data augmentation methods, we evaluate the gain from a specific transformation by adding it to the transformation pool of SigAugment; when we use the same approach for individual transformations, the top three methods are SP, Inv, and FCS, which does not appear to be the expected result. This also highlights the fact that the relationship between the effectiveness of individual data augmentation methods and their combined overall effect is complicated. As a result, the expansion of the SigAugment transformation pool appears to be more open and full of possibilities, and augmentation techniques that do not work as well when applied independently might achieve notable benefits in SigAugment.

Conclusions
In this paper, we address the AMR task for class imbalance in the low-data regime from the perspective of extending the original data distribution. We do not follow the traditional idea of rebalancing the classes when the AMR tasks face both low-data and imbalanced-class regime problems, because this would lead to severe overfitting of the model, and the performance of different approaches varies widely across different datasets and deep neural networks. As a result, we propose a more fundamental data augmentation approach to address the problem of insufficient data and unbalanced classes. To augment the data online, we propose SigAugment, a practical and effective data augmentation strategy. Based on the 2016A dataset, we created four sub-datasets and thoroughly tested the algorithm on three representative SOTA models. The experimental results validate the effectiveness of the proposed method.
SigAugment can be seamlessly integrated into existing data generation pipelines with just one line of code, and the algorithm can be easily extended by researchers. We hope that the proposed method can be validated and applied in more practical scenarios.    Table A1. On three class-imbalanced datasets, 2016A-10, 2016A-10a, and 2016A-10b, the proposed methods were compared to existing representative class rebalancing methods and individual transformations. The accuracy (%) is calculated as the mean of three experiments. The best-performing individual model results are shown in bold, while the best-performing dataset results are shown in bold and red.