1. Introduction
With the rapid development of wireless communications, modulation recognition has become very important in modern wireless communication systems [
1,
2]. As the low-altitude economy booms, unmanned aerial vehicle (UAV) air–ground communication is widely applied in civil and industrial scenarios, making automatic modulation recognition (AMR) a core technical support for the safe operation of UAV systems and standardized governance of low-altitude electromagnetic spectra. The accurate recognition of UAV signal modulation is also critical to avoiding inter-UAV electromagnetic interference and safeguarding the order of the low-altitude communication environment [
3].
AMR is often applied in electronic warfare and military software-defined radio [
4,
5,
6]. This technique can accurately analyze the parameters of enemy communication UAV signals and provide key intelligence support for electronic countermeasures and target positioning. In the field of electromagnetic spectrum management, AMR can identify illegal UAV signals, malicious interference sources, and unauthorized spectrum occupancy behaviors [
7,
8]. It provides core technical support for the compliant scheduling, security protection, and efficient utilization of spectrum resources, and thus becomes an important component of the electromagnetic spectrum governance system [
9,
10]. In UAV air–ground signal processing and identification tasks under complex low-altitude propagation environments with multi-scale fading and LoS/NLoS switching, data quality and algorithm robustness directly affect the performance of AMR [
11,
12]. Therefore, improving modulation recognition accuracy has been a key focus of current research.
The core operation mode of conventional modulation recognition relies heavily on manual intervention [
13]. In these manual-dependent methods, the process of identifying UAV signal modulation types centers entirely on operators. This limitation severely restricts the application effectiveness of these methods in complex communication scenarios. The accuracy and reliability of recognition results depend heavily on the expert cognitive level and subjective judgment tendencies. From the practical operation perspective, different operators vary individually in professional knowledge reserves, signal feature interpretation capabilities, and the degree of experience accumulation—these differences further amplify the instability of recognition outcomes. Meanwhile, with the rapid iteration of communication technologies and the exponential growth of modulation types, the traditional manual experience system can hardly achieve comprehensive coverage. More critically, when faced with scenarios where the number of samples is scarce, manual methods lack sufficient data support to extract universal recognition rules and are unable to cope with the variation patterns of signal features across different scenarios, leading to a cliff-like decline in recognition accuracy. This problem is particularly prominent in environments with low signal-to-noise ratio (SNR) and superimposed multiple interferences [
14]. The gap between the expanding technical scope and the limited experience system have directly led to a systematic decline in recognition accuracy.
Modulation recognition based on deep learning are widely used in recent years. The authors of [
15,
16] applied generative adversarial networks (GANs) to data classification and recognition. In modulation identification, GANs can be used to generate diverse modulated signal data, addressing the scarcity of high-quality data while enhancing the generalization ability of the models [
16]. The authors of [
17] proposed a mixed-Domain feature fusion network (MDFNet) for few-shot modulation identification. This network can effectively fuse the time-domain and frequency-domain features of signals, achieve a certain noises reduction effect on signal feature transmission, and enhance the ability to capture valid information of models. The authors of [
18] designed a SEI method based on Transfer Learning. This method freezes some layers of the complex-valued neural network (CVNN) trained in the source domain, generalizes the learned transferable universal features to new modulation signal identification tasks, and uses the maximum mean discrepancy (MMD) metric as regularization for supervised learning to reduce the distribution difference between the source domain and the target domain in the latent space. The authors of [
19] introduced a framework that combines multi-domain contrastive learning with reinforcement learning. The multi-domain representation of signals enhances the richness of features, while the integrated architecture of contrastive learning and reinforcement learning can extract deep features for classification. Notably, most of deep learning methods focus on extracting the intrinsic features of the signals themselves, such as time-domain and frequency-domain features, while neglecting the systematic mining and modeling of environmental background features. Environmental background features (e.g., UAV air–ground channel LoS/NLoS state, multi-scale fading parameters (Rice factor/Nakagami parameter), path loss coefficient, multipath propagation delays, and Doppler shift intensities induced by UAV low-altitude flight) directly affect the propagation characteristics and final manifestations of signals [
20]. However, existing methods typically treat such environmental features as pure interference redundancy, rather than integrating them into model training as interpretable and valuable auxiliary discriminative information. This insufficient background analysis prevents models from distinguishing intrinsic signal features from environment-induced spurious features, leading to weak adaptability in the dynamic and complex electromagnetic environments and increased dependence on large-scale labeled samples.
In the field of deep learning-based modulation recognition, researchers have proposed a variety of specific methods to address key issues such as feature extraction and accuracy improvement. The authors of [
21] proposed a method for orthogonal frequency division multiplexing (OFDM) signal modulation pattern recognition that combines high-order cyclic cumulants with neural networks. They used neural networks to learn manually extracted feature representations, thereby improving recognition accuracy and robustness while reducing the complexity of feature engineering. The authors of [
22] presented an AMR method based on the transformer-context broadcasting (TCB) model. The Transformer model was introduced into AMR to extract the global correlation features of signals; unified attention was manually inserted to increase signal density, enabling higher classification accuracy. The authors of [
23] proposed a convolutional neural network (CNN) model that operates on fast Fourier transform window banks (FWBs) to extract useful symbol lengths in OFDM—these symbol lengths serve as indicators for identifying each OFDM-based wireless communication technology. After extracting the useful OFDM symbol lengths, a deep learning (DL)-based automatic modulation classification (AMC) system was proposed. This system combines FWB with in-phase and quadrature-phase signals to classify both OFDM symbol lengths and single-carrier modulation schemes simultaneously [
24]. However, mainstream deep learning models have limited generalization ability in complex channel environments.
To address these issues, we present a new intelligent modulation recognition method based on small-sample augmentation and soft threshold denoising. As shown in
Figure 1, the proposed STF-DRSN framework follows a four-stage pipeline: (1) UAV signal dataset collection, (2) model–data dual-driven dataset expansion to alleviate small-sample scarcity, (3) BL-LSTM-based environmental background learning to capture signal context features, and (4) STF-DRSN-based signal feature estimation and modulation recognition. The detailed design of each module is elaborated in the following sections. Our method differs from existing deep learning-based AMC methods in the core aspects, in that we use a model–data dual-driven strategy for small-sample processing, embed a BL-LSTM module to improve the environmental adaptability, and adopt an STF-DRSN with adaptive soft thresholding to achieve stable, high-precision recognition at low SNR. The contributions of this paper are summarized as follows:
- 1.
We propose a dual-driven dataset expansion method based on the UAV signal propagation model and received data samples. Different from most existing methods that rely solely on measured data augmentation or pure model-based simulation, our approach combines the UAV signal propagation model with measured data augmentation. Therefore, we can accurately expand the number of training samples and maintain a high level of feature consistency between the expanded dataset and the original dataset.
- 2.
We construct a background learning-based long short-term memory (BL-LSTM) model for adaptive modulation recognition in complex electromagnetic scenarios. Different from traditional LSTM-based recognition models that only extract signal temporal features and neglect environmental background information, our model deeply integrates environmental feature mining into the basic network. By integrating UAV air–ground channel environmental background information into the data training model and optimizing the authenticity of data distribution, the environmental adaptability of the proposed modulation recognition method to dynamic air–ground propagation scenarios can be enhanced.
- 3.
We construct a new deep residual shrinkage network based on the soft threshold function (STF-DRSN). Our improvement is integrating the soft threshold function into each residual block of the deep residual shrinkage network, enabling adaptive denoising and feature enhancement. The construct ensures that the modulation recognition accuracy of our method does not change significantly in different scenarios. The simulation results show that the average estimation accuracy of the proposed STF-DRSN is more than 95% under low-SNR conditions, outperforming existing residual network-based methods.
The remaining part of this paper is organized as follows:
Section 2 introduces the system models. We complete the dataset augmentation based on model–data dual-driven in
Section 3.
Section 4 presents the technology of training on augmented datasets based on environmental background information.
Section 5 completes the estimation of UAV signal features, and then
Section 6 conducts simulations to obtain results. Finally,
Section 7 provides the research conclusions.
2. System Model
2.1. UAV Air–Ground Signal Propagation Model
A typical low-altitude UAV air–ground communication scenario is illustrated in
Figure 2, with the UAV as the transmitter (Tx) and the ground terminal as the receiver (Rx). This propagation model characterizes the physical distortion of the transmitted signal during wireless transmission, and the baseband equivalent expression of the final received signal is given by [
17]
where
is the modulated complex baseband signal transmitted by the UAV, which carries the modulation information to be identified. The model quantifies the cumulative impairment of the wireless channel on the transmitted signal:
denotes the path loss gain, describing the deterministic signal attenuation as the Tx-Rx communication distance increases;
is the log-normal shadow fading gain, characterizing slow signal fluctuations caused by obstacles such as buildings and vegetation along the propagation path;
represents the time-varying small-scale fading gain, which adopts the Rice fading model for line-of-sight (LoS) scenarios with a dominant direct path, and the Rayleigh or Nakagami fading model for non-line-of-sight (NLoS) scenarios with only scattered multipath components, with the Doppler shift caused by UAV mobility integrated into this term. Finally,
is the zero-mean additive white Gaussian noise (AWGN) introduced by the receiver and the surrounding electromagnetic environment.
2.2. UAV Air–Ground Channel Fading and Noise Model
To accurately simulate the UAV low-altitude propagation environment and support subsequent dataset expansion, we adopt widely recognized channel and noise models matching UAV communication scenarios. The log-normal distribution is used for shadow fading, while Rice, Rayleigh and Nakagami distributions are adopted for small-scale fading under different channel states, with time-varying Doppler shift generated based on the harmonic superposition principle. AWGN conforming to theoretical statistical characteristics is generated via the Box–Muller transform method to simulate received signals under different signal-to-noise ratio (SNR) conditions. Based on the above models, we construct the augmented training data as
where
is the modulated UAV air–ground signal with Gaussian white noises added in a low-SNR environment with severe air–ground channel fading,
is the original received air–ground signal,
is the selected UAV air–ground channel multi-scale fading model with
for the LoS state and
for the NLoS state, and
is the generated Gaussian white noises.
,
,
(LoS/NLoS state) and
are taken as a set of augmented training data, with the addition of air–ground channel key parameters (Rice factor
K, Nakagami parameter
m, and path loss coefficient).
2.3. Problem Modeling
Considering the small-sample scenario for machine learning-based UAV air–ground signal feature estimation with dynamic LoS/NLoS switching and multi-scale fading in air–ground channels, we propose a modulation recognition method integrating small-sample augmentation and soft-threshold denoising. To effectively alleviate the sample shortage, the dataset is augmented under the model–data dual drive based on the UAV air–ground channel propagation model—this lays a solid data foundation for subsequent high-precision recognition in dynamic air–ground propagation scenarios. During model training, the augmented dataset is leveraged with the integration of environmental background information; this integration further optimizes the adaptation to complex interference environments, avoiding performance deviations caused by disconnected simulated UAV air–ground channel data and real low-altitude propagation data. For signal feature estimation, deep learning techniques are applied to automatically extract intrinsic signal features, ensuring the accuracy of subsequent modulation type identification. Notably, automatic modulation recognition takes the received signal as input and outputs the identified modulation type. Based on this working principle, the modulation recognition problem in interference environments can be expressed as
where
is the received interfered signal, and
m is the modulation type adopted for it. We optimize the modulation recognition model
such that the difference between the recognized modulation type
of the received signal and the actual modulation type
m is minimized under a given criterion
.
2.4. Technical Implementation
We first expand the dataset based on the model and data. By combining ray tracing (RT)-based UAV air–ground channel physical models simulation and the measured air–ground communication data, a high-coverage hybrid dataset is constructed to effectively alleviate the problem of insufficient samples. The UAV air–ground channel physical models can simulate signal characteristics under different low-altitude propagation conditions, while the measured data augmentation improves data diversity through methods such as noises injection and time–frequency transformation, ensuring that the training set can cover a wider range of signal scenarios.
Then, we train the dataset based on the environmental background information. The prior knowledge of UAV air–ground channels, including environmental noises, multipath effects, LoS/NLoS state, multi-scale fading parameters, and Doppler frequency shift induced by UAV low-altitude flight, is introduced to optimize the authenticity of data distribution. This step not only improves the physical rationality of the dataset but also enables the models to have stronger adaptability in complex electromagnetic environments, avoiding performance degradation caused by excessive differences between simulated UAV air–ground channel data and the real low-altitude propagation environment.
We finally estimate signal features based on deep learning. A hierarchical neural network is used to automatically extract the essential features of signals, achieving high-precision parameter estimation and classification. Compared with traditional methods, deep learning models can adaptively learn the nonlinear features of signals and combine environmental information for more robust reasoning.
The proposed framework unifies data augmentation, environmental background learning, and soft-threshold denoising into a collaborative pipeline, as none of these modules can individually address the multiple challenges in UAV signal modulation recognition. Using data augmentation alone can expand the sample scale, but without effective denoising, it will also amplify noise. Relying solely on background learning makes it difficult to extract stable environmental features under small-sample conditions and leaves the model vulnerable to noise interference. Adopting soft-threshold denoising alone lacks reliable environmental priors, which may either eliminate useful signal components or fail to suppress noise sufficiently.
The model–data dual-driven augmentation provides high-quality and diverse samples for BL-LSTM to learn accurate mappings among noise, channel, and modulation features. The environmental background information extracted by BL-LSTM further serves as critical prior knowledge for the adaptive soft-threshold mechanism in STF-DRSN, enabling dynamic and reasonable adjustment of denoising intensity. Meanwhile, the denoised output from STF-DRSN provides high-quality input for background learning to ensure stability. Through this synergistic cooperation, data augmentation compensates for sample insufficiency, background learning enhances environmental adaptability, and soft-threshold denoising strengthens anti-noise robustness.
The proposed three parts of work are interrelated and form a closed-loop optimization framework: data augmentation provides a foundation for training; the injection of environmental information improves the authenticity and availability of data; and the feedback results of feature estimation can further guide the iterative optimization of data generation and training strategies. This technical chain not only solves the problem of signal processing under small-sample conditions but also provides a full-link solution for intelligent electromagnetic signal analysis from data generation to models training, which has strong theoretical significance and practical application value. The connections between the technologies are shown in
Figure 3.
3. Model–Data Dual-Driven Dataset Expansion
3.1. Composite Fading Channel Model
Radio waves undergo reflection, refraction and diffraction by obstacles during propagation in the UAV low-altitude air–ground propagation environment. The received air–ground signal is typically a superposition of multi-cluster path signals, with each cluster composed of indistinguishable scattering branches. Measured data demonstrate that the shadow fading power, a form of large-scale fading in air–ground channels, follows a log-normal distribution [
25], which is expressed as
where
denotes the shadow fading gain, and
and
correspond to the standard deviation and regional mean of shadow fading respectively.
We adopt the fading models of Rayleigh channel and Rician channel, which are expressed as
where
denotes the Probability Density Function (PDF).
where
L is the amplitude of the line-of-sight propagation component, and
is the zero-order modified Bessel function.
To adapt to the complex small-scale fading characteristics of UAV air–ground channels in non-ideal NLoS scenarios, we also adopt the Nakagami fading model, which is expressed as
where
m is the Nakagami fading parameter for characterizing the severity of air–ground channel fading,
is the average power of the signal envelope, and
is the Gamma function. When
, the Nakagami fading model degrades to the Rayleigh fading model, which is suitable for the UAV air–ground channel with severe NLoS fading.
It is worth emphasizing that the received signals of different propagation paths experience different environments, resulting in the composite fading of each cluster being uncorrelated with one another. The impulse response corresponding to the time-varying multipath equivalent baseband channel [
26] is expressed as
where
denotes the UAV air–ground channel path loss for the
l-th path, and
represents the air–ground channel multi-scale fading (shadow fading + small-scale fading) for the
l-th path. For the LoS path,
is dominated by Rice fading; for the NLoS path,
is dominated by Rayleigh or Nakagami fading.
In dynamic UAV air–ground communication scenarios, the UAV transmitter maintains low-altitude flight while the ground receiver is in a fixed or low-speed mobile state. Combined with the varying low-altitude propagation environment such as terrain and building occlusion, channel parameters including time delay, maximum Doppler shift, path loss and line-of-sight/non-line-of-sight state exhibit random characteristics. Meanwhile, continuous changes in UAV air–ground channel scenarios such as UAV flight height and propagation distance result in regular and continuous variations of time delay, maximum Doppler shift, path loss and fading parameters. Channel parameters including the low-altitude propagation environment, time delay, maximum Doppler shift, path loss and line-of-sight/non-line-of-sight state show randomness at different moments when the UAV is in high-speed low-altitude flight.
3.2. A2G Channel Simulation Generation
The simulation process starts with the composite fading model and integrates Gaussian white noise generation and Doppler shift fusion to generate multi-scenario channel data, which lays a foundation for the subsequent dataset fusion with measured data.
As shown in
Figure 4, the left modules represent the core steps of simulation, the right panels detail the key methods of each step, and the dark blue highlighted content marks the formula-related technical points. The bottom logic bar summarizes the overall workflow, reflecting the transition from channel physical model to simulated-measured data fusion.
The simulation of wireless channel propagation characteristics is an integrated application of high-speed signal processing technology and computer application technology. The core component for implementing UAV air–ground channel propagation characteristic simulation technology is a digital multipath fading characteristic simulation module centered on a discrete tapped delay line, combined with ray tracing technology for three-dimensional low-altitude scene reconstruction.
We generate Gaussian random variables based on the principle of harmonic superposition to simulate the time-varying small-scale fading with Doppler shift characteristics, which can be expressed as
where
N denotes the number of unresolvable scattering branches,
,
, and
represent the maximum Doppler frequency, the incident angle of each scattering branch, and the initial phase, respectively.
When simulating and generating log-normal (shadow fading), Rayleigh, and Rice random processes for A2G channel, they can be decomposed into Gaussian random processes. Among these, a Rayleigh random variable (NLoS small-scale fading) can be expressed as
A log-normal random variable can be expressed as
where
.
UAV air–ground channel propagation characteristic simulation technology enables the simulation of air–ground channels under various low-altitude propagation scenarios, including static constant parameter channels, slow time-varying flat channels, Rice fading channels, Rayleigh flat channels, Nakagami fading channels and bidirectional Doppler channels matching UAV flight speed. These air–ground channels are applicable to different UAV low-altitude flight scenarios with varying flight heights and line-of-sight non-line-of-sight ratios, and reproduce the respective multi-scale fading characteristics of different propagation scenarios. The simulated channel data is combined with measured air–ground communication data to form the model–data dual-driven expanded dataset, which makes up for the shortage of original measured samples while retaining the core feature consistency with the original samples. The expanded dataset covers rich channel state characteristics and provides sufficient and high-quality data support for subsequent UAV signal modulation recognition model training.
4. Dataset Training Based on Environmental Background Information
To address the key technical challenges commonly encountered in signal propagation environments, such as low accuracy in signal modulation recognition and poor adaptability to complex environments, we propose a method based on deep learning network training. Specifically, we construct an expanded dataset incorporating rich environmental background information and use signal data with scenario-specific features as network input for systematic training. This method is designed to enhance the model adaptability to complex propagation environments while improving the accuracy of modulation recognition.
To further enhance the model ability to perceive and analyze signal propagation scenarios, on the basis of the aforementioned expanded signal dataset, we construct the background learning-based long short-term memory (BL-LSTM) model for background information extraction and SNR estimation. This model can deeply mine and extract the environmental background features embedded in the signal, and achieve real-time and reliable estimation of the SNR under different propagation scenarios. Ultimately, the model provides critical environmental parameter support for the subsequent optimization of signal modulation recognition accuracy.
4.1. BL-LSTM Model
We construct the BL-LSTM network model oriented to UAV air–ground channel environmental background information learning, with the input layer incorporating air–ground channel scene features (UAV flight height, propagation distance) and channel state features (LoS/NLoS state, fading parameter labels). The structure of the model is shown in
Figure 5, which mainly consists of the following parts [
27]:
Input Gate: This is governed by a sigmoid activation function to control new information integration into the memory cell. For dynamic environmental contexts, it prioritizes weight allocation to environment-relevant information and reduces weights for irrelevant data to avoid decision interference.
Forget Gate: This is a sigmoid-based gate that regulates the discarding of old memory cell information via 0–1 forgetting weights. It adaptively raises weights for environment-incompatible old information and lowers weights for environment-robust core information.
Cell State: This is the core of BL-LSTM for storing long-term dependent information, updated by the Input and Forget Gates. It dynamically optimizes storage strategies for continuous environmental fluctuations, preserving only current environment-compatible long-term information and short-term fluctuation-robust key data.
Output Gate: This uses a sigmoid function to weight information flow from the memory cell to the hidden state. It adjusts output weights based on environmental noise reduction demands, increasing weights for relevant memory information to prioritize its role in hidden state calculation and boost recognition accuracy.
The working process of the BL-LSTM network is as follows:
- 1.
The forget gate calculates the degree of forgetting for old information;
- 2.
The input gate computes the weight for incorporating newly input information;
- 3.
The state of the cell state is updated based on the weights of the input gate and the forget gate;
- 4.
The output gate calculates the weight for information flow from the cell state to the hidden state, and updates the hidden state.
Through the design of such a gating mechanism and memory cells, the BL-LSTM network can effectively capture the relevant dependencies of events in long sequences and achieve excellent performance in sequence modeling tasks with dataset expansion.
4.2. Background Information Extraction
The constructed BL-LSTM network is used to perform background information extraction and SNR estimation on the expanded dataset. The modulated signal with Gaussian noises is expressed as
and the constructed BL-LSTM is utilized to extract background information such as noises from the expanded dataset, which can be expressed as
where
represents the UAV air–ground channel environmental background information extracted from the
n-th modulated air–ground signal,
denotes the extraction function of the Long Short-Term Memory network model, and the extracted background information has the same shape dimension as the waveform of the modulated air–ground signal,
is the parameter of the UAV air–ground channel multi-scale fading model for the
n-th modulated signal, which is consistent with the unit of signal,
is the unitless Gaussian white noises parameter of the
n-th signal, and
is the LoS/NLoS state factor of the air–ground channel. The values of
and
are related to the UAV air–ground channel low-altitude propagation environment characteristics and the attributes of the modulated air–ground signal itself.
Since the proportion weight of Gaussian white noises in the extracted background information is much higher than that of the channel time-varying fading model, the power
of the extracted background information is approximated as the noise power
. Then, the noise power of the
n-th signal within time
is expressed as
The signal power
of the
n-th signal within time
is calculated can be expressed as
By combining the signal power and noise power, the power-based SNR is estimated, which is expressed as
Based on the above, we perform SNR and UAV air–ground channel fading parameter (Rice factor K/Nakagami parameter m) estimation on the air–ground signal, and input the extracted UAV air–ground channel environmental background information (LoS/NLoS state, fading parameters) and the estimated SNR into the subsequent modulation scheme estimation network to improve the recognition accuracy in dynamic low-altitude propagation scenarios.
6. Simulation and Result Analysis
In this section, we present the simulation results and analysis to demonstrate the performance of our method. We first introduce the simulation settings, and then present the estimation accuracy of our method. We also compare the estimation accuracy of our method with that of the state of the art.
The model training platform of this experiment adopts Intel Xeon Gold 6240R CPU, NVIDIA RTX 3090 24 GB GPU and 64 GB DDR4 memory, and the onboard deployment test platform adopts NVIDIA Jetson Xavier NX simulating the resource-constrained scenario of UAV airborne platform, equipped with 6-core Carmel ARM CPU, 384-core Volta GPU and 8 GB shared memory. The software environment uniformly adopts Ubuntu 20.04 LTS operating system, Python 3.8, PyTorch 1.13.1 deep learning framework, CUDA 11.7 parallel computing architecture, cuDNN 8.5 deep neural network acceleration library, and Matlab R2023b to complete the simulation generation of channel and baseband signals.
We adopt a measured air-to-ground (A2G) channel dataset [
32] covering three typical scenarios with data collected at two different UAV flight heights. We select six typical modulation schemes used in UAV communications (ASK, FSK, BPSK, FM, MSK, QPSK), uniformly set their baseband parameters to a 2.4 GHz carrier frequency, 1 Mbps symbol rate, 8 MHz sampling rate (8 samples per symbol), and a fixed 1024-point baseband IQ sequence length, and employ a raised cosine filter with a roll-off factor of 0.35 for pulse shaping to align with practical UAV communication engineering.
6.1. Performance Comparison Before and After Augmentation
In our simulations, we configure the UAV air–ground channel fading type and expansion factor. Data augmentation is then performed on the A2G channel samples extracted from the measured dataset [
32] to generate an expanded dataset with various LoS/NLoS ratios and fading parameters.
The IQ visualization diagrams of the six types of signals before and after expansion are shown in
Figure 9.
Then, we calculate the consistency before and after augmentation according to the following formula as given by
where
denotes the amplitude of the
i-th sample point in the original signal,
represents the amplitude of the corresponding sample point in the augmented signal, and
N is the total number of sample points in one signal segment.
where
D represents the number of transformed bits, and
M denotes the total number of bits.
Table 1 presents the performance of the core feature consistency of the original UAV air–ground signal training dataset for this study under the conditions of 10-fold, 20-fold, and 30-fold expansion multiples, with the consistency including both signal modulation features and UAV air–ground channel fading features. We can see that after data expansion using the proposed model–data dual-driven expansion strategy, the feature matching degree between the expanded data and the original data remains stably above 99%, which fully meets the preset expansion indicators. This is because, based on the design logic of model–data dual drive, this strategy can accurately control the core feature attributes of the original data during the dataset expansion process, effectively avoid the problem of feature deviation during the expansion, and reduce the interference of expansion operations on the integrity of the original data features. Thus, it ensures the high quality and high stability of the expanded dataset and provides reliable data support for subsequent models training.
Furthermore, to verify the impact of dataset augmentation on the training stability and convergence of the proposed STF-DRSN, we plot the training loss curves of the model before and after data expansion as shown in
Figure 10.
Figure 10 presents the training loss curves of STF-DRSN before and after the implementation of the dataset augmentation strategy. We can see that the augmented version achieves a lower initial loss and converges to a stable state of 0.28 within only 60 epochs, demonstrating a significantly faster convergence speed compared to the original model. The loss trajectory of the non-augmented STF-DRSN is much higher and exhibits significant fluctuations during the first 30 epochs. This is because the proposed method introduces the model–data dual-driven augmentation during the data preparation phase, encouraging the model to learn from a more diverse and authentic data distribution, thereby effectively alleviating the impact of overfitting and enhancing the overall training stability of the network.
Next, we select 100 signals of each type from the dataset that are augmented by 10 times. The formula for calculating the accuracy is as follows
where
C represents the number of signal categories, and
M represents the Confusion Matrix, a matrix for evaluating the performance of a classification model. The element
denotes the number of samples belonging to class
i that are classified as class
j, and
refers to the element at the
i-th row and
i-th column in the confusion matrix, which corresponds to the number of samples in class
i that are correctly classified as class
i. The numerator is the number of all correctly classified samples, and the denominator is the total number of test samples.
The overall estimation results of the six signal modulation methods are presented in the form of a confusion matrix as shown in
Figure 10.
Figure 11 shows the model ability to distinguish between different modulation modes for non-expanded samples and expanded samples. We can see that the overall recognition accuracy is 90.83% before expansion and 97.83% after expansion. The sample recognition accuracy processed by the dataset expansion strategy is improved by 7%, which significantly enhances the model ability to distinguish different modulation modes. This is because the dataset expansion strategy supplements valid samples reasonably, increasing the sample size while preserving the core features of various modulated signals. It can help the models learn the feature differences of various modulated signals comprehensively, and address the problems in insufficient feature learning and judgment deviation caused by insufficient sample quantity. Thereby, the strategy lowers the misjudgment rate and improves the modulation recognition accuracy.
6.2. Performance Comparison Before and After Noise Variation
In the research on UAV air–ground signal modulation recognition, although the analysis of signals in an ideal noise-free environment can simplify the initial verification process of algorithms, it is significantly disconnected from real UAV low-altitude air–ground communication scenarios. The unavoidable noise interference in actual air–ground channels directly makes it difficult to apply the research conclusions obtained in noise-free environments to practical situations. Therefore, the active introduction of noises in modulation recognition research is essential to align the models with the signal transmission characteristics of UAV low-altitude propagation engineering practice during the training phase.
In this subsection, we further optimize the UAV air–ground channel dataset construction scheme. The SNR of the background environment of the augmented dataset is reduced by 5 dB to simulate the complex low-altitude propagation environment with severe noise. On this basis, we conduct training on the augmented dataset based on UAV air–ground channel environmental background information to improve the model adaptability to scenarios with different noise intensities and fading severities. The confusion matrix showing the variation results of the six signal modulation methods is presented in
Figure 12.
Figure 12 presents the modulation recognition accuracy of the proposed STF-DRSN across a range of SNR levels. We can see that the proposed method achieves robust performance at higher SNRs, but its accuracy drops significantly as the SNR decreases by 5 dB, demonstrating a clear dependency on the signal-to-noise ratio. The identification accuracy of the model under high-noise conditions is markedly lower than that in clear-signal environments, showing a significant performance degradation as the noise floor rises. This is because the enhancement of noise effectively masks the subtle discriminative features of certain modulation signatures, encouraging the model to confuse different modulation modes with high structural similarity, thereby effectively compromising the network’s ability to capture the core physical differences between signal categories. Notably, the inherent similarity between specific modulation pairs further exacerbates this judgment deviation under intensified noise interference.
To demonstrate the independent contribution and necessity of each core module in the proposed framework, we conduct an ablation experiment under the condition of SNR = 15 dB. The complete proposed framework is used as the baseline. Only one core module is removed or replaced separately, while all other training configurations, network parameters, and test conditions remain exactly the same, to ensure the fairness of the comparison. The ablation experiment results of each modulation type and the average accuracy are summarized in
Table 2.
Table 2 presents the modulation recognition accuracy of the complete STF-DRSN and its ablated variants. We can see that the complete proposed model achieves the highest identification accuracy across all signal types, and this architecture exhibits a measurable performance gain over all simplified counterparts. The identification accuracy of the ablated models is consistently lower than that of the full framework, with the replacement of the adaptive soft thresholding mechanism by a fixed threshold causing the most significant performance degradation of 6.27%. This is because the proposed method integrates a collaborative pipeline where the dual-driven augmentation alleviates overfitting by providing high-fidelity samples, and the BL-LSTM module captures essential environmental prior information. This synergy encourages the model to extract discriminative features while dynamically suppressing noise-related redundant information, thereby effectively alleviating the performance bottleneck caused by small samples and complex channel distortions.
Based on this, subsequent research can promote targeted optimization: on the one hand, improve the feature extraction module to strengthen the model’s resistance to noise interference and reduce the weakening effect of noises on the features of modulated signals; on the other hand, design special differentiation strategies for easily confused modulation modes to enhance the model ability to distinguish similar features. In this way, the recognition accuracy and stability of the model in complex noise environments can be comprehensively improved.
6.3. The Impact of Classification Algorithms on Recognition Performance
We compare the proposed STF-DRSN with seven benchmarks, i.e., the multi-domain feature enhancement network (MDFEN) [
33], the time-optimized reduced-feature device (TO-RFD) [
34], the deep feature learning network (DFLNET) [
35], the zero-sample signal recovery (ZLSR) [
36], the residual structure neural network (RESNET) [
37], the generative adversarial networks (GANs) [
16], and the Transformer [
22].
Figure 13 presents the average modulation recognition accuracy of different methods versus training sample size. According to references [
17,
19], in low-altitude intelligent UAV communication scenarios, the critical threshold for small samples is defined as 5% of the total samples. Aligning with our specific research context, we set 100 training samples per category as the critical threshold of small samples, and vary the sample size from 30 to 200 in the simulation to demonstrate the performance of our method under small-sample conditions. We can see that the proposed STF-DRSN achieves the highest identification accuracy across all tested sample sizes, and it maintains remarkably stable performance even under the ultra-small sample conditions (e.g., 30 and 50 groups per class). The identification accuracy of other baseline methods, such as GANs, DFLNet, and TO-RFD, shows a sharp performance degradation as the sample size decreases. This is because the proposed method integrates a model–data dual-driven augmentation strategy and an adaptive soft thresholding mechanism during the feature extraction process. Such combination encourages the network to prioritize core discriminative features while suppressing the detrimental effects of noise and environmental fluctuations, effectively alleviating the risk of overfitting typically associated with sparse data scenarios and eliminating environment-related redundant information.
Figure 14 illustrates the average modulation recognition accuracy of the compared models across a wide SNR spectrum ranging from 0 dB to 30 dB. It can be observed that the proposed STF-DRSN consistently maintains a dominant lead, initiating with a high baseline of 92% at 0 dB and converging toward 98% at 30 dB. We can also see that, while the Transformer model occupies the second tier, other baseline architectures such as TO-RFD, DFLNet, and GANs exhibit significantly lower classification performance, particularly in the high-noise region. Unlike the other seven models whose feature extraction capabilities are severely suppressed under low-SNR conditions, the proposed STF-DRSN maintains high-fidelity recognition accuracy even when the signal is heavily distorted by environmental interference. The reason lies in the collaborative operation of the BL-LSTM background learning module and the adaptive soft thresholding mechanism. By explicitly mining environmental prior information, our method can accurately estimate the instantaneous noise level and dynamically adjust the shrinkage thresholds within each residual block. Therefore, our method can precisely prune noise-related redundant features while preserving the essential high-dimensional signatures of the modulation signals, effectively neutralizing the impact of low SNR on decision boundaries.
Figure 15 shows the comparison of recognition accuracy between the method proposed in this paper and the other five algorithms under SNR of 20 dB. We can see that within the entire experimental test range, the STF-DRSN does not experience a significant performance decline caused by slight fluctuations in experimental conditions, demonstrating good performance stability. Moreover, our method maintains an overall higher recognition accuracy when compared with the other five algorithms. This is because, by combining physical models and the measured data enhancement, a hybrid dataset with high coverage is constructed, which effectively alleviates the problem of insufficient samples. Then, training based on the expanded dataset incorporating environmental background information further introduces prior knowledge such as environmental noises, multipath effects, and Doppler frequency shifts. This optimizes the authenticity of data distribution, improves the physical rationality of the dataset, and enables the model to have stronger adaptability in complex electromagnetic environments. Finally, a hierarchical neural network is used to automatically extract the essential features of signals, achieving high-precision parameter estimation and classification.
6.4. Robustness Verification Under UAV Dynamic Mobility Scenarios
We evaluate the robustness of the proposed scheme against the time-varying channel interference caused by UAV dynamic movement. The dynamic scenario settings strictly follow the UAV signal propagation model established in
Section 2.1: the UAV moves along a linear uniform trajectory, with the carrier frequency fixed at 2.4 GHz, and the Doppler shift and time-varying channel gain are calculated according to the UAV moving speed. All experiments are conducted under a fixed SNR of 15 dB, to ensure a fair comparison between different methods. We test the modulation recognition accuracy of the proposed STF-DRSN under typical UAV moving speeds (0 m/s to 40 m/s), and compare it with three representative baseline methods: the standalone BL-LSTM model, the GANs-based AMC method, and the Transformer-based AMC method. The quantitative results are summarized in
Table 3.
Table 3 presents the modulation recognition accuracy of the proposed STF-DRSN model and three baseline methods under different UAV moving speeds with a fixed SNR of 15 dB, aiming to verify the robustness of the proposed scheme against time-varying channel interference caused by UAV dynamic mobility. It can be observed from the table that all methods show a gradual decrease in modulation recognition accuracy as the UAV moving speed increases, but the STF-DRSN model maintains the highest recognition accuracy across all moving speed scenarios, with a minimal accuracy drop of only 4.37%, which is significantly superior to the three baseline methods. This is because the STF-DRSN integrates a collaborative pipeline consisting of dual-driven augmentation and BL-LSTM module, while the BL-LSTM module effectively captures essential environmental prior information and dynamically suppresses noise-related redundant information, thereby enhancing the model adaptability to time-varying channel interference caused by UAV dynamic movement.
The excellent dynamic robustness of the proposed method benefits from the synergistic design of the three core modules: first, the model–data dual-driven augmentation integrates dynamic channel and Doppler shift characteristics into the training dataset, enabling the model to learn the feature variation rules under dynamic movement in advance; second, the BL-LSTM module effectively extracts the dynamic environmental features including time-varying fading and Doppler shift, providing accurate environmental prior information; finally, the adaptive soft threshold mechanism of STF-DRSN can dynamically adjust the denoising intensity according to the changing environmental features, effectively suppressing the interference caused by UAV movement while retaining the core modulation features. The experimental results fully verify that the proposed method can well adapt to the dynamic mobility scenarios of UAVs, and has strong practical application value.
6.5. Complexity Analysis
To evaluate the engineering deployment feasibility of the proposed STF-DRSN on resource-constrained UAV airborne platforms, this section conducts a quantitative analysis about the model complexity and real-time inference performance. The hardware platform parameters are specified in
Section 6.1, with a fixed input IQ sequence length of 1024 points and batch size = 1.
The model complexity is measured by two universally adopted academic indicators. Trainable Parameters, which reflect the storage overhead of the model and directly determine the memory occupation on the airborne platform, and Floating Point Operations (FLOPs) per single-sample inference, which characterize the computational overhead of the model and directly dominate the computing power requirement for inference. Meanwhile, the real-time inference performance is quantified by the end-to-end single-sample inference time, with the inference latency tested separately on the server side and the airborne embedded side, to comprehensively reflect the real-time processing capability of the proposed model in practical deployment.
The quantitative test results of model complexity and inference performance are shown in
Table 4.
Table 4 presents the computational complexity and inference performance of the proposed STF-DRSN under different hardware platforms. We can see that the proposed method achieves the lowest memory footprint and shortest execution latency on all evaluation dimensions, and maintains a remarkably efficient performance on both server-side and airborne embedded deployments. This is because the proposed method introduces an adaptive soft-threshold shrinkage mechanism within each residual block, encouraging the model to prune noise-contaminated redundant features at an early stage, thereby effectively alleviating the impact of excessive computational load on airborne hardware resources.
7. Conclusions
In this paper, aiming at the pain points of small-sample, strong noise and multi-scale fading in UAV low-altitude air–ground communication scenarios, we proposed an enhanced UAV signal modulation estimation method focusing on small-sample augmentation and soft threshold denoising. First, we developed a model–data dual-driven dataset expansion method by fusing the electromagnetic propagation physical model with measured data, which maintains a core feature consistency of over 99% and can accurately reproduce the LoS/NLoS dynamic switching and Doppler shift characteristics of UAV air–ground channels. Then, we designed a BL-LSTM network trained on the augmented dataset with environmental information, achieving the accurate extraction of UAV air–ground channel background features and real-time SNR estimation. Finally, we proposed the STF-DRSN; by integrating the soft threshold anti-interference capability and the attention-driven threshold structure, the network suppresses noise and enhances the feature extraction capability, thus ensuring the stability of modulation recognition accuracy in UAV dynamic flight scenarios. Experimental results showed that the proposed STF-DRSN outperformed state-of-the-art algorithms by more than 2% in UAV signal modulation estimation accuracy, and achieved an average recognition accuracy of over 95% under low-SNR conditions with severe UAV air–ground channel fading. In the future, our work will extend to complex modulation types of UAV communication signals and optimize network complexity for UAV onboard terminal real-time processing.