Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios

Jin, Fang; Shao, Yang; He, Yunhong; Ye, Zhihao; He, Fangmin; Lin, Zhipeng; Xiao, Han

doi:10.3390/drones10040257

Open AccessArticle

Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios

by

Fang Jin

¹,

Yang Shao

²,

Yunhong He

²,

Zhihao Ye

¹,

Fangmin He

¹,

Zhipeng Lin

^2,*

and

Han Xiao

^1,*

¹

Naval University of Engineering, Wuhan 430033, China

²

College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

^*

Authors to whom correspondence should be addressed.

Drones 2026, 10(4), 257; https://doi.org/10.3390/drones10040257

Submission received: 13 February 2026 / Revised: 30 March 2026 / Accepted: 30 March 2026 / Published: 3 April 2026

(This article belongs to the Special Issue Recent Developments in Artificial Intelligence and Interdisciplinary Research for UAV Application)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A novel model–data dual-driven method is proposed to effectively expand the UAV signal training dataset while preserving essential feature consistency of the air–ground channel.
A background learning-based long short-term memory (BL-LSTM) model and a deep residual shrinkage network based on the soft threshold function (STF-DRSN) are built to improve the anti-interference capability, thereby significantly boosting the environmental adaptability of UAV modulation recognition.

What are the implications of the main findings?

The proposed method effectively addresses the critical challenge of insufficient training data in UAV small-sample scenarios, establishing a reliable data foundation for accurate modulation recognition.
By successfully mining and integrating background environmental features from air–ground channels, the proposed method substantially enhances the robustness and adaptability to complex low-altitude environments.

Abstract

As a key technology for wireless signal identification, modulation recognition plays an important role in the fields of unmanned aerial vehicle (UAV) communications, low-altitude spectrum management, etc. However, the accuracy of modulation recognition often cannot be guaranteed in scenarios with serious noise interference when a few samples are available. In this paper, we propose an intelligent modulation recognition method for UAV signals based on small-sample augmentation and soft threshold denoising. We first propose a new dual-driven dataset expansion method by combining the UAV air–ground channel propagation model with the received data samples. Then, we construct a background learning-based long short-term memory (BL-LSTM) model to extract the environmental background features embedded in the UAV signal, including Line-of-Sight (LoS) state, multi-scale fading parameters and Doppler shift characteristics. We integrate environmental background information into the data training model and optimize the authenticity of data distribution. As a result, the model adaptability can be enhanced. Finally, we construct a deep residual shrinkage network based on the soft threshold function (STF-DRSN). By leveraging the capability of the soft threshold that resists noise interference, we integrate it into each residual block of the deep residual shrinkage network. Simulation results show that compared with the state of the art, our method can improve the modulation recognition accuracy of UAV signals in small-sample scenarios.

Keywords:

UAV communications; modulation recognition; deep learning; dataset augmentation; deep residual network

1. Introduction

With the rapid development of wireless communications, modulation recognition has become very important in modern wireless communication systems [1,2]. As the low-altitude economy booms, unmanned aerial vehicle (UAV) air–ground communication is widely applied in civil and industrial scenarios, making automatic modulation recognition (AMR) a core technical support for the safe operation of UAV systems and standardized governance of low-altitude electromagnetic spectra. The accurate recognition of UAV signal modulation is also critical to avoiding inter-UAV electromagnetic interference and safeguarding the order of the low-altitude communication environment [3].

AMR is often applied in electronic warfare and military software-defined radio [4,5,6]. This technique can accurately analyze the parameters of enemy communication UAV signals and provide key intelligence support for electronic countermeasures and target positioning. In the field of electromagnetic spectrum management, AMR can identify illegal UAV signals, malicious interference sources, and unauthorized spectrum occupancy behaviors [7,8]. It provides core technical support for the compliant scheduling, security protection, and efficient utilization of spectrum resources, and thus becomes an important component of the electromagnetic spectrum governance system [9,10]. In UAV air–ground signal processing and identification tasks under complex low-altitude propagation environments with multi-scale fading and LoS/NLoS switching, data quality and algorithm robustness directly affect the performance of AMR [11,12]. Therefore, improving modulation recognition accuracy has been a key focus of current research.

The core operation mode of conventional modulation recognition relies heavily on manual intervention [13]. In these manual-dependent methods, the process of identifying UAV signal modulation types centers entirely on operators. This limitation severely restricts the application effectiveness of these methods in complex communication scenarios. The accuracy and reliability of recognition results depend heavily on the expert cognitive level and subjective judgment tendencies. From the practical operation perspective, different operators vary individually in professional knowledge reserves, signal feature interpretation capabilities, and the degree of experience accumulation—these differences further amplify the instability of recognition outcomes. Meanwhile, with the rapid iteration of communication technologies and the exponential growth of modulation types, the traditional manual experience system can hardly achieve comprehensive coverage. More critically, when faced with scenarios where the number of samples is scarce, manual methods lack sufficient data support to extract universal recognition rules and are unable to cope with the variation patterns of signal features across different scenarios, leading to a cliff-like decline in recognition accuracy. This problem is particularly prominent in environments with low signal-to-noise ratio (SNR) and superimposed multiple interferences [14]. The gap between the expanding technical scope and the limited experience system have directly led to a systematic decline in recognition accuracy.

Modulation recognition based on deep learning are widely used in recent years. The authors of [15,16] applied generative adversarial networks (GANs) to data classification and recognition. In modulation identification, GANs can be used to generate diverse modulated signal data, addressing the scarcity of high-quality data while enhancing the generalization ability of the models [16]. The authors of [17] proposed a mixed-Domain feature fusion network (MDFNet) for few-shot modulation identification. This network can effectively fuse the time-domain and frequency-domain features of signals, achieve a certain noises reduction effect on signal feature transmission, and enhance the ability to capture valid information of models. The authors of [18] designed a SEI method based on Transfer Learning. This method freezes some layers of the complex-valued neural network (CVNN) trained in the source domain, generalizes the learned transferable universal features to new modulation signal identification tasks, and uses the maximum mean discrepancy (MMD) metric as regularization for supervised learning to reduce the distribution difference between the source domain and the target domain in the latent space. The authors of [19] introduced a framework that combines multi-domain contrastive learning with reinforcement learning. The multi-domain representation of signals enhances the richness of features, while the integrated architecture of contrastive learning and reinforcement learning can extract deep features for classification. Notably, most of deep learning methods focus on extracting the intrinsic features of the signals themselves, such as time-domain and frequency-domain features, while neglecting the systematic mining and modeling of environmental background features. Environmental background features (e.g., UAV air–ground channel LoS/NLoS state, multi-scale fading parameters (Rice factor/Nakagami parameter), path loss coefficient, multipath propagation delays, and Doppler shift intensities induced by UAV low-altitude flight) directly affect the propagation characteristics and final manifestations of signals [20]. However, existing methods typically treat such environmental features as pure interference redundancy, rather than integrating them into model training as interpretable and valuable auxiliary discriminative information. This insufficient background analysis prevents models from distinguishing intrinsic signal features from environment-induced spurious features, leading to weak adaptability in the dynamic and complex electromagnetic environments and increased dependence on large-scale labeled samples.

In the field of deep learning-based modulation recognition, researchers have proposed a variety of specific methods to address key issues such as feature extraction and accuracy improvement. The authors of [21] proposed a method for orthogonal frequency division multiplexing (OFDM) signal modulation pattern recognition that combines high-order cyclic cumulants with neural networks. They used neural networks to learn manually extracted feature representations, thereby improving recognition accuracy and robustness while reducing the complexity of feature engineering. The authors of [22] presented an AMR method based on the transformer-context broadcasting (TCB) model. The Transformer model was introduced into AMR to extract the global correlation features of signals; unified attention was manually inserted to increase signal density, enabling higher classification accuracy. The authors of [23] proposed a convolutional neural network (CNN) model that operates on fast Fourier transform window banks (FWBs) to extract useful symbol lengths in OFDM—these symbol lengths serve as indicators for identifying each OFDM-based wireless communication technology. After extracting the useful OFDM symbol lengths, a deep learning (DL)-based automatic modulation classification (AMC) system was proposed. This system combines FWB with in-phase and quadrature-phase signals to classify both OFDM symbol lengths and single-carrier modulation schemes simultaneously [24]. However, mainstream deep learning models have limited generalization ability in complex channel environments.

To address these issues, we present a new intelligent modulation recognition method based on small-sample augmentation and soft threshold denoising. As shown in Figure 1, the proposed STF-DRSN framework follows a four-stage pipeline: (1) UAV signal dataset collection, (2) model–data dual-driven dataset expansion to alleviate small-sample scarcity, (3) BL-LSTM-based environmental background learning to capture signal context features, and (4) STF-DRSN-based signal feature estimation and modulation recognition. The detailed design of each module is elaborated in the following sections. Our method differs from existing deep learning-based AMC methods in the core aspects, in that we use a model–data dual-driven strategy for small-sample processing, embed a BL-LSTM module to improve the environmental adaptability, and adopt an STF-DRSN with adaptive soft thresholding to achieve stable, high-precision recognition at low SNR. The contributions of this paper are summarized as follows:

1.: We propose a dual-driven dataset expansion method based on the UAV signal propagation model and received data samples. Different from most existing methods that rely solely on measured data augmentation or pure model-based simulation, our approach combines the UAV signal propagation model with measured data augmentation. Therefore, we can accurately expand the number of training samples and maintain a high level of feature consistency between the expanded dataset and the original dataset.
2.: We construct a background learning-based long short-term memory (BL-LSTM) model for adaptive modulation recognition in complex electromagnetic scenarios. Different from traditional LSTM-based recognition models that only extract signal temporal features and neglect environmental background information, our model deeply integrates environmental feature mining into the basic network. By integrating UAV air–ground channel environmental background information into the data training model and optimizing the authenticity of data distribution, the environmental adaptability of the proposed modulation recognition method to dynamic air–ground propagation scenarios can be enhanced.
3.: We construct a new deep residual shrinkage network based on the soft threshold function (STF-DRSN). Our improvement is integrating the soft threshold function into each residual block of the deep residual shrinkage network, enabling adaptive denoising and feature enhancement. The construct ensures that the modulation recognition accuracy of our method does not change significantly in different scenarios. The simulation results show that the average estimation accuracy of the proposed STF-DRSN is more than 95% under low-SNR conditions, outperforming existing residual network-based methods.

The remaining part of this paper is organized as follows: Section 2 introduces the system models. We complete the dataset augmentation based on model–data dual-driven in Section 3. Section 4 presents the technology of training on augmented datasets based on environmental background information. Section 5 completes the estimation of UAV signal features, and then Section 6 conducts simulations to obtain results. Finally, Section 7 provides the research conclusions.

2. System Model

2.1. UAV Air–Ground Signal Propagation Model

A typical low-altitude UAV air–ground communication scenario is illustrated in Figure 2, with the UAV as the transmitter (Tx) and the ground terminal as the receiver (Rx). This propagation model characterizes the physical distortion of the transmitted signal during wireless transmission, and the baseband equivalent expression of the final received signal is given by [17]

r (t) = x (t) \cdot h_{PL} \cdot h_{SF} \cdot h_{SS} (t) + n (t),

(1)

where

x (t)

is the modulated complex baseband signal transmitted by the UAV, which carries the modulation information to be identified. The model quantifies the cumulative impairment of the wireless channel on the transmitted signal:

h_{PL}

denotes the path loss gain, describing the deterministic signal attenuation as the Tx-Rx communication distance increases;

h_{SF}

is the log-normal shadow fading gain, characterizing slow signal fluctuations caused by obstacles such as buildings and vegetation along the propagation path;

h_{SS} (t)

represents the time-varying small-scale fading gain, which adopts the Rice fading model for line-of-sight (LoS) scenarios with a dominant direct path, and the Rayleigh or Nakagami fading model for non-line-of-sight (NLoS) scenarios with only scattered multipath components, with the Doppler shift caused by UAV mobility integrated into this term. Finally,

n (t)

is the zero-mean additive white Gaussian noise (AWGN) introduced by the receiver and the surrounding electromagnetic environment.

2.2. UAV Air–Ground Channel Fading and Noise Model

To accurately simulate the UAV low-altitude propagation environment and support subsequent dataset expansion, we adopt widely recognized channel and noise models matching UAV communication scenarios. The log-normal distribution is used for shadow fading, while Rice, Rayleigh and Nakagami distributions are adopted for small-scale fading under different channel states, with time-varying Doppler shift generated based on the harmonic superposition principle. AWGN conforming to theoretical statistical characteristics is generated via the Box–Muller transform method to simulate received signals under different signal-to-noise ratio (SNR) conditions. Based on the above models, we construct the augmented training data as

r_{aug} (t) = r (t) \cdot h_{AG} (t, ξ) + n (t),

(2)

where

r_{aug} (t)

is the modulated UAV air–ground signal with Gaussian white noises added in a low-SNR environment with severe air–ground channel fading,

r (t)

is the original received air–ground signal,

h_{AG} (t, ξ)

is the selected UAV air–ground channel multi-scale fading model with

ξ = 1

for the LoS state and

ξ = 0

for the NLoS state, and

n (t)

is the generated Gaussian white noises.

r_{aug} (t)

,

h_{AG} (t, ξ)

,

ξ

(LoS/NLoS state) and

n (t)

are taken as a set of augmented training data, with the addition of air–ground channel key parameters (Rice factor K, Nakagami parameter m, and path loss coefficient).

2.3. Problem Modeling

Considering the small-sample scenario for machine learning-based UAV air–ground signal feature estimation with dynamic LoS/NLoS switching and multi-scale fading in air–ground channels, we propose a modulation recognition method integrating small-sample augmentation and soft-threshold denoising. To effectively alleviate the sample shortage, the dataset is augmented under the model–data dual drive based on the UAV air–ground channel propagation model—this lays a solid data foundation for subsequent high-precision recognition in dynamic air–ground propagation scenarios. During model training, the augmented dataset is leveraged with the integration of environmental background information; this integration further optimizes the adaptation to complex interference environments, avoiding performance deviations caused by disconnected simulated UAV air–ground channel data and real low-altitude propagation data. For signal feature estimation, deep learning techniques are applied to automatically extract intrinsic signal features, ensuring the accuracy of subsequent modulation type identification. Notably, automatic modulation recognition takes the received signal as input and outputs the identified modulation type. Based on this working principle, the modulation recognition problem in interference environments can be expressed as

min_{M} Δ (M (\tilde{Y}), m),

(3)

where

\tilde{Y}

is the received interfered signal, and m is the modulation type adopted for it. We optimize the modulation recognition model

M (\cdot)

such that the difference between the recognized modulation type

M (\tilde{Y})

of the received signal and the actual modulation type m is minimized under a given criterion

Δ (\cdot)

.

2.4. Technical Implementation

We first expand the dataset based on the model and data. By combining ray tracing (RT)-based UAV air–ground channel physical models simulation and the measured air–ground communication data, a high-coverage hybrid dataset is constructed to effectively alleviate the problem of insufficient samples. The UAV air–ground channel physical models can simulate signal characteristics under different low-altitude propagation conditions, while the measured data augmentation improves data diversity through methods such as noises injection and time–frequency transformation, ensuring that the training set can cover a wider range of signal scenarios.

Then, we train the dataset based on the environmental background information. The prior knowledge of UAV air–ground channels, including environmental noises, multipath effects, LoS/NLoS state, multi-scale fading parameters, and Doppler frequency shift induced by UAV low-altitude flight, is introduced to optimize the authenticity of data distribution. This step not only improves the physical rationality of the dataset but also enables the models to have stronger adaptability in complex electromagnetic environments, avoiding performance degradation caused by excessive differences between simulated UAV air–ground channel data and the real low-altitude propagation environment.

We finally estimate signal features based on deep learning. A hierarchical neural network is used to automatically extract the essential features of signals, achieving high-precision parameter estimation and classification. Compared with traditional methods, deep learning models can adaptively learn the nonlinear features of signals and combine environmental information for more robust reasoning.

The proposed framework unifies data augmentation, environmental background learning, and soft-threshold denoising into a collaborative pipeline, as none of these modules can individually address the multiple challenges in UAV signal modulation recognition. Using data augmentation alone can expand the sample scale, but without effective denoising, it will also amplify noise. Relying solely on background learning makes it difficult to extract stable environmental features under small-sample conditions and leaves the model vulnerable to noise interference. Adopting soft-threshold denoising alone lacks reliable environmental priors, which may either eliminate useful signal components or fail to suppress noise sufficiently.

The model–data dual-driven augmentation provides high-quality and diverse samples for BL-LSTM to learn accurate mappings among noise, channel, and modulation features. The environmental background information extracted by BL-LSTM further serves as critical prior knowledge for the adaptive soft-threshold mechanism in STF-DRSN, enabling dynamic and reasonable adjustment of denoising intensity. Meanwhile, the denoised output from STF-DRSN provides high-quality input for background learning to ensure stability. Through this synergistic cooperation, data augmentation compensates for sample insufficiency, background learning enhances environmental adaptability, and soft-threshold denoising strengthens anti-noise robustness.

The proposed three parts of work are interrelated and form a closed-loop optimization framework: data augmentation provides a foundation for training; the injection of environmental information improves the authenticity and availability of data; and the feedback results of feature estimation can further guide the iterative optimization of data generation and training strategies. This technical chain not only solves the problem of signal processing under small-sample conditions but also provides a full-link solution for intelligent electromagnetic signal analysis from data generation to models training, which has strong theoretical significance and practical application value. The connections between the technologies are shown in Figure 3.

3. Model–Data Dual-Driven Dataset Expansion

3.1. Composite Fading Channel Model

Radio waves undergo reflection, refraction and diffraction by obstacles during propagation in the UAV low-altitude air–ground propagation environment. The received air–ground signal is typically a superposition of multi-cluster path signals, with each cluster composed of indistinguishable scattering branches. Measured data demonstrate that the shadow fading power, a form of large-scale fading in air–ground channels, follows a log-normal distribution [25], which is expressed as

f_{β_{i}} (β) = \frac{1}{\sqrt{2 π} σ_{i, β} β} e^{- {(ln β - μ_{i, β})}^{2} / 2 σ_{i, β}^{2}},

(4)

where

β

denotes the shadow fading gain, and

σ_{i, β}

and

μ_{i, β}

correspond to the standard deviation and regional mean of shadow fading respectively.

We adopt the fading models of Rayleigh channel and Rician channel, which are expressed as

p_{rayl} (r) = \frac{r}{σ_{0}^{2}} exp (- \frac{r^{2}}{2 σ_{0}^{2}}) r \geq 0,

(5)

where

p (r)

denotes the Probability Density Function (PDF).

p_{rici} (r) = \frac{r}{σ_{0}^{2}} exp (- \frac{r^{2} + L^{2}}{2 σ_{0}^{2}}) I_{0} (\frac{r L}{σ_{0}^{2}}) r \geq 0, L \geq 0,

(6)

where L is the amplitude of the line-of-sight propagation component, and

I_{0}

is the zero-order modified Bessel function.

To adapt to the complex small-scale fading characteristics of UAV air–ground channels in non-ideal NLoS scenarios, we also adopt the Nakagami fading model, which is expressed as

p_{n a k a} (r) = \frac{2 m^{m}}{Γ (m) Ω^{m}} r^{2 m - 1} e x p (- \frac{m}{Ω} r^{2}) r \geq 0, m \geq 0.5,

(7)

where m is the Nakagami fading parameter for characterizing the severity of air–ground channel fading,

Ω = E [r^{2}]

is the average power of the signal envelope, and

Γ (\cdot)

is the Gamma function. When

m = 1

, the Nakagami fading model degrades to the Rayleigh fading model, which is suitable for the UAV air–ground channel with severe NLoS fading.

It is worth emphasizing that the received signals of different propagation paths experience different environments, resulting in the composite fading of each cluster being uncorrelated with one another. The impulse response corresponding to the time-varying multipath equivalent baseband channel [26] is expressed as

\bar{h} (t, τ) = \sum_{l = 1}^{L} α_{l} β_{l} δ (τ - τ_{l}),

(8)

where

α_{l}

denotes the UAV air–ground channel path loss for the l-th path, and

β_{l}

represents the air–ground channel multi-scale fading (shadow fading + small-scale fading) for the l-th path. For the LoS path,

β_{l}

is dominated by Rice fading; for the NLoS path,

β_{l}

is dominated by Rayleigh or Nakagami fading.

In dynamic UAV air–ground communication scenarios, the UAV transmitter maintains low-altitude flight while the ground receiver is in a fixed or low-speed mobile state. Combined with the varying low-altitude propagation environment such as terrain and building occlusion, channel parameters including time delay, maximum Doppler shift, path loss and line-of-sight/non-line-of-sight state exhibit random characteristics. Meanwhile, continuous changes in UAV air–ground channel scenarios such as UAV flight height and propagation distance result in regular and continuous variations of time delay, maximum Doppler shift, path loss and fading parameters. Channel parameters including the low-altitude propagation environment, time delay, maximum Doppler shift, path loss and line-of-sight/non-line-of-sight state show randomness at different moments when the UAV is in high-speed low-altitude flight.

3.2. A2G Channel Simulation Generation

The simulation process starts with the composite fading model and integrates Gaussian white noise generation and Doppler shift fusion to generate multi-scenario channel data, which lays a foundation for the subsequent dataset fusion with measured data.

As shown in Figure 4, the left modules represent the core steps of simulation, the right panels detail the key methods of each step, and the dark blue highlighted content marks the formula-related technical points. The bottom logic bar summarizes the overall workflow, reflecting the transition from channel physical model to simulated-measured data fusion.

The simulation of wireless channel propagation characteristics is an integrated application of high-speed signal processing technology and computer application technology. The core component for implementing UAV air–ground channel propagation characteristic simulation technology is a digital multipath fading characteristic simulation module centered on a discrete tapped delay line, combined with ray tracing technology for three-dimensional low-altitude scene reconstruction.

We generate Gaussian random variables based on the principle of harmonic superposition to simulate the time-varying small-scale fading with Doppler shift characteristics, which can be expressed as

u_{i} (t) = \sqrt{\frac{2}{N}} \sum_{n = 1}^{N} cos (2 π f_{i, d} t cos α_{i, n} + φ_{i, n}),

(9)

where N denotes the number of unresolvable scattering branches,

f_{i, d}

,

α_{i, n}

, and

φ_{i, n}

represent the maximum Doppler frequency, the incident angle of each scattering branch, and the initial phase, respectively.

When simulating and generating log-normal (shadow fading), Rayleigh, and Rice random processes for A2G channel, they can be decomposed into Gaussian random processes. Among these, a Rayleigh random variable (NLoS small-scale fading) can be expressed as

β_{rayl} (t) = u_{1} (t) + j u_{2} (t) .

(10)

A log-normal random variable can be expressed as

β_{\log} (t) = e^{σ_{β} u_{0} (t) + m_{β}},

(11)

where

u_{i} (t) \sim N (0, 1)

.

UAV air–ground channel propagation characteristic simulation technology enables the simulation of air–ground channels under various low-altitude propagation scenarios, including static constant parameter channels, slow time-varying flat channels, Rice fading channels, Rayleigh flat channels, Nakagami fading channels and bidirectional Doppler channels matching UAV flight speed. These air–ground channels are applicable to different UAV low-altitude flight scenarios with varying flight heights and line-of-sight non-line-of-sight ratios, and reproduce the respective multi-scale fading characteristics of different propagation scenarios. The simulated channel data is combined with measured air–ground communication data to form the model–data dual-driven expanded dataset, which makes up for the shortage of original measured samples while retaining the core feature consistency with the original samples. The expanded dataset covers rich channel state characteristics and provides sufficient and high-quality data support for subsequent UAV signal modulation recognition model training.

4. Dataset Training Based on Environmental Background Information

To address the key technical challenges commonly encountered in signal propagation environments, such as low accuracy in signal modulation recognition and poor adaptability to complex environments, we propose a method based on deep learning network training. Specifically, we construct an expanded dataset incorporating rich environmental background information and use signal data with scenario-specific features as network input for systematic training. This method is designed to enhance the model adaptability to complex propagation environments while improving the accuracy of modulation recognition.

To further enhance the model ability to perceive and analyze signal propagation scenarios, on the basis of the aforementioned expanded signal dataset, we construct the background learning-based long short-term memory (BL-LSTM) model for background information extraction and SNR estimation. This model can deeply mine and extract the environmental background features embedded in the signal, and achieve real-time and reliable estimation of the SNR under different propagation scenarios. Ultimately, the model provides critical environmental parameter support for the subsequent optimization of signal modulation recognition accuracy.

4.1. BL-LSTM Model

We construct the BL-LSTM network model oriented to UAV air–ground channel environmental background information learning, with the input layer incorporating air–ground channel scene features (UAV flight height, propagation distance) and channel state features (LoS/NLoS state, fading parameter labels). The structure of the model is shown in Figure 5, which mainly consists of the following parts [27]:

Input Gate: This is governed by a sigmoid activation function to control new information integration into the memory cell. For dynamic environmental contexts, it prioritizes weight allocation to environment-relevant information and reduces weights for irrelevant data to avoid decision interference.
Forget Gate: This is a sigmoid-based gate that regulates the discarding of old memory cell information via 0–1 forgetting weights. It adaptively raises weights for environment-incompatible old information and lowers weights for environment-robust core information.
Cell State: This is the core of BL-LSTM for storing long-term dependent information, updated by the Input and Forget Gates. It dynamically optimizes storage strategies for continuous environmental fluctuations, preserving only current environment-compatible long-term information and short-term fluctuation-robust key data.
Output Gate: This uses a sigmoid function to weight information flow from the memory cell to the hidden state. It adjusts output weights based on environmental noise reduction demands, increasing weights for relevant memory information to prioritize its role in hidden state calculation and boost recognition accuracy.

The working process of the BL-LSTM network is as follows:

1.: The forget gate calculates the degree of forgetting for old information;
2.: The input gate computes the weight for incorporating newly input information;
3.: The state of the cell state is updated based on the weights of the input gate and the forget gate;
4.: The output gate calculates the weight for information flow from the cell state to the hidden state, and updates the hidden state.

Through the design of such a gating mechanism and memory cells, the BL-LSTM network can effectively capture the relevant dependencies of events in long sequences and achieve excellent performance in sequence modeling tasks with dataset expansion.

4.2. Background Information Extraction

The constructed BL-LSTM network is used to perform background information extraction and SNR estimation on the expanded dataset. The modulated signal with Gaussian noises is expressed as

r_{n} (t) \in r_{aug} (t)

and the constructed BL-LSTM is utilized to extract background information such as noises from the expanded dataset, which can be expressed as

{\bar{r}}_{n} (t) = B [r_{n} (t)] = B [r (t) \cdot h_{A G} (t, ξ) + n (t)] = α_{n} h_{A G} (t, ξ) + β_{n} n (t),

(12)

where

{\bar{r}}_{n} (t)

represents the UAV air–ground channel environmental background information extracted from the n-th modulated air–ground signal,

B [\cdot]

denotes the extraction function of the Long Short-Term Memory network model, and the extracted background information has the same shape dimension as the waveform of the modulated air–ground signal,

α_{n}

is the parameter of the UAV air–ground channel multi-scale fading model for the n-th modulated signal, which is consistent with the unit of signal,

β_{n}

is the unitless Gaussian white noises parameter of the n-th signal, and

ξ

is the LoS/NLoS state factor of the air–ground channel. The values of

α_{n}

and

β_{n}

are related to the UAV air–ground channel low-altitude propagation environment characteristics and the attributes of the modulated air–ground signal itself.

Since the proportion weight of Gaussian white noises in the extracted background information is much higher than that of the channel time-varying fading model, the power

P_{b g}

of the extracted background information is approximated as the noise power

P_{noise}

. Then, the noise power of the n-th signal within time

T_{0}

is expressed as

P_{noise} \approx P_{b g} = \frac{1}{T_{0}} \int_{0}^{T_{0}} {[{\tilde{r}}_{n} (t)]}^{2} d t .

(13)

The signal power

P_{signal}

of the n-th signal within time

T_{0}

is calculated can be expressed as

P_{signal} = \frac{1}{T_{0}} \int_{0}^{T_{0}} \{{[r_{n} (t)]}^{2} - {[{\tilde{r}}_{n} (t)]}^{2}\} d t .

(14)

By combining the signal power and noise power, the power-based SNR is estimated, which is expressed as

SNR = \frac{P_{signal}}{P_{noise}} .

(15)

Based on the above, we perform SNR and UAV air–ground channel fading parameter (Rice factor K/Nakagami parameter m) estimation on the air–ground signal, and input the extracted UAV air–ground channel environmental background information (LoS/NLoS state, fading parameters) and the estimated SNR into the subsequent modulation scheme estimation network to improve the recognition accuracy in dynamic low-altitude propagation scenarios.

5. Signal Feature Estimation Based on Deep Learning

5.1. Channel Threshold Function

Noise reduction methods typically include filtering, statistical signal processing, compressed sensing, and deep learning models. Among them, the noise reduction methods based on residual shrinkage networks (RSNs) have strong a feature extraction capability and satisfactory noise reduction performance. Meanwhile, soft thresholding is an effective approach to mitigate the impact of complex interference [28]. Therefore, we construct a channel threshold function in the deep residual shrinkage network. The generation process of the channel threshold function is illustrated in Figure 6.

We input the modulated UAV air–ground signal r, and extract the four-channel features of the signal through two convolutional neural networks, including the original three time–frequency features and an additional UAV air–ground channel fading feature channel (LoS/NLoS state and fading parameter characteristics), which can be expressed as

u_{h, w, c} = Cov [Cov (r)],

(16)

where h, w and c are the indices of the width, height, and channel of the feature

u_{h, w, c}

respectively, and

Cov (\cdot)

denotes the convolution function. Concatenate the three-channel features with the background information, perform global pooling to generate a global mean, which can be expressed as

A = G A P |c o n c a t (u_{h, w, c}, {\tilde{r}}_{n}, ξ)|,

(17)

where

c o n c a t

(\cdot)

denotes the concatenation operation,

G A P

| \cdot |

represents the global pooling operation, and

ξ

is the LoS/NLoS state factor of the UAV air–ground channel, incorporated to enhance the model adaptability to air–ground channel state changes. Input the global mean A into a two-layer fully connected network, where one layer uses linear activation and the other uses a sigmoid activation function; thus, a scaling factor in the set 0, 1 corresponding to each global mean is obtained, which can be expressed as

α_{c} = \frac{1}{1 + exp (- z_{c})},

(18)

where

Z_{c}

is the feature of the c-th neuron, and

α_{c}

is the i-th scaling parameter. Expand the scaling factor by two dimensions and then multiply it with the global mean; the final channel threshold function can be expressed as

τ_{c} = α_{c} \times A,

(19)

where

τ_{c}

is the threshold of the c-th channel of the feature r.

5.2. Attention Mechanism

The core of the attention mechanism model focuses on key information via weights, with three layers, i.e., Query (target), Key (identifier), and Value (content), and it requires the Query and Key to have the same dimension. The calculation layer usually uses a scaled dot-product for similarity and weights Value. Multi-head attention splits the sub-vectors for parallel computation and fusion. The output layer fuses the attention output with other features to adapt to downstream tasks like classification and generation. It directly models dependencies between any input positions, serving as a core structure in NLP and CV fields [29]. The attention mechanism is applied in the network, and its model structure is shown in Figure 7.

To obtain detailed features and effectively suppress irrelevant features caused by air–ground channel fading, for the four-channel features extracted from UAV air–ground signals under different SNRs and different LoS/NLoS states/fading parameters, we perform global pooling on UAV air–ground signals with adaptive weight allocation based on the severity of air–ground channel fading, pass through the first fully connected layer, implement batch normalization, and output the intermediate feature

Z_{1}

via the activation function, which can be expressed as

Q = GAP |u_{h, w, c}| .

(20)

Z_{1} = ReLU (BN (FC (Q))),

(21)

where

FC (\cdot)

denotes the operation of the fully connected layer;

BN (\cdot)

represents the operation of batch normalization; and

ReLU (\cdot)

indicates the operation of the ReLU activation function.

The intermediate feature

Z_{1}

undergoes the second fully connected layer and activation function again, and outputs the weight coefficient

α_{i}

, which can be expressed as

α_{i} = ReLU ({FC}_{2} (Z_{1})) .

(22)

The weight coefficient

α_{i}

is multiplied by the compressed output result of the global pooling layer to obtain the threshold corresponding to each feature

τ_{i}

, which can be expressed as

τ_{i} = α_{i} \times Q .

(23)

5.3. Update Feature Values

We construct a signal modulation estimation network structure that integrates the gradient optimization capability of deep residual networks (ResNets) [30] with the noise suppression characteristics of channel thresholds [31]. The network structure is shown in Figure 8.

We introduce a residual shrinkage module unit into the modulation recognition network to reduce the interference of noises on training and improve the model feature learning ability under low-SNR conditions. The deep learning attention mechanism is combined with the channel threshold function to form a dynamic channel threshold update structure. The threshold corresponding to each feature generated by the attention mechanism is added to the generated channel threshold

τ_{i}

, and then element-wise multiplied with the extracted three-channel features

u_{h, w, c}

to obtain the updated feature

U_{h, w, c}

, which can be expressed as

U_{h, w, c} = (τ_{i} + τ_{c} + γ \cdot θ) ⊙ u_{h, w, c},

(24)

where ⊙ denotes element-wise multiplication,

θ

is the UAV air–ground channel fading parameter (Rice factor K/Nakagami parameter m), and

γ

is the modulation coefficient for adapting the feature update to the severity of air–ground channel fading. A UAV air–ground communication signal dataset with different LoS/NLoS ratios and fading parameters is selected for training, and the trained network is used to perform modulation scheme estimation of UAV air–ground communication signals on the updated feature

U_{h, w, c}

in dynamic low-altitude propagation scenarios.

6. Simulation and Result Analysis

In this section, we present the simulation results and analysis to demonstrate the performance of our method. We first introduce the simulation settings, and then present the estimation accuracy of our method. We also compare the estimation accuracy of our method with that of the state of the art.

The model training platform of this experiment adopts Intel Xeon Gold 6240R CPU, NVIDIA RTX 3090 24 GB GPU and 64 GB DDR4 memory, and the onboard deployment test platform adopts NVIDIA Jetson Xavier NX simulating the resource-constrained scenario of UAV airborne platform, equipped with 6-core Carmel ARM CPU, 384-core Volta GPU and 8 GB shared memory. The software environment uniformly adopts Ubuntu 20.04 LTS operating system, Python 3.8, PyTorch 1.13.1 deep learning framework, CUDA 11.7 parallel computing architecture, cuDNN 8.5 deep neural network acceleration library, and Matlab R2023b to complete the simulation generation of channel and baseband signals.

We adopt a measured air-to-ground (A2G) channel dataset [32] covering three typical scenarios with data collected at two different UAV flight heights. We select six typical modulation schemes used in UAV communications (ASK, FSK, BPSK, FM, MSK, QPSK), uniformly set their baseband parameters to a 2.4 GHz carrier frequency, 1 Mbps symbol rate, 8 MHz sampling rate (8 samples per symbol), and a fixed 1024-point baseband IQ sequence length, and employ a raised cosine filter with a roll-off factor of 0.35 for pulse shaping to align with practical UAV communication engineering.

6.1. Performance Comparison Before and After Augmentation

In our simulations, we configure the UAV air–ground channel fading type and expansion factor. Data augmentation is then performed on the A2G channel samples extracted from the measured dataset [32] to generate an expanded dataset with various LoS/NLoS ratios and fading parameters.

The IQ visualization diagrams of the six types of signals before and after expansion are shown in Figure 9.

Then, we calculate the consistency before and after augmentation according to the following formula as given by

C_{t} = |1 - \frac{\sum_{i = 1}^{N} |A_{i} - A_{i}^{'}|}{\sum_{i = 1}^{N} |A_{i}|}|,

(25)

where

A_{i}

denotes the amplitude of the i-th sample point in the original signal,

A_{i}^{'}

represents the amplitude of the corresponding sample point in the augmented signal, and N is the total number of sample points in one signal segment.

C_{b} = 1 - \frac{D}{M},

(26)

where D represents the number of transformed bits, and M denotes the total number of bits.

Table 1 presents the performance of the core feature consistency of the original UAV air–ground signal training dataset for this study under the conditions of 10-fold, 20-fold, and 30-fold expansion multiples, with the consistency including both signal modulation features and UAV air–ground channel fading features. We can see that after data expansion using the proposed model–data dual-driven expansion strategy, the feature matching degree between the expanded data and the original data remains stably above 99%, which fully meets the preset expansion indicators. This is because, based on the design logic of model–data dual drive, this strategy can accurately control the core feature attributes of the original data during the dataset expansion process, effectively avoid the problem of feature deviation during the expansion, and reduce the interference of expansion operations on the integrity of the original data features. Thus, it ensures the high quality and high stability of the expanded dataset and provides reliable data support for subsequent models training.

Furthermore, to verify the impact of dataset augmentation on the training stability and convergence of the proposed STF-DRSN, we plot the training loss curves of the model before and after data expansion as shown in Figure 10.

Figure 10 presents the training loss curves of STF-DRSN before and after the implementation of the dataset augmentation strategy. We can see that the augmented version achieves a lower initial loss and converges to a stable state of 0.28 within only 60 epochs, demonstrating a significantly faster convergence speed compared to the original model. The loss trajectory of the non-augmented STF-DRSN is much higher and exhibits significant fluctuations during the first 30 epochs. This is because the proposed method introduces the model–data dual-driven augmentation during the data preparation phase, encouraging the model to learn from a more diverse and authentic data distribution, thereby effectively alleviating the impact of overfitting and enhancing the overall training stability of the network.

Next, we select 100 signals of each type from the dataset that are augmented by 10 times. The formula for calculating the accuracy is as follows

Accuracy = \frac{\sum_{i = 1}^{C} M_{i i}}{\sum_{i = 1}^{C} \sum_{j = 1}^{C} M_{i j}},

(27)

where C represents the number of signal categories, and M represents the Confusion Matrix, a matrix for evaluating the performance of a classification model. The element

M_{i j}

denotes the number of samples belonging to class i that are classified as class j, and

M_{i i}

refers to the element at the i-th row and i-th column in the confusion matrix, which corresponds to the number of samples in class i that are correctly classified as class i. The numerator is the number of all correctly classified samples, and the denominator is the total number of test samples.

The overall estimation results of the six signal modulation methods are presented in the form of a confusion matrix as shown in Figure 10.

Figure 11 shows the model ability to distinguish between different modulation modes for non-expanded samples and expanded samples. We can see that the overall recognition accuracy is 90.83% before expansion and 97.83% after expansion. The sample recognition accuracy processed by the dataset expansion strategy is improved by 7%, which significantly enhances the model ability to distinguish different modulation modes. This is because the dataset expansion strategy supplements valid samples reasonably, increasing the sample size while preserving the core features of various modulated signals. It can help the models learn the feature differences of various modulated signals comprehensively, and address the problems in insufficient feature learning and judgment deviation caused by insufficient sample quantity. Thereby, the strategy lowers the misjudgment rate and improves the modulation recognition accuracy.

6.2. Performance Comparison Before and After Noise Variation

In the research on UAV air–ground signal modulation recognition, although the analysis of signals in an ideal noise-free environment can simplify the initial verification process of algorithms, it is significantly disconnected from real UAV low-altitude air–ground communication scenarios. The unavoidable noise interference in actual air–ground channels directly makes it difficult to apply the research conclusions obtained in noise-free environments to practical situations. Therefore, the active introduction of noises in modulation recognition research is essential to align the models with the signal transmission characteristics of UAV low-altitude propagation engineering practice during the training phase.

In this subsection, we further optimize the UAV air–ground channel dataset construction scheme. The SNR of the background environment of the augmented dataset is reduced by 5 dB to simulate the complex low-altitude propagation environment with severe noise. On this basis, we conduct training on the augmented dataset based on UAV air–ground channel environmental background information to improve the model adaptability to scenarios with different noise intensities and fading severities. The confusion matrix showing the variation results of the six signal modulation methods is presented in Figure 12.

Figure 12 presents the modulation recognition accuracy of the proposed STF-DRSN across a range of SNR levels. We can see that the proposed method achieves robust performance at higher SNRs, but its accuracy drops significantly as the SNR decreases by 5 dB, demonstrating a clear dependency on the signal-to-noise ratio. The identification accuracy of the model under high-noise conditions is markedly lower than that in clear-signal environments, showing a significant performance degradation as the noise floor rises. This is because the enhancement of noise effectively masks the subtle discriminative features of certain modulation signatures, encouraging the model to confuse different modulation modes with high structural similarity, thereby effectively compromising the network’s ability to capture the core physical differences between signal categories. Notably, the inherent similarity between specific modulation pairs further exacerbates this judgment deviation under intensified noise interference.

To demonstrate the independent contribution and necessity of each core module in the proposed framework, we conduct an ablation experiment under the condition of SNR = 15 dB. The complete proposed framework is used as the baseline. Only one core module is removed or replaced separately, while all other training configurations, network parameters, and test conditions remain exactly the same, to ensure the fairness of the comparison. The ablation experiment results of each modulation type and the average accuracy are summarized in Table 2.

Table 2 presents the modulation recognition accuracy of the complete STF-DRSN and its ablated variants. We can see that the complete proposed model achieves the highest identification accuracy across all signal types, and this architecture exhibits a measurable performance gain over all simplified counterparts. The identification accuracy of the ablated models is consistently lower than that of the full framework, with the replacement of the adaptive soft thresholding mechanism by a fixed threshold causing the most significant performance degradation of 6.27%. This is because the proposed method integrates a collaborative pipeline where the dual-driven augmentation alleviates overfitting by providing high-fidelity samples, and the BL-LSTM module captures essential environmental prior information. This synergy encourages the model to extract discriminative features while dynamically suppressing noise-related redundant information, thereby effectively alleviating the performance bottleneck caused by small samples and complex channel distortions.

Based on this, subsequent research can promote targeted optimization: on the one hand, improve the feature extraction module to strengthen the model’s resistance to noise interference and reduce the weakening effect of noises on the features of modulated signals; on the other hand, design special differentiation strategies for easily confused modulation modes to enhance the model ability to distinguish similar features. In this way, the recognition accuracy and stability of the model in complex noise environments can be comprehensively improved.

6.3. The Impact of Classification Algorithms on Recognition Performance

We compare the proposed STF-DRSN with seven benchmarks, i.e., the multi-domain feature enhancement network (MDFEN) [33], the time-optimized reduced-feature device (TO-RFD) [34], the deep feature learning network (DFLNET) [35], the zero-sample signal recovery (ZLSR) [36], the residual structure neural network (RESNET) [37], the generative adversarial networks (GANs) [16], and the Transformer [22].

Figure 13 presents the average modulation recognition accuracy of different methods versus training sample size. According to references [17,19], in low-altitude intelligent UAV communication scenarios, the critical threshold for small samples is defined as 5% of the total samples. Aligning with our specific research context, we set 100 training samples per category as the critical threshold of small samples, and vary the sample size from 30 to 200 in the simulation to demonstrate the performance of our method under small-sample conditions. We can see that the proposed STF-DRSN achieves the highest identification accuracy across all tested sample sizes, and it maintains remarkably stable performance even under the ultra-small sample conditions (e.g., 30 and 50 groups per class). The identification accuracy of other baseline methods, such as GANs, DFLNet, and TO-RFD, shows a sharp performance degradation as the sample size decreases. This is because the proposed method integrates a model–data dual-driven augmentation strategy and an adaptive soft thresholding mechanism during the feature extraction process. Such combination encourages the network to prioritize core discriminative features while suppressing the detrimental effects of noise and environmental fluctuations, effectively alleviating the risk of overfitting typically associated with sparse data scenarios and eliminating environment-related redundant information.

Figure 14 illustrates the average modulation recognition accuracy of the compared models across a wide SNR spectrum ranging from 0 dB to 30 dB. It can be observed that the proposed STF-DRSN consistently maintains a dominant lead, initiating with a high baseline of 92% at 0 dB and converging toward 98% at 30 dB. We can also see that, while the Transformer model occupies the second tier, other baseline architectures such as TO-RFD, DFLNet, and GANs exhibit significantly lower classification performance, particularly in the high-noise region. Unlike the other seven models whose feature extraction capabilities are severely suppressed under low-SNR conditions, the proposed STF-DRSN maintains high-fidelity recognition accuracy even when the signal is heavily distorted by environmental interference. The reason lies in the collaborative operation of the BL-LSTM background learning module and the adaptive soft thresholding mechanism. By explicitly mining environmental prior information, our method can accurately estimate the instantaneous noise level and dynamically adjust the shrinkage thresholds within each residual block. Therefore, our method can precisely prune noise-related redundant features while preserving the essential high-dimensional signatures of the modulation signals, effectively neutralizing the impact of low SNR on decision boundaries.

Figure 15 shows the comparison of recognition accuracy between the method proposed in this paper and the other five algorithms under SNR of 20 dB. We can see that within the entire experimental test range, the STF-DRSN does not experience a significant performance decline caused by slight fluctuations in experimental conditions, demonstrating good performance stability. Moreover, our method maintains an overall higher recognition accuracy when compared with the other five algorithms. This is because, by combining physical models and the measured data enhancement, a hybrid dataset with high coverage is constructed, which effectively alleviates the problem of insufficient samples. Then, training based on the expanded dataset incorporating environmental background information further introduces prior knowledge such as environmental noises, multipath effects, and Doppler frequency shifts. This optimizes the authenticity of data distribution, improves the physical rationality of the dataset, and enables the model to have stronger adaptability in complex electromagnetic environments. Finally, a hierarchical neural network is used to automatically extract the essential features of signals, achieving high-precision parameter estimation and classification.

6.4. Robustness Verification Under UAV Dynamic Mobility Scenarios

We evaluate the robustness of the proposed scheme against the time-varying channel interference caused by UAV dynamic movement. The dynamic scenario settings strictly follow the UAV signal propagation model established in Section 2.1: the UAV moves along a linear uniform trajectory, with the carrier frequency fixed at 2.4 GHz, and the Doppler shift and time-varying channel gain are calculated according to the UAV moving speed. All experiments are conducted under a fixed SNR of 15 dB, to ensure a fair comparison between different methods. We test the modulation recognition accuracy of the proposed STF-DRSN under typical UAV moving speeds (0 m/s to 40 m/s), and compare it with three representative baseline methods: the standalone BL-LSTM model, the GANs-based AMC method, and the Transformer-based AMC method. The quantitative results are summarized in Table 3.

Table 3 presents the modulation recognition accuracy of the proposed STF-DRSN model and three baseline methods under different UAV moving speeds with a fixed SNR of 15 dB, aiming to verify the robustness of the proposed scheme against time-varying channel interference caused by UAV dynamic mobility. It can be observed from the table that all methods show a gradual decrease in modulation recognition accuracy as the UAV moving speed increases, but the STF-DRSN model maintains the highest recognition accuracy across all moving speed scenarios, with a minimal accuracy drop of only 4.37%, which is significantly superior to the three baseline methods. This is because the STF-DRSN integrates a collaborative pipeline consisting of dual-driven augmentation and BL-LSTM module, while the BL-LSTM module effectively captures essential environmental prior information and dynamically suppresses noise-related redundant information, thereby enhancing the model adaptability to time-varying channel interference caused by UAV dynamic movement.

The excellent dynamic robustness of the proposed method benefits from the synergistic design of the three core modules: first, the model–data dual-driven augmentation integrates dynamic channel and Doppler shift characteristics into the training dataset, enabling the model to learn the feature variation rules under dynamic movement in advance; second, the BL-LSTM module effectively extracts the dynamic environmental features including time-varying fading and Doppler shift, providing accurate environmental prior information; finally, the adaptive soft threshold mechanism of STF-DRSN can dynamically adjust the denoising intensity according to the changing environmental features, effectively suppressing the interference caused by UAV movement while retaining the core modulation features. The experimental results fully verify that the proposed method can well adapt to the dynamic mobility scenarios of UAVs, and has strong practical application value.

6.5. Complexity Analysis

To evaluate the engineering deployment feasibility of the proposed STF-DRSN on resource-constrained UAV airborne platforms, this section conducts a quantitative analysis about the model complexity and real-time inference performance. The hardware platform parameters are specified in Section 6.1, with a fixed input IQ sequence length of 1024 points and batch size = 1.

The model complexity is measured by two universally adopted academic indicators. Trainable Parameters, which reflect the storage overhead of the model and directly determine the memory occupation on the airborne platform, and Floating Point Operations (FLOPs) per single-sample inference, which characterize the computational overhead of the model and directly dominate the computing power requirement for inference. Meanwhile, the real-time inference performance is quantified by the end-to-end single-sample inference time, with the inference latency tested separately on the server side and the airborne embedded side, to comprehensively reflect the real-time processing capability of the proposed model in practical deployment.

The quantitative test results of model complexity and inference performance are shown in Table 4.

Table 4 presents the computational complexity and inference performance of the proposed STF-DRSN under different hardware platforms. We can see that the proposed method achieves the lowest memory footprint and shortest execution latency on all evaluation dimensions, and maintains a remarkably efficient performance on both server-side and airborne embedded deployments. This is because the proposed method introduces an adaptive soft-threshold shrinkage mechanism within each residual block, encouraging the model to prune noise-contaminated redundant features at an early stage, thereby effectively alleviating the impact of excessive computational load on airborne hardware resources.

7. Conclusions

In this paper, aiming at the pain points of small-sample, strong noise and multi-scale fading in UAV low-altitude air–ground communication scenarios, we proposed an enhanced UAV signal modulation estimation method focusing on small-sample augmentation and soft threshold denoising. First, we developed a model–data dual-driven dataset expansion method by fusing the electromagnetic propagation physical model with measured data, which maintains a core feature consistency of over 99% and can accurately reproduce the LoS/NLoS dynamic switching and Doppler shift characteristics of UAV air–ground channels. Then, we designed a BL-LSTM network trained on the augmented dataset with environmental information, achieving the accurate extraction of UAV air–ground channel background features and real-time SNR estimation. Finally, we proposed the STF-DRSN; by integrating the soft threshold anti-interference capability and the attention-driven threshold structure, the network suppresses noise and enhances the feature extraction capability, thus ensuring the stability of modulation recognition accuracy in UAV dynamic flight scenarios. Experimental results showed that the proposed STF-DRSN outperformed state-of-the-art algorithms by more than 2% in UAV signal modulation estimation accuracy, and achieved an average recognition accuracy of over 95% under low-SNR conditions with severe UAV air–ground channel fading. In the future, our work will extend to complex modulation types of UAV communication signals and optimize network complexity for UAV onboard terminal real-time processing.

Author Contributions

Conceptualization, F.H.; Methodology, F.J.; Software, F.J. and Y.S.; Validation, Z.L.; Formal analysis, F.J. and Y.S.; Investigation, Y.H.; Resources, Y.H. and Z.L.; Data curation, F.J.; Writing—original draft, F.J.; Supervision, Z.Y., F.H. and H.X.; Project administration, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This resarch received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

An, T.T.; Argyriou, A.; Puspitasari, A.A.; Cotton, S.L.; Lee, B.M. Efficient Automatic Modulation Classification for Next-Generation Wireless Networks. IEEE Trans. Green Commun. Netw. 2026, 10, 249–259. [Google Scholar] [CrossRef]
Lee, I.; Alteneiji, K.; Alghfeli, M. Enhancing Modulation Classification via Diffusion Transformers for Drone Video Signal Processing. IEEE Signal Process. Lett. 2025, 32, 3325–3329. [Google Scholar] [CrossRef]
Lin, Z.; Liu, K.; Wu, S.; Zhu, Q.; Gao, Q.; Zhong, W.; Wu, Q. Nonlinearity Approximation-Based Spectrum Map Fusion with Environmentally Adaptive Regularization. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 4032–4044. [Google Scholar] [CrossRef]
Zhang, Y.; Lu, M.; Wang, Y. Combinatorial Neural Network Signal Modulation Recognition Algorithm Based on Attention Mechanism. In Proceedings of the International Conference on AI Network and Information Technology (AINIT), Nanjing, China, 16–18 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 149–153. [Google Scholar]
Beknadj, D.; Tayakout, H.; Tounsi, M.; Bouchibane, F.Z.; Boutellaa, E. Digital Modulation Identification in Spatial Modulation MIMO Systems. In Proceedings of the International Conference on Image Signal Processing and Applications (ISPA); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Diaz, L.; Ortega, H. Cooperative Spectrum Sensing Technique for Identifying Illegal FM Broadcast Radio Stations Using an Energy Detector and a Peaks Detector. In Proceedings of the Symposium on Image, Signal Processing and Artificial Vision (STSIVA); IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
Lin, Z.; Cai, X.; He, Y.; Guo, L.; Xu, Y.; Wei, N.; Zhu, Q.; Wu, Q. Variational Bayesian Multi-Source Localization in Complex Multipath Environments. IEEE Trans. Commun. 2026. [Google Scholar]
Wang, P.; Yan, G.; Zhang, Y.; Huang, Z.; Song, T.; Xu, C.; Zhu, X. DC Bus Capacitance Identification Method Based on Frequency Domain Data Drive for EV Charging System. IEEE Trans. Ind. Electron. 2024, 71, 12223–12232. [Google Scholar] [CrossRef]
Sun, X.; Shao, W.P.; Lv, P.; Mao, Y.-N.; Chen, Q.-X.; Wu, H. Deep Learning-Based Defect Identification for Quantum Key Distribution Devices. IEEE Photonics J. 2025, 17, 7600308. [Google Scholar] [CrossRef]
Wang, J.; Lin, Z.; Zhu, Q.; Wu, Q.; Lan, T.; Zhao, Y.; Bai, Y.; Zhong, W. 3D spectrum mapping and reconstruction under multi-radiation source scenarios. China Commun. 2026, 23, 20–34. [Google Scholar] [CrossRef]
Liu, Y.; Xie, S.; He, H.; Zhu, J. A Design for an Intelligent Washing Machine Control System. In Proceedings of the International Conference on Image Processing and Computer Applications (ICIPCA); IEEE: Piscataway, NJ, USA, 2024; pp. 185–189. [Google Scholar]
Nair, A.A.; Adithyan, R.; Unni, A.; Nalinakshan, S. RFID Door Lock Access Control Systems: Trends, Technologies and Applications. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT); IEEE: Piscataway, NJ, USA, 2025; pp. 906–912. [Google Scholar]
Ramya, G.; Bharath, M.; Pujith, N.; Harini, D. Modulation Identification with Deep Learning Using Matlab. In Proceedings of the International Conference on AI and Machine Learning Applications (AIMLA); IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Li, S.; Chao, L. Tackling the Challenge of Small Sample Size: Effective Training by Integrating SNN and GNN. In Proceedings of the China Automation Congress (CAC); IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
Rupavath, R.V.S.S.B.; Alfilh, R.H.C.; Karthiga, S.; A C, R.; Dhilipkumar, R. Signal Detection and Classification in Communication Network Using Quadrature Amplitude Modulation with Signal to Noise Ratio. In Proceedings of the International Conference on Integrated Intelligent Communication Systems (ICIICS); IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Wang, Z.; Zhou, F.; Han, X.; Zhang, L. Automatic Modulation Recognition of Communication Signal Based on Deconvolution Generation Adversarial Network. In Proceedings of the International Conference on Signal Processing and Intelligent Computing (SPIC); IEEE: Piscataway, NJ, USA, 2024; pp. 16–20. [Google Scholar]
Wang, B.; Lu, J. Mixed-Domain Feature Fusion Network for Few-Sample Automatic Modulation Recognition. In Proceedings of the International Conference on Robust Intelligent Control and Artificial Intelligence (RICAI); IEEE: Piscataway, NJ, USA, 2024; pp. 205–210. [Google Scholar]
Yin, L.; Fu, X.; Shi, S.; Liu, P.; Lin, Y.; Wang, Y.; Sari, H. Few-Shot Domain Adaption-Based Specific Emitter Identification Under Varying Modulation. In Proceedings of the IEEE International Conference on Communication Technology (ICCT); IEEE: Piscataway, NJ, USA, 2023; pp. 1439–1443. [Google Scholar]
Xu, D.; Zhu, Y.; Lu, Y.; Feng, Y.; Lin, Y.; Xuan, Q. MCLRL: A Multi-Domain Contrastive Learning with Reinforcement Learning Framework for Few-Shot Modulation Recognition. arXiv 2025, arXiv:2502.19071. [Google Scholar] [CrossRef]
Chai, Y. Modulation Pattern Recognition Based on TR-Unet. In Proceedings of the IEEE International Conference on Power Intelligent Computing Systems (ICPICS); IEEE: Piscataway, NJ, USA, 2024; pp. 1081–1087. [Google Scholar]
Xiao, Q.; Xie, W.; Li, L.; Yuan, S.; Zhao, Y. Research on OFDM Modulation Recognition Method Based on High-Order Cyclic Cumulants and Neural Networks. In Proceedings of the IEEE International Conference on Power Intelligent Computing Systems (ICPICS); IEEE: Piscataway, NJ, USA, 2024; pp. 712–716. [Google Scholar]
Wu, Y.; Chang, J.; Li, D.; Li, M. An Improved Encoder Network Method for Modulation Recognition. In Proceedings of the International Conference on Communication and Image Signal Processing (CCISP); IEEE: Piscataway, NJ, USA, 2024; pp. 46–51. [Google Scholar]
Park, M.C.; Han, D.S. Deep Learning-Based Automatic Modulation Classification with Blind OFDM Parameter Estimation. IEEE Access 2021, 9, 108305–108317. [Google Scholar] [CrossRef]
Sun, D.; Chang, X.; Zhang, Q. Image Denoising Algorithm Based on New Threshold Function. In Proceedings of the International Conference on Data-Driven Optimization of Complex Systems (DOCS); IEEE: Piscataway, NJ, USA, 2025; pp. 54–59. [Google Scholar]
Salo, J.; Vuokko, L.; Elsallabi, H.M.; Vainikainen, P. An Additive Model as a Physical Basis for Shadow Fading. IEEE Trans. Veh. Technol. 2007, 56, 13–26. [Google Scholar] [CrossRef]
Shao, G.; Morawski, R.; Le-Ngoc, T. Digital Baseband Self-Interference Cancellation Using Fractionally-Spaced Finite-Impulse-Response Structure for Full-Duplex MIMO. In Proceedings of the International Conference on Communications and Electronics (ICCE); IEEE: Piscataway, NJ, USA, 2024; pp. 48–53. [Google Scholar]
Li, G.; Zhang, X. Research on the Recommendation System Combining Bi-LSTM and NCF Algorithms. In Proceedings of the International Conference on Information Systems and Computer Technology (ISCTech); IEEE: Piscataway, NJ, USA, 2023; pp. 402–407. [Google Scholar]
D, E.; Bhavani, N.P.G. An Effective DNN Based ResNet Approach for Satellite Image Classification. In Proceedings of the International Conference on Smart Electronic and Communication (ICOSEC); IEEE: Piscataway, NJ, USA, 2023; pp. 1055–1062. [Google Scholar]
Upadhyay, D.; Sharma, K.B.; Gupta, M.; Upadhyay, A.; Venu, N. Deep Learning for Channel Prediction in Non-Stationary Wireless Fading Environments. In Proceedings of the 2024 IEEE International Conference on Intelligent Signal Processing and Effective Communication Technologies (INSPECT); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Fan, W.; Yao, H.; Zhao, Z.; Zhao, X.; Zang, Y.; Wang, H. ESTMST-ST: An End-to-End Soft Threshold and Multiloss Self-Distillation Based Swin Transformer for Underwater Acoustic Signal Recognition. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4200813. [Google Scholar] [CrossRef]
Li, J.; Liu, H.; Li, K.; Shan, K. Heart Sound Classification Based on Two-Channel Feature Fusion and Dual Attention Mechanism. In Proceedings of the International Conference on Computer Engineering and Applications (ICCEA); IEEE: Piscataway, NJ, USA, 2024; pp. 1294–1297. [Google Scholar]
Zhu, Q.; Li, H.; Mao, K.; Li, H.; Wen, X.; Zhang, S.; Wang, J. Measured A2G Channel Dataset (CIR) in Multiple Scenarios. Mendeley Data, V2. 2025. Available online: https://data.mendeley.com/datasets/8p6r8ft4fz/2 (accessed on 20 March 2026).
Zhang, D.; Song, R. A Multi-Scale Feature Fusion and Extraction Network Based on Point-Wise Attention for Aerial-Ground Point Cloud Place Recognition. In Proceedings of the International Conference on Neural Network and Information Communication Engineering (NNICE); IEEE: Piscataway, NJ, USA, 2025; pp. 890–893. [Google Scholar]
Qiu, T.; Chen, N.; Zhang, S. Robustness Optimization for IoT Topology; Springer: Singapore, 2022. [Google Scholar]
Mahalakshmi, D.; Thangaraj, S.J.J. Deep Spectral Learning Features in an Adaptive Dense Net Convolution Neural Network for Macular Degeneration. In Proceedings of the International Conference on Electronic Systems and Intelligent Computing (ICESIC); IEEE: Piscataway, NJ, USA, 2024; pp. 337–342. [Google Scholar]
Yoo, J.S.; Kim, D.W.; Lu, Y.; Jung, S.-W. RZSR: Reference-Based Zero-Shot Super-Resolution with Depth Guided Self-Exemplars. IEEE Trans. Multimed. 2023, 25, 5972–5983. [Google Scholar] [CrossRef]
Zhang, K.; Sun, M.; Han, T.X.; Yuan, X.; Guo, L.; Liu, T. Residual Networks of Residual Networks: Multilevel Residual Networks. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 1303–1314. [Google Scholar] [CrossRef]

Figure 1. The core workflow of unmanned aerial vehicle (UAV) transmission signal detection.

Figure 2. Typical A2G communication scenario.

Figure 3. An illustrating of the connections between the three parts of our work.

Figure 4. Flow diagram of A2G composite fading channel simulation.

Figure 5. The schematic diagram of the BL-LSTM network unit structure.

Figure 6. The flowchart of the proposed soft-threshold-based deep residual shrinkage network.

Figure 7. The schematic diagram of the attention mechanism in the proposed network.

Figure 8. Schematic diagram of the signal modulation estimation network structure.

Figure 9. Comparison of IQ visualization diagrams for different types of signals before and after dataset expansion. (a) The results of ASK. (b) The results of FSK. (c) The results of BPSK. (d) The results of FM. (e) The results of MSK. (f) The results of QPSK.

Figure 10. Training loss curves of STF-DRSN before and after dataset augmentation.

Figure 11. The confusion matrices of modulation recognition accuracy before and after data expansion. (a) Modulation recognition accuracy before data expansion. (b) Modulation recognition accuracy after data expansion.

Figure 12. The confusion matrix of modulation recognition accuracy after noise variation. (a) The results at SNR of 20 dB. (b) The results at SNR of 15 dB.

Figure 13. Modulation recognition accuracy of different algorithms vs sample sizes.

Figure 14. Modulation recognition accuracy of different algorithms under different SNRs.

Figure 15. The confusion matrices of modulation recognition of different algorithms. (a) The results of STF-DRSN. (b) The results of Transformer. (c) The results of MDFEN. (d) The results of TO-RFD. (e) The results of DFLNET. (f) The results of GANs.

Table 1. Consistency record table for dataset augmentation.

Augmentation Multiple	Number of Samples Before Augmentation	Number of Samples After Augmentation	Consistency
10	600	6000	99.64%
20	600	12,000	99.13%
30	600	18,000	99.26%

Table 2. Ablation experiment results of core modules (SNR = 15 dB).

Signal Type	Complete Proposed Model	$w / o$ Dual-Driven Augmentation	$w / o$ BL-LSTM Module	$w / o$ Adaptive Soft Threshold
ASK	98.00	94.20	96.00	92.50
FSK	97.00	92.50	88.00	90.30
BPSK	97.00	93.80	92.00	91.60
FM	95.00	90.10	89.00	87.40
MSK	92.00	88.30	91.00	85.70
QPSK	94.00	91.20	90.00	87.90
Average	95.50	91.68	91.00	89.23

Table 3. Modulation recognition accuracy (%) under different UAV moving speeds (SNR = 15 dB).

Moving Speed (m/s)	STF-DRSN	BL-LSTM	GANs	Transformer
0 (Static)	96.12	91.35	89.74	90.56
10	95.78	88.62	85.31	86.79
20	94.93	84.17	80.26	82.15
30	93.46	79.53	74.89	77.32
40	91.75	73.28	68.45	71.60

Table 4. The quantitative analysis results.

Evaluation Dimension	Specific Metrics	Value
Training Complexity	Trainable Parameters	12.8 M
	FLOPs per Training Epoch	2.3 G
Computational Inference Time	Single-sample Inference Time	1.24 ms
	Single-sample Inference Time	4.87 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, F.; Shao, Y.; He, Y.; Ye, Z.; He, F.; Lin, Z.; Xiao, H. Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios. Drones 2026, 10, 257. https://doi.org/10.3390/drones10040257

AMA Style

Jin F, Shao Y, He Y, Ye Z, He F, Lin Z, Xiao H. Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios. Drones. 2026; 10(4):257. https://doi.org/10.3390/drones10040257

Chicago/Turabian Style

Jin, Fang, Yang Shao, Yunhong He, Zhihao Ye, Fangmin He, Zhipeng Lin, and Han Xiao. 2026. "Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios" Drones 10, no. 4: 257. https://doi.org/10.3390/drones10040257

APA Style

Jin, F., Shao, Y., He, Y., Ye, Z., He, F., Lin, Z., & Xiao, H. (2026). Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios. Drones, 10(4), 257. https://doi.org/10.3390/drones10040257

Article Menu

Soft Threshold Denoising-Based Environmental Adaptive UAV Signal Modulation Recognition for Small-Sample Scenarios

Highlights

Abstract

1. Introduction

2. System Model

2.1. UAV Air–Ground Signal Propagation Model

2.2. UAV Air–Ground Channel Fading and Noise Model

2.3. Problem Modeling

2.4. Technical Implementation

3. Model–Data Dual-Driven Dataset Expansion

3.1. Composite Fading Channel Model

3.2. A2G Channel Simulation Generation

4. Dataset Training Based on Environmental Background Information

4.1. BL-LSTM Model

4.2. Background Information Extraction

5. Signal Feature Estimation Based on Deep Learning

5.1. Channel Threshold Function

5.2. Attention Mechanism

5.3. Update Feature Values

6. Simulation and Result Analysis

6.1. Performance Comparison Before and After Augmentation

6.2. Performance Comparison Before and After Noise Variation

6.3. The Impact of Classification Algorithms on Recognition Performance

6.4. Robustness Verification Under UAV Dynamic Mobility Scenarios

6.5. Complexity Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI