Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference

Long, Yudong; Zhou, Huaji; Yu, Wenbo; Ren, Huan; Zhou, Feng; Zhang, Yufei

doi:10.3390/drones10010036

Open AccessArticle

Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference

by

Yudong Long

¹,

Huaji Zhou

²,

Wenbo Yu

^1,*,

Huan Ren

³,

Feng Zhou

⁴ and

Yufei Zhang

⁵

¹

Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China

²

National Key Laboratory of Electromagnetic Space Security, Jiaxing 100048, China

³

Development Planning Office, Hangzhou Normal University, Hangzhou 311121, China

⁴

Key Laboratory of Electronic Information Countermeasure and Simulation Technology, Ministry of Education, Xidian University, Xi’an 710071, China

⁵

School of Art and Archaeology, Hangzhou City University, Hangzhou 310015, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(1), 36; https://doi.org/10.3390/drones10010036

Submission received: 26 November 2025 / Revised: 2 January 2026 / Accepted: 4 January 2026 / Published: 6 January 2026

(This article belongs to the Section Drone Communications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A geometry-energy joint open-set recognition framework is proposed for robust UAV RF signal identification under low-SNR and high-openness conditions.
Frequency-conditioned temporal modeling enables reliable unknown signal rejection while preserving strong closed-set classification performance.

What are the implications of the main findings?

The proposed GE-OSR framework provides a lightweight and practical solution for real-time low-altitude UAV monitoring and anti-drone systems.
The geometry-energy modeling strategy is general and can be extended to other open-set sensing tasks in dynamic electromagnetic environments.

Abstract

Reliable recognition of unmanned aerial vehicle (UAV) communication signals is essential for low-altitude airspace safety and UAV monitoring. In practical electromagnetic environments, UAV signals exhibit complex time-frequency characteristics, and unknown signal types frequently appear, making open-set recognition necessary. This paper proposes a geometry-energy open-set recognition (GE-OSR) method for UAV signal identification. First, a time-frequency convolutional hybrid network is developed to learn multi-scale representations from raw UAV signals. Then, learnable class embeddings with a dual-constraint embedding loss are introduced to improve feature compactness and separability. In addition, a free-energy alignment loss is introduced to assign low energy to known signals and high energy to unknown ones, forming an adaptive rejection boundary. Experiments under different signal-to-noise ratios (SNRs) and openness levels show that GE-OSR provides stable performance. At 0 dB SNR under high openness, the method improves OSCR by about 2.95% over the recent S3R model and more than 6% over other baselines. These results show that GE-OSR is effective for practical UAV signal identification and unknown signal detection in complex low-altitude environments.

Keywords:

unmanned aerial vehicles; RF signal processing; open-set recognition; deep learning; low-altitude security

Graphical Abstract

1. Introduction

UAVs are now widely used across many areas, including smart agriculture [1], re-mote sensing [2], emergency response [3], logistics [4], and infrastructure inspection [5]. However, the rapid increase in low-altitude UAV operations has brought new challenges for airspace safety. Unauthorized or unknown drones may interfere with navigation or communication systems and raise potential security concerns for critical infrastructure and national defense [6,7]. Therefore, reliable identification of UAV communication signals and effective detection of unknown or suspicious UAVs are essential for maintaining situational awareness in complex electromagnetic environments.

At present, UAV identification is mainly based on acoustic, visual, radar, and radio-frequency (RF) signals. Early works mainly relied on hand-crafted features combined with traditional machine learning models. For example, Muhammad et al. [8] used Mel-frequency cepstral coefficients from UAV acoustic signals and trained support vector machines (SVMs) for recognition. Chu et al. [9] applied histograms of oriented gradients for visual-based detection, while Zhang et al. [10] used short-time Fourier transform and principal component analysis for radar-based classification. For RF-based UAV recognition, Nie et al. [11] combined fractal dimension and dual-spectrum features to extract RF fingerprints for UAV recognition. Methods based on handcrafted features can achieve reasonable performance in relatively controlled experimental conditions. However, their effectiveness is highly dependent on the quality of feature design, and their performance often drops noticeably in the presence of low SNRs, channel distortions, or dynamically changing environments.

Compared with other sensing modalities, RF signals offer several practical ad-vantages. They directly reflect the communication behavior and operational status of UAVs and can be collected passively without strict line-of-sight requirements. This makes them suitable for continuous monitoring and anti-UAV applications. In recent years, deep learning has greatly improved UAV signal recognition, since neural networks can learn hierarchical and discriminative features directly from raw data. In related research fields, studies have explored a variety of network architectures [12,13,14,15] and learning strategies, such as few-shot learning [16,17], semi-supervised learning [18,19], self-supervised learning [20,21], and transfer learning [22,23], to improve model performance in difficult conditions. Such strategies have gradually been introduced into RF signal processing and UAV identification tasks. For instance, Akter et al. [24] used convolutional neural networks (CNNs) with angle-of-arrival features for UAV recognition; Domenico et al. [25] proposed a real-time RF identification framework; and Cai et al. [26] built a lightweight multi-scale CNN for efficient UAV RF fingerprinting.

Since RF signals naturally contain information in both the time and frequency do-mains, effective representation learning usually requires jointly modeling these two aspects. Classical time-frequency analysis methods, such as the short-time Fourier trans-form (STFT) [27] and the continuous wavelet transform (CWT) [7], have been widely used to generate two-dimensional representations for CNN-based classifiers. Ozturk et al. [28] applied spectrogram-based CNNs for UAV detection and achieved robust performance across different drone platforms. Zhang et al. [29] proposed a multi-channel physical feature convolution and tri-branch fusion network for automatic modulation recognition. More recently, Dong et al. [30] introduced a second-order synchrosqueezing transform with attention mechanisms for enhanced analysis of non-stationary signals. In other domains, feature modulation techniques have shown strong capability in integrating heterogeneous information. De Vries et al. [31] proposed conditional batch normalization for language-guided visual processing, and Perez et al. [32] introduced FiLM layers that use one modality to modulate features from another. These works suggest that conditioning temporal features on global spectral information may benefit RF signal recognition, although such mechanisms have not been systematically explored for UAV RF signals.

Another line of research focuses on network architectures that can capture both local and global patterns. Conventional CNNs are effective for local feature extraction but are limited by their receptive fields [33]. Transformer-based models [34] address this issue by using self-attention to model long-range dependencies. To combine the advantages of both, hybrid convolution-attention architectures have been proposed. Wu et al. [35] introduced convolutional operations into Vision Transformers, while Dai et al. [36] developed CoAtNet by integrating depthwise convolutions with attention layers. In speech processing, the Conformer architecture [37] alternates between convolution and attention modules and has achieved strong results. For RF signals, Xu et al. [38] designed parallel complex convolution and attention branches for modulation classification; Huynh-The et al. [39] built a lightweight multi-scale convolutional network for UAV fingerprinting; and Dhakal et al. [40] explored frameworks combining physical-layer fingerprints with deep attention mechanisms.

However, in most existing hybrid architectures, time-domain and frequency-domain information are usually processed separately or combined through simple feature fusion. Although such designs allow the network to access information from both domains, they do not explicitly consider how frequency-domain characteristics influence temporal signal patterns. In UAV RF signals, frequency-domain features often reflect the overall transmission state of the signal, while temporal features describe local waveform variations. When the two domains are treated independently, the interaction between global spectral characteristics and local temporal structures may not be fully captured. Instead of directly concatenating features from different domains, we adopt a modulation-based strategy, in which frequency-domain information is used to adjust temporal representations. This design provides a simple way to introduce cross-domain interaction while preserving the original temporal structure, and avoids significantly increasing model complexity.

Beyond feature extraction, the recognition paradigm itself also requires reconsideration. Most existing UAV signal recognition methods assume a closed-set scenario, in which all test classes are seen during training. In real UAV communication environments, this assumption rarely holds, as new modulation types and private protocols often appear. Traditional Softmax-based classifiers [41] must classify each input into a single known class and cannot recognize or reject new signals. To solve this problem, open-set recognition (OSR) has been proposed, aiming to correctly classify known samples while detecting and rejecting unknown ones.

Figure 1 shows the basic concept of OSR. In a closed-set scenario, the model performs well when all test samples come from known classes. However, when unknown samples appear, closed-set models tend to misclassify them as one of the known classes. In contrast, an open-set model can not only correctly recognize known samples but also reject unknown ones. The concept of OSR was first introduced by Scheirer et al. [42]. Early OSR methods were based on traditional classifiers such as SVMs [43], sparse representation [44], and k-nearest neighbors [45]. Later, Bendale et al. [46] proposed OpenMax, extending OSR to deep neural networks. After that, researchers extended OSR by using counterfactual samples [47], generative adversarial networks [48,49], and reciprocal point learning [50,51]. Geng et al. [52] developed an OSR method based on a hierarchical Dirichlet process, and Wang et al. [53] employed energy modeling to construct high-energy regions for unknown detection.

Generally, OSR methods can be roughly divided into generative and discriminative approaches. Generative methods attempt to model the data distribution of known classes and synthesize unknown samples, but they often struggle to generate realistic and diverse data, especially in complex signal environments. Discriminative methods, on the other hand, focus on learning compact and separable feature spaces. Although such methods usually achieve better classification performance, their robustness can still be limited under practical conditions, such as noise contamination, channel drift, and varying RF environments. Moreover, many existing approaches rely on a single latent space and fixed rejection thresholds, which makes it difficult to simultaneously achieve strong feature separability, stable decision boundaries, and adaptive unknown rejection. Although OSR has been applied to other fields such as bias detection [54] and pathogen identification [55], it is still rarely used for UAV RF signals. This task is challenging because UAV signals often share overlapping frequency bands, exhibit subtle inter-class differences caused by hardware imperfections, and show strong non-stationary motion features, which make detecting unseen signals even harder.

To deal with these challenges, this paper proposes the GE-OSR method for UAV signal classification. Unlike previous works, GE-OSR integrates geometric embedding learning with energy-based modeling within a unified framework. This design enables the model to learn both a structured feature structure and a discriminative energy distribution, leading to achieve better recognition of known samples and more reliable rejection of unknown ones. The main contributions of this paper are summarized as follows:

A time-frequency convolutional hybrid network for UAV signal representation. Considering the complex and non-stationary characteristics of UAV communication signals, a time-frequency convolutional hybrid network is designed to jointly exploit temporal and spectral information, providing stable and representative signal features from raw UAV data.
A geometric embedding mechanism to enhance feature separability. To obtain a more compact and discriminative feature space, a geometric embedding mechanism with learnable class embeddings and dual-constraint loss is introduced, which effectively improves intra-class compactness and inter-class separability.
An energy-based regularization strategy for learning discriminative energy distributions. Faced with the difficulty of distinguishing known and unknown samples in open-set scenarios, an energy-based regularization strategy is adopted, consisting of an explicit energy formulation and its regularization term, and ultimately forming a more discriminative energy landscape.
An adaptive energy threshold for open-set rejection. Instead of relying on a fixed threshold, an adaptive energy thresholding mechanism is introduced, using the empirical energy distribution of known classes, and finally achieving more reliable rejection of unknown signals.

2. Materials and Methods

2.1. Feature Extractor

2.1.1. Time-Frequency Convolutional Hybrid Network

UAV RF signals are usually represented as in-phase and quadrature (IQ) sequences. As these signals evolve over time while distributing energy across multiple frequency components, their underlying structure is jointly governed by time-domain dynamics and frequency-domain properties. To effectively model this coupled time-frequency behavior, we propose a time-frequency convolutional hybrid network that aims to jointly exploit temporal dynamics and frequency-domain characteristics of UAV RF signals in a compact and efficient manner. As illustrated in Figure 2, the network is composed of a front-end frequency-conditioned temporal modulation (FCTM) module and a series of convolutional-Transformer hybrid blocks (CTBlocks), which together form a hierarchical feature extraction backbone.

The FCTM module is designed to introduce frequency-domain context into temporal feature learning. It extracts local temporal features from the IQ sequences while simultaneously summarizing global spectral information, and uses the latter to modulate the former. In this way, frequency-domain characteristics are incorporated as a conditioning signal, enabling the temporal representations to be adapted according to the overall spectral structure of the input signal. On top of the modulated temporal features, multiple CTBlocks are stacked to further enhance representation capability. Each CTBlock combines convolutional operations and Transformer-based modeling to capture both local temporal patterns and long-range dependencies. By interleaving these blocks with temporal downsampling, the network progressively abstracts signal features at different temporal scales. Finally, the extracted feature sequence is aggregated into a fixed-length representation, which is then fed into the classification and open-set recognition modules.

2.1.2. Frequency-Conditioned Temporal Modulation Module

The temporal and spectral components of UAV IQ signals differ substantially in scale, statistical distributions, and noise sensitivity. Directly treating time-domain and frequency-domain representations as independent or equally weighted features often leads to suboptimal representations, as global spectral characteristics may be overwhelmed by local temporal variations. To address this issue, the proposed network adopts a frequency-conditioned temporal modulation strategy, which enables effective interaction between temporal and spectral information without introducing heavy fusion overhead.

As shown in Figure 3, given an input UAV RF signal represented as an IQ sequence, the module processes the signal along two complementary paths to extract temporal features and global spectral information, respectively, and then integrates them in a structured and adaptive manner. Specifically, the temporal path applies one-dimensional convolution followed by batch normalization and non-linear activation to capture local temporal patterns from the raw IQ sequence, producing a temporal feature map that preserves the original time resolution. In parallel, the frequency path computes the fast Fourier transform (FFT) of the same input signal and retains the magnitude spectrum to characterize its frequency-domain behavior. To stabilize the dynamic range and suppress extreme spectral variations, logarithmic compression is applied to the FFT magnitude. The compressed spectral representation is further processed by convolution and adaptive average pooling operations, resulting in a compact global spectral descriptor that summarizes the overall energy distribution across frequency components.

Instead of directly merging time-domain and frequency-domain features, the proposed module uses frequency information to adjust temporal features in a conditioning manner. Specifically, a global spectral descriptor is first extracted from the input signal and then fed into two fully connected layers to generate a channel-wise scaling factor

γ

and a shifting factor

β

. These two parameters are applied to the temporal feature map through an affine transformation:

\tilde{T} = γ * T + β

(1)

where

*

denotes channel-wise multiplication with broadcasting along the temporal dimension. In this way, the original temporal structure is preserved, while the feature representation becomes aware of global frequency-related properties. The output of the FCTM module therefore provides frequency-conditioned temporal features, which serve as the input to subsequent temporal modeling stages.

2.1.3. Convolutional-Transformer Hybrid Blocks

After frequency-conditioned temporal modulation, the resulting feature sequence still contains both local patterns and long-range dependencies. Convolutional operations are well suited for capturing local temporal variations, whereas Transformer-based self-attention is more effective for modeling global contextual relationships. Based on this observation, the proposed network adopts a stack of CTBlocks for temporal feature modeling.

As shown in Figure 4, each CTBlock includes two parallel paths. The convolutional path uses large-kernel depthwise convolutions followed by pointwise projection to efficiently extract local temporal patterns with relatively low computational cost. Channel expansion is applied before pointwise projection to enrich local representations. In the Transformer path, the input sequence is first transposed and passed through Transformer encoder layers, where the self-attention mechanism captures long-range dependencies and global interactions among different time steps.

The outputs of the two paths are combined using a learnable gating mechanism. A scalar gating parameter, followed by a Sigmoid activation, controls the relative contribution of the convolutional and Transformer branches. This allows the block to adaptively emphasize local or global information depending on the input signal. The fused output is finally added to the original input via a residual connection.

Multiple CTBlocks are stacked and interleaved with temporal downsampling layers to form a hierarchical temporal modeling structure. In earlier blocks, the network mainly focuses on fine-grained local temporal patterns, while deeper blocks gradually capture more abstract and long-range dependencies at coarser temporal resolutions. This progressive design helps the network learn robust multi-scale temporal representations for UAV RF signal classification and subsequent open-set recognition.

2.2. GE-OSR: An Open-Set Recognition Framework Based on Geometry-Energy Joint Modeling

2.2.1. Framework Overview

The GE-OSR framework is designed to address the open-set UAV signal recognition problem by jointly considering the geometric distribution of feature vectors and their associated energy values. The goal of the framework is to correctly classify signals from known UAV classes while identifying and rejecting signals that do not belong to any known category.

As shown in Figure 5, GE-OSR contains four core components. First, a convolution-Transformer feature extractor is used to learn discriminative UAV signal representations by combining time-domain and frequency-domain information. Second, a set of learnable class embeddings is introduced to represent known UAV categories in the feature space. These embeddings are optimized using a dual-constraint embedding loss (DCEL), which encourages features from the same class to cluster together while pushing different classes farther apart. Third, a free energy alignment loss (FEAL) shapes low-energy regions for known classes and high-energy regions for unknowns, providing a reliable energy-based boundary. Finally, during inference, an adaptive energy threshold is applied to distinguish unknown signals from known ones, enabling both closed-set classification and open-set rejection within a unified framework.

2.2.2. Feature Metrics and Class Embedding Initialization

The feature vectors produced by the extractor, denoted as

z_{i}

, are normalized to unit length so that variations among samples are reflected purely by their angular differences. Cosine similarity is adopted to measure the relationships between features and class embeddings, as it is bounded in [−1, 1] and provides a stable geometric interpretation when defining category boundaries. Given two feature vectors

x

and

y

, their cosine similarity is defined as:

s i m (x, y) = \frac{x \cdot y}{| | x | | \cdot | | y | |}

(2)

In UAV signal feature space, class distributions are often uneven, and boundaries between nearby categories may become ambiguous, especially under noisy or overlapping conditions. To alleviate this, GE-OSR introduces a set of learnable class embeddings:

C = {c_{1}, c_{2}, \dots, c_{K}}

(3)

where

K

represents the number of known classes,

c_{k} \in R^{d}

, and

d

is the feature dimension of each sample

z_{i}

. During initialization, each class embedding

c_{j}

is independently sampled independently from a standard Gaussian distribution and then normalized onto the unit hypersphere:

{\tilde{c}}_{j} ~ N (0, I_{d}), c_{j} = \frac{{\tilde{c}}_{j}}{{| | {\tilde{c}}_{j} | |}_{2}}

(4)

This initialization strategy distributes the class embeddings roughly uniformly across the hypersphere, ensuring that they are nearly orthogonal to one another. Such spatial diversity serves as a substantial geometric prior, preventing early-stage collapse of class representations. As training progresses, these embeddings are treated as learnable parameters that gradually adapt through backpropagation, aligning with the empirical feature distributions of the known UAV signal classes.

2.2.3. Dual-Constraint Embedding Loss

To encourage both intra-class compactness and inter-class separability, the proposed DCEL applies explicit geometric regularization in the embedding space. Let

z_{i} \in R^{d}

denote the feature vector of the

i

-th sample,

y_{i}

its label, and

c_{k} \in R^{d}

the embedding vector corresponding to class

k

. The intra-class term encourages features to remain close to their corresponding class embeddings, thereby improving cohesion among samples of the same category. It is defined as:

L_{i n t r a} = \log (1 + \sum_{i : y_{i} = k} e^{(- α (sim (z_{i}, c_{k}) - δ))})

(5)

where

α

controls the sensitivity to deviations, and

δ

specifies a soft-margin that defines the minimum desired angular proximity between features and their class embeddings. By minimizing

L_{i n t r a}

, the model continuously pulls features toward their corresponding embeddings, leading to dense and stable intra-class clusters.

To avoid cluster overlap, an inter-class separation term is further introduced to penalize samples that appear excessively similar to incorrect class embeddings:

L_{i n t e r} = \log (1 + \sum_{i : y_{i} \neq k} e^{α (sim (z_{i}, c_{k}) + δ)})

(6)

when a feature exhibits high similarity to an unrelated class embedding,

L_{i n t e r}

rises sharply, forcing the network to enlarge inter-class margins.

The total loss is then defined as:

L_{D C E} = L_{i n t r a} + L_{i n t e r}

(7)

Together, these two terms construct a well-organized embedding space characterized by dense intra-class clusters and clear inter-class separation.

2.2.4. Free Energy Alignment Loss

Although

L_{D C E}

shapes a discriminative feature geometry, it does not by itself provide a direct mechanism for identifying unknown inputs. To address this limitation, we introduce the FEAL, inspired by the energy-based out-of-distribution detection methods [56], and adapt it to the cosine-similarity-based embedding geometry. The core idea is that samples from known classes should reside in low-energy regions, whereas unknown samples should be mapped to high-energy regions. Unlike earlier studies that convert Softmax logits into energy, we define energy directly through the cosine distance between a feature vector and the learnable class embeddings—thus maintaining consistency with the geometric modeling in the DCEL.

The cosine distance between two vectors

x

and

y

is defined as:

D (x, y) = 1 - s i m (x, y)

(8)

For a given feature

z

, its distances to all class embeddings are:

D (z, C) = {D (z, c_{1}), \dots, D (z, c_{K})}

(9)

These distances are mapped to a scalar free energy value via a log-sum-exp formulation:

E (z) = - \frac{1}{T} l o g (\sum_{k = 1}^{K} e^{- T D (z, c_{k})})

(10)

where

T

is a temperature parameter controlling the smoothness of the energy landscape.

This formulation can be interpreted as a smooth minimum over distances, where embeddings closer to z dominate the energy value. Consequently, features close to at least one class embedding yields low energy, while features distant from all embeddings results in high energy.

To explicitly enforce low-energy alignment for known samples, FEAL is defined as:

L_{F E A} = \frac{1}{N} \sum_{i = 1}^{N} (E (z_{i}) - E_{0})^{2}

(11)

where

E_{0}

is a predefined target energy level for known samples, and

N

is the total number of known-class samples. Minimizing

L_{F E A}

encourages the free energy of known samples to converge around

E_{0}

, forming compact, stable low-energy zones within the feature space.

Finally, the overall training objective combines the geometric and energy-based losses:

L = {λ_{1} L}_{D C E} + λ_{2} L_{F E A}

(12)

where

λ_{1}

and

λ_{2}

balance the contributions of geometric and energy-based constraints.

2.2.5. Adaptive Energy Threshold Discrimination

In dynamic RF environments, the statistical distribution of signal features may shift during training or testing. A fixed energy threshold is therefore prone to bias and unstable classification. To address this issue, we design an adaptive energy thresholding strategy that updates the decision boundary online based on observed batch statistics. During each training iteration, the mean and variance of the free energy for known samples are calculated as:

\bar{E} = \frac{1}{N} \sum_{i = 1}^{N} E_{i}, S^{2} = \frac{1}{N} \sum_{i = 1}^{N} {{(E}_{i} - \bar{E})}^{2}

(13)

where

N

denotes the batch size and

E_{i}

represents the free energy of the

i

-th sample. To reduce sensitivity to batch noise, an exponential moving average (EMA) mechanism is applied:

μ_{t} = (1 - η) μ_{t - 1} + η \bar{E}, σ_{t} = \sqrt{(1 - η) σ_{t - 1}^{2} + η S^{2}}

(14)

where

η \in (0, 1)

is the update rate. This EMA-based update is widely used to obtain stable second-order statistics under non-stationary training dynamics. It smooths short-term fluctuations while retaining the long-term statistical trend of the overall energy distribution. As a result, it provides a more stable and robust estimate of the global energy landscape, even under dynamic or noisy signal conditions.

Based on the estimated statistics, the rejection threshold is defined as:

τ = μ_{t} + β σ_{t}

(15)

where

β

is a hyperparameter controlling the strictness of the boundary. Larger values of

β

result in a more tolerant threshold, while smaller values enforce stricter unknown rejection.

During inference, the deep feature

z

of each input sample is first extracted, and its free energy

E (z)

is computed. Classification is then performed based on the adaptive threshold

τ

:

\hat{y} = {\begin{array}{l} a r g m a x (s i m (z, c_{k})), & i f E (z) \leq τ \\ U n k n o w n, & i f E (z) > τ \end{array}

(16)

Samples with energy below the threshold are assigned to the most similar known class, whereas samples with high energy are rejected as unknown.

3. Results

3.1. Experimental Setup and Evaluation Protocol

3.1.1. Dataset Description

To evaluate the proposed open-set UAV signal recognition method, UAV RF signals were collected using a self-developed lightweight target intelligent acquisition system. The dataset contains downlink image transmission signals from DJI Phantom 4 RTK UAVs (DJI, Shenzhen, China). These signals correspond to the communication link from the UAV to the remote controller. During normal flight, this link is continuously active and occupies a relatively stable bandwidth. Therefore, it is suitable for RF signal recognition based on deep learning methods.

The collected signals follow DJI’s proprietary OcuSync 2.0 communication protocol and operate in the 5.8 GHz ISM band. The system adopts Time-Division Duplexing (TDD) to separate uplink control commands and downlink video transmission. Due to the proprietary nature of OcuSync 2.0, detailed physical-layer parameters such as FFT size, cyclic prefix length, and modulation order are not available. Accordingly, the signal characteristics were analyzed based on spectrum and time-domain observations. The received signals show a nearly rectangular spectrum with an occupied bandwidth of about 10 MHz, which indicates an OFDM-based multicarrier transmission. Autocorrelation analysis shows that the OFDM symbol duration is about 30 μs. In addition, the measured Peak-to-Average Power Ratio (PAPR) is around 8 dB, which is consistent with typical OFDM signals. Due to the scrambling operations applied in the transmitter, the exact modulation order cannot be directly identified, but the amplitude distribution suggests QAM-like modulation on the subcarriers.

The acquisition system supports a frequency range from 1.5 MHz to 8 GHz, has a real-time bandwidth over 30 MHz, and supports 32 simultaneous channels, allowing signals to be captured across multiple frequency bands. Experiments were conducted indoors to ensure signal stability. The receiving antenna was positioned approximately 20 m from the UAV remote controller. The system’s center frequency was set to 5.8 GHz, and the 32 channels covered from 5.725 GHz to 5.850 GHz. Signals were continuously recorded for 10 s at 30 Msps (complex IQ). The data acquisition process is shown in Figure 6.

The dataset includes signals collected from ten UAV units of the same model. Each UAV unit is treated as one class, which forms a same-model Specific Emitter Identification (SEI) task. To avoid the influence of transmitted content, all UAVs transmitted the same pre-stored image data during the experiments. In addition, to introduce some time-related variations within each class, data from each UAV were collected in four different sessions conducted at different times. Each session contributes roughly one quarter of the samples. With this setting, the model is encouraged to focus on hardware-related signal differences instead of data content or single-session effects.

For data preprocessing, contiguous segments of 4096 complex IQ samples were extracted from the raw recordings. Then, DC offset removal and normalization were applied. At the original sampling rate of 30 Msps, each segment corresponds to about 136 μs in time, which covers more than four OFDM symbol durations. Simple checks in both time and frequency domains show that the signal bandwidth and OFDM structure are still preserved after segmentation. After data screening, 5000 samples were finally retained, with 500 samples for each UAV unit.

To better understand the differences among UAV units, several RF-related attributes were analyzed using simple statistical methods, including residual carrier frequency offset, I/Q amplitude imbalance, and signal amplitude distribution. Although all UAVs use the same communication protocol and transmit the same payload data, differences in the distributions of these attributes can still be observed among different UAV units. At the same time, for a given UAV unit, these attributes remain relatively stable across different acquisition sessions. This observation indicates that the collected signals contain unit-specific RF characteristics that can be used for RF fingerprint-based identification.

For open-set recognition experiments, the ten UAV classes were divided into known and unknown subsets. Only the known classes were used in the training stage. The unknown classes did not participate in training and were only included in the testing stage to evaluate the open-set recognition performance. Although the dataset was collected from a single UAV model and a proprietary protocol, the RF fingerprint features used in this work mainly come from transmitter hardware imperfections, which are common in practical wireless systems. Therefore, the proposed method may be applicable to other RF signal recognition tasks with similar characteristics.

3.1.2. Experimental Environment and Implementation Details

All experiments were performed on a workstation with an Intel Core i9-11950H CPU (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA RTX A3000 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The model was implemented using PyTorch 2.2.2 with CUDA 12.1 support. To make the comparison fair, all models were trained and tested on the same dataset partition, using the same random seed. For the known classes, the data were divided into training, validation, and test sets in the ratio of 6:2:2. The detailed hyperparameter settings and implementation configurations are given in Table 1.

Each experiment was repeated several times, and we report the average results to reduce randomness and improve reproducibility.

3.1.3. Evaluation Metrics

To comprehensively evaluate the model performance for both closed-set classification and open-set recognition, four main metrics were adopted: accuracy (Acc), F1-score, area under the ROC curve (AUROC), and open-set classification rate (OSCR).

Acc: Acc measures the overall proportion of correctly classified known samples and is defined as:

A c c = \frac{T P + T N}{T P + T N + F P + F N}

(17)

where

T P

,

T N

,

F P

, and

F N

denote true positives, true negatives, false positives, and false negatives, respectively. This metric provides an intuitive assessment of the closed-set classification precision.

F1-score: The F1-score provides a balanced measure between precision and recall, offering a fair assessment of classification performance, especially when the class distribution is uneven. It is computed as:

P r e c i s i o n = \frac{T P}{T P + F P}, R e c a l l = \frac{T P}{T P + F N}

(18)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(19)

AUROC: AUROC quantifies how well the model distinguishes between known and unknown samples. It corresponds to the area under the receiver operating characteristic (ROC) curve, which captures the relationship between the true positive rate (TPR) and the false positive rate (FPR):

T P R = \frac{T P}{T P + F N}, F P R = \frac{F P}{F P + T N}

(20)

A U R O C = \int_{0}^{1} T P R (F P R) d (F P R)

(21)

A higher AUROC value reflects a stronger ability to separate unknown samples from known ones, as well as a more stable decision boundary.

OSCR: OSCR jointly evaluates closed-set classification accuracy and open-set rejection performance, providing a unified view of recognition effectiveness. It is defined as:

C C R (θ) = \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} 1 ({\hat{y}}_{i} = y_{i}, S_{i} \geq θ)

(22)

F P R (θ) = \frac{1}{N_{o}} \sum_{j = 1}^{N_{o}} 1 (S_{j} \geq θ)

(23)

O S C R = \int_{0}^{1} C C R (F P R) d (F P R)

(24)

where

N_{c}

and

N_{o}

denote the number of closed-set and open-set samples, respectively;

S

represents the model’s confidence score, and

θ

is the decision threshold. Higher OSCR values indicate a better overall trade-off between correctly classifying known classes and successfully rejecting unknown ones.

3.2. Closed-Set Recognition Performance

A strong closed-set classification backbone is the prerequisite for reliable open-set recognition. To assess the effectiveness of the proposed feature extractor in terms of accuracy, noise robustness, and parameter efficiency, we compared it with several representative IQ-based electromagnetic signal classification models, including ResNet [57], GRU [58], MCLDNN [59], and IQformer [60].

Figure 7 shows the closed-set recognition accuracy of all models across different SNRs. As expected, performance improves as SNR increases. However, the differences between models become pronounced at low SNR (SNR < 0 dB). Under substantial noise interference, both GRU and IQformer show a noticeable drop in accuracy, while ResNet and MCLDNN remain relatively stable but still underperform the proposed model. Across the entire SNR range, our method maintains the highest accuracy, suggesting stronger resilience and better generalization in noisy communication environments.

At an SNR of 0 dB, detailed quantitative results are reported in Table 2.

As shown in Table 2, the proposed model achieves the highest classification accuracy (97.34%) and F1-score (97.32%) among all compared methods, outperforming ResNet, GRU, MCLDNN, and IQformer. This demonstrates the effectiveness of the proposed feature extraction backbone under low-SNR conditions.

To further evaluate the computational efficiency, Table 3 reports the parameter count, FLOPs, and inference latency of different models. The inference latency is measured as the average forward-pass time per sample under the same experimental setup described in Section 3.1.2.

As shown in Table 3, the proposed model has the smallest number of parameters (0.057 M) and the lowest FLOPs (30.133 M), highlighting its lightweight design. Its inference latency on both CPU and GPU is slightly higher than that of ResNet. This is partly attributable to the inclusion of frequency-domain processing and Transformer-based components, which introduce additional computational overhead compared with standard convolutional operations that are highly optimized on modern hardware platforms. In contrast, ResNet is mainly composed of convolutional layers and therefore benefits more directly from mature hardware acceleration mechanisms available on modern CPU and GPU platforms.

Nevertheless, the inference latency of the proposed model remains within a few milliseconds on both CPU and GPU, which is acceptable for practical UAV signal monitoring applications. Considering its compact model size, moderate computational cost, and consistently superior recognition performance, the proposed method shows strong potential for deployment on resource-constrained platforms, such as edge devices and embedded systems.

To better understand category-level performance, Figure 8 presents the confusion matrices of all models at 0 dB SNR. The proposed model shows strong diagonal dominance, indicating that most samples are correctly classified. Misclassifications are fewer and less severe compared with the baseline models, consistent with the quantitative results.

To provide a feature-level view, Figure 9 visualizes the learned representations using t-SNE. The proposed model produces compact and well-separated clusters, with samples from the same class grouped tightly and clear margins between classes. In contrast, features from the baseline models show noticeable overlap, indicating weaker inter-class separation. This visualization supports the observed quantitative improvements.

In summary, these results show that the proposed model achieves strong closed-set classification. The learned feature space demonstrates compact intra-class clusters and clear inter-class boundaries, which improves accuracy under noisy conditions and provides a solid basis for the subsequent geometry-energy-based open-set recognition.

3.3. Open-Set Recognition Results

To comprehensively evaluate the effectiveness of the proposed GE-OSR framework in open-set recognition tasks, the performance was compared with several baseline methods, including OpenMax [46], ARPL [51], PROSER [61], CSSR [62], and the state-of-the-art S3R [63]. Experiments were conducted along two dimensions: (1) varying levels of openness, and (2) different SNR conditions.

First, the number of known classes was fixed at four, and the SNR was set to 0 dB to examine how the number of unknown classes affects recognition performance. Table 4 presents the AUROC results for each method under different numbers of unknown classes. As the number of unknown classes increases, AUROC values decrease consistently across all methods, reflecting the growing difficulty of open-set recognition at higher openness levels. Among all methods, GE-OSR achieves the highest AUROC scores and exhibits the smallest decline as openness increases. For instance, with six unknown classes, GE-OSR still reaches an AUROC of 96.62 ± 0.85%, demonstrating strong performance even under challenging conditions. The relatively high average AUROC and small standard deviations indicate that GE-OSR is stable and reliable across different openness settings.

Table 5 shows the OSCR results under the same experimental setup. Similarly to AUROC, OSCR decreases gradually as the number of unknown classes grows. However, GE-OSR consistently outperforms other methods in terms of OSCR, maintaining high scores across all openness levels. In particular, when six unknown classes are present, GE-OSR achieves an OSCR of 95.98 ± 0.92%, which is higher than that of all competing approaches, demonstrating a favorable balance between known-class classification and unknown-class rejection.

In the subsequent experiments, the number of known categories was fixed at four and the number of unknown categories at six. Since open-set UAV signal recognition needs to consider both classifying known categories correctly and rejecting unknown ones, OSCR is a more suitable metric as it evaluates both at the same time. Therefore, Figure 10 illustrates the OSCR performance of each method under different SNR conditions, providing a clearer view of how noise affects the open-set classification ability.

As shown in Figure 10, the OSCR performance of all models improves with increasing SNR, indicating that noise reduction generally benefits open-set recognition performance. Under low-SNR conditions, the proposed GE-OSR model clearly outperforms the other methods, demonstrating stronger robustness to noise interference. When the SNR becomes moderate or high (≥0 dB), S3R also achieves competitive performance. Nevertheless, GE-OSR still achieves higher performance, reaching an OSCR of about 96% at high SNR. This result suggests that GE-OSR not only performs well under adverse conditions but also exhibits a higher performance upper bound and more stable behavior across different noise environments.

To further illustrate how different methods separate known and unknown samples in the feature space, Figure 11 presents t-SNE visualizations at an SNR of 0 dB.

As shown in Figure 11, the features learned by the proposed GE-OSR framework display the most compact intra-class clusters and the clearest inter-class separations. More importantly, unknown samples are well-isolated from known clusters, forming distinct and independent regions in the feature space. This qualitative visualization provides intuitive support for the superior open-set recognition performance observed in the quantitative results.

3.4. Impact of Threshold Settings on Open-Set Recognition Performance

Threshold selection is very important in open-set recognition because it affects both the classification accuracy of known categories and the rejection rate of unknown signals. To show how this balance works, Figure 12a presents the free-energy distributions for known and unknown samples. Figure 12b shows how recognition accuracy changes as the rejection threshold varies. These figures help illustrate how the adaptive threshold affects the decision boundary and maintains stable recognition.

As seen in Figure 12a, most of the known samples are located in the low-energy region, while the unknown samples are mainly in the higher-energy region. This shows that there is a clear separation between the known and unknown signals in the energy space. However, some known samples still have relatively high energy. This usually happens when the signal has low SNR, channel distortion, or is close to the class boundary, which makes its features deviate from the class embedding. The dashed line represents the adaptive threshold

τ = μ_{t} + β σ_{t}

, which is approximately in the valley between the two distributions. This threshold can automatically adjust based on the energy distribution, so it does not need to be set manually and still effectively separates known and unknown samples.

Figure 12b further shows how the recognition performance changes as the threshold varies. At the current threshold, the GE-OSR model achieves 94.00% accuracy on known UAV signals and rejects 98.83% of unknown ones. These results show that the proposed method can achieve a good balance between correctly recognizing known classes and rejecting unfamiliar signals. The open-set confusion matrix is shown in Figure 13.

As shown in Figure 13, the proposed model maintains high classification accuracy across all known UAV categories and successfully rejects approximately 98.8% of unknown samples. Most known classes are clearly separated, although some confusion remains between UAV2 and the unknown category. Specifically, about 22% of UAV2 samples are incorrectly rejected as unknown, and approximately 1% of unknown samples are misclassified as UAV2. This behavior may be caused by partial overlap between the feature distributions of UAV2 and certain unseen signal types in the embedding space.

Despite this limitation, the adaptive energy-thresholding method can still work effectively. It can automatically adjust the decision boundary as the feature distribution changes, ensuring the model’s recognition and rejection abilities remain stable across different noise levels and openness conditions.

3.5. Ablation Studies

To systematically evaluate the contribution of each component in the proposed GE-OSR framework, a series of ablation studies was conducted. Following a step-by-step evaluation strategy, we first examine the feature extraction architecture under the closed-set setting, and then analyze the influence of different open-set modeling components. In this way, the representation learning ability can be evaluated independently of the open-set decision mechanism, making it easier to understand the contribution of each module.

3.5.1. Closed-Set Ablation Study

To evaluate the effectiveness of the proposed conv-Transformer hybrid design, a closed-set ablation study was conducted by comparing different CTBlock variants. The corresponding results are summarized in Table 6. For a fair comparison, all models adopt the same front-end feature extractor and training configuration, and only the internal structure of the CTBlock is changed. Each experiment was repeated five times, and the average classification accuracy along with the standard deviation is reported.

As shown in Table 6, the Conv-only variant achieves an average accuracy of 91.58%, indicating that convolutional operations are effective in capturing local and short-term temporal patterns in UAV RF signals. By contrast, the Transformer-only variant obtains a noticeably lower accuracy of 85.40%, suggesting that relying solely on global self-attention is not sufficient to model fine-grained temporal variations when directly applied to the extracted feature sequences.

For sequential hybrid designs, the Conv-first (convolution followed by self-attention) structure improves the accuracy to 93.46%, demonstrating that applying convolutional feature extraction prior to global attention helps stabilize the input representation for the Transformer. However, the Transformer-first (self-attention followed by convolution) variant performs noticeably worse, with an accuracy of only 81.64%, even lower than the Transformer-only baseline. This result indicates that introducing convolution after self-attention may interfere with the effective utilization of global dependencies and lead to suboptimal feature representations.

The proposed parallel CTBlock achieves the best performance, with an accuracy of 97.34%, consistently outperforming all other variants. By processing local convolutional features and global attention features in parallel and dynamically fusing them through a gated mechanism, the proposed design preserves complementary information from both branches without enforcing a strict processing order. This parallel structure enables the network to jointly model local temporal details and long-range dependencies, resulting in more discriminative feature representations for closed-set UAV signal classification.

Overall, these results validate the design choice of the proposed CTBlock and confirm that parallel conv-Transformer fusion is more effective than either single-branch or sequential hybrid alternatives for modeling UAV RF signals.

3.5.2. Ablation Study on Open-Set Modeling Components

To further analyze how each component of the GE-OSR model contributes to open-set recognition performance, we conducted ablation experiments on three key modules: FCTM, DCEL, and FEAL. These modules were incrementally added to the baseline model to observe their individual influence on AUROC and OSCR.

All experiments were conducted under the same conditions and repeated five times to improve reliability. The mean and standard deviation of the performance are shown in Table 7. For the version without FCTM, we used a 1 × 1 convolutional layer to maintain the same feature dimension, and all other settings remained unchanged.

As shown in Table 7, the basic model achieves an AUROC of 70.09% and an OSCR of 62.35%. After adding FCTM, both indexes increase significantly, by +16.29% in AUROC and +21.80% in OSCR. This shows that time-frequency fusion can extract more useful information and make the model more robust to noise. After adding DCEL, the two indicators increase again (+4.84% AUROC, +5.54% OSCR), indicating that the geometric constraint can bring intra-class features closer and enlarge the inter-class margin. When FEAL is also used, both metrics improve by around 5–6%, indicating that controlling the energy distribution is important for a smoother open-set boundary.

In conclusion, FCTM provides a solid foundation for feature extraction, DCEL helps build better geometric representations across classes, and FEAL clarifies the energy distribution between known and unknown areas. When all three modules are combined, the GE-OSR model demonstrates strong discriminative ability and stable recognition performance across different open-set UAV signal environments.

4. Discussion

The effectiveness of the proposed GE-OSR framework mainly arises from the joint modeling of feature geometry and energy distribution. From the geometric perspective, the DCEL module improves the compactness of feature distribution by pulling samples of the same class closer to their embeddings, so features from the same UAV become more clustered in the embedding space. At the same time, the inter-class constraint enlarges the angular separation between different class embeddings, which helps reduce overlap between classes and improves discrimination, especially under noisy conditions. From the energy perspective, the FEAL module keeps the energy of known samples within a stable, low range, while unknown samples are pushed to higher-energy areas. Samples that are far from all known class embeddings naturally produce higher energy values and are therefore easier to identify as unknown. When these two parts work together, they can build a clear, stable boundary in the geometry-energy space, which helps the model distinguish between known and unseen UAV signal classes more accurately and robustly.

At the feature representation level, the FCTM module and the CTBlocks work together to improve feature quality. The role of FCTM is to bring global frequency information into temporal feature learning. Specifically, spectral statistics extracted from the input signal are used to modulate time-domain features, so that temporal representations are adjusted according to the overall frequency characteristics of the signal. This process allows the network to adapt its temporal features based on different spectral patterns, rather than treating all signals in the same way. Building on these frequency-aware temporal features, the CTBlocks further model temporal information at different scales. The convolutional branch mainly captures local temporal patterns, such as short-term variations and fine-grained structures, while the self-attention branch focuses on long-range temporal dependencies and global context. By processing these two types of information in parallel, the network can exploit both local and global temporal characteristics without introducing excessive model complexity. Thanks to this design, the proposed model maintains strong recognition performance even under low SNR conditions. At the same time, the model has only about 0.057 million parameters, making it very lightweight and suitable for real-time UAV detection systems, including portable or edge devices often used in smart cities, emergency response, and perimeter surveillance.

However, there are still some limitations. The EMA-based threshold can sometimes be slow to react when channel conditions change quickly. The model may also fail to identify signals that are completely different from all known categories. In addition, the delay still needs to be reduced to improve the system’s performance in real-time tasks. In the future, we plan to improve the threshold method, enhance generalization to unseen signals, and further simplify the model architecture to meet the strict real-time and operational requirements of practical UAV monitoring applications.

Overall, GE-OSR demonstrates high accuracy, clear interpretability, and strong generalization ability. The model maintains very stable performance in open-set recognition, handling low SNRs and a large number of unknown signals with ease, whereas most existing methods often fail under such conditions. These results indicate that the joint constraint of geometry and energy is an effective strategy for managing complex electromagnetic signal environments. The combination of geometry and energy not only supports robust UAV signal recognition but also provides a promising approach for other intelligent perception tasks in open and dynamic environments, such as anti-drone surveillance, autonomous UAV navigation, and urban airspace management.

5. Conclusions

This paper proposes GE-OSR, an open-set UAV signal recognition framework that jointly models feature geometry and energy distribution. By integrating learnable class embeddings with dual-constraint geometric regularization and free energy alignment, the proposed method achieves stable and reliable discrimination between known and unknown UAV signals. Experimental results demonstrate that GE-OSR consistently outperforms existing methods, achieving up to a 2.96% improvement in OSCR under challenging noise and openness conditions. In addition to its strong recognition performance, GE-OSR is lightweight and interpretable, making it suitable for real-time deployment in UAV monitoring and anti-drone systems. More broadly, the geometry-energy joint modeling strategy introduced in this work provides a promising direction for future intelligent sensing systems. In particular, its ability to distinguish known and unknown UAV signals in complex electromagnetic environments is directly relevant to low-altitude airspace safety, smart city UAV management, and autonomous drone operations.

6. Patents

This work has resulted in the following patent:

Zhou, F.; Long, Y.; Zhou, H.; Ren, H.; Gao, R.; Li, H.; Yi, Z.; Hu, J.; Wang, W. An Open-Set UAV Signal Recognition Method Based on Class Center Learning. China National Intellectual Property Administration (CNIPA), Application No. CN202510455917.9, Publication No. CN120408403A, filed on 11 April 2025, Published on 1 August 2025.

Author Contributions

Conceptualization, Y.L. and H.Z.; data curation, Y.L. and H.Z.; formal analysis, W.Y. and H.R.; Funding acquisition, H.Z.; investigation, Y.L.; methodology, Y.L.; project administration, W.Y.; resources, H.Z.; supervision, H.Z. and F.Z.; validation, Y.L., W.Y. and H.R.; visualization, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, H.Z., W.Y., H.R., F.Z. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under grant number U2541202, by the National Natural Science Foundation of China under grant number 62231027, and by the National Natural Science Foundation of China under grant number 62471385.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to confidentiality and security restrictions.

Acknowledgments

The authors would like to acknowledge the 36th Research Institute of China Electronics Technology Group Corporation (CETC36) for its support in providing the experimental facilities and equipment for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Acc	Accuracy
AUROC	Area under the receiver operating characteristic curve
CTBlock	Convolutional-Transformer hybrid block
DCEL	Dual-constraint embedding loss
FEAL	Free energy alignment loss
GE-OSR	Geometry-energy open-set recognition
IQ	In-phase and quadrature
OSCR	Open-set classification rate
OSR	Open-set recognition
RF	Radio frequency
SNR	Signal-to-noise ratio
TFFM	Time-frequency feature merging
UAV	Unmanned aerial vehicle

References

Maddikunta, P.K.R.; Hakak, S.; Alazab, M.; Bhattacharya, S.; Gadekallu, T.R.; Khan, W.Z. Unmanned Aerial Vehicles in Smart Agriculture: Applications, Requirements, and Challenges. IEEE Sens. J. 2021, 21, 17608–17619. [Google Scholar] [CrossRef]
Dong, H.; Dong, J.; Sun, S.; Bai, T.; Zhao, D.; Yin, Y.; Shen, X.; Wang, Y.; Zhang, Z.; Wang, Y. Crop Water Stress Detection Based on UAV Remote Sensing Systems. Agric. Water Manag. 2024, 303, 109059. [Google Scholar] [CrossRef]
Surman, K.; Lockey, D. Unmanned Aerial Vehicles and Pre-Hospital Emergency Medicine. Scand. J. Trauma Resusc. Emerg. Med. 2024, 32, 9. [Google Scholar] [CrossRef] [PubMed]
Saunders, J.; Saeedi, S.; Li, W. Autonomous Aerial Robotics for Package Delivery: A Technical Review. J. Field Robot. 2024, 41, 3–49. [Google Scholar] [CrossRef]
Duan, Q.; Chen, B.; Luo, L. Rapid and Automatic UAV Detection of River Embankment Piping. Water Resour. Res. 2025, 61, e2024WR038931. [Google Scholar] [CrossRef]
Tlili, F.; Ayed, S.; Fourati, L.C. Advancing UAV Security with Artificial Intelligence: A Comprehensive Survey of Techniques and Future Directions. Internet Things 2024, 27, 101281. [Google Scholar] [CrossRef]
Medaiyese, O.O.; Ezuma, M.; Lauf, A.P.; Guvenc, I. Wavelet Transform Analytics for RF-Based UAV Detection and Identification System Using Machine Learning. Pervasive Mob. Comput. 2022, 82, 101569. [Google Scholar] [CrossRef]
Anwar, M.Z.; Kaleem, Z.; Jamalipour, A. Machine Learning Inspired Sound-Based Amateur Drone Detection for Public Safety Applications. IEEE Trans. Veh. Technol. 2019, 68, 2526–2534. [Google Scholar] [CrossRef]
Chu, H.; Zhang, D.; Shao, Y.; Chang, Z.; Guo, Y.; Zhang, N. Using HOG Descriptors and UAV for Crop Pest Monitoring. In Proceedings of the Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1516–1519. [Google Scholar]
Zhang, P.; Yang, L.; Chen, G.; Li, G. Classification of Drones Based on Micro-Doppler Signatures with Dual-Band Radar Sensors. In Proceedings of the Progress in Electromagnetics Research Symposium—Fall (PIERS—FALL), Singapore, 19–22 November 2017; pp. 638–643. [Google Scholar]
Nie, W.; Han, Z.; Zhou, M.; Xie, L.; Jiang, Q. UAV Detection and Identification Based on WiFi Signal and RF Fingerprint. IEEE Sens. J. 2021, 21, 13540–13550. [Google Scholar] [CrossRef]
Wang, X. Electronic Radar Signal Recognition Based on Wavelet Transform and Convolution Neural Network. Alex. Eng. J. 2022, 61, 3559–3569. [Google Scholar] [CrossRef]
Gu, Z.; Ma, Q.; Gao, X.; You, J.; Cui, T. Direct Electromagnetic Information Processing with Planar Diffractive Neural Network. Sci. Adv. 2024, 10, eado3937. [Google Scholar] [CrossRef] [PubMed]
Chu, T.; Zhou, H.; Ren, Z.; Ye, Y.; Wang, C.; Zhou, F. Intelligent Detection of Low-Slow-Small Targets Based on Passive Radar. Remote Sens. 2025, 17, 961. [Google Scholar] [CrossRef]
Cucchi, M.; Gruener, C.; Petrauskas, L.; Steiner, P.; Tseng, H.; Fischer, A.; Penkovsky, B.; Matthus, C.; Birkholz, P.; Kleemann, H.; et al. Reservoir Computing with Biocompatible Organic Electrochemical Networks for Brain-Inspired Biosignal Classification. Sci. Adv. 2021, 7, eabh0693. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Bai, J.; Wang, Y.; Jiao, L.; Zheng, S.; Shen, W.; Xu, J.; Yang, X. Few-Shot Electromagnetic Signal Classification: A Data Union Augmentation Method. Chin. J. Aeronaut. 2022, 35, 49–57. [Google Scholar] [CrossRef]
Sun, Z.; Wang, Z.; Wang, M. SLTRN: Sample-Level Transformer-Based Relation Network for Few-Shot Classification. Neural Netw. 2024, 176, 106344. [Google Scholar] [CrossRef]
Liu, Z.; Pei, W.; Lan, D.; Ma, Q. Diffusion Language-Shapelets for Semi-Supervised Time-Series Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 14079–14087. [Google Scholar]
Zhou, H.; Jiao, L.; Zheng, S.; Yang, L.; Shen, W.; Yang, X. Generative Adversarial Network-Based Electromagnetic Signal Classification: A Semi-Supervised Learning Framework. China Commun. 2020, 17, 157–169. [Google Scholar] [CrossRef]
Liu, J.; Chen, S. TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 13918–13926. [Google Scholar]
Hou, K.; Du, X.; Cui, G.; Chen, X.; Zheng, J.; Rong, Y. A Hybrid Network-Based Contrastive Self-Supervised Learning Method for Radar Signal Modulation Recognition. IEEE Trans. Veh. Technol. 2025, 1–15. [Google Scholar] [CrossRef]
Zhou, H.; Wang, X.; Bai, J.; Xiao, Z. Modulation Signal Recognition Based on Selective Knowledge Transfer. In Proceedings of the GLOBECOM 2022–2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 1875–1880. [Google Scholar]
Wang, M.; Lin, Y.; Tian, Q.; Si, G. Transfer Learning Promotes 6G Wireless Communications: Recent Advances and Future Challenges. IEEE Trans. Reliab. 2021, 70, 790–807. [Google Scholar] [CrossRef]
Akter, R.; Doan, V.; Thien, H.; Kim, D. RFDOA-Net: An Efficient ConvNet for RF-Based DOA Estimation in UAV Surveillance Systems. IEEE Trans. Veh. Technol. 2021, 70, 12209–12214. [Google Scholar] [CrossRef]
Lofù, D.; Gennaro, P.D.; Tedeschi, P.; Noia, T.D.; Sciascio, E.D. URANUS: Radio Frequency Tracking, Classification and Identification of Unmanned Aircraft Vehicles. IEEE Open J. Veh. Technol. 2023, 4, 921–935. [Google Scholar] [CrossRef]
Cai, Z.; Wang, Y.; Jiang, Q.; Gui, G.; Sha, J. Toward Intelligent Lightweight and Efficient UAV Identification with RF Fingerprinting. IEEE Internet Things J. 2024, 11, 26329–26339. [Google Scholar] [CrossRef]
Ouamna, H.; Kharbouche, A.; Madini, Z.; Zouine, Y. Deep Learning-Assisted Automatic Modulation Classification Using Spectrograms. Eng. Technol. Appl. Sci. Res. 2025, 15, 19925–19932. [Google Scholar] [CrossRef]
Ozturk, E.; Erden, F.; Guvenc, I. RF-Based Low-SNR Classification of UAVs Using Convolutional Neural Networks. ITU J. Future Evol. Technol. 2021, 2, 39–52. [Google Scholar] [CrossRef]
Zhang, C.; Luo, J.; Shi, K.; Liu, T.; Ling, C. Multi-Channel Physical Feature Convolution and Tri-Branch Fusion Network for Automatic Modulation Recognition. Electronics 2025, 14, 4847. [Google Scholar] [CrossRef]
Dong, N.; Jiang, H.; Liu, Y.; Zhang, J. Intrapulse Modulation Radar Signal Recognition Using CNN with Second-Order STFT-Based Synchrosqueezing Transform. Remote Sens. 2024, 16, 2582. [Google Scholar] [CrossRef]
De Vries, H.; Strub, F.; Mary, J.; Larochelle, H.; Pietquin, O.; Courville, A. Modulating Early Visual Processing by Language. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 6594–6604. [Google Scholar]
Perez, E.; Strub, F.; De Vries, H.; Dumoulin, V.; Courville, A. FiLM: Visual Reasoning with a General Conditioning Layer. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying Convolution and Attention for All Data Sizes. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–14 December 2021. [Google Scholar]
Gulati, A.; Qin, J.; Chiu, C.-C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-Augmented Transformer for Speech Recognition. In Proceedings of the Interspeech, Shanghai, China, 25–29 October 2020. [Google Scholar]
Xu, S.; Zhang, D.; Lu, Y.; Xing, Z.; Ma, W. MCCSAN: Automatic Modulation Classification via Multiscale Complex Convolution and Spatiotemporal Attention Network. Electronics 2025, 14, 3192. [Google Scholar] [CrossRef]
Huynh-The, T.; Pham, Q.-V.; Nguyen, T.-V.; Costa, D.B.D.C.; Kim, D.-S. RF-UAVNet: High-Performance Convolutional Network for RF-Based Drone Surveillance Systems. IEEE Access 2022, 10, 49696–49707. [Google Scholar] [CrossRef]
Dhakal, R.; Kandel, L.N. Physical-Layer-Based UAV Identification: A Comprehensive Review. J. Aerosp. Inf. Syst. 2025, 22, 624–643. [Google Scholar] [CrossRef]
Zhang, H.; Zhou, H.; Wang, L.; Zhou, F. Spatial Distribution Feature Extraction Network for Open Set Recognition of Electromagnetic Signal. Comput. Model. Eng. Sci. 2024, 139, 279–296. [Google Scholar] [CrossRef]
Scheirer, W.J.; Rocha, A.d.R.; Sapkota, A.; Boult, T.E. Toward Open Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1757–1772. [Google Scholar] [CrossRef]
Scheirer, W.J.; Jain, L.P.; Boult, T.E. Probability Models for Open Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2317–2324. [Google Scholar] [CrossRef]
Zhang, H.; Patel, V.M. Sparse Representation-Based Open Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1690–1696. [Google Scholar] [CrossRef] [PubMed]
Júnior, P.R.M.; de Souza, R.M.; Werneck, R.d.O.; Oliveira, L.S.; Papa, J.P. Nearest Neighbors Distance Ratio Open-Set Classifier. Mach. Learn. 2017, 106, 359–386. [Google Scholar] [CrossRef]
Bendale, A.; Boult, T.E. Towards Open Set Deep Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1563–1572. [Google Scholar]
Neal, L.; Olson, M.; Fern, X.; Wong, W.; Li, F. Open Set Learning with Counterfactual Images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 620–635. [Google Scholar]
Yang, Y.; Hou, C.; Lang, Y.; Guan, D.; Huang, D.; Xu, J. Open-Set Human Activity Recognition Based on Micro-Doppler Signatures. Pattern Recognit. 2019, 85, 60–69. [Google Scholar] [CrossRef]
Oza, P.; Patel, V.M. C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2302–2311. [Google Scholar]
Chen, G.; Zhang, Y.; Liang, X.; Liu, Y.; Lin, L. Learning Open Set Network with Discriminative Reciprocal Points. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 507–522. [Google Scholar]
Chen, G.; Peng, P.; Wang, X.; Tian, Y. Adversarial Reciprocal Points Learning for Open Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8065–8081. [Google Scholar] [CrossRef]
Geng, C.; Chen, S. Collective Decision for Open Set Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 192–204. [Google Scholar] [CrossRef]
Wang, H.; Pang, G.; Wang, P.; Zhang, L.; Wei, W.; Zhang, Y. Glocal Energy-Based Learning for Few-Shot Open-Set Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7507–7516. [Google Scholar]
D’Incà, M.; Peruzzo, E.; Mancini, M.; Xu, D.; Goel, V.; Xu, X.; Wang, Z.; Shi, H.; Sebe, N. OpenBias: Open-set Bias Detection in Text-to-Image Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 12225–12235. [Google Scholar]
Zhu, L.; Yang, Y.; Xu, F.; Lu, X.; Shuai, M.; An, Z.; Chen, X.; Li, H.; Martin, F.L.; Vikesland, P.J.; et al. Open-Set Deep Learning-Enabled Single-Cell Raman Spectroscopy for Rapid Identification of Airborne Pathogens in Real-World Environments. Sci. Adv. 2025, 11, eadp7991. [Google Scholar] [CrossRef]
Liu, W.; Wang, X.; Owens, J.D.; Li, Y. Energy-based out-of-distribution detection. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Xu, J.; Luo, C.; Parr, G.; Luo, Y. A Spatiotemporal Multi-Channel Learning Framework for Automatic Modulation Recognition. IEEE Wirel. Commun. Lett. 2020, 9, 1629–1632. [Google Scholar] [CrossRef]
Shao, M.; Li, D.; Hong, S.; Qi, J.; Sun, H. IQFormer: A Novel Transformer-Based Model with Multi-Modality Fusion for Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1623–1634. [Google Scholar] [CrossRef]
Zhou, D.; Ye, H.; Zhan, D. Learning Placeholders for Open-Set Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4399–4408. [Google Scholar]
Huang, H.; Wang, Y.; Hu, Q.; Cheng, M. Class-Specific Semantic Reconstruction for Open Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4214–4228. [Google Scholar] [CrossRef]
Yu, N.; Wu, J.; Zhou, C.; Shi, Z.; Chen, J. Open Set Learning for RF-Based Drone Recognition via Signal Semantics. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9894–9909. [Google Scholar] [CrossRef]

Figure 1. Conceptual illustration of open-set recognition: (a) Closed-set recognition: all test samples belong to known classes; (b) Closed-set classifier encountering unknown samples: unknown data are incorrectly assigned to the nearest known class; (c) Open-set recognition: known samples are correctly classified while unknown ones are rejected. The black dashed boundaries represent known classes, whereas the red dashed boundaries represent unknown classes.

Figure 2. Architecture of the proposed time-frequency feature extraction network.

Figure 3. Structure of the frequency-conditioned temporal modulation module.

Figure 4. Structure of the convolutional-Transformer hybrid block.

Figure 5. Overall architecture of the geometry-energy open-set recognition framework. The “?” symbol and gray color denote unknown classes, whereas other colors denote known classes. The class embeddings and the threshold

τ

are learned during training and fixed for inference.

Figure 5. Overall architecture of the geometry-energy open-set recognition framework. The “?” symbol and gray color denote unknown classes, whereas other colors denote known classes. The class embeddings and the threshold

τ

are learned during training and fixed for inference.

Figure 6. The data acquisition process.

Figure 7. Closed-set classification accuracy of different models under varying SNR conditions.

Figure 8. Confusion matrices of different models at 0 dB SNR: (a) ResNet; (b) GRU; (c) MCLDNN; (d) IQformer; (e) Proposed model. The proposed approach shows greater diagonal dominance and fewer inter-class confusions than the baselines.

Figure 9. t-SNE visualization of the feature space at 0 dB SNR: (a) ResNet; (b) GRU; (c) MCLDNN; (d) IQformer; (e) Proposed model. The proposed approach yields more compact, separable feature clusters than baseline networks.

Figure 10. OSCR performance curves of different methods under varying SNR conditions.

Figure 11. Feature visualization of known and unknown classes across different open-set recognition methods: (a) OpenMax; (b) ARPL; (c) PROSER; (d) CSSR; (e) S3R; (f) Proposed GE-OSR. The proposed GE-OSR produces compact intra-class clusters and well-separated inter-class boundaries, effectively isolating unknown samples from known feature regions.

Figure 12. Effect of threshold settings on open-set recognition performance: (a) Free energy distributions of known and unknown samples; (b) Accuracy-rejection rate curves across different thresholds.

Figure 13. Open-set recognition confusion matrix.

Table 1. Experimental settings.

Parameter	Value
$α$	32
$δ$	0.1
$T$	10.0
$E_{0}$	−0.1
$λ_{1}$	0.3
$λ_{2}$	1.0
$β$	0.2
batch size	128
optimizer	AdamW
learning rate	0.001

Table 2. Closed-set classification performance of different models at 0 dB.

Model	Acc (%)	F1-Score (%)
ResNet	94.21 ± 1.13	94.30 ± 1.11
GRU	83.24 ± 1.91	83.02 ± 2.08
MCLDNN	95.04 ± 0.98	95.14 ± 1.13
IQformer	88.37 ± 0.36	88.38 ± 0.41
Proposed	97.34 ± 0.41	97.32 ± 0.47

Table 3. Computational complexity and inference latency of different models.

Model	Parameters (M)	FLOPs (M)	CPU Latency (ms)	GPU Latency (ms)
ResNet	0.202	78.226	4.294	0.786
GRU	0.159	620.766	362.403	4.603
MCLDNN	0.241	1068.936	97.277	7.716
IQformer	0.072	144.636	38.118	2.595
Proposed	0.057	30.133	10.911	0.924

Table 4. AUROC (%) Results across Openness Levels (4 Known Classes, 0 dB SNR).

Method	Number of Unknown Classes						Average
Method	1	2	3	4	5	6	Average
Openmax	97.21 ± 1.56	94.34 ± 1.22	93.82 ± 1.21	91.94 ± 1.59	89.15 ± 1.81	87.81 ± 1.70	92.38 ± 3.48
ARPL	97.57 ± 1.07	93.39 ± 2.96	93.93 ± 2.39	91.97 ± 2.98	92.65 ± 2.68	90.18 ± 1.69	93.28 ± 2.98
PROSER	94.03 ± 1.78	91.03 ± 1.91	91.10 ± 2.12	90.24 ± 2.04	90.05 ± 2.31	89.94 ± 2.40	91.04 ± 2.52
CSSR	97.11 ± 1.11	93.48 ± 1.52	93.39 ± 1.42	92.14 ± 1.62	89.64 ± 1.57	89.32 ± 1.76	92.45 ± 3.15
S3R	97.67 ± 0.78	95.17 ± 1.18	95.66 ± 1.51	95.28 ± 1.06	95.01 ± 1.09	93.82 ± 1.55	95.48 ± 1.39
GE-OSR	98.26 ± 1.09	97.04 ± 0.76	96.98 ± 0.58	96.79 ± 0.64	96.64 ± 0.98	96.62 ± 0.85	97.05 ± 1.01

Table 5. OSCR (%) Results across Openness Levels (4 Known Classes, 0 dB SNR).

Method	Number of Unknown Classes						Average
Method	1	2	3	4	5	6
Openmax	95.38 ± 0.22	93.35 ± 0.35	91.32 ± 0.36	90.02 ± 0.66	85.94 ± 1.63	84.57 ± 1.93	90.10 ± 3.92
ARPL	97.08 ± 0.94	92.32 ± 2.74	93.04 ± 2.33	91.60 ± 3.67	92.06 ± 2.54	89.37 ± 2.28	92.58 ± 3.21
PROSER	92.72 ± 2.52	88.89 ± 2.92	89.75 ± 2.48	89.98 ± 2.61	89.55 ± 2.51	89.58 ± 1.28	90.08 ± 2.69
CSSR	95.28 ± 0.67	92.36 ± 1.59	91.59 ± 1.11	91.17 ± 2.12	88.97 ± 0.70	86.71 ± 1.84	90.93 ± 3.33
S3R	97.00 ± 1.02	94.67 ± 1.07	94.90 ± 1.79	94.69 ± 0.88	94.52 ± 1.39	93.02 ± 0.65	94.80 ± 1.69
GE-OSR	97.26 ± 0.80	96.21 ± 0.54	96.55 ± 1.01	96.40 ± 1.21	96.18 ± 1.17	95.98 ± 0.92	96.43 ± 1.04

Table 6. Closed-set classification performance of different CTBlock variants at 0 dB SNR.

CTBlock Variant	Conv Branch	Transformer Branch	Fusion Type	Accuracy (%)
Conv-only	√	×	-	91.58 ± 1.02
Transformer-only	×	√	-	85.40 ± 1.39
Conv-first	√	√	Sequential	93.46 ± 0.59
Transformer-first	√	√	Sequential	81.64 ± 1.01
Proposed	√	√	Parallel	97.34 ± 0.41

Table 7. Ablation study of the GE-OSR framework.

FCTM	DCEL	FEAL	AUROC (%)	OSCR (%)
×	×	×	70.09 ± 1.80	62.35 ± 1.34
√	×	×	86.38 ± 1.33	84.15 ± 1.45
√	√	×	91.22 ± 1.17	89.69 ± 1.02
√	√	√	96.62 ± 0.85	95.98 ± 0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Long, Y.; Zhou, H.; Yu, W.; Ren, H.; Zhou, F.; Zhang, Y. Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference. Drones 2026, 10, 36. https://doi.org/10.3390/drones10010036

AMA Style

Long Y, Zhou H, Yu W, Ren H, Zhou F, Zhang Y. Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference. Drones. 2026; 10(1):36. https://doi.org/10.3390/drones10010036

Chicago/Turabian Style

Long, Yudong, Huaji Zhou, Wenbo Yu, Huan Ren, Feng Zhou, and Yufei Zhang. 2026. "Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference" Drones 10, no. 1: 36. https://doi.org/10.3390/drones10010036

APA Style

Long, Y., Zhou, H., Yu, W., Ren, H., Zhou, F., & Zhang, Y. (2026). Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference. Drones, 10(1), 36. https://doi.org/10.3390/drones10010036

Article Menu

Open-Set UAV Signal Identification Using Learnable Embeddings and Energy-Based Inference

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Feature Extractor

2.1.1. Time-Frequency Convolutional Hybrid Network

2.1.2. Frequency-Conditioned Temporal Modulation Module

2.1.3. Convolutional-Transformer Hybrid Blocks

2.2. GE-OSR: An Open-Set Recognition Framework Based on Geometry-Energy Joint Modeling

2.2.1. Framework Overview

2.2.2. Feature Metrics and Class Embedding Initialization

2.2.3. Dual-Constraint Embedding Loss

2.2.4. Free Energy Alignment Loss

2.2.5. Adaptive Energy Threshold Discrimination

3. Results

3.1. Experimental Setup and Evaluation Protocol

3.1.1. Dataset Description

3.1.2. Experimental Environment and Implementation Details

3.1.3. Evaluation Metrics

3.2. Closed-Set Recognition Performance

3.3. Open-Set Recognition Results

3.4. Impact of Threshold Settings on Open-Set Recognition Performance

3.5. Ablation Studies

3.5.1. Closed-Set Ablation Study

3.5.2. Ablation Study on Open-Set Modeling Components

4. Discussion

5. Conclusions

6. Patents

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI