1.1. Background
With the explosive growth of emerging applications such as the Internet of Things (IoT) and unmanned aerial vehicle (UAV) communication, the density of devices and users in wireless communication systems has increased sharply.
Figure 1 is a schematic diagram of spectrum resource application in the context of the IoT. It can be seen from the figure that various devices such as intelligent transportation systems and UAVs operate simultaneously to achieve effective information transmission across industries. During the transmission process, on the one hand, due to the inherent scarcity of wireless spectrum resources, the sharing of overlapping frequency bands by multiple systems or users, and the non-ideal characteristics of transmitting and receiving devices, such as frequency offset, phase noise, and nonlinear distortion, mutual interference between devices occurs frequently. On the other hand, the competition for spectrum resources among various devices and users has intensified, leading to prominent interference issues. This not only causes spectrum congestion but also results in significant uncertainty in frequency band occupancy, ultimately leading to an increasing shortage of radio spectrum resources. To cope with this challenge, Cognitive Radio (CR) technology has emerged, which allows primary and secondary users to share spectrum resources [
1]. As one of the key technologies of CR, Spectrum Sensing (SS) provides important support for spectrum resource reuse by monitoring the spectrum usage in real-time [
2].
Traditional spectrum sensing methods include Energy Detector (ED) [
3], Cyclostationary Feature Detector (CFD) [
4], and Matched Filtering Detector (MFD) [
5]. The ED determines whether the spectrum is occupied or not by measuring the energy of the received signal, which has the advantages of simple implementation and high detection capability of independent non-correlated signals. The CFD makes use of the periodicity of the signal for spectrum sensing, and performs well in dealing with periodic signals. The MFD identifies the signals by comparing them with the known signal templates. The above traditional methods have good detection results, but there are still the following problems: (1) Weak perception of signals in complex noise environments, prone to false judgement; (2) Poor effect on non-periodic signals, requiring longer perception time, affecting the perception efficiency; (3) Unable to effectively perceive complex electromagnetic environments, such as the IoT.
Deep Learning (DL) has been widely used in IoT due to its powerful feature extraction capability and nonlinear expressiveness [
6]. Existing DL methods mainly include convolutional neural networks [
7], recurrent neural networks [
8], attention mechanisms [
9], etc. These methods have achieved significant results in IoT communication fields, such as spectrum sensing [
10] and modulation mode identification [
11]. Under normal circumstances, traditional neural network methods require a large number of labeled samples to make the network model have good generalization ability and robustness. However, with the rapid development of communication technology, the electromagnetic signals in the wireless space are evolving towards diversification and advancement, which triggers the problems such as difficulty in sample acquisition and high labor cost for annotation. As a result, the network model cannot be trained adequately, which seriously affects the performance of spectrum sensing.
Few-shot Learning (FSL) [
12] aims to enable the model to quickly adapt to new tasks on a limited dataset, so as to improve the generalization ability and learning efficiency in the face of new tasks, which can effectively cope with the problem of spectrum sensing of wireless signal spectrum in multiple scenarios under the condition of scarce data sample size. Nowadays, the most effective method to implement FSL is Meta-learning [
13], which can be broadly classified into three categories: optimization [
14], migration [
15], and metric [
16] based methods. Among them, metric learning is widely used in many fields due to its simple structure and efficient processing, mainly including matching network [
17,
18] prototypical network [
19,
20] and relation network [
21,
22]. The relation network has become a hotspot in current research owing to its unique ability to capture nonlinear relationships and good generalization performance.
1.2. Related Work
In recent years, regarding the issues of spectrum sensing and signal recognition in few-shot scenarios, research teams have conducted extensive studies focusing on directions such as metric learning optimization and feature enhancement. This section analyzes existing relevant methods from multi-scale feature utilization capability and frequency-domain feature enhancement design to clarify the progress and shortcomings of each method in these two core technologies.
Lim et al. [
23] proposed an SSL-ProtoNet network architecture, which integrates self-supervised learning and knowledge distillation into the prototype network and achieves optimization through three stages. In the pre-training stage, augmented tasks with unlabeled samples are used, and discriminative features are learned via contrastive learning. In the fine-tuning stage, pre-trained parameters serve as initial weights, and dual losses are fused to prevent overfitting. In the self-distillation stage, a well-trained “teacher model” guides the training of a “student model” through soft labels to improve generalization ability. However, SSL-ProtoNet relies solely on single-scale features for self-supervised clustering and lacks a multi-scale feature extraction mechanism, making it unable to capture complementary features of wireless signals at different scales. Additionally, this method focuses entirely on the self-supervised optimization of spatial features and ignores the frequency-domain properties of wireless signals. When facing scenarios where frequency domains of multi-transmitter signals overlap, it is difficult to distinguish each transmitter component in mixed signals.
Feng et al. [
24] proposed a Global Information Embedding Network, designing a dedicated global feature extraction module for few-shot learning tasks. The core of GIEN lies in the Global Information Embedding (GIE) module, which performs frequency-domain modulation on the extracted local features using learnable real-valued frequency filters, embedding the global frequency distribution information of signals into local features. This represents a valuable attempt in the dimension of frequency-domain feature enhancement and partially compensates for the insufficient global feature modeling capability of CNNs. Nevertheless, this method only designs a global information embedding mechanism for single-scale features and fails to consider the multi-scale characteristics of wireless signals. In the Internet of Things, the frequency distribution of signals from different transmitters varies significantly, and features of signals from the same transmitter at different scales are complementary. The single-scale design cannot fully extract these multi-scale features.
Zhang et al. [
25] constructed a semi-supervised few-shot framework for wireless signal recognition, combining a Deep Residual Shrinkage Network (DRSN) with a semi-supervised strategy to form a technical process of “noisy feature extraction, semi-supervised fusion and few-shot classification”. The DRSN uses a soft threshold function to suppress noise and learn discriminative features, while the modular semi-supervised method fuses labeled and unlabeled data via MixMatch to improve classification performance under few-shot conditions. However, this method relies on fixed layers of the deep residual network for feature extraction and lacks a dedicated multi-scale parallel extraction structure, making it unable to actively capture key information of signals at different scales. Meanwhile, it only focuses on noise suppression of time-domain signals and does not design an enhancement mechanism for multi-scale frequency-domain features of signals. mere time-domain noise suppression cannot distinguish the frequency-domain features of different transmitters, resulting in limited feature discrimination ability.
Hao et al. [
26] proposed a meta-learning-based Multi-Frequency Multiplier ResNet method for few-shot automatic modulation classification tasks. The method designs ResNet branches with different sampling rates to extract low-frequency coarse-grained features and high-frequency fine-grained features of signals respectively. Moreover, it integrates multi-frequency features through a feature fusion module, ensuring feature integrity while reducing computational complexity. Compared with traditional meta-learning methods, its accuracy is improved by approximately 5% to 7%. However, M-MFOR still has prominent limitations. In terms of frequency-domain feature enhancement, its multi-frequency feature extraction only focuses on the frequency decomposition of time-domain signals and does not design a dedicated enhancement module for the global distribution of frequency-domain features. Thus, it cannot enhance key frequency-domain features and suppress noise interference through weight optimization.
When calculating sample similarity, traditional metric learning methods usually directly use global features or simply averaged features, without considering the local semantic differences of sample features. In the time-frequency diagrams of wireless signals, feature regions of different transmitters may have positional differences. Direct matching of global features will reduce the accuracy of similarity calculation. To solve this problem, Hao et al. [
27] proposed the Semantic Alignment Metric Learning method, which achieves accurate alignment of local semantic features through a “collection-selection” strategy. In few-shot image classification, SAML first calculates the distances between all region pairs to construct a relation matrix for “semantic feature collection”, then adjusts the matrix weights for “semantic feature selection”, and finally outputs sample similarity scores via a multi-layer perceptron, increasing the accuracy of the 5-way 1-shot task to over 80%. However, in terms of multi-scale feature utilization, SAML’s local region decomposition relies on fixed-size feature blocks, making it unable to extract and fuse features of different granularities in parallel. In terms of frequency-domain feature enhancement, it does not process frequency-domain features and only focuses on the semantic alignment of spatial local features, leading to limited performance.
Table 1 combs and presents relevant methods from multi-scale feature utilization capability and frequency-domain feature enhancement capability.
Table 2 describes the core variables of the method proposed in this paper.
Based on the above analysis, although existing few-shot signal recognition methods have made progress in improving feature discriminability, adapting to distribution shifts, and semantic alignment, they still have three problems in the IoT multi-transmitter spectrum sensing scenario: insufficient utilization of multi-scale features, lack of frequency-domain feature enhancement design, and insufficient coordination between multi-scale and frequency-domain features.
To address the above-mentioned problems, this paper proposes a few-shot spectrum sensing method using multi-scale global features. This method utilizes a multi-scale feature extractor and a learnable weight feature enhancer to extract multi-scale global features of signals. The multi-scale feature extractor employs average pooling operations with different rates to extract multi-scale features of signals. On this basis, the learnable weight feature enhancer optimizes the weights of features at different scales in the frequency domain, completes the enhancement of signal features, and thereby achieves the goals of enhancing signal discriminability and improving sensing performance. The main contributions can be summarized as follows:
- (1)
In this paper, a relation network framework based on multi-scale feature enhancement is proposed. By constructing a multi-scale feature extraction module and a frequency-domain feature enhancement module, this framework enables the acquisition of global features at multiple scales, and improves the network’s ability to distinguish wireless signals.
- (2)
For multi-scale features, a frequency-domain feature enhancement module is constructed, which extracts global features at different scales. It optimizes the representation capability of global features of the network by using multiple learnable parameters to weigh the features at different scales.
- (3)
Under the research background of spectrum sensing in the IoT, multiple sets of comparative experiments were carried out by changing different training sample sizes, verifying the effectiveness of this model. It achieved accurate spectrum sensing of multiple radiation sources under the few-shot condition, providing a practical and feasible new idea for solving the problem of scarce spectrum resources in the IoT.