1. Introduction
Sensor technology plays a vital role in modern communication systems; the signal captured by the sensor contains rich modulation information, and the modulation signal identification is the core of decoding the key information. Signals collected by sensors in wireless communication, radar or military communication are usually processed by various modulations, which affect the transmission efficiency, anti-interference ability, and information-carrying capacity of signals. Modulation signal recognition technology is the bridge between sensor perception and signal analysis. The acquisition accuracy, response speed, and anti-noise ability of the sensor affect the recognition accuracy. The recognition results can optimize the sensor acquisition strategy and improve the system efficiency. Especially in the complex electromagnetic environment, the receiving parameters can be dynamically adjusted to reduce invalid data. In these systems, signal analysis and processing is the basis of efficient communication, and modulation recognition is one of the important technologies, which automatically identifies the modulation type of the received signal through analysis [
1], providing support for demodulation and data decoding. It is widely applied in fields such as wireless communication, radar, and military communication. Conventional approaches for modulation classification can be broadly categorized into two main groups: likelihood-based and feature-based techniques [
2]. The likelihood-based method relies on prior knowledge, which is computationally complex and has low recognition accuracy in complex environments. Based on the feature rule, the high-order moment [
3], cyclic spectra [
4], cyclic statistics [
5] and other features of the signal are extracted, and then the classifier is used for recognition. Although likelihood-based methods are computationally simple, they require a large number of empirical features and perform poorly in complex scenarios. This is particularly true in cases with few samples, where it is challenging to effectively extract appropriate features for recognition. Therefore, modulation recognition technology needs further optimization and innovation when facing complex signal environments.
Deep learning technology has achieved remarkable results in automatic feature extraction and classification. O’Shea et al. [
6] introduced an automatic modulation recognition model based on CNN, which achieved higher recognition performance compared to traditional methods relying solely on manually extracted expert-crafted features. Peng et al. [
7] employed a deep convolutional neural network for automatic modulation recognition. Liu et al. [
8] proposed a DCN-BiLSTM model for AMR, which extracts the phase and amplitude characteristics of the modulated signal through DCN, and the BiLSTM layer solves the long-term correlation problem. Tao et al. [
9] proposed the AG-CNN model, which integrates the dual attention mechanism and Ghost module to improve performance while reducing the number of parameters. Li et al. [
10] proposed the LAGNet model, integrating LSTM and GCN, and used an attention mechanism to enhance signal recognition capability. Elsagheer et al. [
11] combined ResNet with LSTM, improving classification accuracy under high signal-to-noise ratios (SNR). Zhang et al. [
12] proposed a novel network called CTRNet, which combines CNN and Transformer to capture global sequential dependencies and optimize parameter efficiency. Chen et al. [
13] enhanced training performance by generating augmented samples through wavelet transform. Asad [
14] proposed an innovative method based on the linear combination of cumulants (LC) and genetic algorithm (GA), which uses hypercumulants to classify five modulation types, and uses a k-nearest neighbor classifier (KNN) to improve the recognition rate under low SNR, which outperforms the existing methods. Gao et al. [
15] proposed a framework based on a Mixture of Experts model, where a Transformer module processes low-SNR signals and a ResNet module handles high SNR signals, achieving effective recognition under varying SNR conditions. SCTMR-Net, proposed by An et al. [
16] achieves better performance by using constant wavelet convolution filters in the SCT module employed, reducing the learning parameters and lowering the prior channel state. The method based on MFDE proposed by Li [
17] can effectively reflect the complexity, frequency, and amplitude changes in the signal in the feature extraction of the noise signal. Esmaiel et al. [
18] proposed a feature extraction method combining Enhanced Variational Mode Decomposition, Weighted Permutation Entropy, and Local Tangent Space Alignment. The method first decomposes the signal into intrinsic mode functions, calculates the WPE of each IMF to enhance the decomposition effect of VMD, and then uses LTSA to reduce the high-dimensional features to two-dimensional. Zhang et al. [
19] proposed to build a lightweight neural network, MobileViT, driven by clustering constellation images to achieve real-time automatic modulation recognition, which has superior performance and efficiency. Zhang et al. [
20] proposed a real-time automatic modulation recognition method based on a lightweight mobile radio Transformer, which is constructed by iteratively training the radio Transformer and pruning the redundant weights with information entropy. It learns robust modulation knowledge from multi-modal signal representation and has strong adaptability to communication conditions. The method can be deployed on an unmanned aerial vehicle receiver to realize air-to-air and air-to-ground cognitive communication in low-requirement communication scenarios.
For certain modulation schemes, their performance in specific environments may be highly similar, making it difficult to distinguish them effectively using a single feature. Therefore, multi-feature fusion methods have become crucial for improving recognition accuracy. By integrating multiple feature types, models can more accurately identify modulation schemes under complex environments and noise interference. Liu et al. [
21] proposed a novel AMC method for OTFS systems. We constructed a two-stream convolutional neural network model to capture multi-domain signal features simultaneously, which greatly improves the recognition accuracy. Hu et al. [
22] proposed a KB-DBCN network, which combines knowledge-based feature extraction and data-driven methods to process signal features in parallel, and maps and fuse features through fully connected layers. Zhang et al. [
23] proposed a recognition scheme based on multi-modal feature fusion, which constructs a deep convolutional neural network to extract spatial features through RSBU-CW and interacts with temporal features extracted by LSTM to enhance feature diversity. Hao et al. [
24] proposed a modulation classification framework using multi-domain amplitude features, achieving recognition through statistical analysis of envelope variations. Zhao et al. [
25] proposed a hybrid feature extraction method based on time-domain statistical features and high-order cumulants and introduced a new parameter, AT, to significantly improve the modulation recognition performance under low-SNR conditions. Gao et al. [
26] normalized and fused time-frequency entropy features, higher-order statistical features, and network-extracted features using non-negative matrix factorization, achieving effective signal recognition. Tan et al. [
27] proposed a multi-feature fusion approach using IQ signals, AP signals, and time-frequency signals to construct a more comprehensive signal representation, enhancing the feature expression capability and recognition accuracy of automatic modulation recognition methods. Wang et al. [
28] proposed a multi-scale network with multi-channel input and multi-head self-attention and BiGRU, but it has the problems of a large number of parameters and high computational complexity. Liu et al. [
29] proposed a multi-modal data fusion method based on iDCAM, which constructs feature maps utilizing local and global embedding layers, in which the inputs pass through iDCAM modules to capture high-level features and global weights, and modal advantages are combined to improve the recognition effect. Wei et al. [
30] introduced a novel multi-dimensional shrinkage module based on CNN to effectively extract temporal information from raw IQ signals, though it still suffers from relatively high time and space complexity. Jiang et al. [
31] proposed a BLR modulation recognition method, which combines BiLSTM to extract the timing features of IQ data, ResNet-18 to extract the constellation features, and serial feature fusion to mine the complementarity of multi-modal data.
Although existing research has made progress in improving recognition performance under noisy environments, many challenges remain in complex situations. In particular, under high noise interference or poor channel conditions, the performance of modulation recognition models may significantly decrease. To address this critical issue, the Multi-Feature Fusion Channel Attention Transformer (MFCA-Transformer) proposed in this paper achieves performance breakthroughs through the following innovative work:
By extracting multi-dimensional information from different features in the original data set, including constellation features, time-frequency features, and power spectrum features, the construction of multi-dimensional feature maps not only provides a more comprehensive representation of the data, fully exploiting its potential features, but also helps the model better recognize and classify complex patterns, achieving higher accuracy and effectiveness.
Traditional feature fusion methods, such as direct concatenation or simple convolution, are unable to fully utilize global information, resulting in fused features that are not comprehensive and precise. In this paper, a Triple Dynamic Feature Fusion (TDFF) is proposed. By introducing a dynamic mechanism, it adaptively fuses local features at different scales based on global information, improving the quality of feature fusion and enhancing the model’s recognition ability in complex environments.
For the issue in Swin Transformer models where there is insufficient information interaction between feature channels in multi-dimensional feature recognition tasks, this paper introduces the Channel Prior Convolutional Attention (CPCA) mechanism, which effectively promotes interaction and information transfer between feature channels, compensating for the deficiencies of traditional models in channel information interaction. As a result, the model’s accuracy and robustness in handling multi-dimensional feature classification tasks are improved.
When dealing with modulation signals under low SNR, label noise may exist, causing the model to overfit the training data, and leading to poor generalization on low-SNR signals. Therefore, label smoothing is introduced into the standard cross-entropy loss function. By adjusting the distribution of the true labels, the smoothed model’s predicted probabilities are made closer to the real confidence, thus enhancing the model’s generalization ability.
2. Methods
2.1. The MFCA-Transformer Recognition Network with Multi-Dimensional Feature Fusion
Based on the aforementioned research, this paper proposes a deep neural network architecture based on multi-dimensional feature input fusion, aiming to effectively improve the accuracy of signal processing and feature extraction. The network architecture primarily consists of four key modules: multi-dimensional feature construction, multi-dimensional feature fusion module, feature extraction module, and attention mechanism module. The structure of the MFCA-Transformer network model is shown in
Figure 1.
After data preprocessing, multi-dimensional feature representation is constructed, including the constellation diagram feature of the modulation signal, the time-frequency diagram feature based on short-time Fourier transform, and the power spectrum diagram feature as the network input. The pre-processed features first enter the TDFF module, which integrates the features of different dimensions adaptively through the dynamic weight allocation mechanism, and outputs the fused feature vector with the same dimension. The fused features are fed into a neural network based on Swin Transformer architecture, which uses a hierarchical self-attention mechanism to model long-distance dependencies in the global range, and effectively captures local and global features in the image or signal through sliding window technology.
In the final stage of feature extraction, the output layer is replaced by the CPCA module. The CPCA module extracts the prior information of the channel dimension through the convolution operation, and optimizes the feature channel by combining with the squeeze-and-excitation mechanism to strengthen the key features and suppress noise interference, so as to further improve the feature discrimination. Finally, the optimized features are mapped to the classification space through the fully connected layer, and then the probability distribution of each class is calculated through the softmax layer to generate the final classification result.
2.2. Modulated Signal Multi-Dimensional Feature Construction
Modulated signals are primarily categorized into two types: analog modulation and digital modulation. Analog modulation directly encodes the original information onto a carrier signal, whereas digital modulation first converts digital signals into analog form before modulation or directly modulates the digital signals. In wireless communication systems, the transmitter is responsible for converting information into either analog or digital modulated signals for transmission. The channel serves as the medium for information transfer, and the receiver detects and decodes the signals. The received signal can be expressed as follows:
where
denotes the transmitted modulated signal,
indicates the channel impulse response, and
corresponds to the additive noise signal. In this study, additive noise is employed to investigate modulation algorithms under complex channel conditions.
2.2.1. Constellation Characteristics
Constellation feature representation is the core visualization tool used to describe the characteristics of modulation signals, and its essence is the set of projections of signal vectors on the complex plane. By representing different symbols or signal states on the complex plane, it helps us to intuitively understand the signal and its characteristics. For a discrete-time signal
, the constellation points are located at coordinates
, where
I (in-phase) and
Q (quadrature) denote the orthogonal components. This can be represented mathematically as
Here,
and
correspond to the discrete amplitude and phase values, respectively.
M represents the modulation order, that is, the number of symbol states in a specific modulation scheme; for example,
M = 16 for 16QAM modulation,
M = 2 for BPSK modulation, which is used to quantify the symbol space complexity of different modulation signals. BPSK has two constellation points, which are located in the direction of the real axis on the complex plane and represent 1 bit of information. The phases of the constellation points are 0° and 180°, respectively. The 16QAM has 16 constellation points, each of which represents 4 bits of information, combining different amplitude and phase variations to achieve higher transmission rates. PAM4 uses 4 different amplitude levels to represent 2 bits of information and the points in the constellation are distributed on one dimension of the complex plane according to the amplitude values. The constellation feature map uses an oversampling mechanism to generate in-symbol samples, and the specific sampling rate is set to be 8 times of the symbol rate, that is, each symbol contains 8 sample points. The characteristic constellation diagrams for BPSK, QPSK, CPFSK, GFSK, 16QAM, and PAM4 modulation schemes at high SNR are shown in
Figure 2.
2.2.2. Time-Frequency Characteristics
With the continuous advancement of communication technologies, signal modulation schemes have become increasingly complex, rendering traditional time-domain or frequency-domain analysis methods progressively inadequate for effectively identifying these sophisticated modulated signals. Consequently, time-frequency analysis has emerged as a powerful tool for modulation recognition. Short-time Fourier transform (STFT), as a common time-frequency analysis method, divides the continuous time domain signal into several short-time segments by sliding window function. Fourier transform is applied to each segment to obtain the time-frequency domain distribution features. STFT is widely used in time-frequency representation and modulation recognition of signals, which provides a stronger ability to characterize the time-domain features of modulated signals. The mathematical formulation of the STFT is expressed as
where
is the signal to be analyzed,
is a window function, usually a windowing function, used to limit the time range of the signal,
is the frequency,
is the time variable, and
denotes the integration operation. The time-frequency characteristics of BPSK, QPSK, 8PSK, GFSK, 16QAM, and PAM4 modulation signals at high SNR are shown in
Figure 3.
The time-frequency distribution of the signal is extracted by STFT. The horizontal axis implies the time series, the vertical axis corresponds to the frequency axis, and the color brightness reflects the energy density of the signal. The core differences in the six types of modulation signals are reflected in the form of energy distribution and time-frequency concentration: BPSK has a narrow band and linear distribution in the time-frequency plane because of its simple binary phase jump and fewer frequency components, and its energy is concentrated in a single frequency band; with the increase in modulation order, the phase jump of 8PSK becomes more intensive, the time-frequency energy band is further broadened, and the edge of time-frequency energy is slightly diffused; the time-frequency energy of GFSK presents a smooth band distribution, and there is no obvious discrete energy peak. The time-frequency energy band of 16QAM is the widest and dispersive, and the energy density decays rapidly with the frequency axis, reflecting the multi-dimensional characteristics of high-order modulation. The time-frequency energy of PAM4 is concentrated in the low-frequency band, and the amplitude modulation is dominated by the baseband energy.
2.2.3. Power Spectrum Characteristics
The power spectral density of a modulated signal shows the signal’s power distribution across frequencies, reflecting energy variations. It helps evaluate bandwidth efficiency, spectral efficiency, and signal interference with noise. The power spectrum is defined as
where
is the power spectrum of the signal
,
represents the magnitude operation, and T is the observation time of the signal. Different modulation schemes have different power spectrum characteristics, which are determined by the symbol rate, bandwidth, and the modulation scheme itself. The power spectra of BPSK-, QPSK-, CPFSK-, GFSK-, 16QAM-, and PAM4-modulated signals at high SNR are shown in
Figure 4.
The horizontal axis is the normalized frequency, and the vertical axis is the power spectrum density, which reflects the signal frequency domain energy distribution and out-of-band radiation. The core differences in the six types of modulation signals are mainlobe width, sidelobe suppression, and spectral efficiency: the mainlobe width of BPSK is the same as that of QPSK, and the sidelobes are high and dense; the sidelobes of QPSK are slightly lower than those of BPSK; the mainlobe of CPFSK is narrower than that of BPSK and QPSK, and the sidelobe suppression is better, which reflects the spectral advantages of constant envelope modulation. GFSK has very low sidelobes due to Gaussian filtering that suppresses out-of-band radiation, which is typical of spectrum-friendly modulation. The side lobe of 16QAM attenuates rapidly, and the energy is concentrated in the main lobe, which reflects the high spectral efficiency. PAM4 has the narrowest main lobe and almost leak-free side lobes, in contrast to the broadband nature of the other modulations.
2.3. TDFF Multidimensional Feature Fusion Module
The existing feature fusion methods lack the use of global information, resulting in imperfect fusion features. In this paper, a three-input dynamic feature fusion (TDFF) module is proposed to optimize the effect of feature fusion through the dynamic adaptive fusion of multi-scale local features [
32]. The dynamic mechanism includes channel dynamic selection and spatial dynamic selection. The former selects the important feature maps according to the global channel information, and the latter calibrates the feature maps according to the global spatial information, which enhances the detail preservation and global information utilization, enhancing the model’s segmentation accuracy and robustness. The TDFF module realizes the adaptive integration of local features through a global context-aware mechanism, as shown in
Figure 5.
The TDFF module employs an adaptive feature selection mechanism guided by global contextual cues to optimize feature integration. First, the feature maps
,
, and
are spliced into features
along the channel as follows:
In order to ensure that the subsequent modules can make full use of the fusion features, a channel compression mechanism needs to be introduced to recover the original channel dimensionality; the channel compression in TDFF does not simply use 1 × 1 × 1 convolution operation, but uses global channel information to guide the weight parameters
. The concatenated AVGPool is used to pool the stitching features globally, compress the spatial information, and focus on the channel-level global features. The Conv1 compresses the number of channels through 1 × 1 convolution, and then generates the channel attention weight through Sigmoid activation. This weight reflects the contribution of different dimensions and channels to the final fusion, and the larger the value is, the more important the corresponding channel is, so as to describe the weight distribution of features more effectively.
where
Wc denotes the channel weight parameter.
The fused features are corrected through global channel information, the channel attention weight
Wc is multiplied by the original splicing feature
F according to elements, and then 1 × 1 × 1 convolution channel dimension transformation is used to unify the multi-dimensional features to the same number of channels, so as to prepare dimensions for subsequent fusion. The convolutional layer adaptively enhances critical feature
and filters out non-essential components under the guidance of channel-wise statistics,
where
Fc denotes the key feature map of the channel and
denotes the element-by-element multiplication.
In order to model the spatial dependence between local feature maps, The input feature maps
F1,
F2, and
F3 are subjected to spatial information encoding using a 1 × 1 × 1 convolution, respectively. The convolution does not change the spatial dimension of the features, but fuses the local features in the channel, and the output dimension is consistent with the original feature map. The encoded multi-dimensional feature map is added element by element. This operation forces the multi-dimensional features to interact in the spatial dimension, and excavates the spatial association between features of different dimensions. Sigmoid activation is performed on the added feature map, and the values of all spatial positions are mapped to the [0, 1] interval to generate the global spatial attention weight
Ws.
where
Ws denotes global spatial information and
denotes element-by-element addition.
Multiplying the global spatial attention weight
Ws and the feature map
Fc after channel weighting pixel by pixel to obtain the final fusion feature
. This spatial information is used to calibrate the feature map, highlighting salient spatial regions and thereby reinforcing the expression of key features.
In order to solve the problem of scale difference in multi-dimensional features, the two-step method of scale alignment is adopted, firstly, the constellation map features, time-frequency features and power spectrum features are mapped to 64 channels by 1 × 1 convolution to unify the channel dimensions, and then the features are unified to 64 × 64 spatial dimensions by adaptive average pooling, which makes the feature dimensions consistent and provides the basis for dynamic fusion. Different from the traditional fusion method of direct splicing or fixed weighting, TDFF realizes the dynamic adaptive fusion of multi-dimensional features through channel and spatial attention mechanism, so that the network can independently judge the features of different dimensions, channels, and spatial locations according to the characteristics of input signals.
2.4. Feature Extraction Module
The feature extraction part adopts the network architecture of Swin Transformer, which aims to improve the performance in computer vision tasks [
33]. At present, various neural networks are used for image recognition. Compared with the recognition of modulation signals by various neural networks, the Swin Transformer neural network has strong scalability. It is suitable for the improvement of different network structures, and it has the best accuracy and speed trade-off. The structure of Swin Transformer is usually composed of several stages, and each stage contains several basic modules, namely, the Swin Transformer block [
34]. The Swin Transformer network structure is shown in
Figure 6.
In the Swin Transformer model, the input image is first divided into small blocks, and then the hierarchical structure progressively extracts features from the image data set. The input feature map (size:
H × W × C) is partitioned into
N = (W/P) × (H/P) non-overlapping patches, each of size
P × P. Each small block is mapped to D dimension through linear mapping transformation, and the obtained image feature can be expressed by a formula, such as the following:
where
Z0 is the initial feature representation and
D represents the embedding dimension. The layering process in the fusion Swin model is usually divided into multiple stages, and each stage contains multiple Swin Transformer blocks. The input feature map
Z0 passes through L times of the Swin Transformer block, the feature dimension is unchanged, and the resulting feature map is denoted as
Z1. The process is expressed by the following formula:
A feature map Z1 is obtained, the resolution (spatial dimension) of which is reduced by half and the space size is changed from H × W to H/2 × W/2, and the number of channels is increased to 2D, so that the model can capture a wider range of context information. The feature maps are merged by the image block merging operation, which can be expressed by the following formula:
After patch merging, adjacent blocks in
Z1 are merged. This step strengthens the long-distance dependence, which is crucial to understand the global context, further aggregates the features of adjacent blocks, and enlarges the receptive field. The range of the feature map
Z2 can be expressed as
After that, the model repeats this process, with the feature map going through several Swin Transformer blocks at each stage, and the patch merging operation is performed at the beginning of each stage. After this process is repeated many times, the feature map undergoes progressive downsampling while expanding its channel depth. The Swin Transformer block structure contains important operational processes in the model. Sliding the window attention, apply the self-attention mechanism to each window, and its expression is as follows:
After the window self-attention is completed, the two-layer feedforward network will deal with it. The FFN usually uses the GELU activation function and the linear layer to enhance the feature representation through nonlinear transformation. The function of the feed-forward network is to introduce a nonlinear transformation to further optimize feature after self-attention, and its expression is as follows:
This leads to residual connection and layer normalization in the neural network. After each sub-module, the residual and normalization mechanisms ensure the stability of the deep Swin Transformer training and accelerate the model convergence. The function of residual connection and layer normalization is to stabilize the training process, optimize the gradient flow, and avoid the gradient disappearance or explosion during model training. The expression of residual connection and normalization operation is
2.5. CPCA Module
Although the Swin Transformer model has powerful feature expression ability, it still has the issue of inadequate information interaction. CPCA [
35] addresses limited inter-channel feature interaction in multi-dimensional recognition. By employing CPCA, the model strengthens attention on crucial image channel features, thereby improving classification accuracy.
The Swin Transformer block in the last stage of the Swin Transformer is replaced by a CPCA mechanism module, which captures the interaction between channels. It can deeply learn the dependence relationship between channel features, so as to guide the model, which is enabled to place greater emphasis on the feature channels containing key information. Unlike the traditional convolution operation that only focuses on local pixel information, the CPCA module designs special convolution kernels, which focus on processing the dependencies between channels, so that the model can adaptively adjust the learning of convolution kernels according to the interaction between feature channels. The CPCA module structure is shown in
Figure 7.
The CPCA module adopts an effective channel prior convolution attention, namely the CPCA mechanism, and supports the assignment of weights across both the channel and spatial dimensions. This is achieved by using multi-scale depth-separable convolutions and 1 × 1 bar convolutions, which efficiently extract spatial relationships while preserving channel priors. The multi-scale depth-wise strip convolution kernel enables efficient feature extraction with reduced computational cost. The CPCA mechanism integrates two key components, channel attention (CA) and spatial attention (SA), as shown in
Figure 8.
The CA map is produced by the channel attention module, which captures the inter-channel relationships within the features. Following the CBAM method, spatial information in the feature map is gathered using average pooling and max-pooling, resulting in two independent spatial context features. These features are then fed into a shared multi-layer perceptron (MLP). The outputs from the shared MLP are combined through element-wise summation to form the final channel attention map. To reduce parameter overhead, the shared MLP is composed of a single hidden layer, with the activation size of the hidden layer being
, where r represents the reduction ratio.
where
is the Sigmoid function.
The SA map captures cross-dimensional relationships, enabling dynamic weight allocation across both channel and spatial axes, rather than forcing consistency. Depth-wise convolution captures spatial relationships between features, preserving channel relationships while minimizing computational complexity. Additionally, a multi-scale structure is used to improve the convolutional ability to capture spatial relations. Finally, a convolution is applied at the end of the SA module for channel mixing, producing a more refined attention map.
where
DwConv is the deep convolution,
represents the
ith branch, and
represents the identity connection.
2.6. Loss Function
For most of the classification recognition tasks, the standard cross-entropy loss is adopted, that is, the real label is assumed to be one-hot encoding y (that is,
yi = 1 indicates the real class, and the rest
yj = 0). The unnormalized fraction (logits) of the model output is
z, and the probability distribution
p is obtained by softmax as follows:
The standard cross-entropy loss is
However, at low SNR, modulated signals may contain tag noise, leading to training overfitting and degraded generalization on low SNR signals. Therefore, label smoothing is introduced in this paper. By adjusting the distribution of the real label, the absolutely correct label is softened. The probability of the real category is reduced from 1 to
, and the remaining
C − 1 categories are increased from 0 to
. The smoothed label distribution
ysmooth is
where
is the smoothing factor (usually
) and
C is the total number of modulation classes (e.g., QPSK\16QAM, etc.). Finally, the smoothed labels
are brought into the cross-entropy loss,
When expanded, it is divided into two parts,
Where the non-true category item .
Modifying the loss function encourages the model to predict the true class with a probability close to , rather than the extreme value of 1, and prevents the model from predicting the other classes with a probability of exactly 0, preserving a certain probability distribution. By adjusting the distribution of the true label, the smoothed model prediction probability is closer to the true confidence, thereby enhancing the model’s generalization capability.
4. Conclusions
The modulation signal recognition method based on multi-dimensional characteristics fusion proposed in this paper demonstrates clear advantages over traditional single-dimensional feature extraction approaches in dealing with complex environments. Traditional methods usually rely on single-dimensional features, such as constellation diagram or time-frequency diagram, to describe the signal, but these single-dimensional features are often not enough to fully reveal the complexity of the signal, especially in low signal-to-noise ratio environments, the ability of feature extraction is often limited. Therefore, this paper successfully improves the multi-dimensional description ability of the signal by constructing multi-dimensional features including constellation, time-frequency and power spectrum. The multi-dimensional feature representation can not only capture the dynamic changes and time domain characteristics of the signal more comprehensively, but also effectively make up for the shortcomings of a single-dimensional feature in a low signal-to-noise ratio environment. By introducing the TDFF and the CPCA mechanism, the model can make full use of different dimensions of signal features to further improve the recognition accuracy.
Experimental results on two standard datasets show that the proposed model successfully achieves accurate recognition of various modulation types. Under the condition of high SNR, the recognition accuracy of most modulation types is more than 95%. Compared with the existing deep learning recognition methods, the recognition accuracy of this method is improved by 3% to 14%. This significant improvement not only verifies the potential of multi-dimensional feature extraction in improving recognition accuracy, but also proves the superiority of the proposed method in anti-noise performance. Compared with the traditional single-dimensional feature method, the proposed multi-dimensional feature fusion strategy has a stronger generalization ability and can adapt to signal recognition tasks in different noise environments. In short, the intelligent identification method of communication signals proposed in this study provides an efficient technical solution for practical applications, and shows important application value in key areas such as communication signal reconnaissance, spectrum monitoring and system optimization. This method not only significantly improves the accuracy and robustness of signal recognition in complex electromagnetic environments, but also offers both theoretical insights and technical support for research in the field of intelligent radio signal processing.