MCCSAN: Automatic Modulation Classification via Multiscale Complex Convolution and Spatiotemporal Attention Network

Songchen Xu; Duona Zhang; Yuanyao Lu; Zhe Xing; Weikai Ma

doi:10.3390/electronics14163192

,

and

School of Computer and Artificial Intelligence, North China University of Technology, Beijing 100144, China

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(16), 3192;https://doi.org/10.3390/electronics14163192

This article belongs to the Special Issue Human-Centered Artificial Intelligence for Human-Computer Interaction, Signal Processing, and Unmanned Systems

Version Notes

Order Reprints

Abstract

Automatic Modulation Classification (AMC) is vital for adaptive wireless communication, yet it faces challenges in complex environments, including insufficient feature extraction, feature redundancy, and high interclass similarity among modulation schemes. To address these limitations, this paper proposes the Multiscale Complex Convolution Spatiotemporal Attention Network (MCCSAN). In this work, we propose three key innovations tailored for AMC tasks: a multiscale complex convolutional module that directly processes raw I/Q sequences, preserving critical phase and amplitude information while extracting diverse signal features. A spatiotemporal attention mechanism dynamically weights temporal steps and feature channels to suppress redundancy and enhance discriminative feature focus. A combined loss function integrating cross-entropy and center loss improves intraclass compactness and interclass separability. Evaluated on the RML2018.01A dataset and RML2016.10A across SNR levels from −6 dB to 12 dB, MCCSAN achieves a state-of-the-art classification accuracy of 97.03% (peak) and an average accuracy improvement of 3.98% over leading methods. The study confirms that integrating complex-domain processing with spatiotemporal attention significantly enhances AMC performance.

Keywords:

automatic modulation classification; deep learning; attention mechanism; complex convolution

1. Introduction

Communication signal recognition technology plays a vital role in mission-critical applications, including operator network surveillance, anti-jamming communication protocols, and user authentication mechanisms [1,2,3]. The fundamental objective of this technology centers on real-time assessment of communication resource allocation patterns to guarantee secure, time-critical, and robust data transmission performance [4]. Automatic Modulation Classification (AMC) plays a crucial role in identifying modulation types over various spectral bands, enabling adaptive communication adjustments and facilitating effective analysis of the electromagnetic environment [5]. Besides this, AMC is vital for extracting digital baseband information from signals, particularly when there is limited knowledge of system parameters. This technique finds broad applications across both military and civilian domains, particularly in cognitive radio and anomaly detection, both of which have recently garnered growing research attention [6,7].

Although deep-learning-based methods are currently popular solutions for AMC, several key challenges remain. First, traditional Convolutional Neural Networks (CNNs) exhibit inherent limitations in processing complex-valued signals. This is primarily due to the degradation of phase information, which results in inadequate extraction of critical features—namely amplitude and phase characteristics—embedded in modulation signals. Such loss significantly compromises the classification accuracy, especially for complex modulation types. Second, the time-frequency representations of signals often contain redundant information. This redundancy hampers the model’s ability to focus on salient features during learning, thereby reducing its effectiveness in discriminating between modulation types. Third, similar modulation schemes tend to exhibit high overlap in the feature space. A single loss function is often insufficient to address this issue, as it fails to enforce both intraclass compactness and interclass separability. As a result, interclass confusion increases and classification performance deteriorates.

To address these challenges, a Multiscale Complex Convolution Spatiotemporal Attention Network (MCCSAN) is proposed (Figure 1). The Multiscale Complex Convolutional (MCC) block is responsible for extracting low-level time-domain features from raw complex-valued I/Q signals. It consists of three parallel complex convolution branches with kernel sizes of 3, 7, and 15, respectively. The Bidirectional Gated Recurrent Unit (BiGRU) component follows the MCC block and models the sequential dependencies and temporal dynamics of the extracted features. The proposed spatiotemporal attention mechanism enables efficient feature selection, reducing redundancy while enhancing the model’s ability to distinguish critical signal features. To tackle the issue of interclass signal similarity inherent in single loss functions, a combined loss function is adopted to improve intraclass compactness and interclass separability, thereby ensuring high classification accuracy and enhanced model interpretability. In summary, the contributions of this paper are as follows.

Figure 1. Pipeline of the Multiscale Complex Convolution Spatiotemporal Attention Network (MCCSAN).

(1) A multiscale complex convolutional architecture is introduced to tackle the challenge of insufficient feature extraction in signal processing, where critical features are often lost in the original input signals. Although complex-valued convolution has been applied in other domains, its application in AMC remains underexplored. The proposed feature extraction module directly processes complex-valued signals, retaining both phase and amplitude information. By using convolution kernels of different sizes, the model extracts multiscale signal features in parallel. By extracting diverse signal features, the model significantly enhances classification accuracy.

(2) A novel spatiotemporal attention mechanism is proposed to address the issue of feature redundancy in signals. The hierarchical architecture adjusts the weights of individual channels to emphasize the key channels, and then applies adaptive weighting to both time steps and channels. An attention-based adaptive pooling strategy is utilized to reduce the dimensionality of signal features and lower computational complexity. By extracting essential features, this approach effectively enhances the network ability to focus on the most relevant information.

(3) A combined loss function architecture, integrating cross-entropy loss and center loss, is tailored for AMC to tackle the challenge of interclass signal similarity. By minimizing the distance between samples of the same class in the feature space, the model enhances the separability between classes. Experiments conducted on the RML2018.01A dataset and RML2016.10A demonstrate that the proposed network improves accuracy by 3.98% compared to the state-of-the-art, achieving better performance across a range of signal-to-noise ratios (SNR) from −6 dB to 12 dB.

The remainder of this paper is organized as follows: Related work is introduced in Section 2. Section 3 introduces the multiscale complex convolution spatiotemporal attention network, and our experiments and corresponding results are discussed in Section 4. Section 5 concludes the paper.

3. Multiscale Complex Convolution Spatiotemporal Attention Network

The proposed MCCSAN is illustrated in this section. Notably, the designed network is built upon BiGRU, and the feature extraction module is implemented using multiscale complex convolution. The spatiotemporal attention mechanism is inserted after the BiGRU layer to capture the interdependencies among signal features. Finally, the network is optimized and evaluated using a joint loss function.

3.1. Multiscale Complex Convolutional Module

For AMC techniques, CNNs are often used for modulation classification tasks. A multiscale complex convolution is applied as a feature extractor in the MCCSAN. Unlike generic complex CNNs, the complex convolution used in our model simulates the behavior of true complex multiplication in the real domain. By computing both direct and cross-term convolutions between the real and imaginary parts of inputs and kernels, the operation captures the amplitude-phase coupling present in complex modulated signals. This formulation goes beyond simple channel-wise separation or concatenation, and enables the model to extract discriminative features that are sensitive to joint I/Q variations.

In the I/Q plane, many modulation types differ by relative phase shifts, amplitude scaling, or trajectory curvature in complex time-series. A real-valued convolution cannot model these geometric transformations in the complex plane unless specially engineered. Complex convolution, however, naturally performs equivariant transformations under rotation and scaling. For a phase rotation

X (t) \to X (t) e^{j θ}

, the output transforms consistently:

Y (t) \to Y (t) e^{j θ}

(1)

This property is essential in AMC, where two samples from the same class may differ only by a global phase shift.

Through convolution kernels of different scales, the network can comprehensively extract signal features from short-range features, e.g., signal transitions to long-range features, e.g., periodic variations, thereby improving network performance (Figure 2).

Figure 2. Single complex convolutional module of MCCSAN. “Real” and “Imag.” represent the real and imaginary parts of the complex I/Q signal, corresponding to the in-phase and quadrature components, respectively.

Complex convolution is used to better preserve complex structural information and avoid information loss [30]. The complex-domain signal can be defined as:

X (t) = X_{r} (t) + j X_{i} (t)

(2)

where

j

represents the imaginary unit, which is generally defined as

j = \sqrt{- 1}

,

X (t)

is the input complex signal, and

X (t) \in R^{B \times L \times 2 C_{i n}}

. Although

X (t)

is a complex-valued signal, it is represented as a real-valued tensor

X (t) \in R^{B \times L \times 2 C_{i n}}

, where the last dimension encodes the real and imaginary parts in separate channels.

X_{r} (t)

is the real part of the signal, and

X_{i} (t)

is the imaginary part of the signal. Specifically,

X_{r} (t)

and

X_{i} (t)

can be expressed as:

X_{r} (t) = X [\dots, : C_{i n}], X_{i} (t) = X [\dots, C_{i n} :]

(3)

In this paper, the input signal is an IQ sequence, and the value of

C_{i n}

is set to 1. B is the batch size and L denotes the length of the input sequence, which is 1024.

In communication theory, a complex-valued signal consists of an In-Phase (I) and a Quadrature (Q) component. The instantaneous amplitude and phase can be directly derived from

I / Q

as:

A (t) = \sqrt{I {(t)}^{2} + Q {(t)}^{2}}

(4)

ϕ (t) = arctan (\frac{Q (t)}{I (t)})

(5)

These two features are critical in distinguishing different modulation types. Therefore, a model that captures not only the individual behavior of I and Q, but also their interdependence, can more effectively retain modulation characteristics.

Complex convolution preserves the inherent amplitude–phase coupling in modulated signals, enabling the network to extract more discriminative and physically meaningful features—particularly under phase shifts or amplitude scaling.

The output after complex convolution is then expressed as:

Y (t) = X (t) * W (t) = (X_{r} + j X_{i}) * (W_{r} + j W_{i})

(6)

Y (t) = (X_{r} * W_{r} - X_{i} * W_{i}) (t) + j (X_{r} * W_{i} + X_{i} * W_{r}) (t)

(7)

Here,

W (t)

represents the complex-valued convolution kernel, where

W_{r}

and

W_{i}

are its real and imaginary components, respectively. These real-valued learnable parameters are used to compute the complex convolution with the input

X (t)

according to complex arithmetic rules. After simplification, it can be expressed as:

Y_{r} (t) = (W_{r} * X_{r}) - (W_{i} * X_{i}), Y_{i} (t) = (W_{r} * X_{i}) + (W_{i} * X_{r})

(8)

The real and imaginary parts of the output are concatenated along the channel dimension to obtain the final output:

Y = Concat (Y_{r}, Y_{i}) \in R^{B \times L \times 2 C_{o u t}}

(9)

The SELU activation function is applied to produce a non-linear representation [31]. A parallel structure of complex convolution modules with kernel sizes of 3, 7, and 15 is adopted to achieve multiscale feature extraction. Padding operations are applied to ensure that the input and output lengths remain the same. Complex convolution ensures preservation of amplitude-phase coupling inherent in I/Q signals, which is critical for distinguishing between modulation types. Compared to real-valued convolution, which treats I and Q independently, complex convolution captures joint structure and is more aligned with the physical nature of the signals. In AMC tasks where class differences often manifest through phase rotation or amplitude modulation, this complex-domain modeling provides superior representation capacity.

3.2. Spatiotemporal Attention Module

Popular AMC models often neglect the spatiotemporal dependencies of signal features. In such models, feature channel weights are typically treated equally. However, the purely data-driven deep-learning approach results in significant performance disparities across channels. Most previous AMC methods integrating attention mechanisms have demonstrated that identifying temporal and spatial correlations in signals can significantly enhance recognition accuracy [32,33,34].

Inspired by the above observations, this section proposes a Spatiotemporal Attention (SA) module to enhance classification performance. Unlike traditional attention modules, e.g., Squeeze-and-Excitation (SE) or Convolutional Block Attention Module (CBAM), designed for 2D images, our spatiotemporal attention operates directly on the 1D time-domain radio signal. It preserves the I/Q structure and enables adaptive weighting across both channels and temporal steps. This design is essential for extracting time-dependent features in modulation classification tasks.

Figure 3 illustrates the detailed process of the Spatiotemporal Attention module. The input feature map is

X \in R^{B \times L \times C}

, where L and C denote the sequence length and number of channels, respectively, and B represents the batch size.

Figure 3. Spatiotemporal attention module of MCCSAN.

The spatiotemporal attention module maps the input features to

A_{t} \in R^{B \times L \times 128}

through a one-dimensional convolution operation, defined by the following equation:

A_{t} = σ (Conv 1 D (X; W_{t}; b_{t}))

(10)

where

W_{t}

denotes the temporal convolution kernel, and

σ

denotes the sigmoid activation function. The output feature

A_{t}

represents the weight for each time step.

Simultaneously, the input features are mapped to

A_{s} \in R^{B \times 1 \times 128}

through average channel pooling and a fully connected layer, formulated as:

X_{avg} = \frac{1}{L} \sum_{l = 1}^{L} X [:, l, :] \in R^{B \times 1 \times C}

(11)

A_{s} = σ (W_{s} X_{avg} + b_{s})

(12)

where

W_{s}

is the weight matrix of the fully connected layer. The output feature

A_{s}

encodes the attention weights for each channel.

The spatiotemporal attention matrix A is derived by element-wise multiplying

A_{t}

and

A_{s}

, which, respectively, represent temporal and spatial attention weights. The output feature

Y \in R^{B \times L \times 128}

is then obtained by multiplying the input feature map X with

A \in R^{B \times L \times 128}

, as expressed by the equation:

A = A_{t} ⊙ A_{s}

(13)

Y = X ⊙ A

(14)

where Y denotes the feature map weighted by the spatiotemporal attention mechanism that captures both temporal and spatial dependencies. All attention-related variables in this module, including

X_{a v g}

, A, and Y are real-valued tensors derived from the concatenated real and imaginary components of the complex input signal.

Notably, an adaptive temporal pooling mechanism based on attention is incorporated after the spatiotemporal attention module. This component enhances computational efficiency and reduces model complexity. The process involves generating weights for each time step via a fully connected layer, followed by a weighted summation over the temporal dimension, expressed as:

W = σ (W_{a} X + b_{a})

(15)

V = \sum_{l = 1}^{L} W [:, l, :] ⊙ X [:, l, :]

(16)

where

W_{a}

denotes the weight of the fully connected layer, W is the temporal pooling weight corresponding to the l-th time step, and V is the output tensor after pooling that integrates attention-weighted temporal information.

3.3. Joint Loss Function

The loss function in MCCSAN adopts a dynamic weighting strategy that combines cross-entropy loss, suited for multiclass classification, with center loss, which enhances interclass separability to improve overall classification performance [35]. The joint loss function can be defined as:

L = \frac{1}{2 σ_{1}^{2}} \cdot L_{ce} + \frac{1}{2 σ_{2}^{2}} \cdot L_{center} + log (σ_{1} \cdot σ_{2})

(17)

where

σ_{1}, σ_{2}

are the learnable weights of the loss function, and their optimal values are determined during training.

L_{c e}

is the cross-entropy loss function, and

L_{c e n t e r}

is the center loss function.

log (σ_{1} \cdot σ_{2})

is a regularization term, which is used to prevent the loss from becoming ineffective due to the infinite growth of

σ_{1}

and

σ_{2}

.

In the cross-entropy loss, C denotes the number of classes in the classification task, and N is the number of samples. The loss can be defined as:

L_{ce} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{c}^{(i)} log ({\hat{y}}_{c}^{(i)})

(18)

where

{\hat{y}}_{c}^{(i)}

is the probability that the sample i is predicted by the model to belong to the class of c. In this paper, it is the output of the softmax function and can be defined as:

{\hat{y}}_{c} = \frac{e^{z_{c}}}{\sum_{j = 1}^{C} e^{z_{j}}}, c = 1, 2, \dots, C

(19)

where Z is the original output of the model, which can be expressed as:

Z = [z_{1}, z_{2}, \dots, z_{C}]

(20)

y_{c} \in {0, 1}

is the true label of the sample, represented by one-hot encoding. For one-hot labels, only the value corresponding to the true class is 1, and the rest are 0. Therefore, for the sample i belonging to class c, the loss function is:

L_{ce}^{(i)} = - log ({\hat{y}}_{c^{*}}^{(i)}) = - log (\frac{e^{z_{c^{*}}^{(i)}}}{\sum_{j = 1}^{C} e^{z_{j}^{(i)}}}) = - z_{c^{*}}^{(i)} + log (\sum_{j = 1}^{C} e^{z_{j}^{(i)}})

(21)

The overall cross-entropy loss function is:

L_{ce} = \frac{1}{N} \sum_{i = 1}^{N} [- z_{c^{*}}^{(i)} + log (\sum_{j = 1}^{C} e^{z_{j}^{(i)}})]

(22)

Center loss represents the distance between samples and their corresponding class centers in a multidimensional space. It helps improve class separability in the feature space.

f_{i} \in R^{d}

is the feature vector of the sample i. In this paper, the output of the dense layer is regarded as

f_{i}

.

c_{y_{i}} \in R^{d}

is the central vector of the class. The overall center loss function is defined as:

L_{center} = \frac{1}{2} \sum_{i = 1}^{N} {∥f_{i} - c_{y_{i}}∥}_{2}^{2}

(23)

3.4. Radio Signal Model

Wireless channels are often represented by compact stochastic models to capture propagation characteristics [36]. The received baseband signal, typically affected by background noise and common real-world impairments, can be formulated as:

r (t) = α e^{j (ω t + θ)} \sum_{n} I_{n} g (t - n T - ε T) + w (t), 0 \leq t \leq T_{0}

(24)

Here,

T_{0}

denotes the total observation window, and T is the symbol period.

g (t)

represents the pulse shaping filter, while

I_{n}

are the modulated symbols transmitted. The parameter n reflects symbol index or delay.

w (t)

is a zero-mean complex-valued additive white Gaussian noise process. The term

α

models channel gain, whereas

ω

and

θ

account for carrier frequency and phase offset. The residual timing error is denoted by

ε

. The aim of this work is to determine the correct modulation scheme from the observed signal

r (t)

.

3.5. Implementation Details

The raw I/Q samples, normalized to unit variance, are directly fed into the multiscale complex convolution layers of MCCSAN, without relying on hand-crafted features or signal pre-processing. The network architecture of MCCSAN is shown in Table 1. To further enhance performance, the Sigmoid-Weighted Linear Unit (SiLU) activation and Dropout are also incorporated [37].

Table 1. The network architecture of MCCSAN.

Table 2 presents detailed information on two open-source datasets [38]. MCCSAN is trained for modulation classification using a dataset split of 20% for training, 20% for validation, and 60% for testing. The model is implemented in TensorFlow, and the experiments are conducted on a PC equipped with an Nvidia RTX A2000 12 GB GPU.

Table 2. Detailed information of two open-source datasets.

4. Experiments

In this section, the proposed method is compared with baseline classifiers under different SNR (

E_{s} / N_{0}

). Classification performance is evaluated by the probability of correct classification.

4.1. Comparison with Baseline Methods

To demonstrate the effectiveness of the proposed method, MCCSAN is compared with several deep-learning-based baseline models on the RADIOML 2018.01A and RADIOML 2016.10A datasets. For a fair comparison, all baselines, including BiGRU and LSTM, are re-implemented under the same training settings as MCCSAN, and no performance results from prior publications were directly cited in our comparisons.

The comparison results of this experiment are shown in Figure 4. These results demonstrate that MCCSAN significantly enhances modulation classification performance by more than 3.98% across SNRs ranging from −6 dB to 12 dB. This improvement is attributed to the multiscale complex convolution and spatiotemporal attention modules, which strengthen feature representation and key feature extraction in deep networks. It is noteworthy that MCCSAN exhibits a substantial advantage in low SNRs. This comparison indicates that enhancing signal feature expression and attention mechanisms can greatly improve the network classification performance for radio signals. Table 3 presents the average accuracy of various baseline models on the two datasets. MCCSAN achieves a peak classification accuracy of 97.03% on RADIOML2018.01A and 89.34% on RADIOML2016.10A, respectively.

Figure 4. Comparison results of MCCSAN with baseline methods. (a) On RADIOML 2018.01A dataset. (b) On RADIOML 2016.10A dataset.

Table 3. Average accuracy comparison with baseline methods from RML2018.01A and RML2016.10A.

To explore the influence of the multiscale complex convolution and spatiotemporal attention mechanisms, an ablation study is conducted. Figure 5 demonstrates the performance of MCCSAN with and without each of the considered modules. The results indicate that MCCSAN outperforms the versions without the multiscale complex convolution and spatiotemporal attention mechanisms by 1.93% and 2.20%, respectively. Table 4 shows the specific accuracy values of the model under different SNRs. Integrating the multiscale complex convolution as a feature extractor significantly improves the model recognition accuracy. The introduction of the spatiotemporal attention mechanism enables the model to focus on signal components that are critical to feature learning, thereby enhancing its performance.

Figure 5. Comparison results of MCCSAN without each considered module on RADIOML2018.01A.

Table 4. Correct classification probability (%) under different SNR conditions on RADIOML2018.01A.

4.2. Comparison with State-of-the-Art Methods

To evaluate the effectiveness of the proposed multiscale complex convolution and spatiotemporal attention modules, we conducted ablation experiments against commonly used alternatives. Specifically, we replaced the multiscale complex convolution module with a standard real-valued convolution block, and replaced the spatiotemporal attention mechanism with SE and CBAM, respectively. In each case, the rest of the network architecture remained unchanged to ensure a fair comparison.

The experimental results are shown in Figure 6. In Figure 6a, the network using complex convolution consistently outperforms the real-valued convolution, demonstrating its superiority in preserving phase and amplitude information essential for modulation recognition. The results proved that complex convolution is more suitable for I/Q signal processing in AMC tasks.

Figure 6. Comparison results of MCCSAN with different modules on RADIOML2018.01A. (a) Convolution compared to complex convolution. (b) Comparison of spatiotemporal attention between SE and CBAM.

Figure 6b compares our spatiotemporal attention module with SE and CBAM. The SA module achieves significantly higher classification accuracy across almost the entire SNR range. This result highlights the importance of jointly weighting temporal and channel-wise features, an aspect absent in SE. Furthermore, the spatial attention mechanism in CBAM is primarily designed for 2D image inputs and does not adapt well to 1D signal processing. The performance confirms that SA effectively captures time-dependent modulation patterns and suppresses redundant features, making it better tailored for time-series radio signals. Table 5 specifically compares the differences between SA, SE, and CBAM.

To further demonstrate the effectiveness and generalization of the proposed MCCSAN model, we conducted a comprehensive comparison with several state-of-the-art benchmarks on the RADIOML2018.01A dataset. The selected baselines include MetaLearning [29], Transformer-based AMC [27], Denoising Autoencoder (DAE) [42], and improved convolutional neural-network-based automatic modulation classification network (IC-AMCNet) [43]. All models were evaluated under identical experimental settings and dataset splits to ensure a fair comparison.

Table 5. Comparison of attention modules and their task-specific adaptation.

Module	Domain	Key Operation	Task-Specific Adaptation
SE [44]	Channel	Global average pooling, FC	None
CBAM [45]	Channel + Spatial	AvgPool + MaxPool, Conv2D	Designed for images
SA (ours)	Time + Channel	ReduceMean, Conv1D + Dense	Designed for time series

As shown in Figure 7, MCCSAN consistently achieves the highest classification accuracy across nearly the entire SNR range, especially from 2 dB to 10 dB. While MetaLearning performs competitively at very low SNRs, its accuracy saturates earlier and remains significantly lower in medium-to-high SNR conditions. Similarly, although DAE and IC-AMCNet models perform well above 8 dB, they are outperformed by MCCSAN throughout.

Figure 7. Comparison results of MCCSAN with benchmark models on RADIOML2018.01A.

4.3. Classification Performance by Modulation Mode

Figure 8 illustrates the per-class classification accuracy of MCCSAN on the RML2018.01A dataset. As the SNR drops below 10 dB, the classification accuracy of each modulation type experiences a sharp decline, whereas it stabilizes at higher SNR levels. The radio signals with low-complexity demodulation and good spectral characteristics are more distinguishable under low SNR conditions, e.g., BPSK, QPSK, and GMSK.

Figure 8. Classification results of MCCSAN for individual modulation mode.

In contrast, higher-order modulation modes have higher requirements for the SNR and fail to achieve the desired classification accuracy. For various modulation modes in the dataset, when the SNR is around 6dB, the average classification accuracy reaches over 90%.

Specifically, the accuracy of the AM-SSB-WC modulation type is significantly lower than that of other modulation types. This is primarily because AM-SSB-WC relies heavily on fine-grained frequency domain details, for which the model lacks specifically designed feature extractors, resulting in reduced recognition accuracy.

To analyze misclassification patterns, Figure 9 presents the confusion matrix of MCCSAN across all 24 classes on RML2018.01A under AWGN conditions with varying SNR levels. The confusion matrix indicates that the major sources of misclassification are between high-order Phase-Shift Keying (PSK), e.g., 16-PSK and 32-PSK, and high-order Quadrature Amplitude Modulation (QAM), e.g., 64-QAM, 128-QAM, and 256-QAM, as well as among AM modes which are confused with carriers and suppressed carriers. Since the characteristics of high-order signals are easily overwhelmed by noise, and the key feature of jumps, which represents the signal type, is difficult to capture, it is extremely difficult to identify them through existing methods.

Figure 9. Confusion matrix of MCCSAN with different SNR: (a) Confusion matrix of MCCSAN with SNR from −20 to 20 dB. (b) Confusion matrix of MCCSAN with SNRs 2 dB. (c) Confusion matrix of MCCSAN with SNRs 6 dB. (d) Confusion matrix of MCCSAN with SNRs 10 dB.

Some selected examples are shown in Figure 10 [46], which provides three complementary views of all modulation types in the dataset: time-domain waveforms, frequency-domain spectra, and constellation diagrams. These visualizations offer distinct but related perspectives on modulation characteristics. The time domain illustrates temporal structure, the frequency domain reveals bandwidth and spectral shape, and the constellation diagram shows symbol geometry. Collectively, these views facilitate a deeper understanding of the intrinsic complexity of each modulation scheme and the challenges associated with their classification under noisy conditions.

Figure 10. Signal feature visualization of all modulation modes on RADIOML 2018.01A dataset: (a) I/Q time domain examples of all modulation modes on RADIOML 2018.01A dataset. (b) I/Q frequency domain examples of all modulation modes in RADIOML 2018.01A dataset. (c) Radio signal constellations of all modulation modes on RADIOML 2018.01A dataset.

5. Conclusions

This paper presents a deep-learning framework based on multiscale complex convolution and a spatiotemporal attention mechanism for AMC. By integrating deep learning with complex convolution and attention mechanisms, the proposed model retains original signal characteristics while focusing on more discriminative representations of radio signals. Through the joint loss function, our method can more effectively distinguish signals between different modulation classes.

Experiments on RADIOML 2018.01A and RADIOML 2016.10A have been performed to demonstrate the effectiveness of MCCSAN in AMC. The confusion matrix analysis reveals that misclassifications are mainly concentrated among high-order PSK and QAM schemes, as well as among AM variants with close spectral characteristics. The ablation studies confirm that both the multiscale complex convolution and the spatiotemporal attention contribute significantly to performance gains. In addition, the visualization of signal features in time, frequency, and constellation domains provides further insight into modulation complexity and classification challenges.

In future work, we aim to enhance MCCSAN’s ability to recognize high-order modulation schemes, extend its applicability to time-varying channels, and address challenges in real-world signal identification. Incorporating domain-specific prior knowledge and frequency-aware attention mechanisms is expected to further improve the model’s classification robustness and interpretability in complex electromagnetic environments.

Author Contributions

Conceptualization, S.X. and D.Z.; Methodology, S.X. and D.Z.; Software, D.Z.; Validation, S.X. and W.M.; Formal analysis, S.X.; Investigation, S.X.; Resources, D.Z., Y.L. and Z.X.; Writing—original draft, S.X.; Writing—review & editing, S.X. and D.Z.; Visualization, S.X.; Supervision, D.Z., Z.X. and W.M.; Project administration, D.Z. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 62201010 and R&D Program of Beijing Municipal Education Commission grant number KM202310009003.

Data Availability Statement

The data presented in this study were derived from the following resources available in the public domain: [https://www.deepsig.ai/datasets/ (accessed on 1 June 2025)].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kulin, M.; Kazaz, T.; Moerman, I.; De Poorter, E. End-to-end learning from spectrum data: A deep learning approach for wireless signal identification in spectrum monitoring applications. IEEE Access 2018, 6, 18484–18501. [Google Scholar] [CrossRef]
Zhou, Q.; Niu, Y. From Adaptive Communication Anti-Jamming to Intelligent Communication Anti-Jamming: 50 Years of Evolution. Adv. Intell. Syst. 2024, 6, 2300853. [Google Scholar] [CrossRef]
Chenchev, I.; Aleksieva-Petrova, A.; Petrov, M. Authentication mechanisms and classification: A literature survey. In Intelligent Computing, Proceedings of the 2021 Computing Conference, London, UK, 15–16 July 2021; Springer: Berlin/Heidelberg, Germany, 2021; Volume 3, pp. 1051–1070. [Google Scholar]
Xia, N.; Chen, H.H.; Yang, C.S. Radio resource management in machine-to-machine communications—A survey. IEEE Commun. Surv. Tutorials 2017, 20, 791–828. [Google Scholar] [CrossRef]
Meng, F.; Chen, P.; Wu, L.; Wang, X. Automatic modulation classification: A deep learning enabled approach. IEEE Trans. Veh. Technol. 2018, 67, 10760–10772. [Google Scholar] [CrossRef]
Krayani, A.; Alam, A.S.; Marcenaro, L.; Nallanathan, A.; Regazzoni, C. Automatic jamming signal classification in cognitive UAV radios. IEEE Trans. Veh. Technol. 2022, 71, 12972–12988. [Google Scholar] [CrossRef]
Xu, Y.; Zheng, K.; Liu, X.; Li, Z.; Liu, J. Cognitive Radio Networks: Technologies, Challenges and Applications. Sensors 2025, 25, 1011. [Google Scholar] [CrossRef] [PubMed]
Wei, W.; Mendel, J. Maximum-likelihood classification for digital amplitude-phase modulations. IEEE Trans. Commun. 2000, 48, 189–193. [Google Scholar] [CrossRef]
Hameed, F.; Dobre, O.A.; Popescu, D.C. On the likelihood-based approach to modulation classification. IEEE Trans. Wirel. Commun. 2009, 8, 5884–5892. [Google Scholar] [CrossRef]
Salam, A.O.A.; Sheriff, R.E.; Al-Araji, S.R.; Mezher, K.; Nasir, Q. A unified practical approach to modulation classification in cognitive radio using likelihood-based techniques. In Proceedings of the 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), Halifax, NS, Canada, 3–6 May 2015; IEEE: New York, NY, USA, 2015; pp. 1024–1029. [Google Scholar]
Panagiotou, P.; Anastasopoulos, A.; Polydoros, A. Likelihood ratio tests for modulation classification. In Proceedings of the MILCOM 2000 Proceedings. 21st Century Military Communications. Architectures and Technologies for Information Superiority (Cat. No. 00CH37155), Los Angeles, CA, USA, 22–25 October 2000; IEEE: New York, NY, USA, 2000; Volume 2, pp. 670–674. [Google Scholar]
Dulek, B. Online hybrid likelihood based modulation classification using multiple sensors. IEEE Trans. Wirel. Commun. 2017, 16, 4984–5000. [Google Scholar] [CrossRef]
Azzouz, E.; Nandi, A. Procedure for automatic recognition of analogue and digital modulations. IEE Proc.-Commun. 1996, 143, 259–266. [Google Scholar] [CrossRef]
Orlic, V.D.; Dukic, M.L. Automatic modulation classification: Sixth-order cumulant features as a solution for real-world challenges. In Proceedings of the 2012 20th Telecommunications Forum (TELFOR), Belgrade, Serbia, 20–22 November 2012; IEEE: New York, NY, USA, 2012; pp. 392–399. [Google Scholar]
Headley, W.C.; Reed, J.D.; da Silva, C.R.C.M. Distributed Cyclic Spectrum Feature-Based Modulation Classification. In Proceedings of the 2008 IEEE Wireless Communications and Networking Conference, Las Vegas, NV, USA, 31 March–3 April 2008; pp. 1200–1204. [Google Scholar]
Hazza, A.; Shoaib, M.; Alshebeili, S.A.; Fahad, A. An overview of feature-based methods for digital modulation classification. In Proceedings of the 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Sharjah, United Arab Emirates, 12–14 February 2013; pp. 1–6. [Google Scholar]
Huynh-The, T.; Hua, C.H.; Pham, Q.V.; Kim, D.S. MCNet: An efficient CNN architecture for robust automatic modulation classification. IEEE Commun. Lett. 2020, 24, 811–815. [Google Scholar] [CrossRef]
Tekbıyık, K.; Ekti, A.R.; Görçin, A.; Kurt, G.K.; Keçeci, C. Robust and fast automatic modulation classification with CNN under multipath fading channels. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Abdulkarem, A.M.; Abedi, F.; Ghanimi, H.M.; Kumar, S.; Al-Azzawi, W.K.; Abbas, A.H.; Abosinnee, A.S.; Almaameri, I.M.; Alkhayyat, A. Robust automatic modulation classification using convolutional deep neural network based on scalogram information. Computers 2022, 11, 162. [Google Scholar] [CrossRef]
Abd-Elaziz, O.F.; Abdalla, M.; Elsayed, R.A. Deep learning-based automatic modulation classification using robust CNN architecture for cognitive radio networks. Sensors 2023, 23, 9467. [Google Scholar] [CrossRef]
Hong, D.; Zhang, Z.; Xu, X. Automatic modulation classification using recurrent neural networks. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017; pp. 695–700. [Google Scholar]
Rajendran, S.; Meert, W.; Giustiniano, D.; Lenders, V.; Pollin, S. Deep Learning Models for Wireless Signal Classification with Distributed Low-Cost Spectrum Sensors. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 433–445. [Google Scholar] [CrossRef]
Zang, K.; Wu, W.; Luo, W. Deep Sparse Learning for Automatic Modulation Classification Using Recurrent Neural Networks. Sensors 2021, 21, 6410. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Wang, Y. A novel deep learning automatic modulation classifier with fusion of multichannel information using GRU. EURASIP J. Wirel. Commun. Netw. 2023, 2023, 66. [Google Scholar] [CrossRef]
Hu, X.; Gao, G.; Li, B.; Wang, W.; Ghannouchi, F.M. A Novel Lightweight Grouped Gated Recurrent Unit for Automatic Modulation Classification. IEEE Wirel. Commun. Lett. 2024, 13, 2135–2139. [Google Scholar] [CrossRef]
Hamidi-Rad, S.; Jain, S. Mcformer: A transformer based deep neural network for automatic modulation classification. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
Cai, J.; Gan, F.; Cao, X.; Liu, W. Signal modulation classification based on the transformer network. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1348–1357. [Google Scholar] [CrossRef]
Kong, W.; Jiao, X.; Xu, Y.; Zhang, B.; Yang, Q. A transformer-based contrastive semi-supervised learning framework for automatic modulation recognition. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 950–962. [Google Scholar] [CrossRef]
Jang, J.; Pyo, J.; Yoon, Y.I.; Choi, J. Meta-transformer: A meta-learning framework for scalable automatic modulation classification. IEEE Access 2024, 12, 9267–9276. [Google Scholar] [CrossRef]
Hu, Y.; Liu, Y.; Lv, S.; Xing, M.; Zhang, S.; Fu, Y.; Wu, J.; Zhang, B.; Xie, L. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv 2020, arXiv:2008.00264. [Google Scholar]
Kılıçarslan, S.; Adem, K.; Çelik, M. An overview of the activation functions used in deep learning algorithms. J. New Results Sci. 2021, 10, 75–88. [Google Scholar] [CrossRef]
Zheng, Q.; Tian, X.; Yu, Z.; Yang, M.; Elhanashi, A.; Saponara, S. Robust automatic modulation classification using asymmetric trilinear attention net with noisy activation function. Eng. Appl. Artif. Intell. 2025, 141, 109861. [Google Scholar] [CrossRef]
Liu, T.; Wu, S. Automatic Modulation Classification Based on Time-Attention Mechanism and LSTM Neural Networks. In Proceedings of the 2024 7th International Conference on Information Communication and Signal Processing (ICICSP), Zhoushan, China, 21–23 September 2024; IEEE: New York, NY, USA, 2024; pp. 112–117. [Google Scholar]
Ma, W.; Cai, Z.; Wang, C. A transformer and convolution-based learning framework for automatic modulation classification. IEEE Commun. Lett. 2024, 28, 1392–1396. [Google Scholar] [CrossRef]
Zhang, H.; Zhou, F.; Wu, Q.; Wu, W.; Hu, R.Q. A Novel Automatic Modulation Classification Scheme Based on Multi-Scale Networks. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 97–110. [Google Scholar] [CrossRef]
Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-air deep learning based radio signal classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [Google Scholar] [CrossRef]
Hou, S.; Dong, Y.; Li, Y.; Yan, Q.; Wang, M.; Fang, S. Multi-domain-fusion deep learning for automatic modulation recognition in spatial cognitive radio. Sci. Rep. 2023, 13, 10736. [Google Scholar] [CrossRef]
Clerico, V.; González-López, J.; Agam, G.; Grajal, J. LSTM framework for classification of radar and communications signals. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Zhou, J.; Wan, S. Automatic Modulation Recognition via Pruned LSTM-GRU with Multi-Head Attention. In Proceedings of the 2025 3rd International Conference on Communication Networks and Machine Learning, New York, NY, USA, 21–23 February 2025; CNML ’25. pp. 98–102. [Google Scholar]
Ke, Z.; Vikalo, H. Real-Time Radio Technology and Modulation Classification via an LSTM Auto-Encoder. IEEE Trans. Wirel. Commun. 2022, 21, 370–382. [Google Scholar] [CrossRef]
Hermawan, A.P.; Ginanjar, R.R.; Kim, D.S.; Lee, J.M. CNN-Based Automatic Modulation Classification for Beyond 5G Communications. IEEE Commun. Lett. 2020, 24, 1038–1041. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and pattErn Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Vitthaladevuni, P.K.; Alouini, M.S. A recursive algorithm for the exact BER computation of generalized hierarchical QAM constellations. IEEE Trans. Inf. Theory 2003, 49, 297–307. [Google Scholar] [CrossRef]

Figure 1. Pipeline of the Multiscale Complex Convolution Spatiotemporal Attention Network (MCCSAN).

Figure 2. Single complex convolutional module of MCCSAN. “Real” and “Imag.” represent the real and imaginary parts of the complex I/Q signal, corresponding to the in-phase and quadrature components, respectively.

Figure 3. Spatiotemporal attention module of MCCSAN.

Figure 4. Comparison results of MCCSAN with baseline methods. (a) On RADIOML 2018.01A dataset. (b) On RADIOML 2016.10A dataset.

Figure 5. Comparison results of MCCSAN without each considered module on RADIOML2018.01A.

Figure 6. Comparison results of MCCSAN with different modules on RADIOML2018.01A. (a) Convolution compared to complex convolution. (b) Comparison of spatiotemporal attention between SE and CBAM.

Figure 7. Comparison results of MCCSAN with benchmark models on RADIOML2018.01A.

Figure 8. Classification results of MCCSAN for individual modulation mode.

Figure 9. Confusion matrix of MCCSAN with different SNR: (a) Confusion matrix of MCCSAN with SNR from −20 to 20 dB. (b) Confusion matrix of MCCSAN with SNRs 2 dB. (c) Confusion matrix of MCCSAN with SNRs 6 dB. (d) Confusion matrix of MCCSAN with SNRs 10 dB.

Figure 10. Signal feature visualization of all modulation modes on RADIOML 2018.01A dataset: (a) I/Q time domain examples of all modulation modes on RADIOML 2018.01A dataset. (b) I/Q frequency domain examples of all modulation modes in RADIOML 2018.01A dataset. (c) Radio signal constellations of all modulation modes on RADIOML 2018.01A dataset.

Table 1. The network architecture of MCCSAN.

Layer	Convolution Filter	Output Dimension
Input	-	1024 × 2
Complex Convolution	32, 3 × 1	1024 × 32
Complex Convolution	32, 7 × 1	1024 × 32
Complex Convolution	32, 15 × 1	1024 × 32
Concatenate	-	1024 × 96
Convolution1D	128, 3 × 1	1024 × 128
BiGRU	-	1024 × 128
BiGRU	-	1024 × 128
Spatiotemporal attention	128, 11 × 1	1024 × 128
Adaptive Temporal Pooling	-	128
FC	-	128
FC	-	24
Total parameter	406,553

Table 2. Detailed information of two open-source datasets.

Dataset	RML2016.10a	RML2018.01a
Modulation Schemes	11 classes: (8PSK, BPSK, CPFSK, GFSK, PAM4, 16QAM, AM-DSB, AM-SSB, 64QAM, QPSK, WBFM)	24 classes: (OOK, 4ASK, 8ASK, BPSK, QPSK, 8PSK, 16PSK, 32PSK, 16APSK, 32APSK, 64APSK, 128APSK, 16QAM, 32QAM, 64QAM, 128QAM, 256QAM, AM-SSB-WC, AM-SSB-SC, AM-DSB-WC, AM-DSB-SC, FM, GMSK, OQPSK)
Sample Dimension	128 × 2	1024 × 2
Dataset Size	220,000	2,555,904
SNR Range (dB)	−20:2:18	−20:2:30

Table 3. Average accuracy comparison with baseline methods from RML2018.01A and RML2016.10A.

	BiGRU [39]	LSTM2 [40]	LSTM + BiGRU [41]	MCCSAN
RML2018.10A	66.15%	64.99%	65.61%	70.13%
RML2016.01A	80.23%	79.11%	78.18%	82.98%

Table 4. Correct classification probability (%) under different SNR conditions on RADIOML2018.01A.

SNR (dB)	Baseline	Baseline + MCC	Baseline + SA	Ours
−6	26.36%	25.98%	27.13%	28.13%
−4	34.14%	34.46%	34.87%	36.03%
−2	43.45%	44.46%	44.47%	46.68%
0	53.46%	55.92%	55.07%	58.39%
2	63.45%	66.09%	65.80%	70.89%
4	75.89%	79.18%	78.80%	83.02%
6	85.28%	89.25%	89.04%	90.79%
8	90.86%	94.07%	93.53%	94.13%
10	93.32%	96.01%	94.38%	96.32%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

MCCSAN: Automatic Modulation Classification via Multiscale Complex Convolution and Spatiotemporal Attention Network

Abstract

1. Introduction

3. Multiscale Complex Convolution Spatiotemporal Attention Network

3.1. Multiscale Complex Convolutional Module

3.2. Spatiotemporal Attention Module

3.3. Joint Loss Function

3.4. Radio Signal Model

3.5. Implementation Details

4. Experiments

4.1. Comparison with Baseline Methods

4.2. Comparison with State-of-the-Art Methods

4.3. Classification Performance by Modulation Mode

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

MCCSAN: Automatic Modulation Classification via Multiscale Complex Convolution and Spatiotemporal Attention Network

Abstract

1. Introduction

2. Related Work

2.1. Likelihood-Based Methods for AMC

2.2. Feature-Based Methods for AMC

2.3. Deep-Learning Methods for AMC

3. Multiscale Complex Convolution Spatiotemporal Attention Network

3.1. Multiscale Complex Convolutional Module

3.2. Spatiotemporal Attention Module

3.3. Joint Loss Function

3.4. Radio Signal Model

3.5. Implementation Details

4. Experiments

4.1. Comparison with Baseline Methods

4.2. Comparison with State-of-the-Art Methods

4.3. Classification Performance by Modulation Mode

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics