Cross-Domain Automatic Modulation Classiﬁcation Using Multimodal Information and Transfer Learning

: Automatic modulation classiﬁcation (AMC) based on deep learning (DL) is gaining increasing attention in dynamic spectrum access for 5G/6G wireless communications. However, inconsistent feature parameters between the training (source) and testing (target) data lead to performance degradation or even failure of existing DL-based AMC. The primary reason for this is the difﬁculty in obtaining sufﬁcient labeled training data in the target domain. Therefore, we propose a novel cross-domain AMC algorithm based on multimodal information and transfer learning, utilizing abundant unlabeled target domain data. We achieve complementary gains by fusing multimodal information such as amplitude, phase, and spectrum, which are used to train a network. Additionally, we apply domain adversarial neural network technology from transfer learning to learn from a large number of unlabeled data samples in the target domain to address the issue of decreased accuracy in cross-domain AMC caused by differences in sampling rate, signal-to-noise ratio, and channel variations. Furthermore, we introduce class weight weighting and entropy weighting to solve the partial domain adaptation problem, considering that the target domain has fewer modulation signal classes than the source domain. Experimental results on two designed modulation datasets demonstrate improved performance gains, thus validating the effectiveness of the proposed method.


Introduction
Deep learning (DL) for automatic modulation classification (AMC) is gaining increasing attention in cognitive radio and spectrum sensing technologies.This approach can support the refarming of spectrum resources with low utilization, which is crucial for developing 5G/6G wireless communications.AMC [1] refers to the identification of the modulation schemes of an unknown signal received with limited prior knowledge for use in scenarios such as electromagnetic situational awareness [2], cognitive radio [3], dynamic spectrum access [4], and interference identification [5].Classical AMC methods can be divided into maximum-likelihood hypothesis testing based on decision theory and pattern recognition based on feature extraction [6].The likelihood ratio test methods are optimal in terms of Bayesian estimation for their classification results.However, the identification process requires higher prior knowledge and has stringent hypothesis constraints.Moreover, a suitable likelihood ratio function for different modulation schemes is required [7,8], and the calculation complexity of the likelihood ratio function is high.Therefore, pattern recognition based on feature extraction methods [9,10] is widely used in practice.However, owing to ever-emerging complex signals and increasingly crowded electromagnetic environments, feature extraction methods face several challenges, including difficulty setting manual feature thresholds and achieving optimal combinations subjectively, resulting in poor adaptability to complex environments, complex modulation schemes, and similar modulation schemes.Furthermore, these methods have low classification accuracy under low signal-to-noise ratio (SNR) conditions.
To address these challenges, recently, deep learning (DL) has been applied to AMC.DL methods do not require manual design or the extraction of signal features.Neural networks can adaptively extract and infer modulation signal features that are more robust and generalizable.The mainstream DL technologies include convolutional neural networks (CNN), recurrent neural networks (RNN) [11], and some hybrid models, which show superior performance over classical methods in AMC.
Currently, most AMC approaches obtain experimental datasets through three main methods: MATLAB/GNU radio simulation, data collection in a single real scenario, and direct use of publicly available datasets, such as RML 2016 and 2018 [12].A part of the generated/sampled/publicly obtained sample data is used to train neural networks, and another part is used to test and verify the method's effectiveness.In the research process, optimization is primarily conducted on the data or the neural network model to enhance classification performance.Appropriate data preprocessing can maximize the differences between different modulation schemes, thus ensuring improved classification by the neural network.The network model and hyperparameters can also be fine-tuned to AMC tasks.
However, existing DL-based AMC algorithms often encounter specific problems and challenges in practical applications.First, many studies predominantly rely on monomodal information from a single depicting dimension, disregarding the complementarities that multimodal information can generate to better adapt to complex scenarios with different SNR and channel variations.Based on signal representation and preprocessing [13], the existing DL-based AMC algorithms are divided into four categories: feature representation (such as higher-order cumulants (HOCs) [9] and spectral features [10]), image representation (such as constellation diagram [14], feature point image [15], eye diagram [16], and spectral correlation function image [17]), sequence representation (such as in-phase and quadrature (IQ) sequences [18], amplitude and phase (AP) sequences [19], fast Fourier transformation sequences [20]), and combined representation [21][22][23].Increased modal space has been theoretically proved to provide more comprehensive knowledge to improve network performance [24].Therefore, the use of multimodal information, such as features, images, and sequences, is inevitable in future AMC algorithms.Second, training and test data used currently in DL-based AMC algorithms are generated from the same datasets, assuming they come from the same feature space and follow the same feature distribution.However, time, space, transmitter and receiver performance differences, and channel multipath delay inevitably give rise to notable distinctions in feature distributions between the source and target domain data (defined as the unsupervised non-partial domain adaptation (NPDA) problem for AMC).When the trained model is directly used to test new data, it performs poorly.In practical scenarios, the difficulty in acquiring accurate labels for data in the target domain data hinders the direct utilization of the data for training the network.This is the primary reason for the observed degradation in the performance of trained models.Employing unlabeled data in the target domain for training is a feasible approach to the above problem.Finally, existing research assumes the same modulation classes without considering that the modulation classes of the target domain are often less than those of the source domain (i.e., the target domain class is a proper subset of the source domain class, defined as the unsupervised PDA problem for AMC).Hence, applying nonpartial domain-adaptive AMC algorithms directly to such scenarios can result in negative transfer owing to their global alignment strategies, thereby reducing the method's performance.
Several studies have attempted to resolve these issues.Insufficient monomodal information representation capability was addressed by converting signals into a twodimensional image through time-frequency transformation and combined with handcrafted features to form joint features [23].The simulation result revealed that CNN models using a fusion strategy achieve favorable classification performance under low SNR conditions.However, after conversion of the raw I-Q sequences into images, the data increase ex-ponentially, and the computation complexity of extracting higher-order cumulants and circular features also increase significantly, affecting classification efficiency.To address the problem of deep neural network model mismatch caused by feature distribution differences between the source and target domains, sharp deterioration in classification performance, and numerous unutilized and unlabeled target domain data samples, adversarial-based domain adaptation (DA) methods [25,26] have been proposed [27][28][29].Discrepancy-based DA methods have been used in [30] for cross-domain AMC on self-built and public datasets [31].These methods achieved better performance improvements than no transfer learning (TL).However, the above studies did not consider cross-domain AMC issues under multiple parameter changes, such as sampling rate, SNR, and channel, or conduct in-depth research on unsupervised PDA problems using multimodal information.
We propose a novel multimodal information and TL framework for cross-domain AMC to address the aforementioned issues.The contributions of this study are summarized as follows.
(1) We adopted a multimodal information fusion strategy based on signal time-domain and frequency-domain features, which enables the leverage of the complementary benefits of different modalities to improve the network's understanding of the input.
With the same network structure, our approach achieves improved classification performance.
(2) We introduced TL to transfer knowledge from the source domain to the target domain.
By leveraging a large amount of unlabeled data in the target domain and aligning the distribution of modulation signal data between the source and target domains using a domain adversarial neural network (DANN), we proposed an unsupervised DANN method that addresses the problem of unsupervised NPDA when multiple parameters vary between the source and target domains.(3) We designed a class weight weighting and entropy weighting mechanism to improve the weight of shared class data samples and effectively address the PDA problem, particularly in scenarios where the number of modulation signal classes in the target domain is smaller than that in the source domain.(4) We conducted extensive experiments on two datasets explicitly designed to validate the effectiveness of our approach.The results demonstrated that our method achieves higher classification accuracy in different DA tasks compared with the baseline methods.
The remainder of this study is organized as follows.Section 2 introduces the system model, including the cross-domain learning model, cross-domain AMC model, and calculation computation of multi-modal feature inputs.Section 3 details the proposed classification approach, including multimodal information fusion, architecture, and training steps.Section 4 presents the experimental results and their detailed analysis.Finally, Section 5 concludes this study.The list of abbreviations and notations used in the article are presented in Tables 1 and 2, respectively.

System Model
We will first introduce the cross-domain learning model, then define the cross-domain AMC problem, and finally describe the computation method of multi-feature input adopted in this study.

Cross-Domain Learning Model
The research methodology employed in this study to resolve the cross-domain AMC problem is based on transfer learning principles.Particularly, we adopt the DANN to align the training and testing data domains.

Transfer Learning
The major task of TL is to transfer learned knowledge from the source to the target domain to improve the learning process of the target task [25].Thus, we first define "domain" and "task".A domain D comprises a feature space χ and marginal probability distribution P(X) in which X = {x 1 , . . . ,x n } where x i is the i-th feature vectors in X.Hence, D = {χ, P(X)}.A task T in D comprises a label space γ and a predictive function f (•) in which Y = {y 1 , . . . ,y n } ∈ γ, where y i is the i-th label in Y.The predictive (or decision) function is learned from the feature vector and label pairs {x i , y i }.Additionally, the predictive function represents the prediction of the corresponding label f (x i ) given instance x i .In this case, the predictive function can be defined as f (x i ) = P(Y|X) .Hence, T = {γ, f (•)}.Now, we can formally define TL as follows.Given a source domain D S with a corresponding source task T S , and a target domain D T with a corresponding source task T T , TL aims to learn the target predictive function f (•) by leveraging the knowledge gained from D S and T S , where D S = D T or T S = T T .Note that D S = D T implies that χ S = χ T and/or P S (X) = P T (Y).When χ S = χ T , the feature space of the source and target domains differ.Similarly, P S (X) = P T (Y) when the marginal distributions of the source and target domain differ.Another scenario of TL is T S = T T , where γ S = γ T and/or P(Y S |X S ) = P(Y T |X T ) .When γ S = γ T , the label space of the source and target domains are different.When P(Y S |X S ) = P(Y T |X T ) , the conditional probability distributions of the source and target domains are different.

DANN
In this study, we treat the source and target domains as a whole and train a domain classifier to achieve feature alignment between the two domains.The objective of the domain classifier is to ensure that the deep features extracted from the source and target domains are aligned in the same feature space.This addresses the parameter sensitivity issue that can cause deep-learning-based modulation recognition methods to fail.
Our method utilizes DANN to align the distribution of the source and target domains, thus avoiding the manual designing of the distance losses between the source and target domains.During training, the network spontaneously learns what should be aligned between the two domains and to what extent.This approach typically yields improved results.
DANN [25] is a representation learning approach for DA in which the training and test data come from similar but different distributions.The advantage of the DA approaches is the ability to learn a mapping between domains when the target domain data are either fully unlabeled or have few labeled samples.DANN's architecture consists primarily of a feature extractor, label predictor, and domain classifier (Figure 1).The learning feature is required to be domain-invariant except for discriminativeness.Therefore, the domain classifier is designed to discriminate whether the underlying features is from the source or target domain during training.The gradient reversal layer (GRL) [25] propagates the domain classification loss back to the feature extractor, with the weight of the loss function controlled by a hyperparameter λ.Through gradient reversal, the domain adversarial network maximizes the loss of domain classifier while minimizing the loss of the label predictor.by a hyperparameter λ.Through gradient reversal, the domain adversarial network maximizes the loss of domain classifier while minimizing the loss of the label predictor.The loss function consists of the source and domain classification loss.The source domain classification loss is used to ensure that the neural network performs well on the source domain data.The domain classification loss aims to align the data distributions of the source and target domains so that the neural network trained on the source domain can also exhibit good classification performance on the unlabeled target domain.By jointly optimizing the source and domain classification loss, we can train a neural network with good generalization performance, enabling it to perform effectively in the target domain.
The core idea of this approach is to use the domain classifier to guide the feature extractor in learning feature representations that have discriminative performance for both the source and target domains.By aligning the source and target domain data features, we can resolve the issue of ineffective deep-learning-based modulation recognition methods caused by parameter sensitivity and improve the model's generalization performance on unlabeled data.
Therefore, DANN must accomplish the following two core tasks during training.The first task is to accurately classify the source domain data to minimize the loss of the label classifier.The second task is to confuse the source and target domain data to maximize the loss of the domain classifier.The objective function of DANN can be represented as follows.
A saddle point can be found as a stationary point of the following gradient updates.The loss function consists of the source and domain classification loss.The source domain classification loss is used to ensure that the neural network performs well on the source domain data.The domain classification loss aims to align the data distributions of the source and target domains so that the neural network trained on the source domain can also exhibit good classification performance on the unlabeled target domain.By jointly optimizing the source and domain classification loss, we can train a neural network with good generalization performance, enabling it to perform effectively in the target domain.
The core idea of this approach is to use the domain classifier to guide the feature extractor in learning feature representations that have discriminative performance for both the source and target domains.By aligning the source and target domain data features, we can resolve the issue of ineffective deep-learning-based modulation recognition methods caused by parameter sensitivity and improve the model's generalization performance on unlabeled data.
Therefore, DANN must accomplish the following two core tasks during training.The first task is to accurately classify the source domain data to minimize the loss of the label classifier.The second task is to confuse the source and target domain data to maximize the loss of the domain classifier.The objective function of DANN can be represented as follows.
where G f is the feature extractor with parameters θ f ; G y is the label predictor in the source domain with parameter θ y ; G d is the domain classifier with parameter θ d ; and the number of samples in the source and target domain is denoted as n s and n t , respectively.Further, y s i and d s,t j are the source domain category label (only the data in the source domain has the category label) and the domain label (both the source and target domain data have the domain label), respectively; λ is the weight coefficient; and L y and L d represent the cross-entropy loss function obtained by finding the saddle point θ f , θy , θd such that θd = argmin A saddle point can be found as a stationary point of the following gradient updates.
where u is the learning rate. Ref.
[25] added GRL to achieve true end-to-end training and avoid the two-stage training process where the generator and discriminator parameters are fixed separately, as in generative adversarial networks (GANs).Mathematically, we can formally treat the GRL as a "pseudo-function" ( x) defined by two (incompatible) equations describing its forward and backpropagation behavior: where I is an identity matrix.It is worth noting that Equations ( 7) and ( 8) define a trainable network layer that does not require parameter updates.It can be easily implemented using existing deep learning tools, specifically by defining the procedures for forward propagation (identity transformation) and backpropagation (multiplication by −1).
Then, we can define the objective "pseudo-function" of θ f , θ y , θ d that is being optimized by the stochastic gradient descent as follows: with n t unlabeled samples.Here, x s i represents a single source-domain-modulated signal sample, with a corresponding label y s i ; and x t i represents a single-target domain-modulated signal sample without a label.Let us assume that the label space of the source domain contains |C s | types of radio signal and is denoted as C s .Similarly, the label space of the target domain contains |C t | types of radio signal and is denoted as C t .

Unsupervised NPDA Problem
For the problem of unsupervised NPDA, we assume that the source and target domains have the same number of labels and label types for radio signals, thus indicating that the domains have the same label space.Let p and q denote the marginal probability distribution of the two domains, where p = q.The primary objective is to transfer knowledge from the source to the target domain and align the distribution between the target and source domains.Figure 2 is the schematic of the DL-based AMC method directly applied to the unsupervised NPDA problem of AMC and the expected effect to be achieved using the unsupervised NPDA method.

Unsupervised PDA Problem
For unsupervised PDA problem, we assume that the label space of the target domain is a proper subset of the source domain, i.e.

Unsupervised NPDA Problem
For the problem of unsupervised NPDA, we assume that the source and target domains have the same number of labels and label types for radio signals, thus indicating that the domains have the same label space.Let p and q denote the marginal probability distribution of the two domains, where p q ≠ .The primary objective is to transfer knowledge from the source to the target domain and align the distribution between the target and source domains.Figure 2 is the schematic of the DL-based AMC method directly applied to the unsupervised NPDA problem of AMC and the expected effect to be achieved using the unsupervised NPDA method.

Unsupervised PDA Problem
For unsupervised PDA problem , we assume that the label space of the target domain is a proper subset of the source domain, i.e., s t C C ⊄ .Here, p and q denote the mar- ginal probability distribution of the two domains, where p q ≠ , and t pc denotes the marginal probability distribution of source domain samples with labels shared by the two domains, which differs from that of the target domain.The main objective of unsupervised PDA is to align the fine-grained shared label distributions.
In addition to the challenges of different distributions between the source and target domains of the modulated signal and the lack of labels in the target domain, adaptation also involves the difficulty of not knowing the shared label space for the source and target domain modulation signals during training as the label space of target domain t C is unknown at that time [32].This poses two technical challenges.First, directly applying unsupervised NPDA AMC algorithms aligns the global distributions of both domains, thus causing negative transfer due to the existence of outlier C t .Here, p and q denote the marginal probability distribution of the two domains, where p = q, and pc t denotes the marginal probability distribution of source domain samples with labels shared by the two domains, which differs from that of the target domain.The main objective of unsupervised PDA is to align the fine-grained shared label distributions.

Unsupervised PDA Problem
For unsupervised PDA problem , we assume that the label space of the target domain is a proper subset of the source domain, i.e., s t C C  .Here, p and q denote the marginal probability distribution of the two domains, where p q  , and t pc denotes the marginal probability distribution of source domain samples with labels shared by the two domains, which differs from that of the target domain.The main objective of unsupervised PDA is to align the fine-grained shared label distributions.
In addition to the challenges of different distributions between the source and target domains of the modulated signal and the lack of labels in the target domain, adaptation also involves the difficulty of not knowing the shared label space for the source and target domain modulation signals during training as the label space of target domain t C is unknown at that time [32].This poses two technical challenges.First, directly applying unsupervised NPDA AMC algorithms aligns the global distributions of both domains, thus causing negative transfer due to the existence of outlier classes denoted as C C (i.e., signal categories only included in the source domain modulation dataset).Therefore, the matching of outlier classes should be avoided.Second, aligning the distributions of t pc and q to promote positive transfer is goal of this study.Thus, elim- inating or reducing the impact of outlier classes in the source domain and promoting the transfer of shared classes (i.e., signal categories included in the source and target domain modulation datasets) from the source domain to the target domain is critical.Figure 3 illustrates this problem, considering a simple case with three modulation signal categories in the source domain and only one in the target domain.In addition to the challenges of different distributions between the source and target domains of the modulated signal and the lack of labels in the target domain, adaptation also involves the difficulty of not knowing the shared label space for the source and target domain modulation signals during training as the label space of target domain C t is unknown at that time [32].This poses two technical challenges.
First, directly applying unsupervised NPDA AMC algorithms aligns the global distributions of both domains, thus causing negative transfer due to the existence of outlier classes denoted as C s \C t (i.e., signal categories only included in the source domain modulation dataset).Therefore, the matching of outlier classes should be avoided.Second, aligning the distributions of pc t and q to promote positive transfer is goal of this study.Thus, eliminating or reducing the impact of outlier classes in the source domain and promoting the transfer of shared classes (i.e., signal categories included in the source and target domain modulation datasets) from the source domain to the target domain is critical.Figure 3 illustrates this problem, considering a simple case with three modulation signal categories in the source domain and only one in the target domain.

Unsupervised PDA Problem
For unsupervised PDA problem , we assume that the label space of the target domain is a proper subset of the source domain, i.e., s t C C  .Here, p and q denote the marginal probability distribution of the two domains, where p q  , and t pc denotes the marginal probability distribution of source domain samples with labels shared by the two domains, which differs from that of the target domain.The main objective of unsupervised PDA is to align the fine-grained shared label distributions.
In addition to the challenges of different distributions between the source and target domains of the modulated signal and the lack of labels in the target domain, adaptation also involves the difficulty of not knowing the shared label space for the source and target domain modulation signals during training as the label space of target domain t C is unknown at that time [32].This poses two technical challenges.First, directly applying unsupervised NPDA AMC algorithms aligns the global distributions of both domains, thus causing negative transfer due to the existence of outlier classes denoted as \

Multimodal Feature Input Calculation
The AMC method based on image representations (such as eye and constellation diagrams) depends on the accurate estimation of signal modulation parameters; thus, it cannot to classify noncooperative received signals.The length of the sampled data considerably influences the accuracy of estimating higher-order cumulant features is computationally complex.
Therefore, under the noncooperative reception condition, we aimed to reduce computational complexity and data volume while fully using the signal's multimodal features.We used IQ and AP sequences from the signal's sequential representation and spectral amplitude and squared signal's spectral amplitude from the feature representation as inputs to the network.Assuming that the baseband complex signal obtained after the received signal is x(n), the detailed calculation methods are as follows.
IQ sequence (I): The in-phase and quadrature components of the signal are the real and imaginary parts of the signal, as follows: The first modality F iq is the imaginary part and real part of the received signal, which is expressed as: Spectral amplitude and squared signal's spectral amplitude (S): Calculate the spectral amplitude of the signal, as follows: where | • | denotes the modulus operation.Calculate the squared signal's spectral amplitude, as follows: The second modality F spc consists of the spectral amplitude and the squared signal's spectral amplitude, which is expressed as where F spc , represents spectral features of the received signals in the frequency domain for DL models to recognize frequency and phase modulated signals.
AP sequence (A): Calculate the normalized instantaneous amplitude of the signal, as follows.
This feature can reflect the amplitude variation of different modulated signals, which is helpful for DL models to recognize amplitude-modulated signals.
Calculate the instantaneous phase of the signal, as follows.
where the value of phase The third modality,F ap , comprises instantaneous amplitude and instantaneous frequency, which is expressed as: Figures 4-7 present the schematic of features extracted from nine modulation schemes, namely 8PSK, BPSK, 2FSK, 4FSK, 2ASK, GFSK, PAM4, QAM16, and QPSK.
The third modality, ap F , comprises instantaneous amplitude and instantaneous frequency, which is expressed as:    The third modality, ap F , comprises instantaneous amplitude and instantaneous frequency, which is expressed as:      The complementarity between the first and third modalities (IQ and AP) has been proven in previous research [33,34].It has been shown that (1) algorithms that utilize AP as input data outperform IQ algorithms at high SNR but show opposite results at low SNR; (2) the features extracted from IQ and AP exhibit complementary characteristics.
Furthermore, selecting features with stronger representational power can enhance the performance of existing deep-learning-based AMC.For instance, when ASK has to be distinguished from other signals, choosing instantaneous amplitude features may yield more effective results.Similarly, when differentiating PSK from other signals, selecting instantaneous phase features may be preferred.When the task is to distinguish FSK from other  The complementarity between the first and third modalities (IQ and AP) has been proven in previous research [33,34].It has been shown that (1) algorithms that utilize AP as input data outperform IQ algorithms at high SNR but show opposite results at low SNR; (2) the features extracted from IQ and AP exhibit complementary characteristics.
Furthermore, selecting features with stronger representational power can enhance the performance of existing deep-learning-based AMC.For instance, when ASK has to be distinguished from other signals, choosing instantaneous amplitude features may yield more effective results.Similarly, when differentiating PSK from other signals, selecting instantaneous phase features may be preferred.When the task is to distinguish FSK from other The complementarity between the first and third modalities (IQ and AP) has been proven in previous research [33,34].It has been shown that (1) algorithms that utilize AP as input data outperform IQ algorithms at high SNR but show opposite results at low SNR; (2) the features extracted from IQ and AP exhibit complementary characteristics.
Furthermore, selecting features with stronger representational power can enhance the performance of existing deep-learning-based AMC.For instance, when ASK has to be distinguished from other signals, choosing instantaneous amplitude features may yield more effective results.Similarly, when differentiating PSK from other signals, selecting instantaneous phase features may be preferred.When the task is to distinguish FSK from other signals, instantaneous frequency features can be a suitable choice.Lastly, constellation mapping can represent a feature of differentiating higher-order modulation schemes.
Therefore, the utilization of modal information should be determined based on the specific signal categories and their corresponding feature selection.The multi-modal feature input that we have chosen here is provided as an example.

1.
Initializing the network parameters and θ d 3 .

Training
For v = 1, 2, . .., epo It should be noted that a previous study has shown that employing corresponding network structures for different modalities can lead to better feature representations [35].However, as neural networks become deeper, the performance differences between various network structures may diminish.Therefore, in this work, we concentrate on addressing the challenge of parameter sensitivity, which can impede the effectiveness of deeplearning-based modulation recognition methods, by leveraging multi-modal information and adversarial training.Rather than utilizing multiple network structures to extract deep features from different modal inputs, we have made a deliberate choice to maintain focus and prevent the research from becoming divergent.This decision is reasonable and helps us maintain a concentrated and focused approach to our research.It should be noted that a previous study has shown that employing corresponding network structures for different modalities can lead to better feature representations [35].However, as neural networks become deeper, the performance differences between various network structures may diminish.Therefore, in this work, we concentrate on addressing the challenge of parameter sensitivity, which can impede the effectiveness of deeplearning-based modulation recognition methods, by leveraging multi-modal information and adversarial training.Rather than utilizing multiple network structures to extract deep features from different modal inputs, we have made a deliberate choice to maintain focus and prevent the research from becoming divergent.This decision is reasonable and helps us maintain a concentrated and focused approach to our research.

Multimodal Fusion AMC Module
The source domain label classification loss is calculated by propagating the loss through backpropagation to each feature extractor after fusing and concatenating the deep features extracted from the input multimodal features (I, S, A).Therefore, the source domain label classification loss can be expressed as:

Domain Adversarial Alignment Module
To leverage the complementary benefits of multimodal information, a domain classifier is applied to each mono-modality feature to align the feature distributions between the source and target domains.Thus, the overall domain classification loss can be defined as:

Multimodal Fusion AMC Module
The source domain label classification loss is calculated by propagating the loss through backpropagation to each feature extractor after fusing and concatenating the deep features extracted from the input multimodal features (I, S, A).Therefore, the source domain label classification loss can be expressed as: where G f 1 , G f 2 , and G f 3 are three feature extractor with parameters θ f 1 , θ f 2 and θ f 3 ; G y is the label predictor with parameter θ y , respectively; G d is the domain classifier with parameter θ d ; n s is the number of samples in the source domain; y s i is the source domain category label; and L y represents the cross-entropy loss function.

Domain Adversarial Alignment Module
To leverage the complementary benefits of multimodal information, a domain classifier is applied to each mono-modality feature to align the feature distributions between the source and target domains.Thus, the overall domain classification loss can be defined as: where The loss function of the label classifier applies only to the source domain, whereas the loss function of the domain classifier applies to both the source and target domains.

Class-Entropy Weighted Multimodality DANN Modulation Classification
Directly applying DANN to unsupervised PDA AMC problems can degrade performance owing to negative transfer caused by outlier classes, which can be reduced by mitigating the influence of outlier classes [32].Therefore, the main ideas in [32,[36][37][38][39] include assigning higher class weights to shared class samples and lower weights to outlier class samples from the source domain, either in the label predictor or the domain adversarial classifier.Our approach identifies the modulation schemes of the target domain.It assigns higher weights to source domain samples that belong to the same class as the target domain by introducing a class weight weighting and entropy weighting mechanism into the proposed early MMDA-MC model.The modified model is named WMMDA-MC.
The output of the label predictor ŷi = G y (x i ) presents a probability distribution in the source domain label space C S for each input sample x i .This distribution effectively describes the likelihood of a sample belonging to a certain class.As the label spaces of outlier classes and shared classes do not overlap, the label predictor should assign a sufficiently low probability of predicting an outlier class for shared class samples in the target domain.Based on the output of the label predictor for target domain samples, we can determine the weights of each class in the target domain and share these weights with the source domain samples.The impact of prediction errors can be reduced by averaging the SoftMax predictions of all target domain samples.Ultimately, the contribution of each class in the source domain to training can be represented as where η is a |C S |,-dimensional vector that quantitatively describes the different categories in the source domain label space during training.Considering that this vector is obtained from the output of the target domain samples in the label predictor and that the target domain does not include outlier classes, the weights assigned to the outlier classes in η should be noticeably smaller than those assigned to the shared classes.
In addition to reducing negative transfer, promoting positive transfer from pc t to q is also important.Multimodal information in the signal can enhance the confidence of the label predictor's predictions, thus enabling more accurate assignment of appropriate weights to the shared and outlier classes.Furthermore, DANN can better facilitate the transfer between the shared classes in the source and target domains.
According to [39,40], for PDA problems, having every sample from the source and target domains equally participate in domain adversarial training is unreasonable.The presence of difficult samples to accurately predict and located near the label predictor can negatively impact domain adversarial training.These difficult-to-predict samples are referred to as "hard samples", whereas easily predictable samples are called "soft samples".Figure 9 illustrates soft and hard samples in the simplest binary classification case.The existence of hard samples affects the transfer ability of soft samples; hence, the weight of hard samples in domain adversarial training must be reduced, whereas that of the soft samples must be increased.Hard samples primarily originate from two sources.As outlier and shared classes are orthogonal, no outlier classes exist in the target domain.Therefore, outlier classes are comparatively harder to transfer and lack directionality, resulting in more difficulty in accurate prediction.Moreover, hard-to-transfer samples within the shared classes are fewer, and they can be measured using the conditional entropy criterion defined as The total loss function for the PDA problem is modified by incorporating Equations ( 22)- (24) into Equation ( 21), as follows.The proposed WMMDA-MC method modified according to the loss function is shown in Figure 10.We present the training and testing steps of the proposed MMDA-MC in Algorithm 2. According to the optimization principle in [40], in adversarial training, entropy weighting is applied to each sample and expressed as The total loss function for the PDA problem is modified by incorporating Equations ( 22)-( 24) into Equation ( 21), as follows.
where η y s i represents the weight of each source domain sample, obtained by taking the y s i -th value in the vector.
The proposed WMMDA-MC method modified according to the loss function is shown in Figure 10.We present the training and testing steps of the proposed MMDA-MC in Algorithm 2. i=1 , the learning rate u, and the number of iterations iter.Output: Classification accuracy.

1.
Initializing the network parameters and θ d 3 .

Training
For v = 1, 2, . .., iter a.If iter = 1 and is a multiple of the test interval , and G f 3 (A t i ; θ f 3 ).Concatenate these deep features and input them to G y to get SoftMax output (target_softmax).Take the average of target_softmax to obtain the class weight vector η.The cross-entropy losses for each source and target domain sample should be weighted using the respective entropy weights w(x s ) and class weights w(x t ) i.
Input the fused source domain feature into G y to calculate L y .j. Add and θ d 3 by using gradient descent.l.
Adjust learning rate u. m.If converges to an extremum or L reaches a preset threshold:

Model Network Structure
The feature extractor used in this study mainly consists of five convolutional layers (conv1, conv2, conv3, conv4, and conv5) and a fully connected layer (fc1) for extracting features from the source and target domains.ReLU is used as the activation function, and BatchNorm is applied for normalization.Additionally, pooling layers are added after conv2, conv3, conv4, and conv5 to reduce data dimensionality.The original input size of the feature extractor is N × 2 × 128, and it is reshaped using the Reshape function before being fed into the feature extractor as N × 1 × 2 × 128, where N represents the batch size.An AdaptiveAvg-Pool2d layer conducts binary adaptive mean aggregation, thus ensuring that the features extracted by each feature extractor have consistent dimensions during fusion.
The label predictor consists of two fully connected layers (fc1, fc2) for predicting the labels of the source domain data.ReLU is the activation function.The hidden layer features outputted by the feature extractor are fused and concatenated before being input to the label prediction classifier.
The domain classifier includes three fully connected layers (fc1, fc2, and fc3) for discriminating whether the hidden layer output from the feature extractor belongs to the source or target domains.ReLU is used as the activation function, and each domain classifier is preceded by a GRL.
The Adam optimizer is used for optimizing the feature extractor, label predictor, and domain classifier.A restarted cosine annealing method [41] is applied to update the learning rate at the end of each epoch.The network layouts for each module are presented in Tables 3-5.The feature extractor used in this study mainly consists of five convolutional layers (conv1, conv2, conv3, conv4, and conv5) and a fully connected layer (fc1) for extracting features from the source and target domains.ReLU is used as the activation function, and BatchNorm is applied for normalization.Additionally, pooling layers are added after conv2, conv3, conv4, and conv5 to reduce data dimensionality.The original input size of the feature extractor is N × 2 × 128, and it is reshaped using the Reshape function before being fed into the feature extractor as N × 1 × 2 × 128, where N represents the batch size.An AdaptiveAvgPool2d layer conducts binary adaptive mean aggregation, thus ensuring that the features extracted by each feature extractor have consistent dimensions during fusion.
The label predictor consists of two fully connected layers (fc1, fc2) for predicting the labels of the source domain data.ReLU is the activation function.The hidden layer features outputted by the feature extractor are fused and concatenated before being input to the label prediction classifier.
The domain classifier includes three fully connected layers (fc1, fc2, and fc3) for discriminating whether the hidden layer output from the feature extractor belongs to the source or target domains.ReLU is used as the activation function, and each domain classifier is preceded by a GRL.
The Adam optimizer is used for optimizing the feature extractor, label predictor, and domain classifier.A restarted cosine annealing method [41] is applied to update the learning rate at the end of each epoch.The network layouts for each module are presented in Tables 3-5.

Simulation Environment and Evaluation Metric
The DL environment is configured with Python 3.8.0,pytorch 1.6.0,cuda10.2.89 on Windows Server 2012 R2 Standard.The CPU is dual Intel(R) Xeon(R) Gold 6230R, with NVIDIA Tesla V100 and 128 GB memory.

Layers Output Shape
Reshape N × 1 × 2 × 128 Conv (filter 32, size = (2,7), stride = 2, padding = (0,3), bias = True) N × 32 × 1 × 64 BatchNorm2d (32) + ReLU N × 32 × 1 × 64 Conv (filter 64, size = (1,3), stride = 1, padding = (0,1), bias = True) For the MMDA-MC method experiment, the optimizer batch size is set to 5000; the epoch is 50, and λ i , i = 1, 2, 3; all three parameters remain the same.As the epochs iterate from 0 to 1, the strategy used in [42] is adopted to update λ i such that λ i = 2 1+e −10r − 1, where r represents the current iteration number divided by the total iteration number.In the early stage of training, λ i tends to be 0, thus indicating the importance of optimizing the label predictor.In the later training stages, λ i tends to be 1, thus indicating equal importance in optimizing the label predictor and domain classifier.The AMC performance metric is defined as classification accuracy: where M P represents the number of correctly classified samples and M represents the total number of samples.For the WMMDA-MC method experiment, the optimizer batch size is set to 400, iterations are set to 40,000 (as the sample sizes of the source and target domains are different, epoch is not used for counting iterations), test interval is set to 40,000, and the initial learning rate is set to 1 × 10 −3 .

Dataset Generation
The datasets were created following the method described in [31], which includes five parts: symbol data generation, digital modulation, channel modeling, normalization, and storage.
Based on the unsupervised NPDA problem for AMC mentioned in Section 2.2.2, we designed a typical cross-domain dataset with different parameters and reception conditions to verify the good classification performance and generalization ability of the MMDA-MC method, named Dataset A. By adjusting the SNR, channel types, and samples per symbol (sps), we controlled different domains of data.Dataset A is divided into 12 subsets, named D n , n = 1, 2, . . .n, containing 8PSK, BPSK, 2FSK, 4FSK, 2ASK, GFSK, PAM4, QAM16, and QPSK, which are nine typical digital modulation schemes.The number of sampling points for each signal sample was 128.Each subset contains 1200 training samples, 400 validating samples and 400 testing samples for each modulation scheme at different SNRs.The parameter settings are presented in Table 6, and the remaining parameters are kept consistent.Based on the 12 subsets, we designed 12 × 11 = 132 DA tasks, denoted as D i → D j , i = 1, 2, . . ., 12; j = 1, 2, . . ., 12; i = j where the left side of → represents the labeled source domain dataset and the right side represents the unlabeled target domain dataset.We introduced two research variables, sps and the number of modulation schemes, to further validate that the modified WMMDA-MC model can effectively address not only unsupervised NPDA but also unsupervised PDA.The parameter sps, when the sampling rate is the same, can control the symbol rate of the modulated signal, which significantly effects transmission rate, bandwidth requirements, and noise resistance in digital communication.Therefore, we specifically selected sps as the experimental variable.Additionally, other source and target domain data parameters were kept consistent to avoid the effects of other interfering variables, except for the difference in sps and modulation scheme types.We designed dataset B, which comprises BPSK, QPSK, 8PSK, PAM4, QAM16, GFSK, CPFSK, and QAM64 modulation schemes, used for AMC under Rician channel conditions.Table 7 presents the parameter settings for the Rician channel.The SNR ranges from 0 to 18 dB with an interval of 2 dB.Dataset B includes two subsets.Each subset contains 1200 training samples, 400 validating samples and 400 testing samples for each modulation scheme at different SNRs.The subset with a sps of 8 is named D s8 and will be considered the source domain because it contains all eight modulation schemes.The subset with an sps of 4 is further divided into eight datasets based on the included modulation types.For example, the dataset containing only the first modulation type (BPSK) is named D s4_1 .
These datasets collectively form the target domain, and the label space of the target domain should be a proper subset of the source domain label space.Here, when D s4_8 is used as the target domain, the PDA problem becomes an NPDA problem.Based on these nine datasets from the source and target domains, we design eight DA tasks for each method, denoted as D s8 → D s4_i , i = 1, 2, . . .8, where the left side of "→" represents the labeled source domain dataset and the right side represents the unlabeled target domain dataset.

Baseline 4.3.1. Supervised Learning
Supervised learning algorithms are used for comparison to explore the upper limit of classification accuracy in DA methods.Currently, most AMC algorithms based on DL are trained on labeled datasets and tested on datasets with the same distribution as the trained datasets.This method is referred to as "supervised" in this study.The network structure of the supervised method is composed by concatenating the feature extractor and label predictor introduced in Section 3.

Supervised Learning with Different Source and Target Domain Distributions
To compare the performance gain of DA methods with current DL-based AMC algorithms in practical scenarios, we designed a "source-only" method.The training was conducted using the supervised network structure, whereas testing was performed on a target domain different from the source domain.

Effectiveness Analysis of the Multimodal Fusion Strategy
This section verifies the effectiveness of multimodal fusion in DL-based AMC algorithms.It tests the classification performance of the "supervised" method under different input feature combinations on a target domain dataset with the same distribution as the source domain.Figure 11 and Table 8 present the average classification accuracy; the column headers represent different feature combinations fed into the network, and the row headers represent different datasets.Using basic features such as amplitude, phase, and frequency for learning and training can effectively achieve AMC under high SNR Gaussian channels.Furthermore, multimodal fusion outperforms single-feature approaches in classification performance across different channels, SNRs, signal parameters, and other datasets.Moreover, the classification performance increases when more modal features are used as input.This demonstrates that multimodal information can achieve better complementary gains by helping the network learn and understand the input objects, thus improving the network's classification performance.
for learning and training can effectively achieve AMC under high SNR Gaussian channels.Furthermore, multimodal fusion outperforms single-feature approaches in classification performance across different channels, SNRs, signal parameters, and other datasets.Moreover, the classification performance increases when more modal features are used as input.This demonstrates that multimodal information can achieve better complementary gains by helping the network learn and understand the input objects, thus improving the network's classification performance.

Validity Analysis of Cross-Domain AMC
This section verifies the effectiveness of the MMDA-MC method for solving the unsupervised NPDA problem in cross-domain AMC and analyzes the impact of the channel, SNR, and symbol rate.Multimodal information is employed, with "source-only" serving as the control group.Table 9 presents the average classification accuracy, where the column headers indicate the current dataset as the source domain and others as the target domain.For visual comparison, Figure 12 shows a histogram comparing the two methods.Evidently, when the target and source domain data differ, the accuracy of the DL-based AMC method decreases significantly, thus indicating significant challenges in realistic scenarios.However, the proposed MMDA-MC demonstrates clear advantages in such scenarios.Compared with "source-only" without cross-domain training, it improves classification accuracy from 8.22% to 31.03%.Analyzing the different domain discrepancies caused by varying parameters between the source and target domains is important for evaluating the MMDA-MC method's performance.For experimental convenience and demonstration, all 132 DA tasks generated from the 12 datasets are divided into seven categories based on the types and combinations of parameter variations.Table 10 summarizes the DA task mappings: "1" indicates that the parameter remains consistent between the source and target domains, whereas "0" indicates a change.For example, DT110 represents scenarios where the source and target domains have consistent symbol rates and channels but varying SNR.Analyzing the different domain discrepancies caused by varying parameters between the source and target domains is important for evaluating the MMDA-MC method's performance.For experimental convenience and demonstration, all 132 DA tasks generated from the 12 datasets are divided into seven categories based on the types and combinations of parameter variations.Table 10 summarizes the DA task mappings: "1" indicates that the parameter remains consistent between the source and target domains, whereas "0" indicates a change.For example, DT110 represents scenarios where the source and target domains have consistent symbol rates and channels but varying SNR.

Name sps
Channel SNR The results presented in Table 11 indicate the following.
(1) Overall, the average classification accuracy improvement ranges from 11.50% (DT101) to 20.58% (DT100), thus demonstrating that the MMDA-MC method can enhance classification performance even when there are one or multiple parameter differences between the source and target domains.(2) Under single parameter changes: When only the SNR or channel differs (DT110 and DT101, respectively) between the source and target domains, the MMDA-MC method achieves relatively high average classification accuracy of 75.94% and 74.72%, respectively.This indicates that the feature distributions are more similar when only the SNR or channel varies, facilitating feature distribution alignment.However, when only the sps differ (DT011), the average classification accuracy reduces to 58.32%.Particularly, when SNR and channel change simultaneously (DT100), the average classification accuracy is 63.39%, higher than in the case of sps difference alone.This demonstrates that a difference in sps reduces feature similarity, thereby increasing the difficulty in aligning the feature distributions between the source and target domains and significantly impacting classification performance.(3) When two or three parameters vary, particularly for DT000, although the MMDA-MC method improves the average classification accuracy compared with the "source-only" method, the accuracy is only 38.10%.Thus, DA methods can enhance classification performance when the differences between the source and target domains are significant.However, the gain in classification performance for the target domain is also limited owing to limited source domain knowledge.

Validity Analysis of Partial Cross-Domain AMC
This section validates the effectiveness of the WMMDA-MC for solving the unsupervised PDA problem in cross-domain AMC.The compared algorithms include supervised, source-only, and MMDA-MC.Table 12 presents the average classification performance of the proposed and compared algorithms on different DA tasks.Figure 13

Conclusions
The problem of insufficient generalization ability in current intelligent AMC algorithms was addressed in this study.A novel framework based on multimodal information and TL for cross-domain AMC was proposed to alleviate the phenomenon.The comprehensive amplitude, phase, and frequency information of the modulation signals were effectively utilized, and the cross-domain AMC problem was solved under varying parameter spaces such as symbol rate, SNR, channel model, and modulation categories.The learning ability of DL networks for signals and scenarios was enhanced by our method, and the method's robustness was improved, rendering it more adaptable to real-world application scenarios.The main achievements of this study are as follows: 1.The existing research results were used to construct two scenarios with 20 domains to create an AMC dataset.This provides a new multidomain dataset for intelligent AMC research with strong generalization capabilities.2. The proposed method guides the network to enhance its understanding of the modulation schemes using the multimodal information in the modulation signals.Experimental results demonstrate that the multimodal fusion input enables deep neural networks to learn richer information under supervised conditions, effectively improving classification performance to 89.80% on 12 datasets.3. TL is introduced to effectively utilize the unlabeled data in the target domain.A crossdomain AMC method is proposed based on the existing DANN.Experimental results show that the MMDA-MC improves the average classification accuracy by 18.53% compared to the "source-only" method in cross-domain classification problems.Moreover, under seven variations between the source and target domains, the average classification accuracy is improved by 11.50% (only channel changes) and 20.58% (changes in SNR and sps).4. Furthermore, when the modulation signal categories in the target domain are a proper subset of the source domain (category differences), an AMC method is proposed based on category-weighted entropy and multimodal DANN.Experimental results demonstrate that the WMMDA-MC achieves an average classification accuracy improvement of 21.42% compared with the MMDA-MC when category and sps differences exist between the source and target domains.Additionally, it achieves an average classification accuracy improvement of 23.72% compared to the "source-only" method.(1) The WMMDA-MC achieves an average classification accuracy of 93.23%, which is a significant improvement of 23.72% compared to 69.51% without TL.The average classification accuracy of the WMMDA-MC is higher than that of non-TL across all seven PDA tasks, essentially demonstrating that this method effectively addresses the performance degradation issue faced by current intelligent AMC algorithms when dealing with inconsistent distributions between source and target domains.(2) The proposed method improves the average accuracy by 21.42% compared with the MMDA-MC, with an average classification accuracy of 71.82%.Moreover, the average classification accuracy of the proposed method is higher than that of NPDA AMC algorithms across all seven PDA tasks.In particular, when the target domain only has two classes, the proposed method achieves a maximum improvement of 52.09% in average classification accuracy.This demonstrates that introducing class weight and entropy weighting can reduce the negative transfer effects caused by outlier classes in the source domain, thus promoting the positive transfer and improving classification performance.
(3) The average classification accuracy of the NPDA AMC method is lower than that of no TL when the target domain only has 1 or 2 modulation classes.This verifies that directly applying the unsupervised NPDA AMC to the unsupervised PDA problem can lead to abnormalities in the global matching strategy due to inconsistent label spaces between the source and target domains, resulting in performance degradation.(4) D 8 → D 4_8 task is equivalent to the unsupervised NPDA problem of AMC.There is minimal difference in classification accuracy between the MMDA-MC and WMMDA-MC algorithms.Note that because no outlier classes exist when the modulation classes of the source and target domains are the same, no significant distinction in weights among different classes will exist.Thus, the weights tend to average out, consequently not improving the performance.
By contrast, entropy weighting assigns smaller weights to outlier classes and minorityshared classes that are difficult to transfer and predict.Thus, when no outlier classes exist, the number of complex samples to transfer and predict decreases, reducing the performance gain and increasing the computational complexity.Therefore, the MMDA-MC is suitable when dealing with unsupervised NPDA.However, for the unsupervised PDA problem, the WMMDA-MC is preferable.

Conclusions
The problem of insufficient generalization ability in current intelligent AMC algorithms was addressed in this study.A novel framework based on multimodal information and TL for cross-domain AMC was proposed to alleviate the phenomenon.The comprehensive amplitude, phase, and frequency information of the modulation signals were effectively utilized, and the cross-domain AMC problem was solved under varying parameter spaces such as symbol rate, SNR, channel model, and modulation categories.The learning ability of DL networks for signals and scenarios was enhanced by our method, and the method's robustness was improved, rendering it more adaptable to real-world application scenarios.The main achievements of this study are as follows: 1.
The existing research results were used to construct two scenarios with 20 domains to create an AMC dataset.This provides a new multidomain dataset for intelligent AMC research with strong generalization capabilities.

2.
The proposed method guides the network to enhance its understanding of the modulation schemes using the multimodal information in the modulation signals.Experimental results demonstrate that the multimodal fusion input enables deep neural networks to learn richer information under supervised conditions, effectively improving classification performance to 89.80% on 12 datasets.3.
TL is introduced to effectively utilize the unlabeled data in the target domain.A cross-domain AMC method is proposed based on the existing DANN.Experimental results show that the MMDA-MC improves the average classification accuracy by 18.53% compared to the "source-only" method in cross-domain classification problems.Moreover, under seven variations between the source and target domains, the average classification accuracy is improved by 11.50% (only channel changes) and 20.58% (changes in SNR and sps).4.
Furthermore, when the modulation signal categories in the target domain are a proper subset of the source domain (category differences), an AMC method is proposed based on category-weighted entropy and multimodal DANN.Experimental results demonstrate that the WMMDA-MC achieves an average classification accuracy improvement of 21.42% compared with the MMDA-MC when category and sps differences exist between the source and target domains.Additionally, it achieves an average classification accuracy improvement of 23.72% compared to the "source-only" method.
Thus, the proposed framework exhibits better performance and adaptability in addressing cross-domain adaptation problems for AMC.
Funding: This research was funded by the National Natural Science Foundation of China, grant number 62271494.

) 2 . 2 .
Cross-Domain AMC The cross-domain AMC problem based on unsupervised DA comprises the source domain D s = x s i , y s i n s i=1 with n s labeled samples and target domain D t = x t i n t i=1

Figure 2 .
Figure 2. (a) Schematic of unsupervised NPDA method using global matching strategy.(b) Schematic of the expected result using unsupervised NPDA method.

Figure 2 .
Figure 2. (a) Schematic of unsupervised NPDA method using global matching strategy.(b) Schematic of the expected result using unsupervised NPDA method.

Figure 2 .
Figure 2. (a) Schematic of unsupervised NPDA method using global matching strategy.(b) Schematic of the expected result using unsupervised NPDA method.

Figure 2 .
Figure 2. (a) Schematic of unsupervised NPDA method using global matching strategy.(b) Schematic of the expected result using unsupervised NPDA method.
i.e., signal categories only included in the source domain modulation dataset).Therefore, the matching of outlier classes should be avoided.Second, aligning the distributions of t pc and q to promote positive transfer is goal of this study.Thus, elim- inating or reducing the impact of outlier classes in the source domain and promoting the transfer of shared classes (i.e., signal categories included in the source and target domain modulation datasets) from the source domain to the target domain is critical.Figure3illustrates this problem, considering a simple case with three modulation signal categories in the source domain and only one in the target domain.

Figure 3 .
Figure 3. (a) Schematic of unsupervised NPDA algorithm applied to the problem of unsupervised PDA.(b) Schematic of the expected result using unsupervised PDA method.

Figure 4 .
Figure 4. IQ sequence of nine types of modulation signals.Here the red and blue lines represent the in-phase and quadrature components of the signal, respectively.

Figure 5 .
Figure 5. Spectral amplitude of nine types of modulation signals.

Figure 4 .
Figure 4. IQ sequence of nine types of modulation signals.Here the red and blue lines represent the in-phase and quadrature components of the signal, respectively.

Figure 4 .
Figure 4. IQ sequence of nine types of modulation signals.Here the red and blue lines represent the in-phase and quadrature components of the signal, respectively.

Figure 5 .
Figure 5. Spectral amplitude of nine types of modulation signals.Figure 5. Spectral amplitude of nine types of modulation signals.

Figure 5 .
Figure 5. Spectral amplitude of nine types of modulation signals.Figure 5. Spectral amplitude of nine types of modulation signals.

Figure 6 .
Figure 6.Square spectrum of nine types of modulation signals.

Figure 7 .
Figure 7. AP sequence of nine types of modulation signals.Here the red and blue lines represent the instantaneous amplitude and instantaneous frequency, respectively.

Figure 6 . 27 Figure 6 .
Figure 6.Square spectrum of nine types of modulation signals.

Figure 7 .
Figure 7. AP sequence of nine types of modulation signals.Here the red and blue lines represent the instantaneous amplitude and instantaneous frequency, respectively.

Figure 7 .
Figure 7. AP sequence of nine types of modulation signals.Here the red and blue lines represent the instantaneous amplitude and instantaneous frequency, respectively.
-Domain AMC Method Based on Multimodal and TL 3.1.Multimodality DANN Modulation Classification For the problem of unsupervised NPDA, we proposed the multimodality DANN modulation classification model (MMDA-MC) shown in Figure 8.The network input consists of the multimodal features computed in Section 2.3.FE1, FE2, and FE3 represent the underlying feature extractors and share the same network structure.D1, D2, and D3 are domain classifiers with the same network structure.Cls is the class label predictor.s1, s2, and s3 are the hidden layer outputs of the three feature extractors.The method mainly consists of two key modules: multimodal fusion AMC module and domain adversarial alignment module.We present the training and testing steps of the proposed MMDA-MC in Algorithm 1.
G is the label predictor with parameter y  , respectively; d G is the domain classifier with parameter d  ; s n is the number of samples in the source domain; s i y is the source domain category label; and y L represents the cross-entropy loss function.


represents the weight of each source domain sample, obtained by taking the s i y -th value in the vector.

Figure 9 .
Figure 9. Schematic of the soft and hard samples in the two-category case.

Figure 9 .
Figure 9. Schematic of the soft and hard samples in the two-category case.

2 , and θ d 3
the concatenated the deep features of the target domain into G y to calculate the classification accuracy.End End 3. Testing Input the concatenated the deep features of the target domain into y G to calculate the classification accuracy.

Figure 11 .
Figure 11.Classification accuracy for different combinations of input on different datasets.

Figure 11 .
Figure 11.Classification accuracy for different combinations of input on different datasets.

Figure 13 .
Figure 13.Test accuracy of different algorithms in various cross-domain adaptation tasks (%).

Figure 13 .
Figure 13.Test accuracy of different algorithms in various cross-domain adaptation tasks (%).

Table 1 .
Summary of abbreviations.

Table 2 .
Definition of notation.
, C s C .Similarly, the label space of the target domain contains t C types of radio signal and is denoted as t C .

Algorithm 1 :
Training and Testing Steps of Proposed MMDA-MC Input: Multimodal features of the source and target domains {I s , and G f 3 (A t i ; θ f 3 ).b. Input the deep features into the G f 3 , G d 1 , and G d 2 to calculate L d 1 , L d 2 , and L d 3 .c. Concatenate the deep features to obtain fused feature.d.Input the fused feature into G y to calculate L y ; e. Add L d 1 , L d 2 , L d 3 , and L y to obtain L. f.Update θ y , θ f 1 , θ f 2 , θ f 3 , θ d 1 , θ d 2 , and θ d 3 by using gradient descent.g.Adjust learning rate u.h.If converges to an extremum or L reaches a preset threshold: Save weights θ y , θ f 1 , θ f 2 , θ f 3 , θ d 1 , θ d 2 , and θ d 3 .
and G d 3 are three domain classifiers with parameters θ d 1 , θ d 2 , and θ d 3 , respectively; λ 1 , λ 2 , and λ 3 are the weight coefficients; d s,t j is the source domain label; and L d 1 , L d 2 , and L d 3 represent the cross-entropy loss function.Thus, the total loss function can be defined as

Algorithm 2 :
The Training and Testing Steps of Proposed WMMDA-MC Input: the multimodal features of the source and target domains {I s G f 2 (S t i ; θ f 2 ), and G f 3 (A t i ; θ f 3 ) into G y to get SoftMax output (target_softmax).g.Calculate the source domain sample entropy weight vector w(x t ) based on target_softmax.h.Input the deep features into the G d 1 , G d 2 , and G d 3 to calculate L d 1 , L d 2 , and L d 3 .

Table 3 .
CNN architecture layout of feature extractor.

Table 4 .
CNN architecture layout of label predictor.

Table 5 .
CNN architecture layout of domain classifier.

Table 8 .
Test accuracy for different combinations of input on different dataset (%).

Table 8 .
Test accuracy for different combinations of input on different dataset (%).

Table 9 .
Test accuracy of different algorithms in various datasets (%).

Table 11 .
Test accuracy of different algorithms in various cross-domain adaptation task (%).

Table 12 .
Test accuracy of different algorithms in various cross-domain adaptation tasks (%).