Radio–Image Transformer: Bridging Radio Modulation Classiﬁcation and ImageNet Classiﬁcation

: Radio modulation classiﬁcation is widely used in the ﬁeld of wireless communication. In this paper, in order to realize radio modulation classiﬁcation with the help of the existing ImageNet classiﬁcation models, we propose a radio–image transformer which extracts the instantaneous amplitude, instantaneous phase and instantaneous frequency from the received radio complex baseband signals, then converts the signals into images by the proposed signal rearrangement method or convolution mapping method. We ﬁnally use the existing ImageNet classiﬁcation network models to classify the modulation type of the signal. The experimental results show that the proposed signal rearrangement method and convolution mapping method are superior to the methods using constellation diagrams and time–frequency images, which shows their performance advantages. In addition, by comparing the results of the seven ImageNet classiﬁcation network models, it can be seen that, except for the relatively poor performance of the architecture MNASNet1_0, the modulation classiﬁcation performance obtained by the other six network architectures is similar, indicating that the proposed methods do not have high requirements for the architecture of the selected ImageNet classiﬁcation network models. Moreover, the experimental results show that our method has good classiﬁcation performance for signal datasets with di ﬀ erent sampling rates, Orthogonal Frequency Division Multiplexing (OFDM) signals and real measured signals.


Introduction
Radio signal classification is widely used in the field of communication. For example, in cognitive radio [1], the secondary user needs to monitor the primary user's signal to avoid harmful interference to the primary user. Therefore, the secondary user needs to classify the primary user's signal to accurately know the type of primary user currently using the spectrum, so as to access the spectrum in a more efficient manner [2]. In electromagnetic spectrum management [3,4], some malicious users may transmit specific signals to disturb the spectrum use. In these circumstances with the overlapping and coexistence of various radio signals, if each radio signal can be identified, it will help to determine the existence of malicious users, and then take effective measures to deal with the situation. In adaptive modulation and coding communication [5][6][7], if the receiver is able to recognize the modulation and coding scheme adopted by the transmitter, it can choose the corresponding demodulation and decoding algorithm for information recovery, so as to save signaling overhead for informing the receiver of the adopted modulation and coding scheme. Modulation type is a basic attribute of radio communication signal, and in this paper, we mainly focus on radio modulation classification.
The purpose of the radio modulation classification problem is to identify the type of modulation to which the signal belongs. This is actually a multi-category classification problem, and the number of categories is the number of modulation types in the signal dataset.
In this paper, we treat the problem of radio modulation classification by making full use of the existing mature ImageNet classification network structure. Since the raw radio signals are represented in IQ form, which is different from the image, we propose a radio-image transformer (RIT) to convert the radio signal into the image format. To be specific, we firstly extract the instantaneous amplitude, instantaneous phase and instantaneous frequency of the radio IQ signals, and then convert the signals into images by the proposed signal rearrangement method (SRM) or convolution mapping method (CMM). We finally complete the task of signal modulation classification by using the existing ImageNet classification network models. We verify the effectiveness of the proposed methods through simulations.
The main contribution of this paper is that the RIT proposed can transform the signal classification problem into the image classification problem by mapping signals to images, and thus complete the radio signal modulation recognition using sophisticated classification techniques in the image field. We will show by extensive experiments that our method achieves better classification accuracy than some existing signal-to-image classification methods.
The rest of the paper is organized as follows. Section 2 gives the related work and literature review. Section 3 presents the definition of radio modulation classification problem. Section 4 introduces the proposed radio modulation classification methods. Section 5 introduces the experiments and performance evaluation. Section 6 summarizes the paper.

Related Work and Literature Review
In this section, we mainly introduce some related work of radio modulation classification. The traditional modulation classification methods are mainly based on feature design [8][9][10][11][12][13]. The quality of the designed features directly determines the quality of recognition performance. However, these designed features are often associated with a specific modulation, and it is difficult to find features that are widely applicable to a variety of modulations. As an end-to-end learning method, deep learning [14] unifies feature extraction and recognition tasks, avoids the process of manual feature design, and hence greatly enhances its universal applicability. For instance, in the field of image recognition, a lot of convolutional neural network (CNN) structures have been designed for the ImageNet dataset [15] and these CNNs achieved pretty good classification performance. In view of the great success of deep learning in image classification and other fields, more and more work has introduced deep learning into radio modulation recognition. These methods can be divided into two categories. The first is to design a special CNN or long short term memory (LSTM) network for modulation classification according to the raw radio signals' input form (the in-phase and quadrature (IQ) form) [16][17][18][19]. The second is to convert the radio signals into images, and then classify the images by referring to the network structure used for image classification, so as to realize the classification of radio signals. At present, there are two main ways to transform IQ signals into images, namely, constellation diagrams [20] and time-frequency images [21][22][23], but they may affect the modulation classification performance due to the loss of the temporal correlation information between signal sampling points.
The classification methods proposed in this paper make full use of the technology in the field of image classification, especially the content related to ImageNet. ImageNet is an image dataset widely used by computer vision researchers and plays an important role in object detection and recognition and other fields. The image dataset was created by Professor Li's team. At that time, the academic scholars mostly focused on designing better algorithms. Prof. Li, however, believed that a good dataset is beneficial to the training of models and then began to collect images with annotated information on a large scale and finally officially launched the ImageNet dataset in 2009. This dataset contains about 15 million annotated images, which can be divided into about 22,000 categories [15]. The dataset can be used to train the models to improve their performance, which makes a great contribution to the academic research in this area.
Electronics 2020, 9, 1646 3 of 14 In 2010, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), based on the ImageNet dataset, was launched globally. The competition includes image classification, target detection, target positioning, scene classification and video target detection, and so on. A subset of the ImageNet dataset is used in this competition, which contains 1000 categories, and each category contains about 1000 images. Various classic CNN models, such as AlexNet [24], ZFNet [25], VGG [26], GoogleNet [27] and ResNet [28], emerged in the ILSVRC competitions held for 7 years. In particular, ResNet learns the residual representation between inputs and outputs by using multiple layers that contain parameters and can be learned, thus avoiding the problem of gradient disappearance and making the model easier to optimize. In the field of image classification, the CNN models trained with a large amount of high-quality images can be transferred to other image datasets for classification and recognition, and they can also be fine-tuned and trained on this basis. The advantage of this is that it is unnecessary to train the neural network from scratch, which can save time and computing resources. Most of the classic CNN models mentioned above have a pre-training model based on the ImageNet dataset, which is convenient for relevant personnel to conduct research. In addition, some lightweight CNN models for edge application scenarios also have pre-training models based on the ImageNet dataset, such as SqueezeNet [29], Xception [30], MobileNet [31], and ShuffleNet [32]. Taking MobileNet as an example, it is a small model proposed by Google, which can be effectively applied to the recognition task on intelligent devices. Unlike the traditional CNN models, MobileNet uses different convolution kernels to process different input channels, and then uses the convolution kernel of 1 × 1 to process the output obtained from the above operation, which greatly reduces the number of model parameters while ensuring that its performance is not far from the standard convolution.

Definition of the Problem
In this section, we give the definition of the signal modulation classification problem. After the signal is sent by the transmitter, it travels through the wireless channel and reaches the receiving end. The received signal can be expressed as: where * presents convolution operation, s(n) is the transmitted complex baseband signal, h(n) is the channel response, g(n) is the response of the shaping filter, ∆ f is carrier frequency deviation, ∆θ is phase deviation, w(n) is additive white Gaussian noise (AWGN), and N is the number of sampling points. Let the set of modulation categories to be classified as M = {1, 2, . . . , M}, where M is the number of modulation categories, then the radio modulation classification problem can be expressed as an M-hypothesis testing problem, that is: where r N×1 = [r(0), r(1), . . . , r(N − 1)] T and T (r N×1 ) represents the modulation category to which the signal belongs. Obviously, this problem can be solved by a classifier with the number of categories M.

The Proposed Radio Modulation Classification Method
As previously pointed out, the research on ImageNet classification is very rich, and many CNN architectures with good classification capability have been proposed. The main idea of this paper is to use the existing CNN networks for ImageNet recognition to realize radio modulation classification. Since the image size of ImageNet is W × W × 3 and the radio signal is a complex signal of N × 1, the core problem we have to solve is how to convert the complex signals into images of the right size. This will be achieved through RIT. The function R(·) of RIT can be expressed as: Electronics 2020, 9, 1646 4 of 14 where y W×W×3 is the image output by RIT. Based on this, the overall architecture of radio modulation classification is shown in Figure 1. The figure shows that by constructing a bridge between radio signals and images through RIT, the mature neural networks used in the ImageNet classification domain can be used for radio modulation classification, thereby avoiding the complex process of designing specialized networks for radio signal classification and thus facilitating the cross-domain use of the same model. It should be noted that in order to meet the requirement of the number of modulation classification categories, the number of neurons in the classification layer of the ImageNet network needs to be replaced with M. The RIT algorithms proposed in this paper are described in detail in the following.
Electronics 2020, 9, x FOR PEER REVIEW 4 of 14 images of the right size. This will be achieved through RIT. The function ℛ(•) of RIT can be expressed as: where × × is the image output by RIT. Based on this, the overall architecture of radio modulation classification is shown in Figure 1. The figure shows that by constructing a bridge between radio signals and images through RIT, the mature neural networks used in the ImageNet classification domain can be used for radio modulation classification, thereby avoiding the complex process of designing specialized networks for radio signal classification and thus facilitating the cross-domain use of the same model. It should be noted that in order to meet the requirement of the number of modulation classification categories, the number of neurons in the classification layer of the ImageNet network needs to be replaced with M. The RIT algorithms proposed in this paper are described in detail in the following.

RIT
Existing ImageNet Classifier Radio signal Image Radio modulation classification result

Signal Rearrangement Method
The SRM directly arranges the one-dimensional signal sample points into two-dimensional images. We note that the image has three channels, and if the real and imaginary parts of the signal × are extracted separately, there are still only two channels. Instead, we from the received signal × to extract the instantaneous amplitude . Each element in the instantaneous amplitude is: Each element of the instantaneous phase is: where Im( ( )) and Re( ( )) represent the imaginary and real parts of ( ), respectively. Each element of the instantaneous frequency is: In order to make the dimension of the instantaneous frequency vector consistent with the instantaneous amplitude and the instantaneous phase, we set ( − 1) = 0.
Then we use sliding window to reorder instantaneous amplitude where the window length is and the interval is . The schematic diagram is shown in Figure 2. For convenience, , and in the figure represent ( ), ( ) and ( ), respectively. After the input signal is rearranged, the output size is × × 3, where = 224, which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify the radio modulation.

Signal Rearrangement Method
The SRM directly arranges the one-dimensional signal sample points into two-dimensional images. We note that the image has three channels, and if the real and imaginary parts of the signal r N×1 are extracted separately, there are still only two channels. Instead, we from the received signal r N×1 to extract the instantaneous amplitude Each element in the instantaneous amplitude is: Each element of the instantaneous phase is: where Im(r(n)) and Re(r(n)) represent the imaginary and real parts of r(n), respectively. Each element of the instantaneous frequency is: In order to make the dimension of the instantaneous frequency vector consistent with the instantaneous amplitude and the instantaneous phase, we set f (N − 1) = 0.
Then we use sliding window to reorder instantaneous amplitude where the window length is W and the interval is L. The schematic diagram is shown in Figure 2. For convenience, A n , P n and F n in the figure represent a(n), p(n) and f (n), respectively. After the input signal is rearranged, the output size is W × W × 3, where W = 224, which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify the radio modulation.  Figure 3. Similarly, after convolution mapping, the output size is × × 3, where = 224, which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify radio modulation.

Convolution Mapping Method
The CMM also calculates firstly the instantaneous amplitude . . , f (N − 1)] T of the received signal r N×1 , then adds a layer of convolution and pooling operation to turn input into an image of the right size. The schematic diagram is shown in Figure 3. Similarly, after convolution mapping, the output size is W × W × 3, where W = 224, which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify radio modulation.  Figure 3. Similarly, after convolution mapping, the output size is × × 3, where = 224, which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify radio modulation.

The Training Procedure
The purpose of training is to make use of the training dataset, optimize the network parameters, make it achieve better performance on the training set, and at the same time make the networks generalize to other data outside the training set as much as possible. First, we need to build the training set for training. The training set contains the received signal samples and corresponding labels, which can be expressed as: where, m (i) represents the modulation type corresponding to the i-th sample, and S represents the number of samples in the training set. Since the RIT proposed in this paper (SRM and CMM) needs to convert the received signals into instantaneous amplitude, instantaneous phase and instantaneous frequency, the training set can also be expressed as: In this training set, the loss function used in training is the commonly used cross entropy. For a mini-batch containing N B samples, the loss function is defined as: where p ik is the confidence output on the k-th category when the i-th sample is taken as input, and d ik is the k-th dimension true label of the i-th sample. In this paper, Adam optimization algorithm (Adam) [33] is adopted for network training. It keeps an element-wise moving average of both the parameter gradients and their squared values and uses these averages to update the network parameters as where θ t is the network parameter vector (including weights and biases), t is the iterative index, α > 0 is the learning rate, β 1 ∈ [0, 1) is the exponential decay rate of the first order moment estimation, β 2 ∈ [0, 1) is the exponential decay rate of the second moment estimation,ˆ is a very small positive number which is used to prevent dividing by zero during the calculation. We consider two training strategies in this paper. The first training strategy is based on transfer learning. During training, only the weights of the last classification layer and the convolutional mapping layer are adjusted, while the weights of other layers are fixed. The second training strategy is to train the entire network as a new network. Compared with the first training strategy, the second method has higher computational complexity.

Dataset
In this paper, four datasets are generated to test the performance of the proposed methods.

• Sig1024
The dataset is generated with simulation and contains 12 modulations, including BPSK, QPSK, 8PSK, OQPSK, 2FSK, 4FSK, 8FSK, 16QAM, 32QAM, 64QAM, 4PAM and 8PAM. When IQ data is generated, the original information bit is generated in a random way. After modulation, a pulse shaping filter is used. The shaping filter used in the simulation is the raised cosine filter, and the rolling off factor is randomly chosen within the range of [0.2, 0.7]. Since the receiver and transmitter are not strictly synchronized, we add random phase and carrier frequency deviations to the signals, where the phase deviation is randomly chosen within the range of [−π, π], and the normalized carrier frequency offset (relative to the sampling frequency) is randomly chosen within the range of [−0.1, 0.1]. The noise is assumed to be AWGN, and the signal-to-noise ratio (SNR) ranges from −20 dB to 30 dB with an interval of 2 dB. Each IQ signal contains 128 symbols and each symbol has a sampling point of eight, so the number of samples per signal is 1024. In the training set, the sample size of each modulation under each SNR is 1000. In the test set, the sample size is half of the training set.

• Sig1024_2
This dataset is similar to Sig1024, except that two sampling rates are considered. Signal samples with oversampling ratio of four are also contained and therefore this dataset has twice as many signal samples as Sig1024.

• RealSig
This real dataset is collected over the air. A signal source generates a specific modulated signal, and the collector receives and samples the signal. Two modulations are considered, i.e., 16QAM and 64QAM. We add AWGN noise with the SNR ranges from 0 dB to 30 dB with an interval of 2 dB. In the training set and test set, the sample size of each modulation under each SNR is 3860.

Network Models
In the simulation, seven ImageNet classification network models are used, including the typical convolutional neural networks (CNNs) ResNet18, ResNet101, VGG16 and DenseNet121, and the lightweight models MobileNet, ShuffleNet and MNASNet. The number of parameters of these models and their accuracy on ImageNet classification are shown in Table 1.

Training Environment
In the experiments, the loss function was cross entropy loss, the optimizer was Adam, the learning rate was set at 0.001 and after every 20 epochs, the learning rate decreased to 80% of the previous value. The deep learning models were trained on NVIDIA GeForce RTX 2080, and the framework used was PyTorch.

Traditional Methods for Comparison
In this paper, constellation diagrams and time-frequency images are used for performance comparison. These two methods convert IQ signals into constellation diagrams and time-frequency images, respectively, and then use the existing CNNs for classification. The process of these two methods is described below.
Constellation diagram: Take radio signal's I channel as the x-coordinate and the corresponding Q channel as the y-coordinate, and then specify the number of bins for both the abscissa and ordinate dimension to be 224 to calculate the two-dimensional histogram of I channel and Q channel data. For each signal, we can get a 224 × 224 sized matrix, and the matrix is copied and expanded into an image of 224 × 224 × 3, which is used as the input of CNNs.
Time-frequency image: The short-time Fourier transform (STFT) is used to obtain the time-frequency image of the radio signal. First, the original 1024 length signal is expanded to 1116 length by repeating the first 92 values of the radio signal. We then compute the STFT of the signal. The window used is the Hamming window, the length of the window is set as 224, the overlap number of the window is set as 220, and the length of the STFT is set as 224. Through the above processing, a radio signal of 1116 length in the complex form can be transformed into a matrix of 224 × 224 size in the complex form. A time-frequency image of size 224 × 224 × 3 can be obtained by taking and combining the real part, imaginary part and amplitude of the matrix. This time-frequency image is used as the input of CNNs to realize the classification task.

Comparison of Training Strategies
At first, we compare the classification performance of the two training strategies mentioned above on dataset Sig1024. The method used is the signal rearrangement method and the model used is ResNet18. The first training strategy only adjusts the weight of the last classification layer during training, and the second training strategy trains the whole network from the start. Figure 4 shows the classification accuracy of the two training strategies under different SNRs, and Table 2 shows the overall classification accuracy of the two training strategies in the whole test dataset. Although the computational complexity of the first training strategy is low, the classification effect is unsatisfactory. In the following simulations, we adopt the strategy of training the entire network.

Training Environment
In the experiments, the loss function was cross entropy loss, the optimizer was Adam, the learning rate was set at 0.001 and after every 20 epochs, the learning rate decreased to 80% of the previous value. The deep learning models were trained on NVIDIA GeForce RTX 2080, and the framework used was PyTorch.

Traditional Methods for Comparison
In this paper, constellation diagrams and time-frequency images are used for performance comparison. These two methods convert IQ signals into constellation diagrams and time-frequency images, respectively, and then use the existing CNNs for classification. The process of these two methods is described below.
Constellation diagram: Take radio signal's I channel as the x-coordinate and the corresponding Q channel as the y-coordinate, and then specify the number of bins for both the abscissa and ordinate dimension to be 224 to calculate the two-dimensional histogram of I channel and Q channel data. For each signal, we can get a 224 × 224 sized matrix, and the matrix is copied and expanded into an image of 224 × 224 × 3, which is used as the input of CNNs.
Time-frequency image: The short-time Fourier transform (STFT) is used to obtain the timefrequency image of the radio signal. First, the original 1024 length signal is expanded to 1116 length by repeating the first 92 values of the radio signal. We then compute the STFT of the signal. The window used is the Hamming window, the length of the window is set as 224, the overlap number of the window is set as 220, and the length of the STFT is set as 224. Through the above processing, a radio signal of 1116 length in the complex form can be transformed into a matrix of 224 × 224 size in the complex form. A time-frequency image of size 224 × 224 × 3 can be obtained by taking and combining the real part, imaginary part and amplitude of the matrix. This time-frequency image is used as the input of CNNs to realize the classification task.

Comparison of Training Strategies
At first, we compare the classification performance of the two training strategies mentioned above on dataset Sig1024. The method used is the signal rearrangement method and the model used is ResNet18. The first training strategy only adjusts the weight of the last classification layer during training, and the second training strategy trains the whole network from the start. Figure 4 shows the classification accuracy of the two training strategies under different SNRs, and Table 2 shows the overall classification accuracy of the two training strategies in the whole test dataset. Although the computational complexity of the first training strategy is low, the classification effect is unsatisfactory. In the following simulations, we adopt the strategy of training the entire network.  Overall classification accuracy of training the entire network and training only the classification layer. The classification method is SRM, the model used is Resnet18, and the dataset is Sig1024.

Methods Train the Classification Layer Train the Entire Network
Accuracy 32.30% 65.99%

Comparison with Traditional Methods
We now compare the performance of our proposed method with traditional constellation diagram and time-frequency image methods on dataset Sig1024. ResNet18 is used to carry out classification tasks. Figure 5 shows the classification accuracy of the four methods under different SNR, and Table 3 shows the overall classification accuracy of the four methods. As can be seen from Figure 5, the performance of the signal rearrangement method and the convolution mapping method proposed in this paper are almost the same, and the classification accuracy can approach 100% at high SNR. The method based on time-frequency images performs better at low SNR, but has lower classification accuracy at high SNR. The method using the constellation diagrams generated from IQ signals has relatively poor results under both low and high SNR. On the whole, the proposed two methods have better performance than the traditional constellation diagram method and time-frequency image method, which shows the superiority of our proposed methods.

Method s Train the Classification Layer
Train the Entire Network Accuracy 32.30% 65.99%

Comparison with Traditional Methods
We now compare the performance of our proposed method with traditional constellation diagram and time-frequency image methods on dataset Sig1024. ResNet18 is used to carry out classification tasks. Figure 5 shows the classification accuracy of the four methods under different SNR, and Table 3 shows the overall classification accuracy of the four methods. As can be seen from Figure 5, the performance of the signal rearrangement method and the convolution mapping method proposed in this paper are almost the same, and the classification accuracy can approach 100% at high SNR. The method based on time-frequency images performs better at low SNR, but has lower classification accuracy at high SNR. The method using the constellation diagrams generated from IQ signals has relatively poor results under both low and high SNR. On the whole, the proposed two methods have better performance than the traditional constellation diagram method and time-frequency image method, which shows the superiority of our proposed methods.

Performance of Different ImageNet Models
Using the convolution mapping method, we compare the classification accuracy obtained by using different ImageNet classification models on dataset Sig1024, including the typical convolutional neural networks ResNet18, ResNet101, VGG16 and DenseNet121, as well as the lightweight models MobileNet, ShuffleNet and MNASNet. Table 4 shows the overall classification accuracy of these methods and Figure 6 shows the classification accuracy of these models under different SNRs. We can see that except for the relatively low classification accuracy of the MNASNet model, the classification results of the other models have little difference. Among them, DenseNet121 has the highest overall accuracy. Using the convolution mapping method, we compare the classification accuracy obtained by using different ImageNet classification models on dataset Sig1024, including the typical convolutional neural networks ResNet18, ResNet101, VGG16 and DenseNet121, as well as the lightweight models MobileNet, ShuffleNet and MNASNet. Table 4 shows the overall classification accuracy of these methods and Figure 6 shows the classification accuracy of these models under different SNRs. We can see that except for the relatively low classification accuracy of the MNASNet model, the classification results of the other models have little difference. Among them, DenseNet121 has the highest overall accuracy. Figure 6. Comparison of the performance of different ImageNet classification network models. The convolution mapping method was adopted. We test the performance of our method on dataset Sig1024_2 with different sampling rates. We use CMM and MobileNet_V2 model and the classification accuracy of each SNR is shown in the Figure 7. We can see that in this case, our method still has good classification performance. This shows that the method can be used to classify radio signals with different sampling rates as long as the model has been trained with signal samples with these sampling rates. Figure 6. Comparison of the performance of different ImageNet classification network models. The convolution mapping method was adopted.

Performance with Different Sampling Rates
We test the performance of our method on dataset Sig1024_2 with different sampling rates. We use CMM and MobileNet_V2 model and the classification accuracy of each SNR is shown in the Figure 7. We can see that in this case, our method still has good classification performance. This shows that the method can be used to classify radio signals with different sampling rates as long as the model has been trained with signal samples with these sampling rates.

Performance with Real Measured Data
We also verify the effectiveness of our method by using real measured dataset RealSig to test the MobileNet_V2 model trained in the third experiment. Figure 8 shows the classification accuracy of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM signals.

Performance with Real Measured Data
We also verify the effectiveness of our method by using real measured dataset RealSig to test the MobileNet_V2 model trained in the third experiment. Figure 8 shows the classification accuracy of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM signals.

Performance with Real Measured Data
We also verify the effectiveness of our method by using real measured dataset RealSig to test the MobileNet_V2 model trained in the third experiment. Figure 8 shows the classification accuracy of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM signals. To further improve the performance, we add the training set in dataset RealSig to fine-tune the model MobileNet_V2 trained with dataset Sig1024 with a learning rate of 0.00001, and then test the model's classification performance on the test set. The classification accuracy of each SNR is shown in Figure 9. We can see that the classification accuracy can be improved to nearly 100% at high SNR. To further improve the performance, we add the training set in dataset RealSig to fine-tune the model MobileNet_V2 trained with dataset Sig1024 with a learning rate of 0.00001, and then test the model's classification performance on the test set. The classification accuracy of each SNR is shown in Figure 9. We can see that the classification accuracy can be improved to nearly 100% at high SNR.

Performance of OFDM Signal Classification
Finally, we carry out experiments on dataset OFDMSig to verify the effectiveness of our method, using CMM and MobileNet_V2. The classification result is shown in Figure 10. It can be seen that our method also has a good classification performance on the dataset OFDMSig, achieving a classification accuracy of 100% at the high SNR.

Performance of OFDM Signal Classification
Finally, we carry out experiments on dataset OFDMSig to verify the effectiveness of our method, using CMM and MobileNet_V2. The classification result is shown in Figure 10. It can be seen that our method also has a good classification performance on the dataset OFDMSig, achieving a classification accuracy of 100% at the high SNR.

Performance of OFDM Signal Classification
Finally, we carry out experiments on dataset OFDMSig to verify the effectiveness of our method, using CMM and MobileNet_V2. The classification result is shown in Figure 10. It can be seen that our method also has a good classification performance on the dataset OFDMSig, achieving a classification accuracy of 100% at the high SNR.

Conclusions
In this paper, we have proposed a RIT that transforms radio signals into images, and then the existing ImageNet classification network models are utilized to realize radio modulation classification, thus building a bridge between radio modulation classification and ImageNet classification. The contribution of this paper is to transform the signal modulation classification problem into the image classification problem by the proposed RIT, enabling signal classification using ImageNet models from the image classification domain. Experiments have shown that the performance of this method is better than those of the constellation diagram method and timefrequency image method. The performance comparison of different ImageNet classification network models shows that except for MNASNet1_0, the other six models have similar performance, which indicates that this method has relatively loose requirements for ImageNet classification network models. Furthermore, the experimental results show that our method has good classification performance for signal datasets with different sampling rates and our method can also handle OFDM signals and real measured signal data well. The method in this paper establishes a relationship between radio modulation classification and ImageNet image classification, two

Conclusions
In this paper, we have proposed a RIT that transforms radio signals into images, and then the existing ImageNet classification network models are utilized to realize radio modulation classification, thus building a bridge between radio modulation classification and ImageNet classification. The contribution of this paper is to transform the signal modulation classification problem into the image classification problem by the proposed RIT, enabling signal classification using ImageNet models from the image classification domain. Experiments have shown that the performance of this method is better than those of the constellation diagram method and time-frequency image method. The performance comparison of different ImageNet classification network models shows that except for MNASNet1_0, the other six models have similar performance, which indicates that this method has relatively loose requirements for ImageNet classification network models. Furthermore, the experimental results show that our method has good classification performance for signal datasets with different sampling rates and our method can also handle OFDM signals and real measured signal data well. The method in this paper establishes a relationship between radio modulation classification and ImageNet image classification, two seemingly unrelated tasks, and verifies the cross-domain transfer ability of ImageNet classification network models. In our future work, we will consider applying a similar RIT to other radio signal classification tasks, such as interference classification, coding recognition, and protocol identification.