Multidimensional CNN-LSTM Network for Automatic Modulation Classification

: Automatic modulation classiﬁcation (AMC) is the premise for signal detection and demodulation applications, especially in non-cooperative communication scenarios. It has been a popular topic for decades and has gained signiﬁcant progress with the development of deep learning meth-ods. To further improve classiﬁcation accuracy, a hierarchical multifeature fusion (HMF) based on a multidimensional convolutional neural network (CNN)-long short-term memory (LSTM) network is proposed in this paper. First, a multidimensional CNN module (MD-CNN) is proposed for feature compensation between interactive features extracted by two-dimensional convolutional ﬁlters and respective features extracted by one-dimensional ﬁlters. Second, learnt features of the MD-CNN module are fed into an LSTM layer for further exploitation of temporal features. Finally, classiﬁcation results are obtained by the Softmax classiﬁer. The effectiveness of the proposed method is veriﬁed by abundant experimental results on two public datasets, RadioML.2016.10a and RadioML.2016.10b. Satisfying results are obtained as compared with state-of-the-art methods.


Introduction
With the rapid development of wireless communication technology, various modulation schemes have been adopted for efficient data transmission.In this context, automatic modulation classification (AMC) is becoming increasingly challenging as an important prerequisite and key step in signal detection and demodulation.It is crucial in several civilian and military applications, such as radio interference monitoring, spectrum allocation optimization, and electronic countermeasures [1].
There have been numerous signal processing-based research works in the area for decades, which can generally be classified into two classes: likelihood-based methods (LB) and feature-based methods (FB) [1].The likelihood-based framework can lead to optimal solutions by adopting different criteria, such as the average likelihood ratio test (ALRT) [2], generalized likelihood ratio test (GLRT) [3], hybrid likelihood ratio test (HLRT), etc., [4].However, these rely on explicit modeling of priori knowledge, such as channel conditions, and suffer from heavy computational complexity.In this context, feature-based approaches designed by experts are more robust.Cyclic features [5], higher order moments and cumulants [6][7][8], and wavelet transforms [9] are representatives of these categories.Although they are widely applied in practice, the design of efficient features requires extensive domain knowledge, and extraction of handcrafted features can be very tedious in some cases.
In recent years, deep learning [10] has gained significant progress in various fields, such as computer vision (CV) [11][12][13], natural language processing (NLP) [14], and speech recognition [15,16].These successes facilitate researchers in applying it to modulation classification, which has been verified to be effective.In 2016, O'Shea et al. pioneered the first convolutional neural network (CNN) model for modulation recognition [17].Adopting the in-phase (I) and quadrature components (Q) as inputs, it significantly outperforms traditional methods with handcrafted features [17,18].Since then, the application of deep learning for modulation recognition has become prevalent.The improvement of classification accuracy and robustness against various g parameter settings and different wireless propagation conditions is crucial for successive modulation classification.
To obtain higher classification accuracy, researchers have developed many methods that can be roughly classified into two categories.The first category of methods study design of deep archietectures that can better exploit the distinction between different types of modulation.The convolutional long short-term memory cells (CLDNN) [19], Google Inception architectures [19], residual networks (ResNet) [19,20], and graph convolutional neural network (GCN) [21] all lead to good performance.The model CLDNNs [19] that consist of CNN and LSTM achieved the best performance compared with CNN, Inception networks, and ResNet used in research papers.Since then, the cascade of CNN and LSTM has been a popular framework due to its effectiveness and thus has been followed by several works [22,23].Recently, the author in paper [24] proposed a complex model that learns features from the IQ signals.However, the existing methods do not derive full value from the IQ signals.The previous works utilize convolutional filters with the same dimension or cascade convolutional filters with different dimensions and ignore the compensation from different convolutional filters.In this work, we take advantage of CNN to explore more useful information from the IQ signal by paralleling different dimension filters and LSTM to learn temporal features.
The second category explores efficient feature representation as an input of deep learning networks.Various features calculated from IQ components, such as amplitude and phase [25,26], higher order statistics (HOS) [23], and combinations of them [27], are utilized to provide sequence chacerization.Furthermore, in order to take advantage of the efficient feature extraction performance of CNN in images, different picture types replaced time sequences as the the input of CNN, such as spectrum [28], Choi-Williams distribution (CWD), time-frequency analysis [29], constellation diagrams [30][31][32], and cyclic correntropy spectrum.A combination of different features usually results in performance improvement [33].In [22], a dual-stream structure based on CNN and LSTM is proposed that considers the feature interaction between the IQ signal and polar feature (amplitude and phase).Zhang combined the raw IQ data and fourth order cummulants (FOC) together as the input of a structure consisting of CNN and LSTM [23].In our work, we combine IQ signals and their biquadrate as the input features based on a two-stream network to further improve performance.
Robustness against varying parameter settings and different wireless propagation conditions is another key problem for the application of AMC.Performance of deep learning models drops dramatically when test dataset characteristics differ from training datasets.There are two means to solve this based on existing deep learning methods.One is to construct large-scale well-annotated datasets including all parameters.However, the cost of construction of large-scale well-annotated datasets covering all parameters is prohibitive.Up to now, most people use training sets including the same signal to noise ratio (SNR) as the test data to enhance the robustness of the SNR.This is reasonable for most SNRs except the lower SNR because feature extraction from the lower SNR signal may affect higher SNR signal features.Another method is to introduce transfer learning to enhance robustness on different sample rates and SNRs and increase the training speed.Wang [34] and Bu [35] used transfer learning and adversarial transfer learning architecture (ATLA) to improve the robustness of sample rate, respectively.Xu used transfer learning to enhance the performance of lower SNRs samples [36].However, the methods also need well-annotated datasets with new parameters.Robustness will be one of the lines of focus in future research.
In this paper, a hierarchical multifeature fusion scheme is proposed for enhanced feature diversity.The states of IQ signals characterize the modulation feature, and its statistical order process describes time periodically [37].We firstly propose to fully exploit features from IQ signals using different dimension convolutional filters based on two streams.Multidimensional convolution filters are applied to each stream in parallel for better characterization of individual channels and interactions between them.Afterwards, a compact LSTM network is utilized to further exploit intrinsic sequential information.Then, IQ components and their corresponding biquadrate serve as two main streams of input to enhance performance.To verify the effectiveness of the proposed method, abundant experiments are carried out with comparative analysis with state-of-the-art works.
The main contributions of our work can be summarized as follows: (1) To the best of our knowledge, this is the first time a multidimensional CNN (MD-CNN) module is proposed to exploit the relationship between individual and interactive features for feature compensation.(2) A hierarchical multifeature fusion (HMF) scheme is proposed for enhanced feature diversity, where IQ componets and the biquadrate higher order statistics are utilized as two parallel branches of input, and MD-CNN outputs are fed into an LSTM layer for further exploitation of temporal characteristics.(3) The effectiveness of the proposed method is verified by an abundant series of experiments, including discussions about hyperparameter configuration, enhanced feature diversity, and performance comparison with different classifiers and varieties.Better performance compared with sate-of-the-art works is presented.
The remainder of the paper is organized as follows.The signal model and related works are shown in Section 2. Details of the proposed hierarchical multi-feature fusion scheme are illustrated in Section 3. Experimental results and analysis are presented in Section 4, while Section 5 concludes the paper.

Signal Model
In this paper, the signal model [17] considering real world effects is adopted.Without loss of generality, the complex base-band time series representation of the received signal r(t) can be expressed as where s(t) is the modulated signal of the transmitter, n clk is the sampling rate offset, h(t) represents a time varying rotating nonconstant amplitude channel impulse response, τ 0 is the maximum delay spread, n Lo (t) is the residual carrier frequency, n Add (t) is the complex addition of noise that may not be white, and j is the imaginary number.
The received signal r(t) is sampled into its discrete version r[n], which consists of the in-phase (I) components r I [n] and quadrature (Q Their relationship with the transmitter side I component caused by phase offset ϕ, carrier frequency offset f o , and additive noise.Note that r I [n] and r Q [n] are no longer orthogonal to each other due to these contaminations.

Higher Order Statistics
Higher order statistics (HOS) are popular features in digital modulation classification for their outstanding feature representation property.For example, higher order cumulants and higher order spectrum characterize the shape of the distribution of the noisy baseband IQ signal well [8,23], while higher order cumulants can effectively suppress the white Gaussian noise when the length of the signal is large enough [8].
HOS has also been involved in deep learning-based AMC applications.Higher classification accuracy is reported in [23] when the fourth order cumulants (FOC) are combined with IQ components as network inputs, compared with IQ signals utilized alone.In [38], an overcomplete dictionary is learnt with a k-sparse autoencoder from the biquadrate HOS.The satisfied classification accuracy of MQAM and MPSK is obtained with a SVM classifier.

Deep Network in AMC
The convolutional neural network (CNN) has long been used in AMC since O'Shea first utilized the CNN model for modulation recognition in 2016 [17].Different variants with different features to enhance the performance.For the IQ signal, convolutional filters increase the diversity of signals with different filters.In [19], the author studied the influence of the number of convolutional filters, convolutional layers, and taps of filers on classification.The CLDNN [19] and the deep hierarchical network (DHN) [39] combined shallow information with deep information using the same dimension filters and obtained good performance.In our work, we use convolutional filters with different dimensions enriching the feature to improve the classification performance.
In addition to CNN, the long short-term memory (LSTM) network is widely adopted in AMC for its effectiveness in charactering the sequential information in the signal.It can be considered as a special kind of recurrent neural network (RNN), where the memory cell unit is utilized to solve the long-term dependencies problems and optimizes the gradient vanishing problems in the RNN.West and O'Shea proposed a CLDNN that first utilized LSTM for modulation recognition and achieved better performance compared with the network only including CNN [19].Since then, it has been common for researchers to use LSTM for modulation recognition tasks.The studies contained in [22, 23,40] are among the representative works in this category, where satisfied performance is achieved.

Multidimensional CNN Module for Feature Compensation
An efficient feature representative scheme is essential for successful AMR in a big data driven scenario.Features derived from IQ components have been widely studied in the literature.For IQ signals, either I components or Q components can represent modulation information.Furthermore, the intrinsic relationships between I and Q channels are also important in identifying different modulation types.However, the compensation relationship between individual IQ components and their interactive features, which are crucial in identifying different modulation types, is often ignored by existing works.In this paper, a multidimensional CNN module is proposed to better exploit feature compensations between them, the structure of which is shown in Figure 1.
As illustrated in Figure 1, a 'Conv1' convolutional layer consisting of 50 1 × 8 filters is applied to the 2 × 128 IQ inputs.It resembles the functionality of matched filter [17] and also enriches the input IQ components by mapping them onto 50 feature channels.Then, based on the characteristics of IQ signal [41], we propose using two streams, namely stream A and stream B, to extract individual features and interactive features between IQ components, respectively.The 50 one-dimensional (1D) filters in 'Conv 2-A' layer at the start of stream A applied to I and Q channels independently thus can learn individual features within each channel.As its counterpart in stream B, the two-dimensional (2D) filters in 'Conv 2-B' exploit the interactive features between I and Q channels.A set of 1D convolution layers is then adopted for learning deeper features within each stream.Number of convolutional layers in each stream N could be experimentally determined.The learnt individual and interactive features are then merged by concatenation to better characterize the complementary information between two streams as illustrated in Figure 2. As feature maps of stream A are twice the size as compared with those of stream B, directly merged features are reshaped.Typical feature map dimensions are depicted in Figure 2 for better understanding of the concatenation operation, where signal length may not equal 128 due to different boundary settings.After the concatenation operation, features are reshaped and expanded in depth channel.They are then fed into the following LSTM layers for further feature representation.

The Proposed Hierarchical Multifeature Fusion Scheme
Involving more features as the model input is one intuitive method for performance enhancement.Inspired by the previous works, a hierarchical multi-feature fusion (HMF) scheme is proposed, where IQ components and the biquadrate higher order statistics are utilized as two parallel branches of input for enhancing feature diversity (denoted as branches I and II in Figure 3).Notations of corresponding layers are slightly adjusted to be compatible to those in the MD-CNN module in Figure 1.
In the proposed HMF scheme, the biquadrate HOS r 4 [n] defined as is adopted, whose real and imaginary part are denoted as r 4 R [n] and r 4 I [n], respectively.The locations, strength, number, and existence of quartic spectrum lines are effective features in classifying MQAM and MPSK [38], which are very challenging modulation types in AMC.
To provide more network choices, the network if only the upper IQ component branch is considered is abbreviated as HMF-I, while only the lower HOS branch is denoted as HMF-II.HMFs referring to both branches are utilized.After the two branches of IQ signals and biquadrate HOS go through parallel MD-CNN modules, an LSTM layer is adopted to further exploit sequential features from the concatenation outputs.All these features are then directly concatenated by depth for final HMF representation.

Classifier
Like most conventional deep learning-based methods, feature maps of the last HMF layer are vectorized and fed into a fully connected dense layer.A Softmax classifier is adopted to obtain final modulation classification outputs.For input data x, it is assigned to the class that maximizes the probability, calculated as where θ is the classifier weight to be learnt, and N is the total number of classes.Then, the categorical cross entropy error that can obtain accurate classification results with less calculation and fast convergence is adopted as loss function: where y i denotes the target output, and the N B is the number of a training batch size.Once network parameters are obtained, it is common practice to replace the Softmax layer with traditional support vector machine (SVM) classifier or random forest classifier (RF) for further performance improvement.We shall discuss their classification performance in Section 4.

Experiments and Result Analysis
To evaluate the effectiveness of the proposed method, we implement a series of experiments in this section.Firstly, we investigate the influence of hyperparameter settings to determine the optimal network structure.Secondly, we examine the proposed HMF feature representation part, and classifier part, respectively.Finally, comparison results with state-of-the-art methods are presented.

Datasets
To verify the effectiveness of the proposed method, RadioML2016.10a[42] and Ra-dioML2016.10.b [42] datasets are adopted in this paper, which are all authoritative and widely used datasets in the AMC field.Various realistic channel imperfections, such as frequency selective fading, power delay profile, and local oscillator offset, are well characterized to resemble practical wireless communication environments under different signal to noise ratio (SNR) scenarios.

Training and Testing Settings
To reveal the underlying relationship between different modulation types, proper training, validation, and testing of the set division method are required.Considering the fact that robustness against different SNR settings is an important property of AMC methods, each dataset is comprised of samples at all SNR scenarios.For RadioML2016.10a, samples at each SNR are randomly split with a ratio of 5:3:2.Consequently, training, validation, and testing set sizes are 5.5k, 3.5k, and 2.2k per SNR per modulation scheme.Experiments on RadioML2016.10bare included for extensive testing of classification performance.To this end, we fix the training and validation set sizes to be 5k and 3k per SNR per modulation scheme, while the rest are all pooled into the testing set.The sizes of training, validation, and testing sets are 100k, 60k, and 1040k, respectively.
A dropout rate of dr = 0.5 is adopted to avoid overfitting.The initial learning rate starts at 0.01 and multiplies by a factor of 0.5 if the validation loss does not decrease within 10 epochs to improve the training efficiency.The batch size is set to N B = 1024 to avoid the local value and speed up the training process.The adaptive moment estimation (Adam) [43] is used in this work to minimize the loss function.We stop the training process when the validation loss does not decrease for 30 epochs and use the minimum validation loss to predict the modulation type.All experiments are implemented in the TensorFlow backend using the Keras deep learning library, supported by NVIDIA GeForce GTX TITAN X GPU.

Hyperparameter Configuration
We start our experiments by exploring the influence of network structure-related hyperparameters in the proposed hierarchical multifeature fusion (HMF) scheme, including the number of convolutional layers in the MD-CNN module N and the number of units in each LSTM layer U. Note that only HMF-I with the upper IQ component branch is considered at this experimental stage for simplicity.The Softmax classifier is adopted by default unless otherwise stated.
To determine the optimal number of convolutional layers in the MD-CNN module, the classification accuracy over varying SNR values is depicted in Figure 4a when U is fixed to 100.As shown in Figure 4a, these classification accuracies are similar at lower SNR values where N = 2 tops the performance when SNR is greater than −6 dB, suggesting the importance of proper CNN layers in efficient feature representation.The number of CNN layers is fixed to N = 2 for all HMF networks.
Then, we investigate the classification accuracy with different numbers of cells in the LSTM layer, where three cases of U = 50, 100, and 150 are considered.Note that, as illustrated in Figure 2, the input feature dimension of the LSTM layer is 150 due to our CNN network settings.Consequently, the LSTM layer could be regarded as a dimension reduction layer.We see that U = 50 leads to the worst result, indicating that some information is overly reduced when the output dimension is set to be too low.However, performance gain with the increase of U reached saturation when it was greater than 100.There is only a slight difference between the two curves.Considering extra parameters and computation complexity, the number of LSTM cells is fixed to U = 100.To experimentally demonstrate this property, we compare the classification accuracy of the following three networks: (1) HMF-1A: only stream A in the MD-CNN in HMF-1 is enabled; (2) HMF-1B: only stream B in the MD-CNN in HMF-1 is enabled; and (3) HMF-1, where both streams are enabled on both datasets.Experimental results are depicted in Figure 5.In Figure 5, we can observe the consistent performance improvement of HMF-I over its subnet works, which is a promising result confirming the performance gain by combining individual and interactive features.Generally speaking, individual features perform better than interactive features when utilized separately, which is more obvious in RadioML2016.10a.

As discussed in Section
For a better understanding of what the CNN has learnt, we also visualize the 50 convolution kernels of the two branches in Figure 6.We see that the one-dimensional 1 × 8 filters in Figure 6a,c applied independently to individual I/Q/HOS channels perform similar to frequency selective filters, while two-dimensional 2 × 8 filters in Figure 6b,d apply across IQ/HOS channels simultaneously to result in deeper feature maps.Within each branch, as the kernels of the two streams are learnt independently, it is not unusual that some of the kernels in Figure 6a,c and Figure 6b,d share similarities with each other to characterize the intrinsic information in identifying different modulation types.However, a careful examination reveals that certain kernel patterns in Figure 6a,c never appear in Figure 6b,d

Performance Comparison of Variants of HMF
We now study the performance differences between variants of HMF.As HOS features are seldom independently utilized, only HMF-I and HMF are involved.
Classification accuracy comparison of HMF-I and HMF is depicted in Figure 7a, while corresponding training curves are shown in Figure 7b.Note that although only the IQ branch is involved in HMF-I, it outperforms HMF in some cases when SNR values are within the range of (−10,0) dB.A possible explanation is that HOS features are highly influenced by noises, as they are calculated based on cyclic periodicity which is very sensitive to noise.As shown in Table 1, the averaged classification accuracy over all SNR values of HMF-1 is only 0.75% lower than the HMF, making it a strong choice when storage and computation resources are limited.Comparison results of number of learnt parameters and training time are also presented in Table 1.One can make one's own choice from HMF-I and HMF depending on different requirements in real applications.

Comparison With Different Classifiers
The proposed HMF performs as an efficient feature extractor.In this part, we illustrate the influence of different classifiers on the classification accuracy with HMF-I on RadioML2016.10a.After the dense layer has mapped HMF output onto higher feature dimensions, features obtained are fed into the support vector machine (SVM) and random forest (RF) classifiers for modulation scheme classification.Experimental results are illustrated in Figure 8.Our SVM implementation is based on the Libsvm library [44], where the radial basis function (RBF) is adopted as the kernel function with other parameters set to default values.The RF classifier is achieved by built-in functions in sklearn, with the number of trees in the forest specified as typical values.
-.a.As shown in Figure 8, we can see that SVM and certain RF-based classifiers can lead to performance improvement as compared with the HMF-I Softmax as the baseline.For the SVM classifier, it is consistent with our intuition that better classification is achieved when HMF features are mapped into a higher dimensional space.For the random forest classifier, accuracy increases with the number of trees raising from two to five; however, it gradually drops back when the number of trees is too large.Better accuracy is possible if classifiers are carefully selected.

Comparison With State-of-the-Art Results
In this section, we report comparison results with several state-of-the-art methods on both datasets.Based on CNN2 [17], CNN2Opt [19] was obtained by specifying a group of optimal hyperparameters.In [19], a CNN-LSTM cascading model CLDNN was proposed, where an LSTM layer with 50 cells was first utilized in the AMC field.For fair comparison, we also present comparison results when the LSTM cell number is set to be 100, abbreviated as CLDNN 100.In [24], the authors proposed a linear combination that enables deep learning architectures to compute complex convolutions.Comparison results on both datasets are shown in Figure 9.As revealed by Figure 9, higher classification accuracies are reported by CLDNN [19] and the proposed HMF and HMF-I methods, as compared to CNN2 [17], CNN2Opt [19], and Complex [24].This can be attributed to the fact that they all employ the LSTM layer in network design, which is good at sequential feature representation.As compared to CLDNN [19] and its variant CLDNN 100, the utilization of the compensation relationship between individual and interactive features of the proposed HMF method leads to further performance improvement.
As can be easily observed in Figure 9, the proposed HMF and its IQ-channel-only variant HMF-I are superior to all other methods on both datasets when SNR is equal to or greater than −6 dB.For RadioML2016.10a, an average accuracy of 90.99% is reported by the proposed HMF when SNR varies from 4 to 18 dB, with the highest accuracy of 92.35% obtained at SNR = 8 dB.The performance gain over the other methods is not so obvious as compared with that in RadioML2016.10a,and this may be explained by the fact that the signal shapes in RadioML2016.10bare affected by different channel parameters that need a more proper model to address this difference.However, the identification performance of the proposed HMF method is not stable at lower SNR settings (lower than −6 dB).There is still room for performance improvement in learning useful features from noisy data samples.
To gain further insight into the classification accuracy of different modulation types, the confusion matrices of CLDNN 100 and the two proposed HMF methods are depicted in Figure 10 at a moderate SNR setting.As can be observed in Figure 10, most modulation types can be correctly classified (with accuracy above 99%), while there are still two main resources for misclassification.One is between two analog modulation types: AMDSB and WBFM, which are susceptible to noise.The other one occurs in higher order QAM modulations (QAM16 and QAM64) that have similar constellation shapes.As clearly shown in Figure 10b,c, the classification accuracy of the AMDSB method has been greatly improved by the proposed HMF-I and HMF methods.Furthermore, confusion between M-QAM modulation types has been significantly reduced.
The classification accuracy of these modulation types is shown in Table 2 for quantitative evaluation, where average classification accuracy of all 11 modulation types is listed in the last column.As shown in Table 2, for SNR = 12 dB, the overall performance improvement of HMF-1 and HMF is 2.68% and 4.69%, compared with CLDNN 100.Moreover, we see a significant increase (about 20%~44%) in the classification accuracy of AMDSB, QAM16, and QAM64 modulation types, indicating that efficient features for identifying these modulation types have been successfully learnt by the proposed HMF scheme.It is worth noticing that there is still room for performance improvement in identifying M-QAM modulation types.Figure 11 visualizes some typical features of dense layer outputs with different feature dimensions.We see that QPSK features can be easily separated from M-QAM features, which share certain similarities between them.However, QAM 16 and QAM 64 are mixed together in all feature dimensions, which can explain the inferior identification accuracy.Designing a better feature representation method addressing this problem is an important direction in the future.

Conclusions
In the paper, a hierarchical multi-feature fusion (HMF) scheme is proposed for efficient feature representation in automatic modulation classification.Firstly, an MD-CNN module is proposed to increase feature diversity by extracting the interactive features and individual features using convolutional filters with different dimensions.Secondly, the MD-CNN outputs of two parallel branches of IQ and HOS features are fed into corresponding LSTM layers to further exploit temporal features.Finally, HMF outputs are classified after dense and Softmax layers for classification results.Abundant experiments are conducted to verify the effectiveness of the proposed method, while satisfying results are reported as compared with several state-of-the-art works.Improving the classification results of the QAM modulation types, especially under low SNR settings, is a promising direction for further study.

Figure 1 .
Figure 1.Illustration of the proposed MD-CNN module.

Figure 2 .
Figure 2. Feature merging by concatenation in length.

Figure 3 .
Figure 3. Illustration of the proposed hierarchical multifeature fusion scheme.

Figure 4 .
Figure 4. Classification accuracy comparison of HMF-I with varying number of (a) CNN layers (N) and (b) LSTM cells (U).

4. 3 .
Enhanced Feature Presentation of HMF 4.3.1.Feature Compensations Between Two Streams in MD-CNN As discussed in Section 3.2, the feature concatenation of individual features and interactive features by the proposed MD-CNN module can better exploit feature compensations between them, thus leading to enhanced feature diversity.
Figure 6c,d significantly.Consequently, it is safe to conclude that enhanced feature diversity is achieved in this way.

Figure 7 .
Figure 7.Comparison of the proposed HMF-I and HMF in (a) classification accuracy and (b) training curves.

Figure 8 .
Figure 8. Classification accuracy comparison of HMF-I with different classifiers on RadioML2016.10.a.

Figure 11 .
Figure 11.Visualization of typical feature outputs of the dense layer.(a) The 5th feature, (b) the 11th feature, (c) the 39th feature, (d) the 58th feature, (e) the 63rd feature, and (f) the 98th feature.

Table 1 .
Overall comparison of HMF-I and HMF.