Machine Learning for 5G MIMO Modulation Detection

Modulation detection techniques have received much attention in recent years due to their importance in the military and commercial applications, such as software-defined radio and cognitive radios. Most of the existing modulation detection algorithms address the detection dedicated to the non-cooperative systems only. In this work, we propose the detection of modulations in the multi-relay cooperative multiple-input multiple-output (MIMO) systems for 5G communications in the presence of spatially correlated channels and imperfect channel state information (CSI). At the destination node, we extract the higher-order statistics of the received signals as the discriminating features. After applying the principal component analysis technique, we carry out a comparative study between the random committee and the AdaBoost machine learning techniques (MLTs) at low signal-to-noise ratio. The efficiency metrics, including the true positive rate, false positive rate, precision, recall, F-Measure, and the time taken to build the model, are used for the performance comparison. The simulation results show that the use of the random committee MLT, compared to the AdaBoost MLT, provides gain in terms of both the modulation detection and complexity.


Introduction
Recently, the integration of 5G new radio (NR) and multiple-input multiple-output (MIMO) has received increasing attention due to its effectiveness in improving both the capacity and robustness of the wireless systems [1]. In fact, the use of multiple antenna elements in MIMO systems is considered as one of the most promising technologies in 5G NR systems that can be employed to enable beamforming and spatial multiplexing [2]. The cooperative MIMO systems also offer a considerable rate gain and improve the diversity order [3][4][5][6][7]. The efficient relaying mechanisms are taken into account by the standard specifications of the mobile broadband communication systems, such as LTE-advanced (LTE-A) [8]. Furthermore, the estimation of communication parameters (e.g., number of antennas, coding, and modulation) has received a great deal of attention. It has found applications in several military and civilian communication systems, such as software-defined radio and cognitive radios [9,10]. It is important to consider the effect of spatial correlation and imperfect channel state information (CSI) in the cooperative MIMO systems. The distance limitation between the antennas and scatterers existing in propagation environment affects the diversity, multiplexing, and capacity gains. The errors caused by the channel estimation, quantization, reciprocity mismatch, and delay produce the imperfect CSI [11]. It is practically unfeasible to obtain the perfect CSI at all nodes [12].

Related Works
In a MIMO destination node, the decoder or the spatial demultiplexer and the demodulator are employed to recover the transmitted binary information. In fact, the destination node is the entity that converts the received waves into a binary stream. Therefore, the

Contributions
In this study, we aim to propose a detection modulation algorithm with a low complexity that is dedicated to multi-relay cooperative MIMO systems over the imperfectly estimated correlated channels. More specifically, the contributions of this work are summarized as follows. • We proposed a modulation detection using random committee MLT for multi-relay cooperative MIMO systems over the imperfectly estimated correlated channels. To the best of the authors knowledge, this is the first time that random committee MLT is used for the modulation detection. The purpose is to detect the modulation types and orders among different M-ary shift-keying linear modulations (M-PSK and M-QAM) used by broadband technologies, especially 5G NR. • The modulation detection algorithms proposed in [21,24] are considered as benchmarks, and the comparative study is provided. The performance of the proposed modulation detection algorithm is investigated and evaluated with the number of efficiency metrics, such as the true positive rate, false positive rate, precision, recall, F-Measure, and the time taken. The superiority of the proposed modulation detection algorithm in terms of computational complexity and modulation detection is verified through the simulation results.

Outline
The rest of the paper is organized as follows. Section 2 presents the system model, including the system description and assumptions. The proposed modulation detection algorithm is provided in Section 3. The simulation results, along with the discussion and the benefits of the proposed algorithm, are given in Section 4. Finally, the conclusions of the paper are presented in Section 5.

Notations
In this paper, we use tr(·),( ·), (·) H , (·) T , and (·) −1 to denote the trace, conjugate, conjugate transpose, transpose, and inverse, respectively. (.) r,c represents the entry in the r th row and the c th column of a matrix. E[.] denotes the statistical expectation. I N stands for a N × N identity matrix. The set of M × N matrices over complex field is denoted by C M×N . Finally, CN (m, Σ) is a circularly symmetric complex Gaussian distribution with mean m and covariance matrix Σ. It is noted that the abbreviations used in this paper are listed in Table 1.

System Model
We consider the multi-relay cooperative MIMO system over the spatially-correlated channel as shown in Figure 1. Here, we denote the source node by S, the destination node by D, and L relay nodes by R l , l = 1, 2, . . . , L to make information transmission from S to D. We suppose that antennas N AS , N AR , and N AD are enabled at S, each R l , and D, respectively. We apply a two-phase transmission protocol to allow the information transmission from S to D; via the direct link SD and the cooperative links SR l − R l D, l = 1, ..., L. To this end, a non-regenerative and half-duplex relay technique is employed for processing and forwarding the received signals at each R l [27]. Now, if we apply a spatial multiplexing (SM) at S and if all R l and D are to simultaneously support all the N AS independent substreams, then we must meet the requirements We presume that all nodes have an equal number of antennas for simplicity, i.e., N AS = N AR = N AD = N A . We encode the source signals x s in the first transmission phase using SM. Therefore, the signals x s is given by where s 1 , s 2 , . . . , s N A are assumed to be independent, identically distributed (i.i.d.), and mutually independent. Given the fact that we stand the transmit power by P x s at S, x s should satisfy the power constraint given by To achieve a near-capacity at sum-rate [28], we apply a regularized zero forcing (RZF) linear precoding technique at S. Consequently, the linear precoding matrix can be expressed as whereĤ SD ∈ C N A ×N A is the matrix that estimate the transmission channel from S to D with a Gaussian distributed error, and α 1 represents the total noise variance to the total transmit power ratio expressed by [28] After performing the RZF to x s , S can transmit the precoded data in parallel to D and all R l . At D, the received signal can be expressed as where ρ s represents the control factor of the source power, given by H SD ∈ C N A ×N A is the SD channel matrix with spatial correlation and n SD ∼ CN 0, σ 2 SD I N A represents an additive zero-mean spatially-white circularly complex Gaussian noise with variance σ 2 SD . At R l , l = 1, ..., L, the received signal is given by where H SR l ∈ C N A ×N A is the SR l channel matrix with spatial correlation and In the second transmission phase, all R l apply a linear beamforming matrix (BM) to received signals from the first transmission phase. We model the linear BM according to the zero forcing and regularized zero forcing (ZF-RZF) [29], denoted by F l and given by whereĤ R l D ∈ C N A ×N A is the matrix that estimate the transmission channel from R l to D with a Gaussian distributed error at lth relay node, and α 2 represents the total noise variance to the total transmit power ratio expressed by [28] After that, the resulting signals after ZF-RZF precoding are forwarded from all R l , and the received signal at D can be written as where ρ r l represents the control factor of the lth relay power, given by and H R l D ∈ C N A ×N A is the R l D channel matrix with spatial correlation and n R l D ∼

Spatial Correlation Model
The distance limitation between the antennas and scatterers existing in the propagation environment can affect the diversity, multiplexing, and capacity gains [30]. We propose to model spatial correlation for cooperative MIMO channels based on the Kronecker model [30]. Accordingly, the channel correlation matrices H SD , H SR l , and H R l D can be expressed as where H w SD , H w SR l , and H w R l D are full rank gain matrices of which the entries are i.i.d. and follow a circularly symmetric complex Gaussian distribution with zero-mean and unit variance. R H SD , RX , R H SR l , RX and R H R l D , RX are the receiver correlation matrices. Finally, R H SD , TX , R H SR l , TX and R H R l D , TX are the transmitter correlation matrices. Based on the exponential correlation model defined in [31,32], we model the entries of the receiver and transmitter correlation matrices presented in (13). In fact, for a correlation matrix, denoted R, the entries can be expressed as where ρ denotes the amount of correlation.

Imperfect Channel Estimation Model
In practical cooperative MIMO systems, it is unfeasible to obtain perfect CSI at all nodes. In fact, errors can be produced by channel estimation, quantization, reciprocity mismatch, and delay. Consequently, it results in the presence of imperfect CSI. In this paper, we consider that the estimation of the backward H SR l channels can be performed based on pilot signaling by R l , l = 1, ..., L and thus relay nodes have a perfect knowledge of H SR l [33]. However, we consider the existence of the imperfect CSI in SD and R l Dl = 1, ..., L links. Therefore, we model the imperfect CSI for SD and R l D, l = 1, ..., L links as [11] where the entries of Ω SD and Ω R l D are i.i.d. zero-mean circularly symmetric complex Gaussian variables with unit variance. In addition, these matrices, i.e., Ω SD and Ω R l D , are independent of H SD and H R l D , respectively. e 2 SD and e 2 R l D denote the estimation error variances of the SD and R l D channels, respectively. At the end of the second transmission phase, two copies of source data x s are received at D through the direct link SD, i.e., y SD Equation (6), and the cooperative links SR l − R l D, i.e., y RD (Equation (11)). These two copies are combined in order to increase the SNR. Therefore, the received signal at D without any time oversampling and optimum symbol timing and with perfect carrier frequency and phase estimation is given by After that, discriminating features will be extracted from received signals as an input to the random committee MLT [34].

Features Extraction
To correctly estimate the modulation from a received signal, an appropriate choice of key features is mandatory. The higher-order statistics (HOSs) that include the higherorder moments (HOMs) and higher-order cumulants (HOCs) are considered as promising features allowing to offer a good detection of modulation types [35,36]. For that reason, we choose the HOMs and HOCs statistics up to order eight for modulation detection purposes [36].
The mth-order HOM of a received signal vector at the ath antenna, denoted by can be written as [37] M mk y The HOMs can be also expressed aŝ The mth-order HOC of y where mth-order HOC can be written based on equal and lower ordered HOMs as [37] Cum y (a) D1 , ..., y (a) where Ψ runs through the list of all partitions of 1, . . . , j, ϕ runs through the list of all blocks of the partition Ψ and β is the elements number of the partition Ψ. The interested reader can refer to the Appendix A for further details. We raise each HOC to the power 2/m as the magnitude of HOCs increases with their order [38].
To improve the performance of the proposed algorithm in terms of modulation detection and decrease the computational cost, a reduced set of features is chosen based on the principal component analysis (PCA) technique [39]. In fact, this latter technique allows building a low-dimensional representation of the extracted features that describes as much as possible the variance in that features. It represents a linear transformation that transforms the components of the extracted features to orthogonal components. Thereafter, it ranks the resulting orthogonal components in a manner that those with the largest variation are placed in the top of the list. Consequently, the selected subset of features is the orthogonal components with the largest variance, while the remaining components are those that present high correlations and thus can be removed with a minimal loss of information. Simulations show that only ten orthogonal components, i.e., N f eat = 10, among twenty-eight, are chosen in the training and test phases.
To detect the modulation type of an unknown signal, denoted by y D , a training phase should be launched. It involves building a classifier from a learning database (DB). Based on the built classifier, the test phase is done to detect the modulation type. In Figure 2, we present the modulation detection of a given y D signal.

Random Committee Operating with Random Tree MLT
To detect the modulation type used by the source node based on the received signals at the destination node, we deploy the random committee MLT [34] using the random tree as a base MLT [40]. Let T represents the number of the training subsets (i.e., T 1 , . . . , T T ). These latter construct an ensemble of C 1 , . . . , C T random tree classifiers, where each random tree classifier is formed based on a various random number seed using the same training data. With the use of the random tree classifier, a set of features is randomly selected in each node to construct the classifier and the final detection decision, denoted by D f inal , is an average of the received predictions given by the individual random tree classifiers.

Adaptive Boosting (AdaBoost) Operating with Decision Tree
In this work, we compare the random committee with the AdaBoost (Adaptive Boosting) MLT [41]. In fact, AdaBoost MLT produces a set of sequential decision tree (J48) classifiers. Thanks to classifiers that were previously built, it adjusts the weights of the training samples. Here, the goal is to force the J48 classifier to reduce expected errors under different input distributions. In fact, the training samples, that are wrongly detected by former classifiers, will play an essential role in the training of later ones. Using AdaBoost MLT, a number of T weighted training subsets T 1 , . . . , T T are created in sequence and T classifiers C 1 , . . . , C T are construct. Then, the final decision, denoted by D f inal , is made based on the decision of C 1 , . . . , C T classifiers through a weighted voting rule. We notice that the weight of each classifier is set based on its performance on the training subset employed to construct it. In Figure 3, we present the random committee and AdaBoost processes.  Recall that T is the number of the generated training subsets, N f eat is the number of selected features, and let N samp is the size of the learning DB, the time complexities of random committee using random tree MLT and AdaBoost using J48 MLT are given in Table 2. Table 2. Time complexities of random committee using random tree classifier and AdaBoost using J48 MLT.

MLTs
Time Complexities

Multilayer Perceptron (MLP)
We also compare the proposed algorithm using the random committee MLT with the MLP MLT used in [24]. In fact, MLP is one of the most widely used artificial neural networks (ANN). It uses the resilient backpropagation (RPROP) proposed in [42], which is known for its good performance on pattern recognition algorithms. Note that the structure of the MLP contains one input layer, one or more hidden layers and one output layer. Here, each neuron of a layer is linked to all the neurons of the next layer. The intuition behind the introduction of this hidden layer is to enable the network to model the functions of complex nonlinear decision between any input and output layers. The optimal MLP structure to be employed in this work is determined using intensive simulations. In fact, we show that MLP with two hidden layers excluding the input and the output layers, where the first layer contains 10 nodes and the second has 15 nodes, provides a good trade-off between modulation detection and training time. Consequently, we use MLP with these settings in our simulation.

Metrics Used for Performance Evaluation of MLTs
In this paper, we compare between Random committe and AdaBoost MLTs using true positive (TP) rate, false positive (FP) rate, precision, recall, and F-Measure metrics. The precision, recall, and F-measure are given, respectively, as

Simulation Results
The performance of the proposed algorithm was verified for multi-relay cooperative MIMO systems over spatially correlated channels through numerical simulations. The simulated modulations set contains the following list: M = {16QAM, 64QAM, BPSK, QPSK and 8PSK}. A training set is built for each modulation type based on 50 realizations of signals with 512 × N A symbols, where the messages transmitted by the source node and MIMO channels are randomly generated in each realization. We assume that all sub-channels, i.e., SD SR l , and R l D, l = 1, ..., L, have the same correlation coefficient, i.e., |ρ| = ρ H ch , RX = ρ H ch , TX , where ch = (SD, SR l , R l D). We also assume that the sub-channels SD and R l D have the same variance of estimation error, i.e., e 2 = e 2 SD = e 2 R l D . In all results, we consider that all nodes are equipped with four antennas, i.e., N A = 4. Added spatially white circularly complex Gaussian noises with variances σ 2 ch are considered. Without loss of generality, we suppose that the sub-channels SR l have the same SNR, i.e., SNR SR = SNR SR l = 20 dB, l = 1, 2, . . . , L. Furthermore, we consider that the sub-channels SD and RD have equal SNRs, i.e., SNR = SNR SD = SNR RD .
In this work, we carry out a comparative study between the random committee MLT with the AdaBoost MLT using a 10-fold cross-validation [43] on the training set described above. Here, the number of training subsets is set to ten for both random committee and AdaBoost MLTs (i.e., T = 10). For all the results, we consider a cooperative MIMO system with N A = 4, L = 3, SNR SR = 20 dB, SNR = −5 dB and e 2 = 0.1 where |ρ| = 0.5. Here, we choose SNR = −5 dB to evaluate the MLTs performance at low SNR as a good detection performance, i.e., 100%, can be achieved at acceptable SNR values as shown in our proposal presented in [21].
Tables 3 and 4 display the detailed accuracy by modulation type for the random committee and the Adaboost MLTs, respectively, in the case of cooperative MIMO system with L = 3, N A = 4, SNR SR = 20 dB, SNR = −5 dB, e 2 = 0.1 and |ρ| = 0.5. By comparing the average of the presented metrics, i.e., TP rate, FP rate, precision, recall, and F-Measure, it is clearly shown that the random committee MLT offers a gain compared to the Adaboost MLT in terms of modulation detection as the values of TP rate, precision, recall, and F-Measure of the random committee MLT are higher than the ones of the Adaboost MLT. However, the value of the FP rate for the random committee is lower than the one of the Adaboost MLT. Therefore, the random committee MLT can be adopted for modulation detection.
Tables 5 and 6 confirm the obtained results. In fact, the percentages of correctly detected modulations are 84.86% and 84.285% for random committee and the Adaboost MLTs, respectively. Table 3. Detailed accuracy by modulation type for the random committee using random tree as a base MLT with L = 3, N A = 4, SNR SR = 20 dB, SNR = −5 dB, e 2 = 0.1, and |ρ| = 0.5. To confirm these results, we show in Figure 4 the probability of correct detection (P Correct detection ) of the proposed algorithm using random committee MLT as a function of the SNR compared to the algorithm proposed in [21] using J48 MLT alone and AdaBoost MLT operating with J48, where L = 3, N A = 4, SNR SR = 20 dB, e 2 = 0.1, and |ρ| = 0.5. Here, we consider that the test set consists of 1000 Monte Carlo trials for each modulation scheme (i.e., 5000 Monte Carlo trials in total N total = 5000). For each trial, N A test signals are considered where each signal consists of 512 i.i.d. symbols. It clearly shown that the proposed algorithm provides a good performance, as P Correct detection reaches 100% (i.e., P Correct detection 100%) at acceptable SNR. Furthermore, one can see that the modulation detection of our proposal is enhanced compared to the algorithm proposed in [21] for the two cases: using both the AdaBoost and J48 MLTs. For example, P Correct detection achieves about 100% at SNR equal to 5 dB and 10 dB for our proposal and the algorithm proposed in [21] using AdaBoost MLT, respectively. It is also shown that the MLP has the worst performance.  In addition to the provided modulation detection gain, the random committee MLT has a low complexity compared to the Adaboost MLT. Indeed, the required time taken to build the model for the random committee is more than seven times faster than the Adaboost MLT as shown in Figure 5. One can also show that the training time of the MLP is long. Consequently, the proposed algorithm provides a good tradeoff between modulation detection performance and complexity.

Conclusions
In this paper, we studied the detection of modulations for the multi-relay cooperative MIMO systems in the presence of spatially correlated channels. At destination node, we extracted the higher-order statistics (HOSs) as discriminating features of the received signals. After applying the principal component analysis (PCA) technique, we carried out a comparative study between the random committee and the AdaBoost MLTs at low SNR. The efficiency metrics, including the true positive rate, false positive rate, precision, recall, F-Measure, and the time taken to build the model, are used for the performance comparison. Simulation results demonstrated that the use of the random committee MLT, as compared to the AdaBoost MLT, offers gain in terms of the complexity and modulation detection.  (A1) C 20 and C 21 are written using (19) as By following the same manner, one can express the HOC up to eighth order in terms of HOMs. For example, C 40 , C 60 , and C 80 are defined, respectively, as