Deep Learning-Based Secure MIMO Communications with Imperfect CSI for Heterogeneous Networks

Perfect channel state information (CSI) is required in most of the classical physical-layer security techniques, while it is difficult to obtain the ideal CSI due to the time-varying wireless fading channel. Although imperfect CSI has a great impact on the security of MIMO communications, deep learning is becoming a promising solution to handle the negative effect of imperfect CSI. In this work, we propose two types of deep learning-based secure MIMO detectors for heterogeneous networks, where the macro base station (BS) chooses the null-space eigenvectors to prevent information leakage to the femto BS. Thus, the bit error rate of the associated user is adopted as the metric to evaluate the system performance. With the help of deep convolutional neural networks (CNNs), the macro BS obtains the refined version from the imperfect CSI. Simulation results are provided to validate the proposed algorithms. The impacts of system parameters, such as the correlation factor of imperfect CSI, the normalized doppler frequency, the number of antennas is investigated in different setup scenarios. The results show that considerable performance gains can be obtained from the deep learning-based detectors compared with the classical maximum likelihood algorithm.


Introduction
In recent years, the heterogeneous wireless network, which can support high-density and high-rate traffic, has attracted much research interest from both academic and industry sectors [1][2][3]. The cooperation between the macro base stations (BSs) and the femto BSs can greatly improve the quality of service (QoS) of the user equipment, as well as the spectrum efficiency and the energy efficiency [4]. Therefore, ultra-dense heterogeneous technology has been adopted as one of the potential solutions to the next-generation wireless sensor network [5].
On the other hand, the architecture of heterogeneous wireless networks brings up many challenges to the physical-layer security [6][7][8]. Several signal-level algorithms, such as beamforming [9,10], artificial noise [11,12] or stochastic geometry approach [13,14], have been proposed as the classical

•
We employ the deep learning-based technique for secure MIMO communications in heterogeneous networks, which can exploit the benefits of CNN learning model to produce more accurate CSI and meanwhile reduce the bit error rate (BER) of the receiver.

•
We provide the detailed framework of deep learning-based detectors, where imperfect CSI as well as the original messages or ideal CSI are included in the training set, and can be used in different application scenarios.
• We present simulation results for deep learning-based detectors in heterogeneous networks. With the help of the CNN technique, the proposed detectors show obvious performance gain over the MLD with acceptable computational cost.
Notations: We use CN (µ, σ 2 ) to represent the circularly symmetric complex Gaussian random variable with mean µ and variance σ 2 , f X (x) and F X (x) denote the probability density function (PDF) and cumulative distribution function (CDF) of a random variance x, respectively, diag(A) is a row vector consisting of all diagonal elements of A, A * is the conjugate transpose of the A, and H FM denotes the wireless channel fading matrix from M to F.

Related Work
Considering Gaussian wiretap channel, Fritschek et al. [32] introduced an auto-encoder to model the noised wireless channel with a novel security loss function. The generative model of the auto-encoder was trained to encode a message such that the eavesdropper cannot decode it correctly. Results show that the proposed scheme learns a trade-off between legitimate communication rate and secrecy capacity. With the help of the channel's statistical characteristic in relay networks, the authors in [33] proposed a new deep learning-based algorithm to design secure beamforming vector. Considering visible light communication, Xiao et al. [34] proposed deep reinforcement learning (DRL)-based secure communications strategy. Since the optimization of the system secrecy rate is non-convex and NP-hard, a suboptimal solution on beamforming vector can be obtained by introducing zero-forcing beamforming gain to the eavesdropper.
With perfect CSI, the deep feedforward neural network (DFNN) with three layers was adopted for time-slot wireless powered system in [35]. In the proposed scheme, the tuple system parameters, such as the time allocation factor, the power allocation factor as well as the rate of the wiretap channel were produced by the DFNN. During the training phase, the output of the DFNN was compared with the optimal system parameters obtained from exhaust search, and the mean squared error (MSE) was adopted as the performance loss function. Numerical results of the DFNN and the optimal parameters were provided to validate the proposed scheme. Using Stackelberg equilibria, the authors in [36] proposed a secure mobile crowd-sensing (MCS) scheme, where the DRL technique was adopted to derive the optimal MCS policy. Figure 1 depicts the model of secure MIMO communications for heterogeneous networks. In the considered system, there is a macro BS, a femto BS and a terminal user. It is assumed that three types of nodes are all equipped with multiple antennas, and the numbers of antennas are denoted as N M , N F and N U , respectively. In the heterogeneous networks, macro BS and femto BS work cooperatively to provide the wireless coverage. Specifically, the users located in the hot spots area are associated with the macro BS, and the users located in the network edge are served by the femto BS. We use H FM and H UM to denote the wireless channel fading matrix from the macro BS to the femto BS and the users, respectively.

System Model
In addition, the channel fading matrix is modeled as time-varying and flat fading using the classical Jakes model [37]. Thus, the correlation coefficient between adjacent samples is given as ( where f d is the normalized doppler frequency spread and J 0 (·) is the zero-order Bessel function of the first kind.
Thus, the channel fading matrix can be calculated as where n denotes the sample time and N UM (n) denotes the additional white noise matrix with the same size of H UM (n). The same equation can be applied on H FM as follows: Please note that in the following sections, the sample time n is omitted without loss of generality. It is well-known that the deep learning networks can effectively capture the correlation features of the training data set. Spatial correlation or antenna correlation, which may achieve dimensionality reduction, is an important challenge for MIMO systems [38,39]. Also, it will be a direction of our future work.
Since the time correlated MIMO channel model is adopted in this paper, we can use DCNN to obtain more accurate CSI from the outdated CSI. Actually, considering the phase rotation introduced by the channel matrix, the outdated CSI is necessary to assist the data recovery at the user.

User
Macro BS Femto BS Due to the open nature of the heterogeneous networks, the femto BS may intercept the signal transmitted from the macro BS to the users. To prevent the information leakage to the femto BS, the macro BS can zero-forcing the equivalent channel matrix of the femto BS by the null-space technique. In this case, the macro BS first obtains the CSI H FM between the macro BS and the femto BS. Then the null-space eigenvectors can be produced by applying eigenvalue decomposition on the autocorrelation matrix, i.e., where Eig(·) denotes the eigenvalue decomposition, v denotes the eigenvalues in ascending order and V are the corresponding eigenvectors. Considering the size limitation for both the femto BS and users, it is reasonable to assume that the antenna number of the macro BS is larger than the femto BS and users, i.e., Thus, the number of zero eigenvalues can be given as Please note that N D is also the number of null-space vectors for H FM , which is given as where V :N D denotes the first N D column vectors of V. Thus, the beamforming matrix B, which is used by the macro BS to transmit messages to its associated user, lies in the null-space of H FM . Then the signal received at terminal user can be expressed as where P is the transmission power of the macro BS, x ∈ CN (0, I) is the original message transmitted from macro BS with size N D and N ∈ CN (0, σ 2 I).
Since the equivalent channel fading matrix of the femto BS is zero-forced, we only need to observe the signal-noise-ratio (SNR) of the associated user. We define the average normalized SNR received at the user as Classically, the receiver obtains the wireless CSI through pilot signals transmitted from the BS, and there exists a time difference between the channel estimation and the data packet transmission. Thus, the estimated CSI is an imperfect version of the instant of packet transmission. To reduce the analysis complexity, it is assumed that the estimation of H FM , while that of H UM is imperfect. Specifically, the imperfect equation is modeled aŝ where ξ is the correlation factor of the imperfect version of channel matrix. The standard maximum likelihood detector (MLD) with imperfect CSI can be employed to detect the original message asx where Ω is the all possible constellations set.

Deep CNN-Based Detector
The imperfect CSI, which is introduced by the noise or delay of channel estimation, will greatly deteriorate the system performance of standard MLD. To overcome the effects of the imperfect CSI in secure MIMO communications, two types of deep CNN (DCNN)-based detectors are proposed in this section, which can be used in different application scenarios. The deep CNN models are first trained with predefined loss functions and then used to generate the refined CSI H UM , which can be fed to the MLD to obtain the original message, i.e.x cnn = arg min The details of the DCNN model is given as Figure 2, where there exist N one-dimension convolutional layers excluding the input layer. In the input layer, the channel fading matrixĤ UM is reshaped as a column vector. Since only real data can be processed in the CNN model, the complex data ofĤ UM can be treated as two real channels [40]. Please note that each convolutional layer is followed by a ReLU activation function except the output layer. Moreover, in the n-th convolutional layer, there are {F n , n ∈ [1, N]} features maps with filter length {L n , n ∈ [1, N]}. Specifically, in the output layer, there is only one feature map.
As to the detailed architecture of the DCNN, we must find the trade-off between complexity and performance. It is noted that the fully connected DNN, which may hold better performance, while its computational complexity is proportional to the square of the number of nodes. On the other hand, both the training data set and the training time required by fully connected DNN the is too large to be satisfied. There also exist some powerful CNN models, such as VGG [41] and ResNet [42], which improve the detection probability by increasing the depth of the models to 19 and 34, respectively. Specifically, the number of parameters for VGG-19 is up to 144M. Thus, to decrease the computational complexity, we must simplify the classical CNN models as follows. The architecture of DCNN model includes N=4 layers, and is described by F n and L n as F n = {32, 16 Figure 3 shows the first training method of DCNN-based detector with accurate CSI, which is denoted as DCNN type-I. In the training phase, a data set including both the imperfect CSIĤ UM and the accurate CSI H UM is fed into the learning model. The loss function is defined as the mean square error between the output of the model H UM and the accurate CSI H UM , i.e.
During the DCNN model training, the loss function is calculated batch by batch and used to optimize the weight and the bias of the DCNN model [43]. Please note that the accurate CSI is necessary for DCNN type-I, which is used to calculate the model loss function. Thus, the application of DCNN type-I is limited, because in some practical scenarios, it is difficult to obtain the accurate CSI especially in the wireless MIMO communications. Therefore, another type of DCNN is proposed to overcome this limitation.

DCNN Type-II: Training with Original Message
The training architecture of DCNN type-II is given as in Figure 4, where accurate CSI is not needed. Instead, the output of DCNN H UM is used in MLD and obtain the likelihood of each candidate message as follows:q By using of the SoftMax function, the normalized likelihood probability of each candidate message can be given as We use p i , i ∈ [1, |Ω|] to denote the correct probability of each candidate message. That is p i = 1 if i-th candidate message is correct, otherwise p i = 0. Inspired by the information theory, the cross-entropy can be used to quantify the difference between two probability vectors. Accordingly, for probability distributions of q i and p i , we can calculate the cross-entropy as follows: Then, the cross-entropy C(p, q) can be used as the loss function to train the DCNN model. Compared with the loss function in (12), only the received signal y, the beamforming matrix B and the original message x are needed, which enlarge the application scenarios of the DCNN type-II. On the other hand, without the help of the accurate CSI, DCNN type-II leads to deteriorated performance compared with type-I, which can be validated in the simulation results.
Please note that the DCNN model could also output the ground-true symbol x directly in a supervised manner, and the outdated MIMO channel matrix could be further employed as side information by inputting it to the DCNN. We use DCNN type-III to denote the new DCNN model. Although the detailed architectures of DCNN type-II and the suggested DCNN type-III are different, they are functionally equivalent as a black box with DCNN kernels. In other words, the MLD module in DCNN type-II can be seen as part of the functions of DCNN type-III.

Simulation Results
In this section, simulation results are provided to verify the proposed DCNN models. The impacts of system parameters, such as the correlation factor of imperfect CSI ξ, the normalized doppler frequency f d , the number of antennas is evaluated in different setup scenarios. Since the equivalent channel fading matrix of femto BS is zero-forced, BER of users with different detectors is used to evaluate the system performance.
Specifically, QPSK modulation is employed in all simulation setups. Since the constellations of QPSK modulation is 2-D complex signal, we can generalize the setup to an arbitrary modulation order. We set the data packet length as 600 bits, and each batch consists 10 data packets. During the training phase, a training data set with 10000 batches as well as a validation data set with 1000 batches are fed to the DCNN, and a test data set with 1000 batches is used to evaluate the BER performance of the proposed schemes. The popular TensorFlow framework [44] is adopted in our simulations, while the adaptive moment estimation (Adam) optimizer is used to minimize the loss value during training phase. In particular, the optimization parameters are listed as follows: learning rate is 0.001, β 1 = 0.9, β 2 = 0.999, = 10 −8 . Since the testbed of our paper is under construction at this moment, we will present experiment results on real datasets in future works. Figure 5 depicts the BER performance versus the SNR of the system with MIMO setup as N M = 4, N U = 4, N F = 2, the normalized doppler frequency f d = 0.1, and the correlation factor of imperfect CSI ξ = 0.90. The BER of three types of detectors, such as standard MLD, DCNN type-I and DCNN type-II are compared. As a benchmark, the BER curves obtained by MLD with the perfect CSI is also presented. We can see from this figure that the outdated CSI has obvious adverse effects on BER performance. As shown in this figure, in the high SNR region, DCNN-based detectors show a performance gain of about 4dB in comparison to the standard MLD. The reason is that the former can refine the imperfect channel matrix and produce more accurate CSI, then the BER of the system can be improved.
Moreover, two types of DCNN-based detectors show almost the same performance with slight gap. Similar results can be obtained from Figures 6 and 7, where the correlation factors of imperfect CSI are ξ = 0.8 and ξ = 0.7, respectively. However, accurate CSI is necessary for DCNN type-I, which is used to calculate the model loss function. Thus, the application of DCNN type-I is limited, because in some practical scenarios, it is difficult to obtain the accurate CSI especially in the wireless MIMO communications. Compared with the loss function, only the received signal y, the beamforming matrix B and the original message x are needed, which enlarge the application scenarios of the DCNN type-II. On the other hand, without the help of the accurate CSI, DCNN type-II leads to deteriorated performance compared with type-I.    The effect of the normalized frequency f d is present in Figure 9, where the MIMO configuration remains the same with previous setup. The BER curves of both DCNN training models are provided with f d = 0.1 and f d = 0.05, respectively. We can see from this figure that smaller f d produces better performance with a gain of about 4dB. The reason is that if f d is smaller, the channel fading matrix changes more slowly, and the wireless channel can be learned more efficiently by the DCNN. As a result, more accurate CSI can be produced and enhancing the BER performance.  Figure 10 is N M = 4, N U = 4, N F = 1, and the number of data stream N D = 3. In other words, the spectrum efficiency is higher than the previous setup. In Figure 11, the antenna configuration is N U = 2 and N U = 3, respectively. We can see from the two figures that the performance gain of DCNN with larger antenna number of user is obvious compared with the smaller antenna number, especially in the higher SNR region. Specifically, with N F = 1, the performance gain of DCNN to the standard MLD is about 6dB. The performance gain of N U = 3 is about 8dB than that of N U = 2. That reason is that the larger receiver antennas' number can introduce more freedom of space diversity, thus the BER will be decreased greatly.

Conclusions
In this paper, we investigate two types of deep learning-based secure MIMO detectors for heterogeneous networks. In the considered system, the equivalent channel fading matrix of the femto BS is zero-forced through null-space eigenvectors. The BER of the associated user is adopted as the metric to evaluate the system performance. with the help of deep convolutional neural networks, the macro BS produces more accurate CSI. The impacts of system parameters, such as the correlation factor of imperfect CSI, the normalized doppler frequency, the number of antennas are investigated in different setup scenarios. Considerable performance gains can be obtained from the deep learning-based detectors compared with the classical maximum likelihood algorithm.

Conflicts of Interest:
The authors declare no conflict of interest.