NLOS Identification in WLANs Using Deep LSTM with CNN Features

Identifying channel states as line-of-sight or non-line-of-sight helps to optimize location-based services in wireless communications. The received signal strength identification and channel state information are used to estimate channel conditions for orthogonal frequency division multiplexing systems in indoor wireless local area networks. This paper proposes a joint convolutional neural network and recurrent neural network architecture to classify channel conditions. Convolutional neural networks extract the feature from frequency-domain characteristics of channel state information data and recurrent neural networks extract the feature from time-varying characteristics of received signal strength identification and channel state information between packet transmissions. The performance of the proposed methods is verified under indoor propagation environments. Experimental results show that the proposed method has a 2% improvement in classification performance over the conventional recurrent neural network model.


Preliminaries
In this section, we consider a system model and experimental data for commodity WLANs, where a receiver obtains the RSSI at each transmission and estimates the frequency-domain CSI of the subcarriers.

System Model
Let h = [h(0), h(1), ..., h(L − 1)] T be the time-domain channel impulse response (CIR), where L is the number of multipath taps. The frequency-domain CIR for the kth subcarrier can be modeled as [16] H(k) = where N is the fast Fourier transform (FFT) size, k ∈ K and K is the fast Fourier transform (FFT) size, N − K − 1 subcarriers at the edges of the spectrum are not used and the used subcarriers can be indexed by K = −K/2, ..., −1, 1, ..., K/2, where K is the number of used subcarriers. At the receiver, the channel state information (CSI) for the kth subcarrier is estimated aŝ where n(k) is complex Gaussian noise for the kth subcarrier with zero-mean and variance of N 0 [16]. In IEEE 802.11 WLANs, RSSI is provided for upper layer information. At each transmission, the RSSI is used as an indication of the received power level. RSSIs for the LOS condition are concentrated at a high value, while RSSIs for the NLOS condition are distributed over a wide range [16].

Experimental Data
For performance comparison with the previous result, we exploited data collected at Seoul National University [16]. Figure 1 shows the layout of the measurement site, which can be considered a typical indoor office environment. For measurement campaigns, two laptops equipped with Qualcomm Atheros network interface cards (NICs) were used to capture both RSSI and CSI. The height of the transmitter and the receiver were fixed at 1.2 m. A person holding the receiver walked around the highlighted area shown in Figure 1 to collect data while recording the labels of the collected data: LOS if there was no obstacle between the transmitter and the receiver or NLOS if the direct path was blocked by the person holding the receiver or other obstacles, e.g., walls and doors.
The measurement took 4300 s to complete. During the measurement campaigns, the transmitter sent sounding packets every 10 ms and the receiver measured RSSI and CSI per packet transmission. For signal transmission, IEEE 802.11n protocol with a 20-MHz bandwidth was used, and therefore, total K = 56 CSIs (i.e., full CSI report) and an RSSI were measured for each point-to-point link. Moreover, the transmitter and the receiver were equipped with two and three antennas, respectively, and six sets of CSI and three sets of RSSIs were measured during each packet transmission. Using these protocols, a total of 101,197 packet transmissions were measured under the LOS condition and 331,365 packet transmissions were measured under the NLOS condition.

Proposed CNNLSTM Model
To classify LOS and NLOS in WLANs, we propose a novel CNNLSTM model. Figure 2 shows the overall framework of the proposed model, that comprise CNN and RNN segments. As shown in Figure 2, the CSIs form the input signals for the CNN, while the output of CNN concatenates with the RSSIs, and feed into the LSTMs for the LOS and NLOS classification.  Figure 3 shows the proposed CNN model, that comprises one input vector, L convolutional layers, and one output vector with size N C × N F by a Flatten layer. The input vector for the pth packet transmission can be expressed as

CNN Part
whereĤ p [k] is the CSI for the pth packet transmission, and R(.) and I(.) are the real and imaginary parts of the complex value, respectively. The lth convolutional layer convolves the input regions locally using N F filter kernels, where each filter uses the same kernel to extract the local features of the input region. The output of a convolution operation at the lth layer for one filter is determined by where N (l) K is the kernel size of the filters, w (l) r and b (l) are the weight and bias elements located at (r) on the kernel, respectively, in the lth convolutional layer. In addition, a(.) represents a non-linearity activation function, that is typically given by the sigmoid, softsign, hyperbolic tangent (tanh) and rectified linear unit (reLU), etc. [31]. Without zero-padding, the output size is calculated as where N S is a stride, which corresponds to how much a filter is shifted at a time. We put batch normalization (BN) after the non-linearity activation function applied after each CNN layer. The BN plays a role in regularization; its benefits are discussed in [32]. As shown in Figure 3, the CNN segment has L layers and we finally stress out the data to a vector with size N C × N F by a Flatten layer. Note that CSI data is different from actual images so when applying the CNN, we need to change some structures from the normal CNN model. The first thing is to set the stride step N (1) S by even numbers (2,4, etc.) in the first convolutional layer to guarantee the characteristic of complex input data. Because the size of the CSI packet is small, the second difference is that we do not apply any pooling layers, thereby significantly reducing the size of the input, leading to the loss of some important information to training in the RNN segment.

RNN Part
The RNN model is composed of LSTM modules and an output layer for classification. The input vector for the LSTM module is defined as u p = [r p , z p ] where r p is an RSSI value for the pth packet transmission. The structure of the LSTM is shown in Figure 4. At the current time step p, the equations below describe the internal structure of the LSTM module: where u p is the input to the LSTM block; i p , f p , o p , c p and h p are the input gate, the forget gate, the output gate, the cell state, and the output of the LSTM block, respectively. W ui , W u f , W ug , and W uo are the weights between the cell state and the input gate, the forget gate, the external output gate, and the output gate, respectively. W ci , W c f ,W cg ,and W co are the weights between the cell state and the input gate, the forget gate, the external output gate, and the output gate, respectively, and finally, b i , b f , b g , and b o are the additive biases of the input gate, the forget gate, the external output gate, and the output gate, respectively. The sigmoid function σ(.) and the hyperbolic activation function tanh(.) are used as activation functions. In (7) and (8), the cell state, c p , and the output of the LSTM block, h p , are calculated using the outputs form the above gates in (6), where denotes an element-wise multiplication. Finally, for the NLOS and LOS condition decision, we put the feature vector h P extracted at the last LSTM cell through a single perceptron layer where P is the number of packet transmissions. The output h θ of the model is calculated as follows: where V is the weight matrix that transfers the values in the Fully Connected (FC) layer to the output layer and b is a bias factor. In (9), the sigmoid function σ(.) is used to transform the logit of the single neuron in the final stage to calculate the probability for classifying the LOS or NLOS. We set y = 1 for LOS conditions and y = 0 for NLOS conditions. During the training stage, at each epoch, we select multiple batches from the set of input and output pairs ([X, r], y) to train and verify the proposed CNNLSTM model. Every parameter in the model is adjusted to minimize the following lost function where N is the batch size of model, C(g) is the cost of the gth input and output pair that measures how accurately the model predicts the label that corresponds to the input. Among many choices of the loss function used in optimizing our model, we adopt the binary cross-entropy function, expressed by where the superscript is used to indicate the index of the input and output pair.
To minimize the loss function, many variants of the gradient-descent method such as AdaGrad, AdaDelta, and Adam optimizers have been studied. These optimizers adaptively change the learning rate to properly minimize the loss function. In this study, we applied the Adam optimization algorithm to train our proposed CNNLSTM model as the Adam optimizer is straight-forward and saves memory and computational resources.
The process starts with random initialization of all the model parameters. During the training phase, the weight update takes place after a whole sequence has been propagated forward through the network. The error signals are calculated with respect to the Mean of Cross Entropy Losses cost function. The loss function was chosen as the natural cost function for the sigmoid output layer with the aim of maximizing the likelihood of classifying the input data correctly.
Once every parameter in the proposed CNNLSTM model is adjusted appropriately, the model can identify the channel condition based on the following simple hypothesis test where H 0 and H 1 are null and alternative hypothesis, respectively, and α denotes the decision threshold. We assume that, LOS detection rate is a true positive rate (TPR) corresponding to the portion of correct decisions among all measurements under the LOS condition. Similarly, NLOS detection rate is a true negative rate (TNR) corresponding to the portion of correct decisions over all measurements. These statistical values depend on the decisions.

Performance Evaluation
In this section, we will discuss the results of the proposed scheme using CSI and RSSI data with a total 100,000 packet transmissions. We assess several numbers of packet transmissions, P = 10, P = 20, P = 50 and P = 100. We split our dataset into three parts: training, validation, and test sets. We use 70% of sample points to build our classification model during the training phase. 15% of data were used to compare the performances of the models in the validation phase. We selected the best model for the test phase. Finally, we applied our chosen model to the test set, the remaining 15% of the original data set, so as to evaluate how our model performs on unseen data. Note that, the test set was not used in the experiments.
The model was trained by truncated backpropagation through time [33] with Adam optimization [34] with an initial learning rate of 0.001. On the dataset, we used a minibatch [35] with a size of 128 for high efficiency. After each batch, the gradients were averaged and updated. We employed the early stopping method to stop the training when the validation accuracy becomes stagnant and does not increase after 10 epochs. We adopted the dropout method with a probability of 0.5 after the CNN layers and LSTM layer for regularization to avoid over-fitting problems [36].
In the CNN segment, we used hybrid hyper-parameters settings for the CNNs with high numbers, L of convolution layers to extract the implicit features of data. We applied the CNN models with a different number of N (l) F filters, such as 16, 32, 64, 128, etc. with different kernel sizes, such as 2, 3, 4, etc., and with several kinds of popular activation functions, such as ReLU, tanh, sigmoid, etc. After testing out all the simulation settings, the best achieved model is as shown in Table 1. Here, we set number of CNN layers L = 3 with kernel size, N   Figure 5 shows the convergence of the model over epochs for the training and validation set. It can be seen that, the accuracy of the training set shows a trend of improvement in performance after each epoch. Conversely, the accuracy of the validation set decrease and fluctuates after reaching the top point of 96.32%. To avoid wasting time in training the data, we used the early stopping method that automatically stops the model if the accuracy of the validation set does not improve after several epochs (in our model, this value was set to 10). The peak point of the highest accuracy for the validation set occurred in epoch 21 and was marked by X symbol in the figure.
In this paper, we also implemented CNN methods and compared them with the conventional method LSTM method. The CNN model was also optimized similarly to the proposed CNN segment, except here, instead of the LSTM layer, we added the Flatten layer after the final CNN layer. The LSTM model only learns time relative sequence information whereas the CNN model focuses on extracting implicit features that contain space information, while our proposed model offers both of these advantages. Figure 6 illustrates the performance of the proposed model with various values of P for the test set. It can be seen that the best outcome was obtained at P = 50. Even if P is increased to 100, the performance shows a decreasing trend for all models. Hence, in the results shown below, we use P = 50 to compare to the building data.     Table 2 summarizes the performances decision thresholds that are selected to maximize the average LOS/NLOS detection rate for the LSTM, CNN, and CNNLSTM models. As can be seen, the CNNLSTM model outperforms the other models in both accuracy and average detection rate for all values of P. Note that CNNLSTM* denotes a model that applies the absolute values of CSI data. It offers slightly better performance than the LSTM and CNN models while using simpler data, so we can consider it as a choice when generating data from practical instruments. The input signal for the CNN segment in this case can be written as where |Ĥ p [k]| is the amplitude value of the CSI for the p th packet transmission.  Table 3 shows the total training time and the number of parameters of the LSTM, CNN and proposed CNNLSTM models, where an NVIDIA Titan X GPU 1.4 GHz with 3584 cores is used for simulations. The total training time for the proposed CNNLSTM is comparable to the CNN and LSTM models because the proposed model takes only 21 epochs to converge to the optimal solution. The number of parameters for the proposed CNNLSTM is larger than the LSTM model. However, the memory requirement is less demanding in the test time evaluation because there is no backward propagation. In addition, the proposed CNNLSTM* reduces the total training time and number of parameters compared to the proposed CNNLSTM. In Figure 7, we use the receiver operating characteristic (ROC) curve, that describes the relationship between TPR and FPN. If the performance is better, the ROC curve will approach a point in the upper left edge, that implies perfect discrimination. As we can see, the LSTM model has worse performance. Conversely, the proposed CNNLSTM model offers the best result because its area under curve (AUC) of 0.9928 approximates with perfect result of 1 [18,19,37]. To further understand what is inside the model, we analyze the internal representations of the trained network. Following the training procedure, the hidden state vector of the last LSTM module was used to visualize the trained network. Figure 8 shows t-SNE representations using 5000 different inputs from the training set of the proposed method and conventional LSTM method [16], where t-SNE aims to project the high dimensional vectors to two-dimensional space while retaining their pairwise similarity [38]. In the figure, we can see that the hidden state for the proposed method was much more dispersed compared to the hidden state for the conventional method [16]. This explains the improved accuracy based on feature extraction by using CNN segment in the proposed CNNLSTM.

Conclusions
In this paper, we proposed a deep learning model to identify channel conditions by combining CNN and RNN. In the proposed CNNLSTM model, the CNN captured the feature from frequency-domain characteristics of CSIs and then LSTMs extracted the temporal feature from RSSI and the output of CNN. In addition, a CNNLSTM model with the absolute value of CSIs was proposed to reduce the complexity with slightly better performance than the conventional models. The proposed methods were verified under indoor environments for WLANs and achieved higher accuracy than the conventional LSTM model in classifying LOS and NLOS. In future work, we would like to investigate the performance of the proposed CNNLSTM models in outdoor environments to expand the range of applications of the algorithm.

Conflicts of Interest:
The authors declare no conflict of interest.