Hybrid Wireless Fingerprint Indoor Localization Method Based on a Convolutional Neural Network

In the indoor location field, the quality of received-signal-strength-indicator (RSSI) fingerprints plays a key role in the performance of indoor location services. However, changes in an indoor environment may lead to the decline of location accuracy. This paper presents a localization method employing a Hybrid Wireless fingerprint (HW-fingerprint) based on a convolutional neural network (CNN). In the proposed scheme, the Ratio fingerprint was constructed by calculating the ratio of different RSSIs from important contribution access points (APs). The HW-fingerprint combined the Ratio fingerprint and the RSSI to enhance the expression of indoor environment characteristics. Moreover, a CNN architecture was constructed to learn important features from the complex HW-fingerprint for indoor locations. In the experiment, the HW-fingerprint was tested in an actual indoor scene for 15 days. Results showed that the average daily location accuracy of the K-Nearest Neighbor (KNN), Support Vector Machines (SVMs), and CNN was improved by 3.39%, 8.03% and 9.03%, respectively, when using the HW-fingerprint. In addition, the deep-learning method was 4.19% and 16.37% higher than SVM and KNN in average daily location accuracy, respectively.


Introduction
In the last few decades, location-based services (LBS) have become an integral part of daily life. Nowadays, satellite location systems have poor accuracy indoors because indoor environments have more obstacles that cause severe attenuation to satellite signals [1]. To obtain a satisfying real-time indoor location, many methods have been proposed. An indoor location method based on Radio Frequency IDentification (RFID) was proposed in [2,3]. However, the method has the following disadvantages: anti-interference is poor, and the biggest problem is that, when the mobile terminal does not actively scan the RFID tag, localization cannot be implemented. In recent years, Ultrawideband (UWB) [4] and infrared [5] have achieved high precision in indoor location applications and attracted the attention of many researchers. However, the difficulty of implementation and relative high hardware costs restrict its practical applications.
In many RSSI-based indoor localization technologies, location based on WiFi has become popular because of the widespread use of wireless local-area networks (WLANs) [6]. The WiFi-based indoor location method is divided into two types, one based on a propagation model and another based on the RSSI fingerprint. The method based on the propagation model uses the time of arrival (TOA) [7,8] of the signal between nodes or the angle of arrival (AOA) of the signal [9] to determine the position coordinates. The Distance Vector-Hop (DV-Hop) algorithm is a very frequently used algorithm for Wireless Sensor Network(WSN). DV-Hop estimates the distance through the hop count between nodes in which the value of the hop count is discrete; thus, there is the serious consequence that some nodes have the same estimated distance when their hop count with respect to identical nodes is has also been used in some studies, but the literature [32] needed to know the physical layout of the indoor environment, and [31] did not consider the impact of time changes on indoor positioning. Therefore, we propose a HW-fingerprint approach that does not require an understanding of the indoor-environment layout and takes the effects of time changes into account.
The main contributions of this work are as follows: • HW-fingerprint is proposed. The ratio relationship between APs was combined with the RSSI fingerprint; • CNN architecture was constructed to effectively learn characteristics from the HW-fingerprint; • Different CNN architectures were tested to find the best location model; • The proposed method was verified with 15 days of collected data in the actual environment. The tested algorithm had better location results when using the HW-fingerprint.
The rest of this article is organized as follows. Section 2 describes the proposed method in detail. Section 3 shows the experiment results and discusses the proposed work. Finally, conclusions are given in Section 4. Figure 1 shows the proposed methods, where we constructed the HW-fingerprint both in an offline and online phase. The deep-learning method was used to learn the features of the HW-fingerprint and predict the indoor location. In the offline data-processing module, we could obtain the Media Access Control (MAC) address sequence of the relevant AP that was used to construct the ratio fingerprint, and also to match and construct the Ratio fingerprint corresponding to the online HW-fingerprint. At the same time, information about the range of RSSI and the Ratio fingerprint obtained in the offline data-processing module was also used to standardize the online HW-fingerprint. In the offline phase, we trained a CNN model for indoor locations, and used this model to predict the location of online data in the online phase.

Proposed Methods
Subsequently, we introduce the location method in detail. First the reasons for building a HW-fingerprint are introduced. Second, the processes of building the HW-fingerprint in offline and online phases are described. Third, reasons of treating the location problem as image classification are described. Finally, details of the CNN location architecture are presented.

Offline Phase
Online Phase

Analysis of HW-Fingerprint Construction
In this subsection, we analyze RSSI changes in the environment and describe the ideas for the construction of the Ratio fingerprint. As we know, currently used WiFi devices mainly use the 2.4 GHz band. A 2.4 GHz WiFi signal is severely attenuated in water (microwaves are heated up by water molecules absorbing the energy resonance of this band). If the water density in the air changes, the WiFi signal has different attenuation.
Shown in Figure 2a are changes in RSSI values of two APs collected by same reference point (RP). RSSI distribution was relatively stable in the short term. Therefore, good location results can often be achieved in the short term. As we know, indoor environments have more unstable factors than outdoor environments. There is much disturbance in an indoor environment, such as instantly moving targets or someone temporarily standing next to the AP and blocking the WIFI signal. instant RSSI positioning measurements are not sufficient as fingerprints for location, and these temporary interferences can be solved by collecting more samples. In Figure 2b, a box diagram of RSSI changes of APs in the next 15 days is described. RSSI distribution is very different in different days. This is a reason why the method based on propagation-model positioning needs to update the propagation model in time. Table 1 shows the weather for five days; it can be seen that the weather is different every day, whether in terms of temperature or weather conditions. Therefore, the RSSI becomes unstable when the environment changes, so here we mainly considered the change of daily RSSI values due to the change of water molecular density in the air.
In the case of WiFi devices using a 2.4 GHz frequency band, if water density in the air changes, the WiFi signal has different attenuation, but RSSI attenuation caused by different water molecular densities per day means that the RSSI ratio between different APs is relatively stable when all AP in same air humidity environment at the same attenuation rate. According to previous analysis, this gave us the idea of constructing the corresponding Ratio fingerprint by calculating the ratio of RSSI between different APs. The constructed Ratio fingerprint was considered to be added into the RSSI fingerprint to construct the proposed HW-fingerprint, which could make up for the RSSI fingerprint not being able to express indoor environments well.

Acquisition of Offline HW-Fingerprint
In this subsection, we divide the process of obtaining the HW-fingerprint offline into three parts as indicated by the dashed box in Figure 3. In order to obtain the HW-fingerprint, we copied the offline RSSI database, one for obtaining the offline RSSI fingerprint and the other for obtaining the Ratio fingerprint, and finally merged them to obtain the final HW-fingerprint.

Acquisition of RSSI Fingerprint
In order to obtain RSSI fingerprints, the offline RSSI database was processed by a normalization module; the detailed process is as follows.

Acquisition of Ratio Fingerprint
To obtain the Ratio fingerprint, the specific implementation was divided into the following steps: (1) In the AP selection module, the contribution weight of every AP in the initial RSSI fingerprint needed be calculated. Number j denotes the number of collection times in the j-th AP (where j = 1, . . . , J and J are the total APs that could be collected in the environment); the total number of RSSI fingerprint samples collected in the indoor environment is Number total . Then, the j-th AP's contribution weight was calculated with the following equation: where W j is the contribution weight of j-th AP. After obtaining the contribution weight of each AP, we set a minimum weight threshold. We then selected the APs of which the weight was above the minimum threshold and obtained its corresponding MAC sequence and RSSIs.
(2) In the ratio-fingerprint building module, we could obtain the initial Ratio fingerprint. Let the MAC address sequence of the selected AP be MAC importent = {MAC 1 , MAC 2 , . . . , MAC V }, where V belong to 1, . . . , M and is less than M. Their corresponding RSSI is RSSI importent = {RSSI 1 , RSSI 2 , . . . , RSSI V }. According to the corresponding RSSI data of the AP, the Ratio fingerprint was constructed by the following equation: if RSSI K , RSSI I Both are 0 or have a 0 x, and x belongs to 1 ≤ p ≤ V − 1. We could then obtain the Ratio fingerprint.
(3) The filter was set to filter outliers of the initial Ratio fingerprint. In the filter module, the ratio with 0 should not be counted in the total. The quartile of the Ratio fingerprint was obtained by the box-plot analysis method. The outliers in the Ratio fingerprint were defined as: where Q l is the lower quartile, Q u is the upper quartile, and IQL is the quartile range. We filtered out the outlier according to Formula (4) and filled the exception element with a value of 0.
(4) In the normalization module, we normalized the Ratio fingerprint: we set the maximum ratio as Ratio MAX = Q u + 1.5 * IQL, and the minimum ratio as Ratio MI N = Q l − 1.5 * IQL; then, the normalization formula for the Ratio fingerprint was: x.

Acquisition of Online HW-Fingerprint
Online location RSSI data are transferred to the HW-fingerprint like the offline HW-fingerprint. With the processing foundation of obtaining the HW-fingerprint in offline phase, some steps can be utilized to keep the HW-fingerprint obtained in the offline and online phases consistent. The implementation process specifically includes the following steps: Step 1: Matching online RSSI data based on the MAC sequence that constructs the initial RSSI fingerprint and normalizes the online RSSI fingerprint according to the upper and lower limits that were set in Section 2.2.1.
Step 2: According to the MAC importent = {MAC 1 , MAC 2 , . . . , MAC V } sequence from the offline phase, we matched its associated online RSSIs. Then, we constructed the online Ratio fingerprint with the method of obtaining the Ratio fingerprint in Section 2.2.2.
Step 3: According to Section 2.2.3, we combined the RSSI fingerprint and ratio-fingerprint sequences to form the online HW-fingerprint.

Indoor-Location Analysis and Image Classification
Since a CNN has strong feature-extraction capabilities in image classification, this helped us to translate the problem of indoor location into image classification. According to the HW-fingerprint obtained in Section 2.2, the RSSI fingerprint was compressed into a distribution of 0-1 by RSSI MAX and RSSI MI N , and the Ratio fingerprint was normalized to the range of 0-1 through the outlier boundary of ErrorValue. In the image, the brightness of the grayscale image was 0-255. Finally, we reshape the vector into a matrix. Considering the multiplication of 255 for each feature of the HW-fingerprint, we could obtain the brightness distribution of the HW-fingerprint in the image. These luminance distributions give us a visual representation of the HW-fingerprint. As shown in Figure 4a, the sparse part of the white-and gray-pixel distribution on the left side was constructed by the RSSI fingerprint, while the dense part of the white pixel in the middle was constructed by the Ratio fingerprint, and the black part on the right side was the space reserved for the possible Ratio fingerprint. In the Ratio fingerprint, the dimension of the ratio fingerprint was determined by the number of selected APs.
We set up some RPs in our laboratory that collected RSSIs for the following experiment. The specific settings are described in detail in the subsequent experiment sections. As shown in Figure 4, grayscales a, b, and c were visualized by different samples from RP1. The RP is a collection point of the RSSI fingerprint. Generally, an indoor environment is divided into multiple subareas, and collection points are generally set at the center of the subareas. Grayscales e, d, and f in Figure 4 are visualized by different samples from RP2, and RP2 is another collection point, different from RP1. In Figure 4, we marked some regions with a rectangle in six grayscales, where it can been seen that samples from the same RP had similar brightness and pixel distribution at the marked locations, while samples from different RPs had different brightness and pixel distribution at the marked regions; this visual difference enlightens us to treat the problem of indoor location as a problem of image classification.
Similarly, the HW-fingerprint contains a lot of noise. In Figure 4, it can be seen that even samples from the same RP still have differences, which reflects the complex and varied characteristics of indoor environments. In order to solve this problem, in the following content, we introduce the CNN location model we constructed. The CNN was used to learn useful features from the HW-fingerprint with much noise and to determine user position.

Deep-Learning Location Model
DNNs, Recurrent Neural Networks (RNN), and CNNs are commonly used in classification. The hidden layer neurons of a DNN are connected with all neuronal inputs from the previous layer, resulting in a large number of parameters to be learned, so it becomes difficult to obtain a suitable model when the data dimension is high. Both CNNs and RNNs are improved networks based on DNN. RNNs are mostly used to deal with time-series problems and it is often used in the field of natural-language processing. CNNs capture the relationship between local regions from a spatial perspective, and is often used in computer vision to classify images and achieve better results in image classification. Since a CNN has strong feature-extraction capabilities in image classification, this motivated us to translate the problem of indoor location into image classification. The CNN location model mainly consists of the following parts:

Convolution Layer
The convolutional layer can extract feature maps within local regions in the previous layer's feature maps with linear convolutional filters followed by nonlinear activation functions. Denote θ l i i as the i-th feature map in layer of the CNN, which is defined as: where δ is the Rectified Linear Units (ReLUs) function, b L i is the bias of the i-th feature map in layer L, S L−1 is the set of feature maps in layer L − 1 connected to the current feature map, and w L im is the convolutional kernel to generate the i-th feature map in layer L, which is the same for different m due to local weights sharing. The convolution operation can obtain the shift invariance of input data and extract robust features.

Pooling Layer
Pooling layer is a downsampling layer that downsamples the outputs of the previous convolutional layer. It can reduce computational complexity by reducing the dimension of tensors. We chose the max-pooling function, which selects the maximum value of those covered within the currently chosen pooling window.

Fully Connected Layer and Output Layer
For the fully connected layer, we utilized a basic neural network with a hidden layer to train the output data after all the convolutional and subsampling layers. Moreover, a softmax layer was employed as an output layer to calculate the output label. The softmax layer is defined as: where S i is the probability that the input data belong to the ith location, and a is the total number of location tags.

ReLUs
In order to reduce the occurrence of overfitting, the Rule layer was adopted to CNN as an activation function in the convolutional and fully connected layers. Compared to traditional neural-network activation functions, such as logic functions (logistic sigmoid), tanh, and other hyperbolic functions, Rectified Linear Units (ReLUs) function have the following advantages: Firstly, the principle of biological and related brain research shows that the information coding of biological neurons is usually scattered and sparse. Secondly, more efficient gradient descent and backpropagation that avoids gradient explosion and gradient disappearance. Finally, simplification of the calculation process, where there is no influence of other complex activation functions such as exponential function. At the same time, dispersion of activity decreases the overall computational cost of the neural network. A ReLU is defined as

Training Process
In training process, the training parameters shown in Table 2, during the training process were training epoch, batch size, and learning rate, and they were set as 20, 50, and 0.001, respectively. In addition, dropout technology was added in the fully connected layer to prevent network overfitting. The dropout parameter in the network was set to 0.5, which means that neurons in the fully connected layer were closed with a probability of 0.5, so that they did not participate in any calculations and in the update of the weights. The proposed CNN architecture is shown in Figure 5. In order to learn the location features from the HW-fingerprint, the learning process of the proposed CNN was as follows: The images were convenient for the CNN to process in its convolution and pooling layers. For each input image in the first convolutional and pooling layer, we employed 50 convolutional filters with 3 * 3 size to obtain the same number of feature maps with 24 * 24 size that could extract different characteristics. Simultaneously, the same number of feature maps with 12 * 12 size could be obtained by pooling sized 2 * 2. Then, by implementing one more convolutional and pooling layer, as shown in Figure  5, we obtained 256 feature maps sized 6 * 6. We reshaped them to a vector and filled them in the fully connected (FC) and SoftMax layers to obtain the probability that the fingerprint belonged to each region.

Experiment Setup
In order to collect the necessary evaluation data, the proposed method was deployed in Lab 505 of the Engineering Facility Building No.1 of Guangdong University of Technology. As shown in Figure 6, the laboratory area is 12.5 × 10 m, each RP was set in the center of the 3 × 3 m square area, as shown in Figure 6a with a red point, and a total of 9 RPs were set. In the test environment, 258 unknown APs were detected by our mobile devices. Thousands of RSSI samples were collected on the first day by nine RPs. Data collection was performed for half a month.
The WiFi collector application was implemented to collect surrounding WiFi information, and the program recorded the MAC, RSSI, and timestamp of every sample. Construction and training of the CNN model was based on Google's open-source deep-learning framework of TensorFlow (version 1.8) [33]. The data from the first day were used for training, and data from the subsequent days were used for testing.
In the experiment, we used the HW-fingerprint constructed by the RSSI data collected on the first day as the training data. The data were continuously collected for testing in the RPs where the training data were collected in the subsequent time. In order to compare the experiment results, we defined the accuracy of the prediction. Let NU M Di correct be the number of samples predicted to be correct on the i-th day, and NU M Di totol is the total test sample data on i-th day. Then, the predicted accuracy of the i-th day was given by the following formula:

Threshold Impact on Location
In this subsection, the effects of different thresholds of the constructed HW-fingerprint are analyzed. All measurable APs in the indoor environment were used to select the AP with an important contribution to construct the Ratio fingerprint in the offline phase. For this reason, three thresholds were set to select APs with important contributions. When the threshold was low, it is worth noting that most of the APs were selected for ratio-fingerprint construction, and some unimportant APs were also selected. In order to obtain a suitable threshold, three sets of HW-fingerprints were constructed from thresholds of 0.7, 0.8, and 0.9, respectively. Figure 7 shows the three sets of HW-fingerprint prediction accuracy with the CNN for 15 days. It can be seen that, when the threshold was set to 0.9, results were better than with the other datasets.
To further analyze the appropriate thresholds, result statistics of the three datasets are shown in Table 3. When the threshold was set to 0.9, average accuracy was significantly better than with the two other thresholds. As shown in Figure 7, in the 13th day, when the threshold was set to 0.9, accuracy was not as good as during the other days, but it was better than the other two datasets. The main reason was that, on the 13th day, there was long-term power outage in experimental area that directly caused changes in the indoor environment.

Influence of Different CNN Structures on Location
AlexNet uses very large convolution kernels, such as 11 * 11 and 5 * 5. The idea is that the larger the convolution kernel is, the larger the receptive field and the picture information seen are. That being said, a large convolution kernel can lead to a surge in computational complexity, which is not conducive to increasing model depth and reducing computational performance.
In this section, the influences of using different convolution-kernel sizes in a CNN were researched, as shown in Figure 8. When the convolution kernel was set to 3 * 3, accuracy in subsequent location prediction was better than in the case where the convolution kernel was set to 5 * 5 and 7 * 7. As shown in Table 4, when kernel size was set to 3 * 3, average accuracy was higher than in the others, and variance was lower than in the others.   In image recognition, a pooling layer is widely used in convolutional neural networks. It is used for feature-dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the fault tolerance of the model. However, we do not know whether the max-pooling layer in indoor positioning has an effect on indoor location.
In order to compare the influence of the pooling layer on positioning, we constructed a full convolutional-neural-network structure. In the constructed full convolutional neural network, the original pooling layer was removed based on the previously constructed convolutional neural network. To achieve a better results, a convolutional layer was added. As shown in Figure 9, when the max-pooling layer was added, location accuracy was significantly higher than when the max-pooling layer was not added. Figure 10 shows the AP capture rate in an RP, which is the value of Dect CR times to ALL CR num , where Dect CR times means the times that it can be detected in RP, and ALL CR num is the total number of fingerprint samples collected by the RP. It can be seen that many APs had a capture rate of less than 50%, which means that there was a lot of noise data in the HW-fingerprint, which also proved that the max-pooling layer had a filtering effect on the noise data.

Experiment Comparison of HW-Fingerprint
In order to explore the rationality of the HW-fingerprint, we tested several other common machine-learning methods. Figure 11a-c gives the predictions of KNN, SVM, and CNN, respectively, in the case with and without the HW-fingerprint. Table 5 shows the average daily prediction accuracy of KNN, SVM, and CNN in the case with and without the use of the proposed HW-fingerprint. In the case with the HW-fingerprint, prediction accuracy was 67.79%, 79.97%, and 84.17% per day by KNN, SVM, and CNN, respectively, and in the case without HW-fingerprint, average daily location accuracy was 64.39%, 71.94% and 75.13%. It can be seen that, when using our HW-fingerprint, overall prediction accuracy was better than in cases without the HW-fingerprint. The average daily location accuracy of KNN, SVM, and CNN was improved by 3.39%, 8.03%, and 9.03%, respectively.    As shown in Figure 11a, only eight out of 15 days, or only half of the predictions, were better than those without the HW-fingerprint. Here, we analyzed the reasons. One was that KNN has no learning process. When predicting, it traverses all training data to select the K-nearest samples to determine the most probable prediction. As mentioned above, the value of K has a directly influence on prediction results and in order to simulate realistic predictions, we set the K value to 1 instead of tuning the value of K for better results. Another reason is that the HW-fingerprint dimension was higher than that of the RSSI fingerprint, and the KNN calculated the sample distance. As the dimension increased, correlation between the nearest sample and the predicted sample selected according to the distance decreased.
To further analyze the impact of the proposed HW-fingerprint on indoor locations, we calculated the loss rate of the important APs selected to construct the Ratio fingerprint; loss rate was the value of UDect LR times to ALL LR num , where UDect LR times means the times that an AP was undetected in all fingerprints collected by all RPs and ALL LR num was the total number of fingerprint samples collected by all RP.s As shown in Figure 11b,c, SVM and CNN had a certain improvement in location-accuracy rate in most cases. Loss rate is shown in Table 6 and, according to Figure 11b,c, when the number of APs is lost and loss rate was high like Days 2, 7, and 16, there was still improvement in location accuracy.
In the experiment, the selected AP was from measurable APs in the experiment environment, and the important APs used to construct the Ratio fingerprint were obtained by analyzing the training data. Therefore, it is inevitable that these important APs were closed in the subsequent time period. When some APs were used to construct the Ratio fingerprint, its RSSI could not be detected, which caused many null values when building the online HW-fingerprint. As a result, the HW-fingerprint built online did not match the HW-fingerprint built in the offline phase because of the addition of these noises. In Table 7, we calculated the distance error of the three algorithms in the cases of using and not using the HW-fingerprint. Distance error was the average error of their predicted and actual positions about the online HW-fingerprint. It can be seen that positioning error was reduced when the HW-fingerprint was used in the case of using three algorithms. In Table 8, we outline statistics on the run time of the three algorithms in the cases of using and not using the HW-fingerprint. Since the training strategies of each algorithm are different, it was difficult to evaluate their training time. For example, KNN is an algorithm that does not require training. So we only counted the test time for a single sample. It can be seen that the increase of running time when using HW-fingerprint was negligible and did not have much impact on the servers.  Figure 12 and Table 9 when using the HW-fingerprint. It can be seen that the CNN was significantly better than KNN and SVM. The CNN was 4.19% and 16.37% higher than KNN and SVM in average daily location accuracy, respectively. Compared with SVM and KNN, CNN's test results showed that the deep-learning model had better performance in data-feature extraction and classification.

Discussion
As shown in the experimental section above, we conducted a series of experiments to establish the feasibility of verifying the proposed method. First, in order to verify the impact of selecting the threshold of the important contribution AP on the quality of the HW-fingerprint, we did a comparative experiment with different thresholds. The experiment results showed that, when the threshold was set to a large value, the HW-fingerprint could achieve better prediction accuracy. Second, in order to verify the influence of different CNN structures on indoor-positioning accuracy, we also carried out related comparison experiments and finally obtained a better CNN positioning model. Finally, we verified the improvement effect of the HW-fingerprint on indoor positioning with different algorithms. Of course, in the prediction of using KNN, when using HW-fingerprint the improvement of prediction accuracy was not so obvious, so we also carried out related analysis. However, in the experiments using SVM and CNN, our proposed HW-fingerprint could significantly improve the accuracy of indoor positioning. This also proved the validity and feasibility of the proposed method.

Conclusions
In this paper, in order to enhance the ability of fingerprints to express the change characteristics of indoor environments, a feature-construction method based on adding a Ratio fingerprint was proposed. We tested the HW-fingerprint in an actual environment, and the test results showed that, in the case with a HW-fingerprint compared with the case without a HW-fingerprint, the average daily improvement location accuracy of KNN, SVM and CNN increased by 3.39%, 8.03% and 9.03%, respectively. The CNN method was 4.19% and 16.37% higher than SVM and KNN in average daily location accuracy, respectively. The improvement was limited to a few days, and the reason was the AP we chose to build the Ratio fingerprint by statistical methods rather than long-term investigations.
Of course, our experiment environment was still not big enough, but in small areas, prediction results are prone to errors due to the close distance between the RPs. In future work, we will consider a larger environment and focus on the work of stable AP selection and methods to increase the positioning performance of the system.