A Robust Wi-Fi Fingerprint Positioning Algorithm Using Stacked Denoising Autoencoder and Multi-Layer Perceptron

: With the increasing demand for location-based services, Wi-Fi-based indoor positioning technology has attracted much attention in recent years because of its ubiquitous deployment and low cost. Considering that Wi-Fi signals ﬂuctuate greatly with time, extracting robust features of Wi-Fi signals is the key point to maintaining good positioning accuracy. To handle the dynamic ﬂuctuation with time and sparsity of Wi-Fi signals, we propose an SDAE (Stacked Denoising Autoencoder)-based feature extraction method, which can obtain a robust and time-independent Wi-Fi ﬁngerprint by learning the reconstruction distribution from a raw Wi-Fi signal and an artiﬁcial-noise-added Wi-Fi signal. We also leverage the strong representation ability of MLP (Multi-Layer Perceptron) to build a regression model, which maps the extracted features to the corresponding location. To fully evaluate the performance of our proposed algorithm, three datasets are applied, which represent three di ﬀ erent scenarios, namely, spacious area with time interval, no time interval, and complex area with large time interval. The experimental results conﬁrm the validity of our proposed SDAE-based feature extraction method, which can accurately reﬂect Wi-Fi signals in corresponding locations. Compared with other regression models, our proposed regression model can better map the extracted features to the target position. The average positioning error of our proposed algorithm is 4.24 m when there is a 52-day interval between training dataset and testing dataset. That conﬁrms that the proposed algorithm outperforms other state-of-the-art positioning algorithms when there is a large time interval between training dataset and testing dataset.


Introduction
Indoor positioning technology, such as Wi-Fi [1,2], magnetic [3,4], pedestrian dead reckoning [5,6] and visible light [7,8] technologies, have become increasingly more important in people's daily life, and positioning services have gradually become an indispensable mobile application.Most Wi-Fi based positioning technologies do not require deploying additional hardware because they only utilize Wi-Fi hotspots and existing wireless LANs (Wireless Local Area Networks) to obtain position estimation.At present, due to the wide deployment and availability of Wi-Fi infrastructure, Wi-Fi fingerprint-based localization has become one of the most dominant indoor positioning techniques.
There are two main types of Wi-Fi-based indoor positioning technologies: RSSI (Received Signal Strength Indicator)-based ranging positioning algorithm [9][10][11], and fingerprint-based positioning algorithm [12][13][14].The RSSI-based ranging positioning algorithm [11] usually adopts the received Wi-Fi signal to estimate the distance between the target (its location is unknown) and the access point (its location is known) using the wireless radio signal propagation model, and then estimates the target position using trilateration or multilateration methods.The fingerprint-based positioning algorithm [14] adopts the signal matching algorithm to estimate the user location.It first collects environmental Wi-Fi signals and constructs a Wi-Fi fingerprint database during the offline phase.During the online positioning phase, the fingerprint-based positioning algorithm compares the current Wi-Fi observation with the recorded fingerprint in the database to obtain the target position using the optimum matching criterion.Compared with the fingerprint-based positioning algorithm, the RSSI-based ranging positioning algorithm struggles to meet high positioning accuracy due to the complex multipath and dynamic characteristics of signal propagation in indoor environments [15,16].
Current Wi-Fi fingerprint-based indoor localization mainly adopts either deterministic or probabilistic techniques [17][18][19][20].The deterministic Wi-Fi positioning methods employ different deterministic machine learning algorithms to estimate the target location based on the shortest distance (such as Euclidean distance) criterion, such as KNN (K-NearestNeighbor) [21][22][23], linear discriminant analysis [24], and SVM (Support Vector Machine) [25].The probabilistic Wi-Fi positioning methods use the posterior probability calculated by probabilistic inference methods [26,27] to estimate the target location.By calculating the conditional posterior probability using Bayesian estimation method in a previous study [19], the target location was obtained based on the maximum posterior probability criterion.
Though much work has been done on the Wi-Fi fingerprint-based indoor localization using various traditional machine learning methods, the localization accuracy still does not meet the high-accuracy and robust requirement of indoor location-based services [28].With the rapid development of deep learning technology, some researchers have attempted to use deep learning methods in Wi-Fi positioning.One important advantage of the deep-learning-based Wi-Fi positioning method is that it can automatically filter the raw Wi-Fi observation data, extract reliable features, and build internal representations from dynamic Wi-Fi signals without additional devices or human intervention [29].To handle the sparseness and volatility problem of the Wi-Fi signal, Khatab [30], Xu [31], and Kim [32] tried using the AE (Autoencoder) method to learn the feature representation from the raw Wi-Fi signal.Kim [33] and Nowicki [34] utilized a SAE (Stacked Autoencoder) to extract features for buildings and floors identification, and obtained a relatively desirable positioning performance on the UJIIndoorLoc dataset (http://indoorlocplatform.uji.es/databases/get/1/).By transforming the original Wi-Fi signal into an image form, Wang [35], Mao [36], and Shao [37] introduced the convolutional neural network to indoor positioning.Wang [38] and Hsieh [39] attempted to construct a recurrent neural network for indoor positioning.
Because the above-mentioned positioning methods take the target positioning as a classification problem, the location estimation results are discrete and the positioning accuracy relies on the density of collected fingerprints.To improve the smoothness and robustness of positioning results, some regression positioning algorithms are applied, such as support vector regression [40,41] and Gaussian process regression [42].
Though the previously-described positioning algorithms can obtain good positioning accuracy when there is a short time interval between the time of collecting training data and the time of collecting testing data, the positioning accuracy of these algorithms declines dramatically when the collection time of training data and testing data is separated by a large time interval.To maintain good positioning performance, these algorithms often need to periodically collect new samples and train the positioning model, which is time-consuming and labor-intensive.To solve the dynamic fluctuation with time of Wi-Fi signals, this paper proposes a robust Wi-Fi fingerprint positioning algorithm using a SDAE (Stacked Denoising Autoencoder) [43] and a MLP (Multi-Layer Perceptron) [44].The SDAE is used to extract time-independent features, and the MLP is used to find well-behaved mapping functions by constructing a reasonable regression model.In our proposed algorithm, we train the SDAE and MLP using the training dataset in a certain period.
The main contributions of this paper are summarized as follows: • An SDAE-based robust feature extraction method is proposed.To extract the time-independent and robust features of the raw Wi-Fi data by using the SDAE, we design a deep neural network structure with three hidden layers, and the input data of each hidden layer adds reasonable noise.
We use the layer-by-layer greedy training method to train the SDAE model.

•
An MLP-based regression positioning method is proposed.By taking advantage of the Universal Approximation Theorem [45] and the fast training speed of MLP, we build a MLP-based regression model with nine hidden layers to obtain a good mapping function.

•
Our proposed algorithm is fully evaluated using three datasets, which represent three different classical scenarios.We also compare our proposed localization algorithm with other localization algorithms.Extensive experimental results demonstrate that our proposed algorithm obviously outperforms the comparative localization algorithms when the Wi-Fi data covers longer time intervals.

System Overview
Our proposed positioning algorithm is comprised of four main modules, i.e., the data collection module, preprocessing module, feature extraction module, and MLP-based regression positioning module.The overall structure of our proposed positioning algorithm is shown in Figure 1.The data collection module collects Wi-Fi data using mobile devices.The preprocessing module is responsible for normalizing the Wi-Fi data, and then constructing the fingerprint database.The feature extraction module extracts robust and time-independent features using the SDAE method.The MLP-based regression positioning module employs the extracted features to estimate position.

Data Preprocessing
The data preprocessing aims to produce reasonable input for the feature extraction module by handling the raw Wi-Fi data.There are three steps in this phase.Firstly, we compensate the missing observation for those locations without collecting Wi-Fi signals.Considering the common range of raw RSSI observation is −110 dBm to 0 dBm, we set the missing RSSI value of the Wi-Fi signal to −110 dBm.After handling all the missing Wi-Fi observations, the range of the whole RSSI data is (−110 dB, 0 dB).Then, we normalize the raw RSSI data to the range between 0 and 1 using Equation ( 1), and obtain normalized data.By adopting this normalization operation, the distribution of normalized Wi-Fi data is unbiased and low variance.Directly using the raw asymmetry Wi-Fi observation data may lead to the network model training failure.
where rssi i represents the RSSI value of the i-th Wi-Fi, min_rssi represents the smallest RSSI value, and max_rssi represents the largest RSSI value.
Finally, we construct a fingerprint database, as shown in Equation ( 2), using the normalized Wi-Fi data.
where n represents the number of features, m represents the number of samples, x j i represents the j-th feature of the i-th sample, (target x , target y ) represents the coordinate of the target position, target xi , target yi represents the coordinate of the i-th sample.

Feature Extraction Based on the SDAE
The RSSI value of the Wi-Fi signal in the online stage may deviate from the initial fingerprint firstly collected in the offline stage.Figures 2 and 3 correspond to a teaching building and office building, respectively.As shown in Figures 2a and 3a, the two distributions of RSSI measurement sequences are different at different times in a specific location.The averaged RSSI value changes by about 9 dBm when there is a 10-day interval, as Figure 2b shows.The averaged RSSI value changes by about 3.3 dBm when there is a 50-day interval, as shown in Figure 3b.A previous study [18] also reported that the Wi-Fi signal fluctuates over time

Feature Extraction Based on the SDAE
The RSSI value of the Wi-Fi signal in the online stage may deviate from the initial fingerprint firstly collected in the offline stage.Figure 2 and Figure 3 correspond to a teaching building and office building, respectively.As shown in Figure 2(a) and Figure 3(a), the two distributions of RSSI measurement sequences are different at different times in a specific location.The averaged RSSI value changes by about 9dBm when there is a 10-day interval, as Figure 2(b) shows.The averaged RSSI value changes by about 3.3dBm when there is a 50-day interval, as shown in Figure 3(b).A previous study [18] also reported that the Wi-Fi signal fluctuates over time

Feature Extraction Based on the SDAE
The RSSI value of the Wi-Fi signal in the online stage may deviate from the initial fingerprint firstly collected in the offline stage.Figure 2 and Figure 3 correspond to a teaching building and office building, respectively.As shown in Figure 2(a) and Figure 3(a), the two distributions of RSSI measurement sequences are different at different times in a specific location.The averaged RSSI value changes by about 9dBm when there is a 10-day interval, as Figure 2(b) shows.The averaged RSSI value changes by about 3.3dBm when there is a 50-day interval, as shown in Figure 3(b).A previous study [18] also reported that the Wi-Fi signal fluctuates over time The Wi-Fi signal fluctuation with time results in a negative influence on the positioning performance.To enhance the accuracy and robustness of positioning, we employ the SDAE to extract the robust and time-independent features from the raw Wi-Fi signal observation.
We think that the fundamentals that the reason the SDAE-based Wi-Fi feature extraction can enhance the accuracy and robustness of positioning may lay below: (1) the SDAE adds noise to the original Wi-Fi data, enabling it to approach the distribution of the new Wi-Fi data, which has a large time interval from the original Wi-Fi data; (2) the SDAE reconstructs the original Wi-Fi data from the corrupted Wi-Fi data and extracts features, and the extracted features represent the essential distribution of the Wi-Fi signal.An experimental result shown in Figure 4 demonstrates that the correlation of RSSI samples of the original Wi-Fi data and the new-collected Wi-Fi data (there is a 10-day interval) at the same position is higher using the SDAE-based feature extraction than that without using the SDAE-based feature extraction.
We think that the fundamentals that the reason the SDAE-based Wi-Fi feature extraction can enhance the accuracy and robustness of positioning may lay below: (1) the SDAE adds noise to the original Wi-Fi data, enabling it to approach the distribution of the new Wi-Fi data, which has a large time interval from the original Wi-Fi data; (2) the SDAE reconstructs the original Wi-Fi data from the corrupted Wi-Fi data and extracts features, and the extracted features represent the essential distribution of the Wi-Fi signal.An experimental result shown in Figure 4 demonstrates that the correlation of RSSI samples of the original Wi-Fi data and the new-collected Wi-Fi data (there is a 10day interval) at the same position is higher using the SDAE-based feature extraction than that without using the SDAE-based feature extraction.The neural network structure and parameters of the proposed feature extraction method are shown in Figure 5.These consist of three stacked DAEs (Denoising Autoencoders) [46], and each DAE (Denoising Autoencoder) includes three parts, namely, the noise-added layer, encoder layer, and decoder layer.The noise-added layer obtains corrupted Wi-Fi data by adding masking noise to the original Wi-Fi data.In the first DAE, we set the noise parameter to 0.4, which randomly chooses neurons to drop out and removes them from the input layer temporarily.We produce a random seed for the disconnection operation, so the dropped neurons are definite.Noise parameters of the second DAE and the third DAE are 0.5 and 0.6, respectively.The encoder layer maps the corrupted Wi-Fi data into hidden representation, and the neurons of the encoder layer in each DAE are 256, 128, and 64, respectively.Finally, we extract 64 robust features using the SDAE.To increase nonlinearity, Relu [47] is employed in each encoder layer.The decoder layer maps the hidden representation back to a reconstruction of the original Wi-Fi data.
More specifically, the red rectangle part in Figure 5 is the second DAE in the SDAE.The dimension of the original input x is 256, and the Dropout layer Drop(•) obtains noise_x by adding The neural network structure and parameters of the proposed feature extraction method are shown in Figure 5.These consist of three stacked DAEs (Denoising Autoencoders) [46], and each DAE (Denoising Autoencoder) includes three parts, namely, the noise-added layer, encoder layer, and decoder layer.The noise-added layer obtains corrupted Wi-Fi data by adding masking noise to the original Wi-Fi data.In the first DAE, we set the noise parameter to 0.4, which randomly chooses neurons to drop out and removes them from the input layer temporarily.We produce a random seed for the disconnection operation, so the dropped neurons are definite.Noise parameters of the second DAE and the third DAE are 0.5 and 0.6, respectively.The encoder layer maps the corrupted Wi-Fi data into hidden representation, and the neurons of the encoder layer in each DAE are 256, 128, and 64, respectively.Finally, we extract 64 robust features using the SDAE.To increase nonlinearity, Relu [47] is employed in each encoder layer.The decoder layer maps the hidden representation back to a reconstruction of the original Wi-Fi data.
More specifically, the red rectangle part in Figure 5 is the second DAE in the SDAE.The dimension of the original input x is 256, and the Dropout layer Drop(•) obtains noise_x by adding masking noise to x.The Encoder layer Enc(•) obtains Enc_x by encoding noise_x.The Decoder layer Dec(•) obtains x, whose dimension becomes 256 again.The DAE minimizes the reconstruction error L DAE , as Equation (7) shows, by training the Encoder and Decoder.
The SDAE is trained by the layer-by-layer greedy training method, in which the output of a DAE in the lower layer is used as the input of the DAE in the higher layer, and the SDAE completes the training task until all DAEs in the SDAE are trained.The Encoders and Decoders in the SDAE are trained to minimize the reconstruction error between the uncorrupted Wi-Fi data and reconstructed Wi-Fi data.The SDAE can extract more essential and robust features because it reconstructs the original Wi-Fi data even with the presence of high noise levels.

Regression Model Using MLP
To avoid dense fingerprint collection and maintain training features for each location grid, we employ a MLP-based regression model to estimate target location, which can improve smoothness and robustness of positioning results.
The MLP-based regression model is constructed by adopting the Universal Approximation Theorem [45], which possesses fast training speed.MLP defines a mapping function, as shown in Equation (8), and obtains the best function approximation by learning the value of the parameter θ.
where y is target position, x is the Wi-Fi sample, and θ represent weight parameters in MLP.
The neural network structure and parameters of the proposed MLP-based regression model are shown in Figure 6.The neurons of the input layer in the network structure are set to 64, which are the features extracted by the SDAE algorithm.Our designed network structure includes nine hidden layers, and the neurons are set to 128, 256, 512, 256, 128, 64, 32, 16, and 8, respectively.To increase nonlinearity, we use the Tanh as the activation function for each hidden layer.We adopt the BatchNormalization [48] layer between the hidden layers because this layer makes the input of each hidden layer have the same distribution, which can speed up the convergence process.Furthermore, it plays a regularization role and can mitigate the overfitting problem.Since the target value of the mapping function in MLP is the position (x, y), the neurons in the output layer are set to 2. We exploit the sigmoid function as the activation function of the output layer because the target value in the dataset has been normalized and the sigmoid function maps the output of the last hidden layer to (0, 1).There are nine mixing layers (each mixing layer consists of a Dense layer and a BatchNormalization layer) in our proposed network structure, and the MLP network can approximate a function with arbitrary precision when the network structure contains enough hidden neurons.The hidden layer weights are updated by minimizing the loss function L regression using the back-propagation algorithm, and the proposed network structure in this module is determined according to the loss function value of the validation dataset.
where M represents the number of samples, y i represents the true position of the i-th sample, f (x i ) represents the prediction position of the i-th sample.Our proposed algorithm is implemented using Keras.Table 1 summarizes the hyperparameter values in our proposed algorithm.Our proposed algorithm is implemented using Keras.Table 1 summarizes the hyperparameter values in our proposed algorithm.Algorithm 1 (As Figure 7 shows) describes the general process of our proposed robust Wi-Fi fingerprint positioning algorithm using SDAE and MLP.

Tree-Fusion-Based Regression Model
Recently, tree models (such as XGBoost and LightGBM) [51][52][53][54] have been widely used in various problems and have achieved good performances.Inspired by this idea, in this section, we construct and implement a tree-fusion-based regression model and use it as a localization comparison.
The overall framework of our proposed tree-fusion-based regression model is shown in Figure 8.

Experiments and Evaluation
In this section, we adopt three datasets to evaluate our proposed algorithm, and these datasets represent three different typical scenarios, namely spacious area (i.e., teaching building) with time interval, complex area (office building) without time interval, and complex area (office building) with time interval.Several experiments are conducted to evaluate our proposed positioning algorithm.We also compare our proposed algorithm with other state-of-the-art localization algorithms.

Datasets
Besides the public UJIIndoorLoc dataset (named Dataset1 in this paper), we also collect Wi-Fi signals from 20 sampling points in our laboratory area.The distance between adjacent sampling points is 5 m or more.We collect Wi-Fi signals three times in these locations, which are recorded as the training dataset, validation dataset, and testing dataset, respectively.The sampling durations of each sampling point in the training dataset, validation dataset, and testing dataset are 12 s, 9 s, and 6 s, respectively.The collection interval of the training dataset and validation dataset is about 13 min.The collection interval of the validation dataset and testing dataset is 6 min.There is no time interval between the training dataset and testing dataset.This dataset is recorded as Dataset2.The sampling area is approximate 40 × 30 m 2 .The positions of sampling points are shown in Figure 9.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 28 interval, complex area (office building) without time interval, and complex area (office building) with time interval.Several experiments are conducted to evaluate our proposed positioning algorithm.We also compare our proposed algorithm with other state-of-the-art localization algorithms.We collect Wi-Fi signals from 57 sampling points in the laboratory area.The distance between sampling points is 5 m or more.Firstly, we collect Wi-Fi data twice in these locations, which are record as the training dataset and validation dataset, respectively.The sampling durations of each sampling point in the training dataset and validation dataset are 14 s and 9 s, respectively.For the testing dataset, we collect three sub-datasets, and record them the as sub-test1, sub-test2, and sub-test3, respectively.The sampling durations of these sub-datasets are 6 s.The collection intervals between these sub-datasets and the training dataset are one day, eleven days, and fifty-two days, respectively, and all sub-datasets are collected at 3 pm.The sampling area is approximate 40 × 60 .This dataset is recorded as Dataset3.The position of sampling points is shown in Figure 10.We collect Wi-Fi signals from 57 sampling points in the laboratory area.The distance between sampling points is 5 m or more.Firstly, we collect Wi-Fi data twice in these locations, which are record as the training dataset and validation dataset, respectively.The sampling durations of each sampling point in the training dataset and validation dataset are 14 s and 9 s, respectively.For the testing dataset, we collect three sub-datasets, and record them the as sub-test1, sub-test2, and sub-test3, respectively.The sampling durations of these sub-datasets are 6 s.The collection intervals between these sub-datasets and the training dataset are one day, eleven days, and fifty-two days, respectively, and all sub-datasets are collected at 3 pm.The sampling area is approximate 40 × 60 m 2 .This dataset is recorded as Dataset3.The position of sampling points is shown in Figure 10.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 28 interval, complex area (office building) without time interval, and complex area (office building) with time interval.Several experiments are conducted to evaluate our proposed positioning algorithm.We also compare our proposed algorithm with other state-of-the-art localization algorithms.

Datasets
Besides the public UJIIndoorLoc dataset (named Dataset1 in this paper), we also collect Wi-Fi signals from 20 sampling points in our laboratory area.The distance between adjacent sampling points is 5 m or more.We collect Wi-Fi signals three times in these locations, which are recorded as the training dataset, validation dataset, and testing dataset, respectively.The sampling durations of each sampling point in the training dataset, validation dataset, and testing dataset are 12 s, 9 s, and 6 s, respectively.The collection interval of the training dataset and validation dataset is about 13 min.The collection interval of the validation dataset and testing dataset is 6 min.There is no time interval between the training dataset and testing dataset.This dataset is recorded as Dataset2.The sampling area is approximate 40 × 30 . The positions of sampling points are shown in Figure 9.We collect Wi-Fi signals from 57 sampling points in the laboratory area.The distance between sampling points is 5 m or more.Firstly, we collect Wi-Fi data twice in these locations, which are record as the training dataset and validation dataset, respectively.The sampling durations of each sampling point in the training dataset and validation dataset are 14 s and 9 s, respectively.For the testing dataset, we collect three sub-datasets, and record them the as sub-test1, sub-test2, and sub-test3, respectively.The sampling durations of these sub-datasets are 6 s.The collection intervals between these sub-datasets and the training dataset are one day, eleven days, and fifty-two days, respectively, and all sub-datasets are collected at 3 pm.The sampling area is approximate 40 × 60 .This dataset is recorded as Dataset3.The position of sampling points is shown in Figure 10.The sample format of these three datasets at a certain sampling point (x, y) is as shown in Equation (12).s = rssi 0 , rssi 1 , . . ., rssi n , x, y (12) where n represents the number of Wi-Fi AP, and (x, y) represents the position of the sample s, rssi i represents RSSI value of the i-th Wi-Fi AP.
The environment of the experimental area is shown in Figure 11.
Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 28 The sample format of these three datasets at a certain sampling point (x, y) is as shown in Equation (12).
where n represents the number of Wi-Fi AP, and (x, y) represents the position of the sample s, represents RSSI value of the i-th Wi-Fi AP.
The environment of the experimental area is shown in Figure 11.

Effect of SDAE Feature Extraction
To    The sample format of these three datasets at a certain sampling point (x, y) is as shown in Equation (12).
where n represents the number of Wi-Fi AP, and (x, y) represents the position of the sample s, represents RSSI value of the i-th Wi-Fi AP.
The environment of the experimental area is shown in Figure 11.

Effect of SDAE Feature Extraction
To    Influence of different numbers of hidden layers in SDAE network structure In addition to the SDAE structure proposed in this paper (256-128-64), we also build two comparative SDAE network structures, i.e., one SDAE network structure only contains one hidden layer with 256 neurons, and the other SDAE network structure contains two hidden layers (256-128).The positioning accuracy is shown in Figure 13.We can see from Figure 13 that using more hidden layers can obtain higher positioning accuracy, which demonstrates that adopting more a complex SDAE network structure can better represent the robust and time-independent Wi-Fi fingerprint.
Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 28  Influence of different numbers of hidden layers in SDAE network structure In addition to the SDAE structure proposed in this paper (256-128-64), we also build two comparative SDAE network structures, i.e., one SDAE network structure only contains one hidden layer with 256 neurons, and the other SDAE network structure contains two hidden layers (256-128).The positioning accuracy is shown in Figure 13.We can see from Figure 13 that using more hidden layers can obtain higher positioning accuracy, which demonstrates that adopting more a complex SDAE network structure can better represent the robust and time-independent Wi-Fi fingerprint. Influence of different numbers of hidden layers in SDAE network structure In addition to the SDAE structure proposed in this paper (256-128-64), we also build two comparative SDAE network structures, i.e., one SDAE network structure only contains one hidden layer with 256 neurons, and the other SDAE network structure contains two hidden layers (256-128).The positioning accuracy is shown in Figure 13.We can see from Figure 13 that using more hidden layers can obtain higher positioning accuracy, which demonstrates that adopting more a complex SDAE network structure can better represent the robust and time-independent Wi-Fi fingerprint.(2) Performance using different activation functions From Figure 15, we can obviously see that using the Tanh activation function outperforms the Linear and Relu activation functions.The MLP-based regression model obtains an average positioning error of 5.64 m when using the Tanh activation function, which is 28.1% less than using Linear activation function and 17.3% less than using Relu activation function, respectively.(3) Performance using different MLP optimizers In addition to RMSprop used in our proposed algorithm, we also try using other optimizers, such as Adamax, Adam, Nadam, Adadelta, Adagrad, and SGD.For each optimizer, we optimize the learning rate that corresponds to the best positioning performance.The positioning accuracy using these optimizers is shown in Figure 16.The MLP using RMSprop obtains the best positioning accuracy.As shown in Figure 16a, the localization error using the MLP-based regression model for 80% of the testing dataset falls between 8 m and 9 m, except the model using Nadam, in which the model using RMSprop obtains approximately 8 m.The MLP-based regression model using RMSprop obtains an average positioning error of 5.64 m, which is 26.6% less than using Nadam, which obtains the worst positioning performance and 8.3% less than using the SGD, which obtains the highest positioning accuracy, except using RMSprop, as shown in Figure 16b.(3) Performance using different MLP optimizers In addition to RMSprop used in our proposed algorithm, we also try using other optimizers, such as Adamax, Adam, Nadam, Adadelta, Adagrad, and SGD.For each optimizer, we optimize the learning rate that corresponds to the best positioning performance.The positioning accuracy using these optimizers is shown in Figure 16.The MLP using RMSprop obtains the best positioning accuracy.As shown in Figure 16a, the localization error using the MLP-based regression model for 80% of the testing dataset falls between 8 m and 9 m, except the model using Nadam, in which the model using RMSprop obtains approximately 8 m.The MLP-based regression model using RMSprop obtains an average positioning error of 5.64 m, which is 26.6% less than using Nadam, which obtains the worst positioning performance and 8.3% less than using the SGD, which obtains the highest positioning accuracy, except using RMSprop, as shown in Figure 16b.(3) Performance using different MLP optimizers In addition to RMSprop used in our proposed algorithm, we also try using other optimizers, such as Adamax, Adam, Nadam, Adadelta, Adagrad, and SGD.For each optimizer, we optimize the learning rate that corresponds to the best positioning performance.The positioning accuracy using these optimizers is shown in Figure 16.The MLP using RMSprop obtains the best positioning accuracy.As shown in Figure 16a, the localization error using the MLP-based regression model for 80% of the testing dataset falls between 8 m and 9 m, except the model using Nadam, in which the model using RMSprop obtains approximately 8 m.The MLP-based regression model using RMSprop obtains an average positioning error of 5.64 m, which is 26.6% less than using Nadam, which obtains the worst positioning performance and 8.3% less than using the SGD, which obtains the highest positioning accuracy, except using RMSprop, as shown in Figure 16b.(5) Performance using different learning rates in optimizer RMSprop In addition to the learning rate (lr = 0.0008) used in this paper, we try employing 0.0004, 0.0006, 0.0007, 0.0009, 0.001, and 0.002.The positioning performance influence using different learning rates is shown in Figure 18  In the following experiments, we will use above-mentioned optimal parameters to further evaluate our proposed algorithm.(5) Performance using different learning rates in optimizer RMSprop In addition to the learning rate (lr = 0.0008) used in this paper, we try employing 0.0004, 0.0006, 0.0007, 0.0009, 0.001, and 0.002.The positioning performance influence using different learning rates is shown in Figure 18   (5) Performance using different learning rates in optimizer RMSprop In addition to the learning rate (lr = 0.0008) used in this paper, we try employing 0.0004, 0.0006, 0.0007, 0.0009, 0.001, and 0.002.The positioning performance influence using different learning rates is shown in Figure 18  In the following experiments, we will use above-mentioned optimal parameters to further evaluate our proposed algorithm.In the following experiments, we will use above-mentioned optimal parameters to further evaluate our proposed algorithm.

The Performance of Our Proposed Algorithm under Different Sample Densities
In this section, we evaluate the positioning performance of our proposed algorithm under different sample densities.We collect four datasets with different sampling densities.The distances between neighboring sampling points in the four datasets are 3 m, 5 m, 7 m, and 10 m, respectively.Each dataset contains training data, validation data and testing data.The positioning accuracy of our proposed algorithm on the four datasets is shown in Figure 19 and Table 2.In this section, we evaluate the positioning performance of our proposed algorithm under different sample densities.We collect four datasets with different sampling densities.The distances between neighboring sampling points in the four datasets are 3 m, 5 m, 7 m, and 10 m, respectively.Each dataset contains training data, validation data and testing data.The positioning accuracy of our proposed algorithm on the four datasets is shown in Figure 19 and Table 2. From Figure 19 and Table 2, we find that the positioning accuracy of our proposed algorithm decreases with the sampling density decrease.According to Table 2 the average positioning error is 2.84 m when the distance between sampling points is 3m.The average positioning error increases to 5.63 m when the distance between sampling points is 10 m.In general, with the increase of neighboring sample distance, the localization errors of our proposed algorithm increase gradually.

The Positioning Performance on Three Datasets
The positioning performance of our proposed algorithm on Dataset1, Dataset2, and Dataset3 is shown in Figure 20 and Table 3. From Figure 19 and Table 2, we find that the positioning accuracy of our proposed algorithm decreases with the sampling density decrease.According to Table 2 the average positioning error is 2.84 m when the distance between sampling points is 3 m.The average positioning error increases to 5.63 m when the distance between sampling points is 10 m.In general, with the increase of neighboring sample distance, the localization errors of our proposed algorithm increase gradually.

The Positioning Performance on Three Datasets
The positioning performance of our proposed algorithm on Dataset1, Dataset2, and Dataset3 is shown in Figure 20 and Table     According to Figure 20, the proposed algorithm achieves about 8 m localization error for 80% of the testing dataset on Dataset1.The proposed algorithm achieves about 2.5 m and 5.2 m localization error for 50% and 90% of the testing dataset on Dataset2, respectively.On Dataset3, the proposed algorithm produces about 4.7 m localization error, 5.8 m localization error, and 6 m localization error for 80% of sub-test1, sub-test2, and sub-test3, respectively.From Table 3, we can find that using our proposed algorithm can obtain 5.64 m and 3.05 m of average positioning errors on Dataset1 and Dataset2, respectively.On Dataset3, the average positioning error of our presented method is 4.24 m when there is a 52-day collection interval between training dataset and testing dataset.

Performance Comparison with the Tree-Fusion-Based Regression Model
We compare the positioning accuracy of our proposed algorithm with the tree-fusion-based regression model described in Section 2.5 with the above-mentioned datasets.The fusion weight  According to Figure 20, the proposed algorithm achieves about 8 m localization error for 80% of the testing dataset on Dataset1.The proposed algorithm achieves about 2.5 m and 5.2 m localization error for 50% and 90% of the testing dataset on Dataset2, respectively.On Dataset3, the proposed algorithm produces about 4.7 m localization error, 5.8 m localization error, and 6 m localization error for 80% of sub-test1, sub-test2, and sub-test3, respectively.From Table 3, we can find that using our proposed algorithm can obtain 5.64 m and 3.05 m of average positioning errors on Dataset1 and Dataset2, respectively.On Dataset3, the average positioning error of our presented method is 4.24 m when there is a 52-day collection interval between training dataset and testing dataset.

Performance Comparison with the Tree-Fusion-Based Regression Model
We compare the positioning accuracy of our proposed algorithm with the tree-fusion-based regression model described in Section 2.5 with the above-mentioned datasets.The fusion weight  According to Figure 20, the proposed algorithm achieves about 8 m localization error for 80% of the testing dataset on Dataset1.The proposed algorithm achieves about 2.5 m and 5.2 m localization error for 50% and 90% of the testing dataset on Dataset2, respectively.On Dataset3, the proposed algorithm produces about 4.7 m localization error, 5.8 m localization error, and 6 m localization error for 80% of sub-test1, sub-test2, and sub-test3, respectively.From Table 3, we can find that using our proposed algorithm can obtain 5.64 m and 3.05 m of average positioning errors on Dataset1 and Dataset2, respectively.On Dataset3, the average positioning error of our presented method is 4.24 m when there is a 52-day collection interval between training dataset and testing dataset.

Performance Comparison with the Tree-Fusion-Based Regression Model
We compare the positioning accuracy of our proposed algorithm with the tree-fusion-based regression model described in Section 2.5 with the above-mentioned datasets.The fusion weight parameters of w1, w2, and w3 corresponding to the XGBoost, LightGBM, and Stacking model are set to 0.15, 0.15, and 0.7, respectively, based on our optimal experiments.

Dataset1
We sort Dataset1 according to the data collection timestamp, and divide it into training dataset, validation dataset, and testing dataset.The collection time of the training dataset is 30 May 2013, and the collection time of the testing dataset is 10 June 2013.The positioning performance on Dataset1 is shown in Figure 21 and Table 4.Our proposed algorithm can obtain 5.64 m of average positioning error, as shown in Table 4, which is 10.8% less than the tree-fusion-based regression model.Our proposed algorithm and the tree-fusion-based regression model achieve 8 m localization error and 9 m localization error (shown in Figure 21) for 80% of the testing dataset, respectively, i.e., the localization accuracy using the tree-fusion-based regression model is 1 m worse than that using our proposed algorithm for 80% of the testing dataset.The positioning performance on Dataset2 using the proposed algorithm and the tree-fusionbased regression model is shown in Figure 22 and Table 5.Both the proposed algorithm and the treefusion-based regression model achieve about 4.8 m localization error for 80% of the testing dataset, as shown in Figure 22.Furthermore, the two algorithms produce about 3 m of average positioning error, as shown in Table 5. Considering that there is no time interval between training dataset and testing dataset in Dataset2, the fingerprint features corresponding to a specific location between the training dataset and the testing dataset are very similar, both the proposed algorithm and the treefusion-based regression model can accurately estimate target locations.

Dataset2
The positioning performance on Dataset2 using the proposed algorithm and the tree-fusion-based regression model is shown in Figure 22 and Table 5.Both the proposed algorithm and the tree-fusion-based regression model achieve about 4.8 m localization error for 80% of the testing dataset, as shown in Figure 22.Furthermore, the two algorithms produce about 3 m of average positioning error, as shown in Table 5. Considering that there is no time interval between training dataset and testing dataset in Dataset2, the fingerprint features corresponding to a specific location between the training dataset and the testing dataset are very similar, both the proposed algorithm and the tree-fusion-based regression model can accurately estimate target locations.
fusion-based regression model achieve about 4.8 m localization error for 80% of the testing dataset, as shown in Figure 22.Furthermore, the two algorithms produce about 3 m of average positioning error, as shown in Table 5. Considering that there is no time interval between training dataset and testing dataset in Dataset2, the fingerprint features corresponding to a specific location between the training dataset and the testing dataset are very similar, both the proposed algorithm and the treefusion-based regression model can accurately estimate target locations.

Dataset3
In Dataset3, positioning algorithms are performed on the testing dataset, which contains three sub-datasets (sub-test1, sub-test2, and sub-test3).The positioning performance is shown in Figure 23 and Table 6.As shown in Figure 23, the proposed algorithm and the tree-fusion-based regression model achieve 6 m and 7.2 m localization error for 80% of the sub-test3 (there is a 52-day long interval), respectively.As shown in Table 6, our proposed algorithm outperforms the tree-fusion-based regression model on all three sub-datasets.For instance, our proposed algorithm obtains 4.24 m of average positioning error on sub-test3, which is 14.5% less than the tree-fusion-based regression model.These experimental results confirm that our proposed algorithm can obtain better positioning performance than the tree-fusion-based regression model when there is a long time interval.In Dataset3, positioning algorithms are performed on the testing dataset, which contains three sub-datasets (sub-test1, sub-test2, and sub-test3).The positioning performance is shown in Figure 23 and Table 6.As shown in Figure 23, the proposed algorithm and the tree-fusion-based regression model achieve 6 m and 7.2 m localization error for 80% of the sub-test3 (there is a 52-day long interval), respectively.As shown in Table 6, our proposed algorithm outperforms the tree-fusionbased regression model on all three sub-datasets.For instance, our proposed algorithm obtains 4.24 m of average positioning error on sub-test3, which is 14.5% less than the tree-fusion-based regression model.These experimental results confirm that our proposed algorithm can obtain better positioning performance than the tree-fusion-based regression model when there is a long time interval.The experimental results conducted on these three datasets demonstrate that when there is no time interval between training dataset and testing dataset, both the tree-fusion-based model and our proposed algorithm can achieve similar positioning performance.However, when there is a large time interval between training dataset and testing dataset, our proposed algorithm can obtain better positioning performance than the tree-fusion-based regression model.Therefore, the proposed algorithm is more robust when there is a large time interval between training dataset and testing dataset, which confirms that the MLP-based regression model can find good mapping between the Wi-Fi fingerprints and locations based on the strong representation of MLP.

Performance Comparison with Related Methods
We also compare the localization accuracy of our proposed algorithm with other state-of-the-art methods (Khatab [30], Xu [31]).Khatab introduced the AE to extract Wi-Fi features, and then used the ELM (Extreme Learning Machine) for indoor positioning.Xu also adopted the AE method for feature extraction, and then used the MLP for indoor positioning.

Performance Comparison with Khatab's Method
Considering that the dataset used in this paper is different from the dataset used in Khatab's paper, we do not use the same parameter values as those used in Khatab's paper.This paper reconstructs the same network structure of Khatab's paper.Then, we train this reconstructed model using our dataset.We conduct comparative experiments on Dataset1, Dataset2, and Dataset3.The experimental results are shown in Figure 24 and Table 7. Considering that the dataset used in this paper is different from the dataset used in Khatab's paper, we do not use the same parameter values as those used in Khatab's paper.This paper reconstructs the same network structure of Khatab's paper.Then, we train this reconstructed model using our dataset.We conduct comparative experiments on Dataset1, Dataset2, and Dataset3.The experimental results are shown in Figure 24 and Table 7.   From Table 7, we can find that the average positioning errors of our proposed algorithm and Khatab's method on the Dataset2 are similar (3.05 m and 3.14 m, respectively).This similar positioning performance is also illustrated in Figure 24b (both the proposed algorithm and Khatab achieve about 6 m localization error for 90% of the testing dataset on Dataset2).This demonstrates that when there is no time interval between training dataset and testing dataset, both algorithms can achieve good and similar positioning performance.However, from Figure 24a,c and Table 7, we can see that the positioning performance using our proposed algorithm is better on Dataset1 and Dataset3.Our proposed algorithm produces 5.64 m of average positioning error on Dataset1, which is 30.7% less than Khatab.Our proposed algorithm produces the average positioning error of 4.24 m on sub-test3 of Dataset3, which is 24.3% less than Khatab.These experimental results confirm that our proposed algorithm is more robust when there is a large time interval between the training dataset and testing dataset.The reason why our proposed algorithm can obtain better positioning accuracy when there is a large time interval between the training dataset and testing dataset is as follows-the output features extracted by AE may be a simple copy of the input layer, which does not extract more essential features from the Wi-Fi signal.Furthermore, the weights of hidden layers in ELM are no longer updated after determining their weights by solving the equation set.Differing from ELM, the MLP updates the weights of hidden layers until the loss function becomes steady.Therefore, the MLP can obtain a better mapping function.

Performance Comparison with Xu's Method
Similar to the 3.5.1,we also employ our three datasets to evaluate the performance of Xu's method.We also reconstruct the network structure of Xu's paper.The comparative experiments on Dataset1, Dataset2, and Dataset3 are conducted and the experimental results are shown in Figure 25 and Table 8.
follows-the output features extracted by AE may be a simple copy of the input layer, which does not extract more essential features from the Wi-Fi signal.Furthermore, the weights of hidden layers in ELM are no longer updated after determining their weights by solving the equation set.Differing from ELM, the MLP updates the weights of hidden layers until the loss function becomes steady.Therefore, the MLP can obtain a better mapping function.

Performance Comparison with Xu's Method
Similar to the 3.5.1,we also employ our three datasets to evaluate the performance of Xu's method.We also reconstruct the network structure of Xu's paper.The comparative experiments on Dataset1, Dataset2, and Dataset3 are conducted and the experimental results are shown in Figure 25 and Table 8.From Table 8 and Figure 25b, we can find that both our proposed method and Xu's method obtain similar localization performance on the Dataset2, without time intervals between the training dataset and testing dataset.Both the proposed algorithm and Xu achieve about 6 m localization error for 90% of the testing dataset on Dataset2, and produce about 3 m of average positioning error.However, when there is a large time interval between training dataset and testing dataset, the   From Table 8 and Figure 25b, we can find that both our proposed method and Xu's method obtain similar localization performance on the Dataset2, without time intervals between the training dataset and testing dataset.Both the proposed algorithm and Xu achieve about 6 m localization error for 90% of the testing dataset on Dataset2, and produce about 3 m of average positioning error.However, when there is a large time interval between training dataset and testing dataset, the positioning performance using our proposed algorithm is obviously better on Dataset1 and Dataset3, as Figure 25a,c and Table 8 show.Our proposed algorithm produces 5.64 m of average positioning error on Dataset1, which is 19.4% less than Xu.Our proposed algorithm produces the average positioning error of 4.24 m on sub-test3 of Dataset3, which is 25.2% less than Xu.The reason for this difference between our proposed algorithm and Xu's method is that AE is also used for feature extraction in Xu's method.Similar to previous experiments, the features extracted by AE may be a simple copy of the input layer, which does not extract more essential features.
From the results of all experiments in Sections 3.5.1 and 3.5.2,we obtain the following main conclusions: When there is no time interval between training dataset and testing dataset, both Khatab's method and Xu's methods can achieve good positioning results.

•
When there is a time interval between the training dataset and testing dataset, our proposed algorithm can obtain better positioning performance, which confirms that our proposed algorithm is more robust.

Calculation Complexity
To evaluate the calculation complexity of our proposed algorithm, we compare the total time (including total training time and prediction time for one sample on Dataset1) of our proposed algorithm with Khatab's method and Xu' method.All algorithms are run on a PC with Intel i5-6500 CPU and 8GB RAM.Table 9 lists the total time of different algorithms.Table 9 indicates that it takes the longest time for our proposed algorithm.However, the total training time is only about 21 s and run on the offline stage, which does not influence online real-time positioning.Khatab's method adopts ELM to obtain positioning results during the positioning phase.The weights in the ELM are obtained by solving the equation set.However, the MLP method adopts the back-propagation algorithm to repeatedly adjust the weights, so the ELM algorithm obtains the shortest runtime in the offline stage.Khatab's method and the proposed algorithm in this paper adopt AE and DAE in the feature extraction phase, respectively, with the DAE having longer runtime than the AE.In the positioning phase, ELM in Khatab's method adopts a simpler network structure, and the MLP network structure proposed in this paper is more complicated.Therefore, Khatab's method obtains the shortest prediction time.Finally, although our algorithm has the longest prediction time for one sample, the latency at the millisecond level is negligible.

Conclusions
In this paper, we propose an indoor positioning algorithm combining SDAE and MLP, in which the SDAE performs feature extraction and the MLP performs regression positioning.To solve the Wi-Fi signal dynamic fluctuation with time, we adopt the SDAE-based robust feature extraction method, and then build a MLP-based regression model for indoor positioning.To evaluate our proposed algorithm, we conduct experiments in three datasets which represent different scenarios.The experimental results indicate that the SDAE-based feature extraction method extracts robust and time-independent features, which represent the raw Wi-Fi data well, and the MLP-based regression model finds a good mapping function.Extensive experimental results demonstrate that the proposed algorithm and other algorithms can achieve similar and good positioning performance when there is a short time interval between the training dataset and testing dataset.However, the proposed algorithm obtains 4.24 m of average positioning error when there is a 52-day interval between training dataset and testing dataset, which is 24.3% less than Khatab's method, 25.2% less than Xu's method, and 14.7% less than the tree-fusion-based regression model, respectively.This confirms that our proposed algorithm is more robust than other algorithms when there is a large time interval between the training dataset and testing dataset.
In future work, we will continue to improve positioning accuracy, apply our proposed method in practical environments, and we will evaluate our proposed algorithm using more datasets.Also, we will consider extracting more essential features to eliminate the effect of time on the Wi-Fi signal.Furthermore, we will consider better mapping the extracted features to the target position by building other network structures.

Figure 1 .
Figure 1.Overall structure of our proposed positioning algorithm.

Figure 2 .Figure 2 .
Figure 2. The dynamic fluctuation of Wi-Fi signal with time in a teaching building.(a) RSSI data distribution of different Wi-Fi signals at different times in a specific location.(b) RSSI changes of different Wi-Fi signals over a 10-day period in a specific location.

Figure 2 .Figure 3 .
Figure 2. The dynamic fluctuation of Wi-Fi signal with time in a teaching building.(a) RSSI data distribution of different Wi-Fi signals at different times in a specific location.(b) RSSI changes of different Wi-Fi signals over a 10-day period in a specific location.

Figure 4 .
Figure 4. Correlation of RSSI samples of the original Wi-Fi data and the new-collected Wi-Fi data at the same position before SDAE and after SDAE.

Figure 4 .
Figure 4. Correlation of RSSI samples of the original Wi-Fi data and the new-collected Wi-Fi data at the same position before SDAE and after SDAE.

28 Figure 6 .
Figure 6.Regression model network structure based on MLP.

Figure 6 .
Figure 6.Regression model network structure based on MLP.

Figure 7 .
Figure 7.The pseudo code of the robust Wi-Fi fingerprint positioning algorithm using SDAE and MLP.

Algorithm 1 . 3 // Data preprocessing 4 Normalizefor sdae_layer in all hidden layers of SDAE_model do 10 for each training epoch do 11 new_training_datasetfor each training epoch do 20 trainingFigure 7 .
Figure 7.The pseudo code of the robust Wi-Fi fingerprint positioning algorithm using SDAE and MLP.
Firstly, we train AdaBoost, RandomForest, and KernelRidge as three meta-learners, then utilize these meta-learners to obtain a new training dataset.Secondly, we utilize the new training dataset to train the secondary-learner GBDT (Gradient Boosting Decision Tree), and then utilize the secondary-learner to obtain the stacked model predictions in Figure 8. Thirdly, we utilize the original training dataset to train XGBoost and LightGBM, and the two single models are used to obtain the testing dataset outputs, respectively, which are the XGBoost model predictions and LightGBM model predictions in Figure 8.The weights of the stacking model, XGBoost, and LightGBM are configured to w3, w1, and w2, respectively.Finally, the final prediction in Figure 8 is obtained by weighted average (as Equation (11) shows).final_prediction = P stack * w3 + P xgb * w1 + P lgb * w2(11)where P stack is the stacked model predictions, P xgb is the XGBoost model predictions, and P lgb is the LightGBM model predictions.

Figure 8 .
Figure 8.The overall framework of the tree-fusion-based regression model.

3. 1 .
Datasets Besides the public UJIIndoorLoc dataset (named Dataset1 in this paper), we also collect Wi-Fi signals from 20 sampling points in our laboratory area.The distance between adjacent sampling points is 5 m or more.We collect Wi-Fi signals three times in these locations, which are recorded as the training dataset, validation dataset, and testing dataset, respectively.The sampling durations of each sampling point in the training dataset, validation dataset, and testing dataset are 12 s, 9 s, and 6 s, respectively.The collection interval of the training dataset and validation dataset is about 13 min.The collection interval of the validation dataset and testing dataset is 6 min.There is no time interval between the training dataset and testing dataset.This dataset is recorded as Dataset2.The sampling area is approximate 40 × 30 .The positions of sampling points are shown in Figure 9.

Figure 9 .
Figure 9.The sampling positions of Dataset2 in our laboratory.

Figure 9 .
Figure 9.The sampling positions of Dataset2 in our laboratory.

Figure 9 .
Figure 9.The sampling positions of Dataset2 in our laboratory.

Figure 10 .
Figure 10.The sample position of the Dataset3 in our laboratory.

Figure 10 .
Figure 10.The sample position of the Dataset3 in our laboratory.
evaluate the effect of the SDAE-based feature extraction method for localization performance, we use the Dataset1 to train our proposed MLP-based regression model with feature extraction operation and without feature extraction operation.The positioning accuracy comparison with or without SDAE-based feature extraction operation is shown in Figure12.It can be seen from Figure12that using the SDAE-based feature extraction method can obtain higher localization accuracy, which confirms that the SDAE can extract the robust and time-independent Wi-Fi fingerprint features from the original dynamic Wi-Fi dataset, and using the features obtained by the SDAE method can improve the positioning accuracy.

Figure 12 .
Figure 12.Comparison of CDF (Cumulative Distribution Function) positioning errors between the algorithm with feature extraction and the algorithm without feature extraction.
evaluate the effect of the SDAE-based feature extraction method for localization performance, we use the Dataset1 to train our proposed MLP-based regression model with feature extraction operation and without feature extraction operation.The positioning accuracy comparison with or without SDAE-based feature extraction operation is shown in Figure12.It can be seen from Figure12that using the SDAE-based feature extraction method can obtain higher localization accuracy, which confirms that the SDAE can extract the robust and time-independent Wi-Fi fingerprint features from the original dynamic Wi-Fi dataset, and using the features obtained by the SDAE method can improve the positioning accuracy.

Figure 12 .
Figure 12.Comparison of CDF (Cumulative Distribution Function) positioning errors between the algorithm with feature extraction and the algorithm without feature extraction.

3. 3 .Figure 12 .
Figure 12.Comparison of CDF (Cumulative Distribution Function) positioning errors between the algorithm with feature extraction and the algorithm without feature extraction.

3. 3 .
Positioning Performance of the Proposed Algorithm 3.3.1.Performance of Our Proposed Algorithm under Different ParametersIn this section, we evaluate the performance of our proposed algorithm on Dataset1 with different parameters.

Figure 15 .
Figure 15.The positioning accuracy using different activation functions.(a) CDF positioning errors.(b) Average positioning errors.

Figure 15 .
Figure 15.The positioning accuracy using different activation functions.(a) CDF positioning errors.(b) Average positioning errors.

( 4 )
Performance using different MLP epoch The loss function declining curve with epoch on the validation dataset is plotted in Figure 17.As can be seen from Figure 17, when epoch reaches 150, val_loss approximates the minimum value, but there is still slight vibration.When epoch reaches 200, val_loss becomes stable.Therefore, The epoch 200 is the appropriate epoch in our proposed algorithm.Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 28 but there is still slight vibration.When epoch reaches 200, val_loss becomes stable.Therefore, The epoch 200 is the appropriate epoch in our proposed algorithm.

Figure 17 .
Figure 17.The loss function declining curve with epoch of the validation dataset.
and the MLP-based regression model obtains the best positioning accuracy when the learning rate is set to 0.0008.According to Figure 18a, the localization error of the MLP-based regression model for 80% of the testing dataset falls between 8 m and 10 m when using RMSprop under different learning rates, in which the model (lr = 0.0008) obtains about 8 m.According to Figure 18b, the MLP-based regression model obtains the average positioning error of 5.64 m when the learning rate is set to 0.0008, which is 12.7% less than the model (lr = 0.002) that obtains the highest positioning error and 1.2% less than the model (lr = 0.0007) that obtains the highest positioning accuracy, except the model (lr = 0.0008).Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 28 but there is still slight vibration.When epoch reaches 200, val_loss becomes stable.Therefore, The epoch 200 is the appropriate epoch in our proposed algorithm.

Figure 17 .
Figure 17.The loss function declining curve with epoch of the validation dataset.

Figure 19 .Table 2 .
Figure 19.The CDF positioning error of our proposed algorithm under different sample densities.

Figure 19 .Table 2 .
Figure 19.The CDF positioning error of our proposed algorithm under different sample densities.Table 2. Average positioning error of our proposed algorithm under different sample densities.Dataset Neighboring sample_spacing (m) Mean_error (m) Sample Density 3 m 3 2.84 Sample Density 5 m 5 3.56 Sample Density 7 m 7 4.18 Sample Density 10 m 10 5.63

28 Figure 21 .
Figure 21.Comparison of CDF positioning errors between the proposed algorithm and the treefusion-based regression model.

Figure 21 .
Figure 21.Comparison of CDF positioning errors between the proposed algorithm and the tree-fusion-based regression model.

Table 4 .
Average positioning errors of the proposed algorithm and the tree-fusion-based regression model.

Figure 22 .Figure 22 .
Figure 22.The positioning errors (CDF) between the proposed algorithm and the tree-fusion-based regression model.

28 Table 5 .
Remote Sens. 2019, 11, x FOR PEER REVIEW 20 of The average positioning error comparison between the proposed algorithm and the treefusion-based regression model.

Figure 23 .Table 6 .ProbabilityFigure 23 .
Figure 23.Comparison of CDF positioning errors between the proposed algorithm and the treefusion-based regression model.

Table 1 .
Hyperparameter values in our proposed SDAE and MLP.As Figure7shows) describes the general process of our proposed robust Wi-Fi fingerprint positioning algorithm using SDAE and MLP.

Table 3 .
Average positioning error of the algorithm proposed in this paper on three datasets.

Table 3 .
Average positioning error of the algorithm proposed in this paper on three datasets.

Table 3 .
Average positioning error of the algorithm proposed in this paper on three datasets.

Table 4 .
Average positioning errors of the proposed algorithm and the tree-fusion-based regression model.

Table 6 .
Comparison of average positioning errors between the proposed algorithm and the tree-fusion-based regression model.

Table 7 .
The average positioning errors between our proposed algorithm and Khatab's method.

Table 7 .
The average positioning errors between our proposed algorithm and Khatab's method.

Table 8 .
Comparison of the average positioning errors between the proposed algorithm and Xu.

Table 8 .
Comparison of the average positioning errors between the proposed algorithm and Xu.

Table 9 .
Calculation time of different algorithms.