An Enhanced Smartphone Indoor Positioning Scheme with Outlier Removal Using Machine Learning

: In smartphone indoor positioning, owing to the strong complementarity between pedestrian dead reckoning (PDR) and WiFi, a hybrid fusion scheme of them is drawing more and more attention. However, the outlier of WiFi will easily degrade the performance of the scheme, to remove them, many researches have been proposed such as: improving the WiFi individually or enhancing the scheme. Nevertheless, due to the inherent received signal strength (RSS) variation, there still exist some unremoved outliers. To solve this problem, this paper proposes the ﬁrst outlier detection and removal strategy with the aid of Machine Learning (ML), so called WiFi-AGNES (Agglomerative Nesting), based on the extracted positioning characteristics of WiFi when the pedestrian is static. Then, the paper proposes the second outlier detection and removal strategy, so called WiFi-Chain, based on the extracted positioning characteristics of WiFi, PDR, and their complementary characteristics when the pedestrian is walking. Finally, a hybrid fusion scheme is proposed, which integrates the two proposed strategies, WiFi, PDR with an inertial-navigation-system-based (INS-based) attitude heading reference system (AHRS) via Extended Kalman Filter (EKF), and an Unscented Kalman Filter (UKF). The experiment results show that the two proposed strategies are effective and robust. With WiFi-AGNES, the minimum percentage of the maximum error (MaxE) is reduced by 66.5%; with WiFi-Chain, the MaxE of WiFi is less than 4.3 m; further the proposed scheme achieves the best performance, where the root mean square error (RMSE) is 1.43 m. Moreover, since characteristics are universal, the proposed scheme integrated the two characteristic-based strategies also possesses strong robustness.

For PDR technology, it contains three critical procedures: azimuth estimation, step detection, and step length estimation [16]. Various algorithms are designed for azimuth estimation based on the micro-electro-mechanical system (MEMS) inertial-measurement-unit (IMU) built-in smartphone. And various algorithms are designed for step detection such as peak detection, zero-crossing, and auto/cross-correlation, etc. Further, the step length is estimated by some models, such as Weinberg model, Kim model, and Scarlet model, etc. Although a Robust Adaptive Kalman Filter (RAKF) [20] or a Robust PDR (R-PDR) algorithm [21] can enhance the positioning performance and maintain the accuracy in the short-term, due to the low quality of smartphones' built-in MEMS IMU, the accumulative error inevitably exists without upper bounds.
For WiFi technology, many studies have developed different positioning approaches, such as the time of arrival (TOA) [57], time difference of arrival (TDOA) [58], and angle of arrival (AOA) [59]. However, these approaches require special hardware, which may not be feasible for a smartphone [25]. As an alternative, the received signal strength (RSS) which can be directly observed by smartphones has been utilized for positioning in two approaches: trilateration [60] and fingerprinting [24]. The trilateration approach converts RSS into distances between access points (APs) and a smartphone, therefore, it needs to know the locations of at least three APs. Compared to the trilateration approach, the fingerprinting approach has gained plenty of attention because it is infrastructure-free and can provide a more excellent performance [25]. Therefore, we only focus on the fingerprinting-based approach in this paper. The fingerprinting approach consists of two phases: offline and online phase. In the offline phase, a radio map is established to describe the relationship between the RSS and the reference points (RPs) within the area of interest. In the online phase, the positioning coordinate will be estimated by matching the real-time RSS measurement received from the smartphone with the established radio map. Previous researches have demonstrated that the fingerprinting approach possesses an accuracy of approximately 5 m [26]. Moreover, due to the inherent RSS variation, there inevitably exist some outliers in WiFi technology.
Based on the abovementioned situation, the hybrid fusion scheme integrated PDR and WiFi exists a significant problem: the outlier of WiFi will easily degrade the performance of the scheme. To solve this problem, from the authors' point of view, there are two main types of solutions: 1.
Improving the WiFi individually: Five nearest neighbor (NN)-based algorithms are compared based on the same radio map named Database2 in [28], which contains 100 RPs and the distance between adjacent RPs is about 2.4 m. A physical distance of the RSS algorithm is proposed in [29] to estimate the positioning coordinate and achieves a root mean square error (RMSE) of 4.49 m and a maximum error (MaxE) of about 10 m. An affinity propagation clustering (APC) algorithm is proposed in [30] and achieves an RMSE of 4.90 m and a MaxE of about 10 m. An optimal weight KNN (OWKNN) algorithm which employs the Euclidean distance is proposed in [31] and achieves an RMSE of 5.54 m and a MaxE of about 10 m. ZiLoc is proposed in [32] which employs the Manhattan distance and achieves an RMSE of 5.88 m and a MaxE of about 10 m. An approximate-position-distance-based WKNN (APD-WKNN) algorithm is proposed in [28] and achieves an RMSE of 3.52 m and a MaxE of about 10 m. In summary, although many excellent algorithms have been proposed, due to the inherent RSS variation, the outliers of WiFi inevitably exist in WiFi technology, which has become a challenge for the hybrid fusion scheme.

2.
Enhance the hybrid fusion scheme: Li [50] proposed a robustly constrained Kalman filter (KF) scheme. The integrated WiFi of the scheme achieves a MaxE of over than 13 m. To lessen the effect of outliers, a chi-square test which based on Gaussian assump-Remote Sens. 2021, 13,1106 3 of 23 tion is employed. However, the Gaussian assumption is not easy to be guaranteed in practical applications due to the RSS variation caused by the signal refraction, reflection, scattering, and multi-path fading. Thus, the unremoved outliers will degrade the performance of the scheme. Hu [51] proposed a Segment-based PDR/WiFi scheme. In the scheme, although the AP whose RSS less than -80 dBm is deleted in the online phase, the integrated WiFi achieves a mean error (ME) of 5.3 m and a MaxE of over 20 m, which demonstrates that there still exist many outliers. Moreover, the scheme defines a fixed size window and utilizes the averaged coordinate of the WiFi in the window to realize the fusion. However, the WiFi positioning coordinates will concentrate on a small area sometimes [52]. Therefore, the averaged coordinate may be an outlier and further will degrade the performance of the scheme. Chen [53] proposed an INS/WiFi scheme. The scheme employs a pre-processing technique to enhance the WiFi signal quality and a Multi-dimensional Dynamic Time Warping (MDTW) to improve the WiFi. However, the improved WiFi achieves a ME of 6.33 m and a MaxE of 11.78 m on handheld motion in the first experiment, which demonstrates that there still exist many outliers. Although the scheme automatically adjusts the weighting coefficients of WiFi, the unremoved outliers will still be integrated into the scheme and inevitably degrade the performance of the scheme. In summary, although many excellent hybrid fusion schemes have been proposed, due to the inherent RSS variation, the outliers of WiFi still exist and has become a challenge for the hybrid fusion scheme.
The outlier inevitably exists in WiFi technology due to the inherent RSS variation, moreover, the unremoved outliers will easily degrade the performance of the scheme. To this end, in this paper, we proposed two strategies to realize the outlier detection and removal. The proposed rationale is: the strategy comes from the sufficient mining and utilization of the positioning characteristics of PDR and WiFi and the complementarity between them. A noteworthy difference from the existing schemes is that the existing schemes only utilize the complementarity to fuse PDR and WiFi and unexpectedly ignore the capacity of the complementarity in outlier detection and removal, which is sufficiently utilized in this paper. The contributions of this paper are summarized as follows:

1.
We reasonably assume that the motion state of the pedestrian in smartphone indoor positioning comprises static and walking, then based on the extracted positioning characteristics of WiFi when the pedestrian is static, we proposed the first outlier detection and removal strategy using Machine Learning (ML) named WiFi-AGNES (Agglomerative Nesting).

2.
Based on the extracted positioning characteristics of PDR and WiFi and the complementary characteristics when the pedestrian is walking, we proposed the second outlier detection and removal strategy named WiFi-Chain.

3.
We proposed a hybrid fusion scheme which integrates the two proposed strategies, fingerprinting-based WiFi, PDR with an inertial-navigation-system-based (INS-based) attitude heading reference system (AHRS) via Extended Kalman Filter (EKF) for the azimuth estimation of PDR and an Unscented Kalman Filter (UKF) for the final fusion.
The remainder of this paper is organized as follows: Related works are discussed in Section 2. In Section 3, we present the proposed two outlier detection and removal strategies, and Section 4 introduces the proposed hybrid fusion scheme for smartphone indoor positioning. In Section 5, we discuss the experiments and results. Section 6 provides the conclusion of this paper and presents some potential future researches.

Related Works
The hybrid fusion scheme integrated PDR and WiFi has been extensively researched over the past decade. In contrast, the outlier of WiFi is still a challenge for the scheme.
Here, we only focus on the related work concerning the above two solutions in terms of the challenge. For improving the WiFi individually: From the authors' point of view, the previously proposed algorithms can be roughly divided into four types: the NN-based algorithms, the probability-based algorithms, the AutoEncoder (AE)-based algorithms, and the deep learning-based algorithms. In the first type, RADAR [27] is the first RSS-based fingerprinting algorithm which employs the KNN algorithm and achieves a 50% error of 2.94 m and a MaxE of over than 20 m; a WKNN algorithm is proposed in [33] and achieves a ME of 3.06 m and a MaxE of about 10 m; an improved semi-supervised affinity propagation-based WKNN (WKNN-SAP) algorithm is proposed in [34] and achieves a ME of 2.5 m and a MaxE of over than 20 m in the entire environment; a feature scaling based KNN (FS-KNN) is proposed in [35] to improve the positioning performance and achieves a ME of 1.93 m and a MaxE of about 6.5 m; a signal weighted Euclidean distance-based WKNN (SWED-WKNN) algorithm is proposed in [36] and achieves an RMSE of 2.74 m and a MaxE of about 10 m. In the second type, a Kernel-Based Bayesian algorithm is proposed in [37] and achieves an RMSE of 2.71 m and a MaxE of about 12 m; a Weibull-Bayesian density model is proposed in [25] to improve the positioning performance and achieves an RMSE of 2.59 m and a MaxE of about 9 m on the second floor; a pairwise signal strength differences (PSSD) strategy is proposed in [26] and achieves an RMSE of 3.0 m and a MaxE of about 11 m on Honor 8 smartphone. In the third type, three algorithms are compared in [28] based on the same radio map named Database2. The AE based Multi-Layer Perceptron (MLP) algorithm is proposed in [39] which achieves an RMSE of 4.94 m and a MaxE of about 9 m; an AE based Extreme Learning Machine (ELM) algorithm is proposed in [40] which achieves an RMSE of 4.35 m and a MaxE of about 9.5 m; a Stacked Denoising Autoencoder (SDAE) based MLP algorithm is proposed in [41] which achieves an RMSE of 3.89 m and a MaxE of about 9 m. In the last type, a deep neural network (DNN) utilized the data augmentation scheme is proposed in [42] and achieves a ME of 2.54 m and a MaxE of about 9 m; In [43], three deep learning-based algorithms are compared, on the office testing with data from 30 RPs, a Semi-supervised Deep Extreme Learning Machine (SDELM) algorithm proposed in [44] achieves a ME of 2.16m and a MaxE of about 6 m; a Stacked Denoising Autoencoder based DNN (SDA-DNN) algorithm proposed in [45] achieves a ME of 2.23 m and a MaxE of about 9.5 m; a local feature-based deep long short-term memory (LF-DLSTM) algorithm proposed in [43] achieves a ME of 1.75 m and a MaxE of about 10 m. It is obvious that some algorithms achieve an improved performance, however, the outliers still exist in each algorithm due to the inherent RSS variation.
Meanwhile, there are also some algorithms which achieve much excellent performance in their experiments, such as the best bin fast based WKNN (BBF-WKNN) algorithm proposed in [46] achieves a ME of 1.5 m and a MaxE of over 6 m; a DNN based algorithm CellinDeep is proposed in [47] which achieves a ME of 0.78 m and a MaxE of about 5.7 m; a Bisecting K-means (BKM) algorithm proposed in [48] achieves a ME of 1.51 m and a MaxE of about 10 m. Although these algorithms achieve much excellent performance, the established radio map is too heavy to be practical. Moreover, the outliers exist in these algorithms as well. Further, it is worth noting that the abovementioned performances are tested on the static state, which means that the performance will be worse and the outliers will be more distinct when the pedestrian is walking in smartphone indoor positioning.
For enhancing the hybrid fusion scheme: Li [54] proposed an improved dead-reckoning (DR)/WiFi/MM scheme. The scheme employs QC Level #1 and QC Level #2 to remove the outliers of WiFi and set the weight of WiFi, respectively. However, the performance of WiFi achieves an RMSE of 5.9 m and a MaxE of over 10 m in the handheld motion of test group #1 after QC Level #1, which demonstrates that there still exist many outliers. Further, the rationale behind QC Level #2 is adjusting the weight via comparing the distance between the current WiFi coordinate and the previous fusion coordinate with the predefined threshold. However, the distance cannot effectively reflect the uncertainty of the current WiFi coordinate for the accuracy of the previous fusion coordinate is uncertain in the filter. Therefore, an improper weight of an outlier will easily degrade the performance of the scheme. Zhou [55] proposed a PDR/WiFi scheme in which a variable-size sliding window is employed to improve the scheme of [51] in the turning point. The integrated WiFi achieves a ME of 3.72 m, which demonstrates that there exist some outliers. Similar to [51], the averaged coordinate may be an outlier and further will degrade the performance of scheme. Deng [56] proposed a PDR/WiFi/Landmarks scheme. To improve the WiFi, the scheme employs the previous fusion coordinate as a trusted area to limit the search area of WiFi and proposes a strategy to remove the outliers. However, the performance of the improved WiFi achieves a ME of 2.43 m and a MaxE of over 10 m, which demonstrates that there still exist many outliers. Moreover, a kernel density estimation-based model is utilized to adaptively measurement the measurement noise statistics of WiFi, which means that all of the WiFi positioning coordinates including the unremoved outliers will be integrated into the scheme, thus the performance of the scheme will inevitably be degraded.
According to the previous work, the inherent RSS variation will inevitably result in the outliers in the WiFi algorithm and further the unremoved outliers will easily degrade the performance of the scheme. Nevertheless, PDR and WiFi possess some distinct positioning characteristics, which can be utilized to remove the outliers out. Moreover, the complementarity between PDR and WiFi can be utilized to not only fuse them but also realize the outlier detection and removal, however, the latter is unexpectedly ignored in the existing schemes to the best of our knowledge. Therefore, we proposed two strategies named WiFi-AGNES and WiFi-Chain based on the positioning characteristics and the complementary characteristics in terms of the positioning level and these strategies have never been addressed before to the best of our knowledge.

Outlier Detection and Removal Strategy
In this section, we detail the two proposed outlier detection and removal strategies. The proposed rationale is: strategy comes from the sufficient mining and utilization of the positioning characteristics of PDR, WiFi, and the complementarity between them. Therefore, it is a prerequisite to analyze the positioning characteristics and complementarity. Since different motion states possess different characteristics, we reasonably assume that the motion state of the pedestrian in smartphone indoor positioning comprises static and walking, which is recognized via the step detection of PDR in this paper. Then, the characteristics are analyzed at each state and further the strategies are proposed accordingly.

Positioning Characteristics of WiFi
When the pedestrian is static, since there is no step detected, therefore we just analyze the positioning characteristics of WiFi. A one-minute experiment is implemented, in which the employed WiFi technology has been executed individually for about one minute in the static state, and the experiment results are shown in Figure 1: From Figure 1, we can extract four positioning characteristics of WiFi: 1.
Any two WiFi are independent with each other, which indicates that there is no cumulative error.

2.
Among the received WiFi, relatively, some are approximately accurate and therefore close to the true coordinate, while the others are jumping and therefore far from the true coordinate.

3.
Among the received WiFi, taking the radius of the blue circle as a threshold, in the sense of Euclidean distance, the approximately accurate WiFi coordinates fall in the circle and form a cluster, while the jumping WiFi are scattered and not formed a cluster.

4.
Among the received WiFi, the quantity of WiFi inside the circle is more than that outside.
circle and form a cluster, while the jumping WiFi are scattered and not formed a cluster.
4. Among the received WiFi, the quantity of WiFi inside the circle is more than that outside.

WiFi-AGNES
Based on the above-extracted positioning characteristics of WiFi, the idea of WiFi-AGNES can be detailed as follows: In the smartphone indoor positioning, the approximately accurate WiFi described in the second characteristic is what we desire certainly. To obtain them, based on the third characteristic, the significant difference between the approximately accurate WiFi and the jumping WiFi is that the former forms a cluster while the latter is not in the sense of Euclidean distance when giving a predefined threshold, such as the blue circle shown in Figure 1a, that is, if we obtain the cluster, we will obtain the approximately accurate WiFi. This is an obvious clustering problem and further reminds us of the unsupervised hierarchical clustering algorithm in ML [61], which will generate a dendrogram. Then, by taking the predefined threshold as the distance of clusters, a bunch of clusters can be obtained from the dendrogram. The bunch of clusters contains not only the desired cluster, but also the clusters formed by the jumping WiFi. To pick out the desired cluster, based on the fourth characteristic, the significant difference between the desired cluster and the others is that the former possesses the largest quantity of WiFi, that is, if we pick out the cluster which possesses the largest quantity of WiFi, we will obtain the desired cluster, and the cluster contains the approximately accurate WiFi we desire. Based on this idea, the specific workflow of WiFi-AGNES is detailed as follows: Firstly: In static, a series of WiFi are received in chronological order such as: where Τ represents a set which contains the received WiFi, ( ) x ,y represents the WiFi coordinate received at t-n . Second: We adopt the average-linked AGNES as the unsupervised hierarchical clustering algorithm of ML and initialize the AGNES as follows:

WiFi-AGNES
Based on the above-extracted positioning characteristics of WiFi, the idea of WiFi-AGNES can be detailed as follows: In the smartphone indoor positioning, the approximately accurate WiFi described in the second characteristic is what we desire certainly. To obtain them, based on the third characteristic, the significant difference between the approximately accurate WiFi and the jumping WiFi is that the former forms a cluster while the latter is not in the sense of Euclidean distance when giving a predefined threshold, such as the blue circle shown in Figure 1a, that is, if we obtain the cluster, we will obtain the approximately accurate WiFi. This is an obvious clustering problem and further reminds us of the unsupervised hierarchical clustering algorithm in ML [61], which will generate a dendrogram. Then, by taking the predefined threshold as the distance of clusters, a bunch of clusters can be obtained from the dendrogram. The bunch of clusters contains not only the desired cluster, but also the clusters formed by the jumping WiFi. To pick out the desired cluster, based on the fourth characteristic, the significant difference between the desired cluster and the others is that the former possesses the largest quantity of WiFi, that is, if we pick out the cluster which possesses the largest quantity of WiFi, we will obtain the desired cluster, and the cluster contains the approximately accurate WiFi we desire. Based on this idea, the specific workflow of WiFi-AGNES is detailed as follows: Firstly: In static, a series of WiFi are received in chronological order such as: where T represents a set which contains the received WiFi, x t−n , y t−n represents the WiFi coordinate received at t − n. Second: We adopt the average-linked AGNES as the unsupervised hierarchical clustering algorithm of ML and initialize the AGNES as follows: where C t−n represents an initial cluster containing one sample which is obtained from the set T and is the WiFi received at t−n in Equation (1), and the coordinate of the WiFi is regarded as the feature of the sample, C T represents a set which contains all of the current clusters.
Third: An obvious prerequisite to start AGNES is that there should be a certain number of clusters in C T , therefore when Equation (3) is satisfied, we start AGNES and calculate the linked metric of AGNES as follows: where C m and C n represent two different clusters, d avg (C m , C n ) represents the average distance between C m and C n , M represents the linked metric, where the entry in the mth row and nth column equals d avg (C m , C n ), |C m | and |C n | represent the sample size of C m and C n , respectively, i and j represent a sample in C m and C n , respectively, THR1 represents a predefined threshold of the number of clusters in C T and can be empirically determined, d ij represents the Euclidean distance between i and j, which can be calculated as follows: where (x i , y i ) and x j , y j represent the feature of the sample i and j, respectively. Then based on the AGNES algorithm, an iterative aggregation calculation will be executed until the number of clusters in C T equals 1, and the abovementioned dendrogram can be generated.
Fourth: Taking the predefined threshold as the distance of clusters, and obtaining the bunch of clusters within the threshold from the dendrogram as follows: where S represents a set which contains the bunch of clusters, C p represents a cluster in the dendrogram, D C p represents the distance of clusters of C p , which can be obtained from the linked metric M, and THR2 represents the predefined threshold of the clusters determination and can be determined based on the interval of the RPs in the established radio map. Fifth: Pick out the desired cluster from S as follows: where C des represents the desired cluster which contains the approximately accurate WiFi, argmax Q (S) represents picking out the cluster which possesses the most quantity of WiFi from S. Finally: Based on C des , calculating the current coordinate as follows: x cur , y cur = 1 where x cur , y cur represents the current coordinate via WiFi-AGNES, |C des | represents the sample size of C des , q represents the qth sample in C des , x q , y q represents the coordinate of the qth sample. So far, based on the extracted positioning characteristics of WiFi in static, the strategy named WiFi-AGNES is proposed using ML, and with which we realize the outlier detection and removal and obtain the approximately accurate WiFi and the current coordinate x cur , y cur simultaneously. Since characteristics are universal, the characteristic-based WiFi-AGNES is robust. Further, we will integrate the WiFi-AGNES into the proposed hybrid fusion scheme to mitigate the effect of outliers of WiFi and improve the performance of the scheme when the pedestrian is static. The details of the unsupervised hierarchical clustering algorithm of ML and the AGNES are shown in [61,62].

Positioning Characteristics of PDR and WiFi
When the pedestrian is walking, an experiment is implemented, in which we have executed the PDR technology and the employed WiFi technology simultaneously for about forty seconds, and the experiment results are shown in Figures  x ,y simultaneously. Since characteristics are universal, the characteristic-based WiFi-AGNES is robust. Further, we will integrate the WiFi-AGNES into the proposed hybrid fusion scheme to mitigate the effect of outliers of WiFi and improve the performance of the scheme when the pedestrian is static. The details of the unsupervised hierarchical clustering algorithm of ML and the AGNES are shown in [61,62].

Positioning Characteristics of PDR and WiFi
When the pedestrian is walking, an experiment is implemented, in which we have executed the PDR technology and the employed WiFi technology simultaneously for about forty seconds, and the experiment results are shown in Figure 2 and Figure 3:   x ,y simultaneously. Since characteristics are universal, the characteristic-based WiFi-AGNES is robust. Further, we will integrate the WiFi-AGNES into the proposed hybrid fusion scheme to mitigate the effect of outliers of WiFi and improve the performance of the scheme when the pedestrian is static. The details of the unsupervised hierarchical clustering algorithm of ML and the AGNES are shown in [61,62].

Positioning Characteristics of PDR and WiFi
When the pedestrian is walking, an experiment is implemented, in which we have executed the PDR technology and the employed WiFi technology simultaneously for about forty seconds, and the experiment results are shown in Figure 2 and Figure 3:   The maximum error difference of PDR does not exceed 0.3 m, which indicates that the relative accuracy of PDR is high, regardless of the absolute error.

2.
On the basis that there is no absolute error in the first step, the absolute error of the last step reaches 11.83 m, which indicates that PDR has a cumulative error in the long term.
From Figure 3, we can extract five positioning characteristics of WiFi: 1.
Any two WiFi are independent with each other, which indicates that there is no cumulative error.

2.
Among the received WiFi, relatively, some are approximately accurate, while the others are jumping.

3.
Among the received WiFi, assuming that we take 3 m as the dividing line, the number of approximately accurate WiFi is more than the jumping. 4.
Due to the inherent RSS variation, there is a randomness in the jumping WiFi.

5.
Among the received WiFi, the jumping WiFi is received intermittently.

Complementary Characteristics between PDR and WiFi
According to the above-extracted positioning characteristics, it is obvious that there exists a strong complementarity between PDR and WiFi: the cumulative error of PDR can be revised by WiFi and the jumping of WiFi can be restrained by PDR simultaneously. Moreover, the complementarity also has a capacity in outlier detection and removal which is sufficiently utilized in WiFi-Chain. To this end, we couple PDR and WiFi by means of the vector at first as follows: Then, based on the second and fourth characteristics of WiFi, the error level of the received WiFi at one moment can be relatively divided into two levels: approximately accurate L 0 and jumping L 1 , and let's consider any two adjacent moments when the pedestrian is walking as shown in Figure 4. ers are jumping. 3. Among the received WiFi, assuming that we take 3 m as the dividing line, the number of approximately accurate WiFi is more than the jumping.
4. Due to the inherent RSS variation, there is a randomness in the jumping WiFi.
5. Among the received WiFi, the jumping WiFi is received intermittently.

Complementary Characteristics between PDR and WiFi
According to the above-extracted positioning characteristics, it is obvious that there exists a strong complementarity between PDR and WiFi: the cumulative error of PDR can be revised by WiFi and the jumping of WiFi can be restrained by PDR simultaneously. Moreover, the complementarity also has a capacity in outlier detection and removal which is sufficiently utilized in WiFi-Chain. To this end, we couple PDR and WiFi by means of the vector at first as follows: where ( )   in Equation (9), respectively, the circle labelled with W L i j represents the received WiFi with the error level L i at j, the upward line segment labelled with V L i j represents the coupled vector by Equation (9), the green segment represents the time update of the scheme via PDR.
Based on the first positioning characteristic of PDR and WiFi, an obvious complementary characteristic can be extracted from Figure 4: with a predefined threshold, if the two Wi-Fi received at any two adjacent moments possess different error levels, then the two corresponding coupled vectors will be very different, otherwise, the two vectors will be approximately equal such as follows: where • 2 represents the Euclidean norm, THR3 represents the predefined threshold of the complementary characteristic determination and can be determined based on the 68% error of the WiFi technology. We summarize all cases of the two coupled vectors and the corresponding complementary characteristics for any two adjacent moments as shown in Table 1:

WiFi-Chain
So far, based on the positioning characteristics of PDR, WiFi and the complementary characteristics of them, the idea of WiFi-Chain can be detailed as follows: In the smartphone indoor positioning, based on the fifth positioning characteristic of WiFi, the three cases will occur alternately. Whereas only Case1 which contains two approximately accurate WiFi is what we want certainly. To obtain Case1, the significant difference between Case1 and the others is that Case1 occurs more often based on the third positioning characteristic of WiFi, therefore we can accumulate the number of occurrences of Case1 by first picking it out based on the difference of the complementary characteristic between Case1 and Case3. Although Case1 and Case2 possess the same complementary characteristic, Case2 will easily transform into Case3 due to the fourth positioning characteristic of WiFi and further can be separated from Case1 base on the complementary characteristic as well. Based on this idea, the specific workflow of WiFi-Chain is detailed as follows: Firstly: Initialization. We define a combined variable Num, V , and initialize it when receiving the first WiFi W L i t 0 as follows: where the minimum value of Num is 1, and V where THR4 represents the predefined threshold of the cumulative number of Num and can be empirically determined.
In summary, in WiFi-Chain, we sufficiently utilize the complementarity of PDR and WiFi by coupling them in a way of vector, and based on the extracted positioning characteristic and the complementary characteristics, we obtain the approximately accurate WiFi even if the error level of the initial WiFi is unknown, and achieve the outlier detection and removal simultaneously. Since characteristics are universal, the characteristic-based WiFi-Chain is robust. Further, we will integrate the WiFi-Chain into the proposed hybrid fusion scheme to mitigate the effect of outliers of WiFi and improve the performance of the scheme when the pedestrian is walking. Figure 5 overviews the architecture of the proposed hybrid fusion scheme for smartphone indoor positioning. In the scheme, we integrated the two proposed outlier detection and removal strategies, and considering the effect of the pitch, roll, magnetic inclination, and magnetic declination on azimuth, we design an INS-based AHRS via EKF to estimate the azimuth of PDR, and a UKF is designed to fuse the PDR and WiFi ultimately. Moreover, although the smartphone contains four basic poses, i.e., handheld, swinging, calling, and pocket [16], this paper only considers the most common handheld pose. even if the error level of the initial WiFi is unknown, and achieve the outlier detection and removal simultaneously. Since characteristics are universal, the characteristic-based WiFi-Chain is robust. Further, we will integrate the WiFi-Chain into the proposed hybrid fusion scheme to mitigate the effect of outliers of WiFi and improve the performance of the scheme when the pedestrian is walking. Figure 5 overviews the architecture of the proposed hybrid fusion scheme for smartphone indoor positioning. In the scheme, we integrated the two proposed outlier detection and removal strategies, and considering the effect of the pitch, roll, magnetic inclination, and magnetic declination on azimuth, we design an INS-based AHRS via EKF to estimate the azimuth of PDR, and a UKF is designed to fuse the PDR and WiFi ultimately. Moreover, although the smartphone contains four basic poses, i.e., handheld, swinging, calling, and pocket [16], this paper only considers the most common handheld pose.

PDR Technology
PDR is a recursive positioning technology which contains three critical procedures: azimuth estimation, step detection, and step length estimation. The principle of PDR can be expressed as follows: where P t = (x t , y t ) and P t−1 = x t−1 , y t−1 represent the pedestrian's coordinate at t and t − 1, respectively, D t and ψ t represent the estimated step length and azimuth at t, respectively.

Attitude Updating of INS Mechanization
The basic idea of the attitude updating in INS Mechanization is that the attitude can be obtained by integrating the angular rates provided by the triple-axis gyroscope. Due to the low quality of smartphones' built-in MEMS IMU, we have neglected certain small error correction terms (i.e., rotation of the Earth) of the INS mechanization for their slight improvement in navigation performance. The discrete attitude updating algorithm of INS mechanization with a unit quaternion can be expressed as follows: Remote Sens. 2021, 13, 1106 12 of 23 where ω b ib represents the angular rate measurement vector from triple-axis gyroscope, ε b g represents the drift vector of the triple-axis gyroscope, ∆t is the time interval between two adjacent moments, Q n c b is the quaternion rotation from the body coordinate system (i.e., b-frame) to the computed navigation coordinate system (i.e., n c -frame).
The transformation from the quaternion rotation Q n c b = [q 1 , q 2 , q 3 , q 4 ] to the rotation matrix C n c b can be expressed as follows: System Model As the low-grade MEMS IMU in the smartphone has large gyroscope drift, which will degrade the accuracy of AHRS, an EKF is usually utilized to fuse multi-measurement information to improve the performance. The state variables are defined as: where φ represents the attitude error vector in the n-frame, ε b g represents the drift of the triple-axis gyroscope.
The discrete linearization of the system error model can be expressed as follows: where δx t−1,t−1 represents the previous error state vector, δx t,t−1 represents the predicted error state vector, δz t represents the measurement misclosure vector, H t represents the design matrix, w t and v t represent the system process noise and measurement noise, respectively, and Φ t−1 represents the 6 × 6 state transition matrix: where ∆t is the time interval between two adjacent moments. The rotation matrixĈ n b from the body coordinate system (i.e., b-frame) to the true navigation coordinate system (i.e., n-frame) can be calculated via feedback as follows: The estimated azimuth ψ t for PDR can be calculated as follows: where C ij is the element at the ith row and jth column ofĈ n b . The azimuth rate . ψ t can be calculated as follows: where 3 represents taking the third coordinate. Measurement Model Regarding Earth gravity sensing ability, the triple-axis accelerometers are usually utilized to update the AHRS in the absence of external acceleration. The absolute value of the difference between the triple-axis accelerometer measurement vector and Earth's gravity is utilized to determine whether external acceleration exists, which can be expressed as: where f b and f n = 0 0 −g represent the triple-axis accelerometer measurement vector and the Earth gravity vector, respectively, THR5 represents the predefined threshold of the external acceleration determination and can be empirically determined. Then, the accelerometer measurement model can be expressed as follows: where f b represents the accelerometer reading vector, v 1 represents the measurement noise.
Regarding geomagnetic field sensing ability, the triple-axis magnetometers are usually utilized to update the AHRS in the absence of any external magnetic field interference. Whereas, there are frequent magnetic field disturbances caused by the man-made infrastructure in indoor environments, to mitigate the effect of the disturbance, the weight for the magnetometer measurements was set at a very small value in our model. Moreover, to avoid the effect of horizontal angle error, the triple-axis magnetometer readings rather than the absolute azimuth are utilized by means of tight coupling. The magnetometer measurement model can be expressed as follows: where m b represents the triple-axis magnetometer measurement vector, v 2 represents the measurement noise. H m = 0 H 0 represents the geomagnetic field in the magnetic coordinate system (i.e., m-frame), and C n m represents the rotation matrix from the m-frame to the n-frame and can be expressed as follows: where η x = 47.22 • and η z = 4.68 • represent the magnetic inclination angle and declination angle in WuHan, respectively, which can be calculated from the IGRF model [63,64]. Based on the system model and measurement model, the AHRS can be executed according to Figure 6.

Step Detection and Step Length Estimation
We employ the peak detection algorithm to detect the step due to its small computation and high success rate, and the Weinberg model to estimate the step length due to its practicability [15]. Weinberg model assumes that the step length D is proportional to the vertical movement of the human hip and can be expressed as follows: where a max z and a min z represent the maximum and minimum value of the vertical acceleration in one step period, respectively, K represents the predefined parameter to obtain the appropriate step length and can be empirically determined. 4.68 represent the magnetic inclination angle and declination angle in WuHan, respectively, which can be calculated from the IGRF model [63,64]. Based on the system model and measurement model, the AHRS can be executed according to Figure 6.

Step Detection and Step Length Estimation
We employ the peak detection algorithm to detect the step due to its small computation and high success rate, and the Weinberg model to estimate the step length due to its practicability [15]. Weinberg model assumes that the step length D is proportional to the vertical movement of the human hip and can be expressed as follows:

WiFi Technology
In this paper, a probabilistic algorithm of fingerprinting positioning technology named Weibull probability density function fingerprinting algorithm based on the Weibull-Bayesian density model with dynamic bin is employed, since the algorithm not only possesses an excellent positioning accuracy, but also reduces the workload in establishing the radio map [25], Figure 7 presents an overview of the architecture of the employed WiFi technology.  (26) where max z a and min z a represent the maximum and minimum value of the vertical acceleration in one step period, respectively, K represents the predefined parameter to obtain the appropriate step length and can be empirically determined.

WiFi Technology
In this paper, a probabilistic algorithm of fingerprinting positioning technology named Weibull probability density function fingerprinting algorithm based on the Weibull-Bayesian density model with dynamic bin is employed, since the algorithm not only possesses an excellent positioning accuracy, but also reduces the workload in establishing the radio map [25], Figure 7 presents an overview of the architecture of the employed WiFi technology.

Hybrid Fusion via UKF
Although many researchers adopt different filters to fusion PDR and WiFi, such as KF, EKF, and Particle filter (PF) [65,66], in this paper, the UKF which is a non-linear filtering algorithm based on the unscented transform is adopted for the following considerations: 1. There exists a non-linear in the proposed hybrid fusion scheme when considering the azimuth as a state variable. In this situation, KF is inapplicable for its linear nature. EKF is also inapplicable for its linearization error and hence will degrade the positioning accuracy [52]. 2. Compared with PF, UKF is a lighter filter that is more suitable for real-time positioning on the resource-limited smartphone. The state equation of UKF is designed as follows: where ( ) t t x ,y , t s , ψ t , and ψ t represent the two-dimensional coordinate、speed、azimuth and azimuth rate, respectively, Δt represents the time interval between two adjacent moments, and t w represents the system process noise with the covariance matrix ( ) , and t Q can be empirically determined.

Hybrid Fusion via UKF
Although many researchers adopt different filters to fusion PDR and WiFi, such as KF, EKF, and Particle filter (PF) [65,66], in this paper, the UKF which is a non-linear filtering algorithm based on the unscented transform is adopted for the following considerations:

1.
There exists a non-linear in the proposed hybrid fusion scheme when considering the azimuth as a state variable. In this situation, KF is inapplicable for its linear nature. EKF is also inapplicable for its linearization error and hence will degrade the positioning accuracy [52].

2.
Compared with PF, UKF is a lighter filter that is more suitable for real-time positioning on the resource-limited smartphone.
The state equation of UKF is designed as follows: where (x t , y t ), s t , ψ t , and . ψ t represent the two-dimensional coordinate, speed, azimuth and azimuth rate, respectively, ∆t represents the time interval between two adjacent moments, and w t represents the system process noise with the covariance matrix Q t = E w t w T t , and Q t can be empirically determined.
The measurement equation of UKF is designed as follows: where (x t , y t ) represents the two-dimensional measurement coordinate from WiFi, s t = D t /∆t represents the measurement speed from PDR, ψ t and . ψ t represent the measurement azimuth and azimuth rate from AHRS, H t represents the design matrix and is a fivedimensional identity matrix I 5×5 , v t represents the measurement noise with the covariance matrix R t = E v t v T t , and R t can be empirically determined. The implementation of the UKF algorithm is detailed in [52].

Experimental Environment Deployment
The experimental environment is provided by Siriandhorn Research Center at Wuhan University, which is a typical office environment. Figure 8 shows the layout of the experimental site, the experimental path, and the RPs. The site is about 90 m in length and contains 12 control points with known coordinates in terms of CGCS2000 [67]. To establish the radio map, 47 RPs were established on a grid map at intervals of approximately 2.5 m, and 30 sets of RSS samples were collected at each RP via a Samsung Galaxy S8 smartphone with the Android 8.0 operation system to estimate the parameters of the Weibull signal model [25].
The smartphone is also employed as the positioning device which outputs the positioning coordinates of the proposed hybrid fusion scheme at a frequency of 50 Hz, and the raw data sampling frequency of WiFi, accelerometer, gyroscope, and magnetometer are 0.72 Hz, 50 Hz, 50 Hz, and 50 Hz, respectively. Another positioning device is a handheld SLAM which comprises a single scan lidar with a frequency at 40 Hz and an IMU with a frequency at 100 Hz, the SLAM can output the local coordinates with a frequency at 50 Hz by executing an algorithm named Cartographer SLAM of Google [68]. The output local coordinates can be transformed into the coordinates in terms of CGCS2000 and further be utilized to evaluate the positioning performance of the smartphone. Figure 9 shows the two employed devices. and is a five-dimensional identity matrix 5 5 I × , t v represents the measurement noise with the covariance matrix ( ) , and t R can be empirically determined.
The implementation of the UKF algorithm is detailed in [52].

Experimental Environment Deployment
The experimental environment is provided by Siriandhorn Research Center at Wuhan University, which is a typical office environment. Figure 8 shows the layout of the experimental site, the experimental path, and the RPs. The site is about 90 m in length and contains 12 control points with known coordinates in terms of CGCS2000 [67]. To establish the radio map, 47 RPs were established on a grid map at intervals of approximately 2.5 m, and 30 sets of RSS samples were collected at each RP via a Samsung Galaxy S8 smartphone with the Android 8.0 operation system to estimate the parameters of the Weibull signal model [25].
The smartphone is also employed as the positioning device which outputs the positioning coordinates of the proposed hybrid fusion scheme at a frequency of 50 Hz, and the raw data sampling frequency of WiFi, accelerometer, gyroscope, and magnetometer are 0.72 Hz, 50 Hz, 50 Hz, and 50 Hz, respectively. Another positioning device is a handheld SLAM which comprises a single scan lidar with a frequency at 40 Hz and an IMU with a frequency at 100 Hz, the SLAM can output the local coordinates with a frequency at 50 Hz by executing an algorithm named Cartographer SLAM of Google [68]. The output local coordinates can be transformed into the coordinates in terms of CGCS2000 and further be utilized to evaluate the positioning performance of the smartphone. Figure 9 shows the two employed devices.
Besides, in this paper, we predefined the value of THR1 , THR4 , THR5 , and THR6 as 6, 3, 0.05 2 m/s , and 0.556 based on our testing results, respectively, the value of THR2 as 3 m based on the interval of RPs which is approximately 2.5 m in our established radio map, and the value of THR3 as 2.5 m based on the 68% error of the employed WiFi technology [25]; further, the value of t Q and the initial value of t R in UKF is predefined as follows: Finally, based on the abovementioned deployed environment, the running time for once filtering is approximately 8 ms on the employed smartphone. Besides, in this paper, we predefined the value of THR1, THR4, THR5, and THR6 as 6, 3, 0.05 m/s 2 , and 0.556 based on our testing results, respectively, the value of THR2 as 3 m based on the interval of RPs which is approximately 2.5 m in our established radio map, and the value of THR3 as 2.5 m based on the 68% error of the employed WiFi technology [25]; further, the value of Q t and the initial value of R t in UKF is predefined as follows: Finally, based on the abovementioned deployed environment, the running time for once filtering is approximately 8 ms on the employed smartphone.

Experimental Setup and Performance
We set three experiments, and the performance can be evaluated as follows: where true P and loc P represent the two-dimensional true coordinates and the to be evaluated positioning coordinates, respectively, E represents the positioning error.
The first experiment is designed to demonstrate the capacity of the handheld SLAM device in evaluating the positioning performance of the smartphone. We run the device along the experimental path shown in Figure 8 with a normal walking speed of approximately 1 m/s and then transform the output local coordinates into the coordinates in terms of CGCS2000 via a planar four-parameter coordinate transformation model [69]. Further taking the known coordinates of control points as the true coordinates, the performance of the transformed coordinates can be evaluated by Equation (30). Figure 10 shows the cumulative distribution function (CDF) of the transformed coordinates for the handheld SLAM device. The RMSE, 95% error, 68% error, ME, MaxE, and Minimum error are 0.097 m, 0.165 m, 0.116 m, 0.083 m, 0.188 m, and 0.005 m, respectively. This performance is much more accurate than the meter-level positioning performance of the smartphone, therefore the handheld SLAM device is competent to evaluate the positioning performance of the smartphone. The second experiment is designed to evaluate the performance of WiFi-AGNES. We

Experimental Setup and Performance
We set three experiments, and the performance can be evaluated as follows: where P true and P loc represent the two-dimensional true coordinates and the to be evaluated positioning coordinates, respectively, E represents the positioning error. The first experiment is designed to demonstrate the capacity of the handheld SLAM device in evaluating the positioning performance of the smartphone. We run the device along the experimental path shown in Figure 8 with a normal walking speed of approximately 1 m/s and then transform the output local coordinates into the coordinates in terms of CGCS2000 via a planar four-parameter coordinate transformation model [69]. Further taking the known coordinates of control points as the true coordinates, the performance of the transformed coordinates can be evaluated by Equation (30). Figure 10 shows the cumulative distribution function (CDF) of the transformed coordinates for the handheld SLAM device. The RMSE, 95% error, 68% error, ME, MaxE, and Minimum error are 0.097 m, 0.165 m, 0.116 m, 0.083 m, 0.188 m, and 0.005 m, respectively. This performance is much more accurate than the meter-level positioning performance of the smartphone, therefore the handheld SLAM device is competent to evaluate the positioning performance of the smartphone.
The second experiment is designed to evaluate the performance of WiFi-AGNES. We randomly select 9 control points in Figure 8, and the employed WiFi technology is executed individually for about four minutes at each control point in the static state. When Equation (3) is satisfied, we will execute WiFi-AGNES. Further taking the coordinates of the control points as the true coordinates, the performance of the WiFi technology with and without WiFi-AGNES can be evaluated by Equation (30), respectively. Table 2 and Figure 11 show the various statistical results and the CDF for the evaluated 9 control points, respectively. Where AGNES and NO-AGNES represent the positioning of WiFi technology with and without WiFi-AGNES, respectively. In Table 2 (8) in WiFi-AGNES. These performances demonstrate that the proposed strategy WiFi-AGNES is effective. Figure 11 verifies the above performance from another perspective.

Experimental Setup and Performance
We set three experiments, and the performance can be evaluated as follows: where true P and loc P represent the two-dimensional true coordinates and the to be evaluated positioning coordinates, respectively, E represents the positioning error.
The first experiment is designed to demonstrate the capacity of the handheld SLAM device in evaluating the positioning performance of the smartphone. We run the device along the experimental path shown in Figure 8 with a normal walking speed of approximately 1 m/s and then transform the output local coordinates into the coordinates in terms of CGCS2000 via a planar four-parameter coordinate transformation model [69]. Further taking the known coordinates of control points as the true coordinates, the performance of the transformed coordinates can be evaluated by Equation (30). Figure 10 shows the cumulative distribution function (CDF) of the transformed coordinates for the handheld SLAM device. The RMSE, 95% error, 68% error, ME, MaxE, and Minimum error are 0.097 m, 0.165 m, 0.116 m, 0.083 m, 0.188 m, and 0.005 m, respectively. This performance is much more accurate than the meter-level positioning performance of the smartphone, therefore the handheld SLAM device is competent to evaluate the positioning performance of the smartphone. The second experiment is designed to evaluate the performance of WiFi-AGNES. We randomly select 9 control points in Figure 8, and the employed WiFi technology is executed individually for about four minutes at each control point in the static state. When The third experiment is designed to evaluate the performance of the proposed strategy WiFi-Chain and the proposed hybrid fusion scheme which integrates the two proposed outlier detection and removal strategies. We take the experiment along the experimental path showed in Figure 8 with a normal walking speed of approximately 1 m/s, and a snapshot is showed in Figure 9. To evaluate the performance of WiFi-Chain, we record the WiFi which satisfies both Equation (10) and Equation (12), and to evaluate the performance of WiFi-AGNES again, considering the requirement of evaluation and time costs, we will stand at each of the remaining three control points (i.e., P3, P9, and P12) for about 1 minute (i.e., about 40 WiFi coordinates will be received at each control point) during the experiment. Further taking the transformed coordinates of the handheld SLAM device as the true coordinates of the scheme and WiFi-Chain, and the coordinates of the control points as the true coordinates of WiFi-AGNES, the performance can be evaluated via Equation (30)   The third experiment is designed to evaluate the performance of the proposed strategy WiFi-Chain and the proposed hybrid fusion scheme which integrates the two proposed outlier detection and removal strategies. We take the experiment along the experimental path showed in Figure 8 with a normal walking speed of approximately 1 m/s, and a snapshot is showed in Figure 9. To evaluate the performance of WiFi-Chain, we record   Table 3 and Figure 12a show the various statistical results and the CDF of the positioning error, respectively. Where Fusion represents the fusion scheme with neither WiFi-AGNES nor WiFi-Chain, Fusion+WiFi-AGNES and Fusion+WiFi-Chain represent the fusion scheme which only integrates WiFi-AGNES and WiFi-Chain, respectively, Fusion+WiFi-AGNES+WiFi-Chain represents the proposed hybrid fusion scheme which integrates both WiFi-AGNES and WiFi-Chain. In Table 3, compared to Fusion, which achieves an RMSE, 95% error, 68% error, ME, MaxE, and Minimum error of 2.08 m, 4.12 m, 1.93 m, 1.67 m, 7.47 m, and 0.04 m, respectively, the improved percentages of Fusion+WiFi-Chain are 18.8%, 20.6%, 9.3%, 15.0%, 42.4%, and 50.0%, respectively, the improved percentages of Fusion+WiFi-AGNES are 9.1%, 5.1%, 16.1%, 13.8%, 0.0%, and 0.0%, respectively, and the improved percentages of Fusion+WiFi-AGNES+WiFi-Chain are 31.3%, 35.9%, 19.2%, 28.7%, 46.6%, and 50.0%, respectively. It is obvious that the proposed hybrid fusion scheme achieves the best performance, which demonstrates that the two proposed strategies are effective in outlier detection and removal. Figure 12a verifies the above performance from another perspective. points as the true coordinates of WiFi-AGNES, the performance can be evaluated via Equation (30). Table 3 and Figure 12a show the various statistical results and the CDF of the positioning error, respectively. Where Fusion represents the fusion scheme with neither WiFi-AGNES nor WiFi-Chain, Fusion+WiFi-AGNES and Fusion+WiFi-Chain represent the fusion scheme which only integrates WiFi-AGNES and WiFi-Chain, respectively, Fu-sion+WiFi-AGNES+WiFi-Chain represents the proposed hybrid fusion scheme which integrates both WiFi-AGNES and WiFi-Chain. In Table 3, compared to Fusion, which achieves an RMSE, 95% error, 68% error, ME, MaxE, and Minimum error of 2.08 m, 4.12 m, 1.93 m, 1.67 m, 7.47 m, and 0.04 m, respectively, the improved percentages of Fu-sion+WiFi-Chain are 18.8%, 20.6%, 9.3%, 15.0%, 42.4%, and 50.0%, respectively, the improved percentages of Fusion+WiFi-AGNES are 9.1%, 5.1%, 16.1%, 13.8%, 0.0%, and 0.0%, respectively, and the improved percentages of Fusion+WiFi-AGNES+WiFi-Chain are 31.3%, 35.9%, 19.2%, 28.7%, 46.6%, and 50.0%, respectively. It is obvious that the proposed hybrid fusion scheme achieves the best performance, which demonstrates that the two proposed strategies are effective in outlier detection and removal. Figure 12a verifies the above performance from another perspective.     Table 3, Figure 12b, and Figure 13 show the various statistical results, the CDF of the positioning error, and the frequency distribution histogram, respectively. Where All-WiFi represents the total received WiFi during the third experiment, WiFi-Chain-WiFi and No-WiFi-Chain-WiFi represent the WiFi which satisfies and dissatisfies WiFi-Chain, respectively. In Table 3, compared to All-WiFi, which achieves an RMSE, 95% error, 68% error, ME, MaxE, and Minimum error of 2.97 m, 6.97 m, 2.25 m, 2.10 m, 9.44 m, and 0.02 m, respectively, the improved percentages of WiFi-Chain-WiFi are 44.8%, 47.8%, 19.1%, 41.4%, 55.5%, and 0.0%, respectively. As shown in Figure 13b, there are totally 274 WiFi received during the third experiment, which possesses the characteristics extracted in Section 3.2.1, and we have picked out 203 WiFi which satisfy WiFi-Chain as shown in Figure 13a, the maximum error of WiFi-Chain-WiFi is less than 4.3 m. Moreover, we showed the WiFi which dissatisfies WiFi-Chian in Figure 13c, which includes all WiFi with an error greater than 4.3 m. The abovementioned results demonstrate that the proposed strategy WiFi-Chian possesses an excellent performance in outlier detection and removal. Figure 12b verifies the above performance from another perspective.
tively. In Table 3, compared to All-WiFi, which achieves an RMSE, 95% error, 68% error, ME, MaxE, and Minimum error of 2.97 m, 6.97 m, 2.25 m, 2.10 m, 9.44 m, and 0.02 m, respectively, the improved percentages of WiFi-Chain-WiFi are 44.8%, 47.8%, 19.1%, 41.4%, 55.5%, and 0.0%, respectively. As shown in Figure 13b, there are totally 274 WiFi received during the third experiment, which possesses the characteristics extracted in Section 3.2.1, and we have picked out 203 WiFi which satisfy WiFi-Chain as shown in Figure  13a, the maximum error of WiFi-Chain-WiFi is less than 4.3 m. Moreover, we showed the WiFi which dissatisfies WiFi-Chian in Figure 13c, which includes all WiFi with an error greater than 4.3 m. The abovementioned results demonstrate that the proposed strategy WiFi-Chian possesses an excellent performance in outlier detection and removal. Figure  12b verifies the above performance from another perspective.

Conclusions
In this paper, based on the extracted positioning characteristics of WiFi when the pedestrian is static, we proposed the first outlier detection and removal strategy using ML named WiFi-AGNES, and based on the extracted positioning characteristics of PDR, WiFi, and the complementarity between them when the pedestrian is walking, we proposed the second outlier detection and removal strategy named WiFi-Chain, further, a hybrid fusion scheme integrated the two proposed strategies is proposed. The designed experiment result shows that the RMSE of the proposed scheme is 1.43 m, which is 31.3% higher than the fusion with neither WiFi-AGNES nor WiFi-Chain. This performance demonstrates that the proposed two strategies possess an excellent performance in outlier detection and removal. Furthermore, since characteristics are universal, the characteristic-based strategies are robust, and the proposed scheme integrated the two characteristic-based strategies also possesses strong robustness.

Conclusions
In this paper, based on the extracted positioning characteristics of WiFi when the pedestrian is static, we proposed the first outlier detection and removal strategy using ML named WiFi-AGNES, and based on the extracted positioning characteristics of PDR, WiFi, and the complementarity between them when the pedestrian is walking, we proposed the second outlier detection and removal strategy named WiFi-Chain, further, a hybrid fusion scheme integrated the two proposed strategies is proposed. The designed experiment result shows that the RMSE of the proposed scheme is 1.43 m, which is 31.3% higher than the fusion with neither WiFi-AGNES nor WiFi-Chain. This performance demonstrates that the proposed two strategies possess an excellent performance in outlier detection and removal. Furthermore, since characteristics are universal, the characteristic-based strategies are robust, and the proposed scheme integrated the two characteristic-based strategies also possesses strong robustness.
In the future, we will integrate other smartphone indoor positioning technologies, such as MM and Visual, etc., into our proposed hybrid fusion scheme to provide more excellent performance.