Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons

Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target’s location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment.

‚ BLE beacons are usually battery powered, which are more flexible and easier deployed than WiFi.
Therefore in this paper, we proposes an algorithm for smartphone-based indoor localization using RSS signals from BLE beacons.
Currently, there are mainly three RSS-based BLE localization techniques: proximity, trilateration (or, range-based) and fingerprinting (FP) [20]. The proximity algorithms pre-set an event-triggering threshold for a coverage area. If the RSS values are stronger than the threshold, the target is indicated in the area. In the trilateration algorithm, the propagation model (PM) is used to estimate the distances between the target and BLE beacons. Then, the estimation technique (e.g., least squares or Kalman filtering) is used to estimate the target's location from these distances and the locations of BLE beacons. FP first builds a radio map database where RSSs from available BLE beacons are mapped to absolute positions. Then, the target's location is determined by using the average or weighted average of the locations of several most matched fingerprints in the database.
To further improve accuracy and robustness, we propose an algorithm for indoor localization with BLE beacons by using the combination of polynomial regression model (PRM), FP, outlier detection and extended Kalman filtering (EKF). Since indoor environments are complicated by reflection, shadowing and multipath, the radio PM may be not accurate for distance estimation in the indoor environment. Therefore, the proposed algorithm uses PRM to estimate the distances between the target and BLE beacons. Meanwhile, FP is also used to determine the target's location.
Most of existing research combines RSS from all channels to obtain an aggregate signal (the aggregate mode). However, we notice that the signals and noises from various advertisement channels may be different, which is shown in Figure 1. In Figure 1, RSS values from three advertisement channels of a BLE beacon were collected by keeping an iPhone 4S static and five meters away. As reported by iOS 8, the mean and standard deviation (STD) of RSS values from various advertisements were different (i.e., Channel 37: Mean´71.9 dBm, STD 2.4 dBm; Channel 38: Mean´68.1 dBm, STD 0.8 dBm; and Channel 39: Mean´78.8 dBm, STD 2.7 dBm). This difference might be caused by the different channel gain and multipath effect. Therefore, one advantage of this research is that it considers that signals and noises in RSS measurements from various channels may be different and thus uses separate PRM and FP for three advertisement channels (i.e., the separate mode), that is, the PRM and FP are used for each advertisement channel separately. Therefore, there are a maximum of three PRM distance Sensors 2016, 16, 596 3 of 20 estimates for each observed BLE beacon and three FP location estimates. Furthermore, to enhance the robustness of the algorithm, a two-outlier-detection mechanism is designed. These estimates are fed into the first outlier detection to generate improved distance estimates for observed BLE beacons. Next, these improved distance estimates are utilized as measurements for the EKF. Finally, the second outlier detection algorithm is further performed to remove the outliers from the measurements by using the statistical testing method.
Sensors 2016, 16, x 3 of 20 mechanism is designed. These estimates are fed into the first outlier detection to generate improved distance estimates for observed BLE beacons. Next, these improved distance estimates are utilized as measurements for the EKF. Finally, the second outlier detection algorithm is further performed to remove the outliers from the measurements by using the statistical testing method. The contribution of this paper is listed as follows: (1) We propose the usage of the separate PRM to improve both the location and distance estimation for each advertisement channel of BLE beacons. Moreover, we originally generate separate radio map database for each BLE advertisement channel for the FP process. (2) We originally propose an algorithm for BLE-based indoor localization by combing separate PRM, separate FP, EKF and outlier detection. (3) We propose a two-level outlier detection algorithm to improve the robustness of the system.
The proposed algorithms are validated in detail by field experiments with both sparsely and densely distributed BLE beacons. Part of the outcomes are: (1) Compared with results that use traditional PM, the distance estimation accuracy is improved by 18.42% using the PRM. (2) In the case of dense deployment of BLE beacons, the proposed algorithm achieves average 35.82% and 15.77% improvement of the location accuracy in two trajectories, compared with classical PM + EKF and FP + EKF, respectively. The improvement changes to 49.58% and 21.41% in the sparse deployment.
This paper is organized as follows. Section 2 reviews the related work. Section 3 presents the proposed algorithm, Section 4 evaluates the algorithm with various field experiments and Section 5 draws the conclusions.

Related Work
Current localization algorithms for BLE beacons can be divided into three classes: proximity [21,22,23], range-based [21,[24][25][26] and FP [4,21,27,28]. In the first class, the work [21] evaluates the performance of BLE localization by using proximity in a multi-floor building. A triggering threshold optimization method is proposed for proximity based positioning [22]. The research [23] proposes a particle filtering for proximity based indoor positioning. In the second class, a PM is used in a multi- The contribution of this paper is listed as follows: (1) We propose the usage of the separate PRM to improve both the location and distance estimation for each advertisement channel of BLE beacons. Moreover, we originally generate separate radio map database for each BLE advertisement channel for the FP process. (2) We originally propose an algorithm for BLE-based indoor localization by combing separate PRM, separate FP, EKF and outlier detection. (3) We propose a two-level outlier detection algorithm to improve the robustness of the system.
The proposed algorithms are validated in detail by field experiments with both sparsely and densely distributed BLE beacons. Part of the outcomes are: (1) Compared with results that use traditional PM, the distance estimation accuracy is improved by 18.42% using the PRM. (2) In the case of dense deployment of BLE beacons, the proposed algorithm achieves average 35.82% and 15.77% improvement of the location accuracy in two trajectories, compared with classical PM + EKF and FP + EKF, respectively. The improvement changes to 49.58% and 21.41% in the sparse deployment.
This paper is organized as follows. Section 2 reviews the related work. Section 3 presents the proposed algorithm, Section 4 evaluates the algorithm with various field experiments and Section 5 draws the conclusions.

Related Work
Current localization algorithms for BLE beacons can be divided into three classes: proximity [21][22][23], range-based [21,[24][25][26] and FP [4,21,27,28]. In the first class, the work [21] evaluates the performance of BLE localization by using proximity in a multi-floor building. A triggering threshold optimization method is proposed for proximity based positioning [22]. The research [23] proposes a particle filtering for proximity based indoor positioning. In the second class, a PM is used in a multi-floor building for BLE localization [21]. The research [24] presents a modified PM, called iRingLA, for BLE localization, in which the ring is used instead of circle for trilateration. Another approach called stigmergic is presented for range-based indoor localization [25]. This approach can reduce the effects of multipath, fading, and shadowing for BLE positioning. The work [26] proposes several empirical PMs for BLE based indoor localization in different conditions such as indoor/outdoor and line-of-sight (LOS)/non-line-of-sight (NLOS). In the third class, the research [21] presents a FP for BLE localization in a multi-floor building. The work [27] compares three FP algorithms: k-nearest neighbors (k-NN), neural networks, and support vector machine (SVM). The result shows that k-NN is a good candidate for localization in real-life applications. The research [4] provides a detailed study of the effects of beacon density, transmit power and transmit frequency for BLE FP. A FP solution using Weibull probability distribution is proposed in [28] to improve the reliability and accuracy of the positioning. In addition, the paper [29] presents a comparative analysis of contemporary BLE indoor positioning solutions, taking into account the classification, comparison and various considerations that are required for designing new indoor positioning approaches. All the above research combines RSS from all channels to obtain an aggregate signal, instead of using separate model for each separate BLE advertisement channel.

Algorithm Description
This section first presents the overview of the whole system. Next, PRM is described for distance estimation, which is followed by FP for location estimation. Then, "Outlier Detection-Level 1" uses the information from PRM and FP to remove the outliers from the measurements and generate improved distance measurements for EKF. Finally, the details of EKF and "Outlier Detection-Level 2" are described to further remove outliers and achieve an enhanced indoor localization solution.
neighbors (k-NN), neural networks, and support vector machine (SVM). The result shows that k-NN is a good candidate for localization in real-life applications. The research [4] provides a detailed study of the effects of beacon density, transmit power and transmit frequency for BLE FP. A FP solution using Weibull probability distribution is proposed in [28] to improve the reliability and accuracy of the positioning. In addition, the paper [29] presents a comparative analysis of contemporary BLE indoor positioning solutions, taking into account the classification, comparison and various considerations that are required for designing new indoor positioning approaches. All the above research combines RSS from all channels to obtain an aggregate signal, instead of using separate model for each separate BLE advertisement channel.

Algorithm Description
This section first presents the overview of the whole system. Next, PRM is described for distance estimation, which is followed by FP for location estimation. Then, "Outlier Detection-Level 1" uses the information from PRM and FP to remove the outliers from the measurements and generate improved distance measurements for EKF. Finally, the details of EKF and "Outlier Detection-Level 2" are described to further remove outliers and achieve an enhanced indoor localization solution.

System Overview
The overview of the proposed algorithm is shown in Figure 2, which mainly consists of FP, PRM, EKF, and a two-level outlier detection mechanism. The processed data for the proposed algorithm are RSS values from three advertisement channels of observed BLE beacons. The first process for these RSS values is passing a smoother. Next, we originally use channel-separate models/databases to process RSS values in FP and PRM to generate location and distance estimates. Since BLE RSS values from three advertisement channels are processed separately, there could be three different FPderived locations and PRM-derived distances. These derived location and distance estimates are then fed into the "Outlier Detection-Level 1" to generate improved FP + PRM distance estimates using statistical methods. The EKF is then used to process these enhanced FP + PRM distance estimates to estimate the target's location. The second outlier detection based on statistical testing is utilized to further remove the outliers. After the EKF computation with outlier detection, EKF outputs the final BLE localization solution. The final BLE localization solution is also fed to the radio map database to select appropriate fingerprints to improve the FP performance. Overall, the purpose of the proposed algorithm is to achieve a robust and accurate localization solution for the target.

System Overview
The overview of the proposed algorithm is shown in Figure 2, which mainly consists of FP, PRM, EKF, and a two-level outlier detection mechanism. The processed data for the proposed algorithm are RSS values from three advertisement channels of observed BLE beacons. The first process for these RSS values is passing a smoother. Next, we originally use channel-separate models/databases to process RSS values in FP and PRM to generate location and distance estimates. Since BLE RSS values from three advertisement channels are processed separately, there could be three different FP-derived locations and PRM-derived distances. These derived location and distance estimates are then fed into the "Outlier Detection-Level 1" to generate improved FP + PRM distance estimates using statistical methods. The EKF is then used to process these enhanced FP + PRM distance estimates to estimate the target's location. The second outlier detection based on statistical testing is utilized to further remove the outliers. After the EKF computation with outlier detection, EKF outputs the final BLE localization solution. The final BLE localization solution is also fed to the radio map database to select appropriate fingerprints to improve the FP performance. Overall, the purpose of the proposed algorithm is to achieve a robust and accurate localization solution for the target.

Polynomial Regression Model
In indoor localization using BLE beacons, the radio PM is usually used to model the relationship between measured RSS readings and corresponding distances. The mostly used model is the lognormal shadowing model as follows [30,31]: where γ represents the path-loss exponent, P pd 0 q represents the RSS at the reference distance, d 0 , P pdq represents the RSS at the distance between the access point (AP) and the receiver, d, and X σ represents a Gaussian random variable, with zero mean, caused by shadow fading [31]. After the model calibration for propagation parameters, the PM model can be used to convert the RSS values to distances between wireless APs and the receiver.
PM is derived from the outdoor environment and works well for LOS scenarios; however, PM may be inaccurate for distance estimation indoors since the indoor environment is complicated by reflection, shadowing and multipath. Actually, indoor wireless signals consist of both LOS and NLOS signals. These signals may have different propagation parameters. Furthermore, it is difficult to differentiate the LOS and NLOS signals in indoor environments. Therefore, the fitted theoretical PM from all training data may not work well for both LOS and NLOS signals and result in large errors in distance estimation.
To overcome this problem, this paper proposes the PRM to model the relationship between RSS and distance for BLE beacons. Different from PM, PRM assumes that the RSS-distance relationship is a nth-degree polynomial and the polynomial coefficients are estimated from the training data in the calibration process. The PRM is given byd where c i are the coefficients of the nth-degree polynomial, RSS is the RSS value, andd PRM is the estimated distance. The sum of the model fitting error squares is expressed as where M is the number of the calibration points, d j represents the true distance between the jth calibration point and the BLE beacon, j " 1, 2, . . . , M, and RSS j is the RSS at the jth calibration point. From the criterion of least squares estimation, E pc 0 , c 1 ,¨¨¨, c n q should be minimized. To achieve the minimum of E pc 0 , c 1 ,¨¨¨, c n q, we equate its partial derivation to zero with respect to c 0 , c 1 ,¨¨¨, c n , and get Equation (4) can be rewritten as follows: Sensors 2016, 16, 596 and C " " Finally, the polynomial coefficients can be calculated by After the calibration, PRM can be used to convert RSS values to distances. In this paper, three PRMs are used to estimate distances for three advertisement channels of each BLE beacon. Therefore, there can be three distance estimates for each BLE beacon. When using PRM, the polynomial degree, n, should be carefully determined. If the polynomial degree is small, the distance estimation error will be large. In contrast, if the polynomial degree is large, the computation load is large. Therefore, the polynomial degree is the balance of the distance estimation error and computation load. The field experiments will further discuss the set of the polynomial degree for PRM.

Fingerprinting
The BLE radio map database for FP was constructed by surveying some points' locations and taking corresponding RSS values. Gaussian Process Regression (GPR) was used to fill the gaps between the surveyed points. We constructed a separate database for each advertisement channel (i.e., the separate mode). Another database was also constructed by dealing all advertisement channels together (i.e., the aggregate mode). In the field experiments, these databases will be used for the evaluation of the FP performance.
On the FP positioning phase, the search space for BLE FP can be limited to an ellipse as determined by the proposed localization solution at the previous epoch, as shown in Figure 3. This localization solution at the previous epoch is used as the center of the ellipse. The major semi-axis, minor semi-axis, and the azimuth of the ellipse are calculated by using the location covariance matrix of the previous epoch from the EKF, that is where σ 2 N and σ 2 E represent the north and east variances; σ NE and σ EN are the north/east and east/north covariances. The major semi-axis of the ellipse is The minor semi-axis of the ellipse is And the azimuth of the major semi-axis is where tan´1 4 p¨q represents a four-quadrant arctangent operator. s represents the scale factor for the size change of the ellipse, which is set at 5 in this paper. Readers can refer to [32] for the details of the confidence ellipse for measurements. And the azimuth of the major semi-axis is where   1 4 tan ( ) represents a four-quadrant arctangent operator. s represents the scale factor for the size change of the ellipse, which is set at 5 in this paper. Readers can refer to [32] for the details of the confidence ellipse for measurements. After the search space is determined, the Euclidean distances are calculated by using current measured fingerprint and the fingerprints in the search space, and the equation is given by where i ED is the Euclidean distance, m RSS is the measured RSS vector, i RSS is the RSS vector of the th i fingerprint in the search space of the radio map database and N is the length of the measured RSS vector. Then, the target's location is calculated as the weighted average of the locations of the selected fingerprints, and the weight is determined by the inverse of the Euclidean distance. The equation for this calculation is given by where FP r is the estimated location of the target, i r is the location of the th i selected fingerprint,  i is the weight corresponding to the th i selected fingerprint, i ED is the Euclidean distance between the th i selected fingerprint and the measured fingerprint and K is the total number of selected fingerprints. In this paper, FP is used for each advertisement channel, and can achieve three location estimates for each BLE beacon.

Outlier Detection-Level 1
As per the previous discussion, three PRM-derived distances for each observed BLE beacon and three FP-derived locations can be imported to "Outlier Detection-Level 1" to generate an enhanced FP + PRM distance estimate for each observed BLE beacon. In this paper, three FP-derived locations are represented by  After the search space is determined, the Euclidean distances are calculated by using current measured fingerprint and the fingerprints in the search space, and the equation is given by where ED i is the Euclidean distance, RSS m is the measured RSS vector, RSS i is the RSS vector of the i th fingerprint in the search space of the radio map database and N is the length of the measured RSS vector. Then, the target's location is calculated as the weighted average of the locations of the selected fingerprints, and the weight is determined by the inverse of the Euclidean distance. The equation for this calculation is given by where r FP is the estimated location of the target, r i is the location of the i th selected fingerprint, ω i is the weight corresponding to the i th selected fingerprint, ED i is the Euclidean distance between the i th selected fingerprint and the measured fingerprint and K is the total number of selected fingerprints. In this paper, FP is used for each advertisement channel, and can achieve three location estimates for each BLE beacon.

Outlier Detection-Level 1
As per the previous discussion, three PRM-derived distances for each observed BLE beacon and three FP-derived locations can be imported to "Outlier Detection-Level 1" to generate an enhanced FP + PRM distance estimate for each observed BLE beacon. In this paper, three FP-derived locations are represented by r FP37 , r FP38 , and r FP39 . And, three PRM-derived distances are represented by d PM37,i , d PM38,i , and d PM39,i for the i th BLE beacon. In the first step of "Outlier Detection-Level 1", FP-derived locations in Equation (15) are used to calculate the FP-derived distances for each observed BLE beacon through the following equation:d (   16) where r BLE,i is the coordinate of the i th BLE beacon. d FP37,i , d FP38,i , and d FP39,i are three FP-derived distances for the i th BLE beacon. norm is the Euclidean norm (2-norm), and is given by where Nx is the length of the vector x, and x i is the term of the vector x.
The statistical method is used to remove the outliers in these six distance estimates: three PRM-derived distance estimates and three FP-derived distances. The confidence interval for these six distance estimates is given byd whered i are the distance estimates (i.e., FP-derived and PRM-derived distances) for the i th BLE beacon.
µd i and σd i are the mean and STD ofd i . If any distance estimate is out of the confidence interval, it is moved out as an outlier. Finally, the enhanced FP+PRM distance estimate,d i,enhanced , for the i th BLE beacon is given byd ( 19) whered i,trust are FP-derived and PRM-derived distance estimates in the confidence interval.

Extended Kalman Filtering
The EKF is used in the proposed algorithm to estimate the target's current location by fusing current and historical information. In this paper, the state vector of the EKF is given by x " " r e r n v e v n ı T where r e and r n are 2D position components (i.e., east and north) in the horizontal plane. v e and v n are their corresponding 2D velocity components. In the proposed algorithm, the EKF system model is the typical kinematic model and is given by where x k`1|k is the predicted state vector, x k|k is the previous state vector at epoch k and Φ k,k`1 is a 4ˆ4 transition matrix: where ∆t is the time difference between two epochs. ω k is the process noise with the covariance matrix, Q k " E`ω k ω T k˘, and is given by where e,k and n,k are modelled as white noises for velocity in the east and north directions at epoch k.
The measurement model is given by where z k " "d 1,enhancedd2,enhanced¨¨¨dm,enhanced ı T uses distance estimates from the "Outlier Detection-Level 1" as the measurement vector. H k is the design matrix, and υ k is the measurement noise modelled as a Gaussian white noise, and its covariance matrix is R k " E`υ k υ T k˘. In this paper, we define that r e,t and r n,t are 2D position components of the target's location, which are predicted by the EKF; r e,i and r n,i , i " 1, 2, . . . , m, are 2D position components of the BLE beacons' locations. The design matrix is given by pr e,1´re,t q 2`p r n,1´rn,t q 2´r n,1´rn,t b pr e,1´re,t q 2`p r n,1´rn,t q 2 0 0 r e,2´re,t b pr e,2´re,t q 2`p r n,2´rn,t q 2´r n,2´rn,t b pr e,2´re,t q 2`p r n,2´rn,t q 2 0 0 . . .
There are two phases in the EKF process: prediction and update. In the prediction process, the state vector and covariance matrix are predicted from the system model: In the update process, the state vector and covariance matrix are updated from the measurement model: where K k is the Kalman gain.

Outlier Detection-Level 2
In this section, statistical testing on the innovations of the EKF is presented as the second level of outlier detection. When using EKF, we assume the following two conditions: (1) the measurement noise is zero-mean, white and Gaussian distributed; and (2) the process noise is zero-mean, white, and Gaussian distributed. Based on these assumptions, the innovation sequence will be zero-mean, white and Gaussian distributed. The equation for the innovation sequence can be given as where τ k is the innovation, z k is the observed measurement, andẑ k|k´1 " H k x k|k´1 is the predicted measurement. The innovations have the following covariance matrix [33]: where C τ k is the covariance matrix of the innovation, H k is the design matrix, P k|k´1 is the state covariance matrix and R k is the measurement covariance matrix. Given the assumptions stated above, the innovation sequence is distributed as ( 30) where N pµ, C σ q represents the normal distribution with mean of µ and covariance of C σ . The confidence intervals for the individual measurements are then calculated [33]. If these are violated, the measurement is considered as an outlier, and removed from the EKF.

Experimental Setup
To evaluate the performance of the proposed algorithm, we conducted experiments in an office environment (60 mˆ40 m). These experiments were conducted with two different deployments of BLE beacons as shown in Figure 4: (a) dense deployment (total: 20 beacons; average: 1 beacon per 9 m) and (b) sparse deployment (total: 8 beacons; average: 1 beacon per 18 m). All these beacons were installed on the wall at a height of approximately 1.5 m. To balance the power consumption and accuracy, each beacon was set to 10 Hz sample rate with´16 dBm transmit power [4]. The locations of BLE beacons were measured by using a commercial laser rangefinder and the floor plan of the building. Since the floor plan was verified to be very accurate (which had been used for engineering construction), the location errors of BLE beacons were less than 1 m. Two different iPhone 4S with iOS 8 were used by different testers in the experiments. The BLE beacons used in the experiments are based on Texas Instruments (TI) BLE chip CC2540. This chip broadcasts channel information and RSS value together for each advertisement channel, which can be received by an iOS device when using iOS 7 or above.
where    ,C N represents the normal distribution with mean of  and covariance of  C . The confidence intervals for the individual measurements are then calculated [33]. If these are violated, the measurement is considered as an outlier, and removed from the EKF.

Experimental Setup
To evaluate the performance of the proposed algorithm, we conducted experiments in an office environment (60 m × 40 m). These experiments were conducted with two different deployments of BLE beacons as shown in Figure 4: (a) dense deployment (total: 20 beacons; average: 1 beacon per 9 m) and (b) sparse deployment (total: 8 beacons; average: 1 beacon per 18 m). All these beacons were installed on the wall at a height of approximately 1.5 m. To balance the power consumption and accuracy, each beacon was set to 10 Hz sample rate with −16 dBm transmit power [4]. The locations of BLE beacons were measured by using a commercial laser rangefinder and the floor plan of the building. Since the floor plan was verified to be very accurate (which had been used for engineering construction), the location errors of BLE beacons were less than 1 m. Two different iPhone 4S with iOS 8 were used by different testers in the experiments. The BLE beacons used in the experiments are based on Texas Instruments (TI) BLE chip CC2540. This chip broadcasts channel information and RSS value together for each advertisement channel, which can be received by an iOS device when using iOS 7 or above.
Many trajectories were tested in the experiments and two of them were selected for performance demonstration in this paper. Several reference points with surveyed locations were used to generate the ground truth. A stopwatch was used to record the time when the testers passed these reference points and thereafter, the locations of the ground truth between these reference points were generated through interpolation. The interpolation method used the stopwatch information and assumed the person walked with a constant speed between two reference points, which as per the instructions to testers when collecting the data.   Many trajectories were tested in the experiments and two of them were selected for performance demonstration in this paper. Several reference points with surveyed locations were used to generate the ground truth. A stopwatch was used to record the time when the testers passed these reference points and thereafter, the locations of the ground truth between these reference points were generated through interpolation. The interpolation method used the stopwatch information and assumed the person walked with a constant speed between two reference points, which as per the instructions to testers when collecting the data.

Performance of Polynomial Regression Model for Distance Estimation
The performance of PRM-based distance estimation is discussed in this section. First, we need to determine the polynomial degree, n, for the PRM. As per the previous discussion, the selection of the polynomial degree is a trade-off between distance accuracy and computation load. We collected RSS values from BLE beacons at several points with known locations in the dense deployment shown in Figure 4. This data was used to estimate the PM and PRM with different polynomial degrees for each advertisement channel of the BLE beacons. In the estimation, the selected range for polynomial degree is set to n " 1, 2, . . . , 5, and the corresponding PRM models are called POLY1, POLY2, POLY3, POLY4, and POLY5. In the dense deployment, there are total 60 advertisement channels (i.e., 3 channels/beacon 20 beacons = 60 channels). Therefore, there are 60 channel models in total for each method (i.e., PM, POLY1, POLY2, POLY3, POLY4 and POLY5). Each channel model has an average distance estimation error, therefore there are a total of 60 average distance estimation errors for each method (i.e., PM, POLY1, POLY2, POLY3, POLY4, and POLY5). The cumulative distribution function (CDF) of these 60 average distance estimation errors for each method, are plotted in Figure 5. As shown in Figure 5, POLY2, POLY3, POLY4 and POLY5 have similar performance and are better than POLY1 and PM. On the other hand, POLY2 has the smallest computation load among these four methods. Therefore, POLY2 is selected for the PRM and PRM represents POLY2 for the rest of the paper. From Figure 5, we can find that the 90% distance estimation error using PRM is 3.1 m, which is reduced by 18.42% over the PM (i.e., 3.8 m). The performance of PRM-based distance estimation is discussed in this section. First, we need to determine the polynomial degree, n , for the PRM. As per the previous discussion, the selection of the polynomial degree is a trade-off between distance accuracy and computation load. We collected RSS values from BLE beacons at several points with known locations in the dense deployment shown in Figure 4. This data was used to estimate the PM and PRM with different polynomial degrees for each advertisement channel of the BLE beacons. In the estimation, the selected range for polynomial degree is set to  1,2, ,5 n , and the corresponding PRM models are called POLY1, POLY2, POLY3, POLY4, and POLY5. In the dense deployment, there are total 60 advertisement channels (i.e., 3 channels/beacon × 20 beacons = 60 channels). Therefore, there are 60 channel models in total for each method (i.e., PM, POLY1, POLY2, POLY3, POLY4 and POLY5). Each channel model has an average distance estimation error, therefore there are a total of 60 average distance estimation errors for each method (i.e., PM, POLY1, POLY2, POLY3, POLY4, and POLY5). The cumulative distribution function (CDF) of these 60 average distance estimation errors for each method, are plotted in Figure 5. As shown in Figure 5, POLY2, POLY3, POLY4 and POLY5 have similar performance and are better than POLY1 and PM. On the other hand, POLY2 has the smallest computation load among these four methods. Therefore, POLY2 is selected for the PRM and PRM represents POLY2 for the rest of the paper. From Figure 5, we can find that the 90% distance estimation error using PRM is 3.1 m, which is reduced by 18.42% over the PM (i.e., 3.8 m).   Figure 6 shows the measured data and distance estimation results using PM and PRM, in which PRM works better than PM in indoor environments. It seems that the relationship between BLE RSS and distance does not obey the log path-loss model in complex indoor environments due to the effect of fading, reflection, multipath, etc. Figure 7 depicts the CDF functions of the distance estimation errors in the selected channel using PM and PRM. The 90% error of distance estimation using PRM is 2.6 m, which is only 47.24% of that uses PM (5.5 m). For the demonstration purpose, one BLE channel is used as an example to compare the PRM and PM. The PRM and PM functions are given by d " 0.01582ˆRSS 2´0 .03178ˆRSS`0.06341 (31) and RSS "´79.5891´10ˆ1.2954ˆlog 10 pdq (32) Figure 6 shows the measured data and distance estimation results using PM and PRM, in which PRM works better than PM in indoor environments. It seems that the relationship between BLE RSS and distance does not obey the log path-loss model in complex indoor environments due to the effect of fading, reflection, multipath, etc. Figure 7 depicts the CDF functions of the distance estimation errors in the selected channel using PM and PRM. The 90% error of distance estimation using PRM is 2.6 m, which is only 47.24% of that uses PM (5.5 m).  Next, we compare the performance of the distance estimation using the separate PRM for each advertisement channel with that uses the aggregate PRM for all advertisement channels. For the illustration purpose, one BLE beacon is used as an example for this comparison. Figure 8 depicts CDF functions of the distance estimation errors for the selected BLE beacon using separate PRM and aggregate PRM. The 90% distance estimation errors of three separate PRMs are 5.1 m, 5.2 m and 5.0 m; which are decreased by 19.05%, 17.46% and 20.63% over the aggregate PRM (6.3 m) respectively. Therefore, our tests indicate that separate PRM can provide more accurate distance estimation than the aggregate PRM. Thus, separate PRM is adopted in this paper.  Next, we compare the performance of the distance estimation using the separate PRM for each advertisement channel with that uses the aggregate PRM for all advertisement channels. For the illustration purpose, one BLE beacon is used as an example for this comparison. Figure 8 depicts CDF functions of the distance estimation errors for the selected BLE beacon using separate PRM and aggregate PRM. The 90% distance estimation errors of three separate PRMs are 5.1 m, 5.2 m and 5.0 m; which are decreased by 19.05%, 17.46% and 20.63% over the aggregate PRM (6.3 m) respectively. Therefore, our tests indicate that separate PRM can provide more accurate distance estimation than the aggregate PRM. Thus, separate PRM is adopted in this paper. Next, we compare the performance of the distance estimation using the separate PRM for each advertisement channel with that uses the aggregate PRM for all advertisement channels. For the illustration purpose, one BLE beacon is used as an example for this comparison. Figure 8 depicts CDF functions of the distance estimation errors for the selected BLE beacon using separate PRM and aggregate PRM. The 90% distance estimation errors of three separate PRMs are 5.1 m, 5.2 m and 5.0 m; which are decreased by 19.05%, 17.46% and 20.63% over the aggregate PRM (6.3 m) respectively. Therefore, our tests indicate that separate PRM can provide more accurate distance estimation than the aggregate PRM. Thus, separate PRM is adopted in this paper.

Performance of of Fingerprinting for Location Estimation
This section mainly compares the FP-based location estimation using separate databases for each advertisement channel with that uses the aggregate database. In the comparison, RSS values were collected at 150 points in the experimental area. These RSS values were used for FP calculation by using two types of FP databases: the separate databases and aggregate database. Furthermore, the parameter "k" of k-NN is temporarily set to 1 here to avoid the effect of k-NN for the comparison of these two kinds of databases. To achieve a better localization performance, k is set to 3 in other parts of this paper. Figure 9 depicts CDF functions of the location estimation errors using separate FP databases and aggregate FP database. The FPs using separate databases have better performance than that uses aggregate database. The 90% errors of FP using separate databases are 6.9 m, 7.0 m, and 7.0 m; which are reduced by 4.17%, 2.78%, and 2.78% over that uses the aggregate database (7.2 m), respectively. These tests indicate that the separate strategy can provide more accurate FP result than the aggregate strategy. Thus, FP using separate databases are used in this paper.

Performance Evaluation for the Proposed Algorithm
The performance of the proposed algorithm was evaluated in indoor environments with densely and sparsely distributed BLE beacons shown in Figure 4. The proposed algorithm was also compared

Performance of of Fingerprinting for Location Estimation
This section mainly compares the FP-based location estimation using separate databases for each advertisement channel with that uses the aggregate database. In the comparison, RSS values were collected at 150 points in the experimental area. These RSS values were used for FP calculation by using two types of FP databases: the separate databases and aggregate database. Furthermore, the parameter "k" of k-NN is temporarily set to 1 here to avoid the effect of k-NN for the comparison of these two kinds of databases. To achieve a better localization performance, k is set to 3 in other parts of this paper. Figure 9 depicts CDF functions of the location estimation errors using separate FP databases and aggregate FP database. The FPs using separate databases have better performance than that uses aggregate database. The 90% errors of FP using separate databases are 6.9 m, 7.0 m, and 7.0 m; which are reduced by 4.17%, 2.78%, and 2.78% over that uses the aggregate database (7.2 m), respectively. These tests indicate that the separate strategy can provide more accurate FP result than the aggregate strategy. Thus, FP using separate databases are used in this paper.

Performance of of Fingerprinting for Location Estimation
This section mainly compares the FP-based location estimation using separate databases for each advertisement channel with that uses the aggregate database. In the comparison, RSS values were collected at 150 points in the experimental area. These RSS values were used for FP calculation by using two types of FP databases: the separate databases and aggregate database. Furthermore, the parameter "k" of k-NN is temporarily set to 1 here to avoid the effect of k-NN for the comparison of these two kinds of databases. To achieve a better localization performance, k is set to 3 in other parts of this paper. Figure 9 depicts CDF functions of the location estimation errors using separate FP databases and aggregate FP database. The FPs using separate databases have better performance than that uses aggregate database. The 90% errors of FP using separate databases are 6.9 m, 7.0 m, and 7.0 m; which are reduced by 4.17%, 2.78%, and 2.78% over that uses the aggregate database (7.2 m), respectively. These tests indicate that the separate strategy can provide more accurate FP result than the aggregate strategy. Thus, FP using separate databases are used in this paper.

Performance Evaluation for the Proposed Algorithm
The performance of the proposed algorithm was evaluated in indoor environments with densely and sparsely distributed BLE beacons shown in Figure 4. The proposed algorithm was also compared

Performance Evaluation for the Proposed Algorithm
The performance of the proposed algorithm was evaluated in indoor environments with densely and sparsely distributed BLE beacons shown in Figure 4. The proposed algorithm was also compared with classical PM + EKF and FP + EKF algorithms. Figure 10 shows the two trajectories which were used for performance evaluation. Figure 11 depicts the numbers of observed advertisement channels from BLE beacons in these two trajectories with dense and sparse deployment of BLE beacons. In Figure 11, there are 5~19 observed advertisement channels in two trajectories with dense deployment while the number changes to 0~9 with sparse deployment. The different number of measurements leads to various localization performance. with classical PM + EKF and FP + EKF algorithms. Figure 10 shows the two trajectories which were used for performance evaluation. Figure 11 depicts the numbers of observed advertisement channels from BLE beacons in these two trajectories with dense and sparse deployment of BLE beacons. In Figure 11, there are 5~19 observed advertisement channels in two trajectories with dense deployment while the number changes to 0~9 with sparse deployment. The different number of measurements leads to various localization performance.
(a) (b)  with classical PM + EKF and FP + EKF algorithms. Figure 10 shows the two trajectories which were used for performance evaluation. Figure 11 depicts the numbers of observed advertisement channels from BLE beacons in these two trajectories with dense and sparse deployment of BLE beacons. In Figure 11, there are 5~19 observed advertisement channels in two trajectories with dense deployment while the number changes to 0~9 with sparse deployment. The different number of measurements leads to various localization performance.
(a) (b)  The localization performance with dense deployment of BLE beacons is discussed first. Figure 12 shows the estimated trajectories using the proposed algorithm, PM + EKF and FP + EKF. The proposed algorithm performs slightly better than FP + EKF and PM + EKF. Figure 13 shows the localization errors of the two trajectories using the proposed method, PM + EKF and FP + EKF in the environment with densely distributed BLE beacons. As shown in Figure 13, the proposed algorithm has less localization errors than PM + EKF and FP + EKF at most of the time. Table 1 summarizes the localization errors of the two trajectories using the proposed algorithm, PM + EKF and FP + EKF. Figure 14 shows the CDF functions of localization errors of the two trajectories using these three approaches. As shown in Figure 14 and Table 1, the 90% localization error of the proposed algorithm is 2.57 m in trajectory I, which is reduced by 36.70% over PM + EKF (4.06 m) and 14.33% over FP + EKF (3.00 m). The 90% localization error of the proposed algorithm is 2.55 m in trajectory II, which reduces by 34.95% over PM + EKF (3.92 m) and 17.21% over FP + EKF (3.08 m). These results demonstrate that the proposed algorithm achieves around 2.5 m 90% localization error in the two trajectories, which performs better than both the traditional PM + EKF and FP + EKF when there is dense deployment of BLE beacons. The localization performance with dense deployment of BLE beacons is discussed first. Figure  12 shows the estimated trajectories using the proposed algorithm, PM + EKF and FP + EKF. The proposed algorithm performs slightly better than FP + EKF and PM + EKF. Figure 13 shows the localization errors of the two trajectories using the proposed method, PM + EKF and FP + EKF in the environment with densely distributed BLE beacons. As shown in Figure 13, the proposed algorithm has less localization errors than PM + EKF and FP + EKF at most of the time. Table 1 summarizes the localization errors of the two trajectories using the proposed algorithm, PM + EKF and FP + EKF. Figure 14 shows the CDF functions of localization errors of the two trajectories using these three approaches. As shown in Figure 14 and Table 1, the 90% localization error of the proposed algorithm is 2.57 m in trajectory I, which is reduced by 36.70% over PM + EKF (4.06 m) and 14.33% over FP + EKF (3.00 m). The 90% localization error of the proposed algorithm is 2.55 m in trajectory II, which reduces by 34.95% over PM + EKF (3.92 m) and 17.21% over FP + EKF (3.08 m). These results demonstrate that the proposed algorithm achieves around 2.5 m 90% localization error in the two trajectories, which performs better than both the traditional PM + EKF and FP + EKF when there is dense deployment of BLE beacons.  The localization performance with dense deployment of BLE beacons is discussed first. Figure  12 shows the estimated trajectories using the proposed algorithm, PM + EKF and FP + EKF. The proposed algorithm performs slightly better than FP + EKF and PM + EKF. Figure 13 shows the localization errors of the two trajectories using the proposed method, PM + EKF and FP + EKF in the environment with densely distributed BLE beacons. As shown in Figure 13, the proposed algorithm has less localization errors than PM + EKF and FP + EKF at most of the time. Table 1 summarizes the localization errors of the two trajectories using the proposed algorithm, PM + EKF and FP + EKF. Figure 14 shows the CDF functions of localization errors of the two trajectories using these three approaches. As shown in Figure 14 and Table 1, the 90% localization error of the proposed algorithm is 2.57 m in trajectory I, which is reduced by 36.70% over PM + EKF (4.06 m) and 14.33% over FP + EKF (3.00 m). The 90% localization error of the proposed algorithm is 2.55 m in trajectory II, which reduces by 34.95% over PM + EKF (3.92 m) and 17.21% over FP + EKF (3.08 m). These results demonstrate that the proposed algorithm achieves around 2.5 m 90% localization error in the two trajectories, which performs better than both the traditional PM + EKF and FP + EKF when there is dense deployment of BLE beacons.   Then, we evaluated the localization performance with the sparse deployment of BLE beacons. Figure 15 shows the estimated trajectories with sparsely distributed BLE beacons using the proposed algorithm, PM + EKF, and FP + EKF. The proposed algorithm performs better than FP + EKF and PM + EKF and the improvement is more than the case of dense deployment. Figure 16 shows the localization errors of the two trajectories using the proposed method, PM + EKF, and FP + EKF in the environment with the sparse distribution of BLE beacons. Table 2 summarizes the location errors of the two trajectories in this case. Figure 17 shows the CDF functions of localization errors of the two trajectories using the proposed method, PM + EKF, and FP + EKF. In Figure 17 and Table 2, the 90% localization error of the proposed algorithm is 4.16 m in trajectory I, which is reduced by 37.72% over PM + EKF (6.68 m) and 22.24% over FP + EKF (5.35 m). Figure 17 and Table 2 also show that the 90% localization error of the proposed algorithm is 3.59 m in trajectory II, which is decreased by 61.44% over PM + EKF (9.31 m) and 20.58% over FP + EKF (4.52 m). These results demonstrate that the proposed algorithm achieves around 4.0 m 90% localization error in the two trajectories with the sparse deployment of BLE beacons, which performs much better than the traditional PM + EKF and FP + EKF. The improvement of the proposed method (37.72% and 61.44% for PM + EKF, and 22.24% and 20.58% for FP + EKF) in two trajectories with the sparse deployment is more than that (36.70% and 34.95% for PM + EKF, and 14.33% and 17.21% for FP + EKF) with the dense deployment. This outcome means that the proposed algorithm is especially useful to improve the localization accuracy in the environments with sparse beacon deployment.  Then, we evaluated the localization performance with the sparse deployment of BLE beacons. Figure 15 shows the estimated trajectories with sparsely distributed BLE beacons using the proposed algorithm, PM + EKF, and FP + EKF. The proposed algorithm performs better than FP + EKF and PM + EKF and the improvement is more than the case of dense deployment. Figure 16 shows the localization errors of the two trajectories using the proposed method, PM + EKF, and FP + EKF in the environment with the sparse distribution of BLE beacons. Table 2 summarizes the location errors of the two trajectories in this case. Figure 17 shows the CDF functions of localization errors of the two trajectories using the proposed method, PM + EKF, and FP + EKF. In Figure 17 and Table 2, the 90% localization error of the proposed algorithm is 4.16 m in trajectory I, which is reduced by 37.72% over PM + EKF (6.68 m) and 22.24% over FP + EKF (5.35 m). Figure 17 and Table 2 also show that the 90% localization error of the proposed algorithm is 3.59 m in trajectory II, which is decreased by 61.44% over PM + EKF (9.31 m) and 20.58% over FP + EKF (4.52 m). These results demonstrate that the proposed algorithm achieves around 4.0 m 90% localization error in the two trajectories with the sparse deployment of BLE beacons, which performs much better than the traditional PM + EKF and FP + EKF. The improvement of the proposed method (37.72% and 61.44% for PM + EKF, and 22.24% and 20.58% for FP + EKF) in two trajectories with the sparse deployment is more than that (36.70% and 34.95% for PM + EKF, and 14.33% and 17.21% for FP + EKF) with the dense deployment. This outcome means that the proposed algorithm is especially useful to improve the localization accuracy in the environments with sparse beacon deployment.      Table 3 compares the localization errors of the proposed method in two cases: (a) dense deployment and (b) sparse deployment. It shows that the 90% localization errors with dense deployment are 2.57 m and 2.55 m in two trajectories, which are less than the 4.16 m and 3.59 m of the sparse deployment. The 90% localization errors with sparse deployment are increased by 61.87% and 40.78% over that with dense deployment in the two trajectories, respectively. This result illustrates that the proposed method in the dense deployment performs better than the sparse deployment. However, the dense deployment has more expense. Therefore, it is suggested that the deployment of BLE beacons should comprehensively consider both requirement of the localization accuracy and budget in specific indoor positioning and navigation applications.

Conclusions
This paper proposed an innovative algorithm based on the integration of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), extended Kalman filtering (EKF), and outlier detection for indoor localization using Bluetooth Low Energy (BLE) beacons. Field experiments showed that the proposed PRM for distance estimation achieved the accuracies of <3.1 m at 90% of the time, which was reduced by 18.42% over the 3.8 m of traditional propagation model (PM). Also, the proposed algorithm provided the accuracies of <2.56 m at 90% of the time (average of two trajectories) with dense deployment of BLE beacons (1 beacon per 9 m), which performed better than <3.99 m of the classical PM + EKF algorithm and <3.04 m of the classical FP + EKF algorithm. With sparse deployment of BLE beacons (1 beacon per 18 m), the proposed algorithm achieved the accuracies of <3.88 m at 90% of the time (average of two trajectories), which performed better than <8.00 m of the classical PM + EKF algorithm and <4.94 m of the classical FP + EKF algorithm. The improvement of the proposed algorithm over the classical PM + EKF and FP + EKF with sparse deployment of BLE beacons was more significant than the case of dense deployment. This proposed algorithm can be easily implemented without extra hardware costs and promote the development of robust smartphone-based indoor localization systems with BLE beacons.