Collaborative WiFi Fingerprinting Using Sensor-Based Navigation on Smartphones

This paper presents a method that trains the WiFi fingerprint database using sensor-based navigation solutions. Since micro-electromechanical systems (MEMS) sensors provide only a short-term accuracy but suffer from the accuracy degradation with time, we restrict the time length of available indoor navigation trajectories, and conduct post-processing to improve the sensor-based navigation solution. Different middle-term navigation trajectories that move in and out of an indoor area are combined to make up the database. Furthermore, we evaluate the effect of WiFi database shifts on WiFi fingerprinting using the database generated by the proposed method. Results show that the fingerprinting errors will not increase linearly according to database (DB) errors in smartphone-based WiFi fingerprinting applications.


Introduction
Mobile location-based services (LBS) are attracting the attention of many mobile device companies due to their potential applications in a wide range of personalized services [1]. As a core technology of OPEN ACCESS LBS, navigation (i.e., determination of the position, velocity, and attitude of the mobile device) is evidently vital. A highly demanding issue is to provide trustable navigation solutions in real time, since mobile devices are carried by users almost anywhere and anytime [2].
While Global Navigation Satellite Systems (GNSS) based outdoor navigation has greatly advanced over the past few decades, positioning and navigation in indoor scenarios and deep urban areas are still an open issue [3]. The challenges include unavailable or degraded GNSS signals, complex indoor environments, the necessity of using low-grade devices, etc. To navigate in indoor or urban areas, various indoor positioning technologies based on Radio Frequency (RF) signals have been broadly researched, such as Wireless Local Area Net (WLAN, also known as Wi-Fi) [4], cellular networks [5], Bluetooth [6], Radio Frequency identification (RFID) tags [7], ZigBee [8], Ultra Wideband Beacons (UWB) [9], high-sensitivity GNSS [10], and pseudo-satellites (pseudolites, also known as Locata) [11]. These RF-based technologies can provide a long-term accurate absolute position, but require the creation and maintenance of a network [12]. WiFi (WLAN based on 802.11 standards) does not need a specific network, since there are WiFi infrastructures in public areas such as malls, airports, and universities. Considering the fact that the WiFi receivers commonly exist in consumer devices such as smartphones, it is feasible to implement WiFi positioning in these public areas. The common WiFi positioning approaches include triangulation [13], fingerprinting [14][15][16][17][18], and their combination [19]. Triangulation is a method that determines the relative positions of objects using the geometry of triangles; therefore, a signal propagation model is needed to convert the received signal strength (RSS) to calculate the distance between WiFi access points (APs) and user devices [20]. However, it is difficult to obtain an accurate signal propagation model in indoor environments, since transmitted signals can suffer obstructions and reflections [21]. To mitigate this issue, several enhanced models considering multipath effects [22] or Rayleigh fading effects [23] have been studied. WiFi fingerprinting approaches based on WiFi RSS have gained a large amount of attention, as they can provide a relatively high indoor positioning accuracy in a multipath indoor environment [24]. Fingerprinting is commonly achieved in two steps (phases): the offline training (pre-survey) step and the online positioning step. The training phase is conducted to build or update a <RSS, location> database (DB) that consists of a set of reference points (RPs) with known coordinates and the RSS from available WiFi APs, while the positioning step is implemented to finding the closest match between the features of the RSS and those stored in the DB. The main barrier for the broad adoption of WiFi fingerprinting is that most current DB training methods are tedious and labor-intensive [25]. Another challenge is the non-stationarity of WiFi signal distribution, which can be attributed to radio signal propagation effects induced by environment changes [20,[26][27][28]. Therefore, the training phase needs to be rerun periodically to keep the system up-to-date [24]. Different WiFi DB training approaches have been researched due to various requirements. The research [29,30] have proposed the idea of a metropolitan-scale WiFi localization based on vehicles with equipment such as GNSS receivers and high-end inertial navigation systems. This method is highly efficient, but is mainly used in outdoor urban areas. To train indoor WiFi DBs, the first approach is to survey at every RP and record their fingerprints. This method can also improve DB reliability by averaging the RSS at each RP [31] and even providing a coarse estimate of orientation [20,32]. However, it is time-and labor-consuming when dense RPs are selected to cover an entire area of interest, and a surveyor needs to mark the position of all RPs (labels) manually [33]. The training phase can take up to several hours even for a small building [34].
To minimize the number of labels needed for training, another approach is used based on landmarks (i.e., points with known coordinates) or floor plans (i.e., the true position of corners and intersections, and the true orientation of corridors), and constant-speed assumption. To implement this method, a surveyor has to walk with constant speed along each link between landmarks, such as corners or intersections, over the known path. The position of landmarks can be determined by marking them manually on a digital map, while the position of other RPs on the links between landmarks can be calculated by the arrival time and the distance between two landmarks. This approach is significantly more time-effective than measuring the positon of all RPs on the digital map; however, it requires the user to walk straight with a constant speed between landmarks. To break the limitation of the constant-speed assumption, scholars generate the WiFi fingerprint DB between landmarks by leveraging a dead-reckoning solution from sensors [35]. Sensors can provide a step-based dead-reckoning approach, which detects steps, predicts their length and direction based on sensor readings [24].
There is also literature on the removal of the training process by updating the WiFi DB automatically while navigating based on sensors. WiFi SLAM (simultaneous location and mapping) is an advanced algorithm which trains WiFi DB while navigating [36]. The limitation of the SLAM algorithm is the increased computational cost as the scenarios become larger [37]. There are other approaches based on crowd-sourcing. The research [38] estimates the location of WiFi APs or other radio beacons using pedestrian dead-reckoning with high-quality foot-mounted IMUs, while [34,[39][40][41] propose similar systems or approaches using handheld smartphones. Based on this idea, it is possible for mobile users to collect WiFi fingerprints automatically in daily life by conducting sensor-based navigation. The crowd-sourcing approaches have the potential to provide daily-life navigation solutions due to the increasing popularity of Micro-electromechanical Systems (MEMS) sensor in consumer electronics [42,43]. The shortcoming is that MEMS sensor errors change with time and are significantly dependent on environmental factors such as temperature [44]. Therefore, MEMS inertial sensors provide only short-term accuracy but suffer from accuracy degradation over time [45]. Although the errors of horizontal attitudes (i.e., the roll and pitch) can be controlled by accelerometer measurements, the heading error will grow when there is no additional information from other sensors or techniques [46]. Magnetometers can assist the heading estimation by sensing the Earth's magnetic field; however, the local magnetic field is susceptible to interference from man-made infrastructures when the user enters urban or indoor environments [47], which makes magnetometer measurements unreliable. Therefore, MEMS sensors have to be integrated with other positioning technologies to provide reliable long-term navigation solutions.
In this paper, we propose an approach that utilizes similar ideas as [34,39,40], i.e., training the WiFi DB using the navigation data from the users, and utilizing different strategies to control sensor-based navigation errors and in turn control the drifts of the generated WiFi DBs. First, we use a strategy that combines different navigation trajectories that move in and out of a building to make up the DB; also, we restrict the time length of available indoor navigation trajectories, i.e., the trajectories from the last epoch that receives GNSS signal before entering an indoor area to the first epoch that receives GNSS signal after walking out of the indoor area. To be specific, we only use the navigation data which is shorter than a time threshold (e.g., 10 min). This strategy can help control the sensor-based navigation errors, but will exclude the majority of navigation trajectories. However, since there will be massive pedestrian navigation data in real-world application, even a small number of reliable navigation trajectories will be enough to generate the WiFi DB; also, it is necessary to use only the most valuable data and exclude others. Moreover, sometimes aiding sources such as landmarks can update the navigation solution, which divide long-term trajectories into short-term trajectories.
Another strategy is implemented based on the fact that users will always walk both into and out of an indoor area. Therefore, we conduct post-processing (smoothing) to improve the sensor-based navigation solution, since there are accurate GNSS position updates on the start and end points of indoor trajectories. Post-processing is common for survey applications such as mobile mapping [48,49], but is used relatively little in real-time navigation [50]. To train the WiFi DB, post-processing is both affordable and worthwhile. Test results show that the navigation solutions can be significantly improved through post-processing, but still suffer from errors during the middle part of the trajectories, which will in turn cause DB drifts.
Furthermore, we evaluate the effect of DB shifts on WiFi fingerprinting. Therefore, the difference between the WiFi fingerprinting results with the proposed DB (i.e., the DB generated by the proposed approach) and the reference DB was not significant when comparing with the WiFi fingerprinting errors with smartphones. Therefore, although constructing a high-quality DB is vital since fingerprinting accuracy highly depends on DB quality [31], the fingerprinting errors will not increase linearly according to DB errors because there are other error sources in smartphone-based WiFi fingerprinting.
This paper is organized as follows: Section 2 provides the methodology of proposed sensor-based navigation methods, including attitude determination, pedestrian dead-reckoning (PDR), and smoothing. Section 3 explains WiFi fingerprinting and focuses on training the WiFi DB; Section 4 outlines the tests and results; and Section 5 draws the conclusions.

Sensor-Based Navigation
The sensor-based navigation algorithm consists of three modules: multi-sensor based attitude determination, position tracking, and post-processing. The main components of the proposed algorithm are shown in Figure 1. The following will introduce these modules separately. Step Detection & Step-length Estimation

A Priori Information
Postprocessing

Multi-Sensor Based Attitude Determination
In this part, we use the inertial navigation systems (INS) mechanization to calculate continuous attitude angles, and utilize the information from multiple sensors and a priori information as updates to estimate the attitude errors through an attitude determination Kalman filter. The details of INS mechanization has been well described in [48]. The following will describe the Kalman filter models, including the system model and the measurement models.

Kalman Filter System Model
A simplifies from of the error model detailed in [48] is applied as the continuous state equation in the Kalman filter [51] (2 ) ( ) n n n n en n n n n n

Kalman Filter Measurement Models
We use multiple kinds of constraints to build the measurement model to enhance the attitude determination. These constraints include the pseudo-observations and the measurements from magnetometers and accelerometer.
The pseudo-position and pseudo-velocity observations are proposed based on the fact that the range of the position and linear velocity of the IMU are within a limited scope [51]. They can be used to compose the measurement vectors of the Kalman filter, that is The pseudo-velocity is directly set as zero; the pseudo-position can be set as a random constant value and this will not influence the estimation of attitude and gyro errors [51]. Accelerometers and magnetometers are well used to update attitude estimation in many applications of attitude and heading reference systems (AHRS) [52]. The details of using the accelerometer and magnetometer measurements can refer to [53]. The measurement uncertainties on the magnetometer measurements are different from that on the accelerometer signals. The latter is commonly high-frequency and alternating; however, the perturbation on the magnetometer measurements is low-frequency because the local magnetic field (LMF) changes due to the existence of external magnetic bodies such as man-made infrastructures. Among various kinds of perturbations to LMF, a common type is that both the direction and the strength of LMF are changed, but the change is stable during short periods. This period during which the LMF is stable is called quasi-static magnetic field (QSMF) period. We only use the magnetometer measurements in QSMF environment, which follows [47].

Initial Alignment
The initial direction cosine matrix 0 ( ) n b C t can be determined by using the accelerometer and magnetometer measurements as [54] ( ) = × l f m    . Since the purpose of this paper is to use sensor-based solution to build the WiFi DB, we start data collection from outdoor environment, where GNSS positioning results are used to provide initial position and heading.

Pedestrian Dead-Reckoning
PDR is the relative means of determining of a new position from the previous known position using current heading and step length information [55]. The coordinates ( ) ϕ λ can be computed as follows: where ϕ , λ , ψ , and s are the latitude, longitude, heading and step length, m R and n R are the radius of curvature of meridian and curvature in the prime vertical, and h is the altitude. The subscript k and k + 1 indicate the count of steps.
In practice, the linear error Model (6) is used to implement the PDR algorithm. This is because when other information such as GNSS or WiFi is available, it can be used to update the PDR through a position tracking Kalman filter. The Kalman filter system model is , , where x is the state vector, k Φ is the transition matrix, and k w is the noise vector. δϕ , δλ , δψ , s δ , and b are the errors of latitude, longitude, heading, step length, and the vertical gyro bias component.

Step Length Model
Step detection and step length estimation have been well introduced in [56]. The basic idea is that the acceleration shows a cycle pattern responding to every step when a pedestrian is walking. The specific force signals are smoothed to reduce the influence of noise by where (2L + 1) is the length of smooth array. The smoothed specific force is computed and then checked periodically in a sliding window. If it is a peak and the magnitude is bigger than a threshold, a new step is detected.
The step length varies with walk velocity, terrain scope, etc. It can be related to the step frequency with a linear model [57] k k s can be trained, s w is noise.

Post-Processing
To build WiFi DB, it is affordable and worthwhile to employ post-processing to provide a better navigation solution. In general, smoothing can be performed by combining the forward and backward filter solution as follows [48]: where subscripts 'f', 'b', and 'sm' denote the forward, backward, and smoothed solutions, respectively. x is the estimated state vector, and P represents the covariance matrix of x . In this research, . f P and b P are predicted in the forward and backward process by [58] 1 1 where Q is the covariance matrices of system noise sequence vector.
When other information such as GNSS or WiFi is available, the matrix P can be updated through with ( ) where K is the Kalman gain matrix, H is the design matrix for measurements, and R is the covariance matrices of measurement error vector.

WiFi Fingerprinting
WiFi fingerprinting is the widely used positioning approach based on WiFi RSS. The main advantage is that it does not rely on known WiFi AP position and signal propagation environment. WiFi fingerprinting consists of two phases: offline training phase and online positioning phase. The procedure of WiFi fingerprinting is shown in Figure 2 [59].

Training Phase
The purpose of training phase is to build or update a <location, RSS> DB that consists of a set of RPs with known coordinates and RSS from available WiFi APs. To generate DB, enough RPs should be selected to cover the whole area of interest. Then, the RSS from available APs are collected at every RP. The procedure for training is illustrated in the red dashed box in Figure 2. The fingerprint information at the i-th RP is recorded as In this paper, we use the continuous sensor-based navigation solution from the method described in Section 2 to obtain the position of RPs. Therefore, it is possible to train DB with daily-life navigation data from users, instead of performing a separate training phase. This is similar as [34,39,40].

Positioning Phase
In the positioning phase, the user location is estimated by comparing the RSS information with that stored in the DB. The procedure for positioning is illustrated in the green dashed box in Figure 2. Several approaches have been proposed for estimating the user location using RSS measurements, such as the nearest neighbor (NN) approach and the probabilistic estimation method [60]. The NN method selects the RP which has the minimal signal strength distance as the user's estimated position which is calculated as follows [31]: where, i d is the signal strength distance at i RP in the DB,

Tests and Results
Walking tests were conducted in two indoor environments: the Energy, Environment, and Experiential learning (EEEL) building, in which the average weighted AP number (as defined in Section 4.1.2) at one point was over 15; and the Engineer building (ENB), in which the average weighted AP number was nearly seven. The tests were performed with a Samsung Galaxy 4 smartphone. A designed software was used to simultaneously receive the data from sensors, WiFi, and GNSS. The sample rate of sensors, WiFi, and GNSS data were set as 20 Hz, 1 Hz, and 1 Hz, respectively. Even though our attitude estimation algorithm can dealing with different scenarios such as handheld, ear, dangling, pocket, and backpack [61], we conducted the tests in this paper with only handheld mode to focus on investigating WiFi positioning.

Tests at EEEL
EEEL is a relatively new building with well-equipped infrastructures. There are metallic infrastructures inside the building, which may influent the propagation of WiFi signals. Figure 3 shows the indoor test environment. This building has a main corridor which is 3 m wide, and a lobby which is about 30 × 30 m 2 . The test area at EEEL was approximately 120 × 40 m 2 .

Trajectories for Building DB
In this test, we generated the WiFi fingerprint DB inside the EEEL building using four different sensors stand-alone navigation trajectories. The true trajectories are shown in Figure 4. Each trajectory lasted for 5-10 min. Both the starting and ending points of each trajectory were in the outdoor environment, where the initial position was provided by GNSS.   Figure 5 shows that both forward and backward PDR trajectories had a similar shape with the reference and are accurate at the beginning, but suffered from long-term drifts. The drifts were caused by both heading and step length errors, which are the issues inherent in the sensors-only navigation algorithm. The smoothed trajectory significantly got closed to the reference at beginning and ending periods since forward and backward PDR solutions were accurate and had high weight at beginning and ending periods, respectively. To make the errors clear, the error distances (i.e., the distance between estimated user position and the corresponding true position) of these solutions are illustrated in  The smoothed results were accurate at the beginning and ending parts of each trajectory, and had more drifts in the middle, which meets the characteristics of smoothing algorithm. By comprehensive using the data during the whole navigation process, the smoothed results were more accurate than the forward. The max and RMS values of the errors in forward and smoothed results are shown in Table 1. Each element in the final row is the RMS of the values of four trajectories in the corresponding column.  The max and RMS values of error distance were generally reduced from 16.1 m and 8.7 m to 9.6 m and 5.7 m after smoothing, with a reduction of 34.5%. Then, the smoothed navigation solutions were used to build the WiFi fingerprint DB. The DB built through the proposed method (which will be denoted as "the proposed DB" for short) is shown in Figure 7a. To make a comparison, a reference DB was built

RMS Forw ard
Backward by using the conventional floor plan aided training approach (which will be denoted as "the reference DB" for short) and shown in Figure 7b. It is clear that the proposed DB has some shifts when comparing with the true path, while the reference DB fits the true path. We will evaluate the effect of such DB shift on WiFi positioning errors by WiFi positioning tests in the next subsection. In addition, WiFi signal distribution in the reference DB was shown in Figure 8. The x-and y-axis indicate the length in the west-east and south-north directions, and the z-axis show the weighted AP number, which is calculated by

WiFi Fingerprinting Using Generated DB
The WiFi fingerprinting performance with both the proposed DB and the reference DB was tested by a separate navigation trajectory in the EEEL building, to evaluate the effect of DB shifts on WiFi positioning errors. The true trajectory is shown in Figure 9.  Figure 10. Considering the good performance of GNSS in the outdoor environment, only the results of WiFi fingerprinting in the indoor environment are considered. Even though we set sampling rate of WiFi as 1 Hz, the real-world WiFi updating rate was less than 0.3 Hz. This might due to the restriction of the smartphone or the Android system.  The largest WiFi positioning errors existed in the middle area with abundant WiFi signals, instead of in the marginal areas with fewer WiFi signals, which does not meet our intuition. A possible reason for this is that the middle area has strongest signals, which causes ambiguity issue around this area. Integrating PDR with WiFi may probably mitigate the ambiguity issue, which is beyond the scope of this article.
To  There is approximately 80% fewer errors than 7.5 m when using the proposed DB, while around 80% fewer errors than 6.0 m when using the reference DB. Therefore, the sample WiFi positioning results using the proposed DB were also 1.5 m worse than those using the conventional floor plan aided DB.

Trajectory for Building DB
ENB is mainly used for walking. Therefore, the environment at ENB is different from that at EEEL. Compared with EEEL, there are fewer WiFi APs and fewer metallic infrastructures at ENB. Figure 13 shows the indoor test environment. The test area at ENB was approximately 140 × 60 m 2 . The corridors inside ENB is long and narrow, which is different from those in EEEL. The test trajectory is shown in Figure 14. Due to the straightforward and long shape of the ENB trajectory, the navigation error at the end of the forward and backward PDR reached 25 m and 20 m, which is larger than those in the EEEL tests. However, the max error and RMS error were reduced from 29.2 m and 15.9 m in forward results to 8.1 m and 3.9 m in smoothed results.
The smoothed navigation solutions were used to build the WiFi fingerprint DB, as shown in Figure 16a. The reference DB generated through the conventional floor plan aided method is shown in Figure 16b. It is clear that the proposed DB has some shifts in the middle part even after smoothing when comparing with the true floor plan.  The WiFi signal distribution in the DB was shown in Figure 17. There are generally many fewer WiFi signals on this trajectory than those in EEEL because the tests were conducted on the 0-th floor, which is mainly used for walking instead of working.

WiFi Fingerprinting Using Generated DB
In this WiFi positioning test, we use the same trajectory as the DB training tests, but walked in the opposite direction. The WiFi fingerprinting results with the proposed DB and the conventional DB were shown in Figure 18a  The RMS of WiFi positioning errors decreased from 6.0 when using the proposed DB to 5.1 m when using the reference DB, while the max error increased from 11.4 m to 12.5 m. Therefore, even using the proposed DB which had shifts with a RMS of 3.9 m, the final WiFi fingerprinting results increased for only 0.9 m, which further verified that there were other error sources in smartphone-based WiFi positioning which is more significant than such level of DB shifts.
There are approximate 80% fewer errors than 7.1 m when using the proposed DB, while around 80% fewer errors than 6.4 m when using the conventional DB. Therefore, the sample WiFi positioning results using the proposed DB were 0.7 m worse than those using the conventional floor plan aided DB.

Conclusions
This paper presents a method which trains the WiFi fingerprint database using sensor-based navigation solutions. Since MEMS sensors provide only short-term accuracy but suffer from the accuracy degradation with time, we use a strategy that combine different navigation trajectories that move in and out of a building into the database, and restrict the time length of available indoor navigation trajectories to a certain time threshold (e.g., 10 min). This simple strategy can help control the sensor-based navigation errors. In addition, we conduct post-processing (smoothing) to improve the sensor-based navigation Furthermore, we evaluated the effect of DB shifts on WiFi fingerprinting using the proposed DB (i.e., the database generated using the proposed method). Results show that when the error of the proposed DB was 5.7 m, the WiFi positioning error was 6.1 m, which is 1.5 m larger than that with the reference DB (i.e., the database built through the floor-plan aided method). In the tests inside another building, when the error of the proposed DB was 3.9 m, the WiFi positioning error was 6.0 m, which is 0.9 m larger than that with the reference DB. Therefore, the difference between the WiFi fingerprinting results with the proposed DB and the reference DB was not significant when comparing with the WiFi fingerprinting errors with smartphones. Therefore, although constructing a high-quality DB is vital as fingerprinting accuracy highly depend on DB quality, the smartphone-based fingerprinting errors will not increase linearly according to DB errors because there are other error sources.
Future works will focus on researching WiFi positioning based on crowd-sourcing, where the database accuracy can be further improved with mass data of sensor-based navigation.