^{1}

^{2}

^{3}

^{*}

^{3}

^{3}

^{3}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.

One of the key issues for location based services (LBS) is the positioning technology, and the indoor positioning accuracy requirement is usually higher than that for outdoors [

Wi-Fi has been widely used for indoor positioning. Wi-Fi provides local wireless access to a fixed network that is low cost, widely deployed and whose indoor coverage is still rapidly increasing. Using existing infrastructure for positioning is a very attractive option. However, only Wi-Fi signal strength (SS) measurements by the user receiver device from the access points (AP) are available for positioning.

One obvious approach is to convert the SSs to distance measurements. If three distances between the user receiver and different APs can be obtained, trilateration can be used to estimate the receiver's position. However creating an accurate model to convert SS to distance is difficult. The propagation of radio signals in indoor environments is very complicated. The SS received from an AP varies significantly (up to 15 dBm) over time at the same location. In addition, indoor environments vary considerably from each other, which means one model may work well for a specific environment, but perform poorly in other situations. Hence it is difficult, if not impossible, to accurately obtain distance measurements from SSs on a consistent basis [

However, there is a major shortcoming of this technique—initial creation of the fingerprint database and maintaining it are not trivial tasks. An entry in the fingerprint database is represented by a location identifier paired with the Wi-Fi received SS (RSS) at that location. The “fingerprints” may be the average RSSs (deterministic approach) [

The probabilistic approach can provide better accuracy for Wi-Fi positioning. However, understanding the statistical characteristics of the RSSs is the key [

This paper proposes a new algorithm based on an improved double-peak Gaussian distribution (IDGD) for Wi-Fi fingerprinting. The rest of the paper is organised as follows: Section 2 gives some details of the probabilistic approach of fingerprinting technology and discusses the characteristics of RSS. Section 3 introduces the IDGD and describes a way to generate the fingerprint database. Before the conclusions are given in Section 5, testing of the new algorithm and the analyses are reported in Section 4. Results show that the proposed algorithm can improve the positioning accuracy by about 20% and has the potential to significantly reduce the labour costs for the training phase.

The fingerprinting technique is widely used where line-of-sight signal propagation is not typical. The low cost of the user hardware and the promising performance are its main advantages. Wi-Fi location fingerprinting consists of two phases: the off-line data training phase and the on-line positioning phase. The aim of the training phase is to build a fingerprint database. To generate the database in a conventional way, some reference points (RP) in the area of interest are selected. Locating a MS at one RP location, the RSSs of all the APs are measured. From such measurements the characteristic feature of that RP is determined, which is then recorded in the database. This process is repeated at another RP, and so on until all RPs are visited. In the positioning phase, the MS measures the RSS at a place where it requires its position. The measurements (including RSSs and MAC addresses of the APs) are compared with the data in the database using a matching algorithm. Typically, the signal distance is computed. The smallest signal distance indicates the best match and the likeliest location of the MS can be determined [

In this study the authors have adopted the probabilistic approach [_{1}, p_{2}, …, p_{N}) consists of N RSS values from N APs. The database contains RSS vectors for all RPs in the area of interest. For positioning, a MS obtains a sample of the RSS vector S = (s_{1}, s_{2}, …, s_{N}). The probability between the P and S for each P in the database is computed. The location is then estimated to be that L for which the probability is the highest. Note that the vector S is random. An error is made when the highest probability occurs for a location L that is not the one at which the sample S was collected. Errors occur because the measured RSS vector is a sample of a random vector while only the probabilistic RSS vector is stored in the database.

The RSS probability distributions of all APs at all RPs need to be stored. The fingerprint of the i-th RPs can be defined as:
_{n} (n = 1…N) means the n-th AP; T means the measurement of RSS. P is expressed as:
_{i}_{m}

To speed up the computations, the signal strength distribution is typically divided into p bins. The fingerprint of the i-th RP also can be expressed as:

Correspondingly the probability of the RSS measurements within the bin _{k}_{n}_{k}

In order to investigate the characteristics of RSSs, Four tests have been carried out at four different environments: a residential room, an office, a class room and a shopping centre. More than 10,000 RSS samples have been collected for each test. In total 424 APs have been detected during the tests. All data have been analysed and some characteristics of the Wi-Fi signals were determined (see

Distribution of more than 30% of RSSs from APs consists of two peaks and a long tail as shown in

The Gaussian function does not approximate the distribution of RSSs very well. The Gaussian distribution in

The two peaks distribution of RSS is not accidental. The test results are listed in

Generating the database is a prerequisite for location fingerprinting. Generally speaking, the more measurements obtained at each RP, the better the positioning performance. However, more measurements means more effort is required for the RSS survey/training phase. In reality only a few samples of RSS are collected at each RP, and hence the limited samples cannot be used to generate an accurate empirical RSS distribution.

The Gauss function is a traditional method for fingerprint database generation [

The double-peak Gaussian distribution (DGD) is proposed as a candidate to replace the Gaussian distribution when it is not suitable. The RSS of each AP is divided into two parts, according to the minimum value between the two peaks, and each part is treated as an independent Gaussian function. The weight of each function was assumed to be 1/2. Its probability density is expressed as:
_{1}_{1}_{2}_{2}_{1}_{2}

In order to solve the problem mentioned above, an improved DGD was developed. The _{1}_{2}_{1}_{2}

It can be seen that the probability distribution based on with the IDGD is better than that obtained using the DGD. As already mentioned, the distribution of RSSs is not always Gaussian, but it is also not always double-peak Gaussian. Hence, a new model (IDGD), comprising a joint model of Gaussian distribution and DGD, is proposed. The function is defined as:

In _{1} + Max_{2}, the RSS distribution appears one peak or two unobvious peaks. Otherwise, the distribution appears two peaks. Investigation of this question will be carried out in future research. Thus the decision rule is expressed as:
_{1}_{1}_{1}_{2}

The procedure for fingerprint-based positioning using the proposed joint model is as follows, where steps 1 to 4 are the off-line data training phase and steps 5 to 7 are the on-line positioning phase:

Choose the RPs, and then collect the RSSs from all APs at each RP.

Detect the gross errors and filter them out.

Use a global search procedure to find the two peaks and the minimum value between the two peaks. The two times minimum is compared with the sum of the two peaks to decide between using the Gaussian model or the alternative model. This decision rule was created based on all the data collected.

Create the fingerprint database.

RSSs are collected by the user, outliers are removed. Calculate the probability distribution of received RSSs.

Use the fingerprint database to calculate the joint probability density for the RSSs collected in the step 5.

Estimate the user's location using the K weighted nearest neighbour (KWNN) algorithm. KWNN is a conventional algorithm used for fingerprint-based Wi-Fi positioning. Using this algorithm, K (K ≥ 2) nearest neighbours (those with the shortest signal distance) of a test vector are chosen. The weighted average of the co-ordinates of K points can be used as the estimate of the user's location. The inverse of the signal distance defines the weight [

To verify the proposed approach, a study was carried out in a small test area. The test area was a typical office room of forty five square metres in size. Nine RPs and five test points (TPs) were selected. A LENOVE X220 Tablet equipped with an Intel Centrino Advanced-N 6250 wireless network card was used to make RSS measurements. A software called inSSIDer was used to collect Wi-Fi signal strengths.

Data were collected during a working day from 8–9 a.m. Up to 100 RSS samples (for about 2 min) were collected at each RP. Then different models—Gaussian, histogram, DGD and IDGD—were used to generate the fingerprint database. About 10–20 RSS samples were collected at each TP soon after the training data were collected. The conventional KWNN (K = 3) was used to estimate the position of the TPs.

In this paper, the weight was calculated as the inverse of the signal distance—the Euclidean distance was adopted.

This first test indicated that the proposed model works well.

The observation of double peaks of Wi-Fi signal strength has suggested the investigation of a new model known as the Double-peak Gaussian Distribution (DGD) to approximate the signal strength's distribution. Further investigation indicated that an improvement of the DGD was needed, and the Improved DGD (IDGD) was proposed.

The IDGD takes into account the different types of distributions of the RSS samples (sometimes one peak, in other circumstances two peaks). When one peak is detected a standard Gaussian distribution is used to create the fingerprint, whereas when two peaks are detected the DGD is used instead. Tests show that applying the new model for fingerprint-based positioning can significantly improve the positioning accuracy (by up to 40%). Furthermore, this model has the potential to reduce labour costs for the data training phase,

This work was supported by the following projects: The Pre-Research project of the key technology research of “Container Intelligent Logistics Based on BeiDou Satellite” which was funded by the Science and Technology Commission of the Shanghai Municipality (12511501102); the project of “Research on Authentication Platform of Cloud Computing based on the Internet of Things” which was funded by National Natural Science Foundation of China (61272468); and the project of “High Gain Low Cost Miniaturization Multimode Substrate Integrated Satellite Navigation Antenna” which was funded by the Shanghai Municipal Commission of Economy and Information.

The authors declare no conflict of interest.

Location fingerprinting technique.

Distribution characteristics of Wi-Fi signals in indoor test environments: (

Comparison of DGD and IDGD with the empirical distribution of RSS.

The procedure of positioning using proposed model.

The test bed of the second test.

Errors at test points in test 2.

The average error using different models in test 2.

Double-peak distribution of RSSs in tests

Residential room | 10 m^{2} |
28 | 9 | 32% |

Office | 45 m^{2} |
134 | 35 | 26% |

Class room | 200 m^{2} |
124 | 38 | 31% |

Shopping centre | 1,000 m^{2} |
138 | 52 | 38% |

Total | 424 | 134 | 32% |

Positioning errors of test 1 (unit: metres).

Deterministic | 2.28 | 2.10 | 0.62 | 1.27 | 2.49 | 1.75 |

Gaussian | 2.36 | 2.01 | 1.36 | 2.57 | 2.83 | 2.23 |

Histogram | 2.51 | 1.27 | 1.33 | 1.04 | 2.28 | 1.69 |

DGD | 1.26 | 1.20 | 1.32 | 2.58 | 2.29 | 1.73 |

IDGD | 1.15 | 1.23 | 1.09 | 1.14 | 2.18 | 1.36 |