Next Article in Journal
Optical-Based Artificial Palpation Sensors for Lesion Characterization
Previous Article in Journal
Modulated Acquisition of Spatial Distortion Maps
Article Menu

Export Article

Sensors 2013, 13(8), 11085-11096;

An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning
College of Information Science and Technology, East China Normal University, Dongchuang Road 500, Shanghai 200241, China
College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Yingbin Road 688, Jinhua 321004, China
School of Surveying and Geospatial Engineering, University of New South Wales, Sydney 2052, Australian
Author to whom correspondence should be addressed.
Received: 9 June 2013; in revised form: 30 July 2013 / Accepted: 16 August 2013 / Published: 21 August 2013


: The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.
double-peak Gaussian distribution; kurtosis testing; location fingerprinting; indoor positioning

1. Introduction

One of the key issues for location based services (LBS) is the positioning technology, and the indoor positioning accuracy requirement is usually higher than that for outdoors [1,2]. For outdoor environments, a Global Navigation Satellite System (GNSS) such as Global Positioning System (GPS) is ideal. However, GPS is not suitable for indoor environments as the satellite signals cannot penetrate walls or roof of buildings [3,4]. Furthermore, when assisted-GPS techniques are used, the position may have errors of tens of meters [5].

Wi-Fi has been widely used for indoor positioning. Wi-Fi provides local wireless access to a fixed network that is low cost, widely deployed and whose indoor coverage is still rapidly increasing. Using existing infrastructure for positioning is a very attractive option. However, only Wi-Fi signal strength (SS) measurements by the user receiver device from the access points (AP) are available for positioning.

One obvious approach is to convert the SSs to distance measurements. If three distances between the user receiver and different APs can be obtained, trilateration can be used to estimate the receiver's position. However creating an accurate model to convert SS to distance is difficult. The propagation of radio signals in indoor environments is very complicated. The SS received from an AP varies significantly (up to 15 dBm) over time at the same location. In addition, indoor environments vary considerably from each other, which means one model may work well for a specific environment, but perform poorly in other situations. Hence it is difficult, if not impossible, to accurately obtain distance measurements from SSs on a consistent basis [6]. On the other hand, the so-called “fingerprinting” approach has demonstrated promising performance of Wi-Fi positioning [7,8]. There are many advantages of the fingerprinting approach [9,10], including the fact that no special hardware is required on the user mobile station (MS) side [11,12].

However, there is a major shortcoming of this technique—initial creation of the fingerprint database and maintaining it are not trivial tasks. An entry in the fingerprint database is represented by a location identifier paired with the Wi-Fi received SS (RSS) at that location. The “fingerprints” may be the average RSSs (deterministic approach) [7], or RSS distributions (probabilistic approach) [8]. The more accurate the fingerprint database (also referred to as the “radio map”) created, the better the positioning accuracy that can be achieved.

The probabilistic approach can provide better accuracy for Wi-Fi positioning. However, understanding the statistical characteristics of the RSSs is the key [12]. In [13] a lognormal distribution was used to model the RSSs. A shape-filtered empirical distribution was utilised to estimate the RSS distribution [14]. Kamol et al. investigated the use of a Gaussian distribution to approximate the real RSS distribution [11]. The Weibull function for approximating the Bluetooth RSS distribution was discussed in [15]. Although the normal distribution is often used for Wi-Fi RSS [16,17], the current study has found that it is not always correct.

This paper proposes a new algorithm based on an improved double-peak Gaussian distribution (IDGD) for Wi-Fi fingerprinting. The rest of the paper is organised as follows: Section 2 gives some details of the probabilistic approach of fingerprinting technology and discusses the characteristics of RSS. Section 3 introduces the IDGD and describes a way to generate the fingerprint database. Before the conclusions are given in Section 5, testing of the new algorithm and the analyses are reported in Section 4. Results show that the proposed algorithm can improve the positioning accuracy by about 20% and has the potential to significantly reduce the labour costs for the training phase.

2. Fingerprinting Technique

The fingerprinting technique is widely used where line-of-sight signal propagation is not typical. The low cost of the user hardware and the promising performance are its main advantages. Wi-Fi location fingerprinting consists of two phases: the off-line data training phase and the on-line positioning phase. The aim of the training phase is to build a fingerprint database. To generate the database in a conventional way, some reference points (RP) in the area of interest are selected. Locating a MS at one RP location, the RSSs of all the APs are measured. From such measurements the characteristic feature of that RP is determined, which is then recorded in the database. This process is repeated at another RP, and so on until all RPs are visited. In the positioning phase, the MS measures the RSS at a place where it requires its position. The measurements (including RSSs and MAC addresses of the APs) are compared with the data in the database using a matching algorithm. Typically, the signal distance is computed. The smallest signal distance indicates the best match and the likeliest location of the MS can be determined [18,19]. Figure 1 illustrates the whole process.

In this study the authors have adopted the probabilistic approach [20,21]. The location fingerprint is a vector P of the probabilistic RSS values from multiple APs at a particular location L. A typical vector P = (p1, p2, …, pN) consists of N RSS values from N APs. The database contains RSS vectors for all RPs in the area of interest. For positioning, a MS obtains a sample of the RSS vector S = (s1, s2, …, sN). The probability between the P and S for each P in the database is computed. The location is then estimated to be that L for which the probability is the highest. Note that the vector S is random. An error is made when the highest probability occurs for a location L that is not the one at which the sample S was collected. Errors occur because the measured RSS vector is a sample of a random vector while only the probabilistic RSS vector is stored in the database.

2.1. Fingerprint Database

The RSS probability distributions of all APs at all RPs need to be stored. The fingerprint of the i-th RPs can be defined as:

R i = [ P A 1 ( T 1 ) P A 2 ( T 1 ) P A N ( T 1 ) P A 1 ( T 2 ) P A 2 ( T 2 ) P A N ( T 2 ) P A 1 ( T M ) P A 2 ( T M ) P A N ( T M ) ]
where An (n = 1…N) means the n-th AP; T means the measurement of RSS. P is expressed as:
P A n ( T m ) = C T m N i
where Ni is the total number of training samples collected at the i-th RP; and C T mis the number of Tm appearing in the training data at the i-th RP. The whole fingerprint database is expressed as:
D = [ R 1 , R 2 , , R w ]
where w is the total number of RPs in the area of coverage.

To speed up the computations, the signal strength distribution is typically divided into p bins. The fingerprint of the i-th RP also can be expressed as:

R i = [ P A 1 ( B 1 ) P A 2 ( B 1 ) P A N ( B 1 ) P A 1 ( B 2 ) P A 2 ( B 2 ) P A N ( B 2 ) P A 1 ( B P ) P A 2 ( B P ) P A N ( B P ) ]

Correspondingly the probability of the RSS measurements within the bin Bk for AP An at the i-th RP can be expressed as:

P A n ( B k ) = C N i
where C is the number of samples with the signal strength within Bk[15].

2.2. The Characteristics of RSSs

In order to investigate the characteristics of RSSs, Four tests have been carried out at four different environments: a residential room, an office, a class room and a shopping centre. More than 10,000 RSS samples have been collected for each test. In total 424 APs have been detected during the tests. All data have been analysed and some characteristics of the Wi-Fi signals were determined (see Figure 3):


Distribution of more than 30% of RSSs from APs consists of two peaks and a long tail as shown in Figure 2. The two peaks are quite obvious, however this characteristic has not been mentioned in past studies.


The Gaussian function does not approximate the distribution of RSSs very well. The Gaussian distribution in Figure 2 was based on the data used to generate the occurrence plot. Obviously these two distributions are significantly different.

The two peaks distribution of RSS is not accidental. The test results are listed in Table 1. The probability distribution of 134 APs (out of a total of 424) indicate double peaks, which is about 32%. Further investigation has found that the percentage of double-peak distribution of RSS at different environments is not significantly different (being from 26% to 38%), which suggests the double-peak behaviour may not be so unusual for indoor environments.

Generating the database is a prerequisite for location fingerprinting. Generally speaking, the more measurements obtained at each RP, the better the positioning performance. However, more measurements means more effort is required for the RSS survey/training phase. In reality only a few samples of RSS are collected at each RP, and hence the limited samples cannot be used to generate an accurate empirical RSS distribution.

3. Improved Double-Peak Gaussian Distribution (IDGD)

3.1. Fingerprint Database Based on Gaussian and Double-Peak Gaussian Distribution

The Gauss function is a traditional method for fingerprint database generation [22]; its probability density function can be expressed as:

F ( x ) = 1 2 π σ e ( x u ) 2 2 σ 2 ( σ > 0 )
where x is the variable of the function; u is the mean of x; and σ is the standard deviation of x. Since the variation of the RSS at each point is large, the probabilistic approach in principle can achieve more accurate results. The probabilistic approach to date has been based on the assumption of a Gaussian distribution for the RSS values. Unfortunately, the distribution of the RSS is not always Gaussian (as mentioned in the previous section).

The double-peak Gaussian distribution (DGD) is proposed as a candidate to replace the Gaussian distribution when it is not suitable. The RSS of each AP is divided into two parts, according to the minimum value between the two peaks, and each part is treated as an independent Gaussian function. The weight of each function was assumed to be 1/2. Its probability density is expressed as:

F ( x ) = 1 2 ( 1 2 π σ 1 e ( x u 1 ) 2 2 σ 1 2 + 1 2 π σ 2 e ( x u 2 ) 2 2 σ 2 2 ) ( σ > 0 )
where u1, σ1 and u2, σ2 are the mean and the standard deviation of the RSS in part one and part two, respectively. However, another problem was observed during the test—the mean values (u1 and u2) usually were not coincident with the values of the peaks. These offsets introduce some errors in the positioning phase.

3.2. IDGD Fingerprint Database Model

In order to solve the problem mentioned above, an improved DGD was developed. The u1 and u2 are changed from the mean values to the values of peak 1 and peak 2, and the standard deviations are the same as σ1, σ2 used in DGD. Figure 3 shows the empirical distribution, DGD and IDGD.

It can be seen that the probability distribution based on with the IDGD is better than that obtained using the DGD. As already mentioned, the distribution of RSSs is not always Gaussian, but it is also not always double-peak Gaussian. Hence, a new model (IDGD), comprising a joint model of Gaussian distribution and DGD, is proposed. The function is defined as:

F ( x ) = { 1 2 π σ e ( x u ) 2 2 σ 2 ; one peak 1 2 ( 1 2 π σ 1 e ( x u 1 ) 2 2 σ 1 2 + 1 2 π σ 2 e ( x u 2 ) 2 2 σ 2 2 ) ; two peak ( σ > 0 )

In Equation (8), the Gaussian model is adopted when the distribution of RSSs has one peak, and the DGD is utilised when the RSS distribution has two peaks. The values at peak 1 and peak 2 are denoted as Max 1 and Max 2, respectively. The values between Max 1 and Max 2 are searched (see Figure 2) and the minimum value is found and denoted as Min. We tested all collected data and thus find that if 2 MIN< Max1 + Max2, the RSS distribution appears one peak or two unobvious peaks. Otherwise, the distribution appears two peaks. Investigation of this question will be carried out in future research. Thus the decision rule is expressed as:

I f Min Max 1 + M a x 2 2 , use Gaussian model I f Min < Max 1 + Max 2 2 , use IDGD model
where u and σ are the mean and standard deviation of the RSS measurement for the Gaussian model. If the IDGD model is used, the RSS measurement is divided into two parts by the mean first. u1, σ1 and u1, σ2 are the improved means and standard deviations of the two parts of the RSS measurement.

3.3. The Positioning Procedure

The procedure for fingerprint-based positioning using the proposed joint model is as follows, where steps 1 to 4 are the off-line data training phase and steps 5 to 7 are the on-line positioning phase:


Choose the RPs, and then collect the RSSs from all APs at each RP.


Detect the gross errors and filter them out.


Use a global search procedure to find the two peaks and the minimum value between the two peaks. The two times minimum is compared with the sum of the two peaks to decide between using the Gaussian model or the alternative model. This decision rule was created based on all the data collected.


Create the fingerprint database.


RSSs are collected by the user, outliers are removed. Calculate the probability distribution of received RSSs.


Use the fingerprint database to calculate the joint probability density for the RSSs collected in the step 5.


Estimate the user's location using the K weighted nearest neighbour (KWNN) algorithm. KWNN is a conventional algorithm used for fingerprint-based Wi-Fi positioning. Using this algorithm, K (K ≥ 2) nearest neighbours (those with the shortest signal distance) of a test vector are chosen. The weighted average of the co-ordinates of K points can be used as the estimate of the user's location. The inverse of the signal distance defines the weight [23].

Figure 4 illustrates the details of the procedure using the proposed joint Gaussian and IDGD model for positioning.

4. Test and Analysis

To verify the proposed approach, a study was carried out in a small test area. The test area was a typical office room of forty five square metres in size. Nine RPs and five test points (TPs) were selected. A LENOVE X220 Tablet equipped with an Intel Centrino Advanced-N 6250 wireless network card was used to make RSS measurements. A software called inSSIDer was used to collect Wi-Fi signal strengths.

Data were collected during a working day from 8–9 a.m. Up to 100 RSS samples (for about 2 min) were collected at each RP. Then different models—Gaussian, histogram, DGD and IDGD—were used to generate the fingerprint database. About 10–20 RSS samples were collected at each TP soon after the training data were collected. The conventional KWNN (K = 3) was used to estimate the position of the TPs.

In this paper, the weight was calculated as the inverse of the signal distance—the Euclidean distance was adopted. Table 2 lists the positioning error for each TP using different models, and shows that the performance of the Gaussian model is the worst (with an average error of 2.23 m), while using the DGD and histogram generates similar results (1.69 m and 1.73 m, respectively), and the IDGD gives the lowest positioning error. For all individual TPs the performance of the IDGD is almost always the best, and overall performance is improved by about 40%, 20% and 21% compared with those based on the Gaussian, histogram and DGD, respectively. We also try a deterministic approach using the average of RSS for the same experiment, the results are no better (see the first line in the Table 2).

This first test indicated that the proposed model works well. Figure 5 shows the test bed (with an area of approximately 400 square metres) of the second test, consisting of a computer lab, corridors, a foyer, a kitchen and a toilet. In total, there were 68 RPs (red crosses) and 35 TPs, the latter being chosen at random. A similar procedure to the first test was used; there were about 40 RSS samples collected at each RP and 5–20 RSS measurements at each TP. All the data were collected at one working day. Figure 6 shows the results of the test—the horizontal axis is the number of the TP and the vertical axis is the positioning error. In generally the IDGD gave the most accurate positioning results.

Figure 7 shows the average positioning errors using the four models. The positioning accuracy using the IDGD is improved by about 42%, 33% and 24% compared to that of the histogram, Gaussian and DGD distributions, respectively. The small number of samples collected at each RP is the main reason that the histogram model performed the worst.

5. Concluding Remarks

The observation of double peaks of Wi-Fi signal strength has suggested the investigation of a new model known as the Double-peak Gaussian Distribution (DGD) to approximate the signal strength's distribution. Further investigation indicated that an improvement of the DGD was needed, and the Improved DGD (IDGD) was proposed.

The IDGD takes into account the different types of distributions of the RSS samples (sometimes one peak, in other circumstances two peaks). When one peak is detected a standard Gaussian distribution is used to create the fingerprint, whereas when two peaks are detected the DGD is used instead. Tests show that applying the new model for fingerprint-based positioning can significantly improve the positioning accuracy (by up to 40%). Furthermore, this model has the potential to reduce labour costs for the data training phase, i.e., to achieve the same level of positioning accuracy less RSS samples need to be collected during the training phase.


This work was supported by the following projects: The Pre-Research project of the key technology research of “Container Intelligent Logistics Based on BeiDou Satellite” which was funded by the Science and Technology Commission of the Shanghai Municipality (12511501102); the project of “Research on Authentication Platform of Cloud Computing based on the Internet of Things” which was funded by National Natural Science Foundation of China (61272468); and the project of “High Gain Low Cost Miniaturization Multimode Substrate Integrated Satellite Navigation Antenna” which was funded by the Shanghai Municipal Commission of Economy and Information.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Ferraro, R.; Aktihanoglu, M.; Li, L., Translators; Location-Aware Applications; Posts & Telecom Press: Beijing, China, 2012.
  2. Market Survey Report: Location Based Services-Market and Technology Outlook 2013–2020; Market Info Group LLC (MIG), Inc.: Kawasaki, Japan, 2013.
  3. Li, B.; Zhang, J.; Dempster, A.G.; Rizos, C. Open Source GNSS Reference Server for Assisted-Global Navigation Satellite Systems. J. Navig. 2011, 64, 127–139. [Google Scholar]
  4. Li, B.; Salter, J.; Dempster, A.G.; Rizos, C. Indoor Positioning Techniques Based on Wireless LAN. Proceedings of First IEEE International Conference on Wireless Broadband and Ultra Wideband Communications, Sydney, Australia, 13–16 March 2006; pp. 1–6.
  5. Bing, B. Wireless Local Area Networks: The New Wireless Revolution; Wiley-Interscience: New York, NY, USA, 2002. [Google Scholar]
  6. Mok, E.; Retscher, G. Location Determination Using WiFi Fingerprinting Versus WiFi Trilateration. J. Locat. Based Serv. 2007, 1, 145–159. [Google Scholar]
  7. Bahl, P.; Padmanabhan, V.N. RADAR: An In-building RF-Based User Location and Tracking System. Proceedings of IEEE 9th Annual Joint Conference of the IEEE Computer and Communications Societies, Tel Aviv, Israel, 26–30 March 2000; pp. 775–784.
  8. Youssef, M.A.; Agrawala, A.; Shankar, A.U. WLAN Location Determination via Clustering and Probability Distributions. Proceedings of First IEEE International Conference on Pervasive Computing and Communications, Fort Worth, TX, USA, 23–26 March 2003; pp. 143–150.
  9. Kohoutek, T.K.; Mautz, R.; Wegner, J.D. Fusion of Building Information and Range Imaging for Autonomous Location Estimation in Indoor Environments. Sensors 2013, 13, 2430–2446. [Google Scholar]
  10. Guerrero, L.A.; Vasquez, F.; Ochoa, S.F. An Indoor Navigation System for the Visually Impaired. Sensors 2012, 12, 8236–8258. [Google Scholar]
  11. Kaemarungsi, K.; Krishnamurthy, P. Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting. Proceedings of IEEE First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, Boston, MA, USA, 22–26 August 2004; pp. 14–23.
  12. Kaemarungsi, K. Distribution of WLAN Received Signal Strength Indication for Indoor Location Determination. Proceedings of 2006 1st International Symposium on Wireless Pervasive Computing, Phuket, Thailand, 16–18 January 2006.
  13. Youssef, M.A. HORUS: A WLAN-Based Indoor Location Determination System. Ph.D. Thesis, Department of Computer Science, University of Maryland, College Park, Maryland, MD, USA, 2004. [Google Scholar]
  14. Xiang, Z.; Song, S.; Chen, J.; Wang, H.; Huang, J.; Gao, X. A Wireless LAN-based Indoor Positioning Technology. IBM J. Res. Dev. 2004, 48, 617–626. [Google Scholar]
  15. Pei, L.; Chen, R.; Liu, J.; Kuusniemi, H.; Tenhunen, T.; Chen, Y. Using Inquiry-Based Bluetooth RSSI Probability Distributions for Indoor Positioning. J. Glob. Position. Syst. 2010, 9, 122–130. [Google Scholar]
  16. Small, J.; Smailagic, A.; Siewiorek, D.P. Determining User Location for Context Aware Computing Through The Use of A Wireless LAN Infrastructure, December 2000. Available online: (on accessed 20 September 2011).
  17. Ladd, A.M.; Bekris, K.E.; Rudys, A.; Marceau, G.; Kavraki, L.E.; Dan, S. Robotics-based Location Sensing Using Wireless Ethernet. Proceedings of the 8th Annual International Conference on Mobile Computing and Networking, New York, NY, USA, 23–28 September 2002; pp. 227–238.
  18. Wang, Y.; Jia, X.; Lee, H.K.; Li, G.Y. An Indoor Wireless Positioning System Based on WLAN Infrastructure. Proceedings of the 6th International Symposium on Satellite Navigation Technology Including Mobile Positioning & Location Services, Melbourne, Australia, 22–25 July 2003; pp. 1–13.
  19. Roos, T.; Myllymaki, P.; Tirri, H.; Misikangas, P.; Sievanen, J. A Probabilistic Approach to WLAN User Location Estimation. Int. J. Wirel. Inf. Netw. 2002, 9, 155–164. [Google Scholar]
  20. Li, B.; Dempster, A.G.; Barnes, J.; Rizos, C.; Li, D. Probabilistic Algorithm to Support The Fingerprinting Method for CDMA Location. Proceedings of International Symposium on GPS/GNSS, Hong Kong, 8–10 December 2005.
  21. Li, B.; Wang, Y.; Lee, H.K.; Dempster, A.G.; Rizos, C. Method for Yielding a Database of Location Fingerprints in WLAN. IEE Proc. Commun. 2005, 152, 580–586. [Google Scholar]
  22. Hashemi, H. The Indoor Radio Propagation Channel. Proc. IEEE 1993, 81, 943–968. [Google Scholar]
  23. Li, B. Terrestrial Mobile User Positioning Using TDOA and Fingerprinting Techniques. Ph.D. Thesis, School of Surveying & Spatial Information Systems, University of New South Wales, Sydney, Australia, 2006. [Google Scholar]
Figure 1. Location fingerprinting technique.
Figure 1. Location fingerprinting technique.
Sensors 13 11085f1 1024
Figure 2. Distribution characteristics of Wi-Fi signals in indoor test environments: (a) Guassian; (b) Double-peak.
Figure 2. Distribution characteristics of Wi-Fi signals in indoor test environments: (a) Guassian; (b) Double-peak.
Sensors 13 11085f2 1024
Figure 3. Comparison of DGD and IDGD with the empirical distribution of RSS.
Figure 3. Comparison of DGD and IDGD with the empirical distribution of RSS.
Sensors 13 11085f3 1024
Figure 4. The procedure of positioning using proposed model.
Figure 4. The procedure of positioning using proposed model.
Sensors 13 11085f4 1024
Figure 5. The test bed of the second test.
Figure 5. The test bed of the second test.
Sensors 13 11085f5 1024
Figure 6. Errors at test points in test 2.
Figure 6. Errors at test points in test 2.
Sensors 13 11085f6 1024
Figure 7. The average error using different models in test 2.
Figure 7. The average error using different models in test 2.
Sensors 13 11085f7 1024
Table 1. Double-peak distribution of RSSs in tests
Table 1. Double-peak distribution of RSSs in tests
The Area of the Test SiteThe Number of APs DetectedThe Number of APs Show Double-Peak DistributionPercentage
Residential room10 m228932%
Office45 m21343526%
Class room200 m21243831%
Shopping centre1,000 m21385238%
Table 2. Positioning errors of test 1 (unit: metres).
Table 2. Positioning errors of test 1 (unit: metres).
Sensors EISSN 1424-8220 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top