An Improved WiFi Positioning Method Based on Fingerprint Clustering and Signal Weighted Euclidean Distance

WiFi fingerprint positioning has been widely used in the indoor positioning field. The weighed K-nearest neighbor (WKNN) algorithm is one of the most widely used deterministic algorithms. The traditional WKNN algorithm uses Euclidean distance or Manhattan distance between the received signal strengths (RSS) as the distance measure to judge the physical distance between points. However, the relationship between the RSS and the physical distance is nonlinear, using the traditional Euclidean distance or Manhattan distance to measure the physical distance will lead to errors in positioning. In addition, the traditional RSS-based clustering algorithm only takes the signal distance between the RSS as the clustering criterion without considering the position distribution of reference points (RPs). Therefore, to improve the positioning accuracy, we propose an improved WiFi positioning method based on fingerprint clustering and signal weighted Euclidean distance (SWED). The proposed algorithm is tested by experiments conducted in two experimental fields. The results indicate that compared with the traditional methods, the proposed position label-assisted (PL-assisted) clustering result can reflect the position distribution of RPs and the proposed SWED-based WKNN (SWED-WKNN) algorithm can significantly improve the positioning accuracy.


Introduction
In indoor environments, global navigation satellite systems (GNSS) can be affected by unfavorable factors, such as signal blocking and multipath propagation, which make it unable to achieve a satisfactory positioning performance [1]. According to the report of the US Environmental Protection Agency, people spend nearly 70-90% of their time indoors [2]. Therefore, it is important to establish an accurate, reliable, and real-time indoor positioning system to satisfy the public demand for indoor positioning. With the rapid development of intelligent terminals, smartphones have more sensors and have become excellent tools for indoor positioning. Additionally, WiFi signal receiving modules have been widely embedded in smartphones, and hotspots have also been covered in many public places, such as office buildings, airports, and shopping malls. Therefore, WiFi fingerprint positioning has become one of the most popular indoor positioning schemes and has been widely used in recent years [3]. The WiFi fingerprint positioning can be divided into the offline stage and the online stage. In the offline stage, the user collects the received signal strength (RSS) from the wireless access points (APs) on each reference point (RP) whose position is known, the coordinates and RSS are stored in a fingerprint database. In the online stage, RSS should be a logarithm function of the physical distance [5]. Therefore, in the process of fingerprint matching and the weighted averaging of RP positions, using the traditional Euclidean distance or Manhattan distance will cause positioning errors.
It is also pointed out that the fingerprint collection process is very time-consuming and laborious, which is a major disadvantage of the fingerprint method. Numerous researchers have aimed at the automatic construction of indoor fingerprints in a real scenario application, such as simultaneous localization and mapping (SLAM) [26,27], data interpolation [28,29], and crowdsourcing technology [30,31]. Nevertheless, the automatic construction of a fingerprint database may lead to the inaccuracy of fingerprint data, which is not the focus of this paper. To better demonstrate that our proposed method can achieve more appropriate fingerprint clustering and improve the positioning accuracy, our fingerprint collection work is still done manually.
To address the existing problems of WiFi fingerprint positioning, this paper proposes an improved WiFi positioning method based on fingerprint clustering and signal weighted Euclidean distance (SWED). The remainder of this paper is organized as follows: In Section 2, an overview of the proposed method is presented, and the proposed clustering and positioning algorithms are described in detail. In Section 3, the clustering and positioning experiments are performed, and the performance of the proposed method is evaluated. Finally, the conclusions are discussed in Section 4.

Overview of the Proposed WiFi Fingerprint Positioning Method
The overall architecture of the proposed method is shown in Figure 1. In the offline stage, to reduce the influence of fading and shadowing on WiFi signal and make the RSS smoother, the original RSS observations are preprocessed. The traditional RSS-based clustering cannot well reflect the position distribution of RPs, to deal with this problem, we propose the position label-assisted (PL-assisted) clustering algorithm. We first implement the coordinate-based clustering using the k-means algorithm, and the clustering results are taken as the position labels of the RPs. Then, using the position labels as auxiliary information, all RPs are clustered by the Learning Vector Quantization (LVQ) algorithm. This process ensures that the clustering result is consistent with the position distribution of RPs and the position relationship between RPs. We theoretically analyze the distribution characteristic of WiFi signal and the positioning error of the traditional Euclidean distance-based WKNN (Euclidean-WKNN). To make the signal distance reflect the physical distance more accurately, we propose the SWED-based WKNN (SWED-WKNN) algorithm, which considers the nonlinear relationship between RSS and physical distance. Based on the overall size of each pair of RSS measurements, we assign weight to each differential RSS in the calculation of signal distance. In addition, to make our proposed SWED-WKNN algorithm more accurate and reasonable, we analyze the RSS distribution in actual environment. We define the line-of-sight AP (LOS-AP) based on the region of the cluster to which the TP belongs, and only the LOS-APs are used in the SWED calculation. The experiment results indicate that our proposed method can greatly improve the positioning accuracy.

Received Signal Strength Preprocessing and Fingerprint Database
In an indoor environment, WiFi signals are significantly affected by fading and shadowing [32]. The fading is mainly caused by the multipath propagation of signals reflected in walls, rooms, and floors, which can cause strong fluctuations in RSS value. The shadowing such as the presence of a

Received Signal Strength Preprocessing and Fingerprint Database
In an indoor environment, WiFi signals are significantly affected by fading and shadowing [32]. The fading is mainly caused by the multipath propagation of signals reflected in walls, rooms, and floors, which can cause strong fluctuations in RSS value. The shadowing such as the presence of a pedestrian between transmitter and receiver can considerably reduce the RSS value. The strong RSS would be mainly affected by fading, while the weak RSS may be affected by fading and shadowing or more factors [33]. Therefore, to reduce these influences, the RSS preprocessing method proposed in [33] is adopted in this paper. The method is to collect the original RSS observations within a certain period on a point, abandon the weakest RSS observations and take the average value of the strongest RSS observations as the RSS measurement. The preprocessed RSS is calculated by: where RSS j i represents the preprocessed RSS measurement of the jth AP at the ith RP, and orig.RSS j i,k represents the kth strongest original RSS observation. num. max is the number of the selected maximum RSS values and is set to 30 in this paper. In this way, the preprocessed RSS values are smoother, which is helpful for the following fingerprint clustering and positioning.
We divide the experimental area into grids and select the grid centers as the RPs. The format of the fingerprint database is shown as Table 1. The first column represents the ordinal number of the RPs, N and M are the number of the RPs and the APs, respectively. x i and y i are the coordinate of the ith RP, which is measured in the established two-dimensional coordinate system. As denoted in Equation (1), RSS j i represents the preprocessed offline RSS measurement of the jth AP at the ith RP.

Position Label-Assisted Clustering Algorithm
The fingerprint clustering is to gather the RPs with high similarity and separate that with low similarity based on certain similarity criteria. For the existing fingerprint clustering algorithms, the fingerprint similarity criteria can be divided into position distance and signal distance, among which the Euclidean distance is widely used: where d pos RP i , RP j and d sig RP i , RP j represent the position and signal Euclidean distance between the ith RP and the jth RP, respectively. The issue of signal ambiguity and position ambiguity exists in the WiFi fingerprint positioning [34]. The signal ambiguity issue can be described as that two points are close to each other, while their signal distance is large. It is mainly caused by the fluctuation of RSS and can be reduced to some extent by averaging multi-epoch RSS signals [35]. However, the position ambiguity issue is more difficult to solve. It can be described as that two points are far from each other, while their signal distance is small. It may bring serious problem that a cluster may cover many RPs that are far apart but have small signal distance, and it will also cause inaccurate cluster matching in online stage. Therefore, to solve the position ambiguity issue, we consider both the position distribution and the signal distance of RPs, propose the PL-assisted clustering using LVQ algorithm. Different from the traditional clustering methods, LVQ assumes that the samples have labels and the labels can assist clustering [36].
To obtain the position labels, the coordinate-based clustering is first implemented based on the position distance between RPs, as indicated in Equation (2). Since the coordinate-based clustering is relatively simple, the most common k-means algorithm is adopted. Then we take the category that each RP belongs to in the coordinate-based clustering as its position label. The number of the coordinate-based clusters is predefined as P, that is, the value of the given position labels ranges from 1-P. The position labels are denoted by: where L i is the position label of the i-th RP and its value ranges from 1-P. Our purpose is finding a set of the prototype vectors to characterize the clustering structure, each prototype vector defines a cluster. The number of the clusters is predefined as Q, which is also the number of the prototype vectors. In LVQ algorithm, the number of label types is not greater than the number of the clusters, that is, P ≤ Q. First, we need to select the Q offline RSS fingerprints as the Q initialized prototype vectors, so each prototype vector also has a label, which is the position label of the selected RP. To make the position labels of the prototype vectors traverse all the position labels, at least one RSS fingerprint is selected from each coordinate-based cluster as the initialized prototype vector. The prototype vectors and their position labels are denoted by: where V i is the ith initialized prototype vector, T i is the position label of V i and its value ranges from 1-P. First, we randomly select a RP from the fingerprint database, calculate the signal Euclidean distances between the selected RP and the Q prototype vectors to find the prototype vector with the minimum distance. Then we determine whether the position label of the nearest prototype vector is the same as that of the selected RP. If the position labels are the same, the nearest prototype vector is considered to have the potential to be the cluster center of the selected RP, then update the prototype vector to make it closer to the selected RP; otherwise, the nearest prototype vector is considered to be potentially unsuitable as the cluster center of the selected RP, then update the prototype vector to make it farther away from the selected RP.
For instance, for the position label T j of the nearest prototype vector V j the same as the position label L i of the selected RSS i , this indicates that they both have similar RSS and belong to the same coordinate-based cluster. Therefore, V j is considered to have the potential to be the cluster center of RSS i and is updated to closer to RSS i , denoted as: where α is the learning rate and ranges from 0 to 1, considering the convergence rate and the clustering effect of the algorithm, we set the learning rate to 0.1 empirically. The Euclidean distance between the updated prototype vector V j and RSS i is: We can see that the updated prototype vector V j is closer to RSS i . More importantly, for T j different from L i , this indicates that although their signal distance is the smallest, their position distance may be large. Therefore, it is considered that V j may not be suitable as the cluster center of RSS i and it is updated to farther away from RSS i , denoted as: It should be noted that for given a large P-value, the types of position labels also increase. This will cause the labels of the prototype vectors to be scattered and increase the probability that the label of the prototype vectors is different from the label of the selected RP. Correspondingly, for those prototype vectors with minimum signal distance and acceptable position distance from the selected RP, they will be updated to distant the selected RP due to the difference of position labels. This will be adverse for the convergence and the clustering effect of the algorithm. Therefore, to obtain satisfactory clustering result in practical application, we need to evaluate the clustering under different P-values and select the optimal value. After updating a prototype vector, we select the next RP and repeat the updating process, and the final prototype vectors can be obtained. Therefore, the RPs are classified into the cluster represented by the nearest prototype vector and all RPs are divided into the Q clusters. The stopping condition of the algorithm is to achieve the maximum number of iterations or the update range of the prototype vectors is very little. In the process of the online cluster matching, we calculate the signal Euclidean distances between the TP and all prototype vectors. Similar to the offline stage, the cluster represented by the nearest prototype vector is the cluster to which the TP belongs, denoted by: Compared with the traditional RSS-based clustering, the PL-assisted clustering algorithm effectively utilizes the position distribution of RPs. In addition, compared with the traditional coordinate-based clustering and the hybrid distance-based clustering in [13], since the criteria of our offline clustering and online cluster matching are consistent, it reduces the misjudgment of online cluster matching. The pseudocode of Algorithm 1 is listed below. position labels of prototype vectors T = T 1 , T 2 , . . . T Q ; learning rate α ∈ (0, 1); 1: repeat 2: randomly select an offline RSS fingerprint from the database; 3: calculate the Euclidean distance between the selected RSS i and all prototype vectors: 4: find the prototype vector V j closest to the selected RSS i : 5: if L i = T j 6: end 10: the prototype vector V j is updated as V j ; 11: return to line 2; 12: until achieve the maximum number of iterations or the update range of the prototype vectors is very little; Output: the final prototype vector V 1 , V 2 , . . . , V Q ;

Signal Weighted Euclidean Distance-Based Weighted K-Nearest Neighbor Algorithm
Before describing our proposed positioning method, we first theoretically analyze the distribution characteristic of WiFi signals and the positioning error of the traditional Euclidean-WKNN algorithm. Many WiFi signal attenuation models are summarized in previous work [37,38], such as log-distance, multi-slope, COST231, and International Telecommunication Union (ITU) models, which have been widely used to construct and update fingerprint database automatically. For convenience, the log-distance model is adopted for the theoretical analysis in this paper. As indicated in [25], the classical WiFi signal attenuation model is expressed by: where P L (d i ) represents the RSS at the point which has a distance d i to the AP. d 0 is the reference distance and usually set to 1 m. η is the path loss exponent and χ σ is a Gaussian random variable with standard deviation σ. Therefore, the physical distance d i and the differential physical distance ∆d can be easily calculated by: Based on the simulation data obtained by Equations (13) and (14), we analyze the relationship between RSS and physical distance. According to the results reported in [39], the path loss exponent η is set to 2.76, the reference distance d 0 is 1 m and P L (d 0 ) is −31.7 dBm. As shown in Table 2, for the same differential RSS value (∆RSS is equal to 1 dBm), the differential physical distances (∆d) under different RSS values are different. A pair of small RSS values is accompanied by a large differential physical distance, the relationship between the differential RSS and differential distance is nonlinear. We continue to analyze the positioning error of the traditional Euclidean-WKNN based on simulation data. As shown in Table 3, to understand it intuitively and simply, we only consider two APs, two RPs and five TPs in the one-dimensional coordinate system, and they are all on the same floor and in a straight line. Table 2. Analysis of the relationship between differential RSS value and differential physical distance based on simulation data. For a pair of RP and TP, the differential RSS values from two APs are different. Additionally, the ratio of the signal Euclidean distances between the TP and different RPs is not consistent with that of the physical distances. For instance, TP3 has the same physical distance (6 m) from RP1 and RP2, but its signal Euclidean distances from the two RPs are different (11 dBm for RP1 and 6.47 dBm for RP2). This is because, for a pair of RP and TP, the traditional Euclidean-WKNN only considers the size of the differential RSS but not the overall RSS size of a pair of RP and TP, which leads to errors in measuring physical distance. Therefore, although the simulation data can be considered as the positioning in an ideal environment without interference, multipath and other factors, there are still positioning errors for the traditional Euclidean-WKNN.
The above theoretical analysis shows that the nonlinear relationship between the differential RSS and physical distance should be considered in WKNN algorithm. Therefore, we propose the signal weighted Euclidean distance based WKNN (SWED-based WKNN) algorithm to improve the positioning accuracy. As mentioned before, in the initial stage of positioning, we calculate the signal Euclidean distances between each TP and all prototype vectors using Equation (11), and the cluster represented by the nearest prototype vector is the cluster to which the TP belongs. Therefore, only the RPs within that cluster are searched for each TP.
We first use the average RSS value to measure the overall size of each pair of RSS for a RP and a TP, then we calculate the difference between the average RSS value and P L (d 0 ), denoted by: where RSS j i is the offline RSS measurement of the jth AP at the ith RP, RSS j is the online RSS measurement of the jth AP at the TP, P j L (d 0 ) is the RSS value of the jth AP at reference distance. avg() and abs() are the average value function and the absolute value function, respectively.
As the analysis in Table 2, given the same differential RSS value, a pair of small RSS values is accompanied by a large differential physical distance. A pair of small RSS values means a small avg RSS j i , RSS j and a large DAR j i . Therefore, to balance the differential RSS and the physical distance, we assign a large weight to a large DAR j i : Then, we assume that the path loss exponents of all APs are the same and assign weights to the differential RSS of different APs. Thus, the SWED is calculated by: where SWED(RP i , TP) is the SWED between the i-th RP and the TP, ω j i represents the weight of the differential RSS of the j-th AP. m is the number of detected same APs, since the number of same APs at different RPs is varying, the signal distance is averaged by m to ensure the fairness of distance comparison. Finally, K nearest RPs with the minimum SWEDs are selected, the weights of the nearest RPs' coordinates are calculated by: The position is estimated by: where λ i is the weight of the ith RP, (x, y) and (x i , y i ) are the estimated position and the position of the ith RP, respectively. Obviously, the RP with a large SWED is assigned a small weight, which can reduce the contribution of the RPs away from the TP. It should be noted that for the practical positioning, fading, shadowing, and dynamic environment significantly make the RSS fluctuation [24]. Although we have preprocessed the original RSS observations, these effects cannot be eliminated completely. Therefore, we analyze the RSS distribution in actual experimental environment. According to whether there is a wall or other obstacles between the AP and the receiver, we divide the APs into the line-of-sight APs (LOS-APs) and the non-line-of-sight APs (NLOS-APs). For convenience, for a position point, we consider the APs in the same corridor as the LOS-APs of this point, and the others as the NLOS-APs. As shown in Figure 2a, we employ a LOS-AP and a NLOS-AP, select 21 points with different distance from the APs. The total span of the points is 20 m, the interval between adjacent points is 2 m and the signal collection time is three minutes at each point. Using the RSS preprocessing method in Section 2.2, we obtain the processed RSS at different distances.
From Figure 2b we can see that, because of the reflection and refraction of the signals, the WiFi signals show strong fluctuation. Especially the RSS distribution of the LOS-AP is more complex and unpredictable, and the signal curve of the LOS-AP is also not smooth and has many mutations. However generally speaking, compared with the NLOS-AP, the RSS distribution of the LOS-AP is basically consistent with the change of distance, that is, the signal intensity becomes weaker with the increase of distance and the signal attenuation rate is relatively faster in the distance closer to the AP. Therefore, it can be predicted that using different APs may have a significant impact on the performance of our proposed positioning method.  From Figure 2b we can see that, because of the reflection and refraction of the signals, the WiFi signals show strong fluctuation. Especially the RSS distribution of the LOS-AP is more complex and unpredictable, and the signal curve of the LOS-AP is also not smooth and has many mutations. However generally speaking, compared with the NLOS-AP, the RSS distribution of the LOS-AP is basically consistent with the change of distance, that is, the signal intensity becomes weaker with the increase of distance and the signal attenuation rate is relatively faster in the distance closer to the AP. Therefore, it can be predicted that using different APs may have a significant impact on the performance of our proposed positioning method.
The experimental site will be introduced in the next section of the experimental setup.
To confirm our speculation, we compare the positioning accuracy of our proposed algorithm with the LOS-APs and the NLOS-APs in the experimental field 1, which is introduced in the next section of the experimental setup and 120 TPs are selected. We use the error vectors to show the positioning performance more intuitively. As shown in Figure 3, the arrows point from the actual positions to the corresponding estimated positions, black arrows indicate the positioning errors of the SWED-WKNN with only LOS-APs, and red arrows indicate the positioning errors of the SWED-WKNN with all detected APs. We can see that the proposed SWED-WKNN method has larger positioning errors when all the APs are used. This is because the signal propagation path from the NLOS-APs is more complex, its RSS distribution may not be consistent with the signal attenuation model. Additionally, the values of the RSS from the LOS-APs and the NLOS-APs are confused. In the SWED calculation for a pair of RP and TP, the weights calculated by the RSS average values from different APs will become inaccurate, resulting in errors in the obtained SWED. To make our proposed SWED-WKNN method effective, in this paper, the positions of all APs are known and only the LOS-APs are used. Our solutions are as follows: 1. In the initial stage of positioning, we determine which cluster the TP belongs to using Equation (11). If the cluster is located at the corner of the experimental field, the traditional Euclidean-WKNN algorithm is used for positioning. This is because, for the RPs and TPs in the cornerclusters, their LOS-APs may be in either of the two corridors and are difficult to determine. 2. If the cluster the TP belongs to is not located at the corner of the experimental field, we consider the APs in the same corridor where this cluster is located as the LOS-APs. Then, we only use the RSS from these LOS-APs in our proposed SWED-WKNN algorithm. Therefore, it should be pointed out that our proposed SWED-WKNN algorithm is useful for the fingerprint positioning with LOS-APs or when the RSS distribution is basically consistent with the WiFi signal attenuation model, and it is not suitable for positioning with the NLOS-APs or the multifloor APs. The experimental site will be introduced in the next section of the experimental setup. To confirm our speculation, we compare the positioning accuracy of our proposed algorithm with the LOS-APs and the NLOS-APs in the experimental field 1, which is introduced in the next section of the experimental setup and 120 TPs are selected. We use the error vectors to show the positioning performance more intuitively. As shown in Figure 3, the arrows point from the actual positions to the corresponding estimated positions, black arrows indicate the positioning errors of the SWED-WKNN with only LOS-APs, and red arrows indicate the positioning errors of the SWED-WKNN with all detected APs. We can see that the proposed SWED-WKNN method has larger positioning errors when all the APs are used. This is because the signal propagation path from the NLOS-APs is more complex, its RSS distribution may not be consistent with the signal attenuation model. Additionally, the values of the RSS from the LOS-APs and the NLOS-APs are confused. In the SWED calculation for a pair of RP and TP, the weights calculated by the RSS average values from different APs will become inaccurate, resulting in errors in the obtained SWED. To make our proposed SWED-WKNN method effective, in this paper, the positions of all APs are known and only the LOS-APs are used. Our solutions are as follows: 1.
In the initial stage of positioning, we determine which cluster the TP belongs to using Equation (11). If the cluster is located at the corner of the experimental field, the traditional Euclidean-WKNN algorithm is used for positioning. This is because, for the RPs and TPs in the corner-clusters, their LOS-APs may be in either of the two corridors and are difficult to determine.

2.
If the cluster the TP belongs to is not located at the corner of the experimental field, we consider the APs in the same corridor where this cluster is located as the LOS-APs. Then, we only use the RSS from these LOS-APs in our proposed SWED-WKNN algorithm.
Therefore, it should be pointed out that our proposed SWED-WKNN algorithm is useful for the fingerprint positioning with LOS-APs or when the RSS distribution is basically consistent with the WiFi signal attenuation model, and it is not suitable for positioning with the NLOS-APs or the multi-floor APs.

Experimental Setup
To demonstrate the applicability of our proposed method in different indoor environments, the proposed method is tested by experiments conducted in two experimental fields. As shown in Figure  4, the black points represent the RPs and the distance between adjacent RPs is 1.

Experimental Setup
To demonstrate the applicability of our proposed method in different indoor environments, the proposed method is tested by experiments conducted in two experimental fields. As shown in Figure 4, the black points represent the RPs and the distance between adjacent RPs is 1.

Experimental Setup
To demonstrate the applicability of our proposed method in different indoor environments, the proposed method is tested by experiments conducted in two experimental fields. As shown in Figure  4, the black points represent the RPs and the distance between adjacent RPs is 1.

Result of Clustering Experiment
In this section, to evaluate the performance of different clustering methods, the Davies--Bouldin Index (DBI) [40] is introduced. DBI is a clustering validity index which can help us quantify the clustering effect based on the clustering data, and we can make a judgment according to the actual meaning of the data. To obtain the DBI value, the intra-cluster distance and inter-cluster distance need to be calculated first, which represent the dispersion degree of the RPs, calculated by:

Result of Clustering Experiment
In this section, to evaluate the performance of different clustering methods, the Davies-Bouldin Index (DBI) [40] is introduced. DBI is a clustering validity index which can help us quantify the clustering effect based on the clustering data, and we can make a judgment according to the actual meaning of the data. To obtain the DBI value, the intra-cluster distance and inter-cluster distance need to be calculated first, which represent the dispersion degree of the RPs, calculated by: inter.d sig C i , C j = d sig µ sig,i , µ sig,j inter.d pos C i , C j = d pos µ pos,i , µ pos,j where intra.d sig (C i ) and intra.d pos (C i ) represent the intra-cluster distances of the ith cluster in the signal domain and the position domain, respectively. inter.d sig C i , C j and inter.d pos C i , C j represent the inter-cluster distances between the ith cluster and the jth cluster in the signal domain and the position domain, respectively. |C i | is the number of the RPs in the ith cluster, µ sig,i and µ pos,i are the signal center and position center of the ith cluster, respectively. Based on the intra-cluster and inter-cluster distances of the clusters, the signal-domain DBI, the position-domain DBI and the hybrid DBI are calculated by:  (26) where Q is the number of the clusters, DBI sig , DBI pos and DBI hyb are the signal-domain DBI, the position-domain DBI and the hybrid DBI, respectively. As indicated in Equations (24) and (25), for each cluster, the maximum ratio value of the intra-cluster distance to the inter-cluster distance is selected, and the DBI is obtained by averaging these maximum ratio values. Generally, the maximum ratio value comes from the adjacent clusters, which is most important for evaluating clustering performance. As indicated in Equations (26), since the hybrid DBI is the square of product of the signal-domain DBI and the position-domain DBI, it can reflect both the signal relationship and position relationship of the RPs after the clustering. Clearly, a small DBI value means a good clustering performance, that is, the intra-cluster distances are small and the inter-cluster distances are large. Table 4 lists three DBI values of the proposed PL-assisted clustering algorithm under different P-values for a given fixed Q-value. The learning rate is set to 0.1 and the maximum number of iterations is set to 5000. As mentioned in the Section 2.3, the P-value is not greater than the Q-value. For the same Q-value we can see that, as the P-value is close to Q-value, some position-domain DBI values decrease and some signal-domain DBI values increase, while some other position-domain DBI (signal-domain DBI) values may decrease (increase) first and then increase (decrease). Similarly, the variation of hybrid DBI values is also not completely consistent with the variation of P-value. As explained in Section 2.3, for given a large number of types of position labels, the position distribution of RPs is clearer. However, this also increase the probability that the label of the prototype vectors is different from the label of the selected RP. Correspondingly, for those prototype vectors with minimum signal distance and acceptable position distance from the selected RP, they will be updated to distant the selected RP due to the difference of position labels. Therefore, we select the P-value with the minimum hybrid DBI value for the comparison with other clustering method.  Similar to the proposed algorithm, k-means is also a prototype-based clustering algorithm, which is widely used in WiFi fingerprint clustering. Therefore, we choose k-means algorithm for comparison and the clustering criterion of k-means is also the signal Euclidean distance.
According to the number of RPs and the experimental area, the Q-values from 4-12 are considered in experimental field 1 and from 2-10 in experimental field 2. Figure 5 shows the position-domain DBI, the signal-domain DBI, and the hybrid DBI value comparison between the proposed clustering algorithm and k-means algorithm under different Q-values. As shown in Figure 5a, for the proposed algorithm in the experimental field 1, the minimum values of the position-domain DBI, the signal-domain DBI, and the hybrid DBI are 1.54, 1.83, and 1.77 when the Q-values equal 8, 5, and 8, respectively. Accordingly, that of the k-means are 1.82, 1.78, and 1.93 when the Q-values equal 5, 4, and 7, respectively. As shown in Figure 5b, for the proposed algorithm in the experimental field 2, the minimum values of the position-domain DBI, the signal-domain DBI and the hybrid DBI are 1.59, 1.81 and 1.73 when the Q-values equal 6, 4 and 4, respectively. Accordingly, that of the k-means are 1.72, 1.93 and 1.87 when the Q-values equal 3, 4, and 5, respectively. It can be concluded that the position-domain DBI values of the proposed algorithm are smaller than that of the k-means, while the signal-domain DBI values are larger than the k-means for some Q-values. Compared with the signal-domain DBI, we are more concerned about the position-domain DBI, because for the position estimation, our ultimate purpose is to find the RPs closest to the position of the TP. The position labels are used as the supervised information to assist clustering, and they can reflect the position distribution of the RPs. Thus, some prototype vectors with the nearest signal distance, but inconsistent labels, do not serve as the cluster center, which, to some extent, leads to the increase of the signal-domain DBI value. Nevertheless, the hybrid DBI values of the proposed algorithm are still smaller than that of the k-means. Considering both the position-domain and the signal-domain, it means that the proposed clustering algorithm has a smaller ratio value of the intra-cluster distance to the inter-cluster distance, and the clustering performance outperforms the k-means algorithm.
Sensors 2019, 19, x FOR PEER REVIEW 14 of 20 distribution of the RPs. Thus, some prototype vectors with the nearest signal distance, but inconsistent labels, do not serve as the cluster center, which, to some extent, leads to the increase of the signal-domain DBI value. Nevertheless, the hybrid DBI values of the proposed algorithm are still smaller than that of the k-means. Considering both the position-domain and the signal-domain, it means that the proposed clustering algorithm has a smaller ratio value of the intra-cluster distance to the inter-cluster distance, and the clustering performance outperforms the k-means algorithm.  Figure 6 intuitively shows the clustering result comparison between the proposed algorithm and the k-means algorithm under the number of the clusters with the minimum hybrid DBI values in experimental fields 1 and 2; different colors and shapes denote different clusters. The red box denotes the RPs in the boundary region of adjacent clusters, and the black box denotes the RPs in the middle region of the cluster. We can see, for the two algorithms, there are both some RPs confused in the boundary region of adjacent clusters. This is because WiFi signals are influenced by a complex channel environment, which makes the RSS distribution in the boundary region not clear enough, resulting in the confusion of the RPs' distribution. However, as shown in the black boxes, compared with the k-means, the RPs clustered by the proposed algorithm can be divided more clearly, and there are fewer outliers in the middle region of the cluster, these outliers are the RPs with similar RSS but distant positions. Now we evaluate the effect of the different clustering algorithms on positioning accuracy. We select 120 points and 100 points as the TPs in experimental field 1 and experimental field 2, respectively. For convenience, we use the Euclidean-WKNN algorithm for the position estimation in this experiment and compare the positioning accuracy in terms of cumulative probability distribution. As shown in Figure 7, because of the reduction of those RPs in the cluster whose positions are far apart, the positioning error with the proposed PL-assisted clustering algorithm is smaller. It can also be concluded that better clustering results can produce better positioning performance.
(a) (b)  Figure 6 intuitively shows the clustering result comparison between the proposed algorithm and the k-means algorithm under the number of the clusters with the minimum hybrid DBI values in experimental fields 1 and 2; different colors and shapes denote different clusters. The red box denotes the RPs in the boundary region of adjacent clusters, and the black box denotes the RPs in the middle region of the cluster. We can see, for the two algorithms, there are both some RPs confused in the boundary region of adjacent clusters. This is because WiFi signals are influenced by a complex channel environment, which makes the RSS distribution in the boundary region not clear enough, resulting in the confusion of the RPs' distribution. However, as shown in the black boxes, compared with the k-means, the RPs clustered by the proposed algorithm can be divided more clearly, and there are fewer outliers in the middle region of the cluster, these outliers are the RPs with similar RSS but distant positions.
distribution of the RPs. Thus, some prototype vectors with the nearest signal distance, but inconsistent labels, do not serve as the cluster center, which, to some extent, leads to the increase of the signal-domain DBI value. Nevertheless, the hybrid DBI values of the proposed algorithm are still smaller than that of the k-means. Considering both the position-domain and the signal-domain, it means that the proposed clustering algorithm has a smaller ratio value of the intra-cluster distance to the inter-cluster distance, and the clustering performance outperforms the k-means algorithm.  Figure 6 intuitively shows the clustering result comparison between the proposed algorithm and the k-means algorithm under the number of the clusters with the minimum hybrid DBI values in experimental fields 1 and 2; different colors and shapes denote different clusters. The red box denotes the RPs in the boundary region of adjacent clusters, and the black box denotes the RPs in the middle region of the cluster. We can see, for the two algorithms, there are both some RPs confused in the boundary region of adjacent clusters. This is because WiFi signals are influenced by a complex channel environment, which makes the RSS distribution in the boundary region not clear enough, resulting in the confusion of the RPs' distribution. However, as shown in the black boxes, compared with the k-means, the RPs clustered by the proposed algorithm can be divided more clearly, and there are fewer outliers in the middle region of the cluster, these outliers are the RPs with similar RSS but distant positions. Now we evaluate the effect of the different clustering algorithms on positioning accuracy. We select 120 points and 100 points as the TPs in experimental field 1 and experimental field 2, respectively. For convenience, we use the Euclidean-WKNN algorithm for the position estimation in this experiment and compare the positioning accuracy in terms of cumulative probability distribution. As shown in Figure 7, because of the reduction of those RPs in the cluster whose positions are far apart, the positioning error with the proposed PL-assisted clustering algorithm is smaller. It can also be concluded that better clustering results can produce better positioning performance.

Result of Positioning Experiment
To evaluate the performance of our proposed positioning algorithm, we compare the positioning accuracy of three positioning algorithms under different distance measures (the traditional Euclidean-WKNN and Manhattan-WKNN, and the proposed SWED-WKNN). From the clusters that are not located the corners of the two experimental fields, we select 60 TPs in experimental field 1 and 50 TPs in experimental field 2. For different algorithms, the APs used for positioning are all the LOS-APs and the values of K are all set to 4.
As shown in Figure 8, we can see that using the LOS-APs, the positioning accuracy of the proposed SWED-WKNN algorithms outperforms the other two algorithms. Table 5 lists the positioning error statistics of three algorithms. Compared with the Euclidean-WKNN and Manhattan-WKNN, for the positioning in experimental field 1, the mean error improvement of SWED-WKNN is 9.6% and 25.6%, and the RMSE improvement is 12.9% and 32.3%. For the positioning in experimental field 2, the mean error improvement of SWED-WKNN is 19.1% and 24.7%, and the RMSE improvement is 22.4% and 28.3%. The positioning accuracy method can satisfy the requirements of indoor positioning. Now we evaluate the effect of the different clustering algorithms on positioning accuracy. We select 120 points and 100 points as the TPs in experimental field 1 and experimental field 2, respectively. For convenience, we use the Euclidean-WKNN algorithm for the position estimation in this experiment and compare the positioning accuracy in terms of cumulative probability distribution. As shown in Figure 7, because of the reduction of those RPs in the cluster whose positions are far apart, the positioning error with the proposed PL-assisted clustering algorithm is smaller. It can also be concluded that better clustering results can produce better positioning performance.

Result of Positioning Experiment
To evaluate the performance of our proposed positioning algorithm, we compare the positioning accuracy of three positioning algorithms under different distance measures (the traditional Euclidean-WKNN and Manhattan-WKNN, and the proposed SWED-WKNN). From the clusters that are not located the corners of the two experimental fields, we select 60 TPs in experimental field 1 and 50 TPs in experimental field 2. For different algorithms, the APs used for positioning are all the LOS-APs and the values of K are all set to 4.
As shown in Figure 8, we can see that using the LOS-APs, the positioning accuracy of the proposed SWED-WKNN algorithms outperforms the other two algorithms. Table 5 lists the positioning error statistics of three algorithms. Compared with the Euclidean-WKNN and Manhattan-WKNN, for the positioning in experimental field 1, the mean error improvement of SWED-WKNN is 9.6% and 25.6%, and the RMSE improvement is 12.9% and 32.3%. For the positioning in experimental field 2, the mean error improvement of SWED-WKNN is 19.1% and 24.7%, and the RMSE improvement is 22.4% and 28.3%. The positioning accuracy method can satisfy the requirements of indoor positioning.

Result of Positioning Experiment
To evaluate the performance of our proposed positioning algorithm, we compare the positioning accuracy of three positioning algorithms under different distance measures (the traditional Euclidean-WKNN and Manhattan-WKNN, and the proposed SWED-WKNN). From the clusters that are not located the corners of the two experimental fields, we select 60 TPs in experimental field 1 and 50 TPs in experimental field 2. For different algorithms, the APs used for positioning are all the LOS-APs and the values of K are all set to 4.
As shown in Figure 8, we can see that using the LOS-APs, the positioning accuracy of the proposed SWED-WKNN algorithms outperforms the other two algorithms. Table 5 lists the positioning error statistics of three algorithms. Compared with the Euclidean-WKNN and Manhattan-WKNN, for the positioning in experimental field 1, the mean error improvement of SWED-WKNN is 9.6% and 25.6%, and the RMSE improvement is 12.9% and 32.3%. For the positioning in experimental field 2, the mean error improvement of SWED-WKNN is 19.1% and 24.7%, and the RMSE improvement is 22.4% and 28.3%. The positioning accuracy method can satisfy the requirements of indoor positioning.  To show the advantages of the proposed SWED-WKNN algorithm more intuitively, Table 6 lists the comparison between the SWED-WKNN and the traditional Euclidean-WKNN on the selected nearest RPs, the physical distances and the positioning errors. The nearest RPs selected by SWED and Euclidean distance are arranged according to the size of their signal distances, respectively. We can see that compared with the traditional Euclidean distance, the physical distances between the TP and the nearest RPs are more consistent with the SWED. In other words, the nearest RPs are basically sorted in the same order as the SWED. This should be attributed to the introduction of weights in calculating signal distances and the proposed SWED-WKNN can reflect the physical distance between the TP and the RPs more accurately, so that the positioning error can be reduced.  To show the advantages of the proposed SWED-WKNN algorithm more intuitively, Table 6 lists the comparison between the SWED-WKNN and the traditional Euclidean-WKNN on the selected nearest RPs, the physical distances and the positioning errors. The nearest RPs selected by SWED and Euclidean distance are arranged according to the size of their signal distances, respectively. We can see that compared with the traditional Euclidean distance, the physical distances between the TP and the nearest RPs are more consistent with the SWED. In other words, the nearest RPs are basically sorted in the same order as the SWED. This should be attributed to the introduction of weights in calculating signal distances and the proposed SWED-WKNN can reflect the physical distance between the TP and the RPs more accurately, so that the positioning error can be reduced.

Conclusion and Future Work
This paper presents an improved WiFi positioning method based on fingerprint clustering and signal weighted Euclidean distance. The method is intended to cope with the issue of the nonlinear relationship between the RSS and the physical distance. The position distribution information of RPs is exploited to improve the clustering performance. Meanwhile, to make the signal distance reflect the physical distance better, we assign weights to different differential RSS in the calculation of the signal distance. The performance of this method is evaluated by the experiments in two typical office buildings. The experimental results demonstrate that the proposed PL-assisted clustering algorithm outperforms the traditional k-means algorithm, and the proposed SWED-WKNN algorithm outperforms the traditional Euclidian-WKNN and Manhattan-WKNN algorithms for the fingerprint positioning with the line-of-sight APs. For future work, to enhance the environmental adaptability of fingerprint positioning and solve the problem of fingerprint collection, we will research the automatic construction and update of a fingerprint database. In addition, we will further solve the problems in clustering, such as eliminating the outliers of the cluster and integrating the confusing RPs in regions of two adjacent clusters.

Conflicts of Interest:
The authors declare no conflict of interest.