DBSCAN and TD Integrated Wi-Fi Positioning Algorithm

: A density-based spatial clustering of applications with noise (DBSCAN) and three distances (TD) integrated Wi-Fi positioning algorithm was proposed, aiming to enhance the positioning accuracy and stability of ﬁngerprinting by the dynamic selection of signal-domain distance to obtain reliable nearest reference points (RPs). Two stages were included in this algorithm. One was the ofﬂine stage, where the ofﬂine ﬁngerprint database was constructed and the other was the online positioning stage. Three distances (Euclidean distance, Manhattan distance, and cosine distance), DBSCAN, and high-resolution distance selection principle were combined to obtain more reliable nearest RPs and optimal signal-domain distance in the online stage. Fused distance, the fusion of position-domain and signal-domain distances, was applied for DBSCAN to generate the clustering results, considering both the spatial structure and signal strength of RPs. Based on the principle that the higher resolution the distance, the more clusters will be obtained, the high-resolution distance was used to compute positioning results. The weighted K-nearest neighbor (WKNN) considering signal-domain distance selection was used to estimate positions. Two scenarios were selected as test areas; a complex-layout room (Scenario A) for post-graduates and a typical large indoor environment (Scenario B) covering 3200 m 2 . In both Scenarios A and B, compared with support vector machine (SVM), Gaussian process regression (GPR) and rank algorithms, the improvement rates of positioning accuracy and stability of the proposed algorithm were up to 60.44 and 60.93%, respectively. Experimental results show that the proposed algorithm has a better positioning performance in complex and large indoor environments.


Introduction
The global navigation satellite system (GNSS) is very hard to realize the high-precision indoor positioning because GNSS signals arrived at rooms are weak, or there are no GNSS signals [1]. Then, some indoor positioning techniques have been proposed to achieve the acquirement of the position of people and objects in the indoor environment. According to various structures of buildings and layouts of indoor environments, indoor positioning techniques can be divided into building dependence and building independence methods. The former methods were primarily based on electromagnetic and acoustic signals, such as ultra-wideband (UWB) [2][3][4], Bluetooth [5,6], wireless fidelity (Wi-Fi) [7][8][9], radio frequency identification (RFID) [10][11][12], ultrasonic or acoustic [13,14], geo-magnetism [15,16], pseudolite [17,18], and so on. The latter ones were based on computer vision [19,20] and inertial navigation system (INS) or pedestrian dead reckoning (PDR) [21,22]. Multiple techniques strength of RPs and map position-domain and signal-domain distances into the same metrics, reserving intrinsic connection and avoiding zero value of signal-domain distance.
A high-resolution distance selection principle was proposed. The number of clusters indicates the resolution of this distance. The larger the number, the higher the resolution of the distance. The high-resolution distance was applied to improve the positioning performance of Wi-Fi fingerprinting.
The rest of the paper is organized as follows. In Section 2, a review of related work is discussed. The research motivation and several algorithms are described in Section 3. The DBSCAN and TD integrated WKNN algorithm is presented in detail in Section 4. Experimental description and results analysis are given in Section 5. Finally, Section 6 concludes the work.

Related Work
There are offline and online stages in the Wi-Fi fingerprint positioning method. The aim of the offline stage is mainly to construct the fingerprint database. Given the huge consumption of human and material resources in the offline stage, crowdsourcing [39], interpolation [40], and the path loss model [41] were put forward to rapidly construct the fingerprint database. The purpose of the online stage is to estimate the position by real-time or post-process. Algorithms for location estimation mainly include the EWKNN algorithm [8], GPR [42,43], artificial neural network (ANN) [41,44], probabilistic algorithm [45,46], SVM [47], rank [48,49], etc.
Besides, some researchers have introduced the clustering algorithm to fingerprint positioning [33,50]. GPR, SVM, and ANN need model training for a long time before the establishment of positioning models. They also need a large online computation amount when the model is very complex. Furthermore, positioning accuracy of training models can be affected by the size of the training scenarios, such as office room, meeting room, teaching room, corridor, stair, parking lot, etc. Therefore, the above algorithms may not be suitable for large and complex indoor environments. However, the algorithm based on the signaldomain distance between online and offline fingerprints can be implemented without any training operations, such as nearest neighborhood (NN) [51], K-nearest neighborhood (KNN) [31], WKNN, and so on. In general, WKNN has a better positioning performance than NN and KNN [26], therefore, it is more widely used. Common fingerprinting algorithms based on signal-domain distance apply the ED-based signal-domain distance to achieve indoor positioning [52]. In fact, there are other ways to calculate the signaldomain distances, such as MD-based signal-domain distance, CD-based signal-domain distance, Sorensen signal-domain distance, Log Gaussian signal-domain distance, Hamming signal-domain distance, Jaccard signal-domain distance, Lorentzian signal-domain distance [37,38,53,54], etc.
Researchers have demonstrated that the positioning accuracy of various signal-domain distance computation methods are different. Li et al. [55] showed that the MD-based signaldomain distance behaved better than the ED-based signal-domain distance in terms of positioning accuracy when they were utilized to estimate location. It showed that the nearest RPs found with different signal-domain distances may be different. Literature [38] researched fifty-three signal-domain distances by using a public dataset and found that fingerprinting using Sorensen signal-domain distance had the best positioning performance. Minaev et al. [37] described forty-nine signal-domain distances, compared the positioning effects, and concluded that the performances of Hamming, Jaccard, Lorentzian signaldomain distances were optimal. Another study [9], used Spearman signal-domain distance to achieve relatively high-precision fingerprint positioning. Authors of [56] realized fingerprint positioning by using ED, MD, and Tanimoto signal-domain distances, respectively, in a multi-floor environment and certified that the positioning effect of MD was better than those of other distances. The authors of [57] compared ED, MD, and Mahalanobis signaldomain distances, and declared that MD and Mahalanobis signal-domain distances were optimal in an office environment and shopping center, respectively. Therefore, it is useful for fingerprinting to select an optimal signal-domain distance to estimate the location.
There were also some studies about reducing the influence of unreliable nearest RPs on positioning accuracy. For example, the correlation between the K value and the RSS was utilized to adapt the K value for each position, which could eliminate the bad influence of unreliable nearest RPs [31]. The nearest RPs with larger signal-domain distances, maximum and minimum coordinate values were regarded to be unreliable and removed. Then, the remaining nearest RPs were used to calculate the positioning result [58]. Clustering was also employed to eliminate interference from unreliable RPs by high-precision clustering identification and avoided their participation in the computation of online location estimation [59]. However, the accuracy of clustering identification might be lower due to RSS fluctuations and thus, the positioning precision may become poor. Thus, clustering might not be a good way to eliminate the influence of unreliable RPs on precision, especially in huge or complex indoor environments.

Basic Algorithm Description
This section will introduce the research motivation, construction of the offline fingerprint database, and the definitions of position-domain and signal-domain distances, respectively. Then, the definitions and calculation methods of ED, MD, and CD will be described in detail. Finally, the principles of WKNN, normalization, and DBSCAN algorithms will be presented.

Research Motivation
In the radio fingerprint positioning method based on signal-domain distance, the reliabilities of the searched nearest RPs could affect the accuracy of fingerprinting. If there is an untrusted RP participating in the position calculation, the estimated location will have a larger positioning error.
When the signal-domain distances were used to find nearest RPs, some unreliable nearest RPs appeared due to the errors in the fingerprints, as shown in Figure 1. There were four examples where the unreliable RPs took part in positioning calculation. The red triangle, blue diamond, and green circle represented the nearest RPs, positioning result, and true position, respectively. It can be seen that the two nearest RPs are away from the true position in Figure 1a, and one nearest RP is away from the true position in Figure 1b. Three nearest RPs are away from the true position in Figure 1c,d. In these examples, the positioning results were affected with poor accuracy.
The reason for causing poor positioning accuracy was that unreliable nearest PRs participated in the positioning estimation. Therefore, it is very necessary for fingerprint positioning to use reliable nearest RPs. Then, three signal-domain distances were adopted to enhance the reliability of the obtained nearest RPs. and Mahalanobis signal-domain distances, and declared that MD and Mahalanobis signal-domain distances were optimal in an office environment and shopping center, respectively. Therefore, it is useful for fingerprinting to select an optimal signal-domain distance to estimate the location. There were also some studies about reducing the influence of unreliable nearest RPs on positioning accuracy. For example, the correlation between the K value and the RSS was utilized to adapt the K value for each position, which could eliminate the bad influence of unreliable nearest RPs [31]. The nearest RPs with larger signal-domain distances, maximum and minimum coordinate values were regarded to be unreliable and removed. Then, the remaining nearest RPs were used to calculate the positioning result [58]. Clustering was also employed to eliminate interference from unreliable RPs by high-precision clustering identification and avoided their participation in the computation of online location estimation [59]. However, the accuracy of clustering identification might be lower due to RSS fluctuations and thus, the positioning precision may become poor. Thus, clustering might not be a good way to eliminate the influence of unreliable RPs on precision, especially in huge or complex indoor environments.

Basic Algorithm Description
This section will introduce the research motivation, construction of the offline fingerprint database, and the definitions of position-domain and signal-domain distances, respectively. Then, the definitions and calculation methods of ED, MD, and CD will be described in detail. Finally, the principles of WKNN, normalization, and DBSCAN algorithms will be presented.

Research Motivation
In the radio fingerprint positioning method based on signal-domain distance, the reliabilities of the searched nearest RPs could affect the accuracy of fingerprinting. If there is an untrusted RP participating in the position calculation, the estimated location will have a larger positioning error.
When the signal-domain distances were used to find nearest RPs, some unreliable nearest RPs appeared due to the errors in the fingerprints, as shown in Figure 1. There were four examples where the unreliable RPs took part in positioning calculation. The red triangle, blue diamond, and green circle represented the nearest RPs, positioning result, and true position, respectively. It can be seen that the two nearest RPs are away from the true position in Figure 1a, and one nearest RP is away from the true position in Figure 1b. Three nearest RPs are away from the true position in Figure 1c,d. In these examples, the positioning results were affected with poor accuracy.
The reason for causing poor positioning accuracy was that unreliable nearest PRs participated in the positioning estimation. Therefore, it is very necessary for fingerprint positioning to use reliable nearest RPs. Then, three signal-domain distances were adopted to enhance the reliability of the obtained nearest RPs.

Offline Fingerprint Database Construction
The main task of the offline stage of Wi-Fi fingerprint positioning is to build the fingerprint database. The precision of the fingerprint database can greatly affect the performance of fingerprinting. Generally, most researchers apply the mean of a sequence of RSS measurements as the signal feature of an RP, as shown in Equation (1), where represents the mean of a set of RSS measurements, is the collection times, and RSS denotes the ith RSS measurement.
The structure of the fingerprint database is shown in Table 1, including the location coordinates of known points and the signal features.

Id
Location Where and are the numbers of RPs and access points (APs), respectively, and MAC denotes the media access control (MAC) address of the th AP, and represents the mean of a sequence of RSS measurements of the th AP at the th location.
Each RP had a location and corresponding fingerprint. The fingerprint was made up of RSS mean of multiple APs. The online fingerprint database consisted of multiple RPs.

Position-Domain and Signal-Domain Distances
In this subsection, the position-domain and signal-domain distances will be introduced. There are two attributes for each RP. One is position coordinates, and the other is the signal features, i.e., RSS data gathered at one RP.
Based on these two attributes, we defined two distances: position-domain distance and signal-domain distance. The position-domain distance is the range between the position coordinates, as shown in Equation (2).
where ( , ) and ( , ) are the coordinates of the ℎ and ℎ RPs, and represents the position-domain distance between the ℎ and ℎ RPs.

Offline Fingerprint Database Construction
The main task of the offline stage of Wi-Fi fingerprint positioning is to build the fingerprint database. The precision of the fingerprint database can greatly affect the performance of fingerprinting. Generally, most researchers apply the mean of a sequence of RSS measurements as the signal feature of an RP, as shown in Equation (1), where µ represents the mean of a set of RSS measurements, L is the collection times, and RSS i denotes the ith RSS measurement. The structure of the fingerprint database is shown in Table 1, including the location coordinates of known points and the signal features.

Id
Location Where M and N are the numbers of RPs and access points (APs), respectively, and MAC j denotes the media access control (MAC) address of the jth AP, and µ ij represents the mean of a sequence of RSS measurements of the jth AP at the ith location.
Each RP had a location and corresponding fingerprint. The fingerprint was made up of RSS mean of multiple APs. The online fingerprint database consisted of multiple RPs.

Position-Domain and Signal-Domain Distances
In this subsection, the position-domain and signal-domain distances will be introduced. There are two attributes for each RP. One is position coordinates, and the other is the signal features, i.e., RSS data gathered at one RP.
Based on these two attributes, we defined two distances: position-domain distance and signal-domain distance. The position-domain distance is the range between the position coordinates, as shown in Equation (2). where (x i , y i ) and x j , y j are the coordinates of the ith and jth RPs, and dist position represents the position-domain distance between the ith and jth RPs. The signal-domain distance represents the similarity between offline and online fingerprints, which can be expressed with ED, MD, CD, Sorensen distances, etc. In other words, the signal-domain distance is the difference between RSS measurements on RPs and an unknown location. In the next subsection, we will take ED, MD, and CD as cases to describe the signal-domain distance in detail.

Three Signal-Domain Distances
Through the investigation and related work, it is found that these three distances are the most widely used and have achieved high-precision positioning accuracy. It is the main reason for the usage of these three distances. In addition, when there was only one distance, all the acquired RPs might be untrusted due to the errors of RSS measurements.
ED is the straight-line distance between two points in Euclidean space, which is one of the commonly used similarity measurements in indoor positioning fields. WKNN, GPR and SVM often apply ED as the similarity measurement. The ED-based signal-domain similarity can be expressed as: where (rss 1 , rss 2 , · · · rss N ) and (µ i1 , µ i2 , · · · µ iN ) represent the online RSS readings and fingerprint of the ith RP, respectively, N is the number of APs, and rss j is the RSS measurement of the jth AP. MD expresses absolute wheelbase summation of two points in standard coordinates. It can also be utilized for indoor positioning based on the Wi-Fi fingerprint to find the nearest RPs. The MD between the online RSS measurements and fingerprints can be expressed as: where rss j − µ ij represents the absolute value between rss j and µ ij . CD is the result of one minus the cosine similarity, which can be used to measure the similarity between two fingerprints. Generally, the smaller the similarity between online and offline fingerprints is, the bigger the CD of two fingerprints is. Therefore, CD is often applied for fingerprinting. The way to acquire CD-based signal-domain distance can be denoted as:

WKNN Algorithm
WKNN algorithm is a common fingerprint positioning algorithm. It utilizes the signaldomain distances to find the nearest RPs to realize the location estimation. In this paper, with normalized signal-domain distance, it was used to estimate the location. The detailed processes are shown, as below.
Step (1): Traverse the fingerprint database and calculate three signal-domain distances between the online RSS readings and each fingerprint. Based on the above description, there are many signal-domain distances, such as ED, CD, Sorensen signal-domain distance, and so on.
Step (2): Sort the signal-domain distances and find K RPs with minimum signaldomain distances. The weight of the nearest RP was calculated with the signal-domain distance by Equation (6); where dist signal represents the signal-domain distance between online fingerprint and the ith RP, and w i is the weight of the ith RP.
Step (3): Compute the weighted mean coordinates of K nearest RPs, which was regarded as the positioning result. It can be expressed as: where (x wknn , y wknn ) is the estimated location, and (x i , y i ) is the position of the ith nearest RP.

DBSCAN Algorithm
DBSCAN algorithm is a classic density-based clustering algorithm proposed by Martin and Hans in 1996 [60]. It is to find the cluster and noise based on the density. There are two important definitions: the Eps neighborhood of a point and density threshold, respectively. The Eps neighborhood of a point p can be denoted by N Eps (p) and the points in the neighborhood can be expressed as: where PS represents the point set, p and q present the points in the set, dist(p, q) represents the distance between two points, and Eps is the radius of the neighborhood. When the distance dist(p, q) is lower than Eps, indicating the point q is the direct density-reachable form p. If the point g is direct density-reachable from q and the distance dist(p, g) is lower than Eps, and p is density-reachable from q. Suppose the Eps-neighborhood of a point contains at least a minimum number of particles, then the point is a core point, as shown in Equation (9); where | N Eps (p) denotes the number of points in the Eps-neighborhood of the point p, and MinPts represents the minimum number of points in the neighborhood of a core particle. When a point had several points in its Eps-neighborhood and the number of these points was smaller than MinPts, this point was a border particle, and there was no point in the Eps-neighborhood of noise point.
Initially, the DBSCAN algorithm may start with an arbitrary point p and retrieve all points that were density-reachable from the given point p, as shown in Figure 2a. When the number of density-reachable points was greater than or equal to MinPts, a cluster composed of point p and its density-reachable points could be obtained, as shown in Figure 2b. When the search of density-reachable points of the current point was over, the DBSCAN algorithm would visit the next core point until all core points were traversed. Step (3): Compute the weighted mean coordinates of K nearest RPs, which was regarded as the positioning result. It can be expressed as: where ( , ) is the estimated location, and ( , ) is the position of the ℎ nearest RP.

DBSCAN Algorithm
DBSCAN algorithm is a classic density-based clustering algorithm proposed by Martin and Hans in 1996 [60]. It is to find the cluster and noise based on the density. There are two important definitions: the neighborhood of a point and density threshold, respectively. The neighborhood of a point can be denoted by ( ) and the points in the neighborhood can be expressed as: where represents the point set, and present the points in the set, represents the distance between two points, and is the radius of the neighborhood. When the distance ( , ) is lower than , indicating the point is the direct density-reachable form . If the point is direct density-reachable from and the distance ( , ) is lower than , and is density-reachable from . Suppose the -neighborhood of a point contains at least a minimum number of particles, then the point is a core point, as shown in Equation (9); where | ( )| denotes the number of points in the -neighborhood of the point , and represents the minimum number of points in the neighborhood of a core particle.
When a point had several points in its -neighborhood and the number of these points was smaller than , this point was a border particle, and there was no point in the -neighborhood of noise point. Initially, the DBSCAN algorithm may start with an arbitrary point and retrieve all points that were density-reachable from the given point , as shown in Figure 2a. When the number of density-reachable points was greater than or equal to , a cluster composed of point and its density-reachable points could be obtained, as shown in Figure 2b. When the search of density-reachable points of the current point was over, the DBSCAN algorithm would visit the next core point until all core points were traversed.  Two clusters were merged into one cluster when the minimum distance between them was lower than Eps. And the minimum distance was the separation between two closest points in two clusters, which can be denoted as: where S i and S j represent a cluster, respectively, dist S i , S j, presents the minimum distance between S i and S j , and p and q are the particles of S i and S j , respectively.

Overview
The system overview of the proposed algorithm is illustrated in Figure 3, including the offline and online stages. The signal-domain distances based on ED, MD, and CD between online fingerprint and offline fingerprints in fingerprint database were calculated, respectively. For one signal-domain distance, K RPs corresponding to K minimum distances can be found. In other words, three group of K RPs would be obtained. According to whether these K RPs were the same, the subsequent procedures could be divided into two cases. When the K RPs based on ED are the same as those based on MD and CD, these K nearest RPs could be treated as reliable ones, which would be used for further position calculation. When there were differences among three groups of K RPs, there might be unreliable RPs for further selection.
The position-domain distances between RPs were computed and combined with three signal-domain distances to generate the fused distance based on ED, fused distance based on CD, and fused distance based on CD. Then, these three fused distances were employed by DBSCAN algorithm to cluster the corresponding K RPs, respectively. Based on the high-resolution distance selection principle, the optimal signal-domain distances could be determined by the maximum number of clusters. Since WKNN was better than KNN and NN, the location estimation was achieved with WKNN. Two clusters were merged into one cluster when the minimum distance between them was lower than . And the minimum distance was the separation between two closest points in two clusters, which can be denoted as: where and represent a cluster, respectively, ( , , ) presents the minimum distance between and , and and are the particles of and , respectively.

Overview
The system overview of the proposed algorithm is illustrated in Figure 3, including the offline and online stages. The signal-domain distances based on ED, MD, and CD between online fingerprint and offline fingerprints in fingerprint database were calculated, respectively. For one signal-domain distance, K RPs corresponding to K minimum distances can be found. In other words, three group of K RPs would be obtained. According to whether these K RPs were the same, the subsequent procedures could be divided into two cases. When the K RPs based on ED are the same as those based on MD and CD, these K nearest RPs could be treated as reliable ones, which would be used for further position calculation. When there were differences among three groups of K RPs, there might be unreliable RPs for further selection.
The position-domain distances between RPs were computed and combined with three signal-domain distances to generate the fused distance based on ED, fused distance based on CD, and fused distance based on CD. Then, these three fused distances were employed by DBSCAN algorithm to cluster the corresponding K RPs, respectively. Based on the high-resolution distance selection principle, the optimal signal-domain distances could be determined by the maximum number of clusters. Since WKNN was better than KNN and NN, the location estimation was achieved with WKNN.

Fused Distance
In this paper, the concept of fused distance enhanced by a normalization algorithm with changeable intervals was proposed. It was the fusion of position-domain and signaldomain distances. Initially, the position-domain and signal-domain distances were adjusted to the same metric with normalization algorithm. Then, their weighted sum was the fused distance, as shown below:

Fused Distance
In this paper, the concept of fused distance enhanced by a normalization algorithm with changeable intervals was proposed. It was the fusion of position-domain and signaldomain distances. Initially, the position-domain and signal-domain distances were adjusted to the same metric with normalization algorithm. Then, their weighted sum was the fused distance, as shown below: where dist f usion denotes the fused distance, dist signal and dist position represent the signaldomain and position-domain distances, respectively.α is the fusion parameter between 0 and 1, and Nor(·) is the normalization algorithm. Normalization algorithm is to transform the dimensional expression is transformed into a dimensionless expression. It can adjust different metrics data to the same metric. Since position-domain and signal-domain distances belong to different metrics, the normalization algorithm is introduced to process the three signal-domain distances and a position-domain distance.
Generally, normalization will map the data into a value between 0 to 1. However, it was inappropriate to have a zero normalized distance, because the reciprocal of normalized distance was regarded as the weight of the nearest RP. Thus, we improved the traditional normalization algorithm. The transformed interval of the revised normalization algorithm could be changed based on actual demands. The computation method of the improved normalization is shown in Equation (12), which could map the data into a value between 1 to P where dist nor presents the normalized distance, dist represents the distance to be normalized, k denotes the slope, dist max and dist min are the maximum and minimum distances. The change of the interval could be realized based on the selection of slope. The value of slope k should be P − 1.

Description of TD
TD was defined as the fusion of three signal-domain distances, and it was the sum of three normalized signal-domain distances, which can be expressed as: where TD denotes the sum of three normalized signal-domain distances, and ED, MD, and CD represent the signal-domain distances based on ED, MD, and CD, respectively, and ED Nor

DBSCAN and TD Integrated WKNN Algorithm
In order to enhance the performance of fingerprinting, this paper proposed the DB-SCAN and TD integrated WKNN algorithm, which applied TD to enhance the probability of obtaining reliable RPs, and adaptively selected an optimal signal-domain distance to achieve the positioning computation, and used the proposed rule to judge whether RPs were credible. There were two stages in the proposed algorithm: offline and online stages. In the offline stage, the RSS measurements on existing RPs were collected with the smartphone. Then, the mean of a sequence of RSS and coordinates of RPs were utilized to build the fingerprint database. The online stage included three steps: same RPs judgment, clustering and distance selection, and positioning calculation.
Initially, the offline fingerprints were gathered, as shown below: RSS = {rss 1 , rss 2 , · · · rss N }. (14) Based on the signal-domain distance computation method, ED, MD, and CD between the offline fingerprints and fingerprint databases were calculated. Then, the normalization algorithm with an interval of 1 to 10 (P in Equation (6) is 10, the slope k is 9) was used to adjust the ED, MD, and CD to the same metric, which can be expressed as: We could obtain K nearest RPs according to single normalized signal-domain distance, as shown in Equation (16). In Equation (17), the nearest RP found by each signal-domain distance was available and the RP corresponding to the minimum distance was credible. Finally, these credible RPs were used to obtain the positioning results, and WKNN was employed as the positioning algorithm.
However, the nearest RPs searched by ED, MD and CD are often different, indicating that there may be unreliable RPs. Thus, we should choose the optimal distance to perform positioning to reduce the probability of getting unreliable RPs. That is, one, two, or three distances can be used in each localization process to realize high-precision fingerprinting according to the actual situation.
In this paper, we used the number of clusters as the index of distinction degree of signal-domain distance. A greater number of clusters indicated more resolution. DBSCAN and fused distance were used for clustering of RPs, and there were three fused distances, which can be denoted as: where dist E_P Based on these fused distances, we could obtain three groups of clustering results, as shown below: where Cluster E_P , Cluster M_P , and Cluster M_P represent the cluster sets based on three fused distances, respectively, and cluster E_P i present the ith cluster in Cluster E_P , and l1, l2, and l3 are the number of clusters. Theoretically, the larger the number of clusters is, the higher resolution of distance used for clustering is. This rule can be utilized to select the optimal signal-domain distance and improve the positioning accuracy.
We designed an adaptive selection strategy of the optimal signal-domain distance and reliable RPs according to three cases of l1, l2, and l3, as shown below.
In this situation, the degree of differentiation between ED, MD, and CD was considered to be consistent. TD (ED, MD, and CD) were simultaneously applied to calculate the positioning results, as shown in Equation (20).
Then, we sorted the Dist optimal and found the nearest RPs of density threshold quantity. The coordinates and signal-domain distance of these nearest RPs were utilized to estimate the location. WKNN algorithm was used as the positioning algorithm.
Case 2: li > lj = lk ||li < lj = lk , (i = j = k, i, j, k = 1, 2, 3) . There were two situations under this case. One was that there was two maximum number of clusters, and the other was that there was a maximum number of clusters. We chose the signal-domain distance corresponding to the maximum number of clusters, as shown below.
Then, we sorted the Dist optimal and found the nearest RPs. The coordinates and signal-domain distance of these nearest RPs were utilized to estimate the location. WKNN algorithm was applied for positioning, and K value was Minpts, i.e., density threshold.
Case 3: li > lj > lk , (i = j = k, i, j, k = 1, 2, 3). There was a maximum number of clusters. We selected the signal-domain distance corresponding to the maximum number of clusters, as shown below.
In this paper, the parameter α of the fused distance was 0.7, and the value of MinPts was 3, and the neighborhood radius Eps was 3.1 m, because the position-domain distance accounted for 30% of the fusion distance. The above parameters will remain changeless in the following tests to prove the universality of the proposed algorithm.

Experiment
All positioning algorithms were achieved based on the MATLAB 2021b simulation platform. SVM, GPR, and rank algorithms were selected as the comparison methods to assess the positioning performance of the proposed algorithm. Mean absolute error (MAE) and root mean square error (RMSE) were used as the indexes of accuracy and stability, respectively. MAE is the arithmetic average of the absolute errors between estimated values and true values. RMSE is the square root of the quadratic mean of differences between estimated values and true values.
Offline fingerprints and their corresponding coordinates were utilized as the training data to construct positioning models for SVM, GPR, and rank algorithms. A Bayesian optimization algorithm was employed to solve the parameters of SVM. The parameters of GPR model were solved with the quasi-Newton algorithm. Rank algorithm applied the sorting results of RSS measurements to achieve the position calculation.

Experiment Area and Experimental Description
There were two classic scenarios for experimental tests, Scenarios A and B, as shown in Figures 4 and 5. 60 cm determined the interval, facilitating the establishment of indoor grids. TPs were arranged randomly, and the 24th TP was outside the indoor grid. Eight APs using a Wi-Fi 6th generation protocol with a dual frequency band were pre-placed in different heights in the large room. Six APs were fixed on the wall at a height of 4 m, and one AP was placed at a height of 2 m on a steel filing cabinet, which was marked by the letter F. These seven APs were on the left of the large room. The another one was set on a big wooden table on the right of the room. Besides, another 17 APs were outside the room and arranged at different locations, such as in the long corridor, meeting room, and office room. The collection time and frequency of each RP were 80 s and 1Hz, respectively, and those at each TP were 40 s and 1 Hz, respectively. During RSS collection, all personnel in this room and somewhere else were moving normally. Many MAC addresses could be heard at each RP and TP. The lost RSS was replaced by −100 dBm. Scenario B was a corridor with an overall length of 211 m and had almost 3200 m 2 . The black solid squares and green solid circles represent RP and TP, respectively. The distance between two adjacent RPs was 1.2 m and the total number of RPs was 379. The RSS data were collected on every RP and used to generate the fingerprints. The sampling time was 60 s and the sampling frequency was 1 Hz.
The test data should be collected to evaluate the performance of the proposed algorithm after the RSS data collection on RPs was completed. The total number of TPs was Figure 4. Scenario A. The wooden desk is in orange, the wooden cabinet is in red, the plastic chair is in blue. Wireless signal station denotes access point, reference points, and test points are illustrated in different colors.
Scenario A was a large room, the graduate student's laboratory, with a complex layout, and easily seen in the existing indoor environment. The maximum length and width were 19.43 m and 18 m, respectively. In this scenario, there were 67 RPs marked by orange circles and 24 TPs marked by blue circles. The interval of most RPs was 1.2 m, while that of some RPs was 1.8 m. Both the existing layout and the tiles with the length and width of 60 cm determined the interval, facilitating the establishment of indoor grids. TPs were arranged randomly, and the 24th TP was outside the indoor grid. Eight APs using a Wi-Fi 6th generation protocol with a dual frequency band were pre-placed in different heights in the large room. Six APs were fixed on the wall at a height of 4 m, and one AP was placed at a height of 2 m on a steel filing cabinet, which was marked by the letter F. These seven APs were on the left of the large room. The another one was set on a big wooden table on the right of the room. Besides, another 17 APs were outside the room and arranged at different locations, such as in the long corridor, meeting room, and office room. The collection time and frequency of each RP were 80 s and 1Hz, respectively, and those at each TP were 40 s and 1 Hz, respectively. During RSS collection, all personnel in this room and somewhere else were moving normally. Many MAC addresses could be heard at each RP and TP. The lost RSS was replaced by −100 dBm.
Scenario B was a corridor with an overall length of 211 m and had almost 3200 m 2 . The black solid squares and green solid circles represent RP and TP, respectively. The distance between two adjacent RPs was 1.2 m and the total number of RPs was 379. The RSS data were collected on every RP and used to generate the fingerprints. The sampling time was 60 s and the sampling frequency was 1 Hz.
The test data should be collected to evaluate the performance of the proposed algorithm after the RSS data collection on RPs was completed. The total number of TPs was 86. The acquisition time and frequency were 10 s and 1 Hz, respectively. The data of 10 s was taken as one set of test data.

Stability of RSS Measurement
This subsection will mainly study the stability of RSS measurements by applying RMSE of ten minutes RSS measurements at a fixed location. The approximate true value of RSS was hard to acquire due to the complexity of radio propagation and the lack of high-precision measuring equipment. Thus, the accuracy analysis of RSS measurements was hard to conduct. While the mean of RSS measurements for a long time could be seen as true value. Based on the mean of RSS measurements, RMSE of RSS measurements might be obtained and regarded as an index of stability of RSS measurements. Figure 6 presented the RMSEs of 600 groups of RSS measurements from 12 APs. All RMSEs are greater than 2 dBm, which indicated that there were wide fluctuations in the RSS measurements. In order to analyze the change of the RSS measurements, the RSS measurement of AP 6 was selected as experimental data, as shown in Figure 7. The difference between the maximum and minimum RSS measurements was 30 dBm, which could cause a large positioning error. When facing the above huge changes, the acquired nearest RPs of each positioning request might be different even if at the same position, which caused the changes of multiple localization results of the same position. Thus, it is necessary to select an optimal signal-domain distance to reduce the impact of RSS fluctuations in terms of positioning accuracy and stability.

Stability of RSS Measurement
This subsection will mainly study the stability of RSS measurements by applying RMSE of ten minutes RSS measurements at a fixed location. The approximate true value of RSS was hard to acquire due to the complexity of radio propagation and the lack of high-precision measuring equipment. Thus, the accuracy analysis of RSS measurements was hard to conduct. While the mean of RSS measurements for a long time could be seen as true value. Based on the mean of RSS measurements, RMSE of RSS measurements might be obtained and regarded as an index of stability of RSS measurements. Figure 6 presented the RMSEs of 600 groups of RSS measurements from 12 APs. All RMSEs are greater than 2 dBm, which indicated that there were wide fluctuations in the RSS measurements. In order to analyze the change of the RSS measurements, the RSS measurement of AP 6 was selected as experimental data, as shown in Figure 7. The difference between the maximum and minimum RSS measurements was 30 dBm, which could cause a large positioning error. When facing the above huge changes, the acquired nearest RPs of each positioning request might be different even if at the same position, which caused the changes of multiple localization results of the same position. Thus, it is necessary to select an optimal signal-domain distance to reduce the impact of RSS fluctuations in terms of positioning accuracy and stability.

Impact of the Number of APs and RPs on Positioning Accuracy
In this subsection, we firstly studied the impact of the number of APs on positio accuracy. The number of deployed APs in the scenario was 8. The number of corresp ing MAC addresses was 16. Because there were 17 APs displayed outside the scenar 50 MAC addresses could be detected in total at RP and TP. Therefore, 16 and 50 M addresses were used for estimating position, respectively, and WKNN was the loca tion algorithm. Figure 8 gives the MAEs of WKNN positioning method with 16 an MAC addresses under different K values from 1 to 10. The MAE of WKNN positio method with 50 MAC addresses was smaller than that with 16 APs on any K value. H ever, with the increase of K value, the difference of MAE between positioning me with 50 and 16 MAC addresses under the same K value was getting smaller graduall some large K value, the difference may become very small-close to zero. When K between 1 and 10, the more the number of APs was, the higher the positioning accu was.

Impact of the Number of APs and RPs on Positioning Accuracy
In this subsection, we firstly studied the impact of the number of APs on positioning accuracy. The number of deployed APs in the scenario was 8. The number of corresponding MAC addresses was 16. Because there were 17 APs displayed outside the scenario A, 50 MAC addresses could be detected in total at RP and TP. Therefore, 16 and 50 MAC addresses were used for estimating position, respectively, and WKNN was the localization algorithm. Figure 8 gives the MAEs of WKNN positioning method with 16 and 50 MAC addresses under different K values from 1 to 10. The MAE of WKNN positioning method with 50 MAC addresses was smaller than that with 16 APs on any K value. However, with the increase of K value, the difference of MAE between positioning method with 50 and 16 MAC addresses under the same K value was getting smaller gradually. At some large K value, the difference may become very small-close to zero. When K was between 1 and 10, the more the number of APs was, the higher the positioning accuracy was.

Impact of the Number of APs and RPs on Positioning Accuracy
In this subsection, we firstly studied the impact of the number of APs on positioning accuracy. The number of deployed APs in the scenario was 8. The number of corresponding MAC addresses was 16. Because there were 17 APs displayed outside the scenario A, 50 MAC addresses could be detected in total at RP and TP. Therefore, 16 and 50 MAC addresses were used for estimating position, respectively, and WKNN was the localization algorithm. Figure 8 gives the MAEs of WKNN positioning method with 16 and 50 MAC addresses under different K values from 1 to 10. The MAE of WKNN positioning method with 50 MAC addresses was smaller than that with 16 APs on any K value. However, with the increase of K value, the difference of MAE between positioning method with 50 and 16 MAC addresses under the same K value was getting smaller gradually. At some large K value, the difference may become very small-close to zero. When K was between 1 and 10, the more the number of APs was, the higher the positioning accuracy was. Remote Sens. 2022, 13, x FOR PEER REVIEW 15 of 23 Then, the influence of the number of RPs on positioning accuracy was studied. The test area was Scenario B, and the number of RPs used in fingerprinting localization were 352, 126, 62 and 36. NN was the positioning algorithm. The experimental results were shown in Table 2. We could find a trend that the accuracy would become better with the increasing number of RPs.

Differences among ED, MD, and CD
The experiment was conducted to study the positioning effect under different signaldomain distances, and the test area was Scenario B. The ED-based signal-domain distances, MD-based signal-domain distances and CD-based signal-domain distances were calculated with the online fingerprint and fingerprint database, respectively, and the nearest RPs were found based on different signal-domain distances, respectively. Then, the positioning results were obtained with the NN algorithm.
The positioning errors of WKNN based on ED, MD, and CD, respectively, are shown in Figure 9. The positioning results varied when the signal-domain distance was different. To show the positioning effects based on different signal-domain distances, two small areas including multiple points were chosen to show the positioning performance, as illustrated in Figure 9a,b. In Figure 9a, the ED-based signal-domain distance was best at Point 22, while in Figure 9b, the CD-based signal-domain distance might be best at Point 46. And Figure 9b presented that the optimal signal-domain distance was MD at Point 47. The experimental results showed that the positioning results were different when the signaldomain distance was different. However, when three signal-domain distances were simultaneously used for positioning estimation, more reliable nearest RPs could be acquired. Thus, this paper proposed the WKNN algorithm based on TD, aiming to get more reliable nearest RPs. Then, the influence of the number of RPs on positioning accuracy was studied. The test area was Scenario B, and the number of RPs used in fingerprinting localization were 352, 126, 62 and 36. NN was the positioning algorithm. The experimental results were shown in Table 2. We could find a trend that the accuracy would become better with the increasing number of RPs.

Differences among ED, MD, and CD
The experiment was conducted to study the positioning effect under different signaldomain distances, and the test area was Scenario B. The ED-based signal-domain distances, MD-based signal-domain distances and CD-based signal-domain distances were calculated with the online fingerprint and fingerprint database, respectively, and the nearest RPs were found based on different signal-domain distances, respectively. Then, the positioning results were obtained with the NN algorithm.
The positioning errors of WKNN based on ED, MD, and CD, respectively, are shown in Figure 9. The positioning results varied when the signal-domain distance was different. To show the positioning effects based on different signal-domain distances, two small areas including multiple points were chosen to show the positioning performance, as illustrated in Figure 9a,b. In Figure 9a, the ED-based signal-domain distance was best at Point 22, while in Figure 9b, the CD-based signal-domain distance might be best at Point 46. And Figure 9b presented that the optimal signal-domain distance was MD at Point 47. The experimental results showed that the positioning results were different when the signal-domain distance was different. However, when three signal-domain distances were simultaneously used for positioning estimation, more reliable nearest RPs could be acquired. Thus, this paper proposed the WKNN algorithm based on TD, aiming to get more reliable nearest RPs. Remote Sens. 2022, 13, x FOR PEER REVIEW 16 of 23 Figure 9. Positioning errors based on ED, MD, and CD, respectively.

Positioning Performance by Using TD
TD aims to increase the possibility of getting reliable nearest RPs and improve positioning precision. The experiment was conducted to research the positioning effect of TD, and NN was the positioning algorithm. TD was compared with ED, MD, and CD. The experimental results were shown in Figure 10. It could be seen that the positioning effect of TD was better than those of ED, MD, and CD.

Positioning Performance by Using TD
TD aims to increase the possibility of getting reliable nearest RPs and improve positioning precision. The experiment was conducted to research the positioning effect of TD, and NN was the positioning algorithm. TD was compared with ED, MD, and CD. The experimental results were shown in Figure 10. It could be seen that the positioning effect of TD was better than those of ED, MD, and CD.

Positioning Performance by Using TD
TD aims to increase the possibility of getting reliable nearest RPs and improve positioning precision. The experiment was conducted to research the positioning effect of TD, and NN was the positioning algorithm. TD was compared with ED, MD, and CD. The experimental results were shown in Figure 10. It could be seen that the positioning effect of TD was better than those of ED, MD, and CD. Besides, the results indicated that the use of TD could improve the positioning effect and apply the advantages of three signal-domain distances. However, the TD-based fingerprinting could not select the optimal signal-domain distance for each localization, which might be the reason that the MAE of TD was similar to MD. Therefore, the realization of high-precision positioning needs to select optimal signal-domain distances.

Clustering Effect of DBSCAN
In this subsection, we will introduce the clustering effect of DBSCAN. Figure 11 shows the clustering effect of three positioning methods. In this paper, the value of MinPts was three.
Fused distances based on ED, MD, and CD were the fusion of position-domain distance and one of three signal-domain distances, ED-based signal-domain distance, MD-based signal-domain distance and CD-based signal-domain distance, respectively. The clustering results showed that DBSCAN using the fused distance had good clustering effects. The number of clusters was different when fused distances were different. This denoted the distinction degree of the fused distance was different in once localization.
Besides, the results indicated that the use of TD could improve the positioning effect and apply the advantages of three signal-domain distances. However, the TD-based fingerprinting could not select the optimal signal-domain distance for each localization, which might be the reason that the MAE of TD was similar to MD. Therefore, the realization of high-precision positioning needs to select optimal signal-domain distances.

Clustering Effect of DBSCAN
In this subsection, we will introduce the clustering effect of DBSCAN. Figure 11 shows the clustering effect of three positioning methods. In this paper, the value of was three. Fused distances based on ED, MD, and CD were the fusion of position-domain distance and one of three signal-domain distances, ED-based signal-domain distance, MDbased signal-domain distance and CD-based signal-domain distance, respectively. The clustering results showed that DBSCAN using the fused distance had good clustering effects. The number of clusters was different when fused distances were different. This denoted the distinction degree of the fused distance was different in once localization.
In Figure 11a, the number of clusters based on three fused distances, respectively, was the same. The clustering results of fused distances based on CD, MD, and ED were similar. This indicated that there was a small difference among the three fused distances. In Figure 11b,c, the bigger the distinction degree of the fused distance was, the more the number of clusters was.  In Figure 11a, the number of clusters based on three fused distances, respectively, was the same. The clustering results of fused distances based on CD, MD, and ED were similar. This indicated that there was a small difference among the three fused distances. In Figure 11b,c, the bigger the distinction degree of the fused distance was, the more the number of clusters was.

Performance of the Proposed DBSCAN-TD Integration WKNN Algorithm in Scenario A
In this subsection, we will evaluate the proposed method in Scenario A. Based on the collected testing data, the positioning experiment was conducted to study the stability and accuracy of the proposed method. The MAE was regarded as the index of accuracy. The stability of the positioning algorithm was measured with the RMSE. In order to rate the performance of the proposed algorithm, SVM, GPR, and rank were used as comparison algorithms.
The experimental results are shown in Figure 12. The MAE and RMSE of the pro-posed algorithm were 3.721 and 4.227 m, respectively, and those of SVM were 5.077 and 5.734 m, respectively. The MAEs of GPR and rank were 4.313 and 4.979 m, respectively, and RMSEs of those were 4.835 and 5.607 m, respectively. Note that the K of rank was 21.

Performance of the Proposed DBSCAN-TD Integration WKNN Algorithm in Scenario A
In this subsection, we will evaluate the proposed method in Scenario A. Based on the collected testing data, the positioning experiment was conducted to study the stability and accuracy of the proposed method. The MAE was regarded as the index of accuracy. The stability of the positioning algorithm was measured with the RMSE. In order to rate the performance of the proposed algorithm, SVM, GPR, and rank were used as comparison algorithms.
The experimental results are shown in Figure 12. The MAE and RMSE of the proposed algorithm were 3.721 and 4.227 m, respectively, and those of SVM were 5.077 and 5.734 m, respectively. The MAEs of GPR and rank were 4.313 and 4.979 m, respectively, and RMSEs of those were 4.835 and 5.607 m, respectively. Note that the K of rank was 21. More detailed errors are shown in Table 3. Compared with SVM, the stability and accuracy of the proposed algorithm were improved by 26.71% and 26.29%. The positioning precision of the proposed algorithm was improved by 13.72% and 25.26%, respectively, compared with GPR and rank. The stability of the proposed algorithm had an improvement of 12.58% and 24.61%, respectively, compared with GPR and rank. Besides, the maximum errors of SVM, GPR, rank and proposed algorithm were 9.562, 8.729, 9.737, and 8.256 m. The maximum error of the proposed algorithm was lower than those of SVM, GPR, and rank. Therefore, the proposed algorithm was better than SVM, GPR, and rank, which is more suitable in a room with a complex layout for positioning.
Actually, the positioning performances of SVM, GPR, rank and the proposed algorithm were not ideal in Scenario A, and the minimum MAE was only 3.721 m. Based on the description of Section 5.1, the number of APs in Scenario A was enough for relatively high fingerprint localization in theory. However, the actual positioning effect was not very good. The reason should be that the Wi-Fi signals were seriously disturbed by the complex More detailed errors are shown in Table 3. Compared with SVM, the stability and accuracy of the proposed algorithm were improved by 26.71% and 26.29%. The positioning precision of the proposed algorithm was improved by 13.72% and 25.26%, respectively, compared with GPR and rank. The stability of the proposed algorithm had an improvement of 12.58% and 24.61%, respectively, compared with GPR and rank. Besides, the maximum errors of SVM, GPR, rank and proposed algorithm were 9.562, 8.729, 9.737, and 8.256 m. The maximum error of the proposed algorithm was lower than those of SVM, GPR, and rank. Therefore, the proposed algorithm was better than SVM, GPR, and rank, which is more suitable in a room with a complex layout for positioning.
Actually, the positioning performances of SVM, GPR, rank and the proposed algorithm were not ideal in Scenario A, and the minimum MAE was only 3.721 m. Based on the description of Section 5.1, the number of APs in Scenario A was enough for relatively high fingerprint localization in theory. However, the actual positioning effect was not very good. The reason should be that the Wi-Fi signals were seriously disturbed by the complex layout of Scenario A, pedestrians, etc. The proposed algorithm still had great improvement, compared with SVM, GPR, and rank, indicating that the proposed algorithm could perform better in a complex scenario.

Performance of the Proposed DBSCAN-TD Integration WKNN Algorithm in Scenario B
This subsection will mainly illustrate the performance of the proposed algorithm in Scenario B. SVM, GPR, and rank were also utilized as comparison algorithms. The experimental results are shown in Figure 13, which were the cumulative distribution functions (CDFs) of the SVM, GPR, rank, and proposed algorithm. The maximum errors were 16.696, 19.992, 22.615, and 8.461 m, respectively. This indicated that the proposed method could avoid large errors and had better stability. In addition, the minimum errors were 0.138, 0.444, 0.209, and 0.014 m, proving that the proposed algorithm had decimeterlevel positioning ability.
Besides, the experimental area was a relatively huge opening-working area, so many external factors, such as multipath, non-line of sight, pedestrians, may influence the accuracy of fingerprinting. Generally, the fingerprint positioning in a larger indoor environment has poor performance. However, the probability of positioning error of the proposed method lower than 1 m reached 33.7%, and those probabilities of SVM, GPR, and rank were 10.47%, 12.79%, and 12.79%, respectively. Obviously, the proposed method achieved a bigger probability that made positioning errors below 1 m. layout of Scenario A, pedestrians, etc. The proposed algorithm still had great impro ment, compared with SVM, GPR, and rank, indicating that the proposed algorithm co perform better in a complex scenario.

Performance of the Proposed DBSCAN-TD Integration WKNN Algorithm in Scenario B
This subsection will mainly illustrate the performance of the proposed algorithm Scenario B. SVM, GPR, and rank were also utilized as comparison algorithms. The exp imental results are shown in Figure 13, which were the cumulative distribution functi (CDFs) of the SVM, GPR, rank, and proposed algorithm. The maximum errors w 16.696, 19.992, 22.615, and 8.461 m, respectively. This indicated that the proposed meth could avoid large errors and had better stability. In addition, the minimum errors w 0.138, 0.444, 0.209, and 0.014 m, proving that the proposed algorithm had decimeter-le positioning ability.
Besides, the experimental area was a relatively huge opening-working area, so ma external factors, such as multipath, non-line of sight, pedestrians, may influence the ac racy of fingerprinting. Generally, the fingerprint positioning in a larger indoor envir ment has poor performance. However, the probability of positioning error of the propo method lower than 1 m reached 33.7%, and those probabilities of SVM, GPR, and ra were 10.47%, 12.79%, and 12.79%, respectively. Obviously, the proposed method achiev a bigger probability that made positioning errors below 1 m. Figure 13. CDFs of SVM, GPR, rank, and the proposed method. Figure 14 shows the positioning effects of SVM, GPR, rank, and the propo method. The upper quartile, median, and lower quartile of the proposed method w better than those of SVM, GPR and rank. Therefore, the positioning effects of the p posed method was better than those of SVM, GPR, and rank.  Figure 14 shows the positioning effects of SVM, GPR, rank, and the proposed method. The upper quartile, median, and lower quartile of the proposed method were better than those of SVM, GPR and rank. Therefore, the positioning effects of the pro-posed method was better than those of SVM, GPR, and rank. Remote Sens. 2022, 13, x FOR PEER REVIEW 20 of 23 Figure 14. Positioning effects of SVM, GPR, rank and the proposed method. Table 4 shows the detailed statistical results of positioning errors of SVM, GPR, rank and proposed method. The cumulative error probabilities (50%, 70% and 90%) and corresponding errors are shown for comprehensive comparison. For example, the 50% errors of SVM, GPR, rank, and the proposed method were 3.215, 3.127, 4.515, and 1.745 m, which indicated that the proposed algorithm owned a better performance. Besides, the MAEs and RMSEs of SVM, GPR, rank and the proposed method were also displayed in Table 4. The MAE and RMSE of the proposed method were 2.094 and 2.638 m, respectively. The MAEs of SVM, GPR and rank were 3.82, 3.63, and 5.293 m, respectively, and the RMSEs of SVM, GPR, and rank were 4.735, 4.583, and 6.753 m, respectively. Obviously, rank had the worst positioning effect, with a poor accuracy and stability.

Cumulative error probability
Compared with SVM, GPR, and rank, the MAE of the proposed method was reduced by 45.18, 42.31, and 60.44%, respectively. The RMSE of the proposed method had a great improvement of 44.29, 42.44, and 60.94%, respectively. This indicated that SVM, GPR, and rank were not suitable for a large indoor environment, compared with the proposed method.
Therefore, the positioning performance of the proposed method was better than that of SVM, GPR, and rank in both complex and large indoor environments.  Table 4 shows the detailed statistical results of positioning errors of SVM, GPR, rank and proposed method. The cumulative error probabilities (50%, 70% and 90%) and corresponding errors are shown for comprehensive comparison. For example, the 50% errors of SVM, GPR, rank, and the proposed method were 3.215, 3.127, 4.515, and 1.745 m, which indicated that the proposed algorithm owned a better performance. Besides, the MAEs and RMSEs of SVM, GPR, rank and the proposed method were also displayed in Table 4. The MAE and RMSE of the proposed method were 2.094 and 2.638 m, respectively. The MAEs of SVM, GPR and rank were 3.82, 3.63, and 5.293 m, respectively, and the RMSEs of SVM, GPR, and rank were 4.735, 4.583, and 6.753 m, respectively. Obviously, rank had the worst positioning effect, with a poor accuracy and stability.
Compared with SVM, GPR, and rank, the MAE of the proposed method was reduced by 45.18, 42.31, and 60.44%, respectively. The RMSE of the proposed method had a great improvement of 44.29, 42.44, and 60.94%, respectively. This indicated that SVM, GPR, and rank were not suitable for a large indoor environment, compared with the proposed method.
Therefore, the positioning performance of the proposed method was better than that of SVM, GPR, and rank in both complex and large indoor environments.

Conclusions
This paper proposed a novel DBSCAN and three distances (TD) integrated Wi-Fi positioning algorithm. Three distances, DBSCAN, and high-resolution distance selection principle were combined to obtain more reliable adjacent RPs and optimal signal-domain distance in the online stage with the improvement of positioning performance. And the fused distance was enhanced by a normalization algorithm with changeable intervals, which could not only consider both the spatial layout and signal strength of RPs, but also map position-domain and signal-domain distances into the same metrics. Since the proposed method needs a period of observation time, while the speed of a walking person or moving object is larger than 1 m/s. So, our proposed method is applicable only for the location of devices or objects in semi-stationary conditions. Scenario A is a complex-layout room with many post-graduates and a complex layout, and Scenario B is a large indoor environment covering 3200 m 2 . Scenarios A and B are typical representatives of large and complex indoor environment. Compared with SVM, GPR and rank positioning methods, the improvement rates of positioning accuracy and stability of the proposed algorithm in Scenarios A and B were up to 60.44% and 60.93%, respectively. Therefore, the proposed algorithm has a better positioning performance in large and complex indoor environment.