An Efficient Indoor Positioning Method Based on Wi-Fi RSS Fingerprint and Classification Algorithm

Wi-Fi received signal strength (RSS) fingerprint-based indoor positioning has been widely used because of its low cost and universality advantages. However, the Wi-Fi RSS is greatly affected by multipath interference in indoor environments, which can cause significant errors in RSS observations. Many methods have been proposed to overcome this issue, including the average method and the error handling method, but these existing methods do not consider the ever-changing dynamics of RSS in indoor environments. In addition, traditional RSS-based clustering algorithms have been proposed in the literature, but they make clusters without considering the nonlinear similarity between reference points (RPs) and the signal distribution in ever-changing indoor environments. Therefore, to improve the positioning accuracy, this paper presents an improved RSS measurement technique (IRSSMT) to minimize the error of RSS observation by using the number of selected RSS and its median values, and the strongest access point (SAP) information-based clustering technique, which groups the RPs using their SAP similarity. The performance of this proposed method is tested by experiments conducted in two different experimental environments. The results reveal that our proposed method can greatly outperform the existing algorithms and improve the positioning accuracy by 89.06% and 67.48%, respectively.


Introduction
The rapid development of wireless technology leads people to pay more attention to location-based services (LBS). In recent years, indoor LBS has been widely used in airports, subways, shopping malls, and many other indoor environments for location identification, indoor navigation for the users, and the positioning demands of the different groups of people. Therefore, it is important to build an accurate, reliable, and real-time indoor positioning system to effectively satisfy the public demand for indoor positioning. With the rapid development of smartphones, more sensors and intelligent terminals have become excellent tools for indoor positioning [1][2][3].
Nowadays, the global navigation satellite system (GNSS) is well matured and is commonly used in outdoor positioning technology. LBS requirements in outdoor environments were well addressed and positioned in GNSS [4]. However, the GNSS is not applicable for indoor positioning systems due to its signal becoming weaker after penetrating through indoor environments; hence, it fails to produce reliable position information [5]. Different wireless technologies such as Bluetooth (BLE beacon) [6], radio frequency identification (RFID) [7], ultra-wideband (UWB) [8], geomagnetism [9], visible light [10], and Wi-Fi [11] were used in indoor positioning systems. Among the aforementioned technologies, Wi-Fi is the most effective technology used in indoor positioning research areas due to its low power consumption, high precision, and low cost [3]. The universality and common imple-searching all the RPs, a particular cluster (which has a group of RPs) will be searched to find the user's location using the clustering technique. The researchers have proposed different clustering techniques, such as k-means [23,24], Bisecting k-means [25], smallest enclosing circle [26], and affinity propagation [27,28]. However, the clustering criteria used in the above-mentioned clustering techniques do not consider the nonlinear similarity between RPs, which means the clustering result cannot perfectly reflect the position distribution of RPs. Since the k-means method is an unsupervised machine learning algorithm, the data are unlabeled, and the algorithm learns by itself using the dataset. The disadvantage of this method is that it randomly places the centroids to make clusters, and we never know how this algorithm sorted the data. The output obtained may not be what the user was expecting due to a data interpretation mismatch. When k is too large, the relatively relevant points might be divided into different clusters, and when k is too small, the less relevant points are much more likely to be divided into the same cluster. Cluster matching in an online phase will become inaccurate due to the inconsistency of the clustering criteria. A location-based clustering is proposed in [29]. Since the signal is not distributed equally due to the obstacles in an indoor environment, location-based clustering will lead to positioning errors. The combination of position label assisted clustering and signal weighted Euclidean distance is proposed in [30], which outperforms the traditional W k NN with Euclidean distance and a k-means clustering algorithm; however, it is only effective for positioning with the line-of-sight APs, and it is not suitable for positioning with non-line of sight APs.
On the other hand, researchers have focused on automatic fingerprint database construction methods such as interpolation and crowdsourcing [31]. A network fingerprint approach is proposed in [32] to auto construct the fingerprint database through Wi-Fi APs. The crowdsource-based adaptive area localization to enable area classification for continuously generated data is proposed in [33], which dynamically subdivides the floor plan into areas based on the quality and features of available training data. The authors of [34] built the Wi-Fi-based indoor navigation application that uses an interpolation method to generate artificial Wi-Fi fingerprints, but the interpolation technique hardly outperformed the accuracy achieved using manual site surveys. The automatic site survey may cause inaccuracy in the fingerprint database, so in this research, our fingerprint collection work is done manually for better demonstration.
The ever-changing dynamics of RSS in indoor environments impact the average value of missing APs, and the nonlinear similarity between RPs of cluster construction was not considered in the above methods. This paper proposes an IRSSMT RSS extraction method and SAP information-based clustering algorithm to tackle the above-mentioned problems of Wi-Fi fingerprint-based indoor positioning. The proposed IRSSMT helps make the RSS smoother and construct an effective fingerprint database, and the proposed SAP information-based clustering algorithm groups the RPs accurately according to its SAP similarity criterion. First, the Wi-Fi signals are collected and stored in RPs using IRSSMT, then the RPs are grouped as clusters according to their SAP similarity in the offline phase. Finally, the W k NN algorithm is employed in the online phase to estimate the user's position.
The rest of this paper is organized as follows: In Section 2, an overview of the proposed method is presented, and the proposed algorithms are described in detail. Section 3 consists of the experimental results and the performance evaluation of the proposed method. In Section 4, we present a discussion of the results. Finally, the conclusion and future works are discussed in Section 5.

The Proposed Wi-Fi Fingerprint Method
In the following section, we introduce our approach to the efficient indoor positioning method. We describe the overview of the proposed Wi-Fi RSS fingerprint method. Subsequently, we introduce the proposed concept of RSS preprocessing using IRSSMT and SAP information-based clustering. Finally, we describe the online localization algorithm used to select delegate clusters and find user locations.

Overview of the Proposed Wi-Fi RSS Fingerprint Positioning Method
The overall architecture of the proposed Wi-Fi RSS fingerprint method is shown in Figure 1. Since the proposed method is based on the traditional fingerprint algorithm, it also consists of two phases: the offline phase and the online phase. The traditional preprocessing methods only use the mean value of the RSS samples, and it does not reflect the ever-changing dynamics of RSS and the impact of the average value of its missing APs stored in the fingerprint database; to deal with this problem, we propose an IRSSMT. In the offline phase, we first collect the N number of original RSS observation samples at each RP. Then, the collected RSS observations are preprocessed using the median value of the RSS samples to make the RSS smoother and reduce the influence of shadowing and fading. The preprocessed RSS values are then stored in the fingerprint database. The traditional clustering methods cannot position the distribution of RPs; to overcome this problem, we propose an SAP information-based clustering algorithm. We first find the SAP information of RPs, and then that SAP information is assigned as the label of that RPs. Then, using the RP labels as the primary information, the RPs are partitioned into several clusters. This process ensures that the SAP information-based clustering results are consistent with the position distribution of RPs. In the online phase, one particular cluster is chosen as the delegate using SAP cluster matching. The user location is then calculated using the W k NN algorithm. The experiment results indicate that the proposed method can greatly improve the positioning accuracy.

Received Signal Strength Preprocessing and Fingerprint Database Construction
As mentioned earlier, due to the obstacles in indoor environments, Wi-Fi signals can be affected by reflection, diffraction, scattering, and absorption. Therefore, preprocessing the RSS is necessary in order to reduce these influences and make the RSS smoother. The flow chart of the IRSSMT is shown in Figure 2. In this new and improved method, N RSS observations were collected at a 3 s sampling rate at each RP. Any portable APs such as personal Wi-Fi hotspot (WiBro) and temporary Wi-Fi devices placed inside the room for personal use will affect the fingerprint database; to overcome this problem, the portable APs are filtered. Then the RSS samples are sorted in descending order. A weak Wi-Fi signal received from distant APs may affect the positioning accuracy by presenting less stability and reliability, this should be considered during data collection, so the empirical RSS threshold is set to t for rejection of APs with ineffective RSS values. In this research, −90 dBm (t = −90 dBm) is set as an empirical threshold value; it is suitable for our scenario, and it may change in other scenarios with different hardware and AP positions. The existing methods, such as the average method, error handling method, and particle filters, may not be able to handle the complex Wi-Fi RSS measurement noise in dynamic and ever-changing indoor environments [19]. For example, according to the distance between RP and AP in the environment, some APs may not be heard in some RPs due to fading and shadowing. In such cases, the missing APs were filled with −99 dBm, and it is considered as the maximum RSS received at the missing APs. These missing AP's maximum RSS values make a huge difference in the mean value of the collected samples. Therefore, the mean value of the collected RSS and existing methods may not reflect the dynamic behavior of the RSS and the impact of the missing APs. In this case, the median value of the processed RSS samples is a more appropriate representative of RP (see Section 3.1 for a thorough explanation); so, the median value of the original RSS samples is calculated.
The processed RSS vector is stored in the RP and is expressed as: where RP i is the processed RSS values stored at the ith RP, and RSS ij is the processed RSS of the jth AP at ith RP, i = 1, 2, . . . , n, where n is the number of RPs and j = 1, 2, . . . , m, where m is the number of existing APs. The RPs are then stored in the fingerprint database along with their RP position (X, Y coordinates), and the format of the fingerprint database is shown in Table 1. The first column represents the number of RPs in the database, X and Y are the coordinates (RP position) of the ith RP measured in the two-dimensional coordinate system of the floor plan. As mentioned in Equation (1), RSS ij represents the processed RSS value of the jth AP at the ith RP.

Strongest Access Point Information-Based Clustering Algorithm
We analyzed the similarities between RPs in the offline fingerprint database based on real-time data and found that the set of RPs obtained the strongest RSS value from the particular AP. We continued to analyze the similarities and collected 12 RPs in different locations in a two-dimensional coordinate system and analyzed their RSS values as shown in Table 2. Table 2. Analysis of similarities between RPs. The strongest RSS value obtained in RP1, RP2, RP3, and RP4 is from AP1, and RP5 to RP8 is from AP2, and RP9 to RP12 is from AP3. According to this real-time result of preprocessed data, we can see that the set of RPs belongs to the particular AP coverage area. Likewise, we can assume that the user is also present somewhere near that same AP coverage area, and those APs are called SAPs.

RSS of AP6
Inspired by this distribution characteristic of the Wi-Fi signals, we propose an SAP information-based clustering algorithm. The proposed SAP cluster strategy takes more consideration of the signal propagation of the APs and their environment. The AP that obtained the strongest RSS value in RP i will be set as its SAP label and is denoted by: where SAP i is the SAP label of the ith RP, argmax m j=1 RSS ij is the maximum value of RSS obtained from the jth AP at the ith RP, m is the number of APs for RP i The SAP number will be updated at the SAP label column as shown in Table 2.
In the fingerprint clustering algorithm, the similarity criteria can be calculated using the signal distance (SD) between RPs using Euclidean distance [35], whereas, in this proposed method, the similarity of the SAP labels between RPs is calculated using the following expression: where S d (RP i , RP j ) represents the similarity distance between the SAP label of the ith RP and jth RP. To obtain the clusters, the SAP information-based clustering is implemented based on the SAP label distance between RPs, as indicated in Equation (3). The set of clusters are denoted as: where C l is the group of RPs stored at the lth cluster, l = 1, 2, . . . , p where p is the number of clusters. In this method, the number of clusters is fixed, and it depends on the number of deducted SAPs in the indoor environment. Then, the mean value of the RPs in C l will be calculated and assigned as the cluster centroid. The set of cluster centroids are denoted as: where CC r is the number of cluster centroids stored in the rth CC and r = 1, 2, . . . , q where q is the number of CC.

Online Localization Algorithm
In the online phase, we first collect the RSS vectors at the test point (TP), and it is expressed as: where TP is the set of RSS vectors collected at the TP (user's position), and RSS j is the collected RSS of jth AP at the TP.

Cluster Matching
The main goal of the delegate cluster selection step is to reduce the RP space, through the selection of a delegate cluster centroid that matches the online reading, according to the SAP similarity metric. Instead of comparing the TP with all the RPs in the fingerprint database, we first compare the TP with the SAP cluster centroids CC r , then select the particular cluster with the minimum distance as the position set the TP belongs to. The delegate cluster is selected using the following equation: where D(TP, CC r ) is the distance between the TP and cluster centroids, RSS rj is the mean RSS value of the jth AP at rth CC. After selecting the delegate cluster using Equation (7), the search space for the algorithm is reduced.

Weighted k-Nearest Neighbor Algorithm
Once the delegate cluster is chosen, the TP is compared with all the RPs belonging to that cluster to determine the position of the user by applying the W k NN algorithm. The W k NN algorithm is an improved classification algorithm of k NN, and the working mechanism is to find k nearest RPs, which have a minimum signal distance from the RSS observed at the user location, and weighted average the positions of the selected k nearest RPs according to their signal distance. The W k NN global method will apply weights to all the data points in the database, which also increases the computational overhead. To overcome this issue, the W k NN local method is applied in this research. The weights are applied only to the k found neighbors. It chooses the RP that has the highest weight and minimum distance to TP in the signal space [36].
The k-nearest neighbor is calculated by: where D(TP, C l ) is the distance between the TP and the RPs in the lth delegate cluster C. The distance values are sorted in ascending order, and the set of k nearest RPs can be selected according to their minimum distance. In the k NN algorithm, the k value can be chosen as: The coordinate of the k nearest RPs with a minimum distance in all D(TP, C l ) is calculated by: where (x, y) is the user position in the Cartesian coordinates, (xi, yi) is the position of the ith RP, andωi is the weighting factor of the ith k selected RP. The weightωi can be calculated by: The closest RP with the highest weight and the lowest distance among the k selected RP is considered as the user position.

Performance Evaluation
To validate the performance of our proposed method, two experiments were conducted. Experiment 1 was conducted in the information and communication engineering (ICE) department corridor, 4th floor of the engineering building in Wonkwang University, Republic of Korea. Figure 3 shows the schematic diagram of the corridor on the fourth floor. There were more than 20 APs deployed, and a total of 60 RPs were marked in the experimental field. The accuracy of the position depends on the density of RPs, so the distance between adjacent RPs was set to 1.8 m, and the length of the corridor was 60 m. The position of the RP was measured in the two-dimensional coordinate system and was marked in the X, Y coordinate of the floor plan. A Toshiba Satellite C50-A with an Intel Core i5-3230M CPU 2.60 GHz, 4 GB RAM, and a 64-bit OS was used as the server and the client machine. To deduct APs and collect Wi-Fi RSS data, we used a Wi-Fi network scanner application called "inSSIDer" version 4.4.0 from MetaGeek, LLC. Experiment 2 is explained in Section 3.4.

Results of the Improved RSS Measurement Technique
There are more than 15 APs deducted at each RP, but they were filtered as mentioned in Section 2.2. The mean and median value was taken from N (N = 5) samples of the RP and is compared as shown in Figure 4. The RSS from APs 6,7,8,11,12,14,15 was not heard in some samples due to obstacles in the indoor environment, and these are called missing APs. In this case, −99 dBm was set as its maximum RSS value for those APs as mentioned in Section 2.2, and it makes a huge difference between the mean and median values. Generally, the RSS characteristics are highly based on the mean or probability of the Wi-Fi signal. However, due to fading and shadowing in a complex and ever-changing indoor environment, the characteristics of the RSS are spatially and temporally different. The basic general situations are: (1) The Wi-Fi signal probability distribution is negativeskewed in the presence of weak multipath interference; (2) the Wi-Fi signal probability distribution is positive-skewed if the multipath is stronger than the signal. The mean value of the selected RSS is used in [19]; due to the above-mentioned reasons, the mean value of RSS may not be accurate, and it will affect the positioning accuracy, so the median value of RSS is generally considered to be the most suitable representative of the center of the RSS value.  To verify that the improved RSS measurement will reduce Wi-Fi signal fluctuations, we compared the waveforms of the original N RSS samples and the processed RSS of the Wi-Fi signal, as shown in Figure 5. There are N (N = 5) RSS samples collected at the same place from 17 APs, and each sample has a different RSS value from the same AP with a different time interval. This shows the importance of signal preprocessing.

Results of the SAP Cluster Experiment
The RPs are grouped into three clusters using their SAP information, and their cluster centroid was calculated according to the description in Section 2.3, and as shown in Figure 6. The three different colors represent three different clusters. To compare the performance of the SAP cluster with the traditional k-means, we set the k-means as a comparison group because it is one of the most widely used robust clustering algorithms for Wi-Fi RSS fingerprint-based indoor positioning [37]. We can see from Figure 6 that the similarity between RPs in cluster 0 shows that the strongest RSS is received from SAP 1, and in clusters 1 and 2, the strongest RSS is received from SAPs 2 and 3, respectively. The SAP cluster is plotted as shown in Figure 7.
Usually, in k-means cluster analysis, the Elbow method is used in determining the optimal k value [22]. In this research, we used two different k values for the k-means algorithm, and their result was compared with the proposed SAP algorithm to show the similarity correlation between RPs.
The cluster results of k-means and the comparison between k-means and SAP are shown in Figure 8. In the k-means cluster, k = 2, and k = 5 are applied and are plotted as shown in (a, c), and (b, d) shows the comparison between the k-means and the SAP cluster results. In (b, d), three different groups of RPs on the Y-axis 0, 1, 2 represent three different SAP clusters (cluster 0, 1, 2). The RPs with green and red colors in (b) represent k-means clusters where k = 2. Likewise, five different colors in (d) represent k-means clusters where k = 5. The simulation result of the k-means algorithm shows that the RPs that are less relevant are divided into the same cluster when the k value is small, and the RPs that are relatively relevant are divided into different clusters when the k value is large. Whereas, in this proposed SAP cluster, the grouped RPs are more relevant to each other, and the number of clusters is fixed.

Results of the Positioning Experiment
The TPs were taken at the corridor of the experiment field, and the location of the user is estimated using the traditional methods and proposed method. The results of the positioning experiment of the proposed method are shown in Figure 9. The black arrow marks and circles represent the distance between the TP (original user location) and the estimated user location. The error distance between the real user location (TP) and estimated user location is calculated using the below equation: where DEE is the distance estimation error between the TP and the estimated user location, x test , y test is the X Y coordinate of the TP, and x est , y est is the X Y coordinate of the estimated user location.
To evaluate the performance of our proposed algorithm, the positioning accuracy was compared with the traditional W k NN algorithm, W k NN + k-means algorithm, and decision tree algorithm, [25,30] shows that these algorithms are reasonable benchmarks. The comparison of DEE between the proposed algorithm and the other algorithms is shown in Figure 10.
As shown in Figure 10, 35% of TPs (6 out of 17 TPs) have achieved the minimum error (0 m) in the proposed method, whereas the W k NN, W k NN + k-means, and decision tree algorithms achieved 0% (0 out of 17 TPs), 5% (1 out of 17 TPs), and 11% (2 out of 17 TPs) respectively. The maximum error of the proposed method is 2.54 m, whereas the maximum error of W k NN, W k NN + k-means, and decision tree algorithm are 10.86, 6.41, and 26.1 m, respectively. The physical distance between the TP and the estimated user location is more consistent with the proposed method.

Experimental Environment 2
To validate the applicability of our proposed method in a different scenario with different hardware and AP positions, we experimented with our method in a larger indoor environment. Experiment 2 was conducted on the electronics convergence engineering (ECE) department, which is located on the 4th floor of the engineering building. The schematic diagram and the clustering results of experimental environment 2 are shown in Figure 11. A total of 179 RPs were marked in the experimental field, and the distance between adjacent RP was 1.8 m. A total of 895 RSS samples were collected and stored in 179 RPs after preprocessing according to Section 2.2. Overall, 25 APs were deducted during the site survey, and 10 APs were selected as SAPs according to their signal distribution in the environment. The RPs were grouped into 10 different clusters according to their SAP information. Different colors represent different clusters in Figure 11. The TPs were collected at different locations, and the user position is finally estimated.

Discussion
In this section, we examine the results of the proposed method and various algorithms used to acquire the user location. We collected 17 TPs in experimental environment 1 and 35 TPs in experimental environment 2. We analyzed the distribution of error distance using the cumulative distribution function (CDF), which is a method used to describe the distribution of error distance, and it is predominantly useful to observe the DEE as a cumulative error distribution function. For convenience, the CDF was adopted for the theoretical analysis in this paper. As indicated in [38], it is probable that a normal function variant is between 0 and 1, or it can be explained as the probability at which specific error occurred as in the following equation: Using the above equation, we could identify the probability of positioning error at some definite distance. The probability of DEE is calculated, and the CDF is shown in Figure 12. As shown in Figure 12, in experiment environment 1, the probability of getting less than 2 m of DEE in the proposed method is 0.941, whereas in the W k NN, W k NN + kmeans, and decision tree are 0.058, 0.411, and 0.352, respectively. In experiment environment 2, the probability of getting less than 2 m of DEE in the proposed method is 0.571, whereas in the W k NN, W k NN + k-means, and decision tree are 0.371, 0.4, and 0.371, respectively. The CDF of the distance estimation error shows that the proposed method obtained a higher probability of better accuracy in both experiment environments 1 and 2 compared to the other methods, which proves that the proposed method is not limited to specific scenarios. The main reason for the improved accuracy is that the SAP cluster divides the RPs more accurately according to its SAP similarity between RPs. In this case, the TPs collected at the boundary of the clusters are well classified into its delegate cluster during simulation, which helps to avoid the error. The better clustering results have produced better positioning accuracy.

Computational Complexity
The main purpose of using clustering algorithms is to reduce the computational complexity of the online localization phase. Furthermore, it will lead to an improvement in positioning accuracy by eliminating possible outliers [35].
The W k NN algorithm simply compares the TP with the fingerprint database and calculates the distance between TP and RPs. Then, the k nearest RPs (k = 4) is found from the fingerprint database. Finally, the nearest RP with minimum distance and highest weight among the k found neighbors is considered to be the user location. In this algorithm, to find the user location, the TP is compared with all the RPs that are stored in the fingerprint database, which causes high computational complexity.
The W k NN + k-means algorithm first groups the RPs and makes clusters. Instead of comparing the TP with all the RPs in the database, only the particular cluster with minimum distance to the TP is taken for the next process. Finally, the W k NN algorithm is performed in that particular cluster to find the user location. For example, if there are 1000 RPs in the fingerprint database, it will be divided into several clusters using the k-means algorithm (if k = 10, each cluster will have 100 RPs), instead of comparing the TP with all the 1000 RPs in the fingerprint database, it selects the single suitable cluster and calculates the user position by comparing only 100 RPs in that selected cluster. The remaining 900 RPs will be omitted, which helps to reduce the computational complexity. Meanwhile, as mentioned in Section 1, the k-means clustering algorithm randomly places its centroids to the fingerprint database to make clusters, and it does not consider the nonlinear similarity between RPs, which seriously reduced the positioning accuracy during the simulation.
Whereas, in this proposed method, the clusters are made perfectly by grouping the RPs using its SAP similarity. Therefore, the accuracy of this method is better than the other two compared algorithms. Since the proposed method has a clustering technique, so it can also reduce the computational complexity.
The Big O notation is used in this paper for complexity analysis. It is the most common metric for calculating runtime complexity (execution time). It describes the execution time of a task concerning the number of operations required to complete it. All these traditional and proposed methods are based on the W k NN algorithm and the prediction time complexity of W k NN is expressed by: where k is the number of neighbors that we considered in the algorithm (k = 4), n is the number of RPs in the training dataset, and d is the data dimensionality (RSS vectors received from different APs). Equation (14) loop through all points k times. First, it computes the distance between TP and RPs in the training dataset, then it remembers the index of the element with the smallest distance and adds the class at found RP to the counter. Finally, it returns the class with the most votes as a determined user location. When the number of searching points (RP) in the dataset is high, the prediction time complexity will be higher. The dataset and the execution time are shown in Table 3. As shown in the above table, there are no clusters in the W k NN method, and the TP is directly compared with all the RPs in the fingerprint database. In contrast, the W k NN + k-means algorithm has three clusters in both experimental environments 1 and 2 (the number of clusters in k-means comes from the Elbow method where k = 3). On the other hand, the proposed method has three clusters in experimental environment 1, and 10 clusters in experimental environment 2 (the number of clusters in the proposed method comes from the SAP cluster technique). It can be seen from the results that the execution time highly depends on the number of RPs stored in that cluster, and the user location calculation is significantly reduced in the proposed method.

Error Statistics
To show the efficiency of the proposed method more intuitively, Table 4 lists the comparison of error statistics between the proposed method and the other traditional methods. The error metrics used in Table 4 are followed from [30]. As we can see from the above table, the 50% and 75% mean error of DEE of the proposed method is less than the other compared methods in both experimental environments. The mean error of DEE is used in this paper as a metric to judge the positioning accuracy. The mean error of the proposed method in experimental environment 1 is 80.94%, 69.69%, and 89.06%, and in experimental environment 2 is 32.80%, 32.58%, and 67.48% lower than the mean error of the other methods used for comparison. We can see from Figures 10 and 12 and Tables 3 and 4 that the proposed indoor positioning algorithm outperforms traditional W k NN, W k NN + k-means, and decision tree indoor positioning algorithms. The results indicate that the proposed method can significantly improve positioning accuracy.
The positioning accuracy of this Wi-Fi RSS fingerprint-based indoor positioning highly relies on the number of acquired RPs and RSS observed in the indoor environment. Meanwhile, the ever-changing APs in the indoor environment (for example, the addition of new APs and removal of damaged existing APs) are one of the nagging issues of this method. Moreover, the changes in the indoor infrastructure (modifying the structure of the environment, for example, adding new rooms and removing existing partitions) will also reflect the fingerprint database. Due to the above-mentioned reasons, the positioning accuracy of this method is limited.

Conclusions
In this paper, we presented an efficient Wi-Fi indoor positioning method based on the IRSSMT Wi-Fi signal extraction method and the SAP information-based clustering algorithm. The proposed method is mainly intended to cope with the issue of multipath interference in indoor environments, impacts of the average value of missing APs, and randomly constructed clusters with randomly placed cluster centroids. The proposed method is based on the analysis of the characteristics of the Wi-Fi signals. The selected RSS median value is employed for better offline fingerprint database construction, and the SAP information reflects the better RP similarity and helps to create the most appropriate cluster. The performance of this proposed method is evaluated by experiments in two different indoor environments. The results show that the proposed IRSSMT and SAP information-based clustering algorithm outperform the traditional indoor positioning algorithms. Besides, the SAP information-based clustering algorithm produces significantly better clustering solutions quite consistently according to the SAP similarity measures of cluster quality, and the proposed algorithm has better-improved accuracy by 89.06% and 67.48% when compared to the other two algorithms. In addition, the run time of the proposed method is incredibly attractive, which greatly reduced the computational complexity. For future work, we will analyze and modify various filters, clustering algorithms, and localization algorithms to enhance the environmental adaptability of fingerprint positioning.