WiFi Access Points Line-of-Sight Detection for Indoor Positioning Using t he Signal Round Trip Time

: The emerging WiFi Round Trip Time measured by the IEEE 802.11mc standard promised sub-meter-level accuracy for WiFi-based indoor positioning systems, under the assumption of an ideal line-of-sight path to the user. However, most workplaces with furniture and complex interiors cause the wireless signals to reﬂect, attenuate, and diffract in different directions. Therefore, detecting the non-line-of-sight condition of WiFi Access Points is crucial for enhancing the performance of indoor positioning systems. To this end, we propose a novel feature selection algorithm for non-line-of-sight identiﬁcation of the WiFi Access Points. Using the WiFi Received Signal Strength and Round Trip Time as inputs, our algorithm employs multi-scale selection and Machine Learning-based weighting methods to choose the most optimal feature sets. We evaluate the algorithm on a complex campus WiFi dataset to demonstrate a detection accuracy of 93% for all 13 Access Points using 34 out of 130 features and only 3 s of test samples at any given time. For individual Access Point line-of-sight identiﬁcation, our algorithm achieved an accuracy of up to 98%. Finally, we make the dataset available publicly for further research.


Introduction
Although GPS has been indispensable for outdoor positioning, robust indoor positioning remains a research challenge.First, modern buildings with complex interiors make it difficult for the weak GPS signals to penetrate.Second, the 5-10 m GPS accuracy cannot provide the indoor users with the positioning accuracy they need for room-level tracking.To address these challenges, several technologies were proposed in the literature and applied in the real world [1].Due to the ubiquity of WiFi-enabled devices, WiFi-based indoor positioning has drawn much attention.Indoor positioning systems using the WiFi Received Signal Strength (RSS) were widely reported to achieve 2-3 m accuracy on average [2,3].However, the challenges for WiFi RSS-based systems were signal instability and spatial ambiguity caused by the multipath interference [4].
In recent years, the introduction of WiFi Round Trip Time (RTT) from the IEEE 802.11mc standard, which measures the travelling time of the signal between the transmitter and receiver, has promised sub-meter positioning accuracy, under the assumption of a clear line-of-sight (LOS) path.With RTT, positioning systems could trilaterate the user's location, assuming that the signal measure reflects the true distance.However, workplaces with plenty of furniture often do not have a clear LOS path from the WiFi Access Points to most locations, and hence, impact the wireless signal's integrity.In such environments, the WiFi RSS and RTT signals could be attenuated, reflected, blocked, or interfered, and resulted in fluctuating and unpredictable measures [5].Since the WiFi signal travels at the speed of light, the fluctuating and reflecting nature of RTT propagation in complex indoor spaces would result in large positioning errors with the trilateration technique.Moreover, the instability of WiFi signal measures in non-line-of-sight (NLOS) scenarios would create highly similar values of WiFi measurements in two distinguishing locations several meters away.Such similarity of signal measures would greatly decrease the accuracy of WiFi fingerprinting in complex indoor environments.Therefore, detecting the LOS condition of the WiFi Access Points is of great importance in enhancing the system performance.
To this end, we propose a framework for WiFi LOS detection by automatically selecting the most informative feature set using Machine Learning and weighting methods.For further optimization, we develop a novel multi-scale selection (MSS) method to validate the importance of the features on multiple scales.The proposed framework used the correlation between the input features and ground-truth labels to decide the importance of each feature.To further investigate the informativity of the features, datasets of different sampling sizes are used for feature validation.
In the preprocessing stage of the proposed framework, statistics of the input WiFi RSS and RTT measurements were computed and fed into an importance filter.Several popular feature selection models were used in the importance filter to decide their own feature set based on different algorithms.Then, statistical features chosen by feature selection models were assigned initial weights based on their macro F1 score and accuracy in LOS identification.To validate the selected features from both macroscopical and microscopical perspectives, multi-sampling datasets were introduced.Based on the performance of the selected feature set, weights adjustment was leveraged to reselect the features recursively.To evaluate the performance and transferability of the proposed algorithm, a large-scale real-world campus building floor was used as the testbed.Each location in the dataset was manually labeled and verified for ground truth.Since the framework only focuses on reducing high-dimensional data based on the relevance between input signal measures and the output, it could be applied to other signal measurements in indoor positioning.
The article's contributions are summarized as follows: • A novel feature selection framework was proposed to identify the LOS conditions of WiFi APs with high accuracy even with few data samples, while using fewer Machine Learning features than existing state-of-the-arts.

•
A large-scale real-world dataset for a campus floor was collected and made available for further research.To the best of our knowledge, this was the first publicly available dataset that contains both WiFi RSS and RTT signal measures, as well as LOS conditions of each AP for every location.

•
We analyzed our framework on such dataset to evaluate the efficiency and to provide a baseline performance for further research.
The rest of the article is organized as follows: Section 2 introduces the related work in WiFi LOS identification.Section 3 provides a detailed description of the framework architecture, then the data preprocessing and the proposed feature selection method is investigated in Section 4, the experimental setup and empirical performance are presented and analyzed in Section 5. Finally, Section 7 concludes our work and outlines future work.

Related Work
The Non-Line-of-Sight (NLOS) scenario has always been a challenge for most positioning systems.For instance, although promising positioning accuracy is provided by the Global Navigation Satellite System (GNSS) [6][7][8][9][10] in most outdoor spaces, GPS still struggles where the signals are interfered by skyscrapers and poor weather conditions.To address the problem, the system proposed by [11] leveraged the vector tracking loop (VTL) to detect NLOS and perform corrections.Features such as noise bandwidth, time delay of multi-correlator peaks, and code discriminator outputs were used as the input data.
Similarly, Massive Multiple-Input Multiple-Output (MIMO) systems also suffered from the same NLOS challenge [12][13][14][15][16].In [17], indoor MIMO channel measures were analyzed for kurtosis-based LOS detection.The importance of introducing kurtosis statistical features was investigated based on channel impulse response (CIR).A stochastic model was developed in [18] for outdoor LOS/NLOS scenarios.Multipath components (MPCs) extracted from sub-array outputs were assessed and identified into spatial-stationary (SS) for modelling.To improve the vehicle MIMO localisation systems, Support Vector Machines for LOS identification was proposed to process CIR information [19].The role of small cells was investigated in [20] for optimizing downlink heterogeneous cellular networks under LOS and NLOS transmissions.Beside using a convolutional neural network (CNN) in a 3-D massive MIMO channel model, LOS detection [21,22] treated the problem as a binary hypothesis test.Based on time-space-frequency channel correlation, the system in [22] aimed to improve the new radio capacity and spectral efficiency of the 5G network.
For ultra-wideband systems (UWB), in ideal LOS conditions, the positioning accuracy was widely reported to be at the centimeter level [23][24][25][26][27]. Researchers have also attempted to address NLOS conditions for UWB indoor spaces.In [28], recursive decision tree learning was used to exploit the UWB data for LOS detections.The CIR information extracted from UWB signals was taken into consideration.Machine Learning methods were leveraged to mitigate for the deviation of NLOS UWB measurements [29].In [30], multi-layer perceptron and CNN were used to make predictions.The 2D Time Difference of Arrival (TDoA) framework based on deep Q-learning was proposed in [31] to make efficient LOS node selection.The system in [32] introduced Morlet wave transform (MWT)to make LOS detection based on time-domain characteristics.Table 1 compares the performance of LOS identification in different systems.
Despite its high accuracy, the disadvantage of UWB positioning systems is that they use proprietary beacons.On the contrary, WiFi-based indoor positioning systems leverage existing WiFi APs.To achieve high LOS identification, Channel State Information (CSI) was used [2,[33][34][35][36][37].A detailed description of the channel properties could be extracted from CSI to identify the propagation situation of the WiFi signal [38].Phase information, amplitude information, Time-of-flight (ToF), and Angle-of-arrival (AoA) [39] extracted from CSI were commonly used as inputs to identify LOS conditions.Similar to MIMO and UWB systems, CIR converted by Inverse Fast Fourier Transform (IFFT) also helped improve the identification accuracy [40][41][42].In [40], the system achieved 90.5% LOS identification accuracy when using Rician-K and skewness derived from CSI.The root mean square delay spread, Skewness, Kurtosis of CSI were used in [43] in making LOS detection with an accuracy of 95%.Systems proposed in [44,45] investigated the potential of power-delay profile and power-angle spectrum, respectively.Other than exploiting phase information of each sub-carrier [41], statistics of CIR were also studied for their performance in making LOS classifications [46][47][48][49][50][51].The system proposed by [47] achieved detection of LOS AP with an accuracy of up to 94%.The standard deviation, kurtosis and skewness of CIR were used in [46] and delivered an accuracy of 95% in detecting LOS situation.Furthermore, CSI could also be leveraged to detect human activities.The system proposed in [52] used existing WiFi equipments for location-oriented activity identification at home based on CSI signal measurement.In [53], CSI was used to detect the human respiration based on the Fresnel model.Although CSI provides detailed channel information of the WiFi signal measures and is more informative and efficient in indoor positioning, it is hard to access.CSI information could only be acquired on a PC with a modified WiFi driver such as the Intel 5300 NIC.These limitations make it challenging to use in mobile devices such as smartphones and tablets.As this article focuses on the wider implementation of WiFi-based indoor positioning on heterogeneous devices, CSI was not considered to be one of the input signal measurements in our empirical experiments, although our proposed framework could also be applied for CSI measures.
In addition to CSI, WiFi Received Signal Strength (RSS) and Round Trip Time (RTT) were often employed, due to their accessible nature in all WiFi-enabled devices.In addition, the ESP32 system also supports WiFi RSS and RTT signal measurements.ESP32 is a low-cost and low-power-consumption device integrated with a series of chips to support Wi-Fi and Bluetooth.For WiFi RTT measurement testing and collection on ESP32, a utility software called Chronos was created, as introduced in [54].One of the most famous WiFi-based positioning techniques for WiFi RSS is fingerprinting.The more complicated the interior along the WiFi propagation path, the more unique the WiFi RSS measurements.Thus, in a real-world indoor space, each location will have its special WiFi RSS pattern.Such distinguishing WiFi RSS patterns could be leveraged by indoor positioning systems to make precise positioning estimations.As shown in Figure 1, fingerprinting consists of two phases: an offline phase and an online phase.In the offline phase, a dataset is built in the targeted testbed.WiFi measurements are recorded at each reference point and preprocessed before being stored in the dataset.Each data sample is carefully labeled with ground-truth coordinates of the reference point where the WiFi signal measures are collected.In the online phase, when the user reports a real-time WiFi RSS measurement from an unknown location, the system will match the measurement with those in the dataset and make a positioning estimation based on their relevance.Fingerprinting could also be used in WiFi RTT-based indoor positioning systems.The basic architecture of a classic WiFi indoor fingerprinting system.This system has two phases: the offline phase and the online phase.In the offline phase, the WiFi signal measurements are collected, preprocessed, labeled and stored in the dataset.In the online phase, the WiFi measurements received by the user from an unknown location are compared with the measurements in the database by the positioning algorithm to obtain the final location estimation.
The most common WiFi LOS identification method in the literature was based on the statistical features of the signal measures.Several works employed the Round Trip Time (RTT) [55][56][57][58][59][60], Received Signal Strength (RSS) [61][62][63][64].The researchers in [65] used RSS in identifying LOS conditions.A Gaussian model was leveraged to make detections based on RSS signal measures in the system proposed by [66].RTT measurements along with pedestrian dead reckoning were used in LOS identification systems in [67].To make use of the statistics from WiFi signal measure, the standard deviations of both RSS and RTT were used in [68] for NLOS error detections.Statistical features of kurtosis and the mean value of both RSS and RTT measures were included in [69] identification system.Furthermore, a wider range of statistics was covered in the LOS and NLOS channel detection system in [70].In addition to kurtosis and mean, skewness, hyper-skewness, and peak probability extracted from RSS measures were investigated [70].As concluded in [71], the statistical features of RTT are more indicative to improving LOS identification than those of RSS.It was widely reported that the positioning and LOS identification accuracy of WiFi RSSbased or RTT-based systems are not as high as that of CSI-based ones.However, the ease of accessibility of RSS and RTT signal measures make them more appealing for WiFi-based indoor positioning systems.In addition, to benefit the huge WiFi RSS and RTT existing work, we focus this article on such measures to improve their performance.
Most importantly, the above previous approaches rely on the manual selection of the features.They enumerated different combinations of features extracted from all WiFi APs.However, since not all APs are informative, redundant information could impact the performance accuracy.This is where our proposed framework comes into place.

System Architecture and Problem Formulation
This section introduces the architecture of our proposed framework in detail, and formulates the problem to be investigated.

System Architecture
The architecture of the proposed framework is shown in Figure 2:

•
Step 1: A feature preprocessing method is proposed to extract the statistical features from raw WiFi training data.The mean, median, standard deviation, Kurtosis, and Skewness are calculated.Then, several Machine Learning feature selection models are used to analyze the importance of each feature and generate different sets of features.

•
Step 2: Each feature set will be assigned an initial weight based on its macro F1 score and accuracy.These weights will be fed into the Feature Selector in the next step, and used for generating an initial set of features.

•
Step 3: A multi-scale selection (MSS) method is used to reduce the weights of the uninformative features.The MSS method uses several scales of the datasets to select the features from different perspectives.In doing so, features that are important in both long-term time and short-term periods would be selected.The process is repeated until an optimal set of features is decided.

•
Step 4: Using the selected set of features from the previous step, a LOS identifier (e.g., Random Forest Classifier) is employed to make LOS detections for the WiFi APs.

Problem Formulation
Without loss of generality, the test bed is evenly divided into grids where each cell represents a reference point.Please note that there is no overlapping reference point in training and testing data.A total of J grids is used as training reference point R j (j = 1, 2, . . ., J).K consecutive scans of raw WiFi RSS and RTT signal measures from T number of WiFi APs are collected at every point R j : k=1 , as follows: where t = 1, 2, . . ., T.
The raw training data are defined as D = {X , Y }, where Y = {Y j } J j=1 and X = {X RTTj , X RSSj } J j=1 .When the raw testing WiFi signal measures X Test at R Test are collected by the user, it will be preprocessed so only the features selected by the Feature Selector will remain.

Then RFC identifies the LOS conditions y (t)
Test of the t th AP.Finally, the LOS detection result Test , . . ., y

Feature Preprocessing and Feature Selection Algorithms
This section provides detailed descriptions of our proposed framework, including the feature preprocessing, initial weights assignment, feature selector and data validation.

Feature Preprocessing
As shown in Algorithm 1, during the Feature Preprocessing step, the statistics of the WiFi RSS and RTT are computed and filtered based on their importance and correlation to the LOS ground truth.Several traditional feature selection models are employed to analyze the importance of the features.

Algorithm 1 Feature preprocessing and initial weights assignment
Require: X : input data, Y: label, RFC: Random Forest Classifier, F1: macro F1 score calculation, Acc: accuracy calculation, N initial : number of features expected Ensure: X: statistical features, X: selected features, W: initial weights 1: Models ← {all included feature selection models} 2: M ← |Models| 3: for X in X do X includes both X RTT and X RSS

5:
Med ←Median(X) σ ←Standard Deviation(X) 7: S ←Skewness(X) w nm ← 0 21: Using the raw WiFi training data, the preprocessing step leverages a feature extraction method to generate the statistical features.Mean (µ), median (Med), standard deviation (σ), Skewness (S) and Kurtosis (K), which were reported to be the most informative features for LOS identification [2], are computed from the WiFi RSS and RTT input data.For statistics calculation, the mean and central moment are defined as follows: where x k indicates the RTT or RSS data collected at a specific reference point, K is the total number of data samples for statistics calculation, µ n is the nth central moment.Based on the mean(µ) and nth central momentµ n , standard deviation (σ), Skewness (S) and Kurtosis (K) are computed as follows: The raw training data X will be replaced by a new statistical feature vector X = {µ RTT , µ RSS , Med RTT , Med RSS , σ RTT , σ RSS , S RTT , S RSS , K RTT , K RSS }.

Importance Filter
In this step, an importance filter is employed to remove the less important features from the previous feature preprocessing step.Several feature selection models are leveraged to analyze the importance and correlations of the features in X to the ground truth label Y.Each model selects the top N initial features based on their evaluation for the next step.The statistical features are ranked by their importance so that the least important features and those with the weakest correlation to the ground truth are removed.
We use the most popular feature selection models, namely Permutation Importance (PI), Hierarchical Clustering (HC), Fisher's Score (Fisher), Recursive Feature Elimination (RFE), Least Absolute Shrinkage and Selection Operator (Lasso), Mean Decrease in Impurity (MDI), Pearson Correlation (Pearson) and Chi-squared (Chi).A short description of the models is as follows: Permutation Importance (PI) The Permutation Importance model uses the mean decrease in accuracy of a chosen classifier as the evaluation metric to calculate the feature's importance.To investigate the importance of each feature x n from the original feature set, PI randomly shuffles every feature during the iteration r (r = 1, 2, . . ., R).The shuffled feature set is then fed into the classifier for identification.The impact of the shuffled feature is illustrated by mean accuracy decrease MAD.Thus, the correlation between the feature and the ground truth label could be evaluated which is also suitable for non-linear feature selection purposes [73].The mean decrease accuracy importance MAD of x n is calculated using both the average accuracy of shuffled data and the accuracy performance Acc of the original feature set, as follows: As shown in Figure 3, the importance of each feature in the proposed dataset is listed in their original order from left to right.Please note that features with negative MAD values are of the least importance for LOS detection.
We chose this model, as it was the underlying model to achieve more than 90% accuracy in breast cancer margins identification [74], and high accuracy in short-term electricity load forecasting [75].
Hierarchical Clustering (HC) Although the above permutation importance model already investigates the correlation between each feature and the output, some features may have similar importance when they are closely relevant.Therefore, redundant features still remain in the feature set after the selection by the PI model.To avoid the impact of duplicated information, Hierarchical Clustering (HC) is introduced.
To identify closely related features, HC groups all the features in separate clusters so that each cluster is clearly distinguishing from the rest.First, HC assigns every feature to a unique cluster.By leveraging Ward's linkage function distance matrix converted from Spearman correlation matrix, HC investigates the similarity among the clusters.Then, the two most similar clusters are merged.By recursively repeating this process, the final set of feature clusters are decided where each cluster only contains features that are most similar to each other.
An example of the final clusters generated by HC is shown in Figure 4. Figure 5 demonstrates the correlations between every two features x n (∀x n ∈ X).We employ this model, as it was reported to achieve high classification accuracy for daily electricity usage prediction with 97% less computational cost [76], and reached 99% accuracy of classification on non-iid (not independent and identically distributed) data [77].
Fisher's Score Fisher's score is one of the most popular filter methods among feature selection models.It evaluates the importance of the feature by computing the score between each feature and all the classes in the ground truth label.Then, the features are ranked to filter out the least important ones.The intuition of Fisher's score is that the most informative features of a class should be more concentrated within the class while being further away from other classes.Fisher's score is defined as: where n represents the n th feature, l indicates the l th class in the label, num l is the number of data samples in the l th class, µ nl and var nl are the mean and variance of the n th feature in class l, µ n is the mean of the n th feature in all classes.We employ the Fisher's score, as it was the underlying feature selection method in other application domains with reported high accuracy (e.g., intrusion detection systems with 99% success rate [78], speech emotion recognition with 85% accuracy [79]).
Recursive Feature Elimination (RFE) The Recursive Feature Elimination model analyzes the importance of the feature by evaluating the change in the cost function.The intuition of RFE is such that removing an informative feature has a considerable impact on the cost function.Therefore, the larger and more rapid changes it causes to the cost function, the more important a feature is to the LOS identification.After removing a feature from the original set, the RFE model would calculate the changes to the cost function J immediately.Using Support Vector Machine (SVM) to assign the weights, RFE eliminates the features with the least importance iteratively.The changes in the cost function ∆J(n) are defined as [80]: where ∆w 2 n indicates the change in the weight of x n .The importance of a feature is evaluated based on ∆J(n).The whole process is repeated until a feature set of the expected size is decided.For non-linear feature selection, the Gaussian kernel was proven to provide better results [81].
We employ RFE, as it was reported to achieve high accuracy with other application domains (e.g., fault diagnosis detection with an F1-measure of up to 0.95 [81]).

Least Absolute Shrinkage and Selection Operator (Lasso) Least Absolute Shrinkage and Selection
Operator is an embedded feature selection model that gives a weight of 0 to the least important features.By leveraging L1 regularization (i.e., introducing L1-norm to the cost function), Lasso is able to select the best features in high-dimension datasets.The cost function in Lasso in defined as: where n samples is the number of data samples, Y is the label of the input data, X is the feature set, θ is the slope term corresponding to each feature and λ is the penalty term indicating how severe the regularization is.In a set of closely correlated features, Lasso only selects one of them rather than adopting the whole combination.We employ Lasso, as it was reported to achieve high accuracy with other application domains (e.g., crime prediction [82], tumor classification) with more than 81% accuracy [83].
Mean Decrease in Impurity (MDI) In contrast to the Permutation Importance, Mean Decrease in Impurity measures the feature importance by calculating the average decrease in Gini impurity [84].In each node of a decision tree, every informative feature would help to reduce the Gini impurity.For a randomly selected variable, the Gini impurity indicates the probability of misidentification in this node, as follows: 12) where L is the number of classes to be identified in the ground truth label, p l is the probability of the data to be identified as class l ∈ Y, t is a specific node in Random Forest, t L and t R are the child nodes of t, X t is the input to the t, X tL and X tR are data divided into t L and t R , respectively.Therefore, the weighted average decrease in the impurity∆G(t) of each related node t would represent the importance of the corresponding feature [85].The performance of MDI was investigated in [86].It was observed that by leveraging features selected by MDI, an error rate as low as 4% was achieved.
Pearson correlation coefficient The Pearson correlation coefficient measures the linear correlation between each feature and the ground truth label to select the most important features.The covariance of the label Y and the feature set X is leveraged by the Pearson correlation coefficient.The positive value of the covariance indicates a positive correlation between the feature and the ground-truth LOS conditions.Pearson correlation coefficient is defined as follows: where E is the mathematical expectation, x n and Y are the input feature and the label, µ x n and µ Y are the mean values of x n and Y, and σ x n and σ Y are the standard deviation of x n and Y.
We employ the Pearson correlation coefficient, as it was reported to achieve high accuracy with other application domains (e.g., daily activity recognition [87] with more than 86% accuracy).
Chi-squared The Chi-squared model evaluates the importance of the features by calculating their correlations to the ground-truth labels Y.The Chi-squared score χ 2 score of each feature x n (n = 1, 2, . . ., N) is calculated where N is the total number of statistical features in X as: where O in our case is the observation results in LOS identification based on groundtruth label Y and WiFi signal features in X , and E represents the expected output of the identifications where X and Y has no correlation at all.Since the Chi-squared score χ 2 score has an approximate Chi-squared (χ 2 ) distribution in large-scale data, the higher the χ 2 score , the more important and relevant the feature is to the identification result.
We employ the Chi-squared model, as it was reported to achieve high accuracy with other application domains (e.g., Arabic text recognition [88] with 90.50% accuracy).

Initial Weights Assignment
With the above importance filter, the feature sets Xm (where m indicates different selection models) were chosen by the feature selection models.However, the generated feature sets are not guaranteed with high identification accuracy and therefore, are not equally informative to the result.Thus, in this step, the selected statistical features X, are evaluated and assigned with initial weights.The LOS identification performance of all feature sets are investigated by leveraging Random Forest Classifier (RFC) and crossvalidation.The macro F1 score (the average F1 score of all classes) and the accuracy are used as the evaluation metrics, as follows: where L represents the total classes in the ground-truth label, TP is the number of the true positive predicts, FP is the number of the false positive, and FN is the number of the false negative.
After obtaining the LOS identification performance evaluation, an evaluation vector is generated for each feature set based on the Macro F1 score and accuracy.Next, different initial weights w nm are generated by the Initial Weights Generator based on all the evaluation vectors E m of the feature set Xm .All selected features xnm in the generated feature set Xm is assigned with the same weight w m .If a feature is filtered out by the feature selection model, it is given the weight 0. The general weight of a statistical feature is defined as follows: where w nm is assigned with the weight of 0 when the corresponding x nm is not selected by Xm .

Feature Selector and Testing Data Validation
In this step, we propose the feature selector to validate the statistical features from the previous step.

Multi-Scale Selection (MSS)
To analyze the selected features in both long-time and short-time periods, a novel multi-scale selection method is proposed (see Figure 2).By using datasets of different sampling scales, MSS removes the features with the weakest correlation and decides on an optimal feature set with the minimum size N min as shown in the Algorithm 2. This step consists of four separate processes, namely multi-scale sampling, performance evaluation, importance censoring and result voting.A comprehensive description of these processes is given below.
Algorithm 2 Feature selector with multi-scale selection.
Require: X (s) : input data of s sample size, Y (s) : label of s sample size, X, X, W: old , Y (s) , ) the output of model is the feature importances 15: end for 16: end for 17: W new ← WeightsAdjust(V, W old ) 19: X new ← GenerateFeatureSet( X, W new ) 20: end while 21: X ← X new 22: return X Multi-Scale sampling First, to investigate the feature performance on datasets, multiscale sampling is adopted to generate datasets with different sampling sizes.For our dataset, a maximum of 120 scans of WiFi measures were recorded in each reference point.Thus, we evaluate different sampling sizes from five scans (i.e., short-term sudden change) to 120 scans (i.e., long-term measure).As illustrated in Section 5.4, adopting different sampling sizes has a clear impact on LOS identification.
Performance evaluation Next, the same evaluation methods in Section 4.2 are employed to analyze the feature performance on multi-scale datasets.The evaluation vectors of the feature set on different datasets indicate the importance of each feature for LOS detection.After a new feature set is generated based on the updated weights, RFC is used to perform LOS identification.
Importance censoring Since the features are re-sampled according to different sampling sizes, the hidden patterns that indicate the LOS condition change accordingly.Thus, we use the feature selection models from the above importance filter to evaluate the relevance of the features from different perspectives.
Result voting After obtaining the RFC performance and importances generated by the above selection models, the features with the strongest correlations to multi-scale LOS identification are selected.Result voting is leveraged to rank the statistical features.With both empirical performance and theoretical evaluations, the most informative and relevant statistical features to LOS detections in the current set are decided.Features with higher voting scores on multi-scale datasets receive increments in their corresponding weights.

Final Feature Set and Testing Data Validation
After the weights update in the previous step, the features with lower weights (those with uninformative information for LOS identification) are rejected, the weights generator and feature set generator select a new feature set based on these updated weights W. The new feature set will be fed into MSS for further iterations.When the final set of features X remains unchanged, this set will be used for data validation.
In the testing phase, the WiFi signal measures X Test collected from the new reference points are preprocessed.Statistics of WiFi RSS and RTT measurements are extracted to form a statistical testing dataset X Test .By only keeping the features selected by the final feature set X from MSS, a new dataset containing only the most informative features X Test is generated.

Experimental Setup and Empirical Results
In this section, a comprehensive description of the proposed dataset is introduced.Then, we evaluate the proposed framework on this dataset.

Test Bed and the Proposed Dataset
Although identifying LOS conditions of APs is of significant value for WiFi indoor positioning systems, to the best of our knowledge, there is no publicly available dataset that contains both the WiFi RTT, RSS signal measures and the LOS condition of the reference points.Furthermore, there is no public WiFi positioning dataset that contains multiple samples of both RTT and RSS per reference points, that are needed for statistical analysis of the signal measures.Therefore, it is necessary to have a public dataset that fulfills the above criteria for further WiFi indoor positioning research.This motivates us to collect and publish our own large-scale real-world WiFi RSS and RTT datasets for the community.
We chose the entire fifth floor of a campus building as the testbed (see Figure 6).The space was filled with furniture and a noisy background with plenty of electromagnetic signal transmitters.The indoor interior includes long narrow corridors, big meeting rooms, small office rooms, and large open space.The variety of LOS and NLOS scenarios in this testbed makes it suitable for testing our proposed LOS identification algorithms.For WiFi signal measurements, an LG G8X ThinQ smartphone and 13 RTT-enabled Google APs were used.The APs were placed in the exact same locations as the university's regular APs.Using measuring tapes and ground markers, the ground-truth coordinates of each reference point were carefully recorded and validated by human testers.In addition, the LOS conditions of all APs from each reference point were manually collected and verified.The detailed information of the proposed dataset is listed in Table 2.The dataset is made public at https://github.com/Fx386483710/WiFi-RTT-RSS-dataset(accessed on 10 April 2020).A snapshot of the WiFi RSS and RTT measurements of our dataset is shown in Table 3.The values in columns 'X' and 'Y' are the ground-truth coordinates of the reference point.Columns 'AP1' to 'AP13' show the WiFi RSS and RTT from all APs at such reference point.The value of −200 dBm indicates that the corresponding AP is not visible from the current position.Similarly, the value of 100,000 mm demonstrates that no WiFi RTT signal is received from the corresponding AP.Column 'LOS APs' shows which APs the reference point has a direct LOS path to.In our dataset, 120 scans (i.e., approximately 40 s) of data samples are recorded at each reference point, which provides sufficient information for further research.Please note that the reference points in the training and testing dataset do not overlap.A desktop PC equipped with an Intel i9-12900k @ 4.90 GHz CPU and 32 GB DDR4 4000 MHz memory was used to analyze the results.

The Impact of NLOS Scenarios in Indoor Positioning
In NLOS scenarios, the WiFi signals are interfered, causing a negative impact on the signal measurements.In the indoor environments, the signals are easily attenuated by thick concrete walls, humans and furniture, making it challenging for indoor positioning.As illustrated in Figure 7, the WiFi RSS measurements were not stable over time.Most importantly, we observed that even though strong signal measures of −60 dBm were received, the same NLOS WiFi AP could not be reached at some point.To have a deeper understanding of the real-world impact of NLOS conditions, we recorded the WiFi RSS and RTT signal measurements under two scenarios: LOS where there was a clear path between the AP and the smartphone, and NLOS where there was a human body in-between.The smartphone was placed 3 m away from the AP.We observed in Figures 8-10 that under the NLOS scenario, both signal measures were unstable.For WiFi RSS, the recorded measurement values decreased drastically from −54 dBm to −80 dBm.Additionally, the distribution of the WiFi RSS became wider.However, we observed that although the RTT measures became larger, its distribution was less affected than RSS under the NLOS scenario.There were occasional outliers of up to 4 m (from the ground-truth of 3 m).Under LOS conditions, both WiFi RSS and RTT signals were stable and exhibited a small distribution.Therefore, with correct calibration, the RTT measures could locate the user with good accuracy by trilateration.It was also observed in the CDF curve that the variance of the LOS RTT measures stayed within 0.5 m while NLOS had a variance of up to 3.5 m.Therefore, successfully identifying the LOS conditions of each AP would help improving the indoor positioning accuracy.The WiFi RTT signal measure under different scenarios.A smartphone was placed 3 m away from the access point.We observed that in human NLOS experiment where the signal was blocked by human body, the RTT measurement became larger, more unreliable and further away from the ground truth distance measure.
Figure 10.The CDF curves of WiFi RTT signal measure under different scenarios.A smartphone was placed 3 m away from the access point.We observed that in human NLOS experiment where the signal was blocked by human body, the minimum error of the RTT measurement increased and the maximum error grew larger as well.WiFi RTT became more unreliable under NLOS scenarios.
To investigate the impact of a dynamic indoor environment on the signal measures, we collected both WiFi RSS and RTT measures under three scenarios: LOS, NLOS and corridor LOS.The smartphone was moving away from the AP while recording WiFi data.To create a common NLOS condition, the AP was placed on the other side of a thick concrete wall.In the corridor experiment, although the AP had a clear LOS path to the smartphone in a narrow long corridor, the WiFi signals struggled under the heavy reflections created by the walls.To analyze the correlation between the WiFi signals measurements and the true distance, the RSS and RTT values were normalized.As shown in Figures 11-13, in an ideal LOS experiment, the RSS measures (in green color) had much smaller variance under LOS conditions.However, the RTT measure (in orange color) showed its robustness in the NLOS scenario by producing a similar level of variance as the RSS measures.In the corridor experiment, it was observed that the RSS measures were greatly attenuated by the interior, where locations up to 9 m away from the AP had similar RSS.On the contrary, the RTT measure showed clear correlations to the true distance with some minor offsets.We concluded that the RTT measures were more robust and reliable in more complex environments and the RSS measures were more sensitive to the interior changes.The RSS (as shown in green color) and RTT (as shown in orange color) values were normalized.We observed that in complex indoor spaces, the RSS produced large variance even with clear LOS path to the AP.The RSS measurements were unpredictable with similar values up to 9 m away.In contrast, the RTT measures were more stable and had a clear positive correlation to the true distance.
As shown in Figures 14 and 15, when using only raw statistical features from all LOS APs, the positioning error was 1.18 m.However, after introducing NLOS APs, the error increased to 1.41 m.When only NLOS signals are included, the positioning error went up to 1.65 m.In addition, the largest RMSE produced by NLOS features was up to 7 m.We observed that using LOS WiFi signals could greatly improve the positioning accuracy by up to 29%.Please note that the results were based on raw statistical features.The above empirical results indicate that identifying the LOS conditions of the APs is of great importance.

The Importance of Feature Selection
In LOS identification, higher accuracy is not guaranteed with more data.The WiFi signals are unstable and may introduce more errors into the positioning result.To assess the impact of introducing raw WiFi statistical features, we used the random forest classifier to perform LOS detection.WiFi AP6, AP8 and AP12 were chosen as examples for individual LOS detection because they had the most LOS paths to the RPs.The positions of the three APs are as shown in Figure 6.The performance of using unmodified statistical features is illustrated in Figure 16 and Table 4.We observed that using all the raw features from the corresponding AP does not guarantee better accuracy.The WiFi signals from AP12 contained more noisy information for the classifier.On the contrary, using features from all APs failed in LOS detection of AP6.We observed that to achieve robust LOS detection results, selecting the most meaningful features is of great importance.

Sampling Size
The novel idea of developing the multi-scale selection method is to analyze the importance of the statistical features from both macroscopical and microscopical perspectives.The signal patterns indicating the LOS condition of each AP were investigated in both longterm and short-term time periods.Therefore, stable measurements and sudden changes were included in identifying LOS conditions.As demonstrated in Figure 17, statistical features from datasets of different sampling sizes provide distinguishing information.For features from 120 to scan data, only the mean value of all RTT measurements was kept.The outliers and fluctuating measurements were removed during the feature extraction phase.In contrast, the 5-scan dataset still recorded the abnormal RTT measures at the reference point which implied a higher possibility of the NLOS condition of the AP.To illustrate the significance of the MSS method, LOS identification performance was evaluated on multi-scale datasets.Datasets of different sampling sizes (i.e., 5, 10, 15, 30, 60, 120) were generated by MSS.For the dataset with a minimum of five scans, every 1.5 s WiFi signal measures were used to form the statistical features.AP6, AP8, and AP12 were chosen for individual AP LOS detection because they had the most LOS paths to the RPs.The positions of the three APs are as shown in Figure 6.The performance results of LOS identifications using multi-scale datasets are shown in Figure 18 and Table 5.In the proposed framework, we focus on the LOS conditions of both individual AP and all APs at the same time.As illustrated in the results, using smaller sampling sizes had an improvement in identifying the LOS of certain APs.However, such influence became negative when predicting the conditions of all APs at the same time.We observed that the features selected by the proposed MSS method had a great improvement in the performance of both individual and all-AP LOS predictions.

The Performance of the Proposed Framework
Since using all statistical features had a poor performance in LOS detections, feature selection algorithms were introduced to address the problem.By leveraging feature selection models, features with redundant information were removed, while noisy features were marked for further improvement.
For performance evaluation of the proposed framework, WiFi RSS and RTT datasets containing the ground-truth coordinates and LOS conditions are needed.However, currently, there is no publicly available dataset that meets this requirement.Therefore, to validate the LOS identification accuracy and assess the transferability and generalization of our proposed framework, a large-scale real-world dataset was proposed as introduced in Section 5.1.By evaluating the performance on both individual AP and all APs LOS identification, we illustrate the transferability and generalization of the proposed framework.
Several state-of-the-art feature selection algorithms were used for comparison.In addition to popular feature selection models (i.e., PI, HC, Fisher, RFE, Lasso, MDI, Pearson, and Chi) as introduced in Section 4.1.2,algorithms proposed by previous works were included.The C SEL model proposed by [69] used the mean RSS measurement and other statistical features (e.g., mean, quantile deviation, number of outliers) extracted from RTT measures.In the paper by Dong et al. [69], different manually selected combinations of the features were tested.S-F [71] leveraged the mean, standard deviation, Skewness, and Kurtosis of both WiFi RTT and RSS measures.The features set (represented as Sun) selected by [68] contained standard deviations of both RTT and RSS.Consisting only of raw RSS and RTT measures, the Choi set [89] identified APs sending both large RTT measures and low RSS measures as NLOS.In the Si feature set chosen by [66], mean and variance of the RSS were leveraged.The system proposed [70] used standard deviation, skewness, kurtosis, hyper-skewness, and peak probability as the neural network input for WiFi channel LOS identifications, represented by Carpi.The 10-scans dataset was used because it was the most indicative based on the above empirical results.Macro F1 score and weighted F1 score were used as evaluation metrics.Macro F1 score focused on valuing all APs equally while the weighted F1 score balanced the disparity among the classes.
The identification of LOS conditions of all APs was performed based on different feature selection models.The performance of each algorithm is illustrated in Figure 19 and Table 6.We observed that the feature set selected by the proposed framework improved the LOS identification results greatly.In macro F1 score, the proposed framework achieved up to 126% improvement compared to previous work and up to 29% compared to popular feature selection models.For weighted macro F1 score, the proposed framework achieved up to 81% improvement compared to previous work and up to 16% compared to popular feature selection models.Our proposed framework also used fewer features, with only 34 out of 130 features.The number of features used by Pearson and Fisher models was more than 110 and 90, respectively.The result demonstrated that further analysis of the feature importance in the multi-scale datasets provided higher accuracy.As shown in Figure 20, the misclassifications happened mostly in reference points that were located in cornered areas.We observed from Figure 20 that stable and strong WiFi connections would provide more reliable LOS identification.Furthermore, the proposed framework was evaluated on its individual AP LOS detection.As shown in Figure 21, APs that had LOS path to any RPs were included in the evaluation.AP6, AP8 and AP12 had LOS paths to the most RPs while AP7 and AP10 only had LOS paths to 8 RPs and 4 RPs, respectively.It was observed that the proposed framework provided promising performance in LOS identifications for individual APs even with a LOS path to a few RPs.

Discussion
We compared our proposed framework with several feature selection models and LOS identification algorithms from previous works, in empirical settings.We observed that our framework increased the macro F1 score and weighed F1 score by up to 15.3% and 28.8%, respectively.Most importantly, our framework used fewer features compared to existing state-of-the-art models.
The major improvement in our feature selection progress is that instead of enumerating different combinations of features manually (as adopted by previous works), the proposed framework automatically selects the best set of features.The final selection is only based on the input features and ground-truth label which guarantees the framework's transferability to high-dimensional data of other signal measurements.For instance, WiFi RTT measurements collected from ESP32 system using Chronos software [54] can be used directly as the input to the proposed framework with no restrictions.The framework would only investigate the correlations between the input features and the ground-truth label and select the best set of features accordingly.The proposed MSS method investigates the feature importance in multiple sampling scales so that the final set of features is informative from both macroscopical and microscopical perspectives.
To validate the generalization of the framework, a novel dataset was collected.To the best of our knowledge, there is no existing WiFi dataset that contains detailed WiFi RSS, RTT signal measurements, ground-truth coordinates of the reference points, and LOS conditions of all APs to each reference point.The contributions of the dataset to the research field are as follows: • The dataset was collected in a campus floor.Each AP was surrounded by complex interiors and in different LOS/NLOS conditions.

•
The testbed of 92 × 15 m 2 was evenly divided into 0.6 × 0.6 m 2 grids which served as reference points.Each grid was carefully labeled with ground-truth coordinates by two human surveyors.Reference points for training and testing are not overlapping.

•
At each reference point, more than 120 scans of both WiFi RTT and RSS signal measurements were collected.During collection, the influence of the human body was taken into consideration.• Each data sample was meticulously labeled with LOS conditions of all the APs in the testbed.• With more than 77,000 samples, the dataset provides good coverage for the evaluation of any WiFi RSS-based, RTT-based or hybrid indoor positioning systems.The realworld indoor environment guarantees the generalization of the proposed framework.

Conclusions and Future Work
In this article, a novel feature selection framework for LOS identification of WiFi APs was introduced.Our proposed framework efficiently selects the most optimal set of informative features for identifying WiFi LOS scenarios.Different from previous state-ofthe-art techniques where features were selected manually, our framework automatically investigates the importance of each feature on multi-scale datasets.
In the preprocessing stage, statistics of the input WiFi measurements were computed and fed into the importance filter.Several popular feature selection models were used in the importance filter to decide their own feature set based on different algorithms.Then, in the initial weight assignment step, the statistical features chosen by feature selection models were assigned with initial weights based on their macro F1 score and accuracy in LOS identification.Based on the empirical experience, RFC was used as the LOS identifier in our framework.Next, to validate the selected features from both macroscopical and microscopical perspectives, multi-sampling datasets are introduced in the feature selector.Based on the performance of the selected feature set, importance censoring and result voting were leveraged to adjust the weights of the features recursively.In the testing stage, the proposed framework extracts the same features from the testing data selected by the feature selector.
For evaluation of the framework, a dataset was collected in a large-scale real-world indoor environment.More than 120 scans of data samples were recorded and carefully labeled by two human surveyors.Since each AP was surrounded by complex interiors, the generalization of the proposed LOS identification algorithm could be validated.To investigate the improvement brought by feature selection, we compare the proposed framework with raw statistical features on individual AP and all APs identification.It was observed that using selected features by the proposed work improved the macro F1 score by up to 50%.We observed that using only 3 s data, the proposed framework provided promising LOS detection accuracy which is up to 93% for all APs at the same time and 98% for individual AP.
For future work, we may improve the combinations and selections of different feature selection models used in the importance filter (see Section 4.1.2) which may reduce the time cost and enhance the efficiency of the proposed framework.Furthermore, sliding windows in different sampling scales may be considered for implementation in multi-scale selection as introduced in Section 4.3.1.The sampling method leveraged in the proposed framework was not sensitive to different segmentations of a long consecutive data record.Using sliding window may help the framework to select better features for LOS identification.

Figure 1 .
Figure 1.The basic architecture of a classic WiFi indoor fingerprinting system.This system has two phases: the offline phase and the online phase.In the offline phase, the WiFi signal measurements are collected, preprocessed, labeled and stored in the dataset.In the online phase, the WiFi measurements received by the user from an unknown location are compared with the measurements in the database by the positioning algorithm to obtain the final location estimation.

Figure 2 .
Figure 2. The architecture of our proposed framework.As illustrated in the bar plot, our proposed framework greatly reduced the number of features while keeping the most informative ones.

Figure 3 .
Figure 3.A snapshot of some most important statistical features selected by the Permutation Importance (PI) model.The X-axis indicates different statistical features from the dataset.Negative mean accuracy values indicate that the corresponding features have the least correlation to the ground truth labels.We observe that AP #6 located in the middle of the testbed is the most informative AP for LOS identification.

Figure 4 .
Figure 4.The result from Hierarchical Clustering (HC).From the top node to the bottom, features are divided by their correlations to the others.Further separated features are less similar to each other.Please note that the order of the features listed is based on the result of HC.

Figure 5 .
Figure 5.The correlation between features from Hierarchical Clustering (HC).The X and Y axis represent the same set of features listed in the same order.Please note that this order is based on the result of HC.Lighter color indicates stronger correlation.

Figure 6 .
Figure 6.The layout of the building floor.The icons show the locations of the WiFi APs.Measurements were taken in the shaded area.The numbers next to the icons indicate the IDs of the Access points.

Figure 7 .
Figure 7.A histogram of the WiFi RSS signal measurement from the dataset proposed in Section 5.1.We observed that the RSS measures were unstable and easily attenuated during long observation period.

Figure 8 .
Figure 8.The WiFi RSS signal measure under different scenarios.A smartphone was placed 3 m away from the access point.We observed that in NLOS experiment where the signal was blocked by human body, the RSS measurement became unstable and reduced drastically due to the NLOS condition.

Figure 9 .
Figure9.The WiFi RTT signal measure under different scenarios.A smartphone was placed 3 m away from the access point.We observed that in human NLOS experiment where the signal was blocked by human body, the RTT measurement became larger, more unreliable and further away from the ground truth distance measure.

Figure 11 .
Figure 11.The comparison of the WiFi RSS and RTT measurements as a function of the true distance in LOS scenario.The RSS (as shown in green color) and RTT (as shown in orange color) values were normalized.We observed that under LOS conditions, RSS measures were more resilient.

Figure 12 .
Figure 12.The comparison of the WiFi RSS and RTT measurements as a function of the true distance in a NLOS scenario.The WiFi signal was blocked by a thick wall.The RSS (as shown in green color) and RTT (as shown in orange color) values were scaled between 0 and 1.We observed that under NLOS conditions, RSS and RTT measures had similar variance.

Figure 13 .
Figure13.The comparison of WiFi RSS and RTT measurements as a function of the true distance in a long narrow corridor.The WiFi signal suffered from severe reflections and attenuations.The RSS (as shown in green color) and RTT (as shown in orange color) values were normalized.We observed that in complex indoor spaces, the RSS produced large variance even with clear LOS path to the AP.The RSS measurements were unpredictable with similar values up to 9 m away.In contrast, the RTT measures were more stable and had a clear positive correlation to the true distance.

Figure 14 .
Figure 14.The CDF result of WiFi fingerprinting using the signal measures from only LOS APs, NLOS APs, and all the APs.Please note that all statistical features from the corresponding APs were leveraged.Introducing NLOS signal measures greatly reduced the performance accuracy.The largest positioning error was up to 7 m.

Figure 15 .
Figure 15.The RMSE result of indoor positioning WiFi signal measures from only LOS APs, NLOS APs, and all the APs.Please note that all statistical features from the corresponding APs were leveraged.Using only raw statistical features from LOS APs improved the positioning accuracy by up to 29 % compared to only using NLOS signals.

Figure 16 .
Figure 16.The performance of all APs and individual AP LOS identification using different sets of features.Please note that features included are all statistical features introduced in Section 4.1.1.

Figure 17 .
Figure 17.The mean value of AP1's RTT measures with different sampling sizes at a RP.The value of 100 indicates that there is no WiFi signal at this RP.Datasets of different sample sizes contain signal patterns of different time period.

Figure 18 .
Figure 18.The performance of all APs and individual AP LOS identification using datasets of different sampling sizes, compared to MSS.Statistical features of all APs were used except for MSS where only selected features were used.Features selected by MSS are more informative than features from single sample size dataset.

Figure 19 .Table 6 .Figure 20 .
Figure 19.The performance of different sets of features in all APs LOS identification.Our proposed MSS is up to 28.8% better than popular feature selection models and up to 14.5% better than state-ofthe-art WiFi LOS identification algorithms.Table 6.Comparison of the LOS identification performance of previous state-of-the-art.PI HC Fisher RFE Lasso MDI Pearson Chi Car pi C SEL Si S-F Choi Sun Proposed Weighted F1 0.90 0.88 0.91 0.72 0.89 0.88 0.91 0.88 0.41 0.89 0.57 0.88 0.89 0.68 0.93 Macro F1 0.68 0.71 0.70 0.68 0.67 0.68 0.71 0.70 0.67 0.69 0.67 0.68 0.68 0.43 0.78

Figure 21 .
Figure 21.The performance of the individual AP identification.APs which do not have LOS path to any RP are not included.Please note that only 8 and 4 RPs have LOS path to AP7 and AP10, respectively.MSS selected features achieve more than 0.5 F1 score performance even in APs with insufficient data.

Table 1 .
Comparison of the performance of notable work in LOS identification.
outputs from Algorithm 1, RFC:Random Forest Classifier, F1: macro F1 score calculation, Acc: accuracy calculation, N min : minimum number of features, SampleSizes: a set of different sample sizes Ensure: X : final set of selected features 1: Models ← {included feature selection models} 2: M ← |Models| 3: X new ← X 4: W new ← W 5: while X new = X old or |X new | ≤ N min do model ← m th model in Models 10: for s in SampleSizes do 11:

Table 2 .
Summary of our dataset.

Table 3 .
A Snapshot of the proposed WiFi dataset.

Table 4 .
The Macro F1 score performance of all APs and individual AP LOS identification using different sets of features.

Table 5 .
The Macro F1 score performance of all APs and individual AP LOS identification using datasets of different sampling sizes, compared to MSS.