In this section, we first present the system architecture of the proposed IL model XGBLoc, and then introduce the input & output specification, objective function of the proposed model, and finally address RSSI data and preprocessing.
3.2. Input and Output Specification
In order to devise our model architecture, we first need to acquire a general IL dataset. In any given indoor localization dataset, fingerprints are usually defined as follows: . Depending on the dataset, the input and output can vary (such as absence of multiple buildings, etc). Therefore, we try to generalize the proposed model as much as possible.
Throughout the paper, we mainly consider the UJIIndoorloc dataset [
18], whose fingerprint elements are listed in
Table 2. In the training phase (i.e., offline phase), the RSSI value
R is combined with labels (
) to form a sample (
), where (
) is coordinates of landmarks (RPs) that are situated in building ID
on floor ID
. In the trained phase (i.e., online phase), the estimated output of the presented model, i.e., either (
,
) for classification tasks or (
,
) for 2-D regression tasks, is obtained. For that estimation, real-time measured RSSI data
R are used.
Currently, during data collection in the offline phase in a hierarchical smart environment, raw WiFi RSSI measurements are taken from a particular landmark or a RP, which is present in floor
a of building
b. In the offline phase, most of the conventional classification schemes use NRL, as shown in
Figure 1 [top], where floor ID
and building ID
are treated as distinct independent variables [
18]. In contrast, the proposed scheme employs RL, as shown in
Figure 1 [bottom], where a relational label relating those two IDs, i.e.,
&
, is defined and used for the localization over the multi-building multi-floor environment. In the RL, we exploit one-to-many relationship cardinality [
32] that maps each of every building with its corresponding floors, where each relationship is represented by a uniquely assigned relational label (or relational identification number), as shown in
Figure 1 [bottom].
For example, if building 1 has four floors, we can obtain a uniquely-assigned 4-digit relational label for each floor by applying one to many mapping, such as 1001, 1002, 1003, and 1004, where the first digit represents building numbers and last three digits represent floor numbers. By doing so, we establish a dependency of floor IDs on building IDs, which enables to classify building and floor together using only a single classification model. Then this classification model easily extracts floor IDs and their corresponding building IDs from those relational labels. Resultant output eliminates the chances of obtaining wrong combinations between floor IDs and building IDs. Thereby the proposed model reduces the false mismatch of floor IDs and building IDs during the prediction, as shown in
Figure 1 [bottom]. This implies that for a classification problem the resultant output of the proposed scheme (using that single classification model) correctly estimates the relationship of a building number and its corresponding floor numbers. Furthermore, for a 2-D regression problem, the output of the proposed scheme (
) represents the user’s estimated location in the multi-building multi-floor IL environments.
Conventional ML/DL schemes that mostly use NRL labels classify building IDs well since there is a distance of at least a few meters away between buildings, which means that WiFi signals of a building suffer less interference from the other nearby buildings. In addition, in a multi-building multi-floor environment, each building has a unique building ID; for example, if there are four buildings, the building IDs could be given as 1, 2, 3, and 4, respectively. As a result, those existing classifiers easily learn using NRL labels and associate a WAP with its corresponding building, resulting in improved performance at the perspective of building classification. However, when it comes to floor classification, it is a different story. The WiFi signal interference between adjacent floors in the same building is relatively high, and the floor ID might not even be unique across the entire multi-building complex. Hence, those existing classifiers may result in a degraded floor classification performance.
Differing from NRL, RL allows the classifier model to easily associate a building with its floors because each floor has its unique floor ID. Moreover, XGBoost that easily deals with such tabular data learning is a good candidate for classifying RL labels. Therefore, the proposed ML algorithm XGBLoc distinguishes between the floors of different buildings well over such complex hierarchical environments, resulting in increased classification performance.
3.3. Objective Function of Proposed XGBoost-Based ML Model
In the proposed model, XGBoost leveraging a gradient boosting framework is used as the foundation of our IL algorithm. It is well-known that XGBoost, a decision tree-based ML algorithm, is good for the learning of structured data. Some of the key reasons we choose XGBoost over other ML and Dl algorithms such as random forest (RF), DT, GBDT, and CNN are summarized as follows [
30,
33]:
With a similarity score, XGBoost prunes the tree. It calculates the node’s gain as the difference between the node’s similarity and the children’s similarity score. When the node’s gain is found to be nominal, it simply stops constructing the tree to a greater extent.
In real-world applications, classification performance, computational cost, and hyperparameter optimization are critical factors of choosing a good classifier. In the paper, we choose XGBoost as such a good candidate for the IL applications where a target object is localized over a multi-building multi-floor environment. Moreover, when compared to traditional ML/DL classifiers, XGBoost is capable of handling real-time data with many variations.
Especially, we use relational labels representing that hierarchical environment in a given dataset and train XGBoost using such translated tabular data. The proposed algorithm, simply termed as XGBLoc, performs better on those tabular data even with fewer data samples, when compared to other ML/DL algorithms.
Furthermore, to deal with over- and under-fitting, those existing ML/DL algorithms require extensive hyperparameter tuning. For instance, ML algorithms such as RF, KNN, and so forth require longer computational time. Furthermore, DL algorithms require a large number of data samples to perform well. However, XGBLoc requests a relatively simple hyperparameter tuning.
XGBoost, which is an ensemble tree approach, employs the gradient descent architecture boosting weak learners. Compared to typical gradient boosting schemes, XGBoost enhances the underlying gradient boosting framework further with system optimization and algorithmic improvements, which includes hardware optimization, parallel tree building, efficient handling of missing data, tree pruning using the depth-first approach, and regularization through both LASSO (L1) and Ridge (L2) for avoiding overfitting. A regularized (L1 and L2) objective function that has a convex loss term and a model complexity penalty term is minimized by XGBoost [
14]. The training proceeds iteratively until the final prediction results are obtained, while new trees that may reduce the residual errors of previous trees are continually inserted.
Table 3 shows the symbols used for defining the objective function of proposed ML model.
Consider a fingerprint dataset that consists of
N numbers of samples
, where
is the vector of received
M-dimensional RSSI values of the
ith RP and
is the position of the
ith RP. The dataset
D can be decomposed into two subsets
and
for the modeling. For
and
, XGBoost is used to predict
and
, respectively. Putting them together can be termed as an estimated position coordinate
. Considering the subset
as an example, where
K trees are assumed to have been trained, the predicted output for the
ith sample is
where
is the predicted output of the
kth tree for the sample
. The objective function can be modeled as
where
represents the
ith loss function and
is the complexity/regularization term of the
kth tree. For more mathematical details, please refer to [
14].
XGBoost supports hyperparameter tuning in order to tackle underfitting or overfitting problems. That is, the proposed scheme XGBLoc is tuned with hyperparameter tuning such that system performance is improved.
Table 4 lists hyperparameters of XGBLoc and its corresponding default values. Note that the value of “loss function” needs to be set “multi:softprob” for a multi-class classification problem or “reg:squarederror” for a regression problem, respectively.
3.4. RSSI Data and Preprocessing
According to the log-normal propagation model, the WiFi RSSI values measured degrade exponentially as the distance between a transmitter and a receiver increases [
34]. Moreover, in real-world scenarios, WiFi RSSI not only suffers from the interference by other radio signals but from the multipath caused by complex and dynamic indoor environments. All of these issues increase non-linearity and uncertainty of the WiFi RSSI signals. Another important aspect is the sparsity of raw WiFi RSSI data. Many WAPs including commercial APs and private Internet-connected smart IoT APs are installed in multi-floor buildings. In reality, at any particular reference point (RP), not all the APs can be scanned because an AP cannot cover the entire indoor environment. For instance, in the UJIIndoorLoc dataset [
18], only about 190 APs among a total of 520 APs were scanned in a floor. That is, a user (or a device) cannot sense RSSI from some distant APs such that RSSI values of those APs are recorded as empty (’NA’), as shown in
Figure 3.
Generally, these empty values are replaced with the minimum present RSSI value. For instance, in our evaluation of the UJIIndoorloc dataset, we replace those empty values with −98 dBm. However, sparsity remains a cause of concern because a large portion of that data is replaced with −98 dBm. Therefore, we need to reduce the dimensional sparsity. This dimensionality reduction will also help to reduce computational load and mitigate noise effects. Before reducing the dimensional sparsity of the data, we also need to bring RSSI values to a common scale because every vendor of WiFi may have different scales representing the RSSI values. To remove this heterogeneity from the data, we use a ZeroToOneNormalized technique [
35], that gives the scaled RSSI values
x as follows:
where
is the
mth WAP’s RSSI value,
is the minimum value of RSSI in the offline fingerprint database, and the value of 100 is used to indicate that no AP was detected. Other than removing the effect of heterogeneity of devices, the primary purpose of normalization is to modify numeric column values in the dataset such that the standard scale is utilized without distorting or losing the information distribution. Each iteration of the XGBoost algorithm optimizes the samples according to the residual error such that the algorithm’s bias will decrease, forcing it to be less sensitive to the outliers and noise. Then, the presented scheme employs PCA [
36] to extract some important features from sparse and raw WiFi RSSI values, while reducing the data dimensionality and decreasing impact of outliers.
PCA is a popular tool in current ML because it is a simple, non-parametric method for extracting meaningful information from complex datasets [
36]. In addition, PCA provides a pathway for reducing a complex high-dimensional dataset to a lower dimensional one with minimal effort [
37]. The fundamental advantage of PCA is that it quantifies the value of principal components in representing the variability of a dataset. The analysis of variance along with a small number of principal components (i.e., less than the number of measurement types) offers a meaningful representation of the entire dataset [
36]. For instance, for the UJIIndoorLoc dataset [
18],
Figure 4 shows that the first 100 eigenvectors (i.e., principal components) could explain roughly 90% of the explained variance ratio (EVR). Algorithm 1 shows the pseudo code of the proposed XGBLoc scheme reflecting system methodology above mentioned in this section.
Algorithm 1 Pseudo code of proposed XGBLoc scheme |
Input: Dataset , , , , and . Output: Predicted Location - 1:
while (, ) do - 2:
- 3:
Set RL as labels. - 4:
end while - 5:
if then - 6:
Skip Step 1 to 4. - 7:
Set (long, lat) as labels. - 8:
end if - 9:
if ) then - 10:
- 11:
end if - 12:
←. - 13:
← - 14:
Set Dataset ← Dataset - 15:
training set, testing set, validation set ← - 16:
if then - 17:
Train classification XGBLoc model. - 18:
Tune Hyperparameters return Predicted location : Symbolic location. - 19:
return Predicted location : Symbolic location. - 20:
else {} - 21:
Train regression XGBLoc model. - 22:
Tune Hyperparameters - 23:
return Predicted location : Physical location. - 24:
end if
|