Hierarchical Fusion of Machine Learning Algorithms in Indoor Positioning and Localization

: Wi-Fi-based indoor positioning o ﬀ ers signiﬁcant opportunities for numerous applications. Examining the Wi-Fi positioning systems, it was observed that hundreds of variables were used even when variable reduction was applied. This reveals a structure that is di ﬃ cult to repeat and is far from producing a common solution for real-life applications. It aims to create a common and standardized dataset for indoor positioning and localization and present a system that can perform estimations using this dataset. To that end, machine learning (ML) methods are compared and the results of successful methods with hierarchical inclusion are then investigated. Further, new features are generated according to the measurement point obtained from the dataset. Subsequently, learning models are selected according to the performance metrics for the estimation of location and position. These learning models are then fused hierarchically using deductive reasoning. Using the proposed method, estimation of location and position has proved to be more successful by using fewer variables than the current studies. This paper, thus, identiﬁes a lack of applicability present in the research community and solves it using the proposed method. It suggests that the proposed method results in a signiﬁcant improvement for the estimation of ﬂoor and longitude.


Introduction
Today, determination of indoor location and position of objects is an important subject, which provides many applications in various fields such as robotics, asset tracking, warehouse management, crowd analysis [1][2][3][4]. The idea of indoor positioning and localization based on the strength of wireless signal was first introduced in 2000 [5]. Since then, it has sparked great interest and become the subject of many research areas. The concepts of positioning and localization are often confused. The position of an object corresponds to a specific point in a coordinate system such as longitude and latitude. However, the location of an object provides details of the position such as office, floor, building, etc. In this regard, localization can be defined as a simpler or categorical state of the positioning process. Localization is sufficient for applications that require less precision such as crowd analysis and storage-asset tracking. Positioning, on the other hand, is required for more precise operations such as robotics.
Today, creating an economic, accurate, and precise system by utilizing the already installed Wi-Fi infrastructure is one of the most important goals of the research community. To this end, it is essential to use the pattern created by the propagation and address broadcasting of the Wi-Fi access points. The pattern provided by the numerous access points that are fixed to various points becomes unique by undergoing various reflections and weaknesses with the contribution of its spatial structure. This unique pattern is called Wi-Fi fingerprint. The simple Wi-Fi fingerprint data consist of the received signal strength ML methods with hierarchical inclusion are then investigated. Further, new features are generated according to the measurement point obtained from "UJIIndoorLoc" dataset. Subsequently, learning models are selected according to the performance metrics for the estimation of location and position. These learning models are then fused hierarchically using deductive reasoning. The second part of the paper introduces the dataset adopted in this study and the method of positioning. The third part presents the results of the study. Finally, a conclusion is suggested according to the results. The main contribution of this study will be to propose a more general dataset structure for datasets such as localization and positioning in the hierarchical structure and to provide a more successful retrofit method in hierarchical situations.

Materials and Methods
The overall system for Wi-Fi-based positioning and localization consists of fixed WAP's, mobile devices, and positioning services. The stages of this general method are shown in Figure 1. To start with, new features are generated from the dataset. The generated dataset is split into two categories, namely training and test. This process is carried out randomly and the ratio of training data and test data is 70% and 30%, respectively. In the next stage, training data are presented to Gaussian naive Bayes (NB), decision tree (DT), K-nearest neighbors (KNN), support vector machine (SVM), random forest (RF), and multi-layer perceptron (MLP) algorithms. These learning algorithms were obtained from Scikit-learn library [25][26][27][28][29]. They are then compared according to the classification and regression performance metrics, which are explained in the ML section, while the most suitable algorithms are selected for positioning and localization. In addition, the hierarchical fusing ML (HF-ML) algorithm is shown in Figure 2. In this algorithm, the dataset is primarily used for the training of building classification. Results of the building prediction are then combined with the input dataset. Consequently, prediction results and the input dataset are used for the training of floor prediction. After the training, floor predictions are combined with the previous dataset. This dataset contains the input dataset, building predictions, and floor predictions. The resulting dataset of these combined hierarchical prediction procedures is used in the training of longitude and latitude regression. The selected learning models are fused hierarchically using deductive reasoning. According to HF-ML, there are two more types of input data, namely building information and floor information. Since two new variables were introduced, HF-ML was trained with the training data and the performance was evaluated with the test data. This obtained model can produce location and position data after applying the feature generation process to the new data. See the supplementary materials for the codes and the data.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 16 order to achieve this goal, machine learning (ML) methods are compared and the results of successful ML methods with hierarchical inclusion are then investigated. Further, new features are generated according to the measurement point obtained from "UJIIndoorLoc" dataset. Subsequently, learning models are selected according to the performance metrics for the estimation of location and position. These learning models are then fused hierarchically using deductive reasoning. The second part of the paper introduces the dataset adopted in this study and the method of positioning. The third part presents the results of the study. Finally, a conclusion is suggested according to the results. The main contribution of this study will be to propose a more general dataset structure for datasets such as localization and positioning in the hierarchical structure and to provide a more successful retrofit method in hierarchical situations.

Materials and Methods
The overall system for Wi-Fi-based positioning and localization consists of fixed WAP's, mobile devices, and positioning services. The stages of this general method are shown in Figure 1. To start with, new features are generated from the dataset. The generated dataset is split into two categories, namely training and test. This process is carried out randomly and the ratio of training data and test data is 70% and 30%, respectively. In the next stage, training data are presented to Gaussian naive Bayes (NB), decision tree (DT), K-nearest neighbors (KNN), support vector machine (SVM), random forest (RF), and multi-layer perceptron (MLP) algorithms. These learning algorithms were obtained from Scikit-learn library [25][26][27][28][29]. They are then compared according to the classification and regression performance metrics, which are explained in the ML section, while the most suitable algorithms are selected for positioning and localization. In addition, the hierarchical fusing ML (HF-ML) algorithm is shown in Figure 2. In this algorithm, the dataset is primarily used for the training of building classification. Results of the building prediction are then combined with the input dataset. Consequently, prediction results and the input dataset are used for the training of floor prediction. After the training, floor predictions are combined with the previous dataset. This dataset contains the input dataset, building predictions, and floor predictions. The resulting dataset of these combined hierarchical prediction procedures is used in the training of longitude and latitude regression. The selected learning models are fused hierarchically using deductive reasoning. According to HF-ML, there are two more types of input data, namely building information and floor information. Since two new variables were introduced, HF-ML was trained with the training data and the performance was evaluated with the test data. This obtained model can produce location and position data after applying the feature generation process to the new data. See the supplementary materials for the codes and the data.

Raw Dataset and Properties
This study uses the dataset named "UJIIndoorLoc", which was obtained at Jaume I University [21]. The dataset was collected from 3 different buildings, each of them having 5 floors. A total of more than 20,946 samples were collected. This dataset is publicly available online [30], and the variables in it are explained in Table 1. Feature codes, properties of these features, and their range are outlined in this table below.

Raw Dataset and Properties
This study uses the dataset named "UJIIndoorLoc", which was obtained at Jaume I University [21]. The dataset was collected from 3 different buildings, each of them having 5 floors. A total of more than 20,946 samples were collected. This dataset is publicly available online [30], and the variables in it are explained in Table 1. features, and their range are outlined in this table below.

Feature Generation
For the method employed in the study, no data based on the user id, timestamp, phone id, and space id were used. The study is based on the production of a new dataset in order to provide an alternative to a system that is less applicable due to the dependence of the data on a specific location, user, phone, or WAP number. Also, the generated features are explained in Table 2. In this table, abbreviations of generated features are presented according to the data type and each of these features are explained. Moreover, the absolute Pearson cross-correlation coefficients of the generated data are shown in Figure 3. According to this figure, building, longitude, and latitude values are highly correlated. Also, FirstMAC, SecondMAC, and ThirdMAC features provide better correlation for output values.

Feature Generation
For the method employed in the study, no data based on the user id, timestamp, phone id, and space id were used. The study is based on the production of a new dataset in order to provide an alternative to a system that is less applicable due to the dependence of the data on a specific location, user, phone, or WAP number. Also, the generated features are explained in Table 2. In this table, abbreviations of generated features are presented according to the data type and each of these features are explained. Moreover, the absolute Pearson cross-correlation coefficients of the generated data are shown in Figure 3. According to this figure, building, longitude, and latitude values are highly correlated. Also, FirstMAC, SecondMAC, and ThirdMAC features provide better correlation for output values.

Machine Learning
Gaussian naive Bayes (GNB) classifier is a probabilistic classifier algorithm based on Bayes' theorem [31]. Due to its low computational load and high correct classification performance, it has been widely chosen in the studies aimed for classification. The aim of the naive Bayes classifier is to compute the i-th observation that is obtained by computing the posterior probability of class c for a given feature x i . Calculation of the posterior probability is given in Equation (1). In this equation P(c|x i ), is the posterior probability while P(c) is class prior probability and P(x i |c) is the likelihood, which means probability of i-th feature, and finally, P(x i ) is the prior probability of feature i. There are three main steps in the naive Bayes algorithm. In the first step, the algorithm calculates the frequency values of each output category according to the dataset. In the second step, the likelihood table is calculated. In the last step, probability of output class according to Gaussian Distribution is calculated [31,32].
The K nearest neighbors (KNN) algorithm is a learning algorithm that operates according to the values of the nearest k-neighbor. The KNN algorithm is a non-parametric method for classification and regression [33]. It was first applied to the classification of news articles [34]. When performing learning with the KNN algorithm, firstly, the distance of the data to others is calculated in the dataset examined. This length calculation is done with Euclidian, Manhattan, or Hamming distance function. Then, the mean value of the nearest K neighbors is calculated for the data. The K value is the only hyper-parameter of the KNN algorithm. If the K value is too low, then the borders are going to be flickering and overfitting will occur, whereas if the K value is too high, the separation borders are going to be smoother and underfitting will occur. The disadvantage of the KNN algorithm is that in the distance calculation process, it increases the processing load as the number of data increases.
Decision tree (DT) algorithm is an algorithm that is frequently used in statistical learning and data mining. It works with simple if-then-else decision rules and is used for both classification and regression in supervised learning [35]. It has three basic steps. In the first step, the most meaningful feature is placed as the first (root) node. In the second step, datasets are divided into subsets according to this node. Subsets should be created in such a way that each subset contains data with the same value of a feature. In the third step, step one and two are repeated until the last (leaf) nodes in all of the branches are created. The decision tree algorithm builds classification or regression models in the form of a tree structure. It splits a dataset into smaller subsets while at the same time an associated decision tree is incrementally developed. The result of this algorithm is a tree with decision nodes. Decision trees can handle both categorical and numerical data [36,37].
The RF method is designed in the form of a forest consisting of many DT's [38]. Each decision tree in the forest is formed by selecting the sample from the original dataset with the bootstrap technique and selecting the random number of all variables in each decision node. The RF algorithm consists of four fundamental steps. First, n features are randomly selected from a total of m features. Second, the node d using the best split point is calculated among the n features. Third, it checks whether the number of final (leaf) nodes reaches the target number. If it does, it moves on with the next step; if not, the algorithm goes back to step one. Finally, by repeating steps one to three for n (number of trees in the forest) times, a forest is built [38][39][40].
The adaptive boosting (AB) algorithm is a ML method, which is also known as an ensemble method. The aim of this method is to create a stronger learning structure by using weak learners. AB can be used to improve the performance of any ML algorithm. However, it often uses a single-level decision tree algorithm as a weak learner algorithm since the processing load is much lower than other basic learning algorithms. The AB algorithm consists of four basic steps. In the first step, the N weak algorithms are run and the dataset is learned. Each of these N weak learning algorithms is assigned a weight value of 1/N. In the second step, error values are calculated in each of the learner algorithms.
In the third step, the weight value of the learning algorithm with a high amount of error is increased. In the final step, the learning algorithms are summed with weight values and if the desired metric limit is reached, the total algorithm is output; otherwise, it returns to the second step [41][42][43]. As the number of learners in this algorithm increases, the processing load and the learning performance increases.
First introduced by Vapnik, SVM is a supervised learning algorithm based on the statistical learning theory [44,45]. SVM was originally developed for binary classification problems. While SVM was only used for classification at first, it eventually began to be used in linear and non-linear regression procedures as well [46]. SVM aims to find the hyper-plane, which separates classes from each other, and which is the most distant from both classes. In cases where a linear separation cannot occur, the data are moved to another space of a higher dimension, and the classification is performed in that space. While the classification processes aim to separate data from each other by means of the generated vectors; the regression, in a sense, performs the opposite of this process and aims to create a hyper-plane by identifying support vectors in a way that they will include as many data as possible [47]. The SVM is mainly divided into two categories according to the linear separability of the dataset. Non-linear SVM decision function is given in Equation (2). In cases where the data cannot be separated linearly, the data are moved to a space of higher dimension, and kernel functions are used to resolve them. The transformations can be made by using Kernel functions expressed as K(x i ,x j ) = Φ(x) Φ(x i ) instead of Φ(x) Φ(x i ) scalar product given in Equation (2). Also, non-linear transformations can be made, and the data can be separated in the high dimension thanks to the Kernel functions. Therefore, Kernel functions have a critical role in the performance of SVM. The most widely used of these Kernel functions are linear, polynomial, Radial Basis Function (RBF), and Sigmoid, all of which are given in Table 3 [47][48][49]. In this table, x i and x j corresponds to window coefficients. x i is the width parameter and the x j represents the place of the window. In the polynomial function, the parameter d corresponds to the degree of the polynomial. In radial basis function, on the other hand, γ is equal to Gaussian function. In this study, only RBF was used as the SVM Kernel function. Table 3. Support vector machine (SVM) kernel functions.

Kernel Function Equation
Linear K-fold cross validation is a method used in the performance evaluation of learning algorithms [50,51]. In the evaluation processes, it is essential to create a dataset for training and testing and to test the performance of the model that actualizes the learning with training data on the test dataset. However, performance evaluation may not be reliable because of not having the same distribution when selecting the training and test data in the dataset, uneven distribution of outliers, and so on. Therefore, K-fold cross validation method was developed. In this method, training and test data are integrated and turned into a single dataset. All data are divided into K equal parts. The K value here is determined by the user. Then, learning and testing are performed for each of the K sub-sets; here, one of the subsets will be used for the test, and the others are going to be used for the training. As a result, performance metrics are obtained for each of the sub-sets. The average of the performance metrics is considered as the performance metric of the K-fold cross-validation. In this study, the K value was chosen as 10.
The metrics of classification performance are obtained through the confusion matrix in Figure 4 as outlined in Table 4. The true positive (TP) value in a two-class confusion matrix is the number of predictions where the predicted value is 1 (true) when the actual value is also 1 (true). The true negative (TN) value is the number of predictions where the predicted value is 0 (false) when the actual value is also 0 (false). The false positive (FP) value is the number of predictions where the predicted value is 1 (true) when the actual value is 0 (false). The false negative (FN) value is the number of predictions where the predicted value is 0 (false) when the actual value is 1 (true). The mean square error (MSE), mean average error (MAE), and coefficient of determination (R 2 ) metrics used in the regression performance evaluation are outlined in Table 5. It represents the sample standard deviation of the differences between predicted values (ŷ j ) and observed values (y j ). predicted value is 1 (true) when the actual value is 0 (false). The false negative (FN) value is the number of predictions where the predicted value is 0 (false) when the actual value is 1 (true). The mean square error (MSE), mean average error (MAE), and coefficient of determination (R 2 ) metrics used in the regression performance evaluation are outlined in Table 5. It represents the sample standard deviation of the differences between predicted values ( ) and observed values ( ).

Metric Equation
Mean

Results and Discussion
In this section, HF-ML localization and positioning results are presented. The generated dataset consists of 20,946 rows with 10 columns of constructed features. When the data are split (70% training, 30% test), 4190 samples are available for test and 16,756 for training.   Table 5. Regression metrics.

Metric Equation
Mean

Results and Discussion
In this section, HF-ML localization and positioning results are presented. The generated dataset consists of 20,946 rows with 10 columns of constructed features. When the data are split (70% training, 30% test), 4190 samples are available for test and 16,756 for training.

Building Classification Results
Building classification is the first step in HF-ML localization and positioning. Only the generated data are used for classification. The results obtained by ML methods are presented in Table 6. The accuracy value in this table is provided for both the training and testing. The value given for the training is the average of the results obtained in the 10-fold cross-validation stage. The precision, recall, and F1 values obtained for the testing are calculated according to the distribution weights, since there are three different output values and these outputs are in different numbers in the dataset. According to the findings, the RF algorithm is considered to be the most successful algorithm in both training and test metrics. When the results are examined in terms of accuracy metrics, the RF algorithm produces a value of 0.993 in training and a value of 0.995 in the test. According to the results of the RF algorithm for the testing data, the confusion matrix is presented in Figure 5. In this figure, it is seen that the actual values and the predicted values match to a very high degree. Also, there are small numbers of confusion between close buildings. Building classification is the first step in HF-ML localization and positioning. Only the generated data are used for classification. The results obtained by ML methods are presented in Table 6. The accuracy value in this table is provided for both the training and testing. The value given for the training is the average of the results obtained in the 10-fold cross-validation stage. The precision, recall, and F1 values obtained for the testing are calculated according to the distribution weights, since there are three different output values and these outputs are in different numbers in the dataset. According to the findings, the RF algorithm is considered to be the most successful algorithm in both training and test metrics. When the results are examined in terms of accuracy metrics, the RF algorithm produces a value of 0.993 in training and a value of 0.995 in the test. According to the results of the RF algorithm for the testing data, the confusion matrix is presented in Figure 5. In this figure, it is seen that the actual values and the predicted values match to a very high degree. Also, there are small numbers of confusion between close buildings.

Floor Classification Results
Floor classification is the second step in HF-ML localization and positioning. For the classification, the generated data and the predictions produced by the selected RF algorithm according to the results presented in the previous section are used. The results obtained with the ML methods are presented in Table 7. The accuracy value in this table is provided for both training and testing. The value given for the training is the average of the results obtained in the 10-fold crossvalidation stage. The precision, recall, and F1 values obtained for the testing are calculated according to the distribution weights since there are five different output values and these outputs are in different numbers in the dataset. The HF tag is added when the generated data are combined with the predictions of RF building classification. In general, when the performance metrics are considered, the positive effect of HF is observed. According to the findings, the HF-RF algorithm is considered to be the most successful one in terms of both the training and test values. In terms of accuracy metrics, the HF-RF algorithm produces a high score of 0.951 in training and 0.960 in testing.

Floor Classification Results
Floor classification is the second step in HF-ML localization and positioning. For the classification, the generated data and the predictions produced by the selected RF algorithm according to the results presented in the previous section are used. The results obtained with the ML methods are presented in Table 7. The accuracy value in this table is provided for both training and testing. The value given for the training is the average of the results obtained in the 10-fold cross-validation stage. The precision, recall, and F1 values obtained for the testing are calculated according to the distribution weights since there are five different output values and these outputs are in different numbers in the dataset. The HF tag is added when the generated data are combined with the predictions of RF building classification. In general, when the performance metrics are considered, the positive effect of HF is observed. According to the findings, the HF-RF algorithm is considered to be the most successful one in terms of both the training and test values. In terms of accuracy metrics, the HF-RF algorithm produces a high score of 0.951 in training and 0.960 in testing. The confusion matrix of the RF and HF-RF algorithms are presented in Figures 6 and 7, respectively. In these figures, it is seen that the actual and the predicted values match to a very high degree. Also, there are small numbers of confusion between floors close to each other. Comparing Figures 6 and 7, the decrease in the wrong estimated values is clearly seen. The confusion matrix of the RF and HF-RF algorithms are presented in Figures 6 and 7, respectively. In these figures, it is seen that the actual and the predicted values match to a very high degree. Also, there are small numbers of confusion between floors close to each other. Comparing Figures 6 and 7, the decrease in the wrong estimated values is clearly seen.

Longitude and Latitude Regression
The algorithms for longitude and latitude estimation have been analyzed with ML using the generated data and additional HF data. Previously, RF was used for building classification and HF-RF for floor classification. The NB algorithm was excluded from the longitude and latitude estimation because it was not used in the regression process. Performance metrics of longitude regression are presented in Tables 8 and 9. Considering the general performance metrics, application of HF has a positive effect on longitude estimation. When the algorithms are analyzed in terms of MSE metrics, the most successful algorithm turns out to be the HF-DT. While the MSE value of the regression with DT was 270.975, the MSE value of HF-DT decreased to 246.582. The results obtained with HF-DT for training data have a very low MSE value of 3.751.

Longitude and Latitude Regression
The algorithms for longitude and latitude estimation have been analyzed with ML using the generated data and additional HF data. Previously, RF was used for building classification and HF-RF for floor classification. The NB algorithm was excluded from the longitude and latitude estimation because it was not used in the regression process. Performance metrics of longitude regression are presented in Tables 8 and 9. Considering the general performance metrics, application of HF has a positive effect on longitude estimation. When the algorithms are analyzed in terms of MSE metrics, the most successful algorithm turns out to be the HF-DT. While the MSE value of the regression with DT was 270.975, the MSE value of HF-DT decreased to 246.582. The results obtained with HF-DT for training data have a very low MSE value of 3.751.

Comparison of Findings with Other Studies
Comparison of localization studies are presented in Table 12 and the comparison of positioning studies are given in Table 13. According to the Table 12, the most successful method seems to be the one proposed by Bozkurt et.al. [12]. However, when the average of accuracy values obtained as a result of the proposed method is considered, it ranks second among these studies. According to the Table 13, the proposed method HF-DT appears to be the most successful one. The results of the extra-trees are given as root mean square error; however, to compare with the values present in this study, the square of the given values are converted into the MSE metrics [11]. Carrying out a literature review, it was found that the only study to perform both localization and positioning procedures was conducted by Akram et.al. Also, according to the findings, it was seen that the proposed method provides higher performance values [14].

Conclusions
In this study, first of all, a more applicable, standard, and simple dataset structure was designed, and then a hierarchical structure of the new dataset as well as a higher performance method was introduced. Using the proposed method, estimation of location and position has proved to be more successful by using fewer variables than the current studies. This paper, thus, identifies a lack of applicability present in the research community and solves it using the proposed method. According to the results obtained with the application of the method, the following conclusions were obtained. A simple, standardized, and more efficient dataset has been proposed instead of an indoor positioning dataset whose implementation is difficult and inefficient. This new, simple, and more standardized dataset that was proposed instead of the classic dataset has achieved high performance values. Also, a hierarchical fusion method was proposed to improve the classical ML models. The proposed HF method resulted in a significant improvement in floor and longitude estimation. The method was developed from the general to the specific using the Hierarchical structure of the localization process. Here, by transferring the estimation, it is ensured that the correct parts of the algorithms are strengthened.
When the findings were examined, RF was found to be the most successful algorithm for building classification. In terms of accuracy metrics, the RF algorithm produces a value of 0.993 in training and a value of 0.995 in testing. Also, HF-RF is chosen as the most successful method for floor classification. In terms of accuracy metrics, the HF-RF algorithm produces a high score of 0.951 in training and 0.960 in testing. In terms of localization, this study concludes that the variable production and the HF method provide efficiency.
The HF method has led to an improvement in the longitude estimation process, while it has also been shown to cause deterioration in the latitude estimation process. For this reason, in the estimation of longitude, HF-DT, which has a 3.751 MSE, was preferred and DT, which has a 3.751 MSE, was preferred in latitude estimation. In terms of positioning, this study concludes that the variable production and the HF method provide efficiency. Additionally, it was observed that the method of variable production proves to be successful. Further, it was concluded that the HF method only yields longitude estimation. Therefore, it is promising for the position estimation in indoor areas such as rooms, corridors, etc. Although there was no attempt to estimate the coordinates in any of the similar studies, a coordinate estimation was carried out in this study. However, the coordinate estimate was not as precise as GPS. This study serves as a preliminary study for future localization or indoor localization systems using Wi-Fi. Thanks to the results of this study, a strengthening technique with HF-ML method should provide higher performance results in similar hierarchical datasets. This study is expected to provide higher performance positioning services in indoor spaces in the future. In the future, studies on wireless signal mapping and campus positioning system are planned considering the advantage of the proposed dataset structure and method.
Author Contributions: A.Ç.S. was mainly responsible for the technical aspects and the modeling, as well as the conceived and designed manuscript. A.Ç.S. and A.C. contributed equally in the analysis of the data and writing of the manuscript.