Advanced Heterogeneous Feature Fusion Machine Learning Models and Algorithms for Improving Indoor Localization

In the era of the Internet of Things and Artificial Intelligence, the Wi-Fi fingerprinting-based indoor positioning system (IPS) has been recognized as the most promising IPS for various applications. Fingerprinting-based algorithms critically rely on a fingerprint database built from machine learning methods. However, currently methods are based on single-feature Received Signal Strength (RSS), which is extremely unstable in performance in terms of precision and robustness. The reason for this is that single feature machines cannot capture the complete channel characteristics and are susceptible to interference. The objective of this paper is to exploit the Time of Arrival (TOA) feature and propose a heterogeneous features fusion model to enhance the precision and robustness of indoor positioning. Several challenges are addressed: (1) machine learning models based on heterogeneous features, (2) the optimization of algorithms for high precision and robustness, and (3) computational complexity. This paper provides several heterogeneous features fusion-based localization models. Their effectiveness and efficiency are thoroughly compared with state-of-the-art methods.


Introduction
With seamless integration of the physical world and the digital world through networks, the era of the Internet of Things (IoT) beckons. It offers a tremendous amount of opportunities for numerous novel applications that contribute to a significantly improved daily life [1]. With the surge in demand for location services, Location-Based Services (LBSs) become one of the key applications. For outdoor localization under the Line-Of-Sight (LOS) propagation conditions, the Global Positioning System (GPS) has matured and been successfully applied in various fields. For indoor localization under the None-Line-Of-Sight (NLOS) propagation conditions, extending the GPS to indoor environments is extremely difficult due to irregular signal fading and multi-path interference [2]. Therefore, indoor localization requires innovative solutions.
Typically, existing indoor positioning methods can be divided into three categories: the affinity method, the geometric-based method and the fingerprint method. With the affinity method, the location of the target node is approximated by the location of the service node when the mobile target accesses the service node at the known location. The affinity method has low computational complexity yet poor localization precision. Geometry-based positioning methods are either time obtained by classification algorithm are discrete and struggle to meet the high precision requirements of indoor positioning. In order to improve the precision, some regression machine learning algorithms are applied. Support Vector Regression (SVR) [31][32][33] is used to find the positioning function that controls the accumulative error. However, the computational complexity of the SVR algorithm is cubic in the number of training data, because its solution process involves n-order positive definite matrix inversion [34]. Reference [35] proposed a deep-learning-based fingerprinting scheme which can fully explore the features of wireless channel data and obtain the optimal weights as fingerprints. However, the prediction performance of a deep neural network is highly dependent on the size of the training data set and may not be better than a machine learning algorithm when the number of training samples is small. Moreover, the training of a deep neural network requires a large amount of matrix operations and may be time-consuming and costly. The ridge regression algorithm [36] is another regression machine learning algorithm that has been used in localization [5]. Compared to the least squares method, the ridge regression algorithm is more reliable [37]. However, finding the optimal values for tuning regularization parameter and kernel parameter is the most complex part in terms of computations [5,37], and compared with WKNN and SVR, ridge regression has the best performance in terms of precision and robustness.
Increasingly researchers focus their attention on the Wi-Fi fingerprint localization [38] and Pedestrian Dead Reckoning (PDR) [39] localization system, which is based on inertial sensors for reasons of low cost, good compatibility, extendibility etc. However, the single-mode localization technology is unable to meet the demand of people in the complex indoor environment because of its own limitations. Thus, some fusion models, which are built based on two or more existing models, have been proposed. The most general fusion scheme is based on Wi-Fi fingerprint and PDR [40][41][42] since they possess complementary properties. For building the fusion model, Reference [43] makes use of the Kalman filter, which is the optimal filter for the linear model. However, since Kalman is a linear optimal filter, it cannot solve complex non-linear localization problems. To overcome this problem, the EKF (extended Kalman filter) [44] and UKF (unscented Kalman filter) [45,46], which are non-linear developments of the Kalman filter, are generally utilized. Compared with the Kalman filter, the particle filter [47] has better generalization ability, thus is more suitable for the non-linear problem. However, the localization model built with the particle filter is much more complicated, which could be destructive for real-time performance of the model. Besides this, some other methods, such as the Hidden Markov Model (HMM) and Conditional Random Field (CRF), are also utilized.

Motivation and Contribution
After summarizing the above references, we note that among the machine learning algorithms, regression methods are able to provide better performance in the accuracy of indoor localization. On the other hand, exploiting different features provides another way to improve the accuracy of localization. In order to address the above issue, we propose heterogeneous features fusion (HFF) machines to effectively improve localization. Multiple features fusion machine learning models are given. First, we propose the Heterogeneous Feature Fusion ridge regression (HFF-RR) model. The results show that precision is improved and the robustness to noise is also improved compared with Reference [5]. Second, in order to eliminate the bias caused by noisy data, we propose the heterogeneous feature selection (HFS) model by employing group LASSO [48]. Additionally, we have designed a fast algorithm to solve the model which combines the Newton iteration method with gradient descent. The algorithm is operated via the backtracking line search method, which accelerates convergence. Third, in order to separate the impact of each feature, we provide another machine model by using L1-Norm Penalty. Fourth, to reduce HFF-RR computational complexity, we simplify the HFF model as a set of underdetermined equations, then transform it as a constraint optimization problem. Numerical results show that, compared with other proposed learning methods, ours has the lowest computation complexity but relatively high accuracy of localization. We compare our proposed localization algorithms with two state-of-the-art localization algorithms based on feedback and correction of Wi-Fi signals and PDR information fusion: EKF and UKF [43]. Simulation data shows that our HFF model outperformances EKF and UKF when considering both localization accuracy and time efficiency.

Organizations
The rest of the paper is organized as follows. The general model for localization is firstly introduced in Section 2. Our proposed hybrid features machine learning models and algorithm are presented and compared in Section 3. In Section 4, the real data collection procedure is elaborated, and then the performance of the proposed methods is evaluated on both simulated and real-world signals. Finally, Section 5 concludes the work.

Machine Learning for Indoor Localization
We consider an environment of D dimensions in which the points denoted by p = [p 1 , . . . , p D ] are filled. In addition, two types of sensors are considered: Access Point (AP) and mobile nodes. APs as signal emission nodes are evenly distributed in the D-dimensional space, denoted by a r = [a 1 r , . . . , a D r ], r ∈ {1, . . . , N a }, where N a is the total number of APs. Mobile nodes are used for receiving signals, whose locations are known as training samples, denoted by p l = [p 1 l , . . . , p D l ],l ∈ 1, . . . , N p , where N p is the total number of mobile nodes. The fingerprinting localization scheme consists of two phases, namely offline and online phases, respectively, as illustrated in Figure 1. In the offline phase, the broadcast signals are transmitted by APs at a constant initial power. Meanwhile, each sensor placed at a known position is used to detect the signal features transmitted by all N a APs. Let where L denotes the empirical loss over training set,  is tuning parameter, and the regularization term R is usually a monotone function of the RKHS norm of d  . The regression loss is chosen because: (a) it produces a relatively good performance on localization accuracy while the other machine learning algorithms cannot compare; (b) it gives continuous results which are much more accurate than classification results; (c) ridge regression loss function is sometimes differentiable, which is preferred for optimization. We will show later that such a loss function gives rise to an efficient learning algorithm.
Therefore, the estimated dth coordinate can be obtained bŷ As one of the most popular machine learning tools, kernel machine is properly effective for learning a nonlinear function [49]. In kernel machine, the input data is implicitly embedded into a high-dimensional space by a nonlinear mapping. Linear functions in the transformed kernel space are naturally equivalent to a rich class of nonlinear functions in the original data space, which constitute the so-called reproducing kernel Hilbert space (RKHS) [50,51]. We make ϕ d (·) to be a kernel-based machine learning model, and the reproducing kernel function could be defined as In practice, the classic kernels functions are such as the linear, polynomial and Gaussian kernel functions. Here, we select the Gaussian kernel functions where f m is a N a column vector denoted the mth feature of input sample. Additionally, for each feature m, the similarity metric between two samples is represented by K m (f m , f m l ). Then we model the regression function byp where β m l is the unknown kernel regression coefficient associated with the lth sample and the mth feature. β 0 is the bias. The new model provides a flexible way to fuse multiple features, where the fusion weights are formulated as part of the kernel regression coefficients and will be adaptively estimated from the data.
To solve ϕ d (F), we minimize the following loss function where L denotes the empirical loss over training set, λ is tuning parameter, and the regularization term R is usually a monotone function of the RKHS norm of ϕ d . The regression loss is chosen because: (a) it produces a relatively good performance on localization accuracy while the other machine learning algorithms cannot compare; (b) it gives continuous results which are much more accurate than classification results; (c) ridge regression loss function is sometimes differentiable, which is preferred for optimization. We will show later that such a loss function gives rise to an efficient learning algorithm.

Fusion Machines Models and Algorithms
In this section, we need to find a set of functions ϕ d (F), d ∈ {1, . . . , D}, which associates each feature matrix F to the corresponding coordinates p d . We propose several learning models and efficient algorithms to find the appropriate ϕ d (F).

Heterogeneous Feature Fusion Ridge Regression (HFF-RR)
In this subsection, we define the function ϕ d (·) to minimize the following regularized risk: where β l are the corresponding coefficients of the lth sample, denoted by.β l = [β 1 l , . . . , β M l ] Let β be a N p × M + 1 column vector which consists of scalar β 0 and N p column vectors, denoted By plugging (4) into (6), the optimization problem (6) could be written in the following matrix format: The solution is obtained by taking the derivative with respect to β and setting it to zero: This leads to the following form of β: It is noticed that we can find an appropriate regularization coefficient λ to make the matrix (K T K + λE ) to be nonsingular.

Heterogeneous Feature Selection using Group LASSO Penalty (HFS-GLP)
Due to the large number of parameters, in order to prevent an overfitting problem and try to remove the noisy samples, we employ the group LASSO [52] regularization. To learn a group sparse model, the final cost function is defined as where · 2 denotes the l 2 norm. The group LASSO leads to a sparse constraint at group level by combining l 1 norm and l 2 norm, and it uses l 2 norm within a group and l 1 norm between groups.

An Efficient Iterative Optimization (EIO) Algorithm
Since the group LASSO regularization term is not differentiable, an iterative algorithm should be employed to minimize the model. In this paper, we proposed an efficient approach to solve our optimization problem (10). A gradient descent method the step size of which is acquired by the backtracking line search is a desirable algorithm. However, due to its extremely slow convergence near the point at which the target function achieves its minimum value, it is still hard to obtain the optimal solution. Meanwhile, the Newton iterative algorithm has quadratic convergence speed. The algorithm could reach convergence by only one iteration for the optimization problem when its Hessian matrix is positive definite. Therefore, we propose an improved learning algorithm which combines these two iterative algorithms. We call it Efficient Iterative Optimization (EIO). For simplicity, we outline the framework of EIO in Algorithm 1. Algorithm 1: Outline of EIO algorithm 1: let β (0) be the initial point 2: let t M be the number of iterations, do the gradient descent method whose step size is acquired by the backtracking line search, and output β opt .
3: let β opt be the initial point, do Newton iterative algorithm until the algorithm is converged. Output: Obtain the precision β opt EIO consists of two phases. In the first phase, we set an arbitrary point as the initial point and then obtain a point quite close to the optimal solution by using the gradient descent method the step size of which is acquired by the backtracking line search, executed by repeating where k denotes the kth iteration. The step size is obtained by using the backtracking line search method. In this method, α k is updated by α k = να k ,ν ∈ (0, 1) until the Armijo rule is met, i.e., the following inequality holds: where µ is an arithmetic number and µ ∈ (0, 0.5).
In the second phase, we use the Newton iterative method. We let the result of the previous phase be the initial point. In practice, the Newton iterative method exploits first-order and second-order information of the cost function to get the optimal β. Within the kth iteration, it is calculated by where H k−1 is the Hessian matrix of the target function C d at the point β (k−1) . We illustrate the effectiveness of the EIO algorithm through simulation experiments. In this simulation, the three iterative algorithms use the same initial value points. The simulation results are shown in Figure 2, which demonstrates the comparison among the objective function curves of these three iterative algorithms, i.e., the EIO algorithm and the individual Newton method and backtracking line search gradient decent method. The simulation results show that the objective function reaches 0.08 after 10,001 iterations in the EIO algorithm. However, the backtracking line search gradient descent algorithm still does not converge after more than 100,000 iterations, and the Newton iteration method does not converge.

Heterogeneous Feature Selection Using L1-Norm Penalty (HFS-LNP)
The model in Equation (10) removes the noisy samples which contain multiple features. In this subsection, we propose a learning model to remove the ruined features instead of multiple features of the sample. The corresponding optimization problem could be expressed as follow: As the interpretation of Reference [53], SALSA (Split Augmented Lagrangian Shrinkage Algorithm) which combines the augmented Lagrangian approach and the variable splitting technique is available for solving linear inverse problems with sparse regularization. By applying variable splitting to Equation (14), the constraint optimization problem is written as:

Heterogeneous Feature Selection Using L1-Norm Penalty (HFS-LNP)
The model in Equation (10) As the interpretation of Reference [53], SALSA (Split Augmented Lagrangian Shrinkage Algorithm) which combines the augmented Lagrangian approach and the variable splitting technique is available for solving linear inverse problems with sparse regularization. By applying variable splitting to Equation (14), the constraint optimization problem is written as:   We exploit the conclusion of Reference [53], and use the following solution to solve the optimization problem in Equation (15) initialized We exploit the conclusion of Reference [53], and use the following solution to solve the optimization problem in Equation (15) initialized where v = u − d and the operator K H is the Hermitian conjugate transformation of K.

Heterogeneous Feature Fusion by Solving Underdetermined Equations (HFF-UE)
For the algorithm proposed in previous sections, finding the optimal values for tuning regularization parameter and kernel parameter is the most complex part in terms of computations. Its computational cost would be increased by ten times as one parameter is added. In this subsection, we try to remove the regularization parameter to reduce the computation complexity. We formulate the training set (F l ; p d l ) N p l=1 as a group of equations Since the number of features in Equation (17) are much more than the number of equations, it is underdetermined. It could be rewritten as where p d is a N p column vector which is denoted by p d = [p d 1 , . . . , p d N p ] T and K is a "wide" matrix whose columns more than the rows, its rows are linearly independent. In this case, we formulate it as an optimization problem. min By using Lagrange multipliers, we can derive the closed-form solution in Equation (20).

The Relationship between the Proposed Four Learning Models
In this subsection, we address the differences and connections between our models. Table 1 lists the formulas for the four models and shows the performance of the four models in terms of precision, time efficiency and sparsity. These four learning models consist of the fitness term L and the penalty term R. The fitness terms are based on the smallest square error criterion. The penalty terms are diverse due to the different goals in positioning performance. HFF-RR applied L2-norm penalty term. Since the cost function is differentiable, there is an exact solution of this model which leads to the highest accuracy and robustness in positioning. In order to remove the noisy samples, the group LASSO which combines L1-norm and L2-norm is selected as the penalty term in HFS-GLP. HFS-LNP removes the ruined features by using L1-norm penalty term. Since L1-norm term is not differentiable everywhere, the optimal solution would be obtained by the iterative algorithm. Considering the computational burden, we propose an efficient approach to solve our optimization problem. Since the solution is calculated by iterations, the performance in terms of precision and robustness are poor in positioning. Furthermore, iterative computation leads to a reduction in computational efficiency. In HFF-UE, the optimization problem model is transformed from the Underdetermined Equations. It removes the regularization parameter so that the computational complexity is significantly reduced in the cross-validation phase. Therefore, it has the highest computation efficiency. Table 1. The relationship between the proposed four learning model.

Numerical Analysis and Results
In this section, we evaluate the performance of the proposed approaches by simulation with real data. The system is reviewed first with the data collection and zone division, then computational efficiency and accuracy of the approaches are evaluated by comparison among different machine learning based popular methods and other popular data fusion methods. In the second part, the proposed heterogeneous feature machine is compared with the RSS-based kernel machines in different noise scenarios.

Real Experiment Setup
To test the performance of the proposed models and algorithms, we did experiment in a school building. The floor plan is shown in Figure 3. The experiment area includes a long west-east oriented aisle and four shorter north-south oriented aisles. The long aisle is around 40 m while the shorter aisle is nearly 8.5 m. There are over ten APs arranged in the area with uniform specifications but unknown position. The direction from east to west is marked as X axis. The direction from south to north is marked as Y axis. The anchor points are set symmetrically with a 1.2 m spacing. There are 126 anchor points in total. In the office stage, we use TL-WN823N USB wireless network adapter which is compatible with the IEEE 802.11 n/g/b standard. The frequency of the system is operated on 2.4 GHz. In order to have enough data to do the simulation, we scan Wi-Fi RSS information at every anchor point 100 times, at a sampling interval of 1 s. The collected data is stored in text files. We import these data and perform simulation experiments on version 2015a of Matlab on a Sony laptop with Windows 7 and Intel ® Core™ i5 CPU.

Localization Accuracy and Computational Cost Evaluation
In this subsection, we mainly discuss the performance of the proposed four fusion machine models in terms of positioning accuracy and time efficiency. Meanwhile, we compare them with two state-of-the-art localization algorithms based on feedback and correction of Wi-Fi signals and PDR information fusion: EKF and UKF [43]. To evaluate the performance of various localization algorithms in terms of precision and time efficiency, the corresponding simulation results are recorded in Figures 4 and 5 and Table 2. Figure 4 demonstrates the cumulative distribution function (CDF) of position estimation error of each positioning algorithm. The Figure 5 shows the root mean square error (RMSE) in terms of Signal Noise Ratio (SNR). The RMSE, running time of parameters optimization and corresponding optimal parameter are shown in Table 2. In terms of positioning accuracy, EKF and UKF have higher localization accuracy and localization stability than RSS-based Wi-Fi fingerprint location algorithm i.e., WKNN. Table 2 shows that the average error of EKF and UKF is about 3 m, while WKNN is about to reach 4 m. In Figure 4, the positioning error probability of EKF and UKF is significantly lower than WKNN. Although EKF and UKF improve positioning accuracy and stability, our proposed localization algorithms outperform them in terms of positioning accuracy and positioning stability. From Table 2, the positioning accuracy of the four positioning algorithms proposed in this paper is higher than EKF and UKF. From Figure 4, the probability of HFF-RR positioning error at 3 m reaches 95%, HFF-UE and HFS-LNP are close to 90%, while EKF and UKF are less than 80%. From the results in Figure 5, we can observe that the value of RMSE decreases as the SNR increases. In terms of time efficiency, since finding the optimal value of the tuning parameters is the most complex part in terms of computations, we measured the average running time for the parameters optimization phase for each algorithm. To reduce the computational complexity of EKF and UKF, we used the WKNN clustering algorithm to cluster the Wi-Fi fingerprint database. The K parameter denotes the number of classifications in WKNN. From Table 2, we observe that EKF and UKF are both faster than other algorithms, and the HFF-UE algorithm performs slightly slower than EKF and UKF in terms of time efficiency. However, the HFF-UE algorithm outperforms EKF and UKF in terms of positioning accuracy. Compared to the HFF-RR algorithm with the highest accuracy, the accuracy of HFF-UE algorithm is only slightly poorer, but the time complexity is much lower than that of HFF-RR. Therefore, the HFF-UE algorithm is a good localization model while considering both localization accuracy and time efficiency. Another point worth noting from Table 2 is that the performance of the first two learning algorithms outperform the follow two algorithms in terms of whether precision or computation complexity. The large number of iterations in the follow two algorithms contribute to large amounts of running time in parameters optimization phase. Moreover, the result of an iteration is an approximation instead of an exact value of the optimal value. algorithms proposed in this paper is higher than EKF and UKF. From Figure 4, the probability of HFF-RR positioning error at 3 m reaches 95%, HFF-UE and HFS-LNP are close to 90%, while EKF and UKF are less than 80%. From the results in Figure 5, we can observe that the value of RMSE decreases as the SNR increases. In terms of time efficiency, since finding the optimal value of the tuning parameters is the most complex part in terms of computations, we measured the average running time for the parameters optimization phase for each algorithm. To reduce the computational complexity of EKF and UKF, we used the WKNN clustering algorithm to cluster the Wi-Fi fingerprint database. The K parameter denotes the number of classifications in WKNN. From Table 2, we observe that EKF and UKF are both faster than other algorithms, and the HFF-UE algorithm performs slightly slower than EKF and UKF in terms of time efficiency. However, the HFF-UE algorithm outperforms EKF and UKF in terms of positioning accuracy. Compared to the HFF-RR algorithm with the highest accuracy, the accuracy of HFF-UE algorithm is only slightly poorer, but the time complexity is much lower than that of HFF-RR. Therefore, the HFF-UE algorithm is a good localization model while considering both localization accuracy and time efficiency. Another point worth noting from Table 2 is that the performance of the first two learning algorithms outperform the follow two algorithms in terms of whether precision or computation complexity. The large number of iterations in the follow two algorithms contribute to large amounts of running time in parameters optimization phase. Moreover, the result of an iteration is an approximation instead of an exact value of the optimal value.     In this subsection, we compare our proposed heterogeneous feature machine with the RSS-based single feature machine proposed in Reference [5]  The entries of f 1 l are the RSS values at position p l emitted by different APs. They are generated utilizing the well-known Okumura-Hata model [54]. The power f 1 lr received at position p l from the AP a r can be expressed by: where ρ 0 is the initial power set to a fixed value 150 dBm, n p is the path-loss exponent set to 4, a r − p l is the Euclidian distance between the position p l and the position a r , and ε lr is the noisy in indoor wireless channel.
For the element of f 2 l , f 2 lr is the propagation time of the signal transmission from the position a r to the position p l . Its value can be obtained by the following formula: where c is 3.0 × 10 8 m/s which is the velocity of light, τ lr is the time delay caused by the propagation of light. For any learning algorithms, it is necessary to choose the optimum parameter for accurate positioning. We use the cross-validation to choose the optimal parameters. We use k-fold cross-validation, the basic form of cross-validation, consists of separating the data into k probably equally sized folds. At each iteration, k-1 folds are used for training and the rest for validation. For each group of parameters which just provides for a certain learning algorithm, the performance are measured by the mean error of validation set in k iterations. The optimal tuning parameters are these may contribute to the minimum mean error of validation. The value of k is set to 10. For each learning model, our simulation experiment can be divided into the following steps: • Optimizing the relevant parameters using 10-fold cross-validation method. • Learning the location model ψ(·) = [ϕ 1 (·), ϕ 2 (·)] in training set by using current learning algorithm. • Validating the model learned from the previous step in validation set. Figure 6 demonstrates the comparison between the estimated curves of the generated trajectory in several simulations by the heterogeneous feature machine and single feature-based kernel machine mentioned in Reference [5]. The heterogeneous feature machine learned via HFF-RR and HFS-LNP in the absence of noise. The single feature-based kernel machine learned by using single feature ridge regression. The estimation error, measured by the root mean squared distance between the exact positions and the estimated ones, as well as each optimal parameter, are shown in Table 3. We notice that the heterogeneous feature machine outperforms single feature-based kernel machine whether with HFF-RR or HFS-LNP.
For each learning model, our simulation experiment can be divided into the following steps:


Optimizing the relevant parameters using 10-fold cross-validation method.  Learning the location model         12   , ψ in training set by using current learning algorithm.  Validating the model learned from the previous step in validation set. Figure 6 demonstrates the comparison between the estimated curves of the generated trajectory in several simulations by the heterogeneous feature machine and single feature-based kernel machine mentioned in Reference [5]. The heterogeneous feature machine learned via HFF-RR and HFS-LNP in the absence of noise. The single feature-based kernel machine learned by using single feature ridge regression. The estimation error, measured by the root mean squared distance between the exact positions and the estimated ones, as well as each optimal parameter, are shown in Table 3. We notice that the heterogeneous feature machine outperforms single feature-based kernel machine whether with HFF-RR or HFS-LNP.   To prove the robustness of the proposed model, we carried out simulations in certain noisy scenarios. We set RSS with Gaussian random noise. We set TOA with non-line-of-sight errors. The estimation error and optimal parameters of our proposed model HFF-RR and the model introduced by Reference [5] are shown in Table 4. We consider three kinds of noise scenarios: noisy RSS but true TOA; noisy TOA but true RSS; and finally noisy TOA and noisy RSS. The results indicate that the mean error of localization using heterogeneous feature machine is far less than the single feature-based kernel machine in noise conditions. It is noted that the HFF model (4) adaptively selects features based on noise conditions in Tables 3  and 4. From Tables 3 and 4, we observed that σ m corresponding to the mth feature increase as the noise increases. According to Reference [3], increasing the value of σ m will reduce the correlation between the input vector f m and the value of kernel function. Therefore, the impact of the mth feature on the position coordinate estimation will decrease.

Conclusions
In this paper, we proposed several heterogeneous feature machine learning models for localization, namely, HFF-RR, HFS-GLP, HFS-LNP and HFF-UE. In the model of HFS-GLP, in order to solve the corresponding optimization problem, we proposed a novel iterative algorithm which combines the Newton iteration method and gradient descent. From the aspect of time efficiency, HFF-UE model shows the best performance among all four models. From the aspect of localization accuracy, HFF-RR model provides the highest precision and robustness to noise. From the aspect of removing outlier noise, HFS-GLP model can remove the noise at the sample level and the HFS-LNP can remove the noise at the feature level. In contrast to other latest data fusion method for indoor localization, the proposed methods outperform others in computational cost and localization accuracy by doing real experiments. On the other hand, in contrast to the single feature-based kernel machine, in our proposed localization model based on heterogeneous features, the accuracy is improved significantly. In the case of relatively poor channel environment, the positioning accuracy of the proposed model can still be maintained at a high level.