Heterogeneous Transfer Learning for Wi-Fi Indoor Positioning Based Hybrid Feature Selection

This paper presents the application of heterogeneous transfer learning (HetTL) methods which consider hybrid feature selection to reduce the training calibration effort and the noise generated by fingerprint duplicates obtained from multiple Wi-Fi access points. The Cramer–Rao Lower Bound analysis (CRLB) was also applied to evaluate and estimate a lower limit for the variance of a parameter estimator used to analyze positioning performance. We developed two novel algorithms for feature selection in fingerprint-based indoor positioning problems (IPP) to enhance positioning performance in the target domain with the HetTL. The algorithms comprised two scenarios: (i) a principal component analysis-based approach (PCA-based) and (ii) a hybrid approach that takes both PCA and correlation effect analysis into account (hybrid scenario). Accordingly, a new feature vector was constructed by retaining only the most significant predictors, and the most efficient feature dimensions were also determined by using a hybrid-based approach. Experimental results showed that the hybrid-based proposed algorithm has the minimum mean absolute error. The CRLB analysis also showed that the number of Wi-Fi access points could affect the lower bound location estimation error; however, identifying the most significant predictors is an effective approach to improve positioning performance.


Introduction
Indoor positioning systems (IPSs) have received significant attention in both academia and industry due to the growth of wireless communication infrastructure and the everincreasing demand for location-based services (LBSs) [1][2][3][4]. Currently, more than 80% of the world's population owns a smartphone [5], it is estimated that most people spend around 80% of their daily lives indoors [6], and 74% of smart device owners are active users of smartphone location-based applications [7,8]. The predominant use of locationbased information is required for a variety of applications, including but not limited to: military use (originally designed for this purpose based on Global Positioning System (GPS)), shopping malls to guide customers to obtain services, hospitals to monitor patients for better health services, marketing to assist in the display of advertisements, emergency services, navigation, social networking services, multimedia geotagging, location and tourism, and so on [1,[7][8][9][10][11][12][13]. Although GPS receivers are known for their promising and dependable accuracy in an outdoor environment, their applications are limited to indoor positioning due to the complex nature of the indoor environment, which includes: no direct line of sight, poor GPS signal penetration through complex internal buildings, and severe internal channel conditions such as shadows and multipath fade [14,15]. In addition to higher accuracy, an indoor positioning system for mobile devices should have low (1) We proposed the combination of both PCA and Heterogeneous Transfer Learning methods to minimize fingerprint duplications by constructing a new feature vector that could extract the most significant features as well as enable positive knowledge transfer for the location estimation. (2) We applied correlation analysis (CA) to test the independence of the Wi-Fi received signals from multiple access points mainly to reduce the effect of multicollinearity among the various RSS Wi-Fi access points aiming to avoid irrelevant features from the analysis. We claim that the RSS fingerprints collected from a grid point might be duplicated possibly from multiple Wi-Fi APs and create interference of matched patterns with the actual user's fingerprint or irrelevant features might be generated and recorded in the database which negatively inflates or degrades the positioning performance. (3) We build a fusing coefficient approach to exploit both offline and online sources to reduce labor-intensive database fingerprint construction and enhance positioning performance in the target domain through heterogenous transfer learning-based hybrid feature selection. (4) We analyze the unbiasedness of the location estimator for the fingerprint-based indoor positioning system or estimate lower bound variance for the estimator through Cramer-Rao lower bound analysis.
The remainder of this paper is structured as follows: related works are presented in Section 2. Section 3 discusses the fingerprinting positioning framework, as well as its problem formulation, evaluation matrices, and Cramer-Rao lower bound analysis (CRLB). Section 4 presents experimental results and discussions. Conclusions are provided in Section 5.

Related Works
This section provides an overview of indoor positioning systems, as well as potential location estimation performance evaluation metrics and the challenges that limit the applications of Wi-Fi RSS signal features in indoor positioning. Indoor positioning systems have attracted a significant amount of research effort in the last two decades due to the increasing use of location-based services (LBSs) that happen to be mostly indoors. Due to the poor performance of GPS signals indoors, various wireless technologies have been proposed and studied to solve indoor positioning problems with greater accuracy [45][46][47]. Despite advances in the location service industry over the past decades, indoor positioning remains a more complex task than outdoor positioning due to the complex nature of the indoor environment, where radio signals are characterized by NLOS propagation, multipath effects, and a dynamic environment [45][46][47][48]. Thus, an indoor positioning system should not only have higher positioning accuracy, but also low computational complexity and a short positioning time. To this end, a novel approach to indoor positioning systems that takes advantage of existing wireless communication infrastructure and is economically viable is also required.
Several signal features, including channel state information (CSI), angle of arrival (AOA), time of arrival (TOA), and received signal strength (RSS), have been used as fingerprints to estimate the location of a user or mobile device to address indoor positioning requirements [15,20]. However, the first three measurement methods (TOA, CSI, and AOA) cannot be used directly with the existing wireless network infrastructures and have high implementation costs because they require additional hardware devices to store accurate timing information, network interface cards (NICs) to extract CSI measurements, and multi-directional antennas [25][26][27]. In contrast, the RSS-measurement method has been widely used as a location fingerprint for indoor positioning due to its low cost and ease of implementation [3,[28][29][30]. There are two types of RSS-based localization methods: (a) propagation-model-based and (b) fingerprint-based [49]. The propagation model method is less robust in general because it requires prior knowledge of the locations of the Wi-Fi access points, and the RSS distributions can deviate dramatically in complex indoor environments [50]. The fingerprint-based approach [28][29][30][51][52][53] is, however, relatively simple, robust, and widely used for indoor localization, and is divided into two phases: (a) training and (b) testing. During the training phase, a survey is conducted to collect RSS measurements from predetermined known locations, and a database of fingerprint patterns indexed with their corresponding grid points (GPs) is created. During the testing phase, the learned model is used to deduce the location of the mobile user based on the new measurements received (online phase). Despite RSS's popularity and low cost, multipath diffraction, shadowing, and scattering, which are common in indoor environments, can easily affect the spread of RSS values. Furthermore, to achieve the desired accuracy, RSS fingerprinting requires a large number of labeled samples, which is both labor-intensive and time-consuming.
Several research works have been proposed to deal with the complex nature of indoor environments causing poor indoor positioning performance [54][55][56][57][58][59][60][61]. For instance, some studies have been reported in the literature that use a probabilistic approach to address the dynamic signal fluctuations that are generated within complex indoor environments [56][57][58][59][60][61]. The probability models, however, are only valid or provide reliable results when certain conditions are met by the measurements; otherwise, they are no longer valid or superior.
The assumption of normal distribution of signal values in a wireless indoor environment seemed impractical for various reasons including measurement time differences, the requirement for large instances, hardware devices (even identical mobile devices could follow different patterns), user orientations, inherent heterogeneity, and others [56][57][58][59][60][61]. Various machine learning algorithms were also widely used in the literature to address indoor positioning systems [37][38][39][40][41]. These systems seek the best possible match between the user's fingerprint and a pre-defined set of grid points on the radio map. Fingerprints, however, may be duplicated from the available Wi-Fi Access Points (Wi-Fi APs) and interference, increasing the number of matched patterns with the user's fingerprint (causing irrelevant features). In summary, the following are the two main types of localization algorithms used for RSS fingerprint-based indoor positioning systems: probabilistic and deterministic techniques, and their details are given below.

Deterministic Method
In deterministic technique, the average value of the Wi-Fi received strength signals collected on training phase from each access point (AP) at a particular grid point (GP) is used to create the database fingerprint and mainly we used this single value representing the distribution of an access point to map with the actual RSS measurement to complete the location estimation. In this method, the location of a mobile user (L) is estimated as a convex combination of the K reference locations [62][63][64].L is calculated as the shortest distance between S (g) ij andŜ (g) ij in the n-dimensional space by using the following equation: The norm vector space given in Equation (1) can be rewritten as such that {L 1 , L 2 , . . . , L k } represents the target locations between the corresponding matching fingerprintŜ (g) ij and the new received measurement at testing phase S (g) ij . This deterministic method is widely used in Wi-Fi indoor positioning systems due to its ease of use in both the creation of the fingerprint database and the localization, which is based on the average value of the received signal strengths of Wi-Fi access points at a specific GP. For example, a radio-frequency (RF)-based system based on the K-nearest neighbor (KNN) algorithm was proposed and studied for positioning and tracking users inside buildings, and the experimental results revealed that Radar can estimate user location with a high degree of accuracy [65]. Furthermore, RSS fingerprint-based ML algorithms such as k-nearest neighbor [65][66][67][68], radial basis function [69], weighted k-nearest neighbor [65,66], support vector machine [70], neural networks [71], and others such as kernel direct discriminant analysis and relevance vector regression [72] are being used for indoor positioning systems. Nonetheless, the spread of RSS values measured at a specific grid point is so dynamic in nature due to the inherent heterogeneity of hardware devices and time differences in measurements that there is no guarantee that the signal fluctuations will be represented by a single value for a specific position even with the same device. This would lead us to another approach known as the probabilistic approach, which could achieve better accuracy by taking signal distributions into account [73,74].

Probabilistic Techniques
In the probabilistic method, location L i is estimated by calculating and maximizing the conditional posterior probabilities p(L i |s), i = 1, . . . , K given an observed fingerprint s and a fingerprint database. A Bayesian approach for Wi-Fi localization [56][57][58][59][60][61]63] can be given as: where p(L i |s) is the probability of a mobile user being at a location given that the new received signal strength, p(s|L i ) is the probability of a Wi-Fi received signal strength of being located at location L i and can be calculated using the fingerprint database. The occurrence of each RSS value is used to generate a probability distribution as the likelihood function. Additionally, p(L i ) is the a priori probability calculated or set based on researchers' knowledge of the target's characteristics. p(s) is a normalizing constant that does not depend on the location and is given as p(s|L i )p(L i ). Probabilistic models are valid or would produce reliable results under the specific conditions that the measurements must meet, or their results are no longer valid or better. Furthermore, the signal spreads of the RSS values used for indoor positioning systems appear to be difficult to represent with a single value due to the complex nature of the indoor environment, where distributions of signals at a specific location even with the same device appeared to obey a specific pattern, making it difficult for the probabilistic approach to be efficient for indoor positioning [61]. Furthermore, in [61], the Gaussian Mixture Model (GMM) was reported as the highest mis-locating classifier for a wireless mobile user in an indoor environment. Several attempts have been made in the literature to solve the underlying assumption associated with probabilistic models, including transfer learning (TL) and taking significant instances during the training phase to establish a fairly stable 'signal-signature to a location' relationship, despite its criticism for being a costly and labor-intensive approach. Nonetheless, due to the requirements for a large sample size, these rewards based on a probabilistic approach could be obtained at the expense of computational cost and memory usage.

Problem Formulation and Framework
The Wi-Fi received strength signal fingerprint-based positioning is identified as the most popular indoor positioning technique due to its low cost and ease of implementation, which consists of two phases: (a) training and (b) testing. During the training phase, RSS fingerprints known as radio maps are collected from each WiFi access point at multiple locations within the defined space of grid points (GPs), and a predictive model is trained on the RSS measurements of a grid point possibly from all Wi-Fi access points to characterize the 'signal-location' relationship, and its corresponding index of the location is stored in the database fingerprints (offline phase). This process is repeated for all GPs in order to characterize or store a grid point's signal signature with its corresponding location. The learned model is then applied during the testing phase to infer the mobile user's location based on the new measurement by mapping into the highest likelihood resemblance of signal-signature stored during the training phase time (online phase). To formulate the problem mathematically, let us conceive the general fingerprint-print-based positioning of the indoor environment scenario partitioned into G grid points. Each represent a mobile user's location and each GP is indexed with a label g, (g = 0, . . . , k − 1), M detectable Wi-Fi access points (j) were available, and the ith Wi-Fi signal strength of fingerprints received at the kth gird point of the jth access point is a vector and the fingerprint database can be represented as a matrix F (g) ij : The above matrix can be described explicitly as: F ij ; x g , y g ; i = 1, 2, . . . , n & j = 1, 2, . . . , m and x g , y g is the corresponding coordinate to the associated location of the signal signature. The target instances would be received during the testing phase denoted as F (g) ij ; i = 1, 2, . . . , n t & j = 1, 2, . . . , m t and the offline source domain can be represented as: F ij ; x g , y g ; i = 1, 2, . . . , n s & j = 1, 2, . . . , m s called the labeled source data. n t and n s represent the numbers of measurements for the target and source data instances, respectively. In summary, the major challenges for Wi-Fi RSS-based fingerprint applications for indoor positions can be divided into three categories: (a) The inherent heterogeneity in measurement distribution at specific grid points or the dynamic nature of RSS value distributions (the sources of spreads can be: hardware devices, measurement time differences, user and antenna orientation, multipath channel conditions such as diffraction, shadowing, fading, interference, and others). (b) RSS fingerprints for a grid point may be duplicated from multiple Wi-Fi APs, causing interference of matched patterns with the actual user's fingerprint, or irrelevant features may be generated and recorded in the database, inflating or degrading positioning performance. (c) Spending a significant amount of time during the training phase to establish a fairly stable 'signal-signature to a location' relationship, or their requirement for large sample size is so expensive. In this paper, we proposed a novel approach to solving the underlying problem associated with signal fluctuations by combining both techniques of principal component analysis (PCA) and correlation analysis with the help of heterogeneous transfer learning (HetTL), such that positive knowledge from the training phase could be leveraged, and only the most significant features would be considered to build the model, and we could establish fairly stable 'signal-signature to a location' relationship at the testing phase, respectively. Figure 1 demonstrates the proposed framework of Wi-Fi received signal strength-based fingerprint for the indoor positioning system. sample size is so expensive. In this paper, we proposed a novel approach to solving the underlying problem associated with signal fluctuations by combining both techniques of principal component analysis (PCA) and correlation analysis with the help of heterogeneous transfer learning (HetTL), such that positive knowledge from the training phase could be leveraged, and only the most significant features would be considered to build the model, and we could establish fairly stable 'signal-signature to a location' relationship at the testing phase, respectively. Figure 1 demonstrates the proposed framework of Wi-Fi received signal strength-based fingerprint for the indoor positioning system.

Principal Component Analysis (PCA)
Even though the Wi-Fi received signal strength fingerprint-based technique has been identified as the most popular technique for indoor positioning systems due to the use of existing wireless infrastructure, its applications continue to be challenged. This is primarily because the spread of RSS values measured at a specific grid point is so dynamic, and the inherent heterogeneity of hardware devices, as well as time differences in when measurements were taken, have a significant effect on the distribution of RSS values received at each grid point, such that there is no guarantee that the signal fluctuations will be represented by a single value for a specific position, even when using the same device. To address these issues, we proposed combining PCA and HetTL techniques to create new feature spaces that may extract the most significant features while also enabling positive knowledge transfer for location estimation. Second, the algorithm must be computationally efficient while remaining reasonably priced. As a result, the PCA was used to reduce high-dimensional feature spaces to low-dimensional feature spaces by taking only the most significant features, although discarding information could negatively affect the ability to represent the entire measurements [75,76]. The PCA is an unsupervised learning technique for reducing data dimensionality to find the significant features in a dataset while retaining a large amount of information and increasing data interpretability [75].
The proposed algorithm (feature selection based PCA or Algorithm 1) is a pseudocode used to generate a new fingerprint feature space based on PCA. The input values are the original database fingerprints; after PCA is applied, a new fingerprint feature space with reduced dimension is created. Specifically, the feature-selection-based PCA algorithm applied data pre-processing techniques to reduce the dimension of the RSS measurements of the offline fingerprints database based on their contribution to the positioning performance. In other words, features with a higher explainability variance ratio are the more significant features that discriminate the model positioning performance and should be retained in the model for further analysis. The learned model is then applied during the testing phase after the PCA data preprocessing is performed to infer the mobile user's location based on the received new RSS measurements by mapping into the highest likelihood resemblance of signal-signature stored in the radio map. The feature selection-based PCA algorithm primarily deals with the dimensionality curse of a predefined radio map by linearly combining the features into an uncorrelated space (i.e., eigenspace) using the predefined radio map's training covariance matrix of size F × F . The fingerprint database is then projected into uncorrelated space in the direction of the highest variance ratio and the principal components are chosen based on the highest explainability variance ratio (i.e., eigenvalue) which represents the information context of the principal components. Finally, Algorithm 1 below also includes the detailed steps for the PCA.

7.
From each measurement subtract average value: S 8.
Find the covariance matrix of the measurements: Calculate eigenvectors and eigenvalues of the SS T ;

Heterogeneous Transfer Learning
Suppose the Wi-Fi RSS-based fingerprint positioning was conducted in an indoor environment setting and the fingerprint database is given as: F ij ; x g , y g ; i = 1, 2, . . . , n s & j = 1, 2, . . . , m s also known as the labeled source data. Whereas the (x k , y k ) denotes the corresponding coordinate to the associated location of the signal signature and the target instances would be received during the testing phase as Similarly, n t and n s represent the numbers of instances for the target and source domains, respectively. We have noted that there are two major challenges that need to be considered in heterogenous transfer learning: (a) The Wi-Fi signal strength received at a grid point from multiple APs assumed to be independent such that the Wi-Fi signals transmitted from different APs are transmitted independently and do not interfere with each other. Nonetheless, the RSS values of a grid point may be duplicated possibly from multiple Wi-Fi APs and creates interference of matched patterns with the actual user's fingerprint or irrelevant features might be generated and recorded in the database which could reduce our positioning accuracy. (b) Dimensions of the source feature spaces and target domains may also be different i.e., R F s = R F t . In this paper, the noise created due to the duplicated fingerprints and interdependence of APs are handled by using both the PCA (as in Algorithm 1) and correlation coefficient techniques which could decrease the dependency of certain Wi-Fi APs or extract the most significant fingerprint features that could be used to build the homogeneous feature spaces. The RSS values of the source domain can be given as: . . , m s and the target domain is represented as: F (g) . . , m t . The Wi-Fi signals received at a grid point from multiple independent APs can be defined where cov S j , S j+1 denotes the covariance of S j and S j+1 which is given as follows: s 2 j and s 2 j+1 represent the variances of the measurements of S j and S j+1 . The pseudo code for positioning using heterogenous TL with the hybrid-based feature selection approach is provided in Algorithm 2. Algorithm 2. The proposed heterogenous transfer learning for Wi-Fi indoor positioning hybrid-based 1. Input: Source fingerprint F ij (g) = F ij (g); x g , y g = ss (g) ij ; i = 1, 2, . . . , n and j = 1, 2, . . . , m 2. Input: Received RSS testing data F (g) T ; transfer weight β i j ; predicted labels x g , y g ; 4. for g = 0 : k − 1 do 5.
Apply PCA as detailed in Algorithm 1 8.
Obtain projection matrix: Obtain the correlation coefficient matrix using Equation (4):

10.
Select uncorrelated predictors or significant predictors for modeling the variance 11.
Compute β i j by using Equation (7) with considering weights of source domains β i j

The Objective Function
Due to the inherent heterogeneity of hardware devices and the time difference between measurements for each grid point, there is no guarantee that the signals at any given position will be represented by a single value even using the same device. Two other challenges must also be addressed in heterogeneous transfer learning: it assumes that the Wi-Fi signals transmitted from different APs are independent so that the Wi-Fi signals transmitted from individual APs do not interfere with each other. Grid points' RSS fingerprint values may be duplicates from multiple APs, which may interfere with matched patterns with the actual fingerprint of a user, or irrelevant features could be generated and recorded in our database, lowering positioning accuracy. Furthermore, the dimensions of the source feature spaces and target domains may be different. The noise caused by duplicated fingerprints and APs interdependence is handled by combining PCA and correlation coefficient techniques, which allow us to extract the most significant fingerprint features; thus, enabling the construction of a homogeneous independent feature vector. As a result, the objective function was minimized over the new feature vector to leverage the most significant features, independent features, and related source knowledge to the target domain. The Wi-Fi signal strength vectors of the source fingerprints can be given . . , m s and the target fingerprints also is given as: F Minkowski is a generalized distance metric between two vectors and defined as: where D 1 and D 2 denotes Manhattan [77,78] and Euclidean distances [62,63,79], respectively. The transfer coefficients (β i j ) constraint is to better minimize the signal measurements of differences between the samples of the source and target domains. The equality constraint mentioned in the objective function would assign higher weights to the most related source samples and the lesser weights would be assigned to the less related source samples. The weights are therefore updated as follows. The new feature vectors were used to minimize the received signal strength difference, such that the transfer coefficients can be estimated as: The Lagrangian multiplier method was used to solve the constrained optimization problem of Equation (7) and we assumed that we obtained a location estimate β at the (t − 1)th iteration, and now we need to estimate the coefficients β (t) i j at the tth iteration. By applying the Lagrangian multiplier method, we obtain: where δ is the Lagrangian multiplier. By letting the partial derivative of the Lagrangian with respect to β i j and δ to be zeros, we obtain: By adding up the first n s terms in Equation (9), we can obtain: Additionally, substituting Equation (10) into Equation (9) gives the estimated transfer coefficients as:

Prediction Evaluation Metrics
In this paper, we compared the performance of the proposed algorithms against different machine learning algorithms taken as baselines and verified through extensive experimentations and simulations using a real-world dataset. The mean absolute error (MAE) is defined as the average of all absolute errors and was used to evaluate the effectiveness of the proposed algorithm and given as: where [x i ,ŷ i ] T and [x i , y i ] T are the predicted location estimate and the true location of the mobile user, respectively. Additionally, n is the total number of samples to be located in the target domain.

Analysis of Indoor Positioning Performance-Based Cramer-Rao Lower Bound (CRLB)
Recall the general RSS fingerprint-based positioning of the indoor environment scenario partitioned into G grid points (GPs) such that G = gp g ; x g , y g g = 0, 1, . . . , k − 1 defines the set of reference points associated to the coordinates representing the mobile user's locations nearest by and each GP is indexed with a label g, and there were M detectable Wi-Fi APs (j). Thus, the ith Wi-Fi signal strength of fingerprints received at the kth grid point of the jth access point is a vector and the fingerprint database from all Wi-Fi APs entirely can be represented as a matrix: F x g , y g is the corresponding coordinate to the associated location of the signal signature. Additionally, the target instances of signal strengths received during the testing phase can be represented as: The main goal of indoor positioning is to improve positioning performance mainly for better location-based services. In line with this, we applied the Cramer-Rao lower bound (CRLB) as it is widely used to evaluate the performance of the indoor positioning algorithm [79] and estimates a lower limit for the variance of any unbiased estimator of an unknown parameter, which is widely used to analyze the localization performance [80][81][82][83]. Thus, the Wi-Fi received signal strengths were used to analyze the lower bound of the location estimation error and it is significantly important to characterize the properties of this lower bound in order to evaluate the impact of different parameters on the accuracy of target positioning. Furthermore, the lower bound analysis can also provide important system design suggestions by revealing error trends with the indoor localization system deployment. Suppose L is an unknown deterministic parameter which is to be estimated from n independent observations (measurements) of s (g) ij , each from a distribution according to some probability density function f s ij and E L denotes the expected value with respect to the density f s ij ; L is twice differentiable and holds certain regularity conditions, then the Fisher information can also be defined as: The efficiency of an unbiased estimatorL measures how close this estimator's variance comes to this lower bound; and efficiency of an estimator is defined as Along with this, consider that the unknown position of the mobile user and the position of the jth access point are denoted as L = (x , y ) T and L j = x j , y j T , respectively, then the distance between the mobile user and the jth Wi-Fi access point can be calculated . Now, the Wi-Fi received signal strengths of a grid point from multiple access points is expected to have fluctuating signal values due to the complex nature of indoor environment possibly affected by the path loss, shadowing, and multipath effect propagation. Hence, the distribution of those measurements could be affected by a random heterogeneity over time such that it is paramount to know how to estimate the random effect of that variable on the overall process of the system modeling. In this case, we adopted the assumption that the random error or the noise could be introduced due to the random phenomenon following a normal distribution with mean zero and variance σ 2 and hence the RSS values can be modeled as: where Additionally, the pdf (probability density function) of the measurements SS (k) ij is given by: In RSS fingerprint-based indoor localization, we assume that the Wi-Fi received signal strengths from multiple APs at a reference point are independent and identically distributed such that the joint likelihood function for the measurements can be defined as: By exponential property, Equation (18) can be rewritten as: Take ln to get the log-likelihood function of the above Equation (19): Apply the first partial derivative w.r.t L: Take the second partial derivative of equation: Additionally, one can see that the final result does not depend on the vector of observations S (g) ij and the CRLB of the estimate is given as: We proved that the variance of the location estimator L attained the lower boundary variance (CRLB) and this indicates our estimator is also the MVU (Minimum Variance Unbiased) estimator as no unbiased estimator could do better and the Fisher information matrix (FIM) [82,83] is given by If l(x; L) is twice differentiable and holds certain regularity conditions, then the Fisher information can also be rewritten as: Therefore, the CRLB of MSE of the L can be calculated as From Equations (19), (24), and (25), we see that: On the other hand, the CRLB analysis can be restated using the concept of mean square error (MSE). Assume L is an unknown deterministic parameter which is to be estimated from n independent observations (measurements) of ss (g) ij , each from a distribution according to some probability density function f ss (g) ij ; L . The MSE of any estimatorL of L is then defined as: From properties of expectations of random variables, the last term of Equation (33) would give zero. Additionally, the term var L represents the variance of the estimator L. Finally, as in Equation (33) of the unknown mobile user's coordinates L = (x , y ), and the covariance matrix of the estimator can be given as: where E x − x (ŷ − y ) is the covariance betweenx andŷ which can be represented as cov x i ,ŷ i and since the covariance is commutative it is equal with cov ŷ i ,x i . The term J(L) refers to the Fisher information matrix (FIM) and Equation (28)  conditioned on L, then the score function is defined as the gradient of its log-likelihood function such that: Additionally, the FIM can be defined as the variance of this score function λ(L): One can also observe that From the above derivation, we noted that an interesting relationship for the FIM above and the signal model we considered in Equation (16) were revealed as a reciprocal multiplicative effect to the Gaussian random noise that might be introduced due to the heterogeneity nature of complex indoor environment settings and can be described as:

Experimental Results and Discussion
In this section, we present a real-world experiment to evaluate the proposed algorithms. Experimental settings and dataset are presented first, followed by an analysis of the overall performance of the classifiers.

Experimental Settings Dataset
An experiment with an area of 1460 m 2 and 175 reference points (RPs) equidistant from the adjacent next grid point was carried at the University of Electronic Science and Technology of China (UESTC) on the 21st floor of the innovation building as shown on Figure 2. There were nine Wi-Fi access points (Wi-Fi APs) available to collect the received signal strength from a mobile device, allowing for the creation of the fingerprint database, and they were sparsely deployed to ensure that at least three Wi-Fi APs could be detected at each grid point. The layout and experimental settings of the generated real office experiment scenario for the dataset are shown below.
from the adjacent next grid point was carried at the University of Electronic Science and Technology of China (UESTC) on the 21st floor of the innovation building as shown on Figure 2. There were nine Wi-Fi access points (Wi-Fi APs) available to collect the received signal strength from a mobile device, allowing for the creation of the fingerprint database, and they were sparsely deployed to ensure that at least three Wi-Fi APs could be detected at each grid point. The layout and experimental settings of the generated real office experiment scenario for the dataset are shown below.    Figure 3 depicts the distribution of the new feature vector along with the labels for both training and testing Wi-Fi RSS values for the dataset. Despite a slight mix-up of fingerprints across the labels due to the heterogeneous nature of signal power attenuation over distance, the RSS measurements received from multiple base stations at a grid point appear to have shown a specific distribution. As shown in Figure 3, the two main principal components referred to as the 'base model' are considered to be the simplest model and enable visualization of the effect of variations for mobile user positioning by retaining only the most significant features. A 'base model' is one with only two major principal components that represent the variational distributions of the target's prediction and account for approximately 56% of the model's total variance explainability. From a technical standpoint, this has two major advantages: computational cost and model simplicity may improve as feature space dimensions are reduced. Furthermore, larger feature spaces would necessitate the deployment of more base stations, which would be prohibitively expensive in practice, and redundant features could introduce systematic errors into the entire positioning system modeling process. Because there is a tradeoff over accuracy due to the removal of some features or a reduction in dimension of feature spaces, there are some practical situations where the major principal components explaining only 56% of the model's prediction are a better choice. It is also pertinent to note that training and testing data may have different distributions.   Figure 4a,b show the distributions of the first and second principal components, which account for approximately 56% and 55% of the total variations that could account for both instances of training and testing datasets, respectively. Even though the two distributions of the base model appear to be different, the RSS measurements received at a reference point from various Wi-Fi access points appear to be independent, and the distribution of each label can be characterized by using the base model, which has only two principal components. This could also demonstrate that keeping the most important predictors could help the indoor positioning system work more efficiently and cost-effectively. A 'base model' is one with only two major principal components that represent the variational distributions of the target's prediction and account for approximately 56% of the model's total variance explainability. From a technical standpoint, this has two major advantages: computational cost and model simplicity may improve as feature space dimensions are reduced. Furthermore, larger feature spaces would necessitate the deployment of more base stations, which would be prohibitively expensive in practice, and redundant features could introduce systematic errors into the entire positioning system modeling process. Because there is a tradeoff over accuracy due to the removal of some features or a reduction in dimension of feature spaces, there are some practical situations where the major principal components explaining only 56% of the model's prediction are a better choice. It is also pertinent to note that training and testing data may have different distributions. Figure 4 depicts the distribution of principal components accounting for 95 percent explained variance ratio (EVR) for the dataset, along with their corresponding labels. Figure 4a,b show the distributions of the first and second principal components, which account for approximately 56% and 55% of the total variations that could account for both instances of training and testing datasets, respectively. Even though the two distributions of the base model appear to be different, the RSS measurements received at a reference point from various Wi-Fi access points appear to be independent, and the distribution of each label can be characterized by using the base model, which has only two principal components. This could also demonstrate that keeping the most important predictors could help the indoor positioning system work more efficiently and cost-effectively. Nonetheless, with only two principal components, this basic model could perform mobile user positioning at the expense of some information discarded based on their contributions.

Distribution of the RSS Measurements
Consider Figure 4c,d to better understand the effect of each principal component's contribution, which depicts the distributions of the second and third principal components, causing the RSS values to become muddled. This means that the second and third principal components (the second and third) account for approximately 26% and 30% of the total variance explainability ratio for both instances of training and testing datasets, respectively, and appear to be poor fitting in estimating the location of the mobile user when compared to the base model, which accounts for 56% of both domains. This trend is visible in Figure 4e,f, which depicts the distributions of the third and fourth principal components (21 percent and 18 percent for training and testing datasets, respectively), indicating that the RSS values received from multiple base stations at a grid point failed to characterize the 'signal to location' relationship or that there is a high tendency of mix-up among the clusters of RSS measurements from a single reference point. The less explained variance ratio by a principal component, the less to characterize the 'signal-location' relationship, and the algorithms may fail to estimate the mobile user's position or degrade the system model's performance.
The effect of different feature dimension possibilities on the amount of variance ratio explainability by algorithms for mobile user positioning estimation is shown in Table 1. To estimate the location of a mobile user with a 95 percent explained variance ratio, a classifier, for example, requires seven principal components. A classifier, on the other hand, can localize a mobile user using only two main principal components, which account for approximately 56% of the model's total variations (base model). Various scenarios of algorithms or model-building processes can be selected based on various attributes such as cost, simplicity, and interpretability, as well as algorithm accuracy. The fewer principal components used in a model indicate the lower the computational cost and memory usage. It also includes a tip about how simpler models require fewer feature spaces to properly visualize the model. In real-life scenarios, a model with higher dimensional feature spaces or principal components would be more difficult to understand and interpret. Technically, the more complex the model, the more vulnerable it is to model overfitting as it attempts to represent the highest variance explained ratio. Table 2 describes the number of measurements collected from each grid point and demonstrates that the label distribution for the dataset is balanced. As a result, feature scaling techniques were not taken into account in order to avoid the dominance effect of the cluster's higher label occurrence. Otherwise, the cluster's larger features would dominate the others. To use the principal component analysis, however, features must be standardized, so all values in the dataset are set to be standardized. components (21 percent and 18 percent for training and testing datasets, respectively), indicating that the RSS values received from multiple base stations at a grid point failed to characterize the 'signal to location' relationship or that there is a high tendency of mixup among the clusters of RSS measurements from a single reference point. The less explained variance ratio by a principal component, the less to characterize the 'signal-location' relationship, and the algorithms may fail to estimate the mobile user's position or degrade the system model's performance.  The effect of different feature dimension possibilities on the amount of variance ratio explainability by algorithms for mobile user positioning estimation is shown in Table 1. To estimate the location of a mobile user with a 95 percent explained variance ratio, a classifier, for example, requires seven principal components. A classifier, on the other hand, can localize a mobile user using only two main principal components, which account for approximately 56% of the model's total variations (base model). Various scenarios of algorithms or model-building processes can be selected based on various attributes such as cost, simplicity, and interpretability, as well as algorithm accuracy. The fewer principal components used in a model indicate the lower the computational cost and memory usage. It also includes a tip about how simpler models require fewer feature spaces to properly visualize the model. In real-life scenarios, a model with higher dimensional feature spaces or principal components would be more difficult to understand and interpret. Technically, the more complex the model, the more vulnerable it is to model overfitting as it attempts to represent the highest variance explained ratio.    Figure 5 depicts a scree plot of the dataset's 95 percent and 90 percent variance explained ratios and the amount of variance ratio explainability by the principal components for mobile user positioning accuracy. The y-and x-axes represent the eigenvalues and the number of principal components, respectively. For both instances of training and testing datasets, the scree plot shows a sharp decrease from principal component 1 to principal component 2, and then a slow decremental value or slope from principal component 2 through the last principal components. Furthermore, the general rule of thumb for interpreting the scree plot is to keep the number of principal components above the scree or in areas where the plot does not drop significantly. As a result, the scree plot suggests that we keep only one principal component, which accounts for roughly 47 percent of total model variations. This confirms the significance of principal component one as a predictor of location estimation accuracy, which accounts for approximately 47 percent of the variance explained ratio. Moreover, this has a direct impact on implementation costs, as the deployment of more Wi-Fi access points with less contribution may be due to their receiver's antenna orientation or other features that are inefficient for the positioning task.     Figure 6a,b further elucidates that the distributions of the sixth and seventh principal components are such that the RSS values are highly mixed-up among the different labels of mobile users in both instances of training and testing phases, or there is no clear demarcation between the clusters of fingerprints generated at each reference point, causing the modeling system to be highly fluctuated due to possible fingerprint duplications at a grid point. This indicates that the two principal components (the sixth and seventh) account for about 9 percent and 11 percent of the total variance explainability ratio for both instances of training and testing datasets, respectively, and appear to be very poor fitting in estimating the location of the mobile user when compared to the base model, which accounts for 56 percent of both domains. The lesser the amount of explained variance ratio by a principal component, the less likely the algorithms are to establish the 'signal to location' relationship, and the algorithms may fail to estimate the mobile user's position or degrade the system model's performance. for about 9 percent and 11 percent of the total variance explainability ratio for both instances of training and testing datasets, respectively, and appear to be very poor fitting in estimating the location of the mobile user when compared to the base model, which accounts for 56 percent of both domains. The lesser the amount of explained variance ratio by a principal component, the less likely the algorithms are to establish the 'signal to location' relationship, and the algorithms may fail to estimate the mobile user's position or degrade the system model's performance. Figure 6. Distribution of principal components accounting for 20% EVRs with their corresponding labels for the dataset. Table 3 depicts the effect of various feature dimension options on the amount of variance ratio explainability by classifiers for mobile user positioning accuracy, taking into account the nature of distributions of instances from both training and testing data points separately. Seven principal components explained 95 percent of the total variations of the model in the training dataset, whereas eight principal components are required to fit the model for testing data points with the same size of variations. Similarly, the 90 percent explained variance ratio required six principal components for training phases and seven principal components for testing phases. The difference in principal component requirements for training and testing datasets appears insignificant for small-sized feature spaces. Consideration of a different number of principal components for the same amount of explained variance ratio for both training and testing datasets, on the other hand, can result in a significant difference for large real-world datasets with potentially large feature spaces, particularly in heterogeneous transfer learning scenarios. This effect of variation in the number of principal components across both domains of training and testing datasets could be demonstrated more clearly for large datasets with large feature spaces. Table 3. Effect of feature dimensions on the variance account of the predictive model for both instances of training and testing dataset.

#PCA
Explained Variance Ratio (EVR) Training Dataset Testing Dataset Figure 6. Distribution of principal components accounting for 20% EVRs with their corresponding labels for the dataset. Table 3 depicts the effect of various feature dimension options on the amount of variance ratio explainability by classifiers for mobile user positioning accuracy, taking into account the nature of distributions of instances from both training and testing data points separately. Seven principal components explained 95 percent of the total variations of the model in the training dataset, whereas eight principal components are required to fit the model for testing data points with the same size of variations. Similarly, the 90 percent explained variance ratio required six principal components for training phases and seven principal components for testing phases. The difference in principal component requirements for training and testing datasets appears insignificant for small-sized feature spaces. Consideration of a different number of principal components for the same amount of explained variance ratio for both training and testing datasets, on the other hand, can result in a significant difference for large real-world datasets with potentially large feature spaces, particularly in heterogeneous transfer learning scenarios. This effect of variation in the number of principal components across both domains of training and testing datasets could be demonstrated more clearly for large datasets with large feature spaces.  Table 4 describes the effect of each principal component on the predictive model's explained variance account for dataset A of both training and testing datasets. As a result, principal component 1 (PC1) contributed the highest explained variance ratio (EVR) for mobile user positioning estimation, followed by principal component 2 (PC2), which contributed the second highest variance account for both training and testing datasets.  According to Figure 7, the contribution of each principal component to model variation for indoor positioning of mobile users applied to both training and testing datasets of A decreased from PC1 to PC8 or PC1 constituted the most information for estimating the location of the mobile users. We discovered that principal component 1 (PC1) contributed the highest explained variance ratio (EVR) for mobile user positioning estimation, revealing 42.71 percent and 35.75 percent for training and testing datapoints, respectively. This implies that a classifier can be used to estimate location with smaller feature space dimensions if the most significant variations are retained, despite the fact that some possible variations are excluded from the model. Figure 8 shows that the contribution of the number of principal components to model variation for indoor positioning of mobile users applied to the training dataset of A appears to be sharply increasing from the base model to PC7 and sharply decreasing from PC6 to PC4. The base model, which consists of only two principal components, achieved approximately 56% of the explained variance ratio, implying that reducing feature spaces would make our model easier to visualize and understand the analysis. On the other hand, the benefits of principal component analysis, such as simplicity, interpretability, and minimizing computational cost, could not be retained at no cost. of A decreased from PC1 to PC8 or PC1 constituted the most information for estimating the location of the mobile users. We discovered that principal component 1 (PC1) contributed the highest explained variance ratio (EVR) for mobile user positioning estimation, revealing 42.71 percent and 35.75 percent for training and testing datapoints, respectively. This implies that a classifier can be used to estimate location with smaller feature space dimensions if the most significant variations are retained, despite the fact that some possible variations are excluded from the model.   Figure 8 shows that the contribution of the number of principal components to model variation for indoor positioning of mobile users applied to the training dataset of A appears to be sharply increasing from the base model to PC7 and sharply decreasing from PC6 to PC4. The base model, which consists of only two principal components, achieved approximately 56% of the explained variance ratio, implying that reducing feature spaces would make our model easier to visualize and understand the analysis. On the other hand, the benefits of principal component analysis, such as simplicity, interpretability, and minimizing computational cost, could not be retained at no cost.  Table 5 depicts the application of feature correlation analysis to investigate the presence of multicollinearity, and it can be seen that three predictors appear to have slightly higher correlation values when compared to the overall correlational trend values, indicating that those three values should be investigated further. Furthermore, the overall distributions of those features have mostly similar RSS values, indicating that no signal was received, as indicated by the value −90. As a result, we decided to remove those three features and rebuild our classifier algorithm to see if positioning performance would suffer. Furthermore, the principal component analysis clearly demonstrates that six principal components can explain 90% of the total variations in the model, which is consistent with feature correlational analysis, which demonstrates that six features are independent. This is further supported by the fact that the Wi-Fi received signal strengths at a grid point from multiple access points should be independent; otherwise, the higher correlated samples in the analysis would degrade our positioning system's accuracy and cost as we deploy more Wi-Fi access points that could no longer bring any unique features in modeling the variance of the positioning system. In other words, redundant features would have a negative impact on positioning performance, so we decided to exclude from the analysis those features that are either redundant or irrelevant, as well as highly correlated features. In our case, features (X4, X6, and X8) were excluded from the analysis due to their higher correlations with other features.   Table 5 depicts the application of feature correlation analysis to investigate the presence of multicollinearity, and it can be seen that three predictors appear to have slightly higher correlation values when compared to the overall correlational trend values, indicating that those three values should be investigated further. Furthermore, the overall distributions of those features have mostly similar RSS values, indicating that no signal was received, as indicated by the value −90. As a result, we decided to remove those three features and rebuild our classifier algorithm to see if positioning performance would suffer. Furthermore, the principal component analysis clearly demonstrates that six principal components can explain 90% of the total variations in the model, which is consistent with feature correlational analysis, which demonstrates that six features are independent. This is further supported by the fact that the Wi-Fi received signal strengths at a grid point from multiple access points should be independent; otherwise, the higher correlated samples in the analysis would degrade our positioning system's accuracy and cost as we deploy more Wi-Fi access points that could no longer bring any unique features in modeling the variance of the positioning system. In other words, redundant features would have a negative impact on positioning performance, so we decided to exclude from the analysis those features that are either redundant or irrelevant, as well as highly correlated features. In our case, features (X4, X6, and X8) were excluded from the analysis due to their higher correlations with other features.

Comparative Analysis of Methods
In this section, the performance of the proposed algorithms for positioning estimation using Wi-Fi RSS-based fingerprints of indoor positioning was compared to the most popular algorithms in the field of prediction tasks and machine learning, in general, using a realworld experimental dataset. Table 6 also describes how several machine learning algorithms were applied to the dataset to characterize the effect of different principal components (PCs) on the model's target prediction. Five different scenarios were considered to see if the amount of information used for location estimation could change depending on the contribution of each principal component. The scenarios correspond to the percentages of variations explained by the model or the explained variance ratios for the dataset of 80%, 85%, 90%, 95%, and 56%, with the corresponding number of principal components of 4, 5, 6, 7, and 2, respectively. Furthermore, correlation analysis was used to select features based on their impact on positioning performance. To train and estimate the position of a mobile user given the received signal strengths from multiple Wi-Fi access points, an extensive experimentation-based real office scenario of indoor environment settings for the dataset was performed, and the positioning performance of the different classifiers was evaluated (APs). In this paper, we proposed two feature selection algorithms based on different scenarios: (a) the proposed algorithm relies solely on principal component analysis (PCA) for feature selection, and (b) the hybrid-based approach considers both PCA and correlation effect analysis. Furthermore, for their prediction tasks, the proposed algorithms were compared to the most popular machine learning algorithms, such as decision tree (DT), K nearest-neighbor (KNN), support vector machine (SVC), logistic regression (LR), random forest (RF), and neural network (MLP). To select the most significant predictors and avoid any negative knowledge transfer, data preprocessing was performed, and irrelevant feature spaces were removed from further analysis based on the two criteria listed above in (a) and (b). The original number of feature spaces for the dataset was 9. However, all feature spaces could not be considered for positioning estimation based on the scenarios presented above for three reasons: (a) some features may be irrelevant, resulting in duplicate fingerprints or pattern fingerprint mismatches, (b) large feature spaces necessitate a massive deployment of Wi-Fi access points, which is prohibitively expensive, and (c) dealing with higher dimension features incur significant computational and memory costs. As a result, we proposed the above-mentioned two scenarios of feature selection mechanisms for use in positioning estimation or estimating a mobile user's location. Table 7, on the other hand, shows the use of several machine learning algorithms on the dataset to characterize the effect of various principal components (PCs) on the model's target prediction using transfer learning. Similarly, the above five scenarios were considered to see if the amount of information used for location estimation could vary depending on the contribution of each principal component after transfer learning was applied. As shown in Table 7, the combined use of principal component analysis and transfer learning significantly improved the positioning performance of the classifiers. Base Model-> represents a classifier having two principal components accounts for 57% of the total proportion of explained variational ratio (EVR), TL -> transfer learning. Figure 9 shows that the proposed algorithm-based PCA (before transfer learning) was discovered to be the best algorithm with the lowest mean absolute error for the dataset with different principal components, accounting for 80%, 85%, 90%, 95%, and 56% variance explainability of positioning estimation. A classifier, on the other hand, can localize a mobile user using only two principal components, which account for roughly 56% of the model's total variations (base model). Thus, various scenarios of algorithms or model building processes can be selected based on various attributes such as cost, simplicity, and interpretability, as well as algorithm accuracy. The fewer principal components used in a model, the lower the computational cost and memory usage. In real-life scenarios, a model with higher dimensional feature spaces or principal components would be more difficult to understand and interpret. Technically, the more complex the model, the more vulnerable it is to model overfitting as it attempts to constitute the highest variance explained ratio of the new measurements received.
However, Figure 10 depicts that the proposed algorithm-based PCA after applying transfer learning (HetTLIPPC based: heterogeneous transfer learning for indoor positioning based PCA) also reveals a significant positioning performance improvement and one can clearly observe that the proposed algorithm has come out to be the best algorithm with the lowest mean absolute error applied to the dataset with different principal components, accounting for 80%, 85%, 90%, 95%, and 56% variance explainability of positioning estimation. Surprisingly, the base model that constitutes approximately 56% of the variance explainability ratio of the model performs as the best in both cases. However, the risk of model overfitting, which is a very common phenomenon, is an imminent threat for indoor positioning due to the dynamic nature of the indoor environment described by NLOS propagation, multipath effects, and random signal noise. Thus, we recommend a model with a 95 percent variance of explainability ratio to represent the most discriminant features possible in a complex indoor environment. variance explainability of positioning estimation. A classifier, on the other hand, can localize a mobile user using only two principal components, which account for roughly 56% of the model's total variations (base model). Thus, various scenarios of algorithms or model building processes can be selected based on various attributes such as cost, simplicity, and interpretability, as well as algorithm accuracy. The fewer principal components used in a model, the lower the computational cost and memory usage. In reallife scenarios, a model with higher dimensional feature spaces or principal components would be more difficult to understand and interpret. Technically, the more complex the model, the more vulnerable it is to model overfitting as it attempts to constitute the highest variance explained ratio of the new measurements received.  However, Figure 10 depicts that the proposed algorithm-based PCA after applying transfer learning (HetTLIPPC based: heterogeneous transfer learning for indoor positioning based PCA) also reveals a significant positioning performance improvement and one can clearly observe that the proposed algorithm has come out to be the best algorithm with the lowest mean absolute error applied to the dataset with different principal components, accounting for 80%, 85%, 90%, 95%, and 56% variance explainability of positioning estimation. Surprisingly, the base model that constitutes approximately 56% of the variance explainability ratio of the model performs as the best in both cases. However, the risk of model overfitting, which is a very common phenomenon, is an imminent threat for indoor positioning due to the dynamic nature of the indoor environment described by NLOS propagation, multipath effects, and random signal noise. Thus, we recommend a model with a 95 percent variance of explainability ratio to represent the most discriminant features possible in a complex indoor environment. The effect of a heterogeneous transfer-learning-based hybrid feature selection approach that incorporates both PCA and correlation effect analysis is shown in Table 8. As shown in Table 5, the principal component analysis clearly shows that six principal components can explain 90% of the total variations in the model, which is consistent with the feature correlational analysis, which shows that six features are independent. The positioning performance of the classifiers clearly improved after applying transfer learning to the hybrid feature selection approach, as shown in Table 7.  Figure 10. Effect of number of PCs for positioning with various explained variance ratios for the dataset after transfer learning.
The effect of a heterogeneous transfer-learning-based hybrid feature selection approach that incorporates both PCA and correlation effect analysis is shown in Table 8. As shown in Table 5, the principal component analysis clearly shows that six principal components can explain 90% of the total variations in the model, which is consistent with the feature correlational analysis, which shows that six features are independent. The positioning performance of the classifiers clearly improved after applying transfer learning to the hybrid feature selection approach, as shown in Table 7.

Conclusions
In this paper, we proposed two novel algorithms for feature selection purposes applied to fingerprint-based IP estimation problems to improve positioning performance in the target domain through heterogeneous TL-based hybrid feature selection. The two scenarios of the algorithms are (a) principal component analysis-based feature selection and (b) the hybrid-based approach for feature selection considering both the combination of PCA and correlation effect analysis. To this end, we created a new feature vector on which the model was trained by retaining only the most significant predictors, and we also determined the most efficient feature dimensions using the hybrid-based feature selection approach. We discovered that not all available feature spaces could be considered for positioning estimation for at least four reasons: (a) some features may be irrelevant and result in fingerprint duplication or pattern fingerprint mismatch; (b) large feature spaces necessitate a massive deployment of Wi-Fi access points, which is prohibitively expensive from a cost standpoint; (c) from a technical standpoint, computational cost and memory usage are quite expensive when dealing with higher dimensions of features; and (d) model over-fitting is also a serious issue for higher dimensions of feature spaces.
The hybrid-based proposed algorithm was found to be the best algorithm with the lowest mean absolute error, confirming that our proposed algorithm improved positioning estimation accuracy through heterogeneous transfer learning. Moreover, not only was positive knowledge transferred from the source domain to the target domain, but the computational cost and memory usage were effectively reduced by retaining the most significant predictors using both principal component and correlation analysis techniques. Analysis of the lower bound for the position estimation error based CRLB, on the other hand, shows that the variance of the random noise (σ 2 ξ ), the distance between the mobile user and Wi-Fi access points (d 0,g ) or the location of Wi-Fi APs and mobile user, the number of Wi-Fi access points, and the signal propagation parameters can affect position estimation using Wi-Fi RSS-based fingerprinting of indoor positioning. Although the number of Wi-Fi APs can affect the lower bound location estimation error, we saw that identifying the most significant features could help to improve the target's positioning accuracy. This could be accomplished by combining PCA methods and correlation analysis, both of which are commonly used in prediction tasks to exploit the most important features (or measurements with a high explained variation ratio) and avoid measurement interdependence, respectively.