Gaussian Process Regression Plus Method for Localization Reliability Improvement

Location data are among the most widely used context data in context-aware and ubiquitous computing applications. Many systems with distinct deployment costs and positioning accuracies have been developed over the past decade for indoor positioning. The most useful method is focused on the received signal strength and provides a set of signal transmission access points. However, compiling a manual measuring Received Signal Strength (RSS) fingerprint database involves high costs and thus is impractical in an online prediction environment. The system used in this study relied on the Gaussian process method, which is a nonparametric model that can be characterized completely by using the mean function and the covariance matrix. In addition, the Naive Bayes method was used to verify and simplify the computation of precise predictions. The authors conducted several experiments on simulated and real environments at Tianjin University. The experiments examined distinct data size, different kernels, and accuracy. The results showed that the proposed method not only can retain positioning accuracy but also can save computation time in location predictions.


Introduction
Accurate, reliable, and real-time indoor positioning and position-based protocols and services are required in future social applications. Through a mobile device, a positioning system can help to determine its position and makes the position of the device available for position-based services such as navigation and tracking or monitoring. The location information of devices or users could considerably improve the performance of wireless network for network planning, network adaptation, and load balancing. Generally, the indoor localization problem is resolved using standard, low-cost, and already deployed infrastructure such as the strength database of received signals [1,2]. Instead of spending resources in deploying dedicated infrastructure and collecting data procedure, the purpose of indoor positioning study is to design and implement data fusion methods and algorithms using existing infrastructure.
Generally, received signal strength (RSS)-based location fingerprinting is based on the principle that each position has a unique set of signal values. When a system boots up, mobile devices receive the unique RSS value of the system, and they search the fingerprinting database and identify the entry that is most similar to the system unique RSS value as the estimated location. The main problem of a typical RSS fingerprinting system is that the real RSS value at any location is easily affected by the object (human or furniture inside the building) and multipath fading effects [3]. In other words, the RSS fingerprint obtained at different periods of time need not match the previous fingerprint stored in the database, leading to incorrect estimation results. Previous researchers have shown that the fingerprinting localization method can achieve meter-level accuracy if the RSS signature map is not outdated. However, because of the dynamic nature of the radio channel and changes in the surrounding environment, the RSS signature maps must be updated multiple times daily. That is, the time and effort required to build the RSS signature map during the offline phase are the major drawbacks of fingerprinting-based localization.
In addition, the challenges in deploying reliable and scalable fingerprinting localization systems and facing user limits are also the barriers to reaching the accuracy required for use in commercial applications. Many studies have prompted in the elimination of the times at gathering RSS signatures. In [4], Haeberlen et al. demonstrated a system that allows for remarkably accurate localization across an entire office building over 12,000 square meters in area; most notably, regarding the scale, they localized a device to one of 510 cells in the building within seconds, yielding a success rate of greater than 95%. Moreover, Bisio et al. designed an indoor localization scheme of targets by using electromagnetic waves; this system was replaced by an offline phase in which the fingerprints in each point of the area of interest were estimated by means of electromagnetic waves [5]. In [6], the authors computed a position by considering the RSS measurements as a part of a random process, exploiting the information present in the acquired signals. With the efforts to build an RSS signature map having been reduced, the median error remains unacceptable for most practical indoor applications [7]. Furthermore, Zheng et al. utilized both the raw RSS values and their relation to construct a new stable and robust fingerprint for indoor position, their results indicated that the RSS variance problem can be solved without any manual calibration [8]. In this study, we developed a probabilistic framework for handling sparse training data in fingerprinting localization. Specifically, the calibration effort and costs of building the RSS signature map were reduced by modeling signal strength by using the advanced Gaussian process (GP), and the troublesome of RSS signature map was enhanced in the online phase.
Supervised learning in the form of regression and classification is an important constituent of statistics and machine learning, either for the analysis of data sets or as a subgoal of a more complex problem [9]. Traditionally, parametric models have been used for this purpose. A possible advantage also exists in the ease of interpretability; however, for complex data sets, simple parametric models may lack expressive power. A GP is a nonparametric model that is characterized completely by its mean function and covariance matrix [10]. A GP depends on several hyper parameters that can be estimated using training measurements. In our study, we used the properties of the conditional probability of a GP. The GP prior distribution was used for regression and predicting the RSS at locations with no prior measurements, and the Naive Bayes algorithm was also used to derive the obtained conditional Gaussian probability. We named the computation process Gaussian Process Plus. A major advantage of using a GP-based predicator is that in addition to the mean estimate of the RSS, an estimation of the variance is also produced, providing an indication of the uncertainty of the estimation [11]. The aforementioned process can be performed offline and generally must be performed only once.
Yiu et al. used Gaussian process regression (GPR) for received signal strength indication (RSSI) prediction in order to solve indoor location problems [12]. Partial measurements were first taken from the area of interest. They then used the Firefly algorithm to train and categorize the prior results derived using GPR. The trained results were used to build a refined fingerprinting database for the entire area of interest. The experiment showed that the GPR model could achieve a satisfactory result in predicting the RSS of an area with no prior measures. However, according to Yiu et al., the ability to expand the GPR model is restricted by the need to rebuild the fingerprinting database; in addition, the flexibility in the computation of the GPR models was limited in their study. Thus, to retain the flexibility of GP and increase the efficiency of indoor positioning, we proposed the Gaussian Process Regression Plus (GPRP) method to solve the aforementioned limitations, and we tested the experiments in both simulated and real environments.
The remainder of this manuscript is organized as follows: Section 2 reviews related studies investigating the properties of RSSI for signal transmission, and the definitions of GP. Section 3 details the GPRP measurement system used in the current study and describes the data analysis method. Section 4 presents the simulated and real experiments and discussion, and Section 5 offers conclusions and recommendations for future research.

RSS Fingerprint
RSS properties, which facilitate location fingerprinting, have been determined by many studies examining indoor positioning systems. This research demonstrates that user orientation could cause a variation of up to 5 dBm in the RSSI level [13,14]. At any location, the different orientations of users and mobile devices with respect to the transmitter could cause the mean RSS value to change. The modeling of the RSS-based location fingerprinting is essential for location determination algorithms; examples of RSS-based location fingerprinting models are the probabilistic approach model and preliminary analytical model [15]. Gaussian or lognormal distributions were used to model the randomness of RSS. For example, [16] summarized in a large-scale measurement that most RSS histograms could be fitted appropriately with Gaussian distributions but that few histograms could be fitted with bimodal Gaussian distributions.
Current signal-based RSS location systems have two problems. First, a considerable manual calibration effort is required to construct a radio map in the offline training phase; second, the positioning accuracy changes with the environmental dynamics. Three dynamic factors observed to change frequently over time in the environment are proposed in [17], including the presence of people, relative humidity, and movement. These factors easily affect the radio signal propagating from access points (APs) to mobile devices and are responsible for changes in positioning accuracy. The results of the experiments conducted in a cafeteria showed that an RSS distribution at a fixed location with a specified beacons was not constant with time ( Figure 1). The RSS values calibrated previously in the radio map may be outdated; this condition degrades positioning accuracy because of the presence of people. Section 4 presents the simulated and real experiments and discussion, and Section 5 offers conclusions and recommendations for future research.

RSS Fingerprint
RSS properties, which facilitate location fingerprinting, have been determined by many studies examining indoor positioning systems. This research demonstrates that user orientation could cause a variation of up to 5 dBm in the RSSI level [13,14]. At any location, the different orientations of users and mobile devices with respect to the transmitter could cause the mean RSS value to change. The modeling of the RSS-based location fingerprinting is essential for location determination algorithms; examples of RSS-based location fingerprinting models are the probabilistic approach model and preliminary analytical model [15]. Gaussian or lognormal distributions were used to model the randomness of RSS. For example, [16] summarized in a large-scale measurement that most RSS histograms could be fitted appropriately with Gaussian distributions but that few histograms could be fitted with bimodal Gaussian distributions.
Current signal-based RSS location systems have two problems. First, a considerable manual calibration effort is required to construct a radio map in the offline training phase; second, the positioning accuracy changes with the environmental dynamics. Three dynamic factors observed to change frequently over time in the environment are proposed in [17], including the presence of people, relative humidity, and movement. These factors easily affect the radio signal propagating from access points (APs) to mobile devices and are responsible for changes in positioning accuracy. The results of the experiments conducted in a cafeteria showed that an RSS distribution at a fixed location with a specified beacons was not constant with time ( Figure 1). The RSS values calibrated previously in the radio map may be outdated; this condition degrades positioning accuracy because of the presence of people. Traditionally, an average RSS has been considered to be lognormally distributed according to a large-scale fading model. Such RSS is generally predictable and follows several standardized path loss models.
The wall attenuation factor (WAF) model is useful for describing the slow-fading phenomenon and attenuation in signal propagation in indoor environments [18]. In this model, the attenuation factor is used to predict signal propagation behavior when walls are the main obstacle. The following equation shows how attenuation influences RSS: Traditionally, an average RSS has been considered to be lognormally distributed according to a large-scale fading model. Such RSS is generally predictable and follows several standardized path loss models.
The wall attenuation factor (WAF) model is useful for describing the slow-fading phenomenon and attenuation in signal propagation in indoor environments [18]. In this model, the attenuation factor is used to predict signal propagation behavior when walls are the main obstacle. The following equation shows how attenuation influences RSS: where n indicates the rate of increase in signal attenuation with the propagation distance, P pd o q is the RSS at a distance of reference point d o , and d is the distance between the transmitter and the receiver. Furthermore, nW is the number of obstacles (walls) between the transmitter and the receiver, and WAF is the attenuation value resulting from the obstacles. In this equation, if nW is greater than a certain constant C, the value of nW is considered to represent the number of walls at which the attenuation factor stops influencing the signal. We can then use the constant value instead of nW. When we select a subset of APs satisfying a certain property in our alternative set of fingerprint definitions, we retrieve the corresponding position from the offline fingerprinting database by using the location estimation algorithm. Generally, several methods are used for determining the nearest neighbor location. For example, we can use the Euclidean distance: Eucdist pS, Rq " c ÿ n i"1 ps´r i q 2 or the Mahalanobis distance: where S is the RSS value for the target location and R is the value closest to S in the fingerprinting database.

Gaussian Process
The GPR is a new machine-learning method based on Bayesian theory and statistical learning theory. It provides a flexible framework for probabilistic regression and is widely used to solve high-dimension, small-sample, or nonlinear regression problems. From the view of the function space, the GP defines a distribution over functions. GPs are the extension of multivariate Gaussian model to the infinite-sized vector of real-valued variables. The GP is fully specified by a mean function and a covariance function such as: where x and x 1 R are random variables [19].
For simplification, the mean function is usually set to zero in the data preprocessing stage. Because the key assumption in GP modeling is that the data can be represented as a sample from a multivariate Gaussian distribution, we can infer that: where T indicates matrix transposition. The function probability follows a Gaussian distribution, and the conditional probability of x˚|x can be computed as: p px˚|x q " N´K˚K´1y, K˚˚K´1K T¯ ( 4) The optimal estimation for x˚is the mean of this distribution x˚" K˚K´1x [20,21], and the uncertainty in the estimation is the variance var px˚q " K˚˚´K˚K´1K T . The aforementioned description indicates that the GPR is suitable for predicting unknown data with limited known data. GPR is also the central idea used in this study.
Some works have used the GP to help generate a fingerprint database in the offline phase [22,23]. Ferris et al. demonstrated how to use the GP to generate likelihoods at locations for which no calibration data were available, and proved that Gaussian regression could be applied successfully to various localization problems [22]. Atia et al. [24]. considered using the GPR technique to bridge the GPS outage; they ultimately obtained an 80% improvement in the position root square mean error (RMSE). Richter et al. analyzed different GPR models for WLAN fingerprinting and provided useful advice regarding GPR [22]. Cho et al. created a GPR-based radio map construction method for surveying data that could obtain high accuracy and availability, though realistic data were rare [25]. These studies have proved that GPR is a viable means of improving positioning accuracy and releasing the labor at collecting fingerprinting data.

Weight-RSS Propagation Model
For estimating and tracking the parameters of RSS propagation model in the indoor environment, many studies employ adaptive Bayesian framework to cope with unpredictable, inter-calibration and fading radio effects. However, these methods cannot drive the acceptable performance due to the complex propagation phenomena in the radio transmission. In [8], Zheng et al. proposed weight-RSS, a calibration-free solution to solve the RSS variance caused by device heterogeneity and complex environmental factors. In Zheng's system, their location can be estimated by matching all the RSS values and the relation from the test device with all the entries of the fingerprint mapping results.
Assume that m APs are deployed in an indoor environment and the physical space of the indoor environment is modeled as a finite space L = { l 1 px 1 , y 1 q , l 2 px 2 , y 2 q , . . . , l n px n , y n qu [8]. RSS values of all APs at the location l i can be represented as follows: . . , r m iā nd the weight-RSS of location l i is defined as: where spr j i q is the index of r j i after R i was sorted as descending order. Accordingly, Zheng et al. assume that the distance from the fingerprint map is D tr (off line), and the distance from a test device is D te (on line). The weighting factor is computed as follows: Then the authors can derive the difference between distance from fingerprint map D tr and from a test device D te is:

System Design
In [12], Yiu et al. used the trained GPR model to estimate the signature map of an area. For optimizing hyper parameters in GPR, they proposed that the signature map be rebuilt; however, inefficient computation restricted the flexibility of the Gaussian model. Thus, in our study, we focused Sensors 2016, 16, 1193 6 of 17 on GPR system improvement and used the Naive Bayes algorithm to reduce the computation complexity of the indoor positioning system, and retain the flexibility of the models of GP.
In Figure 2, we illustrate the proposed system framework and compares it with that of Yiu et al. Firstly, our GPRP method and Yiu's method created a discrete and partial RSS fingerprinting database in the testing area. Accordingly, we all derived and tested several GP models for system computation. The different GP models had distinct distribution functions and hyper parameters. However, Yiu et al. proposed a method based on categorized and GP algorithms, and finished with the rebuilding of the RSS fingerprinting database; by contrast, our GPRP method employed the Naive Bayes method to select all the predicted location from the GP models, and the computation was more easily derived and was less time-consuming. Furthermore, to stress the efficiency of our proposed method, in Figure 2, the solid line indicates the training phase, and the blue dashed line indicates the testing phase. FL i was derived as the final estimated location for each method. The less computation steps are needed in our GPRP method.

The Method of Gaussian Process Regression Plus
In our method, an offline RSS fingerprinting database must be built first. We assume that the testing area has M reference points in locations ( ∈ {1,2, ⋯ , }). In the sampling phase, we collect the RSS value at the rth reference point of APs from 1 to jth times; that is: where: Mo(.) is the mod function, which is used to derive the remainder of the division. , is denoted as the odd moment data at the rth reference point of APs from 1 to jth times; by contrast, is the filtering set of signal strength received at the rth reference point.
When we attempt to predict the unknown position at the input of RSS values, we obtain a series of RSS values * as follows: * = * , , * , , ⋯ , * , Because of the noises in our environment, we can coincide with the regression problem as ( * ) = ( * ) + fort the simplification. The latent function (. ) describes the relationship between the RSS and spatial coordinates. denotes the physical noise; the value would follow the Gaussian distribution.
Thus, the GP is defined as the function of the mean and covariance, which represent the principal characteristic and structure of the model. In addition, the RSS value in location positioning can be derived to the absolute distance; in our study, the authors derived the log-distance path-loss model as the mean function; that is: where is a constant value, * , is the collected online RSS value of unknown position * at jth AP,

The Method of Gaussian Process Regression Plus
In our method, an offline RSS fingerprinting database must be built first. We assume that the testing area has M reference points in locations L r pr P t1, 2,¨¨¨, Muq. In the sampling phase, we collect the RSS value at the rth reference point of N APs from 1 to jth times; that is: is the mod function, which is used to derive the remainder of the division. ap j r,N is denoted as the odd moment data at the rth reference point of N APs from 1 to jth times; by contrast, AP r is the filtering set of signal strength received at the rth reference point.
When we attempt to predict the unknown position at the input of RSS values, we obtain a series of RSS values AP˚as follows: AP˚" tap˚, 1 , ap˚, 2 ,¨¨¨, ap˚, N u Because of the noises in our environment, we can coincide with the regression problem as L pAP˚q " f pAP˚q`ν fort the simplification. The latent function f p.q describes the relationship between the RSS and spatial coordinates. ν denotes the physical noise; the value would follow the Gaussian distribution. Thus, the GP is defined as the function of the mean and covariance, which represent the principal characteristic and structure of the model. In addition, the RSS value in location positioning can be derived to the absolute distance; in our study, the authors derived the log-distance path-loss model as the mean function; that is: where C j is a constant value, AP˚, j is the collected online RSS value of unknown position˚at jth AP, and AP S ,j is the RSS value of jth AP at the source of unknown position˚. α j represents the weight of AP˚, j , which indicates the trustable degree of the AP. ζ j represents the path-loss exponent of jth AP, which is computed as the noise degree of jth AP. In our method, this value follows the Gaussian distribution N`0, σ 2 i˘; σ i is the covariance of noises in our experiment. Accordingly, the covariance function is defined as follows: k pAP˚, AP r q " exp¨ř N j"1 β jˇˇA P˚, j´A P jˇř N j"1 β j‚ (6) where β j is computed as the weight of AP j . To reduce the computation complexity, we derive the mod function; that is, we use AP m r instead of AP r in Equation (6). The objective of our system is to predict the RSS value as f n pAP˚q for all two-dimensional inputs x and y. To achieve this, the system models are defined as: That is, the GP models for the indoor positioning are divided into parts: one part is based on the x and y coordinates of all training locations. Thus, we let X n fi rx 1 , x 2 , . . . , x S s be a set of training samples, where S represents the number of training coordinate data; then: S x " tpx 1 , y n px 1 qq , px 2 , y n px 2 qq , . . . px S , y n px S qqu and: S y " tpx n py 1 q , y 1 q , px n py 2 q , y 2 q , . . . px n py S q , y S qu are the S-dimensional column vectors containing the RSS measurements from all training locations. Accordingly, we define the GP model to predict location as follows: and: f y pAP˚q " ř N j"1 α jˆp C j´ζjˆl ogp||AP˚, j´A P˚, jS ||q`ε j q ř N j"1 α j exp˜ř N j"1 β jˇˇA P˚, j´A P jˇř N j"1 β j¸: y t1, 2,¨¨¨, hu We use the hyper parameter vector θ fi rC, α, ε, βs to optimize our GP model; different parameters would influence the estimated results. In our experiments, we trained hyper parameters by training the data S x and S y . The GPR has many kernels, some of which are listed in Tables 1 and 2. The performance of the GPR was validated using the different sets of mean and covariance kernel functions. The variances were examined in our experiments. Table 1. Summary of several commonly-used mean functions [26].

Mean Function Equation Expression
Zero Table 2. Summary of several commonly-used covariance functions [26].

Covariance Function Equation Expression
Constant Polynomial exp p´rˆ2{ p2lˆ2qq Exponential exp p´r{lq Rational quadratic p1`rˆ2{ p2αlˆ2 qqˆp´αq Figure 3 presents the results from the GPR model of Equations (7) and (8)

Covariance Function Equation Expression
Constant σ Exponential

Naive Bayesian Location Model
Our proposed system derived the positions x and y with and individually, sometimes, and have multiple solutions. To combine these results, our system employed the Naive Bayesian model. Bayes' theorem is a simple mathematical formula used for calculating conditional probabilities. If is a random experiment and , , , ⋯ , are events in , then the following conditions are met:

Naive Bayesian Location Model
Our proposed system derived the positions x and y with S x and S y individually, sometimes, S x and S y have multiple solutions. To combine these results, our system employed the Naive Bayesian model. Bayes' theorem is a simple mathematical formula used for calculating conditional probabilities. If E is a random experiment and B, A 1 , A 2 ,¨¨¨, A n are events in E, then the following conditions are met: (1) P pA i q ą 0, i " 1, 2,¨¨¨, n, (2) Events A 1 , A 2 ,¨¨¨, A n are partitioned by the sample space; their values are independent of each other, (3) P pBq ą 0. Therefore, P pA i |B q " PpA iˆB q PpBq " PpA i qPpB|A i q ř n j"1 PpA i qPpB|A j q , i " 1, 2,¨¨¨, n. This equation can be proved using the conditional probability theorem and total probability theorem [27].
Bayesian Decision Theory is built on Bayesian probabilities, which enable us to assert a prior belief in a data point coming from a certain class [28]. This theory is used as a classifier such as the Naive Bayesian classifier, which can classify rapidly and accurately [29,30]. According to our proposed method, RSS values transmitted from APs were all independent, and our proposed GPR model derived some predicted candidate sets of S x and S y . To reduce the possible sets, the Naive Bayesian positioning method was used to combine and derive the most satisfactory results. The Naive Bayesian positioning method is detailed as follows: (1) The sample space is divided by S x and S y , and events in the sample spaces are represented as L i " px i , y i q, which is the place derived from the proposed GPR method. (2) The prior probability P pL i q is computed as: P pAP˚|L i q " P pAP 1 , AP 2 ,¨¨¨, AP n |L i q " n ÿ j"1 P`AP j |L i˘ ( 9) (3) The posterior probability is computed as: P pL i | AP˚q " P pAP˚|L i q P pAP˚q " P pAP˚|L i q P pL i q ř n j"1 P`AP˚ˇˇL j˘P`Lj˘(

10)
Accordingly, the final estimation location is defined as: and if P pL i | AP˚q is equal to zero, x and y are selected as follows: P pAP˚|x i q " P pAP 1 , AP 1 ,¨¨¨, AP n |x i q " ź n j"1 P`AP j |x i˘ ( 11) and: P px i |AP˚q " P pAP˚|x i q P pAP˚q " P pAP˚|x i q P px i q ř n j"1 P`AP˚ˇˇx j˘P`xj˘(

12)
then: P py i |AP˚q " P pAP˚|y i q P pAP˚q " P pAP˚|y q P py i q ř n j"1 P`AP˚ˇˇy j˘P`yj˘(

13)
(4) The final location and error between the estimate positions are computed as follows: P px i |AP˚q " max pP px i | AP˚qq , P py i |AP˚q " max pP py i | AP˚qq and: P py i |AP˚q " max pP py i | AP˚qq , P px i |AP˚q " max pP px i | AP˚qq We then define the real location as TL i and FL i " `x i , y i˘( , and the RMSE is derived as:

Experiment Environment Initialization
In this section, we conducted experiments in two environments. One involved simulated experiments programmed using Matlab 2015. The other environment was the physical experiment, the area of which included four beacons deployed on the corner of a square area that was 30 mˆ30 m. In the simulation experiment, the signal was smooth; thus, the RSS energy declined gradually from the source; thus, the received energy could be predicted easily according to the distance from the source. However, the physical experiment was performed on the second floor in the 55th office building of Tianjin University. It was approximately a 7 mˆ15 m area (Figure 3). The area was a typical building hall with some pillars in each corner of the area and was relatively open in the center, with people able to walk around freely. Four Bluetooth transmitters were placed in the corner of the area to ensure the signal full coverage of the testing area.
The testing area was divided into 54 locations for area measurements. The locations are marked by black dots in Figure 4. The four Bluetooth transmitters are marked by the red dots. For receiving online RSS data, we employed a Samsung Note 3 as the data collector in the testing area. Each Bluetooth transmitter was designed to transmit 100 signals per second, and our system collected 20 samples in each location; 15 signals were for system training, and five signals were for localization testing. In addition, we applied our GPRP method with the mod function to avoid potential overfitting problems through a single signal measurement.  Figure 4. The four Bluetooth transmitters are marked by the red dots. For receiving online RSS data, we employed a Samsung Note 3 as the data collector in the testing area. Each Bluetooth transmitter was designed to transmit 100 signals per second, and our system collected 20 samples in each location; 15 signals were for system training, and five signals were for localization testing. In addition, we applied our GPRP method with the mod function to avoid potential overfitting problems through a single signal measurement.

The Effect of the Different Training Data Size
For the purpose of reducing the burden of the RSS data collection in the offline phase in this experiment, the authors attempted to find a balance between using limited training data and increasing the localization accuracy in the emulation environment. Figure 5 shows the cumulative distribution function (CDF) of the localization error for different training dataset sizes. We tested distinct 30%, 60%, and 90% data sizes to evaluate the performance of our proposed method. The results showed that larger training datasets achieve more satisfactory results than do smaller training datasets. The average error rates were 2.98, 2.69, and 2.45 m for the 30%, 60%, and 90% size of datasets, respectively ( Figure 5). Comparing to the 2.41 m error rate of whole dataset, we can tell that the

The Effect of the Different Training Data Size
For the purpose of reducing the burden of the RSS data collection in the offline phase in this experiment, the authors attempted to find a balance between using limited training data and increasing the localization accuracy in the emulation environment. Figure 5 shows the cumulative distribution function (CDF) of the localization error for different training dataset sizes. We tested distinct 30%, 60%, and 90% data sizes to evaluate the performance of our proposed method. The results showed that larger training datasets achieve more satisfactory results than do smaller training datasets. The average error rates were 2.98, 2.69, and 2.45 m for the 30%, 60%, and 90% size of datasets, respectively ( Figure 5). Comparing to the 2.41 m error rate of whole dataset, we can tell that the performance is acceptable at the saving almost half training time, but only increasing 0.11 times of the error rate.

Effect on Different GPR Kernel
To test location estimation accuracy with different GPR kernels, 30% of the 54 reference locations in the training area (black dots in Figure 2) were selected in this experiment; resets were used for the testing. The GPR was the kernel we adopted in our proposed GPRP method described in Section 3.1. In addition, we compared the derived results with different GPR kernels, it is the CL kernel, the CL kernel was found to be in Section 3.1 too. The CL kernel is based on the constant mean and linear covariance kernel; Tables 1 and 2 show the equation. The popular reason of using CL kernel is because of the simplified computation. The results show that our proposed GPRP method outperformed the CL kernel ( Figure 6). Table 3 summarizes the overall results, which reveal that the estimation accuracy of the GPRP method was acceptable, especially inside the 4-m area. The average computation times were 42.80 and 43.81 s for the CL and our proposed method, respectively. The accumulative accuracy of our proposed model was much higher than that of the CL method, though the computation time was only increased by approximately 2%.

Effect on Different GPR Kernel
To test location estimation accuracy with different GPR kernels, 30% of the 54 reference locations in the training area (black dots in Figure 2) were selected in this experiment; resets were used for the testing. The GPR was the kernel we adopted in our proposed GPRP method described in Section 3.1. In addition, we compared the derived results with different GPR kernels, it is the CL kernel, the CL kernel was found to be in Section 3.1 too. The CL kernel is based on the constant mean and linear covariance kernel; Tables 1 and 2 show the equation. The popular reason of using CL kernel is because of the simplified computation. The results show that our proposed GPRP method outperformed the CL kernel ( Figure 6). Table 3 summarizes the overall results, which reveal that the estimation accuracy of the GPRP method was acceptable, especially inside the 4-m area. The average computation times were 42.80 and 43.81 s for the CL and our proposed method, respectively. The accumulative accuracy of our proposed model was much higher than that of the CL method, though the computation time was only increased by approximately 2%. of the simplified computation. The results show that our proposed GPRP method outperformed the CL kernel ( Figure 6). Table 3 summarizes the overall results, which reveal that the estimation accuracy of the GPRP method was acceptable, especially inside the 4-m area. The average computation times were 42.80 and 43.81 s for the CL and our proposed method, respectively. The accumulative accuracy of our proposed model was much higher than that of the CL method, though the computation time was only increased by approximately 2%.

Performance Evaluation of GPRP
To check the obstacle effects on the signal transmission in these simulations, we tested the signal transmission on several situations; one situation was tested using the same environment with different obstacles, and the other situation was tested robustly by the different obstacles.

GPRP Robustness on Location Accuracy Testing
According to Equation (1), the WAF is defined as the obstacle factor, which is used to represent the obstacle effects on the RSS. Figure 7a shows the optimal situation with receiving signals; the distribution is displayed as the cyclic shape. In addition, Figure 7b-d indicate strength mapping, with WAF of 1.1, 3.2, and 6.4, respectively These values were indicated as light, medium, and heavy crowds in the environment. According to these pictures, we can observe that the WAF affects the simulation environment.

Performance Evaluation of GPRP
To check the obstacle effects on the signal transmission in these simulations, we tested the signal transmission on several situations; one situation was tested using the same environment with different obstacles, and the other situation was tested robustly by the different obstacles.

GPRP Robustness on Location Accuracy Testing
According to Equation (1), the WAF is defined as the obstacle factor, which is used to represent the obstacle effects on the RSS. Figure 7a shows the optimal situation with receiving signals; the distribution is displayed as the cyclic shape. In addition, Figure 7b-d indicate strength mapping, with WAF of 1.1, 3.2, and 6.4, respectively These values were indicated as light, medium, and heavy crowds in the environment. According to these pictures, we can observe that the WAF affects the simulation environment. To test the robustness, the validation was performed according to the changing environment. Thus, the WAF value was changed on the offline training and online testing. We defined four different environments in Table 4. These four environments were represented by the same offline training and dynamic online testing with light, medium, heavy, and super heavy crowds. Figure 8 shows that the GPRP method could adjust to the dynamic environments in terms of acceptable accuracy. For example, we can observe that the accuracy decrease to less than 10%, even in the light-crowd training to heavy-crowd testing.  To test the robustness, the validation was performed according to the changing environment. Thus, the WAF value was changed on the offline training and online testing. We defined four different environments in Table 4. These four environments were represented by the same offline training and dynamic online testing with light, medium, heavy, and super heavy crowds. Figure 8 shows that the GPRP method could adjust to the dynamic environments in terms of acceptable accuracy. For example, we can observe that the accuracy decrease to less than 10%, even in the light-crowd training to heavy-crowd testing.

Localization Accuracy on Different Environments
In these experiments, we evaluated our system performance in two different environments: simulated and field. In the environment of the field experiment, the location examined was the second floor of the 55th building of Tianjin University. For all the experiments, we adopted 30% data for training and 70% data for testing. Firstly, we compared three methods for the general fingerprinting database: our proposed GPRP, the method of Yiu [12], general fingerprinting (FP), and Weight-RSS by Zheng et al. [8]. The FP and Zheng methods are all based on the RSS location system, which is required to build an RSS mapping database of an entire area. Both the computation time and the database preparation are bottlenecks in the system execution.
Accordingly, we employed our proposed method in the actual environment. The area was divided into 6 × 9 blocks. Every block was approximately 1.4 m long and 1.2 m wide. All the experiments were carried out by using 30% data for training and 70% data for training. About the FP and Zheng method, we need to build the fingerprint database, it took 54 × 10 s = 540 s to accomplish, FP method took another 1 s to predict the online location, and Zheng method took more time (2 s) to predict. Furthermore, Yiu method and our proposed method were selected 0.3 percent samples to build the training database, the computation time was 54 × 0.3 × 10 s = 160 s to construct the basement. However, according to our description in Section 2, Yiu's method cost average 25 s to train the GPR model, 75 s to train parameters in the model, and 46 s to rebuild the fingerprinting database on the average. Overall, Yiu's method would spend about 307 s to finish the prediction. Besides, when our proposed method built the fingerprint database, we only needed 25 s to train the GPRP model, after that, the offline prediction would cost about 125 s to complete this procedure. Overall, our proposed method only cost 205 s to predict the online position on the average. Table 5 showed all the data for these four methods. According to the comparing results in the Table 5, the results reveal that the computation time would be extended with the sampling size of fingerprint database, because the procedure of RSS value re-mapping to the position was changeable. Our proposed method not only can save the computation time of building the RSS fingerprint database, but also can remain the accuracy of online prediction.

Localization Accuracy on Different Environments
In these experiments, we evaluated our system performance in two different environments: simulated and field. In the environment of the field experiment, the location examined was the second floor of the 55th building of Tianjin University. For all the experiments, we adopted 30% data for training and 70% data for testing. Firstly, we compared three methods for the general fingerprinting database: our proposed GPRP, the method of Yiu [12], general fingerprinting (FP), and Weight-RSS by Zheng et al. [8]. The FP and Zheng methods are all based on the RSS location system, which is required to build an RSS mapping database of an entire area. Both the computation time and the database preparation are bottlenecks in the system execution.
Accordingly, we employed our proposed method in the actual environment. The area was divided into 6ˆ9 blocks. Every block was approximately 1.4 m long and 1.2 m wide. All the experiments were carried out by using 30% data for training and 70% data for training. About the FP and Zheng method, we need to build the fingerprint database, it took 54ˆ10 s = 540 s to accomplish, FP method took another 1 s to predict the online location, and Zheng method took more time (2 s) to predict. Furthermore, Yiu method and our proposed method were selected 0.3 percent samples to build the training database, the computation time was 54ˆ0.3ˆ10 s = 160 s to construct the basement. However, according to our description in Section 2, Yiu's method cost average 25 s to train the GPR model, 75 s to train parameters in the model, and 46 s to rebuild the fingerprinting database on the average. Overall, Yiu's method would spend about 307 s to finish the prediction. Besides, when our proposed method built the fingerprint database, we only needed 25 s to train the GPRP model, after that, the offline prediction would cost about 125 s to complete this procedure. Overall, our proposed method only cost 205 s to predict the online position on the average. Table 5 showed all the data for these four methods. According to the comparing results in the Table 5, the results reveal that the computation time would be extended with the sampling size of fingerprint database, because the procedure of RSS value re-mapping to the position was changeable. Our proposed method not only can save the computation time of building the RSS fingerprint database, but also can remain the accuracy of online prediction. In this experiment of physical area, we compared four methods for the general fingerprinting database: our method GPRP, the method of Yiu, the method of Zheng, and fingerprinting (FP). The graph (Figure 9) below showed all the comparison. The average errors were 2.28, 2.34, 3.09 and 2.93 m for our proposed GPRP method, the FP method, the Yiu method, and Zheng method respectively. The combination of graph and table showed that our proposed method can retain accuracy and reduce computation time in the simulated localization system.  In this experiment of physical area, we compared four methods for the general fingerprinting database: our method GPRP, the method of Yiu, the method of Zheng, and fingerprinting (FP). The graph ( Figure 9) below showed all the comparison. The average errors were 2.28, 2.34, 3.09 and 2.93 m for our proposed GPRP method, the FP method, the Yiu method, and Zheng method respectively. The combination of graph and table showed that our proposed method can retain accuracy and reduce computation time in the simulated localization system. In the simulation area, when we compared our method to the FP method, the results showed that the performance of our method was almost the same as that of the FP method ( Figure 10); however, our method required only half the computation time of the FP method. This advantage was based on the saving of comparing process from the lookup table. In addition, the Yiu method had the partial advantage of flexibility, though it had the longest computation time of all three methods. The average errors were 1.82, 1.46, 4.88, 7.2 and 1.6 m for our proposed GPRP method, the FP method, the Yiu method, and Zheng method respectively. The cumulative error rate at 3 m was 0.87, 0.90, 0.34 and 0.85 for the GPRP method, the FP method, the Yiu method and Zheng method, respectively. Overall, the current results still revealed that our proposed method can retain accuracy and reduce computation time in the simulated localization system. In the simulation area, when we compared our method to the FP method, the results showed that the performance of our method was almost the same as that of the FP method ( Figure 10); however, our method required only half the computation time of the FP method. This advantage was based on the saving of comparing process from the lookup table. In addition, the Yiu method had the partial advantage of flexibility, though it had the longest computation time of all three methods. The average errors were 1.82, 1.46, 4.88, 7.2 and 1.6 m for our proposed GPRP method, the FP method, the Yiu method, and Zheng method respectively. The cumulative error rate at 3 m was 0.87, 0.90, 0.34 and 0.85 for the GPRP method, the FP method, the Yiu method and Zheng method, respectively. Overall, the current results still revealed that our proposed method can retain accuracy and reduce computation time in the simulated localization system.

Conclusions
Generally, the main challenge in RSS-based location positioning is the high sensitivity of the technique to environmental changes. Variations in RSS measurement reduce estimation accuracy. In other words, if the radio propagation signal strength were correlated with the distance between the transmitter and the receiver, location determination would be a trivial problem. However, the relationship between these two parameters is dynamic rather than straightforward. Hence, many methods have been proposed for obtaining location predictions when the position of the receiver is changing. One such basic method is using a fingerprinting database, which involves straight computation for RSS signal collection. Both time consumption and searching from the lookup table database are bottlenecks in this method.
In this study, we proposed the GPRP method for adapting the flexibility of the GP and determining the location by using the simplified computation of the Naive Bayes method. Both computation time and efficiency can be improved using this method. In our experiments, we tested our method by examining the 55th building of Tianjin University in both a simulated and physical environment. The experiments considered distinct data size, different kernels, and accuracy testing. The results showed that our proposed method can ensure accuracy and reduce computation time in location prediction.

Conclusions
Generally, the main challenge in RSS-based location positioning is the high sensitivity of the technique to environmental changes. Variations in RSS measurement reduce estimation accuracy. In other words, if the radio propagation signal strength were correlated with the distance between the transmitter and the receiver, location determination would be a trivial problem. However, the relationship between these two parameters is dynamic rather than straightforward. Hence, many methods have been proposed for obtaining location predictions when the position of the receiver is changing. One such basic method is using a fingerprinting database, which involves straight computation for RSS signal collection. Both time consumption and searching from the lookup table database are bottlenecks in this method.
In this study, we proposed the GPRP method for adapting the flexibility of the GP and determining the location by using the simplified computation of the Naive Bayes method. Both computation time and efficiency can be improved using this method. In our experiments, we tested our method by examining the 55th building of Tianjin University in both a simulated and physical environment. The experiments considered distinct data size, different kernels, and accuracy testing. The results showed that our proposed method can ensure accuracy and reduce computation time in location prediction.