Knowledge Preserving OSELM Model for Wi-Fi-Based Indoor Localization

Wi-Fi has shown enormous potential for indoor localization because of its wide utilization and availability. Enabling the use of Wi-Fi for indoor localization necessitates the construction of a fingerprint and the adoption of a learning algorithm. The goal is to enable the use of the fingerprint in training the classifiers for predicting locations. Existing models of machine learning Wi-Fi-based localization are brought from machine learning and modified to accommodate for practical aspects that occur in indoor localization. The performance of these models varies depending on their effectiveness in handling and/or considering specific characteristics and the nature of indoor localization behavior. One common behavior in the indoor navigation of people is its cyclic dynamic nature. To the best of our knowledge, no existing machine learning model for Wi-Fi indoor localization exploits cyclic dynamic behavior for improving localization prediction. This study modifies the widely popular online sequential extreme learning machine (OSELM) to exploit cyclic dynamic behavior for achieving improved localization results. Our new model is called knowledge preserving OSELM (KP-OSELM). Experimental results conducted on the two popular datasets TampereU and UJIndoorLoc conclude that KP-OSELM outperforms benchmark models in terms of accuracy and stability. The last achieved accuracy was 92.74% for TampereU and 72.99% for UJIndoorLoc.


Introduction
In this present era of mobile technologies, a broad range of emerging and innovative applications are adopted to enhance communications. Most of these applications connect individuals digitally and are used in transmitting and receiving data via access to a secure cloud environment or to an internal device [1]. Location data allow for the utilization of various services used by individuals and are important in monitoring and tracking devices. The global positioning system (GPS) is a sensing system used to conduct localization analysis. However, GPS signals transmitted via satellites have

Related Works
In the broader scope, various simple approaches were proposed to solving Wi-Fi based localization using created fingerprint for the site. A good example is k-nearest neighbor or KNN [13] where the KNN algorithm starts by calculating the P-norm of M dimensions RSSI vector. Then, it selects the minimum Euclidian distance of the k-neighbor points. Finally, the location is estimated by calculating the average of the coordinates of the k-nearest neighbor. However, such approach assumes a linear relation between the positions and their corresponding RSS signals which is not applicable in practical scenarios. As a result, researchers have suggested neural network NN based approaches where the knowledge pattern that associates between the various locations in the environment and the measured RSS signals of APs at those locations can be captured through training.
One of the most attractive NN models is ELM. In [12], there was a proposal for a new neural network version known as ELM. In [14], the authors demonstrated that ELMs are able to accomplish minimum training error. Liang et al. [15] proposed an online learning mechanism that one can utilize for ELM. In online sequential learning, one can update a base learner using small quantities of incoming data. By using small chunks at a time to learn, a base learner that has undergone previous training can update its knowledge regarding the new data. To manage the practical side of learning, the development of model transfer learning was done for ELM. In [16], there was a proposal for a new cross-domain network learning framework that is based on ELM. This is called the ELM-based domain adaptation (EDA), EDA makes it possible to learn an ELM classifier and a category transformation with random projection through the minimization of the l 2,1 -norm of learning errors and network output weights simultaneously. Unlabeled target data that is considered a useful knowledge are also incorporated as a fidelity term to ensure stability during cross-domain learning. This way lessens the matching error between base and learned classifiers, such that one can readily incorporate numerous existing classifiers as base classifiers. The weights of the network output weights are transferable and can be analytically determined. It also incorporated a manifold regularization with Laplacian graph to facilitate semi-supervised learning. In [17], there was a proposal for a unified framework called domain adaptation ELM (DAELM). In order to learn a robust classifier, the DAELM leverages a limited amount of labelled data from the target domain for gas recognition and for drift compensation in E-nose systems, without losing the learning ability and computational efficiency of traditional ELM. However, online transfer learning is not a part of this approach, which is needed in a broad range of real-world applications. In [18], there was a proposal for a blind domain adaptation algorithm. This algorithm has no need for target domain samples for training. It utilizes a global nonlinear ELM model from the source domain data in an unsupervised manner. It then uses the global ELM model to learn and initialize class-specific ELM models based on the source domain data. During testing, the reconstructed features from the global ELM model are used to augment the features of the target domain. It then classifies the resulting enriched features using the class-specific ELM models, which use the minimum reconstruction error as their basis. Ref. [19] proposed an on-line approach to quickly adapt a "black box" classifier to a new test dataset without the need to retrain the classifier or investigate the original optimization criterion. In this approach, there is an assumption that the original classifier outputs refer to a continuous number where the class is given by threshold. It classifies the points near the original boundary using a Gaussian process regression scheme. One can utilize this general procedure in the context of a classifier cascade. It can also obtain results better compared to the state-of-the-art results obtained through face detection on a standard dataset.
For indoor localization performed with ELM, various researchers have used OSELM for Wi-Fi localization. OSELM is known to have a fast-learning speed that can lessen the time and manpower that are often related to the offline site survey. OSELM is also equipped with an online sequential learning ability that makes it possible for the proposed localization algorithm to quickly and automatically adapt to the environmental dynamics. It was proven by the authors in [20] that OSELM performs better than the batch ELM [12] for indoor localization. Weighted ELM was also incorporated with signal tendency index to perform Wi-Fi-based localization based on the standardized fingerprint. In [21], there was a proposal for two robust ELMs (RELMs), close-to-mean and small-residual constraints, to address issues of noisy measurement in IPSs. There is a determination of the existence of explicit feature mapping in ELM. It then utilizes second-order cone programming to offer kernelised RELM formulations and random hidden nodes. The methods were utilized for indoor localization through Wi-Fi. It provides better accuracy compared to the basic ELM model. Apart from using ELM in Wi-Fi-based localization, a number of researchers, such as those in [22], also made use of it in personal dead reckoning (PDR)-based localization. During the first stage, they formulated the PDR localization process to serve as an approximation function. They then formulated a sliding window-based scheme to pre-process the gathered inertial sensor data and produce the feature dataset. Lastly, they suggested using the OSELM-based PDR algorithm for managing localization issues of pedestrians. OSELM has the ability to adapt dynamically to the localization environment and reduce the localization errors to a lower scale as a result of its universal approximation capability and extreme learning speed. In [23], the researchers proposed an incremental learning model, using transfer learning; MFA-OSELM. In this study, the authors applied the concept to Wi-Fi navigation and showed good performance improvement in the context of feature adaptability of Wi-Fi positioning system to preserve the knowledge in its neural network. The work conducted in [24,25] describes a novel type of extreme learning machine, using external memory and transfer learning; ITM-OSELM. In this study, the authors applied the concept to Wi-Fi localization and showed good performance improvement in the context of cyclic dynamic and feature adaptability of Wi-Fi navigation. However, the approach has used external memory, which means this approach cannot preserve the knowledge in its neural network without the use of external memory. In [10], the researchers used ELM for transfer learning framework. The developed framework has the ability to remove or add APs to the environment, which produces changes in the fingerprint model. Transfer learning is used in neural networks for adopting to a new situation without having to gather the new fingerprint again. One can move the old information that was gathered within the neural network to the new network with the help of two matrices: The input-weight supplement vector and the input-weight transfer matrix. The former helps the system take on the needed adjustments about the changing dimensions of feature matrices among the domains in conjunction with online sequential learning. One can use this model to avoid exhausting and traditional training processes when an expected update happens in the distribution of data because of changes in the domain or the environment. However, a disadvantage of the model is that it loses all old information and knowledge gathered from the network. This knowledge is vital when a high dynamical change takes place in the environment, which helps restore old knowledge in the system in the case of the occurrence of another change. One particular example involving Wi-Fi localization is when users go back and forth within indoor environment areas.
In general, the machine learning developed for Wi-Fi localization takes advantage of the online learning power of transfer learning and ELM models to easily perform efficient operation and training of the model. However, all current models do not take into account a vital feature of indoor navigation, which is its cyclic dynamic behavior. Taking advantage of this behavior can significantly enhance performance. This study emphasizes on this topic.

Background
This section provides the needed background for the developed methodology. The ELM model is reviewed in Section 3.1. The online variant of ELM or OSELM is provided in Section 3.2. The procedure for each of the two models is also provided. Additionally, feature adaptive-OSELM (FA-OSELM) is presented in Section 3.3.

ELM Review
In [12], a new neural network version called ELM was developed. Figure 1 illustrates the structure of ELM. This new learning framework is considered a single layer feed-forward network (SLFN) that performs random selection of input weights and an analytical determination of output weights through where, represents the Moore-Penrose generalised inverse of H that results into the minimisation of the normfor both ‖ − ‖ and ‖ ‖. C denotes the regularization parameter that is added to prevent the case of singularity. T stands for the training set's label matrix that can be defined as where m refers to the dimension of labels that corresponds to every training sample 4: return

OSELM Review
There are no available data in advance for a wide range of applications. Instead, there is a continuous generation of data based on time. Thus, every time there is a new block available, there is a need to train on the block of data. In [26], there was a development of a mathematical method for conducting online sequential learning for ELM, referred to as OSELM. OSELM includes two major phases. In the boosting phase, training of SLFNs is performed using the primitive ELM method as well as a few batches of training data that were utilized in the initialization stage. Once the boosting , number of neurons in the hidden layer N h Output: Output weights, β 1 : Random values are selected for input weights w i and bias b i , i = 1, 2, . . . , N h 2: The hidden layer output matrix, H, is calculated and defined as: 3 : Output matrix β is computed as where, H † represents the Moore-Penrose generalised inverse of H that results into the minimisation of the L 2 normfor both Hβ − T and β . C denotes the regularization parameter that is added to prevent the case of singularity.
T stands for the training set's label matrix that can be defined as where m refers to the dimension of labels that corresponds to every training sample 4: return β

OSELM Review
There are no available data in advance for a wide range of applications. Instead, there is a continuous generation of data based on time. Thus, every time there is a new block available, there is a need to train on the block of data. In [26], there was a development of a mathematical method for conducting online sequential learning for ELM, referred to as OSELM. OSELM includes two major phases. In the boosting phase, training of SLFNs is performed using the primitive ELM method as well as a few batches of training data that were utilized in the initialization stage. Once the boosting phase is complete, it discards the boosting training data. Then, the training data is learned by the OSELM one by one or chunk by chunk. After the learning procedure is performed on these data, it discards all the training data. Algorithm 2 depicts the process of the OSELM algorithm. The symbols of both Algorithm 1; Algorithm 2 are explained in Table 1.  Calculate the initial hidden layer output matrix Estimate the initial output weight Set k = 0. step 2 Sequential Learning Phase: Calculate latest output weight β (k+1) based on RLS algorithm:

FA-OSELM Review
FA was developed by [9] and put forward this model that aims to transfer weight values from the old to the new classifier. Thus, FA-OSELM can be defined as a method of transferring previous knowledge from a pre-trained neural network to a new network, based on the difference in the number of features in both.
Considering that hidden nodes (L) is the same between two networks, FA-OSELM provides an input-weight supplement vector Qi as well as an input-weight transfer matrix P, which allow moving from the old weights a i to the new weights a i with regards to the equation that accounts for the change in the amount of features from m t to m t+1 : where: where matrix P must adhere to the following rules: a. For every line, there is only one '1'; the rest of the values are all '0'; b.
Every column has at most one '1'; the rest of the values are all '0'; c.
P ij = 1 signifies that following a change in the feature dimension, the ith dimension of the original feature vector will become the jth dimension of the new feature vector.
When the feature dimension increases, Q i will function as the supplement. It also adds the corresponding input-weight for the newly added attributes. Furthermore, the rules below apply to Q i : a.
Lower feature dimensions indicate that Q i can be termed as an all-zero vector. Hence, no additional corresponding input weight is required by the newly added features; b.
In cases where the feature dimension increases, if the new feature is embodied by the ith item of a i , a random generation of the ith item of Q i should be carried out based on the distribution of a i .

Methodology
This section presents our developed methodology for building knowledge preserving OSELM (KP-OSELM). We start with presenting four types of dynamical scenarios that can be regarded as primary scenarios for cyclic dynamic behavior. Then, we present the neural network structure of KP-OSELM and its evolution with respect to time. Thereafter, we introduce our general algorithmic procedure of KP-OSELM. Finally, we provide the evaluation measures for investigating the performance of the developed model by comparison with other state-of-the-art models.

Generating Dynamic Scenarios of Localization
The goal of this subsection is to generate dynamic states related to a person moving in an area from one place to another, which causes a change in available APs. Dealing with mobility scenarios requires emphasis on the aspect of knowledge preservation. For example, a person may move from area A where only N1 features or APs are available to area B where N2 features are available. Then, the person returns to area A, which requires using old knowledge. The work of FA-OSELM has transferred the knowledge from A to B, but knowledge loss may occur due to the change in the dimension of the neural network when the person moves to a less AP area. When the person returns to the old area, the neutral network of the newly transferred knowledge cannot return the lost knowledge. Therefore, an extended period is required to gain the old lost knowledge. This problem can be resolved by developing the concept of knowledge transferring to a new model that assures knowledge preservation with a minimum amount of knowledge loss. To quantify this certain limitation of FA-OSELM.
The following scenarios are made.   Figure 2d. The APs are distributed on both sides with higher density in side B than in side A. Table 2 summarizes these scenarios.  The procedure of preparing the datasets for such scenarios starts at dividing the original data into two equal parts. The first part indicates the data of area A, whereas the second part indicates the data of area B. Then, random selection is performed to select the features for areas A and B. Each scenario has different selection of B depending on the relation between sets A and B as presented in the scenario description in Table 2. Table 3 shows the pseudocode for the first scenario. A similar

Scenario Name Mobility Number of APs Relation APs Subset Relation
The procedure of preparing the datasets for such scenarios starts at dividing the original data into two equal parts. The first part indicates the data of area A, whereas the second part indicates the data of area B. Then, random selection is performed to select the features for areas A and B. Each scenario has different selection of B depending on the relation between sets A and B as presented in the scenario description in Table 2. Table 3 shows the pseudocode for the first scenario. A similar approach is done for other scenarios with changing command 7 that refers to the source of preparing the data of area B, either as subset of the data of area A or as intersected set with A. Table 3. Pseudocode of dataset preparation for testing scenario 1. //select NA random features from features 9-featuresB = getRandom(featuresA,NB) //select NB random features from featuresA because B is contained in A 10-trainDataA = trainData(1:numberofRecordsA) 11-trainDataB = trainData(numberofRecordsA+1:end) 12-trainDataAf = generateFeaturesData(trainDataA, featuresA) 13-testDataAf = generateFeaturesData(testDataA, featuresA) 14-trainDataBf = GenerateFeaturesData(trainDataB, featuresB) 15-testDataBf = GenerateFeaturesData(testDataB, featuresB) End

Neural Network Structure for Knowledge-Preserving Neural Network and Feature Coding
The SLFN structure shown in Figure 1 is used for our NN model. The difference of this structure from classical SLFN for ELM learning is discussed as follows.

1.
The number of inputs is variable n, which equals to the number of APs that are sensed in the area. The number of hidden neurons is L, which is determined with the regularization parameter C by using the characterization model (which is built on the basis of the training data).

2.
The activation function is tansig This function is selected because it passes through (0,0), which enables the model to cancel the effect of old knowledge (that is gained from non-active APs) because the input is set to 0. Figure 3 presents the mathematical curve of tansig.  Figure 4 depicts the evolution of the neural network structure from one area to another and the APs and their relation with the neural network. Notably, the structure or topology of KP-OSELM does not change similar to that of FA-OSELM, which updates the input number depending on the active features with using a separate transfer learning block to move the needed weights from the old network to the new one as it is shown in Figure 4a. Alternatively, all inputs (active and non-active) in KP-OSELM are kept as it can be seen from Figure 4b. However, the encoding of non-active features with the same value must ensure that the activation functions pass through (0,0). Thus, we use zero encoding for the current tansig. The goal is to cancel the effect of the features in the network decision when they are non-active.
For further elaboration, we consider that the shown NNs are corresponding to a practical example of a person moving from location 1 until location 4 passing through 2 and 3. The APs reading are shown in Table 4. We assume that the non-active APs are encoded with zeros. Thus, when predicting using KP-OSELM, the neural network will receive the input    Figure 4 depicts the evolution of the neural network structure from one area to another and the APs and their relation with the neural network. Notably, the structure or topology of KP-OSELM does not change similar to that of FA-OSELM, which updates the input number depending on the active features with using a separate transfer learning block to move the needed weights from the old network to the new one as it is shown in Figure 4a. Alternatively, all inputs (active and non-active) in KP-OSELM are kept as it can be seen from Figure 4b. However, the encoding of non-active features with the same value must ensure that the activation functions pass through (0,0). Thus, we use zero encoding for the current tansig. The goal is to cancel the effect of the features in the network decision when they are non-active.
For further elaboration, we consider that the shown NNs are corresponding to a practical example of a person moving from location 1 until location 4 passing through 2 and 3. The APs reading are shown in Table 4. We assume that the non-active APs are encoded with zeros. Thus, when predicting using KP-OSELM, the neural network will receive the input [X 1 X 2 X 3 X 4 X 5 ] = [30 50 60 0 0] when the person was in location 1. Then, it will receive the input [35 45 55 0 0] when the person has moved to location 2. On the other side, in FA-OSELM, NN will have only 3 inputs, and it will receive the vector [X 1 X 2 X 3 ] = [30 50 60] when the person was in location 1, and [35 45 55] when the person has moved to location 2. Furthermore, when the person moves from location 2 to 3, the structure of KP-OSELM will not change while the structure of FA-OSELM will change to have different inputs [X 3 X 4 X 5 ] = [70 30 25] in location 3.

General Algorithmic Procedure
KP-OSELM is a novel variant of OSLEM with the capability of preserving the weights of nonactive features to restore them when they become active. Its learning equations are similar to the

General Algorithmic Procedure
KP-OSELM is a novel variant of OSLEM with the capability of preserving the weights of non-active features to restore them when they become active. Its learning equations are similar to the equations of the classical OSELM. The differences are that tansig is used as the objective function and that the constraint of zero features changes within one chunk of data.
The boosting equations are presented as Equations (6)-(9), whereas the iterated equations are presented as Equations (10)- (13). The equations are different from the classical OSELM equations in terms of the input data. In KP-OSELM, we do not use the input vector X i . We replace it with vector .. X i that is calculated from Equation (11). This equation indicates that vector .. X i has the same element of X i with active features and has zeros for non-active features. I denotes the active features, whereas F denotes the entire set of features. ..
x j = x j ; j ∈ I 0; otherwise Table 5 provides the pseudocode of learning and prediction, which starts with the boosting phase to train the initial network SLFN 0 with boosting data (D 0 , y 0 ). The data contains RSSI information of active features and the corresponding location of any record. Non-active features are encoded with zero values using the encode function. The steps are iterated for as many data chunks are available. Notably, any data chunk contains the same number of active features. We point out that the training and prediction functions adopts the same formulas that are used for OSELM and given in Equations (2) and (4).

Computational Analysis
According to [27], the ELM algorithm has a computational complexity of O ( N 2 N) where N denotes the number of hidden neurons and N denotes the number of inputs. Since N generally satisfies N N when N is sufficiently large, the computational complexity of the ELM can be thought to approach O(N). For KP-OSELM, we are preserving the structure of ELM with increasing only the dimension of the input vector when the number of inputs increase. Increasing the number of inputs implies only increasing N without changing the number of samples N. As a result, the order of KP-OSELM will stay approximately O(N). Table 5. Pseudocode of training and prediction using KP-OSELM.

Inputs:
D k //chunks of data with constraint of FC to be zero when k is constant y k //vector of labels of chunk D k , it is only provided after the neural network makes the prediction of D k SLFN 0 //initial neural network Outputs: ACC //accuracy of prediction Start: 1-x 0 =Encode(D 0 ) //this function encode the non-active features with zeros 2-

Experimental Results
This section presents the experimental work for validating KP-OSELM and comparing its performance with those of two approaches, the state-of-the-art approach OSELM [26] and the transfer learning-based approach FA-OSELM [10]. The section is combined of datasets description in Section 5.1. Next, we provide the characterization model in Section 5.2. After that we present the evaluation scenarios. Two approaches of evaluation were used. The first one is the area-based scenario in Section 5.3. This area-based scenario includes the four scenarios that were presented earlier in Section 4.1. The second one is the trajectory-based scenario in Section 5.4.

Datasets Description
The bases of this experiment are the TampereU and UJIndoorLoc databases. The UJIndoorLoc database is made up of three buildings of Universitat Jaume I with areas of almost 110.000 m 2 . These buildings should have at least four levels [28]. One can use this database for classification purposes, such as actual floor and building identification, regression, and actual estimation of latitude and longitude. Development of the database took place in 2013, with more than 20 unique users and 25 Android devices. The database is made up of 19,937 training/reference records and 1111 validation/test records. In 529 attributes, a Wi-Fi fingerprint was observed, which includes the coordinates of where the information was gathered, and other related information.
On the other hand, the TampereU dataset represents an indoor localization database that can be utilized to test IPSs that rely on WLAN/Wi-Fi fingerprint. Lohan and Talvitie created this dataset to test indoor localization techniques [29]. The dataset integrates the two buildings of the Tampere University of Technology. These buildings have three and four levels. The dataset contains 1478 training/reference records and 489 test attributes for the first building, and 312 attributes for the second building. It also contained the coordinates (longitude, latitude, and height) and the Wi-Fi fingerprint (309 WAPs).
An important measure to be considered is the density of APs in each of the two datasets. For TampereU, the density of APs is = number of Aps/area of building = 309/9454 = 0.03 while in UJIndoorLoc the density is the number of Aps/area of building = 520/110,000 = 0.004.

Characterization Model
In the parameter setting, the regularization parameter (C) is selected to be C = 2 −6 for TampereU based on the relationship between accuracy and C as shown in Figure 5a

Areas Based Scenarios
The generation of the scenarios is based on the parameters presented in Table 6; Table 7. Different numbers of features are chosen depending on the scenario and the size of the dataset. High numbers are selected for UJIIndoorLoc because the number of its attributes is high.

Areas Based Scenarios
The generation of the scenarios is based on the parameters presented in Tables 6 and 7. Different numbers of features are chosen depending on the scenario and the size of the dataset. High numbers are selected for UJIIndoorLoc because the number of its attributes is high. For evaluation, we generate the measures for each of the presented scenarios. Our observation of the results is also provided with an analysis and interpretation. Accuracy is presented in Section 5.3.1. The goal of accuracy is to differentiate between our developed model and other benchmark models in terms of location prediction performance. For a quantitative summary of the differences, we provide statistical based comparison that is presented in Section 5.3.2. To investigate the stability of the classifier, we present the standard deviation measure provided in Section 5.3.3. Predicted trajectories vs. ground truth are provided in Section 5.4.

Accuracy
The accuracy of KP-OSELM is generated and compared with that of the two state-of-the-art approaches OSELM and FA-OSELM. The comparison is performed with the TampereU and UJIndoorLoc datasets. The chunk represents a block of data received by the classifier for prediction with the correct values used for training for the next chunk and the accuracy represents the complementary of the error ratio to 1. The error is the ratio of the misclassification over the total number of classification as given in Equation (17):   In each scenario for the first area, all the classifiers generate similar performance in accuracy. Some tiny changes in accuracy are found, and they are due to the random behavior of ELM. Specifically, the first step in training requires initiating random weights between the input and hidden layer. However, the difference in performance between the approaches occurs when the person leaves an area and enters another area that has been visited earlier. This is interpreted by the fact that the classifier needs to restore an old gained knowledge. The restoration of old knowledge changes between OSELM, FA-OSELM, and KP-OSELM. An example of that is in all scenarios when the subject entered area B, we see that KP-OSELM has achieved an accuracy higher than the two benchmarks FA-OSELM and OSELM. The only similar performance between the benchmarks and KP-OSELM was in scenario 3 because FA-OSELM was able to restore old knowledge because of transfer learning. The exception has occurred in scenario 3 because B is part of A, and the transfer learning has transferred the knowledge of B to A when the subject visited in the second time.

Statistical Based Comparison
In order to verify the improvement of our developed classifier KP-OSELM over the benchmark, we conduct t-test to decide when to reject the hypothesis of non-difference. Tables 8 and 9 present the t-test value between KP-OSELM in one side and FA-OSELM and OSELM in the other side for TampereU and UJIndoorLoc, respectively. The t values are found for three areas: A, B, and A for second time. The null hypothesis is rejected for almost all cases of visiting A for the second time by confidence level of 0.1. This emphasizes the claim of superiority of KP-OSELM over both approaches.

Standard Deviation
The standard deviation of accuracy with respect to the chunks of data provides an indicator of the amount of new knowledge in the area that has been gained by the classifier. When a person goes to an early visited area, the classifier may restore the old knowledge. Therefore, gaining any new knowledge is unnecessary or the standard deviation is low, or the knowledge must be gained again because the old knowledge cannot be restored or the standard deviation is high. Figures 8 and 9 show the standard deviation of KP-OSELM, FA-OSELM, and OSELM with respect to the data chunks for A2 with the TampereU and UJIndoorLoc datasets, respectively. because the old knowledge cannot be restored or the standard deviation is high. Figures 8 and 9 show the standard deviation of KP-OSELM, FA-OSELM, and OSELM with respect to the data chunks for A2 with the TampereU and UJIndoorLoc datasets, respectively.  From the figures, the following observations are made. For most scenarios, OSELM has the highest standard deviation because OSELM cannot preserve knowledge when a person goes from one area to another. For all scenarios, KP-OSELM has a small standard deviation or at least equivalent to that of FA-OSELM. By contrast, OSELM has the highest standard deviation in all scenarios. The reason is that OSELM cannot preserve knowledge when the number of features changes, whereas FA-OSELM can transfer knowledge whenever a feature change occurs. However, FA-OSELM has limited capability in knowledge transferring because it can transfer knowledge only from the last state to the current state, whereas KP-OSELM can transfer knowledge from any previous states. In scenario 3 with both datasets, the standard deviation of FA-OSELM is similar to that of KP-OSELM because area A in this scenario is visited from area B and features of area A are part of features of area B. As a result, FA-OSELM can transfer all needed knowledge in area A, which provides FA-OSELM a performance equivalent to that of KP-OSELM. Figure 8. Standard deviation with respect to areas for scenario 1 with TampereU dataset. From the figures, the following observations are made. For most scenarios, OSELM has the highest standard deviation because OSELM cannot preserve knowledge when a person goes from one area to another. For all scenarios, KP-OSELM has a small standard deviation or at least equivalent to that of FA-OSELM. By contrast, OSELM has the highest standard deviation in all scenarios. The reason is that OSELM cannot preserve knowledge when the number of features changes, whereas FA-OSELM can transfer knowledge whenever a feature change occurs. However, FA-OSELM has limited capability in knowledge transferring because it can transfer knowledge only from the last state to the current state, whereas KP-OSELM can transfer knowledge from any previous states. In scenario 3 with both datasets, the standard deviation of FA-OSELM is similar to that of KP-OSELM because area A in this scenario is visited from area B and features of area A are part of features of

Trajectory Based Scenarios
The previous evaluation (areas based scenarios) lacks the ability of evaluating the localization algorithm based on given navigation trajectories. The reason is that the testing captures the accuracy of predicting the location of the testing dataset which is composed of different records, however, it does not include an actual records generated from a traversed scenario. In order to test the algorithm based on given scenarios we adopt the simulator that is presented in [24]. This simulator allows providing a whole trajectory and it generates its corresponding time series of records from the dataset. Floor one was considered for generating the trajectory for rectangle and floor one and two were considered for generating the trajectory for cubic. Each of the trajectories was generated with three cycles.
The geometry of the trajectories was compared with the ground truth in order to investigate the correctness of the prediction from localization perspective. Ground truth is the actual conducted path by the navigating person. Figures 10 and 11 shows the predicted paths for KP-OSELM, FA-OSELM, and OSELM. In addition, the ground truth path or actual path is shown as the rectangular trajectory for both TampereU and UJIndoorLoc datasets. We represented the paths in graphs were each node in the graph provides one predicted location while the edge represents the previous location that was predicted before the person goes to the current predicted place. We call irregular edges, which are defined as the edges that connect between two nodes that were not connected in the ground truth graph. While missing edges are defined as the edges that are missing between two nodes that were connected in the ground truth graph. The best graphs are the ones with lower number of the two types of edgesirregular edges and missing edges. Based on that, it can be observed that KP-OSELM has provided graphs with lower irregular edges and missing edges compared with FA-OSELM and OSELM graphs. Moreover, in KP-OSELM the edges thickness increases when they match their corresponding in the actual conducted graph. Observing the irregular edges in KP-OSELM, it can be interpreted that the multi-path nature of the Wi-Fi signal in the indoor environment is the reason. However, the frequency of the occurrence of such edges is not high compared with FA-OSELM and OSELM. The achieved accuracy in each of the three cycles of the rectangular trajectory was compared between the three approaches in Table 10. It is obvious that our model KP-OSELM has improved its performance from one cycle to the other. In the third cycle, the accuracy of KP-OSELM has reached 92.74% compared with 78.28% for FA-OSELM and 7.99% for OSELM in TampereU dataset. Similarly, the accuracy of KP-OSELM has reached 72.99% compared with 38.2% for FA-OSELM and 13.12% for OSELM in UJIndoorLoc dataset.   The reason of the difference in the performance between the two datasets is the difficulty in each of them. The difficulty is assumed to be inversely proportional with the density or the ratio of the number of APs to the area. The higher APs is equivalent to lower difficulty and the bigger area is equivalent to higher difficulty. From Section 5.1, In TampereU we see that Tamper has higher density, which causes lower difficulty and higher accuracy.

Summary and Conclusions
This article presents an online localization approach based on Wi-Fi technology for a person moving in an indoor environment. It focuses on the cyclic dynamic behavior that exists due to the nature of a person's mobility in an indoor environment, which requires visiting the same places that have been visited previously. This requires from the neural network to save the old knowledge for future usage. To preserve old knowledge, feature coding based on the used activation function is used. The goal of the coding is to cancel out the contribution of old knowledge that has nothing to do in the current situation. When using the tansig activation function, zero encoding of non-active features is needed. The evaluation performed in this study shows the superiority of the developed KP-OSELM over OSELM and FA-OSELM. The superiority is clear in the accuracy improvement, which supports the effectiveness of KP-OSELM compared with FA-OSELM and OSELM. The improvement has been verified statistically using the t-test. A further aspect of performance measurement is the standard deviation of accuracy, which is nearly low in all scenarios for KP-OSELM, thereby proving that this model has greater stability and lesser loss of old knowledge than FA-OSELM. A visualization of predicted trajectories by our approach and comparing it with benchmarks and ground truth supports the improvement. Our evaluation concludes the superiority of KP-OSELM over the benchmarks in terms of knowledge preservation, and stability in performance. The reached accuracy of KP-OSELM in the last cycle for TampereU was 92.74% and for UJIndoorLoc was 72.99%.
In the future study, we aim to investigate the effect of common features between two places on knowledge preservation and generalize this knowledge-preserving approach with cyclic dynamic awareness to other types of learners.

Conflicts of Interest:
The authors declare no conflict of interest.