A CSI-Based Indoor Positioning System Using Single UWB Ranging Correction

A fingerprint-based localization system is an economic way to solve an indoor positioning problem. However, the traditional off-line fingerprint collection stage is a time-consuming and laborious process which limits the use of fingerprint-based localization systems. In this paper, based on ubiquitous Wireless Fidelity (Wi-Fi) equipment and a low-cost Ultra-Wideband (UWB) ranging system (with only one UWB anchor), a ready-to-use indoor localization system is proposed to realize long-term and high-accuracy indoor positioning. More specifically, in this system, it is divided into two stages: (1) an initial stage, and (2) a positioning stage. In the initial stage, an Inertial Measure Unit (IMU) is used to calculate the position using Pedestrian Dead Reckon (PDR) algorithm within a preset number of steps, and the location-related fingerprints are collected to train a Convolutional Neural Network (CNN) regression model; simultaneously, in order to make the UWB ranging system adapt to the Non-Line-of-Sight (NLoS) environment, the increments of acceleration and angular velocity in IMU and the increments of single UWB ranging measures are correlated to pre-train a Supported Vector Regression (SVR). After reaching the threshold of time or step number, the system is changed into a positioning stage, and the CNN predicts the position calibrated by corrected UWB ranging. At last, a series of practical experiments are conducted in the real environment; the experiment results show that, due to the corrected UWB ranging measures calibrating the CNN parameters in every positioning period, this system has stable localization results in a comparative long-term range. Additionally, it has the advantages of stability, low cost, anti-noise, etc.


Introduction
With the development of the Internet of Things (IoT), indoor Location Based Service (LBS) has aroused extensive research in recent years [1]. Though Global Navigation System (GNS), including Chinese BeiDou Navigation System (BDS) and American Global Positioning System (GPS), can provide high-accuracy positioning results in open-air scenarios [2], they are limited in indoor environments such as basements, tunnels, and even high-density building areas, due to attenuation and distortion of electronic signals on the surface of blocking objects [3]. Therefore, many positioning technologies have been proposed or improved to satisfy various indoor localization requirements including technologies such as Wi-Fi [4], UWB [5], Bluetooth [6], Zigbee [7], Ultrasonic [8], and more [9]. Broadly, current indoor positioning technologies can be classified into two types: fingerprinting-based methods and ranging-based methods.
that it had a high classification accuracy. Yu et al. [22] proposed an equality constrained Taylor series robust least squares (ECTSRLS) technique to suppress NLoS ranging errors.
In order to implement high accuracy indoor positioning, alleviate the update frequency of the fingerprint database, and fully exploit the advantages of fingerprint-based methods and ranging-based methods, we proposed a CNN-based indoor localization system whereby the mapping relationship between the characteristics of received signals and location is stored in the weight parameters of the CNN, which could save storage cost, to some extent. Moreover, the UWB ranging is adopted to track change in the dynamical environment by refreshing weights of the CNN through errors backpropagation. To alleviate NLoS interference, the Support Vector Regression (SVR) and IMU data are designed to achieve a UWB ranging recovery. The main contributions of this paper are listed as follows: • The proposed system is a fully automatic scheme, which means that the system can be used to localize position directly, and can work without any former preparation, e.g., manual measurement of the UWB anchor position and fingerprint location, etc. It only costs a short amount of time to initialize the whole system. • To the best of our knowledge, this system is the first to use IMU measurements to correct UWB ranging. It highly expands the use of a UWB signal in an NLoS scenario and solves the localization problem in a harsh environment using ranging measurements. • The proposed system can adapt to a dynamical environment properly, relying on the corrected UWB ranging feedback, and can provide a comparatively stable localization result over the long term. • Using corrected UWB ranging measures can reduce the amount of CNN training (fingerprint database updating). Moreover, the position of the used UWB anchor can be ignored in this system (only ranging measurements are needed).
The remainder of this paper is organized as follows. In Section 2, the related work is introduced from the aspect of machine learning and UWB fusion system, and the detailed description of our proposed system is shown in Section 3. The corresponding experiment designs, as well as analysis of experiment results, are provided in Section 4. In Section 5, we discuss our system according to experimental results, and the conclusion is given in Section 6.

Machine-Based Localization
The use of machine learning methods for indoor localization has attracted considerable attention in recent years. Dou et al. [23] formulated the indoor localization problem as a Markov Decision Process (MDP), using Deep Q Learning (DQL) to bisect the whole positioning space in 3-Dimension (3D); this method had low complexity and flexible localization resolution. Song et al. [24] used K-Nearest Neighbor (KNN) to estimate location after comparing the Time-reversal Resonating Strength (TRRS) and Euclidean distance between reference fingerprints and target fingerprints. Carrera Villacrés et al. [25] designed a particle filter based reinforcement learning localization system fusing Wi-Fi fingerprint and IMU-based PDR; this system can change its positioning model according to channel types (LoS/NLoS), and has high localization accuracy and strong robustness compared with traditional methods when the propagation model matches the real environment well. Chen et al. [26] used two CNN to realize indoor localization, called a two-stage CNN deep learning approach; one was used to identify the inherent features of an environment, based on first CNN recognition results (choosing an appropriate positioning model), the other one was applied to realize localization. Zhou et al. [27] proposed a method named AdapLoc which was based on one-dimensional Convolutional Neural Network (1D-CNN) to dynamically adapt to environmental change, and the evaluation experiment verified the effectiveness of AdapLoc. Zhao et al. [28] designed a hybrid convolutional autoencoder neural network to extract the features of location-related signals, and the experiments showed that the convolutional autoencoder neural network not only worked well in a Sensors 2021, 21, 6447 4 of 21 real world dataset but also had anti-noise ability and low latency (average 4 ms). Chen et al. [29] proposed a Dilated CNN prediction and SVR correction Wi-Fi localization method, which had good real-time performance with only one RSS collection at each position. Consequently, it needed a large number of Accessible Points (64 for 8 × 8 size picture) to realize positioning with high accuracy.
In machine learning, deep learning is the most popular research field; one of the excellent characteristics of deep learning is the ability to inherently extract deep representative features in given data samples. In deep learning, CNN is the most popular localization tool among various machine learning methods; CNN has the ability to learn spatial features from data samples, thus, the temporal series signals can be changed into spatial series signals which can be processed by CNN. However, these deep learning methods have a latent assumption that the distribution of signal keeps stable in the long-term range. In order to follow the change of environment, the single UWB anchor-based ranging system is used in our proposed system.

UWB Fusion Localization
As formerly stated, a UWB localization system is a high-accuracy but expensive positioning system. Unfortunately, to alleviate the interference of NLoS in practical applications, the number of UWB anchors is much more than the theoretical number in real applications. Therefore, some researchers are devoted to a hybrid system based on a single UWB ranging with only one anchor. Tian et al. [18] utilized a Particle Filter (PF) to fuse PDR and UWB ranging; moreover, the anchor position was estimated in the initial stage to show that sensor drift was not significant. A ranging error model was then modified to implement PDR and UWB fusion using PF [30] to mitigate interference of NLoS. Cao et al. [31] designed a UWB ranging and IMU fusion algorithm which used UWB ranging and heading (provided by IMU) to calculate target speed, and an extended Kalman filter to fuse IMU and UWB ranging constricted by estimated speed. Li et al. [32] used an extended Kalman filter to fuse a UWB localization system (not ranging) and IMU, and they also discussed the fusion system under LoS and NLoS environments. Shi et al. [33] used commercial IMU and UWB ranging to calculate anchor coordinates which simplified the deployment of the UWB system, after which the UWB measurements and inertial measurements were fused by a tightly-coupled error-state Kalman filter. Xu et al. [34] proposed a fix-lag extended finite impulse response smoother (FEFIRS) to implement UWB and INS data tight fusion, and the results showed that FEFIRS had higher robustness and accuracy compared with traditional Kalman-based schemes.
These UWB fusion systems mostly rely on pre-knowledge of the anchor position and an NLoS environment transition model. The measurement of a UWB anchor position is a time consuming process, which will limit the use of UWB localization systems on a large scale. Moreover, though the NLoS model could provide stable positioning results, it cannot adapt to the dynamical environment.

Proposed Fusion System
This section gives an overview of the proposed positioning system followed by the description of some key components in detail. In Wi-Fi localization technology, compared with the variance of Received Signal Strength Indicator (RSSI), the CSI information has more stable properties and finer grained accuracy [35]. Therefore, the CSI is used to realize indoor localization in this paper.

Overview
The whole system is divided into two stages: an initial stage and a positioning stage, as shown in Figure 1. In the initial stage, a tester equipped with an IMU sensor and a UWB tag (mounted tightly) starts from a fixed point (a known coordinate such as an entrance), and the initial stage consists of two parts: CNN training and SVR training. During CNN training, if the step is detected in the Step detection block using the IMU data, the system will calculate the current location and record position-related fingerprints when the sample number and step count are less than their threshold N sf and N th . It should be emphasized that the sample number N sf is much larger than N th , thus, if the step count reaches the threshold N th while the sample number is insufficient, it should return to a fixed point to restart the PDR position calculation and fingerprint collection along a different test line until the sample number is satisfied. This manipulation not only keeps the high accuracy of PDR localization but also ensures quantity and quality of the training sample for the CNN. During SVR training, the data flow is triggered by new IMU data. The system will record the increments of IMU and UWB ranging data in pairs for SVR training when the UWB data are collected under LoS environment; after the sample number reaches its threshold N su , the SVR begins to train.

Overview
The whole system is divided into two stages: an initial stage and a positioning stage, as shown in Figure 1. In the initial stage, a tester equipped with an IMU sensor and a UWB tag (mounted tightly) starts from a fixed point (a known coordinate such as an entrance), and the initial stage consists of two parts: CNN training and SVR training. During CNN training, if the step is detected in the Step detection block using the IMU data, the system will calculate the current location and record position-related fingerprints when the sample number and step count are less than their threshold Nsf and Nth. It should be emphasized that the sample number Nsf is much larger than Nth, thus, if the step count reaches the threshold Nth while the sample number is insufficient, it should return to a fixed point to restart the PDR position calculation and fingerprint collection along a different test line until the sample number is satisfied. This manipulation not only keeps the high accuracy of PDR localization but also ensures quantity and quality of the training sample for the CNN. During SVR training, the data flow is triggered by new IMU data. The system will record the increments of IMU and UWB ranging data in pairs for SVR training when the UWB data are collected under LoS environment; after the sample number reaches its threshold Nsu, the SVR begins to train.  Step Detected? Step

Initial Stage
Positioning Stage Figure 1. The structure of the proposed system. N sf , N su is the sample number of CNN and SVR, respectively. N th is the threshold of the step number. R k is the k-th UWB ranging.
In the positioning stage, the system uses a current fingerprint to estimate the corresponding location; at the same time, the recovered UWB ranging under NLoS or raw UWB ranging under LoS is utilized to calculate CNN positioning error according to CNN estimated position, and then the parameters of CNN are adjusted using estimated errors. We will elaborate on the key technologies and components in subsequent sections.

CNN Training
In this part, the basic technologies utilized in training sample collecting are step detected and position calculated using IMU data, which have been researched sufficiently. The specific values of the sampling number and step count threshold N sf , N th will be discussed in the experiment preparation section.

Step Detection
Step detection can be treated as Stance detection, because feet touch the ground alternately, which generates a tiny zero-velocity interval per step. There have been many step detection algorithms developed in recent years, but most have been improved based on [36], with more complex and strict constraints. For example, Liu et al. [37] proposed an example of a robust step detection algorithm that reduced the false-detection and overdetection of steps well. For simplicity, the step detection algorithm in [36] is utilized in our system, and a simple description of its three constraint conditions (C1, C2, C3) are given as follows: The thresholds th amin and th amax are lower bound and upper bound respectively; |a k | denotes the square root of k-th acceleration sampling; σ a denotes the root mean-variance of acceleration under a given window scale; th σ a is the selected threshold; |ω k | represents the square root of k-th gyroscopic sampling; and th ω is the corresponding threshold.

Position Calculation
The PDR is an efficient localization algorithm, it can iteratively calculate target position using IMU heading data and step length: x k and y k are the coordinate values of k-th step in a planar coordinate system. L s denotes the step length in which k s , h, and f s are scaling factor, pedestrian height, and corresponding walking frequency respectively.
In common scenarios, it is hard to achieve high accuracy positioning results between different testers. However, our proposed system can guarantee that only one person is needed to initialize all systems in different scenarios at the system deployment stage and the positioning stage. The proposed system abandons the PDR algorithm, avoiding tedious PDR correcting processes between different people. In summary, the fixed person initializes the whole system, after which the system can be available for everyone. This is the superiority of our system compared with PDR.

CNN Regression Model
For simplicity without loss of generality, a traditional CNN consists of two convolutionpooling layers (shown in Figure 2), to realize position calculation and verify the effectiveness of the proposed system. The method is simplified in references [38,39]. This CNN consists of two 5×5 convolutional layers (C1 and C2), two max-pooling layers (P1 and P2) with stride 2, and three fully connected layers (F1, F2, F3). The output layer (F3) has two output nodes and each node outputs a corresponding coordinate value.
For simplicity without loss of generality, a traditional CNN consists of two convolution-pooling layers (shown in Figure 2), to realize position calculation and verify the effectiveness of the proposed system. The method is simplified in references [38,39]. This CNN consists of two 5×5 convolutional layers (C1 and C2), two max-pooling layers (P1 and P2) with stride 2, and three fully connected layers (F1, F2, F3). The output layer (F3) has two output nodes and each node outputs a corresponding coordinate value. As is depicted in Figure 1 flow chart, the feedback error signals are different between the initial and positioning stages. In the initial stage, the feedback signals are calculated with positions of reference points in database and CNN estimated positions, thus the loss function Φ is defined as follows: where N denotes the total number of CNN training samples, i y is the corresponding under the weight vector of w . The Stochastic Gradient Descent (SGD) is chosen to train weight w , and the weight update rule is: 1 : where i v is the i-th momentum variable, and m and d denote the constant momentum coefficient and weight decay respectively. The i κ is the learning rate decaying with a nonlinear rate. In the positioning stage, the CNN provides the predicted localization result and its weights are adjusted according to UWB ranging. The adjustment process can be regarded as environment characteristic tracking because the weights of CNN represent environment characteristics. Thus, the CNN-based system can theoretically realize long-term positioning through environment characteristics tracking using UWB ranging. In this stage, the corresponding loss function defined in positioning stage is:  Figure 2. The structure of CNN used in this paper.
As is depicted in Figure 1 flow chart, the feedback error signals are different between the initial and positioning stages. In the initial stage, the feedback signals are calculated with positions of reference points in database and CNN estimated positions, thus the loss function Φ is defined as follows: where N denotes the total number of CNN training samples, y i is the corresponding label of i-th training sample x i , and f (x i ; w) is the output of the CNN regression model under the weight vector of w. The Stochastic Gradient Descent (SGD) is chosen to train weight w, and the weight update rule is: where v i is the i-th momentum variable, and m and d denote the constant momentum coefficient and weight decay respectively. The κ i is the learning rate decaying with a nonlinear rate. ∂Φ/∂w|w i D i denotes the i-th iteration of loss function derivation with respect to w, on batch D i , evaluated at w i . In the positioning stage, the CNN provides the predicted localization result and its weights are adjusted according to UWB ranging. The adjustment process can be regarded as environment characteristic tracking because the weights of CNN represent environment characteristics. Thus, the CNN-based system can theoretically realize long-term positioning through environment characteristics tracking using UWB ranging. In this stage, the corresponding loss function defined in positioning stage is: where N p is the number of localization targets, f(rx i ; w) denotes the output of CNN with real-time fingerprint measures rx i and parameter w. r UWB is the corrected UWB ranging\raw UWB ranging (under NLoS\LoS), with the SGD equally used in this stage. ap is the coordinate of UWB anchor which can be estimated through UWB ranging in initial stage [19]. The scheme of ap position estimation is described briefly as follows: firstly, the pedestrian locations are calculated using PDR while recording corresponding UWB ranging measures in the initial stage. After that, an empirical power metric of UWB signal [40] is utilized to sort the UWB power measures. Following that, three non-collinear PDR positions with the best UWB ranging quality are chosen. Finally, the anchor position is solved by trilateration using three selected PDR positions and UWB ranging pairs. The empirical UWB ranging power metric in unit of dBm is defined as: where P RX , P FP , C, F 1−3 are the total received power, First Path (FP) power and the amplitude of three points defined in [19], respectively. In this work, the UWB system equipped with DW1000 chip is used to measure distance. Thus, the values of C and F 1−3 can be achieved in the registers of DW1000 chip. Reference [40] suggests that the channel is likely to be under LOS state when P di f f is greater than 10 dBm, while the channel is LOS when P di f f is less than 6 dBm.

SVR Training
Although the pedestrian trace can be calculated through strapdown IMU assisted with UWB ranging correcting, it is a time-consuming and error-accumulating process; thus, it is hard to estimate high accuracy and long term trace, especially using commercial IMU. In the proposed system, the IMU is used in an indoor environment; compared with large-scale applications such as car or plane tracking, it is a tiny-scale application. Thus, the navigation frame can be treated as a fixed frame; moreover, low-cost IMU cannot sense some physical effects, such as the Earth's curvature, rotation, etc., [41]. The UWB signals also suffer from object blockage in NLoS scenarios, while IMU is not affected by NLoS; moreover, NLoS and LoS always appear alternately. Therefore, the IMU data can be used to correct raw UWB ranging under LoS scenario, and can be verified through IMU trace formulas and distance formulas. The simplified relationship of position calculation between IMU and UWB ranging is: where the ∆R UWB(m) is the UWB ranging increment compared with the last ranging measurement, it relates with IMU position P n I MU(m) and P n I MU(m−1) ; the P n I MU(m−1) is the position of IMU in last updating cycle, which is already known. The P n I MU(m) is: in which the ∆P n I MU(m) is: Due to the limits of low-cost IMU, the v n m can be: where, where the subscript m denotes m-th updates of corresponding variables, the letters i, n and b are the abbreviation of navigation frame, body frame and inertial coordinate frame respectively, with T s the update interval. P ANC , P I MU , ∆P I MU , R UWB , ∆R UWB are respectively the constant coordinate of UWB anchor, the position of IMU, the position increment of IMU, the UWB ranging measure and the increment of UWB ranging measure.
∆v n s f is the specific force increment in n-frame and ∆v is the increment of specific force sampling from the accelerometer in a sampling period. θ (m)1−2 denotes twice sampling results in m-th updating period. Ignoring intermediate variables in Equations (12)-(14), it can be seen that the increment of UWB ranging measure relates to velocity increment ∆v and angular increment ∆θ. Therefore, m-th UWB ranging can be expressed with the initial location (last position under LoS) and corresponding increments ∆v and ∆θ.
From the formulas listed above, the relationship between UWB ranging increment and IMU reading increment is high dimensional, and the analytic solution is hard to achieve. Therefore, in order to realize UWB ranging correction and alleviate the error accumulation effect, the powerful ability of inherent characteristic extraction of SVR is applied to estimate the increment of UWB ranging under NLoS environment. Due to highly non-linear characteristics of radial basis function (RBF), the RBF is chosen as kernel function in SVR. Equation (11) reveals that the increment of the UWB range depends on the location of the anchor and the sensor. However, inspired by Equations (12)-(18), the IMU reading increments (gyroscope and accelerator) and last position of the sensor (where the UWB mobile node and IMU are fixed tightly) are chosen as the input data of the SVR to predict the corresponding UWB ranging increments in every sampling period, because the positions of the anchor are fixed parameters and their location information implicitly exists in data tuples of the training database, i.e., increment of IMU reading (input), last position of sensor (input), and UWB ranging increment (target). In other words, the position of the UWB anchor and other fixed parameters are reflected in the weights of SVR after undergoing offline training.
If the UWB range is measured under NLoS, learning from the former empirical power metric [40], the increment of the IMU reading and last sensor position are sent to SVR to predict the UWB ranging increment. As shown in the lower right part of Figure 3, there is a straight route (red line) and a metal baffle (black rectangle) between the red line and Access Point 1 (AP-1). Reciprocating along the red line, the corresponding raw UWB ranging measurements are shown in Figure 4 with red line, showing that the NLoS arouses fluctuation of UWB ranging. The SVR-corrected results are displayed with a green dash-dotted line which is consistent with the actual situation.     Moreover, we conducted a series of experiments to verify the validity of correction UWB ranging using SVR and IMU through changing the ratio of NLoS environment (adding baffles). The total moving distance was 7.2 m with 25 ranging points deployed evenly on the red trajectory, as shown in Figure 3. The results of the experiment are listed in Table  1, showing that the SVR and IMU could reduce the mean ranging error by up to 30.40% even under a harsh environment (high NLoS ratio 80%). Though the (Standard Deviation) Std of correction ranging decreases with the increase in NLoS ratio, the correction scheme can still keep low mean and Std of error compared with raw UWB ranging.  Moreover, we conducted a series of experiments to verify the validity of correction UWB ranging using SVR and IMU through changing the ratio of NLoS environment (adding baffles). The total moving distance was 7.2 m with 25 ranging points deployed evenly on the red trajectory, as shown in Figure 3. The results of the experiment are listed in Table 1, showing that the SVR and IMU could reduce the mean ranging error by up to 30.40% even under a harsh environment (high NLoS ratio 80%). Though the (Standard Deviation) Std of correction ranging decreases with the increase in NLoS ratio, the correction scheme can still keep low mean and Std of error compared with raw UWB ranging.  Figure 3 shows the layout of the experiment environment, which is a 13.18 m × 9.58 m hybrid office room with some instruments and office supplies. All experiment data were processed on a computer with Intel Core i5-10400F CPU, 16 GB RAM and NVIDIA GeForce RTX 2060.

PDR Parameters Setting
Four different people (three men and one woman) mounted with an IMU, walked along a fixed rectangular route (34 steps per circle) to confirm the value of step threshold Nth using a basic PDR algorithm; the PDR tracking results shown in Figure 5 illustrate that the PDR algorithm had high localization accuracy (below 0.1 m) within 20 steps. Therefore, the value of step threshold Nth is set to 20 in the proposed system. For simplicity and generality, basic PDR and low-cost IMU sensors are utilized in this system. However, a high step threshold can be achieved using a more expensive sensor or a better PDR algorithm. The parameters of IMU utilized in this system are listed in Table 2. In the fingerprint collection stage, people walked along given routes and remained stationary at each step for around 10 s, while the fingerprint data was collected. We collected CSI at 40 locations with 100 × 40 size of training and five received packets were randomly picked to test performance in the changed environment. The values of the experiment parameter are listed in Table 3.

CNN Parameters Setting
The input size of the CNN relates to the number of APs and the length of sampling sequences. In this paper, only one AP was used to collect a CSI fingerprint in the system. In order to select an appropriate sampling number and keep correlation in sampling sequence (sequential signals are received in a fixed point while the pedestrian is walking), the signal transmission interval and the ground contact time (interval between heel-strike and heeloff) should be considered. As depicted in Figure 6, the tester's ground contact time of walking (walking speed was about 1.8 m/s, which can satisfy most motion modes in an indoor scenario) had at least 50 sampling periods (500 ms) when the sampling period of the IMU was 10 ms (100 Hz).

Test Evaluation
In this part, all metal doors (Doors 1-3 shown in Figure 3) were kept in an open state when fingerprint data was collected. In the initial environment (denoted as En1), a tester randomly walked 280 steps in the experiment area (covering nearly all the area) to collect training samples. In order to improve the stability of the CNN system, the collected 280 samples were expanded to 4000 samples by adding White Gaussian Noise (WGN) which could improve robustness of CNN feature extraction [42]. After that, we changed the environment by adding an NLoS condition, i.e., object blocking and pedestrian stochastic walking (En2) and further closing all doors (denoted as En3). Then, another tester walked randomly in the three environments to evaluate the localization system.

Basic Performance Test
The Euclidean distance between predicted location and true location was used as localization error (m). The corresponding cumulative distribution function (CDF) of the localization errors and comparisons was computed and is given in Figure 8 and Table 4, in which the character(s) 'With', 'Without', and 'With raw' respectively denote calibrated CNN localization result with corrected UWB ranging, without UWB ranging, and with raw UWB ranging. It reflects that the calibrated UWB ranging can considerably improve the CNN-predicted results compared with 'Without' and 'With raw' schemes. Moreover, the localization performance on mean errors and Standard deviation (Std) are given in Figure 9 and Table 5. Figure 9 and Table 5 indicate that 'With' scheme has higher and more stable localization results compared with other schemes in terms of mean error and corresponding Std. It also reveals that the accuracy of UWB ranging is the key to performance improvement. More specifically, raw UWB ranging without correction may incur performance degradation compared with no UWB assisted scheme ('Without' scheme), and the high-accuracy UWB measurement can improve the localization performance. The CSI is described by a function, i.e., where |H| and ∠H are the corresponding amplitude and phase, respectively. In this paper, the average interval between adjacent received CSI packet was about 100 ms, which means that at least 5 packets could be received during each foot contact with the ground. Moreover, it had 6 channels (2 transmitting antennas, 3 receiving antennas) in each data packet and each channel had 30 subcarriers. Therefore, a 30 × 30 (5 packets × 6 channels × 30 subcarriers) matrix of amplitudes could be formed to a location-related CSI image as shown in Figure 7, reflecting a weak signal in third receiving antennas which acted as auxiliary antennas.

Test Evaluation
In this part, all metal doors (Doors 1-3 shown in Figure 3) were kept in an open state when fingerprint data was collected. In the initial environment (denoted as En1), a tester randomly walked 280 steps in the experiment area (covering nearly all the area) to collect training samples. In order to improve the stability of the CNN system, the collected 280 samples were expanded to 4000 samples by adding White Gaussian Noise (WGN) which could improve robustness of CNN feature extraction [42]. After that, we changed the environment by adding an NLoS condition, i.e., object blocking and pedestrian stochastic walking (En2) and further closing all doors (denoted as En3). Then, another tester walked randomly in the three environments to evaluate the localization system.

Basic Performance Test
The Euclidean distance between predicted location and true location was used as lo- In the fingerprint collection stage, people walked along given routes and remained stationary at each step for around 10 s, while the fingerprint data was collected. We collected CSI at 40 locations with 100 × 40 size of training and five received packets were randomly picked to test performance in the changed environment. The values of the experiment parameter are listed in Table 3. Table 3. Values of experiment parameter.

Parameter
Value/Setting

Number of samples in CNN N sf 4000
Number of samples in SVR N su 4000 Step threshold N th 20 The kernel function of SVR Radial Basis Function

Test Evaluation
In this part, all metal doors (Doors 1-3 shown in Figure 3) were kept in an open state when fingerprint data was collected. In the initial environment (denoted as En1), a tester randomly walked 280 steps in the experiment area (covering nearly all the area) to collect training samples. In order to improve the stability of the CNN system, the collected 280 samples were expanded to 4000 samples by adding White Gaussian Noise (WGN) which could improve robustness of CNN feature extraction [42]. After that, we changed the environment by adding an NLoS condition, i.e., object blocking and pedestrian stochastic walking (En2) and further closing all doors (denoted as En3). Then, another tester walked randomly in the three environments to evaluate the localization system.

Basic Performance Test
The Euclidean distance between predicted location and true location was used as localization error (m). The corresponding cumulative distribution function (CDF) of the localization errors and comparisons was computed and is given in Figure 8 and Table 4, in which the character(s) 'With', 'Without', and 'With raw' respectively denote calibrated CNN localization result with corrected UWB ranging, without UWB ranging, and with raw UWB ranging. It reflects that the calibrated UWB ranging can considerably improve the CNN-predicted results compared with 'Without' and 'With raw' schemes. Moreover, the localization performance on mean errors and Standard deviation (Std) are given in Figure 9 and Table 5. Figure 9 and Table 5 indicate that 'With' scheme has higher and more stable localization results compared with other schemes in terms of mean error and corresponding Std. It also reveals that the accuracy of UWB ranging is the key to performance improvement. More specifically, raw UWB ranging without correction may incur performance degradation compared with no UWB assisted scheme ('Without' scheme), and the high-accuracy UWB measurement can improve the localization performance.     The quantitative comparisons between 'With' and 'Without' schemes in Table 5 show that the error reduction percentile of 'With' scheme on 'Without' scheme stabilizes at around 65%, and the corresponding Std reduction is more than 40% in the three different scenarios.

Daily Localization Performance Test
In order to evaluate the performance of the localization system in daily time, 2000 (steps) location-related fingerprint data were collected from three testers (walking along preset test points) every 2 h from 8:00 A.M. to 8:00 P.M. During the data collecting stage, there were other students also working or walking in the testing environment. The mean and Std of localization error are shown in Figure 10.  The quantitative comparisons between 'With' and 'Without' schemes in Table 5 show that the error reduction percentile of 'With' scheme on 'Without' scheme stabilizes at around 65%, and the corresponding Std reduction is more than 40% in the three different scenarios.

Daily Localization Performance Test
In order to evaluate the performance of the localization system in daily time, 2000 (steps) location-related fingerprint data were collected from three testers (walking along preset test points) every 2 h from 8:00 A.M. to 8:00 P.M. During the data collecting stage, there were other students also working or walking in the testing environment. The mean and Std of localization error are shown in Figure 10.
As depicted in Figure 10, the mean localization error of the proposed scheme ('With') is lower than that of the CNN directly-predicted results ('without') in every time interval, which verifies that the proposed system can reduce the localization error effectively. Moreover, the positioning error of the proposed scheme is lower than 0.25 m and the error of the corresponding CNN prediction scheme ('without') is bigger than 0.35 m in all test intervals. Due to people body interference and changes in the tiny environment, the localization error varies with time; it is obvious that the localization system has the maximum positioning error at 16:00 (people are most tightly concentrated in this period of time) and the minimum positioning error at 20:00 (when there is minimal human interference). As depicted in Figure 10, the mean localization error of the proposed scheme ('With') is lower than that of the CNN directly-predicted results ('without') in every time interval, which verifies that the proposed system can reduce the localization error effectively. Moreover, the positioning error of the proposed scheme is lower than 0.25 m and the error of the corresponding CNN prediction scheme ('without') is bigger than 0.35 m in all test intervals. Due to people body interference and changes in the tiny environment, the localization error varies with time; it is obvious that the localization system has the maximum positioning error at 16:00 (people are most tightly concentrated in this period of time) and the minimum positioning error at 20:00 (when there is minimal human interference).

Long-Term Localization Performance Test
To study the long-term localization performance of the proposed system, the fingerprint information was collected over 10 consecutive days at 12:00 and 16:00 every day. The size of test data comprised 240 locations (240 steps each, for three people, along fixed points), and the means and Std of localization errors are displayed in Figure 11. From Figure 11, we can conclude that the proposed UWB ranging calibrated system ('with') has high positioning accuracy (lower than 0.32 m) in long-term and stable performance (Std within 0.25 m). As shown in Figure 11b, due to occasionally wrongly predicted UWB ranging and a system sensitive to wrong UWB ranging (refer to 'with raw' scheme

Long-Term Localization Performance Test
To study the long-term localization performance of the proposed system, the fingerprint information was collected over 10 consecutive days at 12:00 and 16:00 every day. The size of test data comprised 240 locations (240 steps each, for three people, along fixed points), and the means and Std of localization errors are displayed in Figure 11. As depicted in Figure 10, the mean localization error of the proposed scheme ('With') is lower than that of the CNN directly-predicted results ('without') in every time interval, which verifies that the proposed system can reduce the localization error effectively. Moreover, the positioning error of the proposed scheme is lower than 0.25 m and the error of the corresponding CNN prediction scheme ('without') is bigger than 0.35 m in all test intervals. Due to people body interference and changes in the tiny environment, the localization error varies with time; it is obvious that the localization system has the maximum positioning error at 16:00 (people are most tightly concentrated in this period of time) and the minimum positioning error at 20:00 (when there is minimal human interference).

Long-Term Localization Performance Test
To study the long-term localization performance of the proposed system, the fingerprint information was collected over 10 consecutive days at 12:00 and 16:00 every day. The size of test data comprised 240 locations (240 steps each, for three people, along fixed points), and the means and Std of localization errors are displayed in Figure 11. From Figure 11, we can conclude that the proposed UWB ranging calibrated system ('with') has high positioning accuracy (lower than 0.32 m) in long-term and stable performance (Std within 0.25 m). As shown in Figure 11b, due to occasionally wrongly predicted UWB ranging and a system sensitive to wrong UWB ranging (refer to 'with raw' scheme From Figure 11, we can conclude that the proposed UWB ranging calibrated system ('with') has high positioning accuracy (lower than 0.32 m) in long-term and stable performance (Std within 0.25 m). As shown in Figure 11b, due to occasionally wrongly predicted UWB ranging and a system sensitive to wrong UWB ranging (refer to 'with raw' scheme shown in Figure 9), the Std of the proposed system has a larger error Std than that of the CNN-direct prediction at day 2 and day 5. However, the proposed system can recalibrate the positioning results and keep them stable using subsequent high-accuracy UWB ranging measures, which is satisfied with practical applications.

Noise Injection Test
To evaluate the robustness of the proposed system, we injected the Gaussian noise (0, σ2) into the fingerprint gray image, where the Mean Square Error (MSE) σ is the deviation of image gray ranging from 0 to 52.
The experimental results depicted in Figure 12 show that the localization system can still keep most of the mean localization error (75%) under 0.5 m, and 1 m when σ ≤ 12, σ ≤ 26. Importantly, the whole localization accuracy is considerably stable (the biggest outliers are no more than 0.7 m) when σ ≤ 8. This experiment shows that the proposed system has a certain degree of anti-noise attack ability.
CNN-direct prediction at day 2 and day 5. However, the proposed system can recalibrate the positioning results and keep them stable using subsequent high-accuracy UWB ranging measures, which is satisfied with practical applications.

Noise Injection Test
To evaluate the robustness of the proposed system, we injected the Gaussian noise (0, σ2) into the fingerprint gray image, where the Mean Square Error (MSE) σ is the deviation of image gray ranging from 0 to 52.
The experimental results depicted in Figure 12 show that the localization system can still keep most of the mean localization error (75%) under 0.5 m, and 1 m when σ ≤ 12, σ ≤ 26. Importantly, the whole localization accuracy is considerably stable (the biggest outliers are no more than 0.7 m) when σ ≤ 8. This experiment shows that the proposed system has a certain degree of anti-noise attack ability.

Impact of Parameters on the Positioning Stage
In the initial stage, the training of the CNN is common training progress, which is stable and predictable. However, compared with the initial stage, the positioning stage has minor adjustment progress and is vulnerable to basic parameters. In this part, the important parameters of the CNN training in the positioning stage are discussed including batch size and number of test points. Localization errors are calculated under different batch sizes while keeping other parameters consistent. The experimental results are shown in Figure 13.  Figure 13 shows that localization errors increase with batch size, and the stability of positioning decreases with batch size (Std increases with batch size). Since the CNN has been fully trained in the initial stage, the mapping parameters of CNN can represent initial environment characteristics. Though the environment has changed in the positioning

Impact of Parameters on the Positioning Stage
In the initial stage, the training of the CNN is common training progress, which is stable and predictable. However, compared with the initial stage, the positioning stage has minor adjustment progress and is vulnerable to basic parameters. In this part, the important parameters of the CNN training in the positioning stage are discussed including batch size and number of test points. Localization errors are calculated under different batch sizes while keeping other parameters consistent. The experimental results are shown in Figure 13.
CNN-direct prediction at day 2 and day 5. However, the proposed system can recalibrate the positioning results and keep them stable using subsequent high-accuracy UWB ranging measures, which is satisfied with practical applications.

Noise Injection Test
To evaluate the robustness of the proposed system, we injected the Gaussian noise (0, σ2) into the fingerprint gray image, where the Mean Square Error (MSE) σ is the deviation of image gray ranging from 0 to 52.
The experimental results depicted in Figure 12 show that the localization system can still keep most of the mean localization error (75%) under 0.5 m, and 1 m when σ ≤ 12, σ ≤ 26. Importantly, the whole localization accuracy is considerably stable (the biggest outliers are no more than 0.7 m) when σ ≤ 8. This experiment shows that the proposed system has a certain degree of anti-noise attack ability.

Impact of Parameters on the Positioning Stage
In the initial stage, the training of the CNN is common training progress, which is stable and predictable. However, compared with the initial stage, the positioning stage has minor adjustment progress and is vulnerable to basic parameters. In this part, the important parameters of the CNN training in the positioning stage are discussed including batch size and number of test points. Localization errors are calculated under different batch sizes while keeping other parameters consistent. The experimental results are shown in Figure 13.  Figure 13 shows that localization errors increase with batch size, and the stability of positioning decreases with batch size (Std increases with batch size). Since the CNN has been fully trained in the initial stage, the mapping parameters of CNN can represent initial environment characteristics. Though the environment has changed in the positioning  Figure 13 shows that localization errors increase with batch size, and the stability of positioning decreases with batch size (Std increases with batch size). Since the CNN has been fully trained in the initial stage, the mapping parameters of CNN can represent initial environment characteristics. Though the environment has changed in the positioning stage, the mapping parameters of the CNN only need minor adjustment. However, the larger the batch size, the more mapping parameters modification required, which could increase the error of other positioning points. Therefore, the batch size should be set as 1.
To evaluate the impacts of the sample number on the positioning error, we calculated the localization error using different test numbers as shown in Figure 14. This indicated that the localization error was insensitive to the test number, i.e., the positioning accuracy remained relatively stable (within 0.4 m) when the number of samples increased. Moreover, as shown in Table 6, the update time (i.e., time of giving calibrated results) increased with the sample number. Therefore, the proposed system can realize online learning\calibrating (in the positioning stage) with high position accuracy, and also has the potential to locate multiple people where the sample size is equal to the number of people.
larger the batch size, the more mapping parameters modification required, which could increase the error of other positioning points. Therefore, the batch size should be set as 1.
To evaluate the impacts of the sample number on the positioning error, we calculated the localization error using different test numbers as shown in Figure 14. This indicated that the localization error was insensitive to the test number, i.e., the positioning accuracy remained relatively stable (within 0.4 m) when the number of samples increased. Moreover, as shown in Table 6, the update time (i.e., time of giving calibrated results) increased with the sample number. Therefore, the proposed system can realize online learning\calibrating (in the positioning stage) with high position accuracy, and also has the potential to locate multiple people where the sample size is equal to the number of people.  The proposed localization system is compared with the latest related research listed in Table 7. It is obvious that these recent results considerably contribute to an indoor positioning system. Based on RSS or CSI information, most of these works adopt a classification strategy to locate target; multiple information fusion and trilateration calculation are also adopted as positioning strategies. Unlike cited relevant works, the CNN regression prediction is used in our system to provide continuous target coordinates using its mapping power.
Though these works achieve good localization results based on respective positioning conditions, the long-term positioning performance has not been tested or discussed except in [43]; the practical application of the positioning system mostly depends on the long-term localization ability, which is key to cost-saving. Due to a more fine-tuned location-related information (CSI) and UWB ranging calibration, our proposed system keeps the long-term localization error within 0.4 m, comparing with meter-level accuracy in [44].
As for the number of AP, these positioning systems rely on multiple positioning base stations, some of which reach hundreds, e.g., [43]. However, [45] achieves low localization error (0.2 m) using CSI and RSSI hybrid information provided by only one AP, but as it does not consider long-term property of the localization system, its positioning performance will deteriorate over time (environment change). Reference [44] has the largest positioning area with lower accuracy (meter level) using less AP, but is suitable for indoor positioning scenarios (with meter-level accuracy).
Compared with related works, our contribution to the area is not the most superior, but we managed to balance the cost and positioning accuracy, i.e., only 2 base stations  The proposed localization system is compared with the latest related research listed in Table 7. It is obvious that these recent results considerably contribute to an indoor positioning system. Based on RSS or CSI information, most of these works adopt a classification strategy to locate target; multiple information fusion and trilateration calculation are also adopted as positioning strategies. Unlike cited relevant works, the CNN regression prediction is used in our system to provide continuous target coordinates using its mapping power.
Though these works achieve good localization results based on respective positioning conditions, the long-term positioning performance has not been tested or discussed except in [43]; the practical application of the positioning system mostly depends on the long-term localization ability, which is key to cost-saving. Due to a more fine-tuned location-related information (CSI) and UWB ranging calibration, our proposed system keeps the long-term localization error within 0.4 m, comparing with meter-level accuracy in [44].
As for the number of AP, these positioning systems rely on multiple positioning base stations, some of which reach hundreds, e.g., [43]. However, [45] achieves low localization error (0.2 m) using CSI and RSSI hybrid information provided by only one AP, but as it does not consider long-term property of the localization system, its positioning performance will deteriorate over time (environment change). Reference [44] has the largest positioning area with lower accuracy (meter level) using less AP, but is suitable for indoor positioning scenarios (with meter-level accuracy).
Compared with related works, our contribution to the area is not the most superior, but we managed to balance the cost and positioning accuracy, i.e., only 2 base stations (one is Wi-Fi, the other is a UWB anchor) are utilized to provide location-related information and calibration signals; and based on this basic information, the CNN regression model is utilized to realize continuous positioning with high localization accuracy over a long-term range. Moreover, the proposed system was tested in a large-scale room which is suitable for most scenarios.

Discussion
Although the proposed system can realize stable and robust localization, there are still some problems to discuss.

1.
The proposal can save the cost of labor and system deployment. However, due to the high drift and noise interference of commercial IMU, the fingerprint collector must return to the fixed coordinate-known point (the entrance or some other given points) when the number of steps reaches the precision threshold of the PDR algorithm. As a result, the area of fingerprint collection is limited to a circle with the fixed point (the entrance or some other given points) as center, and straight PDR distance (within threshold of step number) as radius. This limit can be solved by adopting a more expensive IMU rather than adding an anchor to implement location-related fingerprint collection in a large area, and the equipment of fingerprint collection can still be reused to collect fingerprint in other interesting places. 2.
In this paper, the UWB ranging measure is the key to following environment change and calibrating localization results, thus, the whole system is also vulnerable to an NLoS environment. In our design, machine learning is utilized to recover UWB ranging measures under an NLoS environment. Although the machine learning method can give a reliable result, it has to be retrained when the positioning environment or the position of the UWB anchor has changed. There are two solutions to avoid NLoS interference: discarding UWB ranging under NLoS or constructing a UWB transmitting channel model. In terms of the discarding method, the UWB calibrated function cannot work in NLoS, which limits the system's practical application and reduces the stability of the system (the system frequently changes between calibrated and uncalibrated state). As for the latter solution, there are some UWB signal transmitting models under different blocking objects and these models can recover UWB signal well. However, in practical application, it is hard to design a valid transmitting model suitable for various or multiple blocking interferences. Additionally, the threshold of NLoS judgment needs to be devised in a different environment rather than using experiential value.

3.
Unlike most works, the CNN regression model is used to predict location based on a gray image of CSI amplitude fingerprint. The essence of CNN prediction is the mapping function between position and CSI (similar to a signal transmitting model), which is sensitive to environmental change. For this reason, the UWB ranging measure is utilized to dynamically adjust the CNN predictions and weight parameters in this paper. Although the CNN is vulnerable to environmental change, it has its own superiority, i.e., outputting continuous location, which is the inbuilt advantage of realizing high-accuracy localization compared with the classification method. 4.
In addition to the discussed and tested parameters, there are a large number of factors affecting positioning results in practical application, e.g., the size of the fingerprint image and the choice of length of Wi-Fi sequences, etc. Based on these dynamical factors, our future work will concentrate on a more comprehensive but efficient localization system.

Conclusions
In this article, a UWB ranging calibrated localization system based on a CNN regression model has been developed to realize high-accuracy indoor positioning. Specifically, the proposed system can track dynamical environment characteristics using UWB ranging measure, which can mitigate the effects of environmental changes on localization results. Moreover, the PDR algorithm is employed to save the cost of fingerprint collection and anchor deployment in the off-line stage. A series of experiments have been carried out to testify the priority of this system: these experiments show that the system has strong robustness and adaptability; furthermore, it has excellent short-term and long-term positioning ability with high localization accuracy (lower than 0.35 m) and stability (lower than 0.25 m). Finally, the noise injection test reveals that the gray images of fingerprints have a certain degree of anti-noise attack ability. All the experiments testify that the proposed system is effective in indoor positioning.
There are still several directions to further improve this work including optimization of the CNN for positioning, establishing a high-efficiency fingerprint, designing more intelligent localization structure, etc. Our future work will focus on these research points.