Pedestrian Dead Reckoning-Assisted Visual Inertial Odometry Integrity Monitoring

Visual inertial odometers (VIOs) have received increasing attention in the area of indoor positioning due to the universality and convenience of the camera. However, the visual observation of VIO is more susceptible to the environment, and the error of observation affects the final positioning accuracy. To address this issue, we analyzed the causes of visual observation error that occur under different scenarios and their impact on positioning accuracy. We propose a new method of using the short-time reliability of pedestrian dead reckoning (PDR) to aid in visual integrity monitoring and to reduce positioning error. The proposed method selects optimized positioning by automatically switching between outputs from VIO and PDR. Experiments were carried out to test and evaluate the proposed PDR-assisted visual integrity monitoring. The sensor suite of experiments consisted of a stereo camera and an inertial measurement unit (IMU). Results were analyzed in detailed and indicated that the proposed system performs better for indoor positioning within an environment that contains low illumination, little background texture information, or few moving objects.


Introduction
In the modern world, people are becoming increasingly dependent on location services. The need to provide accurate indoor location services is becoming more and more urgent [1][2][3][4][5]. Indoor positioning technologies based on various types of sensors, such as Wi-Fi [6], Bluetooth [7], cameras [8], and inertial sensors [9], are rapidly developing. Since camera sensors can provide rich visual information on the environment and cameras can be obtained easily universally, vision-based indoor positioning technology [10,11] is increasingly receiving attention. According to the working modes of a camera, they can be divided into three categories: monocular, stereo, and RGB-D. The monocular camera has only one camera with the advantages of simple structure and low cost but has the disadvantage of scale uncertainty. The purpose of the stereo camera and the RGB-D camera is to overcome the shortcoming of the monocular mode-the inability to determine distance. The stereo camera consists of two cameras. We can use the distance between the two cameras to estimate the spatial position of each pixel, a process that is very similar to the human eye. The RGB-D camera can obtain the depth information by physical measurement, so it can save lots of calculations compared to the stereo camera. However, the RGB-D camera has many problems such as large noise, small field of view, easy exposure to sunlight, and the inability to measure transmission materials. Thus, we chose the stereo camera as the sensor for our vision-based indoor positioning system.
Vision-based indoor positioning technology can be divided into visual odometry (VO) [5] and visual-inertial odometry (VIO) [11]. The VO estimates the motion of a camera based on the movement of features in the captured image. The architecture of VIO includes two main components: the front-end 1.
We analyzed the error source and divided it into four error situations when the vision-based positioning system had a large positioning error under special indoor environments that had fewer textures, dynamic obstacles or low lightings.

2.
We proposed autonomous integrity monitoring of a visual observation-based pedestrian dead reckoning system. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring system switches states between VIO (or VO) and PDR automatically to provide more accurate positions in an indoor environment.

Background
Visual inertial odometers generally consist of two parts, namely, front-end and back-end. The front-end mainly deals with a sensor's observations such as feature extraction, feature tracking, feature screening, IMU pre-integration processing, and integration of images with IMU data. The back-end mainly performs estimations of position on the abstracted data produced by the front-end by minimizing the residual which is caused by observations through a filter or an optimization scheme. The structural block diagram of the system is shown in Figure 1.

3
performance index which only measures the goodness of geometrical distributions of feature points. The method does not consider the case in which feature points are sparse and moving.
In this paper, we analyzed the error source and divided it into four error situations when the vision-based positioning system had a large positioning error under special environments. We proposed autonomous integrity monitoring of a visual observation-based pedestrian dead reckoning system. According to the characteristic of short-term reliability of PDR [34], PDR can output positioning results when the visual observation is unreliable. Results from experiments show that our positioning system is more robust to indoor environments having fewer textures, dynamic obstacles or low lightings. The main contributions of this research are summarized below: 1. We analyzed the error source and divided it into four error situations when the vision-based positioning system had a large positioning error under special indoor environments that had fewer textures, dynamic obstacles or low lightings. 2. We proposed autonomous integrity monitoring of a visual observation-based pedestrian dead reckoning system. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring system switches states between VIO (or VO) and PDR automatically to provide more accurate positions in an indoor environment.

Background
Visual inertial odometers generally consist of two parts, namely, front-end and back-end. The front-end mainly deals with a sensor's observations such as feature extraction, feature tracking, feature screening, IMU pre-integration processing, and integration of images with IMU data. The back-end mainly performs estimations of position on the abstracted data produced by the front-end by minimizing the residual which is caused by observations through a filter or an optimization scheme. The structural block diagram of the system is shown in Figure 1. The goal of the back-end is to estimate the 3D pose of the camera frame {C} for a global frame of reference {G}. Since a stereo camera consists of two cameras, the camera frames are represented by { , = 1 2}. To make it clearer and simpler to analyze the impact of visual observations on pose estimation, we defined the state vector and observation of the positioning system. The state vector at time-step k in visual-inertial odometer can be defined as Equation (1), including the evolving state of the IMU and the camera pose (attitude and position ).
where is the rotation from frame {G} to frame {IMU}, and are the IMU position and velocity with respect to {G}, and are the biases of the gyroscope and accelerometer measurements. The goal of the back-end is to estimate the 3D pose of the camera frame {C} for a global frame of reference {G}. Since a stereo camera consists of two cameras, the camera frames are represented by {C k , k = 1 or 2}. To make it clearer and simpler to analyze the impact of visual observations on pose estimation, we defined the state vector and observation of the positioning system. The state vector X k at time-step k in visual-inertial odometer can be defined as Equation (1), including the evolving state X IMU k of the IMU and the camera pose (attitude C k G q and position G p C k ). where IMU G q T is the rotation from frame {G} to frame {IMU}, G v and G p are the IMU position and velocity with respect to {G}, b g and b a are the biases of the gyroscope and accelerometer measurements.
The k-th measurement of the camera contains a series of feature points that are observed by the k-th camera pose ( ). The measurement model of a feature point z i is expressed by the following equation: where f represents a collection of all feature points, C 1 and C 2 represent the left and right cameras, respectively, and n i is the 2 × 1 image noise vector. The feature position expressed in the camera frame, C P i , is given by: where G P i is the 3D feature position in the global frame, G P c , is the camera in the global frame and C C i G q is the rotation matrix between the camera frame and the global frame. Once the estimate of the feature position is obtained, we can compute the measurement residual: where H c and H i are the Jacobians of the measurement z i for the state and the position estimate of the feature. With all the sets of measurement equations formed by the feature points, we can obtain the optimal solution by minimizing the error and get the optimal position estimate.

Visual Error Analysis
Large errors have been observed in the positioning results under special environments such as fewer textures, low lightings or dynamic obstacles [35]. In this research, we closely investigated the error source occurring in four scenarios: an indoor environment with fewer textures resulting in insufficient features; an indoor environment with dim lighting causing the failure of feature tracking; an indoor environment with uneven textures resulting in an uneven distribution of features; and am indoor environment with dynamic obstacles producing moving features.

Insufficient Features
Commonly used feature extraction algorithms include the Scale-Invariant Feature Transform(SIFT) [11], Speed Up robust Feature Transform (SUFT) [12], Features from Accelerated Segment Test (FAST) [13] and Oriented feature from accelerated segment test and Rotated Binary robust independent elementary features (ORB) [14] algorithms. These algorithms are often used in processing VIO projects. In an image, a point with a strong contrast of surrounding pixels is defined as a feature point. The contrast of point P can be expressed as: V(x, y) = I(x + ∆x, y + ∆y) − I(x, y) , V(x, y) ∼ I x 2 · ∆x 2 + I y 2 · ∆y 2 + 2I x I y · ∆x∆y (6) where x and y represent the pixel coordinates of P. I(x, y) and V(x, y) represent the gray value and contrast of the point, respectively. The value of V mainly depends on the gradient of the point P in the x and y directions (I x and I y ). The larger the gradient value, the easier it is to be detected by the detector. It is difficult to obtain sufficient feature points from scenes having fewer textures (i.e., white walls) or dim lightings which is common in indoor environments. Position estimation can be performed when the feature point pairs exceed eight pairs [36].
When the number of feature points is sufficient, rank(H) ≥ 8, the constraint Equation (7) is sufficient to obtain the optimal solution. When the number of feature points is insufficient, the constraint condition is insufficient, and the estimated value error (δX and δp i ) becomes larger. This leads to an increase in the positioning error:

Lighting Causes the Failure of Feature Tracking
Illumination changes often occur in indoor environments, and we used the Lambertian model as the lighting model.
where I(x, y) is the image gray value, ρ(x, y) is the object reflectivity, h(x, y) is the surface normal vector, and S is the lighting intensity. We found that with feature tracking, it is easy to lose leads to inaccurate positioning during lighting changes. The optical flow method is based on the assumption that the gray level is unchanged. Substitute the lighting formula: where I x and I y are the gradient values of the feature points in the x and y directions, respectively, and µ and v are the velocity of the motion in the x and y directions, representing the feature points. As shown in Equation (11), the residual δr i of the features will become larger, while the light intensity changes and ∂S/∂t become larger.

Uneven Distribution of Features
It can be seen in the observation equation of the image that the presence of noise causes positional errors in the feature points in the image. The position error of the feature points will affect the state estimation of the camera when calculating the re-projection error. To better represent the role of the geometric relationship between image feature points and camera poses, our line used a simple two-dimensional example to describe the geometric relationship. As shown in Figure 2, P1 and P2 represent two picture feature points. If there is no noise influence, the camera pose can determine the position by the intersection of two circles with two feature points as the center and two projection distances. But the measurement was not ideal, and the uncertainty of the noise was ±ε. We describe the quality of the position estimate based on the camera state's Jacobian matrix of feature points . Assume that the measurement error is zero-average, the positioning error is also zero-average. Then we can obtain the expected value (∆ ) and covariance ∆ of the error in the position calculation.
The amount of change in the position error in the , , and z directions is represented by , , , respectively. is used to represent the first element on the diagonal in the diagonal matrix ( ) . Then, it can be expressed as:

Moving Features
All moving objects, such as pedestrians or vehicles, will affect the positioning result during positioning. When the feature points are concentrated on the moving object, the relative movement of the feature points results in a larger calculated camera movement. This situation can be expressed in the world coordinates of the feature points as having an additional motion shift, ∆ , which affects the camera's observation as shown in Equation (14): Let us analyze the residuals generated by the offset (∆ , ∆ , ∆ ) of the feature.

PDR-Assisted Visual Integrity Monitoring
Although PDR has a problem of cumulative error, the error over a short time is very small. The rotation matrix of the IMU relative to the world coordinate system can be constructed by the threeaxis gyroscope. After the three-axis acceleration is rotated, the relative position information ( , , ) can be obtained by performing the integral operation as shown in Equation (16).
Now assume that there are two sampling points , , the sampling interval is ∆ and the velocity of time is , the displacement is , and the state covariance matrix is = . We describe the quality of the position estimate based on the camera state's Jacobian matrix of feature points H i . Assume that the measurement error is zero-average, the positioning error is also zero-average. Then we can obtain the expected value E(∆X) and covariance Cov[∆X] of the error in the position calculation.
The amount of change in the position error in the x, y, and z directions is represented by Then, it can be expressed as:

Moving Features
All moving objects, such as pedestrians or vehicles, will affect the positioning result during positioning. When the feature points are concentrated on the moving object, the relative movement of the feature points results in a larger calculated camera movement. This situation can be expressed in the world coordinates of the feature points as having an additional motion shift, ∆ G P f j , which affects the camera's observation as shown in Equation (14): Let us analyze the residuals r i generated by the offset (∆x, ∆y, ∆z) of the feature.

PDR-Assisted Visual Integrity Monitoring
Although PDR has a problem of cumulative error, the error over a short time is very small. The rotation matrix of the IMU relative to the world coordinate system can be constructed by the three-axis gyroscope. After the three-axis acceleration is rotated, the relative position information (x, y, z) can be obtained by performing the integral operation as shown in Equation (16). Now assume that there are two sampling points O 1 , O 2 , the sampling interval is ∆t and the velocity of time O 1 is ν, the displacement is s, and the state covariance matrix is P 1 = P 11 P 12 P 21 P 22 .
Accelerometer observations can cause inaccurate deviations due to the shocks generated during motion. We first analyzed the one-dimensional motion, the acceleration at time O 1 is f mea = f true + δ f . An estimated value of the state quantity at O 2 can be obtained from the state quantity at O 1 . The deviation caused by δ f is: where R is the covariance matrix of the observed noise.
In this paper, we propose an autonomous PDR-assisted visual integrity monitoring approach to improve positioning accuracy. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring system switches states between VIO (or VO) and PDR automatically to provide a more accurate position in an indoor environment. The specific switching situation is shown in Figure 3. When the positioning result of the VIO system exceeds the error range of the PDR, the PDR result is used instead of the VIO result.X i is the camera pose at the ith moment,X i+1 is the camera pose at the i + 1th moment obtained by PDR, andX i+1 is obtained by VIO. ε is the error range of PDR.

7
Accelerometer observations can cause inaccurate deviations due to the shocks generated during motion. We first analyzed the one-dimensional motion, the acceleration at time is = . An estimated value of the state quantity at can be obtained from the state quantity at . The deviation caused by f δ is: where R is the covariance matrix of the observed noise.
In this paper, we propose an autonomous PDR-assisted visual integrity monitoring approach to improve positioning accuracy. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring system switches states between VIO (or VO) and PDR automatically to provide a more accurate position in an indoor environment. The specific switching situation is shown in Figure 3. When the positioning result of the VIO system exceeds the error range of the PDR, the PDR result is used instead of the VIO result.
is the camera pose at the th moment, is the camera pose at the 1th moment obtained by PDR, and is obtained by VIO. ε is the error range of PDR. Hypothesis deviation obeys Gaussian distribution ~(0, ∑). Now, is a three-dimensional vector. In order to facilitate the calculation, the inner product of the computation vector is transformed into a scalar.
( ) It is a normal distribution of multidimensional standards. It can be thought of as the sum of the squares of two independent random variables subject to the standard normal distribution which obeys the chi-square distribution of three degrees of freedom. The probability distribution (cumulative distribution function) is = ( ), and given an , we can determine an interval 0, is the threshold we are looking for to determine the visual integrity. The above is the theoretical analysis part of the threshold of a PDR-assisted visual integrity monitoring approach. Our indoor positioning system is based on the Multi-State constrained Kalman Filter (MSCKF) positioning algorithm and PDR. It is very important to switch states between MSCKF and PDR automatically to provide an accurate positioning result for our indoor positioning system. The problem that visual observations cause on MSCKF can be attributed to the fact that visual observations are not updated or are updated incorrectly. So, the update frequency of visual observations and the estimation of the gyroscope bias will generate a large abnormal fluctuation when the abnormality is detected by PDR-assisted visual integrity monitoring (as shown in Figure 9). We used the update frequency of Hypothesis deviation obeys Gaussian distribution e ∼ N(0, ). Now, e is a three-dimensional vector. In order to facilitate the calculation, the inner product of the computation vector is transformed into a scalar.
It is a normal distribution of multidimensional standards. It can be thought of as the sum of the squares of two independent random variables subject to the standard normal distribution which obeys the chi-square distribution of three degrees of freedom. The probability distribution (cumulative distribution function) is a = F(x), and given an a, we can determine an interval 0, F −1 (a) . F −1 (a) is the threshold we are looking for to determine the visual integrity. The above is the theoretical analysis part of the threshold of a PDR-assisted visual integrity monitoring approach. Our indoor positioning system is based on the Multi-State constrained Kalman Filter (MSCKF) positioning algorithm and PDR. It is very important to switch states between MSCKF and PDR automatically to provide an accurate positioning result for our indoor positioning system. The problem that visual observations cause on MSCKF can be attributed to the fact that visual observations are not updated or are updated incorrectly. So, the update frequency of visual observations and the estimation of the gyroscope bias will generate a large abnormal fluctuation when the abnormality is detected by PDR-assisted visual integrity monitoring (as shown in Figure 9). We used the update frequency of visual observation f update and the estimation of the gyroscope biasb gyr to switch states between MSCKF and PDR automatically as shown in Equation (19): where P out is the estimated poses of our positioning system, P PDR and P MSCKF represent the estimated poses of PDR and MSCKF, respectively. τ 1 and τ 2 are the hyperparameters of the system to switch states between MSCKF and PDR automatically. MSCKF needs to re-measure the observations to prevent the impact of visual observation errors on later operations at each switch. During the operation of the system, the MSCKF cannot be restored to normal immediately after replacing the attitude angle.
To prevent the wrong gyroscope bias from affecting the subsequent pose estimation, we will continue to output the estimated poses of PDR for some time. That time period will be used to allow MSCKF to re-measure the observations and complete the restart operation.

Experiments and Evaluation
The sensor suite we used is shown in Figure 4. It consists of a stereo camera (ZED, 30 HZ from stereolabs, San Francisco, U.S.A) and an IMU (MTi-100 from XSENS, Netherlands). The indoor positioning system was based on the MSCKF positioning algorithm and PDR. In data collection, the sensor suite was held in hands, and the participant walked in an indoor environment. The first part of the experiment assessed the scenarios' impact on the positioning results. The second part of the experiment tested and evaluated the proposed PDR-assisted visual integrity monitoring which switches states between MSCKF and PDR automatically to provide an accurate position. 8 visual observation and the estimation of the gyroscope bias to switch states between MSCKF and PDR automatically as shown in Equation (19): where is the estimated poses of our positioning system, and represent the estimated poses of PDR and MSCKF, respectively. and are the hyperparameters of the system to switch states between MSCKF and PDR automatically. MSCKF needs to re-measure the observations to prevent the impact of visual observation errors on later operations at each switch. During the operation of the system, the MSCKF cannot be restored to normal immediately after replacing the attitude angle. To prevent the wrong gyroscope bias from affecting the subsequent pose estimation, we will continue to output the estimated poses of PDR for some time. That time period will be used to allow MSCKF to re-measure the observations and complete the restart operation.

Experiments and Evaluation
The sensor suite we used is shown in Figure 4. It consists of a stereo camera (ZED, 30 HZ from stereolabs, San Francisco, U.S.A) and an IMU (MTi-100 from XSENS, Netherlands). The indoor positioning system was based on the MSCKF positioning algorithm and PDR. In data collection, the sensor suite was held in hands, and the participant walked in an indoor environment. The first part of the experiment assessed the scenarios' impact on the positioning results. The second part of the experiment tested and evaluated the proposed PDR-assisted visual integrity monitoring which switches states between MSCKF and PDR automatically to provide an accurate position.

Assessing Environment Impacts
The following experiments were designed to evaluate the above four causes of errors identified in the previous session.

Insufficient Features
As shown in Figure 5a, we changed the threshold for the number of feature points per frame for the same set of data to extract feature points. Figure 5a showed that the lower the threshold, the more the number of feature points. For both thresholds of 20 and 60, the number of feature points was 0 in the 1560 th frame. This is because a white wall was encountered, and the feature points could not be extracted. We drew the corresponding positioning trajectory as shown in Figure 5b. When the feature points were scarce, the camera's ability to correct the IMU was weaker, the path was not serrated enough, and the trajectory also showed significant deviations between the x-axis and the y-axis.

Assessing Environment Impacts
The following experiments were designed to evaluate the above four causes of errors identified in the previous session.

Insufficient Features
As shown in Figure 5a, we changed the threshold for the number of feature points per frame for the same set of data to extract feature points. Figure 5a showed that the lower the threshold, the more the number of feature points. For both thresholds of 20 and 60, the number of feature points was 0 in the 1560th frame. This is because a white wall was encountered, and the feature points could not be extracted. We drew the corresponding positioning trajectory as shown in Figure 5b. When the feature points were scarce, the camera's ability to correct the IMU was weaker, the path was not serrated enough, and the trajectory also showed significant deviations between the x-axis and the y-axis.

Lighting Causes the Failure of Feature Tracking
When the lighting is different from the left and the right cameras, the average gray value of the images acquired by the left and right cameras was different, and the matching rate was low. Figure 6a is a feature point distribution map obtained by FAST feature extraction on the images acquired by the left and right cameras. However, the image matching rate of the left and right cameras was not high, and the matching ratio was only 0.55. No feature points exist in the image after stereo matching. If the feature detection module does not have the feature point data output, the visual inertia mileage calculation method cannot perform the posture update, resulting in the track accumulating offset, and serious errors may occur as shown in Figure 6b.

Lighting Causes the Failure of Feature Tracking
When the lighting is different from the left and the right cameras, the average gray value of the images acquired by the left and right cameras was different, and the matching rate was low. Figure 6a is a feature point distribution map obtained by FAST feature extraction on the images acquired by the left and right cameras. However, the image matching rate of the left and right cameras was not high, and the matching ratio was only 0.55. No feature points exist in the image after stereo matching. If the feature detection module does not have the feature point data output, the visual inertia mileage calculation method cannot perform the posture update, resulting in the track accumulating offset, and serious errors may occur as shown in Figure 6b.

Lighting Causes the Failure of Feature Tracking
When the lighting is different from the left and the right cameras, the average gray value of the images acquired by the left and right cameras was different, and the matching rate was low. Figure 6a is a feature point distribution map obtained by FAST feature extraction on the images acquired by the left and right cameras. However, the image matching rate of the left and right cameras was not high, and the matching ratio was only 0.55. No feature points exist in the image after stereo matching. If the feature detection module does not have the feature point data output, the visual inertia mileage calculation method cannot perform the posture update, resulting in the track accumulating offset, and serious errors may occur as shown in Figure 6b.

Uneven Distribution of Features
We used the distribution of feature points as variables and compared the trajectory with the original feature distribution as shown in Figure 7a. However, when the feature points were only distributed in the red area, the movement trajectory of the feature points was directed to the right side of the image. As shown in Figure 7b, the trajectory of the feature with uneven distribution had an obvious deviation to the left. 10

Uneven Distribution of Features
We used the distribution of feature points as variables and compared the trajectory with the original feature distribution as shown in Figure 7a. However, when the feature points were only distributed in the red area, the movement trajectory of the feature points was directed to the right side of the image. As shown in Figure 7b, the trajectory of the feature with uneven distribution had an obvious deviation to the left.

Moving feature point
Pedestrians walked in front of the camera, and the contrast tracking is plotted in Figure 8b. It is obvious in the circle that the green track shifted to the left because of the influence of the pedestrians. We analyzed the details of this moment. As can be seen from Figure 8a, when the pedestrian moved, more than half of the extracted feature points were gathered on the pedestrian. Therefore, the movement of the pedestrian relative to the camera will lead to the deviation of the positioning results. As the pedestrian moved toward the right side of the camera, those feature points on the pedestrian accumulated the corresponding movements which caused the estimated position to produce a leftward deviation as shown in the black elliptical region in Figure 8b.

Moving Feature Point
Pedestrians walked in front of the camera, and the contrast tracking is plotted in Figure 8b. It is obvious in the circle that the green track shifted to the left because of the influence of the pedestrians. We analyzed the details of this moment. As can be seen from Figure 8a, when the pedestrian moved, more than half of the extracted feature points were gathered on the pedestrian. Therefore, the movement of the pedestrian relative to the camera will lead to the deviation of the positioning results. As the pedestrian moved toward the right side of the camera, those feature points on the pedestrian accumulated the corresponding movements which caused the estimated position to produce a leftward deviation as shown in the black elliptical region in Figure 8b.

Evaluation of Proposed PDR-Assisted Visual Integrity Monitoring
This experiment was carried out in a large office building with a length of 100 m and a height of 20 m which lasted for 20 min and spanned three floors. The abnormalities of visual observations led to the update frequency of visual observation, and the estimation of the gyroscope bias generated a large abnormal fluctuation. According to the update frequency of visual observation and the

Evaluation of Proposed PDR-Assisted Visual Integrity Monitoring
This experiment was carried out in a large office building with a length of 100 m and a height of 20 m which lasted for 20 min and spanned three floors. The abnormalities of visual observations led to the update frequency of visual observation, and the estimation of the gyroscope bias generated a large abnormal fluctuation. According to the update frequency of visual observation and the estimation of the gyroscope bias, we divided the path into three parts to introduce the effect of PDR-assistance based on visual observations as shown in Figure 9 (Section A, B, C). The following are the content tests and evaluations of the experimental results of our positioning system with the proposed PDR-assisted visual integrity monitoring.

Section A
The length of walking distance of "Section A" was approximately 340 m. The specific action track was to walk straight in the corridor, then go downstairs to the next floor and walk three times in the hall. It can be seen from Figure 10a that the path of the MSCKF had a large deviation in direction. To illustrate the change in the trajectory, the path in the yellow area was the amplified path, and the red path was the path of the PDR assistance. The scene here is shown in Figure 10b, which is the stair area. When the feature observation is rare and a turn is made, the visual update frequency of the MSCFK will become lower, which means the frequency of the feature points discarded is not enough during the filter update process. These situations cause a deviation in the direction of the MSCKF. After the auxiliary switching by PDR, it can effectively improve the system to provide a reliable path output when a visual observation is insufficient.

Section A
The length of walking distance of "Section A" was approximately 340 m. The specific action track was to walk straight in the corridor, then go downstairs to the next floor and walk three times in the hall. It can be seen from Figure 10a that the path of the MSCKF had a large deviation in direction. To illustrate the change in the trajectory, the path in the yellow area was the amplified path, and the red path was the path of the PDR assistance. The scene here is shown in Figure 10b, which is the stair area. When the feature observation is rare and a turn is made, the visual update frequency of the MSCFK will become lower, which means the frequency of the feature points discarded is not enough during the filter update process. These situations cause a deviation in the direction of the MSCKF. After the auxiliary switching by PDR, it can effectively improve the system to provide a reliable path output when a visual observation is insufficient.

Section B
The length of walking distance of "Section B" was approximately 214 m which went upstairs to the rooftop area and involved two laps of the rooftop space. As shown in Figure 11a, the path of the MSCKF was a little irregular which is the yellow area. We can see in Figure 11b that moving feature points always exist in the process of going upstairs. There were many failures of feature tracking on the stairs, resulting in the MSCKF's trajectory direction always being offset. In this case, the relative displacement based on the prediction of IMU and visual observation was quite different. According to this situation, the frequency of the switching of the PDR was relatively high, and the positioning direction could be ensured more accurately. But there was an error in the calculation of the step size of the PDR, resulting in a longer overall trajectory in the process of going upstairs.

Section B
The length of walking distance of "Section B" was approximately 214 m which went upstairs to the rooftop area and involved two laps of the rooftop space. As shown in Figure 11a, the path of the MSCKF was a little irregular which is the yellow area. We can see in Figure 11b that moving feature points always exist in the process of going upstairs. There were many failures of feature tracking on the stairs, resulting in the MSCKF's trajectory direction always being offset. In this case, the relative displacement based on the prediction of IMU and visual observation was quite different. According to this situation, the frequency of the switching of the PDR was relatively high, and the positioning direction could be ensured more accurately. But there was an error in the calculation of the step size of the PDR, resulting in a longer overall trajectory in the process of going upstairs.

Section C
The length of walking distance of "Section C" was approximately 166 m, and the route was looped two times in an empty room which mostly contained turns. Due to the existence of a large number of white wall scenes, the visual observation of the MSCKF was relatively poor in quantity

Section C
The length of walking distance of "Section C" was approximately 166 m, and the route was looped two times in an empty room which mostly contained turns. Due to the existence of a large number of white wall scenes, the visual observation of the MSCKF was relatively poor in quantity and quality. The filter estimated the deviation of a wrong gyroscope, resulting in a deviation of the overall trajectory direction. With the short-term reliability of the PDR, the direction deviation of the positioning can be reduced. It can be seen from Figure 12a, that, at the first turn, there was no PDR assistance, resulting in a deviation in the direction of the MSCKF. However, it was obvious that the PDR switched at the turn the second time which effectively reduced the direction error of the system.

Section C
The length of walking distance of "Section C" was approximately 166 m, and the route was looped two times in an empty room which mostly contained turns. Due to the existence of a large number of white wall scenes, the visual observation of the MSCKF was relatively poor in quantity and quality. The filter estimated the deviation of a wrong gyroscope, resulting in a deviation of the overall trajectory direction. With the short-term reliability of the PDR, the direction deviation of the positioning can be reduced. It can be seen from Figure 12a, that, at the first turn, there was no PDR assistance, resulting in a deviation in the direction of the MSCKF. However, it was obvious that the PDR switched at the turn the second time which effectively reduced the direction error of the system. To describe the positioning results more clearly, we labeled 32 landmarks and recorded the location information of those landmarks. Then we calculated the positioning errors of MSCKF, PDR, and our positioning system based on the experimental data. As shown in Figure 13, it can be seen To describe the positioning results more clearly, we labeled 32 landmarks and recorded the location information of those landmarks. Then we calculated the positioning errors of MSCKF, PDR, and our positioning system based on the experimental data. As shown in Figure 13, it can be seen from the line chart of the positioning error and the graph of the cumulative distribution function (CDF) that the positioning accuracy of our system was significantly improved. The positioning error of the MSCKF was mainly caused by the poor quality of visual observation, while the positioning result of PDR was due to the cumulative error that was caused by the error of step detection during pedestrian turning. 14 from the line chart of the positioning error and the graph of the cumulative distribution function (CDF) that the positioning accuracy of our system was significantly improved. The positioning error of the MSCKF was mainly caused by the poor quality of visual observation, while the positioning result of PDR was due to the cumulative error that was caused by the error of step detection during pedestrian turning. At this part, we performed a real-time positioning experiment, such as IPIN [37] (The International Conference on Indoor Positioning and Indoor Navigation) competitions, and conducted a quantitative analysis of the positioning situation based on the criteria for performance evaluation of IPIN competitions. As shown in Figure 14, we tested our system in a large and challenging multifloor environment with a significant path length and duration to evaluate its performance. The total length of the walking route was 1400 m, and the walking area spanned four floors. Then, we performed a numerical analysis to show the accuracy of our system in detail. At this part, we performed a real-time positioning experiment, such as IPIN [37] (The International Conference on Indoor Positioning and Indoor Navigation) competitions, and conducted a quantitative analysis of the positioning situation based on the criteria for performance evaluation of IPIN competitions. As shown in Figure 14, we tested our system in a large and challenging multi-floor environment with a significant path length and duration to evaluate its performance. The total length of the walking route was 1400 m, and the walking area spanned four floors. Then, we performed a numerical analysis to show the accuracy of our system in detail. At this part, we performed a real-time positioning experiment, such as IPIN [37] (The International Conference on Indoor Positioning and Indoor Navigation) competitions, and conducted a quantitative analysis of the positioning situation based on the criteria for performance evaluation of IPIN competitions. As shown in Figure 14, we tested our system in a large and challenging multifloor environment with a significant path length and duration to evaluate its performance. The total length of the walking route was 1400 m, and the walking area spanned four floors. Then, we performed a numerical analysis to show the accuracy of our system in detail. To better display the experimental results, the positioning trajectory of each floor is shown in Figure 15. The average error of the positioning results in this experiment was approximately 2.5838 m; we also plotted the CDF of the positioning error as shown in Figure 16. The final score metric was To better display the experimental results, the positioning trajectory of each floor is shown in Figure 15. The average error of the positioning results in this experiment was approximately 2.5838 m; we also plotted the CDF of the positioning error as shown in Figure 16. The final score metric was the third quartile of the positioning error in IPIN which makes the accuracy results less prone to the influence of outliers and more in-line with demanded accuracy for commercial systems. So, the final score of our system was 2.13 m. 16 the third quartile of the positioning error in IPIN which makes the accuracy results less prone to the influence of outliers and more in-line with demanded accuracy for commercial systems. So, the final score of our system was 2.13 m.

Summary and Discussion
To solve the problem of vision-based positioning systems being very susceptible to environmental influences, we analyzed the error source of visual observations when vision-based positioning systems had a large positioning error under special indoor environments having fewer textures, dynamic obstacles or dim lighting. We divided the error sources of visual measurement into four error situations and performed detailed analysis and explanation. The first part of the experiment assessed the scenarios' impact on the positioning results to display intuitively the effect of feature observation. To address this issue, we proposed autonomous integrity monitoring of visual observation based on a pedestrian dead reckoning system. Through the error analysis of PDR, it was found that the error of PDR in a short time was small and bounded. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring switches states between MSCKF and PDR automatically to provide a more accurate position in an indoor environment. The second part of the experiment tested and evaluated the proposed PDR-assisted visual integrity monitoring. In conclusion, we proved that our positioning system can effectively provide more reliable and accurate positioning results. Future research should consider the potential effects of visual observation more carefully. Also, future investigations are necessary to improve the accuracy of the step size of the PDR to improve the positioning accuracy of the system.

Summary and Discussion
To solve the problem of vision-based positioning systems being very susceptible to environmental influences, we analyzed the error source of visual observations when vision-based positioning systems had a large positioning error under special indoor environments having fewer textures, dynamic obstacles or dim lighting. We divided the error sources of visual measurement into four error situations and performed detailed analysis and explanation. The first part of the experiment assessed the scenarios' impact on the positioning results to display intuitively the effect of feature observation. To address this issue, we proposed autonomous integrity monitoring of visual observation based on a pedestrian dead reckoning system. Through the error analysis of PDR, it was found that the error of PDR in a short time was small and bounded. According to the characteristic of short-term reliability of PDR, the proposed PDR-assisted visual integrity monitoring switches states between MSCKF and PDR automatically to provide a more accurate position in an indoor environment. The second part of the experiment tested and evaluated the proposed PDR-assisted visual integrity monitoring. In conclusion, we proved that our positioning system can effectively provide more reliable and accurate positioning results. Future research should consider the potential effects of visual observation more carefully. Also, future investigations are necessary to improve the accuracy of the step size of the PDR to improve the positioning accuracy of the system.