A Tightly Coupled Visual-Inertial GNSS State Estimator Based on Point-Line Feature

Visual-inertial odometry (VIO) is known to suffer from drifting and can only provide local coordinates. In this paper, we propose a tightly coupled GNSS-VIO system based on point-line features for robust and drift-free state estimation. Feature-based methods are not robust in complex areas such as weak or repeated textures. To deal with this problem, line features with more environmental structure information can be extracted. In addition, to eliminate the accumulated drift of VIO, we tightly fused the GNSS measurement with visual and inertial information. The GNSS pseudorange measurements are real-time and unambiguous but experience large errors. The GNSS carrier phase measurements can achieve centimeter-level positioning accuracy, but the solution to the whole-cycle ambiguity is complex and time-consuming, which degrades the real-time performance of a state estimator. To combine the advantages of the two measurements, we use the carrier phase smoothed pseudorange instead of pseudorange to perform state estimation. Furthermore, the existence of the GNSS receiver and IMU also makes the extrinsic parameter calibration crucial. Our proposed system can calibrate the extrinsic translation parameter between the GNSS receiver and IMU in real-time. Finally, we show that the states represented in the ECEF frame are fully observable, and the tightly coupled GNSS-VIO state estimator is consistent. We conducted experiments on public datasets. The experimental results demonstrate that the positioning precision of our system is improved and the system is robust and real-time.


Introduction
Localization plays an important role in many applications such as robotics, unmanned driving, and unmanned aerial vehicles (UAVs). The fusion of information from multiple sensors for localization is the current mainstream method. The sensor fusion approaches have been widely studied for decades and can be divided into two streams: loosely coupled and tightly coupled approaches. Loosely coupled approaches [1,2] process the measurements of each sensor individually and then fuse all the results to obtain the final results. Tightly coupled approaches [3,4] combine measurements from all sensors and then use an estimator to process these measurements to obtain the final results. In general, the accuracy of tightly coupled approaches will be higher than that of loosely coupled approaches.
Visual odometry (VO) has received more attention due to the low price of cameras. Davison et al. [5] introduced MonoSLAM, which is the first monocular visual SLAM system that extracts the Shi-Tomasi corner [6]. MonoSLAM actively searches for matching point features in projected ellipses. However, because of the high computational complexity of MonoSLAM, it can only handle some small scenes. Klein et al. [7] proposed PTAM, which is the first optimization-based monocular visual SLAM system that executes feature tracking and mapping as two independent tasks in parallel in two threads. • To obtain environmental structure information and deal with environments with repeated textures, we extracted line features based on point features.

•
To combine the merits of the pseudorange and carrier phase measurements, we used the carrier phase smoothed pseudorange instead of the pseudorange measurement, which can make the GNSS-VIO system run in real-time and improve the positioning accuracy.

•
We demonstrate that the states represented in the ECEF frame are fully observable, and the tightly coupled GNSS-VIO state estimator is consistent.
The rest of this paper is organized as follows: Section 2 introduces the implementation method of GNSS-VIO in detail, including line features, GNSS raw measurements processing, and observability analysis. Section 3 summarizes the detailed structure of our system. Section 4 conducts experiments on public datasets. Finally, conclusions are given in Section 5.

Frames and Notations
The frames involved in our system consist of:

•
Sensor Frame: In our system, (·) c , (·) b , and (·) r denote the camera frame, the body frame, and the GNSS receiver frame respectively. • Local World Frame: The origin of the local world frame (·) w is the position where the VIO system starts running, and the z-axis is gravity aligned as illustrated in Figure 1. • ENU Frame: The ENU frame is also called the East-North-Up frame. The x-, y-, and z-axis of the ENU frame point to the east, north, and up directions, respectively. In our system, the ENU frame is at the same origin as the local world, and the z-axis of the two frames is aligned ( Figure 1). We use (·) N to represent the ENU frame. • Geodetic Frame: As shown in Figure 1, geodetic coordinate (·) G of a point p is represented by geodetic longitude L, geodetic latitude B, and geodetic height H. The geodetic longitude L of p is the angle between the reference meridian and another meridian that passes through p. The geodetic latitude B of p is the angle between the ellipsoid normal vector that passes through p and the projection of the ellipsoid normal vector into the equatorial plane. The geodetic height H of p is the minimum distance between p and the reference ellipsoid. • ECEF Frame: The Earth-Centered, Earth-Fixed (ECEF) frame (·) E is fixed to Earth. As depicted in Figure 1, the origin of the ECEF frame is at the center of mass of Earth. The x-axis points to the intersection of the Equator and the Prime Meridian. The z-axis is perpendicular to the equatorial plane in the direction of the North Pole. The y-axis is chosen to form a right-handed coordinate system with the xand z-axis. q stand for the extrinsic parameters between the camera and IMU. r δt and  r δt represent the receiver clock error and receiver clock drifting rate, respectively. φ is the yaw between the local world frame and the ENU frame. N w R is the rotation matrix of the local world frame with respect to the ENU frame. E N p denotes the translation of the ENU frame with respect to the ECEF frame. b r p is the extrinsic translation parameter between the GNSS receiver and IMU. × ⋅ [ ] represents the skew-symmetric matrix of a 3D vector.

Plücker Coordinates
In our system, we described a spatial line with the Plücker Coordinates. Given a spatial line ∈ ∈  w d is the line direction vector. We can transform w L in the local world frame to c L in the camera frame by:

Line Feature Triangulation
Assuming that w  is observed by camera i c and camera j c in the normalized image In this paper, R w b and p w b represent the rotation and translation of the body frame with respect to the local world frame, and q w b is the corresponding quaternion form of the rotation R w b . v w b , b a , and b g denote the velocity of the origin of the body frame measured in the local world frame, accelerometer bias, and gyroscope bias, respectively. p b c and q b c stand for the extrinsic parameters between the camera and IMU. δt r and δ . t r represent the receiver clock error and receiver clock drifting rate, respectively. ϕ is the yaw between the local world frame and the ENU frame. R N w is the rotation matrix of the local world frame with respect to the ENU frame. p E N denotes the translation of the ENU frame with respect to the ECEF frame. p b r is the extrinsic translation parameter between the GNSS receiver and IMU.

Plücker Coordinates
In our system, we described a spatial line with the Plücker Coordinates. Given a spatial line L w ∈ π w , its Plücker Coordinates are represented by L w = (n T w , d T w ) T , where n w ∈ R 3 is the normal vector of the plane determined by L w , and the origin of π w and d w ∈ R 3 is the line direction vector. We can transform L w in the local world frame to L c in the camera frame by:

Line Feature Triangulation
Assuming that L w is observed by camera c i and camera c j in the normalized image plane as z c i L w and z c j L w , the two line segments can be denoted by two endpoints as shown in

Orthonormal Representation
Since spatial lines only have 4-DoF, there is a problem of overparameterization using the Plücker Coordinates to represent spatial lines. In contrast, the orthonormal representation (2) is more suitable for nonlinear optimization. Let ( ) = U R ψ and ( ) = W R φ denote 3D and 2D rotation matrix, respectively, then we have: A 3D plane π can be modeled as: where n π = [π x , π y , π z ] T is the normal vector of π. For a point p 0 = [x 0 , y 0 , z 0 ] T on the plane, we can obtain: According to Equations (2) and (3), we can obtain the plane π i = (n π i , π w i ) and π j = n π j , π w j . As shown in Figure 2, the normal vector of π i is n π i = s c i L w × e c i L w , and π w i can be computed by the optical center c i = [0, 0, 0] T as π w i = −n T π i · c 0 = 0. To obtain π j , we need to transform the two endpoints s c j L w and e c j L w to the camera frame c i . Therefore, the corresponding reprojected endpoints are s c j can be obtained by visual-inertial alignment. Similarly, the normal vector of π j is n π j = s c j L w × e c j L w , and π w j can be calculated by the translation p c i c j from the camera frame c j to the camera frame c i as π w j = −n T π j · p c i c j . After π i and π j are computed, we can obtain the Plücker Coordinates of L w according to the dual Plücker matrix L * w :

Orthonormal Representation
Since spatial lines only have 4-DoF, there is a problem of overparameterization using the Plücker Coordinates to represent spatial lines. In contrast, the orthonormal representation (U, W) ∈ sO(3) × sO(2) is more suitable for nonlinear optimization. Let U = R(ψ) and W = R(φ) denote 3D and 2D rotation matrix, respectively, then we have: where ψ = [ψ 1 , ψ 2 , ψ 3 ] T and φ are the 3D rotation angles around the x-, y-, and z-axis of the camera frame and the 2D rotation angle. Therefore, we can define the orthonormal representation of a spatial line by a fourdimensional vector: In addition, given an orthonormal representation (U, W), the corresponding Plücker Coordinates can be obtained by: where w 1 = cos(φ), w 2 = sin(φ), and u i is the ith column of U. L w and L w have a scale factor, but they denote the same spatial line.

Line Feature Reprojection Residual
The line feature reprojection residual is defined in terms of point-to-line distance.
Given a spatial line L c = (n T c , d T c ) T , it can be projected to the image plane by [24]: where K L is the line projection matrix. Finally, assume that L j w represents the jth spatial line L j , which is observed by the ith camera frame c i . Then the spatial line reprojection residual can be modeled as: where s For the corresponding Jacobian matrix, we followed the routine of [10].

GNSS Measurements
GNSS measurements include pseudorange, carrier phase, and Doppler frequency shift.

Pseudorange Measurement
The pseudorange is defined as the measured distance obtained by multiplying the travel time of the satellite signal by the speed of light. Due to the influence of satellite clock error, receiver clock error, and ionospheric and tropospheric delays, the pseudorange is prefixed with "pseudo" to distinguish it from the true distance from the satellite to the GNSS receiver. Generally, although the pseudorange measurement only has meter-level positioning precision (the positioning precision of P code is about 10 m, and the positioning precision of C/A code is 20 m to 30 m), it is real-time and has no ambiguity. Therefore, the pseudorange measurement is still very important for GNSS positioning technology. For a certain satellite s j and a GNSS receiver r k at time k, the pseudorange P s j r k can be modeled as: where p E s j and p E r k represent the position of satellite s j and the GNSS receiver r k at time k in the ECEF frame, respectively. c is the speed of light. δt s j , T s j r k , and I s j r k are the satellite clock error and tropospheric and ionospheric delays, which can be computed according to the satellite ephemeris. ε p j k ∼ 0, σ 2 p j k denotes the multipath and random error of the pseudorange measurement, which is subject to the Zero-mean Gaussian distribution.

Carrier Phase Measurement
Although pseudorange positioning is an important method for GNSS, its error is too large for some applications that require high-precision positioning. In contrast, due to the short wavelength of the carrier phase (λ L 1 = 19 cm and λ L 2 = 24 cm), if the carrier phase measurement is used for positioning, it can achieve centimeter-level positioning accuracy. However, since the carrier phase is a periodic sinusoidal signal, and the GNSS receiver can only measure the part of less than one wavelength, there is the problem of the whole-cycle ambiguity, which makes the positioning process time-consuming. The carrier phase is defined as the phase difference between the phase transmitted by the satellite and the reference phase generated by the GNSS receiver. Similar to the pseudorange measurement, the carrier phase measurement is also related to the position of satellite and GNSS receiver. The carrier phase measurement can be modeled as: where λ is the carrier wavelength, and N is the whole-cycle ambiguity.
represents the multipath and random error of the carrier phase measurement, which is subject to the Zero-mean Gaussian distribution.

Doppler Frequency Shift Measurement
The Doppler effect reveals a phenomenon in the spread of waves, that is, the wavelength of the radiation emitted by objects changes accordingly due to the relative motion of the wave source and the observer. As the wave source moves toward the observer, the wavelength becomes shorter and the frequency becomes higher due to the compression of the wave. On the contrary, as the wave source moves away from the observer, the wavelength becomes longer and the frequency becomes lower. Similarly, the Doppler effect can also occur between the satellite and the GNSS receiver. When a satellite orbits Earth in its elliptical orbit, due to the relative motion between the satellite and the GNSS receiver, the frequency of the satellite signal received by the GNSS receiver changes accordingly. This frequency change is called Doppler frequency shift, which can be modeled as: where is the unit vector pointing along the line of sight from the GNSS receiver r k to the satellite s j . v E s j and v E r k denote the velocity of the jth satellite and the receiver at time k in the ECEF frame, respectively. δ . t s j is the satellite clock drifting rate, which can be calculated according to the satellite ephemeris.
represents the multipath and random error of the Doppler frequency shift measurement, which is subject to the Zero-mean Gaussian distribution.

Carrier Phase Smoothed Pseudorange
As mentioned above, the pseudorange measurement proves fast and efficient, but can only achieve meter-level positioning precision. In contrast, the carrier phase measurement can achieve centimeter-level positioning precision but is required to solve the whole-cycle ambiguity, which is complex and time-consuming. Therefore, to combine the merits of the two measurements, we can use carrier phase smoothed pseudorange (short for smoothed pseudorange) to improve the positioning precision. In general, the positioning precision of the smoothed pseudorange measurement is several times higher than that of the pseudorange measurement. For a single-frequency receiver, when the whole-cycle ambiguity and ionospheric delay are nearly constant within a period of time, the pseudorange can be smoothed by using the carrier phase. The Hatch filter is the most widely used for carrier phase smoothed pseudorange, which assumes that the ionospheric delay is nearly constant among GNSS epochs and then averages the multiepoch whole-cycle ambiguity and ionospheric delay to improve the positioning accuracy of the pseudorange measurement. The carrier phase smoothed pseudorange based on the Hatch filter can be modeled as: where m is the smoothed interval of the Hatch filter, usually 20 to 200 epochs. ε denotes the multipath and random error of the carrier phase smoothed pseudorange measurement, which is also subject to the Zero-mean Gaussian distribution. According to Equation (15), the variance of the smoothed pseudorange is related to the variance of the pseudorange and carrier phase measurements. Assuming that the variance at different times is independent, then the variance of the smoothed pseudorange can be modeled as:

Factor Graph Representation
The factor graph of our system is shown in Figure 3, which shows the factors in a sliding window, including point feature factors, line feature factors, IMU factors, carrier phase smoothed pseudorange factors, and Doppler frequency shift factors. Visual observation consists of the point and line features detected by our system. In nonlinear optimization, the states are optimized according to the residuals of these factors. The states of our system include: where n is the sliding window size, i is the number of point features, and j is the number of line features. µ i is the inverse depth of the ith point feature in the sliding window.
, , ψ φ (17) where n is the sliding window size, i is the number of point features, and j is the number of line features. i μ is the inverse depth of the ith point feature in the sliding window. The IMU preintegration and point feature residuals can be obtained according to [16], and the line feature residual can be obtained from Equation (10). In the following, we compute the smoothed pseudorange and Doppler frequency shift residuals in detail. The position of the receiver at time k in the ECEF frame is: where p k w r is the position of the receiver in the local world frame, and it can be modeled as: R N w is the rotation matrix of the local world frame with respect to the ENU frame. Since the two frames are gravity-aligned, the 1-DoF R N w is only related to the yaw offset φ and can be modeled as: R E N represents the rotation matrix of the ENU frame with respect to the ECEF frame, which is determined by the longitude L and latitude B of p E N . Given a position = p E N X Y Z T [ , , ] in the ECEF frame, it can be denoted in the geodetic frame as:  The IMU preintegration and point feature residuals can be obtained according to [16], and the line feature residual can be obtained from Equation (10). In the following, we compute the smoothed pseudorange and Doppler frequency shift residuals in detail. The position of the receiver at time k in the ECEF frame is: where p w r k is the position of the receiver in the local world frame, and it can be modeled as: R N w is the rotation matrix of the local world frame with respect to the ENU frame. Since the two frames are gravity-aligned, the 1-DoF R N w is only related to the yaw offset φ and can be modeled as: R E N represents the rotation matrix of the ENU frame with respect to the ECEF frame, which is determined by the longitude L and latitude B of p E N . Given a position p E N = [X, Y, Z] T in the ECEF frame, it can be denoted in the geodetic frame as: where a and b denote the semimajor axis and the semiminor axis of the elliptical orbit, respectively. N r is the radius of curvature in prime vertical, and e is the eccentricity that is related to a and b.
After the longitude L and latitude B of p E N are computed, R E N can be obtained by: According to Equation (15), the smoothed pseudorange residual can be represented as: In nonlinear optimization, it is necessary to calculate the Jacobian matrix of the smoothed pseudorange residual with respect to the states. Therefore, from Equation (23), the Jacobian matrix of the smoothed pseudorange can be obtained as: with where .

R
The derivation details are provided in Appendix A.
In Equation (14), the velocity of the receiver in the ECEF frame is transformed from that in the local world frame: Therefore, the Doppler frequency shift residual can be obtained by: Similar to the smoothed pseudorange, the Jacobian matrix of the Doppler frequency shift is: The corresponding derivation rule is similar to the Jacobian matrix of the smoothed pseudorange and will not be repeated here.
After the residuals of all factors are obtained, then the system can optimize the states by Ceres solver [25]. The cost function of our system is: where r p is the prior residual for marginalization. B is the set of IMU preintegration measurements in the sliding window. C and L are the set of point and line features in the sliding window, respectively. α is the Cauchy robust function used to suppress outliers.

GNSS-IMU Calibration
p b r is the extrinsic translation parameter between the GNSS receiver and IMU. After our system performs ENU Origin Estimation and Yaw Estimation, then we can calibrate p b r . We estimate p b r as the initial value of nonlinear optimization through the following optimization problem: After the GNSS-IMU calibration is successfully initialized, our system performs the nonlinear optimization.

Observability Analysis of Tightly Coupled GNSS-VIO System
A SLAM system can be described using a state equation and an output equation, where the input and output are the external variables of the system, and the state is the internal variables of the system. If the states of the system can be completely represented by the output, the system is fully observable; otherwise, the system is not fully observable. Observability plays a very important role in the state estimation problem. If some states of the system are not observable, the positioning precision of the system will be affected when running in long trajectories. The observability of the system can be represented by the observability matrix. If the dimension of the null space of the observability matrix is equal to 0, then the system is fully observable. To facilitate the observability analysis of our proposed system, some simplifications were required. First, the accelerometer and gyroscope biases were not included in the states, because the biases were observable and they did not change the results of the observability analysis [26]. Second, we considered a single point and line feature [27]. Third, the translation parameter p b r was successfully calibrated. Then the discrete-time linear error state model and residual of the system are: where δx k is the error state, and r k is the residual. Φ k and H k are the error-state transition matrix and the measurement Jacobian matrix, respectively. w k and n k represent the system noise process and the measurement noise process, respectively. The noise process is modeled as a Zero-mean white Gaussian process. According to [28], the observability matrix can be obtained as: In Equation (34), the observability matrix is defined as a function of the error-state transition matrix Φ k and the measurement Jacobian matrix H k .
Therefore, given the linearized system in Equation (33), its observability can be demonstrated according to Equation (34). The proof is as follows: Theorem 1. The states represented in the ECEF frame are fully observable.
Proof of Theorem 1. The simplified states include: Generally, the raw accelerometer and gyroscope measurements from IMU can be obtained by: where a b and ω b are the accelerometer and gyroscope measurements. The measurements are represented in the local world frame. Given two time instants, position, velocity, and orientation states can be propagated by the IMU measurements: where ⊗ denotes the quaternion multiplication operation. Equation (37) is the continuous state propagation model. To analyze the observability of the system, it was necessary to perform the discretization of the continuous-time system model. Therefore, we used the Euler method to compute the discrete-time model of Equation (37): According to Equation (38), the error propagation equation of position, velocity, and orientation can be obtained by: The details of Equation (39) are provided in Appendix B.
Similarly, when the states are represented in the ECEF frame, Equation (39) is still held, namely: Therefore, the discrete-time error state model of Equation (35) can be obtained: where Since Φ k is an upper triangular matrix, the error-state transition matrix Φ k+t,k from time step k to k + t is also an upper triangular matrix, namely: where Φ k+t 13 , Φ k+t 23 , and Φ k+t 33 are nonzero entries. For the measurement Jacobian matrix, our system consists of visual observations and GNSS measurements. GNSS measurements include pseudorange, carrier phase, and Doppler frequency shift, and any of them yields the same results for observability analysis. Therefore, in the following, we used the smoothed pseudorange measurement for observability analysis.
The Jacobian matrix of the smoothed pseudorange measurement with respect to the states in Equation (35) can be obtained by: The derivation details are similar to the Jacobian matrix of the smoothed pseudorange, which has been computed in Appendix A. Since the smoothed pseudorange measurement does not include velocity, orientation, the inverse depth of point feature, and the orthonormal representation of line feature, the corresponding entry of the Jacobian matrix is equal to zero.
In addition, we can obtain the Jacobian matrix of the visual observations by: where ∂r C ∂δµ k+t and ∂r L ∂δo k+t can be obtained from [10]. Therefore, we can obtain the entry of the observability matrix: According to Equation (46), we can clearly observe that the dimension of the null space of M is equal to zero, which means the states represented in the ECEF frame are fully observable and the tightly coupled GNSS-VIO state estimator is consistent.
The fact that the tightly coupled GNSS-VIO system is fully observable means that even if the system runs in long trajectories, the accumulated error can be eliminated. By leveraging the global measurements from GNSS, our system can achieve high-precision and drift-free positioning compared with the VIO system, which has four unobservable directions.

System Overview
The architecture of our proposed system is shown in Figure 4.  can be obtained from [10]. Therefore, we can obtain the entry of the observability matrix: According to Equation (46), we can clearly observe that the dimension of the null space of M is equal to zero, which means the states represented in the ECEF frame are fully observable and the tightly coupled GNSS-VIO state estimator is consistent. □ The fact that the tightly coupled GNSS-VIO system is fully observable means that even if the system runs in long trajectories, the accumulated error can be eliminated. By leveraging the global measurements from GNSS, our system can achieve high-precision and drift-free positioning compared with the VIO system, which has four unobservable directions.

System Overview
The architecture of our proposed system is shown in Figure 4. The proposed system implements four threads including data input, preprocessing, initialization, and nonlinear optimization. As shown in Figure 4, the white block diagrams The proposed system implements four threads including data input, preprocessing, initialization, and nonlinear optimization. As shown in Figure 4, the white block diagrams represent the work that has been implemented by VINS-Mono [16] and GVINS [21], and the green block diagrams represent improvements we made. The inputs of our system are image, IMU, and GNSS measurements. In the preprocessing step, point and line feature detection and tracking were performed, IMU measurements were preintegrated, and the pseudorange measurements were smoothed by the carrier phase. In the initialization step, we followed the routine of VINS-Mono [16] for VI-Alignment. After VI-Alignment was completed, we performed GNSS initialization, which was divided into four stages: ENU Origin Estimation (Coarse), Yaw Estimation, ENU Origin Estimation (Fine), and GNSS-IMU Calibration. The first three stages were implemented in GVINS [21]. Finally, nonlinear optimization was performed. Nonlinear optimization will optimize the states in Equation (17) by leveraging the residuals and Jacobian matrices of different factors, which were computed in Section 2.5.

Experimental Results
Our experiments were conducted on the public dataset GVINS-Dataset [29], which captured scenes from the Hong Kong University of Science and Technology. The measurements were collected by a helmet that is equipped with a VI-Sensor and a u-blox ZED-F9P GNSS receiver. The dataset sports field captured a sports field scene where the device followed an athletic track for five laps. The sports field is an outdoor environment with an open area where the satellites are well locked and the RTK solution remains fixed. The other dataset complex environment was a complex indoor-outdoor environment where many challenging scenes were captured. For example, point and line features cannot be detected in bright or dim scenes, and the GNSS signal was highly corrupted or unavailable in cluttered or indoor environments (about 25 m). The overall distance of the complex environment dataset was over 3 km. We compared our proposed system with some open-source SLAM systems, including GVINS and VINS-Mono where VINS-Mono includes results with and without loop closure. For comparison with our results, we transformed the trajectories of VINS-Mono from the local world frame to the ECEF frame. Furthermore, since RTK has centimeter-level positioning precision, we compared it with our trajectories as the ground truth.

Sports Field
We plotted the trajectories of the sports field on Google Earth as shown in Figure 5. We see that VINS-Mono without loop closure suffers from accumulated drift among all three directions, which leads to the worst positioning precision among methods. Obviously, the drift increases with each lap around the sports field. VINS-Mono with loop closure can significantly improve the positioning precision, but there is still an obvious drift in yaw direction, which is mainly because the VINS-Mono has four unobservable directions. As a comparison, since GVINS is fully observable in the ECEF frame, it has a smaller positioning error and is drift-free. As depicted in Figure 5, its trajectory is very close to RTK. In addition, since our method is also a tightly coupled GNSS-VIO system, which is fully observable in the ECEF frame and the ENU frame, the positioning result of our method is also very accurate.
To quantitatively analyze results, we compared the positioning errors of GVINS and our method in detail. As shown in Figure 6, we visualized the positioning errors of GVINS and our proposed method. We see that the results of GVINS and our system fluctuate within a small range in the ENU frame and ECEF frame. The positioning errors of our method are less than 1m among all three directions of the ENU frame. Compared with GVINS, the error of our system is smaller in the east and north directions. In addition, the error of our method is smooth and stable in the ECEF frame, which is because we smooth the pseudorange measurements with the carrier phase. In addition, we listed the RMSE and Spherical Error Probable (SEP) for the positioning error of GVINS and our method in the ENU frame and ECEF frame. From Table 1, we can see that the RMSE of our positioning errors is lower than that of GVINS except in the up direction. The SEP refers to the radius of a sphere in which 50% of the estimated positions occur, and it can also evaluate the positioning precision of our system. The smaller the SEP, the higher the positioning precision of our system. We see that the SEP of our method in the ECEF frame is lower than that of GVINS, which shows that the positioning precision of our method is more accurate in the ECEF frame. positioning errors is lower than that of GVINS except in the up direction. The SEP refers to the radius of a sphere in which 50% of the estimated positions occur, and it can also evaluate the positioning precision of our system. The smaller the SEP, the higher the positioning precision of our system. We see that the SEP of our method in the ECEF frame is lower than that of GVINS, which shows that the positioning precision of our method is more accurate in the ECEF frame.    positioning errors is lower than that of GVINS except in the up direction. The SEP refers to the radius of a sphere in which 50% of the estimated positions occur, and it can also evaluate the positioning precision of our system. The smaller the SEP, the higher the positioning precision of our system. We see that the SEP of our method in the ECEF frame is lower than that of GVINS, which shows that the positioning precision of our method is more accurate in the ECEF frame.

Complex Environment
In the following, we conducted experiments on the dataset complex environment, and we compared VINS-Mono, GVINS, RTK, and our proposed method. As shown in Figure 7, since VINS-Mono has four unobservable directions, the trajectory of VINS-Mono has a large drift. In addition, we saw that a section of the trajectory of VINS-Mono deviated from RTK by about 34 m, which is an unacceptable result. On the contrary, since GVINS and our proposed system are tightly coupled GNSS-VIO systems, there is no accumulated drift theoretically. In addition, due to a section of the trajectory being collected indoors, the GNSS receiver cannot obtain measurements at this time, which causes the failure of the RTK solution. In contrast, even if the GNSS measurements cannot be obtained, our system works well with VIO, and our trajectory is still accurate.

Complex Environment
In the following, we conducted experiments on the dataset complex environment, and we compared VINS-Mono, GVINS, RTK, and our proposed method. As shown in Figure  7, since VINS-Mono has four unobservable directions, the trajectory of VINS-Mono has a large drift. In addition, we saw that a section of the trajectory of VINS-Mono deviated from RTK by about 34 m, which is an unacceptable result. On the contrary, since GVINS and our proposed system are tightly coupled GNSS-VIO systems, there is no accumulated drift theoretically. In addition, due to a section of the trajectory being collected indoors, the GNSS receiver cannot obtain measurements at this time, which causes the failure of the RTK solution. In contrast, even if the GNSS measurements cannot be obtained, our system works well with VIO, and our trajectory is still accurate. To further compare the performance of GVINS and our method, we analyzed the positioning errors of the two methods in the dataset complex environment as shown in Figure 8. It should be noted that we compared the RTK solution with our method as the ground truth. However, when the receiver is occluded, the satellite signal cannot be received, and the RTK solution is useless at this time. Thus, we only compared with segments where the RTK solution is available. Due to the complexity of the dataset, the overall positioning precision of the system is inferior to that of the sports field. The RMSE and SEP of GVINS and our method are shown in Table 2. We see that the positioning errors of GVINS in the east and north directions are slightly lower than that of our method, while the positioning error of our method in the up direction is much lower than that of GVINS. The SEP metric also shows that our method outperforms GVINS on this dataset. To further compare the performance of GVINS and our method, we analyzed the positioning errors of the two methods in the dataset complex environment as shown in Figure 8. It should be noted that we compared the RTK solution with our method as the ground truth. However, when the receiver is occluded, the satellite signal cannot be received, and the RTK solution is useless at this time. Thus, we only compared with segments where the RTK solution is available. Due to the complexity of the dataset, the overall positioning precision of the system is inferior to that of the sports field. The RMSE and SEP of GVINS and our method are shown in Table 2. We see that the positioning errors of GVINS in the east and north directions are slightly lower than that of our method, while the positioning error of our method in the up direction is much lower than that of GVINS. The SEP metric also shows that our method outperforms GVINS on this dataset.   Furthermore, since the ENU frame is gravity-aligned, we further analyzed the 2D trajectory error in the East-North plane as shown in Figure 9. The trajectory projected on the East-North plane is stained in red. The z-axis represents the positioning errors in the corresponding East-North plane, and the errors are represented by a scatter plot. In some bright or dim scenes, only a few point and line features can be detected, which will affect the positioning precision of the system. In addition, buildings, indoor environments, and trees will block satellite signals, which also degrade the performance of our system. Therefore, the positioning error in the complex environment is larger than that in the sports field, and the maximum error exceeds 2 m. As depicted in Figure 9, a sudden increase in positioning error occurs at the curves of the trajectory due to the large change in the orientation direction. On the contrary, the positioning error of the trajectory with small curvature is smoother compared with that of the trajectory with large curvature, which shows that our method is still effective even in complex scenes.  Furthermore, since the ENU frame is gravity-aligned, we further analyzed the 2D trajectory error in the East-North plane as shown in Figure 9. The trajectory projected on the East-North plane is stained in red. The z-axis represents the positioning errors in the corresponding East-North plane, and the errors are represented by a scatter plot. In some bright or dim scenes, only a few point and line features can be detected, which will affect the positioning precision of the system. In addition, buildings, indoor environments, and trees will block satellite signals, which also degrade the performance of our system. Therefore, the positioning error in the complex environment is larger than that in the sports field, and the maximum error exceeds 2 m. As depicted in Figure 9, a sudden increase in positioning error occurs at the curves of the trajectory due to the large change in the orientation direction. On the contrary, the positioning error of the trajectory with small curvature is smoother compared with that of the trajectory with large curvature, which shows that our method is still effective even in complex scenes.

Observability Analysis
In Section 2.7, we demonstrated that the states represented in the ECEF are fully observable. However, according to [20], if the states are represented in the local world frame, the GNSS-VIO system has still four unobservable directions. Therefore, we can perform observability analysis from experimental results. Figure 10 shows the 3D trajectories of the complex environment represented in the ECEF frame and local world frame. We zoomed in on the trajectories of turning and climbing stairs for the convenience of the analysis. From Figure 10b, we see that since the states are unobservable in the local world frame, the two trajectories of climbing the same stairs have a drift. Even if the GNSS measurements are included in our system, the two trajectories are still inconsistent. In contrast, due to the states represented in the ECEF frame being fully observable, the two trajectories of climbing the same stairs are consistent and very close. Similarly, the same conclusions can be obtained for the trajectories of turning as shown in Figure 10. According to the results of the observability analysis, there is an obvious drift in the two trajectories of turning in the local world frame, while the phenomenon does not occur in the ECEF frame. In addition, the consistency of the system also shows that even in large-scale scenarios, it can still achieve high-precision positioning results as illustrated in Figure 10a.

Observability Analysis
In Section 2.7, we demonstrated that the states represented in the ECEF are fully observable. However, according to [20], if the states are represented in the local world frame, the GNSS-VIO system has still four unobservable directions. Therefore, we can perform observability analysis from experimental results. Figure 10 shows the 3D trajectories of the complex environment represented in the ECEF frame and local world frame. We zoomed in on the trajectories of turning and climbing stairs for the convenience of the analysis. From Figure 10b, we see that since the states are unobservable in the local world frame, the two trajectories of climbing the same stairs have a drift. Even if the GNSS measurements are included in our system, the two trajectories are still inconsistent. In contrast, due to the states represented in the ECEF frame being fully observable, the two trajectories of climbing the same stairs are consistent and very close. Similarly, the same conclusions can be obtained for the trajectories of turning as shown in Figure 10. According to the results of the observability analysis, there is an obvious drift in the two trajectories of turning in the local world frame, while the phenomenon does not occur in the ECEF frame. In addition, the consistency of the system also shows that even in large-scale scenarios, it can still achieve high-precision positioning results as illustrated in Figure 10a.

Observability Analysis
In Section 2.7, we demonstrated that the states represented in the ECEF are fully observable. However, according to [20], if the states are represented in the local world frame, the GNSS-VIO system has still four unobservable directions. Therefore, we can perform observability analysis from experimental results. Figure 10 shows the 3D trajectories of the complex environment represented in the ECEF frame and local world frame. We zoomed in on the trajectories of turning and climbing stairs for the convenience of the analysis. From Figure 10b, we see that since the states are unobservable in the local world frame, the two trajectories of climbing the same stairs have a drift. Even if the GNSS measurements are included in our system, the two trajectories are still inconsistent. In contrast, due to the states represented in the ECEF frame being fully observable, the two trajectories of climbing the same stairs are consistent and very close. Similarly, the same conclusions can be obtained for the trajectories of turning as shown in Figure 10. According to the results of the observability analysis, there is an obvious drift in the two trajectories of turning in the local world frame, while the phenomenon does not occur in the ECEF frame. In addition, the consistency of the system also shows that even in large-scale scenarios, it can still achieve high-precision positioning results as illustrated in Figure 10a.

Smoothed Interval
Equation (15) requires the receiver to continuously lock the carrier phase. If the receiver loses lock, the smoothed pseudorange is corrupted by any cycle slip that occurs and must be reinitialized when that happens. Therefore, it is necessary to investigate the influence of different smoothed intervals on the positioning precision of our system. The positioning error with six different settings is illustrated in Figure 11. When the smoothed interval is set to 200 epochs, the system has the largest positioning error in the three directions of the ENU frame. This happens because the receiver loses lock frequently within a smoothed interval and then cycle slip occurs, which causes an increase in the positioning error. By reducing the smoothed interval, the phenomenon that the receiver loses lock can be eliminated, which can improve the positioning precision of the carrier phase smoothed pseudorange. Obviously, with the smoothed intervals of 5, 10, 20, and 50 epochs, our system has smaller errors in east and north directions compared to the smoothed intervals of 100 or 200 epochs. However, if the smoothed interval is too small, the positioning precision in up direction cannot be further improved, and may even become worse. As shown in Figure 11, when the smoothed interval is set to 5 epochs, the positioning error in the up direction is the largest among different settings. The reason for this phenomenon is that when the smoothed interval is too small, the carrier phase cannot smooth the pseudorange well, which causes the smoothed pseudorange measurement to degenerate into the pseudorange measurement.
error. By reducing the smoothed interval, the phenomenon that the receiver loses lock can be eliminated, which can improve the positioning precision of the carrier phase smoothed pseudorange. Obviously, with the smoothed intervals of 5, 10, 20, and 50 epochs, our system has smaller errors in east and north directions compared to the smoothed intervals of 100 or 200 epochs. However, if the smoothed interval is too small, the positioning precision in up direction cannot be further improved, and may even become worse. As shown in Figure 11, when the smoothed interval is set to 5 epochs, the positioning error in the up direction is the largest among different settings. The reason for this phenomenon is that when the smoothed interval is too small, the carrier phase cannot smooth the pseudorange well, which causes the smoothed pseudorange measurement to degenerate into the pseudorange measurement.

Line Feature Tracking Threshold
In the nonlinear optimization stage, if the number of keyframes for which line features are continuously observed is less than a threshold, then these line features are filtered out. A different threshold has a great impact on the robustness and positioning precision. As shown in Figure 12, a threshold of 2 means that those line segments that are observed by two consecutive keyframes or more can be added to the factor graph. However, since these line segments are continuously tracked for too few frames, they are not stable enough for nonlinear optimization, which will definitely reduce the positioning precision. On the contrary, if the line feature tracking threshold varies between 3 and 4, the proposed system can obtain more accurate positioning results in the ENU frame. Because the line segments observed by three or four consecutive keyframes are more stable and more suitable for nonlinear optimization.

Line Feature Tracking Threshold
In the nonlinear optimization stage, if the number of keyframes for which line features are continuously observed is less than a threshold, then these line features are filtered out. A different threshold has a great impact on the robustness and positioning precision. As shown in Figure 12, a threshold of 2 means that those line segments that are observed by two consecutive keyframes or more can be added to the factor graph. However, since these line segments are continuously tracked for too few frames, they are not stable enough for nonlinear optimization, which will definitely reduce the positioning precision. On the contrary, if the line feature tracking threshold varies between 3 and 4, the proposed system can obtain more accurate positioning results in the ENU frame. Because the line segments observed by three or four consecutive keyframes are more stable and more suitable for nonlinear optimization.

Conclusions
In this paper, we propose a tightly coupled GNSS-VIO system based on point-line features. First, since line features contain more environmental structure information, we added line features to the system. Second, the pseudorange measurement can only achieve meter-level positioning precision but is fast and unambiguous. On the contrary, the carrier phase measurement can achieve the centimeter-level positioning precision but is required to solve the whole-cycle ambiguity, which is time-consuming. Therefore, we

Conclusions
In this paper, we propose a tightly coupled GNSS-VIO system based on point-line features. First, since line features contain more environmental structure information, we added line features to the system. Second, the pseudorange measurement can only achieve meter-level positioning precision but is fast and unambiguous. On the contrary, the carrier phase measurement can achieve the centimeter-level positioning precision but is required to solve the whole-cycle ambiguity, which is time-consuming. Therefore, we propose to combine the advantages of the two measurements and replace the pseudorange with the carrier phase smoothed pseudorange, which can greatly improve the performance of our system. Third, we considered the extrinsic translation parameter between the GNSS receiver and IMU, and our system can perform real-time parameter calibration. Finally, we demonstrated that if the states are represented in the ECEF frame, the tightly coupled GNSS-VIO system is fully observable. We conducted experiments on public datasets, which show our system achieves high-precision, robust, and real-time localization. In the future, we will further focus on improved methods for tightly coupled GNSS-VIO systems.

Conflicts of Interest:
The authors declare no conflict of interest.