Monocular Camera/IMU/GNSS Integration for Ground Vehicle Navigation in Challenging GNSS Environments

Low-cost MEMS-based IMUs, video cameras and portable GNSS devices are commercially available for automotive applications and some manufacturers have already integrated such facilities into their vehicle systems. GNSS provides positioning, navigation and timing solutions to users worldwide. However, signal attenuation, reflections or blockages may give rise to positioning difficulties. As opposed to GNSS, a generic IMU, which is independent of electromagnetic wave reception, can calculate a high-bandwidth navigation solution, however the output from a self-contained IMU accumulates errors over time. In addition, video cameras also possess great potential as alternate sensors in the navigation community, particularly in challenging GNSS environments and are becoming more common as options in vehicles. Aiming at taking advantage of these existing onboard technologies for ground vehicle navigation in challenging environments, this paper develops an integrated camera/IMU/GNSS system based on the extended Kalman filter (EKF). Our proposed integration architecture is examined using a live dataset collected in an operational traffic environment. The experimental results demonstrate that the proposed integrated system provides accurate estimations and potentially outperforms the tightly coupled GNSS/IMU integration in challenging environments with sparse GNSS observations.

The remainder of the paper proceeds as follows: we first introduce the coordinate frames utilized in this paper. The inertial navigation and its error model are then summarized. Thirdly, the principle of computer vision-based motion estimation is presented. GNSS simulation and differential processing are discussed afterwards. Based on the above sensor description, the proposed EKF model and integrated architecture are established, followed by the live test validation. The paper concludes with the results as well as planned future work. Figure 1 illustrates three major coordinate frames on the Earth ellipsoid. The inertial coordinate frame (i frame) -i i i O X Y Z is the fundamental reference frame for inertial navigation and an IMU enclosure usually measures the specific force as well as the angular rate relative to the i frame. Its origin is on the Earth's center of mass O and it is stationary relative to the fixed stars. We usually express the ultimate position solution, latitude ϕ , longitude λ and height h, in the Earth frame (e frame) -e e e O X Y Z . Its origin is the Earth's center of mass O , with e Z axis forming along polar axis and e X axis lying on the intersection of the equatorial plane and prime meridian plane. The e Y axis is also on the equatorial plane and satisfies the right-hand rule. The origin and Z-axes of the i and e frames are coincident, respectively. In this paper, the e frame-based position solutions are based on the WGS-84 ellipsoid. To describe the velocity as well as the orientation of a ground vehicle, the right-handed navigation frame (n frame) -n n n P X Y Z is utilized and locally placed at the user's position P . The n Z axis is collinear with the local normal line of the reference ellipsoid pointing downwards while n X and n Y axes indicate the local north and east direction, respectively.  ω from an IMU are usually measured relative to the i frame and expressed in the body frame (b frame) .The b frame has its origin at the center of the IMU enclosure. If the IMU is mounted parallel to the vehicle frame, we allocate the b X axis pointing forward, in view of the vehicle, and b Z axis aligning with local gravity direction. The b Y axis satisfies the right-hand rule and indicates the right-side direction of the vehicle. Based on the pinhole projection model in our research, the image feature points are expressed in the camera frame (c frame) which is extended from the 2D image plane. The c frame representation is shown in Figure 2. The c frame's origin is at the camera center C , and c X as well as c Y axes point towards left and upward directions of the image plane, correspondingly. The c Z axis lies along the principal axis and is orthogonal to the image plane. f denotes the focal length of the camera.  In this paper, the vehicle dynamics model is established in terms of the b frame instead of other frames. According to Hol [15], the b frame based motion estimate has less modeling complexity and is prone to eliminating the influence of angular and position installation errors.

Inertial Navigation and IMU Error Model
A six-degree-of-freedom (6-DOF) strapdown IMU provides the user high-bandwidth position, velocity and orientation estimation with complete independence of the reception of electromagnetic waves as following differential equations [16]: In Equation (1), n b C is the direction cosine matrix representing the rotation between the b and n frames. The variables n ie ω and n en ω are the Earth's rotation rate and body transport rate, respectively. The symbol × denotes the skew symmetric matrix of a specific angular rate term. The calculated n b C is then used to estimate the acceleration in terms of the n frame by multiplication with measured specific force b ib f . As described in Equation (2), Coriolis force, body transport rate and local gravity terms are compensated in the estimation. Although the local gravity changes with latitude ϕ and height h , the insensitivity can be ignored for a ground vehicle. In Equation (3), the position update is expressed in the e frame and therefore needs M R and N R , radii of curvatures in the meridian and prime vertical, to be estimated.
Although a strapdown IMU propagates the high-frequency inertial navigation mathematically derived from Equations (1) to (3), the MEMS accelerometer and gyroscope sensors inevitably drift and give rise to quadratic and cubic accumulated mechanization errors. Researchers have developed error propagation models in various approaches for practical applications such as [17][18][19]. To describe a ground vehicle in low dynamics with well-calibrated a priori information, the Ψ angle model [20] is introduced in this paper on account of simplicity: In the above equations, e R is the Earth radius; δω are the perturbation terms of position, velocity, orientation and sensor biases, respectively. As noted earlier, this paper aims to efficiently integrate data from camera, IMU and GNSS sensors to achieve high accuracy ground vehicle navigation in challenging GNSS environments. Therefore, a model for reasonably calibrating the IMU sensor biases needs to be established considering the practical situation. A commonly cited first-order Gauss-Markov model regards the IMU errors as 3 independent components: a random constant, a Gauss-Markov variable and a white noise term and therein the accelerometer/gyroscope error propagations are written as following equations [16,18]: The subscripts of c and m indicate the random constant and first-order Gauss-Markov terms, and their derivatives yield: where the constants T f and T w are the correlation times; w f , w w , µ f and µ w are assumed uncorrelated and correlated white Gaussian noise (WGN) for inertial sensors. Equations (4) to (9) construct the fundamental linearized dynamic system of the EKF integrated system which will be further discussed in Section 6.

Motion Estimation in Computer Vision
The camera requires calibration prior to the computer vision routine for motion estimation. To effectively estimate the intrinsic camera parameters as well as distortion characteristics of the images, researchers have developed practical calibration methods such as [21][22][23] and there exist open source toolkits for calibration-related tasks. Normally, the camera calibration parameters do not change rapidly over time, moreover, the run-to-run calibration differences are even negligible within a limited period of time, provided the camera is not reconfigured. In addition, as opposed to an IMU, a temperature-independent CMOS/CCD sensor makes the camera resistant to ambient environmental change. Therefore, compared to the inertial sensors, the camera exhibits different calibration mechanism and characteristics coupled with a diverse model for long-term stability.
As mentioned previously, the perspective projection with a pinhole camera model is used. To estimate the motion in successive image frames, the computer vision module executes three main functions: (1) feature extraction and matching; (2) outlier removal and discrimination of the moving features; (3) motion estimation.

Feature Extraction and Matching
Feature extraction is the essential process for subsequent motion estimation. As an outdoor environment usually has a high-contrast texture in the image scene, image points with distinct structural information are selected as features. In computer vision community, the scale-invariant feature transform (SIFT) algorithm [24] is a well-accepted approach to describe and match the feature points which are usually invariant to distortion, scaling, orientation, affine transformation, and illumination changes. Although several faster algorithms, such as PCA [25] and SURF [26], facilitate reducing the computational load, the point location accuracy can be in turn degraded [27]. We, consequently, use the SIFT algorithm to locate the feature points and create the corresponding scale-invariant descriptor vectors by measuring the gradient magnitude and orientation in the pre-defined pixel neighborhood. To provide sufficient dimensionalities in the feature space for favorable matching result, a 128-dimensional descriptor: is used for each pre-selected feature point. In adjacent images, the similarity of two candidate feature points, A and B, is given by the Euclidean distance: By setting the empirically examined accept-reject threshold, the feature matching process is evaluated using the ratio of the minimum distance to the next minimum [24]. Ideally, if an image scene contains a non-uniform texture, there should be a large amount of SIFT features which guarantee desirable redundancy for optimized motion estimation. However, there inevitably exist outliers due to the limited descriptor dimensionality and the imperfect matching determination criteria. Therefore, an outlier removal scheme is undertaken prior to the motion determination process. In order to remove the outlier correspondences which potentially give rise to undesirable estimation, the random sample consensus (RANSAC) algorithm [28] is taken into consideration in this paper.

Outlier Removal by RANSAC
Based on the primary principle of epipolar geometry, all corresponding image points satisfy the coplanarity equation: x Fx 0 (12) where F is a 3 × 3 fundamental matrix which geometrically describes the projection between two corresponding points in a pair of images representing the same 3D object; x and x′ are the coordinates of the points in image planes, respectively. It is supposed that RANSAC algorithm estimates the fundamental matrix F which optimally fits all the inliers. However, RANSAC may miss out a few outliers and estimate a biased F matrix due to the improper number of iteration operations and a relatively large number of outliers. Consequently, a multilayer RANSAC scheme [29] is proposed and implemented as follows: (1) arbitrarily selecting a subset samples from all pre-selected matching features; (2) reconstructing epipolar geometry constraint satisfying Equation (12) and computing the fundamental matrix F based on the selected samples; (3) using the same constraint and the calculated matrix F to determine the error from all features; (4) with a proper threshold, categorizing all feature correspondences as inliers or outliers by appraising the error statistics; (5) iterating the first three steps multiple times (refer to [28] for a favorable number of iterations); (6) finding the best-fit matrix F which yields the optimal error estimates and removing the outliers discriminated by this matrix F; (7) repeating the first six steps with a gradually shrunken inlier/outlier determination threshold.
To test the multilayer RANSAC scheme, we use a sequence of images (2.5 Hz rate) which will be elaborated in Section 7. Figure 3 shows the number of originally matched features as well as that after implementing multilayer RANSAC. It is clearly seen that after the fourth layer of RANSAC processing, there are still hundreds of matched features for every moment to maintain observable redundancy for motion estimation.  Original matching After 1st RANSAC After 2nd RANSAC After 3rd RANSAC After 4th RANSAC lower subplots present the refined matching results by applying multilayer RANSAC scheme. As it has shown, all the remainder feature correspondences are recognized as inliers and no feature locates on the moving van any more.

Motion Estimation
After the refinement of feature matching by implementing the multilayer RANSAC scheme, we also compute the most reasonable estimate of the matrix F. According to the primary principles of epipolar geometry and the properties of fundamental matrix [30], we have the following relationship between the motion of the camera and fundamental matrix: where is the skew symmetric of camera translation; R is the rotation matrix; K contains the intrinsic calibration parameters and E is the essential matrix. By implementing the singular value decomposition (SVD), the camera rotation R and unit translation vector regarding two adjacent images can be finally retrieved and serve as the vision-based navigation parameters for integration purposes. It is worth noting that since the essential matrix E has only five degrees of freedom, the 3× 1 vector is resolved up to scale and is accordingly subject to an ambiguity issue in terms of the translation magnitude. If the absolute magnitude of the translation between two successive images is extracted by other approaches, which we will elaborate in Section 5, the scale ambiguity of can be finally resolved.

Determination of Translation Magnitude from GNSS Measurements
In an effort to resolve the scale ambiguity issue of a monocular camera, there are several available approaches, such as using the wheel tick and inertial sensors [6]. Before resorting to the inertial-aided approach, a decorrelation solution between sensor error and inertial mechanization is desired. An analogous situation applies to using a wheel tick sensor if its drift arises quickly. Conversely, as one of the most widely applied outdoor positioning technologies, GNSS does not yield accumulated ranging or positioning errors, and can provide the capability to resolve the scale ambiguity problem of a monocular camera by leveraging GNSS differential technique. If the acquired GNSS and image data are synchronized, the baseline of position change between two GNSS epochs coincides with the desired camera translation. The GNSS observation equation for pseudorange measurements is given by: where: i P is the pseudorange measurement; The superscript i indicates the satellite index and k represents the total tracked number of satellites. Resorting to the concept of differential positioning technology, Equation (14) between two epochs yield differential expression as follows: where the notation δ depicts the differential operation for each term. Under low dynamic conditions with short time intervals, the differential operation adequately eliminates the ionospheric and tropospheric effects. For the differential term of the satellite clock error, it can be either calculated using the decoded ephemeris or simply ignored as the satellite clock drift can be neglected in a short temporal scale. According to van Graas [31] and Soloviev [6], the true range differential i δρ correlates the baseline magnitude of position change and the relationship satisfies: where dot(, ) defines the vector dot product; R i and R r denote the position vectors of the satellite and the receiver in the e frame; is the unit vector along the signal reception path.
According to the commutative and distributive laws of the vector dot product, Equation (16) yields the baseline of position change R r δ as well as the change of the unit vector u δ . With the satellite broadcast ephemeris, the satellite position vector R ( 1) i t − and R ( ) i t can be easily estimated. In addition, the user's previous position vector R ( 1) r t − has been optimally estimated by the filter of the integrated system, and the present position vector R ( ) r t in Equation (16) can be initially determined by inertial navigation given a short time interval. The R ( ) r t at this stage is not for the ultimate position solution at the present time t but for the upcoming R r δ estimation. Note that after the dot product operation, the desired unknown becomes the scalar R r δ instead of R r δ .
Combining the Equations (15) and (16) with rearranged known/unknown quantities, the revised differential equation is expressed as: where: i P δ ′ is the pseudorange differential term compensated by the known quantities of the dot product operation; i r θ represents the included angle between the vectors of R r δ and u .
With two or more available pseudorange observations, the unknown terms of R r δ and r t δ can be derived by implementing the least-square fit. It is worth noting that the LOS measurements play an important role in accurate R r δ estimation as a contaminated measurement does not purely satisfy the above-mentioned satellite/user relationship and will give rise to a biased estimate of R r δ . In an effort to conduct multipath detection and mitigation in challenging GNSS environments, previous researchers have applied several approaches [32,33] by leveraging pseudorange, carrier phase or signal-to-noise ratio (SNR) measurements based on the test statistics. However, difficulties arise to road applications as a result of undesirable detection performance and required hardware upgrades [34]. Another easier option uses a weighting scheme by evaluating the satellite elevation angle and carrier-to-noise ratio (C/No) to restrain the multipath effect. With the synchronized GNSS and camera observations, the camera scale ambiguity can be finally resolved by the estimation of the baseline magnitude of position change R r δ between two GNSS epochs.

EKF Modeling and Implementation
To optimally implement sensor integration, several forms of non-linear filtering techniques were previously developed and validated, such as the EKF, unscented Kalman filter (UKF) as well as particle filter (PF). An EKF design linearizes the system and measurement models by considering a first-order Taylor series expansion at the predicted state estimate. This first-order approximation enables the linearized models to implement standard Kalman filter processing. In a typical UKF design, particularly for high non-linear instances, the selected sigma points through the unscented transform estimate the mean and covariance of the state vector. Based on the Bayesian estimation theory, the PF design is supposed to achieve a better solution compared to the EKF and UKF implementations if it is properly established. Nevertheless, the intense computational burden of the UKF and PF processing, one of their prominent shortcomings, brings about severe time-lag estimation and fatally hinders real-time road applications. On the other hand, for low-dynamic applications such as consumer-grade ground vehicle navigation, a time-efficient EKF-based sensor integration scheme requires limited computation and can agree well with the actual situation in terms of the accuracy in statistics [35,36]. The EKF approach is thus adopted as the proposed filter design.
As described previously, Equations (4) to (9) serve as the EKF discretized continuous-time model which is summarized as:

Fx
Gw , w ~ 0, Q In Equation (18), F and G are the linearized dynamic matrix and input coefficient matrix with constant dimensionalities. Additive white Gaussian noise of the system is denoted as w with a power spectral density Q and given by: w where the four triads denote uncorrelated and correlated noises of the sensor biases. The 15-element system state vector is given by: where the triad perturbation terms of position, velocity, orientation and sensor biases are included.
As camera-derived navigation parameters, the rotation matrix and translation unit vector require additional transformations to obtain the camera-based position and orientation under the e and n frames, correspondingly. Let 0 be the initial IMU orientation matrix and be the constant rotation matrix between the c and b frames. Assuming the origins of the c and b frames coincide, the camera-based orientation can be expressed as: where is the accumulated 3 × 3 camera rotation from the first image and satisfies: Assuming contains only WGN, the orientation error term Ψ in Equation (20) correlates as well as IMU-based orientation with the relationship of: where matrix Γ consists of the measurement noises. Rearranging Equation (23) yields: Let , , , and , be the corresponding elements from the matrix of right-hand side multiplication and construct: According to Equations (24) to (26), the linear observation equation with respect to the orientation is finally given by: On the other hand, a camera-derived translation unit vector is transformed as: which means a unit vector expressed in the n frame. Note that the orientation matrix does not refer to the present moment but the instant of previous image update and the caret ^ indicates the EKF-based estimation. By obtaining the translation magnitude R which has been resolved in Section 5 by using differential LOS GNSS pseudoranges over epochs, the camera-based position is expressed as: In the above equation, depicts the camera-based position at the present moment, and 1 indicates the previous EKF position update, both expressed in the e frame. By subtracting the IMU-based position from , the observation equation with respect to the position is established by: (30) where denotes the measurement noise. Combining Equations (27) and (30) yields the EKF measurement model as below: In Equation (31), the measurement and noise vectors are respectively given by: v is assumed as WGN vector with a measurement noise covariance R . The measurement model matrix H is given by: Following the standard EKF procedure, the state vector as well as its covariance matrix can be estimated and applied to ground vehicle navigation. After each update cycle, the state vector is reset to zero and refreshed by the next update operation. Figure 5 depicts an overall flowchart for integrating the camera, IMU and GNSS sensors through the EKF engine. The sensor units are colored cyan while white blocks indicate the utilized algorithms for intermediate parameter calculation. Between camera/GNSS data update, the vehicle motion is estimated by strapdown IMU mechanization. Once new camera/GNSS measurement data arrive, all derived navigation information is transformed in order to enable the EKF engine for accurate navigation solution and IMU sensor calibration. In such a method, the measurement model is straightforward to construct as relatively small/constant size of measurement elements are included in the EKF. The whole processing requires a relatively low computational load since covariance matrices have low dimensionality. Finally, a low-pass filter before implementing the EKF can be added to smooth the camera-derived rotation and translation parameters.

Experimental Description and Result Analysis
Our proposed integrated architecture is examined through a live test which was collected by a ground vehicle in an operational traffic environment in Málaga, Spain and is publicly available online [13]. The IMU and monocular forward-looking camera sensors were rigidly fixed on the vehicle rooftop. In order to provide a truth reference solution, three RTK GPS receivers were installed on the vehicle to generate centimeter-level positioning and orientation solutions. In total six drive segments have been accomplished, however the corresponding truth references were vulnerable to limited signal qualities. The Campus_0L segment had full range availability of truth reference and contains most common ground vehicle dynamics, such as acceleration, deceleration, stillness, forward driving, cornering, etc., and it is therefore adopted in this paper to test our integrated design. Other sensors utilized during the drive test, such as laser scanners, are disregarded. Table 1 presents some basic information of the overall drive test. In attempt to keep the recorded data from all sensors available during the whole test, we deliberately remove the beginning of the dataset when the vehicle was keeping static and the sensors were being powered on sequentially. The timespan in Table 1 is, consequently, approximately 20 s shorter than in the actual case.  Table 2 lists the individual IMU sensor specification of Xsens MTi enclosure [37]. Although this product enables 3D axes-based magnetometer to compensate for gyroscope drift, we ignore the magnetic field data and just make use of the 6-DOF raw angular rate and specific force measurements.  Table 3 summarizes the major intrinsic calibration parameters from the camera. There were actually two cameras of the same model mounted at both sides in front of the vehicle rooftop. Without loss of generality, only the left camera is employed in this paper and parameters in Table 3 are, accordingly, derived from the left one. The parameters f x as well as f y denote the location of the principal point by means of pixel element on the image plane. The focal length is, similarly, defined in terms of pixel element in both x and y axes, and is hence depicted as c x and c y . The remainder of the parameters in Table 3 are radial and tangential distortions, respectively. In the Campus_0L segment, the rectified image frames are utilized instead of the originally captured ones to avoid the image projection offset and obtain accurate motion estimation parameters for sensor integration.
Although the b and c frames do not necessarily coincide with each other as in our case, the camera and IMU sensors are usually rigidly fixed to the vehicle and their relative translation and rotation parameters are, therefore, constant. The rotation calibration between the b and c frames can be accomplished according to Lobo [38] while the lever arm vector needs more effort to be accurately estimated. The difficulties of lever arm vector estimation arise due to the sensitivity of the selected target position and the geometric configuration [38]. Normally, in low dynamic situations with moderate maneuvers, particularly for ground vehicle applications, a mild lever arm has little impact little on the navigation solution and it does not necessarily require an accurate estimation [15,39]. Therefore, although both rotation and translation calibration results between the b and c frames are provided in the dataset, we simply disregard the lever arm effect caused by the translation and take only the relative rotation b c C into consideration. As described in Section 5, the scale factor ambiguity between every two successive image frames can be resolved using GNSS differential techniques, capable of obtaining the distance traveled over ground. In an effort to acquire more available GNSS measurements in potentially challenging environments, we take advantage of both GPS and GLONASS constellations for the pseudorange simulation. In typical challenging GNSS environments like urban downtown areas, low-elevation-angle signals are prone to severe multipath degradation or obstruction caused by nearby buildings. We, therefore, filter out those low-elevation-angle satellites from subsequent integration demonstrations. Figure 6 shows the sky plot of GPS/GLONASS satellites when the dataset was initially collected in Málaga with the elevation mask angle of 40 degree. The GPS pseudo random number (PRN) is between 1 and 32, while GLONASS orbital slot is designated within the range 51 to 74. Although the elevation mask angle is significantly higher than in most other applications, both constellations contributed 8 space vehicles for the integration processing. According to the differential operation described in Equation (15), most of the common error sources can be eliminated from the satellite-touser geometry.
It is simulated in this paper that the LOS pseudorange measurements contain thermal noise with 0.25 m root-mean-square error (RMSE) during the whole Campus_0L segment. Note that the two constellations do not share the same coordinate system as well as time reference. In order to avoid newly introduced unknown quantities, the transformation between their coordinate frames has been implemented by using the approach presented by Cook [40] and, similarly, the system time offset can be extracted from the ephemerides of all modernized GLONASS-M satellites [41].
All the camera, IMU and GNSS measurements are timestamped. Instead of using the original 7.5 Hz frame rate, we deliberately downsample the visual measurements to 2.5 Hz to reduce the computational load and maintain preferable alignment with the 100 Hz IMU measurements for the EKF state update estimation. Since the IMU measurements were not necessarily synchronized with the image frames, an extrapolation process of IMU measurements is expected to establish synchronization consistency. According to You [42], a second-order polynomial extrapolation is utilized to balance the computational load and the accuracy of extrapolation. In order to demonstrate and assess how different sensory-integration schemes perform in challenging environments, we gradually reduce the number of visible GPS/GLONASS satellites with lower elevation angles over time and compare the performance between the tightly coupled GNSS/IMU integration and our proposed camera/IMU/GNSS scheme.
A Google Earth visualization of 1 Hz navigation trajectories of different integration schemes is qualitatively given in Figure 7 with common start/end times. The drive test started at the lower left and ended at the upper middle with the upside indicating the north direction. The black trajectory serves as the truth reference which was obtained by leveraging three RTK GPS receivers. The tightly coupled GNSS/IMU solution is illustrated in red and the vision/IMU/GNSS integration is shown as cyan line. Two purple arrows directly denote the place where the number of available satellites changes. A more quantitative comparison of the horizontal accuracy is shown in Figure 8 to reveal how the different integrated architectures react in terms of reduced observability of GNSS measurements in challenging environments.
Based on Figures 7 and 8, the tight coupling of GNSS and IMU measurements yields an accurate solution when an adequate number of satellites are available (above four). Meanwhile, the gradually accumulated error of proposed camera/IMU/GNSS approach reaches a maximum of 5 m before the number of visible satellites drops to three. The small systematic error mainly stems from erosions of unexpected pixel positioning error, camera calibration error, numerical instability, and remnants of the outliers after RANSAC processing and so forth. After the number of satellites drops to three, the tightly coupled GNSS/IMU integration still enables a continuous navigation solution with insufficient observability, however the estimation tends to sharply drift away from the truth reference over time and its horizontal error exceeds 20 m at the end of the test. Contrarily, compared to previous navigation results, the camera/IMU/GNSS integration yields a slightly better accuracy with only three utilizable satellites. The reasons for this phenomenon are twofold. First, based on Equation (17) which contains two unknown quantities, three observation equations still maintain a redundant constraint provided that the measurements are derived from LOS directions. More importantly, the systematic error of the image processing module gives rise to the reversal of the positioning error during the vehicle's U-turn operation. As seen in Figure 7, the cyan line slightly tends to the left side of truth reference prior to the U-turn and therefore the same directionality of the systematic error makes the horizontal accuracy be partially compensated after the U-turn.   According to the above-mentioned facts, the camera/IMU/GNSS integration brings us a potential advantage compared with the traditional tightly coupled GNSS/IMU technique in challenging environments with sparse GNSS observations. A more detailed horizontal positioning error analysis of the camera/IMU/GNSS integration is presented in Figure 9 with the same hypothesis.
The left side shows the horizontal positioning error in two different zoom levels. The right side shows the histograms, cumulative distribution curves and other related statistics. All the subfigures and corresponding statistics are based on 1 Hz solution. Refer to [1,43] which define these statistics/plots in detail.
Researchers sometimes refer to quaternions, rather than Euler angles, for orientation representation, particularly for high dynamic applications [14,15]. The quaternion expression avoids the gimbal lock phenomenon when the pitch angle approaches ±90 deg, and operates more efficiently compared with multiplications of direction cosine matrices. Whereas, since a ground vehicle is incapable of operating orthogonal to the ground plane, Euler angles do not suffer from the singularity problem and can provide an intuitive manner for the user to perceive the vehicle direction. We, therefore, choose roll, pitch and yaw angles for representing the vehicle's orientation based on Tait-Bryan convention. Figure 10 shows the yaw angle error statistics of the camera/IMU/GNSS integration. Yaw jitters occur as a result of the abrupt change in the centrifugal force when the vehicle underwent the U-turn operation. Although the angle residual is maintained within 1 deg in a majority of the time during the test segment, the relatively low level of systematic error still can be observed from the zoomed-in subfigure due to the imperfection of the computer vision module. This further proves the reversal of the positioning error during the vehicle's cornering as shown in Figure 8. It can be inferred that the accumulated position and orientation errors will gradually erode the navigation solution if the dataset is long enough without any other sources of corrections. In addition, the IMU sensor bias estimation is shown in Figure 11. Those large biases from both the accelerometer and gyroscope sensors inevitably cause erroneous IMU mechanization solutions in a matter of seconds without sensor calibration. The non-smoothly varying biases, particularly for the accelerometer components, gave rise to the unpredictability of the sensor errors. Simply averaging the sensor biases within even a few seconds may yield an inaccurate navigation solution, particularly when the vehicle dynamics significantly change with velocity and cornering stiffness. As seen in Figure 11, during the vehicle's cornering, the accelerometer bias on x-axis dramatically dropped by approximately 200 milli-g. However, the actual sensor biases may be, to a certain extent, different from the estimation shown in Figure 11 in terms of the fidelities of the utilized filtering models [16].  In an effort to balance the tradeoff between practical performance and computational cost, we deliberately reduce the image frame rate and examine the position solution of our integrated scheme by comparing with the truth reference. Figure 12 demonstrates a comparison of the horizontal positioning errors with different image frame rates based on the same GNSS observability as discussed previously.
As the interval of images is gradually enlarged, EKF filtering leans more on the inertial navigation alone which gives rise to larger location and orientation errors, and moreover, those fewer detected  feature correspondences tend to degrade the reliability of camera motion estimation. The sawtooth-shaped curves therefore arise with worse accuracy. During the U-turn operation when the images partially overlap each other, SIFT/RANSAC algorithms acquire much fewer feature correspondences with lower frame rate, which results in abrupt change in positioning accuracy as seen in Figure 12, particularly for the low frame rates of 0.25 and 0.50 Hz. The worst situation may occur when no overlap can be found in two consecutive images. Consequently, the strategy to determine an image frame rate is primarily dependent on the vehicle dynamics, IMU quality and fidelity of utilized filtering models. In our case, 0.25 Hz frame rate reaches the lower limit as during the U-turn the gaps of the image overlap arise for a frame interval of approximately 4 s. It is worth noting that the vision-based error characteristics are relevant to the processed images and the systematic error therefore varies with the frame rate as seen in Figure 12, particularly for the time period after U-turn.A higher image frame rate tends to yield better integration accuracy at the cost of more excessive computational load, whereas the acceptable processing speed inevitably depends on the available computational capability. In Campus_0L segment, we used a 64-bit desktop computer with a 3 GHz processor to process the 2.5 Hz rate images with 1,024 × 768 resolution based on a MATLAB platform. Our implementation requires a few seconds of processing for each pair of images and tens of milliseconds for all other filtering routines and, consequently, has yet to achieve real-time functionality. Moore's law, fortunately, has been appropriately indicating the rapid development of integrated circuits during the past decades and also has given us encouraging prediction on the exponentially growing processing capacity of semiconductor materials in the future. In addition, instead of using a CPU, a GPU module also facilitates speeding up the image processing task towards a real-time computation.

Conclusions
Aiming at taking advantage of the existing onboard technologies for ground vehicle navigation in challenging environments, this paper has developed an integrated camera/IMU/GNSS system based on the EKF design. The estimated motion parameters from successive image frames were utilized to compensate the drifting IMU sensor biases and maintain the accurate strapdown mechanization between the update of image data. For resolving the scale ambiguity problem of a monocular camera, GNSS differential technique provided an effective manner to extract the baseline magnitude of position change. Based on the EKF design, the proposed integrated system is able to provide 15-state high-bandwidth navigation solutions. Our implementation has been validated through a 5-minute long drive test which provided raw camera and IMU data as well as centimeter-accuracy truth reference. For a reasonable simulation of challenging GNSS environments, we gradually decreased the number of visible satellites with lower elevation angles over time. The experiment has provided results with high accuracies, a majority to the sub-degree level in the yaw angle and meter level in the horizontal plane. Furthermore, it was demonstrated that under the condition of sparse GNSS observations, the camera/IMU/GNSS integration potentially outperformed the tightly-coupled GNSS/IMU scheme which has yielded diverging navigation solutions. The camera/IMU/GNSS integration has also been tested with different image frame rates and the results revealed that the accuracy deteriorates with decreasing frame rates. Future work will further integrate GNSS measurements into the filtering architecture for better error growth control and automatically enable the loosely/tightly coupled GNSS/IMU integration in case relatively favorable GNSS conditions arise. Future work also includes a target objective of real-time functionality with revised computer vision algorithms on the GPU platform.