Accurate Collaborative Globally-Referenced Digital Mapping with Standard GNSS

Exchange of location and sensor data among connected and automated vehicles will demand accurate global referencing of the digital maps currently being developed to aid positioning for automated driving. This paper explores the limit of such maps’ globally-referenced position accuracy when the mapping agents are equipped with low-cost Global Navigation Satellite System (GNSS) receivers performing standard code-phase-based navigation, and presents a globally-referenced electro-optical simultaneous localization and mapping pipeline, called GEOSLAM, designed to achieve this limit. The key accuracy-limiting factor is shown to be the asymptotic average of the error sources that impair standard GNSS positioning. Asymptotic statistics of each GNSS error source are analyzed through both simulation and empirical data to show that sub-50-cm accurate digital mapping is feasible in the horizontal plane after multiple mapping sessions with standard GNSS, but larger biases persist in the vertical direction. GEOSLAM achieves this accuracy by (i) incorporating standard GNSS position estimates in the visual SLAM framework, (ii) merging digital maps from multiple mapping sessions, and (iii) jointly optimizing structure and motion with respect to time-separated GNSS measurements.


Introduction
Localization is one of the primary operations that connected and automated vehicles must perform, both to navigate from one location to another and to interact with each other and with their surroundings within a mapped environment. Satellite-based navigation sensors have historically been the unrivalled sensor of choice for navigating from source to destination. However, the high-reliability sub-50-cm precision demanded by automated vehicles for lane-keeping and other applications, especially in urban areas, has significantly changed this landscape [1]. In most automated vehicles being developed, the Global Positioning System (GPS)/Global Navigation Satellite System (GNSS) is relegated to a secondary sensor whose role is to loosely constrain (within a few meters) the primary localization sensors, usually camera(s) and/or LiDAR, to a global reference frame when building a digital map. The vehicles then locate themselves to decimeter accuracy within this digital map.
Automated driving does not necessarily demand sub-50-cm agreement between the coordinates of a given point in the digital map and the coordinates of the same point in a well-defined global frame of reference, such as the World Geodetic System 1984 or the International Terrestrial Reference Frame. Multi-session operation of GEOSLAM is demonstrated using camera and GNSS data collected in a moderate urban environment, and the accuracy of global localization with the multi-session GEOSLAM map is assessed with respect to CDGNSS-based ground truth.

Previous Work
Improving the accuracy of maps by averaging GPS/GNSS tracks has been explored previously using a variety of approaches. An early effort, detailed in [8], proposed the precise determination of lane centerlines by clustering and averaging the GNSS tracks of probe vehicles. The accuracy of the estimated centerline was assessed in terms of the spread of GNSS tracks, assuming, without analysis, that the error was zero-mean at every location. Likewise, [9] invoked the zero-mean assumption without examination. More recently, [10] proposed vehicle lane determination via PPP on a rural road under open-sky conditions. The current paper aims to perform localization at a similar accuracy level, but in urban and suburban areas and with the aid of a digital mapping sensor.
Minimizing the difference between GNSS measurements and the assigned map coordinates of locations visited multiple times by probe vehicles has been a common feature of the seminal works on map-based precise localization in urban environments for automated driving [4,5], but no analysis of the accuracy of the resulting map in the global coordinate system was provided.
The effect of multipath on measured pseudoranges was studied extensively for various signal types in [11]. However, this study was done under open-sky conditions with a static survey-grade antenna, hardly representative of a mass-market receiver in an urban environment. The effect of multipath from a single large reflector on a static receiver was studied in [12], where it was shown that the position solution of the receiver was significantly biased when tracking multipath-afflicted signals. However, no details were provided about the tracking strategy and navigation filter. Multipath characterization for a dynamic receiver in an urban area was performed in [13], but the study was carried out for a single run through the test area, and many of the important error sources (e.g., atmospheric modeling errors), were assumed to be negligible. A detailed study on the distribution of code-phase and Doppler offsets of the multipath components from individual satellites in a dynamic urban setting was carried out in [14]. However, the error was characterized as the combined distribution of code phase delays over the entire duration of the run, which marginalizes over the temporally-and physically-local biases. On the contrary, this paper explores the errors in the position domain for repeated sessions through a given realization of an urban corridor.
Other GNSS error sources such as errors in modeling of ionospheric [15] and tropospheric [16] delay have been studied extensively over many decades, and their long-term error characteristics have also been reported in the literature. However, the impact of these errors on the asymptotic statistics of code-phase-based GNSS position estimates has not been previously presented.
To the authors' best knowledge, despite the apparent simplicity of the problem, no prior work has studied the long-term statistics of GNSS errors in an urban environment representative of the conditions to be encountered by consumer vehicles creating digital maps anchored by code-phase-based GNSS positioning. One of this paper's primary contributions is to address this gap in the literature.
Sensor fusion of visible-light cameras and GNSS has been extensively studied [17][18][19][20][21][22][23][24][25][26][27][28]. Some of these works [26][27][28] have proposed visual odometry as a replacement for, or an augmentation of, the traditional GNSS/INS architecture. Visual data from cameras are exploited to perform dead reckoning in a visual odometry pipeline, wherein an important distinction from the current paper is that the 3D map points do not persist after a window of time has elapsed-that is, no map of feature points is maintained.
Clearly, such an approach does not allow improvement of the 3D map point positions over multiple mapping sessions.
In [25], the relative change in position between two image frames is first estimated based on time-differenced GNSS carrier phase measurements. The metric-accurate GNSS-derived change in position is exploited to initialize the otherwise unobservable scale in a monocular visual SLAM system. However, GNSS measurements are not incorporated any further once the absolute scale has been initialized. Unlike the current paper, the visual SLAM map is rendered in the arbitrary SLAM coordinate frame since only the relative change in position, and not the absolute position, was estimated based on GNSS measurements.
The vision-GNSS fusion in the current paper is closely aligned with the bi-objective bundle adjustment (BA) optimization techniques previously reported in [17,18,20,[22][23][24]. In [17,18,20,22], the traditional visual SLAM reprojection cost function is jointly minimized along with a GNSS position error term. The methods proposed in [23,24] are also similar, but guarantee that the visual reprojection cost after incorporation of the GNSS term is not significantly greater than the visual-only case. However, none of these works showed significant empirical evidence of their efficacy on real-world vehicle data sets. Furthermore, collaborative mapping or multi-session improvement of the map was not discussed.
Collaborative multi-agent mapping, without GNSS aiding, has also been extensively discussed in the literature [29][30][31][32][33][34][35]. Some of these proposed solutions require significant overlap in the field-of-view of the agents, or require that the relative pose transformation between the agents be known a priori [30,32].
Other solutions, such as in [31,[33][34][35], enable collaboration by performing data association between non-concurrent mapping sessions where the relative pose transformation between the agents is unknown. The multi-session strategy employed in this paper is similar, but with an important distinction: none of the previous works on collaborative mapping have incorporated GNSS measurements in the map-building process. Without global referencing, the problem of data association between non-concurrent sessions becomes intractable. With no estimate of the pose for the mapping platform in relation to the existing map, data association must be attempted against the entire map. It is easily observed that such data association will become infeasible when scaled to city-or country-wide maps. The current paper proposes rendering and storage of digital maps in a global coordinate frame, such that a new mapping session can readily estimate its approximate pose in relation to the prior map, and perform data association on a small segment of the prior map that is expected to be in view of the vision system.
The work presented in [36] is perhaps the most closely related to the current paper. In [36], a particular stretch of roadway is mapped 25 times with a low-cost sensor setup. However, [36] assumes, without detail, the availability of a lane-level accurate low-cost positioning module that provides the full 6-DoF pose for the mapping platform. This greatly simplifies the ensuing data association and mapping pipeline. No mention is made of the general setting of the roadway being mapped (open-sky highway, urban canyon, etc.), and while the accuracy of the mapped traffic signs is adequately reported, localization within the map is not discussed, presumably since lane-level accurate positioning is already available. Meanwhile, the current paper only assumes the availability of a meter-level accurate code-phase-based GNSS receiver that provides 3D position estimates. Global localization accuracy of a vehicle operating within the multi-session map is presented as the key performance indicator.

Low-Cost GNSS in Urban Areas
Low-cost multi-GNSS receiver manufacturers have recently announced the development and release of low-cost multi-frequency multi-GNSS receivers [37]. Accordingly, the analysis in this section considers a vehicular platform equipped with a multi-frequency multi-GNSS receiver capable of tracking both code and carrier phase of GNSS signals.
Development of an extensive dense reference network in support of CDGNSS consumer vehicular positioning in urban areas, as suggested in [38], could be an expensive affair. PPP is a low-cost alternative to CDGNSS that requires only a sparse network of reference stations across the globe, but is not considered a viable option for urban GNSS positioning in this paper because the constant cycle slips and outages experienced in urban areas [6] make it difficult for PPP's float carrier phase ambiguity estimates to converge [7], in which case PPP degrades to code-phase positioning accuracy.
While convergence of PPP carrier-phase ambiguities may be infeasible in urban areas, a partial PPP solution that exploits precise satellite orbits and clocks, as well as ionospheric and tropospheric corrections, can certainly improve the accuracy of code-phase-based GNSS position estimates. Since connected and automated vehicles will perforce enjoy network connectivity, this paper assumes the availability of such GNSS corrections. Thus, the kind of GNSS errors assessed in this section lie between those corresponding to the two extremes of standard standalone code-phase positioning and PPP. This type of GNSS positioning, hereafter referred to as enhanced code-phase positioning, exploits both code and carrier phase or frequency tracking, but, as opposed to PPP, does not attempt to estimate a quasi-constant float carrier phase ambiguity, making it suitable for urban applications.

Pseudorange Measurement
The pseudorange measurement at receiver R from satellite S i is modeled as where is the state of the receiver, comprising the receiver position, r R (t R ), at the time of the signal receipt event, t R , and the receiver clock bias, δt R (t R ) = t R − t, with respect to true time t. The nonlinear measurement model is denoted by h i ; ρ i denotes the measured pseudorange to S i ; c denotes the speed of light in vacuum; δt S i (t) = t S i − t denotes the satellite clock bias with respect to t; δt TOF i denotes the time-of-flight of the signal from S i , as an increment in true time; I ρ i and T i denote the ionospheric and tropospheric delay experienced by the signal from S i , respectively; w ρ i ∼ (µ w i , σ 2 w i ) denotes the sum of measurement thermal noise, multipath interference, non-line-of-sight (NLOS) delay, and other unmodeled errors; and ∆r i denotes the true range between R and S i , given as where r S i is the satellite position at the signal transmit event. Note from (1) that the receiver clock bias component of the state contributes identically to all pseudorange measurements.
Taking n z pseudorange measurements {ρ i } n z i=1 and predictionsĪ ρ i andT i for each measurement, R estimates its state by solving a nonlinear least squares problem based on (1). First, it linearizes the measurement model in (1) about an initial guess of its statex(t R ) = r T R (t R )δt R (t R ) T and the modeled atmospheric delays: Rearranging measured and modeled quantities on the left-hand side to get the standard form for a linearized measurement model yields The ith row of the measurement sensitivity matrix H is By solving (3) for x, updatingx, and iterating until convergence, R obtains its state estimatex(t R ): For dynamic applications such as vehicle tracking, the state x(t R ) is typically augmented to include the time derivatives of r R (t R ) and t R (t R ), and the measurement model typically assumes direct measurement of apparent Doppler frequency.

Error Sources
The major sources of error in the estimatesr R andδt R are as follows:

Thermal Noise
Measurement thermal noise at the receiver is one of the components of w ρ i in (1). The effect of thermal noise can be accurately modeled as a white Gaussian random variable with zero mean and standard deviation σ T . For the pseudorange measurement, σ T is typically between 10-30 cm, depending on the signal carrier-to-noise ratio, signal bandwidth, and receiver tracking bandwidth [39]. Estimation of the receiver state from multiple appropriately-weighted measurements with independent thermal-noise errors, and processing such measurements over time through a filter based on the modeled dynamics of the receiver, renders negligible the position-domain effects of uncorrelated zero-mean thermal noise. As a result, thermal noise is not a major contributor to the asymptotic accuracy of a digital map.

Satellite Orbit and Clock Errors
Satellite orbit and clock errors manifest in the modeled satellite positionr S i and the modeled satellite clock biasδt S i . The International GNSS Service (IGS) provides orbit and clock models for GNSS satellites. The predicted ultra rapid orbits and satellite clocks have an accuracy of ∼5 cm and ∼3 ns, respectively [40]. These may add up to ∼1 m of combined pseudorange model error for a given satellite. The 17-h retroactively-available rapid orbits and satellite clock models are accurate to ∼2.5 cm and ∼75 ps RMS errors, respectively [40], adding up to less than 5 cm of RMS error in the modeled pseudorange for a given signal. Since the orbit and clock parameters are fit to measurements made at IGS analysis centers, the errors in the estimated parameters must be asymptotically zero-mean by design of the estimator. For post-processing applications such as mapping, it is reasonable to assume the availability of rapid orbit and satellite clock products, and thus the asymptotic average position errors due to errors in modeled satellite position and clock bias can be reduced to a sub-5-cm level.

Ionospheric Modeling Errors
The code-modulated GNSS signal propagates slower through the ionosphere as compared to vacuum due to the slightly-greater-than-unity group index of refraction for this atmospheric layer. The excess group delay is given as where TEC is the total electron content in electrons/m 2 and f is the frequency of the propagating signal. At the GPS L1 frequency centered at 1575.42 MHz, the excess ionospheric group delay is roughly 16.24 cm per TECU (1 TECU 10 16 electrons/m 2 ). If not modeled, the ionospheric delay can lead to ranging errors greater than 15 m. The ionospheric delay can be estimated via an ionosphere model or, in case of a multi-frequency receiver, eliminated via a combination of multiple-frequency pseudorange measurements. The latter technique does not require any external aiding, but the formation of the ionosphere-free combination exacerbates pseudorange noise, including any biases due to tracking of multipath signals. Compensating for ionospheric delay with the aid of an ionosphere model is applicable to both single-and multi-frequency receivers. It relies on accurate delay modeling based on ionospheric measurements at permanent GNSS reference stations, such as those that form the IGS network. While both methods have their merits, the analysis in this section considers corrections from an ionospheric model, and thus will not be relevant to applications where the ionosphere-free combination is applied. Note that those applications would likely experience worse multipath errors than the ones presented later, requiring a separate multipath analysis along the lines of Section 3.3.5.
Ionospheric model accuracy was studied comprehensively in [15]. The method in [15] generates unambiguous carrier-phase measurements from a global distribution of permanent receivers to compute the true slant total electronic content (STEC) for each satellite, and compares the model prediction for a number of models with the ground truth. In [41], the same authors compared PPP convergence times when applying different ionospheric correction models. This section extends the analysis in [15,41] to examine whether there exist long-term position-domain biases in enhanced code-phase positioning.
The post-fit residuals for multiple regional and global ionospheric models, computed as described in [15], were graciously made available by the same authors for the year 2014. These residuals were computed for GPS signals as observed at about 150 reference stations around the globe at 5 min intervals.
To observe the position-domain effect of the ionospheric modeling errors in isolation, this section neglects all other error sources, reducing the linearized measurement model in (3) to z = Hx +Ĩ.
Historical GPS satellite almanacs can be combined with the timestamps from the residuals data to obtain the measurement sensitivity matrix H at each epoch for each station. With an elevation-dependent measurement covariance matrix R, the error in the weighted least-squares solution due to errors in ionospheric modeling is  Figure 1 points in the direction of the position bias in the east-north plane, as estimated over 12 months of data from 2014 (more than 800,000 samples per station). The magnitude of the horizontal position bias is depicted by the color of the arrow according to the scale shown on the right. Interestingly, there is a clear trend of southward bias in the position error for most stations in the northern hemisphere, and a mild trend of northward bias in the position error for stations in the southern hemisphere. A numerical summary of the IGS GIM position bias is presented in Table 1, along with a similar analysis for the Wide Area Augmentation System (WAAS) ionospheric corrections available for the contiguous United States (CONUS) region. As reported in [15], the WAAS model was found to exhibit a significantly smaller RMS error in ionosphere TEC estimates when compared to the IGS GIM; however the long-term position bias due to WAAS corrections is similar to or worse than those for the IGS model. Another global ionospheric model, the Fast PPP model [41], was also studied as above. Fast PPP natively models the ionosphere as a two-layered shell, but is also made available in the standard one-layer IONEX (ionosphere-map exchange) format [15] for dissemination. The results presented in Table 1 represent the IONEX version of Fast PPP. In comparison with the IGS corrections, it is clear that the Fast PPP IONEX GIM corrections result in substantially unbiased long-term position errors at the global test locations. However, it must be conceded that the results in Table 1 are best-case results, as they are based on data from the same permanent reference stations used to constrain the model.

Ionosphere Model
Region To understand the reason behind the systematic biases with IGS corrections, note that any ionospheric modeling bias that identically affects all satellites does not have any impact on the accuracy of the GNSS position solution, as this common error is absorbed inδt R . Rather, position-domain biases arise from the azimuthal-and elevation-dependence of ionosphere model errors. From analysis of the spatial distribution of post-fit residuals, it was found that appreciable azimuthal and elevation residual gradients persist in the IGS ionospheric corrections. These gradients are represented graphically in Figure  From Figure 2, it is clear that the elevation gradients in the ionospheric residuals are pronounced. A subtle azimuthal gradient also exists, mainly along the north-south direction. Such spatial non-uniformity, coupled with the non-uniform satellite angular distribution, may be the reason for the observed persistent position biases. While the elevation gradients are consistent for stations at all locations, the azimuthal gradients appear to invert along the north-south direction between the northern and southern hemisphere. This is likely the reason for the opposite direction of the average horizontal position bias in the northern and southern hemispheres.
Such persistent position-domain biases due to inaccurate ionospheric modeling have not been previously reported in the literature, and are a rather remarkable result. While some single-frequency PPP (SF-PPP) techniques eliminate the ionospheric delays based on the GRAPHIC combination [42], many other techniques that rely solely on ionospheric corrections from GIMs have been shown to achieve 30 cm 95% accuracy in the east-north plane after convergence, with sub-10-cm bias [43], seemingly contradicting the results here. The key difference is that the SF-PPP methods involve estimation of a float carrier ambiguity term for each satellite arc. A portion of the systematic biases in the GIM estimates is likely absorbed in these states of the estimator, thereby attenuating the position biases in the east-north plane. For instance, the SF-PPP technique in [43] is based on the phase-adjusted pseudorange algorithm [44], wherein the ambiguity term for each satellite, physically an unknown constant, is in fact iteratively estimated with small but non-zero process noise. In such an estimator, the ambiguity term can absorb slowly time-varying systematic biases. In other SF-PPP techniques, the ionospheric correction term is explicitly included as a state to be estimated, and the estimates from GIM are applied as pseudo-observations [45,46]. Once again, decimeter-level biases in the GIM estimates of the ionospheric delay may not necessarily appear in the final reported position accuracy of the SF-PPP method. Of course, such absorption of biases in augmented states is not undesirable. However, for the case of urban vehicular positioning, convergence of SF-PPP is a concern due to carrier phase cycle slipping, as discussed earlier. In an enhanced code-phase-based receiver, the high variance of the code noise leads to poor observability of the decimeter-level horizontal position bias due to ionospheric modeling errors. Thus, ionospheric biases are not often estimated in a code-phase-based GNSS estimator.
Another factor of note is that 2014 was a maximum in the 11-year solar activity cycle, and thus the IGS GIM accuracy may have been worse than usual over this period of time.
In conclusion, persistent decimeter-level biases in the east-north plane and meter-level biases in the vertical direction can arise when ionospheric delay corrections are sourced from the IGS GIM, or similar, even under ideal open-sky conditions. More advanced models of the ionosphere with more accurate slant TEC measurements may achieve better results. Elimination of the ionospheric delay based on the ionosphere-free combination is another option, but tends to worsen multipath-induced position errors. If corrections from some ionosphere model lead to unbiased position errors, then for globally-referencing digital maps by averaging GNSS measurements over many sessions it is advisable to avoid the combination of multi-frequency signals.

Tropospheric Modeling Errors
In the troposphere, or more generally the neutral atmosphere, the index of refraction departs from unity much less than in ionosphere at GNSS frequencies, causing a delay of ∼2.4 m at zenith. The index of refraction in the troposphere is non-dispersive, and thus cannot be estimated using multiple-frequency signals. The tropospheric delay is obtained from models of the climatological parameters (temperature, pressure, and water vapor pressure) along the propagation path.
State-of-the-art tropospheric models [16] fit a small number of location-and day-of-year-dependent coefficients to climatological data from numerical weather models (NWMs) to model the zenith tropospheric delay. The zenith delay is mapped to any elevation angle using mapping functions [47]. Similar to the ionospheric models, the tropospheric mapping functions may introduce a differential azimuth-and elevation-dependent error. For empirically-derived mapping functions such as VMF1 [47] and GMF [48], the mean error at lowest elevation of 5 • has been shown to be under 50 mm (this value is typically reported as 10 mm station height error, which is approximately one-fifth of the delay error at lowest elevation [47]). As a result, this paper assumes that time-averaged tropospheric model errors would introduce sub-5-cm errors in the position domain, and would thus not impede asymptotically accurate collaborative mapping in both horizontal and vertical components at the several-decimeters level.

Multipath Error
In ideal circumstances, each signal received from an overhead satellite arrives only along the least-time path. In practice, however, this so-called line-of-sight (LOS) component is accompanied by other components due to signal diffraction and single-or multiple-signal reflections off surrounding surfaces and obstacles (e.g., the glass facade of a nearby building, poles, trees, etc.). The complex baseband representation of the N signal components received from a particular satellite at a particular frequency and code is is the delay of the ith signal component relative to an unobstructed LOS signal, and θ i (t) is the beat carrier phase of the ith component. The combination of multiple components distorts the received signal and causes errors in the pseudorange and phase measurements.
Unlike the study of ionospheric modeling errors, for application in urban mapping, multipath errors cannot be characterized with data from survey stations with a clear view of the sky. This section considers a simulation approach for scalable analysis of multipath tracking errors in an urban environment. The objective of this study was to inspect the presence of persistent biases caused by multipath due to the surrounding structure in the navigation solution averaged over multiple sessions

Scenario Setup
The present simulation study was based on the open-access Land Mobile Satellite Channel Model (LMSCM) [49], itself based on extensive experimentation with a wideband airborne transmitter at GNSS frequencies in urban and suburban environments. First, an urban corridor was simulated stochastically following the procedure described in [50]. The corridor was composed of buildings, trees, and poles. Some of the important parameters for the generation of the scene are summarized in Table 2, and a part of the scene realization is shown in Figure 3. Multi-GNSS satellite trajectories were generated at randomly-selected times based on GPS and Galileo satellite almanac data. An average of 25 satellites were available above an elevation mask of 5 • , consistent with modern multi-GNSS receivers. The satellites were assumed to be stationary over the simulation period of 60 s. Navigation solution errors were computed over 1000 60-s sessions.  The vehicle trajectory was kept consistent across all 1000 driving sessions to avoid decorrelation of multipath error due to variable receiver motion. The trajectory was parametrized by its speed and heading, as described in [50]. The vehicle started at the zero coordinate on the along-track axis, and traveled in the positive direction, which was assumed to be aligned with the local north. The simulated trajectory was 60 s long and simulated a vehicle in stop-and-go traffic executing one 90 • right turn, as shown in Figure 4. The vehicle traveled roughly 430 m and faced eastwards at the end of the trajectory. The three low-speed intervals, marked with red line segments in Figure 3, are expected to present severe multipath effects since multipath errors decorrelate slowly, and thus tend to reinforce one another within the navigation filter, when the vehicle moves slowly.

Multipath Simulation
The LMSCM generates power, delay, and carrier phase for N LOS and echo signals. The interaction of the LOS with the simulated obstacles is governed by deterministic models for attenuation, diffraction, and delay. The LOS components of the combined signal, denoted r LOS (t), may be composed of multiple components due to signal diffraction. These components are modeled as In the special case of an unobstructed LOS signal, where λ denotes the wavelength and γ 0 is a constant due to phase initialization in the satellite and receiver [51]. The LMSCM generates the N − N LOS NLOS echoes stochastically based on satellite azimuth and elevation, receiver dynamics, and general characteristics of the scene (e.g., an urban car scenario). This stochastic procedure might not be representative of multipath over multiple sessions through the same urban corridor, where certain echoes might persist over different sessions. To address this limitation, the LMSCM was augmented by the present authors to generate one-and two-bounce deterministic reflective NLOS echoes off the simulated buildings, and a one-bounce NLOS echo off the ground surface. These three additional reflective NLOS echoes, denoted r DET (t), were added to r(t) and are modeled as where b i (t) ∈ {0, 1} denotes whether the surrounding geometry supports the reflective echo. Since these reflections are expected to be the stronger than other diffracted and multiple-bounce NLOS echoes, the amplitudes A i (t), i ∈ {N, N + 2} for reflective echoes were drawn from the distribution of the strongest echo generated stochastically by the LMSCM at each epoch. By experiment, this distribution was found to be log-normal with 20 log 10 (A i ) ∼ N (−22, 5), i ∈ {N, N + 2}. The delays for the reflective echoes are given as where r R (t) is the position of the imaginary image antenna [52] about the reflecting plane (building or ground). Similarly, the carrier-phase of the reflective echoes is computed geometrically as A random carrier-phase offset θ i (t) ∈ [0, 2π) was added at the reflection point every time a new reflective echo was spawned to simulate the material-specific phase offset introduced by the reflection process.

Receiver
A receiver simulator was developed to account for the mediating effects that a receiver's tracking loops and navigation filter have on multipath-induced position errors in a receiver's reported position solution. The simulated receiver tracks the combination of all N LOS line-of-sight signals and N + 2 − N LOS multipath echoes for a given signal. If R(τ) denotes the correlation function of the GNSS signal's spreading code, then the multipath delay error in the tracked code phase, relative to unobstructed LOS, is given as the solution to [52] where θ c is the tracked carrier-phase of the combined signal: The paramter d is the early-to-late correlator spacing in the receiver. It is well-known that a wide-bandwidth receiver with narrow correlator spacing mitigates the effect of multipath [52]. To this end, the receiver considered in this simulation implements d = 0.1. It must be mentioned that R(τ) was implemented as the correlation function for GPS L1 C/A identically for all the simulated signals. Modernized GNSS signals have better multipath mitigation characteristics [11], but this behavior was not included in the simulation. Another important observation is that when the LOS signal is strong as compared to the echo signals, the time derivative of the tracked carrier-phase is equal to the Doppler frequency of the LOS signal, which changes smoothly in accordance with the motion between the satellite and the receiver. However, when the LOS signal is comparable to or weaker than other rapidly-decorrelating echoes, the combined carrier-phase is uniformly random. In a GNSS receiver, the phase lock loop's phase-lock indicator indicates whether a sufficiently strong LOS signal is available, enabling carrier lock [6]. The simulator's phase-lock indicator is asserted only if (1) the tracked Doppler frequency does not deviate significantly from a second-order polynomial, and (2) the strongest received echo is attenuated by more than 25 dB with respect to an unattenuated signal.

Navigation Filter
At each epoch, n z multipath-free, ionosphere-free, and troposphere-free simulated pseudorange measurements were combined with corresponding simulated multipath tracking delay errors and fed to a navigation filter that estimates the receiver state. The navigation filter implemented in this paper is an extended Kalman filter (EKF) with a nearly constant velocity motion model following [53]. The standard details of the EKF are omitted for brevity.
The effect of multipath tracking on the navigation solution is strongly dependent on the receiver's multipath rejection scheme. Two schemes are explored here. The first is a hypothetical ideal multipath rejection scheme that excludes all signals for which the LOS signal has a smaller-than-10-dB advantage over its multipath echoes. The second scheme implements a normalized innovation squared (NIS) test to reject multipath signals based on measurement innovations [53]. At the (k + 1)th measurement update step, the difference between the predicted and observed measurement vector, called the innovation and denoted ν(k + 1), is squared and normalized by its covariance, which is the sum of the measurement covariance matrix, R(k + 1), and the propagated state covariance transformed through the measurement sensitivity matrix, H(k + 1)P(k + 1|k)H(k + 1) T . In the absence of multipath tracking errors, the resulting NIS statistic is chi-squared distributed with n z degrees of freedom. If the NIS statistic exceeds a chosen threshold, then the signal with the largest normalized innovation is dropped. This continues until the NIS statistic falls below the threshold or the number of remaining signals drops to a preset minimum number of required signals.  Figure 5b shows that the NIS test based exclusion of signals was able to approach the performance of ideal exclusion in the horizontal plane, save for the first stationary period where the vehicle was moving at low speed between buildings on both sides. The average vertical position error was much worse, growing as large as 1.75 m in magnitude.

Simulation Results
To determine whether the average errors shown in Figure 5 are in fact persistent biases, a study of the standard deviation of position errors was conducted. The standard deviation of the average errors in east, north, and up directions was computed for disjoint averaging ensembles of size 1, 2, 4, 8, 16, 32, 50, and 100 sessions taken from the total of 1000 simulated sessions. For instance, 125 disjoint ensembles of eight sessions were selected, and the position errors were averaged over the eight sessions in each set. The standard deviation of the eight-session-averaged errors was then computed across the 125 ensembles. In the case of an averaging ensemble with only a single session (i.e., no averaging), the computed standard deviation is simply the measured standard deviation of the position error across all 1000 simulated runs. In the case of averaging over 100 sessions, the standard deviation is computed based on 10 disjoint averaging ensembles of 100 sessions each.
Note that because the simulation study was based on the same 1000 simulations for all averaging ensembles, the east, north, and up means taken across all averaging ensembles are equivalent to those shown in Figure 5. The more interesting trend is the decreasing standard deviation with increasing size of the averaging ensemble, as shown in Figure 6 for the case of NIS-based multipath rejection and for the east and north error components. As expected, the standard deviation of errors was higher at locations where the vehicle moved at low speed and multipath decorrelated slowly. Additionally, the standard deviation was larger at the beginning of the trajectory where the street was lined with tall buildings on both sides. The standard deviation of the average east and north position error over 100 sessions was bounded below 15-20 cm. Thus, it is highly likely that the ∼40-cm error in the north direction between 15-20 s in Figure 5b is in fact a persistent non-zero bias. Table 3 summarizes the results of the multipath simulation study. It shows the 95-percentile horizontal error magnitude for increasing averaging ensemble sizes and for both ideal and NIS-based multipath exclusion. The 0-60 s average case lists the 95-percentile error over the entire trajectory, whereas the 13-19 s average case lists the 95-percentile error in the worst-case segment of the trajectory in terms of horizontal position bias and standard deviation. This challenging segment is illustrative of persistent problem spots that will arise in urban areas, within which multipath-induced biases will be larger than average. As expected, the 95-percentile error in Table 3 shrank as the averaging ensemble size became larger. For the urban corridor and vehicle dynamics considered in this simulation, NIS-based exclusion achieved 35 cm 95-percentile horizontal error with averaging over 100 sessions. Even in the worst-case region of the trajectory, the 95-percentile horizontal error remained below 50 cm. As multipath exclusion approaches the ideal case, with aid from other sensors or a 3D model of the surroundings, for example, the 95-percentile horizontal error could be reduced to as low as 25 cm for the simulated corridor. From the Section 3.3.3's analysis of asymptotic ionospheric errors, and from this section's multipath simulation study, one can draw the following conclusion: so long as the asymptotic horizontal position errors of the ionosphere corrections are below 5 cm, as is true for the Fast-PPP model, and assuming statistical independence of ionospheric and multipath errors, it appears feasible to achieve 50-cm horizontal positioning accuracy at approximately 95% by averaging over 100 mapping sessions.

Globally-Referenced Electro-Optical SLAM (GEOSLAM)
This section describes a simultaneous localization and mapping (SLAM) pipeline capable of globally-referenced collaborative multi-session digital mapping. The pipeline combines visual measurements from a stereo visible-light camera system with position measurements from GNSS signals. The objective of this pipeline is to demonstrate the development of an accurate digital map based on multiple mapping sessions with standard GNSS position estimates. The following sections detail the visual SLAM pipeline, the integration of GNSS measurements, and the techniques for multi-session mapping developed in GEOSLAM.

Visual SLAM
The visual SLAM component of GEOSLAM is similar to existing high-performance SLAM pipelines developed in the robotics community [54][55][56]. Visual SLAM algorithms may be categorized as either sparse or dense. Sparse visual SLAM algorithms [54,55] create a map of distinctive features such as corners or edges in the scene, while dense SLAM algorithms [56] map the depth for each pixel in the captured frames. The point cloud generated by sparse SLAM algorithms is sufficient for the purpose of localization. Dense reconstruction is appealing to the human eye, but does not provide any tangible benefit to localization, while consuming much more computational resources. As a result, GEOSLAM implements sparse feature-point-based SLAM.
In [57] it was shown that for the visual SLAM problem, structure-from-motion BA (batch non-linear optimization) outperforms filtering techniques such as the extended Kalman filter, yielding higher accuracy per unit of computing time. It was also noted that having a high number of features points per image frame provides better accuracy than having a large number of frames with fewer feature points per frame. Thus, in typical practice, only a select subset of frames among those captured is retained for processing; frames in this subset are called keyframes. Most recent state-of-the-art visual SLAM algorithms use a keyframe-based BA approach instead of sequential filtering. Likewise, GEOSLAM performs BA-based non-linear optimization to refine both structure and motion. Figure 7 shows a block diagram representation of the system architecture proposed in this paper. The yellow-colored blocks in this figure are components of the GEOSLAM pipeline, detailed next.  Figure 7. Globally-referenced electro-optical simultaneous localization and mapping (GEOSLAM) block diagram. BA: bundle adjustment.
By way of notation, let z lr nm u l nm , v l nm , u r nm denote the image plane location of the stereo-matched feature matched to the mth map point, p m , in the nth stereo keyframe, K n . The horizontal and vertical coordinates are denoted u and v, respectively, while the superscripts l and r denote the left and right camera frames, respectively. Note that the feature location is specified by only three coordinates. The vertical feature coordinate in the right camera frame, v r nm , is omitted because for an undistorted and rectified camera model it must hold that v l nm = v r nm , making one of the coordinates redundant. If p m is not matched to any feature in K n , then let z lr nm = ∅. Furthermore, let M n = m : z lr nm = ∅ denote the set of indices of all map points matched to some feature in K n . In the visual SLAM literature, the covisibility window of keyframe K i is defined as the set of keyframes that share at least T map points with K i . Mathematically, the covisibility window of keyframe K i is the set of keyframes with indices where |A| denotes the cardinality of the set A. The covisibility window determines the keyframes to be optimized in a windowed BA. The visibility of common points is regarded as a proxy for correlation between the structure-from-motion states. However, in a sensor fusion architecture, the states for other sensors (e.g., GNSS) may be spatially correlated beyond the covisibility window. Furthermore, other sensors may experience outages that extend beyond the covisibility window. In such a scenario, it would be desirable to optimize over a batch of keyframes that span the availability gap. Accordingly, GEOSLAM extends the concept of covisibility to N levels as When processing K i , GEOSLAM's objective is to estimate the map point 3D locations x S p m ∈ R 3 and keyframe poses x S C n , θ S C n ∈ R 3 , R 3 in the N-level covisibility window for the ith keyframe, where S stands for the local SLAM frame, C n is the left camera coordinate frame associated with K n , and θ S C n is the angle-axis representation of the keyframe orientation. The state vector to be estimated is represented as where the two sets on the right-hand side are arranged as a concatenation of row vectors so that X i becomes a column vector. When triggered by the GNSS front end, the camera setup captures a pair, denoted I lr i , of concurrent images from the left and right cameras, where the subscript denotes that the current pair is a candidate to be the ith stereo keyframe K i . The intrinsic and extrinsic parameters of the stereo camera setup are assumed to have been calibrated a priori. The stereo image pair is then undistorted and rectified according to the given calibration, and SIFT features are detected and computed separately for each image [58]. SIFT feature matching is performed between the left and right image with the additional constraint that matching features must have approximately the same vertical coordinate to within a few pixels. The set of stereo feature measurements for I lr i , f lr i , and the set of feature descriptors as computed in the left image, d l i , are passed on to the tracking module.
The tracking module has access to the 3D map point positions within X i and to the set of SIFT descriptors, D map i , corresponding to the map points expected to be seen in the candidate keyframe I lr i . The tracker performs directed matching of the features between the stereo image and the map. First, a quick feature matching is performed using the Fast Approximate Nearest Neighbor Search Library (FLANN) [59]. With sufficient matches, an initial approximation of the current camera pose is obtained using the five-point algorithm wrapped in random sample consensus (RANSAC) iterations. With this approximate pose, an iteration of exhaustive nearest neighbor search is performed for each map point potentially in view of the camera, but only within a small window of its projected position on the image plane. Subsequently, RANSAC iterations are performed on the full set of feature matches to remove any remaining outliers, and a motion-only BA is performed wherein the current camera pose is optimized based on the feature matches to a fixed set of 3D map points.
After tracking the stereo image pair as described above, GEOSLAM decides whether or not the candidate keyframe I lr i must be selected as a keyframe. This decision is made based on the number of map points that were matched to the image features, and the distance traveled by the platform since the last keyframe was chosen. New keyframes are not spawned if the platform is nearly stationary. If the platform is in motion, and the number of feature matches to the map drops below a threshold, then the candidate keyframe I lr i is chosen as keyframe K i , windowed BA is performed over N levels of covisibility, and the unmatched stereo features in K i are spawned as new map points. Additionally, if I lr i is selected to be K i , then the set of measurements from the features matched to the map, denoted z lr i z lr im : m ∈ M i , along with their SIFT descriptors, are passed on to the map module for storage, future feature matching, and processing in the windowed BA routine.
In a visual-only SLAM system, the state vector X i is optimized with respect to the measurement vector Z i , defined as The windowed BA routine in GEOSLAM minimizes the 3D-to-2D reprojection error. The error term e nm for observation of map point p m in the stereo keyframe K n is given as where Π is the projection function for an undistorted and rectified stereo camera in which R(·) denotes the rotation matrix corresponding to the argument angle-axis vector, and f , (c u , c v ), and b are the focal length, the principal point, and the baseline distance between the left and right cameras of the rectified stereo camera model, respectively. The cost function to be minimized for visual SLAM is given as where ρ may be the standard least squares cost function ρ(·) = (·) or a more robust cost function such as the Huber or Tukey cost functions, and where Ω nm = σ 2 nm I 3×3 is the covariance of the feature measurements. GEOSLAM performs BA minimization via Google's ceres-solver. The automatic differentiation feature of ceres-solver is used to compute the Jacobian for the measurement model.
An important feature of the GEOSLAM visual pipeline is the ability to merge maps from multiple mapping sessions. This is embodied in an algorithm similar to the loop closure technique from the visual SLAM literature [60]. Map merging is described in detail in Section 4.3.2.

GNSS Aiding
Conventional visual SLAM algorithms are known to drift from the true platform trajectory as a function of the distance traveled by the platform. Furthermore, the map of the structure is created in the arbitrary S frame. Such a map cannot be intelligibly shared with another mapping agent having a different S frame. Meanwhile, GNSS position estimates are obtained in the global G frame and do not exhibit any distance-dependent drift. Accordingly, GEOSLAM ingests standard GNSS position estimates from a software-defined GNSS receiver, called GRID/pprx [6,61], in a tightly-coupled architecture to create a globally-referenced map, enable cooperative multi-session mapping, and constrain the drift of visual SLAM. Since the stereo camera setup is triggered by the same clock that drives digitization of the GNSS samples (see Figure 7), it is possible to produce GNSS measurements synchronized with the camera image epochs. This section details the various coordinate frames in GNSS-aided visual SLAM, the updated BA cost function, and an initialization routine required to enable GNSS aiding in SLAM.

Coordinate Frames
The GNSS-aided visual SLAM system has three coordinate frames of interest: the K i camera frame C i , the local SLAM frame S, and the global frame G. The SLAM frame S adopts the position and orientation of the first keyframe prior to optimization as its origin and orientation. Thus, S is fixed relative to G, but each C i changes relative to G as the platform moves.
Note that the structure-from-motion states in Equation (4) are represented in the S frame, whereas the K n th keyframe's corresponding GNSS measurement, denoted z G A n ∈ R 3 , is natively represented in the G frame. The latter is transformed to the S frame through an unknown but fixed rotation, R S G ∈ SO(3), and translation, t S G ∈ R 3 . This transformation is estimated at initialization as explained in Section 4.2.2. After initialization, z G A n is rendered in S as GEOSLAM estimates the 6-DoF pose of the left camera, but the GNSS antenna phase center is not co-located with the camera center; rather, it is offset from the camera center by a fixed vector (same for all n) in C n denoted, t C n A n ∈ R 3 . Thus, the error term associated with the GNSS position estimate for K n is given as Under the assumption of temporally-uncorrelated GNSS errors, the updated BA cost function to be minimized is where Γ n = R S G Γ n R S G T and Γ n is the covariance matrix of the GNSS position estimate associated with K n , expressed in G.

Initialization in GNSS-Aided SLAM
When initializing, GEOSLAM performs visual-only SLAM for the first N i keyframes in the S frame, and stores the GNSS position measurements of the antenna provided in the G frame. Subsequently, GEOSLAM finds the least-squares Euclidean transformation to obtain the optimal rotation matrix R S G and translation vector t S G between the two coordinate systems from the set of vector observations. Note that a full similarity transformation is not required since the known stereo baseline renders the S frame with correct scaling. The estimated Euclidean transformation minimizes the squared difference between the transformed GNSS measurements in S and the visual SLAM predicted trajectory of the GNSS antenna, also in S. The specific method used to estimate the transformation is based on SVD decomposition as discussed in [62].
It must be noted that this transformation need only be approximately correct such that the GNSS estimates, used as measurements, will not diverge with respect to the visually-derived trajectory. Because the jointly estimated trajectory in S gets transformed back to G using the same approximate transformation, any errors in the transformation are cancelled.

Multi-Session Mapping
Refinement of the visual feature map over multiple sessions with time-separated GNSS measurements is central to the idea of approaching the accuracy limit of mapping with standard GNSS. Consider a vehicle revisiting an area mapped previously in one or more sessions. When GEOSLAM matches greater than T features in the current keyframe to the features already present in the prior map, the keyframes from the previous sessions in that section of the map are included in the covisibility window of the current keyframe. After such a merge is detected and verified, a BA may be performed on the covisible keyframes from multiple sessions to average time-separated standard GNSS errors. It is important to note that multi-session mapping can only be realized when sufficient feature matches are found between multiple sessions. This is not a straightforward task, as evidenced by recent efforts on lifelong feature mapping efforts [63]. This issue is further discussed in Section 5. In the current section, multi-session map database management and map merging are discussed.

Map Database
Storage and reuse of maps is a pre-requisite for multi-session mapping. For a given session, the SLAM map is created in the S frame. However, such a map is not readily usable in successive mapping sessions since the S frame is distinct for each session. Fortunately, the integration of visual SLAM with GNSS enables transformation of the SLAM map in to the G frame.
At the end of the pth mapping session in the local frame S p , GEOSLAM stores the data to a map database after applying the R transformation, as estimated during initialization for the pth session, to all map point positions, all keyframe poses, and all GNSS measurements associated with each keyframe: At the beginning of the (p + 1)th session, GEOSLAM again estimates the R S p+1 G , t S p+1 G transformation during initialization. The map database from previous session(s) is then loaded after applying the transformation for the p + 1 session, such that the prior map points, keyframes, and measurements are rendered in the S p+1 frame: After loading the prior map, the standard GEOSLAM pipeline is executed for each stereo image pair in the (p + 1)th session. In addition, GEOSLAM attempts to detect if the vehicle is currently passing through a previously-mapped region. If so, a map merge is declared and the current and prior keyframes are jointly optimized, as detailed in Section 4.3.2. Finally, at the end of the mapping session, the combined map is stored back in the database as described before.

Map Merging
As mentioned before, the matching of feature points across multiple sessions is central to the idea of averaging standard GNSS errors. Once sufficiently many features are matched between the current stereo keyframes and prior map points, GEOSLAM declares a map merging event. This is akin to the well-known problem of detecting loop closure in the visual SLAM literature [60]. This section details GEOSLAM's map merging and loop closing routine. Hereafter, the terms map merging and loop closure are used interchangeably since GEOSLAM treats them identically.
First, note that when detecting a map merge event, feature matching must be attempted against map points that have not been matched in the most recent keyframes. Thus, after processing the ith keyframe, a possible merge is checked for against the set of map points {m : m / ∈ ∪ n∈cov(i,N) M n }. If this bag-of-words-style feature matching succeeds, then RANSAC iterations are performed to determine whether the matches are geometrically consistent, as well as to robustly estimate the camera pose x S C i ,θ S C i implied by the merge event. If enough inliers are found, the map merge routine is executed.
The map merging process is depicted visually in Figures 8-10. A typical merge situation is shown in Figure 8, where the platform pose at the ith keyframe is inconsistent with the prior map at the merge location. To avoid such discontinuity in the ensuing joint BA, as an initial guess GEOSLAM enforces that the ith keyframe pose be consistent with the pose implied by the visual merge matches x S C i ,θ S C i , and that the keyframes and map points from the prior session(s) be unchanged. To this end, a pose-graph optimization [64] is performed over a large N m -level covisibility window for K i , where the relative translations and rotations between covisible keyframes, as estimated in the current session, are provided as delta-pose measurements, while the 6-DoF poses for the terminal nodes in the covisibility window, as well as for K i , are held constant. In particular, let K 0 denote the set of terminal keyframes in the covisibility graph cov(i, N m ). Furthermore, define the delta-pose pseudo-measurements where the superscript (·) denotes GEOSLAM's estimate of the state before the merge event, and θ(·) denotes the angle-axis representation of the input rotation matrix. The pose-graph optimization minimizes the following cost function with respect to x S C n , θ S C n ∀n ∈ cov(i, N m ): where q 2 P = q T Pq, with the following constraints:   As a result of this pose-graph optimization, any discontinuity at the merge location between the prior and current keyframes is smoothed out, as shown in Figure 9. Subsequently, the map points as seen in the current keyframes are also adjusted in accordance to the pose-graph optimization. Finally, the duplicated map points near the merge location are fused together. As a result of shared feature matches between the current and prior keyframes, the updated covisibility window for K i includes keyframes from prior session(s).
The merged map is then optimized in a windowed BA, this time with feature point coordinates and GNSS positions as measurements. Note that this is a joint windowed BA with both current and prior keyframes and map points. As a result, both the current and prior states are appropriately adjusted based on the number and covariance of the feature point and GNSS measurements. The result of the map merging routine is shown in Figure 10.

Empirical Results
To validate the results obtained in the above analyses, GNSS and visual data were collected in a moderate urban area north of the University of Texas at Austin campus in Austin, TX. This section presents the data collection setup, error statistics of various flavors of code-phase GNSS positioning, and results from GEOSLAM's multi-session GNSS-aided-visual mapping.

Rover and Reference Platforms
The rover GNSS receiver is one among several sensors housed in an integrated perception platform called the University of Texas Sensorium [6]. Designed for connected and automated vehicle research, the Sensorium is a self-contained sensor housing that can be mounted atop any standard passenger vehicle. Two Antcom G8Ant-3A4TNB1 triple-frequency patch antennas are flush-mounted in the cross-track direction on the Sensorium's upper plate, separated by just over one meter. The antennas' signals are routed to a unified radio frequency (RF) front end whose output intermediate frequency (IF) samples are processed in real-time (to within less than 10 ms latency) by the Sensorium's onboard computer.
The samples are also stored to disk for post-processing. The experimental setup also includes a surveyed GNSS reference station that aids in the generation of the ground truth trajectory.
The GNSS data were processed by a software-defined GNSS receiver tracking signals from GPS L1 C/A, GPS L2CLM, Galileo E1, and SBAS. Data from both the primary (passenger's side) and secondary (driver's side) antennas were used to reconstruct a sub-dm-accurate CDGNSS-based ground truth trajectory, as described in [6]. Enhanced code-phase positioning was performed on the data from the primary antenna, incorporating precise orbit and clock products from IGS, ionospheric corrections from WAAS satellites, and the Saastamoinen model for tropospheric corrections, in addition to NIS-based exclusion of multipath signals. Double-differenced pseudorange-based positioning was also performed with the data from the primary antenna, as discussed later in this section. The code-phase-based position estimates were (i) compared against the ground truth from the primary antenna to study the code-phase positioning error statistics, and (ii) fed to GEOSLAM for vision-GNSS sensor fusion. The primary antenna feed was also input to a ublox M8T receiver for comparison against the enhanced code-phase software receiver.
The Sensorium features a front-facing stereo camera rig composed of two Basler acA2040-35gm cameras that capture synchronous stereo image pairs when triggered by a signal tied to the GNSS front-end's sampling clock. The images are captured in grayscale at 10 frames per second and timestamped by the Sensorium's computer. The cameras are configured to automatically adjust the exposure time based on lighting, while the focal length, focus, and aperture are held fixed, having been adjusted physically prior to capture.

Test Route
The test route was a 1-km loop north of the University of Texas at Austin campus in Austin, TX. The route included a variety of light-to-moderate urban conditions, from open-sky to overhanging trees to built-up areas. The Dean Keeton corridor, toward the left in Figure 11, was the most challenging stretch along the test route for GNSS positioning. It passes below a pedestrian bridge and is flanked on both sides by buildings ranging from 30 to 65 meters tall set back 28 meters from the center of the roadway. To study the code-phase-based positioning error characteristics over time-separated sessions in the same area, and to perform multi-session mapping with GEOSLAM, multiple laps of the test route were driven over six separate campaigns. The first two campaigns were conducted on 21 December 2017 and 15 January 2018, while the other four campaigns were conducted in pairs of two on 3 June 2018 and 4 June 2018. The GNSS error charts are presented for a total of 75 laps of the test route, while multi-session mapping with GEOSLAM was performed over eight laps/sessions of data from the four latest campaigns.
Imagery collected over the four June 2018 campaigns exhibits appreciable visual diversity, offering a real-world challenge to multi-session GEOSLAM operation. Figure 12a Figure 13 shows the error in the enhanced code-phase GNSS position solutions with respect to the ground truth. The error is plotted versus the distance along the 1-km loop. The beginning of this loop was taken to be immediately after the overhead pedestrian bridge along the Dean Keeton corridor. It is observed that the enhanced code-phase GNSS errors are clustered separately for each of the campaigns, and that each cluster is offset from zero by as much as 1 m in the horizontal plane. Such error characteristics are representative of ionospheric modeling errors, which have a long decorrelation time. It is also evident that the error variance was larger as the receiver exits the challenging portion of the loop at which point the tracking loops were recovering from signal loss under the bridge. The effect was especially pronounced in the vertical direction. Figure 14 shows similar errors for the commercial ublox M8T receiver. The error traces from the ublox receiver show a wider spread than the enhanced code-phase receiver, likely due to lack of precise orbit and clock corrections.  On the basis of Figures 13 and 14, one might be tempted to conclude that errors in enhanced code-phase and stand-alone GNSS navigation solutions are substantially non-zero-mean, especially in the north and up directions, despite the overhead GNSS constellation changing substantially between sessions. It certainly appears that the permanent structures (buildings, bridge) along the test loop left a bias in the vertical direction during the first 400 m along the loop. However, the bias in the north direction, and to a lesser extent in the east, may only be an artifact of the small sample size: ionospheric modeling errors were not yet averaged down to nearly zero in the east and ∼30 cm in the north, as one would expect from the WAAS ionospheric model (see Table 1).

Empirical GNSS Error Analysis
Given that the asymptotic properties of ionospheric modeling errors are better understood than those of multipath errors, it is instructive to eliminate, insofar as possible, all ionospheric modeling errors from the along-track error histories. To this end, a differential code phase GNSS technique was applied whereby the navigation solution was based on double-difference pseudorange measurements using data from a nearby reference station at a precisely known location. Such double differencing over a short 1-km baseline eliminates virtually all ionospheric and tropospheric errors, but does nothing to reduce vehicle-side multipath. Thus, one can empirically examine multipath effects in isolation from ionospheric effects. Figure 15 shows the results of this study based on all six data capture campaigns. Note that biases for all components are much smaller. It appears that for the test route chosen, non-zero-mean horizontal errors in the enhanced code phase positions were almost entirely driven by ionospheric modeling errors, and not by persistent effects of multipath due to the permanent structures along the test route. This is broadly consistent with the analyses presented earlier in this paper on position-domain biases due to ionospheric and multipath errors. However, it does appear that a bias due to multipath remained in the vertical direction over the first 400 m, even when ionospheric errors were removed. Apparently, the arrangement of buildings over this segment caused non-line-of-sight effects that did not average away. Mercifully, horizontal errors, which appear to be close to zero-mean over the six campaigns, matter most for high-accuracy digital mapping, since obstacle avoidance and vehicle coordination are largely 2-D problems, and since multiple vehicles can straightforwardly agree on a particular feature's relative vertical position from an inferred road surface. Based on Figure 15, one can conclude that multi-session averaging with a sufficiently accurate ionospheric model, such as the Fast PPP model, yields sub-50-cm global referencing accuracy for digital maps in the horizontal plane with code-phase-based GNSS, even in the presence of persistent multipath.

Multi-Session Mapping Results
GEOSLAM processed two laps/sessions of data from each of the four campaigns conducted on 4 and 5 June, fusing the visual data from the captured images with the double-differenced pseudorange-based position estimates of the primary antenna. Figure 16 summarizes the result from GEOSLAM's multi-session GNSS-aided-visual SLAM. The black data points denote the difference between the ground truth trajectory of the primary GNSS antenna and GEOSLAM's estimate of the same in local east, north, and up directions for all eight sessions. The gray data points denote the difference between the ground truth trajectory of the primary GNSS antenna and the coincident double-differenced pseudorange-based estimate of the same for all eight sessions. As one might expect, the error in GEOSLAM's estimate of the antenna position was approximately the same as the average double-differenced pseudorange-based error over eight sessions. Furthermore, due to the approximately zero-mean nature of the double-differenced pseudorange-based estimates, the GEOSLAM esimate of the trajectory was within 50 cm of the truth trajectory in the horizontal plane. Note that the error in GEOSLAM's position estimate was highly repeatable over eight different sessions, so much so that it appears there is a single black trace in Figure 16, while in truth eight independent traces were plotted. This indicates that (i) the localization of the vehicle within the visual map was highly precise: GEOSLAM made the same errors with respect to ground truth over eight different sessions; and (ii) the visual map was merged across eight sessions from four different campaigns: if the maps from any two campaigns were not merged through visual matching of features, then the GNSS position estimate for a keyframe from one campaign would not affect another keyframe from a different campaign since they would not be covisible, and thus the eight black traces in Figure 16 would not overlap.

Conclusions
The accuracy limits of collaborative global referencing of digital maps with standard GNSS were explored through simulation and real data. The asymptotic average of position errors due to thermal noise, satellite orbit and clock errors, and tropospheric modeling errors were assumed to be negligible. It has been shown that the position error due to inaccurate ionospheric modeling may lead to persistent dm-level biases in the horizontal position if the corrections are sourced from the IGS GIM, but other recent models such as the Fast PPP IONEX GIM perform better in this regard. Multipath errors persist with multiple mapping sessions through the same urban corridor and may not be zero mean. With adequate multipath exclusion, persistent multipath biases may be reduced below 50 cm on average. In conclusion, sub-50-cm accurate digital mapping has been shown to be feasible in the horizontal plane after multiple mapping sessions with code-phase-based GNSS, but larger biases persist in the vertical direction. A globally-referenced electro-optical SLAM pipeline, termed GEOSLAM, has been detailed and demonstrated to achieve sub-50-cm horizontal localization accuracy in a moderate urban environment by incorporating code-phase-based GNSS position estimates in the visual SLAM framework and jointly optimizing maps merged across time-separated sessions.