Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery

Yao, Yating; Dong, Jing; Han, Songlai; Liu, Haiqiao; Hu, Quanfu; Chen, Zhikang

doi:10.3390/app15158257

Open AccessArticle

Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery

by

Yating Yao

¹,

Jing Dong

^1,*,

Songlai Han

¹,

Haiqiao Liu

²

,

Quanfu Hu

¹ and

Zhikang Chen

³

¹

Research Institute of Aerospace Technology, Central South University, Changsha 410083, China

²

School of Electrical and Information Engineering, Hunan Institute of Engineering, Xiangtan 411104, China

³

College of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8257; https://doi.org/10.3390/app15158257

Submission received: 8 June 2025 / Revised: 13 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Navigation and Positioning Based on Multi-Sensor Fusion Technology)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The proposed geo-localization method operates independently of satellite-based navigation systems and enables robust and drift-free global positioning over long-range navigation, making it particularly suitable for autonomous navigation of ground vehicles in GNSS-denied environments.

Abstract

Existing Simultaneous Localization and Mapping (SLAM) algorithms provide precise local pose estimation and real-time scene reconstruction, widely applied in autonomous navigation for land vehicles. However, the odometry of SLAM algorithms exhibits localization drift and error divergence over long-distance operations due to the lack of inherent global constraints. In this paper, we propose a real-time geo-localization method for land vehicles, which only relies on a LiDAR-inertial-visual SLAM (LIV-SLAM) and a referenced image. The proposed method enables long-distance navigation without requiring GPS or loop closure, while eliminating accumulated localization errors. To achieve this, the local map constructed by SLAM is real-timely projected onto a downward-view image, and a highly efficient cross modal matching algorithm is proposed to estimate the global position by aligning the projected local image to a geo-referenced satellite image. The cross-modal algorithm leverages dense texture orientation features, ensuring robustness against cross-modal distortion and local scene changes, and supports efficient correlation in the frequency domain for real-time performance. We also propose a novel adaptive Kalman filter (AKF) to integrate the global position provided by the cross-modal matching and the pose estimated by LIV-SLAM. The proposed AKF is designed to effectively handle observation delays and asynchronous updates while simultaneously rejecting the impact of erroneous matches through an Observation-Aware Gain Scaling (OAGS) mechanism. We verify the proposed algorithm through R³LIVE and NCLT datasets, demonstrating superior computational efficiency, reliability, and accuracy compared to existing methods.

Keywords:

geo-localization; autonomous navigation; SLAM; cross-modal matching; Kalman filter

1. Introduction

Autonomous navigation has drawn significant attention over the past years for its potential application in unmanned system. SLAM enables localization and mapping in GNSS-denied environments, making it a crucial and widely adopted technology for autonomous navigation across various applications [1,2]. Over the years, SLAM has undergone significant advancements, transitioning from pure vision-based approaches to more robust multi-sensor fusion methods that integrate LiDAR, inertial measurement units (IMU), and cameras to enhance localization accuracy and robustness. The most cutting-edge and advanced LIV-SLAM systems [3,4,5,6] exploit the depth accuracy of LiDAR, the motion constraints of IMU, and the feature richness of vision sensors to achieve high-precision and reliable localization and mapping in complex, real-world environments.

Despite the significant improvements achieved by the recent works, SLAM still faces several challenges. One major issue is the accumulation of errors over long-distance operation, leading to localization drift and map distortion. Additionally, global optimization for correcting accumulated errors in SLAM systems depends heavily on loop closure detection [7,8,9,10]. However, in many real-world scenarios, loop closures may be absent or challenging to detect.

Motivated by the limits of current methods and practical demands, we proposed a real-time geo-localizing navigation method for land vehicles that relies only on a LIV-SLAM and a referenced image. Our method is designed to function without GPS or loop closure. Instead, it estimates the global position by aligning the real-time constructed map to a referenced image with an efficient cross modal matching algorithm.

This paper makes three main contributions:

(1) A fast and robust cross-modal matching algorithm is proposed for the real-time geo-localization for land vehicles. The matching algorithm is designed to address the distortion caused by cross-modal, downward-view projection and cross-temporal scene variations, as shown in Figure 1. Figure 1a,b demonstrate cross-modal distortion, where the grayscale appearance of the same object varies significantly across modalities. Figure 1c,d depict projecting distortion, in which the object’s appearance changes due to geometric transformation. Figure 1e,f show scene variations resulting from temporal discrepancies between the projection image and the referenced image. Notably, all projection images contain invalid regions, typically visualized as black areas, which may introduce ambiguity and reduce the robustness of the matching process. We exploit Dense Structure Principal Direction (DSPD) [11] to capture gradient-orientation-based texture patterns, which remains consistent for cross-domain images.

Furthermore, the similarity measurement is optimized, and the searching efficiency is significantly improved with a fast correlation in the frequency domain. The comparison in the ablation experiments demonstrates that the proposed cross-modal matching algorithm achieved the best result in both accuracy and efficiency. A detailed comparative evaluation of different matching algorithms is presented in Section 4.3.1, which clearly shows the superior performance of our method in terms of matching precision and computational speed.

(2) This paper proposes an adaptive Kalman filter based on motion compensation (MC) and an Observation-Aware Gain Scaling (OAGS) mechanism. With motion compensation, the proposed AKF effectively eliminates prediction errors caused by observation delays and asynchronous updates. Ablation experiments demonstrate that MC effectively mitigates the adverse effects of observation delay, preventing degradation in fused localization accuracy, as detailed in the comparison of filters presented in Section 4.3.2. The OAGS mechanism dynamically adjusts the Kalman gain based on the matching coefficient, prediction deviation, and matching inconsistency, minimizing the adverse impact of incorrect matches on state estimation. This enhancement enables the proposed matching-based autonomous navigation approach to more effectively adapt to complex environmental variations, ensuring robust and reliable localization, as validated by the experimental results in Section 4.2 (Adaptability Evaluation) and Section 4.3.2 (Comparison of Filters).

(3) Building on the contributions above, we propose a novel and practical long-distance autonomous navigation approach for land vehicles. The proposed method is evaluated on the R³LIVE [3,4] and NCLT [12] datasets, demonstrating its effectiveness in diverse real-world scenarios. Experimental results show that the proposed method enables real-time global localization and effectively addresses the issue of position drift in LIV-SLAM systems. It achieves an average RMSE of 2.03 m across all sequences of the NCLT dataset in outdoor environments and 2.64 m on the R³LIVE dataset under the same condition.

To clarify the scope of this work, it should be noted that the goal of our research is not to present a universally optimal global localization solution. Rather, we aim to address a practical and increasingly common scenario: autonomous navigation in GNSS-denied environments where LiDAR and visual sensors are available.

2. Related Work

2.1. SLAM

SLAM has evolved significantly with the integration of multi-sensor fusion techniques.

Pure visual-based SLAM (VSLAM) systems, such as Mono-SLAM [13] and PTAM [14], rely solely on cameras to estimate motion and reconstruct the surrounding environment by extracting and tracking visual features across the frame, which suffers from limitations such as scale ambiguity in monocular setups and sensitivity to lighting variations and texture-less environments.

To address these issues, Visual-Inertial SLAM (VI-SLAM) integrates an IMU with visual data, leveraging the IMU’s short-term motion constraints to enhance motion estimation. Representative VIO-SLAM systems include OKVIS [15], VINS-Mono [16], and ORB-SLAM3 [9], which achieve improved robustness in dynamic or visually degraded conditions by tightly coupling visual and inertial data.

LIV-SLAM integrates LiDAR, IMU, and cameras, leveraging the strengths of each sensor to achieve high-precision localization and detailed scene reconstruction. State-of-the-art systems such as R³LIVE [3], R³LIVE++ [4], and Fast-LIVO [5,6] exploit tight LiDAR-visual-inertial fusion to enhance robustness in complex environments. By combining LiDAR’s depth accuracy, IMU’s motion constraints, and visual data’s rich feature representation, LIV-SLAM has become a leading solution for applications such as autonomous navigation, real-time mapping, and virtual reality.

Despite these advancements, challenges remain in real-time computation, large-scale map maintenance, and drift resistance, necessitating further research into efficient multi-sensor fusion strategies.

2.2. Matching Based Global Localization for Land Vehicles

Matching-based global localization typically matches a real-time image with a referenced image or referenced database, which determines geographic position without relying on GPS.

Street-view image matching (SVIM) is a commonly used visual geo-localization technique for land vehicles. It estimates the location of a query image by matching it to a geo-referenced street-view database [17,18,19,20,21,22,23,24]. Most existing SVIM algorithms adopt a retrieval approach, in which the distance between feature embeddings represents the likelihood that two images originate from the same or nearby locations. During inference, nearest neighbor search is performed within the database to find the best match for a given query image. However, this approach often results in limited localization accuracy because the referenced street-view image databases are not dense enough. Furthermore, the SVIM-based applications are often constrained by the incomplete coverage of street-view image databases [25].

Aerial and satellite images offer more extensive and higher-density coverage of outdoor areas than street-view databases, while also enabling efficient acquisition and frequent updates. Cross-view matching methods enable global localization by aligning ground-level images to aerial or satellite referenced images. Deep learning-based approaches, such as CVM-Net [26], utilize Siamese networks to learn viewpoint-invariant feature embeddings, while L2LTR [27] and TransGeo [28] further enhance the matching performance though attention model and positional encoding. Deuser et al. introduced Sample4Geo [29], a contrastive learning-based method leveraging symmetric InfoNCE loss [30,31], which incorporates hard negatives during training to enhance overall accuracy. More recently, Fervers et al. proposed a novel search region layout with consistent cell resolutions, enabling statewide visual geo-localization [25].

Hu and Lee [32] have explored the integration of cross-view matching into autonomous navigation systems using a probabilistic state estimation framework, which improves ground-to-aerial geo-localization through temporal consistency. This work has successfully realized long-distance, drift-free autonomous navigation for land vehicles, and the experiment results show that an average localization accuracy of approximately 20 m can be achieved.

However, cross-view matching methods are typically based on image retrieval techniques, which struggle with precise alignment between query images and reference maps, limiting the localization accuracy [25].

3. Materials and Methods

3.1. System Overview

The overall framework of our system is illustrated in Figure 2, consisting of four main components: LIV-SLAM, local image real-time generating (Section 3.2), fast cross-modal image matching (Section 3.3), and AKF (Section 3.4).

In our system, an existing LIV-SLAM framework is utilized for pose estimation and environment reconstruction. During the mapping process, the newly updated RGB point cloud, as shown in Figure 3a, is fed into a local image real-time generating algorithm. This method effectively addresses the cross-view challenge by incrementally projecting the local observations into a vertical downward-view image, as shown in Figure 3b. The fast cross-modal image matching module then aligns the projected image with the referenced image to get an absolute position in the referenced frame, as shown in Figure 3c. Finally, the AKF module fuses the horizontal speed estimated by LIV-SLAM with the global position obtained from the fast cross-modal image matching algorithm. This integration provides a refined position expressed in the coordinate frame of the referenced image.

Notably, the LIV-SLAM module can be implemented using various state-of-the-art methods, such as Fast-LIO [33,34], R³LIVE [3], R³LIVE++ [4], Fast-LIVO [6,7] and so on. Although Fast-LIO is primarily a LiDAR-inertial odometry, the RGB point cloud map can be rendered through the extrinsic parameters obtained by the calibration between LiDAR and camera. The referenced satellite imagery used in our method is preloaded and stored locally on the onboard computer. No online map services or downloads are required during runtime.

3.2. Local Image Real-Time Generating

The local image generating algorithm integrates features from multiple LiDAR scans and camera frames, thereby significantly enhancing the feature richness compared to single-view approaches.

Figure 4 gives the details of the projection.

Given a point set:

P = \{x, y, z, r, g, b\}

(1)

where

(x, y, z)

represents the three-dimensional spatial coordinates, and

(r, g, b)

represents the corresponding color information. For each pixel at

(c_{x}, c_{y})

, a subset of

P

within a radius is selected as contributing points, denoted as

D

. A projection map can be calculated using a Gaussian-weighted averaging method as follows:

I_{i} (c_{x}, c_{y}) = \frac{\sum_{p_{k} \in D} p_{k} (C_{i}) e^{- d_{k}^{2} / 2 σ^{2}}}{\sum_{p_{k} \in D} e^{- d_{k}^{2} / 2 σ^{2}}}

(2)

where

d_{k}

denotes the Euclidean distance between point

p_{k}

and the calculated pixel

(c_{x}, c_{y})

,

σ

is the standard deviation, and

i

represents the color channel, as each channel of the color

(r, g, b)

is averaged in the same manner.

A straightforward implementation can be adopted. First, a projection image is determined based on the size of the projection area and the specified resolution. The color of each pixel is then computed using Equation (2), as illustrated in Figure 4. Assuming the image size is

n \times n

and there are

m_{1}

points within the projection area, the computational complexity of the approach is

O (n^{2} m_{1})

. It is evident that this complexity increases rapidly with larger projection areas and finer resolutions.

To reduce computational overhead, the projection process is designed in an incremental way, using only the updated point cloud from the LIV-SLAM module. Specifically, each new point is projected onto the pixels within its effective weighting area. The numerator and denominator of (2) are accumulated into a value map

V (c_{x}, c_{y})

and a weight map

W (c_{x}, c_{y})

, respectively. Then, the local projection image is generated by pixel-wise division of the value map by the weight map as follows:

I_{i} (c_{x}, c_{y}) = \frac{V (c_{x}, c_{y})}{W (c_{x}, c_{y})}

(3)

The computational complexities of the first and second steps are

O (r^{2} m_{2})

and

O (n^{2})

, respectively, where

r \times r

is the size of the effective weighting area for a single pixel, and

m_{2}

represents the number of updated points. Typically, the weighting region for a single pixel is much smaller than the entire projection area (

r ≪ n

), and the number of updated points is significantly less than the total number of points within the projection area (

m_{2} ≪ m_{1}

). As a result, the incremental implementation is considerably more efficient. Moreover, the computational complexity remains invariant by increases in image size or resolution, which enhances matching reliability and accuracy by allowing larger projection images to capture more distinctive and detailed features.

3.3. Fast Cross-Modal Matching

To address the distortions illustrated in Figure 1, we proposed a fast cross-modal matching algorithm based on DSPD [11]. DSPD is a feature descriptor based on structural orientation of images, which can effectively adapt to nonlinear intensity variations across different imaging modalities. Although the DSPD feature is robust to cross-modal distortions, it does not consider the invalid regions caused by the projection as shown in Figure 1. Besides, the matching algorithm proposed in [11] suffers from high computational cost, as its similarity metric requires full searching in spatial domain and the affine searching, which in this algorithm leads to excessive search redundancy for the matching between local projection image and the referenced image, which only has translational uncertainty.

To enhance the computational efficiency, we optimize the similarity metric and employ translation full searching, which can be realized with fast correlations in the frequency domain. Specifically, the dissimilarity metric proposed in [11] can be written to a similarity metric defined as follows:

S = \sum_{P} |\frac{π}{2} - |α_{P} - β_{P}||, α_{P}, β_{P} ϵ [- \frac{π}{2}, \frac{π}{2}]

(4)

where

α_{P}

denotes the DSPD at a pixel of the projection image, and

β_{P}

denotes the DSPD at the corresponding pixel of the referenced image.

To incorporate fast correlation in the frequency domain, we modify the similarity metric in (4) by introducing a double-angle cosine similarity as below:

S = \sum_{P} \cos (2 α_{P} - 2 β_{P})

(5)

Both metrics give minimal similarity when the two DSPDs are mutually orthogonal and maximal similarity when they are parallel. Actually, Equation (5) can be derived from Equation (4) according to Fourier expansion. The detailed derivation of the equations is provided in the Supplementary Materials.

Furthermore, the translation full searching-based matching with the double-angle cosine similarity metric can be expressed in the form of two correlations, as shown in (6):

S = \sum_{P} m_{p} \cos 2 α_{P} \cos 2 β_{P} + \sum_{P} m_{p} \sin 2 α_{P} \sin 2 β_{P}

(6)

where

m_{p}

is the mask of the projection image, which can be obtained during the projection process by binarizing the weight map. It is used to eliminate the influence of invalid regions in the projection image on the matching performance, as illustrated in Figure 1.

According to the convolution theorem, the full searching in the correlation of spatial domains can be efficiently implemented in frequency domain with Fast Fourier Transform (FFT) [35].

The translation full-searching with the similarity metric presented in Equation (4) has to be realized in the spatial domain. Given a projection image of size N × N and a search area of size M × M, the computational complexity of the full searching in the spatial domain is

O (N^{2} {(M - N + 1)}^{2})

. In contrast, the translation full-searching with the similarity metric presented in (6) can be realized in the frequency domain, where the FFT dominates the computational cost, resulting in an overall complexity of

O ({2 M}^{2} l o g M)

[35]. The comparison of matching algorithms in ablation experiments (Section 4.3) demonstrates that the optimization significantly enhances the computational efficiency while maintaining the matching accuracy.

3.4. AKF Based on MC and OAGS

(1) Model: The system state

x

of the proposed AKF is given as below:

x = {[p^{T}, v^{T}, a^{T}]}^{T}

(7)

where the three sub-vectors

p^{T}, v^{T}, a^{T}

denote the position, velocity, and acceleration in a horizontal direction, respectively. The process model of the proposed AKF is a typical motion model, which can be expressed as follows:

{\overset{⌵}{x}}_{t} = F {\hat{x}}_{t - 1} + N_{a}, N_{a} ~ N (0, σ_{a})

(8)

where

{\overset{⌵}{x}}_{t}

and

{\hat{x}}_{t}

represent the prior estimation and posterior estimate of the state at time

t

,

F

gives the transition of the system state over time, and the process noise

N_{a}

is attributed to uncertainties in acceleration.

The position observations are passed by the cross-modal matching algorithm, while the LIV-SLAM submodule provides the measurements of velocity, leading to the measurement model as follows:

z = H x_{m} + N_{m}

(9)

where

x_{m}

contains the measurement of position and velocity,

H

represents an identity matrix, and the measurement noise

N_{m}

consists of position noise and velocity noise, defined as follows:

N_{m} ~ (\begin{matrix} N (0, σ_{p_{m}}) \\ N (0, σ_{v_{m}}) \end{matrix})

(10)

(2) Challenges: Since the observations are obtained from two independent submodules, there will be data latency and synchronization issues. Both the position and the velocity observations are subject to a certain delay due to the processing time before being input into the filter. Additionally, there is a discrepancy in update frequencies between position and velocity observations. Error position observation is another challenge. Although our cross-modal matching algorithm can adapt to image distortions to a certain extent, localization may degrade or even fail in the presence of significant scene changes or heavy occlusions or during transitions into indoor environments.

(3) Motion Compensation (MC): In the proposed method, historical observational data and the estimated navigation states output by the AKF are retained. The position observations are compensated using the past velocity observations, while the velocity observations are compensated based on historical acceleration estimations. This compensation mechanism enables the Kalman filter to process asynchronous data in a temporally consistent manner, ensuring fixed-frequency operation while minimizing the impact of observational delays, as shown in Figure 5. The effectiveness of MC is validated in the comparison of filters in ablation experiments (Section 4.3).

(4) Observation-Aware Gain Scaling (OAGS): To evaluate the confidence of position observations, we leverage the three indicators as follows:

Matching coefficient: This refers to the value of the similarity metric defined in (6), which is an indicator of matching reliability. A higher matching coefficient means a higher confidence level for a position observation, and vice versa.
Matching inconsistency: After the initial matching, the sub-patches of the projection image will be re-searched in corresponding sub-searching areas. The matching inconsistency $T$ is calculated as follows:

$T = \sum_{i} ‖t_{i} - t‖$

(11)

where $t$ is the translation estimated with the projection image, and $t_{i}$ is the translation estimated with a sub-patch. A large matching inconsistency typically suggests that the associated position observation is incorrect.

Prediction deviation: We evaluate the deviation between the predicted state of the Kalman filter and the position observation obtained from the cross-modal matching. A large deviation is highly indicative of the position observation being erroneous.

The three indicators are normalized and utilized as input to the sigmoid function, which is defined as follows:

h (y) = \frac{1}{1 + e^{- α (θ^{T} y) + β}}

(12)

This function gives the relationship between the confidence of position

h (y)

and the three indicators (denoted as vector

y

).

θ

is a weighted vector, and

α

and

β

are linear transformation parameters. Based on the observation that the matching coefficient is positively correlated with the confidence of the matching result, while the latter two indicators are negatively correlated with it, the parameter

θ

is set as

[1, - 1, - 1]

, the scaling factor is set to 10 to effectively amplify the effect of

θ

, and the bias term

β

is set to 0. This configuration is fixed through all our experiments.

Based on the evaluation of position observation confidence, the OAGS is formulated as below:

\hat{K} = h (y) K

(13)

{\hat{p}}_{t} = {\overset{⌵}{p}}_{t} + \hat{K} (Z - H p_{m})

(14)

{\hat{E}}_{t} = (I - \hat{K} H) {\overset{⌵}{E}}_{t}

(15)

where

K

represents the original Kalman gain, the position-related components of which are adaptively adjusted based on the confidence of the position observation, denoted as

h (y)

. Consequently, the updates to the position state

{\hat{p}}_{t}

and the covariance matrix

{\hat{E}}_{t}

are modified accordingly, as shown in Equations (14) and (15).

Equations (13) and (14) show that the filter places greater reliance on the observation when the confidence is high, while reducing the influence of the erroneous position observation when the confidence is low.

4. Experiments

To evaluate the performance of the proposed method, we conduct extensive experiments in public datasets, including the NCLT dataset [12] and R³LIVE dataset [3,4]. Both datasets contain LiDAR, IMU, and camera data. To better evaluate the position drift correction of the proposed method, we selected 14 long-distance sequences covering diverse environmental conditions. Table 1 and Table 2 give the details of the selected sequences.

Localization accuracy is evaluated using the Absolute Position Error (APE), which is calculated in terms of RMSE:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {‖{\hat{p}}_{i} - p_{i}‖}^{2}}

(16)

where

{\hat{p}}_{i}

denotes the estimated position of the proposed method and

p_{i}

is the corresponding position of ground truth.

Since matching-based positioning is infeasible indoors, some accuracy evaluations exclude all indoor results. Columns labeled as “outdoor” report statistics for outdoor localization only, while others include all results.

In our implementation, the projection area is set to 150 m × 150 m with an image resolution of 0.1 m/pixel, and the matching search radius is 20 m. The R³LIVE dataset was collected in August 2021, and the corresponding referenced imagery was collected in April 2022. The NCLT dataset used in our experiments was recorded in 2012 from February to November, and the corresponding referenced imagery was captured in April 2015. All referenced imagery was acquired from Google Earth at zoom level 19, with an approximate ground resolution of 0.2986 m/pixel. During processing, the referenced imagery was resampled to 0.1 m/pixel to maintain consistency with the resolution of the projection map.

4.1. Outdoor Accuracy Evaluation

To evaluate the localization accuracy of our proposed system, we conducted experiments on the NCLT dataset and compared the results with two representative SLAM algorithms: R³LIVE++ [4] and FAST-LIO2 [34]. None of the compared methods employed loop closure. Our method utilizes FAST-LIO2 as the odometry backbone.

The experimental results are shown in Table 1, demonstrating that the proposed method significantly improves localization accuracy compared to R³LIVE++ and FAST-LIO2. This improvement is primarily attributed to the use of global localization based on image matching, which effectively eliminates error accumulation over time.

It is worth noting that the original R³LIVE++ paper does not report experimental results on the 2012-03-25 sequence. Our experiments further indicate that Fast-LIO2 exhibits significant drift on this sequence, with an RMSE of 19.30 m. However, our proposed method effectively corrects the drift by integrating matching-based observations.

Figure 6 presents a comparison between the proposed method and FAST-LIO2 on a representative segment from the NCLT dataset. As shown in Figure 6, the green, red, and blue lines represent the trajectories estimated by LIV-SLAM (FAST-LIO2), the proposed method, and the ground truth, respectively, while the yellow points indicate the localization results obtained from the matching algorithm. This color convention is consistently used throughout the paper. In Figure 6a, the initial segment of a sequence from the NCLT dataset shows that FAST-LIO2 maintains low drift and achieves localization accuracy comparable to our method. However, in Figure 6b, after approximately 1800 s, FAST-LIO2 begins to exhibit significant position drift due to accumulated odometry error. In contrast, our method continuously corrects this drift through matching-based global localization, thereby achieving improved overall localization accuracy.

We also conducted comparative experiments on two sequences from the R³LIVE dataset, evaluating the proposed method against R³LIVE and Fast-LIVO2. Since the original R³LIVE, under default parameters, has been reported to achieve a localization accuracy of approximately 0.1 m [3], we use its trajectory as the ground truth reference for evaluation.

Table 2 shows the localization accuracy of the three methods on the R³LIVE dataset, also demonstrating that the proposed approach achieves the minimum RMSE among all tested methods. In Table 2, R³LIVE- is a degraded version of R³LIVE, which operates with reduced point cloud resolution and a smaller number of tracked visual features, resulting in degraded odometry performance. When comparing with the outputs of R³LIVE-, the LIV-SLAM module implemented in our system is also R³LIVE-. Therefore, the observed improvements in localization accuracy are solely attributed to the proposed method.

4.2. Adaptability Evaluation

Significant scene variations and transitions into indoor environments may cause substantial discrepancies between real-time projection images and referenced images, thereby greatly reducing the reliability of image matching. To address this issue, two strategies can be adopted. First, enlarging the projection image area can help include more corresponding features, thus improving matching robustness. Figure 7 presents an example of indoor environment, where the red cross represents the current localization, Figure 7a,b show the projection image of the local region and its corresponding reference map, respectively. Compared to the 100 m × 100 m projection image, the 150 m × 150 m projection image includes a larger portion of the surrounding outdoor area, thereby capturing more features that effectively correspond to the reference image. Additionally, increasing the search range allows the matching algorithm to relocate the correct position after passing through disturbed or occluded regions.

Although the correct position can be relocated by increasing the search range, the fused localization accuracy will be still degraded due to the incorrect matching results. To mitigate this, the proposed OAGS mechanism adaptively adjusts the Kalman gain based on the matching confidence, thereby significantly suppressing the adverse impact of incorrect observations. As illustrated in Figure 8a, erroneous observations from image matching could lead to significant localization errors, while Figure 8b shows the result after applying the proposed OAGS mechanism. By effectively suppressing unreliable observations during indoor environments, the fusion trajectory with OAGS remains significantly closer to the ground truth, indicating a clear improvement in both localization robustness and accuracy.

4.3. Ablation Experiments

In this section, we conduct ablation experiments to evaluate the performance of the individual modules within the proposed framework.

4.3.1. Comparison of Matching Algorithm

The proposed matching algorithm is compared against other matching methods, including Scale Invariant Feature Transform (SIFT), omniGLUE [36], MI [37], and CFOG [38]. Among them, SIFT is a classic local feature-based algorithm, while omniGLUE is a more recent and deep learning-based feature matching method that can be applied to cross-modal matching tasks. Both MI and CFOG are designed for cross-modal matching. Similar to our method, they adopt a translation-based full-search strategy for the matching. Two variants of our method are included in the comparison: ASD, a spatial-domain search based on the absolute difference metric (4), and CFD, a frequency-domain search utilizing the double-angle cosine similarity metric (6).

A total of 100 matching pairs, each consisting of a real-time projection image and the corresponding reference search region, were randomly sampled from 14 sequences. A matching result is considered incorrect if its distance to the ground truth exceeds 5 m. The results in Table 3 demonstrate that our method achieves the best performance in terms of both matching accuracy rate and computational efficiency. Examples of the above matching method are provided in Figure 9.

4.3.2. Comparison of Filters

In this section, we compare the performance of three variants of the Kalman filter: the standard Kalman filter (KF), the Kalman filter with motion compensation (KF + MC), and the Kalman filter with both motion compensation and the proposed Observation-Aware Gain Scaling (KF + MC + OAGS). The standard KF and KF + MC were each evaluated under three different positioning update frequencies to assess the impact of observation delay on localization performance.

The evaluation results are summarized in Table 4. The results show that as the localization observation frequency decreases (corresponding to increased observation delay from 1 s to 10 s), the standard KF suffers from significant degradation in localization accuracy. In contrast, KF with MC maintains better performance due to the integration of motion compensation. Figure 10 illustrates the trajectory segments of KF and KF + MC, highlighting the performance difference under a 10-s observation delay. Specifically, (a) shows the result of the standard KF, while (b) presents the corresponding trajectory obtained using KF + MC. The improvement is particularly evident in turning scenarios, where motion compensation effectively mitigates the impact of delayed observations.

Table 5 presents the comparison between KF + MC and KF + MC + OAGS. The results show that enabling OAGS further reduces the overall RMSE. This improvement is primarily due to the ability of OAGS to suppress the influence of incorrect matching results, which can otherwise degrade the fused localization accuracy, particularly in indoor environments where the referenced image is unavailable.

4.4. Computational Efficiency

In this section, we evaluate the average processing time of the proposed method. The experiments were conducted on a PC equipped with an Intel i7-1360P CPU and 64 GB of RAM, without any GPU acceleration.

The main computational cost of the proposed algorithm lies in the projection and matching modules. Enabling the OAGS mechanism introduces additional overhead due to the evaluation of matching inconsistency. Table 6 reports the average processing time of each submodule in single-threaded implementation. On the testing platform, the system runs at approximately 14 Hz with OAGS enabled and around 30 Hz without it.

5. Conclusions

This paper presents a global localization algorithm based on real-time map projection and cross-modal image matching, which can effectively correct the drift in existing LIV-SLAM systems such as Fast-LIO, R³LIVE++, and Fast-LIVO2. In addition, we propose an AKF incorporating MC and OAGS mechanisms, which fuses the outputs of the LIV-SLAM module with global localization results to achieve a novel and practical long-distance autonomous navigation approach for land vehicles.

Compared to existing cross view-based autonomous navigation approaches [32], which achieves an average localization accuracy of about 20 m and operates at a relatively low frequency of 0.5–1 Hz, the proposed method achieves pixel-level alignment, resulting in significantly improved localization accuracy (RMSE of about 2 m) and real-time performance (approximately 14 Hz with OAGS enabled and around 30 Hz without it).

However, the proposed method has several limitations. First, it requires accumulating a sufficient amount of point cloud data before generating a projection image, thus it cannot support one-shot localization. Second, the system relies on the output of a LIV-SLAM module; if the SLAM fails completely (e.g., due to extreme motion or sensor degradation), the localization may fail. Additionally, in environments lacking structural or texture features, such as open grasslands or deserts, the image matching process may be unreliable.

Future work will focus on extending the proposed framework to a purely vision-based solution that eliminates the dependency on LiDAR sensors. This adaptation will improve the system’s applicability to a broader range of platforms.

The Supplementary Materials which provide trajectory plots of the proposed method on all test sequences and a demo video will be released through GitHub (https://github.com/yao-yating/RGL-LIV-SLAM, accessed on 7 June 2025) and BaiduDisk (https://pan.baidu.com/s/1qoij3u7oM1qBKZjLjzg42Q?pwd=sa6q, accessed on 7 June 2025).

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app15158257/s1, Figure S1: The similarity metric functions; Figure S2: Result on 2012-02-02; Figure S3: Result on 2012-02-04; Figure S4: Result on 2012-02-05; Figure S5: Result on 2012-02-19; Figure S6: Result on 2012-03-17; Figure S7: Result on 2012-03-25; Figure S8: Result on 2012-04-29; Figure S9: Result on 2012-05-11; Figure S10: Result on 2012-06-15; Figure S11: Result on 2012-08-04; Figure S12: Result on 2012-10-28; Figure S13: Result on 2012-11-17; Figure S14: Result on hkust_seq00; Figure S15: Result on hkust_seq01; Video S1: Result on Sequence NCLT-2012-02-04.

Author Contributions

Conceptualization, J.D.; methodology, Y.Y. and J.D.; software, Y.Y. and J.D.; validation, Y.Y., Q.H. and Z.C.; formal analysis, Y.Y. and J.D.; investigation, Y.Y. and J.D.; resources, J.D.; data curation, Y.Y., Q.H. and Z.C.; writing—original draft preparation, Y.Y. and J.D.; writing—review and editing, J.D., S.H. and H.L.; visualization, Y.Y., J.D. and Q.H.; supervision, J.D. and S.H.; project administration, J.D. and S.H.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is in part supported by the National Natural Science Foundation of China under Grant 62203163, in part supported by the National Natural Science Foundation of Hunan Province under Grant 2025JJ60407 and in part supported by the National Key Research and Development Program of China under Grant No. 2023YFC2907000.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Our experiments were conducted using publicly available datasets, specifically the NCLT dataset (https://robots.engin.umich.edu/nclt/, accessed on 7 June 2025) and the R3LIVE dataset (https://github.com/ziv-lin/r3live_dataset, accessed on 7 June 2025).

Acknowledgments

The authors would like to thank all the anonymous reviewers for their constructive comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SLAM	Simultaneous Localization and Mapping
LIV-SLAM	LiDAR-inertial-visual SLAM
AKF	adaptative Kalman filter
MC	motion compensation
OAGS	Observation-Aware Gain Scaling

References

Qin, H.; Meng, Z.; Meng, W.; Chen, X.; Sun, H.; Lin, F.; Ang, M.H. Autonomous exploration and mapping system using heterogeneous UAVs and UGVs in GPS-denied environments. IEEE Trans. Veh. Technol. 2019, 68, 1339–1350. [Google Scholar] [CrossRef]
Cheng, J.; Zhang, L.; Chen, Q.; Fu, Z.; Du, L. High Precision and Robust Vehicle Localization Algorithm with Visual-LiDAR-IMU Fusion. IEEE Trans. Veh. Technol. 2024, 73, 11029–11043. [Google Scholar] [CrossRef]
Lin, J.; Zhang, F. R3LIVE: A Robust, Real-time, RGB-colored, LiDAR-Inertial-Visual tightly-coupled state Estimation and mapping package. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: New York, NY, USA, 2022; pp. 10672–10678. [Google Scholar]
Lin, J.; Zhang, F. R3LIVE++: A Robust, Real-time, Radiance Reconstruction Package with a Tightly-coupled LiDAR-Inertial-Visual State Estimator. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 11168–11185. [Google Scholar] [CrossRef] [PubMed]
Zheng, C.; Zhu, Q.; Xu, W.; Liu, X.; Guo, Q.; Zhang, F. Fast-livo: Fast and tightly-coupled sparse-direct lidar-inertial-visual odometry. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: New York, NY, USA, 2022; pp. 4003–4009. [Google Scholar]
Zheng, C.; Xu, W.; Zou, Z.; Hua, T.; Yuan, C.; He, D.; Zhou, B.; Liu, Z.; Lin, J.; Zhu, F.; et al. Fast-livo2: Fast, direct lidar-inertial-visual odometry. IEEE Trans. Robot. 2024, 41, 326–346. [Google Scholar] [CrossRef]
Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Shan, T.; Englot, B. Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 4758–4765. [Google Scholar]
Dong, J.; Hu, M.; Lu, J.; Han, S. Affine template matching based on multi-scale dense structure principal direction. IEEE Trans. Circ. Syst. Video Tech. 2020, 31, 2125–2132. [Google Scholar] [CrossRef]
Carlevaris-Bianco, N.; Ushani, A.K.; Eustice, R.M. University of Michigan North Campus long-term vision and lidar dataset. Int. J. Robot. Res. 2016, 35, 1023–1035. [Google Scholar] [CrossRef]
Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed]
Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; IEEE: New York, NY, USA, 2007; pp. 225–234. [Google Scholar]
Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
Ali-bey, A.; Chaib-draa, B.; Giguère, P. Gsv-cities: Toward appropriate supervised visual place recognition. Neurocomputing 2022, 513, 194–203. [Google Scholar] [CrossRef]
Ali-Bey, A.; Chaib-Draa, B.; Giguere, P. Mixvpr: Feature mixing for visual place recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 2998–3007. [Google Scholar]
Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
Berton, G.; Masone, C.; Caputo, B. Rethinking visual geo-localization for large-scale applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4878–4888. [Google Scholar]
Berton, G.; Trivigno, G.; Caputo, B.; Masone, C. Eigenplaces: Training viewpoint robust models for visual place recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 11080–11090. [Google Scholar]
Hausler, S.; Garg, S.; Xu, M.; Milford, M.; Fischer, T. Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14141–14152. [Google Scholar]
Jin Kim, H.; Dunn, E.; Frahm, J.M. Learned contextual feature reweighting for image geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2136–2145. [Google Scholar]
Xu, Y.; Shamsolmoali, P.; Granger, E.; Nicodeme, C.; Gardes, L.; Yang, J. TransVLAD: Multi-scale attention-based global descriptors for visual geo-localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 2840–2849. [Google Scholar]
Fervers, F.; Bullinger, S.; Bodensteiner, C.; Arens, M.; Stiefelhagen, R. Statewide Visual Geolocalization in the Wild. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature Switzerland: Cham, Switzerland; pp. 438–455. [Google Scholar]
Hu, S.; Feng, M.; Nguyen, R.M.; Lee, G.H. Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7258–7267. [Google Scholar]
Yang, H.; Lu, X.; Zhu, Y. Cross-view geo-localization with layer-to-layer transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 29009–29020. [Google Scholar]
Zhu, S.; Shah, M.; Chen, C. Transgeo: Transformer is all you need for cross-view image geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1162–1171. [Google Scholar]
Deuser, F.; Habel, K.; Oswald, N. Sample4geo: Hard negative sampling for cross-view geo-localisation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 16847–16856. [Google Scholar]
Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive codin. arXiv 2018, arXiv:1807.03748. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PmLR, Honolulu, HI, USA, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Hu, S.; Lee, G.H. Image-based geo-localization using satellite imagery. Int. J. Comput. Vis. 2020, 128, 1205–1219. [Google Scholar] [CrossRef]
Xu, W.; Zhang, F. Fast-lio: A fast, robust lidar-inertial odometry package by tightly-coupled iterated kalman filter. IEEE Robot. Autom. Lett. 2021, 6, 3317–3324. [Google Scholar] [CrossRef]
Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
Gonzales, R.C.; Wintz, P. Digital Image Processing; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1987; Chapter 4. [Google Scholar]
Jiang, H.; Karpur, A.; Cao, B.; Huang, Q.; Araujo, A. Omniglue: Generalizable feature matching with foundation model guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19865–19875. [Google Scholar]
Maes, F.; Vandermeulen, D.; Suetens, P. Medical image registration using mutual information. Proc. IEEE 2003, 91, 1699–1722. [Google Scholar] [CrossRef]
Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and robust matching for multimodal remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef]

Figure 1. Examples of cross-modal distortion and local scene changes. (a,b) Example of grayscale distortion; (c,d) Example of projection distortion; (e,f) Example of scene change.

Figure 2. System overview.

Figure 3. Cross-modal matching-based localization. (a) 3D RGB point cloud reconstructed by LIV-SLAM; (b) The corresponding projection image; (c) The matching result (highlighted by the yellow box) of the projection image and the referenced image, and the red cross indicates the current location.

Figure 4. Pixel-wise generation of the projection image.

Figure 5. Motion compensation.

Figure 6. Localization drift correction. (a) The initial segment of a sequence from the NCLT dataset where LIV-SLAM closely following the ground truth; (b) Significant drift occurs in LIV-SLAM as the sequence continues.

Figure 7. Comparison of using different projection sizes. (a) Projection image with different size; (b) The corresponding referenced image.

Figure 8. Performance of OAGS when entering indoor environments. (a) Erroneous observations from image matching lead to significant localization errors; (b) The result after applying the proposed OAGS mechanism.

Figure 9. Matching results of different matching methods. The red circles emphasize the misalignments.

Figure 10. Comparison of KF and KF + MC. (a) The result of the standard Kalman Filter with a position observation delay of 10-second; (b) The corresponding result using the Kalman Filter with motion compensation.

Table 1. Comparison of APE (m) on NCLT dataset.

Sequence (Date)	Length (km)	Duration (s)	Ours (Outdoor)	R³LIVE++	Fast-LIO2
2012-02-02 ¹	6.2	5854.54	1.36	5.33	9.12
2012-02-04	5.5	4659.86	1.85	5.58	7.19
2012-02-05	6.5	5620.74	1.73	8.52	7.80
2012-02-19	6.2	5319.48	1.79	8.52	5.98
2012-03-17	5.8	4948.81	2.53	4.83	4.70
2012-03-25	5.8	4840.98	4.63	\ ²	19.30
2012-04-29	3.1	2597.90	1.28	6.32	6.44
2012-05-11	6.0	5073.34	2.54	3.70	4.13
2012-06-15	4.1	3310.51	1.78	7.74	5.27
2012-08-04	5.5	4814.31	1.91	3.85	6.96
2012-10-28	5.6	5138.99	1.33	7.71	7.68
2012-11-17	5.7	5315.78	1.68	8.68	5.83

¹ The dates represent the data acquisition dates of specific sequences from NCLT dataset. ² The original R³LIVE++ paper does not report experimental results on the 2012-03-25 sequence.

Table 2. Comparison of APE (m) on R³LIVE dataset.

Sequence	Length (km)	Duration (s)	Ours (Outdoor)	R³LIVE-	Fast-LIVO2
hkust-seq00	1.3	1073	1.92	4.68	4.88
hkust-seq01	1.5	1162	3.35	6.06	4.37

Table 3. Comparison of matching performance.

Method	Accuracy	Processing Time (s)
SIFT	0%	1.67
OMNIGLUE	1%	126.59
MI	76%	265.32
CFOG	80%	0.31
Ours (ASD)	96%	5.50
Ours (CFD)	96%	0.02

Table 4. Comparison of localization accuracy (APE, m) with and without MC.

Sequence	KF (Outdoor)			KF+MC (Outdoor)
Sequence	0.2 s	1 s	10 s	0.2 s	1 s	10 s
2012-02-02	1.40	1.32	5.61	1.34	1.36	1.30
2012-02-04	1.71	1.71	6.07	1.86	1.85	1.58
2012-02-05	1.73	1.69	6.07	1.73	1.73	1.85
2012-02-19	1.94	1.79	7.12	1.83	1.79	1.99
2012-03-17	2.66	2.51	6.72	2.44	2.53	2.85
2012-03-25	4.72	4.58	9.36	4.52	4.63	5.01
2012-04-29	1.19	1.15	5.54	1.21	1.28	1.09
2012-05-11	2.59	2.52	7.29	2.55	2.54	2.83
2012-06-15	1.73	1.56	6.27	1.65	1.78	2.04
2012-08-04	1.96	1.81	5.98	1.87	1.91	1.87
2012-10-28	1.61	1.33	21.39	1.35	1.33	1.34
2012-11-17	1.66	1.57	6.44	1.54	1.68	1.88
hkust-seq00	1.70	1.79	5.54	1.89	1.92	2.02
hkust-seq01	3.47	3.40	7.14	3.42	3.35	3.46

Table 5. Comparison of localization accuracy (APE, m) with and without OAGS.

Sequence	OAGS Disabled	OAGS Enabled
2012-02-02	2.27	1.63
2012-02-04	2.20	1.47
2012-02-05	2.18	1.73
2012-02-19	2.17	1.83
2012-03-17	4.14	2.77
2012-03-25	4.90	5.89
2012-04-29	1.75	1.07
2012-05-11	2.74	2.72
2012-06-15	1.99	1.84
2012-08-04	1.94	1.71
2012-10-28	1.46	1.32
2012-11-17	1.68	1.66
hkust-seq00	2.34	2.28
hkust-seq01	12.65	8.55

Table 6. Processing time statistics.

Module	Projection	Matching	OAGS	Total
Average time (ms)	1.64	24.81	46.91	73.36

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yao, Y.; Dong, J.; Han, S.; Liu, H.; Hu, Q.; Chen, Z. Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery. Appl. Sci. 2025, 15, 8257. https://doi.org/10.3390/app15158257

AMA Style

Yao Y, Dong J, Han S, Liu H, Hu Q, Chen Z. Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery. Applied Sciences. 2025; 15(15):8257. https://doi.org/10.3390/app15158257

Chicago/Turabian Style

Yao, Yating, Jing Dong, Songlai Han, Haiqiao Liu, Quanfu Hu, and Zhikang Chen. 2025. "Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery" Applied Sciences 15, no. 15: 8257. https://doi.org/10.3390/app15158257

APA Style

Yao, Y., Dong, J., Han, S., Liu, H., Hu, Q., & Chen, Z. (2025). Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery. Applied Sciences, 15(15), 8257. https://doi.org/10.3390/app15158257

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Geo-Localization for Land Vehicles Using LIV-SLAM and Referenced Satellite Imagery

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

2.1. SLAM

2.2. Matching Based Global Localization for Land Vehicles

3. Materials and Methods

3.1. System Overview

3.2. Local Image Real-Time Generating

3.3. Fast Cross-Modal Matching

3.4. AKF Based on MC and OAGS

4. Experiments

4.1. Outdoor Accuracy Evaluation

4.2. Adaptability Evaluation

4.3. Ablation Experiments

4.3.1. Comparison of Matching Algorithm

4.3.2. Comparison of Filters

4.4. Computational Efficiency

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI