Next Article in Journal
Two Novel Cloud-Masking Algorithms Tested in a Tropical Forest Setting Using High-Resolution NICFI-Planet Basemaps
Previous Article in Journal
Deep Learning for Tumor Segmentation and Multiclass Classification in Breast Ultrasound Images Using Pretrained Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration

1
School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
2
Zhengzhou Research Institute, Beijing Institute of Technology, Zhengzhou 450000, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(24), 7558; https://doi.org/10.3390/s25247558
Submission received: 29 October 2025 / Revised: 9 December 2025 / Accepted: 9 December 2025 / Published: 12 December 2025
(This article belongs to the Section Radar Sensors)

Abstract

This paper presents HV-LIOM (Adaptive Hash-Voxel LiDAR–Inertial Odometry and Mapping), a unified LiDAR–inertial SLAM and autonomous exploration framework for real-time 3D mapping in dynamic, GNSS-denied environments. We propose an adaptive hash-voxel mapping scheme that improves memory efficiency and real-time state estimation by subdividing voxels according to local geometric complexity and point density. To enhance robustness to poor initialization, we introduce a multi-resolution relocalization strategy that enables reliable localization against a prior map under large initial pose errors. A learning-based loop-closure module further detects revisited places and injects global constraints, while global pose-graph optimization maintains long-term map consistency. For autonomous exploration, we integrate a Soft Actor–Critic (SAC) policy that selects informative navigation targets online, improving exploration efficiency in unknown scenes. We evaluate HV-LIOM on public datasets (Hilti and NCLT) and a custom mobile robot platform. Results show that HV-LIOM improves absolute pose accuracy by up to 15.2% over FAST-LIO2 in indoor settings and by 7.6% in large-scale outdoor scenarios. The learned exploration policy achieves comparable or superior area coverage with reduced travel distance and exploration time relative to sampling-based and learning-based baselines.

1. Introduction

Autonomous 3D mapping and localization are fundamental capabilities for mobile robots operating in complex, GNSS-denied environments. With the rapid advances in artificial intelligence and embodied intelligence, mobile robots are increasingly deployed in indoor inspection, environmental exploration, disaster response, and autonomous delivery. Their effectiveness relies critically on the accuracy, robustness, and computational efficiency of simultaneous localization and mapping (SLAM), which enables perception, mapping, and navigation without external positioning infrastructure.
Recent LiDAR–inertial SLAM systems have achieved impressive real-time performance by combining efficient point-cloud registration, tightly coupled LiDAR–IMU fusion, and global optimization. Representative methods such as LOAM [1], LIO-SAM [2], FAST-LIO2 [3], and their variants demonstrate that feature-based or direct registration on k-d tree or voxel-grid maps, together with optimization- or filtering-based fusion, can provide accurate and robust localization across diverse indoor and outdoor scenarios. Despite this progress, several limitations remain. First, map representations are often built upon static k-d tree or octree structures, leading to high memory consumption and degraded query/update efficiency as the map grows. Second, multi-session capabilities—including prior-map relocalization and long-term reuse of previously built maps—are frequently implemented as separate add-ons or not explicitly addressed. Third, autonomous exploration is commonly driven by handcrafted heuristics, lacking a unified learning-based decision-making module that is tightly coupled with the SLAM backbone.
To address these limitations, we propose HV-LIOM, a unified LiDAR–inertial SLAM and autonomous exploration framework that integrates adaptive hash-voxel mapping, learning-based loop closure, multi-resolution prior-map relocalization, and a deep reinforcement learning policy for active exploration. We refer to the overall SLAM and exploration system as HV-LIOM (Adaptive Hash-Voxel LiDAR–Inertial Odometry and Mapping). The LiDAR–inertial odometry front-end is denoted HV-LIO, and the full SLAM system with loop closure and global optimization is denoted HV-LIO-SAM. Unlike conventional LiDAR–inertial pipelines that primarily emphasize odometry and static mapping, HV-LIOM explicitly targets long-term, multi-session operation and autonomous exploration in unknown environments, while preserving real-time performance and high localization accuracy.
In summary, our contributions are as follows:
  • We propose an adaptive hybrid hash-voxel mapping strategy that models both planar and non-planar structures. The representation adaptively maintains surface and non-surface features, enabling efficient feature association and high-fidelity incremental map updates with improved memory efficiency.
  • We develop a learning-based loop-closure mechanism based on the Boundary Triangle Context (BTC) descriptor, which encodes geometric similarity between keyframes via triangle-level relational encoding. Integrated with global factor-graph optimization, this module improves global consistency and mitigates long-term drift.
  • We design an active exploration strategy based on the Soft Actor–Critic (SAC) algorithm, enabling robots to select informative navigation targets by balancing information gain and motion cost. This learning-based policy improves the efficiency and completeness of exploration in unknown environments.
  • We integrate a multi-resolution Normal Distributions Transform (NDT) relocalization module that performs coarse-to-fine probabilistic alignment, providing robust initialization and accurate localization against prior maps across diverse deployment conditions.

2. Related Work

2.1. LiDAR–Inertial SLAM

Point cloud registration is a central component of LiDAR–inertial SLAM [4,5]. Since the seminal Iterative Closest Point (ICP) algorithm [6], LiDAR-based SLAM has achieved substantial progress in registration accuracy and robustness [7]. However, the sparsity and non-uniformity of spinning LiDAR returns often limit the reliability of direct frame-to-frame ICP, especially in low-structure scenes.
To improve data association under sparse measurements, LOAM [1] extracts edge and planar features and formulates point-to-line and point-to-plane constraints. By combining high-rate odometry with low-rate mapping optimization, LOAM achieves a practical trade-off between accuracy and real-time performance. Building on this paradigm, LeGO-LOAM [8] enhances ground segmentation for wheeled platforms, while LIO-SAM [2] adopts a keyframe-based nonlinear factor-graph formulation to improve global consistency. Extending to solid-state sensors, LOAM-Livox [9] introduces adaptive feature extraction and motion distortion compensation to accommodate non-repetitive scanning patterns.
From the perspective of fusion architectures, loosely coupled approaches based on extended Kalman filtering (EKF) [10,11] are computationally attractive but can degrade in degenerate or low-observability environments. Tightly coupled methods have therefore become prevalent by directly incorporating raw LiDAR and IMU measurements within a unified estimation framework. Optimization-based systems such as LIO-Mapping [12] and IN2LAAMA [13] can achieve high accuracy at the cost of increased computation, whereas filtering-based approaches such as LINS [14] and the FAST-LIO series [3,15] leverage iterative error-state Kalman filtering and efficient map structures to balance precision and efficiency.
Beyond classical geometric feature extraction, recent studies have investigated voxelized representations and learning-based feature encoding to enhance 3D perception and registration. Yuan et al. [16] introduced a multi-hierarchical mutual information learning framework for unsupervised point cloud registration to capture cross-scale geometric dependencies. Cao et al. [17] proposed complementary pseudo-multimodal feature learning for point cloud anomaly detection, demonstrating effective cross-modal fusion. Zhao et al. [18] developed a voxel-pillar multi-frame cross-attention network for robust single-object tracking in sparse point clouds, highlighting the advantages of voxelized spatial encoding for temporal association. These works collectively indicate a shift toward voxelized, learning-informed representations that improve scalability and robustness, which motivates the adaptive hybrid hash-voxel mapping strategy proposed in this paper.

2.2. Loop Closure and Global Optimization

During long-term SLAM operation, measurement noise, modeling errors, and unobservable modes can lead to gradual drift accumulation, causing the estimated trajectory to deviate from the true robot motion. As a result, even when the robot revisits a previously mapped place, the trajectory may not form a consistent loop, yielding geometric inconsistencies in the reconstructed map. Loop closure aims to detect such revisits and introduce global constraints, enabling back-end optimization to suppress accumulated errors and maintain global consistency of both the trajectory and the map.
Existing loop closure methods can be broadly categorized into vision-based and point-cloud-based approaches. Vision-based techniques benefit from rich appearance and semantic cues but are sensitive to illumination and viewpoint changes [19]. In contrast, point-cloud-based methods are invariant to illumination and thus more suitable for outdoor and long-term deployment. Early point-cloud approaches such as 3D SIFT [20] and SHOT [21] rely on local descriptors computed from 3D point clouds or projected range images, but their performance can degrade under sparse, non-uniform, or partially occluded measurements. To improve robustness, a series of global descriptors have been proposed, including M2DP [22], Scan Context [23], MinkLoc3D [24], BoW3D [25], Ring++ [26], and STD [27]. More recently, deep learning-based place recognition and loop detection networks such as LCDNet [28], Logg3D-Net [29], LocNet [30], and OverlapNet [31] have further improved accuracy and recall by learning discriminative global and local geometric representations directly from point clouds.
After loop candidates are identified, most LiDAR–inertial SLAM systems employ pose-graph or factor-graph optimization to fuse multiple constraints, including loop closure, inter-frame LiDAR registration, and IMU pre-integration [32]. These constraints are typically formulated as a nonlinear least-squares problem and solved efficiently using incremental smoothing methods such as iSAM2. The optimized global trajectory is subsequently used to update the global map to achieve long-term consistency.
Despite this progress, several challenges remain. Local feature-based methods are sensitive to variations in point density and viewpoint, global descriptors may be confused by repetitive or self-similar structures, and deep learning-based approaches often require large-scale training data and may exhibit limited cross-domain generalization. Therefore, achieving a practical balance among computational efficiency, robustness, and generalization remains a key research direction for loop closure and global optimization in modern SLAM systems.

2.3. Active Exploration in Unknown Environments

Traditional SLAM deployments often rely on manual guidance or pre-scripted routes to complete environmental mapping [33,34,35]. To improve robot autonomy, active exploration has been widely studied, where robots autonomously plan actions to expand perceptual coverage and reduce mapping uncertainty during online operation.
Existing exploration methods are commonly categorized as frontier-based and sampling-based. Frontier-based exploration, first introduced by Yamauchi [36], selects navigation goals along the boundary between known free space and unknown regions. Subsequent work has improved frontier extraction and decision efficiency through refined segmentation and adaptive goal selection mechanisms [37,38,39]. In contrast, sampling-based methods, exemplified by Next Best View Planning (NBVP) [40], generate candidate viewpoints via randomized tree sampling and choose actions that maximize information gain. Extensions such as TMRRT [41] and GBP [42] further improve real-time performance and long-range exploration capability. However, both paradigms are typically driven by myopic or greedy utility maximization, making it difficult to obtain globally efficient exploration trajectories in large-scale or highly cluttered environments.
To mitigate these limitations, hybrid approaches that combine frontier-based and sampling-based strategies have been proposed. For example, DSVP [43] and TARE [44] adopt hierarchical frameworks to balance local responsiveness and global coverage, improving exploration efficiency and completeness at the cost of increased computational complexity.
In recent years, advances in deep learning and reinforcement learning have motivated data-driven exploration. Some methods employ deep reinforcement learning to adaptively select exploration frontiers [45], whereas others leverage convolutional or graph neural networks to predict information-gain distributions [46]. More recent approaches directly output the next exploration target using end-to-end neural policies [47,48,49,50,51]. While these learning-based methods show promising potential to acquire global exploration priors from data, two challenges remain. First, temporal dependency is often under-modeled, despite active exploration being naturally formulated as a Partially Observable Markov Decision Process (POMDP). Second, reward designs are frequently simplistic or task-specific, which can limit learning efficiency and cross-scene generalization.

2.4. Relocalization and Prior Map-Based Localization

Most LiDAR–inertial SLAM systems initialize mapping with respect to the robot’s starting pose and achieve localization through incremental map construction. In practical deployments, however, environments are often pre-mapped, making repeated reconstruction unnecessary. In such cases, the robot can directly localize against a prior map, reducing computational cost and enabling consistent multi-session operation.
Existing relocalization methods can be broadly categorized into point-cloud-registration-based and deep-learning-based approaches. In the first category, classical methods such as ICP and NDT [52] remain widely adopted, while GICP [53] improves accuracy and robustness by better accounting for local surface structure. For NDT-based localization, several extensions have been developed, including 3D-NDT [54], road-marking-aided NDT [55], and Monte Carlo-based NDT [56], which enhance robustness under dynamic or partially changing conditions.
The second category focuses on deep learning-based localization. LocNet [30] learns rotation-invariant features and integrates them with particle filtering for robust matching. DeepLocalization [57] employs dual networks to directly regress the six-degree-of-freedom (6-DoF) pose, while DenserNet [58] and DeLS-3D [59] incorporate multi-scale feature extraction and temporal modeling to further improve registration accuracy. Despite the advantages of parallel inference and data-driven adaptability, learning-based methods often incur substantial computational overhead and rely on large, diverse training datasets, which can hinder generalization and large-scale deployment in real-world settings [60].
To better position HV-LIOM within the literature, we provide a structured comparison of representative LiDAR–inertial SLAM and autonomous exploration methods in Table A1 (Appendix A). The table summarizes key aspects of each approach, including map representation, loop-closure strategy, prior-map relocalization capability, and support for learning-based exploration.

3. Methodology

This paper presents a unified framework for real-time LiDAR–inertial SLAM and active exploration in unknown environments. As illustrated in Figure 1, the proposed system integrates four key modules to achieve accurate, robust, and autonomous 3D mapping: (i) an adaptive hybrid hash-voxel representation for efficient map management and structural feature modeling; (ii) a BTC-based loop-closure and global graph-optimization module to enforce long-term trajectory consistency; (iii) a SAC-based active exploration policy with spatiotemporal feature modeling for autonomous decision-making; and (iv) a coarse-to-fine prior-map relocalization mechanism that enables reliable localization reuse across multiple mapping sessions.

3.1. Adaptive Hybrid Hash-Voxel Map Construction

To overcome the limitations of conventional map representations such as k-d trees and octrees, we propose an adaptive hybrid hash-voxel structure. The core idea is to allocate memory only for observed spatial regions via hash indexing while dynamically subdividing or merging voxels according to local point density. As shown in Figure 2, empty space is naturally ignored, and only occupied voxels are stored in a hash table. Each occupied voxel is discretely indexed by its 3D location, enabling efficient correspondence between voxelized map space and hash-table entries for rapid voxel lookup and insertion.
Following common practice in voxel hashing and spatial hashing for 3D reconstruction and real-time mapping [61,62], we adopt a spatial hashing scheme to map 3D LiDAR points into a discrete hash-voxel space.
Specifically, for an observed LiDAR point p = ( x , y , z ) T , its discrete voxel coordinate P = ( X , Y , Z ) T is computed from the predefined resolution r e s as
P = ( X , Y , Z ) T = int x r e s , int y r e s , int z r e s T ,
where int ( · ) denotes integer rounding. The corresponding hash-table index I D is then computed as
I D = h a s h ( P ) = h a s h ( X , Y , Z ) = ( X · n 1 ) ( Y · n 2 ) ( Z · n 3 ) mod N ,
where n 1 , n 2 , and n 3 are large prime numbers, N is the maximum hash-table capacity, ⊕ denotes bitwise exclusive OR, and mod is the modulo operation with respect to N.
This mapping establishes a deterministic point-to-voxel correspondence for efficient feature association and incremental map updates. When LiDAR–inertial odometry performs data association or integrates a newly registered scan, the hash-voxel structure enables rapid retrieval of the relevant voxel statistics and structural parameters.
To further improve association efficiency, each voxel stores the covariance of the associated measurements and fits local structural primitives. Each voxel thus maintains a feature type and the corresponding geometric parameters (e.g., planar or non-planar). Compared with k-d-tree-based LiDAR–inertial SLAM systems such as LIO-SAM and FAST-LIO2 [3], the proposed representation reduces repeated nearest-neighbor queries and per-frame feature fitting. Instead, the odometry front-end directly retrieves pre-stored structural parameters from the matched voxels, thereby improving the efficiency of state estimation and feature association.
Long-term operation on memory-constrained platforms can cause unbounded map growth if not properly controlled. To address this issue, we design a dynamic maintenance strategy for the adaptive hybrid hash-voxel map (Figure 3) to keep memory usage bounded over extended trajectories. In addition to a fixed-capacity hash table, the map maintains a doubly linked list that records voxel IDs. When a new scan is registered after state estimation, the corresponding voxel IDs are inserted at the head of the list. If the number of nodes exceeds a preset capacity, the algorithm removes the tail node and evicts the associated voxels from the hash table. This mechanism effectively limits the number of maintained voxels and enables continuous operation without exceeding memory constraints.
In the current implementation, HV-LIOM assumes quasi-static environments during mapping and localization, consistent with most LiDAR–inertial SLAM frameworks. Transient moving objects are treated as measurement outliers and are filtered using statistical criteria.

3.2. BTC-Based Loop Closure and Global Optimization

To mitigate cumulative drift in LiDAR–inertial SLAM, we incorporate a loop-closure mechanism into the back-end optimization. We design a compact binary descriptor, termed the BTC descriptor, to encode local geometric structures into concise binary vectors. The BTC-based loop detector aims to determine whether the current LiDAR frame (frame j) corresponds to a previously visited keyframe (frame i). Once a match is identified, a loop pair ( i , j ) C is formed, and the relative transformation Δ T ˜ i , j is estimated.
Global map optimization seeks a globally consistent pose sequence across N time steps,
T = { T 1 , T 2 , , T N } ,
such that all registered point clouds can be aligned in a unified world coordinate system. As illustrated in Figure 4, each node in the pose graph corresponds to a LiDAR–inertial odometry (LIO) pose, whereas edges represent local odometry constraints or global loop-closure constraints.
Following standard formulations in graph-based SLAM and factor-graph optimization [63,64], we model global pose refinement as a nonlinear least-squares problem over odometry and loop-closure constraints.
T = arg min { T 1 , , T N } i = 1 N 1 e i Σ i 1 2 + ( i , j ) C e i j Σ i j 1 2 ,
where e i denotes the relative odometry constraint between adjacent frames and e i j represents the loop-closure constraint between nonadjacent keyframes.
The odometry constraint enforces consistency between consecutive poses.
e i = Log Δ T ˜ i , i + 1 1 T i 1 T i + 1 ,
Δ T ˜ i , i + 1 = T ˜ i 1 T ˜ i + 1 ,
where T ˜ i and T ˜ i + 1 are the pre-optimization LIO estimates.
Similarly, the loop-closure constraint ensures geometric consistency between loop pairs.
e i j = Log Δ T ˜ i , j 1 T i 1 T j ,
where Δ T ˜ i , j is the relative pose estimated by the BTC-based loop-closure module. The pose-graph objective in (4) and the SE ( 3 ) logarithmic error terms in (5)–(7) follow widely used formulations in graph-based SLAM and factor-graph optimization [63,64].
Because odometry and loop-closure constraints typically exhibit different uncertainty levels, their relative weights should be properly modeled. Following first-order error propagation for nonlinear transformations [63,64], the covariance of the odometry constraint is approximated as
Σ i = Δ T ˜ i , i + 1 T i Σ T i Δ T ˜ i , i + 1 T i T + Δ T ˜ i , i + 1 T i + 1 Σ T i + 1 Δ T ˜ i , i + 1 T i + 1 T ,
where Σ T i and Σ T i + 1 are the pose covariances output by the LIO front-end at time i and i + 1 , respectively.
We further model the loop-closure covariance using the registration confidence produced by the BTC-based fine-alignment module. Specifically, the information matrix is defined as
Σ i j 1 = α · s c o r e i j I 6 × 6 ,
where s c o r e i j is the loop-closure confidence score, α balances the influence of odometry and loop-closure constraints, and I 6 × 6 is the 6 × 6 identity matrix.

BTC Descriptor Matching

For each BTC descriptor C i extracted from the current LiDAR frame, its global code l i is used to query the corresponding hash bucket, denoted as Hash ( l i ) , to retrieve candidate descriptors { C j } from the database. Following the BTC-based loop-closure framework in [65], the similarity between a query descriptor C i and a candidate descriptor C j is computed from their local binary encodings b i = { b i 1 , b i 2 , b i 3 } and b j = { b j 1 , b j 2 , b j 3 } as
S b t c ( C i , C j ) = 1 3 k = 1 3 S b i n ( b i k , b j k ) ,
S b i n ( b i k , b j k ) = 2 · H W ( b i k & b j k ) H W ( b i k ) + H W ( b j k ) ,
where H W ( · ) denotes the Hamming weight and & represents the bitwise AND operation. If S b t c ( C i , C j ) exceeds a predefined threshold, the two descriptors are considered a successful match, and the keyframe ID associated with C j receives a score increment.

3.3. SAC-Based Active Exploration

To enable efficient autonomous exploration in unknown environments, we develop an active exploration algorithm based on the Soft Actor–Critic (SAC) framework. The proposed method integrates a topological representation of the local navigation space with spatiotemporal decision features, allowing the robot to learn exploration strategies online through reinforcement learning. A Transformer-based policy network [66] is employed to extract high-dimensional state representations from the topology graph and to output the next best exploration target.
In the SAC updates, we use learning rates ( η Q , η π ) , discount factor γ , and the entropy temperature α . To avoid ambiguity, the bare symbol α is reserved for the SAC entropy temperature, while reward weights always appear with subscripts ( α e , α n , α b , α d ) .

3.3.1. State Space Design

The state space encodes environmental information relative to the robot. Directly using dense 3D LiDAR point clouds or bird’s-eye-view (BEV) images is computationally expensive. We therefore adopt a navigation topological graph for compact state representation. From the local navigation map, a set of viewpoints v i = ( x i , y i ) V is uniformly sampled within free space. Each viewpoint is connected to its 20 nearest neighbors if the straight-line segment is collision-free, forming edges e i j = ( v i , v j ) E . The resulting graph G = ( V , E ) captures both local geometry and traversability.
Each vertex v i is associated with the following attributes: (1) relative position ( x i x c , y i y c ) ; (2) a binary visit indicator b i ; and (3) the number of observable frontier points u i , which serves as a proxy for information gain. We define the edge weight in the navigation graph as
e i j = D ( v i , v j ) , if the segment v i v j is collision - free , , otherwise ,
where D ( v i , v j ) denotes the Euclidean distance between viewpoints v i and v j . Accordingly, the system state at time t is represented by
s t = G t = ( V t , E t ) .
When all information gains satisfy u i = 0 , the exploration task is considered complete.

3.3.2. Action Space Design

The action space consists of neighboring vertices directly connected to the current position v c :
A t = { v i ( v c , v i ) E t } .
The policy network π ( · | θ π ) selects an action a t = v i * corresponding to the next viewpoint to explore. This high-level decision is forwarded to a local navigation module that generates collision-free motion commands. A rolling replanning mechanism continuously re-evaluates the selected goal based on the updated state, ensuring responsive and real-time decision-making.

3.3.3. Reward Function Design

The reward function guides the learning behavior of the SAC agent. To balance exploration completeness and motion efficiency, we define five reward components.
  • Exploration coverage reward r e :
    r e = k ( η t η t 1 ) , if η t > η t 1 , 0.002 , otherwise ,
    where η t = S k / S u is the ratio of explored area to total area at time t.
  • Frontier information gain reward r n :
    r n = N f ,
    where N f is the number of newly observed frontier points.
  • Frontier distribution variation reward r b :
    r b = c t c t 1 2 ,
    where c t is the centroid of the frontier distribution at time t.
  • Path cost penalty r d :
    r d = O t O t 1 ,
    which penalizes redundant motion.
  • Exploration completion reward r f :
    r f = 20 , if the environment is fully explored , 0 , otherwise .
The overall reward is defined as
r = α e r e + α n r n + α b r b + α d r d + r f ,
where α e = 1 , α n = 1 60 , α b = 1 80 , and α d = 1 16 .
  • The complete workflow of the proposed SAC-based active exploration algorithm is summarized in Algorithm 1. The algorithm integrates perception, high-level decision-making, motion execution, and policy optimization into a unified reinforcement learning pipeline.
Algorithm 1 SAC-Based Active Exploration for Mobile Robots
  • Require: Initial local map M 0 , initial pose ( x 0 , y 0 ) , policy network π ( · | θ π ) , Q-network Q ( · | θ Q )
  • Ensure: Exploration trajectory P * , final reconstructed map M final
1:
Initialize environment state s 0 = G 0 = ( V 0 , E 0 )
2:
Initialize replay buffer D
3:
repeat
4:
   Sample viewpoints V t from local map M t
5:
   Construct topological graph G t = ( V t , E t ) and compute vertex attributes ( x i , y i , b i , u i )
6:
   Select next target viewpoint via policy: a t = π ( s t | θ π )
7:
   Plan local path path ( v c a t ) and execute robot motion
8:
   Update map M t + 1 using new LiDAR observations
9:
   Rebuild topology G t + 1 and update state s t + 1
10:
  Compute reward r t = α e r e + α n r n + α b r b + α d r d + r f
11:
  Store transition ( s t , a t , r t , s t + 1 ) into buffer D
12:
  Sample a mini-batch from D and update:
θ Q θ Q η Q L Q ( θ Q ) , θ π θ π η π L π ( θ π )
13:
  Soft update target network: θ Q τ θ Q + ( 1 τ ) θ Q
14:
until  u i = 0 for all v i V t
15:
return  P * , M final
As shown in Algorithm 1, the SAC-based active exploration algorithm enables mobile robots to perform adaptive exploration under online map updates. By incorporating a topology-aware state space, multi-objective reward shaping, and rolling replanning, the proposed approach improves coverage efficiency while reducing path redundancy.

3.3.4. Spatiotemporal Feature Fusion and Policy Evaluation Network

The proposed reinforcement learning framework employs a deep neural network to extract high-dimensional implicit representations from the explicit, low-dimensional attributes of the navigation topology graph, including vertex position, information gain, and connectivity. As illustrated in Figure 5, the network takes the vertices and edges of the navigation graph as input and outputs both the Actor policy and the Critic state–action value estimates for the current exploration state.
Specifically, the input graph is first processed by a six-layer Transformer-based encoder that aggregates features of adjacent nodes to capture local spatial correlations. A Node Pyramid Network (NPN) module is then applied to fuse multi-scale vertex features, enabling perception of structural information across different spatial resolutions. To model temporal dependencies, an LSTM-based module integrates memory vectors from previously visited nodes to produce a spatiotemporally fused feature representation of the current exploration state. Finally, a decoding network maps the fused representation to the action distribution over neighboring nodes for the Actor and to the corresponding value estimates for the Critic.
This hierarchical design enables the policy network to jointly reason about spatial topology and temporal exploration history, supporting long-horizon exploration decisions with improved efficiency and adaptability.

3.4. Coarse-to-Fine Multi-Resolution NDT Relocalization Based on Prior Map

To achieve robust and high-precision global relocalization during system initialization, we propose a coarse-to-fine multi-resolution Normal Distributions Transform (NDT) method based on a prior map. The initial pose inferred from BTC descriptor matching relies on sparse triangular-vertex correspondences; thus, limited geometric support and potential descriptor mismatches can yield large initialization errors. To refine this coarse estimate, the proposed method registers the current LiDAR scan to the prior map using a hierarchical multi-resolution probabilistic model, improving both accuracy and convergence robustness.
The core idea is to extend conventional NDT into a multi-resolution hierarchy inspired by pyramid-based image alignment. We construct NDT likelihood fields at multiple spatial resolutions and perform registration progressively from coarse to fine. At coarse levels, the likelihood field is more tolerant to initialization errors, enabling robust global alignment. As the resolution decreases, the pose is refined incrementally to achieve precise registration. This hierarchical strategy improves convergence robustness compared with single-resolution NDT and ICP-based approaches while maintaining high alignment accuracy.
In implementation, the target (prior) point cloud is voxelized at four resolution levels r = { 10 m , 5 m , 3 m , 1 m } to construct multi-resolution NDT maps. For each voxel containing at least five points, we follow the standard NDT formulation [67] and model the local point set { p k } with a Gaussian distribution of mean μ and covariance Σ :
μ = 1 n k = 1 n p k , Σ = 1 n 1 k = 1 n ( p k μ ) ( p k μ ) T .
Given a source point p transformed by a pose T, its likelihood in the corresponding target voxel is
P ( T , p ) = 1 ( 2 π ) 3 / 2 | Σ | exp 1 2 ( T p μ ) T Σ 1 ( T p μ ) ,
which is the conventional NDT likelihood used for scan registration [52,67].
To improve robustness against outliers, we adopt a mixed Gaussian model consistent with robust NDT variants [67]:
P ( T , p ) = c 1 exp 1 2 ( T p μ ) T Σ 1 ( T p μ ) + c 2 P 0 ,
where c 1 and c 2 are normalization coefficients determined by the probability integral constraint, and P 0 is a predefined noise ratio that mitigates the influence of outliers during optimization. A detailed derivation of the NDT-based registration objective and its optimization is provided in Appendix B.
On the basis of this probabilistic model, we employ the Levenberg–Marquardt (LM) algorithm to optimize the joint likelihood of all source points and estimate the transformation T * between the source and target point clouds. The optimization proceeds hierarchically from coarse to fine resolutions. As shown in Figure 6, the algorithm first performs a coarse alignment in the r = 10 m likelihood field using the BTC-based initialization and then successively refines the result in finer fields ( r = 5 m , 3 m , 1 m ), with each level initialized by the solution from the previous level. This hierarchical refinement promotes stable convergence toward an accurate global pose estimate.
Overall, the proposed coarse-to-fine multi-resolution NDT relocalization combines probabilistic modeling with hierarchical optimization to achieve precise and robust global pose estimation even under large initialization errors. This approach improves relocalization reliability in complex environments and provides a consistent initialization for multi-session SLAM and prior-map-based localization.

4. Experiments and Discussion

4.1. Experimental Platform

The mobile robot platform used in this study is shown in Figure 7. It is built on a differential-drive chassis and integrates multiple onboard sensors for real-time localization, mapping, and autonomous exploration. The robot is equipped with a 3D LiDAR, an inertial measurement unit (IMU), and an RGB-D camera, together with an onboard computing module for running the proposed algorithms. Although 3D LiDAR sensors may be relatively costly for small-scale applications, HV-LIOM targets research-grade and industrial mobile robots that require centimeter-level localization accuracy. The proposed adaptive hash-voxel representation reduces redundant memory usage and supports real-time operation on compact computing platforms such as Intel NUC and NVIDIA Jetson Xavier. The key hardware components and specifications are summarized in Table 1.

4.2. Real-Time Pose Estimation and Mapping

This section evaluates real-time pose estimation and point-cloud mapping performance based on the proposed adaptive hybrid hash-voxel algorithm. Experiments are conducted on the indoor Hilti dataset [68], the outdoor NCLT dataset [69], and our custom mobile robot platform to quantitatively assess accuracy and efficiency, and to qualitatively examine the reconstructed maps. For clarity, the hybrid voxel-based LiDAR–inertial odometry front-end is denoted as HV-LIO, while the SLAM system that integrates the adaptive hybrid hash-voxel representation with loop closure and global map optimization is denoted as HV-LIO-SAM. We compare our method with two representative LiDAR–inertial baselines, the optimization-based LIO-SAM [2] and the filtering-based FAST-LIO2 [3].
Pose estimation accuracy is evaluated using the Absolute Pose Error (APE) between the estimated trajectory and the ground truth. We report the maximum (Max), mean, median, root mean square error (RMSE), and standard deviation (Std). In addition, computational efficiency and map reconstruction quality are analyzed to provide complementary qualitative evidence.

4.2.1. Indoor Experiment on the Hilti Dataset

The Hilti dataset is a benchmark for multimodal SLAM in complex indoor environments. It contains sparse-feature and repetitive-structure scenes and is well suited for evaluating LiDAR–inertial SLAM under challenging indoor conditions. Table 2 reports the APE RMSE of different algorithms across representative Hilti sequences. Both FAST-LIO2 and the proposed HV-LIO achieve centimeter-level accuracy on multiple sequences. Notably, HV-LIO consistently outperforms FAST-LIO2, indicating that the adaptive hybrid hash-voxel representation can improve data association stability and estimation accuracy in cluttered indoor scenes.
Compared with FAST-LIO2, HV-LIO reduces the APE RMSE by approximately 5.7–17.3% across different sequences. A qualitative comparison between the estimated trajectories and the ground-truth trajectories is shown in Figure 8.
To provide a more intuitive comparison, Figure 9 shows the overall and zoomed-in trajectories for the most challenging sequence, exp16. As shown in Figure 9b, the trajectory estimated by HV-LIO is closer to the ground truth than that of FAST-LIO2. As summarized in Table 2, HV-LIO achieves an RMSE of 0.3683 m on exp16, corresponding to a 15.2% improvement over FAST-LIO2 (0.4344 m). These results validate the effectiveness of the proposed adaptive hybrid hash-voxel representation for accurate indoor localization.

4.2.2. Outdoor Experiment on the NCLT Dataset

The NCLT dataset is a large-scale, long-term benchmark for outdoor robotic localization and mapping. Figure 10a shows the estimated trajectories of HV-LIO, HV-LIO-SAM, LIO-SAM, and FAST-LIO2 on the 0110 sequence. The close-up view in Region B (Figure 10b) indicates that HV-LIO aligns more closely with the ground truth than FAST-LIO2. After integrating loop closure and global optimization, HV-LIO-SAM further improves trajectory consistency, demonstrating the benefit of the proposed back-end within the adaptive hash-voxel framework.
Quantitative APE statistics are summarized in Table 3. HV-LIO achieves a mean error of 0.77 m and an RMSE of 0.85 m, improving the RMSE by approximately 7.6% compared with FAST-LIO2. With loop closure and global optimization, HV-LIO-SAM further reduces the mean error to 0.64 m and the RMSE to 0.70 m, representing a 23.9% improvement over FAST-LIO2 and a 17.6% improvement over HV-LIO. These results confirm that the proposed loop-closure and pose-graph optimization effectively mitigate long-term drift in large-scale outdoor mapping.

4.2.3. Mobile Robot Pose Estimation and Mapping Experiment

We further evaluate HV-LIO and HV-LIO-SAM on the physical mobile robot platform and compare their pose accuracy and mapping consistency with existing LiDAR–inertial methods. The experiment was conducted in a corridor environment in Building 6 of the National Defense Science and Technology Park at the Beijing Institute of Technology. The environment is narrow and contains long, repetitive corridors and multiple T-shaped junctions, posing challenges for LiDAR–inertial odometry and long-term map consistency.
Figure 11 compares FAST-LIO2 and HV-LIO-SAM in terms of reconstructed maps and estimated trajectories. The blue curve in Figure 11b denotes the robot trajectory, and the blue filled circles indicate loop closures detected by our system. These results suggest that the proposed learning-based loop-closure module can reliably identify revisited locations in repetitive corridor environments.
In regions visited only once, FAST-LIO2 produces visually consistent maps. However, in repeatedly traversed areas—particularly around the T-junctions—accumulated drift leads to visible misalignment and double contours, as shown in Figure 11a. By contrast, HV-LIO-SAM yields a more globally consistent map without obvious ghosting in revisited regions (Figure 11b). This indicates that combining the adaptive hash-voxel representation with loop closure and global pose-graph optimization improves map coherence in challenging indoor environments.
To illustrate how the proposed map representation models scene structure, Figure 12 visualizes the voxel-level features built by HV-LIO-SAM. Circular patches correspond to locally fitted planar features, while square patches correspond to merged planar regions that represent extended structural planes such as walls or floors. Neighboring square patches form large continuous planes. Red ellipsoids denote non-planar features modeled as probabilistic volumetric Gaussians. Planar features are primarily associated with walls and the floor, whereas non-planar features tend to occur near structural edges such as wall intersections. These observations confirm that the adaptive hash-voxel map captures both planar and non-planar geometry and reuses these features as stable constraints during pose estimation.
We also compare trajectory accuracy among different methods in this environment. LIO-SAM, which relies on fixed edge and planar feature extraction, fails to maintain stable tracking and cannot complete the full trajectory. FAST-LIO2 produces a complete odometric trajectory via continuous-surface assumptions and incremental scan-to-map registration, but exhibits a maximum drift of 1.06 m along the vertical (z) axis. By contrast, HV-LIO and HV-LIO-SAM limit the maximum z-axis error to 0.32 m by leveraging the adaptive hybrid hash-voxel representation to better constrain local structure. This highlights the improved vertical stability of our method in indoor scenes where vertical observability is typically weak for ground robots.
We further benchmark runtime performance on an Intel(R) Xeon(R) E-2286M CPU @ 2.40 GHz to provide a hardware-independent reference. The per-frame timing breakdown is summarized in Table 4. For HV-LIO, the dominant components are (i) data preprocessing (point cloud downsampling, IESKF motion prediction, and deskewing), requiring 4.91 ms per frame on average; and (ii) data association and iterative state update within the IESKF, requiring 7.34 ms per frame. The overall average time per frame is 14.32 ms, comparable to FAST-LIO2 (13.65 ms). This corresponds to an effective pose-estimation rate of 69.83 Hz, which is higher than the 10 Hz scan rate of the LiDAR. These results demonstrate that the proposed method delivers accurate and drift-suppressed pose estimates in real time and is suitable for deployment on resource-constrained mobile robot platforms.
In addition to the quantitative comparisons in Section 4.2, a structured summary of methodological differences between HV-LIOM and representative state-of-the-art approaches is provided in Table A1 (Appendix A), highlighting the complementary advances in mapping, loop closure, prior-map relocalization, and autonomous exploration.

4.3. Active Exploration in Unknown Environments

This section validates the effectiveness and generalization of the proposed SAC-based active exploration algorithm in simulated environments. We compare our method with two representative 3D LiDAR-based autonomous exploration baselines: the sampling-based hierarchical planner TARE [70] proposed by Chao Cao et al. and the reinforcement-learning-driven method ARiADNE-L [71] proposed by Yuhong Cao et al. The relationship between explored volume and travel distance for the three methods is illustrated in Figure 13.
The results show that our method and the baselines achieve comparable performance in the early and mid stages of exploration. In the later stage, however, the proposed algorithm exhibits a faster growth of explored volume and completes full exploration with a shorter total travel distance. These observations suggest that the designed reward shaping, together with spatiotemporal feature modeling, enables the SAC policy to learn more effective long-horizon exploration behaviors than TARE and ARiADNE-L.
Quantitative comparisons of the total travel distance and time required to complete full mapping are summarized in Table 5. In Scene 1, our method completes exploration in 550.78 s with a travel distance of 970.52 m, reducing the travel distance and exploration time by 4.12% and 5.29%, respectively, compared with ARiADNE-L. In Scene 2, the proposed algorithm achieves full exploration with a travel distance of 1293.37 m in 775.39 s, corresponding to 9.38% and 7.20% reductions in distance and time over ARiADNE-L. Overall, these results indicate that the proposed SAC-based strategy achieves more efficient long-term exploration than both sampling-based and learning-based baselines in indoor simulated environments.

4.4. Prior Map-Based Localization and Navigation Experiments

To quantitatively evaluate the proposed multi-resolution Normal Distributions Transform (MR-NDT) relocalization, we align a LiDAR frame collected during autonomous mapping with the global prior point-cloud map. The reference pose for relocalization is taken as the estimated pose from the mapping session. To emulate initialization errors introduced by BTC-based global pose estimation, we apply random translational and rotational perturbations to the reference poses. A relocalization attempt is considered successful if the translation error is below 10 cm and the rotation error is below 3°. Four groups of experiments with different initial error magnitudes are conducted, each containing 100 random samples. The success rates of different methods are reported in Table 6, where bold and underlined entries denote the best and second-best results, respectively. A visual comparison of the point clouds before and after relocalization is shown in Figure 14.
As shown in Table 6, single-resolution NDT (e.g., NDT-3 at 3 m resolution) cannot simultaneously achieve robustness and accuracy across varying initialization errors. Coarser resolutions are more tolerant to large initial deviations but typically yield lower precision, whereas finer resolutions provide higher accuracy under small initial errors yet degrade rapidly as the initialization uncertainty increases. In contrast, the proposed MR-NDT integrates multiple resolutions in a coarse-to-fine scheme, thereby combining the complementary strengths of different scales. Even with initial translational and rotational errors of 10 m and 10°, respectively, MR-NDT achieves a 77% success rate, demonstrating strong robustness to large initialization uncertainty.
The relocalization error comparison between MR-NDT and BTC-ICP is shown in Figure 15. Across 50 independent trials, MR-NDT achieves an RMSE of 5.35 cm and a mean error of 4.97 cm, substantially outperforming BTC-ICP (RMSE 16.69 cm, mean 13.24 cm). Moreover, MR-NDT maintains a localization error below 10 cm in 48 out of 50 trials, indicating reliable accuracy across different initial positions and viewpoints.

4.4.1. Real-Time Localization and Navigation Experiments

After the global initialization relocalization described in Section 4.4, we conduct real-time localization and navigation experiments using the estimated initial pose to validate prior-map-based deployment. In such scenarios, localization accuracy is more critical than incremental mapping accuracy: as long as the current LiDAR scan can be accurately registered to the prior map, the navigation system can reliably guide the robot to the desired target in both the digital map and the physical environment. Because ground-truth trajectories are unavailable in the real indoor factory scene, we use the keyframe poses from the autonomous mapping session as reference for quantitative evaluation.
As shown in Figure 16, two representative moments of the real-time localization process are visualized. The colored point clouds represent real-time LiDAR scans, while the white point cloud corresponds to the prior map constructed during autonomous exploration. The high degree of overlap indicates accurate scan-to-map alignment. The cyan curve denotes the estimated robot trajectory, suggesting that the proposed method provides stable and continuous localization during motion.
The translation and rotation error variations during real-time localization are shown in Figure 17. The rotation error remains nearly constant, and the translation error does not exhibit noticeable accumulation over time. This behavior is consistent with the proposed hybrid localization strategy, which combines a low-frequency, high-precision global registration module with a high-frequency local LiDAR–inertial odometry module to suppress long-term drift. Across all time steps, the mean translation error is 5.35 cm and the mean rotation error is 0.15°, meeting the precision requirements for indoor mobile-robot localization.
Finally, autonomous navigation experiments are conducted using the prior point-cloud map, the derived 2D navigation grid map, and real-time localization information. As shown in Figure 18, the white point cloud represents the 3D prior map, the black region at the bottom indicates the 2D navigation grid map, and the colored points show real-time LiDAR perception. The red arrow denotes the navigation goal, and the green curve is the global path planned by the navigation module. These results demonstrate that the proposed system can integrate multi-source map representations, plan collision-free paths, and guide the robot to designated targets reliably.

4.4.2. Ablation Study

To verify the effectiveness of the coarse-to-fine multi-resolution refinement, we conduct ablation experiments by removing specific resolution levels. As shown in Table 6, removing the coarsest 10 m level reduces the success rate from 77% to 26% under the 10 m– 10 initialization error condition. Omitting the finest 1 m level also degrades performance, particularly under small-to-moderate initialization errors, indicating that fine-scale likelihood fields are important for precise convergence after coarse alignment. These results confirm that progressive multi-resolution refinement improves both convergence stability and alignment accuracy. Consistently, the post-relocalization scan aligns closely with the prior map in Figure 14b.
To further assess global relocalization accuracy, we conduct 50 additional trials using keyframe point clouds from the mapping process without an external initial pose (identity initialization). We compare the absolute pose errors (APE) with those of BTC-ICP [65]. The corresponding BTC feature extraction and matching results are shown in Figure 19. Most extracted keypoints lie along walls, obstacles, and height-discontinuity edges, capturing informative local geometry. The matched BTC feature pairs (orange and red squares) and descriptors (white and yellow triangles) exhibit tight overlap, supporting the reliability of the proposed global initialization module.
In addition to voxel-resolution ablations, we evaluate the influence of key hyperparameters and network components in the SAC-based active exploration module. We focus on the learning rates ( η Q , η π ), discount factor ( γ ), and entropy temperature ( α ). The policy converges stably when γ is between 0.95 and 0.99 and α is between 0.05 and 0.2, consistent with typical SAC configurations. Excessively large α encourages over-exploration, whereas overly small α can reduce policy diversity and exploration efficiency.
We also analyze the impact of network architecture. Increasing the number of Transformer encoder layers from four to six improves spatial feature aggregation and long-horizon exploration success, while further increasing the depth yields marginal gains at higher computational cost. The LSTM-based temporal fusion module is critical for integrating sequential exploration history. Removing the LSTM reduces the coverage rate by approximately 6% and slows policy convergence, confirming the benefit of explicit temporal modeling for exploration.
Overall, these ablation results suggest that the proposed architecture provides a practical trade-off among learning stability, exploration efficiency, and computational cost, supporting reliable performance across diverse unknown environments.

5. Conclusions

This paper presents HV-LIOM, an integrated LiDAR–inertial SLAM and active exploration framework for efficient and robust 3D mapping in unknown environments. The proposed system comprises four key components: an adaptive hybrid hash-voxel map representation for efficient spatial management, a BTC-based loop-closure and global optimization module for long-term drift mitigation, a SAC-based active exploration policy that enables autonomous decision-making via spatiotemporal feature learning, and a multi-resolution NDT strategy for accurate prior-map-based relocalization.
Extensive experiments on public datasets and a custom mobile robot platform demonstrate the effectiveness of the proposed framework. On the Hilti indoor dataset, HV-LIO improves absolute pose accuracy by up to 15.2% over FAST-LIO2, and HV-LIO-SAM further enhances global consistency with loop-closure optimization. On the large-scale NCLT outdoor dataset, HV-LIO reduces localization RMSE by 7.6%, while HV-LIO-SAM achieves an additional 17.6% improvement after global optimization. For autonomous exploration, the SAC-based policy completes unknown-environment mapping with reduced travel distance and exploration time, achieving up to 9.4% and 7.2% reductions compared with representative sampling-based and learning-based baselines. For prior-map-based relocalization, MR-NDT attains an average RMSE of 5.35 cm, outperforming BTC-ICP by more than 68% and maintaining a 77% success rate under 10 m– 10 initialization errors.
Overall, HV-LIOM improves accuracy, efficiency, and robustness across the mapping, loop-closure, relocalization, and exploration pipeline. By unifying adaptive voxelized mapping, learning-based loop closure, active exploration, and coarse-to-fine relocalization, this work advances LiDAR–inertial SLAM toward higher autonomy and long-term reliability in large-scale, complex environments.
Despite these advantages, LiDAR-based SLAM remains sensitive to adverse environmental conditions, including heavy rain, fog, strong sunlight, and dense vegetation, which can degrade range measurements and affect mapping quality. HV-LIOM is validated under standard indoor and outdoor conditions on the Hilti and NCLT datasets and demonstrates stable performance in typical scenarios. However, as the system primarily relies on geometric constraints from LiDAR and IMU, it may still face challenges in geometrically ambiguous scenes or environments with significant dynamic objects. Future work will investigate multimodal fusion with radar and RGB cameras and incorporate learning-based geometric priors to improve robustness and scene understanding in visually complex and adverse settings, with a particular focus on neural implicit mapping.

Author Contributions

Conceptualization, S.F. and W.Z.; methodology, S.F., W.Z., and C.S.; software, S.F. and Z.Z.; validation, S.F., Z.Z., X.T., P.X., and X.H.; formal analysis, S.F., C.L., and W.Z.; investigation, S.F., X.T., P.X., and X.H.; resources, W.Z., C.L., and X.H.; data curation, S.F., Z.Z., X.T., P.X., and M.G.; writing—original draft preparation, S.F.; writing—review and editing, W.Z., C.S., and X.C.; visualization, S.F., C.L., and Z.Z.; supervision, X.C.; project administration, S.F. and W.Z.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Technology and Development Joint Research Foundation of Henan Province (Grant No. 225200810070) and the Science and Technology Innovation Program of Beijing Institute of Technology (Grant No. 2024CX06013).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We gratefully acknowledge the insightful comments and constructive suggestions provided by the editor and reviewers, which have greatly contributed to improving the quality of this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
HV-LIOMAdaptive Hash-Voxel LiDAR–Inertial Odometry and Mapping (overall system)
HV-LIOHybrid Voxel-based LiDAR–Inertial Odometry (front-end odometry)
HV-LIO-SAMHybrid Voxel-based LiDAR–Inertial SLAM with Loop Closure and Mapping
SLAMSimultaneous Localization and Mapping
LiDARLight Detection and Ranging
IMUInertial Measurement Unit
GNSSGlobal Navigation Satellite System
LIOLiDAR–Inertial Odometry
ICPIterative Closest Point
NDTNormal Distributions Transform
SACSoft Actor–Critic
BTCBinary Triangle Coding
APEAbsolute Pose Error
RMSERoot Mean Square Error
BEVBird’s-Eye View
RGB-DRed–Green–Blue and Depth
NCLTNorth Campus Long-Term dataset

Appendix A

Table A1. Structured comparison between HV-LIOM and representative LiDAR–inertial SLAM and localization methods.
Table A1. Structured comparison between HV-LIOM and representative LiDAR–inertial SLAM and localization methods.
MethodMap RepresentationLoop Closure/Global OptimizationPrior-Map RelocalizationAutonomous ExplorationMain Characteristics
LOAM [1]Feature-based (kd-tree)Yes (pose graph)NoNoHigh-accuracy LiDAR odometry with separate tracking and mapping threads.
LIO-SAM [2]Feature-based (ikd-tree)Yes (factor graph)Limited (same-session)NoTightly coupled LiDAR–IMU optimization with keyframe-based loop closure.
FAST-LIO2 [3]Direct mapping (ikd-tree)No explicit loop closureNoNoHigh-frequency direct LiDAR–IMU odometry with efficient incremental kd-tree.
LocNet [30]Point-cloud/voxel featuresNo global exploration moduleYes (learned relocalization)NoDeep learning-based pose regression or feature matching for global localization.
HV-LIOM (ours)Adaptive hash-voxel with planar/non-planar probabilistic featuresYes: learning-based BTC loop closure + factor graph optimizationYes: multi-resolution NDT-based prior-map relocalizationYes: SAC-based active explorationUnified framework for real-time SLAM, multi-session localization, and deep RL-driven exploration on a shared adaptive voxel map.

Appendix B

In this appendix, we briefly outline the derivation of the NDT registration objective used in our multi-resolution relocalization module. Starting from the voxel-wise Gaussian model in (21) and (22), the negative log-likelihood of a source point p given pose T is
s ( T , p ) = log P ( T , p ) = 1 2 ( T p μ ) T Σ 1 ( T p μ ) + const
Summing over all source points { p m } yields the overall objective
L ( T ) = m ( T , p m )
which is minimized using a Levenberg–Marquardt (LM) optimizer on the SE(3) manifold, as described in Section 3.4. The robust mixture model in (23) is handled analogously by replacing P ( T , p ) in the negative log-likelihood.

References

  1. Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. In Proceedings of the Robotics: Science and Systems, Seattle, WA, USA, 12–16 July 2014. [Google Scholar]
  2. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5135–5142. [Google Scholar] [CrossRef]
  3. Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. FAST-LIO2: Fast Direct LiDAR-Inertial Odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
  4. Xu, X.; Zhang, L.; Yang, J.; Cao, C.; Wang, W.; Ran, Y.; Tan, Z.; Luo, M. A Review of Multi-Sensor Fusion SLAM Systems Based on 3D LIDAR. Remote Sens. 2022, 14, 2835. [Google Scholar] [CrossRef]
  5. Zheng, S.; Wang, J.; Rizos, C.; Ding, W.; El-Mowafy, A. Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and Analysis. Remote Sens. 2023, 15, 1156. [Google Scholar] [CrossRef]
  6. Besl, P.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
  7. Yue, X.; Zhang, Y.; Chen, J.; Chen, J.; Zhou, X.; He, M. LiDAR-based SLAM for robotic mapping: State of the art and new frontiers. Ind. Robot. Int. J. Robot. Res. Appl. 2024, 51, 196–205. [Google Scholar] [CrossRef]
  8. Shan, T.; Englot, B. LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4758–4765. [Google Scholar] [CrossRef]
  9. Zhang, J.; Wen, W.; Huang, F.; Chen, X.; Hsu, L.T. Coarse-to-Fine Loosely-Coupled LiDAR-Inertial Odometry for Urban Positioning and Mapping. Remote Sens. 2021, 13, 2371. [Google Scholar] [CrossRef]
  10. Tang, J.; Chen, Y.; Niu, X.; Wang, L.; Chen, L.; Liu, J.; Shi, C.; Hyyppä, J. LiDAR Scan Matching Aided Inertial Navigation System in GNSS-Denied Environments. Sensors 2015, 15, 16710–16728. [Google Scholar] [CrossRef]
  11. Zhang, B.; Yang, C.; Xiao, G.; Li, P.; Xiao, Z.; Wei, H.; Liu, J. Loosely Coupled PPP/Inertial/LiDAR Simultaneous Localization and Mapping (SLAM) Based on Graph Optimization. Remote Sens. 2025, 17, 812. [Google Scholar] [CrossRef]
  12. Ye, H.; Chen, Y.; Liu, M. Tightly Coupled 3D Lidar Inertial Odometry and Mapping. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3144–3150. [Google Scholar] [CrossRef]
  13. Le Gentil, C.; Vidal-Calleja, T.; Huang, S. IN2LAAMA: Inertial Lidar Localization Autocalibration and Mapping. IEEE Trans. Robot. 2021, 37, 275–290. [Google Scholar] [CrossRef]
  14. Qin, C.; Ye, H.; Pranata, C.E.; Han, J.; Zhang, S.; Liu, M. LINS: A Lidar-Inertial State Estimator for Robust and Efficient Navigation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 8899–8906. [Google Scholar] [CrossRef]
  15. Xu, W.; Zhang, F. FAST-LIO: A Fast, Robust LiDAR-Inertial Odometry Package by Tightly-Coupled Iterated Kalman Filter. IEEE Robot. Autom. Lett. 2021, 6, 3317–3324. [Google Scholar] [CrossRef]
  16. Yuan, Y.; Wu, Y.; Yue, M.; Gong, M.; Fan, X.; Ma, W.; Miao, Q. Learning Discriminative Features via Multi-Hierarchical Mutual Information for Unsupervised Point Cloud Registration. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 8343–8354. [Google Scholar] [CrossRef]
  17. Cao, Y.; Xu, X.; Shen, W. Complementary pseudo multimodal feature for point cloud anomaly detection. Pattern Recognit. 2024, 156, 110761. [Google Scholar] [CrossRef]
  18. Zhao, L.; Hu, Y.; Yang, X.; Wang, Y.; Dou, Z.; Zhang, Y. Voxel Pillar Multi-frame Cross Attention Network for sparse point cloud robust single object tracking. Pattern Recognit. 2025, 167, 111771. [Google Scholar] [CrossRef]
  19. Chen, S.; Zhou, B.; Jiang, C.; Xue, W.; Li, Q. A LiDAR/Visual SLAM Backend with Loop Closure Detection and Graph Optimization. Remote Sens. 2021, 13, 2720. [Google Scholar] [CrossRef]
  20. Scovanner, P.; Ali, S.; Shah, M. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, New York, NY, USA, 24–29 September 2007; pp. 357–360. [Google Scholar] [CrossRef]
  21. Salti, S.; Tombari, F.; Di Stefano, L. SHOT: Unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 2014, 125, 251–264. [Google Scholar] [CrossRef]
  22. He, L.; Wang, X.; Zhang, H. M2DP: A novel 3D point cloud descriptor and its application in loop closure detection. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 231–237. [Google Scholar] [CrossRef]
  23. Kim, G.; Kim, A. Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4802–4809. [Google Scholar] [CrossRef]
  24. Komorowski, J. MinkLoc3D: Point Cloud Based Large-Scale Place Recognition. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 1789–1798. [Google Scholar] [CrossRef]
  25. Cui, Y.; Chen, X.; Zhang, Y.; Dong, J.; Wu, Q.; Zhu, F. BoW3D: Bag of Words for Real-Time Loop Closing in 3D LiDAR SLAM. IEEE Robot. Autom. Lett. 2023, 8, 2828–2835. [Google Scholar] [CrossRef]
  26. Xu, X.; Lu, S.; Wu, J.; Lu, H.; Zhu, Q.; Liao, Y.; Xiong, R.; Wang, Y. RING++: Roto-Translation Invariant Gram for Global Localization on a Sparse Scan Map. IEEE Trans. Robot. 2023, 39, 4616–4635. [Google Scholar] [CrossRef]
  27. Yuan, C.; Lin, J.; Zou, Z.; Hong, X.; Zhang, F. STD: Stable Triangle Descriptor for 3D place recognition. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1897–1903. [Google Scholar] [CrossRef]
  28. Cattaneo, D.; Vaghi, M.; Valada, A. LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM. IEEE Trans. Robot. 2022, 38, 2074–2093. [Google Scholar] [CrossRef]
  29. Vidanapathirana, K.; Ramezani, M.; Moghadam, P.; Sridharan, S.; Fookes, C. LoGG3D-Net: Locally Guided Global Descriptor Learning for 3D Place Recognition. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2215–2221. [Google Scholar] [CrossRef]
  30. Yin, H.; Tang, L.; Ding, X.; Wang, Y.; Xiong, R. LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 728–733. [Google Scholar] [CrossRef]
  31. Chen, X.; Läbe, T.; Milioto, A.; Röhling, T.; Vysotska, O.; Haag, A.; Behley, J.; Stachniss, C. OverlapNet: Loop Closing for LiDAR-based SLAM. In Proceedings of the Robotics: Science and Systems (RSS), Virtually, 12–16 July 2020. [Google Scholar]
  32. Grisetti, G.; Kümmerle, R.; Stachniss, C.; Burgard, W. A Tutorial on Graph-Based SLAM. IEEE Intell. Transp. Syst. Mag. 2010, 2, 31–43. [Google Scholar] [CrossRef]
  33. Placed, J.A.; Strader, J.; Carrillo, H.; Atanasov, N.; Indelman, V.; Carlone, L.; Castellanos, J.A. A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers. IEEE Trans. Robot. 2023, 39, 1686–1705. [Google Scholar] [CrossRef]
  34. Lluvia, I.; Lazkano, E.; Ansuategi, A. Active Mapping and Robot Exploration: A Survey. Sensors 2021, 21, 2445. [Google Scholar] [CrossRef]
  35. Leong, K. Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment. arXiv 2023, arXiv:2307.07296. [Google Scholar] [CrossRef]
  36. Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97. ‘Towards New Computational Principles for Robotics and Automation’. In Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97. ‘Towards New Computational Principles for Robotics and Automation’, Monterey, CA, USA, 10–11 July 1997. [CrossRef]
  37. Faigl, J.; Kulich, M. On determination of goal candidates in frontier-based multi-robot exploration. In Proceedings of the 2013 European Conference on Mobile Robots, Barcelona, Catalonia, Spain, 25–27 September 2013; pp. 210–215. [Google Scholar] [CrossRef]
  38. Batinovic, A.; Petrovic, T.; Ivanovic, A.; Petric, F.; Bogdan, S. A Multi-Resolution Frontier-Based Planner for Autonomous 3D Exploration. IEEE Robot. Autom. Lett. 2021, 6, 4528–4535. [Google Scholar] [CrossRef]
  39. Dai, A.; Papatheodorou, S.; Funk, N.; Tzoumanikas, D.; Leutenegger, S. Fast Frontier-based Information-driven Autonomous Exploration with an MAV. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 9570–9576. [Google Scholar] [CrossRef]
  40. Bircher, A.; Kamel, M.; Alexis, K.; Oleynikova, H.; Siegwart, R. Receding Horizon “Next-Best-View” Planner for 3D Exploration. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1462–1468. [Google Scholar] [CrossRef]
  41. Lau, B.P.L.; Ong, B.J.Y.; Loh, L.K.Y.; Liu, R.; Yuen, C.; Soh, G.S.; Tan, U.X. Multi-AGV’s Temporal Memory-Based RRT Exploration in Unknown Environment. IEEE Robot. Autom. Lett. 2022, 7, 9256–9263. [Google Scholar] [CrossRef]
  42. Dang, T.; Tranzatto, M.; Khattak, S.; Mascarich, F.; Alexis, K.; Hutter, M. Graph-based subterranean exploration path planning using aerial and legged robots. J. Field Robot. 2020, 37, 1363–1388. [Google Scholar] [CrossRef]
  43. Zhu, H.; Cao, C.; Xia, Y.; Scherer, S.; Zhang, J.; Wang, W. DSVP: Dual-Stage Viewpoint Planner for Rapid Exploration by Dynamic Expansion. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7623–7630. [Google Scholar] [CrossRef]
  44. Cao, C.; Zhu, H.; Ren, Z.; Choset, H.; Zhang, J. Representation granularity enables time-efficient autonomous exploration in large, complex worlds. Sci. Robot. 2023, 8, eadf0970. [Google Scholar] [CrossRef]
  45. Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep Reinforcement Learning Robot for Search and Rescue Applications: Exploration in Unknown Cluttered Environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
  46. Schmid, L.; Ni, C.; Zhong, Y.; Siegwart, R.; Andersson, O. Fast and Compute-Efficient Sampling-Based Local Exploration Planning via Distribution Learning. IEEE Robot. Autom. Lett. 2022, 7, 7810–7817. [Google Scholar] [CrossRef]
  47. Chen, F.; Bai, S.; Shan, T.; Englot, B. Self-Learning Exploration and Mapping for Mobile Robots via Deep Reinforcement Learning. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar] [CrossRef]
  48. Zhu, D.; Li, T.; Ho, D.; Wang, C.; Meng, M.Q.H. Deep Reinforcement Learning Supervised Autonomous Exploration in Office Environments. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7548–7555. [Google Scholar] [CrossRef]
  49. Li, H.; Zhang, Q.; Zhao, D. Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2064–2076. [Google Scholar] [CrossRef]
  50. Chen, F.; Martin, J.D.; Huang, Y.; Wang, J.; Englot, B. Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 6140–6147. [Google Scholar] [CrossRef]
  51. Cao, Y.; Hou, T.; Wang, Y.; Yi, X.; Sartoretti, G. ARiADNE: A Reinforcement learning approach using Attention-based Deep Networks for Exploration. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 10219–10225. [Google Scholar] [CrossRef]
  52. Biber, P.; Strasser, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 3, pp. 2743–2748. [Google Scholar] [CrossRef]
  53. Segal, A.V.; Hähnel, D.; Thrun, S. Generalized-ICP. In Robotics: Science and Systems; Wiley Online Library: Hoboken, NJ, USA, 2009. [Google Scholar]
  54. Takeuchi, E.; Tsubouchi, T. A 3-D Scan Matching using Improved 3-D Normal Distributions Transform for Mobile Robotic Mapping. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 3068–3073. [Google Scholar] [CrossRef]
  55. Akai, N.; Morales, L.Y.; Takeuchi, E.; Yoshihara, Y.; Ninomiya, Y. Robust localization using 3D NDT scan matching with experimentally determined uncertainty and road marker matching. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1356–1363. [Google Scholar] [CrossRef]
  56. Valencia, R.; Saarinen, J.; Andreasson, H.; Vallvé, J.; Andrade-Cetto, J.; Lilienthal, A.J. Localization in highly dynamic environments using dual-timescale NDT-MCL. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 3956–3962. [Google Scholar] [CrossRef]
  57. Engel, N.; Hoermann, S.; Horn, M.; Belagiannis, V.; Dietmayer, K. DeepLocalization: Landmark-based Self-Localization with Deep Neural Networks. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 926–933. [Google Scholar] [CrossRef]
  58. Liu, D.; Cui, Y.; Yan, L.; Mousas, C.; Yang, B.; Chen, Y. DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation. Proc. AAAI Conf. Artif. Intell. 2021, 35, 6101–6109. [Google Scholar] [CrossRef]
  59. Wang, P.; Yang, R.; Cao, B.; Xu, W.; Lin, Y. DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5860–5869. [Google Scholar] [CrossRef]
  60. Jiang, W.; Xue, H.; Si, S.; Xiao, L.; Zhao, D.; Zhu, Q.; Nie, Y.; Dai, B. R2SCAT-LPR: Rotation-Robust Network with Self- and Cross-Attention Transformers for LiDAR-Based Place Recognition. Remote Sens. 2025, 17, 1057. [Google Scholar] [CrossRef]
  61. Teschner, M.; Heidelberger, B.; Müller, M.; Pomerantes, D.; Gross, M.H. Optimized Spatial Hashing for Collision Detection of Deformable Objects. In Proceedings of the 8th International Fall Workshop on Vision, Modeling, and Visualization, VMV 2003, München, Germany, 19–21 November 2003; Ertl, T., Ed.; Aka GmbH: Rüsselsheim, Germany, 2003; pp. 47–54. [Google Scholar]
  62. Nießner, M.; Zollhöfer, M.; Izadi, S.; Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 2013, 32, 1–11. [Google Scholar] [CrossRef]
  63. Kümmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. G2o: A general framework for graph optimization. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613. [Google Scholar] [CrossRef]
  64. Dellaert, F. Factor Graphs and GTSAM: A Hands-on Introduction; Georgia Institute of Technology: Atlanta, GA, USA, 2012. [Google Scholar]
  65. Yuan, C.; Lin, J.; Liu, Z.; Wei, H.; Hong, X.; Zhang, F. BTC: A Binary and Triangle Combined Descriptor for 3-D Place Recognition. IEEE Trans. Robot. 2024, 40, 1580–1599. [Google Scholar] [CrossRef]
  66. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
  67. Magnusson, M. The Three-Dimensional Normal-Distributions Transform: An Efficient Representation for Registration, Surface Analysis, and Loop Detection. Doctoral Dissertation, Örebro Universitet, Örebro, Sweden, 2009. [Google Scholar]
  68. Zhang, L.; Helmberger, M.; Fu, L.F.T.; Wisth, D.; Camurri, M.; Scaramuzza, D.; Fallon, M. Hilti-Oxford Dataset: A Millimeter-Accurate Benchmark for Simultaneous Localization and Mapping. IEEE Robot. Autom. Lett. 2023, 8, 408–415. [Google Scholar] [CrossRef]
  69. Carlevaris-Bianco, N.; Ushani, A.K.; Eustice, R.M. University of Michigan North Campus long-term vision and lidar dataset. Int. J. Rob. Res. 2016, 35, 1023–1035. [Google Scholar] [CrossRef]
  70. Cao, C.; Zhu, H.; Choset, H.; Zhang, J. TARE: A Hierarchical Framework for Efficiently Exploring Complex 3D Environments. In Proceedings of the Robotics: Science and Systems XVII, Virtual Event, 12–16 July 2021. [Google Scholar]
  71. Cao, Y.; Zhao, R.; Wang, Y.; Xiang, B.; Sartoretti, G. Deep Reinforcement Learning-Based Large-Scale Robot Exploration. IEEE Robot. Autom. Lett. 2024, 9, 4631–4638. [Google Scholar] [CrossRef]
Figure 1. Overall framework of the proposed unified LiDAR–inertial SLAM and active exploration system. The system integrates four main modules: (1) LiDAR–Inertial Odometry with an Adaptive Hybrid Hash-Voxel Map for real-time pose estimation and incremental mapping; (2) BTC-based Loop-Closure Detection and Global Map Optimization for drift correction and global consistency; (3) SAC-based Active Exploration of Unknown Environments enabling autonomous decision-making via spatiotemporal feature learning; and (4) Prior Map-based Real-Time Localization and Navigation for precise relocalization and trajectory planning. All components are deployed on the self-developed mobile robot platform equipped with LiDAR, IMU, and onboard computing units.
Figure 1. Overall framework of the proposed unified LiDAR–inertial SLAM and active exploration system. The system integrates four main modules: (1) LiDAR–Inertial Odometry with an Adaptive Hybrid Hash-Voxel Map for real-time pose estimation and incremental mapping; (2) BTC-based Loop-Closure Detection and Global Map Optimization for drift correction and global consistency; (3) SAC-based Active Exploration of Unknown Environments enabling autonomous decision-making via spatiotemporal feature learning; and (4) Prior Map-based Real-Time Localization and Navigation for precise relocalization and trajectory planning. All components are deployed on the self-developed mobile robot platform equipped with LiDAR, IMU, and onboard computing units.
Sensors 25 07558 g001
Figure 2. Structure of the Adaptive Hybrid Hash-Voxel Map.
Figure 2. Structure of the Adaptive Hybrid Hash-Voxel Map.
Sensors 25 07558 g002
Figure 3. Dynamic maintenance algorithm for the adaptive hybrid hash-voxel map.
Figure 3. Dynamic maintenance algorithm for the adaptive hybrid hash-voxel map.
Sensors 25 07558 g003
Figure 4. Pose-graph optimization constraints.
Figure 4. Pose-graph optimization constraints.
Sensors 25 07558 g004
Figure 5. Architecture of the Spatiotemporal Policy Evaluation Network.
Figure 5. Architecture of the Spatiotemporal Policy Evaluation Network.
Sensors 25 07558 g005
Figure 6. Multi-resolution NDT-based precise relocalization. The white point cloud represents the prior map, while the colored point cloud denotes the newly registered scan during relocalization. The proposed hierarchical optimization refines alignment accuracy progressively from coarse to fine resolutions.
Figure 6. Multi-resolution NDT-based precise relocalization. The white point cloud represents the prior map, while the colored point cloud denotes the newly registered scan during relocalization. The proposed hierarchical optimization refines alignment accuracy progressively from coarse to fine resolutions.
Sensors 25 07558 g006
Figure 7. Mobile robot hardware platform and coordinate-frame definitions.
Figure 7. Mobile robot hardware platform and coordinate-frame definitions.
Sensors 25 07558 g007
Figure 8. Trajectory comparisons on representative Hilti sequences. (a) exp04. (b) exp05. (c) exp06. (d) exp14.
Figure 8. Trajectory comparisons on representative Hilti sequences. (a) exp04. (b) exp05. (c) exp06. (d) exp14.
Sensors 25 07558 g008
Figure 9. Trajectory comparison on the exp16 sequence of the Hilti indoor dataset. (a) Overall trajectories. (b) Zoomed-in local comparison.
Figure 9. Trajectory comparison on the exp16 sequence of the Hilti indoor dataset. (a) Overall trajectories. (b) Zoomed-in local comparison.
Sensors 25 07558 g009
Figure 10. Trajectory comparison on the 0110 sequence of the NCLT dataset. (a) Overall trajectories. (b) Zoomed-in comparison in Region B.
Figure 10. Trajectory comparison on the 0110 sequence of the NCLT dataset. (a) Overall trajectories. (b) Zoomed-in comparison in Region B.
Sensors 25 07558 g010
Figure 11. Point cloud mapping on the mobile robot platform. (a) FAST-LIO2. (b) Proposed HV-LIO-SAM.
Figure 11. Point cloud mapping on the mobile robot platform. (a) FAST-LIO2. (b) Proposed HV-LIO-SAM.
Sensors 25 07558 g011
Figure 12. Voxel-level structural feature visualization of the proposed adaptive hybrid hash-voxel map.
Figure 12. Voxel-level structural feature visualization of the proposed adaptive hybrid hash-voxel map.
Sensors 25 07558 g012
Figure 13. Relationship between exploration volume and travel distance for different algorithms. (a) Indoor Scene 1. (b) Indoor Scene 2.
Figure 13. Relationship between exploration volume and travel distance for different algorithms. (a) Indoor Scene 1. (b) Indoor Scene 2.
Sensors 25 07558 g013
Figure 14. Point-cloud alignment before and after precise relocalization against the prior map. The colored point cloud denotes the registered scan, while the white point cloud represents the prior map. (a) Before relocalization (translation/rotation error: 10 cm / 10 ). (b) After relocalization.
Figure 14. Point-cloud alignment before and after precise relocalization against the prior map. The colored point cloud denotes the registered scan, while the white point cloud represents the prior map. (a) Before relocalization (translation/rotation error: 10 cm / 10 ). (b) After relocalization.
Sensors 25 07558 g014
Figure 15. Accuracy comparison of global-initialization relocalization.
Figure 15. Accuracy comparison of global-initialization relocalization.
Sensors 25 07558 g015
Figure 16. Real-time registration-based localization results: (a) at 65 s; (b) at 152 s. The real-time scan is aligned with the prior map.
Figure 16. Real-time registration-based localization results: (a) at 65 s; (b) at 152 s. The real-time scan is aligned with the prior map.
Sensors 25 07558 g016
Figure 17. Translation and rotation error curves of the real-time localization experiment.
Figure 17. Translation and rotation error curves of the real-time localization experiment.
Sensors 25 07558 g017
Figure 18. Autonomous navigation of the mobile robot based on a prior map.
Figure 18. Autonomous navigation of the mobile robot based on a prior map.
Sensors 25 07558 g018
Figure 19. BTC keypoint detection and descriptor matching results.
Figure 19. BTC keypoint detection and descriptor matching results.
Sensors 25 07558 g019
Table 1. Hardware configuration of the mobile robot.
Table 1. Hardware configuration of the mobile robot.
ComponentModel/Parameters
Dimensions 0.54 m × 0.46 m × 0.82 m
Drive typeDifferential drive
CPUAMD Ryzen 7 5800H @ 3.2 GHz
LiDARHesai PandarXT-32 @ 10 Hz
IMULPMS-IG1 (9-axis AHRS)
RGB-D cameraIntel RealSense D435
Table 2. Comparison of absolute pose error (APE) RMSE (m) on the Hilti indoor dataset.
Table 2. Comparison of absolute pose error (APE) RMSE (m) on the Hilti indoor dataset.
Methodexp04exp05exp06exp14exp16
LIO-SAM0.80510.06270.81082.0336fail
FAST-LIO20.02080.02090.03250.03670.4344
HV-LIO0.01720.01890.02980.03460.3683
Table 3. Trajectory accuracy comparison on the NCLT dataset.
Table 3. Trajectory accuracy comparison on the NCLT dataset.
MethodMax (m)Mean (m)Median (m)RMSE (m)
LIO-SAM40.4512.8211.7814.29
FAST-LIO22.050.870.790.92
HV-LIO2.180.770.700.85
HV-LIO-SAM1.660.640.600.70
Table 4. Comparison of the average computation time per frame for real-time pose estimation algorithms (ms).
Table 4. Comparison of the average computation time per frame for real-time pose estimation algorithms (ms).
AlgorithmData PreprocessingData Association & Iterative UpdateMap Incremental UpdateTotal Time per Frame
LIO-SAM19.448.96\28.40
FAST-LIO24.868.270.5213.65
HV-LIO4.917.342.0714.32
Table 5. Comparison of autonomous exploration performance in indoor simulated scenes.
Table 5. Comparison of autonomous exploration performance in indoor simulated scenes.
SceneMethodExploration Distance (m)Exploration Time (s)
Scene 1TARE1306.77749.77
ARiADNE-L1012.24581.57
HV-LIOM (SAC)970.52550.78
Scene 2TARE1563.55897.79
ARiADNE-L1427.35835.59
HV-LIOM (SAC)1293.37775.39
Table 6. Comparison of success rates (%) for precise relocalization methods under different initialization errors.
Table 6. Comparison of success rates (%) for precise relocalization methods under different initialization errors.
Method1 m–1°5 m–5°10 m–10°15 m–15°
NDT-343%48%8%2%
MR-NDT (w/o 10 m)99%79%26%12%
MR-NDT (w/o 1 m)44%38%42%28%
MR-NDT99%97%77%45%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, S.; Chen, X.; Zhang, W.; Xu, P.; Zuo, Z.; Tan, X.; He, X.; Sheikder, C.; Guo, M.; Li, C. HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration. Sensors 2025, 25, 7558. https://doi.org/10.3390/s25247558

AMA Style

Fan S, Chen X, Zhang W, Xu P, Zuo Z, Tan X, He X, Sheikder C, Guo M, Li C. HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration. Sensors. 2025; 25(24):7558. https://doi.org/10.3390/s25247558

Chicago/Turabian Style

Fan, Shicheng, Xiaopeng Chen, Weimin Zhang, Peng Xu, Zhengqing Zuo, Xinyan Tan, Xiaohai He, Chandan Sheikder, Meijun Guo, and Chengxiang Li. 2025. "HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration" Sensors 25, no. 24: 7558. https://doi.org/10.3390/s25247558

APA Style

Fan, S., Chen, X., Zhang, W., Xu, P., Zuo, Z., Tan, X., He, X., Sheikder, C., Guo, M., & Li, C. (2025). HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration. Sensors, 25(24), 7558. https://doi.org/10.3390/s25247558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop