Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach

Yoon, Hojin; Choi, Haegyeom; Jeong, Jaehoon; Lee, Donghun

doi:10.3390/math14030579

Open AccessArticle

Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach

Department of Mechanical Engineering, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2026, 14(3), 579; https://doi.org/10.3390/math14030579

Submission received: 30 December 2025 / Revised: 3 February 2026 / Accepted: 5 February 2026 / Published: 6 February 2026

(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

In industrial environments, providing intuitive spatial information via 3D maps is essential for maximizing the efficiency of teleoperation. However, existing SLAM algorithms generating 3D maps predominantly focus on improving robot localization accuracy, often neglecting the optimization of viewability required for human operators to clearly perceive object depth and structure in virtual environments. To address this, this study proposes a methodology to optimize the viewability of RTAB-Map-based 3D maps using the Taguchi method, aiming to enhance VR teleoperation efficiency and reduce cognitive workload. We identified eight key parameters that critically affect visual quality and utilized an L18 orthogonal array to derive an optimal combination that controls point cloud density and noise levels. Experimental results from a target object picking task demonstrated that the optimized 3D map reduced task completion time by approximately 9 s compared to the RGB image condition, achieving efficiency levels approaching those of the physical-world baseline. Furthermore, evaluations using NASA-TLX confirmed that intuitive visual feedback minimized situational awareness errors and substantially alleviated cognitive workload. This study suggests a new direction for constructing high-efficiency teleoperation interfaces from a Human–Robot Interaction perspective by expanding SLAM optimization criteria from geometric precision to user-centric visual quality.

Keywords:

teleoperation; viewability; 3D mapping; cognitive workload; virtual reality

MSC:

93C85

1. Introduction

As robotic technologies continue to advance [1], the deployment of robots in complex indoor environments such as warehouses and retail shops has expanded, increasing the need for Human-in-the-Loop approaches that allow human operators to intervene and control tasks in unexpected situations [2]. In real-world scenarios, various unforeseen events occur, including the sudden appearance of obstacles on navigation paths, interference between multiple robots, and grasping failures due to errors in target object pose estimation. To resolve these issues, operators must accurately perceive the on-site situation and actively intervene when necessary [3]. While the most intuitive method for situational awareness involves the operator physically visiting the site for visual inspection, repetitive on-site travel leads to reduced operational efficiency [4], increased operating costs, and safety concerns [5]. Consequently, teleoperation methods that enable operators to assess situations and resolve issues from a remote location, without the need for physical presence, have recently gained significant attention [6].

In current industrial teleoperation systems, the most common source of information relied upon by operators for situational awareness is real-time 2D RGB video feeds from robot-mounted cameras [7]. However, such 2D video-based interfaces suffer from a limited Field of View (FOV) and a lack of depth information, imposing significant limitations on accurately perceiving the spatial structure of real-world environments characterized by complex object clutter [8]. Consequently, these perceptual deficits often lead to failures in teleoperation tasks [9]. To overcome the limitations of conventional 2D video approaches, teleoperation methods that integrate 3D maps with Virtual Reality (VR) environments have recently emerged as a promising alternative [10]. VR-based interfaces allow for the inspection of the surroundings of the robot from multiple angles through free viewpoint manipulation within a 3D space, thereby providing enhanced spatial perception and improving the success rates of teleoperation tasks [11].

Conventionally, Digital Twins, distinct from clinical-domain counterparts focused on physiological modeling [12], have been implemented by constructing 3D maps based on predefined models in order to replicate and simulate physical assets in industrial environments [13]. However, in environments such as logistics warehouses and retail shops, where layout changes occur frequently and object placement or stacking configurations vary over time, predefined model-based approaches face inherent limitations in reflecting such changes in real time [14]. In contrast, visual SLAM-based 3D mapping offers distinct advantages in terms of environmental synchronization, as it enables continuous updates of spatial information based on real-world sensor data [15]. Consequently, in industrial and logistics settings subject to frequent environmental changes, constructing VR interfaces utilizing measurement-based visual maps proves to be a more efficient approach.

In the field of visual SLAM, various algorithms such as ORB-SLAM, DROID-SLAM, OpenVSLAM, and RTAB-Map have been proposed, each exhibiting distinct characteristics depending on their approaches to feature extraction, loop closure strategies, and memory management structures [16,17,18,19]. Among these, RTAB-Map is capable of generating dense 3D maps and occupancy grids in real time; as a graph-based algorithm, it utilizes memory management strategies optimized for long-term or large-scale mapping and operates robustly in obstacle-rich environments, making it highly suitable for industrial and logistics applications [20,21]. However, previous studies on RTAB-Map have primarily focused on improving geometric accuracy, such as localization precision, trajectory error, and loop closure performance, while often neglecting viewability requirements, such as how operators perceive and interpret maps in teleoperation scenarios [22,23]. In actual teleoperation environments, although map noise levels, point density, and structural clarity directly affect an operator’s situational awareness and judgment, research that systematically analyzes these user-centric visual quality factors remains limited [24,25,26].

Therefore, to ensure the visibility required for VR teleoperation, this study analyzes the internal pipeline of RTAB-Map and derives an optimal parameter combination using a Taguchi experimental design that incorporates user-based visual quality assessments. Subsequently, to validate the practical efficacy of the proposed 3D map-based interface utilizing these optimal parameters against conventional methods, a target object picking task was conducted under three distinct interface conditions: (1) a physical environment involving direct on-site control by the operator, (2) a conventional real-time 2D RGB video feed, and (3) the proposed VR-based teleoperation utilizing the optimized 3D map. Through this comparative evaluation, this study goes beyond the limitations of conventional 2D visual information by presenting a 3D map-based teleoperation framework with enhanced operator-centered viewability and experimentally demonstrating its effectiveness.

The remainder of this paper is organized as follows. Section 2 reviews related works on teleoperation systems and visual SLAM-based 3D mapping approaches. Section 3 describes the materials and methods, including the analysis of the RTAB-Map pipeline, the selection of viewability-related parameters, and the Taguchi experimental design. Section 4 presents the experimental results obtained from the Taguchi-based analysis and target object picking experiments under different teleoperation interfaces. Section 5 discusses the results with respect to user-centered map viewability in VR-based teleoperation. Finally, Section 6 concludes the paper and outlines directions for future work.

2. Related Works

2.1. 2D Video-Based Teleoperation Systems

Traditional teleoperation systems primarily provide visual information through 2D video-based interfaces. While these systems offer the advantage of ease of implementation, they possess structural limitations, most notably a restricted FOV and a lack of depth information, which degrade the operator’s situational awareness in complex work environments. Information acquisition via a 2D screen is often likened to observing the world through a keyhole or a straw, a phenomenon known as the “keyhole effect” that hinders the perception of the global context of the surrounding environment [27]. To compensate for these deficits, free-viewpoint methods have been introduced; however, these often introduce a secondary challenge where the operator must manually control the camera viewpoint while simultaneously manipulating the robot. Consequently, the additional operational workload required for viewpoint management frequently outweighs the intended benefits of improved visual information acquisition [28].

Furthermore, studies on Electroencephalography (EEG) analysis indicate that this 2D–3D cognitive transformation process sharply increases the mental fatigue of operators. The non-intuitive nature of 2D interfaces, which deviates from natural human spatial cognition, serves as a primary factor hindering system immersion and operational efficiency [29]. Moreover, the monocular visual information provided by 2D displays fails to offer the stereoscopic sense essential for depth perception, making precise distance estimation difficult [30]. Particularly in teleoperation scenarios where communication latency is inevitable, 2D methods—unlike 3D interfaces that can actively mitigate visual discrepancies—exacerbate control instability and pose a significant risk of inducing simulator sickness [31]. Therefore, to overcome the limitations of 2D-based systems—characterized by restricted fields of view, the absence of depth information, and high cognitive load—3D interface technologies capable of providing intuitive spatial information are essential.

2.2. 3D Video-Based Teleoperation Systems

To circumvent the spatial cognition constraints inherent in 2D interfaces, various studies have leveraged 3D spatial information. These approaches are broadly categorized into those utilizing pre-defined 3D models and those based on visual SLAM-based 3D mapping frameworks. Research utilizing pre-built 3D models can enhance the stability of task planning by providing a noise-free, refined, and structured environment; however, such methods possess a structural limitation in their total reliance on a priori environmental information [13]. Furthermore, if the virtual model fails to synchronize perfectly with physical changes in the real-world environment in real time, there is a risk of divergence between the simulated path and the actual surroundings [32]. Particularly in unstructured environments—where terrain fluctuates frequently or materials are in motion—static 3D models alone are insufficient to completely prevent collisions with peripheral equipment or to perform precise self-localization [33].

Conversely, research based on visual SLAM offers the advantage of immediately reflecting changes in dynamic environments by constructing 3D maps from real-time visual sensor data. Accordingly, various visual SLAM algorithms, including ORB-SLAM, DROID-SLAM, OpenVSLAM, and RTAB-Map, have been proposed. Among these, RTAB-Map is commonly adopted as a practical SLAM framework for indoor environments, as it enables stable map generation even in scenes with repetitive visual patterns—such as warehouses—by rigorously verifying graph constraints during loop-closure detection through a transformation variance mechanism [34]. Furthermore, RTAB-Map fuses visual and depth data in real time to generate 3D maps optimized for indoor settings, effectively representing object boundaries and structural characteristics of the surrounding environment [35]. Due to these attributes, RTAB-Map is highly suitable for application in remote operation and VR-based tasks conducted in complex indoor environments.

2.3. RTAB-Map-Based Teleoperation Systems

Given its robust performance in indoor and structured environments, numerous studies have sought to optimize RTAB-Map for various operational contexts. Existing literature has primarily focused on parameter tuning from a geometric and technical perspective to improve localization accuracy and overall map consistency. Specifically, several studies have adjusted key parameters such as Rehearsal Similarity and Maximum Depth to enhance mapping accuracy [36], while others have modified loop-closure detection and graph optimization processes to achieve globally consistent maps [22]. In addition, research on long-term operation in large-scale environments has addressed memory management issues by adjusting parameters such as maxRetrieved to ensure stable and efficient map generation [37].

However, improvements in geometric accuracy and map consistency do not necessarily guarantee that the resulting maps are visually intuitive or easily interpretable by human operators. Despite the structural robustness of RTAB-Map, the generated point clouds are often inherently discontinuous and sparse, containing sensor noise that hinders an operator’s ability to clearly perceive object boundaries and depth [38]. Moreover, during the real-time reconstruction of massive 3D datasets, the inevitable trade-off between visibility and computational efficiency frequently leads to the loss or blurring of critical visual details [10]. This lack of visual continuity has been identified as a significant limitation that disrupts spatial cognition and prevents precise manipulation, particularly in unknown or cluttered environments [39]. These observations indicate that optimizing RTAB-Map solely from a technical perspective is insufficient for teleoperation scenarios that heavily rely on human visual perception. Therefore, to address the limitations of existing research that overlooks operator-centric requirements, this study systematically analyzes the influence of RTAB-Map parameters on map viewability.

3. Materials and Methods

3.1. RTAB-Map Processing Pipeline Overview

This study aims to analyze the influence of RTAB-Map parameters on the visual quality of three-dimensional maps in a VR-based teleoperation environment. To ensure strict control of experimental variables, a single RGB-D camera was employed. Although LiDAR sensors and multi-sensor fusion approaches are generally effective in improving localization accuracy and system robustness, LiDAR-based maps provide limited color information, and multi-sensor fusion increases system complexity and experimental variability, making them less suitable for parameter-oriented visual quality analysis. In contrast, an RGB-D camera simultaneously provides depth and color information, offering richer visual cues for spatial perception and structural interpretation in VR environments. The overall architecture and data processing pipeline of the RTAB-Map-based map generation system using a single RGB-D camera are illustrated in Figure 1.

The system comprises three main phases: data acquisition and synchronization, hierarchical memory-based loop closure detection, and map generation via graph optimization. In the initial phase, Data Acquisition and Synchronization, the system receives RGB and depth images from an RGB-D camera together with Transform (TF) data from the UR5e robot. To address the different input frequencies of these data streams, an Approximate Time Synchronization strategy is employed. While resampling techniques involving signal interpolation are commonly used to align asynchronous data streams [40], this study utilizes a synchronization method that selects the message with the nearest timestamp within a specified tolerance. This approach preserves the integrity of the raw depth and pose data without introducing artificial interpolation. Specifically, incoming messages are stored in a fixed-size buffer, and the depth and TF data exhibiting the smallest time difference relative to the reference RGB stream—within a predefined tolerance—are selected. These temporally aligned data are then merged into a single node and forwarded to the subsequent processing stage.

The second phase, Memory Management and Loop Closure, constitutes the core mechanism of RTAB-Map, operating on a hierarchical memory structure comprising Short-Term Memory (STM), Working Memory (WM), and Long-Term Memory (LTM). Newly created nodes are initially stored in STM and are excluded from the search space of the loop closure algorithm to prevent the formation of redundant adjacent loops. Upon exiting STM after a specific duration, nodes are transferred to WM, where they are subjected to visual similarity comparisons with existing nodes to execute loop closure and proximity detection. To maintain real-time performance, nodes with lower importance are moved to the LTM for long-term storage, and when a loop closure is detected, the corresponding nodes are retrieved from the LTM back into the WM.

In the third phase, Optimization-based Map Generation, the graph optimization module is activated whenever new constraint links are added to the graph following loop closure detection, thereby minimizing pose estimation errors across the entire graph. This process corrects accumulated drift between nodes and establishes a globally consistent pose graph. Finally, the optimized pose graph is integrated with the raw sensor data to generate a 3D point cloud map and a 2D occupancy grid map suitable for navigation.

The primary objective of this study is not to enhance the precise localization accuracy of the 3D maps generated via RTAB-Map, but rather to optimize their visibility within teleoperation environments. Consequently, given that the RGB-D camera is mounted on the end-effector of the UR5e manipulator, we explicitly calculated the coordinate transformation chain (base_link → end_effector → RGB-D camera) based on the kinematic structure provided by Universal Robots. Let

T_{c}^{b}

denote the homogeneous transformation matrix representing the pose of the RGB-D camera frame expressed in the base_link frame. This transformation is computed as

T_{c}^{b} = T_{e}^{b} T_{c}^{e}

(1)

where

T_{e}^{b}

is obtained from the forward kinematics of the UR5e manipulator, and

T_{c}^{e}

is the fixed extrinsic calibration matrix between the end-effector and the RGB-D camera. This calculated transformation was then utilized as the odometry input source for RTAB-Map. While standard sensor calibration can mitigate pose errors [41], subtle residual errors are inherently unavoidable. Such pose uncertainties not only introduce inaccuracies in 3D map generation but also undermine the reliability of this study, which aims to isolate the causes of map quality variations solely to parameter configurations. Therefore, this study eliminates external pose errors by accurately modeling camera pose changes based on the actual joint kinematics of the robot, ensuring that any observed changes in visibility are strictly attributable to the RTAB-Map parameters.

Furthermore, to fundamentally eliminate issues arising from pose inconsistencies during map generation, data was collected from various viewpoints by actuating the UR5e manipulator while keeping the mobile base stationary. This approach minimized pose discrepancies by utilizing accurate TF information derived from the Forward Kinematics model of the UR5e, without requiring additional odometry for estimating the base_link position. Such an experimental design removes confounding variables that could introduce map generation errors, thereby providing a controlled environment to independently analyze the specific impact of RTAB-Map parameter variations on visibility.

3.2. Selection of Viewability-Related RTAB-Map Parameters

The overall architecture of the RTAB-Map-based map generation system can be subdivided into the following phases: data acquisition (Stages 1–3), node creation (Stage 4), loop closure and proximity detection (Stage 5), graph optimization (Stage 6), and final map generation (Stage 7). Excluding the initial data acquisition phases (Stages 1–3), each subsequent stage involves dozens of parameters governing distinct functions, including viewpoint handling, feature extraction, memory management, registration and optimization, and grid generation. While prior research has predominantly focused on adjusting these parameters to enhance trajectory accuracy or loop-closure performance, this study aims to improve visibility to facilitate effective map interpretation by operators during teleoperation. Therefore, it is imperative to screen the extensive set of RTAB-Map parameters and select only those that directly influence the visual appearance and quality of the 3D map.

In general SLAM environments, Stages 5 and 6 play a pivotal role in correcting accumulated odometry drift and recovering global consistency during revisit scenarios. However, in this study, sources of drift were fundamentally eliminated by generating the 3D map with a stationary mobile base and utilizing drift-free, precise odometry explicitly calculated from the TF of the UR5e as the input for RTAB-Map. Under these ideal odometry conditions, the impact of loop closure-based correction or graph optimization on the geometric structure or visibility of the 3D map becomes negligible. Nevertheless, the parameter Rtabmap/DetectionRate in Stage 5 was identified as a system-level parameter governing the frequency of loop closure attempts, which can influence computational scheduling and data processing stability within RTAB-Map. Consequently, this parameter was exceptionally included in the tuning scope, whereas all other parameters associated with Stages 5 and 6 were excluded from the candidate pool.

This study focused on parameters associated with Stages 4 and 7, which are the primary factors determining the visual quality of 3D maps [17,42]. As summarized in Table 1, these stages encompass various parameters that achieve distinct goals, including node generation criteria, input image and feature quality, and point cloud resolution and density. Given that previous works [43] have highlighted the critical impact of these configurations on mapping performance and memory management, we applied three distinct selection criteria to identify specific target parameters for tuning from this extensive pool.

The first criterion assesses whether each parameter exerts a direct influence on the visibility of the 3D map. For instance, parameters such as Grid/CellSize, Grid/DepthDecimation, Grid/NoiseFilteringMinNeighbors, and Grid/NoiseFilteringRadius directly induce changes in point cloud density, resolution, noise filtering intensity, and surface smoothness. These factors play a critical role in determining object clarity, level of detail, and boundary sharpness within the map, and were therefore classified as key parameters for viewability optimization. In contrast, while Kp/MaxFeatures and Kp/DetectorStrategy may influence the quality of feature-based matching, their impact is negligible in the environment of this study, where drift-free, ideal odometry is explicitly provided. Additionally, although RangeMax adjusts the maximum depth detection range and can aid in removing distant noise, it is not considered a primary determinant of the overall visual appearance of the 3D map.

The second criterion evaluates whether variations in a parameter are continuously and clearly reflected in the visual characteristics of the 3D map, as the parameter optimization in this study is conducted using the Taguchi method. While various optimization approaches such as full factorial design, response surface methodology, and univariate analysis were considered, this study utilizes human-centric indicators, such as the visual satisfaction of the 3D map. Because these indicators are based on qualitative assessments, they require repetitive and resource intensive user evaluations for each experimental condition. Given these characteristics, it was a crucial consideration to derive reliable results within a logistically feasible and limited number of experimental trials. Accordingly, we adopted the Taguchi method, which utilizes orthogonal arrays to efficiently reduce the number of experiments while enabling a systematic analysis of the main effects of each parameter [44,45]. Since the Taguchi method evaluates performance differences by varying factors in stepwise levels, parameters must possess continuous characteristics to allow for the clear observation of experimental effects. Parameters such as RGBD/LinearUpdate, RGBD/AngularUpdate, Mem/ImagePreDecimation, and the Grid family satisfy this condition, as variations in their values are continuously reflected in visual attributes like depth density, smoothing intensity, and point cloud distribution. Conversely, parameters configured with Boolean or discrete options, such as RGBD/ProximityBySpace or Kp/DetectorStrategy, were deemed unsuitable for the Taguchi design because their impact on the map is either discontinuous or ambiguous.

The third criterion involves eliminating functional redundancy among parameters and selecting those that play a pivotal role in determining visibility. For instance, whereas Grid/NormalsK regulates the smoothness of normal estimation, it exhibits a smaller magnitude of impact on visibility changes and shares partial functional overlap compared to Grid/NoiseFilteringRadius, which influences general smoothing effects, or Grid/NoiseFilteringMinNeighbors, which determines noise filtering intensity. Accordingly, this study prioritized parameters with a more direct and significant influence on the visual quality of the 3D map, whereas those with redundant roles or limited efficacy in improving visibility were excluded from the tuning scope.

By comprehensively applying these three criteria, this study selected seven parameters from the extensive set available in Stages 4 and 7: RGBD/LinearUpdate, RGBD/AngularUpdate, Mem/ImagePreDecimation, Grid/CellSize, Grid/DepthDecimation, Grid/NoiseFilteringMinNeighbors, and Grid/NoiseFilteringRadius. Incorporating Rtabmap/DetectionRate, which was exceptionally selected from Stage 5, a total of eight parameters were finalized as the targets for optimization. All selected parameters are key factors that induce direct and observable changes in the visual quality of the 3D map, and their detailed descriptions are summarized in Table 2.

3.3. Taguchi Experimental Design for Viewability Optimization

Prior to conducting the Taguchi experiment for optimizing the eight selected parameters, ensuring the reproducibility of the 3D map generation environment is a prerequisite. However, excessive actuation speeds of the UR5e can induce minute vibrations in the mobile base, thereby compromising reproducibility even when generating 3D maps under identical conditions. Furthermore, considering research indicating that mapping accuracy decreases as the agent’s movement speed increases, we limited the operating speed of the UR5e to a linear velocity of 0.2 m/s and an angular velocity of 0.175 rad/s. This setup satisfies the stable performance range of 0.15–0.27 m/s identified in previous studies [46] while simultaneously ensuring base stability and enhancing reproducibility. Under these velocity conditions, it was confirmed that the depth camera requires approximately 2 s to completely move out of its initial FOV, and this duration was adopted as the real-time performance criterion in this study. Based on this benchmark, the levels for the Taguchi orthogonal array were determined to ensure that the computation time for all parameter combinations satisfies the 2-s threshold.

The parameters Mem/ImagePreDecimation and Grid/DepthDecimation regulate the sampling ratios for the input RGB images and depth data, respectively. Equations (2) and (3) describe the decimation process for the RGB images and their registered depth data, governed by Mem/ImagePreDecimation. The input RGB image, I, is downsampled to

I^{'}

using the decimation factor p. Subsequently, the depth image is decimated using an adaptively determined coefficient,

p_{D}

, to ensure consistency with the RGB resolution. This procedure prevents projection errors arising from resolution discrepancies between the RGB and depth data. Equation (4) defines the additional spatial subsampling of the depth data based on Grid/DepthDecimation. This process directly controls the density of depth samples during the point cloud generation phase, thereby significantly influencing the noise level and computational complexity of the map. Minimizing the decimation factor allows for the utilization of high-resolution information, which enhances visibility; however, it leads to a quadratic increase in computational load, thereby compromising real-time performance [47]. In representative test runs, as summarized in Table 3, reducing the depth decimation rate from 4 to 2 was observed to increase the loop processing time from approximately 0.1 s to 0.5 s. This increase exceeds the theoretical four-fold growth in data volume, indicating that lower decimation values can impose a disproportionate computational burden in practice. Furthermore, excessively low decimation values were found to cause an accumulation of depth noise. Consequently, the parameters for the Taguchi method were selected by carefully balancing visibility and real-time efficiency.

I^{'} = D (I, p)

(2)

T = ⌊\frac{H_{model}}{p}⌋, p_{D} = \{\begin{matrix} 1, & H_{D} \leq T, \\ ⌈\frac{H_{D}}{T}⌉, & H_{D} > T, \end{matrix} D^{'} = D (D, p_{D})

(3)

D^{″} (i, j) = \{\begin{matrix} D^{'} (i, j), & d \leq 1, \\ D^{'} (d / i, d / j), & d > 1 \end{matrix}

(4)

The parameters Grid/NoiseFilteringRadius and Grid/NoiseFilteringMinNeighbors are designed to remove spatially isolated cells as noise within the voxel grid-based map. NoiseFilteringRadius defines the spatial extent of neighbor cells considered for noise assessment, whereas NoiseFilteringMinNeighbors determines the minimum number of neighbor cells required to retain a cell within that range. Equation (5) denotes the set of neighbor cells,

N_{i}^{cell} (R)

, defined by the NoiseFilteringRadius, R. Here,

c_{i}

represents the reference cell, and

x_{i}

denotes its center coordinate; cells existing within the radius R are included in the neighbor set. Thus, as R increases, the spatial neighborhood scope utilized for noise discrimination expands. Equation (6) establishes the criteria for cell retention and removal based on NoiseFilteringMinNeighbors. Here,

K_{min}

signifies the minimum neighbor count required to maintain a cell. Higher values increase the likelihood of removing spatially isolated cells, resulting in more rigorous noise filtering. Based on this filtering mechanism, the levels for NoiseFilteringMinNeighbors were configured bidirectionally around the default value to facilitate a balanced assessment of how both increases and decreases affect map visibility. Conversely, the default value for NoiseFilteringRadius is 0, which corresponds to a state where no spatial neighborhood is considered, rendering negative expansion invalid. Consequently, this study adopted a unidirectional (level ascending) configuration that incrementally increases the radius to evaluate the impact of expanding the neighborhood range on noise removal and visibility.

N_{i}^{cell} (R) = \{c_{j} \neq c_{i} | {∥x_{j} - x_{i}∥}_{2} \leq R\}

(5)

|N_{i}^{cell} (R)| \geq K_{min} \Rightarrow keep, |N_{i}^{cell} (R)| < K_{min} \Rightarrow remove

(6)

The parameter Grid/CellSize is a critical factor determining the spatial resolution in voxel grid-based 3D maps. The space is partitioned into a regular grid where each voxel has a constant size

Δ

, and the index of the grid is defined as shown in Equation (7). Here,

x_{map}

and

y_{map}

denote the position in the map coordinate system, and

Δ

corresponds to Grid/CellSize. This equation indicates how a spatial point is assigned to a specific cell on the voxel grid; as the CellSize decreases, the space is discretized into a finer grid. Furthermore, as illustrated in Equation (7), for a fixed map area, a decrease in CellSize

Δ

causes the total number of voxels,

N_{cells}

, to increase in proportion to

Δ^{- 2}

. While this enhances surface continuity and boundary representation, the resulting surge in voxel count leads to higher memory consumption and computational load, thereby becoming a primary factor in increased loop times. Therefore, rather than indiscriminately minimizing the CellSize, this study established parameter levels for the Taguchi experiment by carefully considering the trade-off between map visibility and real-time processing performance.

(i, j) = (⌊\frac{x_{map}}{Δ}⌋, ⌊\frac{y_{map}}{Δ}⌋), N_{cells} \propto \frac{1}{Δ^{2}}

(7)

The update-related parameters, including DetectionRate, AngularUpdate, and LinearUpdate, collectively govern the frequency of map updates and the density of the pose graph. Specifically, DetectionRate determines the processing frequency for loop closure detection and memory updates, whereas AngularUpdate and LinearUpdate define the minimum rotational and translational displacements required to accept a new node, respectively. Adjusting these parameters to induce more frequent updates generally improves map continuity and visibility. However, excessive update frequency substantially increases computational load and loop time. Therefore, the parameter levels were selected to explicitly evaluate the effect of denser map updates relative to the default settings. Throughout this process, an explicit constraint was applied to ensure that the loop time does not exceed 2 s for any parameter combination. Detailed specifications are provided in Table 4.

To systematically analyze the factors presented in Table 4, a Taguchi

L_{18} (2^{1} \times 3^{7})

orthogonal array was employed. Each factor was assigned to a specific column within the

L_{18}

array, and a consistent UR5e scanning trajectory was applied across all experimental conditions to generate 3D maps using RTAB-Map. By varying only the parameter combinations under uniform kinematic conditions, a total of 18 distinct 3D maps were generated for the same environment; this process is illustrated in Figure 2.

3.4. Subjective Workload and Immersion Assessment

In this study, a usability evaluation was conducted to qualitatively analyze the cognitive load and immersion experienced by operators during teleoperation under three conditions: a Taguchi-optimized VR-based 3D map, RGB images, and the physical environment. NASA-TLX was adopted as the primary evaluation tool [48], as its validity is well-established for quantifying cognitive load in tasks requiring a continuous ’perception–judgment–control’ cycle, such as teleoperation. However, given that visual information serves as the primary data source in this experimental setup, standard NASA-TLX items alone are limited in fully capturing the variations in visual immersion across different interfaces. To address this and specifically analyze the impact of visual information on the user experience, additional immersion evaluation items were incorporated.

The evaluation survey comprised the six subscales of the NASA-TLX—Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration—along with a custom-added immersion item: ’How was your immersive experience while performing the task?’ All participants were required to respond to the questionnaire immediately upon completing the target object picking task under each experimental condition.

4. Results

4.1. Experiment Setup

To conduct the parameter optimization and target object picking experiments, a testbed was constructed as illustrated in Figure 3. The experimental environment was configured as an indoor space mimicking the structure of a real-world warehouse, featuring multi-tiered shelving units, boxes of varying dimensions, and complex object arrangements. This setup was designed to reproduce realistic operational conditions, such as reduced visibility and depth discontinuities, which are frequently encountered in logistics tasks. For the hardware setup, a mobile manipulator platform integrated with a Universal Robots UR5e manipulator was utilized. An Intel RealSense D455 camera was rigidly mounted on the end-effector of the UR5e to enable the acquisition of real-time RGB and depth data corresponding to the movements of the robot. Additionally, the system was equipped with an OnRobot VGC10 vacuum gripper to enable the execution of picking experiments.

During the RTAB-Map-based map generation process, the mobile base was kept stationary to minimize external disturbances, with only the UR5e manipulator actuated to generate diverse viewpoint changes. For the Taguchi-based optimization experiments, the robot arm was controlled to follow an identical scanning trajectory across all experimental conditions. This setup enabled the acquisition of data representing shelves and stacked boxes from multiple perspectives for map construction. The application of such a consistent scanning trajectory serves as a fundamental basis for objectively analyzing the impact of key RTAB-Map parameter variations on visibility within a realistic logistics environment under strictly controlled conditions.

4.2. Taguchi-Based Experimental Design for Viewability Optimization

As previously described, we generated 18 unique 3D maps by varying the parameter combinations in Table 4 according to a Taguchi

L_{18} (2^{1} \times 3^{7})

orthogonal array while maintaining the specified UR5e motion constraints. While it is necessary to score the visibility of each of the 18 generated 3D maps, visibility is an inherently human-centric factor that is challenging to quantify using numerical metrics alone. Consequently, this study conducted a qualitative assessment based on a 5-point Likert scale, evaluating each 3D map according to three distinct criteria: Readability, Similarity, and Usability. Specifically, Readability assessed the visual sharpness of text labels on box surfaces, Similarity evaluated the degree of structural correspondence between the 3D map and the actual physical environment, and Usability determined whether the 3D map could effectively substitute for the real environment during the teleoperation process. For every experimental combination, the cumulative score of these three metrics was normalized against a maximum possible score of 15 and subsequently converted into a percentage to derive the final visibility score. A total of 37 participants (25 males and 12 females) took part in the evaluation, and the mean value of the scores collected from them was calculated to determine the final visibility score for each parameter combination.

Figure 4 presents the main effect plots for each parameter, obtained from the visibility evaluation results of 37 participants under the Taguchi L18 experimental design. For each factor f and its level l, the main effect value

M_{f, l}

was computed as the average visibility score over all trials assigned to that level, as defined in Equation (8):

M_{f, l} = \frac{1}{n_{f, l}} \sum_{i : x_{i, f} = l} y_{i}

(8)

where

y_{i}

denotes the visibility score of the i-th experimental run,

x_{i, f}

indicates the level of factor f used in run i, and

n_{f, l}

is the number of runs in which factor f is set to level l. Each main effect plot connects

M_{f, 1}, M_{f, 2}, \dots

across levels, thereby visualizing how the mean visibility score changes as the parameter level varies. Consequently, the slope and spread of each curve provide an intuitive interpretation of both the direction (increasing or decreasing) and the magnitude of the influence exerted by each parameter on visibility.

First, regarding Grid/DepthDecimation, the highest visibility score was observed at Level 2. This implies that applying additional downsampling compared to the default value (Level 1) effectively reduces point cloud noise, thereby enhancing structural clarity. In the case of Grid/CellSize, the parameter exhibited a declining trend in visibility as the cell size increased. This suggests that configuring smaller voxel sizes results in a more precise reconstruction of surface boundaries and object shapes, thereby improving overall visual quality. Conversely, Mem/ImagePreDecimation yielded consistently higher visibility scores with the original resolution (Level 1) compared to the downsampled inputs (Levels 2–3). This indicates that a reduction in RGB image resolution leads to information loss, consequently degrading boundary representation and structural definition.

Grid/NoiseFilteringMinNeighbors and Grid/NoiseFilteringRadius exhibited peak visibility at Level 1 and Level 3, respectively. This suggests that excessive post-processing during the noise removal stage leads to an over-filtering phenomenon, wherein even valid information is inadvertently removed. Regarding Rtabmap/DetectionRate, a trend was observed where visibility improved as the parameter level increased. This improvement is attributed to the fact that extending the loop closure detection interval allowed for the more stable execution of internal data alignment and graph updates. Consequently, this resulted in enhanced global consistency and structural clarity of the 3D map.

Finally, among the update-related parameters, RGBD/AngularUpdate demonstrated peak visibility at Level 2. Due to the inherent characteristics of camera sensors, angular motion is significantly more prone to inducing motion blur than linear motion. Consequently, setting the rotational update threshold to 0.01 rad prevents the unnecessary superposition of blurred images. By projecting only a minimal set of valid keyframes, this configuration effectively prevents degradation in the quality of the 3D map. In contrast, as RGBD/LinearUpdate is relatively less susceptible to motion blur, it is advantageous to continuously accumulate sharp images onto the 3D map to enhance fine details. To further validate these findings, a Signal-to-Noise (S/N) ratio analysis was conducted, the results of which are illustrated in Figure 5. Notably, the S/N ratio analysis yielded results identical to those of the average response analysis, reinforcing the reliability of the identified optimal parameters. This consistency confirms that the selected levels not only maximize visibility but also ensure robustness against experimental noise.

In addition to the S/N ratio analysis, a non-parametric statistical analysis was performed to further examine whether the observed differences in visibility across parameter levels were statistically significant. For parameters with three levels, a Friedman test was applied, while a paired Wilcoxon signed-rank test was used for the two-level parameter. As summarized in Table 5, all selected RTAB-Map parameters exhibited statistically significant differences across levels after Holm correction (

p_{Holm} < 0.05

). These results provide quantitative statistical support for the trends identified through the average response and S/N ratio analyses, reinforcing the validity of the derived optimal parameter configuration.

Figure 6 presents a comparison of the 3D maps generated using the optimal parameter combination derived in this study versus those using the default settings. In the default settings, point cloud noise is relatively prevalent, and surface boundaries are indistinct, posing difficulties in interpreting the overall structure. Conversely, the results obtained with the optimized parameter combination demonstrate that the contours of boxes and shelf structures are represented with significantly greater definition. This signifies that the Taguchi-based visibility optimization performed in this study has substantially enhanced the structural clarity and visual quality of the 3D map, thereby enabling operators to perceive the environment more clearly. Based on these qualitative comparison results, the visibility level of the default 3D map was deemed insufficient for performing picking tasks that require precise control. Consequently, it was excluded from the conditions of the target object picking experiment.

4.3. Target Object Picking Using 3D-Map Experiment

Figure 7 illustrates the experimental setup designed to compare and validate variations in situation awareness and manipulation performance based on the type of information source utilized by the operator during target object picking tasks. In this study, a comparative analysis was conducted by performing the identical picking task under three distinct conditions: physical-world, RGB image, and VR-based 3D map. For the teleoperation interface in the RGB image and VR-based 3D map conditions, a previously researched IMU sensor-based method was utilized [49]. Figure 7a depicts the scenarios for direct visual observation of the physical environment and teleoperation via RGB image feeds. In the Physical-world condition, the operator is physically present in the workspace, allowing for the direct perception of box positions, spatial depth relationships, and shelf structures; this condition serves to establish the baseline performance for this study. Conversely, in the RGB image condition, the operator performs teleoperation relying solely on the visual feed restricted by the FOV of the camera, operating under visual constraints and a significant lack of depth information.

Figure 7b illustrates the VR-based teleoperation condition incorporating the optimized 3D map proposed in this study. Under this condition, operators perform teleoperation relying on the optimized 3D map and the kinematic motion of the UR5e, which is synchronized in real time with the physical environment. As shown in the magnified view on the right, the optimized 3D map provides intuitive visual information, including box contours, depth relationships, and shelving structures. This demonstrates that the proposed approach substantially mitigates the spatial perception challenges inherent in non-optimized 3D maps or standard RGB image feeds. The target object picking task was executed under these three experimental conditions with a total of 8 participants (4 males and 4 females).

Figure 8 illustrates the average task completion times for the picking operation across the eight participants, revealing distinct disparities among the three experimental conditions. A Friedman test confirmed that these variations in completion times were statistically significant (

χ^{2} (2) = 14.25

,

p < 0.001

), indicating a significant effect of the visual interface on task performance. First, under the Physical-world condition, the shortest completion time was recorded, averaging approximately 13.6 s. This efficiency is attributed to the participants’ ability to directly observe the physical environment, allowing for the immediate perception of spatial depth relationships between boxes, approach paths, and surrounding structures. Consequently, they were able to execute the task without latency, eliminating the need for redundant exploratory maneuvers. In contrast, under the RGB image condition, the average task time increased significantly to approximately 31.8 s. This delay resulted from factors such as the restricted FOV, the absence of depth information, and occlusions between boxes, which necessitated repetitive search and verification actions to accurately determine the position of the target object and the optimal approach trajectory.

Finally, the VR-based 3D map condition recorded an average completion time of approximately 23.3 s, demonstrating a significant reduction in task duration compared to the RGB image condition. This improvement is attributed to the inclusion of spatial structure and depth information in the 3D map, which enabled participants to intuitively perceive the target object and its surrounding layout, even in occluded regions such as the interior of shelves, where visibility is typically limited. This provision of intuitive spatial information minimized the need for redundant verification maneuvers during the teleoperation process, thereby contributing directly to the reduction in overall task time. Collectively, these experimental results demonstrate that the 3D map serves as a significantly more efficient and practical modality for teleoperation compared to conventional RGB image-based approaches. Furthermore, this study corroborates that the RTAB-Map parameter optimization performed herein translates into tangible improvements in real-world teleoperation performance, providing empirical evidence validating the efficacy of visibility optimization.

In addition to task completion time, a usability evaluation was conducted, as previously mentioned, incorporating the six NASA-TLX subscales and an additional immersion item to assess cognitive workload and immersion across the three experimental conditions. The evaluation results, illustrated in Figure 9, indicate that the RGB image condition exhibited a decline in scores ranging from approximately 25% to 38% across all items relative to the physical-world condition. These results suggest that the inherent difficulty in comprehensively grasping depth, spatial position, and structural information using solely RGB images was associated with repetitive visual verification by participants during the target object recognition and path determination processes. Such repetitive verification behaviors are consistent with higher operator-reported cognitive workload and perceived mental demand within the tested experimental setting. Specifically, Figure 10a (Performance) indicates that the RGB image condition resulted in lower median scores and greater variability compared to the physical-world condition, potentially reflecting inconsistent task execution. The most significant disparity occurred in Figure 10b (Effort), where the RGB condition’s score was 37.8% lower (median 2.5 vs. 4.5). This finding suggests increased perceived effort during task execution under FOV-limited conditions.

Conversely, the VR-based 3D map condition exhibited consistently higher scores across all evaluation metrics compared to the RGB image condition. This improvement appears to be associated with the enhanced visibility of box contours and shelf structures provided by the optimized 3D map, which may support a more intuitive interpretation of spatial relationships. By leveraging the structural clarity of the 3D map, participants reported being better able to perceive three-dimensional relationships between the target object and the surrounding environment within the virtual interface. Notably, the Performance subscale (Figure 10a) in the VR-based 3D map condition showed median values comparable to those observed in the physical-world condition, suggesting that perceived task execution consistency may approximate real-world performance under these specific conditions. The most pronounced change was observed in Figure 10b (Effort), where the median score increased from 2.5 (RGB) to 3.5, indicating a reduction in perceived effort relative to the RGB-based interface.

To statistically validate these observed differences, a non-parametric Friedman test was conducted, the results of which are summarized in Table 6. First, for Mental (Q1), Physical (Q2), and Temporal (Q3) demands, the p-values were relatively high (

p_{H o l m} > 0.3

), with score differences limited to less than 0.9. This lack of statistical significance suggests that the fundamental difficulty of the picking task itself remained consistent across all conditions, regardless of the interface. Similarly, regarding Performance (Q4), the analysis exhibited no statistically significant difference between the VR-based interface and the physical-world condition. Rather, this result implies that the proposed system enabled operators to achieve a level of task execution quality comparable to that of the physical environment, effectively overcoming the limitations typically associated with remote interfaces.

In contrast, the subscales more closely associated with interaction cost and user experience—specifically Effort (Q5), Frustration (Q6), and Immersion (Q7)—exhibited substantially lower p-values (

p_{H o l m} \leq 0.056

), with Immersion remaining statistically significant even after Holm–Bonferroni correction. This pattern suggests that while the inherent task demands were unchanged, the proposed VR-based 3D map interface effectively reduced the operator’s interaction burden and enhanced the immersive experience compared to the RGB image condition, which is inherently constrained by a limited field of view and the absence of explicit depth cues.

In summary, the subjective evaluation results indicate that improved map visibility through 3D mapping appears to be associated with enhanced teleoperation usability within the scope of the evaluated task. These findings suggest that RTAB-Map parameter optimization focused on visual clarity may contribute to an improved operator experience, as reflected in the reported reduction in perceived effort and interaction-related cognitive burden during remote picking tasks. This implies that the proposed optimization methodology provides tangible benefits for task execution, rather than merely enhancing the visual appearance of the map.

5. Discussion

The comparative experimental results of this study indicate that the proposed VR-based optimized 3D map interface was associated with shorter task completion times compared to conventional 2D RGB image-based teleoperation. In contrast to the RGB condition, which appeared to necessitate additional exploration and verification due to a limited FOV and a lack of depth information, the optimized 3D map provided clearer spatial structural cues. Consequently, task performance in the VR-based environment was observed to approach, although not fully match, the baseline efficiency established in the physical-world condition.

Furthermore, qualitative analysis based on NASA-TLX and immersion-related evaluations suggests that the enhanced visibility of the optimized 3D map was associated with a reduction in operator-reported cognitive demand during the evaluated teleoperation tasks. By providing clearer visual representations of spatial structure, the 3D map appeared to alleviate the perceived need for explicit mental reconstruction of three-dimensional relationships from two-dimensional imagery. These observations support the perspective that performance evaluation criteria for SLAM-based teleoperation systems may benefit from extending beyond robot-centric localization accuracy to include user-centered visual quality considerations from a Human–Robot Interaction (HRI) standpoint.

Limitations

This study is limited by the fact that 3D map generation experiments were conducted in a controlled, static environment to independently analyze the impact of parameter variations on visibility. In actual operational environments, various disturbances such as odometry drift from mobile manipulator movement, the appearance of dynamic obstacles, and lighting fluctuations can interact in complex ways. Consequently, the fact that these factors’ comprehensive influence on 3D map visibility and teleoperation performance was not fully accounted for remains a limitation of this research.

Furthermore, while the selected RTAB-Map parameters are continuous in nature, this study evaluated them at a limited number of discrete levels (two or three) to adhere to the Taguchi

L_{18}

orthogonal array. Although this approach is highly efficient for identifying main effects and screening key factors, it may not capture the fine-grained nuances of the response surface between these levels. In contrast, methods such as response surface methodology could potentially identify a more finely tuned optimum by modeling the continuous interaction of parameters. Therefore, the optimal combination derived in this study should be viewed as the most effective configuration among the tested levels, and further refinement using higher-order modeling remains a potential area for optimization.

In addition, the subjective usability evaluation was conducted with a relatively limited number of participants, which may constrain the statistical power of the non-parametric tests applied in this study. While significant trends were observed for interaction-related subscales such as Effort, Frustration, and Immersion, the limited sample size may have reduced sensitivity for detecting more subtle differences in other workload dimensions. Future studies with a larger and more diverse participant pool will be necessary to further strengthen the statistical robustness of the subjective evaluation and to validate the generalizability of the observed trends.

6. Conclusions

This study proposed a methodology to systematically optimize the visibility of 3D maps within VR interfaces to enhance the efficiency and accuracy of teleoperation in complex and unstructured environments, such as logistics warehouses. By analyzing the data processing pipeline of RTAB-Map, we identified eight key control factors that determine visual quality and derived the optimal parameter combination using the Taguchi method of experimental design.

In contrast to existing research that focuses primarily on enhancing geometric precision, this study emphasized the control of point cloud density and noise levels to create a visual environment where operators can clearly perceive object contours and depth information. Through experimental validation, we indicated that the optimized 3D map not only can contribute to reducing task time completion time but also demonstrated a measurable reduction in operator-reported cognitive workload.

In future research, we plan to extend this work toward dynamic 3D mapping and real-time rendering optimization to better accommodate environmental changes, localization errors, and odometry drift encountered during robot navigation. To further validate the user-centric benefits observed in this study, future experiments will involve a larger and more diverse participant pool to strengthen the statistical robustness of subjective usability evaluations and to assess the generalizability of the results. Furthermore, higher-order optimization techniques, such as response surface methodology, will be explored to identify more finely tuned optima within the continuous RTAB-Map parameter space, moving beyond the discrete levels considered in this study.

Author Contributions

Conceptualization, H.Y., H.C. and D.L.; methodology, H.Y., H.C. and D.L.; software, H.Y., H.C. and J.J.; validation, H.Y., H.C. and J.J.; formal analysis, H.Y., H.C. and D.L.; investigation, D.L.; resources, D.L.; data curation, H.Y. and H.C.; writing—original draft preparation, H.Y., H.C., J.J. and D.L.; writing—review and editing, H.Y., H.C. and D.L.; visualization, H.Y., H.C. and D.L.; supervision, D.L.; project administration, D.L.; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Institute for Advancement of Technology (KIAT) grant funded by the Korean Government (Ministry of Trade, Industry and Energy, MOTIE) (P0017123, HRD Program for Industrial Innovation); by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (Ministry of Science and ICT, MSIT) (RS-2025-00563228); and by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean Government (MSIT) under the Innovative Human Resource Development for Local Intellectualization program (IITP-2026-RS-2022-00156360).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Phan, V.D.; Mai, T.A.; Ho, S.P.; Dang, T.S.; Le, V.C.; Dinh, V.N. Development of an Adaptive Integral Terminal Sliding Mode Tracking Control for a Wheeled Mobile Robot with Time Delay Estimation. Int. J. Control Autom. Syst. 2025, 23, 2399–2410. [Google Scholar] [CrossRef]
Liu, H.; Nasiriany, S.; Zhang, L.; Bao, Z.; Zhu, Y. Robot learning on the job: Human-in-the-loop autonomy and learning during deployment. Int. J. Rob. Res. 2025, 44, 1727–1742. [Google Scholar] [CrossRef]
Frijns, H.A.; Hirschmanner, M.; Sienkiewicz, B.; Hönig, P.; Indurkhya, B.; Vincze, M. Human-in-the-loop error detection in an object organization task with a social robot. Front. Robot. AI 2024, 11, 1356827. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Liang, B.; Huang, J.; Zhang, I.; Abbeel, P.; Tomizuka, M. Global-local interface for on-demand teleoperation. arXiv 2025, arXiv:2502.09960. [Google Scholar]
Martinetti, A.; Chemweno, P.K.; Nizamis, K.; Fosch-Villaronga, E. Redefining safety in light of human-robot interaction: A critical review of current standards and regulations. Front. Chem. Eng. 2021, 3, 666237. [Google Scholar] [CrossRef]
Rosa-Garcia, A.D.L.; Marrufo, A.I.S.; Luviano-Cruz, D.; Rodriguez-Ramirez, A.; Garcia-Luna, F. Bridging Remote Operations and Augmented Reality: An Analysis of Current Trends. IEEE Access 2025, 13, 36502–36526. [Google Scholar] [CrossRef]
Wan, K.; Li, C.; Lo, F.S.; Zheng, P. A virtual reality-based immersive teleoperation system for remote human-robot collaborative manufacturing. Manuf. Lett. 2024, 41, 43–50. [Google Scholar] [CrossRef]
Kasno, M.A.; Yahaya, I.N.; Jung, J.-W. Affordable 3D Orientation Visualization Solution for Working Class Remotely Operated Vehicles (ROV). Sensors 2024, 24, 5097. [Google Scholar] [CrossRef]
Su, Y.; Chen, X.; Zhou, T.; Pretty, C.; Chase, G. Mixed reality-integrated 3D/2D vision mapping for intuitive teleoperation of mobile manipulator. Robot. Comput. Integr. Manuf. 2022, 77, 102332. [Google Scholar] [CrossRef]
Kuo, C.Y.; Huang, C.C.; Tsai, C.H.; Shi, Y.S.; Smith, S. Development of an immersive SLAM-based VR system for teleoperation of a mobile manipulator in an unknown environment. Comput. Ind. 2021, 132, 103502. [Google Scholar] [CrossRef]
Naceri, A.; Mazzanti, D.; Bimbo, J.; Tefera, Y.T.; Prattichizzo, D.; Caldwell, D.G.; Mattos, L.S.; Deshpande, N. The vicarios virtual reality interface for remote robotic teleoperation: Teleporting for intuitive tele-manipulation. J. Intell. Robot. Syst. 2021, 101, 80. [Google Scholar] [CrossRef]
Prunella, M.; Altini, N.; D’Alessandro, R.; Schirizzi, A.; Ricci, A.D.; Lotesoriere, C.; Scarabaggio, P.; Carli, R.; Dotoli, M.; Giannelli, G.; et al. Pharmacometric and Digital Twin Modeling for Adaptive Scheduling of Combination Therapy in Advanced Gastric Cancer. Comput. Methods Programs Biomed. 2025, 270, 108919. [Google Scholar] [CrossRef]
Gallipoli, M.; Buonocore, S.; Selvaggio, M.; Fontanelli, G.A.; Grazioso, S.; Di Gironimo, G. A virtual reality-based dual-mode robot teleoperation architecture. Robotica 2024, 42, 1935–1958. [Google Scholar] [CrossRef]
Ferrari, A.; Zenezini, G.; Rafele, C.; Carlin, A. A Roadmap towards an Automated Warehouse Digital Twin: Current implementations and future developments. IFAC-PapersOnLine 2022, 55, 1899–1905. [Google Scholar] [CrossRef]
Huang, X.; Chen, X.; Zhang, N.; He, H.; Feng, S. ADM-SLAM: Accurate and Fast Dynamic Visual SLAM with Adaptive Feature Point Extraction, Deeplabv3pro, and Multi-View Geometry. Sensors 2024, 24, 3578. [Google Scholar] [CrossRef] [PubMed]
Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021, 34, 16558–16569. [Google Scholar]
Sumikura, S.; Shibuya, M.; Sakurada, K. OpenVSLAM: A versatile visual SLAM framework. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar]
Vedadi, A.; Yousefi-Koma, A.; Yazdankhah, P.; Mozayyan, A. Comparative evaluation of rgb-d slam methods for humanoid robot localization and mapping. In Proceedings of the 11th RSI International Conference on Robotics and Mechatronics (ICRoM), Tehran, Iran, 19–21 December 2023. [Google Scholar]
Trejos, K.; Rincón, L.; Bolaños, M.; Fallas, J.; Marín, L. 2D SLAM algorithms characterization, calibration, and comparison considering pose error, map accuracy as well as CPU and memory usage. Sensors 2022, 22, 6903. [Google Scholar] [CrossRef]
Ferreira, F.C.; Zaera, M.; Karfakis, P.T.; Couceiro, M.S. Tailoring 3d mapping frameworks for field robotics. In Proceedings of the ICRA 2022 Workshop in Innovation in Forestry Robotics: Research and Industry Adoption, Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
Lin, C.-J.; Peng, C.-C.; Lu, S.-Y. Real-Time Localization for an AMR Based on RTAB-MAP. Actuators 2025, 14, 117. [Google Scholar] [CrossRef]
Lee, Y.; Kim, H.; Ji, H.; Heo, J.; Lee, Y.; Kang, J.; Lee, J.; Lee, D. Human-in-the-Loop Gaussian Splatting for Robotic Teleoperation. IEEE Robot. Autom. Lett. 2025, 11, 105–112. [Google Scholar] [CrossRef]
Jin, Y.; Soto, D.A.P.; Rossiter, J.A.; Veres, S.M. Advanced environment modelling for remote teleoperation to improve operator experience. In Proceedings of the International Conference on Artificial Intelligence and Its Applications, Virtual, 9–10 December 2021. [Google Scholar]
Page, I.; Susbielle, P.; Aycard, O.; Wieber, P.B. Real-time Photorealistic Mapping for Situational Awareness in Robot Teleoperation. arXiv 2025, arXiv:2509.06433. [Google Scholar] [CrossRef]
Doisy, G.; Ronen, A.; Edan, Y. Comparison of Three Different Techniques for Camera and Motion Control of a Teleoperated Robot. Appl. Ergon. 2017, 58, 527–534. [Google Scholar] [CrossRef]
Liu, X.; Ye, J.; Xue, Y.; Sheng, Y.; Wang, Y.; Lin, J.; Song, Z.; Chen, J.; Xi, N. Optimal Viewpoints Selection for Monitoring Teleoperators Based on Event Artificial Potential Method. In Proceedings of the 2024 WRC Symposium on Advanced Robotics and Automation (WRC SARA), Beijing, China, 23 August 2024; pp. 28–33. [Google Scholar]
Shao, S.; Zhou, Q.; Liu, Z. Mental Workload Characteristics of Manipulator Teleoperators with Different Spatial Cognitive Abilities. Int. J. Adv. Robot. Syst. 2019, 16, 1729881419888042. [Google Scholar] [CrossRef]
Jung, Y.; Han, K.; Bae, J. A Tele-Operated Display with a Predictive Display Algorithm. IEEE Access 2019, 7, 154447–154456. [Google Scholar] [CrossRef]
Dybvik, H.; Løland, M.; Gerstenberg, A.; Slåttsveen, K.B.; Steinert, M. A Low-Cost Predictive Display for Teleoperation: Investigating Effects on Human Performance and Workload. Int. J. Hum.-Comput. Stud. 2021, 145, 102536. [Google Scholar] [CrossRef]
Alexopoulos, A.; Bompotas, A.; Kalogeropoulos, N.R.; Kechagias, P.; Kalogeras, A.P.; Alexakos, C. Digital Twin Based Automatic Reconfiguration of Robotic Systems in Smart Environments. arXiv 2025, arXiv:2511.00094. [Google Scholar]
Mori, Y.; Wada, M. Development of 3D Digital-Twin System for Remote-Control Operation in Construction. In Proceedings of the 41st International Symposium on Automation and Robotics in Construction (ISARC), Lille, France, 3–5 June 2024; pp. 958–964. [Google Scholar]
Hroob, I.; Polvara, R.; Molina, S.; Cielniak, G.; Hanheide, M. Benchmark of visual and 3D lidar SLAM systems in simulation environment for vineyards. In Annual Conference Towards Autonomous Robotic Systems (TAROS); Springer: Cham, Switzerland, 2021. [Google Scholar]
Jamaludin, A.; Yatim, N.M.; Masuan, N.A.; Adam, N.M.; Suarin, N.A.S.; Sulong, P.M.; Radzi, S.S.M. SLAM Performance Evaluation on Uneven Terrain Under Varying Illuminance Conditions and Trajectory Lengths. IEEE Access 2025, 13, 4642–46447. [Google Scholar] [CrossRef]
Nagarajan, N.; Zhang, H.; Liu, W.; Gu, J. Multi-Objective Optimization of RTAB-Map parameters using Genetic Algorithm for indoor 2D SLAM. Procedia Comput. Sci. 2024, 250, 172–181. [Google Scholar] [CrossRef]
Yun, H.; Yoo, S. Semantic Zone Based 3D Map Management for Mobile Robot. arXiv 2025, arXiv:2512.12228. [Google Scholar] [CrossRef]
Stedman, H.; Kocer, B.B.; Kovac, M.; Pawar, V.M. VRTAB-Map: A Configurable Immersive Teleoperation Framework with Online 3D Reconstruction. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Singapore, 17–21 October 2022; pp. 104–110. [Google Scholar]
Stotko, P.; Krumpen, S.; Schwarz, M.; Lenz, C.; Behnke, S.; Klein, R.; Weinmann, M. A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 3630–3637. [Google Scholar]
Palazzo, L.; Suglia, V.; Grieco, S.; Buongiorno, D.; Pagano, G.; Bevilacqua, V.; D’Addio, G. Optimized Deep Learning-Based Pathological Gait Recognition Explored Through Network Analysis of Inertial Data. In Proceedings of the 2025 IEEE Medical Measurements & Applications Conference (MeMeA), Chania, Greece, 28–30 May 2025; pp. 1–5. [Google Scholar]
Suglia, V.; Brunetti, A.; Pasquini, G.; Caputo, M.; Marvulli, T.M.; Sibilano, E.; Della Bella, S.; Carrozza, P.; Beni, C.; Naso, D.; et al. A Serious Game for the Assessment of Visuomotor Adaptation Capabilities during Locomotion Tasks Employing an Embodied Avatar in Virtual Reality. Sensors 2023, 23, 5017. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation. IEEE Trans. Robot. 2013, 29, 734–745. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. Memory Management for Real-Time Appearance-Based Loop Closure Detection. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, 25–30 September 2011; pp. 1271–1276. [Google Scholar]
García, A.; Javier-López, J.; Monsalve-Serrano, J.; Iñiguez, E. Optimization Methodology Combining Taguchi Design and Response Surface Method to Maximize a Compression Ignition Engine Efficiency Fueled with Oxygenated Synthetic Fuel. Fuel 2025, 381, 133372. [Google Scholar] [CrossRef]
Jiang, J.; Legrand, D.; Severn, R.; Miikkulainen, R. A Comparison of the Taguchi Method and Evolutionary Optimization in Multivariate Testing. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar]
Prokhorov, D.; Zhukov, D.; Barinova, O.; Konushin, A.; Vorontsova, A. Measuring Robustness of Visual SLAM. In Proceedings of the 2019 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 27–31 May 2019; pp. 1–6. [Google Scholar]
Agustinah, T.; Nugraha, Y.E.; Nurhadi, A.R.; Maynad, V.C. Dynamic Mapping and 3D Perception Using Voxel Grid and Modified Artificial Potential Fields for Indoor Locomotion. IEEE Access 2025, 13, 71288–71305. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar]
Noh, D.; Choi, H.; Jeon, H.; Kim, T.; Lee, D. Upper Extremity Motion-Based Telemanipulation with Component-Wise Rescaling of Spatial Twist and Parameter-Invariant Skeletal Kinematics. Mathematics 2024, 12, 358. [Google Scholar] [CrossRef]

Figure 1. RTAB-Map Pipeline for RGB-D-Based 3D Map Generation with Hierarchical Memory and Loop Closure.

Figure 2. Overview of the Taguchi-based 3D map generation procedure.

Figure 3. Experimental Setup for 3D Map Generation and Target Object Picking Using a UR5e-Based Mobile Manipulator with an RGB-D Camera and Gripper.

Figure 4. Main Effect Plots Illustrating the Influence of RTAB-Map Parameters on 3D Map Viewability.

Figure 5. Signal-to-Noise Ratio Plots Illustrating the Influence of RTAB-Map Parameters on 3D Map Viewability.

Figure 6. Comparison of 3D Maps Before and After Taguchi-Based RTAB-Map Parameter Optimization.

Figure 7. Comparison of Perceptual Information Sources for Target Object Picking: (a) Physical-World and RGB Image–Based Observation, (b) VR-Based 3D Map.

Figure 8. Comparison of Average Task Completion Time for Target Object Picking Across Different Perceptual Information Sources.

Figure 9. Subjective Evaluation Results Comparing Teleoperation Interfaces across Three Experimental Conditions.

Figure 10. Box plot comparison of subjective evaluation scores for selected NASA-TLX items: (a) Performance, (b) Effort.

Table 1. RTAB-Map Parameters Categorized by Processing Stage and Functional Group.

Stage	Group	Parameter
Node generation	Node generation criteria	RGBD/LinearUpdate
		RGBD/AngularUpdate
		RGBD/ProximityBySpace
	Input feature quality	Mem/ImagePreDecimation
		Kp/MaxFeatures
		Kp/DetectorStrategy
Map generation	Resolution density	Grid/CellSize
		Grid/DepthDecimation
		Grid/RangeMax
	Noise removal	Grid/NoiseFilteringMinNeighbors
		Grid/NoiseFilteringRadius
		Grid/NormalsK

Table 2. Selected RTAB-Map Parameters for Viewability Optimization and Their Descriptions.

Stage	Parameter	Description
Node generation	RGBD/LinearUpdate	Node generation based on distance moved.
	RGBD/AngularUpdate	Node generation based on angle rotated.
	Mem/ImagePreDecimation	Downsampling rate for input image resolution.
Loop & Proximity	Rtabmap/DetectionRate	Frequency of loop closure detection.
Map generation	Grid/CellSize	Resolution of the occupancy grid.
	Grid/DepthDecimation	Ratio to downsample depth images.
	Grid/NoiseFilteringMinNeighbors	Minimum neighbors for a point to be valid.
	Grid/NoiseFilteringRadius	Radius size for neighbor searching.

Table 3. Comparison of Real-Time Processing Performance according to Decimation Rate.

DepthDecimation	ImagePreDecimation	DetectionRate (Hz)	Process Time (s)
2	4	1	0.5
4	4	1	0.1

Table 4. Parameter Levels for the Taguchi-Based Viewability Optimization of RTAB-Map.

	DepthDecimation (Rate)	CellSize (m)	ImagePreDecimation (Rate)	NoiseFilteringMinNeighbors (No.)
Level1	4	0.002	1	9
Level2	5	0.004	2	5
Level3	-	0.008	4	1
Default	4	0.05	1	5
	NoiseFilteringRadius (m)	DetectionRate (Hz)	AngularUpdate (rad)	LinearUpdate (m)
Level1	0.2	0.5	0.1	0.1
Level2	0.1	1.0	0.01	0.01
Level3	0.0	2.0	0.0	0.0
Default	0.0	1.0	0.1	0.1

Table 5. Statistical Comparison of User Viewability Ratings across RTAB-Map Parameter Levels using Friedman and Wilcoxon Tests with Holm–Bonferroni Correction.

Parameter	Levels	Test	df	Statistic	p (Raw)	p (Holm)
DepthDecimation	2	Wilcoxon	1	$W = 3.0$	$2.13 \times 10^{- 4}$	$2.13 \times 10^{- 4}$
CellSize	3	Friedman	2	$χ^{2} = 29.51$	$3.90 \times 10^{- 7}$	$2.73 \times 10^{- 6}$
ImagePreDecimation	3	Friedman	2	$χ^{2} = 18.74$	$8.53 \times 10^{- 5}$	$2.00 \times 10^{- 4}$
NoiseFilteringMinNeighbors	3	Friedman	2	$χ^{2} = 32.72$	$7.85 \times 10^{- 8}$	$6.28 \times 10^{- 7}$
NoiseFilteringRadius	3	Friedman	2	$χ^{2} = 26.16$	$2.09 \times 10^{- 6}$	$1.25 \times 10^{- 5}$
DetectionRate	3	Friedman	2	$χ^{2} = 19.23$	$6.68 \times 10^{- 5}$	$2.00 \times 10^{- 4}$
AngularUpdate	3	Friedman	2	$χ^{2} = 24.03$	$6.06 \times 10^{- 6}$	$2.42 \times 10^{- 5}$
LinearUpdate	3	Friedman	2	$χ^{2} = 25.58$	$2.79 \times 10^{- 6}$	$1.39 \times 10^{- 5}$

Table 6. Statistical Comparison of Subjective Workload and Immersion Metrics across Teleoperation Interfaces.

Question	Scale	Test	df	Statistic	p (Raw)	p (Holm)
Q1 (Mental)	Likert (1–5)	Friedman	2	$χ^{2} = 3.93$	$1.40 \times 10^{- 1}$	$3.87 \times 10^{- 1}$
Q2 (Physical)	Likert (1–5)	Friedman	2	$χ^{2} = 5.20$	$7.43 \times 10^{- 2}$	$2.97 \times 10^{- 1}$
Q3 (Temporal)	Likert (1–5)	Friedman	2	$χ^{2} = 4.10$	$1.29 \times 10^{- 1}$	$3.87 \times 10^{- 1}$
Q4 (Performance)	Likert (1–5)	Friedman	2	$χ^{2} = 3.91$	$1.42 \times 10^{- 1}$	$3.87 \times 10^{- 1}$
Q5 (Effort)	Likert (1–5)	Friedman	2	$χ^{2} = 8.96$	$1.13 \times 10^{- 2}$	$5.66 \times 10^{- 2}$
Q6 (Frustration)	Likert (1–5)	Friedman	2	$χ^{2} = 9.36$	$9.26 \times 10^{- 3}$	$5.56 \times 10^{- 2}$
Q7 (Immersion)	Likert (1–5)	Friedman	2	$χ^{2} = 14.21$	$8.19 \times 10^{- 4}$	$5.57 \times 10^{- 3}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yoon, H.; Choi, H.; Jeong, J.; Lee, D. Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach. Mathematics 2026, 14, 579. https://doi.org/10.3390/math14030579

AMA Style

Yoon H, Choi H, Jeong J, Lee D. Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach. Mathematics. 2026; 14(3):579. https://doi.org/10.3390/math14030579

Chicago/Turabian Style

Yoon, Hojin, Haegyeom Choi, Jaehoon Jeong, and Donghun Lee. 2026. "Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach" Mathematics 14, no. 3: 579. https://doi.org/10.3390/math14030579

APA Style

Yoon, H., Choi, H., Jeong, J., & Lee, D. (2026). Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach. Mathematics, 14(3), 579. https://doi.org/10.3390/math14030579

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing RTAB-Map Viewability to Reduce Cognitive Workload in VR Teleoperation: A User-Centric Approach

Abstract

1. Introduction

2. Related Works

2.1. 2D Video-Based Teleoperation Systems

2.2. 3D Video-Based Teleoperation Systems

2.3. RTAB-Map-Based Teleoperation Systems

3. Materials and Methods

3.1. RTAB-Map Processing Pipeline Overview

3.2. Selection of Viewability-Related RTAB-Map Parameters

3.3. Taguchi Experimental Design for Viewability Optimization

3.4. Subjective Workload and Immersion Assessment

4. Results

4.1. Experiment Setup

4.2. Taguchi-Based Experimental Design for Viewability Optimization

4.3. Target Object Picking Using 3D-Map Experiment

5. Discussion

Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI