An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems

Batubulan, Kadek Suarjuna; Funabiki, Nobuo; Kotama, I Nyoman Darma; Brata, Komang Candra; Pradhana, Anak Agung Surya

doi:10.3390/app16031499

Open AccessArticle

An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems

by

Kadek Suarjuna Batubulan

¹

,

Nobuo Funabiki

^1,*,

I Nyoman Darma Kotama

¹

,

Komang Candra Brata

^1,2

and

Anak Agung Surya Pradhana

¹

Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan

²

Department of Informatics Engineering, Universitas Brawijaya, Malang 65145, Indonesia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1499; https://doi.org/10.3390/app16031499

Submission received: 10 January 2026 / Revised: 28 January 2026 / Accepted: 30 January 2026 / Published: 2 February 2026

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

At present, pedestrian navigation systems using smartphones have become common in daily activities. For their ubiquitous, accurate, and reliable services, map information collection is essential for constructing comprehensive spatial databases. Previously, we have developed a map information collection tool to extract building information using Google Maps, optical character recognition (OCR), geolocation, and web scraping with smartphones. However, indoor navigation often suffers from inaccurate localization due to degraded GPS signals inside buildings and Simultaneous Localization and Mapping (SLAM) estimation errors, causing position errors and confusing augmented reality (AR) guidance. In this paper, we present an improved map information collection tool to address this problem. It captures 360° panoramic images to build 3D models, apply photogrammetry-based mesh reconstruction to correct geometry, and georeference point clouds to refine latitude–longitude coordinates. For evaluations, experiments in various indoor scenarios were conducted. The results demonstrate that the proposed method effectively mitigates positional errors with an average drift correction of 3.15 m, calculated via the Haversine formula. Geometric validation using point cloud analysis showed high registration accuracy, which translated to a 100% task completion rate and an average navigation time of 124.5 s among participants. Furthermore, usability testing using the System Usability Scale (SUS) yielded an average score of 96.5, categorizing the user interface as ’Best Imaginable’. These quantitative findings substantiate that the integration of 360° imaging and photogrammetric correction significantly enhances navigation reliability and user satisfaction compared with previous sensor fusion approaches.

Keywords:

map information collection; indoor navigation; 360° panoramic image; photogrammetry; georeference; SLAM; augmented reality

1. Introduction

At present, pedestrian navigation systems using smartphones have evolved into an integral component of daily life, offering convenient and rapid access to location and routing information. The demand for these services continues to grow, particularly within urban areas where environments are characterized by complex spatial layouts [1,2]. A comprehensive and well-maintained map database is essential for providing seamless, accurate, and reliable navigation services. Consequently, the effective collection of geospatial information plays a critical role in the development of the high-quality spatial datasets required for this purpose [3,4]. In turn, these foundational data directly enable navigation systems to provide more precise route guidance and enhanced localization performance to users [5,6].

To meet the specialized demands of indoor environments, we have previously developed a four-step map information collection pipeline for pedestrian navigation systems. This process begins with the manual capture of building and room images, followed by the extraction of textual information using Google ML Kit Optical Character Recognition (OCR). Subsequently, web scraping with Scrapy and crawling via Apache Nutch are employed to enrich room-level and occupant-level metadata. Finally, all collected information is stored in a centralized database for system integration [7].

This approach contrasts with our established outdoor navigation methods, which commonly utilize Visual Simultaneous Localization and Mapping (VSLAM) combined with Google Street View to provide augmented reality (AR)-based guidance [8,9,10], Conversely, our dedicated indoor systems leverage smartphone-based SLAM and object recognition to improve positioning accuracy [11,12]. While both paradigms rely on visual data, they are fundamentally designed for operations in distinct environments.

Indoor navigation systems continue to face persistent technical challenges. A primary issue is the degradation of Global Positioning System (GPS) signals within buildings, which forces a reliance on alternative methods like SLAM. This reliance, in turn, introduces the problem of accumulated SLAM drift over time. Consequently, these factors lead to poor landmark-level localization and cause errors in AR guidance, especially when the geometric representation of the navigation model does not precisely match the physical environment. Such inaccuracies can result in users receiving incorrect directions, which ultimately undermines trust in AR-based navigation systems [13,14,15].

In this paper, to address these limitations, we present an improved map information collection tool by introducing a focused and automated pipeline designed to enhance indoor map fidelity and AR guidance. It reconstructs building geometry in three dimensions and refines geographic coordinates. The primary contributions of this work are as follows: (1) an automated data collection workflow utilizing an Insta360 X5 camera to acquire high-resolution 360° panoramic images; (2) an initial 3D reconstruction process implemented with CupixVista, an AI-powered platform that leverages cloud-based photogrammetry to generate spatial digital twins; (3) photogrammetry-based mesh reconstruction and refinement using MeshLab, an open-source system for processing and editing unstructured 3D triangular meshes to correct geometric distortions and remove artifacts; (4) point cloud georeferencing and registration via CloudCompare (Version 2.14 (Alpha)), a specialized 3D point cloud processing software designed for high-precision model alignment and coordinate transformation to align the reconstructed models with real-world coordinates; and (5) integration of the corrected and georeferenced 3D models into an obstacle-aware routing and AR overlay pipeline. This final stage employs ARCore for rendering and spatial anchor management, thereby establishing a robust framework for accurate pedestrian guidance.

To evaluate the proposed pipeline, we conducted experiments across multiple indoor scenarios within Engineering Building No. 2 at Okayama University, Japan. The assessment employed a dual-focus methodology, where spatial precision was quantitatively measured using the Haversine distance for geolocation accuracy, the geometric Root Mean Square Error (RMSE) of the reconstructed 3D models, and AR drift metrics [16,17]. Concurrently, user-centered performance was evaluated based on task completion rates, completion times, and the System Usability Scale (SUS) [18]. The collective experimental results demonstrate a marked improvement in navigation accuracy and a significant increase in user satisfaction when compared to the performance of the previous map information collection tool.

The remainder of this paper is structured as follows. Section 2 provides a review of related work in indoor navigation and 3D reconstruction. Section 3 details the design of the proposed pipeline. Section 4 elaborates on its implementation. Section 5 presents experimental results for evaluations. Section 6 concludes the paper and discusses potential directions for future research.

2. Related Works

In this section, we introduce relevant works in the literature.

2.1. Map Information Collection and Indoor Mapping

In [19], Wang et al. presented a 3D semantic mapping system that integrates ORB-SLAM2 with multi-object tracking for operation in dynamic indoor environments. Although such conventional SLAM-based methods perform effectively in dynamic settings, they remain fundamentally limited by a narrow field of view. Conversely, the adoption of 360° panoramic imagery in our study enables complete scene coverage, which mitigates tracking loss in areas with limited texture or visually repetitive patterns indoors.

In [20], Tsiamitros et al. addressed dynamic crowd analytics by utilizing Wi-Fi signals to infer indoor occupancy levels. While effective for population-level monitoring, their method relies on Received Signal Strength Indicators (RSSI), which do not provide visual context or detailed geometric data. As a result, the sensing data are inadequate for generating the high-resolution 3D spatial databases necessary for accurate and detailed indoor navigation.

In [21], Wan et al. introduced a map-assisted localization framework that fuses crowd-sensed inertial data with Wi-Fi fingerprinting via deep learning. Nevertheless, its dependence on low-cost inertial sensors and two-dimensional floor plans constrains geometric fidelity. The absence of detailed three-dimensional structure renders the approach unsuitable for applications requiring drift-free Augmented Reality (AR) guidance, which depends on accurate visual reconstruction and metric consistency.

2.2. 360° Panoramic Capture and Photogrammetry

In [22], Kafataris et al. demonstrated that consumer-grade 360° cameras can achieve geometric accuracy comparable to Terrestrial Laser Scanning (TLS) for offline heritage documentation. While significant for static digital preservation, their work does not explore the integration of such high-fidelity data into real-time immersive or navigation-oriented systems. In contrast, our approach utilizes 360° panoramic imagery not only for environmental modeling, but also for active geometric correction and the mitigation of Augmented Reality (AR) drift.

In [23], Yang et al. presented the DSSAC-RANSAC algorithm to reduce dynamic object mismatches in visual SLAM, consequently lowering pose estimation errors. Although this enhances tracking robustness, SLAM-based frameworks remain inherently constrained in delivering the global geometric fidelity necessary for stable AR alignment. Conversely, our method employs panoramic image-based photogrammetry to ensure comprehensive scene coverage and to produce metrically accurate 3D models, thereby enabling reliable and persistent AR registration.

In [24], Vacca et al. reported that 360° panoramic photogrammetry offers an efficient and cost-effective alternative to TLS for surveying complex indoor heritage environments. However, their application remains confined to static geometric documentation. By contrast, our study adapts 360° photogrammetry for the collection of indoor mapping information in real-time navigation scenarios, which offers further optimizations for precise georeferencing and the mitigation of Augmented Reality (AR) drift.

2.3. 3D Reconstruction and Mesh Processing

In [25], Liu et al. presented Spatial/Spectral-Frequency Adaptive Network (SSFAN), a method designed to reconstruct hyperspectral 3D information from compressed two-dimensional inputs by leveraging frequency-domain features. This technique prioritizes the spectral fidelity of dynamic objects, with less emphasis on geometric precision. In contrast, our methodology is centered on photogrammetric mesh reconstruction from 360° imagery, which is specifically aimed at recovering dense, metrically accurate surface geometry—a critical requirement for robust indoor mapping and navigation applications.

In [26], Zhu et al. developed a deep learning framework to reconstruct eddy-resolving three-dimensional physical fields from satellite data. Although this approach is effective for modeling dynamic environmental changes, it does not address the extraction of static architectural geometry. Conversely, our system is specifically designed to derive precise spatial coordinates and structural details from panoramic imagery to construct reliable, metric-accurate models for indoor navigation.

In [27], Lv et al. introduced a voxel-based optimization method for generating high-fidelity meshes from noisy point clouds. While this approach attains a high degree of accuracy, its computational complexity limits its use to offline processing workflows. In contrast, our pipeline employs a streamlined combination of mesh decimation, noise filtering, and geometric cleanup to produce lightweight, yet accurate models that are suitable for real-time georeferencing and seamless system integration.

2.4. Point Cloud Registration and Georeferencing

In [28], Tresnawati et al. presented a museum AR navigation system that employs Bluetooth Low Energy (BLE) beacons and ESP32 microcontrollers to trigger location-based content. While effective for fixed points of interest, this hardware-dependent approach escalates deployment and maintenance overheads. Conversely, our method implements an infrastructure-free visual localization strategy, relying exclusively on smartphone cameras and thereby obviating the need for any external sensor deployment.

In [29], Bash et al. proposed a hierarchical registration method based on a modified Iterative Closest Point (ICP) algorithm to align terrestrial point clouds without physical Ground Control Points (GCPs). However, their method relies on a pre-existing reference model, an assumption often impractical in dynamic indoor settings. In contrast, our system autonomously reconstructs and georeferences the indoor environment using 360° photogrammetry, thereby eliminating dependence on external reference scans.

In [30], Kong et al. introduced a cost-effective rigid registration method for aligning multi-temporal point clouds without GCPs or Global Navigation Satellite System (GNSS) data. While suitable for relative change detection, this approach lacks the absolute spatial georeferencing required for location-based services. Our pipeline explicitly establishes a global coordinate frame, providing the geospatial accuracy essential for robust AR navigation.

In [31], Brightman et al. presented a deep learning framework to reconstruct room layouts and object placements from a single 360° image. Although it is efficient, such single-view methods heavily depend on learned priors and often lack the metric precision required for navigation. Conversely, our method leverages photogrammetry on panoramic image sequences to explicitly compute depth and geometry, thereby ensuring the high-fidelity accuracy necessary for detailed indoor navigation.

2.5. AR-Based Indoor Navigation and Localization

In [32], Elghayesh et al. proposed a self-contained mobile AR navigation system that relies on on-device localization utilizing heuristic algorithms and ORB visual features. While this architecture improves user privacy and minimizes reliance on network connectivity, its scalability is inherently limited by the processing power and storage constraints of mobile hardware. To overcome this limitation, our framework implements an edge–cloud architecture, offloading computationally demanding processes to enable scalable navigation across extensive, multi-building indoor complexes.

In [33], Kamala et al. developed AR GuideX, an indoor navigation solution that combines ARCore, Mapbox, and Bluetooth beacon technology. Nevertheless, this method necessitates considerable pre-deployment of physical hardware, increasing both cost and susceptibility to environmental signal interference. Our methodology, by contrast, adopts a purely vision-based, infrastructure-free paradigm, performing robust localization solely through visual features captured via standard smartphone cameras.

In [34], Messi et al. conducted an evaluation of six-degree-of-freedom (6-DoF) AR registration accuracy within infrastructure-free settings. Although this research delivers valuable insights into localization precision, it does not extend to a fully operational navigation system. Our contribution advances this field by fusing high-accuracy visual localization with a real-time, interactive pathfinding engine developed in Unity, thereby providing a comprehensive solution for dynamic, turn-by-turn AR navigation guidance.

To better situate the proposed system among current solutions, Table 1 compares the photogrammetry-based approach with other mainstream indoor positioning technologies. While technologies like UWB offer high precision, they require significant hardware investment and complex installation. In contrast, our method leverages 360° visual data and ARCore’s motion tracking to achieve reliable accuracy (mean drift of 3.15 m) with minimal infrastructure dependency, while providing superior visual context for the end-user.

3. Proposed System Architecture

In this section, we present the overall architecture of the proposed indoor map information collection system and the adopted software tools and technologies.

3.1. System Overview

The proposed system is hierarchically structured into four layers to facilitate robust and accurate indoor visual positioning. As depicted in Figure 1, the workflow commences within the Data Acquisition Layer, where high-resolution environmental data are captured using panoramic imaging devices integrated with inertial sensors to generate a comprehensive raw visual dataset. The data are subsequently processed in the Modeling Layer, which implements a multi-stage photogrammetric pipeline. The initial structure-from-motion (SfM) reconstruction is performed using CupixVista, followed by geometric refinement in MeshLab and precise georeferencing with CloudCompare.

The resulting metrically accurate spatial database serves as the foundational map for the subsequent Navigation Engine. This engine integrates visual feature-based localization with Google ARCore’s inertial odometry to achieve robust, real-time 6-Degree-of-Freedom (6-DoF) pose tracking. Finally, the User Interface Layer, developed using the Unity AR Foundation framework, renders intuitive navigation cues and path overlays directly onto the user’s physical viewport via AR visualization.

The data synchronization in Insta360 X5 Studio Arashi Vision Inc., Shenzhen, China is governed by a dedicated hardware-level timing controller that aligns the dual CMOS sensors with the internal 6-axis Inertial Measurement Unit (IMU). According to the system specifications [35], the IMU operates at a high-frequency sampling rate of 200 Hz, where motion metadata is encapsulated within the video container using a proprietary synchronization protocol. This architecture achieves sub-millisecond temporal precision, which is a prerequisite for the FlowState stabilization and rolling-shutter compensation [36]. Such hardware-level alignment ensures that orientation metadata for each 8K frame is geometrically consistent, thereby reducing reprojection errors during the subsequent 3D photogrammetric reconstruction process.

To explain the data flow depicted in Figure 1, Table 2 provides a summary of the specific inputs, processing steps, and outputs associated with each hierarchical layer of the proposed system.

3.2. Data Acquisition Layer

The Data Acquisition Layer is responsible for capturing high-fidelity environmental data that constitute the foundational dataset for constructing the three-dimensional indoor map.

3.2.1. Panoramic Image Capture

Visual data acquisition is performed using an Insta360 X5 omnidirectional camera [37], which features dual 1/1.28-inch CMOS sensors. The device operates in 360° video mode at a native 8K resolution (7680 × 3840 pixels). To ensure high-quality reconstruction while maintaining a manageable data volume, static frames were extracted from the 8K video footage at a sampling rate of 0.5 Hz (one frame every two seconds). This ensures adequate texture detail for reliable feature extraction during subsequent photogrammetric processing. To mitigate motion-induced artifacts inherent in handheld scanning, the camera’s integrated six-axis gyroscope and electronic image stabilization functionalities are actively employed. The Insta360 camera automatically synchronizes visual frames with its internal IMU data (gyroscope and accelerometer) at the hardware level, ensuring a temporal alignment precision within

\pm 10

milliseconds, which is sufficient for the proposed pedestrian-speed collection speed. This configuration provides stable and geometrically consistent visual input, which is essential for high-quality three-dimensional reconstruction with an estimated image overlap of approximately 80% between consecutive frames.

3.2.2. Sensor Data Integration

Concurrent with visual acquisition, inertial measurements are recorded by the camera’s integrated Inertial Measurement Unit (IMU). The IMU logs tri-axial linear acceleration and angular velocity data, which are precisely synchronized with the image timestamps. These inertial readings serve a critical function in the initial reconstruction pipeline by correcting for camera orientation drift and establishing a stable global gravity reference, thereby enhancing the geometric fidelity of the 3D model.

3.3. Data Processing and Modeling Layer

The acquired raw sensor and image data are processed through a multi-stage computational pipeline to generate a lightweight yet semantically rich 3D model optimized for mobile navigation and AR visualization.

3.3.1. Processing Workstation

All computational tasks for reconstruction, mesh processing, and georeferencing are performed on a dedicated workstation. The system is powered by an Intel ^® Core ™Ultra 7 265KF processor and an NVIDIA GeForce RTX 4070 graphics processing unit (GPU) with 12 GB of dedicated video memory (VRAM). The GPU architecture is leveraged to accelerate computationally intensive stages; specifically, dense point cloud generation and mesh optimization. The configuration is supplemented with 32 GB of system memory to ensure stable performance during large-scale photogrammetric processing and model refinement. The operating environment is Windows 11 Pro (64-bit).

3.3.2. 3D Reconstruction and Point Cloud Generation

Raw panoramic images are processed using CupixVista, a cloud-based photogrammetry platform selected for its optimized support for sequential equirectangular imagery. As illustrated in the reconstruction pipeline in Figure 2, the platform automatically extracts distinctive visual keypoints and estimates precise camera poses via Structure-from-Motion (SfM) algorithms. This is followed by a dense reconstruction phase using Multi-View Stereo (MVS) that generates a textured point cloud and a corresponding 3D mesh. This process effectively transforms two-dimensional pixel information into a coherent three-dimensional structural representation, which serves as the foundational geometric baseline for subsequent refinement stages [38].

3.3.3. Mesh Refinement and Optimization

The initial mesh generated from the reconstruction stage often contains surface noise, artifacts, and excessive geometric detail. To address these issues, a systematic geometric refinement is performed using MeshLab following the pipeline illustrated in Figure 3. The process begins with noise reduction filters (e.g., Bilateral filtering) to suppress outliers, followed by topology repair to correct non-manifold geometry. Finally, the mesh decimation using the Quadric Edge Collapse algorithm is applied to reduce polygon complexity by 70%, ensuring the model remains lightweight for real-time mobile rendering [39].

3.3.4. Georeferencing and Classification

The optimized mesh is imported into CloudCompare for final spatial alignment and semantic enrichment. This workflow, illustrated in Figure 4, transforms the model’s local coordinate system into a real-world global reference frame through a georeferencing procedure. Concurrently, the semantic classification is applied to differentiate navigable surfaces, such as floors, from architectural obstacles, including walls and fixed furnishings. The output is a structured, georeferenced, and semantically tagged spatial database, which serves as the core digital map for subsequent navigation and augmented reality applications [40].

3.4. Visual Localization and Navigation Engine

The Navigation Engine functions as the core computational module responsible for estimating the user’s precise six-degree-of-freedom (6-DoF) position and orientation within the reconstructed indoor environment.

3.4.1. Visual Pose Estimation

The Visual Localization (VL) module calculates the user’s global pose by matching feature descriptors extracted from the live smartphone camera stream against the pre-constructed 3D map database. This approach inherently solves the kidnapped robot problem, allowing the system to localize the user from arbitrary initial positions within the mapped environment without dependency on external positioning infrastructure, such as GPS or Bluetooth beacons.

3.4.2. Sensor Fusion with ARCore

Following initial localization, continuous 6-degree-of-freedom (6-DoF) pose tracking is maintained through Google ARCore’s Visual–Inertial Odometry (VIO) framework. ARCore fuses visual feature tracking from the camera with high-frequency inertial measurements from the device’s IMU to estimate the device’s position and orientation in real-time. This sensor fusion strategy ensures robust, smooth, and low-latency tracking, which remains stable during periods of rapid device movement or temporary visual occlusion.

3.5. AR Visualization and User Interface

The user interface is developed using the Unity AR Foundation framework, which provides the core infrastructure for AR visualization. This layer receives both the calculated path from the routing module and real-time pose estimates from the Navigation Engine. It subsequently renders three-dimensional navigational elements, including directional arrows, path indicators, and virtual waypoints, directly onto the live smartphone camera view. This integration ensures that the virtual guidance cues remain persistently registered to the physical environment, offering the user intuitive, and real-time navigation assistance [41].

4. Implementation

In this section, we present the deployment and validation of the proposed system within Engineering Building No. 2 at Okayama University, Japan. This building is a four-story academic facility, selected as a real-world testbed to evaluate the enhanced indoor navigation performance of our methodology.

4.1. Data Acquisition

To ensure high spatial accuracy and data consistency, environmental data were captured using an Insta360 X5 omnidirectional camera integrated with an Inertial Measurement Unit (IMU). For the hardware setup, a head-mounted configuration was employed: the camera was vertically affixed to a safety helmet worn by the operator, as depicted in Figure 5.

This head-mounted configuration was selected to mitigate visual noise and motion blur, which are prevalent in handheld acquisition. The elevated and stabilized camera position significantly reduces occlusions caused by the operator’s body and ensures consistent geometric viewpoints across sequential frames. Furthermore, this setup facilitates natural, continuous movement along the intended mapping path, enabling hands-free operation and improving data acquisition efficiency.

The acquisition process generated a dataset of equirectangular panoramic images, automatically captured at regular spatial intervals along the designated navigation routes. Representative samples of the collected imagery from Floor 1 through Floor 4 are presented in Figure 6.

The captured imagery has a resolution of 5.7K, which delivers superior sharpness and dynamic range. This high level of visual fidelity is crucial for robust feature detection and matching, facilitating the reliable identification of distinctive environmental keypoints, such as notice board corners, floor textures, and door frames, even under varying indoor lighting conditions.

4.2. Data Processing and Modeling

4.2.1. 3D Reconstruction and Point Cloud Generation

Following data acquisition, the equirectangular panoramic image dataset captured with the Insta360 X5 was processed using the cloud-based CupixVista photogrammetry platform. This platform was selected as its integrated AI-powered processing engine, which automatically performs image alignment and executes Structure-from-Motion (SfM) algorithms to reconstruct three-dimensional geometry from the two-dimensional image sequences.

The reconstruction process generates a dense 3D point cloud and a corresponding textured mesh model, effectively creating a geometrically accurate digital twin of the physical environment. A key advantage of the CupixVista pipeline is its robustness to illumination variations and its capability to mitigate geometric distortions in architecturally complex transitional spaces, such as staircases and corridor intersections.

As illustrated in Figure 7, the reconstruction results validate the system’s capability to accurately model the building’s complex topology. On Floor 1 (Figure 7a), the base area, lobby, and connecting staircases are reconstructed with high point density, preserving fine architectural details. The model of Floor 2 (Figure 7b) captures a long corridor with multiple branching rooms, demonstrating seamless spatial connectivity and minimal positional drift. Similarly, Floor 3 (Figure 7c) exhibits consistent reconstruction quality, with sharply defined wall boundaries and door structures that facilitate subsequent spatial segmentation. Finally, the reconstruction of Floor 4 (Figure 7d) confirms the stability of the processing pipeline, maintaining geometric consistency even at the uppermost level of the building.

4.2.2. Mesh Refinement, Cleaning, and Decimation Using MeshLab

Following individual floor-level reconstructions, a spatial integration process merges the partial models into a single, unified representation of the building. This involves vertically aligning the structures from Floor 1 to Floor 4 within a common global coordinate framework. The resultant merged geometry, which often contains excessive vertex density and photogrammetric noise artifacts, is subsequently processed in MeshLab to enhance its quality and usability. A systematic mesh refinement pipeline is applied. Initially, statistical outlier filtering is employed with a standard deviation threshold of 1.0 to remove erroneous points and floating artifacts that do not correspond to real-world surfaces. This is followed by the application of the Laplacian smoothing (conducted in 3 iterations) to reduce surface noise while preserving the overall geometric volume and major structural boundaries. Finally, the polygon count is substantially reduced using the Quadric Edge Collapse Decimation algorithm with a 70% reduction ratio. This decimation step is critical for generating a lightweight mesh (reducing the initial model from approximately 500,000 to 150,000 faces per floor) that remains compatible with the computational limits of mobile devices while retaining essential architectural features, such as sharp wall edges and door frames.

Figure 8 presents the final integrated and optimized 3D model of the four-story building. The model exhibits clean geometry and calibrated coordinate axes (X, Y, Z), confirming its suitability for subsequent georeferencing and AR application without imposing prohibitive rendering overhead.

4.2.3. Classification and Georeferencing Using CloudCompare

Following geometric optimization, the refined model is imported into CloudCompare for georeferencing. This process transforms the model from its local, relative coordinate system into a global geospatial reference frame. Since the initial reconstruction lacks proper cardinal orientation, a rigid transformation is applied using the point pairs picking method. In this study, 12 Ground Control Points (GCPs) were utilized, strategically distributed across all four floors specifically at hallway corners and structural pillars to ensure uniform spatial coverage. This involves selecting corresponding anchor points in the model and matching them to these surveyed GCPs with known latitude and longitude values. The reference coordinates were obtained using a combination of survey-grade GNSS for exterior points and official architectural CAD blueprints for internal structural markers.

To maintain metric consistency in the Cartesian processing environment, the geographic coordinates of the GCPs are converted into the Universal Transverse Mercator (UTM) coordinate system. After performing the transformation, the alignment quality was validated, yielding a mean Root Mean Square (RMS) error of 0.084 m. Accurate georeferencing is critical in the AR navigation system, as it ensures the precise horizontal and vertical alignment required for the stable registration of virtual elements, such as directional cues and landmarks, thereby preventing visual drift and unrealistic floating artifacts during AR visualization. Furthermore, surface normal vectors (

N_{x}, N_{y}, N_{z}

) are validated to confirm correct wall and floor orientations, which is essential for accurate occlusion handling in AR rendering.

As depicted in Figure 9, the selected control points serve as reference correspondences for calculating the requisite rotation and translation matrices. These transformations are applied to the entire point cloud, resulting in a model that is simultaneously geometrically accurate and geographically aligned, thereby making it suitable for direct integration into the navigation engine.

4.3. AR User Interface Validation

The AR user interface evaluated in this study was developed by adapting a framework previously introduced by the authors in [7]. The core navigation logic and visual layout are based on the methodology detailed in Figure 14 of the referenced work [7] (https://www.mdpi.com/2078-2489/16/7/588/html, accessed on 5 January 2024). For clarity, a schematic of the foundational AR interface employed in the present validation is reproduced in Figure 10. As shown in Figure 10, the system utilizes OCR to capture building and room identities. It could be focused on fragment (b), where unstructured signage text is converted into structured metadata. While our previous implementation in [11,12] relied on these identities combined with basic sensor fusion (dead reckoning), this study improves the system by anchoring this metadata to precise 3D coordinates. This ensures that the navigation guidance is not only informative but also geometrically accurate, mitigating the drift issues found in earlier versions.

Field validation was conducted to evaluate the system’s robustness across varying elevations and spatial layouts using four distinct target locations, one on each floor of the building. Figure 11 presents a comparative visualization of AR navigation performance before and after the application of the proposed method. To complement these qualitative observations, the visual AR alignment error was quantified. Following the photogrammetric correction, the perceived drift was reduced from over 4 m to a stabilized offset of

0.45 \pm 0.12

m. This ensures that the virtual markers remain accurately pinned to the actual room entrances, as supported by the recorded positional data. The qualitative outcomes for each floor are summarized as follows:

This ensures that the virtual markers remain accurately pinned to the actual room entrances, as supported by the recorded positional data. The qualitative outcomes for each floor are summarized as follows:

Floor 1: As illustrated in Figure 11a, the uncorrected system produced a misaligned navigation path that erroneously directed users toward a corridor wall, with the destination landmark incorrectly positioned within an adjacent kitchen area. Following correction, the path was accurately aligned with the physical corridor, and the landmark was correctly placed directly in front of the intended office entrance.
Floor 2: As in Figure 11b, the initial coordinate drift caused the destination marker to appear as a floating artifact near the Student Laboratory. Post-correction, the navigation arrow correctly aligned with the corridor axis, and the marker was accurately repositioned at the laboratory entrance.
Floor 3: An adjacency ambiguity was identified on Floor 3 as in Figure 11c, where the system incorrectly associated the destination with a neighboring office door. The corrected implementation successfully resolved this spatial ambiguity, placing the landmark precisely at the targeted professor’s office.
Floor 4: As in Figure 11d, the uncorrected AR interface rendered a navigation arrow facing a solid wall, accompanied by a floating destination marker. After applying the correction parameters, the AR guidance conformed accurately to the corridor geometry and terminated at the correct room entrance.

Overall, the validation results confirm that the proposed correction pipeline effectively mitigates a broad spectrum of spatial registration errors, including orientation mismatches and destination point offsets, across all tested building floors.

4.4. System Performance Metrics

To evaluate the real-world applicability of the proposed system, we analyzed key performance metrics related to both the offline processing pipeline and the real-time AR navigation experience. The real-time metrics were measured on a Insert Smartphone Model during active navigation in the test environment.

As shown in Table 3, the AR application maintains a smooth rendering frame rate averaging 45 FPS, with a positioning update frequency synchronized at 30 Hz to ensure stable tracking without excessive battery drain. The estimated motion-to-photon latency is kept below 35 ms, which is crucial for minimizing motion sickness in AR experiences.

The results showed a 100% Task Completion Rate, with all participants successfully reaching their destinations without requiring external assistance. The average completion time was recorded at 124.5 s (SD = 15.2 s). These metrics confirm that the improved geometric accuracy translates directly into operational efficiency for end-users.

5. Performance Evaluation

In this section, to comprehensively assess the effectiveness of the proposal, we adopt a multi-faceted evaluation strategy and conduct a quantitative analysis of geometric accuracy and statistical error distribution, complemented by a qualitative assessment of system usability from an end-user perspective.

5.1. Geometric Accuracy and Drift Analysis

To evaluate the system’s accuracy, two sets of coordinates were compared against a known reference. The Before Correction values represent the raw latitude and longitude data obtained from the smartphone’s internal GNSS sensor while inside the building, which are prone to signal interference. The After Correction values (hereafter referred to as Estimated Corrected Positions) are the coordinates generated through our photogrammetric processing pipeline, where 360° panoramic visuals are anchored to a georeferenced 3D point cloud. The Ground Truth reference for this evaluation was established using the official architectural floor plans of Okayama University, providing a fixed benchmark for calculating the deviation distance.

This subsection evaluates the system’s capability to correct positional drift originating from initial sensor readings. The deviation distance (d) between an initial estimated position

(φ_{1}, λ_{1})

and its corresponding corrected position

(φ_{2}, λ_{2})

is quantified using the Haversine formula.

The calculation, based on the standard Haversine computational logic, proceeds in three steps. First, the differences in latitude and longitude are converted to radians. Second, the square of half the chord length (a) and the angular distance (c) between the two points are computed as defined in Equation (1):

\begin{matrix} a & = {sin}^{2} (\frac{Δ φ}{2}) + cos φ_{1} cos φ_{2} {sin}^{2} (\frac{Δ λ}{2}) \\ c & = 2 \cdot atan2 (\sqrt{a}, \sqrt{1 - a}) \\ d & = R \cdot c \end{matrix}

(1)

where

Δ φ

and

Δ λ

are the differences in latitude and longitude in radians, respectively;

φ_{1}, φ_{2}

are the latitudes of the two points; and R is the Earth’s radius (approximately 6,371,000 m). The final geodesic distance d is obtained in meters. Finally, the geodesic distance d (in meters) is obtained by multiplying the angular distance by the Earth’s radius, as expressed in Equation (1):

The experimental results in Table 4 validate the efficacy of the proposed map information collection tool. The system successfully identified and corrected positional discrepancies at all 12 sample points, with drifts ranging from 2.53 m to 3.66 m.

The average corrected drift across all points is approximately 3.15 m. This result has practical significance for indoor navigation. The physical distance between adjacent room entrances typically exceeds 3.5 m. While a maximum drift of 3.66 m was observed in one case, the overall average of 3.15 m indicates that the system reliably corrects the user’s position to within immediate proximity of the target. This level of accuracy is sufficient to position the user within the visual line-of-sight of the correct entrance, thereby resolving the spatial ambiguity inherent in uncorrected GNSS-based positioning, which can exhibit errors exceeding 5–10 m indoors.

5.2. Statistical Error Validation (RMSE)

While the arithmetic mean of positional drift provides a general indicator of accuracy, the Root Mean Square Error (RMSE) offers a more robust statistical measure by heavily penalizing larger deviations. This metric is essential for evaluating the overall stability and reliability of the system’s positioning performance across diverse locations. The RMSE is calculated using Equation (2):

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(d_{i})}^{2}}

(2)

where n represents the total number of sample points (

n = 12

) and

d_{i}

denotes the deviation distance (drift) at the i-th point.

Based on the empirical data in Table 4, the Root Mean Square Error (RMSE) for the proposed cartographic data collection instrument is calculated as follows:

\begin{matrix} R M S E = \sqrt{\frac{1}{12} (\begin{matrix} 3 . 33^{2} + 2 . 86^{2} + 3 . 46^{2} + 3 . 66^{2} + 3 . 05^{2} + 2 . 78^{2} + \\ 2 . 53^{2} + 3 . 34^{2} + 2 . 79^{2} + 3 . 55^{2} + 3 . 55^{2} + 2 . 94^{2} \end{matrix})} \end{matrix}

(3)

\begin{matrix} R M S E = \sqrt{\frac{1}{12} (\begin{matrix} 11.0889 + 8.1796 + 11.9716 + 13.3956 + 9.3025 + 7.7284 + \\ 6.4009 + 11.1556 + 7.7841 + 12.6025 + 12.6025 + 8.6436 \end{matrix})} \end{matrix}

(4)

R M S E = \sqrt{\frac{120.8558}{12}} = \sqrt{10.0713} \approx 3.17 m

(5)

The calculated Root Mean Square Error (RMSE) of 3.17 m demonstrates the close correspondence with the arithmetic mean positional drift of 3.15 m. This alignment indicates a low variance in the error distribution, which is characteristic of a stable system. The absence of a significant disparity between these two metrics suggests that the error structure is not dominated by extreme outliers, thereby supporting the conclusion of consistent system performance and a mitigated risk of severe navigational disruption.

5.3. User Experience and Usability Assessment

This subsection presents an evaluation of the proposed navigation system’s usability, employing a dual-method approach consisting of a pre/post-test design and the standardized System Usability Scale (SUS).

5.3.1. Pre-Test and Post-Test Methodology

A pre-test screening question (“Have you ever used this navigation system before?”) was administered to participants prior to the experimental task to control the potential confounding effects of prior familiarity. This served to segment the sample for subsequent analysis. Following their practical engagement with the system to complete an indoor wayfinding task, a comprehensive usability assessment was conducted using a post-test questionnaire based on the validated System Usability Scale (SUS) [42]. As detailed in Table 5, the questionnaire was designed to evaluate five critical usability dimensions, including: (1) interface learnability, (2) operational efficiency, (3) system error management, (4) perceived usefulness, and (5) overall user satisfaction with the 360° image-based navigation paradigm.

5.3.2. Pre-Test Result

A sample of 10 participants from the Okayama University was recruited for the usability evaluation. The pre-test screening results for the question: “Have you ever used this system before?”, which are summarized in Table 6, indicate that 100% of participants (n = 10) reported no prior usage of the navigation system. This established a homogeneous baseline of zero prior experience across the entire test group, thereby ensuring that subsequent usability assessments of initial learnability and interface characteristics were not confounded by pre-existing familiarity.

5.3.3. System Usability Scale Result

The system’s overall usability was quantitatively evaluated using the standardized System Usability Scale (SUS). Following the completion of practical indoor wayfinding tasks, all 10 participants responded to a 10-item SUS questionnaire using a 5-point Likert scale (1 = “strongly disagree” to 5 = “strongly agree”) [18]. To ensure reproducibility, the individual score calculation follows the rigorous protocol defined in [43].

For each participant j, the raw score

R_{i}

of the i-th item (where

i = 1 \dots 10

) is normalized to a contribution score

S_{i}

using Equation (6):

S_{i} = \{\begin{matrix} R_{i} - 1 & for odd i (1, 3, 5, \dots) \\ 5 - R_{i} & for even i (2, 4, 6, \dots) \end{matrix}

(6)

The total SUS score for participant j, denoted as

{SUS}_{j}

, is derived by summing the normalized contributions and applying the standard multiplier of 2.5, as shown in Equation (7):

{SUS}_{j} = 2.5 \times \sum_{i = 1}^{10} S_{i}

(7)

Finally, the overall system usability is computed as the mean score across all n participants (

\bar{SUS}

):

\bar{SUS} = \frac{1}{n} \sum_{j = 1}^{n} {SUS}_{j}

(8)

Table 7 presents the individual usability scores. The results indicate that the proposed tool is effective in cartographic data collection and provides a functional user interface for pedestrian navigation.

Based on the data in Table 7, the average SUS score is calculated as follows:

Average SUS = \frac{95.0 + 95.0 + 95.0 + 100 + 95.0 + 100 + 92.5 + 100 + 95.0 + 97.5}{10} = 96.5

The analysis of the SUS responses (n = 10) generated an exceptionally high mean usability score of 96.5, with individual scores ranging from 92.5 to 100. According to established normative benchmarks, this aggregate score categorizes the navigation system’s usability within the “Excellent” to “Best Imaginable” range. The narrow distribution and the absence of low scores indicate a consistently positive user experience across all participants. This finding substantiates the effectiveness of the system in delivering an intuitive and satisfactory navigation interface. Consequently, the results provide strong empirical support for technical readiness and user acceptance of the proposed system for real-world deployment in complex pedestrian environments, such as university campuses.

5.4. Navigation Task Performance

To further evaluate the practical effectiveness of the system, two additional metrics were measured during the field validation: the Task Completion Rate (TCR) and the Average Completion Time. The navigation task required each participant to locate a specific room on a different floor (from Floor 1 to Floor 4) starting from the main building entrance.

Task Completion Rate (TCR): A task was recorded as successful if the participant reached the target room entrance without any external assistance. All 10 participants successfully reached their assigned targets, resulting in a TCR of 100%.
Average Completion Time: The time was measured from the moment the AR guidance was activated until the user arrived at the destination marker. The average time recorded across all participants was $124.5 \pm 15.2$ s.

These results, combined with the low positional drift reported in Table 4, demonstrate that the system is not only geometrically accurate but also highly efficient for real-world wayfinding. The 100% completion rate suggests that the visual-inertial sensor fusion and photogrammetric correction effectively prevent users from getting lost due to AR misalignments, which is a common issue in large-scale indoor environments.

5.5. Discussion

This subsection synthesizes the performance evaluation of the proposed cartographic data collection tool and its integrated indoor navigation system, focusing on geometric precision and qualitative user experience. The results of the positional drift analysis confirm the system’s efficacy in mitigating sensor-based localization errors. The average correction of 3.15 m, corroborated by a closely aligned Root Mean Square Error (RMSE) of 3.17 m, indicates stable and uniform error compensation. This achieved level of the positional accuracy is operationally significant, especially when compared to the typical 5–10 m errors associated with conventional GNSS-based positioning in indoor environments. Critically, within the experimental setting, where the physical distance between adjacent room entrances consistently exceeds 3.5 m, the system’s sub-3.5 m accuracy ensures that users are reliably positioned within the direct visual corridor of their target destination. This effectively bridges the gap between raw positional data and functional navigational guidance.

This operational stability is directly reflected in the positive user perception captured by the System Usability Scale (SUS) evaluation, which yielded a mean score of 96.5. Benchmarking against established norms classifies this result within the “Excellent” to “Best Imaginable” usability range, validating that the 360° image-based navigation interface provides a highly intuitive experience, even for novice users. Collectively, the demonstrated synergy between robust positional accuracy and exceptional usability underscores the technical maturity and practical viability of the proposed system. These complementary findings strongly support its readiness for deployment in complex, real-world indoor environments such as university campuses.

Furthermore, the practical deployment of immersive AR navigation requires robust wireless communication to handle transmissions of high-resolution 360° panoramic data and real-time pose estimation updates. As the complexity of indoor digital twins grows, the demand for low-latency and high-bandwidth communication becomes more critical. Future deployments of this system could integrate advanced communication strategies, such as the MUL-VR framework proposed in 2025 [44], to optimize the delivery of visual assets and ensure seamless navigation in 5G/6G-enabled environments. This framework provides an efficient strategy for managing the data-intensive requirements of multi-user immersive experiences, which is highly relevant for large-scale indoor public spaces.

6. Conclusions

In conclusion, this study successfully validated the efficacy of the proposed 360° panoramic image-based cartographic data collection tool as a robust solution for indoor navigation. The tool integrates photogrammetric 3D reconstruction with a dedicated post-processing pipeline to address the inherent limitations of GNSS signal degradation in enclosed spaces. The key findings of the study include its demonstrated capability to achieve an average positional correction of 3.15 m, with a closely aligned Root Mean Square Error (RMSE) of 3.17 m, confirming strong geometric stability and uniform error distribution.

The metric accuracy translates directly to reliable room-level localization, which in turn underpins an outstanding user experience, quantified by a mean System Usability Scale (SUS) score of 96.5. Collectively, these results substantiate that the proposed methodology effectively bridges the gap between high-fidelity digital twin modeling and operational geospatial requirements, establishing its technical feasibility for deployment in complex indoor environments such as university campuses.

Future work will concentrate on enhancing the system’s scalability by automating critical components of the mesh cleaning and optimization workflow, thereby facilitating its efficient large-scale implementation across diverse infrastructural settings

Author Contributions

Conceptualization, K.S.B. and N.F.; methodology, K.S.B. and N.F.; software, K.S.B., A.A.S.P. and I.N.D.K.; visualization, K.S.B., I.N.D.K., K.C.B. and A.A.S.P.; investigation, K.S.B. and A.A.S.P.; writing—original draft, K.S.B.; writing review and editing, N.F.; supervision, N.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the reviewers for their thorough reading and helpful comments and all their colleagues at the Distributed System Laboratory, Okayama University, who were involved in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ma, S.; Wang, P.; Lee, H. An Enhanced Hidden Markov Model for Map-Matching in Pedestrian Navigation. Electronics 2024, 13, 1685. [Google Scholar] [CrossRef]
Wang, Q.; Luo, H.; Wang, J.; Sun, L.; Ma, Z.; Zhang, C.; Fu, M.; Zhao, F. Recent Advances in Pedestrian Navigation Activity Recognition: A Review. IEEE Sens. J. 2022, 22, 7499–7518. [Google Scholar] [CrossRef]
Asrat, K.T.; Cho, H.J. A Comprehensive Survey on High-Definition Map Generation and Maintenance. ISPRS Int. J. Geo-Inf. 2024, 13, 232. [Google Scholar] [CrossRef]
Al-Bakri, M. Enhancing spatial accuracy of OpenStreetMap data: A geometric approach. Math. Model. Eng. Probl. 2023, 10, 2171–2178. [Google Scholar] [CrossRef]
Kango, V.; Eraqi, H.M.; Moustafa, M. High Precision Map Conflation of Fleet Sourced Traffic Signs. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; p. 4648. [Google Scholar] [CrossRef]
Chaure, A.S.; Santosh Malloli, R.; Malave, M.S.; Basagonda Vhananavar, S.; Borse, S. Enhancing Navigation Systems: A Comprehensive Survey with Proposed Innovations for Improved User Experience. In Proceedings of the 2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 23–24 August 2024; pp. 1–4. [Google Scholar] [CrossRef]
Batubulan, K.S.; Funabiki, N.; Brata, K.C.; Kotama, I.N.D.; Kyaw, H.H.S.; Hidayati, S.C. A Map Information Collection Tool for a Pedestrian Navigation System Using Smartphone. Information 2025, 16, 588. [Google Scholar] [CrossRef]
Brata, K.C.; Funabiki, N.; Panduman, Y.Y.F.; Fajrianti, E.D. An Enhancement of Outdoor Location-Based Augmented Reality Anchor Precision through VSLAM and Google Street View. Sensors 2024, 24, 1161. [Google Scholar] [CrossRef]
Brata, K.C.; Funabiki, N.; Riyantoko, P.A.; Panduman, Y.Y.F.; Mentari, M. Performance Investigations of VSLAM and Google Street View Integration in Outdoor Location-Based Augmented Reality under Various Lighting Conditions. Electronics 2024, 13, 2930. [Google Scholar] [CrossRef]
Brata, K.C.; Funabiki, N.; Panduman, Y.Y.F.; Mentari, M.; Syaifudin, Y.W.; Rahmadani, A.A. A Proposal of In Situ Authoring Tool with Visual-Inertial Sensor Fusion for Outdoor Location-Based Augmented Reality. Electronics 2025, 14, 342. [Google Scholar] [CrossRef]
Fajrianti, E.D.; Funabiki, N.; Sukaridhoto, S.; Panduman, Y.Y.F.; Dezheng, K.; Shihao, F.; Surya Pradhana, A.A. INSUS: Indoor Navigation System Using Unity and Smartphone for User Ambulation Assistance. Information 2023, 14, 359. [Google Scholar] [CrossRef]
Fajrianti, E.D.; Panduman, Y.Y.F.; Funabiki, N.; Haz, A.L.; Brata, K.C.; Sukaridhoto, S. A User Location Reset Method through Object Recognition in Indoor Navigation System Using Unity and a Smartphone (INSUS). Network 2024, 4, 295–312. [Google Scholar] [CrossRef]
Lu, M.; Arikawa, M.; Oba, K.; Ishikawa, K.; Jin, Y.; Utsumi, T.; Sato, R. Indoor AR Navigation Framework Based on Geofencing and Image-Tracking with Accumulated Error Correction. Appl. Sci. 2024, 14, 4262. [Google Scholar] [CrossRef]
Nwankwo, L.; Rueckert, E. Understanding Why SLAM Algorithms Fail in Modern Indoor Environments. In Advances in Service and Industrial Robotics; Petrič, T., Ude, A., Žlajpah, L., Eds.; Springer: Cham, Switzerland, 2023; pp. 186–194. [Google Scholar] [CrossRef]
Lahemer, E.S.; Rad, A. HoloSLAM: A novel approach to virtual landmark-based SLAM for indoor environments. Complex Intell. Syst. 2024, 10, 4175–4200. [Google Scholar] [CrossRef]
Schoovaerts, M.; Li, R.; Niu, K.; Poorten, E.V. Quantitative Assessment of Calibration Motion Profiles in Robotic-assisted Ultrasound System. In Proceedings of the 2022 International Symposium on Medical Robotics (ISMR), Atlanta, GA, USA, 13–15 April 2022; pp. 1–7. [Google Scholar] [CrossRef]
Jalal, A.J.; Mohd Ariff, F.; Razali, F.; Wong, R.; Wook, A.; Idris, I. Assessing Precision and Dependability of Reconstructed Three-Dimensional Modeling for Vehicles at Crash Scenes using Unmanned Aircraft System. J. Adv. Geospat. Sci. Technol. 2023, 3, 129–144. [Google Scholar] [CrossRef]
Kotama, I.N.D.; Funabiki, N.; Panduman, Y.Y.F.; Brata, K.C.; Pradhana, A.A.S.; Noprianto; Desnanjaya, I.G.M.N. Implementation of Sensor Input Setup Assistance Service Using Generative AI for SEMAR IoT Application Server Platform. Information 2025, 16, 108. [Google Scholar] [CrossRef]
Wang, W.; Wu, R.; Dong, Y.; Jiang, H. Research on Indoor 3D Semantic Mapping Based on ORB-SLAM2 and Multi-Object Tracking. Appl. Sci. 2025, 15, 10881. [Google Scholar] [CrossRef]
Tsiamitros, N.; Mahapatra, T.; Passalidis, I.; Kailashnath, K.; Pipelidis, G. Pedestrian Flow Identification and Occupancy Prediction for Indoor Areas. Sensors 2023, 23, 4301. [Google Scholar] [CrossRef]
Wan, Q.; Yu, Y.; Chen, R.; Chen, L. Map-Assisted 3D Indoor Localization Using Crowd-Sensing-Based Trajectory Data and Error Ellipse-Enhanced Fusion. Remote Sens. 2022, 14, 4636. [Google Scholar] [CrossRef]
Kafataris, G.; Skarlatos, D.; Vlachos, M.; Agapiou, A. Investigating the Accuracy of a 360° Camera for 3D Modeling in Confined Spaces: 360° Panorama vs 25-Rig Compared to TLS. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2025, X-M-2-2025, 139–146. [Google Scholar] [CrossRef]
Yang, Z.; Zhao, K.; Yang, S.; Xiong, Y.; Zhang, C.; Deng, L.; Zhang, D. Research on a Density-Based Clustering Method for Eliminating Inter-Frame Feature Mismatches in Visual SLAM Under Dynamic Scenes. Sensors 2025, 25, 622. [Google Scholar] [CrossRef]
Vacca, G.; Vecchi, E. Integrated Geomatic Approaches for the 3D Documentation and Analysis of the Church of Saint Andrew in Orani, Sardinia. Remote Sens. 2025, 17, 3376. [Google Scholar] [CrossRef]
Liu, H.; Yuan, Y.; Yin, X.; Su, L. Spatial/Spectral-Frequency Adaptive Network for Hyperspectral Image Reconstruction in CASSI. Remote Sens. 2025, 17, 3382. [Google Scholar] [CrossRef]
Zhu, Q.; Li, H.; Sun, H.; Xia, T.; Wang, X.; Han, Z. 3DV-Unet: Eddy-Resolving Reconstruction of Three-Dimensional Upper-Ocean Physical Fields from Satellite Observations. Remote Sens. 2025, 17, 3394. [Google Scholar] [CrossRef]
Lv, C.; Lin, W.; Zhao, B. Voxel Structure-Based Mesh Reconstruction From a 3D Point Cloud. IEEE Trans. Multimed. 2022, 24, 1815–1829. [Google Scholar] [CrossRef]
Tresnawati, D.; Mulyani, A.; Nugraha, C.; Fitriani, L.; Lestari, N.; Nugraha, D. An Augmented Reality Technology in Indoor Navigation for Smart Museum Using Bluetooth Low Energy Communication. In Proceedings of the 2024 International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 4–5 September 2024; pp. 1–5. [Google Scholar] [CrossRef]
Bash, E.A.; Wecker, L.; Rahman, M.M.; Dow, C.F.; McDermid, G.; Samavati, F.F.; Whitehead, K.; Moorman, B.J.; Medrzycka, D.; Copland, L. A Multi-Resolution Approach to Point Cloud Registration without Control Points. Remote Sens. 2023, 15, 1161. [Google Scholar] [CrossRef]
Kong, X. Identifying Geomorphological Changes of Coastal Cliffs through Point Cloud Registration from UAV Images. Remote Sens. 2021, 13, 3152. [Google Scholar] [CrossRef]
Brightman, N.; Fan, L. A Brief Overview of the Current State, Challenging Issues and Future Directions of Point Cloud Registration. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, X-3/W1-2022, 17–23. [Google Scholar] [CrossRef]
Elghayesh, A.; Mekawey, H. MSA University Campus Indoor Navigation and positioning System Using Augmented Reality. In Proceedings of the 2025 Intelligent Methods, Systems, and Applications (IMSA), Giza, Egypt, 12–13 July 2025; pp. 152–156. [Google Scholar] [CrossRef]
B, K.; Sankar, S.; R, S.; Ranjani, J. AR GuideX: Indoor Wayfinder. In Proceedings of the 2024 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 8–9 October 2024; pp. 1–6. [Google Scholar] [CrossRef]
Messi, L.; Spegni, F.; Vaccarini, M.; Corneli, A.; Binni, L. Infrastructure-Free Localization System for Augmented Reality Registration in Indoor Environments: A First Accuracy Assessment. In Proceedings of the 2024 IEEE International Workshop on Metrology for Living Environment (MetroLivEnv), Chania, Greece, 12–14 June 2024; pp. 110–115. [Google Scholar] [CrossRef]
Nocerino, E.; Menna, F. In-camera IMU angular data for orthophoto projection in underwater photogrammetry. ISPRS Open J. Photogramm. Remote Sens. 2023, 7, 100027. [Google Scholar] [CrossRef]
Qu, D.; Liao, B.; Zhang, H.; Ait-Aider, O.; Lao, Y. Fast Rolling Shutter Correction in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11778–11795. [Google Scholar] [CrossRef] [PubMed]
Insta360. Insta360 X5 8K 360º Action Camera. 2025. Available online: https://www.insta360.com/product/insta360-x5 (accessed on 10 January 2026).
Cupix Inc. CupixVista: AI-Powered 3D Digital Twin Platform. 2025. Available online: https://www.cupixvista.com/ (accessed on 31 January 2025).
Visual Computing Lab-ISTI-CNR. MeshLab. Open Source Mesh Processing Tool. 2025. Available online: https://www.meshlab.net/ (accessed on 31 January 2025).
Girardeau-Montaut, D. CloudCompare, Version 2.14 (Alpha); GPL Software: Paris, France, 2025.
Google Developers. ARCore: Google’s Platform for Building Augmented Reality Experiences. 2025. Software Development Kit (SDK). Available online: https://developers.google.com/ar?hl=id (accessed on 31 January 2025).
Glomb, D.; Wolff, C. User Experience and Multimodal Usability for Navigation Systems. In Proceedings of the Annals of Computer Science and Information Systems; Polish Information Processing Society: Warszawa, Poland, 2022; Volume 30, pp. 207–210. [Google Scholar] [CrossRef]
Harwati, T.S.; Nendya, M.B.; Dendy Senapartha, I.K.; Lukito, Y.; Tjahjono, F.N.; Jovan, K.I. Usability Evaluation of Augmented Reality Indoor Navigation: A System Usability Scale Approach. In Proceedings of the 2024 2nd International Conference on Technology Innovation and Its Applications (ICTIIA), Medan, Indonesia, 12–13 September 2024; pp. 1–5. [Google Scholar] [CrossRef]
Tang, X.W.; Huang, Y.; Shi, Y.; Wu, Q. MUL-VR: Multi-UAV Collaborative Layered Visual Perception and Transmission for Virtual Reality. IEEE Trans. Wirel. Commun. 2025, 24, 2734–2749. [Google Scholar] [CrossRef]

Figure 1. System architecture with four layers.

Figure 2. CupixVista processing pipeline: From raw 360° imagery to 3D textured mesh via SfM and MVS algorithms.

Figure 3. Mesh refinement and optimization pipeline in MeshLab.

Figure 4. Georeferencing and semantic classification process in CloudCompare.

Figure 5. Hardware configuration for 360° data acquisition.

Figure 6. Samples of 360° panoramic dataset: (a) Floor 1 staircase area; (b) Floor 2 corridor; (c) Floor 3 corridor; and (d) Floor 4 corridor.

Figure 7. 3D reconstruction and point cloud generation results by CupixVista: (a) Floor 1 structure; (b) Floor 2 corridor layout; (c) Floor 3 layout; and (d) Floor 4 layout.

Figure 8. Integrated 3D building model by MeshLab.

Figure 9. Georeferencing process in CloudCompare.

Figure 10. Initial metadata collection via OCR (reproduced from [7]): Pedestrian navigation interface. (a) Search results showing location and staff data; (b) Detailed place view including coordinates and building photo; (c) List of staff members and their room locations; (d) Route preview from current location to the destination using map view; (e) AR-based outdoor navigation with directional guidance; (f) AR indoor navigation leading to the correct room; (g) Arrival indicator displayed when close to the destination.

Figure 11. Field validation of AR navigation: (a) Floor 1; (b) Floor 2; (c) Floor 3; and (d) Floor 4.

Table 1. Comparison of proposed method with mainstream indoor positioning technologies.

Technology	Accuracy	Infrastructure	Main Advantage	Main Limitation
WiFi Fingerprinting	5–15 m	Existing Routers	No extra hardware needed	High signal interference (Multipath)
BLE Beacons	2–5 m	Battery Beacons	Low power consumption	High maintenance (battery replacement)
UWB	0.1–0.5 m	Active Anchors	Extremely high precision	Expensive and complex setup
Proposed Method	∼3 m (Visual: <0.5 m)	Passive Maps (Images)	High visual fidelity; No active hardware	Dependent on lighting conditions

Table 2. Input, processing, and output specifications for each layer.

Layer	Input	Process	Output
Data Acquisition (Section 3.2)	Physical indoor environment Inertial signals (gravity & acceleration)	Panoramic Capture: 8K $360^{\circ}$ recording (Insta360 X5) Sensor Logging: Frame-timestamp sync with IMU	Raw panoramic images Synchronized sensor logs
Modeling Layer (Section 3.3)	Raw panoramic images Sensor logs	Reconstruction: SfM keypoint extraction (CupixVista) Optimization: Noise cleaning and decimation (MeshLab) Georeferencing: Global alignment (CloudCompare)	3D textured mesh Georeferenced point cloud Semantic data
Navigation Engine (Section 3.4)	Live smartphone camera feed Georeferenced 3D map	Visual Localization: Feature matching against map Sensor Fusion (VIO): Combining visual pose with IMU (ARCore)	Real-time 6 DoF device pose (Position & Orientation)
User Interface (Section 3.5)	Device pose estimates Routing path data	AR Rendering: Overlaying 3D assets (Unity Foundation) Anchor Management: World locking virtual objects	Augmented guidance view on smartphone screen

Table 3. Quantitative performance metrics of system.

Category	Metric	Avg. Value	Unit	Notes
Real-Time AR	AR Rendering Frame Rate	45	FPS	Measured on a mid-range smartphone (Snapdragon 8 Gen 2).
	Positioning Update Frequency	30	Hz	Stable pose estimation rate optimized for mobile ARCore.
	Motion-to-Photon Latency (Est.)	<35	ms	Minimized to prevent motion sickness and ensure visual alignment.
Offline Pipeline	Total Processing Time	∼2.5	h	Including 3D reconstruction, mesh cleaning, and georeferencing for a 4-story building.

Table 4. Coordinate correction data and deviation distance calculation.

Room ID	Raw GNSS (Before)		Proposed Estimate (After)		Drift (m)
Room ID	Long ( $°$ )	Lat ( $°$ )	Long ( $°$ )	Lat ( $°$ )	Drift (m)
D103	133.922635	34.689415	133.922629	34.689385	3.33
D104	133.922663	34.689387	133.922632	34.689383	2.86
D106	133.922596	34.689365	133.922630	34.689379	3.46
D201	133.922503	34.689455	133.922470	34.689473	3.66
D204	133.922565	34.689455	133.922540	34.689473	3.05
D206	133.922622	34.689459	133.922597	34.689473	2.78
D205	133.922590	34.689457	133.922570	34.689473	2.53
D301	133.922432	34.689412	133.922442	34.689441	3.34
D303	133.922553	34.689421	133.922569	34.689442	2.79
D308	133.922764	34.689471	133.922762	34.689439	3.55
D403	133.922525	34.689446	133.922562	34.689456	3.55
D404	133.922603	34.689511	133.922576	34.689496	2.94
Average	-		-		3.15

Note: “Raw GNSS” represents initial indoor smartphone readings; “Proposed Estimate” refers to coordinates refined by the photogrammetric model against architectural ground truth.

Table 5. Post-test questions for SUS.

No	Question	Category
1	I think I would like to use this navigation system frequently for finding rooms inside the building.	Usefulness
2	I found the navigation system unnecessarily complex when trying to understand the 3D visual guidance.	Ease of Use
3	I thought the system was easy to use when navigating through the indoor environment.	Ease of Use
4	I think I would need technical support to use this navigation tool effectively.	Learning Curve
5	I found the system’s features, such as the 360° views and spatial directions, were well integrated.	Efficiency
6	I noticed inconsistencies in the system, such as mismatched visuals or unclear location info.	Error Handling
7	I believe most people would quickly learn how to use this 360° navigation system.	Learning Curve
8	I found the system cumbersome to use when following the virtual path.	Ease of Use
9	I felt confident using this system to locate specific rooms using the panoramic display.	Usefulness
10	I had to learn a lot before I could start effectively using the system for wayfinding.	Learning Curve

Table 6. Pre-test result.

Answer	Participants	Rate
Yes	0	0%
No	10	100%

Table 7. SUS scores from 10 participants.

Participant	Responses (Q1–Q10)	SUS Score
1	4, 1, 5, 1, 5, 1, 5, 2, 5, 1	95.0
2	5, 1, 5, 1, 5, 1, 5, 2, 4, 1	95.0
3	4, 1, 5, 1, 5, 1, 5, 1, 4, 1	95.0
4	5, 1, 5, 1, 5, 1, 5, 1, 5, 1	100.0
5	5, 1, 4, 1, 5, 2, 5, 1, 5, 1	95.0
6	5, 1, 5, 1, 5, 1, 5, 1, 5, 1	100.0
7	4, 2, 5, 1, 5, 1, 5, 1, 5, 2	92.5
8	5, 1, 5, 1, 5, 1, 5, 1, 5, 1	100.0
9	5, 2, 5, 1, 5, 1, 4, 1, 5, 1	95.0
10	4, 1, 5, 1, 5, 1, 5, 1, 5, 1	97.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Batubulan, K.S.; Funabiki, N.; Kotama, I.N.D.; Brata, K.C.; Pradhana, A.A.S. An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems. Appl. Sci. 2026, 16, 1499. https://doi.org/10.3390/app16031499

AMA Style

Batubulan KS, Funabiki N, Kotama IND, Brata KC, Pradhana AAS. An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems. Applied Sciences. 2026; 16(3):1499. https://doi.org/10.3390/app16031499

Chicago/Turabian Style

Batubulan, Kadek Suarjuna, Nobuo Funabiki, I Nyoman Darma Kotama, Komang Candra Brata, and Anak Agung Surya Pradhana. 2026. "An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems" Applied Sciences 16, no. 3: 1499. https://doi.org/10.3390/app16031499

APA Style

Batubulan, K. S., Funabiki, N., Kotama, I. N. D., Brata, K. C., & Pradhana, A. A. S. (2026). An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems. Applied Sciences, 16(3), 1499. https://doi.org/10.3390/app16031499

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems

Abstract

1. Introduction

2. Related Works

2.1. Map Information Collection and Indoor Mapping

2.2. 360° Panoramic Capture and Photogrammetry

2.3. 3D Reconstruction and Mesh Processing

2.4. Point Cloud Registration and Georeferencing

2.5. AR-Based Indoor Navigation and Localization

3. Proposed System Architecture

3.1. System Overview

3.2. Data Acquisition Layer

3.2.1. Panoramic Image Capture

3.2.2. Sensor Data Integration

3.3. Data Processing and Modeling Layer

3.3.1. Processing Workstation

3.3.2. 3D Reconstruction and Point Cloud Generation

3.3.3. Mesh Refinement and Optimization

3.3.4. Georeferencing and Classification

3.4. Visual Localization and Navigation Engine

3.4.1. Visual Pose Estimation

3.4.2. Sensor Fusion with ARCore

3.5. AR Visualization and User Interface

4. Implementation

4.1. Data Acquisition

4.2. Data Processing and Modeling

4.2.1. 3D Reconstruction and Point Cloud Generation

4.2.2. Mesh Refinement, Cleaning, and Decimation Using MeshLab

4.2.3. Classification and Georeferencing Using CloudCompare

4.3. AR User Interface Validation

4.4. System Performance Metrics

5. Performance Evaluation

5.1. Geometric Accuracy and Drift Analysis

5.2. Statistical Error Validation (RMSE)

5.3. User Experience and Usability Assessment

5.3.1. Pre-Test and Post-Test Methodology

5.3.2. Pre-Test Result

5.3.3. System Usability Scale Result

5.4. Navigation Task Performance

5.5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI