Skip to Content
BuildingsBuildings
  • Article
  • Open Access

20 February 2026

Drift-Free BIM Alignment for Mixed Reality Visualization Through Image Style Transfer and Feature Matching

,
,
and
1
Department of Infrastructure Engineering, The University of Melbourne, Parkville, VIC 3010, Australia
2
School of Science, RMIT University, 124 La Trobe St, Melbourne, VIC 3000, Australia
*
Author to whom correspondence should be addressed.

Abstract

Accurate localization is a persistent challenge for Mixed Reality (MR) applications in the construction industry, where reliable alignment between digital building models and physical environments is critical. Commercial MR devices such as the Microsoft HoloLens rely on Visual-Inertial Simultaneous Localization and Mapping (VISLAM) for pose estimation, but accumulated drift over extended trajectories and visually ambiguous indoor spaces often reduces localization accuracy. This paper presents a complementary localization refinement methodology that integrates HoloLens spatial tracking with image style transfer and geometry-based pose estimation for Building Information Modeling (BIM)-aligned MR visualization. Image style transfer is used to reduce appearance discrepancies between real-world images and synthetic BIM renderings, improving feature correspondence for geometric alignment. Pose refinement is then applied using feature matching and Perspective-n-Point (PnP) estimation to mitigate accumulated drift when sufficient visual evidence is available. The method is evaluated using 1408 image pairs captured along an indoor trajectory, demonstrating improved BIM alignment, significantly reducing accumulated drift to 1–2 pixels. The proposed approach supports more reliable MR visualization for construction-related tasks such as inspection, coordination, and spatial decision-making.

1. Introduction

Mixed Reality (MR) is a technology that facilitates the integration of physical environments with virtual elements, thereby creating immersive user experiences. MR enables the interaction between real and virtual components, leading to a seamless blend of the two realms [1]. These technologies are increasingly applied in industries such as education [2], tourism [3], navigation [4], military [5], and construction [6,7], where accurate visual representation and manipulation of digital data are crucial.
Building Information Modeling (BIM) plays a vital role in enhancing the effectiveness of MR in the construction industry [8,9,10]. BIM serves as a digital representation of the physical and functional characteristics of a building, allowing for enhanced visualization and facilitating better decision-making and project management [11,12]. Integrating BIM with MR allows for more intuitive and real-time interaction between the virtual and the real world [13]. This combination enables more effective visualization of hidden elements [14,15] and facilitates tasks like progress tracking, maintenance, and scenario simulation in construction [16,17].
A key requirement for MR visualization of BIM geometries is accurate estimation of the MR camera pose, which involves determining the position and orientation of the devices within an indoor space. The absence of Global Navigation Satellite System (GNSS) signals indoors complicates this process, prompting research into alternative methods that can provide reliable, real-time localization without relying on GNSS [18].
To address these challenges, infrastructure-based techniques such as WiFi, Bluetooth, ultrasound, and ultra-wideband (UWB) have been developed. These systems estimate position based on metrics like signal strength and time-of-flight, but require considerable infrastructure investments, which may not always be practical [19]. As a result, there is growing interest in infrastructure-independent methods that do not depend on additional hardware.
Infrastructure-independent methods, like visual-odometry, utilize solely visual observations to estimate the movement of a device along the trajectory [20]. This method relies heavily on the quality of the images, and any degradation in image clarity or detail can significantly impact the accuracy of the motion estimates [21]. Another popular infrastructure-independent method is Simultaneous Localization and Mapping (SLAM), which is a process by which the device constructs a map of an unknown environment while simultaneously determining its position within that map, using sensor data like cameras, LiDAR, or inertial measurements [22]. However, these methods suffer from the accumulation of errors that can arise along the trajectory over time and the distance from the initialization of the device [17,22,23].
Model-based localization methods have gained increasing attention for their ability to align camera poses using digital representations such as BIM. These approaches offer an infrastructure-independent solution by leveraging pre-existing 3D models of the environment to estimate camera positions without the need for physical markers or external hardware [24,25,26,27,28]. However, the practical deployment of these systems faces persistent limitations. The disparity in visual appearance between synthetic BIM renderings and real-world camera images caused by lighting variations, lack of texture in BIM, and differences in environmental conditions often leads to inaccurate feature matching. Additionally, indoor environments with symmetrical architectural layouts can introduce ambiguity in pose estimation. These challenges become more pronounced in large-scale or dynamic settings, where visual drift and cumulative error along the device’s trajectory compromise localization accuracy and reliability.
In recent years, domain adaptation techniques like Cycle-Consistent Generative Adversarial Network (CycleGAN) have gained popularity in addressing visual mismatches between synthetic BIM renderings and real-world images. By translating synthetic images into photorealistic styles and vice versa, image feature correspondence is improved, which enhances camera pose estimation accuracy [29,30,31]. However, existing CycleGAN-based approaches face several limitations. Their performance often degrades in visually uniform or repetitive environments, where the lack of strong visual cues hinders accurate matching. Additionally, artifacts from GAN training, such as texture inconsistencies or noise, can introduce distortions that compromise localization precision. Moreover, these methods typically focus on the image translation task in isolation, without integrating geometric alignment procedures such as Perspective-n-Point (PnP), which limits their effectiveness in practical AR/MR applications requiring precise spatial alignment.
Recognizing these limitations, the aim of this study is to develop and evaluate a complementary localization refinement approach for BIM-aligned MR visualization that reduces accumulated drift in HoloLens-based MR systems without replacing native SLAM tracking. Rather than attempting to solve indoor localization solely through appearance-based domain adaptation or feature correspondence, the proposed method builds upon the Microsoft HoloLens VISLAM system as a continuous localization backbone and applies BIM-guided pose refinement opportunistically when sufficient geometric and visual cues are available. Image style transfer using CycleGAN is employed to reduce appearance discrepancies between real-world images and synthetic BIM renderings, thereby improving correspondence reliability, while geometric pose estimation using PnP is used to correct accumulated drift. By focusing on drift mitigation and alignment refinement, rather than absolute localization or feature-only tracking, the proposed approach explicitly acknowledges the challenges of low-texture and symmetric indoor environments and demonstrates how BIM-informed correction can significantly improve MR alignment when visual evidence supports it.
The contributions of this paper are as follows:
  • A complementary BIM-guided drift-refinement framework for MR localization, which integrates HoloLens VISLAM tracking with BIM-based geometric alignment, CycleGAN-driven domain adaptation, and PnP-based pose estimation, demonstrating that accumulated drift in MR device trajectories can be substantially reduced through BIM-guided pose refinement, resulting in near-zero residual reprojection error.
  • A unified pipeline that combines domain adaptation and geometric pose refinement, where CycleGAN is employed to mitigate appearance discrepancies between real and synthetic images, not as a standalone registration solution, but as an enabling component that improves correspondence reliability within a geometry-based refinement process.
  • An opportunistic refinement strategy that applies BIM-based correction only when sufficient visual correspondences are available, while relying on VISLAM to maintain continuous localization in symmetric or low-texture environments where correspondence-based methods alone typically fail.
  • A comprehensive experimental evaluation along a full indoor trajectory, demonstrating significant reprojection error reduction when refinement is applicable, together with explicit analysis of excluded frames and practical limitations related to feature availability, BIM accuracy, and computational constraints.
  • A clear discussion of deployment considerations and limitations, positioning the proposed approach as a refinement layer that enhances existing MR localization systems rather than replacing them, and outlining pathways toward automated initialization and real-time implementation.
The remainder of this paper is organized as follows: Section 2 presents a comprehensive review of related literature; Section 3 outlines the proposed methodology; Section 4 provides a detailed account of the experimental procedures; Section 5 presents the results and discussions. Section 6 concludes the findings, limitations and directions for future research.

3. Methodology

The proposed workflow enhances indoor localization accuracy by refining HoloLens VISLAM poses through image-based alignment with a BIM-derived virtual environment. The core idea is to exploit geometric correspondences between real images and their synthetic BIM counterparts to estimate camera pose within the BIM coordinate frame. However, substantial appearance differences between real-world images and BIM renderings introduce mismatches that can degrade correspondence quality. To mitigate this discrepancy, the pipeline employs CycleGAN-based domain adaptation to translate real images into a BIM-like style, thereby improving the consistency of visual features across domains. Feature correspondences are then extracted and used within a PnP-based geometric solver to refine the initial VISLAM pose. As illustrated in Figure 1, the methodology comprises four interconnected stages that include BIM generation, image acquisition, domain adaptation and feature matching, and error analysis. Each stage contributes to progressively reducing the domain gap and improving alignment between the physical and virtual environments.
Figure 1. Methodology of the proposed approach, including four stages: BIM Generation, Image Capture, Domain Adaptation and Feature Matching, and Error Analysis.

3.1. Generating BIM

A geometrically reliable BIM is essential for generating synthetic views that can serve as a reference for camera pose refinement. The BIM used in this study was constructed in Autodesk Revit based on a dense point cloud acquired using the GeoSLAM Zeb Horizon mobile laser scanning system. This dataset enabled detailed reconstruction of architectural and structural elements with an accuracy of approximately 1 to 3 cm, ensuring that walls, doors, windows, and other key features were represented with sufficient precision for spatial analysis. To further validate geometric fidelity, critical dimensions such as inter-wall distances were manually verified with an Electro Distance Measurement device, providing an additional check on the consistency between the digital model and the physical environment. Although these verification steps were taken to improve geometric fidelity, residual discrepancies between the BIM and the physical environment may still exist and can influence the accuracy of subsequent pose refinement.
Following the modeling process, the BIM was imported into the Unity platform to generate synthetic images aligned with the HoloLens trajectory. Unity was selected for its ability to efficiently render complex indoor environments and provide pixel-level geometric information [51]. The model was optimized to support real-time rendering by approximating lighting conditions and removing furnishings and nonstructural objects that were not essential for pose estimation. The resulting environment corresponds to a Level of Detail (LoD) 300 BIM, which offers sufficient structural detail for evaluating spatial alignment while maintaining manageable computational complexity (Figure 2).
Figure 2. Generated BIM-based on a point cloud captured by a mobile laser scanning system.

3.2. Image Capture

The image acquisition stage involves generating paired real and synthetic datasets that form the foundation for evaluating pose refinement. Real-world images were collected along a predefined indoor trajectory using the Microsoft HoloLens, which provides RGB images, depth information, and associated camera poses estimated through its internal Visual Inertial SLAM system. This dataset structure follows the framework introduced by Ungureanu and Bogo [52], in which each captured image is accompanied by pose estimates derived from depth sensors and grayscale tracking cameras.
Corresponding synthetic images were generated in Unity by rendering the BIM from viewpoints that match the HoloLens trajectory. In addition to the rendered imagery, Unity provided pixel-level 3D coordinates that serve as geometric ground truth for subsequent pose correction. These coordinates enable direct comparison between the estimated and true projections of BIM points, making them essential for quantitative evaluation of reprojection error.
In total, 1408 real images and their BIM-generated counterparts were acquired for analysis (Figure 3). This paired dataset supports both domain adaptation through CycleGAN and the assessment of pose refinement accuracy through feature correspondence and geometric alignment.
Figure 3. (a) Sample BIM images along the trajectory (b) Corresponding real images.

3.3. Domain Adaptation and Feature Matching

To reduce the visual discrepancy between real-world HoloLens images and synthetic BIM renderings, CycleGAN was employed. CycleGAN is well-suited for unpaired image-to-image translation, enabling domain adaptation without requiring exact correspondence between input datasets. In this study, the network was trained on unpaired sets of BIM renderings and HoloLens captures to generate style-transferred images that preserve geometric structure while approximating the textures and illumination characteristics of real scenes [44]. This transformation enhances appearance consistency and improves the likelihood of obtaining stable image correspondences.
Following domain adaptation, feature correspondences were extracted between the CycleGAN-translated images and their corresponding BIM images using the KAZE feature detector. KAZE was selected after preliminary empirical trials indicated that it performs robustly under the nonlinear intensity variations introduced by the style-transfer process. Its nonlinear scale-space construction makes it more resilient to intensity distortions compared to traditional methods [53,54]. While alternative descriptors such as SIFT, ORB, and SURF could also be applied, the objective of this work was to establish a feasible and effective pipeline for drift refinement rather than to conduct a comprehensive comparison of feature extraction algorithms. We acknowledge that KAZE, like all keypoint-based approaches, remains limited in environments with uniformly textured or symmetric surfaces, and this inherent constraint contributed to the set of image pairs for which correspondences could not be reliably established.
To obtain refined camera poses, 2D-3D correspondences derived from the matched keypoints were used as input to a PnP solver, which computes the rotation and translation, aligning the HoloLens camera frame with the BIM coordinate system [55]. The resulting transformation enables accurate projection of BIM-derived 3D points into the corresponding 2D image plane and forms the basis for correcting accumulated drift in the HoloLens trajectory.
It is important to note that the proposed pipeline does not rely solely on feature matching for continuous localization. The HoloLens VISLAM system provides stable tracking even in textureless or symmetric regions where keypoint-based methods struggle, ensuring continuity when insufficient features are available. The refinement step is therefore applied only when the visual evidence supports reliable correction. This hybrid design strengthens localization performance while acknowledging the practical limitations of both deep learning-based image translation and traditional keypoint detection methods, which we discuss further in the Limitations and Discussion sections.
It should be emphasized that the proposed refinement process is applied opportunistically. Feature matching and PnP-based pose correction are performed only when sufficient and reliable correspondences are detected. In scenes with limited texture or high symmetry, where correspondence extraction is unreliable, the refinement step is skipped. During these periods, the HoloLens VISLAM system continues to provide baseline pose tracking, ensuring continuity of localization without interruption.

3.4. Error Analysis

The final stage of the methodology evaluates the accuracy of camera pose refinement by quantifying the alignment between the real and synthetic image domains. Using the transformation estimated by the PnP algorithm [56], 3D points from the BIM were reprojected into the 2D image plane and compared with their corresponding feature locations in the CycleGAN-translated HoloLens images. The discrepancy between these points was measured using the Root Mean Square Error (RMSE) which provides a quantitative indicator of alignment accuracy and is computed as the Euclidean distance between projected and observed feature coordinates [57].
Two error metrics were used to assess the contribution of the proposed pipeline. The first, referred to as RMSE-before, represents the reprojection error associated with the initial HoloLens VISLAM pose prior to any domain adaptation or PnP refinement. The second metric, RMSE-after, represents the error following the application of CycleGAN-based image translation and PnP-based pose correction. A substantial reduction in RMSE-after relative to RMSE-before indicates that the combined domain adaptation and geometric refinement stages successfully mitigate accumulated drift and improve spatial correspondence between the virtual and physical environments.
This comparison provides direct evidence of the effectiveness of the proposed workflow in improving camera pose accuracy and supports its suitability for MR localization tasks where reliable alignment between BIM and real-world imagery is essential. It is important to note, however, that reprojection RMSE in this study is used to assess geometric consistency rather than absolute localization accuracy. The 2D correspondences employed in the RMSE computation are obtained through feature matching between CycleGAN-translated images and BIM renderings and may therefore contain inaccuracies.
To reduce the influence of unreliable correspondences, only feature matches that satisfy geometric constraints during the PnP estimation are retained, and image pairs with insufficient or unstable matches are excluded from the evaluation. Accordingly, the reported RMSE reflects the degree to which the refined camera pose is internally consistent with the BIM geometry and image observations. A reduction in RMSE after pose refinement, therefore, indicates effective suppression of accumulated drift relative to the BIM reference, rather than absolute positional accuracy with respect to an external ground-truth coordinate system.

4. Experiments

This study was conducted to evaluate the consistency, and effectiveness of the proposed localization enhancement methodology within a controlled offline setting. MATLAB (version R2023b) served as the primary computational environment due to its versatile image processing, computer vision, and mathematical analysis capabilities. All datasets, including those captured from Unity, HoloLens, and CycleGAN, were carefully imported and organized within MATLAB to enable an integrated and iterative experimental workflow.

4.1. HoloLens Data Acquisition

Before initiating formal image acquisition, the head-mounted HoloLens moved along a predefined trajectory within a residential indoor environment. This preliminary phase was essential for allowing the HoloLens to build a consistent spatial understanding of the environment, thereby minimizing tracking drift and improving the accuracy of pose estimation during actual image capture. The goal was to establish a stable operational context, which is critical for reliable data acquisition in real-world MR scenarios. It is important to note that it is not always possible during practical situations.
Following this initialization, the HoloLens device captured 1454 real-world RGB images at a consistent frame rate of 30 frames per second (fps). The trajectory followed by the operator is illustrated in Figure 4, with the start and end locations designated as point “A.” The chosen path covered varied lighting, geometry, and material conditions within the indoor environment.
Figure 4. HoloLens Trajectory (white points) with the start and end locations as point “A” and two abrupt turns over a short distance as point “B”.
A significant challenge was encountered in a narrow corridor denoted by “B” in Figure 4. This section involved two abrupt turns over a short distance, which posed difficulties for the HoloLens in maintaining accurate pose estimation. Consequently, 46 images from this section were deemed unreliable due to incorrect or missing pose data. Despite multiple attempts to re-capture data in this specific corridor, the localization failures persisted, and the associated frames were ultimately excluded from the final dataset.
The dataset, post-processing, included RGB images, corresponding camera pose information in the HoloLens local coordinate system, and spatially contextualized point cloud segments generated from the HoloLens’ internal depth sensors. These elements together formed a foundational multimodal dataset necessary for subsequent alignment, synthetic data generation, and evaluation stages.

4.2. Registration of HoloLens Point Cloud with BIM

It was crucial to align the HoloLens and BIM coordinate systems to capture BIM images within Unity. This alignment step provided the spatial transformation required to convert real-world camera poses into the coordinate space used by the BIM, thereby ensuring consistency between synthetic and real-world datasets.
The initial step in this alignment process involved merging several segmented point clouds obtained from HoloLens image segments into a single cohesive 3D point cloud. This comprehensive point cloud was imported into CloudCompare software, (version v2.12 alpha), which was used to perform a two-step registration process. The first step involved a coarse alignment using point-pair registration, allowing rough alignment based on manually selected reference features. The second step involved fine-tuning through ICP registration, which minimized the Euclidean distance between corresponding point features in the merged HoloLens point cloud and the BIM-derived point cloud (Figure 5). The final registration achieved an error with an RMSE of 0.024. The transformation matrix, denoted as THC, which accurately mapped HoloLens spatial data into the BIM coordinate system:
T HC   =   1.000 0.006 0.006 1.0395 0.035 0.001 1.000 0.479 0.007 1.000 0.000 1.1738 0 0 0 1
Figure 5. Registered point cloud (color) to BIM (green).
This matrix served as a critical spatial bridge, enabling the transformation of all HoloLens poses into the BIM’s reference frame for direct comparison and data fusion. Although this step is performed manually in this work, a range of algorithms exists that can facilitate automated registration in real-time applications

4.3. Unity Data Preparation and Image Capture

After generating the BIM, it was subsequently imported into Unity for the purpose of capturing BIM imagery. To ensure that the BIM and real images have identical geometry, the same intrinsic camera settings as those of the HoloLens RGB camera used to capture real-world images were integrated into the virtual camera in Unity. The series of BIM images was captured in Unity as explained in Section 3.2.

4.4. CycleGAN Training for Domain Adaptation

In order to reduce domain discrepancies between real and synthetic images, a CycleGAN model was trained for unpaired image-to-image translation [44]. The dataset comprised BIM-rendered images and HoloLens images with no one-to-one pairing; these were split in a 9:1:1 ratio into training, validation, and test subsets. Following the original architecture, we used generators based on nine residual-block architectures and discriminators employing the PatchGAN paradigm. Instance normalization was applied consistently across both generators and discriminators to stabilize style transfer and preserve structure. The model’s training objective balanced three loss components: adversarial loss to encourage realism in translated images, cycle-consistency loss to enforce that mapping there and back returns the original image, and identity loss to prevent unnecessary style shifts when input already lies in the target domain. During training, various checkpoints were evaluated using the validation set to assess the trade-off between visual realism and structural fidelity. Ultimately, the model at epoch 200 produced the best style-transferred results, renderings that most closely matched real-world textures while maintaining the geometric integrity of BIM structures (Figure 6a,b). As a note, the trained model was observed to suppress non-BIM semantic content, such as furniture and pendant lights, since these elements do not appear in the BIM domain, resulting in style-transferred images where structural geometry is preserved while nonstructural objects are visually de-emphasized.
Figure 6. (a) Real images, (b) CycleGAN style-transferred images.

4.5. Image Rescaling

Although the original image resolution for both BIM and HoloLens datasets was 760 × 428 pixels, CycleGAN’s architecture resized all training inputs to 256 × 256 pixels. To restore spatial consistency, a rescaling operation was conducted using factors of 2.968 (width) and 1.672 (height), bringing the generated CycleGAN outputs back to their original resolution. During the rescaling, the nearest-neighbor interpolation resampling is employed (Figure 7).
Figure 7. Image Rescaling.

4.6. Image Matching

The rescaled CycleGAN-transformed images and their corresponding BIM images underwent feature matching using the KAZE [54] algorithm implemented in MATLAB using “detectKAZEFeatures” function. The images were first converted to grayscale, and keypoints were extracted using KAZE, known for its robustness to non-linear illumination and scale changes. The descriptors were matched between image pairs, and the matched keypoints were visualized and color-coded for interpretability (Figure 8).
Figure 8. Keypoints ((Left): CycleGAN style transferred image; (Right): Corresponding BIM image).

4.7. PnP Pose Estimation

To refine the estimated camera poses and correct accumulated drift, the PnP algorithm was employed to compute the transformation between the image space and the BIM’s 3D coordinate system. Specifically, the “estimateWorldCameraPose” function in MATLAB was used to solve the PnP problem by aligning 2D image coordinates extracted from CycleGAN images with their corresponding 3D points from the BIM, as explained in Section 3.2.
The 3D spatial coordinates were retrieved from a pre-generated dataset of BIM exported during Unity rendering, while the 2D image coordinates were extracted through feature matching as outlined in Section 3.3. These correspondences were passed to the PnP solver, which uses the Perspective-Three-Point (P3P) algorithm as its underlying method. The P3P approach provides an efficient closed-form solution, especially suitable when at least four 2D-3D point correspondences are available.
To enhance robustness against mismatches and noise, the M-estimator Sample Consensus (MSAC) [18,58] was used to reject outlier correspondences with reprojection errors exceeding 2 pixels. The MSAC implementation involved a maximum of 2000 iterations and a 99% confidence level, ensuring reliable pose estimation even in the presence of challenging visual conditions or erroneous matches.

4.8. Reprojection of 3D Points

To validate the improvement in localization accuracy, the reprojection of 3D BIM points onto the 2D image plane has been carried out before and after applying the PnP algorithm. This comparison has been used to quantify the drift errors present in the initial HoloLens poses and to demonstrate the refinement achieved through the proposed method.
The initial camera poses Rtran, Ttran have been extracted from HoloLens tracking data and used to project known 3D BIM coordinates into the 2D image plane, resulting in the initial set of reprojected points. The corrected camera poses Rcam, Tcam have been estimated through the CycleGAN-enhanced PnP algorithm based on matched 2D-3D keypoint correspondences. Both sets of projections have been computed using the intrinsic parameters of the HoloLens RGB camera, calibrated before experimentation.
The 2D correspondences have been extracted from CycleGAN-translated images using geometric feature matching techniques (as described in Section 3.3). These 2D image points have been compared with the reprojected BIM points derived from both the initial and corrected poses.
As illustrated in Figure 9, green points denote the projections based on the refined pose, representing the expected location of features in the absence of drift. In contrast, red points have represented the projections from the initial HoloLens poses, highlighting the effect of accumulated drift.
Figure 9. Reprojected points, initial reprojected points (red), refined reprojected points (green).

4.9. Error Evaluation

The accuracy of the camera pose refinement has been quantitatively evaluated by computing RMSE between the 2D image correspondences and the reprojected points generated using both initial and corrected poses using the following formulas. EMSE-before has been calculated for the initial HoloLens poses, whereas the RMSE-after has been derived using the refined values. The value is in image pixels.
RMSE - before = 1 N 1 = 1 N P i C P i b e f o r e 2
RMSE - after = 1 N 1 = 1 N P i C P i a f t e r 2
where Piafter is the reprojected point using the estimated camera pose (Rcam, Tcam), Pibefore is the reprojected point using the HoloLens pose (Rtran, Ttran), N is the number of inlier points/correspondence in each image pair, and P i C is the corresponding 2D image point.
These RMSE values have been used to assess the geometric accuracy of the alignment process and to validate the impact of the proposed method in correcting accumulated drift. The 2D correspondences have been treated as ground truth, and reductions in RMSE have indicated improved localization performance.
The evaluation has confirmed that the proposed CycleGAN-enhanced pose refinement pipeline significantly reduced trajectory drift and improved spatial alignment between real and virtual environments across the 1408 tested image pairs.

5. Results and Discussion

The comprehensive evaluation of the proposed methodology was conducted systematically, repeating the entire process for all 1408 captured image pairs. To ensure statistical significance and enhance the reliability of the findings, a MATLAB-based computational workflow was executed iteratively 100 times for each image pair. This thorough approach facilitated the calculation of the RMSE for each pair, effectively capturing the average reprojection error between the initial and refined camera poses.
Figure 10 illustrates the distribution of RMSE values for each image pair, providing a comparative analysis of pose estimation accuracy across the entire dataset. The red line in the graph represents the RMSE prior to the application of the PnP, with values ranging approximately from 1 to 90 pixels, indicating a substantial degree of drift. In contrast, the blue line illustrates the RMSE after the implementation of the PnP, showcasing a remarkable reduction in error to a range of 1 to 2 pixels. The vertical axis is scaled logarithmically to improve visibility, allowing a clearer comparison of the differences between the two phases of the methodology and highlighting the effectiveness of the pose refinement process.
Figure 10. RMSE in each image pair along the trajectory of the camera.
Gaps in the RMSE curves correspond to image pairs for which refinement could not be performed, because the feature-matching stage did not yield a sufficient number of reliable correspondences. These cases predominantly occurred in low-texture or symmetric environments where correspondence extraction remains inherently difficult. However, unlike prior BIM-MR localization methods that completely lose tracking under such conditions, the HoloLens VISLAM system maintained a usable trajectory throughout these intervals. These unrefined poses appear as gray points in Figure 11, indicating that only the drift-correction step was unavailable, while baseline tracking remained intact. The only exception is Location B in Figure 4, where coarse alignment was intentionally omitted to demonstrate an extreme failure case.
Figure 11. Colorized initial and final RMSE along the trajectory before and after PnP.
To visualize the spatial distribution of these outcomes, Figure 11 presents a colorized RMSE map along the camera trajectory. The plot shows both initial and refined reprojection errors at their corresponding spatial locations, making the impact of drift correction more interpretable within the context of the scene. The gray points denote positions where refinement could not be applied due to insufficient correspondence, yet the HoloLens continued to provide stable tracking with last known drift correction applied. This behavior contrasts with the substantially larger initial drift observed at the same locations prior to applying the proposed refinement pipeline. Together, these results demonstrate that while the refinement process is limited by the availability of geometric cues, the overall system maintains localization continuity and effectively mitigates drift whenever conditions permit.
It is important to note that not all 1408 image pairs were included in the RMSE analysis. Specifically, 398 image pairs were excluded due to an insufficient number of reliable feature correspondences required for accurate PnP estimation. A minimum threshold of 10 inlier correspondences was established based on empirical tuning, ensuring a balance between analytical coverage and pose estimation accuracy. Lowering this threshold increased the total number of usable image pairs, but it also led to a higher occurrence of erroneous or spurious feature matches, thereby compromising the reliability of the estimated poses. This trade-off is illustrated in Figure 12, where an example of an image pair with erroneous correspondences is shown, emphasizing the necessity of enforcing a minimum inlier constraint. Importantly, the image exclusions do not represent localization failure. As illustrated by the uncorrected trajectory points, the HoloLens VISLAM system remained active during these intervals, and only the refinement component was unavailable. This behavior contrasts with correspondence-only localization methods, which typically experience complete tracking failure under similar conditions. Additionally, reprojection RMSE is reported in pixel units, it provides a meaningful indicator of pose consistency with respect to the BIM-derived geometric reference. For typical indoor environments and camera-to-surface distances of approximately 2–5 m, a reprojection error of 1–2 pixels corresponds to a translational misalignment on the order of a few millimeters to centimeters, depending on camera intrinsics and viewing geometry. Accordingly, the observed reduction from tens of pixels to approximately 1–2 pixels indicates substantial suppression of accumulated drift relative to the BIM model, even though absolute translation and rotation errors in metric units were not explicitly computed in this study.
Figure 12. (a) CycleGAN image, (b) BIM image, erroneous correspondence (circled), arrows indicate the matched points.
Further, the environment was segmented into distinct sections to facilitate a region-specific analysis. Section A, located at the beginning of the trajectory, demonstrated consistently low RMSE values, indicating high accuracy in localization during the initial phase (Figure 13).
Figure 13. Section analysis.
Minimal reprojection error observed in Section A (Figure 14a) in the start of the trajectory. However, in sections involving turns, such as Turning Points “T” and “U”, a noticeable increase in RMSE was observed (Figure 14b,c). This rise in error is attributed to motion-induced blur and reduced image sharpness, which affected the CycleGAN-generated imagery and led to compromised localization from the HoloLens.
Figure 14. (a) Start of the trajectory (S), (b) Turning Point (T), (c) Turning Point (U), (d) Middle of section B, (e) Middle of Section C, (f) Middle of Section D, (g) Middle of Section E, (h) End of Section D (i), the beginning of Section E, (j) Middle of Section F, initial reprojected points (red), refined reprojected points (green).
Sections B and C, characterized by a wider hallway and fewer distinctive features, showed progressively increasing RMSE values. The larger scale and uniform textures of these regions posed challenges for HoloLens mapping and further contributed to the accumulation of pose estimation errors (Figure 14d,e).
In Section D, RMSE continued to increase due to compounding trajectory errors. Nevertheless, a sharp reduction in RMSE occurred at the start of Section E. This improvement resulted from the camera’s ability to view extended spatial features, allowing the relocalization process to self-correct based on the broader field of view and increased environmental cues (Figure 14f,g). Conversely, Section E’s confined geometry limited feature visibility, preventing effective relocalization (Figure 14h,i).
Section F presented some of the highest RMSE values across the entire dataset. This trend is associated with the prolonged accumulation of errors due to the drift and the limited number of distinctive features available for accurate relocalization (Figure 14j). The final segment, spanning image pairs from index 1351 to 1408, was particularly problematic. Many of these frames were excluded from RMSE calculations due to insufficient feature correspondences, often because the camera’s view was dominated by homogeneous elements such as plain doors or featureless walls (Figure 15).
Figure 15. Removed image pairs from calculating RMSE: (a) location V (b) location W (c) location Y (d) location Z.
Overall, correspondence failures predominantly occur in three scene categories:
  • Textureless or visually uniform corridors,
  • Repetitive architectural layouts, and
  • High-motion turning regions.
Importantly, correspondence failure in these areas did not equate to complete localization failure. While BIM-guided refinement could not be applied, the HoloLens VISLAM system continued to provide a continuous and operational pose estimate, with drift increasing gradually rather than abruptly. Operationally problematic localization, defined as sudden pose jumps or loss of tracking, was observed only in rare cases involving rapid motion or extreme lack of visual cues. This analysis confirms that the proposed framework functions as a selective drift-refinement mechanism, enhancing localization when visual conditions permit while safely deferring to the VISLAM baseline in feature-sparse environments.
To further quantify the distribution of pose estimation accuracy, we generated a Cumulative Distribution Function (CDF) plot of RMSE-before (red line) and RMSE-after (blue line), as illustrated in Figure 16. The x-axis is scaled logarithmically to enhance the visual representation of both lines on the graph. This plot provides an aggregated statistical perspective, indicating that only 60% of RMSE values are around 20 pixels, while 90% of RMSE values exceed 30 pixels before implementing the PnP method. However, after applying PnP, all 100% of RMSE values are less than 2 pixels. These results validate the effectiveness of the proposed localization refinement framework and underscore its robustness in significantly reducing drift errors that can accumulate along the trajectory over distance and time.
Figure 16. CDF plot of alignment errors before and after PnP.
In this study, drift refers to the accumulation of pose estimation error over time in the HoloLens VISLAM trajectory, manifested as increasing misalignment between real-world images and the BIM-derived virtual scene. Rather than measuring drift directly in metric position or orientation units, drift correction is evaluated through reprojection error, which reflects the consistency between estimated camera pose and BIM geometry. A trajectory segment is considered effectively drift-corrected when reprojection RMSE is reduced to approximately 1–2 pixels following pose refinement, indicating that accumulated drift relative to the BIM reference has been suppressed. Accordingly, the term “drift-free” is used in a relative sense to denote negligible residual drift with respect to the BIM model, rather than absolute elimination of localization error in physical space.

6. Conclusions

This research presented a hybrid localization refinement framework that integrates HoloLens VISLAM tracking with BIM-based geometric alignment, CycleGAN-driven domain adaptation, and feature-based pose estimation. The primary contribution lies in demonstrating that accumulated drift in MR device trajectories, one of the most common limitations in extended MR operation, can be substantially reduced by leveraging style-transfer, feature-matching and PnP-based correction. Across 1408 image pairs, the proposed workflow consistently reduced reprojection error from tens of pixels to below two pixels whenever reliable feature correspondences were available. As a result, the approach improves spatial alignment between real and virtual environments and enhances the reliability of MR visualization for construction-related applications.
Unlike previous BIM-MR localization approaches that depend exclusively on feature correspondences and therefore fail in symmetric or textureless areas, our system maintains tracking through the HoloLens VISLAM, which remains active even when few or no geometric features are detected. The refinement stage then corrects accumulated drift only when sufficient geometric cues exist, ensuring that the system benefits from both continuous sensor-based tracking and BIM-informed correction. This structure clarifies the scope of the contribution and highlights how BIM-assisted refinement enhances, rather than competes with, current SLAM-based MR pipelines.
Beyond the technical contributions and quantitative improvements demonstrated, the outcomes of this study also have clear practical implications for multiple stakeholder groups. For MR system developers, the proposed framework illustrates how BIM-guided pose refinement can be integrated with existing VISLAM pipelines to mitigate accumulated drift without replacing native tracking mechanisms. For construction practitioners and facility managers, improved alignment stability enables more reliable MR visualization for inspection, coordination, and spatial decision-making tasks, reducing the need for repeated manual calibration during on-site operations. From a research perspective, the experimental analysis and identified limitations provide insight into the conditions under which domain adaptation and feature-based refinement are effective, highlighting opportunities to advance hybrid localization strategies that balance continuous sensor-based tracking with selective model-based correction. Collectively, these outcomes position the proposed method as a practical enhancement to current MR localization workflows rather than a standalone localization solution.
While the results show strong potential, several limitations still exist. Firstly, the CycleGAN model was trained on a relatively small, fixed dataset, and its performance in dynamic or cluttered construction environments has not yet been tested. Second, although the KAZE feature extractor proved empirically effective under the nonlinear intensity variations introduced by style transfer, its performance was not evaluated using quantitative metrics such as repeatability, inlier ratio, or matching precision, and it remains limited in low-texture or highly repetitive environments, as is the case for all keypoint-based methods. Similarly, while quantitative image-quality metrics for CycleGAN could provide complementary insight, these analyses were not included because the focus of this research is on improving localization accuracy rather than assessing the standalone performance of the style-transfer or feature-extraction components. Incorporating extensive metric-based evaluations for CycleGAN and KAZE would broaden the scope of the study and risk diverting attention from its central objective, which is to demonstrate the effectiveness of the proposed drift-refinement pipeline. For this reason, both components were evaluated indirectly through their contribution to reducing reprojection error in the final pose estimation stage, and more comprehensive quantitative evaluations are identified as important directions for future work.
Third, coarse alignment between BIM and HoloLens point clouds relied on manual selection of reference points. Although this step served only as initialization, operator variability may influence the starting transformation, and automated registration strategies such as ICP-based registration or learning-based alignment could replace this step without altering the core drift refinement pipeline would be beneficial in future implementations. Fourth, the current pipeline was executed offline using MATLAB, which limits its immediate applicability for real-time MR deployment due to the computational constraints of the HoloLens and similar mobile devices. Fifth, absolute translation and rotation errors were not reported because reliable ground-truth poses in metric space were not available for the entire trajectory, and the focus of this work was on relative drift correction with respect to the BIM reference rather than absolute localization accuracy. And finally, the proposed BIM-guided refinement framework assumes that the available BIM reasonably represents the physical environment. While the model used in this study was constructed from laser scanning data and manually verified, a formal sensitivity analysis quantifying the impact of BIM geometric errors on pose refinement accuracy was not conducted. As a result, the reported improvements reflect drift reduction relative to the available BIM accuracy rather than absolute localization performance.
These limitations outline several important directions for future research. A primary priority is expanding CycleGAN training to include more diverse and dynamic construction datasets, accompanied by quantitative evaluations of image translation quality using complementary metrics alongside reprojection error. Further improvements may be achieved by incorporating learning-based keypoint detection or multi-view geometric constraints to enhance correspondence extraction in texture-poor and repetitive environments. Replacing manual coarse alignment with automated registration strategies suitable for on-device execution will also improve scalability and practical deployment. In addition, benchmarking the proposed framework against emerging BIM-MR localization approaches and conducting systematic evaluations of BIM geometric error sensitivity, including controlled perturbations and as-built deviations, would provide deeper insight into performance robustness.
Although reprojection RMSE serves as a meaningful indicator of geometric alignment consistency, future evaluations could be strengthened through independent ground-truth validation, such as manually verified correspondences, fiducial targets, or controlled benchmark datasets to quantify absolute pose accuracy. Addressing these research directions will support the development of a fully integrated, real-time localization enhancement framework that advances MR visualization reliability, construction progress monitoring, and spatial decision-making in complex built environments.

Author Contributions

Conceptualization, M.Z.A.M., D.S., K.K. and D.A.; methodology, M.Z.A.M.; software, M.Z.A.M.; validation, M.Z.A.M., D.S. and D.A.; formal analysis, M.Z.A.M.; investigation, M.Z.A.M.; resources, D.S. and K.K.; data curation, M.Z.A.M.; writing—original draft preparation, M.Z.A.M.; writing—review and editing, D.S., K.K. and D.A.; visualization, M.Z.A.M.; supervision, D.S. and K.K.; project administration, D.S.; funding acquisition, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the University of Melbourne (Application Reference: 644655. 2020). The authors did not receive funding to cover article processing charges.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. Source code is publicly available at https://github.com/Mabdulmuthal/MR-localization (accessed on 17 February 2026).

Acknowledgments

The authors express their sincere appreciation to the University of Melbourne for providing access to facilities, software, and computational resources that supported this research.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Abbreviations

ARAugmented Reality
APRAbsolute Pose Regression)
AECArchitecture, Engineering and Construction
BIMBuilding Information Modeling
CDFCumulative Distribution Function
CNNConvolutional Neural Network
CycleGANCycle-Consistent Generative Adversarial Network
DoFDegrees of Freedom
FPSFrame per Second
GNSSGlobal Navigation Satellite System
LoDLevel of Detail
MRMixed Reality
MSACM-Estimator Sample Consensus
OSMOpen Street Map
PnPPerspective-n-Point
P3PPerspective-Three-Point
RMSERoot Mean Square Error
RTKReal-Time Kinematic
R2S-PoseNetreal-to-synthetic PoseNet
SLAMSimultaneous Localization and Mapping
SMOTESynthetic Minority Over-sampling Technique
S2R-PoseNetsynthetic-to-real PoseNet
Trans-CWGANTransfer Conditional Wasserstein Generative Adversarial Network
UWBUltra-Wideband
VISLAMVisual Inertial Simultaneous Localization and Mapping
VRVirtual Reality

References

  1. Muthalif, M.; Shojaei, D.; Khoshelham, K. A review of augmented reality visualization methods for subsurface utilities. Adv. Eng. Inform. 2022, 51, 101498. [Google Scholar] [CrossRef]
  2. Osadchyi, V.; Valko, N.; Kuzmich, L. Using augmented reality technologies for STEM education organization. In Proceedings of the International Conference on Mathematics, Science and Technology Education, Kryvyi Rih, Ukraine, 15–17 October 2020; Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021. [Google Scholar]
  3. Gharaibeh, M.K.; Gharaibeh, N.K.; Khan, M.A.; Abu-Ain, W.A.K.; Alqudah, M.K. Intention to Use Mobile Augmented Reality in the Tourism Sector. Comput. Syst. Sci. Eng. 2021, 37, 187–202. [Google Scholar] [CrossRef]
  4. Liu, B.; Ding, L.; Wang, S.; Meng, L. Designing Mixed Reality-Based Indoor Navigation for User Studies. KN—J. Cartogr. Geogr. Inf. 2022, 72, 129–138. [Google Scholar] [CrossRef]
  5. Livingston, M.A.; Ai, Z.; Karsch, K.; Gibson, G.O. User interface design for military AR applications. Virtual Real. 2010, 15, 175–184. [Google Scholar] [CrossRef]
  6. Bouchlaghem, D.; Shang, H.; Whyte, J.; Ganah, A. Visualisation in architecture, engineering and construction (AEC). Autom. Constr. 2005, 14, 287–295. [Google Scholar] [CrossRef]
  7. Shin, D.H.; Dunston, P.S. Identification of application areas for Augmented Reality in industrial construction based on technology suitability. Autom. Constr. 2008, 17, 882–894. [Google Scholar] [CrossRef]
  8. Irizarry, J.; Karan, E.P.; Jalaei, F. Integrating BIM and GIS to improve the visual monitoring of construction supply chain management. Autom. Constr. 2013, 31, 241–254. [Google Scholar] [CrossRef]
  9. Volk, R.; Stengel, J.; Schultmann, F. Building Information Modeling (BIM) for existing buildings—Literature review and future needs. Autom. Constr. 2014, 38, 109–127. [Google Scholar] [CrossRef]
  10. Garbett, J.; Hartley, T.; Heesom, D. A multi-user collaborative BIM-AR system to support design and construction. Autom. Constr. 2021, 122, 103487. [Google Scholar] [CrossRef]
  11. Li, X.; Yi, W.; Chi, H.-L.; Wang, X.; Chan, A.P. A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Autom. Constr. 2018, 86, 150–162. [Google Scholar] [CrossRef]
  12. Alizadehsalehi, S.; Hadavi, A.; Huang, J.C. From BIM to extended reality in AEC industry. Autom. Constr. 2020, 116, 103254. [Google Scholar] [CrossRef]
  13. Radanovic, M.; Khoshelham, K.; Fraser, C.S.; Acharya, D. Continuous BIM Alignment for Mixed Reality Visualisation. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, X-1/W1-2023, 279–286. [Google Scholar] [CrossRef]
  14. Abdul Muthalif, M.Z.; Shojaei, D.; Khoshelham, K. Interactive Mixed Reality Methods for Visualization of Underground Utilities. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2024, 92, 741–760. [Google Scholar] [CrossRef]
  15. Muthalif, M.Z.A.; Shojaei, D.; Khoshelham, K. Resolving Perceptual Challenges of Visualizing Underground Utilities in Mixed Reality. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLVIII-4/W4-2022, 101–108. [Google Scholar] [CrossRef]
  16. Albahbah, M.; Kıvrak, S.; Arslan, G. Application areas of augmented reality and virtual reality in construction project management: A scoping review. J. Constr. Eng. Manag. Innov. 2021, 4, 151–172. [Google Scholar] [CrossRef]
  17. Hsieh, C.-C.; Chen, H.-M.; Wang, S.-K. On-site Visual Construction Management System Based on the Integration of SLAM-based AR and BIM on a Handheld Device. KSCE J. Civ. Eng. 2023, 27, 4688–4707. [Google Scholar] [CrossRef]
  18. Ramezani, M.; Acharya, D.; Gu, F.; Khoshelham, K. Indoor Positioning by Visual-Inertial Odometry. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, IV-2/W4, 371–376. [Google Scholar] [CrossRef]
  19. Williams, G.; Gheisari, M.; Chen, P.-J.; Irizarry, J. BIM2MAR: An Efficient BIM Translation to Mobile Augmented Reality Applications. J. Manag. Eng. 2015, 31, A4014009. [Google Scholar] [CrossRef]
  20. Ramezani, M.; Khoshelham, K.; Fraser, C. Pose estimation by Omnidirectional Visual-Inertial Odometry. Robot. Auton. Syst. 2018, 105, 26–37. [Google Scholar] [CrossRef]
  21. Qin, J.; Li, M.; Liao, X.; Zhong, J. Accumulative Errors Optimization for Visual Odometry of ORB-SLAM2 Based on RGB-D Cameras. ISPRS Int. J. Geo-Inf. 2019, 8, 581. [Google Scholar] [CrossRef]
  22. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
  23. Acharya, D. Visual Indoor Localisation Using a 3D Building Model. Ph.D. Thesis, University of Melbourne, Melbourne, VIC, Australia, 2020. [Google Scholar]
  24. Acharya, D.; Khoshelham, K.; Winter, S. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS J. Photogramm. Remote Sens. 2019, 150, 245–258. [Google Scholar] [CrossRef]
  25. Acharya, D.; Ramezani, M.; Khoshelham, K.; Winter, S. BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model. ISPRS J. Photogramm. Remote Sens. 2019, 150, 157–171. [Google Scholar] [CrossRef]
  26. Chen, K.; Chen, W.; Li, C.T. A BIM-based location aware AR collaborative framework for facility maintenance management. J. Inf. Technol. Constr. 2019, 24, 360–380. [Google Scholar]
  27. Mahmood, B.; Han, S.; Lee, D.-E. BIM-Based Registration and Localization of 3D Point Clouds of Indoor Scenes Using Geometric Features for Augmented Reality. Remote Sens. 2020, 12, 2302. [Google Scholar] [CrossRef]
  28. Vermandere, J.; Bassier, M.; Vergauwen, M. Two-Step Alignment of Mixed Reality Devices to Existing Building Data. Remote Sens. 2022, 14, 2680. [Google Scholar] [CrossRef]
  29. Chen, J.; Li, S.; Lu, W.; Liu, D.; Hu, D.; Tang, M. Markerless Augmented Reality for Facility Management: Automated Spatial Registration based on Style Transfer Generative Network. In Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC), Dubai, United Arab Emirates, 2–4 November 2021; International Association for Automation and Robotics in Construction (IAARC): Oulu, Finland, 2021. [Google Scholar]
  30. Chen, J.; Li, S.; Liu, D.; Lu, W. Indoor camera pose estimation via style-transfer 3D models. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 335–353. [Google Scholar] [CrossRef]
  31. Acharya, D.; Tatli, C.J.; Khoshelham, K. Synthetic-real image domain adaptation for indoor camera pose regression using a 3D model. ISPRS J. Photogramm. Remote Sens. 2023, 202, 405–421. [Google Scholar] [CrossRef]
  32. Saito, S.; Hiyama, A.; Tanikawa, T.; Hirose, M. Indoor Marker-based Localization Using Coded Seamless Pattern for Interior Decoration. In Proceedings of the 2007 IEEE Virtual Reality Conference, Charlotte, NC, USA, 10–14 March 2007; IEEE: New York, NY, USA, 2007. [Google Scholar]
  33. Einizinab, S.; Khoshelham, K.; Winter, S.; Christopher, P. Offset-Based Marker Placement for BIM Alignment in Mixed Reality. In Proceedings of the 2023 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW), Kuala Lumpur, Malaysia, 8–11 October 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
  34. Abhishek, M.T.; Aswin, P.; Akhil, N.C.; Souban, A.; Muhammedali, S.K.; Vial, A. Virtual Lab Using Markerless Augmented Reality. In Proceedings of the 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Wollongong, NSW, Australia, 4–7 December 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
  35. Scargill, T. Context-Aware Markerless Augmented Reality for Shared Educational Spaces. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
  36. Jinyu, L.; Bangbang, Y.; Danpeng, C.; Nan, W.; Guofeng, Z.; Hujun, B. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Real. Intell. Hardw. 2019, 1, 386–410. [Google Scholar] [CrossRef]
  37. Hansen, L.H.; Fleck, P.; Stranner, M.; Schmalstieg, D.; Arth, C. Augmented Reality for Subsurface Utility Engineering, Revisited. IEEE Trans. Vis. Comput. Graph. 2021, 27, 4119–4128. [Google Scholar] [CrossRef]
  38. Messi, L.; Spegni, F.; Vaccarini, M.; Corneli, A.; Binni, L. Seamless Augmented Reality Registration Supporting Facility Management Operations in Unprepared Environments. J. Inf. Technol. Constr. 2024, 29, 1156–1180. [Google Scholar] [CrossRef]
  39. Acharya, D.; Roy, S.S.; Khoshelham, K.; Winter, S. A Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences. Sensors 2020, 20, 5492. [Google Scholar] [CrossRef] [PubMed]
  40. Sattler, T.; Zhou, Q.; Pollefeys, M.; Leal-Taixe, L. Understanding the limitations of CNN-based absolute camera pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
  41. Ha, I.; Kim, H.; Park, S.; Kim, H. Image-based Indoor Localization using BIM and Features of CNN. In Proceedings of the 35th International Symposium on Automation and Robotics in Construction (ISARC), Berlin, Germany, 20–25 July 2018; IAARC Publications: Waterloo, ON, Canada, 2018; pp. 1–4. [Google Scholar]
  42. Einizinab, S.; Khoshelham, K.; Winter, S.; Christopher, P. Camera Pose Refinement for Precise BIM Alignment in Mixed Reality Visualization. J. Comput. Civ. Eng. 2025, 39, 04025072. [Google Scholar] [CrossRef]
  43. Boan, T.; Jiajun, L.; Bosché, F. Autonomous Mixed Reality Framework for Real-Time Construction Inspection. J. Inf. Technol. Constr. (ITcon) 2025, 30, 852–874. [Google Scholar]
  44. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
  45. Wang, S. A hybrid SMOTE and Trans-CWGAN for data imbalance in real operational AHU AFDD: A case study of an auditorium building. Energy Build. 2025, 348, 116447. [Google Scholar] [CrossRef]
  46. Wang, S. Domain adaptation using transformer models for automated detection of exterior cladding materials in street view images. Sci. Rep. 2025, 16, 2696. [Google Scholar] [CrossRef]
  47. Sufiyan, D.; Win, L.S.T.; Win, S.K.H.; Tan, U.-X.; Foong, S. Direct Aerial Visual Localization using Panoramic Synthetic Images and Domain Adaptation. In Proceedings of the 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), Boston, MA, USA, 15–19 July 2024; IEEE: New York, NY, USA, 2024. [Google Scholar]
  48. Hong, Y.; Park, S.; Kim, H. Synthetic data generation for indoor scene understanding using BIM. in ISARC. In Proceedings of the 37th International Symposium on Automation and Robotics in Construction (ISARC), Kitakyushu, Japan, 27–28 October 2020; IAARC Publications: Waterloo, ON, Canada, 2020. [Google Scholar]
  49. Chen, H.; Yang, H.; Chen, J.; Zhang, S.; Jing, X. Bim Aided Indoor Camera Pose Estimation Based on Cross-Domain Image Retrieval; SSRN 4913115; SSRN: Rochester, NY, USA, 2024. [Google Scholar]
  50. Alnajjar, O.; Atencio, E.; Turmo, J. A systematic review of lean construction, BIM and emerging technologies integration: Identifying key tools. Buildings 2025, 15, 2884. [Google Scholar] [CrossRef]
  51. Büyüksalih, G.; Kan, T.; Özkan, G.E.; Meriç, M.; Isın, L.; Kersten, T.P. Preserving the Knowledge of the Past Through Virtual Visits: From 3D Laser Scanning to Virtual Reality Visualisation at the Istanbul Çatalca İnceğiz Caves. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2020, 88, 133–146. [Google Scholar] [CrossRef]
  52. Ungureanu, D.; Bogo, F.; Galliani, S.; Sama, P.; Duan, X.; Meekhof, C.; Stühmer, J.; Cashman, T.J.; Tekina, B.; Schönberger, J.L.; et al. Hololens 2 research mode as a tool for computer vision research. arXiv 2020, arXiv:2008.11239. [Google Scholar] [CrossRef]
  53. Tareen, S.A.K.; Saleem, Z. A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 3–4 March 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
  54. Zhang, P.; Yan, X. Application of Improved KAZE Algorithm in Image Feature Extraction and Matching. IEEE Access 2023, 11, 122625–122637. [Google Scholar] [CrossRef]
  55. Wu, Y.; Hu, Z. PnP problem revisited. J. Math. Imaging Vis. 2006, 24, 131–141. [Google Scholar] [CrossRef]
  56. Gao, X.-S.; Hou, X.-R.; Tang, J.; Cheng, H.-F. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 930–943. [Google Scholar]
  57. Lepetit, V.; Fua, P. Monocular Model-Based 3D Tracking of Rigid Objects; Now Publishers Inc.: Delft, The Netherlands, 2005. [Google Scholar]
  58. Aijazi, A.K.; Malaterre, L.; Trassoudaine, L.; Chateau, T.; Checchin, P. Automatic Detection and Modeling of Underground Pipes Using a Portable 3D LiDAR System. Sensors 2019, 19, 5345. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.