1. Introduction
An “imaging rover” is a special portable device with position and vision sensors to record and process visual data about an environment. In the surveying context, it is typically an integrated product of a GNSS receiver (Global Navigation Satellite Systems) with an inertial measurement unit (IMU) and one or more cameras. The GNSS receiver provides accurate positioning information from satellites, and the IMU and cameras are incorporated to further enhance accuracy, especially in challenging GNSS areas. The camera can also be used to take images of the environment, which can then be used to measure hard-to-reach points in that area. As GNSS faces problems in urban or indoor environments, an inertial measurement unit (IMU) is also used to complement GNSS in the rover by providing high-frequency motion data in the event of signal dropouts. In addition, the IMU provides the orientation data required to estimate a complete six-degree-of-freedom (6DoF) pose, utilizing its integrated accelerometer and gyroscope components [
1]. This combination of IMU with computer-aided integration algorithms and coupled with GNSS is referred to as a GNSS inertial navigation system (GNSS/INS) [
2]. Research in mobile mapping and SLAM has demonstrated that integrating cameras or time-of-flight sensors with GNSS/INS units can significantly reduce drift and improve near-range accuracy in environments with partial satellite visibility [
3,
4,
5,
6]. Although most of these works focus on vehicle-mounted or robotic systems, the same principles apply to compact imaging rovers, where synchronized visual and inertial data can support short-range 3D reconstruction and improve robustness during GNSS interruptions. Hence, imaging rovers provide a robust positioning solution for diverse surveying and mapping environments, from Unmanned Aerial Vehicles (UAVs) to ground surveying equipment [
7]. For automotive and mobile mapping applications, this solution is versatile and is already being used in several studies [
7,
8,
9] and devices already available in the market, such as Trimble MX7 [
10], RIEGL VZ-400i [
11], ViDoc [
12]: an add-on to a smartphone that has LiDAR, 3D Image vector [
13], Phantom 4 RTK [
14], and others.
Although INS for mobile mapping applications is not new, this concept is developing significantly for ground-based surveying since 2017. The concept was first proposed by [
15] and then by [
16]. Following this, Leica Geosystems AG (Heerbrugg, Switzerland) introduced the GS18 I, a GNSS sensor together with a 1.2 MP camera built-in that enables visual positioning (allowing point measurement in captured images without pole straightening) [
17]. Following Leica Geosystems AG, some other GNSS rovers offering visual positioning such as vRTK [
18], INSIGHT V1 [
19] and RS10 [
20] also came to the market. Various performance studies have shown that the GS18 I performs consistently well in terms of survey-grade accuracy and reliability across a range of environments [
21,
22]. Currently, there are limited equivalent systems combining totalstation tracking with integrated inertial and imaging components. However, when using the GS18 I for image measurements, some conditions must be observed [
22]. For example, the GS18 I must receive sufficient GNSS signals throughout the measurement. If the GNSS satellite tracking is lost, the acquisition will automatically stop. If visual positioning is required, it should be avoided in darkness or when the camera is facing the sun, as not enough detail can be detected from the captured images to correlate them. In addition, the object needs to have a non-repetitive texture to allow the Structure from Motion (SFM) algorithm [
23] to function properly. SFM is a photogrammetric technique that reconstructs three-dimensional structures from two-dimensional image sequences, relying on distinct visual features to estimate camera motion and object geometry. Alongside, for best results, images are captured from 2–10 m away. Distances less than 2 m may cause blurring due to fixed focus, while distances over 10 m reduce accuracy due to low resolution of the camera. Images taken outside this range may yield less precise measurements or may prevent point placement altogether.
In addition to complementing GNSS solutions, Leica Geosystems AG has introduced the AP20 AutoPole, a tiltable pole to complement totalstation workflows in 2022 [
24]. Totalstations excel in providing precise angle measurements, integrating angle and distance data in one device, and operating effectively where GNSS signals are unreliable or obstructed [
25]. The AP20 automatically adjusts the inclination and height of the pole, eliminating the need to level the pole and separately record the height of the pole during surveying work. The AP20 also has integrated target identification (TargetID), which helps to ensure that the correct target is detected even if there are obstacles such as people or vehicles in the vicinity that could cause the totalstation to lose sight of the target. Currently, there are no direct equivalents from other manufacturers that offer the same combination of features specifically for totalstations. However, the survey equipment industry is rapidly evolving, and there is a need to further enhance the functioning and accuracy of these instruments.
For enhanced visual positioning and measurement accuracy, an idea could be to integrate a Time of Flight (ToF) camera in the GS18 I and AP20. ToF cameras provide precise depth measurements by calculating the time it takes for light to travel to an object and back [
26]. This depth information enables real-time 3D point cloud generation. ToF cameras can also work well in low light or even complete darkness since they provide their own illumination. The accuracy of ToF cameras exceeds any other depth detection technology, with the exception of structured light cameras, and can provide an accuracy of 1 mm to 1 cm depending on the operating range of the camera [
27]. They are significantly more compact, speedy and have lower power consumption than other depth sensing technologies [
27]. Combining real-time 3D data from a ToF camera with GNSS and tilt measurements could therefore address the challenges associated with GNSS limitations while extending the measurement range and improving visibility in low-light/dark conditions. It might also increase the operator’s safety and the measurement accuracy, especially for single point measurements without line of sight, or increase efficiency by speeding up data collection by reducing time spent in the field. This improved ease of use could also provide a competitive advantage over other products on the market.
Therefore, the primary aim of this work is to investigate the use of a ToF camera using the Leica GS18 I and the AP20 as examples. In this regard, the Blaze 101 ToF camera from Basler AG was selected due to its performance data and potential suitability for outdoor surveying applications [
28].
While prior research has explored GNSS-INS integration with passive cameras [
7,
8,
9,
15,
16] and ToF cameras have been used in robotics [
26], this study presents the first systematic evaluation of a ToF-camera-enabled Multi-Sensor-Pole (MSP) for terrestrial surveying in combination with both, a totalstation or a GNSS-receiver. The integration of GS18 I with the Blaze camera is termed as the “GS18 I-MSP”, while the integration of AP20 with the Blaze camera is called the “AP20-MSP” in the text. Ref. [
29] did something similar to our GS18 I-MSP and were able to achieve around 5 cm error on distinct structures like walls. Unlike other existing imaging rovers that rely on Structure from Motion (SFM) with passive 2D cameras (e.g., GS18 I’s built-in camera [
17]), the MSP leverages direct depth sensing, enabling operation in low-light and texture-less environments where SFM fails. Critically, we compare two fundamentally different hardware configurations: GNSS-INS-based (GS18 I-MSP) vs. totalstation-coupled (AP20-MSP), revealing a significant performance divergence (25 cm vs. 3 cm absolute errors) that informs optimal sensor selection for specific field conditions. This comparative analysis, coupled with hardware-synchronized triggering and open-source calibration protocols, constitutes the primary novelty beyond mere component integration.
The structure of this case study is as follows:
Section 2 covers a brief overview of the equipment and technologies used in this study.
Section 3 presents the methodology for the integration process and the testing procedures to evaluate the performance of the integrated system.
Section 4 sets forth the results and findings of the accuracy and performance tests, including specific metrics and comparative data in various use case examples.
Section 5 discusses about the challenges and solutions associated with this work. Finally,
Section 5 summarizes the key outcomes of the case study and implications for future work.
4. Results
In this section, two types of accuracy measures are reported. “Relative distances” refer to the Euclidean distances between pairs of surveyed points. This metric evaluates the internal geometric consistency of the measurements and is independent of the national or global reference frame. In contrast, “absolute distances” refer to the coordinate differences between MSP-derived point positions and the reference coordinates in LV95. This second metric reflects the absolute positioning accuracy of the system with respect to the Swiss reference frame.
Live single point measurement was tested using the AP20-MSP in a lab environment (
Figure 8), with six checkerboard targets attached to the wall. The Leica MS60 MultiStation provided reference measurements. Despite reasonable accuracy and precision at shorter distances (150–350 cm), the experiment revealed a time-synchronization issue between the pose integration and point clouds. Since the pointcloud itself is not timestamped with the corresponding pose of the AP20 it is not possible to match the exact pose with the corresponding pointcloud in the live setup. This mismatch has greater influence at larger distances. This issue prevented reliable live single point measurements, particularly at longer distances in the live setup. The results showed an increase in absolute error from 8 cm at 1.5 m distance to 30 cm at 9.5 m, following a near-linear trend, whereas the relative error remained within 3 cm for all measurements. This synchronization issue did not affect the outdoor tests, as timestamps were precisely matched during post-processing.
For outdoor testing with both setups, three targets were fixed on tripods at three different locations around the house: one at the front, one in the middle and one at the back courtyard of the house (
Figure 9). The whole area was scanned with the 3D laser scanner (RTC360) from Leica Geosystems AG. The RTC360 can capture up to 2 million points per second with a 3D accuracy of 2.9 mm at 20 m Distance. Therefore the scan obtained from the RTC360, shown in
Figure 10, was used as a reference point cloud to compare the measurements later.
The corresponding synchronized point cloud taken by the AP20-MSP with a blue-white-red gradient scale is shown in
Figure 11a and the point cloud taken from the GS18 I-MSP is shown in
Figure 11b. The blue-white-red gradient when applied to a point cloud typically represents a visual encoding of distance or scalar field values when compared to another cloud. For example, in
Figure 11, the blue points are in or near the reference RTC360 cloud (e.g., 0 cm). Points that are approximately 10 cm from the reference cloud are coloured white, while points coloured red are at or near 20 cm from the reference point. Then the points away from 20 cm are shown in grey. Note that the point cloud from the GS18 I-MSP is quite grey, which means that these points are more than 20 cm away from the reference cloud. The different patches blend worse than the ones from the AP20. This is probably a consequence of the varying accuracy of the poses this has greater influence the farther the points are from the MSP.
With each MSP, multiple sets of data were recorded around the house. Three sets of measurements were taken with the AP20-MSP while nine sets were taken with the GS18 I-MSP with the following reference points:
Measured with Leica MS60: Points labelled as hp1, hp2, and hp3 were measured using the Leica MS60. The setup point of MS60 was calculated with a resection using three LFP3 points in the region and one HFP3 for height (LFP3 and HFP3 are official survey points in Switzerland that are managed by the municipality or a contractor of the municipality). The standard deviation for easting and northing was 1.2 cm.
Natural Points: Seven natural features: b1 (building corner), b2 (balcony corner), b3 (railing corner), w1-3 (window corners), and e1 (entrance) were extracted from the RTC360 reference point cloud and geo-referenced using laser scanning targets mounted on tripods. Point b3 was identified in the RTC360 cloud and measured by the GS18 I-MSP, but it was not recoverable in the AP20-MSP point cloud, likely due to occlusion behind the railing and motion-induced dropout during acquisition. A height offset of 6.5 cm exists from the round prism to the laser scanning target. The geo-referencing process had a final RMS of 1.2 cm.
Measured with GS18: Points labelled hp10, hp11, and hp12 were measured with the GS18 I on a different day due to technical difficulties, necessitating a second measurement campaign. To align this one with the other measurements, the laser-scanning targets were removed and the points were measured with the GS18 I. While the physical location of hp10–hp12 remained consistent across all measurements (both laser scanning and GS18), the necessary swap from large spherical laser targets to smaller total station reflectors introduced a known, measured vertical offset between the reference centers, which was later corrected. The 3D-Accuracy was below 5 cm for all measurements.
The accuracy expectations for the Multi-Sensor-Pole configurations vary based on the reference points and measurement methods. For the AP20-MSP, measurements taken with the MS60 are expected to have an accuracy combining the AP20’s inherent accuracy with a 1.2 cm standard deviation, with measurements closely matching the totalstation positions. The Reference Cloud 1 calculations for AP20 follow a similar pattern.
In fact, the measurements from the Blaze camera are noisier than those from the RTC360. It might also depend on how fast the Multi-Sensor-Pole was moving during the collection of the point clouds. However, synchronization with poses from GS18 I should be nearly perfect due to the matching of GS18 I images and intensity images from the Blaze camera. Likewise, the measurements of AP20 are nearly exactly a multiple of 10 of the Blaze Images (since the IMU rate of AP20 is 200 Hz and positioning updates were coming at 20 Hz measuring rate from the MS60). However, drag effects might have occurred when updating the 6-DOF pose. A comparison of a point cloud taken from RTC360, AP20-MSP, and GS18 I-MSP of a Laser scanning Target from the best sets is shown in
Figure 12.
From the best datasets, some measurements were taken for comparison. The GeoCom command used in the GS18 I-MSP gave the measurement coordinates in the ECEF (Earth-Centred, Earth-Fixed) reference system. Then the transformation from the global GNSS reference frame to the Swiss national system was performed in three steps. First, by using the GeoCom command, the GS18 I outputs the rover position in the Earth-Centered Earth-Fixed (ECEF) frame. These coordinates are converted to the geodetic WGS84 system using the standard ellipsoidal parameters
= 6,378,137.0 m and
via the
library of the Python (v 3.13.0) programming language. The resulting latitude, longitude, and ellipsoidal height are then projected into the Swiss projected coordinate system LV95 (EPSG:2056) using the official CH1903+ formulation. LV95 is the official Swiss coordinate reference frame for national surveying. The projection applies the grid shift model FINELTRA, ensuring consistency with Swiss federal geodetic standards. The FINELTRA transformation model is reliable to within 2 mm across the entire country. All point clouds and distance measurements are therefore referenced in LV95, which enables direct comparison against surveying-grade ground-truth points measured by the MS60 totalstation. Among these measurements, the relative and absolute distances from both MSP setups have been calculated and shown in
Section 4.1 and
Section 4.2. The relative distances are the distances calculated inside the point clouds, while the differences to the geo-referenced coordinates have been calculated as absolute distances.
4.1. Relative Euclidean Distances
The differences in relative distances within the point clouds for the AP20-MSP are minimal as shown in
Table 2. Here, “relative” refers only to distances between points, not to the coordinate origin. For example, the distance between reference points hp1 and hp3 is only off by 1.21 cm, and the largest deviation observed is −17.3 cm (hp1–b1). The standard deviation for all the relative distances is 8.7 cm. This suggests that the AP20-MSP maintains good internal consistency and accuracy relative to the reference measurements. The relatively consistent and small errors across different point pairs also suggest good precision.
The GS18 I-MSP showed larger deviations in relative distances, with some differences exceeding 20 cm (see
Table 3). The largest observed deviation was 62.0 cm for w2–w3. The standard deviation for all of them is 21.7 cm. This suggests that the internal consistency of the GS18 I-MSP measurements is less reliable compared to the AP20-MSP.
4.2. Absolute Distances
For AP20-MSP, when comparing absolute distances, hp1 showed a 3D error of 2.69 cm and w3 had a more substantial error of 20 cm. However, the absolute errors for the AP20-MSP are within 3 cm for signalized points, such as hp1 and hp3, indicating that accuracy on the decimetre level is achievable (
Table 4). Because the absolute errors we report in
Table 4 and
Table 5 are 17–33 cm, the contribution of the frame transformation is at least one order of magnitude smaller and can safely be neglected in the present discussion.
With the GS18 I-MSP, the absolute distance errors were notably larger: b1 had a 3D error of 32.5 cm, while hp13, which is significantly better, still had a error of 17.5 cm (see
Table 5).
While points b1–b2/b3, e and w1–w3 provide initial insight into natural-point (texture-based) measurement performance, their small number (n = 6 or 7) limits statistical generalizability. The observed degradation in GS18 I-MSP accuracy on these targets is therefore considered indicative of potential challenges in texture-poor or unstructured environments. Future work will expand validation across a broader set of natural features (e.g., walls, poles, vegetation) to quantify repeatability and bias.
To illustrate the spatial manifestation of the observed absolute errors reported in
Table 4 and
Table 5,
Figure 13 presents a close-up view of point e1 (front entrance) as reconstructed by the three systems. While e1 is selected for its large error magnitude (48.9 cm for GS18 I-MSP) and challenging geometry, similar patterns of misalignment are observed across all seven natural points tested (b1, b2, b3, w1, w2, e1, w3). In each case, the AP20-MSP point cloud aligns closely with the RTC360 reference. In contrast, the GS18 I-MSP point cloud exhibits increasing deviation, particularly along the vertical surfaces and under partial GNSS visibility. This confirms that the pose drift is a systemic issue rather than an isolated anomaly in the GS18 I-MSP.
4.3. Quantitative Results
As seen in
Figure 14, the cloud to cloud distance with a reference cloud was computed. If certain elements are looked at individually like the stairs in the front of the building, it is possible to see how much the points scatter. For the GS18 I-MSP the points scatter by up to 30 cm from the wall. If we look at the classes from
Figure 14, 20% of points are within 5 cm 60% within 15 and 70% within 20 cm of the reference cloud. For the AP20-MSP 45% are within 5 cm, 70% within 10 cm and 80% within 20 cm.
Although the Multi-Sensor-Pole used in this conceptual study is not yet a ready-to-use device, the results clearly demonstrate that integrating a ToF camera with GNSS and IMU technologies in the system can enhance surveying capabilities, allowing measurements in low-light or hard-to-reach areas without direct contact or line of sight. The experiment was carried out in a mixed open–semi-obstructed residential area. While this is not as complex as steep slopes or dense forest, it still provides several conditions relevant for multi-sensor fusion: direct GNSS availability at the front of the house, tree-induced GNSS degradation in the backyard, and limited line-of-sight for totalstation tracking. These variations allowed the MSP setups to be tested under changing visibility, lighting, and satellite geometry.
The results show that accuracy is most sensitive to how the sensor moves around corners, how much the line of sight is interrupted, and how quickly the pole is rotated between measurement patches. Because both MSP configurations rely on pose estimation during motion, any environment that forces frequent heading changes or rapid movement will amplify synchronization errors. In contrast, open areas with slow motion tend to produce more stable results. However, all this points can be improved with more sophisticated processing algorithms and filters.
The distribution of test points also plays a clear role. Points located behind obstacles or in weak GNSS regions show larger deviations, while points in open areas match the reference geometry more closely. This behaviour is consistent with how depth sensors and GNSS-INS systems respond to partial occlusion and pose uncertainty. Although additional test sites would strengthen the generality of the results, the patterns observed here reflect conditions typical of many practical surveying tasks.
The MSP could also improve stake-out operations by offering precise visual guidance, potentially with augmented reality features to overlay digital information, reducing errors and saving time—though such implementations were not tested here and would require dedicated usability studies. While not empirically validated in this study, the system’s ability to capture synchronized 3D and pose data opens avenues for applications in Building Information Modelling, infrastructure inspection, and topographic mapping. In summary, the key benefits of using a Multi-Sensor-Pole include:
Improved low-light performance for night-time or indoor surveying.
Enhanced accuracy, minimizing the need for multiple setups.
Expand the application range of the existing systems.
Simultaneous capture of positional and visual data for greater efficiency.
Versatility across diverse surveying and mapping applications.
4.4. Optimization Strategies for the GS18 I-MSP
The lower accuracy observed in the GS18 I-MSP configuration indicates that several elements of the pipeline can be improved. For instance, currently the trigger from the GS18 I is activated via a GeoCom command. This command creates a small timing delay (10–20 ms of “jitter” because the software has to process the request before the hardware starts sending pulses. These small timing offsets between the GS18 I INS pulses and the Blaze frames lead to incorrect pose assignment during motion. A refined clock synchronization method, such as recording PPS-level timing could reduce this drift.
Besides, in the current setup, the calibration between the GS18 I camera, the Blaze ToF camera, and the GS18 I body frame is performed in separate steps. First, the intrinsic and extrinsic parameters of each camera are estimated, and then the transformations between the devices are solved. This step-by-step approach works when the sensors are static or moving slowly, but it introduces small alignment errors when the pole is in motion. These small errors later show up as inconsistencies in the reconstructed point cloud, especially in the GS18 I-MSP configuration where the pose relies heavily on INS propagation. A joint spatio-temporal optimization would estimate all calibration parameters together, which is the standard approach in multi-sensor robotics [
39]. Instead of treating timing and geometry separately, this method fits the extrinsic transformations and the time offset between sensors in one combined optimization problem. This type of calibration approach is standard in advanced multi-sensor robotics setups and would directly benefit the GS18 I-MSP.
Similarly, the GS18 I relies heavily on continuous GNSS updates. During brief blockages, the INS carries the pose, which amplifies drift and affects the fused depth. Fused depth refers to the depth values generated after combining camera measurements with the INS prediction and the GNSS updates. When the pose estimate drifts, these depth values inherit the same error. To correct motion errors, one needs frame-to-frame changes. Integrating a short visual odometry step or a lightweight IMU prediction filter could help reduce this drift between GNSS fixes. This integration would use the Blaze intensity images, which are already time-synchronized through the hardware trigger. By tracking frame-to-frame feature motion (for example using ORB (Oriented FAST and Rotated BRIEF), which is a fast computer vision feature-tracking method or optical flow), the system can estimate short-term relative motion during intervals where the GNSS signal is weak or lost. This relative motion is then fused with the INS prediction to produce a corrected pose for each Blaze frame.
Note that this study was designed as a pilot evaluation of hardware-level ToF integration feasibility under real-world surveying conditions—not a full statistical validation. As such, several limitations must be acknowledged:
First, the outdoor test site, while representative of mixed GNSS visibility (open front vs. obstructed rear), covers only one building. While point cloud consistency across repeated GS18 I-MSP sessions (n = 9) and AP20-MSP sessions (n = 3) improves confidence in relative trends, broader environmental diversity (e.g., urban canyon, forest, indoor) remains untested.
Second, while the analysis of natural (non-signalized) points counts seven targets (b1, b2, b3, w1, w2, e1, w3), the sample size remains limited for robust generalization across diverse surface types and geometries (e.g., glass, metal, trees or curved walls). Absolute 3D errors range from 5.2 cm to 20.0 cm for AP20-MSP and 16.1 cm to 69.4 cm for GS18 I-MSP, with a mean of 10.7 cm and 33.9 cm, respectively. These results indicate a consistent performance gap particularly in occluded or GNSS-degraded zones (e.g., e1, w3), but should still be interpreted as indicative rather than definitive for population-level inference. Future work will validate these trends across a broader set of unstructured features.
Third, indoor testing (
Section 3.7) validated synchronization and short-range drift behaviour but did not include quantitative accuracy assessment on natural indoor surfaces. This gap needs to be addressed in future indoor validation campaigns.
Notwithstanding these limitations, the large and consistent performance gap between AP20-MSP and GS18 I-MSP on survey-grade signalized points (hp1, hp2, hp3, hp10–hp13; n = 6)—supported by sub-3 cm vs. >17 cm 3D errors—strongly suggests a fundamental difference in system-level integration robustness, warranting the optimization pathways discussed earlier in this Section.
5. Conclusions
This study presented a comparative evaluation of two Multi-Sensor-Pole (MSP) configurations—AP20-MSP (total station–IMU–ToF) and GS18 I-MSP (GNSS-INS–ToF) under identical hardware, calibration, and processing conditions. The AP20-MSP achieves absolute accuracy, with 2.7–2.9 cm 3D error on MS60-controlled signalized points (hp1, hp3) and 5.2–20.0 cm on six natural features (b1, b2, w1, w2, e1, w3), all validated against a ≤1.2 cm RMS RTC360 reference cloud. Its relative consistency is high, with inter-point distance deviations ranging from 1.2 cm to 15.2 cm, and ≤1.2 cm for baselines >20 m (e.g., hp1–hp3;
Table 2). In contrast, the GS18 I-MSP configuration exhibits significant absolute bias, with 17.5–26.6 cm 3D error on its four GNSS-measured signalized points (hp10–hp13) and 16.1–69.4 cm on seven natural targets (b1–b3, w1–w3, e1). While relative consistency remains acceptable over short spans (e.g., ≤2.2 cm on hp11–hp13), errors exceed 21 cm over longer distances or near texture-poor surfaces (hp12–w1), and surge dramatically in semi-occluded zones (e1: 48.9 cm; w3: 69.4 cm). This suggests that pose drift is the dominant error source, likely due to spatio-temporal misalignment in the GNSS-INS–ToF fusion chain (
Section 4.4).
Critically, absolute accuracy matters more than relative consistency for most surveying workflows (e.g., stakeout, cadastral updates, BIM alignment), where points must tie into national or project grids. The AP20-MSP meets this requirement; the GS18 I-MSP does not yet.
With regard to efficiency, the AP20-MSP proves to be more effective for various surveying use cases, particularly in challenging environments. It is well suited for applications that require precise measurements, such as construction surveys, topographic mapping, and infrastructure monitoring. The GS18 I-MSP, while functional for low-accuracy tasks, requires further optimization, particularly in hardware-level synchronization and joint spatio-temporal calibration, before deployment in survey-grade workflows.