1. Introduction
Robust, high-rate, and precise navigation is a fundamental requirement for emerging technologies, particularly in robotics, mobile-augmented reality (MAR), and urban on-device navigation [
1,
2]. While achieving drift-free global positioning is essential for these applications, the widespread adoption of such technology is strictly governed by Size, Weight, Power, and Cost (SWaP-C) constraints [
3,
4]. Over the past several decades, significant computational advancements have allowed for the integration of spatially aware sensors into real-time frameworks, fusing cognitive sensors with conventional navigation aids [
5]. However, real-world applications rely heavily on Global Navigation Satellite Systems (GNSSs). While GNSSs provide essential absolute positioning, their signals are notoriously unreliable in “urban canyons” and narrow pedestrian paths, where the Line of Sight (LOS) is frequently blocked or degraded [
6]. These types of issues are more visible in the context of mass-marketed mobile devices, like smartphones. Unlike survey-grade receivers or complex Real-Time Kinematic (RTK) systems—which offer sub-meter accuracy but suffer from high communication burdens and slow convergence times [
7,
8]—smartphone GNSS receivers are designed primarily for cost efficiency. They typically utilize single-frequency, multi-constellation chipsets that rely on low-quality antennae highly susceptible to multipath interference [
9,
10]. Furthermore, strict power-saving constraints often force these chipsets into duty-cycling operations, frequently corrupting continuity in carrier-phase measurements and resulting in positioning errors of tens of meters [
11]. Even in recent state-of-the-art studies that exploit partial wide-lane ambiguity resolution on specific smartphones, several-decimeter-level horizontal accuracy is achieved [
12]. Google has recently introduced the Smartphone Decimeter Challenge to encourage advancements in high-accuracy positioning using only smartphone sensors [
13]. The 2024 winner of the competition [
14], demonstrated state-of-the-art GNSS/INS fusion using optimized timestamp alignment and adaptive measurement weighting. Even under these favorable conditions, the achieved public and private scores show sub-meter but not decimeter-level accuracy. In one study [
15], the authors reported achieving approximately sub-meter accuracy by fusing smartphone GNSS and IMU measurements within an RTK framework (which requires a smartphone to provide carrier-phase observations). They also showed that their RTK-only smartphone positioning remained at the level of several meters, highlighting the significant benefit provided by the fused GNSS-INS approach. These results highlight that contemporary smartphone hardware, which provides code-pseudoranges at a single frequency, typically remains limited to performance on the order of meters in realistic environments without sophisticated fusion algorithms.
The fusion of different sensors available in smartphones (e.g., IMUs, magnetometers, and cameras) may compensate for the weaknesses of GNSSs through their complementary characteristics. Inertial Measurement Units (IMUs) are commonly used to provide seamless positioning in GNSS-challenged environments. However, IMUs suffer from drift due to systematic and stochastic errors stemming from their intrinsic design and environmental conditions, which introduce an additional burden to the error budget [
16]. On the other hand, inertial information can also be derived by tracking visual landmarks across successive image frames. While monocular-camera-based methods require external scale information, stereo variants can resolve the depth estimation problem [
17]. Consequently, the fusion of visual and inertial sensors handles the depth estimation problem in monocular cameras, as well as the rapid drift of MEMS IMUs [
18] in vision-friendly environments. The fusion may be achieved by jointly optimizing the camera reprojection errors and the IMU preintegration residuals within a sliding-window nonlinear-optimization framework [
19,
20]. However, feature detection and tracking quality can be significantly degraded by a wide range of real-world factors, including motion blur, defocus, low texture, and challenging illumination, such as low light or overexposure. Additional degradation can arise from rolling-shutter distortions, focus instabilities, as well as scene dynamics, such as specular reflections, moving objects, or adverse weather conditions (e.g., fog, rain, or dust). These factors collectively reduce the number of reliable visual features and weaken the visual constraints used in the fusion process [
18].
Consequently, the Visual–Inertial (VI) combination enables an effective navigation system in indoor areas, where rich textures and adequate lighting are available. Nevertheless, large-scale and global position determination requires a connection between local and global frames. This connection is provided by the three translation parameters of the local frame relative to the global frame, as well as the orientation. Therefore, combining GNSS with VI sensors can both recover translation parameters between these frames and reduce accumulated errors in VI, enabling long-term use [
21]. On the other hand, the computational cost grows with the increasing number of tracked features, the size of the optimization window, the density of keyframes, the frequency of marginalization, and the dimensionality of the states and factors involved, requiring careful management to sustain real-time performance [
22].
GVINS [
23] introduces a GNSS Visual–Inertial Navigation System that tightly fuses monocular camera images, inertial measurements of a MEMS IMU, and single-frequency raw measurements of a low-cost GNSS receiver in a sliding-window factor-graph optimization framework. The VI side of GVINS is based on the SLAM framework, VINS-mono [
20]. GVINS uses the Lucas–Kanade method [
24] to track a group of sparse-feature points [
25] extracted from distortion-corrected [
26] images as visual measurements. An improvement to GVINS, namely, P
3-VINS [
27], extends GVINS by incorporating the Ionosphere-Free (IF) combination of GNSS-carrier-phase measurements to operate the GNSS Precise-Point-Positioning (PPP) technique. Carrier-phase measurements are prone to cycle slips due to signal loss or low Signal-to-Noise Ratios (SNRs). P
3-VINS introduces a novel factor, called the phase-ambiguity factor, to mitigate the cycle slip problem. This factor uses dual-frequency Melbourne–Wübbena and Geometry-Free combinations to handle cycle slips in the optimization process [
28]. It incorporates the difference between the ambiguities of dual-frequency carrier-phase observations as a residual in the optimization problem. While this approach offers significant benefits by utilizing carrier-phase data, the ambiguities are handled as a float solution, which limits their actual contribution compared to that of an integer-ambiguity resolution. The most recent attempt to increase the accuracy of GVINS is the DGVINS method, which uses GNSS double-difference measurements [
29]. This approach employs two dual-frequency GNSS receivers. Whereas one of the receivers is stationary at an accurately known position, the mobile receiver is bundled with GVINS sensors. Double differences cancel out satellites’ and receivers’ clock errors. Remaining errors are highly correlated with the distance between mobile and reference GNSS receivers. Thus, DGVINS works well within a few kilometers from the reference station by eliminating ionosphere- and troposphere-dependent effects. This method performs single-epoch integer-ambiguity resolution of the wide-lane combination without using cycle slip detection.
Although GVINS may provide a framework for tightly coupled GNSS VI fusion, it was originally built for a u-blox ZED-F9P receiver, which may be considered as a higher-grade receiver with a dedicated antenna compared to that of a smartphone. It relies on a 10 Hz measurement rate for code pseudoranges, which are also not available in mass-marketed smartphones. The success of GVINS on-device smartphone applications mainly depends on the smartphone GNSS accuracy. Recent studies have highlighted the limitations of smartphone GNSS measurements under several conditions. A deep investigation of pseudorange residuals and the Carrier-to-Noise density ratio (C/No) was conducted in [
30] for assessing smartphones’ GNSS receiver quality and revealing their potential. In [
31], Nexus 9 tablet measurements in both kinematic and static modes were used for evaluating the smoothing filter performance, and improved results were obtained through an enhanced Hatch filter solution. In [
32], smartphone GNSS range errors in realistic environments were numerically evaluated using a geodetic receiver as a reference, and the effects of signal blockage, multipath, device placement, and constellation differences on the distribution and behavior of pseudorange biases were analyzed. GNSS observations generated via the Android API and common logging applications were examined in [
33], where biases and inconsistencies degrading the positioning performance were identified, and the newly developed CSV2RINEX tool was shown to produce higher-quality observations with improved positioning accuracy. More reliable stochastic models for smartphone GNSS observations from two Samsung S20 devices were developed in [
34], and significant positioning improvements were demonstrated by incorporating the derived variances. In [
35], inequality constraints, such as heading, vertical velocity, and inter-device distance, were introduced, leading to notable enhancements in GNSS-only smartphone positioning. In [
36], the performances of DGNSS, SPP, PPK, and PPP were compared across several smartphones. In [
15], tightly coupled GNSS/INS integration was performed with multiple smartphones, and sub-meter positioning accuracy was achieved utilizing RTK measurements with carrier-phase data. Furthermore, in [
37], a smartphone was employed as a GVINS sensor, and accuracy improvements of up to 70% relative to that of SPP were achieved.
In this study, a custom-hardware-supported experimental setup was developed to evaluate the GVINS performance with smartphone GNSS measurements. The developed hardware and software platform can collect measurements, which can be directly input into the original GVINS as well as replace high-rate GNSS measurements from u-blox ZED-F9P, with 1 Hz raw GNSS measurements collected from a Samsung A51 rigidly bound to the same data collection platform. Using this custom-developed hardware and software platform, three different real-world scenario datasets were collected, with varying difficulties in GNSS satellite visibility, from open-sky to dense urban-like environments, representing typical conditions faced in both pedestrian and vehicular navigation. In addition, differential corrections from the nearby IGS reference station (ANK2) were applied to evaluate their impacts on both DGPS and Differential-GVINS (DGVINS) performances. We performed a systematic evaluation across three datasets representing different environmental conditions, allowing us to assess smartphone GNSS behavior in diverse visibility and multipath scenarios. A custom Android-ROS interface that streams raw smartphone GNSS measurements into the GVINS pipeline was developed, which also provides a foundation for future efforts toward running GVINS directly on Android devices. For all the datasets, smartphone measurements are compared against a u-blox ZED-F9P receiver and the impacts of simple code-based differential corrections on both standalone GNSS solutions (SPP and DGPS) and on tightly coupled fusion methods (GVINS and DGVINS) are investigated. The GVINS was also updated to handle 1 Hz GNSS data. Furthermore, this study provides a practical and comprehensive assessment of how low-cost smartphone GNSS can be leveraged within an optimization-based navigation framework and outlines a pathway toward future smartphone–native GNSS-VIO integration.
The structure of the paper is organized as follows:
Section 2 provides an overview of the GVINS framework and highlights the key methodologies employed in this study.
Section 3 presents the experimental setup used for data collection, while
Section 4 introduces the datasets.
Section 5 describes the evaluation methods.
Section 6 discusses the results and provides an in-depth analysis. Finally,
Section 7 concludes the paper, summarizing the findings and suggesting directions for future research.
3. Experimental Setup
A custom hardware platform was developed for multi-sensor data collection. The processing unit consists of a Raspberry Pi 5 board equipped with 8 GB of RAM and an external 512 GB M.2 NVMe SSD. Two forward-looking global-shutter cameras featuring the Sony IMX296LQR-C image sensor were mounted to capture visual data (20–30 Hz). Inertial measurements were acquired from an Xsens MTi-1 IMU (at 100 Hz). Raw GNSS measurements from a Samsung A51 smartphone were logged at 1 Hz, while additional GNSS observations were collected from a u-blox ZED-F9P receiver operating at 10 Hz and connected to a dual-frequency (L1 + L2) u-blox antenna. An overview of our custom data collection platform is shown in
Figure 3. The Xsens MTi-1 IMU (Xsens Technologies B.V., Enschede, The Netherlands), the u-blox receiver (u-blox AG, Thalwil, Switzerland), the Raspberry Pi 5 board (Raspberry Pi Ltd., Cambridge, United Kingdom), and the required power supplies are housed inside the enclosure shown in the figure.
The Samsung A51 integrates a built-in, single-frequency GNSS receiver that supports GPS, GLONASS, BeiDou, and Galileo; however, the exact GNSS RF front-end or chipset model used inside the Exynos 9611 platform is not publicly documented by its manufacturer [
43]. During our experiments, Galileo signals were not observed when using Google’s open-source GnssLogger [
44] application. This device, similar to most Android smartphones, provides raw GNSS measurements at a rate of 1 Hz. For smartphone GNSS data acquisition, the open-source Android ROS node UMA ROS [
40] was modified to enable the collection of raw GNSS measurements. Additionally, ground-truth positions were recorded using the built-in GNSS-RTK engine of the u-blox ZED-F9P receiver by feeding RTCM correction streams from the Turkish Permanent CORS Network (TUSAGA–Active) to the receiver via the u-blox ROS driver and the RTKLIB [
45] utility str2str.
The system time of the Raspberry Pi 5 computer was synchronized using a GPS time server by feeding a 1 PPS signal and NMEA messages from the u-blox ZED-F9P receiver to the Raspberry Pi IO ports. This ensures time consistency between the local sensors (camera and IMU) and the GNSS receiver. The datasets were recorded after time synchronization was completed. The sensor drivers were executed in separate docker containers since each driver required its own operating environment (ROS 1 or ROS 2) and software dependencies. Therefore, their runtime environments needed to be isolated from each other, which required the use of an additional node running the ROS bridge to enable communication between them.
Furthermore, the IMU intrinsic calibration was performed using six hours of static data with the Kalibr toolbox [
46]. Based on an Allan variance analysis, the accelerometer and gyroscope noise densities were identified as
and
, with corresponding bias random-walk terms
and
, respectively. In addition, the cameras were manually focused using the adjustment screws prior to collecting calibration and experimental datasets, ensuring that the intrinsic calibration parameters remained stable throughout data processing and analysis. In addition, the IMU and cameras were rigidly attached to the device body to maintain stable relative poses throughout both extrinsic calibration and data collection. These calibrations were performed using AprilGrid targets with Kalibr [
46].
The data were collected using the described equipment on 31 August 2025 and 5 October 2025. These datasets were recorded as rosbag files via ROS noetic. Furthermore, RINEX observation files were also collected using the GnssLogger [
44] application to enable an external check and to generate standalone SPP solutions independent of the ROS-based framework. In order to assess the performance under varying environmental conditions, camera, IMU, and GNSS measurements were gathered in diverse scenarios, including an open sports field, pedestrian walkways between campus buildings, and campus roads, using both walking and driving modes. During walking experiments, the Samsung A51 and the u-blox antennae were positioned within 10 cm, whereas in driving experiments, they were vertically separated by about 50 cm, which was later compensated.
4. Experimental Dataset
Three experimental scenarios were carefully designed to represent different levels of GNSS difficulty commonly encountered in real-world pedestrian and vehicular navigation.
Sports Field: This environment provides an open-sky condition with minimal multipath and strong satellite visibility. It serves as a baseline scenario to assess the best achievable performance of smartphone GNSS measurements when integrated into the GVINS framework.
Campus Walking: This scenario includes narrow pedestrian pathways surrounded by tall buildings and trees, creating an urban canyon-like environment with severe multipath, intermittent satellite blockage, and highly variable signal quality. It reflects typical challenges faced in smartphone-based pedestrian localization.
Campus Driving: The vehicular route covers a larger geographic area with varying building densities, partial obstructions, and transitions between open and moderately constrained environments. It is representative of navigation conditions encountered in urban and suburban driving applications.
In these three scenarios, the experimental setup captures a wide range of GNSS observability conditions, from ideal open-sky visibility to highly degraded urban environments, allowing a thorough evaluation of GVINS and DGVINS performances using smartphone GNSS measurements.
Figure 4 illustrates the ground-truth trajectory derived from the GNSS-RTK reference positions collected at the Hacettepe University Beytepe Campus. The purple dots in
Figure 4 indicate the locations corresponding to the sample images shown in
Figure 5. For differential corrections, ANK2 was selected as a base receiver. This reference station is located approximately 4.5 km away from Hacettepe University’s Beytepe Campus, where the dataset was collected. Prior to computing the corrections, the precise coordinates of the reference station were obtained from the CSRS-PPP online service [
47], which provides a PPP solution based on precise ephemeris and satellite clock products. The resulting reference station coordinates have an accuracy of approximately 2–3 cm.
6. Results
This section presents the quantitative evaluation of the five methods using both smartphone-based and u-blox ZED-F9P GNSS measurements across the three experimental scenarios. Although the original GVINS framework [
23] requires 10 Hz GNSS observations, it was modified in this work to accommodate 1 Hz smartphone GNSS measurements. Moreover, a ROS node running on the smartphone was implemented in Android to publish GNSS data. The differential GNSS approach was also evaluated both as a standalone GNSS-processing method and as a part of the GVINS factor-graph optimization framework. The following subsections present the results for each dataset, comparing all the processing modes and highlighting how environmental conditions, satellite visibility, and measurement quality affect the performance of the fused solutions.
Errors in the east, north, and up directions (
) are calculated by converting the position output of the sensor fusion solution to the ENU frame, using the RTK solution as the origin at each measurement epoch. In a manner similar to that of GVINS [
23], the solutions are evaluated separately for each ENU axis, as well as for horizontal (2D) and 3D errors, using metrics such as the Mean Absolute Error (MAE), Root-Mean-Square Error (RMSE), and standard deviation of RMSEs.
where
is the mean value of the errors.
6.1. Sports Field
For the sports field dataset, GNSS, IMU, and camera measurements were collected over multiple laps around the field, covering a total duration of 796 s on 5 October 2025.
Figure 6 presents the sky-plots of the GPS, GLONASS, and BeiDou constellations observed using the Samsung A51, where the bold numbers indicate the satellite vehicle numbers. The C/No observations from the smartphone exhibit abrupt fluctuations, causing the corresponding colors of the C/No values to change rapidly in the figures. During data collection, eleven GPS, five GLONASS, and nine BeiDou satellites were visible to the Samsung A51.
The mean C/No values of the u-blox and Samsung A51 GNSS measurements are presented in
Figure 7. The results indicate that the A51 exhibits an approximately 10 dB lower C/No compared to that of the u-blox receiver. Compared to smartphones, u-blox receivers include advanced algorithms and hardware features to mitigate multipath effects. Their high-precision module’s primary function is maximizing the position accuracy and signal quality. A smartphone’s chip, conversely, must prioritize low power consumption, small size, and integration with a complex system, often leading to worse signal quality. Furthermore, some satellites are not commonly observed using both receivers. Although the u-blox receiver was tracking 40 satellites, the Samsung A51 observed 25 satellites, 24 of which were common between the two receivers.
The number of satellites available for processing modes over time are shown in
Figure 8. For the M1 and M2 processing modes, only GPS satellites with available differential corrections were selected. These satellites were then filtered with a C/No of greater than 25 dB-Hz and an elevation angle of above 7°. Thus, for M1 and M2 modes, the median value of the satellite number is seven. In contrast, the M3 and M4 modes included satellites from the GPS, GLONASS, and BeiDou constellations under the conditions of C/No >25 dB-Hz and elevation >30°. The median numbers of satellites for the M3 and M4 modes are eleven and nine, respectively. Lastly, the u-blox receiver observed the highest number of satellites (including Galileo satellites), with a median number of 21, in the same configuration as M3 and M4.
The results presented in
Table 2 summarize the horizontal (2D), vertical (Up), and three-dimensional (3D) positioning errors together with their standard deviations. First, applying differential corrections (M2-DGPS) improved the 3D positioning performance of the M1-SPP method by approximately 29.1%, 37.8%, and 15.8% in RMS, MAE, and STD metrics, respectively. It can be observed that the 3D standard deviation is approximately 11–13 m for the M1 (SPP) and M2 (DGPS) cases, indicating a relatively high dispersion in the positioning results. In contrast, the u-blox receiver achieves sub-meter precision, demonstrating superior measurement quality. Furthermore, the application of differential corrections (M4-DGVINS) significantly improves the performance of the Samsung A51, reducing the 3D RMSE, MAE, and STD from approximately 12.9 m, 12.6 m, and 2.74 m to 4.4 m, 4.1 m, and 1.5 m, corresponding to 66%, 67.3%, and 45.3% improvements, respectively.
Figure 9 shows the time series plot of each ENU axis. The results show the potential of the smartphone GNSS receivers in tightly coupled navigation systems. GVINS provides high-rate and seamless positioning, even with low-cost GNSS sensors.
The cumulative probability of position errors is presented in
Figure 10. The results indicate that 95% of the horizontal errors are approximately below 3 m for the M4 case and below 6 m for the M3 case, demonstrating the benefit of applying differential corrections. The vertical error distribution may further highlight the significant improvement achieved through differential correction for both the M1-M2 and M3-M4 comparisons. Additionally, applying differential corrections in GVINS may reduce the 95% 3D position error from roughly 32 m to 6 m, corresponding to an approximate 80% improvement compared to the SPP-processing mode.
Figure 11 illustrates the horizontal scatter plots of the five positioning methods. The horizontal errors of the GVINS solutions (M3 and M5) without differential corrections exhibit noticeable biases with respect to the ground truth. In contrast, the standalone GNSS solutions (SPP and DGPS) show a much wider dispersion. The DGVINS (M4) method appears to be a promising candidate for improving positioning accuracy by reducing the bias around the mean values. The GVINS u-blox case, though slightly biased, provides the higher precision and consistency expected from a higher-grade GNSS receiver.
6.2. Campus Walking
The campus-walking dataset represents a more challenging scenario, collected during a 688 s walk within the university campus on 31 August 2025. The main difficulties arise from degraded GNSS signal conditions compared to those of the sports field dataset. The area is largely surrounded by buildings, trees, and other structures, which significantly obstruct satellite visibility and degrade signal quality.
Figure 12 shows the sky-plot of the observed satellites, where most low-elevation satellites are blocked by environmental obstacles. Moreover,
Figure 13 presents the mean C/No values, which are slightly lower than those in the sports field dataset. While the u-blox receiver observed 30 satellites, the Samsung A51 GNSS receiver tracked 21 of them. The Samsung A51 tracked nine GPS, six GLONASS, and six BeiDou satellites, but no Galileo signals were received.
Figure 14 illustrates the variation in the number of satellites available for processing over time. In the M1 and M2 modes, the Samsung A51 tracked a median of seven GPS satellites, with a C/No ratio of greater than 25 dB-Hz and an elevation angle of above 7°. For the M3 and M4 modes, the median number of satellites increased to 12 and 11, respectively, including observations from the GPS, GLONASS, and BeiDou constellations, under thresholds of a C/No ratio of >25 dB-Hz and an elevation of >30°. In the same configuration, the u-blox receiver (M5) achieved the highest satellite visibility, with a median of 20 satellites. Around TOW = 53,140–53,170, the trajectory passed through a deep canyon surrounded by buildings and trees, where the number of visible satellites decreased significantly.
M1 and M2 results rely entirely on noisy GNSS measurements, which are highly susceptible to multipath, especially in urban environments. On the other hand, the IMU and camera provide high-rate measurements of the device’s motion and environment. These continuous, locally accurate measurements effectively filter the GNSS noise, constraining the estimated position to be much better than those measured using SPP and DGPS. As summarized in
Table 3, the GVINS solution based on smartphone GNSS measurements (M3-GVINS) markedly outperforms the conventional SPP (M1) and DGPS (M2) approaches, yielding significant reductions in RMS, MAE, and STD metrics. These improvements correspond to reductions of 73.5%, 72.4%, and 76.4% when comparing M3-GVINS with M1-SPP. Furthermore, the application of differential corrections in the M2-DGPS solution helped to reduce RMSE, MAE, and STD by 13.2%, 14.3%, and 10.4%, respectively, compared to those of the M1-SPP case. Therefore, these results may indicate the benefits of applying differential corrections.
However, due to the urban-canyon-like environment around TOW = 53, 140–53, 170, differential corrections may have been insufficient to mitigate error sources, resulting in a sudden jump in the up component. In the M1-SPP and M2-DGPS cases, the effects of such jumps do not persist long, as the solutions are computed on an epoch-by-epoch basis without memory. In contrast, for GVINS-based solutions, once the global position drifts significantly, removing its effect from a memory-based filter—such as the factor-graph optimization window—is not an instantaneous process. Additional results presented in
Table 3, corresponding to the time interval TOW = 52,950–53,120, just before entering the urban canyon, further demonstrate the benefits of applying differential corrections in GVINS. According to this subset of the results, the 3D RMSE, MAE, and STD are reduced from 7.7 m, 6.5 m, and 4.1 m to 6.3 m, 5.3 m, and 3.4 m, representing improvements of 18.7%, 19.7%, and 16.2%, respectively.
Considering the 2D position, the M4 configuration is roughly 16% better than solutions without differential corrections.
As expected, the u-blox GVINS (M5) yielded the most precise results, with 3D RMS errors of around 2.7 m, attributed to its advanced algorithms, higher signal quality, and dedicated antenna. Overall, these findings demonstrate that tightly coupled GNSS–Visual–Inertial integration and differential corrections can substantially enhance the performances of low-cost, single-frequency smartphone receivers. However, the measurements from such devices remain highly susceptible to environmental conditions, which may hinder their ability to approach the accuracy of high-grade GNSS sensors.
Figure 15 illustrates the time series obtained from this dataset. All the GVINS-based positioning modes produce seamless trajectories compared to those produced by the SPP and DGPS cases. By fusing visual and inertial measurements, the GVINS framework is capable of producing 10 Hz position estimates from 1 Hz GNSS observations, effectively bridging temporal gaps and maintaining solution continuity during GNSS signal degradation. As a result, the GVINS solutions demonstrate significantly higher temporal consistency and robustness than conventional GNSS-only methods.
Figure 16 shows the cumulative distribution of the positioning errors. The CDF plots illustrate the overall performances of the five different methods. By applying differential corrections to the SPP case, the 3D 95% position error was moderately reduced from around 44 m to approximately 41 m. With the integration of visual–inertial sensors, the error further decreased to about 10 m.
The scatter plots in
Figure 17 illustrate the spatial distribution of position errors for each method. In the smartphone-based SPP (M1) and DGPS (M2) solutions, the error points are widely dispersed, indicating unstable positioning performance and significant horizontal deviation. In contrast, the GVINS (M3) and DGVINS (M4) results exhibit a much denser cluster of points around the origin, demonstrating improved precision and consistency. The u-blox GVINS (M5) shows the tightest concentration of errors, confirming its superior accuracy due to higher-quality GNSS observations. Overall, the scatter plots clearly visualize the progressive improvement in positioning stability and error reduction achieved through visual–inertial integration.
6.3. Campus Driving
The campus-driving dataset corresponds to 779 s of data collected by car on university campus roads on 5 October 2025. Visual features were mainly derived from static structures, such as road markings and buildings, with only a limited number of dynamic objects (e.g., vehicles and pedestrians) present at the scenes.
Figure 18 and
Figure 19 present the sky-plots and mean C/No values. The sky-plot indicates that the sky is mostly unobstructed down to low elevation angles, and the mean C/No values are comparable to those of the sports field dataset. The u-blox receiver tracked 33 satellites, 25 of which were commonly observed using the Samsung A51. Satellites tracked using the Samsung A51 included ten GPS, six GLONASS, and nine BeiDou satellites, while Galileo signals were not acquired.
Figure 20 presents the temporal evolution of the number of satellites used in positioning. For the M1 and M2 configurations, the Samsung A51 maintained visibility to a median number of six GPS satellites satisfying C/No >25 dB-Hz and elevation >7°. In comparison, the M3 and M4 configurations benefited from multi-constellation tracking, providing a median of 10 satellites from GPS, GLONASS, and BeiDou, with C/No >25 dB-Hz and elevation >30°. The u-blox receiver exhibited the highest satellite availability, maintaining a median of 20 visible satellites throughout the session.
Based on the results presented in
Table 4, the 3D RMSE, MAE, and STD improved by 27.2%, 31.3%, and 19.4%, respectively, when differential corrections were applied to the M1-SPP case. These errors were further reduced through visual–inertial navigation, as seen in the M3-GVINS configuration, reaching 12.6 m, 11.2 m, and 5.8 m, which correspond to improvements of 59.9%, 56.7%, and 67.7% compared to those of the M1-SPP case. The integration of GNSS with visual–inertial measurements helps to mitigate the high noise level inherent in GNSS measurements, resulting in smoother and more consistent positioning solutions.
In addition, the differential GVINS (M4-DGVINS) approach achieved the highest positioning accuracy among all the configurations in 3D RMSE and MAE. The RMSE, MAE, and STD values for the A51 DGVINS solution were significantly lower than those for the M3-GVINS, indicating the effective mitigation of common-mode errors through differential corrections. While the u-blox GVINS (M5) provided the most precise overall solution due to its receiver quality, the smartphone-based DGVINS achieved approximately 5 m accuracy, demonstrating the feasibility of enhancing low-cost smartphone GNSS data with differential corrections and tightly coupled visual–inertial integration.
Figure 21 illustrates the north, east, and up positioning errors obtained in all the tested modes. Among these, the M5-GVINS solution achieved the highest precision and stability, exhibiting minimal temporal fluctuations compared to the other methods. The M4-DGVINS configuration provided the most accurate vertical estimates in terms of the RMSE, while the larger standard deviation indicates less consistency and a higher noise level in the up component. When compared to the M1 and M2 modes, all the GVINS-based approaches (M3, M4, and M5) produced smoother and more reliable trajectories, confirming the advantages of tightly coupled visual–inertial–GNSS integration in improving precision.
Figure 22 presents the cumulative distribution functions (CDFs) of the 2D, vertical, and 3D positioning errors in all the modes. The 95th percentile of the 3D positioning error exceeds 40 m for the M1 and M2 cases, indicating relatively lower standalone and differential GPS performances. When GVINS is employed, the 3D error is significantly reduced to approximately 15 m, demonstrating the benefit of integrating visual–inertial information with GNSS. Furthermore, the application of differential corrections further enhances the accuracy, achieving a performance level comparable to that of the M5 u-blox receiver case.
Figure 23 illustrates the spatial distributions of the positioning errors in all the tested modes. Among them, the M5-GVINS solution exhibits the most compact scatter, indicating the highest positioning precision. However, a small bias is observed toward the negative north direction, suggesting the presence of a systematic offset. Overall, when comparing GVINS-based approaches (M3-M5) with non-GVINS modes (M1-M2), significant improvements in precision and consistency are achieved, highlighting the effectiveness of the tightly coupled visual–inertial–GNSS integration.
7. Conclusions
This study aimed to evaluate the potential of low-cost, low-power, noisy, single-frequency GNSS receivers available in mass-marketed smartphones, within the tightly coupled visual–inertial navigation framework of GVINS. A custom hardware and software platform was developed to support dataset collection and evaluation using a Raspberry Pi 5 computer, integrating two global-shutter cameras and an Xsens MTi-1 IMU, with all the components operating under different ROS distributions that were interconnected via a ROS bridge. GNSS measurements were recorded using a u-blox ZED-F9P receiver with a dedicated antenna and a Samsung A51 smartphone rigidly attached together, enabling a direct performance comparison between the results of GVINS using measurements from the two devices. Android raw GNSS measurements were collected using a custom Android application. Ground-truth positions were obtained using a GNSS Network RTK provided by the Turkish Permanent CORS Network, TUSAGA–Active, and a u-blox receiver–antenna pair. The Raspberry Pi system time was synchronized with a GPS time server to ensure that the timestamps of the local sensors (camera and IMU) were aligned with GPS time (GPST). Three different datasets were collected to represent diverse real-world scenarios, including an open-sky sports field, an urban area surrounded by buildings, and a driving experiment within a campus environment. Differential corrections were obtained from the nearby IGS station (ANK2) using a code-pseudorange-based method in order to evaluate the DGVINS method. Five distinct evaluation schemes were used: SPP, DGPS, GVINS, and DGVINS with Samsung A51 measurements and GVINS with u-blox measurements. This study also evaluated the performance of the u-blox ZED-F9P receiver within the GVINS framework using 1 Hz measurements, whereas the original work [
23] reported results based on 10 Hz data.
The u-blox ZED-F9P GNSS receiver provided the most reliable and precise results due to its advanced algorithms, higher signal quality, and dedicated antenna. The smartphone-based solution still achieved reasonably accurate positioning despite the inherent hardware limitations when integrated into the GVINS framework. Our results show that the improvements achieved by M4-DGVINS compared to M1-SPP are 80.4%, 64.9%, and 83.8% for the sports field, campus-walking, and campus-driving datasets, respectively, in terms of 3D RMS errors. Although these percentages are broadly consistent across the scenarios, the campus-walking dataset exhibits noticeably lower improvement. This may be primarily due to the urban canyon conditions observed along parts of the walking trajectory, where the number of visible satellites drops sharply. In these segments, the GVINS solution may experience a GNSS-induced position deviation caused by degraded satellite geometry, and applying differential corrections during these periods may also be insufficient to mitigate these errors. When such degraded GNSS factors enter the factor-graph optimization window, their influence cannot be removed immediately; even after satellite visibility improves, the optimizer requires time to dilute the effect of these erroneous constraints. Consequently, the residual global bias persists over a portion of the trajectory, reducing the net improvement achieved by DGVINS in this scenario. In contrast, the open-sky sports field provides uninterrupted satellite visibility and pseudorange measurements obtained under higher C/No conditions, while the campus-driving dataset benefits from longer segments with less severe satellite blockage compared with those in the campus-walking scenario. These conditions may allow differential corrections to be utilized effectively, resulting in larger relative improvements with DGVINS. Beyond these environment-specific effects, the remaining differences among the scenarios can be explained by the general error sources present in smartphone GNSS data, including measurement noise, multipath, atmospheric delay variations, and residual satellite orbit and clock errors. Even under these limitations, the results highlight the potential of smartphone GNSS receivers within the GVINS framework. Although they receive signals from fewer constellations, with lower signal quality and a lower number of satellites, they can still achieve performances comparable to that of a relatively higher-end dual-frequency GNSS receiver, the u-blox ZED-F9P. In addition to these findings, the relative performance of M4-DGVINS with respect to M3-GVINS varied significantly across the test scenarios. Improvements of 66.0% and 59.7% were obtained in the sports field and campus-driving datasets for 3D RMSE, whereas a 32.4% degradation occurred in the campus-walking dataset collected under more challenging conditions. A similar dataset-dependent trend was also observed in the standalone GNSS results: DGPS improved the 3D RMSE compared to that of SPP by 29.1% and 27.2% in the sports field and campus-driving datasets but only by 13.2% in the campus-walking dataset. These results may indicate that the effectiveness of differential corrections is condition dependent.
Smartphones are also equipped with various sensors beyond the IMU, camera, and GNSS. Once reasonable performance is achieved using the rolling-shutter cameras and low-cost IMUs that may be affected under dynamic conditions, additional sensors, such as barometers, magnetometers, and ranging sensors based on UWB, LiDAR, Wi-Fi, 5G, or Bluetooth, can also be incorporated to further enhance the performance. Deploying GVINS directly on Android devices would allow real-time operation under the restricted processing power and energy budget of smartphones, motivating the development of lightweight feature extraction and tracking modules and the exploration of shorter optimization windows that can still deliver comparable performances. The findings of this study demonstrate that developing Android-based GVINS software holds strong potential for both indoor and outdoor augmented reality applications. In addition, higher-end smartphones containing dual-frequency GNSS receivers may be evaluated, as their improved chipsets, enhanced antennae, and better signal-processing capabilities have the potential to provide significantly higher-quality measurements and reduce many of the limitations observed in current devices. Modern smartwatches are increasingly equipped with GNSS receivers, while emerging wearable devices, such as smart glasses, integrate cameras and IMUs. The combination of these complementary sensing modalities across multiple wearable platforms presents a promising opportunity for future on-device, tightly fused navigation, enabling more robust and continuous positioning, even in challenging environments.