The Integration of GPS/BDS Real-Time Kinematic Positioning and Visual–Inertial Odometry Based on Smartphones

: The real-time kinematic positioning technique (RTK) and visual–inertial odometry (VIO) are both promising positioning technologies. However, RTK degrades in GNSS-hostile areas, where global navigation satellite system (GNSS) signals are reﬂected and blocked, while VIO is affected by long-term drift. The integration of RTK and VIO can improve the accuracy and robustness of positioning. In recent years, smartphones equipped with multiple sensors have become commodities and can provide measurements for integrating RTK and VIO. This paper veriﬁes the feasibility of integrating RTK and VIO using smartphones, and we propose an improved algorithm to integrate RTK and VIO with better performance. We began by developing an Android smartphone application for data collection and then wrote a Python program to convert the data to a robot operating system (ROS) bag. Next, we established two ROS nodes to calculate the RTK results and accomplish the integration. Finally, we conducted experiments in urban areas to assess the integration of RTK and VIO based on smartphones. The results demonstrate that the integration improves the accuracy and robustness of positioning and that our improved algorithm reduces altitude deviation. Our work can aid navigation and positioning research, which is the reason why we open source the majority of the codes at our GitHub.


Introduction
Global navigation satellite systems (GNSSs) have advanced at a breakneck pace in recent years. The expansion of constellations, as well as the addition of new signals and the introduction of multiple positioning solutions, have contributed to the advancement of precise navigation [1,2]. The real-time kinematic positioning technique (RTK) is representative and attracts research. RTK is a differential positioning technique that utilizes at least one stationary station to determine the location of movable receivers. The stationary station, also known as the reference station, is critical for providing a reference and mitigating common errors between itself and the movable receiver, also known as the rover or the user [3]. RTK simultaneously processes the pseudo-range and carrier phase measurements with the ambiguity resolution (AR) technique to generate more accurate positioning results [4]. According to certain studies, the dual-frequency RTK can accomplish quick AR for short baselines, resulting in real-time centimeter-level precision [5]. In comparison, the single-frequency RTK performance is restricted but can be considerably improved by utilizing satellites from various constellations. These constellations include the global positioning system (GPS), the China-developed Beidou navigation satellite system (BDS), the European Galileo system, and the Russian global navigation satellite system (GLONASS) [6][7][8]. RTKLIB is a well known and representative open-source software for calculating RTK positioning results [9].
Compared to geodetic-grade multifrequency GNSS receivers, consumer-grade receivers are more prevalent in everyday life because of their low cost and low-power consumption. Smartphones, which have become almost ubiquitous in modern society, can be thought of as consumer-grade GNSS receivers. However, smartphone-grade antennas might lose several or even dozens of decibels of sensitivity when compared to professional GNSS antennas, causing smartphones to struggle to maintain a lock on GNSS signals [10]. Additionally, GNSS signals are circularly polarized to suppress the effect of Faraday rotation, while smartphone-grade antennas adopt linear polarization [11]. As a result, the smartphone-grade antenna can only provide a poor carrier-to-noise ratio and insufficient multipath suppression, limiting the performance of the smartphone's built-in GNSS receiver [12]. The advancement of the Android operating system and GNSS chipsets promotes research into positioning, using smartphones. Before 2016, researchers could only obtain the positioning results calculated by the inbuilt GNSS chipset. This situation persisted until Google released Android N, which started providing the Android Raw GNSS Measurements API. Since then, developers have gained access to the pseudo-range and carrier phase measurements and have begun processing the measurements, using their positioning algorithms [13]. Nonetheless, a technique called duty cycling prevents smartphones from tracking the carrier phase continuously. Duty-cycling periodically powers on and off the GNSS chipset to extend the battery life [14]. Android P has a developer option titled "Force full GNSS measurements" that disables duty cycling, paving the path for precise positioning with the carrier phase [15]. Meanwhile, the Xiaomi MI 8 equipped with BCM47755 was released. The Xiaomi MI 8 can provide dual-frequency GNSS measurements and has become an ideal platform for research into smartphone-based precise positioning. According to studies, the Xiaomi MI 8 is promising for offering accurate positioning results in urban areas with RTK and precise point positioning (PPP) [16][17][18].
Users of smartphones are often pedestrians in urban areas. Buildings can block and reflect GNSS signals, obstructing receivers' ability to maintain signal tracking and exacerbating the multipath effect. The carrier phase tracking loop is the weakest link of the GNSS receiver, and it is easier that the carrier tracking loop loses the lock than the code tracking loop [19,20]. The carrier phase measurements may occasionally be absent in urban areas [21]. A pedestrian often keeps their smartphone near their body when they need positioning. As a result, the user's body becomes an unavoidable signal blocker. This fact significantly constrains the performance of smartphone-based RTK. We should introduce other sensors and positioning approaches to assist RTK in providing continuous positioning results in urban areas. A common complementary solution is the inertial navigation system (INS) based on an inertial measurement unit (IMU) [22]. An IMU consists of gyroscopes and accelerometers. The measurements are subject to additive noise and a changing bias, which results in a long-term drift in positioning [23]. Visual odometry (VO), a camera-based visual approach, is also a supplementary positioning solution [24]. VO performance is constrained by light circumstances, ambient textures, and device speed. In the case of monocular VO, the system's absolute scale is ambiguous [25]. Visual-inertial odometry (VIO) is a technique that combines VO with INS to mitigate long-term drift and solve scale ambiguity. Additionally, the combination increases the robustness [26]. VINS-mono, ORB-SLAM3, and MSCKF are representative VIO algorithms [27][28][29]. Generally, these algorithms are integrated with the robotic operating system (ROS) [30]. Nonetheless, VIO systems still undergo long-term drift [31]. In addition, VIO generates a pose output that includes the estimation of position and orientation in local coordinates, indicating that it is a relative positioning technique. Its positioning results depend on the starting point. As a result, VIO is unfriendly for reusing without a fixed global coordinate [32]. These positioning techniques are summarized in Table 1. In Table 1, means that the technique has a corresponding feature, while × means the opposite.
VIO can produce highly accurate relative positioning data in a short period of time. In contrast, GNSS positioning techniques can deliver absolute positioning results. Combining VIO and GNSS can provide locally accurate and global drift-free positioning results. In recent years, the integration of INS, VO, and GNSS has been a popular topic. ETH Zurich designed MSF-EKF, a loosely coupled GNSS/INS/VO framework, based on an extended Kalman filter (EKF) [33]. Rui Sun et al. proposed a GNSS/INS/VO fusion scheme, using two EKFs with a non-holonomic constraint (NHC) for real-time 3D vehicle state estimation [34]. Tong Qin and colleagues published VINS-fusion, a loosely coupled GNSS/VIO system based on optimization [35]. Tuan Li and his colleagues utilized MSCKF to tightly fuse RTK/INS/VO to increase velocity and attitude accuracy [36]. Shaozu Cao's team offered GVINS, derived from VINS-mono. GVINS optimizes GNSS measurements, including pseudo-range and Doppler measurements, along with visual and inertial measurements [37]. These studies advance the field of navigation study. They focus on giving precise positioning results, using bulky and complex specialized devices. Additionally, researchers frequently need to assemble and synchronize sensors, as few commercial devices simultaneously collect GNSS measurements, inertial measurements, and images. For instance, VINS-fusion is built on a self-developed suite that contains stereo cameras and a DJI A3 controller (http://www.dji.com/a3 (accessed on 14 October 2021)). The DJI A3 controller comprises an IMU and a GNSS receiver. The team proposing GVINS combines a u-blox ZED-F9P receiver (https://www.u-blox.com/en/product/zed-f9p-module (accessed on 14 October 2021)) with a VI-Sensor [38]. The VI-Sensor synchronizes the camera and IMU well, and the ZED-F9P delivers a pulse per second (PPS) signal to trigger the VI-Sensor and align the time. Regrettably, the VI-Sensor is no longer commercially available (https://github.com/ethz-asl/libvisensor/issues/11 (accessed on 14 October 2021)). Rui Sun and Tuan Li both utilized three components to acquire GNSS measurements, IMU measurements, and images. They used the PPS signal to trigger the camera and record the GPS time of the exposure. These works are summarized in Table 2. In comparison to these sophisticated and professional devices, smartphones are ubiquitous portable devices in modern society. Currently, smartphones are generally equipped with a camera, an IMU, and a GNSS chipset, which provide a platform for data fusion. Admittedly, these low-cost sensors have certain inherent flaws. For example, there are temporal offsets between the camera and the IMU. Additionally, smartphones generally adopt rolling shutter cameras that can generate motion blur [39]. The abovementioned antenna's low performance is also a significant issue. However, it is critical to investigate the potential of smartphones' positioning abilities [40,41]. Certain VIO algorithms, such as VINS-mono, can estimate the temporal offset, and an appropriate integration can compensate for the sensors' shortcomings. In addition, smartphone users can easily align the GPS time with the local time of the IMU and images. Researchers can treat smartphones as an expedient to study the integration of GNSS and VIO if they do not have the resources to build up a specialized suite.
There are multiple applications of logging measurements from smartphone inbuilt sensors. The GEO++ RINEX Logger is a representative application that can generate a file of measurements of GPS, BDS, Galileo, and GLONASS in a receiver independent exchange (RINEX) format [42]. RINEX is a widely used standard format in the fields of geodesy and navigation. The GEO++ RINEX Logger generates a file that can be directly processed by RTKLIB. Many studies on smartphone positioning are based on this application [43,44]. Regrettably, the GEO++ RINEX Logger can only provide GNSS measurements and does not allow access to its source code. The MARSLogger is an open-source application for recording IMU measurements and images [45]. However, the MARSLogger cannot offer GNSS measurements. Our previous work open sourced a CIGRLogger to collect images, IMU measurements, and GPS measurements. The GPS measurements are output in RINEX format and can be directly processed by RTKLIB, similar to what the GEO++ RINEX Logger does [46]. However, valid satellites in a single constellation are unable to meet the positioning requirements in urban areas. In this work, we upgrade our CIGRLogger to collect BDS measurements to introduce more valid satellites. CRGRLogger can also collect magnetic measurements and barometric measurements. These Android applications are summarized in Table 3. Table 3 illustrates the sensors from which the applications can collect measurements. In Table 3, means that the application can collect the corresponding sensor's measurements, while × means the opposite.

× ×
Many academics focus on positioning, using smartphone built-in sensors. Researchers from Nottingham Scientific Limited (NSL) assessed the smartphone-based multifrequency RTK and PPP performance in urban environments. Their research demonstrates that smartphone-based PPP and RTK can provide the location precision of 1-2 m [47]. Researchers from Wuhan University (WHU) used magnetic measurements to assist the IMU in smartphones. Additionally, they utilized pseudo-observations and strategies, such as zero-velocity update technology (ZUPT) and zero angular rate update (ZARU) to effectively suppress IMU drifts and, hence, offer robust, accurate positioning results [48,49]. A team from the Institute of Space Technology and Space Applications (IST) tested the performance of the loosely coupled RTK/INS with a smartphone. The results indicate that the introduction of IMU bridges the RTK gap and decreases RTK fluctuations [50]. Numerous researchers are studying smartphone-based VIO algorithms. Peiliang Li modified VINS-mono and ported his VINS estimator to the iPhone 7 [51]. Yuan Wang and his team deployed VIO algorithms on an Android smartphone [40]. In recent years, some researchers applied machine learning and deep learning techniques to the subject of smartphone navigation. They collected training data and formed models to estimate a smartphone's trajectory, using IMU measurements. Hang Yan and his colleagues employed a support vector machine (SVM) and a neural network in sequence to provide trustworthy positioning results, using a smartphone's IMU [52,53]. However, few studies have concen-trated on fusing RTK, IMU, and images with a smartphone to provide positioning results. These works are summarized in Table 4. The following are the contributions of this paper.
(1) We demonstrate the feasibility of integrating RTK and VIO, using smartphones. Our previous work verifies the feasibility of continuous positioning based on RTK and VIO, using smartphones. However, in our previous work, RTK and VIO do not run simultaneously, but alternate to estimate a positioning solution. The position output from RTK does not affect the output from VIO and vice versa [46]. In our previous work, we combined RTK with VIO in a straightforward and unreliable way. In this paper, we choose more reliable and robust strategies to integrate RTK and VIO. First, we upgrade our CIGRLogger to collect BDS measurements to introduce more valid satellites. Then, as many VIO algorithms are coupled with ROS, we design a Python script to convert the images, IMU measurements, and GNSS measurements into a ROS bag. A ROS bag is a file that stores messages for ROS. Each kind of message has a unique topic. ROS nodes can subscribe to different topics to obtain the data they require. Following that, RTKLIB is a sophisticated system that consists of many algorithms and functions. The algorithm is implemented in ANSI C. We follow RTKLIB's principle to implement the RTK function in C++ and package these codes into a ROS node. This RTK node subscribes to the topics in the ROS bag created by our Python script and publishes the RTK positioning results to other nodes. This RTK node can be directly integrated into other ROS algorithms. Finally, we adopt the integration strategy of VINS-fusion [35] and integrate our RTK node with VINS-mono [27]. Another node based on optimization is introduced to accomplish the integration. The details of the optimization are explained in Section 2.3. The experiments demonstrate that the integration of both approaches combines RTK and VIO's advantages and can provide accurate continuous positioning results with a smartphone in urban areas.
(2) We provide valuable research tools. As shown in Table 2, many researchers work on the GNSS/INS/VO fusion algorithm based on the bulky and complex specialized devices. We open source our codes on GitHub to facilitate the researchers who lack the resources to build a specialized suite for RTK/VIO integration research. The researchers can collect measurements using an Android smartphone with CIGRLogger. They do not need to implement RTK nodes themselves, which saves time and effort. The academics can concentrate on integration algorithms without spending too much effort on data acquisition, data transformation, and RTK algorithm implementation.
(3) We improve the integration strategy for smartphones. The positioning results provided by RTK with a smartphone can fluctuate immensely. Due to the geometric distribution of satellites, these fluctuations are more pronounced in vertical positioning. We design a sliding window strategy and calculate the vertical positioning fluctuations in the window. The result is treated as a criterion to adjust the weight for RTK in integration. This improvement decreases the deviation in altitude positioning results when compared to the VINS-fusion integration strategy.
As our work involves several existing algorithms, we use Figure 1 to depict the relationship between the existing algorithms and our work. The remainder of the paper is structured as follows: Section 2 discusses the RTK algorithm, the framework of VINS-mono, the integration strategy of VINS-fusion, the improvement of integration, and some specifics regarding our tools and devices. Next, Section 3 introduces the results of the experiments. Finally, Section 4 summarizes the conclusions and future work.

Materials and Methods
This section introduces the principles of the existing techniques utilized by us and our novel algorithm. The existing techniques include RTK in Section 2.1, VINS-mono in Section 2.2, and VINS-fusion in Section 2.3. Our proposed algorithm is introduced in Section 2.4. We introduce the details about our experiments in Section 2.5. A comparison between this work and our previous work is presented in Section 2.6.
In this section, we use t to denote the time. We use ∆t to denote the time interval between two adjacent sampling epochs. We use a notation that is followed by (t) to denote a variable that changes over time.

Implementation of RTK
This section introduces the RTKLIB principle [9,54]. We implement our RTK node in C++ following this principle. The current version of our RTK node has no algorithmic innovations when compared to RTKLIB. We use "G" as a descriptive term for GPS and use "B" as a descriptive term for BDS.

The Single Difference and the Double Difference
RTK is a differential positioning technique involving a single difference (SD) and a double difference (DD). The SD and DD techniques can mitigate the influence of biases because the measurements of the user and those of the reference station are correlated. Figure 2a depicts the geometry of two satellites and two receivers. Figure 2b explains how to calculate the SD and DD measurements [55].
In Figure 2a, o(t) with the specific superscript and subscript is the unit line-of-sight (LOS) vector from a receiver to a satellite. In Figure 2, ρ(t) with the specific superscript and the specific subscript denotes the pseudo-range between the corresponding receiver and the corresponding satellite. The carrier phase measurements are denoted by φ(t) with different superscripts and subscripts. The pseudo-range and the carrier phase are formulated as shown in Equations (1) and (2): In Equations (1) and (2), r(t) denotes the geometric distance between the receiver and the satellite in meters; λ and N represent the wavelength and the integer ambiguity, respectively; ω(t) with different subscripts are the measurement noise and unmodeled error in the pseudo-range and carrier phase, respectively; τ iono (t) and τ tropo (t) refer to the ionospheric delay and the tropospheric delay, respectively; δ rec (t) and δ sat (t) denote the receiver clock bias and the satellite clock bias, respectively; and c stands for the speed of light. The carrier phase measurements are more precise than the pseudo-range measurements. The phase tracking loop uses a numerically controlled oscillator (NCO) to generate a local signal with the same frequency and phase as the received signal. The loop utilizes a phase discriminator and a filter to compute the feedback to the NCO. The discriminator cannot distinguish between one cycle and another and thus, converges to the nearest cycle. The NCO can only match the phase of the local and received signals within one cycle. As a result, the carrier phases are ambiguous by an unknown integer number of wavelengths, which is denoted by N. RTK utilizes the SD and DD measurements to eliminate the influences of the clock biases and the delays and employs the AR technique to calculate the ambiguities for the carrier phases. In Figure 2b, the SD pseudo-range and the SD carrier phase are denoted by ∆ρ(t) and ∆φ(t). SD measurements eliminate the influence introduced by the satellite clock bias. When used in short-baseline RTK, the SD technique can eliminate most ionospheric delays and tropospheric delays. ∇∆ρ(t) and ∇∆φ(t) refer to the DD pseudo-range and the DD carrier phase, respectively. DD measurements have the advantage of removing the influence of the receiver clock bias. In short-baseline RTK, the DD pseudo-range and DD carrier phase are represented in meters as follows: where ∇∆(·) denotes the DD operator. The state vector of the RTK Kalman filter is composed by the integer ambiguities, and the measurement vector is composed by the DD pseudo-ranges and DD carrier phases. Each GNSS system has its own time reference. A time offset exists between GPS and BDS, and the clock biases of different systems are different. Admittedly, a GNSS system can broadcast the ephemeris that includes the information of the satellite clock bias. However, the satellite clock bias calculated in this way is not accurate enough. The SD and DD techniques are introduced to eliminate the influence of the clock bias. If we choose only one reference satellite for GPS and BDS, we should model the clock bias between two systems and estimate it, which can degrade the positioning accuracy. In this paper, we only make DD between satellites of the same systems to eliminate the clock biases more completely.

RTK's Kalman Filter
RTKLIB is built on an EKF that consists of two steps: prediction and update. The EKF provides floating ambiguity solutions. The lambda algorithm [56] is then used to search for integer ambiguity solutions. This section discusses the EKF in detail for GPS/BDS singlefrequency RTK. The state vector χ RTK (t), in general, contains the SD phase ambiguities and the position. We can expand the state vector and use the velocity and acceleration to smoothen the position fluctuations. If we extend the state vector and consider two constellations, the state vector can be represented as Equation (5): In Equation (5), d u (t) refers to the user's position; v u (t) refers to the user's velocity; a u (t) refers to the user's acceleration; e G (t) stands for the SD phase ambiguities of GPS satellites, and e B (t) stands for the SD phase ambiguities of BDS satellites.
We define the measurement vector y(t) as Equation (6): In Equation (6), ∇∆φ G (t) and ∇∆φ B (t) represent the DD carrier phase vectors of GPS satellites and BDS satellites, respectively. ∇∆ρ G (t) and ∇∆ρ B (t) refer to the DD pseudo-range vectors of GPS satellites and BDS satellites, respectively.
We use the variables m G and m B to represent the numbers of visible GPS satellites and visible BDS satellites, respectively. m G and m B are not necessarily equal. We define the state transition matrix as follows: In Equation (7), I with different subscripts denotes identity matrices of different dimensions; O with various subscripts denotes zero matrices of varying dimensions. F(t) is derived from fundamental kinematics theory and the fact that the carrier phase ambiguity is constant during uninterrupted tracking of a satellite signal. We use Q(t) to denote the covariance matrix of the process noise as shown in Equations (8)-(10): In Equation (9), R ECEF ENU (t) refers to the coordinates rotation matrix from the eastnorth-up (ENU) coordinate to the Earth-centered and Earth-fixed (ECEF) coordinate. In Equation (10), σ ve , σ vn , and σ vu are the standard deviations of east, north, and up components of the user's velocity.
We use E(t) with different superscripts to denote different satellites' SD phase ambiguity state variable. The relationship between e(t) in the state vector in Equation (5) and E(t) is shown in Equation (11): In Equation (11), k 1 and k 2 refer to the indices of GPS satellites and BDS satellites, respectively. S G k 1 denotes the k 1 th GPS satellite, while S B k 2 denotes the k 2 th BDS satellite.
We use ∇∆r S G 1 ,S G k 1 (t) to denote the DD geometric range between the 1st GPS satellite and the k 1 th GPS satellite. We use ∇∆r S B 1 ,S B k 2 (t) to denote the DD geometric range between the 1st BDS satellite and the k 2 th BDS satellite. ∇∆r S G 1 ,S G k 1 (t) and ∇∆r S B 1 ,S B k 2 (t) can be calculated using the user's position in the state vector in Equation (5) and the satellites' position as shown in Equation (12): In Equation (12), d S G k 1 (t) denotes the position of the k 1 th GPS satellite, and d S B k 2 (t) refers to the position of the k 2 th BDS satellite. The position of a satellite is calculated using the ephemeris broadcast by the satellite. d ref is the position of the reference station. We use h(χ RTK (t)) to refer to the the measurement function for the update step. The measurement function consists of four sub-functions as shown in Equation (13): where: . . .
In Equation (14), λ G and λ B are the wavelengths of the GPS L1 signal and BDS B1I signal, respectively.
We define D G and D B as the SD transition matrices for GPS and BDS as shown in Equations (16) and (17): We can define L G (t) and L B (t) as matrices comprised of LOS vectors for different constellations as shown in Equations (18) and (19): Finally, we can define the observation matrix H(t) as: We use the C to denote the covariance matrix of the observation noise as follows: In Equation (22), σ φ and σ ρ denote the standard deviation of the phase measurement error and the pseudo-range measurement error, respectively.

The Structure of VINS-Mono
This section summarizes the structure of VINS-mono that integrates recent study achievements in the fields of VIO and simultaneous localization and mapping (SLAM). We now give the frame definitions that we use throughout the remainder of the paper. (·) w denotes the world frame, which corresponds to the pose after initialization. (·) b denotes the body frame, which is the same as the IMU frame. (·) cam refers to the camera frame. Figure 3 illustrates the VINS-mono structure [27].
VINS-mono processes measurements from an IMU and a monocular camera. VINSmono begins with a measurement preprocessing module that preintegrates IMU measurements and extracts and tracks visual features. The system is subsequently initialized, using preprocessed measurements. The initialization module aligns IMU pre-integrals with feature observations to provide initial values of attitude, velocity, IMU bias, and scale for the system. These initial values enter the nonlinear optimization-based VIO that follows. This tightly coupled VIO makes use of a sliding window to reduce computational resources. VINS-mono defines the state vector as follows: In Equations (23) and (24), p b cam and q b cam refer to the rotation and translation from the camera frame to the body frame. We use i and j to denote the keyframe index and the feature index, respectively. x i (t) in Equations (23) and (24) stands for the IMU state vector when the ith frame is captured. x i (t) includes the position, velocity, orientation of the IMU in the world frame, and IMU biases in the body frame, as Equation (24) shows. s j (t) in Equation (23) denotes the inverse distance of the jth feature from the first observation. The variables n 1 and n 2 in Equations (23) and (24) denote the number of keyframes and features in the sliding window, respectively. If a frame is treated as a keyframe, the relocalization module will perform loop detection, using the frame's features. The relocalization module compares this keyframe to all other keyframes in the database to identify a candidate for loop closure. The information associated with this keyframe is imported into the database. Finally, the pose graph optimization module verifies the relocalization results and establishes the feature-level connections between loop closure candidates and the current keyframe. The VIO module mentioned above uses these feature correspondences to eliminate drifts. Relocalization can enhance VIO performance if a user frequently returns to a location he previously passed through. However, an outdoor user can move around a large region without returning to the position he walked through. We can use RTK and other GNSS techniques instead of relocalization to eliminate drift for outdoor users.

The Integration Strategy of VINS-Fusion
VINS-fusion loosely couples VIO and GNSS, as it transforms the results of VIO and of GNSS to unified factors to construct the optimization problem. The integration of GNSS and VIO is based on a global pose graph. Figure 4 illustrates this global pose graph structure. This pose graph is a nonlinear least-squares problem [32]. In this section, we use (·) g to denote the global frame.
In Figure 4, a state node represents a pose estimation in the global frame. VINS-fusion involves a VIO module and the global optimization. VINS-fusion's VIO module defines the state vector as Equation (25) shows. The same symbol in Equations (23) and (25)   In Figure 4, a factor is a constraint that is derived from one kind of measurement. The blue edge between two neighboring nodes represents a VIO factor and reflects a VIO-provided local constraint on two consecutive states. The purple edge represents a GNSS factor and reflects a global constraint on the position state of every node. GNSS outputs low-frequency positioning results, while VIO generates high-rate local poses. VINSfusion runs global optimization at the same rate as the GNSS updates. After each global optimization, VINS-fusion updates the transformation matrix that reflects the rotation and translation from the local frame to the global frame. The transformation matrix is presented in Equation (26). VINS-fusion utilizes the VIO factor and the transformation matrix to constrain the change between neighboring pose estimations in the global frame and utilizes the GNSS factor to constrain the global position estimation.
VINS-fusion calculates the VIO factor, also known as the local factor, as Equation (27) shows: In Equation (27), z VIO i−1,i (t) refers to the VIO results. The VIO factor uses these results as measurements to constrain the pose in the global frame; p g b,i (t) and q g b,i (t) denote the pose in the global frame when the ith keyframe is captured.denotes the quaternions' minus operation. As noted previously, VIO can provide accurate positioning results in a local region. The local factor specifies that the change between two adjacent state nodes should correspond with that provided by VIO. The poses in VIO's world frame and the global frame can be transformed as Equation (28) shows: In Equation (28), q g w (t) is the quaternion that can be calculated with the elements of R g w (t).
VINS-fusion calculates the GNSS factor, also known as the global factor, as Equation (29) shows: The GNSS positioning results directly constrain the position states at each node. After each global optimization, the transformation matrix is updated as Equation (30) shows: In Equation (30) The researchers who designed VINS-fusion use the GNSS covariance to determine the weight for GNSS in the optimization phase. They claim that the covariance is determined by the number of satellites when the measurement is received [32]. The more satellites the receiver receives measurements from, the smaller the covariance is. However, the number of satellites cannot accurately reflect RTK performance because a visible satellite may not provide a valid carrier phase. Additionally, the accuracy of GNSS positioning is highly dependent on the geometric distribution of satellites. The RTK node is based on an EKF, which can estimate the state variable covariances. In addition, RTK's state vector comprises the position. Therefore, we can use the inverse of the position covariances estimated by the RTK algorithm as the weight for RTK.

The Improved Integration Strategy for Smartphones
In general, the smartphone users' altitude does not change rapidly over a short period of time. However, the deviation in GNSS vertical positioning results is more evident than that in horizontal positioning because of the geometric distribution of satellites. The RTK algorithm employs an EKF to estimate the position covariances to assess the accuracy of the position. If the covariances are estimated accurately enough, they can roundly reflect the positioning accuracy, and their inverse can be set as the weight for RTK in the optimization phase. However, the estimation of covariances can be inaccurate. In a GNSS-unfriendly area, the multipath effects can introduce random errors into the measurements. It is difficult to model these errors with a perfect measurement covariance matrix, which can reduce the estimation accuracy. In this case, the inverse of the position covariances is not an ideal weight for RTK. Based on these facts, we design an improved integration strategy adjusting the weight dynamically to reduce vertical positioning deviation. First, we collect sets of GNSS measurements with the Xiaomi MI 8 in an open sky environment devoid of tall buildings. We use the RTK algorithm to process these measurements and obtain the Root-Mean-Square Error (RMSE) of the positioning results. We use µ 0 to denote the RMSE in the open sky environment. Then, we set a threshold based on µ 0 . Next, we use a sliding window containing a fixed number of RTK positioning results. Once the RTK algorithm generates a new positioning result, the sliding window moves. We calculate the RMSE of the positioning results in the sliding window and designate the RMSE as an indicator of positioning result fluctuations. We use µ(t) to denote the RMSE in the sliding window. Finally, we compare the RMSE with the above threshold. If µ(t) is smaller than the threshold, we directly use the inverse of the position covariance as the RTK weight. If µ(t) exceeds the threshold, we dynamically adjust the RTK weight in the optimization phase. This process is shown in Equation (31), where γ(t) denotes the position covariance estimated by RTK; β(t) denotes the RTK weight in the optimization phase: The low RMSE indicates a period of stable positioning results, and the integration adopts a large weight for RTK. Once the RMSE in a sliding window surpasses the threshold, the integration reduces the weight for RTK and becomes more reliant on the VIO positioning results. Figure 5 illustrates this process. The sliding window remains stationary if there are no new RTK results. In this case, we do not calculate the RMSE of the positioning results in the window, and the integration strategy is completely reliant on the VIO positioning results. The improved integration strategy aims to dynamically adjust the weight for RTK to reduce the influence of RTK positioning errors. The improved strategy does not change VINS-fusion's optimization principle. The improved strategy still updates the transformation matrix and utilizes the transformation matrix to transfer the VIO results from the local frame to the global frame. The details described in Section 2.3 also fit in our improved integration strategy.

Field Testing
This section introduces some details about our experiments.

Data Collection and Processing
The measurements are collected with our CIGRLogger. We upgrade our CIGRLogger to collect the BDS measurements, magnetometer measurements, and barometer measurements. The CIGRLogger linearly interpolates magnetometer measurements at the epoch of the gyroscope readings. We designed a Python script to convert measurements collected by our CIGRLogger to a ROS bag so that ROS-integrated algorithms can read and process these measurements directly. This work does not process all visible satellite measurements.
Each satellite whose elevation is lower than 15 • will be discarded, as low-elevation satellites experience large atmospheric delays and multipath errors. In addition, satellites that cannot simultaneously provide a pseudo-range and a carrier phase will be disregarded, as RTK makes use of both pseudo-ranges and carrier phases. This work relies entirely on GNSS measurements acquired by the Xiaomi MI 8 to run the RTK algorithm and estimate the position and the position covariance. We do not use any additional data offered by Android APIs.
We follow the principle of RTKLIB and implement the RTK algorithm in C++. We also package these codes into a ROS node. This RTK node subscribes to the topics in the ROS bag mentioned above and publishes RTK positioning results to the other nodes. Algorithms integrated with ROS can directly fuse themselves with our RTK node. The VINS-fusion publicly available algorithm focused more on the use of stereo VIO. However, most smartphones can only provide images with a monocular camera. Therefore, we introduce our RTK node and an integrating node into the framework of VINS-mono. The integrating node that integrates RTK and VIO adheres to VINS-fusion's integration strategy. After that, we improve the integrating node with a sliding window. Finally, we compare the performances of different integration strategies. Figure 6 shows the CIGRLogger and the flow chart of the software design.

Description of Devices and Scenarios
We choose two u-blox NEO-M8T receivers to collect GNSS measurements (https: //www.u-blox.com/zh/product/neolea-m8t-series (accessed on 14 October 2021)). The GNSS measurements are processed with the RTK algorithm to generate the ground truth. One NEO-M8T is connected to a tactical antenna on the roof of our laboratory, while the other one is connected to a mini-survey antenna AT340 that moves with the researcher (http://www.comnavtech.com/AT340.html (accessed on 14 October 2021)). The RTK algorithm is used to generate a reference trajectory, using the GNSS measurements collected by NEO-M8Ts. We use the Xiaomi MI 8 to capture images, GNSS data, and IMU data. A researcher holds a Xiaomi MI 8 in the hand with an AT340 over his head, as Figure 7a illustrates. We premeasure the height difference between the Xiaomi Mi 8 and the AT340 and subtract it to evaluate the accuracy of the RTK and VIO integration. We perform walking tests along the track of Peking University's playground. We set µ 0 as 1.2 for this scenario. The south of the playground is a GNSS-hostile area surrounded by several tall buildings. Figure 7b depicts the scenario. Figure 8 shows the position of the Xiaomi Mi8's phase center [57]. Additionally, we use Kalibr to calculate the extrinsic matrix that reflects the rotation and translation from the IMU to the camera [58].

Differences between This Work and Our Previous Work
We proposed a continuous positioning algorithm based on RTK and visual-inertial SLAM (VI-SLAM) in our previous work [46]. The location part of VI-SLAM (VIO) is employed in [46] to assist RTK in urban areas. While both our previous and current studies involve RTK and VIO, the strategies for combining them are different. This section discusses how this work differs from our previous work.
The strategy employed in [46] is unreliable and straightforward. RTK and VIO work at distinct times, and they will not work simultaneously. VIO does not work when RTK can provide positioning results. VIO starts to provide positioning results when an RTK outage happens. The RTK positioning results and VIO positioning results do not affect each other. The strategy is fundamentally flawed in three ways. First, as noted previously in this study, VIO can only provide positioning results in a local frame. The starting point determines this local frame. It is difficult to reliably convert VIO positioning results to the global frame during the RTK outage. Second, as RTK and VIO run separately in different periods, they cannot compensate for each other's deficiencies. VIO is powerless to smoothen the RTK result deviations. RTK cannot help reduce the influence of VIO drift. Finally, the strategy of not letting RTK and VIO simultaneously work is insufficient to verify the potential of a smartphone's positioning abilities.
We develop this work to address the aforementioned concerns. We employ VINSfusion's strategy and our improved strategy to integrate RTK and VIO. RTK and VIO operate simultaneously, and their outputs are fused to provide more robust and accurate positioning results. A transformation matrix is estimated and updated to ensure a reliable transformation of VIO positioning results into the global frame. This work can verify the potential of a smartphone's positioning abilities because we fuse GNSS/IMU/visual measurements that are collected simultaneously. The differences between the current strategy and the previous strategy are illustrated in Figure 9. In addition, measurements were post-processed in our previous work. Our previous work was based on two open-source software programs: RTKLIB and VINS-mono. RTKLIB implements the RTK algorithm in ANSI C. Many VIO algorithms are implemented in C++ and are coupled with ROS. In this work, we implement our RTK ROS node and integration node following the principles of RTKLIB and VINS-fusion. The current version of our RTK node has no algorithmic innovations when compared to RTKLIB. However, it can be flexibly embedded into a variety of ROS-based algorithms. The researchers who study the integration of RTK and VIO do not need to implement RTK nodes themselves, which saves time and effort. We open source our software to facilitate the work of other researchers. Our work provides valuable tools for the academics who lack the resources to build a specialized suite for RTK/VIO integration research. The academics can concentrate on integration algorithms without spending too much effort on data acquisition, data transformation, and RTK algorithm implementation.

The Validity of GNSS Measurements Collected by the Smartphone
This section presents the visibility of GNSS measurements in a walking test. These measurements are collected with a Xiaomi MI 8. The CIGRLogger invokes getAccumulatedDeltaRangeState, the Android API, to obtain the state of a carrier phase (https:// developer.android.com/reference/android/location/GnssMeasurement#getAccumulated DeltaRangeState() (accessed on 14 October 2021)). The CIGRLogger discards the carrier phase measurements with invalid states. As discussed previously, the smartphone's antenna has a low-level performance, and the multipath can influence the phase tracking loop. Sometimes, a visible satellite cannot provide both a valid pseudo-range and a valid carrier phase. Figure 10a,b shows that satellites with a valid carrier phase are usually less than satellites with a valid pseudo-range. Figure 11a,b shows that satellites with valid phases can occasionally be less than four, while visible satellites are more than four at the same moment. RTK cannot provide positioning results when satellites providing valid carrier phases are less than four. VIO can help bridge this outage.

The Advantage of the Introduction of BDS Satellites
As we modify the CIGRLogger to collect BDS measurements using a smartphone, this section discusses the benefits of the introduction of BDS satellites. We collect GPS/BDS measurements with the Xiaomi MI 8, and we compare GPS RTK and GPS/BDS RTK. Figure 12 illustrates the results. The reference. Figure 12 shows that both GPS RTK and GPS/BDS RTK suffer an outage in the south of the playground. The outage is caused by signal blockage caused by the tall buildings around the playground. This fact is consistent with the satellites' visibility in Figure 10 and the number of satellites depicted in Figure 11. However, the introduction of BDS shrinks the outage and improves the positioning continuity. VIO can assist in bridging the outage, and the following analyses are based on GPS/BDS RTK.

The Performance of a Standalone VIO
As previously stated, VIO is a relative positioning technique that is dependent on the starting point. This section presents the performance of VIO as a standalone unit. VIO by itself can provide results that indicate the device's translation and rotation relative to the starting pose. VIO by itself is unable to provide results for absolute positioning in a fixed global coordinate because it cannot provide the location and orientation in global frame of the starting pose. VIO by itself cannot provide the device's absolute attitude. VIO algorithms often establish the world frame as a reference and calculate the device's pose in the world frame. The term "world frame" does not refer to a fixed global coordinate. Rather, the world frame corresponds to the device's pose after initialization. As a result, VIO by itself can only provide positioning results relative to the world frame, and presenting the results in a fixed global coordinate, such as the ENU coordinate, directly is nonsensical. Figure 13 shows the comparison of results generated by VIO alone and those given by RTK based on two NEO-M8Ts in a test. The results are presented in the ENU coordinate, with the base station's position as the origin. We directly present VIO's horizontal positioning results in the ENU coordinate in Figure 13a only to provide an intuitive depiction. Figure 13a shows that the standalone VIO trajectory has a similar shape to the ground truth. However, the moving direction of the trajectories differs. This difference is caused by the choice of the world frame. Then, we use two lines to join the starting point and the midpoint of the two trajectories. We calculate the angle between the two lines and rotate the trajectory of VIO by that angle to illustrate the standalone VIO's performance more intuitively in Figure 13b. Figure 13b indicates that a drift exists in the positioning results given by the standalone VIO. The drift distorts the trajectory and results in an obvious gap between the positioning results at the start and the end. The gap indicates that an evident difference exists between the first and final position results of the standalone VIO trajectories. We return to the starting point in the tests, so the difference means positioning errors. We collect 20 sets of data and calculate the average value of the difference. Table 5 summarizes the results.   Figure 14 presents the position results from a walking test. The Xiaomi MI 8's RTK positioning results fluctuate immensely, while the user is in the south of the playground due to the blockage and the multipath. The signal blockage can result in a poor geometric distribution of satellites, leading to significant position deviation. The multipath effect has a greater effect on the pseudo-range quality than on the carrier phase quality. The multipath effect dramatically affects the performance of RTK because it utilizes both pseudo-ranges and carrier phases. In Figure 14, the blue line represents the reference positioning results estimated by the NEO-M8Ts. The RTK results provided by NEO-M8Ts are accurate and continuous, as the movable NEO-M8T is connected to an anti-multipath antenna. The assistance of VIO alleviates the fluctuations, and the integration of RTK and VIO provides more accurate positioning results. As Figure 14 cannot depict the RTK outage, we use Figures 15 and 16 to illustrate that an outage exists in the RTK positioning results and the integration can mitigate the outage. The positioning results in the south of the playground in Figure 15a are sparser when compared to those in Figure 15b,c. The tall buildings around the playground and the user's body can block the GNSS signal and exacerbate the multipath effect. Meanwhile, the Xiaomi MI 8 is equipped with a poor antenna, which can only provide a poor carrier-to-noise ratio and insufficient multipath suppression. As a result, the GNSS receiver in the Xiaomi MI 8 cannot track enough satellites that can provide pseudo-range and carrier phase simultaneously as shown in Figures 10 and 11. As a result, the Xiaomi MI 8 is occasionally unable to deliver valid RTK positioning results in the south of the playground, which means an RTK outage happens. The reference trajectory in Figure 15c contains positioning results at every epoch because the professional receivers and anti-multipath antennas can provide enough measurements for RTK at every epoch. The trajectory in Figure 15b shows that the integration of RTK and VIO based on the Xiaomi MI 8 can also provide positioning results at every epoch. When RTK based on the Xiaomi MI 8 cannot provide positioning results, we use the transformation matrix to transfer the VIO results from the local frame to the global frame so that we can bridge the RTK outage, as described in Section 2.3. We use Figure 16 to depict the RTK outage and the compensation clearer. In Figure 16, a cycle with a specific color means the specific algorithm can provide a valid positioning result at the corresponding epoch. The positioning results given by RTK based on the Xiaomi MI 8 are intermittent in the red box, which means an RTK outage happens. On the contrary, the positioning results given by the integration in the red box are continuous. Figure 16 shows that the Xiaomi MI 8 is occasionally unable to deliver valid RTK positioning results, but with the assistance of VIO, the gap is compensated. We collect 20 sets of data to quantitatively compare the performance of RTK and that of RTK/VIO integration. In a walking test, we count the total number of epochs between algorithm initialization and termination. Then, we count the number of epochs for which the algorithm can provide positioning results. The ratio of the latter to the former demonstrates the continuity of the algorithm. Additionally, we calculate the difference between the position result given by the NEO-M8T and the corresponding one given by the Xiaomi MI 8 at every epoch. Then, we sum up these differences to calculate the average deviation of positioning results in a walking test. Table 6    3.5. The Performance of the Improved Integration As illustrated in Figure 14b, the vertical positioning results of RTK+VIO still suffer a huge fluctuation. Although the fluctuation is reduced when compared to that given by RTK alone, the deviation can be up to 10 m. The position covariance given by RTK is an estimation. The covariance cannot roundly reflect vertical positioning accuracy at approximately 00:43:30 (Universal Coordinated Time). As a result, the weight for RTK in the integration is too large, influencing the integration's accuracy. We propose an improved integration strategy to improve this situation. Our strategy focuses on the vertical positioning results. We design a sliding window and calculate the fluctuations in the window to set the weight for RTK. We only use this sliding window to calibrate the vertical positioning results because, as Figure 14a shows, RTK+VIO can provide satisfactory horizontal positioning results. Figure 17a,b presents the horizontal positioning results of the improved strategy in a walking test. These figures and their local views show that the horizontal positioning results given by the improved strategy are close to those provided by VINS-fusion's strategy. The fusion strategy calculates and optimizes the residuals of position in three dimensions separately. Hence, the sliding window and the calibration on vertical positioning have a small influence on the horizontal positioning results. Figure 18 presents the vertical positioning results of our strategy in the same test. Our improved strategy smoothens the fluctuations in the vertical positioning results. We collect 20 sets of data and calculate the average deviation of the positioning results. Table 7 shows the statistics. Figure 18 and Table 7 show that our improved integration can reduce the deviation in the vertical positioning results when compared to the VINS-fusion strategy. Figure 17 illustrates the horizontal positioning results of the VINS-fusion's strategy and the improved strategy. As the positioning results of different strategies can be close to each other, we present the absolute values of the differences between the reference and the integration results of different strategies in Figure 19. Figures 17 and 19 and Table 7 demonstrate that the improvement in accuracy in vertical positioning do not degrade the horizontal positioning performance.    When it comes to the compensation for the RTK outage, the improved integration strategy can perform as well as the pre-improved strategy. The improved integration concentrates on adjusting the weight for RTK to reduce the impact of the RTK positioning errors. The improved integration updates the transformation matrix and utilizes the transformation matrix to transfer the VIO results from the local frame to the global frame so that VIO results can compensate for the RTK outage. The details are described in Section 2.3. The improvement in positioning performance shown in Figures 17 and 18 will not degrade the integration's ability to provide continuous positioning results. Figure 20 illustrates different algorithms' trajectories, which are plotted in dots. Figure 20 shows that the improved integration can provide continuous positioning results in the south of the playground as the pre-improved strategy does when the RTK positioning results are intermittent, which means our proposed improved strategy can compensate for the RTK outage. We use Figure 21 to depict the improved strategy's ability in providing continuous positioning results clearer. In Figure 21, a cycle with a specific color means that the specific algorithm can provide a valid positioning result at the corresponding epoch. Figure 21 shows that the proposed strategy can provide valid positioning results at every epoch in the test as the pre-improved strategy does. We collect 20 sets of data to assess the continuity of the proposed strategy quantitatively. The improved strategy can provide valid positioning results at every epoch in all the tests. Table 8

Strategies Average Percentage
The pre-improved strategy 100% The improved strategy 100%

Conclusions
This study verifies the feasibility of integrating RTK and VIO with a smartphone. RTK and VIO run simultaneously. Their results are fused to improve positioning robustness and accuracy. We first integrate RTK and VIO following the strategy of VINS-fusion and then improve the strategy to reduce the deviation in the altitude positioning results. Several walking tests show that the integration of RTK and VIO can provide continuous positioning results, while RTK sometimes cannot provide positioning results. The average positioning deviation drops from 3.23 m to 2.8 m after the introduction of VIO. The tests also verify the validity of our improved strategy. The average vertical positioning deviation drops from 2.52 m to 1.21 m. We also provide some useful tools and source codes for researchers. We first modify our CIGRLogger to log BDS measurements. Then, we design a Python script to transform the measurements into a ROS bag. Finally, we follow RTKLIB's principle and provide a ROS node implementing RTK. Users can integrate this node with their ROS algorithms. These codes can be found at https://github.com/Nronaldo (accessed on 14 October 2021).
The improved strategy of integrating RTK and VIO verifies the potential of a smartphone's positioning abilities, but is still in the proof-of-concept stage. In the future, we will evaluate our strategy in other urban environments. We will improve the way of adjusting the RTK weight in our algorithm to adapt to different urban environments. We will also explore a more efficient integration strategy. For example, we plan to tightly couple RTK and VIO to improve accuracy and robustness. We will utilize VIO to help improve the fixed rate of AR in RTK and utilize RTK to improve the initialization of VIO. In addition, we will study algorithmic innovations to develop our RTK node.

Conflicts of Interest:
The authors declare no conflict of interest.

Notations
The following notations are used in this manuscript: The covariance matrix of the observation noise D G The SD transition matrix for GPS D B The SD transition matrix for BDS F(t) The state transition matrix in RTK H(t) The observation matrix in RTK I The identity matrix L G (t) The matrix comprised of LOS vectors of GPS L B (t) The matrix comprised of LOS vectors of BDS O The zero matrix Q(t) The covariance matrix of the process noise R ECEF ENU (t) The coordinates rotation matrix from the ENU coordinate to the ECEF coordinate R g w (t) The rotation matrix from the world frame to the global frame T g w (t) The transformation matrix between the global frame and the world frame T g b (t) The transformation matrix between the body frame and the global frame T w b (t) The transformation matrix between the body frame and the world frame Vector a u (t) The user's acceleration d u (t) The user's position d S G k 1 (t) The position of the k 1 th GPS satellite d ref The position of the reference station e G (t) The SD phase ambiguities of GPS satellites e B (t) The SD phase ambiguities of BDS satellites h(χ RTK (t)) The measurement function for the update step in RTK l g w (t) The translation from the world frame to the global frame o(t) The LOS vector p b cam The translation from the camera frame to the body frame p w b,i (t) The position of the IMU in the world frame when the ith frame is captured p g b,i (t) The position of the IMU in the global frame when the ith frame is captured The position given by GNSS when the ith frame is captured q b cam The rotation from the camera frame to the body frame q w b,i (t) The orintation of the IMU in the world frame when the ith frame is captured q g b,i (t) The orintation of the IMU in the global frame when the ith frame is captured v u (t) The user's velocity v w b,i (t) The velocity of the IMU in the world frame when the ith frame is captured v g b,i (t) The velocity of the IMU in the global frame when the ith frame is captured The IMU state vector when the ith frame is captured The state vector of the Kalman filter in RTK The state vector of VINS-mono χ VIO (t) The state vector of VINS-fusion y(t) The measurements vector of the Kalman filter in RTK ∇∆φ G (t) The DD carrier phase vectors of GPS satellites ∇∆ρ G (t) The DD pseudo-range vectors of GPS satellites ∇∆φ B (t) The DD carrier phase vectors of BDS satellites ∇∆ρ B (t) The DD pseudo-range vectors of BDS satellites acc (t), gyro (t) The IMU biases Scalar The pseudo-range The carrier phase r(t) The geometric distance between the receiver and the satellite ∇∆ρ(t) The DD pseudo-range ∇∆φ(t) The DD carrier phase ∇∆r(t) The DD geometric distance ∇∆r S B 1 ,S B k 2 (t) The DD geometric range between the 1st BDS satellite and the k 2 th BDS satellite τ iono (t) The ionospheric delay τ tropo (t) The tropospheric delay λ The wavelength of the GNSS signal δ rec (t) The receiver clock bias δ sat (t) The satellite clock bias ω ρ (t) The measurement noise in the pseudo-range ω φ (t) The measurement noise in the carrier phase ∇∆ω ρ (t) The DD measurement noise in the pseudo-range ∇∆ω φ (t) The DD measurement noise in the carrier phase The speed of light E S G 1 (t) The SD phase ambiguity state variable of the 1st GPS satellite i The keyframe index j The feature index k 1 The index of GPS satellites k 2 The index of BDS satellites m G The number of the visible GPS satellite m B The number of the visible BDS satellite n 1 The number of keyframes in the sliding window n 2 The number of features in the sliding window N The integer ambiguity s j (t) The inverse distance of the jth feature S G k 1 The k 1 th GPS satellite S B k 2 The k 2 th BDS satellite t The time ∆t The time interval