1. Introduction
Global navigation satellite systems (GNSSs) have advanced at a breakneck pace in recent years. The expansion of constellations, as well as the addition of new signals and the introduction of multiple positioning solutions, have contributed to the advancement of precise navigation [
1,
2]. The real-time kinematic positioning technique (RTK) is representative and attracts research. RTK is a differential positioning technique that utilizes at least one stationary station to determine the location of movable receivers. The stationary station, also known as the reference station, is critical for providing a reference and mitigating common errors between itself and the movable receiver, also known as the rover or the user [
3]. RTK simultaneously processes the pseudo-range and carrier phase measurements with the ambiguity resolution (AR) technique to generate more accurate positioning results [
4]. According to certain studies, the dual-frequency RTK can accomplish quick AR for short baselines, resulting in real-time centimeter-level precision [
5]. In comparison, the single-frequency RTK performance is restricted but can be considerably improved by utilizing satellites from various constellations. These constellations include the global positioning system (GPS), the China-developed Beidou navigation satellite system (BDS), the European Galileo system, and the Russian global navigation satellite system (GLONASS) [
6,
7,
8]. RTKLIB is a well known and representative open-source software for calculating RTK positioning results [
9].
Compared to geodetic-grade multifrequency GNSS receivers, consumer-grade receivers are more prevalent in everyday life because of their low cost and low-power consumption. Smartphones, which have become almost ubiquitous in modern society, can be thought of as consumer-grade GNSS receivers. However, smartphone-grade antennas might lose several or even dozens of decibels of sensitivity when compared to professional GNSS antennas, causing smartphones to struggle to maintain a lock on GNSS signals [
10]. Additionally, GNSS signals are circularly polarized to suppress the effect of Faraday rotation, while smartphone-grade antennas adopt linear polarization [
11]. As a result, the smartphone-grade antenna can only provide a poor carrier-to-noise ratio and insufficient multipath suppression, limiting the performance of the smartphone’s built-in GNSS receiver [
12]. The advancement of the Android operating system and GNSS chipsets promotes research into positioning, using smartphones. Before 2016, researchers could only obtain the positioning results calculated by the inbuilt GNSS chipset. This situation persisted until Google released Android N, which started providing the Android Raw GNSS Measurements API. Since then, developers have gained access to the pseudo-range and carrier phase measurements and have begun processing the measurements, using their positioning algorithms [
13]. Nonetheless, a technique called duty cycling prevents smartphones from tracking the carrier phase continuously. Duty-cycling periodically powers on and off the GNSS chipset to extend the battery life [
14]. Android P has a developer option titled “Force full GNSS measurements” that disables duty cycling, paving the path for precise positioning with the carrier phase [
15]. Meanwhile, the Xiaomi MI 8 equipped with BCM47755 was released. The Xiaomi MI 8 can provide dual-frequency GNSS measurements and has become an ideal platform for research into smartphone-based precise positioning. According to studies, the Xiaomi MI 8 is promising for offering accurate positioning results in urban areas with RTK and precise point positioning (PPP) [
16,
17,
18].
Users of smartphones are often pedestrians in urban areas. Buildings can block and reflect GNSS signals, obstructing receivers’ ability to maintain signal tracking and exacerbating the multipath effect. The carrier phase tracking loop is the weakest link of the GNSS receiver, and it is easier that the carrier tracking loop loses the lock than the code tracking loop [
19,
20]. The carrier phase measurements may occasionally be absent in urban areas [
21]. A pedestrian often keeps their smartphone near their body when they need positioning. As a result, the user’s body becomes an unavoidable signal blocker. This fact significantly constrains the performance of smartphone-based RTK. We should introduce other sensors and positioning approaches to assist RTK in providing continuous positioning results in urban areas. A common complementary solution is the inertial navigation system (INS) based on an inertial measurement unit (IMU) [
22]. An IMU consists of gyroscopes and accelerometers. The measurements are subject to additive noise and a changing bias, which results in a long-term drift in positioning [
23]. Visual odometry (VO), a camera-based visual approach, is also a supplementary positioning solution [
24]. VO performance is constrained by light circumstances, ambient textures, and device speed. In the case of monocular VO, the system’s absolute scale is ambiguous [
25]. Visual–inertial odometry (VIO) is a technique that combines VO with INS to mitigate long-term drift and solve scale ambiguity. Additionally, the combination increases the robustness [
26]. VINS-mono, ORB-SLAM3, and MSCKF are representative VIO algorithms [
27,
28,
29]. Generally, these algorithms are integrated with the robotic operating system (ROS) [
30]. Nonetheless, VIO systems still undergo long-term drift [
31]. In addition, VIO generates a pose output that includes the estimation of position and orientation in local coordinates, indicating that it is a relative positioning technique. Its positioning results depend on the starting point. As a result, VIO is unfriendly for reusing without a fixed global coordinate [
32]. These positioning techniques are summarized in
Table 1. In
Table 1, √ means that the technique has a corresponding feature, while × means the opposite.
VIO can produce highly accurate relative positioning data in a short period of time. In contrast, GNSS positioning techniques can deliver absolute positioning results. Combining VIO and GNSS can provide locally accurate and global drift-free positioning results. In recent years, the integration of INS, VO, and GNSS has been a popular topic. ETH Zurich designed MSF-EKF, a loosely coupled GNSS/INS/VO framework, based on an extended Kalman filter (EKF) [
33]. Rui Sun et al. proposed a GNSS/INS/VO fusion scheme, using two EKFs with a non-holonomic constraint (NHC) for real-time 3D vehicle state estimation [
34]. Tong Qin and colleagues published VINS-fusion, a loosely coupled GNSS/VIO system based on optimization [
35]. Tuan Li and his colleagues utilized MSCKF to tightly fuse RTK/INS/VO to increase velocity and attitude accuracy [
36]. Shaozu Cao’s team offered GVINS, derived from VINS-mono. GVINS optimizes GNSS measurements, including pseudo-range and Doppler measurements, along with visual and inertial measurements [
37]. These studies advance the field of navigation study. They focus on giving precise positioning results, using bulky and complex specialized devices. Additionally, researchers frequently need to assemble and synchronize sensors, as few commercial devices simultaneously collect GNSS measurements, inertial measurements, and images. For instance, VINS-fusion is built on a self-developed suite that contains stereo cameras and a DJI A3 controller (
http://www.dji.com/a3 (accessed on 14 October 2021)). The DJI A3 controller comprises an IMU and a GNSS receiver. The team proposing GVINS combines a u-blox ZED-F9P receiver (
https://www.u-blox.com/en/product/zed-f9p-module (accessed on 14 October 2021)) with a VI-Sensor [
38]. The VI-Sensor synchronizes the camera and IMU well, and the ZED-F9P delivers a pulse per second (PPS) signal to trigger the VI-Sensor and align the time. Regrettably, the VI-Sensor is no longer commercially available (
https://github.com/ethz-asl/libvisensor/issues/11 (accessed on 14 October 2021)). Rui Sun and Tuan Li both utilized three components to acquire GNSS measurements, IMU measurements, and images. They used the PPS signal to trigger the camera and record the GPS time of the exposure. These works are summarized in
Table 2.
In comparison to these sophisticated and professional devices, smartphones are ubiquitous portable devices in modern society. Currently, smartphones are generally equipped with a camera, an IMU, and a GNSS chipset, which provide a platform for data fusion. Admittedly, these low-cost sensors have certain inherent flaws. For example, there are temporal offsets between the camera and the IMU. Additionally, smartphones generally adopt rolling shutter cameras that can generate motion blur [
39]. The abovementioned antenna’s low performance is also a significant issue. However, it is critical to investigate the potential of smartphones’ positioning abilities [
40,
41]. Certain VIO algorithms, such as VINS-mono, can estimate the temporal offset, and an appropriate integration can compensate for the sensors’ shortcomings. In addition, smartphone users can easily align the GPS time with the local time of the IMU and images. Researchers can treat smartphones as an expedient to study the integration of GNSS and VIO if they do not have the resources to build up a specialized suite.
There are multiple applications of logging measurements from smartphone inbuilt sensors. The GEO++ RINEX Logger is a representative application that can generate a file of measurements of GPS, BDS, Galileo, and GLONASS in a receiver independent exchange (RINEX) format [
42]. RINEX is a widely used standard format in the fields of geodesy and navigation. The GEO++ RINEX Logger generates a file that can be directly processed by RTKLIB. Many studies on smartphone positioning are based on this application [
43,
44]. Regrettably, the GEO++ RINEX Logger can only provide GNSS measurements and does not allow access to its source code. The MARSLogger is an open-source application for recording IMU measurements and images [
45]. However, the MARSLogger cannot offer GNSS measurements. Our previous work open sourced a CIGRLogger to collect images, IMU measurements, and GPS measurements. The GPS measurements are output in RINEX format and can be directly processed by RTKLIB, similar to what the GEO++ RINEX Logger does [
46]. However, valid satellites in a single constellation are unable to meet the positioning requirements in urban areas. In this work, we upgrade our CIGRLogger to collect BDS measurements to introduce more valid satellites. CRGRLogger can also collect magnetic measurements and barometric measurements. These Android applications are summarized in
Table 3.
Table 3 illustrates the sensors from which the applications can collect measurements. In
Table 3, √ means that the application can collect the corresponding sensor’s measurements, while × means the opposite.
Many academics focus on positioning, using smartphone built-in sensors. Researchers from Nottingham Scientific Limited (NSL) assessed the smartphone-based multifrequency RTK and PPP performance in urban environments. Their research demonstrates that smartphone-based PPP and RTK can provide the location precision of 1–2 m [
47]. Researchers from Wuhan University (WHU) used magnetic measurements to assist the IMU in smartphones. Additionally, they utilized pseudo-observations and strategies, such as zero-velocity update technology (ZUPT) and zero angular rate update (ZARU) to effectively suppress IMU drifts and, hence, offer robust, accurate positioning results [
48,
49]. A team from the Institute of Space Technology and Space Applications (IST) tested the performance of the loosely coupled RTK/INS with a smartphone. The results indicate that the introduction of IMU bridges the RTK gap and decreases RTK fluctuations [
50]. Numerous researchers are studying smartphone-based VIO algorithms. Peiliang Li modified VINS-mono and ported his VINS estimator to the iPhone 7 [
51]. Yuan Wang and his team deployed VIO algorithms on an Android smartphone [
40]. In recent years, some researchers applied machine learning and deep learning techniques to the subject of smartphone navigation. They collected training data and formed models to estimate a smartphone’s trajectory, using IMU measurements. Hang Yan and his colleagues employed a support vector machine (SVM) and a neural network in sequence to provide trustworthy positioning results, using a smartphone’s IMU [
52,
53]. However, few studies have concentrated on fusing RTK, IMU, and images with a smartphone to provide positioning results. These works are summarized in
Table 4.
The following are the contributions of this paper.
(1) We demonstrate the feasibility of integrating RTK and VIO, using smartphones. Our previous work verifies the feasibility of continuous positioning based on RTK and VIO, using smartphones. However, in our previous work, RTK and VIO do not run simultaneously, but alternate to estimate a positioning solution. The position output from RTK does not affect the output from VIO and vice versa [
46]. In our previous work, we combined RTK with VIO in a straightforward and unreliable way. In this paper, we choose more reliable and robust strategies to integrate RTK and VIO. First, we upgrade our CIGRLogger to collect BDS measurements to introduce more valid satellites. Then, as many VIO algorithms are coupled with ROS, we design a Python script to convert the images, IMU measurements, and GNSS measurements into a ROS bag. A ROS bag is a file that stores messages for ROS. Each kind of message has a unique topic. ROS nodes can subscribe to different topics to obtain the data they require. Following that, RTKLIB is a sophisticated system that consists of many algorithms and functions. The algorithm is implemented in ANSI C. We follow RTKLIB’s principle to implement the RTK function in C++ and package these codes into a ROS node. This RTK node subscribes to the topics in the ROS bag created by our Python script and publishes the RTK positioning results to other nodes. This RTK node can be directly integrated into other ROS algorithms. Finally, we adopt the integration strategy of VINS-fusion [
35] and integrate our RTK node with VINS-mono [
27]. Another node based on optimization is introduced to accomplish the integration. The details of the optimization are explained in
Section 2.3. The experiments demonstrate that the integration of both approaches combines RTK and VIO’s advantages and can provide accurate continuous positioning results with a smartphone in urban areas.
(2) We provide valuable research tools. As shown in
Table 2, many researchers work on the GNSS/INS/VO fusion algorithm based on the bulky and complex specialized devices. We open source our codes on GitHub to facilitate the researchers who lack the resources to build a specialized suite for RTK/VIO integration research. The researchers can collect measurements using an Android smartphone with CIGRLogger. They do not need to implement RTK nodes themselves, which saves time and effort. The academics can concentrate on integration algorithms without spending too much effort on data acquisition, data transformation, and RTK algorithm implementation.
(3) We improve the integration strategy for smartphones. The positioning results provided by RTK with a smartphone can fluctuate immensely. Due to the geometric distribution of satellites, these fluctuations are more pronounced in vertical positioning. We design a sliding window strategy and calculate the vertical positioning fluctuations in the window. The result is treated as a criterion to adjust the weight for RTK in integration. This improvement decreases the deviation in altitude positioning results when compared to the VINS-fusion integration strategy.
As our work involves several existing algorithms, we use
Figure 1 to depict the relationship between the existing algorithms and our work.
The remainder of the paper is structured as follows:
Section 2 discusses the RTK algorithm, the framework of VINS-mono, the integration strategy of VINS-fusion, the improvement of integration, and some specifics regarding our tools and devices. Next,
Section 3 introduces the results of the experiments. Finally,
Section 4 summarizes the conclusions and future work.