Comparison of Three O ﬀ -the-Shelf Visual Odometry Systems

: Positioning is an essential aspect of robot navigation, and visual odometry an important technique for continuous updating the internal information about robot position, especially indoors without GPS (Global Positioning System). Visual odometry is using one or more cameras to ﬁnd visual clues and estimate robot movements in 3D relatively. Recent progress has been made, especially with fully integrated systems such as the RealSense T265 from Intel, which is the focus of this article. We compare between each other three visual odometry systems (and one wheel odometry, as a known baseline), on a ground robot. We do so in eight scenarios, varying the speed, the number of visual features, and with or without humans walking in the ﬁeld of view. We continuously measure the position error in translation and rotation thanks to a ground truth positioning system. Our result shows that all odometry systems are challenged, but in di ﬀ erent ways. The RealSense T265 and the ZED Mini have comparable performance, better than our baseline ORB-SLAM2 (mono-lens without inertial measurement unit (IMU)) but not excellent. In conclusion, a single odometry system might still not be su ﬃ cient, so using multiple instances and sensor fusion approaches are necessary while waiting for additional research and further improved products.


Introduction
Robot localization within its environment is one of the fundamental problems in the field of mobile robotics [1]. One way of tracking this problem is to use vision-based odometry (VO), that is capable of accurately localizing robots' position with low drift over long trajectories even in challenging conditions. Many VO algorithms were developed that are categorized into direct, semi-direct and feature-based on what image information is used in order to estimate egomotion [2]. The hardware setup varies, as camera images can be captured in monocular or stereo vision. Many augmentations of VO are available, which perform sensor fusion of computed egomotion with other sensors that can refine the trajectory such as inertial measurement unit (IMU) [3][4][5], depth sensors [6,7], and LIDAR (light detection and ranging) [8,9].
An important feature to improve the quality of VO is to use SLAM (simultaneous localization and mapping) [10] supplemented by "loop closure": this means building a database of images while moving, so that when the robot comes back to an already seen location-i.e., with a view similar enough than one of those in the database-it will relocalize itself, thereby cancelling any drift since the last time the robot was at that same location, which results in a much more robust long-time positioning [11]. We only assess the "localization" part of SLAM in this article, not its "mapping" component.

Test Environment
The experiments were conducted indoors, in a controlled area, on a flat, non-slippery surface. As visible in Figure 1, we used some pieces of dark textile to make the scene more, or less, feature-rich, i.e., to adjust the quantity of visual clues in the robot's field of view. Indeed, visual odometry systems are especially challenged when facing uniform surfaces such as a long white wall. Another important parameter affecting the quality of visual odometry is whether those visual clues are static (not moving) or whether some of them might be moving (dynamic). In order to compare the robustness of the different odometry systems over moving visual elements, we asked three persons to walk repeatedly along the walls of the experiment area.
It is important to note that in order to ensure that we are testing the different systems in a fair way, all visual odometry systems were running in parallel, meaning that there were exposed to exactly the same environment. We believe this is an interesting approach to compare VO systems, at least when it comes to the data acquisition, and it is similar to the approach used with mobile phones in [18]. In our case, we also ran the computation live (which is needed for RealSense and ZED Mini), which might favor the RealSense solution (which runs on its own hardware) because ZED Mini and ORB-SLAM2 run as software on our computer board.

Ground Truth
An OptiTrack [19] system was used as a ground truth system. It is a motion capture system that is capable of tracking objects with positional error less than 0.3 mm and rotational error less than 0.05°, using seven Prime 13 cameras [20] (cf. Figure 2), which can detect passive markers placed on the tracked object. Five markers placed on the top of the robot were used to track robots 6DoF position. The pivot point was marker location where the final position was calculated for. In the experiment, pivot point was located in the center of the camera, which is was ~25 cm in front of the center of the robot.

Robot Setup
The platform on which the tests were performed is a Parallax Arlo [21] Robot Platform System (cf. Figure 1), commercialized by Parallax Inc. [22]. This platform was utilized as the physical

Ground Truth
An OptiTrack [19] system was used as a ground truth system. It is a motion capture system that is capable of tracking objects with positional error less than 0.3 mm and rotational error less than 0.05 • , using seven Prime 13 cameras [20] (cf. Figure 2), which can detect passive markers placed on the tracked object. Five markers placed on the top of the robot were used to track robots 6DoF position. The pivot point was marker location where the final position was calculated for. In the experiment, pivot point was located in the center of the camera, which is was~25 cm in front of the center of the robot.

Ground Truth
An OptiTrack [19] system was used as a ground truth system. It is a motion capture system that is capable of tracking objects with positional error less than 0.3 mm and rotational error less than 0.05°, using seven Prime 13 cameras [20] (cf. Figure 2), which can detect passive markers placed on the tracked object. Five markers placed on the top of the robot were used to track robots 6DoF position. The pivot point was marker location where the final position was calculated for. In the experiment, pivot point was located in the center of the camera, which is was ~25 cm in front of the center of the robot.

Robot Setup
The platform on which the tests were performed is a Parallax Arlo [21] Robot Platform System (cf. Figure 1), commercialized by Parallax Inc. [22]. This platform was utilized as the physical

Robot Setup
The platform on which the tests were performed is a Parallax Arlo [21] Robot Platform System (cf. Figure 1), commercialized by Parallax Inc. [22]. This platform was utilized as the physical framework for visual odometry research. Two standard wheels with motors on the sides of the robot and two castor wheels in front and back, cause the platform to be nonholonomic. The platform has two battery packs (12 V, 7 Ah) connected to DHB-10 Motor Controller and Propeller Activity board. It also supplied the Nvidia Jetson TX2 and Raspberry Pi 3. The Activity board was connected to the Raspberry Pi which delivered the control signals for the motors.
To improve the efficiency of the visual odometry computations, one has divided the vision systems into two independent parts, running in parallel. First one running under the Raspberry Pi 3, having Arlobot's Activity board and Intel RealSense T265 connected to. The other system was running on Nvidia Jetson TX2, which has significantly higher computation possibilities (8 GB of memory, 6 CPU cores, GPU with 256 CUDA cores). The board was powered by the 12 V output from the battery pack mounted on the Arlobot's platform. The Nvidia Jetson had connected only one external camera-ZED Mini delivered by Stereolabs. Apart from holding the ZED Mini computations, this station held the ORB-SLAM2 algorithm on itself as well. ZED Mini took advantage of the board's GPU, while ORB-SLAM2 used mostly a single CPU core.
Both cameras (Intel RealSense T265 and ZED Mini) were mounted on top of each other in the front of the Arlobot platform. The mounting position was shifted from the robot's rotation axis to the front of it by 25 cm.

Software Setup
The robot software packages operate on ROS [23] (Robot Operating System, from the Open Source Robotics Foundation), more precisely ROS Kinetic under the GNU/Linux distribution Ubuntu 16.04 LTS. One has used modified "ROS packages for ArloBot" on Raspberry Pi to obtain communication with the "Parallax Activity board" [24] (microcontroller) on the robot.

Intel RealSense Tracking Camera T265
The RealSense T265 camera is a tracking camera that was released by Intel in March 2019 at a price of 199 USD. It includes two fisheye lens sensors as well as an inertial measurement unit (IMU). The visual SLAM algorithm runs directly on built-in Intel Movidius Myriad 2 VPU. This gives very low latency between movement and its reflection in the pose, as well as low power consumption that stays around 1.5 W. Since all the computations are performed in real-time onboard it does not require any computations to be held on the master computer.

ZED Mini
The ZED Mini [25] is a visual-inertial depth camera, which features dual high-speed 2K image sensors and a 110 • field of view. With an eye separation of 63 mm, the camera senses depth from 0.1 m to 12 m with improved accuracy and fewer occlusions in the near range. Using visual-inertial odometry technology, inertial measurements (IMU) are fused at 800 Hz with visual data from the stereo camera. Sensor fusion allows for accurate tracking even when visual odometry gets lost due to insufficient amount of feature matches. The image acquisition was done at a resolution of 720p and a frequency of 20 Hz (the best trade-off we found between blur and resolution). The ZED Mini odometry software was able to process frames in stereo at~19.5 Hz, taking advantage of the GPU (graphics processing unit) compute capability of the Nvidia Jetson computer board.

ORB-SLAM2
ORB-SLAM2 is a complete SLAM system [26] for monocular, stereo and RGB-D cameras that achieve state-of-the-art accuracy in many environments (cf. Figure 3). In this study, the monocular setup was used. We chose mono ORB-SLAM2 because stereo ORB-SLAM2 was too computationally heavy for the computer board used in the experiments, resulting in a too low framerate, especially when run in parallel with the other odometries. ORB-SLAM2 was only able to take advantage of a single CPU core, not of the other cores nor of the GPU. pose of current frame and optimizes its position minimizing the reprojection error applying motiononly bundle adjustment (BA); 2) local mapping thread that saves new keyframes, performs local BA and saves visual words for later (Bag of Words) (BoW) place recognition [27]; and 3) the loop closure to detect large loops using BoW approach and refine trajectory first performing pose-graph optimization and lastly performing full BA in order to obtain optimal structure and motion solution.
It is important to note that ORB-SLAM2 does not integrate any IMU by default, and has thus less sensor data to work with than the RealSense T265 and ZED Mini solutions. Furthermore, ORB-SLAM2 appeared very much computationally heavy in our setup, being able to process frames in monocular mode at only ~5.5 Hz (i.e., with many dropped frames). We believe it is still fair to test ORB-SLAM2 this way, as the computer board we used is not low-end for a small robot or drone.
Finally, as ORB-SLAM2 was operating in monocular mode, we did an offline calculation and optimization of the scale factor (cf. Section 3: Data Analysis). We also analyzed the gains when more computing power was available.

Scenarios
For each scenario, the robot starts by driving forward for three meters. Following this, it makes three full turns plus 180° (1260° in total) around its own spot. The process is repeated four times during each scenario.
We repeated the experiments for three different parameters, giving a total of eight combinations (cf. Table 1): • Quantity of visual features: We changed the number of visual features in the field of view: either "many" with several paper posters on the walls to increase the number of visual clues, or "few" with mostly grey walls. The floor is unchanged between conditions. • Robot speed: We made the robot drive at two different reference speeds: either "fast" with 1.07 m/s linear speed for ~2.52 rad/s angular speed (when the robot turns), or "slow" with 0.36 m/s linear (i.e., ~3 times slower) speed for ~0.35 rad/s angular speed (i.e., ~7 times slower). • Moving visual elements: We made the visual environment more, or less stable: either "static" with nothing moving, or "dynamic" with some persons constantly walking along the walls around the room. The main three components that are executed in parallel are: (1) tracking thread which estimates pose of current frame and optimizes its position minimizing the reprojection error applying motion-only bundle adjustment (BA); (2) local mapping thread that saves new keyframes, performs local BA and saves visual words for later (Bag of Words) (BoW) place recognition [27]; and (3) the loop closure to detect large loops using BoW approach and refine trajectory first performing pose-graph optimization and lastly performing full BA in order to obtain optimal structure and motion solution.
It is important to note that ORB-SLAM2 does not integrate any IMU by default, and has thus less sensor data to work with than the RealSense T265 and ZED Mini solutions. Furthermore, ORB-SLAM2 appeared very much computationally heavy in our setup, being able to process frames in monocular mode at only~5.5 Hz (i.e., with many dropped frames). We believe it is still fair to test ORB-SLAM2 this way, as the computer board we used is not low-end for a small robot or drone.
Finally, as ORB-SLAM2 was operating in monocular mode, we did an offline calculation and optimization of the scale factor (cf. Section 3: Data Analysis). We also analyzed the gains when more computing power was available.

Scenarios
For each scenario, the robot starts by driving forward for three meters. Following this, it makes three full turns plus 180 • (1260 • in total) around its own spot. The process is repeated four times during each scenario.
We repeated the experiments for three different parameters, giving a total of eight combinations (cf. Table 1): • Quantity of visual features: We changed the number of visual features in the field of view: either "many" with several paper posters on the walls to increase the number of visual clues, or "few" with mostly grey walls. The floor is unchanged between conditions. • Robot speed: We made the robot drive at two different reference speeds: either "fast" with 1.07 m/s linear speed for~2.52 rad/s angular speed (when the robot turns), or "slow" with 0.36 m/s linear (i.e.,~3 times slower) speed for~0.35 rad/s angular speed (i.e.,~7 times slower).

•
Moving visual elements: We made the visual environment more, or less stable: either "static" with nothing moving, or "dynamic" with some persons constantly walking along the walls around the room. We picked that specific path to fit into the area of our lab covered by the ground truth system, while assessing both translation and rotation. Furthermore, the robot would drive through an already seen path, giving a good chance for the SLAM algorithms to perform relocalization when seeing some known scenes.

Data Analysis
The datasets come from three different sources, namely OptiTrack system, Raspberry Pi and Jetson TX2. The first thing to do is to transform the data into same format. Because the robot runs only in a 2D plane, the position of different methods can be transformed into robot position (x, y) and robot orientation theta. Afterwards, the three datasets are synchronized and merged into one. In order to analyze the performance of different visual odometry systems relative to the OptiTrack, some columns of the dataset such as velocity and OptiTrack are interpolated (filled with previous values if the cells are empty) since the OptiTrack data does not come at the same timestamp as the others. Before calculating the errors, the ORB-SLAM2 data is scaled and the scale coefficient is found by gradient decent (i.e., we found the optimal scale factor), using the first part of each scenario (one seventh of the data points). Besides, the robot wheel odometry data also needs to be transformed to the camera center so that all measurements are in the same coordinate system. An example of how the data looks like at this stage can be seen in Figure 4.
The error of the visual odometry system is evaluated as translation error and rotation error, where the translation error is calculated by the distance offset relative to the ground truth and the rotation error is calculated by the angle offset. In addition, the incremental of the errors over time is also computed.
See details in the "Supplementary Materials" section for the source code and the data.

Descriptive Statistics
In order to get a better understanding of the data, a first round of descriptive statistics is performed. The two most informative visualizations are reported in Figure 5 and Figure 6, respectively for translation error (i.e., robot {x, y} position estimation error) and for rotation error (i.e., robot orientation error).
We observe that wheel odometry (based on optical encoders) always provide a poor translation ( Figure 5) and rotation ( Figure 6) estimation but does so in a quite consistent manner: wheel odometry is indeed not much affected by the scenarios-not even speed-which is not surprising in non-sliding condition. The measurements are more consistent during translations than during rotations.

Descriptive Statistics
In order to get a better understanding of the data, a first round of descriptive statistics is performed. The two most informative visualizations are reported in Figures 5 and 6, respectively for translation error (i.e., robot {x, y} position estimation error) and for rotation error (i.e., robot orientation error). The scenario with many features, slow speed, and static scene was without surprise the one with the best results for all odometries. Expectedly, we observe that the visual odometry systems are much affected by the challenging scenarios, with sometimes big errors for all of them, especially ORB-SLAM2, and especially during rotations. Figure 5. For each of the scenarios ("many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic"), we report the median translation error (red horizontal line), the 75% observed translation errors (blue rectangle box), the 95% observed translation errors (blue error lines), as well as outliers (red crosses above the rest). For each scenario, from left to right, are reported the odometries Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2.
For all visual odometries, a typical example of event leading to outliers is when there is a loss of tracking, followed by an accumulation of errors, until a sharp relocalization by loop closure.
As clearly visible in particular on Figure 5, aside for rotating, speed had the greatest detrimental effect on the quality of the visual odometries. The number of visual features had a clear, but lesser impact. Finally, the fact that some visual clues where moving or not in the field of view did not impact the accuracy as much as the other factors (and less than what we were expecting).

Wheel encoders
RealSense T265 ZED Mini ORB-SLAM2 Figure 5. For each of the scenarios ("many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic"), we report the median translation error (red horizontal line), the 75% observed translation errors (blue rectangle box), the 95% observed translation errors (blue error lines), as well as outliers (red crosses above the rest). For each scenario, from left to right, are reported the odometries Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2.
We observe that wheel odometry (based on optical encoders) always provide a poor translation ( Figure 5) and rotation ( Figure 6) estimation but does so in a quite consistent manner: wheel odometry is indeed not much affected by the scenarios-not even speed-which is not surprising in non-sliding condition. The measurements are more consistent during translations than during rotations.
The scenario with many features, slow speed, and static scene was without surprise the one with the best results for all odometries. Expectedly, we observe that the visual odometry systems are much affected by the challenging scenarios, with sometimes big errors for all of them, especially ORB-SLAM2, and especially during rotations.
For all visual odometries, a typical example of event leading to outliers is when there is a loss of tracking, followed by an accumulation of errors, until a sharp relocalization by loop closure.
As clearly visible in particular on Figure 5, aside for rotating, speed had the greatest detrimental effect on the quality of the visual odometries. The number of visual features had a clear, but lesser impact. Finally, the fact that some visual clues where moving or not in the field of view did not impact the accuracy as much as the other factors (and less than what we were expecting).
Other versions of the above figures are provided in Appendix A (with another visualization, some data smoothing, and more CPU power for ORB-SLAM2).
Other versions of the above figures are provided in Appendix A (with another visualization, some data smoothing, and more CPU power for ORB-SLAM2).

Statistical Analysis
In order to help identifying relevant differences, we did a light statistical analysis, with a series of t-Tests of the type "two-sample assuming unequal variances" from the "Analysis ToolPak" of Excel (Microsoft Office 365 version 1910). Table 2 contains results for the average translation error, while Table 3 contains the average rotation error, across all scenarios.
From Table 2, one can see that there is a significant difference in quality of the positioning between the odometry systems when it comes to translation error, except between wheel odometry and RealSense, and between RealSense and ZED Mini. Likewise from Table 3, one can see that there is no significant difference between the RealSense and the ZED Mini, while other odometries exhibit a significant difference between each-other in terms of rotation error.
As such, the statistical analysis confirmed the main trends observed in the descriptive statistics.
Wheel encoders RealSense T265 ZED Mini ORB-SLAM2 Figure 6. For each of the scenarios ("many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic"), we report the median rotation error (red horizontal line), the 75% observed rotation errors (blue rectangle box), the 95% observed rotation errors (blue error lines), as well as outliers (red crosses above the rest). For each scenario, from left to right, are reported the odometries Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2.

Statistical Analysis
In order to help identifying relevant differences, we did a light statistical analysis, with a series of t-Tests of the type "two-sample assuming unequal variances" from the "Analysis ToolPak" of Excel (Microsoft Office 365 version 1910). Table 2 contains results for the average translation error, while  Table 3 contains the average rotation error, across all scenarios. p-values marked with one asterisk "*" are better than 0.05; two asterisks "**" when better than 0.01.
From Table 2, one can see that there is a significant difference in quality of the positioning between the odometry systems when it comes to translation error, except between wheel odometry and RealSense, and between RealSense and ZED Mini. Likewise from Table 3, one can see that there is no significant difference between the RealSense and the ZED Mini, while other odometries exhibit a significant difference between each-other in terms of rotation error. p-values marked with two asterisks "**" when better than 0.01.
As such, the statistical analysis confirmed the main trends observed in the descriptive statistics.

Main Findings
Wheel odometry is not much affected by the different scenarios, not even by the change of speed, leading to more consistent values, especially during translation. This is not surprising because the floor surface remained identical. However, the standard deviation of wheel odometry is typically higher than for the visual odometries, making it generally less precise, especially during the easy scenarios (i.e., one or more of: low speed, many features, static environment).
But the scenarios do have a significant effect on visual odometries. In our tests, speed had the greatest effect (in the "fast" scenarios, linear speed was~3 times higher and angular speed~7 times higher), followed by the number of features, while the static vs. dynamic environment had the smallest effect.
Among the visual odometries, ORB-SLAM2 has the poorer results in our experiments, both in translation (p < 4 × 10 −2 ) and rotation (p < 4 × 10 −5 ), and for all scenarios. This materializes in a higher imprecision, a higher standard deviation, and more outliers than other methods. This is not surprising due to running in monocular mode and without IMU. Without IMU sensor fusion, if the visual tracking gets lost, all the subsequent position estimations will deviate highly until a familiar scene is found with loop closure.
Except for a few outliers, the RealSense T265 and the ZED Mini have comparable results in average (p > 0.1), both in terms or translation error and rotation error. The RealSense T265 is a bit more negatively affected by speed than ZED Mini, especially during translation.
Finally, we also tried to post-process the odometry data offline to smooth it and reduce outliers (cf. Figures A2 and A5 in Appendix A), but this did not change the main findings.

Camera Lens Types
The RealSense T265 has a wide field of few, making it able to potentially spot many more visual features than the ZED Mini, but the drawback is in principle a poorer image quality. In our experiments, in the end, it did not seem to make a significant difference, although we cannot tell which part of the results is due to the lens and which part is due to a difference of processing. The RealSense T265 would arguably have an advantage in scenarios where the interesting visual features are in the periphery of the camera field of view, i.e., not visible by cameras with narrower field of views.

Processing Power
On a robot or drone, aspects such as total weight, price, and power consumption are essential factors. On those factors, the RealSense T265 globally wins over the ZED Mini and ORB-SLAM2, as it comes with built-in data processing, while the other visual odometries require an additional powerful computer board such as an NVIDIA Jetson or similar.
Noticeably, the stereo version of ORB-SLAM2 was too computationally heavy for the computer board used in the experiments and could therefore not run in scenarios requiring real-time odometry.
We believe it is fair to include aspects such as processing power requirement when picking an odometry method. It is important to note that the quality of the ORB-SLAM2 odometry would have been a bit higher, had we used a more powerful computer board: ORB-SLAM2 uses a single CPU core fully (the other processes being run on the other CPU cores and not using them fully) and achieved to process frames at only about~5.5 Hz in average. This slow performance compared to ZED Mini is partially due to the fact that ZED Mini takes advantage of GPU compute capability, which ORB-SLAM2 could not do.
In order to compare with the maximum quality that the baseline ORB-SLAM2 would have achieved with unlimited processing power, we re-ran ORB-SLAM2 offline on the collected data, and found light obvious improvements, but without changing our findings, i.e., a resulting quality still lower than RealSense T265 and ZED Mini (cf. Figures A3 and A6 in Appendix A).

Multiple Sensors & Sensor Fusion
In our experiments, we compared the different method with one single sensor for each of them, but it would be possible to combine several cameras for potentially better quality. This is especially doable for larger robots.
A similar approach is to combine different types of sensors with a sensor fusion approach. Outside, such a sensor fusion could be done, with e.g., GPS (Global Positioning System), and indoor for instance with fiducial markers such as ArUco markers or other 2D-barcodes.
Noticeably, the RealSense T265 offers a built-in sensor fusion mechanism that can be fed with wheel odometry, but this was outside the scope of those experiments.

Limitations of Black-Box Systems
While it would be interesting for the academic discussion to be able to tune or disable various internal mechanisms of the RealSense T265 or of the ZED Mini (for instance the SLAM loop closure or IMU), those systems are relatively closed and cannot be inspected in details by third-parties like is possible to do, with e.g., an open source software algorithm. We, therefore, content ourselves with an overall assessment of their relative merits, independently of their internal choices. In particular, this prevents us from comparing ORB-SLAM2 (which does not have any IMU data) with the sole visual odometry methods included in RealSense T265 or of the ZED Mini, which would be fairer for ORB-SLAM2.

Limitations of the Experimental Design
The project supporting those experiments only had the resources to perform a controlled assessment in a relatively narrow use-case, i.e., short trajectory ground vehicle motion. We have, therefore, neither assessed the relative performance of the various odometry systems in 3D (e.g., drones) nor for longer trajectories. However, our experiments are a subset of the expected motions that can be seen in the more challenging 3D movements and/or longer navigations, and we can therefore expect that the limitations we observed also occur in those phases of the more challenging conditions. Anecdotally, we did come to similar conclusions when running our ground robots on longer indoor scenarios, but this was outside the lab and thus without a ground truth setup, so we cannot provide good quality data for the longer trajectories.

Conclusions
In the specific tested use-case, i.e., short trajectory ground vehicle motion with limited processing power, the experiments show that the Intel RealSense T265 compares well with off-the-shelf state of the art, especially when accounting for price, built-in processing power, and sensor fusion abilities. In our experiments, the ZED Mini and the RealSense T265 provide comparable results. This confirms another recent evaluation of the T265 [17], which also found that its localization is more reliable than the baseline ORB-SLAM2. However, a single RealSense T265 does not solve the visual odometry challenge fully. Therefore, even for basic indoor navigation needs, several sensors or techniques must be combined. For the time being, visual odometry remains a domain with room for additional research and improvements. In particular, we start observing great advances on visual-inertial odometry from the world of augmented reality [28] with Google ARCore [29], Apple ARKit [18], Microsoft Hololens [30], which comparison with the RealSense T265 is left for a future publication.
Supplementary Materials: The source code of the data analysis, as well as the raw data (in ROS Bag format) for the different odometry systems is available online from: https://github.com/DTUR3/visual_odometry_comparison.  Appendix A Figure A1. For each of the odometries (Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2), we report the median translation error (red horizontal line), the 75% observed translation errors (blue rectangle box), the 95% observed translation errors (blue error lines), as well as outliers (red crosses above the rest). For each odometry, from left to right, are reported the scenarios: "many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic". many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast dynamic mic ic ic c Figure A1. For each of the odometries (Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2), we report the median translation error (red horizontal line), the 75% observed translation errors (blue rectangle box), the 95% observed translation errors (blue error lines), as well as outliers (red crosses above the rest). For each odometry, from left to right, are reported the scenarios: "many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic". Figure A1. For each of the odometries (Wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2), we report the median translation error (red horizontal line), the 75% observed translation errors (blue rectangle box), the 95% observed translation errors (blue error lines), as well as outliers (red crosses above the rest). For each odometry, from left to right, are reported the scenarios: "many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", "few fast dynamic". Figure A2. Same as Figure A1, but after soothing the data on 500 points across all sensors, which is about ~1.76 s. many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast dynamic Figure A2. Same as Figure A1, but after soothing the data on 500 points across all sensors, which is about~1.76 s.  Figure A3. Same as Figure A1, but with ORB-SLAM2 computed offline (i.e., without central processing unit (CPU) processing power limitations) and excluding the first seconds of the experiments used for initialization (so the results for other odometries are not exactly those of Figure A1 either). many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast dynamic Figure A3. Same as Figure A1, but with ORB-SLAM2 computed offline (i.e., without central processing unit (CPU) processing power limitations) and excluding the first seconds of the experiments used for initialization (so the results for other odometries are not exactly those of Figure A1 either). Figure A3. Same as Figure A1, but with ORB-SLAM2 computed offline (i.e., without central processing unit (CPU) processing power limitations) and excluding the first seconds of the experiments used for initialization (so the results for other odometries are not exactly those of Figure A1 either). Figure A4. For each of the odometries (wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2), we report the median rotation error (red horizontal line), the 75% observed rotation errors (blue rectangle box), the 95% observed rotation errors (blue error lines), as well as outliers (red crosses above the rest). For each odometry, from left to right, are reported the scenarios: "many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", and "few fast dynamic". many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast dynamic Figure A4. For each of the odometries (wheel encoders, RealSense T265, ZED Mini, ORB-SLAM2), we report the median rotation error (red horizontal line), the 75% observed rotation errors (blue rectangle box), the 95% observed rotation errors (blue error lines), as well as outliers (red crosses above the rest). For each odometry, from left to right, are reported the scenarios: "many slow static", "many slow dynamic", "few slow static", "few slow dynamic", "many fast static", "many fast dynamic", "few fast static", and "few fast dynamic".
Robotics 2020, 8, x FOR PEER REVIEW 15 of 17 Figure A5. Same as Figure A4, but after soothing the data on 500 points across all sensors, which is about ~1.76 s. many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast dynamic Figure A5. Same as Figure A4, but after soothing the data on 500 points across all sensors, which is about~1.76 s. Figure A5. Same as Figure A4, but after soothing the data on 500 points across all sensors, which is about ~1.76 s. Figure A6. Same as Figure A4, but with ORB-SLAM2 computed offline (i.e., without CPU processing power limitation) and excluding the first seconds of the experiments used for initialization (so the results for other odometries are not exactly those of Figure A4 either).

References
many slow static many slow dynamic few slow static few slow dynamic many fast static many fast dynamic few fast static few fast Figure A6. Same as Figure A4, but with ORB-SLAM2 computed offline (i.e., without CPU processing power limitation) and excluding the first seconds of the experiments used for initialization (so the results for other odometries are not exactly those of Figure A4 either).