A Mobile Robot Position Adjustment as a Fusion of Vision System and Wheels Odometry in Autonomous Track Driving †

: Autonomous mobile vehicles need advanced systems to determine their exact position in a certain coordinate system. For this purpose, the GPS and the vision system are the most often used. These systems have some disadvantages, for example, the GPS signal is unavailable in rooms and may be inaccurate, while the vision system is strongly dependent on the intensity of the recorded light. This paper assumes that the primary system for determining the position of the vehicle is wheel odometry joined with an IMU (Internal Measurement Unit) sensor, which task is to calculate all changes in the robot orientations, such as yaw rate. However, using only the results coming from the wheels system provides additive measurement error, which is most often the result of the wheels slippage and the IMU sensor drift. In the presented work, this error is reduced by using a vision system that constantly measures vehicle distances to markers located in its space. Additionally, the paper describes the fusion of signals from the vision system and the wheels odometry. Studies related to the positioning accuracy of the vehicle with both the vision system turned on and off are presented. The laboratory averaged positioning accuracy result was reduced from 0.32 m to 0.13 m, with ensuring that the vehicle wheels did not experience slippage. The paper also describes the performance of the system during a real track driven, where the assumption was not to use the GPS geolocation system. In this case, the vision system assisted in the vehicle positioning and an accuracy of 0.2 m was achieved at the control points.


Introduction
The article describes the implementation of algorithms to determine the position and orientation of the mobile robot in a certain coordinate system without the participation of the GPS positioning system. Authors divided the described task into two subsystems constantly sending messages to each other via the UART bus. The first subsystem consists of programs located in the robot control system and determines the robot's trajectory calculated from the vehicle wheels. This is called wheels odometry and it is widely used in the automotive industry. A general discussion of this solution is described in [1,2]. Currently, designers of modern solutions to increase the accuracy of the determined robot position, use a combination of many independent positioning systems. Therefore, the authors added to the wheels odometry a vision system. A vision system with an implemented SLAM (Simultaneous Localization and Mapping) algorithm is the most often described in the literature [3]. Visual odometry is a technique used by many different robots. Ichimura uses 3D-Odometry in the bike robots [4]. Tanaka et al. deploy odometry in the robot which specializes in the exploration and sampling of the seafloor [5]. Current research indicates that wheels odometry are crucial for robot self-localization [6]. Such an approach may be implemented using not only the camera but also IMU and wheel encoder [7]. Jung et al. proposed a solution based on MEMS IMU efficient in urban areas [8]. Chaudhari et al. proved that Cartesian odometry and PID algorithm may significantly increase the accuracy of path planning [9]. Zhang et al. present odomerty method to estimate the pose of the robot using ORB features extraction [10]. To increase the accuracy of the self-localization process. Lin et al. proposed a method based on ceiling vision [11]. Current approaches use deep learning techniques to improve visual odometry [12]. Aladem et al. proposed a solution which enables application of visual odometry in non-cooperative environment, for example in the night [13]. The partial differential of the pixel grayscale is another method for improving the performance of visual odometry in difficult conditions [14]. In this article, the vision system only provided position corrections, and was geared toward finding characteristic landmarks. The problem with landmarks was studied by [15]. The OpenCV library algorithms are used [16], where the authors use ARTags in augmented reality. Due to the autonomous movement of the robot, the issue of safety arises during the physical interaction of a human with a robot [17].
The vehicle presented in this paper belongs to a group of vehicles called unmanned ground vehicles (UGV), where an important task of their creators is to develop navigation algorithms to improve their ability to operate in difficult terrain [18]. The UGVs are robots that move and make decisions based on various algorithms both deterministic and artificial intelligence. While driving, vehicles use their sensors to avoid obstacles or locate their position in the environment. These abilities make that, this kind of vehicles are increasingly important in terms of replacing humans in many missions that are dangerous for them. The vision system described by the authors aims to remove positioning errors coming from other sensors of the robot. This error is often the cause of slippage. There are studies that reduce slippage not only for wheeled vehicles [19].
To check the correctness of the implemented subsystems, a mobile robot was designed and built [20]. In addition, many tests were performed on the test track. This track was precisely measured and a map was made based on these measurements. Control points and the position of graphic markers-ARTags-were placed on the map. The experiment assumed that the robot autonomously would be able to move from the starting point to the next and subsequent checkpoints with the greatest accuracy. Besides, if the robot calculated that a checkpoint was reached, the blue lamp mounted on its board stated to flash.
By design, numbers from 0 to 16 are encoded in the ARTags. To accurately determine the position of the robot, the vision system needed to recognize three markers. If two markers had been detected, two robot positions were determined and fortunately, one of which could easily be rejected. If one marker was detected, the robot could be located with lower accuracy, one was positioned on the circle indicated by the distance between the robot and the marker. To calculate the distance to the markers, a specially designed head with two cameras was used. The appearance of the head and the ARTag are shown in Figure 1.
The software that interpreted the robot's space was initially tested on a laptop with an Intel Core i7 processor and 32 GB RAM, which at 1024 × 760 resolution allowed to run the ARTag recognition algorithm 15 frames per second. The vision system communicated with the robot's main processor via a serial bus and transmitted the numbers of founded markers and the distances to them. Information from the wheel odometry was also transmitted to the main processor. Ultimately, the important step of the whole task was to combine the subsystems described above into one decision-making control system, i.e., calculating signals fusion. Tests for the accuracy and speed of the vision system were described in [21]. A general structure of the robot system is shown in the Figure 2.  All algorithms pertaining to the wheels odometry were implemented in C language and are flashed in the STM processor. The vision system was written in C++ language.
A photograph of the robot is shown in Figure 3.

Determining Vehicle Position
In this section, the description of the method for acquiring information about the position and orientation of the mobile robot, both from sensors determining travelled distance and the vision system, will be presented. At the end of the section, the method for fusing those signals will be described.

Position from Wheels and IMU
The current relative position of the mobile robot in two-dimensional real space was determined by a vector of two coordinates p w = [x, y]. Those coordinates were determined according to Equation (1) where: θ-current vehicle yaw angle relative to the axis Y, v x , v y -vehicle speed, x 0 , y 0initial vehicle position. In order to determine the current relative position of the mobile robot, information about the current speed and angle relative to the Y-axis (azimuth) in some coordinate system is necessary. For this purpose, each of the robot wheels is equipped with a DC motor with an incremental encoder and also for each wheel there a PID controller algorithm was implemented to control the velocity. The control system, at each calculation step, maps the current linear vehicle speed from known angular velocities of individual wheels and geometrical dimensions. At the same time, azimuth is read from an inertial navigation system equipped with an accelerometer, gyroscope and magnetometer.

Position from ARTags
Let's define image I as a two-dimensional array containing grayscale intensity of the recorded pixel, I : where Ω is the domain of the image. Point P = (x, y, z) T from 3D space is donated as p = (x, y) T ⊂ Ω. The vision system has been developed and will be presented in the 6 steps described below.

Edge Detection-Canny Algorithm
First, all edges of the image should be detected. The edge is an ordered significant change in the adjacent pixel values. This means that if I(x k , y 1 ) > I(x k , y 2 ), a vertical edge is obtained. In the simplest case, the edge is gained (discovered) using the equation However, we can correctly obtain the edge only if the image is passed throught a low-pass filter, for example, a Gaussian filter h(τ, σ), because the edge will hide in the noise. This means that the final form of the edge detection filter will be equal to where in Canny edge detector threshold τ > 0 and standard deviation σ > 0. Next step is as follows. If ∇I T ∇I is larger than the predefined gradient threshold and is in local maximum along the gradient, set this pixel as the edge. The final step of the algorithm is to differentiate the one-point edge from the hysteresis of two thresholds [22].

Contour Detection and Polygonal Approximation
Next, the contours of the object are needed, whose edges were obtained in the previous step. For this purpose, the algorithm of the polygon contour description is proposed. Its main goal is to specify the vertices of the polygon according to the Ramer-Douglas-Peucker algorithm. Let's define vector D = [p 0 p 1 . . . p n ] describing the points lying on the contour of the object. Define the segment L k with the beginning at point p 0 and ending at point p n . Additionally, determine all lines perpendicular to this segment L k passing through the points from vector D and determine the longest norm L 2 d k for this straight line If d k > T, set a new segment L k+1 at the beginning at current L k segment beginning and end at point p k , for whose d k has been determined. In Equation (4) T is arbitrarily selected threshold. Save point p k as the vertex of the polygon in vectorD. If d k < T calculate L k+1 such that the beginning is at point p k , and the end at point p n . Reassign d k+1 according to Equation (4). Finish the algorithm if the L segment cannot be created.

Rejecting Incorrect Markers
The next step of the graphic markers recognition task is to choose appropriate candidates based on the following criteria. The first criterion is to check whether vectorD i defining vertices has 4 elements. Next, it is important to check also, whether the vertices form closed contours, i.e., whether they depict geometrical figure. The last two tests check if the geometric figure is convex and whether the distance between consecutive segments resulting from the fusion of points in vectorD is big enough. VectorD i , which fulfills these tests is saved as a candidate containing the marker shape.

Removing Perspective and Warping Algorithms
The candidate selected in the previous step may include a marker wiht a number. We need to find position of a camera coordinate system to make vectorD i containing vertices of any quadrangle to be able to convert into vertices of the square. In this work, the warping perspective method with homography A was used. Next, knowing the transformation matrix the equation could be usedÎ(x, y) = A × I(x, y) to distort the image.

Detecting Marker Frame and Reading Its Code
Having determined picture from the previous step, it is possible to start reading the tag code. To do this, the image is divided into a grid of 9 × 9 squares, then a count of the white pixels in each square is performed. If their number is higher than the defined threshold, then the square represents the binary number 1, oppositely the square is assigned to value 0. The resulting matrix containing binary numbers is compared with the code patterns. Each marker has a frame with two squares thickness.

Calculation of the Position and Orientation of the Marker
The proposed equation for a calibrated camera can be written as follows where K is the matrix of the internal camera parameters containing the focal length and the physical centre of the image, matrix [R|t] is the matrix of the external camera parameters containing rotations and translations of the camera coordinate system relative to global coordinate system. The task is to calculate matrix Q from the linear equation . Using the homogeneity of the image points p i the equations could be written as, , where t = [x, y, z], from which we will determine the distance between the vehicle and the marker d v = λ x 2 + y 2 + z 2 .

Fusion of Wheels and Vision Signals
This method of determining the position of a robot is exposed to errors, for example from the wheel slip. Additionally, the error of the currently determined position is the sum of all errors that have occurred since the beginning of the journey. To determine the relative position with correction determined according to the method presented above, a comparison of results from the vision system was used. A set of markers were placed into a workspace of the mobile robot, whose values and positions were known. The vision system sent information about numbers of currently detected markers and distance to them, to the robot controller. Then, based on the information received, correction was determined according to the following procedure. There were three cases.
The first occurs when the vision system correctly recognizes one marker. This is shown schematically in Figure 4, where: m-vector determining the position of the marker, pvector of the robot position determined from the odometry system, p c -vector determining the corrected position of the robot, v c -correction vector, r-distance from the marker obtained from the vision system. The corrected vehicle position is calculated as a point on a circle with a radius equal to the distance to the marker obtained from the vision system and the center point of the marker that is closest to the current vehicle position determined from the odometry system.
The second case takes into account the situation when the vision system correctly detects two markers simultaneously (or the subsequent markers appear with a time interval smaller than 0.3 s). This is schematically illustrated in Figure 5, where: m i -vector defining the position of the marker i, p-vector of the robot position determined from the odometry system, p c -vector defining corrected robot position, v c -correction vector, r i -distance from the marker i obtained from the vision system. In this situation, the corrected robot position is in one of two places, which are the intersection of two circles with radii equal to vehicle distance from the markers and the centres lying in the markers point. The robot algorithm chooses the solution, which is closer to the robot's currently determined odometry position.  The third case takes into account the situation when the vision system correctly detects three or more markers simultaneously (or the subsequent markers appear with a time interval smaller than 0.3 s). This is schematically illustrated in Figure 6, where the designations were adopted as in Figure 5. In this situation, three markers are selected. The corrected robot position is determined as the intersection of all three circles, with radii equal to the distance from the markers and with centres at the marker points.
In the first case, that is, when only one marker is detected, the vehicle is located on a circle with a radius that is the distance of the vehicle to the marker. In this case, its position can be estimated. The vehicle is located on the arc of the circle closest to the position calculated from the wheels odometry. In the second case, after detecting two markers, the vehicle can be located in two positions, which are the intersections of circles determined by the distances to these markers. The closest intersection to the current vehicle position is selected. In the third case, the vehicle position is determined uniquely and the vehicle is located at the intersection of three circles. Corrected data is immediately sent to the robot control system and the tentative robot position is updated, but with provision, that the position taken from the vision system always has higher priority.

System Tests
In this section, the results which contain accuracy tests obtained in the laboratory and real test during international competition, which took place in Poland, are presented.

Test in the Laboratory
The first stage relates to testing the position accuracy from the vehicle to the desired position on the map. Figure 7 shows a vehicle equipped with a vision system and few ARTags located on the test track in the laboratory. Tests on the correctness of estimating the closest distance to the indicated target were carried out in good lighting conditions, thanks to which the differences between the actual distance and the one determined by the algorithm were small, with the average distance of up to 3 m, the relative errors were in most cases smaller than 1% as shown in Figure 8. For worse conditions, for example underexposure, the maximum detection distance was significantly reduced. The strong light falling on the marker could even prevent detection. A detailed description was presented in [21] where it was noted that absolute measurement errors increase with the distance. The most accurate measurements were obtained for distances up to 3 m and this corresponds to the distance for which the camera was calibrated. In the test conducted in the laboratory, there were five marker destinations that the robot had to travel to.
In every test, the robot made 76 drives on a randomly selected point and stopped there. Exactly 12 ARTags were placed on the test track. Each ARTag was 180 mm × 180 mm in size. During one test, the robot covered about 668 m. The first test was an autonomous drive with the vision system turned off. This means that only wheel odometry was taken into account with some undesirable drift coming from the inertial sensor. We can see driving accuracy results marked by the letter (a) in Figures 8-12. In the figures we marked by the letter (c), the position deviations in the axis x and y with cameras turned off. The exact same sequence was reproduced in the second experiment with the vision system and odometry turned on (every letter (b) in the figures). Deviations in this case are presented in figures (d). The system had two cameras on board. Figure 13 presents differences between set and reached positions in the 76 driving experiments. Figure 13a is a graph representing position accuracy in the x direction with cameras turned off and Figure 13c with cameras on. Comparing Figure 13b,d, which represent the position deviations, we can see that the case with cameras on the rover increased its position accuracy. Figure 13e,g represent the position accuracy in the y direction. When cameras were turned on, the accuracy was better, this is shown in Figure 13f,h. Figure 13i,j show, respectively, the distance deviation when the cameras are turned off and on. Those two show that there was a significant improvement in the accuracy in determining the rover position when we applied vision system distance calculation to the wheel odometry. Table 1 shows the result of quantifying the positioning accuracy without cameras and with them. The sum of the deviation modules, measured in the individual axes and the sum of the distance deviations were used as the indicator. It should be noted that the vison system increased vehicle position accuracy by about 50 percent.

Test in the Real Environment
The second stage of the test was performed in Kielce University of Technology and at the European Rover Challenge international competitions in 2018 and 2019. Figure 14 presents a map of test track with control points (letter W), graphic markers (letter L) where the point number was coded using ARTags. The places with obstacles were marked in black (letter R). The robot's position on the track was marked on the map with a red dotted line. The side of the map grid shows a resolution of 1m. In Figure 15, the arrows show the corrections made during the driving. The black arrow indicates the correction made after detecting the markers number L4 and L5, while the red arrows indicate where the control system made corrections after detecting the L6 and L7 markers. As mentioned in [21], the recognition range of the system was limited to about 6 m, therefore corrections are calculated near the markers. During the main drive at the competition, the vehicle reached the checkpoints with the following accuracy. Checkpoints W1-2 cm, W2-5 cm, W3-20 cm. The W4 checkpoint was very difficult to reach due to the large wheels slip of the vehicle and the fact that the vision system did not have two markers in its field of view. The vehicle control system did not make position correction and decided that the vehicle had reached the checkpoint without disruption. The same situation happened with the WX point, which was hidden behind the hill.

Conclusions
The paper describes the automatic driving system of a mobile vehicle, which was equipped with two independent systems based on which the vehicle's position on the test track can be determined. The experiment assumes that GPS cannot be used. The first of the implemented systems calculated the robot positions from the robot wheels. This is the main system of the robot positioning in space, but its main disadvantage is the fact that the larger the distance travelled by the robot, the more positioning error increases. Therefore, the authors decided to develop a system that provides corrections from an independent system that gives an absolute measurement to known characteristic points with a well-known location on the track. The measurement to the markers was carried out using a vision system and if the marker or markers appeared in the field of view of the camera, the system began to use the detection algorithm and determines the correction needed. Additionally, the vehicle algorithm informed about this fact by lighting the blue lamp on board. The places where the system made corrections are shown in the screenshots and marked with arrows. During real track tests, the vehicle performed 15 drives. The robot was placed on the starting line and then we measured the positioning accuracy. It should be noted that the vehicle was not able to complete the full test drive, when the vision system was turned off, despite being equipped with a simple wheel slip detection system. This result was expected, due to the very sandy ground. A more sophisticated system would need to be used in the future. When the vision system was turned on, an approach to the first checkpoint was obtained with an accuracy of 2 cm, the worst result was obtained at the last checkpoint and it was 20 cm. This was because the checkpoint was laying at the top of a very steep hill. In this case, it may help, the algorithm described in [19]. It can be concluded that the laboratory tests overlapped with the real track tests. The addition of the vision system made the autonomous task possible to complete across the entire test track, with the required accuracy of 50 cm. Especially, when the obtained results were compared with GNSS with RTK results ±15 cm presented in [23].
During further research, it is worth considering replacing known graphical markers such as ARTags with arbitrary feature points from the vehicle environment. This will allow authors to determine the position with higher accuracy. Currently, due to the increasing computational efficiency of microprocessor chips, the authors plan to implement vehicle tracking using robust SLAM and sparse point cloud on the vehicle board.