A Novel Identification Methodology for the Coordinate Relationship between a 3D Vision System and a Legged Robot

Coordinate identification between vision systems and robots is quite a challenging issue in the field of intelligent robotic applications, involving steps such as perceiving the immediate environment, building the terrain map and planning the locomotion automatically. It is now well established that current identification methods have non-negligible limitations such as a difficult feature matching, the requirement of external tools and the intervention of multiple people. In this paper, we propose a novel methodology to identify the geometric parameters of 3D vision systems mounted on robots without involving other people or additional equipment. In particular, our method focuses on legged robots which have complex body structures and excellent locomotion ability compared to their wheeled/tracked counterparts. The parameters can be identified only by moving robots on a relatively flat ground. Concretely, an estimation approach is provided to calculate the ground plane. In addition, the relationship between the robot and the ground is modeled. The parameters are obtained by formulating the identification problem as an optimization problem. The methodology is integrated on a legged robot called “Octopus”, which can traverse through rough terrains with high stability after obtaining the identification parameters of its mounted vision system using the proposed method. Diverse experiments in different environments demonstrate our novel method is accurate and robust.


Introduction
The use of sensors, especially vision sensors and force sensors, which provide robots with their sensing ability, plays an important role in the intelligent robotic field. Legged robots, after having a good knowledge of the spot environment, can previously select a safe path and a set of appropriate footholds, then plan the feet and body locomotion effectively in order to traverse rough terrains automatically with high stability, velocity and low energy consumption. There have been some related examples in recent years. HyQ [1] can trot on uneven ground based on its vision-enhanced reactive locomotion control scheme, by using an IMU and a camera. Messor [2] uses a Kinect to classify various terrains in order to achieve automatic walking on different terrains. The DLR Crawler [3] can navigate in unknown rough terrain using a stereo camera. Little Dog [4] uses a stereo camera and the ICP algorithm to build the terrain model. Messor [5] uses a laser range finder to build the elevation map of rough terrains, and chooses appropriate foothold points based on the elevation map. Planetary Exploration Rover [6] builds a map model with a LIDAR sensor based on the objects' distance and plans an optimized path. AMOS II [7] uses a 2D laser range finder to detect the distance to obstacles and gaps in front of the robot, and it also can classify terrains based on the detected data. A humanoid robot [8] uses a 3D TOF camera and a webcam camera to build a digital map, and after that plans a collision avoiding path. Another humanoid robot [9] can walk along a collision avoiding path based on fuzzy logic theory with the help of a webcam camera. The RHEX robot [10] is able to achieve reliable 3D sensing and locomotion planning with a stereo camera and an IMU mounted on it.
In robotics, if a vision sensor is mounted on the robot, its pose with respect to the robot frame must be known, otherwise the vision information can't be used by the robot. However, only a few works describe how to compute it. In related fields, problems of extrinsic calibration of two or more vision sensors have been studied extensively. Herrera [11] proposed an algorithm that could calibrate the intrinsic parameters and the relative position of a color camera and a depth camera at the same time. Li, Liu et al. [12] used the straight line features to identify the extrinsic parameters of a camera and a LRF. Guo et al. [13] solved the identification problem of a LRF and a camera by using the least squares method twice. Geiger et al. [14] presented a method which can automatically identify the extrinsic parameters of a camera and a range sensor using one shot. Pandey and McBride [15] successfully performed an automatic targetless extrinsic calibration of a LRF and a camera by maximizing the mutual information. Zhang and Robert [16] proposed a theoretic algorithm calibrating extrinsic parameters of a camera and a LRF by using a chessboard, and they also verified the theory by experiments. Huang et al. [17] calibrated the extrinsic parameters of a multi-beam LIDAR system by using V-shaped planes and infrared images. Fernández-Moral et al. [18] presented a method for identifying the extrinsic parameters of a set of range finders by finding and matching planes in 5 s. Kwak [19] used a V-shaped plane as the target to calibrate the extrinsic parameters of a LIDAR and a camera by minimizing the distance between corresponding features. By using a spherical mirror, Agrawal [20] could achieve extrinsic calibration parameters of a camera without a direct view. When two vision sensors don't have overlapping detection regions, Lébraly et al. [21] could obtain the extrinsic calibration parameters using a planar mirror. By using a mirror to observe the environment from different viewing angles, Hesch et al. [22] determined the extrinsic identification parameters of a camera and other fixed frames. Zhou [23] proposed a solution for the extrinsic calibration of a 2D LIDAR and a camera using three plane-line correspondences.
Kelly [24] used GPS measurements to establish the scale of both the scene and the stereo baseline, which could be used to achieve simultaneous mapping.
A more related kind of work is the coordinate identification between a vision system and manipulators. Wang [25] proposed three methods to identify the coordinate systems of manipulators and a vision sensor, then compared them by simulations and experiments. Strobl [26] proposed an optimized robot hand-eye calibration method. Dornaika and Horaud [27] presented two solutions to perform the robot-world and the hand-eye calibration simultaneously, one was a closed-formed method which used the quaternion algebra and a positive quadratic error function, the other one was based on a nonlinear constrained minimization, they found that the nonlinear optimization method was more stable with respect to noises and measurements errors. Wong wilai [28] used a Softkinetic Depthsense, which could acquire distance images directly, to calibrate an eye-in-hand system.
Few papers and researches involve identifying the coordinate relationship between the vision system and legged robots. The most similar and recent work to our own is that of Hoepflinger [29], which calibrated the pose of a RGB-D camera with respect to a legged robot. Their method needed to recognize the foot position in the camera coordinate system based on the assumption that the robot's foot has a specific color and shape. Then identification parameters can be obtained by comparing the foot position in different coordinate systems, the camera frame and the robot frame. Our research target is the same with theirs, while the solution is totally different.
Existing methods to identify extrinsic parameters of the vision sensor suffer from several disadvantages, such as a difficult featuring matching or recognition, requirement for external equipment and the involvement of human interventions. Current identification approaches are often elaborate procedures. Moreover, work has seldom been done for the pose identification of the vision sensor mounted on legged robots. To overcome limitations of the existing methods and supplement relevant study in legged robots, in this paper we propose a novel coordinate identification methodology for a 3D vision system mounted on a legged robot without involving other people or additional equipment. This paper makes the following contributions: 1. A novel coordinate identification methodology for a 3D vision system of a legged robot is proposed, which needs no additional equipment or human inventions. 2. We use the ground as the reference target, which makes it possible for our methodology to be widely used. At the same time, an estimation approach is introduced based on the optimization and statistical methods to calculate the ground plane accurately. 3. The relationship between the legged robot and the ground is modeled, which can be used to precisely obtain the pose of the legged robot with respect to the ground. 4. We integrate the proposed methodology on "Octopus", which can traverse rough terrains after obtaining the identification parameters. Various experiments are carried out to validate the accuracy and robust of the method. The remainder of this paper is organized as follows: Section 2 provides a brief introduction to the robot system. Section 3 describes the problem formulation and the definition of coordinate systems. Section 4 presents the modeling and the method in detail. Section 5 describes the experiments and discusses the error and robust analysis results. Section 6 summarizes and concludes the paper.

System Description
The legged robot is called "Octopus" [30,31], which has a hexagonal body with six identical legs arranged in a diagonally symmetrical way around its body as shown in Figure 1. The robot is a six DOFS moving platform that integrates walking and manipulating. A vision system is necessary for building a terrain map, and its mounting position and orientation with respect to the robot frame, which is essential for locomotion planning, need to be acquired.  shows the control architecture of the robot. Users send commands to the upper computer via a control terminal, which can be a smart phone or a pad and communicates with the upper computer via Wi-Fi. The sensor system contains a 3D vision sensor, a gyro, a compass and an accelerometer. The 3D vision sensor detects the terrain in front of the robot and provides the 3D coordinate data. The 3D vision sensor connects with the upper computer via USB. The compass helps the robot navigate in the right direction in outdoor environments. The gyro and the accelerometer can measure the inclination, the angle velocity and the linear acceleration of the robot. The upper computer is a super notebook, which receives and processes useful data from the sensor system. The upper computer sends instructions to the lower computer via Wi-Fi too. The Wi-Fi networking is created by the upper computer. The lower computer runs a real-time Linux OS. The lower computer analyzes messages sent by the upper computer, then plans locomotion and sends planned data to drivers via Ethernet at run time. Drivers provide current to motors, and servo control motors using the feedback data from resolvers. The current work we are doing is try to make the robot walk and operate automatically in unknown environments with the help of the 3D vision sensor. Automatic locomotion planning needs the 3D coordinates of the surroundings, which can be transferred from depth images captured by the 3D vision sensor. Common laser range finders can only measure distances to objects that are located in the laser line of sight, while the 3D vision sensor can measure all the distances to objects in the range of the detection region, which is the reason why we choose a 3D vision sensor. The 3D vision sensor we use is a Kinect (as Figure 3 shows), which integrates multiple kinds of useful sensors, consisting of a RGB camera, an infrared emitter and camera, and four microphones. The RGB camera can capture 2D RGB images, the infrared emitter and camera constitute a 3D depth sensor which can measure the distance. Speech recognition and sound source localization can be achieved by processing voice messages obtained by the four microphones at the same time. Equipped with the 3D vision sensor, the robot can see objects from 0.8 m to 4 m and has a 57.5° horizontal vision angle and 43.5° vertical vision angle. The range from 1.2 m to 3.5 m is a sweet spot, in which the measuring precision can reach millimeter level [32,33]. Additionally, a small motor inside the 3D vision sensor allows it to tilt up and down from −27° to 27°. The 3D vision sensor is installed at the top of the robot as Figure 4 shows. The motor is driven to make the 3D vision sensor tilt down in order to ensure it can detect the terrain in front. The blue area is the region that the 3D vision sensor can detect, and the green area is the sweet spot. The height of the 3D vision sensor, denoted by h, is about 1 m. The short border VA of the green area is about 1.2 m and the long border VB is about 3.5 m through geometric calculations. We can make sure the depth data in the green area have a higher precision.

Problem Formulation and Definition of Coordinate Systems
As mentioned above, it is very important to know the exact relationship between the 3D vision sensor coordinate system and the robot coordinate system. In other words, the mounting position and orientation of the 3D vision sensor must be identified. In order to express this simply, we us G-CS as short notation for the ground coordinate system, R-CS is short for the robot coordinate system, and V-CS is short for the 3D vision sensor coordinate system. As Figure 5 shows, the G-CS is represented by  The transformation matrix G R T in Figure 5 describes the position and the orientation of the R-CS with respect to the G-CS. Similarly, the identification matrix R V T describes the position and orientation of the V-CS with respect to the R-CS, which can be denoted by the X-Y-Z fixed angles of the R-CS. Concretely, set the R-CS fixed, the V-CS rotates γ along the XR-axis, then rotates β along the ZR-axis, and rotates α along the YR-axis, at last translates qx,qy,qz along the XR-axis, YR-axis, ZR-axis, respectively. After that we can get the current V-CS. Table 1 shows the identification parameters, and our goal is to determine the six identification parameters. The 3D coordinates of the terrain obtained by the vision sensor can be transferred to the R-CS by the transformation of the identification matrix R V T . Table 1. The identification parameters.

Fixed Axes Identification Angles Identification Positions
Equation (1)  ( , , , , , )= 0 where: cos cos cos sin cos sin sin cos sin sin sin cos = sin cos cos cos sin sin cos sin sin cos cos sin sin sin sin cos cos

Proposed Identification Methodology
Section 4 presents the novel identification model and method in detail.
As Figure 6 shows, ( ) P x, y,z is an arbitrary point on the ground plane, V P is with respect to the V-CS and G P is with respect to the G-CS. V P and G P fulfill Equation (4): where R V T is the identification matrix we proposed in Section 3, and G R T is the transformation matrix from the R-CS to the G-CS. V P can be detected by the 3D vison system and fulfills a standard plane Equation (5): The upper left mark V in Equation (5) denotes the variables are with respect to the V-CS G P , which is with respect to the G-CS, fulfills the following standard plane Equation (6): The upper left mark G in Equation (6) denotes the variables are with respect to the G-CS. In our work, the ground fulfills the following plane Equation (7): The term R V T can be computed by solving the constraint Equation (4). G R T , representing the relationship between the robot and the ground, can be obtained using the model presented in Section 4.2. In our methodology, R V T is not computed by recognizing some certain points P. Instead, we estimate the ground plane from the point cloud detected by the 3D vision system. Then we have developed an algorithm which will be presented in Section 4.3 that formulates the identification problem as an optimization problem. The above modeling can reduce recognition errors and avoid measurement errors. (8) is the distance from the detected point to the ground plane, as Figure 7 shows:

Estimation of the Ground Plane
ε in Equation (9) is defined to facilitate the computation. The Lagrange multiplier method is used to find the minimum value of ε . The Lagrange function is given by: The following formula exists: Equation (12) can be obtained from Equation (11): Substituting Equation (12) into Equation (8), we can obtain the Equation (13): There also exist the following equations: Equation (14) can be rewritten as a matrix equation: Observing Equation (15), we can find that , ,  (12). Because detection errors and influences of the outer environment exist in the identification process, some abnormal points have large errors, and some other points do not belong to the ground plane. These two kinds of points are called bad points, and a statistical method is used to exclude the bad points. Bad points can be removed, when the distances from them to the ground plane are larger than the standard value. Figure 8 describes the estimation process of the ground plane. First, , , ,

Relationship Model between the Legged Robot and the Ground
In this section, the relationship model between the legged robot and the ground is established to accurately compute the robot's position and orientation (denoted by G R T ) with respect to the G-CS. The detailed expression of G R T is shown in Equation (17), whose formation process is similar to R V T . , , γ β α ′ ′ ′ are angles that the robot rotates along the -axix, -axis and -axis , , , , , where:  Figure 9. Initial state of the robot.
The initial position and orientation of the robot is shown in Figure 9 The robot can reach the set pose if the actuation joints are driven to the calculated positions. The important point here is that there may be deviations between the real pose and the set pose because of the manufacture and installation errors. However, it is quite important to reduce errors during the whole identification process in order to increase the identification precision. Therefore, the real pose is calculated by the following derivations.
Similarly ,   1  2  3  4  5  6 , , , , , P are known. Foots 1, 3 and 5 are chosen to calculate the real pose. As Figure 9 shows, there exist the following relations: where the upper left mark G represents all the geometric relations are built with respect to the G-CS. Equation (22) can be obtained from Equation (21): By solving the above equations, the real translation vector G R P are calculated. Additionally, there exists Equation (25):   1  3  5  1  3  5 , , , , From Equation (25), the real orientation matrix G R R is computed too. Thus, the real translation matrix G R T can be calculated from Equations (24) and (25).

Formulation of the Identification Function
The following equations are obtained from Equation (4): 11 Figure 10. Formulation of the identification function.

a t a t a t x a t a t a t y a t a t a t z a t a t a t a y a t a t a t x a t a t a t y a t a t a t z a t
As Figure 10 shows, Equation (27) Theoretically, the measured ground coincides with the theoretical ground as shown in Figure 10. Because of Equation (29), Equation (27) is a standard plane equation, so it is obvious that Equation (27) is the same as Equation (5) derived in Section 4.1. Then the following four equations can be obtained: 21

a t a t a a t a t a t b a t a t a t c a t a t a t a d
Generally, the legged robot has six DOFS, which can be used to simplify the identification process and increase the identification precision. At the beginning, the robot is located in an initial state, the

Experimental Results and Discussion
In order to verify the proposed identification methodology, a series of experiments were carried out on the robot. The experimental results and related discussions are presented in this section. Figure 11. The experimental environment. Figure 11 shows the experimental environment, a small section of flat ground is in front of the robot. The 3D vision sensor is mounted at the top of the robot, and connected to the upper computer via USB. The 3D vision sensor is set to tilt down in order to guarantee that it can detect the ground. The upper computer controls the robot to reach 52 different groups of poses, and also controls the 3D vision sensor to detect the ground. When the robot reaches a set pose, the 3D vision sensor captures a depth image of the ground. Table A1 in the Appendix shows the 52 different groups of pose parameters, which are used in experiments. Taking into account the length of the paper, only six groups of the 52 experiments' data are listed. But all the experimental data are discussed in detail. Figure 12 shows the six different groups of the robot poses.  Figure 13 shows the point cloud (blue points) of the ground corresponding to the above six groups of poses. The red point in Figure 13 denotes the origin of the 3D vision sensor. Some of cloud points having larger errors are removed using the approach proposed in Section 4.1, thus blue points far away from the 3D vision sensor are sparse. Correspondingly, Table 2 shows the six measured ground equations which are computed based on the approach in Section 4.1.

Errors Analysis
Substituting R V T into Equation (4), theoretical ground equations with respect to the V-CS can be obtained. Table 3 shows six detailed expressions of the theoretical ground.   maximum value is about 3.236 mm. The robot's minimum step height is 50 mm when it is walking, and its foot can rotate from −35° to 35° with respect to its leg. Thus the robot can bear the maximum angle error of 0.5219° and the maximum distance error of 3.236 mm easily. Above analysis results show that the identification precision fulfills the requirement of the robot, which validates our theory.

Robust Tests
In this section, the robustness of the methodology is tested by carrying out identification experiments under two typical situations: different illumination conditions and different ground conditions. For the robust tests under different illumination conditions, the experiments are carried out at different times in an urban environment. As Figure 19 shows, the first experiment is carried out under normal illumination The experiment is executed 20 times under each illumination condition. We provide the mean and standard deviation of the identification results in Table 4 along with box plots in Figure 20 to illustrate their spread. Figure Table 4 shows the statistical results of these tests. The mean values of α , β , γ under three lumination conditions are nearly the same, as the mean values of positions differ less than 4 mm. The standard deviation obtained in the bright set is the maximum, the standard deviation of α is less than 0.052°, the standard deviation of x p is less than 0.8 mm. Nevertheless, the standard deviations in the bright set are relatively small compared to the results of Hoepflinger [29].  For the robustness test under different ground conditions, the experiments are performed on three different terrains. As Figure 21 shows, the first experiment is carried out on a flat ground as a reference, the second experiment on a slightly complex ground, and the third experiment on a considerably complex ground. The experiment is executed 20 times on each terrain. The mean and standard deviation of the identification results are provided in Table 5; and the results' spread is illustrated with box plots in Figure 22. As Figure 22 shows; the identification precision on the flat ground is the highest. considerably complex ground has about 5 mm difference compared to the ones obtained on the other two terrains. It can be observed that the standard deviations are sufficiently small; the maximum standard deviation of angle is less than 0.05° and the maximum standard deviation of position is less than 0.72 mm; both results being obtained on the considerably complex ground.  To conclude this section, above box plots show how the illumination conditions and the complexity of the ground affect the identification precision. The identification results obtained under different illumination conditions and different ground conditions do not differ greatly and the standard deviations are quite small, which shows our method is very robust and stable and can be applied in some complex environments.

Use Case
A use case, underlying the importance and applicability of the methodology in the legged robots field, is presented next. In reality, a legged robot is often used in an unknown environment to execute daunting tasks. With the help of a vision sensor, the robot has a good knowledge of the environment. What's more, after computing the extrinsic parameters relating the vision sensor and the legged robot, an accurate relationship between the robot and the terrain can be obtained. Thus the automatic locomotion can be implemented to execute tasks.
As Figure 23 shows, the robot is in an unknown environment with obstacles. Based on the proposed methodology, the extrinsic parameters relating the sensor and the robot can be computed. The terrain map with respect to the robot can be built. Moreover, the accurate position and orientation of the obstacles are obtained from the terrain map. An automatic locomotion planning algorithm combining the terrain information is executed to plan the foot and body trajectories.  Figure 24 shows the whole process of passing through the obstacles, the robot body is regulated to move forward horizontally. During the whole process, the feet are placed at the planned footholds, so its body remains stable when walking on the obstacles. The results show a successful application of the methodology in the intelligent robotic field.

Conclusions
In this paper, we have presented a novel coordinate identification methodology for a 3D vision system mounted on a legged robot. Generally, the method can address the problem of extrinsic calibration between a 3D type vision sensor and legged robots, which few studies have worked on. The proposed method provides several advantages. Instead of using any kind of external tools (calibration targets and measurement equipment), our method only needs a small section of relatively flat ground, which can reduce recognition errors and avoid measurement errors. Moreover, the method needs no human intervention, and it is practical and easy to implement.
The theoretical contributions of this paper can be summarized as follows. An approach for estimating the ground plane is introduced based on optimization and statistical methods, and the relationship model between the robot and the ground is established too. The identification parameters are obtained from the identification function using the LM algorithm. Finally, a series of experiments are performed on a hexapod robot, and the identification parameters are computed using the proposed method. The calculated errors satisfy the requirements of the robot, which validates our theory. In addition, experiments in various environments are also performed, the results show that our methodology has good stability and robustness. A use case, in which the legged robot can pass through rough terrains after accurately obtaining the identification parameters, is also given to verify the practicability of the method. The work of this paper supplements relevant study in legged robots, and the method can be applied in a wide range of similar applications.