Vision-Based Cooperative Pose Estimation for Localization in Multi-Robot Systems Equipped with RGB-D Cameras

: We present a new vision based cooperative pose estimation scheme for systems of mobile robots equipped with RGB-D cameras. We ﬁrst model a multi-robot system as an edge-weighted graph. Then, based on this model, and by using the real-time color and depth data, the robots with shared ﬁeld-of-views estimate their relative poses in pairwise. The system does not need the existence of a single common view shared by all robots, and it works in 3D scenes without any speciﬁc calibration pattern or landmark. The proposed scheme distributes working loads evenly in the system, hence it is scalable and the computing power of the participating robots is efﬁciently used. The performance and robustness were analyzed both on synthetic and experimental data in different environments over a range of system conﬁgurations with varying number of robots and poses.


Introduction
Multi-Robot Systems (MRSs), which were first proposed in early 1980s, are becoming increasingly popular mobile sensor platforms to measure and estimate quantities of interest at spatially distributed locations.Compared to a single-robot operation, MRSs have the advantages on faster task completion, more extensive coverage, increased reliability to sensor failures, and higher estimation accuracy through sensor fusion.MRSs have been widely used in a variety of tasks such as data collection [1], surveillance [2][3][4], target tracking [5][6][7], formation tracking and control [8][9][10], and visual SLAM [11][12][13].
Precise knowledge on locations and orientations (poses) of the robots is a prerequisite for the successful accomplishment of the collaborative tasks.One approach to localize the robots is to equip them with Global Positioning System (GPS) receivers and use the GPS system.However, GPS signals are not available indoors and they cannot directly provide the orientation information.An alternative approach is cooperative localization in which robots work together and use the robot-to-robot measurements to construct a map of their network.Cooperative localization operates in two broad stages: The initialization process, which involves relative pose estimation (RPE), provides initial location and orientation estimates of the robots.Then the refinement process updates the initial estimates iteratively to enhance the accuracy.
In this study, we focus on the initialization process of the cooperative localization in MRSs.The MRSs of interest here are equipped with cameras or visual sensors.In this kind of MRSs, robots' locations and orientations can be estimated through careful calibration of the rigid body transform operations between the cameras mounted on the robots.Generally there are two kinds of vision-based approaches to achieve this goal: (1) manual calibration; or (2) self-calibration.Manual calibration approaches require a special calibration pattern to be visible in all images [14] or the precise pose information of calibration patterns/objects have to be known [15].Some easily detectable single features which require human interaction, such as moving a LED in a dark room, can also be used to manually calibrate multiple cameras [16][17][18].Manual calibration methods, even though provide good results, require special equipment or time consuming manual measurements.Self-calibration algorithms simultaneously process several images captured by different cameras and find the correspondences across images.Correspondences are established through extracting 2D features from images automatically and matching them between different images.Then, based on the established correspondences, cameras' relative poses can be estimated from the essential matrix.Besides the algorithm which uses static features [19][20][21][22], Aslan et al. [23] detect people walking in the room and use the point on the top of each person's head as the calibration feature.The accuracy of the self-calibration highly depends on the reliability of the relative pose estimates.This problem was first discussed in [24] with the concept of the vision graph.Kurillo et al. [25], Cheng et al. [26], and Vergés-Llahí et al. [27] later used and refined it for this purpose as well.It is becoming a useful general tool for describing the directionality of networked visual sensors.It has been more recently addressed by Bajramovic et al. [28][29][30].They proposed a graph based calibration method which measures the uncertainty of the relative pose estimation between each camera pair.All of the self-calibration algorithms measure the epipolar structure of the system and suffer from scale ambiguity.If there is not any object or pattern with known geometry in the scene, the orientations and locations between robots are determined up to a scale.
In this paper, we consider the initialization process for localization in a multi-robot system with N ≥ 3 robots operating in GPS-denied indoor environments (see Figure 1).A RGB-D camera, which provides both color images and per-pixel depth information, is mounted at the top of each robot.A central node with high performance processor is also implemented in the system which can operate computationally expensive computer vision algorithms.We present a novel self-calibration algorithm to determine the locations and orientations of the robots in this RGB-D camera equipped multi-robot system.The propose scheme can be arranged over an indoor scenarios without imposed constraints for all robots to share a common field-of-view (FoV).Our approaches assume that at least any two given robots have overlapping FoVs and that the cameras on robots have been internally calibrated prior to deployment.Our proposed algorithms consist of the following steps: (1) each robot extracts color features locally and sends the descriptors of these features to the central node; (2) the central node performs feature matching to determine neighboring robots and generates an Initial Pose Matrix (IPM); (3) the central node constructs a robot dependency graph and selects a number of relative poses to connect robots as a calibration tree; (4) after the central node broadcasts the information of the calibration tree, robots work collaboratively to determine the relative poses according to the calibration tree; (5) the determined relative poses are then transmitted to the central node to compute the poses of all the robots in the system.We formulate the selection of relative poses as a shortest path problem, which consists of finding shortest path from a vertex to the other vertices in an edge-weighted graph.The graph represents FoVs of robots as vertices and overlapping FoVs as edges respectively.The main contributions of this paper are: • Construction of a robot dependency graph based on the overlapping ratio between neighboring robots.• Development of a procedure to determine the relative pose of multiple RGB-D camera equipped robots.• By contrast to the conventional approaches that only utilize color information, our approach takes the advantages of the combination of RGB and depth information.• The locations and orientations of robots are determined up to the real world scale directly without involving scale ambiguity problem.
• Extensive experiments using synthetic and real world data were conducted to evaluate the performance of our algorithms in various environments.
The rest of the paper is organized as follows.In Section II, the characteristics of the RGB-D camera and the multi-robot system used in this paper are introduced.In Section III, we formulate the robot localization problem and propose our solutions.In Section IV, we present the experiments and results, and our concluding remarks can be found in Section V.

eyeBug: A Robot Equipped with RGB-D Camera
At the Wireless Sensor and Robot Networks Laboratory (WSRNLab) [31] we have created a multi-robot system consisting of experimental mobile robots called eyeBugs (Figure 2) for computer vision, formation control, visual sensing research activities.A single-board computer, BeagleBoard-xM [32] is the main computational resource of an eyeBug.Each BeagleBoard-xM has an ARM37x 1 GHz processor, a USB hub, and an HDMI video output.Communication service between robots is provided through WiFi links.We run an ARM processor optimized GNU/Linux operating system (Ubuntu Version 11.10) [33].OpenKinect [34], OpenCV [35] and libCVD [36] libraries were installed to capture and process image information.Microsoft Kinect, which produces color and disparity-based depth images, is mounted vertically at the center of the top plate of each robot.The default RGB video stream provided by Kinect uses 8 bits for each color at VGA resolution (640 × 480 pixels, 24 bits/pixel).The depth video stream is also in VGA resolution.

Characteristics of RGB-D Camera
Kinect has an infrared (IR) projector-camera-pair and a RGB camera.The depth sensing of Kinect is based on a fixed structured light source positioned at a known baseline from the IR camera.The depth information is measured through a triangulation process which is based on the detection of transverse shifts of local dot patterns in the IR speckle with respect to its reference patterns at a known distance to the device [37].This process runs repeatedly on all regions in the IR speckle and generates a disparity-based depth image.It should be noted that the depth signal inevitably degrades when multiple RGB-D cameras are pointing at the same scene.It is because the camera projects a structured light dot pattern onto the scene continuously without modulation and devices interfere with one another [38].This interference can be eliminated by a "Shake 'n' Sense" approach [38].
The normalized disparity values returned by the Kinect are inversely proportional to their depth [39].Furthermore, [40,41] show that there is a non-linear relationship between the normalized disparity values and their depth value in Euclidean space.Therefore, it is more suitable to represent the data in inverse depth coordinates.Consider the vector p e = [x y z 1] T which represents a real world point in Euclidean space by using homogeneous coordinates.The relation between a real world point in inverse depth coordinates and its corresponding pixel in the depth image can be established as follows, where (i, j) denotes the pixel coordinates of this real world point projection in the depth image, and z is the corresponding depth value returned by the camera.

Overview
Given N (N ≥ 3) robots equipped with intrinsically calibrated RGB-D cameras, the goal is to automatically determine the initial pose of each robot in a common coordinate system using only the color and depth data.A central node with a high performance processor is also included in the system which runs the computationally expensive algorithms.The function of this node is explained in Section 3.3.
When two robots a and b have sufficiently overlapping FoVs, the relative pose between two robots can be represented by a transformation matrix, M ab , in SE(3) as follow, where R is a 3 × 3 rotation matrix and t is a 3 × 1 translation vector.M ab denotes relative pose of robot b with respect to robot a and is the rigid transformation from the coordinate system of robot b to that of robot a.If there is a robot c and the relative pose between robots c and b is M bc , then the relative pose between robots a and c can be derived via composition as, This transformation provides a mapping from the coordinate system of c to that of b, then from that of b to that of a. Robot b is the intermediate node in this process.This operation is transitive, therefore one robot's pose relative to another can be determined indirectly over an arbitrary number of intermediate poses if they exist.Thus, the system's topology can be built up from the pairwise relative poses between robots that have common FoVs.In order to achieve this, we first need to determine the robots with sufficiently overlapping FoVs.Secondly, robots are grouped in pairs to determine rough estimations of the relative poses, and a number of relative poses are selected based on the reliability of the pose information.In the final step, we calibrate the overall system based on the selected pairwise relative poses.A general description of the scheme we propose is shown in Figure 3.Each step is described in details in the following sections.The list of symbols used through the paper is given in Table 1.An element of a 6D motion vector.G j 6D motion generator matrices.

Assumptions
We make the following assumptions about the multi-robot system: • Intrinsic parameters of the RGB-D camera on each robot are calibrated prior to deployment, • At least two robots in the system have overlapping FoVs; • The scene is static and the robots do not move during the localization process, and • The robots can form an ad-hoc network and directly communicate with each other.

Neighbor Detection and Initial Relative Pose Estimation
We define robots with overlapping FoVs as neighbors.One robot's neighbors can be detected through searching for image pairs sharing a common FoVs.This search can be viewed as a matching of point correspondences that considers the local environment of each feature point set.There are three steps in neighbor detection process: feature detection, feature description, and feature matching.The first two steps are performed on each robot locally.Taking processing speed and accuracy into consideration, we implement FAST [42,43] for feature detection and ORG [44] for feature description on each robot.Instead of transmitting the complete images, each robot sends the feature descriptions to the central node to minimize the transmission load.The corresponding depth information of each feature is also transmitted in conjunction with the feature descriptors.
Associating the feature descriptors with their corresponding depth values, central node can generate feature points in 3D.Central node performs feature matching between every two sets of the feature descriptors.In order to increase the matching reliability and robustness against outliers, we adopt both the symmetrical matching scheme and the geometric constraint to refine the matched points.In the symmetrical matching scheme, the correspondences between two sets of feature descriptors are established bidirectionally.One group of correspondences is generated from matching the first feature set to the second feature set.The other group is produced from matching the second feature set to the first feature set.For a pair of matched features to be accepted, two features must be the best matching candidate of each other in both directions.
Then, we use RANSAC to find a coarse registration, M * ij , between every two matched feature sets.The error metric used to find the best alignment is Here, p l i and p l j contain the depth information of two matched feature points as described in Equation (1).Each term in the summation indicates the squared distance between the transformed pose of a feature point p l i in robot i's feature set and the matched feature point p l j in the robot j's feature set.Between every two matched feature sets, the central node samples a number of matched feature point pairs and determine the transformation matrix repeated.The determined transformation in each iteration is evaluated based on the number of inliers in the remaining 3D feature points.Ultimately, only the matched feature points which agree with the optimal transformation matrix are kept as the good matches.The determined coarse registration between every two matched feature sets are stored as the initial relative poses.Initial relative poses are not accurate which require further refinements.
After operating the above process on every two feature sets, an Initial Pose Matrix (IPM) can be constructed.As shown in Table 2, each element, M * ij , represents the initial relative pose between robot i and robot j.The diagonal elements represent the relative pose with itself, thus they are negligible.

Selection of Relative Pose
After determining the neighboring robots and initial relative poses, we will show the problem on estimating all robots' poses can be transformed to the all-pair shortest path problem.
The relative pose between two neighboring robots can be estimated by RPE algorithm.In order to calibrate the whole system, we require to select a number of relative poses to link all robots together.Since different overlapping areas in FoVs lead to various uncertainty values in the RPE.This process should choose the relative poses with the minimum overall uncertainty between two robots.Furthermore, it is known that the accuracy of the estimation of the relative pose between two robots may significantly degrade when an increasing number of intermediate nodes are added into the computations.This is mainly due to uncertainty accumulated in each time RPE algorithm operates between two robots.In order to ensure each robot has the reliable knowledge of the other robots' locations and orientations, we require to select the relative poses which introduce the smallest overall amount of uncertainty value to calibration the system.

Robot Dependency Graph Construction
To efficiently consider all possible combinations of robot poses, we suggest the usage of the graph structure robot dependency graph.Robot dependency graph consists of a set of vertices representing each view of the scene observed by a robot.The weight on each edge indicates to the degree of uncertainty of the pair of views being connected.Thus, estimating all robots' poses can be transformed to finding the shortest path between every two vertices in the robot dependency graph.
In order to determine the weight on each edge, we need to first derive the uncertainty degree of relative pose estimation between every two neighbors.The relative pose between two neighboring robots can be estimated through aligning the 3D point clouds extracted from the depth images captured by different robots.The motion between two depth images can be estimated by various approaches, such as ICP variants [45][46][47], feature-based registrations [48,49], and combinational methods [50,51].We choose to use our previous proposed algorithm [47] among the existing approaches, since it reports more accurate and robust results in environments with various amount of occlusions than the state-of-the-art works.The performance of the RPE algorithms depends on the overlapping area between two FoVs.In the same circumstance, a larger overlapping area leads to a more accurate estimate.The overlapping area between two neighbors can be estimated by initial relative pose determined in Section 3.3.
Let M ab denote the relative pose between robots a and b.The pixels of the depth image captured by robot a can establish a relation with their projections on the depth image captured by robot b as Applying Equation ( 6) on the depth image observed by robot a, we can generate a synthetic view which is virtually taken at robot b's viewpoint.The overlapping area between two robots' FoVs can be determined through comparing the real and synthetic depth images.We define the overlapping ratio between two neighbors as the proportion of overlapping area in the observed image.However, in this approach the central node requires the knowledge of complete depth image of robot a.If all the robots have to transmit their observed depth images to the central node, the considerable transmission load will be generated.In order to efficiently estimate the overlapping ratio, robot a can only send the values and coordinates of four pixels in its observed depth image.These four pixels are the nearest pixels with valid depth values to the four corners (top left, top right, bottom left, and bottom right) of a image.After applying Equation ( 6) on these four pixels, the quadrangle constructed by the reprojections of these four pixels indicates the region observed by robot a in robot b's view.Though the points in the scene lay on different planes and have various range values, this approach can still provide a rough estimate of the overlapping ratio.An example is shown in Figure 4. We conducted numerical simulations on the RPE algorithm [47] to explore the relation between the overlapping ratio and the uncertainty in estimation.The equation is adopted to quantize the overlapping ratio and uncertainty degree.Here, φ ij represents the overlap ratio between two robots i and j, and w ij indicates the uncertainty degree in RPE between two neighboring robots.A larger w ij indicates a larger uncertainty value in the RPE.Based on IPM and criteria in Equation ( 7), the Uncertainty Matrix (UM) can be generated.
According to the UM, we can generate the robot dependency graph, G = (V, A).There is an edge between any two neighboring robots iff the overlapping ratio is within the range in Equation (7).The weight of the edge linking robots i and j is w ij , which indicates the uncertainty degree.A lower w ij indicate a smaller uncertainty value in relative pose estimation.Then, the problem of relative pose selection is transformed to the all-pairs shortest path problem in the robot dependency graph, which minimizes the uncertainty in the RPE between every two robots.
The shortest path between every two vertices can be determined using Floyd-Warshall algorithm.The central node first generates Dist as a |V | × |V | array of minimum distance and initializes Dist according to UM.Then, Floyd-Warshall algorithm is used to determine the shortest paths between every pair of robots and update Dist.Then, the central node needs to select one robot as the primary robot and make all the other robots calibrate their poses according to the primary robot's coordinate system.In order to minimize the uncertainty, the central node selects the robot which has the smallest overall weight on the shortest paths to all the other robots as the primary robot.At last, the robots can be connected as a calibration tree with the primary robot as the root.In this method, the RPE algorithm only requires to operate |V | − 1 times to connect all the robots.The time complexity of the overall scheme is O(V ).Thus, this scheme is scalable to initialize multi-robot systems with a large number of robots.

Distributed Relative Pose Estimation Algorithm
Though the initial relative poses on the edges of the calibration tree have already be obtained in neighbor detection process, these estimations are too inaccurate to calibrate the overall system.Therefore, after the calibration tree is built, the central node will broadcast the information of the calibration tree and the related initial relative poses to all the robots.Then our earlier work [47] implemented on every robot will operate to refine the initial relative poses.In our earlier work [47], an Iterative Closest Points (ICP) variant was proposed to estimate the relative pose between two RGB-D camera equipped robots.Different from the methods for conventional RGB camera which use feature correspondences to determine the rotation and translation up to a scale, this distributed, peer-to-peer algorithm determines the relative pose in consistent real world scale through explicit registration of surface geometries extracted from two depth images.The registration problem is approached by iteratively minimizing a cost function in which error metrics are defined based on the bidirectional point-to-plane geometrical relationship.
Z a and Z b are two depth images captured by robots a and b.The correspondences between N = N a + N b pairs of points extracted from Z a and Z b are established.We can then estimate the transformation matrix M ab by minimizing the bidirectional point-to-plane error metric, C, expressed in normal least squares form as follows, where p l a and p k b are the sampled points in depth frames Z a and Z b , p l * b and p k * a are their corresponding points on the other depth image respectively.The variables w l,a and w k,b are weight parameters for correspondences established in opposite directions between pairs.The variables w l,a and w k,b are weight parameters for correspondences established in opposite directions between pairs.Also, n l * ,b and n k,b are the surface normals at the corresponding points p l * b and p k b in real world coordinates, and The cost function presented in Equation ( 8) consists of two parts: 1. the sum of squared distances in the forward direction from depth images Z a to Z b , and 2. the sum of square distances in the backward direction from Z b to Z a .
Algorithm 1 Relative pose refinement procedure Randomly sample N a points from Z a to form set P a , Randomly sample N b points from Z b to form set P b , Find the corresponding point set, P * b , of P a in Z b , The correspondences are established using the project and walk method with a neighborhood size of 3x3 based on the nearest neighbor criteria 8: Apply the weight function bidirectionally, P a → P * b , P b → P * We can then estimate M ab by re-weighting the least squares operation in an ICP framework.Details of the criteria for selecting the weight function can be found in [47,52].An overview of the entire process is presented in Algorithm 1.In the first iteration, M ab is initialized by the initial relative pose determined in neighbor detection process.Afterwards, in this coarse-to-fine algorithm, each iteration generates an update E to the robot's pose which modifies the transformation matrix M ab .E takes the same form as M ab which may be parameterized by a 6-dimensional motion vector having the elements α 1 , α 2 , . . ., α 6 via the exponential map and their corresponding group generator matrices G 1 , G 1 , . . ., G 6 as where Here G 1 , G 2 and G 3 are the generators of translations in x, y and z directions, while G 4 , G 5 and G 6 are rotations about x, y and z axes respectively.
Afterwards, the task becomes finding the α 1 , . . ., α 6 that describe the relative pose.Through determining the partial derivatives of u b , v b and q b with respect to the unknown elements of the motion vector α 1 , . . ., α 6 , the Jacobian matrix for each established corresponding point pair can be obtained from The six-dimensional motion vector, which minimizes Equation ( 8), is then determined iteratively by the least squares solution in which W is a diagonal matrix weighting the bidirectional point-to-plane correspondences.B, Y, and K are matrices Here, T is the surface normal expressed in a slightly different form than the one shown in Equations ( 9) and (10).To detect the convergence of our algorithm, we use the thresholds for the ICP framework presented in [46].Once the algorithm converges, the registration is considered as completed and the M ab is refined based on the initial relative pose.
In reality each robot only has its own captured depth image.In order to efficiently accomplish the centralized working principle of the algorithm described above, we distribute the tasks to two robots.Instead of transmitting a complete depth image from one robot to another, each robot only transmits a number of sampled points on its captured depth image to the other robot.The distributed process is illustrated in Figure 5.At each iteration, after robot b receives the sampled point set, P a , from robot a, robot b will find the corresponding point set, P * b , on its captured depth frame Z b .The first component in Equation ( 8) will be derived.The information representing the first component will be sent along with the sampled point set, P b , from robot b to robot a.At robot a, P b 's corresponding point set, P * a , will be determined.And the second component in Equation ( 8) will be derived.Thereby, robot a will acquire the information of both first and second component in Equation (8), and the motion parameters can be determined.These procedures will be performed in each iteration.The transformation matrix describing the relative pose between two robots will be obtained by robot a until the algorithm converges.More details and performance evaluations of this approach can be found in [47].
The relative poses that construct the calibration tree are transmitted to the central node after being determined by robots.Then all robots' locations and orientations can be calibrated according to primary robot's coordinate system.A simple example of the working process is shown below.
Figure 6 depicts a calibration tree for initializing a multi-robot system with 4 robots.Robots operate RPE algorithm to derive the relative poses M ab , M ac , and M cd according to the tree.By using these three pose matrices, the relative pose between every two robots in the network can be derived.For instance, robot d's location and orientation in robot b's coordinate system can be derived as . Example of a calibration tree.

Experimental Results and Discussion
In order to quantitatively evaluate the performance of the proposed method, we conducted a series of experiments both in simulation and with our custom built multi-robot system.Section 4.1 describes the localization experiments that were carried out with our experimental multi-robot system in indoor environments.Section 4.2 presents the results of a set of simulations that were designed to further evaluate the behavior of the method.

Indoor Experiments
We used the color and depth images captured by our experimental multi-robot system consisting of seven RGB-D camera equipped robots.The images were taken from various locations in and around WSRNLab facility.Color images of collected five scenes are shown in Figure 7.As the robots are deployed on the same plane in this set of experiments, the ground truth locations and orientations can be easily measured manually.The estimated robots' poses are shown in Figure 8, in which the estimated poses are depicted in red circles and the ground truth are represented in blue stars.We derived the average absolute errors accordingly and presented in Table 3.The calibration trees of 5 scenes are illustrated in Figure 9.We can see the pose estimations of Scene 1 have the smallest absolute error, while the estimates in Scene 5 have the largest absolute error.We also measure the average relative error for localization.The relative errors are computed based on the absolute errors and systems' dimensions.It is clear that the pose estimation results in Scene 4 and Scene 5 are the most and least accurate among five scenes respectively.Through analyzing the robots' sensing ranges in different scenes, we find that sensing range is a main factor that affects the performance of our proposed scheme.As reported in [39], the errors in the depth measurements of Kinect increase quadratically from a few millimeters at 0.5 m distance to about 4 cm at the maximum range of the sensor.The inaccurate depth information obtained by the Kinect on each robot influences the performance of the distributed relative pose estimation algorithm, thereby decreasing the accuracy of the overall scheme.Due to this limitation of the RGB-D camera, the robots' sensing ranges should be controlled between around 0.5 m to 3.5 m.

Simulation Experiments
A series of simulation experiments were conducted to investigate the accuracy and bandwidth consumption of the proposed scheme implemented on systems that were larger and have more complicated topologies than the ones we could construct with our available hardware.When robots are deployed on different planes, the ground truth poses can hardly be measured precisely by the manual methods.Therefore, in this set of simulations we applied 3D image warping technique [53] on the color and corresponding depth images captured in Scene 4 to generate synthetic image sets with known transformation matrices.Image sets are generated for systems consisting of 10, 15 , 20, 25, 30, 35, and 40 robots.In this process, we make sure that each robot has sufficiently overlapping FoV with at least one another robot and can be connected in calibration tree.The results presented in Figure 10 are averaged over 10 runs of the simulations with vertical bars indicating the variance.
Figure 10a,b, indicate the absolute errors in both location and orientation increase as the number of robots in the system grows.It is because when the number of robots raises, the calibration tree becomes larger and the primary robot requires more intermediate nodes to establish the connection with another robot in the system.Though this effect accumulates small errors, the average absolute errors are still quite small and the average relative error is within 1.3%.The average bandwidth consumption of each robot is presented in Figure 10c.As expected, the bandwidth usage per robot remains consistent approximately in relation to the number of robots in the system.There are two processes in our proposed scheme require communication between robots: (1) Neighbor detection; and (2) distributed relative pose estimation.As the number of times that distributed relative pose estimation runs increase linearly with number of robots, the each robot's transmission load in this process will not be influenced by the number of robots in the system.The only variable that affects the transmission load is the number of the feature points in the neighbor detection process.However, the number of feature points, which depends on the structure of the captured scene, is irrelative with the number of robots.Therefore, the evenly distributed communication load throughout the system indicates the good scalability of our proposed scheme.

Conclusions
This paper describes the first approach which uses color and depth information to initialize robots' poses in a RGB-D camera equipped multi-robot system.Our scheme first detects each robot's neighbors using robust feature matching.Then the overlapping area between FoVs of neighboring robots are determined to establish the robot dependency graph.A calibration tree is generated through finding the shortest path between the primary robot to all the other robots in the system.At last, a distributed relative pose estimation algorithm is performed to precisely compute relative pose between every two connected robots in the calibration tree.Extensive real world experiments and synthetic dataset simulations have been conducted.The results show that our scheme is robust and accurate in various environments and densities of robots.Importantly, the proposed scheme operates distributively and allows the robots to use the limited wireless bandwidth more efficiently.This calibration algorithm, which provides initial

Figure 1 .
Figure 1.An indoor mapping and exploration scenario showing the Monash University's RGB-D equipped experimental mobile robots "eyeBugs".A typical application would be mapping indoors after a disaster such as Fukushima nuclear reactor accident.As shown in the diagram, there are numerous challenges that need to be addressed.In this paper, our focus is initialization problem in cooperative localization.

Figure 2 .
Figure 2. eyeBug-the robot developed for the Monash WSRNLab's [31] experimental multi-robot platform.The RGB-D data generated by the Kinect RGB-D sensor is processed on BeagleBoard-xM running GNU/Linux operating system.

Figure 3 .
Figure 3. Operational overview of the proposed self-calibration scheme for cooperative pose estimation.

Table 1 .b
Mathematical Notation.Notation Description Z a Depth image captured by robot a. p e Vector representing a real world point in Euclidean space.(i c,a , j c,a ) Principal point coordinates of the pinhole camera model.(f x,a , f y,a ) Focal length of the camera in horizontal and vertical axes.M ab Transformation matrix describing the relative pose between robots a and b. p l a Sampled points on the depth image captured by robot a. p l * b Corresponding points of p l a on the depth image captured by robot b.N a Number of sampled points on Z a .P a Set of sample points on Z a .P * Set of corresponding points of P a on Z a .n l * ,b Surface normal at point p l * b .w l,a Weight parameter for correspondence established between p l a and p l * b .E Update transformation matrix in each iteration.α j i a , j a ) and (i b , j b ) are the pixel coordinates on different depth images, (i c,a ,j c,a ) and (i c,b ,j c,b ) are the principal points of cameras on two robots, (f x,a ,f y,a ) and (f x,b ,f y,b ) are the focal lengths, and z a and z b are the depth values of the same real world point's projections in different depth images.

Figure 4 .
Figure 4. Overlapping area estimation.(a) Depth image captured by robot a, 4 corner pixels are highlighted in red; (b) Depth image captured by robot b; (c) Depth image virtually captured at robot b's position.It is generated from (a); (d) The rough estimate of overlapping area in robot b's view.White region indicates overlapping region.

a 9 :
Compute and update transformation matrix based on current bidirectionally weighted correspondences 10: end procedure

Figure 5 .
Figure 5. Distributing the tasks to two robots.

Figure 7 .
Figure 7. Color images captured by the multi-robot system in 5 scenes.

Figure 8 .
Figure 8.Estimated and ground truth robots' poses.Estimated locations are depicted in red circles and the ground truth are represented in blue stars.The line segments on different markers indicate orientations.(a) Scene 1; (b) Scene 2; (c) Scene 3; (d) Scene 4; (e) Scene 5.

Figure 10 .
Figure 10.Simulation results.(a) Average absolute error in location estimation with increasing number of robots in the system; (b) Average absolute error in estimated orientation with increasing number of robots in the system; (c) Bandwidth consumption per robot with increasing number of robots in the system.

Table 2 .
Initial Pose Matrix (IPM) and Uncertainty Matrix (UM) of a multi-robot system with four robots.

Table 3 .
Average error between the estimated poses and the ground truth.