# Multi-Camera Sensor System for 3D Segmentation and Localization of Multiple Mobile Robots

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Multi-Camera Sensor System

#### 2.1. Hardware Architecture

#### 2.2. Software Architecture

#### 2.3. Reference Systems in the Intelligent Space

**P**= (X, Y, Z)

^{T}can be expressed in different coordinate systems. There is a global reference system named “world coordinate system” and represented by Γ

_{w}. There is also a local reference system associated with each camera (Γ

_{ci}, i = 1,…,n

_{c}) whose origin is located in the center of projection. These coordinate systems are represented in Figure 2, where world coordinate system (Γ

_{w}) has been represented in red color and the coordinate systems associated to the cameras (Γ

_{ci}) have been represented in blue color.

_{c}) and its projection onto the image plane in an ideal camera without lenses through the expressions in Equation (1) where f

_{x}, f

_{y}are the camera focal lengths along x and y axis:

_{1},s

_{2}) from the origin to the center of the image plane is included in the projection equations, obtaining the perspective projection Equation (2):

## 3. Algorithm for motion segmentation and positioning

**s**and

**q**are two vectors that depend on the image spatiotemporal derivatives [I

_{x}, I

_{y}, I

_{t}], the coordinates of each point in the image plane (x, y) and the focal lengths f

_{x}, f

_{y}:

#### 3.1. 3D Brightness Constraint for a Multi-camera Sensor System

**P**

_{w}=(X

_{w}, Y

_{w}, Z

_{w})

^{T}be the 3D coordinates of point

**P**on a mobile robot related to the world coordinate system Γ

_{w}. Let ${\mathbf{v}}_{w}={\left({v}_{w}^{x}\hspace{0.17em}{v}_{w}^{y}\hspace{0.17em}{v}_{w}^{z}\right)}^{T}$ and ${\mathbf{\omega}}_{w}={\left({\omega}_{w}^{x}\hspace{0.17em}{\omega}_{w}^{y}\hspace{0.17em}{\omega}_{w}^{z}\right)}^{T}$ be, respectively, the components of the linear and angular velocity of the robot motion in Γ

_{w}. Then, the velocity of

**P**, relative to Γ

_{w}, is given by Equation (6):

**P**

_{c}=(X

_{c}, Y

_{c}, Z

_{c})

^{T}are the coordinates of

**P**relative to Γ

_{c}and ${\mathbf{v}}_{c}={\left({v}_{c}^{x}\hspace{0.17em}{v}_{c}^{y}\hspace{0.17em}{v}_{c}^{z}\right)}^{T}$ and ${\mathbf{\omega}}_{c}={\left({\omega}_{c}^{x}\hspace{0.17em}{\omega}_{c}^{y}\hspace{0.17em}{\omega}_{c}^{z}\right)}^{T}$ are the components of the linear and angular velocity of the robot motion in Γ

_{c}. The velocity of

**P**relative to Γ

_{c}is given by Equation (7):

**R**

_{wc}be the (3 × 3) rotation matrix and

**T**

_{wc}the (1 × 3) translation vector which represent the coordinate transformation from the world coordinate system (Γ

_{w}) to the camera coordinate system (Γ

_{c}). The coordinate transformation is carried out using the expression in Equation (8).

_{w}(Equation (6)) and Γ

_{c}(Equation (7)), Equation (9) is obtained:

**ω**×

**P**can be expressed as a scalar product

**ω̂**·

**P**, where

**ω̂**is the following antisymmetric matrix:

_{c}(

**v**

_{c},

**ω**

_{c}) are expressed as a function of the components of velocity in Γ

_{w}(

**v**

_{w},

**ω**

_{w}) and the transformation matrices (

**R**

_{wc},

**T**

_{wc}):

**P**on the image plane, the derivative of the perspective projection equations (Equation (2)) with respect to time, and the subsequent substitution of the expression of the velocity components of

**P**in Γ

_{c}allows us to obtain the following equations for motion components in the image plane (ẋ,ẏ):

_{w}to Γ

_{c}(

**R**

_{wc}) and

**q**

_{u},

**q**

_{v}are the following vectors:

_{x}ẋ + I

_{y}ẏ + I

_{t}=0) allows to obtain a 3D brightness constraint for rigid objects in terms of the linear and angular velocity components in Γ

_{w}(

**v**

_{w}and

**ω**

_{w}). This constraint is shown in Equation (13):

**s, q**and

**r**in Equation (13) are given, respectively, by equations (14), (15) and (16):

_{c}cameras. Knowing it, we define a new 3D brightness constraint for rigid objects which includes all the information provided by the n

_{c}cameras available in the intelligent space [Equation (17)]:

_{c}) which indicates the camera. It is worth pointing out that the components of the linear and angular velocity in the world coordinate system do not include the subscript i to indicate the camera because these velocities are equal for the n

_{c}cameras.

#### 3.2. Objective Function for a Multi-camera Sensor System

- – A set of N−1 curves ${\left\{{\gamma}_{\mathit{ki}}\right\}}_{k=1,\dots ,N-1}^{i=1,\dots ,{n}_{c}}$ that divide each image in N regions. These curves define the boundaries of the segmentation in the images acquired by each camera.
- – The components of the linear and angular velocities ${\left\{{\mathbf{v}}_{\mathit{wk}}\right\}}_{k=1}^{N}$, ${\left\{{\mathbf{\omega}}_{\mathit{wk}}\right\}}_{k=1}^{N}$ of the (N−1) mobile robots and background. These velocities are related to the world reference system Γ
_{w}and are equal for the n_{c}cameras. - – The depth (distance from each 3D point
**P**to each camera). The value of depth in each point coincides with the Z_{ci}coordinate of the point**P**related to the coordinate system of the camera i Γ_{ci}:$$E\left[{\left\{{\gamma}_{\mathit{ki}}\right\}}_{k=1,\dots ,N-1}^{i=1,\dots ,{n}_{c}},{\left\{{\mathbf{T}}_{\mathit{wk}}\right\}}_{k=1}^{N},{\left\{{\mathbf{\omega}}_{\mathit{wk}}\right\}}_{k=1}^{N},{\left\{{Z}_{\mathit{ci}}\right\}}_{i=1}^{{n}_{c}}\right]=\sum _{k=1}^{N}\sum _{i=1}^{{n}_{\mathit{ck}}}\left[\underset{{\mathrm{\Omega}}_{\mathit{ki}}}{\int}{\psi}_{\mathit{ki}}^{2}(\mathbf{x})d\mathbf{x}+\mu \underset{{\mathrm{\Omega}}_{\mathit{ki}}}{\int}g\left(\Vert \nabla {Z}_{\mathit{ci}}\Vert \right)d\mathbf{x}\right]+\sum _{k=1}^{N-1}\sum _{i=1}^{{n}_{\mathit{ck}}}\lambda \underset{{\gamma}_{\mathit{ki}}}{\oint}\mathit{ds}$$

_{ki}is the 3D brightness constraint (defined in Equation (17)) for the pixels inside the curve k in the image acquired by the camera i; λ and μ are positive and real constants to weigh the contribution of the terms in the objective function (18) and ∇ = (∂

_{x},∂

_{y}) is the spatial gradient operator.

#### 3.3. Curve and Depth Initialization

_{ci}) is obtained using Visual Hull 3D [10] which allows to obtain a 3D occupancy grid (composed by cubes with size Δh) in Γ

_{w}from the initial segmentation boundaries, that have been computed previously using GPCA. Finally, an extended version of the k-means algorithm is used to estimate the number of mobile robots in the scene. The three steps are described below.

_{c}cameras. Background modeling is carried out from a set of background images that do not contain any mobile robot. Using GPCA we obtain two transformation matrices,

**L**

_{ci}and

**R**

_{ci}, for each camera. These matrices are calculated in each camera, and they represent the background model. Since the cameras are placed in fixed positions within the environment, the background modeling stage needs to be carried out only once, and it can be done off-line.

**L**and

**R**(that have been obtained previously). After that, the image is reconstructed (Equation (20)). In these two equations

**M**represents the mean of the N

_{i}images that have been used to obtain the background model:

**I**

_{R}) and the original (

**I**) image and can be calculated subtracting the images pixel-to-pixel, but this approach is not robust against noise. Therefore, we define a set of pixels (window) around each pixel (with dimensions qxq) called

**Φ**

_{wi}in the original image an

**Φ̂**

_{wi}in the reconstructed image, and we obtain the reconstruction error for these windows, using Equation (21):

_{w}from the initial segmentation boundaries computed previously. The 3D coordinates of the occupied cell are projected from Γ

_{w}to each camera coordinate system Γ

_{ci}(i=1,…,n

_{c}) through the transformation matrices (

**R**

_{wci}and

**T**

_{wci}) to obtain a set of points on the mobile robots in Γ

_{ci}. This process provides an effective method for depth initialization in each camera. Figure 6 presents a block diagram including the main steps in the depth initialization process.

_{w}. Then, we cluster the 2D data using an extended version of k-means [12]. This clustering algorithm allows us to obtain a good estimation of the number of robots in the scene, and a division of the initial curves in each image.

#### 3.4. Objective Function Minimization

_{ci}. So, the energy to minimize reduces to Equation (22):

_{w}can be obtained solving the linear equation system shown in Equation (23):

_{ki}is the characteristic function of region k in image i (Ω

_{ki}):

^{2}). This is a simple function, but its effectiveness has been verified in several experiments, in which we have compared the results obtained using this quadratic function, and other boundary functions described in [13]:

_{γki}is the mean curvature and n

_{ki}is the exterior, unit, normal function of the curve γ

_{ki}. Functions φ

_{ki}are defined as: ${\varphi}_{\mathit{ki}}\left({\gamma}_{\mathit{ki}}\left(s\right)\right)=\underset{j\ne k}{\text{min}}{\xi}_{\mathit{ji}}\left({\gamma}_{\mathit{ki}}\left(s\right)\right)$.

## 4. Experimental Results

_{w}, obtained using the proposed algorithm, onto the image plane of the camera 1.

_{w}, Y

_{w}) in Γ

_{w}to obtain the 3D position. The result of this projection for a 250 images belonging to each sequence is shown in Figure 11. In this figure, we have represented the estimated trajectory obtained using 2 and 3 cameras.

_{w}and Y

_{w}axis, using Equation (28):

## 5. Conclusions

## Acknowledgments

## References and Notes

- Vázquez-Martín, R.; Núñez, P.; Bandera, A.; Sandoval, F. Curvature-Based Environment Description for Robot Navigation Using Laser Range Sensors. Sensors
**2009**, 9, 5894–5918. [Google Scholar] - Pizarro, D.; Mazo, M.; Santiso, E.; Marron, M.; Fernandez, I. Localization and Geometric Reconstruction of Mobile Robots Using a Camera Ring. IEEE Trans. Instrum. Meas
**2009**, 58, N 8. [Google Scholar] - Lee, J.; Ando, N.; Yakushi, T.; Nakajima, K. Adaptative guidance for Mobile robots in intelligent infrastructure. Proceedings of IEEE/RSJ International Conference on Robots and Systems, Outrigger Wailea Resort, Maui, HI, USA, 2001; pp. 90–95.
- Steinhaus, P.; Walther, M.; Giesler, B.; Dillmann, R. 3D global and Mobile sensor data fusion for Mobile platform navigation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2004), New Orleans, LA, USA, 2004; Volume 4, pp. 3325–3330.
- Sogo, T.; Ishiguro, H.; Ishida, T. Acquisition of qualitative spatial representation by visual observation. Proceedings of IJCAI, Stockholm, Sweden, 1999; pp. 1054–1060.
- Fernandez, I.; Mazo, M; Lázaro, J.L.; Pizarro, D.; Santiso, E; Martín, P.; Losada, C. Guidance of a mobile robot using an array of static cameras located in the environment. Auton. Robots
**2007**, 23, 305–324. [Google Scholar] - Sekkati, H.; Mitiche, A. Concurrent 3D Motion Segmentation and 3D Interpretation of Temporal Sequences of Monocular Images. IEEE Trans. Image Proc
**2006**, 15, 641–653. [Google Scholar] - Sekkati, H.; Mitiche, A. Joint Optical Flow Estimation, Segmentation, and Interpretation with Level Sets. Comput. Vis. Image Understand
**2006**, 103, 89–100. [Google Scholar] - Ye, J.P.; Janardan, R.; Li, Q. GPCA: an efficient dimension reduction scheme for image compression and retrieval. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 2004; pp. 354–363.
- Laurentini, A. The Visual Hull: a new tool for contour-based image understanding. Proceedings of 7th Scandinavian Conference on Image Processing, Aalborg, Denmark, 2001; pp. 993–1002.
- Losada, C.; Mazo, M.; Palazuelos, S.; Redondo, F. Adaptive threshold for robust segmentation of mobile robots from visual information of their own movement. Proceedings of the IEEE International Symposium on Intelligent Signal Processing, Budapest, Hungary, 2009; pp. 293–298.
- Kanungo, T.; Mount, D.; Netanyahu, N.; Piatko, C.; Silverman, R.; Wu, A. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. Patt. Anal. Mach. Int
**2002**, 24, 881–892. [Google Scholar] - Aubert, G.; Deriche, R.; Kornprobst, P. Computing optical flow via variational techniques. SIAM J. Appl. Math
**1999**, 60, 156–182. [Google Scholar]

**Figure 2.**Reference systems in the intelligent space (ISPACE-UAH): World coordinate system (Γ

_{w}) in red color. Camera coordinate system (Γ

_{ci}i = 1,2,…,n

_{c}) in blue color.

**Figure 4.**Geometric model of a pinhole camera. In this model the optical center coincides with the origin of the camera coordinate system (Γ

_{c}) represented in blue color. The image reference system (x,y) is drawn in black color.

**Figure 7.**Images belonging to the test sequences, acquired by fixed cameras in the ISPACE-UAH. (a) Images belonging to the sequence 1 (b) Images belonging to the sequence 2.

**Figure 8.**Boundaries of the segmentation obtained after the objective function minimization for one image belonging to the sequence 1 (Figure 7(a)). (a) Curves obtained using one camera (b) Curves obtained using two cameras (c) Curves obtained using three cameras.

**Figure 9.**Boundaries of the segmentation obtained after the objective function minimization for one image belonging to the sequence 2 (Figure 7(b)). Each detected object is shown in a different colour (a) Curves obtained using one camera (b) Curves obtained using two cameras (c) Curves obtained using three cameras.

**Figure 10.**3D trajectory estimated by the algorithm and measured by the odometry sensors on board the robots projected onto the image plane (a) Image belonging to the sequence 1 (b) Image belonging to the sequence 2.

**Figure 11.**3D trajectory estimated by the algorithm and measured by the odometry sensors on board the robots on the X

_{w}, Y

_{w}plane (a) Sequence 1 (b) Sequence 2.

**Figure 12.**Positioning error (in millimetres) of the mobile robots, calculated using Equation (28) for 2 and 3 cameras. (a) Robot in the image sequence 1 (b) Robot 1 in the image sequence 2 (c) Robot 2 in the image sequence 2.

**Table 1.**Average value of the computation time (in seconds) of each couple of the 500 images belonging to each test sequence for 1, 2 and 3 cameras.

Sequence 1 (contains one robot) | Sequence 2 (contains two robots) | |||||
---|---|---|---|---|---|---|

1 camera | 2 cameras | 3 cameras | 1 camera | 2 cameras | 3 cameras | |

Initialization | 0.2910 | 2.8353 | 3.3234 | 0.3410 | 5.1390 | 4.0753 |

Minimization | 0.8758 | 2.8247 | 4.1588 | 2.7273 | 9.8419 | 6.9925 |

Total | 1.1668 | 5.6600 | 7.4822 | 3.0683 | 14.9810 | 11.0678 |

**Table 2.**Average value of the value of the positioning error (mm) obtained using 500 images belonging to each test sequence.

Sequence 1 | Sequence 2 | ||
---|---|---|---|

Robot1 | Robot 1 | Robot 2 | |

1 Camera | 1001.5683 | 371.8227 | 769.7783 |

2 Cameras | 194.8882 | 136.1451 | 75.3317 |

3 Cameras | 191.7257 | 91.5264 | 63.2390 |

© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Losada, C.; Mazo, M.; Palazuelos, S.; Pizarro, D.; Marrón, M.
Multi-Camera Sensor System for 3D Segmentation and Localization of Multiple Mobile Robots. *Sensors* **2010**, *10*, 3261-3279.
https://doi.org/10.3390/s100403261

**AMA Style**

Losada C, Mazo M, Palazuelos S, Pizarro D, Marrón M.
Multi-Camera Sensor System for 3D Segmentation and Localization of Multiple Mobile Robots. *Sensors*. 2010; 10(4):3261-3279.
https://doi.org/10.3390/s100403261

**Chicago/Turabian Style**

Losada, Cristina, Manuel Mazo, Sira Palazuelos, Daniel Pizarro, and Marta Marrón.
2010. "Multi-Camera Sensor System for 3D Segmentation and Localization of Multiple Mobile Robots" *Sensors* 10, no. 4: 3261-3279.
https://doi.org/10.3390/s100403261