# Stereo Vision Tracking of Multiple Objects in Complex Indoor Environments

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- With a known model of the object to be tracked: this situation is very common in tracking applications, either using static cameras [3,4] or dynamic ones [5,6]. The detection process is computational more expensive, but the number of false alarms and the robustness of the detector are bigger than if looking for any kind of objects.

_{t}|y⃗

_{1:t})) of the state vector x⃗

_{t}and upon the output one y⃗

_{t}, which informs about the position of the target, by means of the Bayes rule, and through a recursive two steps estimation process (prediction-correction), in which some of the involved variables are stochastic.

## 2. Detection, Classification and Localization Processes

- The amount of information that can be extracted from an image is much bigger than the one that can be obtained from any other kind of sensor, such as laser or sonar [21].
- As the environmental configuration changes with time, with a single camera is not possible to obtain the depth coordinate of the objects’ position vector, and thus a stereo vision arrangement is needed.

_{p,t}y

_{p,t}z

_{p,t}]

^{T}of a point

**P**

_{t}from its projections,

**p**

_{l,t}and

**p**

_{r,t}, in a pair of synchronized images (I

_{l,t}= [u

_{l,p,t}v

_{l,p,t}]

^{T}, I

_{r,t}= [u

_{r,p,t}v

_{r,p,t}]

^{T}), as shown in Figure 3.

_{l,t}= [u

_{l,p,t}v

_{l,p,t}]

^{T}), this process consists on looking for a similar gray level among the pixels in the epipolar line at the paired image (the right one I

_{r,t}). 3D location of paired pixels can be found if, after a careful calibration process of both cameras location, the geometric extrinsic parameters of rotation, R

_{lr}, and translation, T

_{lr}, are known.

#### 2.1. Detection and Classification

_{l,t}and I

_{r,t}) synchronously acquired in sampling time, t, from the stereo-camera set. This process is developed through the following steps.

#### 2.1.1. Detection

_{l,t}= [u

_{l,p,t}v

_{l,p,t}]

^{T}is used to extract those pixels that may be interesting in the tracking process. Image edges from human contour, tables, doors, columns, and so on are visible and distinguishable from the background (even in quite crowded scenes) and can be easily extracted from the filtered image.

#### 2.1.2. Classification: Structural and Non-Structural Features

_{canny,l,t}, edges corresponding with environmental structures have the characteristic of forming long lines. Thus, the classification process starts seeking structural shapes in the resulting image, through these typical features. Hough transform is used to search these long line segments in the partial Canny image.

- rho and theta are respectively the basic Hough transform distance and angle resolution parameters in pixels and radians.
- threshold is the basic limit to overpass by the Hough accumulator in order to consider that a line exists.
- length is needed in the probabilistic version of Hough transform, and is the minimum line length, in pixels, for the detector of segments. This parameter is very important in the related work as it allows taking into account a line made by very short segments, like those generated in scenes with many occlusions.
- gap is also needed in the probabilistic version of Hough transform. This is the maximum gap in pixels between segment lines to be treated as a single line segment. This parameter is significant here, because it allows generating valid lines with very separated segments, due to occluding obstacles.

- ${I}_{\mathit{structure},l,t}={\left[{u}_{i,l,t}\hspace{1em}{v}_{i,l,t}\right]}_{i=1:\mathit{mstructure}}^{T}$ with the environmental structures, formed by the long lines found at the partial Canny image.
- ${I}_{\mathit{obstacles},l,t}={\left[{u}_{i,l,t}\hspace{1em}{v}_{i,l,t}\right]}_{i=1:\mathit{mobstacles}}^{T}$ with the full Canny image zeroed at the environmental structures.

#### 2.2. 3D Localization of Structural and Obstacles’ Features

#### 2.2.1. Phase 1: 3D Localization

_{structure,t}and Y

_{obstacles,t}are respectively obtained calculating the ZNCC value for each non-zero pixel at the corresponding modified left images, I

_{structure,l,t}and I

_{obstacles,l,t}and using the full right image I

_{r,t}. Those features whose ZNCC values reaches a threshold are validated and finally classified in the corresponding features’ classes, Y

_{structure,t}or Y

_{obstacles,t}.

#### 2.2.2. Phase 2: Obstacles’ Features Filterin

_{obstacles,t}.

_{structure,l,t}. over the input Canny image I

_{canny,l,t}while the one at the bottom shows them over the original images. Those elements identified as members of the structural features class Y

_{structure,t}have been highlighted in both rows of images in order to show the behavior of the algorithm: in colors at the Canny image, and in yellow at the original image if their 3D localization ${\left[{x}_{i,t}\hspace{1em}{y}_{i,t}\hspace{1em}{z}_{i,t}\right]}_{i=1:\mathit{mstructure}}^{T}$ has been found.

_{canny,l,t}input to the classification process; the central row shows the set of original images, where those 3D points ( ${\left[{x}_{i,t}\hspace{1em}{y}_{i,t}\hspace{1em}{z}_{i,t}\right]}_{i=1:\mathit{mobstacles}}^{T}$) assigned to the obstacles’ features class Y

_{obstacles,t}are then projected back in colors according to their height in the Y coordinate (light blue for lower values, dark one for middle ones and green for higher ones). Finally, the row at the bottom is a 2D projection over the ground (XZ plane) of the set of points of the obstacles’ features class Y

_{obstacles,t}. The clouds of points in the 2D projection allow perform the tracking task of the four persons found in the original sequence.

_{obstacles,t}related to the legs of the persons in the scene do not include all edge points related to them in the preliminary Canny image I

_{canny,l,t}Nevertheless, the multi-obstacles’ tracker works perfectly in any situation as it is demonstrated in the video MTracker.avi (see supplementary materials) from the experiment shown in Figure 6. In all the frames there are enough edge points in all obstacles, from 115 to 150 features per person to be tracked; the total amount of them are displayed at the bottom of each column in Figure 6 (parameter nPtosObs, text in red).

## 3. The Multiple Obstacles’ Tracker

_{obstacles,t}: the set of measurements, unequally distributed among all obstacles in the scene, are clustered in a set of k

_{in,t}groups G

_{1:}

_{k,t|in}to work as observation density p(y⃗

_{t}) ≈ p(G

_{1:}

_{k,t|in}).

_{out,t}objects G

_{1:}

_{k,t|out}identified by colors with their corresponding location, speed and trajectory followed in the XYZ space.

_{t}| y⃗

_{1:t}) with a set of n weighted samples $p({\overrightarrow{x}}_{t}|{\overrightarrow{y}}_{1:t})\cong {S}_{t}={\{{\overrightarrow{s}}_{i,t}\}}_{i=1}^{n}=\{{{\overrightarrow{x}}_{t}^{(i)},{w}_{t}^{(i)}\}}_{i=1}^{n}$ (generally called particles) to develop the estimation task. Thanks to this kind of representation, different modes can be implemented in the discrete belief generated by the PF, which applied to the case of interest allow to characterize different tracked objects.

_{t−1}|y⃗

_{1:t−1}) output by this step. As shown in this figure, this new re-initialization step is executed using the clusters segmented from the XPFCP input data set of obstacles’ features G

_{1:k,t−1|in}, therefore including in the tracking task a deterministic framework (blocks in blue in Figure 7).

_{1:k,t|in}, is also used at the correction step of the XPFCP, modifying the standard step of the Bootstrap PF, as displayed in Figure 7 (dashed lines). At this point, the clustering process works as a NN association one, reinforcing the preservation of multiple modes (as many as obstacles being tracked at each moment) in the output of the selection step: the final belief p(x⃗

_{t}|y⃗

_{1:t}).

_{1:k,t|out}is obtained organizing in clusters the set of particles ${S}_{t}={\left\{{\overrightarrow{s}}_{i,t}\right\}}_{i=1}^{{n-n}_{m,t}}$ that characterizes the belief p(x⃗

_{t}|y⃗

_{1:t}) at the end of the XPFCP selection step. This new clustering process discriminates the different modes or maximum probability peaks in p(x⃗

_{t}|y⃗

_{1:t}), representing the state x⃗

_{t}of all k

_{out,t}objects being tracked by the probabilistic filter at that moment. The following subsections extend the description of XPFCP functionality.

#### 3.1. The Tracking Model

_{t|t−1}will define the position and speed state of the obstacle being tracked. In addition, the state noise vector v⃗

_{t}(empirically characterized as Gaussian and white) is included in the actuation model both to modify the constant speed of the obstacle, and to model the uncertainty related to the probabilistic estimation process.

_{t}defines the observable part of the state x⃗

_{t|t−1}, that in this case matches with the 3D position information ( ${Y}_{\mathit{obstacles},t}={\left[{x}_{i,t}\hspace{1em}{y}_{i,t}\hspace{1em}{z}_{i,t}\right]}_{i=1:\mathit{mobstacles}}^{T}$) extracted by the stereo vision process described in section 2. An observation noise vector o⃗

_{t}has also been included to model the noise related to that vision process, and so, it is characterized in an off-line previous step. This noise model makes possible to keep tracking objects when they are partially occluded.

_{t}and in o⃗

_{t}, resulting that σ

_{v,i}= 100mm / i = {x, y, z, ẋ, ż} and σ

_{o,i}= [150,200]mm/i = {x,y,z}. Besides, the study of sensibility concluded that a modification of a 100% in any of σ

_{o,i}generates an increase in the tracking error of around 24%, while the same modification in any of σ

_{v,i}generates ten times lower figures. This result indicates the importance of the observation noise vector in the multi-obstacles’ tracking task.

#### 3.2. Steps of the XPFCP

#### 3.2.1. Clustering Measurements

_{obstacles,t}extracted by the stereo vision process. The output set of groups G

_{1:k,t|in}generated by this process is then used in the re-initialization and correction steps of the XPFCP.

- The clustering algorithm adapts itself to an unknown and variable number k
_{in,t}clusters, as needed in this application. - A preliminary centroid g⃗
_{1:k,t|in}prediction is included in the process in order to make fast and sure its convergence (the execution time of the proposal is decreased in 75% related to the standard K-Means’s one). This centroid prediction is possible thanks to the first and third steps of the blocks diagram in Figure 8: predicting an initial value for each centroid g⃗_{0,1:k,t|in}, and computing each centroid updating vector u⃗_{1:k,t|in}. - A window based validation process is added to the clustering proposal in order to increase its robustness against outliers in almost a noise rejection rate of 70%. Besides, this process provides the inclusion of an identifier τ
_{1:k|out}for each cluster obtained, with a 99% success rate meanwhile the cluster keeps appearing among the input data set Y_{obstacles,t}. Thanks to this functionality, the validation process (last step, remarked in green in Figure 8) helps keeping track of temporal total occlusions of objects in the scene, as it is demonstrated in the video sequence MTracker.avi (see supplementary materials).

_{1:k,t|in}≡ {g⃗

_{j,t}, τ

_{j}/ j = 1 : k

_{in,t}} comprises a robust, filtered, compact and identified representation of the corresponding input data, which strengths the PF reliability in the multimodal estimation task pursuit.

#### 3.2.2. Re-Initialization

_{m,t−1}new particles to the discrete belief S

_{t−1}≅ p(x⃗

_{t−1}|y⃗

_{1:t−1}) from time t − 1. So, new tracking events (inclusion or loss of any object in the scene) are quickly updated in the estimation process.

_{in,t−1}clusters G

_{1:k,t−1|in}, segmented from the input data set of obstacles’ features Y

_{obstacles,t−1}. Therefore, the re-initialization step generates the discrete density S̑

_{t−1}≅ p̑(x⃗

_{t−1}|y⃗

_{1:t−1}), which is a modification of S

_{t−1}≅ p(x⃗

_{t−1}|y⃗

_{1:t−1}) described by equation (3):

_{1:k,t−1|in}) are considered equally in the re-initialization process.

_{t−1}, a specific number of particles n

_{m|i t−1}is defined for each cluster j = 1: k

_{in,t−1}to be inserted at this step, as shown in equation (4):

_{init,j,t−1}is a boolean parameter informing about the novelty of the cluster G

_{j,t−1|in}in the set G

_{1:k,t−1|in}; n

_{init}is the number of particles to append for each new cluster; n

_{m}is the minimum number of particles per cluster to be included; and n

_{m,t−1}is the total amount of particles inserted at this step in S

_{t−1}to get S̑

_{t−1}.

_{m,t−1}with the number n of them obtained at the output of this step. Using γ

_{t}a continuous version of equation (3) can be expressed as shown in equation (4) and in Figure 7:

_{m|j,t−1}for each j = 1:k

_{in,t−1}helps shortcoming the impoverishment problem of the PF in its multimodal application. This process ensures the particles diversification among all tracking hypotheses in the density estimated by the PF and increases the probability of newest ones, that otherwise would disappear along the filter evolution. Results included in section 4 demonstrates this assertion for a quite low value of γ

_{t}, that maintains the mathematical recursive rigor of the Bayesian algorithm.

_{t−1}|y⃗

_{1:t}

_{−1}) towards high likelihood areas in the probability space. In order to maintain constant the number of particles in S

_{t}along the time (and thus the XPFCP execution time), the n

_{m,t−1}of them that are to be inserted at the re-initialization step at time are wisely erased at the selection step at time t − 1.

#### 3.2.3. Prediction

_{t−1}≅ p̑(x⃗

_{t−1}|y⃗

_{1:t}

_{−1}) is updated through the actuation model, to obtain a discrete version of the prior S

_{t|t−1}≅ p(x⃗

_{t}|y⃗

_{1:t−1}).

_{t}|x⃗

_{t−1}) is defined in section 3.1, and so, the last expression in equation (6) can be replaced by equation (1).

_{t−1}is included in the particles’ state prediction with two main objectives: to create a small dispersion of the particles in the state space (needed to avoid degeneracy problems of the set [9]); and a slight modification of the speed components in the state vector (needed to provide movement to the tracking hypothesis when using the CV model [27]).

#### 3.2.4. Correction and Association

_{min,i,t}is the shortest distance in the observation space (XYZ in this case), for particle S⃗

_{i,t|t−1}, between the projection in this space of the predicted state vector represented by the particle $h({\overrightarrow{x}}_{t|t-1}^{(i)})$, and all centroids g⃗

_{1:k,t|in}in the cluster set G

_{1:k,t|in}, obtained from the objects’ observations set Y

_{obstacles,t}. The use of cluster centroids guarantees that the observation model applied is filtered, robust and accurate whatever the reliability of the observed object.

_{t}, the observation model defined by (2) has to be utilized, as $h\left({\overrightarrow{x}}_{t|t-1}^{(i)}\right)={\overrightarrow{y}}_{t}^{(i)}$. Besides, O is the covariance matrix that characterizes the observation noise defined in the same model. This noise models the modifications of positions in the clusters G

_{j,t|in}centroid g⃗

_{j,t|in}, when tracking objects that are partially occluded.

_{min,i,t}involves a NN association between the cluster, G

_{j,t|in}, whose centroid g⃗

_{j,t|in}is used in the particle’s weight ${\tilde{w}}_{t}^{(i)}$ computation and the tracking hypothesis represented by the particle S⃗

_{i,t|t−1}itself. In fact, this association means that g⃗

_{j,t|in}is obtained from the observations generated by the tracking hypothesis represented by S⃗

_{i,t|t−1}.

_{eff}is used as a quality factor to evaluate the efficiency of every particle in the set. According this factor, n̑

_{eff}should be above 66% in order to prevent the impoverishment risk at the particle set. This parameter is included among the results presented in next section in order to demonstrate how the XPFCP solves the impoverishment problem.

#### 3.2.5. Selection

_{t}|y

_{1:t}). This final set S

_{t}is formed by n − n

_{m,t}particles, in order to have n

_{m,t}inserted at the next re-initialization step.

#### 3.2.6. Clustering Particles

_{t}≅ p(x

_{t}|y

_{1:t}) output by the selection step, a deterministic solution has to be generated by the XPFCP. This problem consists on finding the different modes included in the multimodal density p(x

_{t}|y

_{1:t}) represented by the particle set S

_{t}; it has not an easy solution if those modes are not clearly different in that distribution.

_{t}|y

_{1:t}), while avoiding impoverishment problems in it, is the principal aim of all techniques proposed in this paper. Following section shows empirical results that demonstrates this.

_{t}|y

_{1:t}) at the end of the XPFCP loop. Therefore, these groups G

_{1:k,t|out}will become the deterministic representation of the multiple obstacles’ hypotheses Y

_{obstacles,t}detected by the stereo vision algorithm described in Section 2.

_{1:k,t|out}from S

_{t}. Therefore, the deterministic representation of each j = 1 : k

_{out,t}tracked hypothesis will be a cluster G

_{j,t|out}with centroid g⃗

_{j,t|out}, with the same components as the state vector defined in (1), and an identification parameter τ

_{j|out}.

## 4. Results

_{1:k,t|out}has got a different and unique color. These groups are identified with a cylinder, thus this is shown as rectangles in the images and as circles in the ground projections. In both graphics, an arrow (with the same color than the corresponding group) shows the estimated speed of every obstacle being tracked at each situation, both in magnitude and in direction.

_{t}generated by the XPFCP) and 3D position of data set Y

_{obstacles,t}are represented by red and green dots, respectively, in each plot. Besides, the estimated values of position and speed (if non zero) of each obstacle are also depicted below its appearance in top row images.

_{eff}(neff) and the frame number in the video sequence (iter). As it can be noticed in Figure 9, the observation system proposed and described in section 2 performs correctly its detection, classification and 3D localization task. Every object not belonging to the environmental structure is detected, localized and classified in the obstacle data set Y

_{obstacles,t}, in order to be tracked afterwards.

_{1:k,t|out}maintains its identity τ

_{1:k|out}(shown with the same color in Figure 9) while the object stays in the scene even if it is partially or totally occluded (for a certain time) to the vision system. This is possible thanks to the particles’ clustering algorithm that includes a window based validation process.

_{1:4|out}of the cluster related to the corresponding obstacle G

_{1:4,t|out}; each color reflects the cluster identity τ

_{1:4|out}. A dashed oriented arrow over each g⃗

_{1:4|out}trace illustrates the ground truth of the path followed by the real obstacles. It can be hence conclude, that the correct identification of each object τ

_{1:4|out}is maintained with a 100% of reliability, even when partial and total occlusions occur; this is the case shown on traces from obstacles three (in pink) and four (in light blue).

_{t}≅ p(x⃗

_{t}|y⃗

_{1:t}), can be easily segmented to generate a deterministic output G

_{1:k,t|t}, which is not the case with the results generated by the proposal in [18]. A fast clustering algorithm, like the K-Means based proposed in this work, is enough to fulfill this task robustly and with low execution time. As it can be seen in the figure, the execution time of the XPFCP (texe = 28 ms) is almost 17 times smaller than the one of the other algorithm (texe = 474 ms); therefore, the Bayesian proposal presented in this paper is more appropriate for a real time application than the proposal in [18].

_{obstacles,t}organized in clusters G

_{1:k,t|in}at the re-initialization and correction steps. The bottom row of images in Figure 12 shows the same information and parameters than the corresponding one in Figure 11. By the other side, the upper row plots the weights array ${\overrightarrow{w}}_{t}={\left[{\tilde{w}}_{t}^{(i)}\right]}_{i=1}^{n}$ output from the correction step. Analyzing the results included in Figure 12, it is concluded that if the proposed segmentation in G

_{1:k,t|in}clases is not used (left column plots) the poorest sensed object in the scene (the paper bin besides the wall on the right), has a reduced representation in the discrete distribution output of the correction step ${S}_{t}^{\prime}={\left\{{\overrightarrow{x}}_{t|t-1}^{(i)},{\tilde{w}}_{t}^{(i)}\right\}}_{i=1}^{n}$. However, results generated by the XPFCP in the same situation (right column plots) are much better. A visual comparison between both discrete distribution plots (top row) show the claimed behavior.

- The low computational load of the tracking application enables its real time execution.
- The impoverishment problem has been correctly solved because the number of efficient particles involved in the PF is above the established threshold (66%).
- The XPFCP shows high identification reliability and robustness against noise.
- A detailed analysis of tracking reliability shows errors (missed, duplicated or displaced objects) in about a 13% of iterations.
- Nevertheless, noticeable errors in the tracking application (those of more than three consecutive iterations) only reached a 5.3% of iterations in the whole experiment.

## 5. Conclusions

## Supplemental Information

sensors-10-08865-s001.avi## Acknowledgments

## References

- Jia, Z; Balasuriya, A; Challa, S. Autonomous vehicles navigation with visual target tracking: Technical approaches. Algorithms
**2008**, 1, 153–182. [Google Scholar] - Khan, Z; Balch, T; Dellaert, F. A Rao-Blackwellized particle filter for eigen tracking. Proceedings of the Third IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, June 2004; pp. 980–986.
- Isard, M; Blake, A. Icondensation: Unifying low-level and high-level tracking in a stochastic framework. Proceedings of the Fifth European Conference on Computer Vision, Freiburg, Germany, June 1998; 1, pp. 893–908.
- Chen, Y; Huang, TS; Rui, Y. Mode-based multi-hypothesis head tracking using parametric contours. Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, May 2002.
- Odobez, JM; Gatica-Perez, D. Embedding motion model-based stochastic tracking. Proceedings of the Seventeenth International Conference on Pattern Recognition, Cambridge, UK, August 2004; 2, pp. 815–818.
- Okuma, K; Taleghani, A; De Freitas, N; Little, JJ; Lowe, DG. A boosted particle filter: Multi-target detection and tracking. Proceedings of the Eighth European Conference on Computer Vision, Prague, Czech Republic, May 2004; 3021, Part I. pp. 28–39.
- Thrun, S. Probabilistic algorithms in robotics. AI Mag
**2000**, 21, 93–109. [Google Scholar] - Arulampalam, MS; Maskell, S; Gordon, N; Clapp, T. A tutorial on particle filters for online nonlinear non-gaussian bayesian tracking. IEEE Trans. Signal. Proces
**2002**, 50, 174–188. [Google Scholar] - Gordon, NJ; Salmond, DJ; Smith, AFM. Novel approach to nonlinear/non-gaussian bayesian state estimation. IEEE Proc. F
**1993**, 140, 107–113. [Google Scholar] - Wang, X; Wang, S; Ma, J-J. An improved particle filter for target tracking in sensor systems. Sensors
**2007**, 7, 144–156. [Google Scholar] - Welch, G; Bishop, G. An Introduction to the Kalman Filter; Technical Report: TR95-041; ACM SIGGRAPH: Los Angeles, CA, USA, 2001; Available online: http://www.cs.unc.edu/~tracker/ref/s2001/kalman/ (accesed on 30 June 2010).
- Reid, DB. An algorithm for tracking multiple targets. IEEE Trans. Automat. Contr
**1979**, 24, 843–854. [Google Scholar] - Tweed, D; Calway, A. Tracking many objects using subordinated condensation. Proceedings of the British Machine Vision Conference, Cardiff, UK, October 2002; pp. 283–292.
- Smith, K; Gatica-Perez, D; Odobez, JM. Using particles to track varying numbers of interacting people. Proceedings of the Fourth IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005; pp. 962–969.
- MacCormick, J; Blake, A. A probabilistic exclusion principle for tracking multiple objects. Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, September 1999; pp. 572–578.
- Schulz, D; Burgard, W; Fox, D; Cremers, AB. Tracking multiple moving targets with a mobile robot using particle filters and statistical data association. Int. J. Robot. Res
**2003**, 22, 99–116. [Google Scholar] - Hue, C; Le Cadre, JP; Pérez, P. A particle filter to track multiple objects. IEEE Trans. Aero. Elec. Sys
**2002**, 38, 791–812. [Google Scholar] - Koller-Meier, EB; Ade, F. Tracking multiple objects using a condensation algorithm. J. Robot. Auton. Syst
**2001**, 34, 93–105. [Google Scholar] - Schulz, D; Burgard, W; Fox, D; Cremers, AB. People tracking with mobile robots using sample-based joint probabilistic data association filters. Int. J. Robot. Res
**2003**, 22, 99–116. [Google Scholar] - Bar-Shalom, Y; Fortmann, T. Tracking and Data Association; Academic Press: New York, NY, USA, 1988.
- Burguera, A; González, Y; Oliver, G. Sonar semsor models and their application to mobile robot localization. Sensors
**2009**, 9, 10217–10243. [Google Scholar] - Boufama, B. Reconstruction Tridimensionnelle en Vision par Ordinateur: Cas des Cameras non Etalonnees. Ph.D. Thesis; Institut National Polytechnique de Grenoble: Grenoble, France, 1994. [Google Scholar]
- Canny, FJ. A computational approach to edge detection. IEEE Trans. Pattern Anal
**1986**, 8, 679–698. [Google Scholar] - Documentation of function cvHoughLines2. Available online: http://opencv.willowgarage.com/documentation/feature_detection.html (accessed on 27 August 2010).
- Project OpenCV. Available online: http://sourceforge.net/projects/opencvlibrary/ (accesed on 27 August 2010).
- Vermaak, J; Doucet, A; Perez, P. Maintaining multimodality through mixture tracking. Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, June 2003; pp. 1110–1116.
- Marrón, M; Sotelo, MA; García, JC; Broddfelt, J. Comparing improved versions of ‘K-Means’ and ‘Subtractive’ clustering in a tracking applications. Proceedings of the Eleventh International Workshop on Computer Aided Systems Theory, Las Palmas de Gran Canaria, Spain, February 2007; pp. 252–255.
- Bar Shalom, Y; Li, XR. Estimation and Tracking Principles Techniques and Software; Artech House: Boston, MA, USA, 1993. [Google Scholar]
- MobileRobots. Available online: http://www.mobilerobots.com/Mobile_Robots.aspx (accessed on 27 August 2010).
- The Player Project. Available online: http://playerstage.sourceforge.net/ (accessed on 27 August 2010).

**Figure 1.**Framework and typical scenario: mobile robot navigation through complex and crowded indoor environments.

**Figure 4.**Flowchart of the data acquisition subsystem, based on a stereo vision process. Main tasks are: detection and classification (blocks at the top); and 3D localization (blocks at the bottom). Inner structure of each main task is highlighted and detailed.

**Figure 5.**Results of the detection, classification and 3D location process in three frames of a real experiment. Detected structural features and related original images.

**Figure 6.**Results of the detection, classification and 3D location process in four frames of a real experiment. Top row, detected edges; middle row, original images; bottom row, 2D ground projection of points classified as obstacles.

**Figure 7.**Functional diagram of the multiple objects’ tracker based on a XPFCP. Deterministic tasks have a blue background while probabilistic tasks have a different color. Modified or new PF steps are remarked with dashed lines.

**Figure 8.**Functional diagram of the modified version of the Extended K-Means (second step, white background), used in the correction step of the XPFCP: the Sequential K-Means with Validation. New steps of this clustering algorithm are highlighted in yellow and green.

**Figure 9.**Results of the multi-tracking process in a real experiment. They are organized in columns, where the upper image shows the tracking results generated by the XPFCP for each object, projected in the image plane, and the lower one shows the same results projected into the XZ plane.

**Figure 10.**Trajectory followed in the ground plane (XZ) by four obstacles according to the XPFCP estimation results in a real experiment.

**Figure 11.**Results of the multi-tracking process in a real experiment: left column shows the results generated by the XPFCP; the right column shows the results of the proposal presented in [18].

**Figure 12.**Results of the multi-tracking process in a real experiment using the proposed XPFCP (left column of images), and the same results using an input data set not segmented in classes at the re-initialization and correction steps (right column of images).

**Table 1.**Distribution percentage of particles in the set S

_{t}among the tracked hypotheses in the situations shown in Figure 12.

Algorithm | Object | |||
---|---|---|---|---|

1 | 2 | 3 | 4 | |

Using G_{1:k,t−1|in} (left column plots) | 28.5 | 28.1 | 31.5 | 10.9 |

Not using G_{1:k,t−1|in} (right column plots) | 31.2 | 42.2 | 24.4 | 2.2 |

**Table 2.**Summary of the results obtained with the multi-tracking proposal in a long and complex experiment. The most relevant parameters in the XPFCP are tuned to the values: n = 600, γ

_{t}= 0.2, $\raisebox{1ex}{${n}_{\mathit{init}}$}\!\left/ \!\raisebox{-1ex}{$n$}\right.=5\%$, σ

_{v,i}= 100 / i = {x,y,z,vx,vz}, σ

_{o,i}= 150mm / i = {x,y,z}.

Parameter | Value |
---|---|

Mean execution time | 40 ms (25 FPS) |

Number of efficient particles, n̑_{eff} | 69.8% |

Mismatch identification (% frames) | 0% |

Outliers rejection (% frames) | 99.9% |

Missed objects (% frames) | 9.2% |

Duplicated objects (% frames) | 3.3% |

Displaced objects (% frames) | 0.4% |

Reliability in long term errors (% frames) | Δt > 0.6s → 3.5%, Δt > 0.8s → 1.8% |

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Marrón-Romera, M.; García, J.C.; Sotelo, M.A.; Pizarro, D.; Mazo, M.; Cañas, J.M.; Losada, C.; Marcos, Á.
Stereo Vision Tracking of Multiple Objects in Complex Indoor Environments. *Sensors* **2010**, *10*, 8865-8887.
https://doi.org/10.3390/s101008865

**AMA Style**

Marrón-Romera M, García JC, Sotelo MA, Pizarro D, Mazo M, Cañas JM, Losada C, Marcos Á.
Stereo Vision Tracking of Multiple Objects in Complex Indoor Environments. *Sensors*. 2010; 10(10):8865-8887.
https://doi.org/10.3390/s101008865

**Chicago/Turabian Style**

Marrón-Romera, Marta, Juan C. García, Miguel A. Sotelo, Daniel Pizarro, Manuel Mazo, José M. Cañas, Cristina Losada, and Álvaro Marcos.
2010. "Stereo Vision Tracking of Multiple Objects in Complex Indoor Environments" *Sensors* 10, no. 10: 8865-8887.
https://doi.org/10.3390/s101008865