# A Methodology for Multi-Camera Surface-Shape Estimation of Deformable Unknown Objects

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Formulation

## 3. Proposed Method

#### 3.1. Camera-Pair Selection

_{i}, and angular separation θ

_{i}.

**K**

_{max}[2 × k

_{max}], with the total number of pairs, k

_{max}, determined by

**K**

_{max}is tested to ensure that (i) both cameras are oriented in a similar direction, (ii) the optical axis separation angle θ

_{i}is within a user-defined limit θ

_{max}, and (iii) the baseline distance is within a user-defined limit d

_{max}.

_{o}(c

_{L}) is the vector of the optical axis of the left camera, and p

_{o}(c

_{R}) is the optical axis vector of the right camera of the ith camera pair tested from

**K**

_{max}.

_{max}, and θ

_{max}must be set by the selection feature-detection algorithm. For example, dense photometric shape recovery methods [4,9,24,25,60,61] would require narrow baselines and low separation angles to maximize photometric consistency in a stereo pair. Unique key-based feature methods, such as scale-invariant feature transform (SIFT) [62,63], affine-SIFT (ASIFT) [64], speeded-up robust features (SURF) [65], or others [66], allow for larger baseline widths and angular separations. In this paper, the limits were set to 350 mm and 45° based on findings by Lowe [63] for unique key-based feature methods. The viable camera pairs were stored into an index list

**K**[2 × k

_{v}], where k

_{v}is the total number of viable pairs. It is noted that, for the purpose of this work, SIFT features were chosen for simulations and experiments. Other approaches, such as photometric stereo, could also be implemented instead. However, it should be noted that photometric stereo methods require highly controlled lighting environments.

#### 3.2. Image Capture and Feature Detection

#### 3.3. 2D Matching and Filtering

**K**, and removing incorrect matches prior to triangulation. At first, all the previously detected features for the given cameras in the ith pair were loaded from the database. The matching process is dictated by the chosen feature detector. Typically, matching consists of locating pairs of 2D features whose keys are most similar though a dot-product operation [63]. The main difference for various feature-descriptors is the size of the key-identifier for the detected features; for example, SIFT features use a 128 8-bit integer key for each feature, while SURF uses a 64 8-bit key. Once the features for the ith camera pair are matched, the incorrect matches must be removed through a filtering operation.

_{Pmax}, was set to 5% of the square root of the image area in pixels—this ensures the metric is scalable with larger or smaller images. All matches that do not fit the 5% criterion are removed.

**V**

_{L}[2 × n], and

**V**

_{R}[2 × n], respectively. The lengths of all vectors are calculated, and stored into vectors d

_{L}, d

_{R}, [n × 1], which are then normalized by the sum of all vector lengths, and are stored in vectors d

^{*}

_{L}, and d

^{*}

_{R}. The vector angles are calculated as

#### 3.4. Triangulation and 3D Filtering

**K**are triangulated through ray projection and intersection. This yields k

_{v}-sets of 3D coordinates corresponding to each stereo-camera pair. Triangulation may result in errors of the 3D coordinates, e.g., if the 2D filter was not able to remove all outliers, some may remain and appear following triangulation. Therefore, a modular 3D filtering process is proposed herein. The proposed methodology requires that all cameras in the system must be calibrated for complete triangulation. All cameras used in stereo pairs must be calibrated as such.

**X**, and an indexed matching list for each camera pair in

**K**that corresponds to all final, matched, and triangulated 2D features. The indexed 3D features are stored in a separate database to reduce the RAM load of the data used in the methodology. Off-RAM storage is a suggested improvement for functionality. Many features may be lost over time; thus, there is no need to retain them in the RAM. The ESR filter was found to be sufficient for 3D filtering following the extensive 2D match filtering.

_{T}-sets of point clouds,

**X**

_{TR}(k), for each kth stereo-camera pair. Each point cloud is then used to create a triangular surface patch,

**T**(k), of the target object through a Delaunay triangulation.

**T**(k) is herein defined as the triangulation map, wherein triangulation is the graphical subdivision of a planar object into triangles. The map is a size [n

_{polys}× 3] index matrix whose rows index the three points from

**X**

_{TR}(k) that make up a given surface patch polygon. The surface patch is obtained by firstly projecting the point cloud into each camera’s image plane as a set of 2D pixel coordinates. One set of 2D coordinates is then triangulated using Delaunay triangulation. The triangulation map is then applied to the second set of 2D coordinates and checked for inconsistencies, such as incorrect edges. Incorrect edges are a symptom of incorrectly triangulated 2D features. The incorrect edges are removed by removing the 3D points in

**X**

_{TR}(k) that connect to the most incorrect edges. The result is a set of fully filtered point cloud matrices

**X**

_{TR}(k) and their associated surface patches,

**T**(k), as shown in Figure 2.

#### 3.5. Tracking

**X**(t) is the 3D positional measurement of the given point,

**Ẋ**is the estimated velocity, and

**Ẍ**is the estimated acceleration. The constant-acceleration model requires nine total states: three positions, three velocities, and three accelerations.

_{j}

^{*}is the nine-dimensional state-vector of the jth tracked point, and ${\sigma}_{{p}_{j}}^{2}$ is its associated variance. The measurement variance, ${\sigma}_{{N}_{j}}^{2}$, is set equal to the particle variance, to ensure the partiality weighing step also adapts on-line with varying deformation dynamics.

**Q**

_{j}

^{+}is loaded, and each nine-dimensional particle is weighed against the current state-space estimate x

_{j}

^{*}.

**W**

_{j}is the weight matrix of all particles for the jth point. The weights are then normalized.

**W**

_{j}

^{*}is of size [9 × q]. The summed term is checked for a zero-condition which occurs if the projection is too far off from the measured location. All states with a zero-sum condition bypass the resampling step, and new particles are generated from the most recent measurement. All remaining states with non-zero sums, i.e., states with accurate projections, are resampled through a sequential importance process [54]. The regenerated states and resampled states are combined into an updated

**Q**

_{j}matrix.

#### 3.6. Prediction

**Q**

_{j}, which is then projected to determine the expected state-space of the jth point at the next demand instant.

**Q**

_{j}

^{+}is the matrix of projected particles for the jth point,

**H**is the [9 × 9] constant-acceleration state transition matrix, and

**U**is the [9 × q] uncertainty matrix based on the particle projection variance (Equation (11)). The particle matrices are then averaged to produce a single state-space estimate of the predicted point.

**X**

^{+}. For each subset of tracked points from

**X**

^{+}detected by the ith camera pair, a surface mesh is applied. Thus, the resulting output of the methodology is a set of n meshes that represent the predicted deformation of the object. Figure 2 illustrates the predicted deformation of a dinosaur from three stereo-camera pairs where each colored surface patch correlates to a particular point-cloud prediction of the corresponding stereo pair.

## 4. Simulations

_{t}, were calculated as the Euclidean distance between a triangulated point and the nearest ground-truth surface. The prediction errors, e

_{p}, were calculated as the Euclidean distance between a predicted point and the nearest ground-truth surface. The relative prediction errors, e

_{f}, were calculated as the Euclidean distance between a predicted point locations and its actual triangulation location, for all tracked points. All three errors were normalized by the square root of the object’s surface area. The normalization ensures the error metrics are invariant to the size of the target object, and relative pose of the cameras.

_{t}(j) is the shortest distance between the jth triangulated point in

**X**to the surface of the true object model, m is the total number of triangulated points at the given demand instant, and S is the surface area of the true object model.

_{p}is the shortest distance between the jth predicted point in

**X**to the surface of the true object model, and m is the total number of predicted points.

^{+}#### 4.1. Simulation 1

#### 4.2. Simulation 2

_{f}, is attributed to the acceleration profile of the motion where the lowest error values correspond to zero acceleration of the object.

#### 4.3. Simulation 3

## 5. Experiments

#### 5.1. Background Segmentation

#### 5.2. Results

## 6. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## Nomenclature

$C$ | Total number of cameras. |

$H$ | State transition matrix, [9 × 9]. |

$K$ | Matrix of stereo camera pairs, [2 × k_{v}]. |

${K}_{max}$ | Matrix of camera combination pairs, [2 × k_{max}]. |

$Q$ | Set of particles, [9 × q]. |

${Q}^{+}$ | Set of projected particles, [9 × q]. |

$S$ | True surface area for the object model. |

$T$ | Triangulation of all individual surface patches. |

$U$ | Uncertainty matrix for tracking and prediction, [9 × n]. |

${V}_{L}$ | Vectors between given SIFT feature and its nearest neighbors in the left image, [2 × n]. |

${V}_{R}$ | Vectors between given SIFT feature and its nearest neighbors in the right image, [2 × n]. |

${W}_{j}$ | Weight matrix for all particles of the jth tracked point, [9 × q]. |

${W}_{j}^{*}$ | Normalized weight matrix for all particles of the jth tracked point, [9 × q]. |

$X$ | Matrix of all triangulated points’ poses, [3 × n]. |

${X}_{TR}$ | Matrix of triangulated points’ poses for a single stereo-camera pair, [3 × n]. |

$\dot{X}$ | Matrix of all tracked, triangulated points’ estimated velocities, [3 × n]. |

$\ddot{X}$ | Matrix of all tracked, triangulated points’ estimated accelerations, [3 × n]. |

${X}^{+}$ | Predicted pose of all tracked, triangulated points, [3 × n]. |

${d}_{i}$ | Baseline separation between ith camera pair. |

${d}_{max}$ | Maximum baseline separation for a camera pair. |

${d}_{Pmax}$ | Maximum epipolar distance. |

${d}_{L}$ | Euclidean lengths of each vector in V_{L}, [n × 1]. |

${d}_{R}$ | Euclidean lengths of each vector in V_{R}, [n × 1]. |

${d}_{{}_{L}}^{*}$ | Normalized d_{L} vector, [n × 1]. |

${d}_{R}^{*}$ | Normalized d_{R} vector, [n × 1]. |

${e}_{t}$ | Normalized Euclidean distance error between triangulated points’ poses and the true object’s surface. |

${e}_{p}$ | Normalized Euclidean distance error between predicted points’ poses and the true object’s surface. |

${e}_{f}$ | Normalized Euclidean distance error between predicted points’ poses and the triangulated points’ poses. |

${k}_{v}$ | Total number of stereo-camera pairs. |

${k}_{max}$ | Total number of camera pairs possible. |

${p}_{o}(c)$ | Optical axis vector for the cth camera, [3 × 1]. |

$q$ | Total number of particles. |

$t$ | Demand instant. |

${x}_{j}^{*}$ | Space-state vector of the jth tracked, triangulated point, [9 × 1]. |

${x}_{j}^{*+}$ | Projected space-state estimate of the jth tracked, triangulated point, [9 × 1]. |

${z}_{t}(j)$ | Euclidean distance between the jth tracked, triangulated point to the nearest true object surface coordinate. |

${z}_{p}(j)$ | Euclidean distance between the jth predicted, tracked, triangulated point to the nearest true object surface coordinate. |

$\Delta t$ | Change in time between demand instants. |

${\theta}_{j}$ | Angular separation between the jth camera pair. |

${\theta}_{\mathrm{max}}$ | Maximum angular separation for a camera pair. |

$C$ | Total number of cameras. |

$H$ | State transition matrix, [9 × 9]. |

$K$ | Matrix of stereo camera pairs, [2 × k_{v}]. |

${K}_{\mathrm{max}}$ | Matrix of camera combination pairs, [2 × k_{max}]. |

$Q$ | Set of particles, [9 × q]. |

${Q}^{+}$ | Set of projected particles, [9 × q]. |

$S$ | True surface area for the object model. |

$T$ | Triangulation of all individual surface patches. |

$U$ | Uncertainty matrix for tracking and prediction, [9 × n]. |

${V}_{L}$ | Vectors between given SIFT feature and its nearest neighbors in the left image, [2 × n]. |

${V}_{R}$ | Vectors between given SIFT feature and its nearest neighbors in the right image, [2 × n]. |

${W}_{j}$ | Weight matrix for all particles of the jth tracked point, [9 × q]. |

## Appendix A

## Appendix B

## References

- Olague, G.; Mohr, R. Optimal camera placement for accurate reconstruction. Pattern Recognit.
**2002**, 35, 927–944. [Google Scholar] [CrossRef][Green Version] - MacKay, M.D.; Fenton, R.G.; Benhabib, B. Multi-camera active surveillance of an articulated human form—An implementation strategy. Comput. Vis. Image Underst.
**2011**, 115, 1395–1413. [Google Scholar] [CrossRef] - Schacter, D.S.; Donnici, M.; Nuger, E.; MacKay, M.D.; Benhabib, B. A multi-camera active-vision system for deformable-object-motion capture. J. Intell. Robot. Syst.
**2014**, 75, 413–441. [Google Scholar] [CrossRef] - Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell.
**2010**, 32, 1362–1376. [Google Scholar] [CrossRef] [PubMed] - Koch, A.; Dipanda, A. Direct 3D Information Determination in an Uncalibrated Stereovision System by Using Evolutionary Algorithms. Intell. Comput. Vis. Image Process. Innov. Appl. Des. Innov. Appl. Des.
**2013**, 101. [Google Scholar] [CrossRef] - Forsyth, D.A.; Ponce, J. Computer Vision: A Modern Approach, 2nd ed.; Pearson: London, UK, 2012. [Google Scholar]
- Blais, F. Review of 20 years of range sensor development. J. Electron. Imaging
**2004**, 13, 231–240. [Google Scholar] [CrossRef] - Slembrouck, M.; Niño-Castañeda, J.; Allebosch, G.; van Cauwelaert, D.; Veelaert, P.; Philips, W. High performance multi-camera tracking using shapes-from-silhouettes and occlusion removal. In Proceedings of the 9th International Conference on Distributed Smart Camera, Seville, Spain, 8–11 September 2015; pp. 44–49. [Google Scholar]
- Goesele, M.; Curless, B.; Seitz, S.M. Multi-view stereo revisited. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2402–2409. [Google Scholar]
- de Aguiar, E.; Stoll, C.; Theobalt, C.; Ahmed, N.; Seidel, H.-P.; Thrun, S. Performance Capture from Sparse Multi-View Video. ACM Trans. Graph.
**2008**, 27, 98–108. [Google Scholar] [CrossRef] - McNeil, J.G.; Lattimer, B.Y. Real-Time Classification of Water Spray and Leaks for Robotic Firefighting. Int. J. Comput. Vis. Image Process.
**2015**, 5, 1–26. [Google Scholar] [CrossRef] - Lee, K.-R.; Nguyen, T. Realistic surface geometry reconstruction using a hand-held RGB-D camera. Mach. Vis. Appl.
**2016**, 27, 377–385. [Google Scholar] [CrossRef] - MacKay, M.D.; Fenton, R.G.; Benhabib, B. Time-varying-geometry object surveillance using a multi-camera active-vision system. Int. J. Smart Sens. Intell. Syst.
**2008**, 1, 679–704. [Google Scholar] [CrossRef] - Schacter, D.S. Multi-Camera Active-Vision System Reconfiguration for Deformable Object Motion Capture; University of Toronto: Toronto, ON, Canada, 2014. [Google Scholar]
- Gupta, J.P.; Singh, N.; Dixit, P.; Semwal, V.B.; Dubey, S.R. Human activity recognition using gait pattern. Int. J. Comput. Vis. Image Process.
**2013**, 3, 31–53. [Google Scholar] [CrossRef] - Kulikova, M.; Jermyn, I.; Descombes, X.; Zhizhina, E.; Zerubia, J. A marked point process model including strong prior shape information applied to multiple object extraction from images. Intell. Comput. Vis. Image Process. Innov. Appl. Des. Innov. Appl. Des.
**2013**, 71. [Google Scholar] [CrossRef] - Naish, M.D.; Croft, E.A.; Benhabib, B. Simulation-based sensing-system configuration for dynamic dispatching. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Tucson, AZ, USA, 7–10 October 2001; Volume 5, pp. 2964–2969. [Google Scholar]
- Zhang, Z.; Xu, D.; Yu, J. Research and Latest Development of Ping-Pong Robot Player. In Proceedings of the 7th World Congress on Intelligent Control. and Automation, Chongqing, China, 25–27 June 2008; pp. 4881–4886. [Google Scholar]
- Barteit, D.; Frank, H.; Kupzog, F. Accurate Prediction of Interception Positions for Catching Thrown Objects in Production Systems. In Proceedings of the 6th IEEE International Conference on Industrial Informatics, Daejeon, Korea, 13–16 July 2008; pp. 893–898. [Google Scholar]
- Tomasi, C.; Kanade, T. Shape and motion from image streams: A factorization method. Proc. Natl. Acad. Sci. USA
**1993**, 90, 9795–9802. [Google Scholar] [CrossRef] [PubMed] - Pollefeys, M.; Vergauwen, M.; Cornelis, K.; Tops, J.; Verbiest, F.; van Gool, L. Structure and motion from image sequences. In Proceedings of the Conference on Optical 3D Measurement Techniques, Zurich, Switzerland, 22–25 September 2001; pp. 251–258. [Google Scholar]
- Lhuillier, M.; Quan, L. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 418–433. [Google Scholar] [CrossRef] [PubMed][Green Version] - Snavely, N.; Seitz, S.M.; Szeliski, R. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph.
**2006**, 25, 835–846. [Google Scholar] [CrossRef] - Jin, H.; Soatto, S.; Yezzi, A.J. Multi-view stereo reconstruction of dense shape and complex appearance. Int. J. Comput. Vis.
**2005**, 63, 175–189. [Google Scholar] [CrossRef] - Jancosek, M.; Pajdla, T. Multi-view reconstruction preserving weakly-supported surfaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3121–3128. [Google Scholar]
- Furukawa, Y.; Ponce, J. Carved visual hulls for image-based modeling. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 564–577. [Google Scholar]
- Li, Q.; Xu, S.; Xia, D.; Li, D. A novel 3D convex surface reconstruction method based on visual hull. Pattern Recognit. Comput. Vis.
**2011**, 8004, 800412. [Google Scholar] - Roshnara Nasrin, P.P.; Jabbar, S. Efficient 3D visual hull reconstruction based on marching cube algorithm. In Proceedings of the International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 19–20 March 2015; pp. 1–6. [Google Scholar]
- Laurentini, A. Visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell.
**1994**, 16, 150–162. [Google Scholar] [CrossRef] - Esteban, C.H.; Schmitt, F. Silhouette and stereo fusion for 3d object modeling. Comput. Vis. Image Underst.
**2004**, 96, 367–392. [Google Scholar] [CrossRef] - Terauchi, T.; Oue, Y.; Fujimura, K. A flexible 3D modeling system based on combining shape-from-silhouette with light-sectioning algorithm. In Proceedings of the International Conference on 3-D Digital Imaging and Modeling, San Diego, CA, USA, 20–25 June 2005; pp. 196–203. [Google Scholar]
- Yemez, Y.; Wetherilt, C.J. A volumetric fusion technique for surface reconstruction from silhouettes and range data. Comput. Vis. Image Underst.
**2007**, 105, 30–41. [Google Scholar] [CrossRef] - Cremers, D.; Kolev, K. Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans. Pattern Anal. Mach. Intell.
**2011**, 33, 1161–1174. [Google Scholar] [CrossRef] [PubMed] - Guan, L.; Franco, J.-S.; Pollefeys, M. Multi-Object Shape Estimation and Tracking from Silhouette Cues. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Sedai, S.; Bennamoun, M.; Huynh, D.Q. A Gaussian Process Guided Particle Filter For Tracking 3D Human Pose In Video. IEEE Trans. Image Process.
**2013**, 22, 4286–4300. [Google Scholar] [CrossRef] [PubMed] - Lallemand, J.; Szczot, M.; Ilic, S. Human Pose Estimation in Stereo Images. In Proceedings of the 8th International Conference on Articulated Motion and Deformable Objects, Palma de Mallorca, Spain, 16–18 July 2014; pp. 10–19. [Google Scholar]
- Charles, J.; Pfister, T.; Everingham, M.; Zisserman, A. Automatic and Efficient Human Pose Estimation for Sign Language Videos. Int. J. Comput. Vis.
**2014**, 110, 70–90. [Google Scholar] [CrossRef] - López-Quintero, M.I.; Marin-Jiménez, M.J.; Muñoz-Salinas, R.; Madrid-Cuevas, F.J.; Medina-Carnicer, R. Stereo Pictorial Structure for 2D articulated human pose estimation. Mach. Vis. Appl.
**2016**, 27, 157–174. [Google Scholar] [CrossRef] - Biasi, N.; Setti, F. Garment-based motion capture (GaMoCap): High-density capture of human shape in motion. Mach. Vis. Appl.
**2015**, 26, 955–973. [Google Scholar] [CrossRef] - Hasler, N.; Rosenhahn, B.; Thormählen, T.; Wand, M.; Gall, J.; Seidel, H.P. Markerless motion capture with unsynchronized moving cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 224–231. [Google Scholar]
- Bradley, D.; Popa, T.; Sheffer, A.; Heidrich, W.; Boubekeur, T. Markerless garment capture. ACM Trans. Graph.
**2008**, 27, 1–9. [Google Scholar] [CrossRef] - Bradley, D.; Heidrich, W.; Popa, T.; Sheffer, A. High Resolution Passive Facial Performance Capture. ACM Trans. Graph.
**2010**, 29, 41–50. [Google Scholar] [CrossRef] - Corazza, S.; Mündermann, L.; Gambaretto, E.; Ferrigno, G.; Andriacchi, T.P. Markerless motion capture through visual hull, articulated ICP and subject specific model generation. Int. J. Comput. Vis.
**2010**, 87, 156–169. [Google Scholar] [CrossRef] - Corazza, S.; Muendermann, L.; Chaudhari, A.M.; Demattio, T.; Cobelli, C.; Andriacchi, T.P. A markerless motion capture system to study musculoskeletal biomechanics: Visual hull and simulated annealing approach. Ann. Biomed. Eng.
**2006**, 34, 1019–1029. [Google Scholar] [CrossRef] [PubMed] - Schulman, J.; Lee, A.; Ho, J.; Abbeel, P. Tracking Deformable Objects with Point Clouds. In Proceedings of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1130–1137. [Google Scholar]
- Petit, B.; Lesage, J.D. Multicamera real-time 3D modeling for telepresence and remote collaboration. Int. J. Digit. Multimed. Broadcast.
**2010**, 2010. [Google Scholar] [CrossRef] - Matsuyama, T.; Wu, X.; Takai, T.; Nobuhara, S. Real-time 3D shape reconstruction, dynamic 3D mesh deformation, and high delity visualization for 3D video. Comput. Vis. Image Underst.
**2004**, 96, 393–434. [Google Scholar] [CrossRef] - Hapák, J.; Jankó, Z.; Chetverikov, D. Real-Time 4D Reconstruction of Human Motion. In Proceedings of the 7th International Conference on Articulated Motion and Deformable Objects, Mallorca, Spain, 11–13 July 2012; pp. 250–259. [Google Scholar]
- Tsekourakis, I.; Mordohai, P. Consistent 3D Background Model Estimation from Multi-viewpoint Videos. In Proceedings of the International Conference on 3D Vision (3DV), Lyon, France, 19–22 October 2015; pp. 144–152. [Google Scholar]
- Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems 1. ASME Trans. J. Basic Eng.
**1960**, 82, 35–45. [Google Scholar] [CrossRef] - Naish, M.D.; Croft, E.A.; Benhabib, B. Coordinated dispatching of proximity sensors for the surveillance of manoeuvring targets. Robot. Comput. Integr. Manuf.
**2003**, 19, 283–299. [Google Scholar] [CrossRef] - Bakhtari, A.; Naish, M.D.; Eskandari, M.; Croft, E.A.; Benhabib, B. Active-vision-based multisensor surveillance-an implementation. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.
**2006**, 36, 668–680. [Google Scholar] [CrossRef] - Bakhtari, A.; MacKay, M.D.; Benhabib, B. Active-vision for the autonomous surveillance of dynamic, multi-object environments. J. Intell. Robot. Syst.
**2009**, 54, 567–593. [Google Scholar] [CrossRef] - Ristic, B.; Arulampalam, S.; Gordon, N. A tutorial on particle filters. In Beyond the Kalman Filter: Particle Filter for Tracking Applications; Artech House: Boston, MA, USA, 2004; pp. 35–62. [Google Scholar]
- Eberhart, R.C.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human, Nagoya, Japan, 4–6 October 1995; Volume 1, pp. 39–43. [Google Scholar]
- Zhang, X. A swarm intelligence based searching strategy for articulated 3D human body tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 45–50. [Google Scholar]
- Kwolek, B.; Krzeszowski, T.; Gagalowicz, A.; Wojciechowski, K.; Josinski, H. Real-time multi-view human motion tracking using particle swarm optimization with resampling. In Proceedings of the International Conference on Articulated Motion and Deformable Objects (AMDO), Mallorca, Spain, 11–13 July 2012; pp. 92–101. [Google Scholar]
- Richa, R.; Bó, A.P.L.; Poignet, P. Towards Robust 3D Visual Tracking for Motion Compensation in Beating Heart Surgery. Med. Image Anal.
**2011**, 15, 302–315. [Google Scholar] [CrossRef] [PubMed] - Popham, T. Tracking 3D Surfaces Using Multiple Cameras: A Probabilistic Approach; University of Warwick: Coventry, UK, 2010. [Google Scholar]
- Furukawa, Y.; Ponce, J. Dense 3D motion capture for human faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 10–25 June 2009; pp. 1674–1681. [Google Scholar]
- Hernández-Rodriguez, F.; Castelán, M. A photometric sampling method for facial shape recovery. Mach. Vis. Appl.
**2016**, 27, 483–497. [Google Scholar] [CrossRef] - Lowe, D.G. Object Recognition from Local Scale-Invariant Features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Yu, G.; Morel, J.-M. ASIFT: An Algorithm for Fully Affine Invariant Comparison. Image Process. Line
**2011**, 1, 11–38. [Google Scholar] [CrossRef] - Bay, H.; Ess, A.; Tuytelaars, T.; van Gool, L. Speeded up robust features (SURF). Comput. Vis. Image Underst.
**2008**, 110, 346–359. [Google Scholar] [CrossRef] - Doshi, A.; Starck, J.; Hilton, A. An Empirical Study of Non-Rigid Surface Feature Matching of Human from 3D Video. J. Virtual Real. Broadcast.
**2010**, 7, 1860–2037. [Google Scholar] - Khan, N.; McCane, B.; Mills, S. Better than SIFT? Mach. Vis. Appl.
**2015**, 26, 819–836. [Google Scholar] [CrossRef] - Brown, M.; Lowe, D.G. Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis.
**2007**, 74, 59–73. [Google Scholar] [CrossRef] - Du, X.; Tan, K.K. Vision-based approach towards lane line detection and vehicle localization. Mach. Vis. Appl.
**2016**, 27, 175–191. [Google Scholar] [CrossRef] - Altuntas, C. Pair-wise automatic registration of three-dimensional laser scanning data from historical building by created two-dimensional images. Opt. Eng.
**2014**, 53, 53108. [Google Scholar] [CrossRef] - Moisan, L.; Stival, B. A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix. Int. J. Comput. Vis.
**2004**, 57, 201–218. [Google Scholar] [CrossRef] - Owczarek, M.; Baranski, P.; Strumillo, P. Pedestrian tracking in video sequences: A particle filtering approach. In Proceedings of the Federated Conference on Computer Science and Information Systems, Lodz, Poland, 13–16 September 2015; pp. 875–881. [Google Scholar]
- Welch, G.; Bishop, G. An introduction to the Kalman filter. In Pract.
**2006**, 7, 1–16. [Google Scholar] - Chen, S.Y. Kalman filter for robot vision: A survey. IEEE Trans. Ind. Electron.
**2012**, 59, 4409–4420. [Google Scholar] [CrossRef] - Marron, M.; Garcia, J.C.; Sotelo, M.A.; Cabello, M.; Pizarro, D.; Huerta, F.; Cerro, J. Comparing a Kalman Filter and a Particle Filter in a Multiple Objects Tracking Application. In Proceedings of the IEEE International Symposium on Intelligent Signal Processing, Alcala de Henares, Spain, 3–5 October 2007; pp. 1–6. [Google Scholar]
- Chen, S.; Li, Y.; Kwok, N.M. Active vision in robotic systems: A survey of recent developments. Int. J. Rob. Res.
**2011**, 30, 1343–1377. [Google Scholar] [CrossRef] - Leizea, I.; Álvarez, H.; Borro, D. Real time non-rigid 3D surface tracking using particle filter. Comput. Vis. Image Underst.
**2015**, 133, 51–65. [Google Scholar] [CrossRef] - Hasinoff, S.W.; Durand, F.; Freeman, W.T. Noise-optimal capture for high dynamic range photography. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 553–560. [Google Scholar]
- Richa, R.; Poignet, P. Efficient 3D Tracking for Motion Compensation in Beating Heart Surgery. Int. Conf. Med. Image Comput. Comput. Interv.
**2008**, 11, 684–691. [Google Scholar] - Blender Online Community Blender—A 3D Modelling and Rendering Package; Blender Institute: Amsterdam, The Netherlands, 2016.
- Vedaldi, A.; Fulkerson, B. {VLFeat}—An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010. [Google Scholar]
- Li, X.; Zhu, S.; Chen, L. Statistical background model-based target detection. Pattern Anal. Appl.
**2016**, 19, 783–791. [Google Scholar] [CrossRef] - Nieto, M.; Ortega, J.D.; Leškovský, P.; Senderos, O. Constant-time monocular object detection using scene geometry. Pattern Anal. Appl.
**2018**, 21, 1053–1066. [Google Scholar] [CrossRef] - Mignotte, M. A biologically inspired framework for contour detection. Pattern Anal. Appl.
**2017**, 20, 365–381. [Google Scholar] [CrossRef] - Ye, S.; Liu, C.; Li, Z. A double circle structure descriptor and Hough voting matching for real-time object detection. Pattern Anal. Appl.
**2016**, 19, 1143–1157. [Google Scholar] [CrossRef] - Tang, M.; Gorelick, L.; Veksler, O.; Boykov, Y. Grabcut in one cut. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 1769–1776. [Google Scholar]

**Figure 5.**

**Left**: triangulation, prediction, and relative prediction error metrics for each demand instant.

**Right**: total number of tracked and triangulated points for each demand instant.

**Figure 7.**

**Left**: triangulation, prediction, and relative prediction error metrics for each demand instant.

**Right**: total number of tracked and triangulated points for each demand instant.

**Figure 9.**

**Left**: triangulation, prediction, and relative prediction error metrics for each demand instant.

**Right**: total number of tracked and triangulated points for each demand instant.

**Figure 11.**Stereo-camera simulation process for one demand instant, (

**a**) first camera position, (

**b**) second camera position. Units in [mm].

**Figure 13.**Background segmentation through GrabCut: (

**a**) input image with guides; (

**b**) resulting binary segmented image.

Experiment | Unoccluded | Occluded |
---|---|---|

1 | 25.58 s | 26.61 s |

2 | 24.78 s | 25.57 s |

3 | 31.10 s | 32.92 s |

4 | 19.48 s | 24.73 s |

5 | 20.51 s | 23.24 s |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nuger, E.; Benhabib, B. A Methodology for Multi-Camera Surface-Shape Estimation of Deformable Unknown Objects. *Robotics* **2018**, *7*, 69.
https://doi.org/10.3390/robotics7040069

**AMA Style**

Nuger E, Benhabib B. A Methodology for Multi-Camera Surface-Shape Estimation of Deformable Unknown Objects. *Robotics*. 2018; 7(4):69.
https://doi.org/10.3390/robotics7040069

**Chicago/Turabian Style**

Nuger, Evgeny, and Beno Benhabib. 2018. "A Methodology for Multi-Camera Surface-Shape Estimation of Deformable Unknown Objects" *Robotics* 7, no. 4: 69.
https://doi.org/10.3390/robotics7040069