# Template Matching for Wide-Baseline Panoramic Images from a Vehicle-Borne Multi-Camera Rig

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Geometry of A Multi-Camera Rig System

#### 2.1. Ideal Panoramic Camera Model

**X**

_{p}, the corresponding panoramic point u with coordinate vector

**X**= [x′, y′, z′], and the panoramic center S (Figure 1a). In Equation (1),

**R**and

**T**are the rotation matrix and translation vector, respectively, and λ is the scale difference between the panoramic and world coordinate systems. In Equation (2),

**X**is restricted on the surface of a sphere with radius r.

**X**can be obtained from a 2D image point

**x**= [x, y]. In Equation (3), ϕ

_{h}and ϕ

_{v}are the horizontal angle with the range [−π, π] and the elevation angle with the range [−0.5π, 0.5π] respectively. w and h are the width and height of the panoramic image respectively. Equation (4) calculates sphere coordinate

**X**using a right-hand coordinate system.

#### 2.2. Rigorous Panoramic Camera Model

_{c}on the fish-eye image (the solid line). For convenience, the fish-eye image coordinates are usually firstly transformed to a virtual plane camera coordinates by choosing a fisheye camera model [30] and a calibration model [31]; we denote the transformation as

**K**

_{c}. Then, according to the rotation

**R**

_{c}and translation

**T**

_{c}between a fisheye camera and the virtual panoramic camera, Equation (5) describes a fisheye image point u

_{c}with a coordinate

**x**

_{c}projected to the corresponding panoramic camera point u with a coordinate

**X**.

**K**

_{c},

**R**

_{c},

**T**

_{c}are typically fixed values with pre-calibration, and k is the scale factor between the ideal plane and the panoramic sphere, which can be calculated by combining Equations (2) and (5).

**X**with its world coordinate

**X**

_{p}. According to the solid line in Figure 2b that passes C, u, and P’, a more rigorous model could be constructed as Equation (6) where the projection centre lies in the fisheye camera.

**T**

_{c}to the panoramic camera.

#### 2.3. Ideal Epipolar of Panoramic Stereo

**B**= [B

_{x}B

_{y}B

_{z}] and the corresponding rays

**X**

_{1}= [X

_{1}Y

_{1}Z

_{1}] and

**X**

_{2}= [X

_{2}Y

_{2}Z

_{2}] in panoramic coordinates.

**X**

_{2}’ =

**RX**

_{2}is the ray that has been translated to the coordinates of the left camera by a rotation

**R**. Then we have

**X**

_{1}and

**R**and a = B

_{y}Z

_{1}− B

_{z}Y

_{1}, b = B

_{z}X

_{1}− B

_{x}Z

_{1}, c = T

_{x}Y

_{1}− T

_{y}X

_{1}. Combined with Equation (2), the epipolar line of ideal panoramic stereo images is a large circle through the panoramic camera centre.

_{2}+ bY

_{2}+ cZ

_{2}= 0

## 3. Multi-View Template Matching for Panoramic Images

#### 3.1. Pre-Processing of Feature-Based Matching and Local Bundle Adjustment

#### 3.2. Error Estimation of Panoramic Epipolar

#### 3.3. Template Matching with Different Feature Descriptors

_{i}of the current candidate point i (as every point on the panoramic epipolar has its own tangent direction as shown in Figure 7b), and then obtain the direction difference between d

_{i}and the tangent direction of the reference epipolar line d

_{0}. The reference patch is rotated with d

_{0}− d

_{i}degree to compensate the angle bias (Figure 7c). The calculation of the direction difference is not very rigorous and could introduce tiny direction bias however it is very slight and could be ignored for the tremendous efficiency improvement.

## 4. Experiments and Results

#### 4.1. Test Design

#### 4.2. Results of the Wuhan Data (with Depth Map)

#### 4.3. Results of the Kashiwa Data (without Depth Map)

## 5. Discussion

#### 5.1. The Difference between the AccSIFT and SIFT Descriptors

#### 5.2. Comparison to Most Recent Studies on Template Matching

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Yang, B.; Fang, L.; Li, J. Semi-automated extraction and delineation of 3d roads of street scene from mobile laser scanning point clouds. ISPRS-J. Photogramm. Remote Sens.
**2013**, 79, 80–93. [Google Scholar] [CrossRef] - Paparoditis, N.; Papelard, J.-P.; Cannelle, B.; Devaux, A.; Soheilian, B.; David, N.; Houzay, E. Stereopolis II: A multi-purpose and multi-sensor 3d mobile mapping system for street visualisation and 3d metrology. Rev. Fr. Photogramm. Télédétec.
**2012**, 200, 69–79. [Google Scholar] - Corso, N.; Zakhor, A. Indoor localization algorithms for an ambulatory human operated 3d mobile mapping system. Remote Sens.
**2013**, 5, 6611–6646. [Google Scholar] [CrossRef] - El-Sheimy, N.; Schwarz, K. Navigating urban areas by VISAT—A mobile mapping system integrating GPS/INS/digital cameras for GIS applications. Navigation
**1998**, 45, 275–285. [Google Scholar] [CrossRef] - Jaakkola, A.; Hyyppä, J.; Kukko, A.; Yu, X.; Kaartinen, H.; Lehtomäki, M.; Lin, Y. A low-cost multi-sensoral mobile mapping system and its feasibility for tree measurements. ISPRS-J. Photogramm. Remote Sens.
**2010**, 65, 514–522. [Google Scholar] [CrossRef] - Kim, G.H.; Sohn, H.G.; Song, Y.S. Road infrastructure data acquisition using a vehicle-based mobile mapping system. Comput.-Aided Civ. Infrastruct. Eng.
**2006**, 21, 346–356. [Google Scholar] [CrossRef] - Briechle, K.; Hanebeck, U.D. Template matching using fast normalized cross correlation. Proc. SPIE
**2001**, 4387. [Google Scholar] [CrossRef] - Brunelli, R. Template Matching Techniques in Computer Vision: Theory and Practice, 1st ed.; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). J. Comput. Vis. Image Underst.
**2004**, 110, 346–359. [Google Scholar] [CrossRef] - Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Mei, C.; Benhimane, S.; Malis, E.; Rives, P. Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors. IEEE Trans. Robot.
**2008**, 24, 1352–1364. [Google Scholar] [CrossRef] - Paya, L.; Fernandez, L.; Gil, A.; Reinoso, O. Map building and Monte Carlo localization using global appearance of omnidirectional images. Sensors
**2010**, 10, 11468–11497. [Google Scholar] [CrossRef] [PubMed] - Gutierrez, D.; Rituerto, A.; Montiel, J.M.M.; Guerrero, J.J. Adapting a real-time monocular visual SLAM from conventional to omnidirectional cameras. In Proceedings of the 11th OMNIVIS in IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 343–350. [Google Scholar]
- Geyer, C.; Daniilidis, K. Catadioptric projective geometry. Int. J. Comput. Vis.
**2001**, 45, 223–243. [Google Scholar] [CrossRef] - Barreto, J.P.; Araujo, H. Geometric properties of central catadioptric line images and their application in calibration. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 1327–1333. [Google Scholar] [CrossRef] [PubMed][Green Version] - Parian, J.A.; Gruen, A. Sensor modeling, self-calibration and accuracy testing of panoramic cameras and laser scanners. ISPRS-J. Photogramm. Remote Sens.
**2010**, 65, 60–76. [Google Scholar] [CrossRef] - Shi, Y.; Ji, S.; Shi, Z.; Duan, Y.; Shibasaki, R. Gps-supported visual slam with a rigorous sensor model for a panoramic camera in outdoor environments. Sensors
**2012**, 13, 119–136. [Google Scholar] [CrossRef] [PubMed] - Kaess, M.; Dellaert, F. Probabilistic structure matching for visual SLAM with a multi-camera rig. Comput. Vis. Image Underst.
**2010**, 114, 286–296. [Google Scholar] [CrossRef][Green Version] - Ji, S.; Shi, Y.; Shi, Z.; Bao, A.; Li, J.; Yuan, X.; Duan, Y.; Shibasaki, R. Comparison of two panoramic sensor models for precise 3d measurements. Photogramm. Eng. Remote Sens.
**2014**, 80, 229–238. [Google Scholar] [CrossRef] - Lewis, J.P. Fast normalized cross-correlation. In Proceedings of the Vision Interface, Quebec, OC, Canada, 15–19 June 1995; pp. 120–123. [Google Scholar]
- Zabih, R.; Woodfill, J. Non-parametric local transforms for computing visual correspondence. In Proceedings of the 3rd European Conference on Computer Vision, Stockholm, Sweden, 2–6 May 1994; pp. 151–158. [Google Scholar]
- Talmi, I.; Mechrez, R.; Zelnik-Manor, L. Template matching with deformable diversity similarity. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1311–1319. [Google Scholar]
- Dekel, T.; Oron, S.; Rubinstein, M.; Avidan, S.; Freeman, W.T. Best-buddies similarity for robust template matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2021–2029. [Google Scholar]
- Chantara, W.; Mun, J.-H.; Shin, D.-W.; Ho, Y.-S. Object tracking using adaptive template matching. IEEE Trans. Smart Process. Comput.
**2015**, 4, 1–9. [Google Scholar] [CrossRef] - Wu, T.; Toet, A. Speed-up template matching through integral image based weak classifiers. J. Pattern Recognit. Res.
**2014**, 1, 1–12. [Google Scholar] - Yoo, J.; Hwang, S.S.; Kim, S.D.; Ki, M.S.; Cha, J. Scale-invariant template matching using histogram of dominant gradients. Pattern Recognit.
**2014**, 47, 3006–3018. [Google Scholar] [CrossRef] - Sun, J.; He, F.Z.; Chen, Y.L.; Chen, X. A multiple template approach for robust tracking of fast motion target. Appl. Math. Ser. B
**2016**, 31, 177–197. [Google Scholar] [CrossRef] - Korman, S.; Reichman, D.; Tsur, G.; Avidan, S. Fast-Match: Fast Affine Template Matching. Int. J. Comput. Vis.
**2017**, 121, 1–15. [Google Scholar] [CrossRef] - Hong, C.; Zhu, J.; Yu, J.; Cheng, J.; Chen, X. Realtime and robust object matching with a large number of templates. Multimed. Tools Appl.
**2016**, 75, 1459–1480. [Google Scholar] [CrossRef] - Schneider, D.; Schwalbe, E.; Maas, H.-G. Validation of geometric models for fisheye lenses. ISPRS-J. Photogramm. Remote Sens.
**2009**, 64, 259–266. [Google Scholar] [CrossRef] - Kannala, J.; Brandt, S.S. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell.
**2006**, 28, 1335–1340. [Google Scholar] [CrossRef] [PubMed] - Sinha, S.N.; Frahm, J.M.; Pollefeys, M.; Yakup Genc, Y. GPU-based video feature tracking and matching, EDGE 2006. In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures, Chapel Hill, NC, USA, 23–24 May 2006. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM
**1981**, 24, 381–395. [Google Scholar] [CrossRef] - Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Introduction of a PGR’s Ladybug3 Camera. Available online: https://www.ptgrey.com/ladybug3-360-degree-firewire-spherical-camera-systems (accessed on 1 May 2018).

**Figure 1.**Some examples of image matching. (

**a**) shows the feature-based matching method that firstly extracts arbitrary features (denoted by small circles) from images and then matches them (denoted by lines); (

**b**) shows a traditional temple matching case that uses the corresponding template patch to locate four fiducial marks in an aerial image; (

**c**) shows the template matching case in our study. Given an object or patch (denoted by green box) in one image, the corresponding objects (denoted by blue box) with significant distortion should be retrieved from multi-view panoramic images.

**Figure 2.**The ideal panoramic camera (

**a**) where S, u, and p are collinear, and the multi-camera rig (

**b**) where in fact C, u, and p′ are collinear and C and S don’t overlap.

**Figure 3.**Epipolar of a panoramic image with sphere projection (yellow curve). Points in circles are selected for checking epipolar errors and bigger circles indicate larger errors (numbers beside the circles).

**Figure 4.**A virtual panoramic depth map (

**right**) generated from 3D point cloud (

**left**) using the same pose of the corresponding panoramic image (

**middle**).

**Figure 5.**Depth map supported multi-view matching. The left and right stereos are matched separately and in multi-view intersection, mistakenly matched candidates could be observed and eliminated.

**Figure 6.**The preprocessing for scale and rotation. To reduce the scale changes between stereo image patches, we can resample (

**a**) based on the ratio depth1/depth2 or resample (

**b**) based on depth2/depth1. As for rotation, both of the patch windows could be aligned to the tangent direction of the current point, as shown on (

**c**,

**d**).

**Figure 7.**The computation of the accelerated SIFT descriptor. In (

**a**), a SIFT point is described by 16 8-D vectors, which can be stored in the 16 orange points. Therefore, for every pixel in search area (

**b**) we can calculate and store the 8-D vectors only once. When a point is to be matched, its SIFT descriptor is easily retrieved. In (

**c**) the reference descriptor is rotated according to the difference of the two epipolar directions.

**Figure 9.**Performances of template matching with different descriptors on the Wuhan data. From left to right: reference image and the two adjacent search image, the results of the NCC, CENSUS, HOG, SURF and AccSIFT descriptors respectively.

**Figure 10.**Performances of template matching with different descriptors on the Kashiwa data. From left to right: reference image and the two adjacent search image, the results of the NCC, CENSUS, HOG, SURF and AccSIFT descriptors respectively.

**Figure 12.**The performances of the three template matching methods on the Wuhan data. The green box in the top images indicates the reference to be matched; the left image patches are cropped from the last frame panoramic image and the right cropped from the next frame. The results of our method are denoted by blue box; the results of DDIS-C and BBS-C are denoted by pink and red box respectively.

Total points | Minimum | Maximum | Average | <10 pixel | 10~20 | >20 pixel |
---|---|---|---|---|---|---|

76 | 0 | 35.0 | 8.29 | 53 (69.74%) | 18 (23.68%) | 5 (6.58%) |

Methods | Match Rate | Time (s) |
---|---|---|

Intensity | 72/80 | 0.202 |

CENSUS | 65/80 | 0.119 |

HOG | 69/80 | 0.344 |

SURF | 73/80 | 0.631 |

SIFT | 78/80 | 1.539 |

AccSIFT | 77/80 | 0.195 |

Methods | match Rate | Average Time(s) |
---|---|---|

Intensity | 101/120 | 1.259 |

CENSUS | 91/120 | 0.572 |

HOG | 96/120 | 1.797 |

SURF | 103/120 | 2.849 |

SIFT | 112/120 | 4.646 |

Acc SIFT | 107/120 | 1.042 |

Methods | Correct Rate | Average Time (s) |
---|---|---|

AccSIFT | 77/80 | 0.195 |

DDIS-C | 74/80 | 0.187 |

BBS-C | 63/80 | 0.325 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ji, S.; Yu, D.; Hong, Y.; Lu, M. Template Matching for Wide-Baseline Panoramic Images from a Vehicle-Borne Multi-Camera Rig. *ISPRS Int. J. Geo-Inf.* **2018**, *7*, 236.
https://doi.org/10.3390/ijgi7070236

**AMA Style**

Ji S, Yu D, Hong Y, Lu M. Template Matching for Wide-Baseline Panoramic Images from a Vehicle-Borne Multi-Camera Rig. *ISPRS International Journal of Geo-Information*. 2018; 7(7):236.
https://doi.org/10.3390/ijgi7070236

**Chicago/Turabian Style**

Ji, Shunping, Dawen Yu, Yong Hong, and Meng Lu. 2018. "Template Matching for Wide-Baseline Panoramic Images from a Vehicle-Borne Multi-Camera Rig" *ISPRS International Journal of Geo-Information* 7, no. 7: 236.
https://doi.org/10.3390/ijgi7070236