Open Access
This article is
 freely available
 reusable
Information 2018, 9(12), 299; https://doi.org/10.3390/info9120299
Article
TriSIFT: A TriangulationBased Detection and Matching Algorithm for FishEye Images
^{1}
Key Laboratory of Optical Electrical Image Processing, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
^{2}
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
^{3}
Department of electrical and Information Engineering, Hebei Jiaotong Vocational and Technical College, Shijiazhuang 050035, China
^{*}
Author to whom correspondence should be addressed.
Received: 6 November 2018 / Accepted: 21 November 2018 / Published: 26 November 2018
Abstract
:Keypoint matching is of fundamental importance in computer vision applications. Fisheye lenses are convenient in such applications that involve a very wide angle of view. However, their use has been limited by the lack of an effective matching algorithm. The Scale Invariant Feature Transform (SIFT) algorithm is an important technique in computer vision to detect and describe local features in images. Thus, we present a TriSIFT algorithm, which has a set of modifications to the SIFT algorithm that improve the descriptor accuracy and matching performance for fisheye images, while preserving its original robustness to scale and rotation. After the keypoint detection of the SIFT algorithm is completed, the points in and around the keypoints are backprojected to a unit sphere following a fisheye camera model. To simplify the calculation in which the image is on the sphere, the form of descriptor is based on the modification of the Gradient Location and Orientation Histogram (GLOH). In addition, to improve the invariance to the scale and the rotation in fisheye images, the gradient magnitudes are replaced by the area of the surface, and the orientation is calculated on the sphere. Extensive experiments demonstrate that the performance of our modified algorithms outweigh that of SIFT and other related algorithms for fisheye images.
Keywords:
SIFT; triangulation; detection; matching; fisheye1. Introduction
Visual feature extraction and matching are the most basic and difficult problems in computer vision and application of optical engineering. Many applications are built on visual feature matching, such as robotic navigation, image stitching, 3D modeling, gesture recognition, and video tracking. In most of these applications, unconventional lensed cameras with nonlinear projection exhibit numerous advantages compared to regular cameras. A camera equipped with microlenses and borescopes enables the visual inspection of cavities that are difficult to access [1], whereas a camera equipped with a fisheye lens can acquire wide fieldofview (FOV) images for a thorough visual coverage of environments. Such a camera also improves the performance of geomotion estimation by avoiding the ambiguity between translation and rotation motions [2,3].
However, the visual feature matching algorithms designed for perspective images cannot handle the strong radial distortion introduced by the optics [4,5,6,7]. The entire hemispherical field is covered in front of the fisheye camera, and the view angle of fisheye lenses is in the range of 0°–180°. In addition, a fisheye lenses obey other projection models because the hemispherical field of view cannot be projected on a finite image plane through a perspective projection. Thus, the fisheye model is different from the common camera model, and the inherent distortion of a fisheye lens is similar to that of the pinhole model [8]. Because of the distinctive feature of the camera lens and the valuable wide angle of the images, fisheye images suffer from large radial distortion and change in scale according to the image locations.
In this paper, we propose the TriSIFT feature matching method to overcome radial distortion of fisheye cameras. We demonstrate how radial distortion affects the performance of the original Scale Invariant Feature Transform (SIFT) algorithm and propose a set of modifications that improve the matching effectiveness. The paper provides a detailed account of the method, presenting details of a thorough analysis and experimental validation.
In detail, we propose a triangulationbased detection and matching algorithm combined with the camera’s imaging model to eliminate the impact of distortion. Our method improves the robustness of the proposed method to distortion and enhances the efficiency of feature point matching in large distortion areas.
2. Related Work
SIFT is an algorithm in computer vision used to extract and describe local features in images [9]. It is able to extract stable features from resized and rotated images. The SIFT algorithm exhibits stable performance in terms of the images’ scale, size, and noise in the Gaussian scale space. Moreover, the SIFT algorithm can adapt to perspective and lighting transformations. SIFT’s superior performance has rapidly made it the most commonly used feature extraction algorithm.
Recently, several algorithms concerning keypoint detection and matching in fisheye images have been proposed [5,6,7,10,11]. In a series of studies [6,7], Hansen, Corke, and Boles proposed a method that involves using stereographic projections for approximating the diffusion on a sphere. In their methods, SIFT was modified for images with significant distortion. In [10], Lourenço et al. proposed the adaptive Gaussian filtering to correct the SIFT algorithm. This method detects keypoints by looking for extrema in a scalespace representation obtained using a kernel that adapts the distortion at each image pixel position. It also achieved description invariance to RD (radial distortion) by implementing implicit gradient correction using the Jacobian of the distortion function. In [11], Denny et al. described a method to photogrammetrically estimate the intrinsic and extrinsic parameters of fisheye cameras, with the aim of providing a rectified image for scene viewing applications. While some works simply ignored the pernicious effects of the radial distortion and directly applied the original algorithm to distorted images [12], others performed a preliminary correction of distortion through image rectification and then applied SIFT [13]. The latter approach is quite straightforward, but it has two major drawbacks: the explicit distortion correction can be computationally expensive for the case of large frames; more importantly, the interpolation required by the image rectification introduces artifacts that affect the detection repeatability.
3. SIFT Algorithm Theory
In this section, we briefly introduce the SIFT algorithm, which the TriSIFT algorithm is inspired by. The major steps of the SIFT algorithm are: detecting the threshold in scale space, locating features, selecting the dominant orientation for feature points, and establishing the features’ descriptor. For an image I(x,y),
using the Gaussian function (1) as the convolution kernel, the scale space of a twodimensional image can be obtained using a Gaussian kernel convolution. σ is the width parameter of the function, which controls the radial extent of the function.
$$G(x,y,\sigma )=\frac{1}{2\pi {\sigma}^{2}}{e}^{({x}^{2}+{y}^{2})}/2{\sigma}^{2}$$
$$L(x,y,\sigma )=G(x,y,\sigma )\times I(x,y)$$
The SIFT algorithm determines the feature points by detecting local keypoints in a twodimensional Difference of Gaussian (DoG) scale space to ensure the unique and stable feature points. The DoG operator is defined as the subtraction of the two different scales of the Gaussian kernel. k is the scale factor. It is the approximation of the normalized $16\times 16$ Laplacian of Gaussian (LoG)
$$D(x,y,\sigma )=(G(x,y,k\sigma )G(x,y,\sigma ))\ast I(x,y)=L(x,y,k\sigma )L(x,y,\sigma )$$
The feature point requires a total of 26 neighborhood pixels to ensure that the chosen pixel is the local keypoint in the scale space and the twodimensional image space. SIFT detects the local keypoint by fitting a threedimensional quadratic function to determine the location and scale of feature points (up to subpixel accuracy). In addition, the SIFT algorithm excludes lowcontrast feature points and unstable edge response points to enhance the matching stability and suppress noise. By assigning a dominant orientation for every feature point, the feature point’s descriptor is described in its dominant orientation to achieve rotation invariance. The value of the gradient $m(x,y)$ and the orientation $\theta (x,y)$ of each image $L(x,y)$ are obtained by the differences between pixel points.
Finally, a $16\times 16$ neighborhood window of the feature point in the rotated image is obtained, and the window is evenly divided into subregions. A gradient orientation histogram of eight orientations in every subregion is calculated, and the values of all gradient d orientations are accumulated. The feature point’s descriptor is a $4\times 4\times 8=128$ dimensional vector. Then, SIFT renormalizes the dimensional vector to eliminate the impact of the light transform.
4. TriSIFT Algorithm
We propose the TriSIFT algorithm, which is an extension of the SIFT algorithm, for application to fisheye images. The first stage of TriSIFT is the search for keypoints over all scales and image locations. It is implemented efficiently by using a DoG function to identify potential interest points that are invariant to the scale and orientation. For each candidate location, a detailed model is fit to determine the location and scale, and keypoints are selected based on their stability. To match the points extracted from different fisheye images, and those obtained using the proposed algorithm, a Local Spherical Descriptor (LSD) is computed at each point on the surface of a unit sphere. The descriptor is obtained using the spherical representation of the image and consists of a set of histograms of orientations in the region around the given point. The size of the region depends on the scale (σ) at which the point has been detected. The magnitude of the LSD is calculated by the area of the triangle obtained by triangulating the points in a circular area surrounding the keypoint. The orientation of the LSD is along the normal component of the plane determined by the three vertices of the triangle.
In this section, we first introduce backprojection and triangulation and the Delaunay triangulation algorithm, and then describe the method of calculation of the dominant orientation and the descriptor construction.
4.1. BackProjection
The distortion caused by the nonlinear projection of a fisheye camera lens causes nonuniform compression of the image structures, which affects the SIFT matching performance. The conventional method is to rectify the fisheye image to the undistorted image by explicitly correcting the distortion and applying classical SIFT to the rectified image [14,15]. The solution is straightforward; however, the problem is that the distortion correction by image resampling requires reconstruction of the signal from the initial discrete image. Thus, there are highfrequency components that cannot be recovered (e.g., low resolution and aliasing), and the reconstruction filters are imperfect. This outcome negatively affects the construction of the descriptor and decreases thus the accuracy of keypoint matching on images.
In this paper, we propose a modelbased approach by transforming the fisheye image to its original state, in which the lights of the physical world pass through the camera lens. As shown in Figure 1, the projection process of the spherical model for an omnidirectional camera can be divided into two steps [16,17,18]. We assume a point $P(X,Y,Z)$ in space to demonstrate these two steps. In the first step, the point is linearly projected along the incident ray to a point $\tilde{p}$ on the unit sphere, where $\theta $ is the angle between the incident ray oP and the principal axis ${z}_{c}$. r is the distance between the image point and the principal point. In the second step, the point $\tilde{p}$ is then nonlinearly projected to a point $p$ on the image plane $XOY$. There are several mathematical models to describe the second projection step, such as the following polynomial formulation.
where ${k}_{1}$ is the radial or polar distance from the image point to the origin of the world coordinate system; i.e., ${k}_{1}$ is the focal length. ${k}_{2}$ is the distortion coefficient. X and Y are the image coordinates, and x_{c} and y_{c} are the pixel coordinates of the principal point. $\phi $ is the angle between the $X$axis and the radial line passing through the image point $p$. ${m}_{u}$ and ${m}_{v}$ are the two scale factors denoting the number of pixels per unit distance in the horizontal and vertical orientations, which should be known beforehand. The mapping between the point $P$ in the space and the image point $p$ is reversible, and the reversal can be performed by using (5)—the detailed process is reported in [16].
$$r(\theta )={k}_{1}\theta +{k}_{2}{\theta}^{3}$$
$$\left(\begin{array}{c}X\\ Y\end{array}\right)=r\left(\theta \right)\left[\begin{array}{cc}{m}_{u}& 0\\ 0& {m}_{v}\end{array}\right]\left(\begin{array}{c}\mathrm{cos}\phi \\ \mathrm{sin}\phi \end{array}\right)+\left(\begin{array}{c}{x}_{c}\\ {y}_{c}\end{array}\right).$$
Using the camera model, the fisheye images can be backprojected to the unit sphere, as if the sensor is on the surface of the fisheye lens. The scene of the actual world is linearly projected to the lens, with the scale corresponding sequentially to that scale of the real world. As a result, the light is no longer nonlinearly projected to the sensor plane, and the consequent distortion is eliminated from the fisheye image that then becomes a backprojected image.
4.2. Triangulation
If the size of the selected region is fixed on the backprojected image to calculate the orientation of the keypoint or the descriptor, the number of pixels in the region varies when the location of the region changes. In theory, the number of the pixels decreases when the polar angle θ increases, which makes the feature points rotationvariant. In Figure 2, there are four panoramic test images, which contain degrees of distortion: 10%, 20%, 30%, and 40%, respectively. In Figure 3, the groundtruth of the keypoints are detected by SIFT in the images without degree of distortion. Then, the keypoints of the four test images with different degrees of distortion are compared with the groundtruth to estimate the keypoints detection effect. Repetition represents the percentage of keypoints, which are detected both in groundtruth and the test images. New detection represents the keypoints detected in the test images which are not detected in the groundtruth. Wrong detection represents the wrong points which are detected as keypoints in the test images. In Figure 4, shows the matching results of keypoints in four test images with groundtruth. Recall and precision of the four test images are calculated point by point. As shown in Figure 2, Figure 3 and Figure 4, the asymmetry introduces significant changes in the gradient histogram, and consequentially affects the orientation and the descriptor of the keypoints, which increases the difficulty in matching the keypoints.
In TriSIFT, we calculate the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation. An image is described in the threedimensional space using the coordinates of the horizontal pixel axis, vertical pixel axis, and grayscale. We assume that the gradient of a slope with a fixed orientation is A. If the slope has 5 pixels on it, the histogram on the particular location and orientation adds 5A. If the slope has 10 pixels on it, the histogram will add 10A. This indicates a significant difference. However, the area of the slope is fixed, and irrespective of the number of pixels on the slope, the sum of the areas is fixed.
To calculate the area of the surface, we triangulate the set of points P using Delaunay triangulation, which has a time complexity of $O(\mathrm{nlogn})$. The Delaunay triangulation for a set of points P in a plane is a triangulation DT(P) such that no point P is inside the circumcircle of any triangle in DT(P), as shown in (6). $V$ is the vertices of the polygon, and E is the edge between the vertices. Delaunay triangulations maximize the minimum angle of all the angles of the triangles in the triangulation, while avoiding skinny triangles.
$$\mathrm{DT}=\left(V,E\right)$$
We use Delaunay triangulation to calculate the orientation in Section 4.3 and the descriptor in Section 4.4.
4.3. Orientation
In the triSIFT, we do not calculate the gradient at each point in the region to determine the dominant orientation or descriptor; instead, the gradient at each triangle is obtained using Delaunay triangulation. The gradient magnitude and orientation are replaced by the area and the normal component of the triangle, respectively.
The sphere on which the image is located is a compact manifold of constant positive curvature. After the image has been backprojected to the sphere, each point, in spherical coordinates, is a threedimensional vector. We define the point set as ${P}_{s}$:
$${P}_{s}=\left\{p:p={(\mathrm{rsin}\theta \mathrm{cos}\phi ,\mathrm{rsin}\theta \mathrm{sin}\phi ,\mathrm{rcos}\theta )}^{T}\phi \in [0,2\pi ),\theta \in [0,\pi ),\mathrm{r}=1\right\}$$
Simultaneously, we define another point set as
$${P}_{g}=\left\{p:p={(g\mathrm{sin}\theta \mathrm{cos}\phi ,g\mathrm{sin}\theta \mathrm{sin}\phi ,g\mathrm{cos}\theta )}^{T}\phi \in [0,2\pi ),\theta \in [0,\pi ),g\in \left[0,1\right]\right\}$$
Obviously, the elements of ${P}_{g}$ and ${P}_{s}$ have the following relation functions,
in which ${p}_{s}\in {P}_{s},{p}_{g}\in {P}_{g}$. Then, for each considered keypoint of the sphere, we calculate the orientations of the surrounding points on the circular region with a radius $3\sigma $, which is centered at the keypoint (where σ is the scale at which each keypoint is located). To define this region, the distance between two points on the unit sphere, ${p}_{s1}\equiv ({\theta}_{1},{\phi}_{1})$ and ${p}_{s2}\equiv ({\theta}_{2},{\phi}_{2})$, must be calculated. The distance can be obtained using Vincenty’s formulae. The angular distance $\Delta \sigma $ is
and the distance between the two points is
$${p}_{s}={f}_{gs}({\mathrm{p}}_{g})$$
$${\mathrm{p}}_{g}={f}_{gs}{}^{1}({p}_{s})$$
$$\Delta \sigma =\mathrm{arctan}\left(\frac{\sqrt{{(\mathrm{sin}{\theta}_{2}\mathrm{sin}\Delta \phi )}^{2}+{(\mathrm{sin}{\theta}_{1}\mathrm{cos}{\theta}_{2}\mathrm{cos}{\theta}_{1}\mathrm{sin}{\theta}_{2}\mathrm{cos}\Delta \phi )}^{2}}}{\mathrm{cos}{\theta}_{1}\mathrm{cos}{\theta}_{2}+\mathrm{sin}{\theta}_{1}\mathrm{sin}{\theta}_{2}\mathrm{cos}\Delta \phi}\right)$$
$$d=r\Delta \sigma $$
The points on the plane are backprojected to the unit sphere, and the distance between the two points on the plane is not the same as that on the surface of the unit sphere. In a fisheye image with radial distortion, the distance between two adjacent pixels near the principal point is different from that near the edges. According to the fisheye camera model, the distortion at the principal point is almost 0. Therefore, at the same scale, we back project two adjacent pixels at the principal point on the plane to the sphere and compute the distance ${\mu}_{fg}$ using (11) and (12). We take this value as the unit of measurement on the sphere and we can obtain the transformation of the distance on the plane using (13)
where $di{s}_{sphere}$ is the distance between the two points on the plane, and $di{s}_{flat}$ is the distance between the two points on the sphere.
$$di{s}_{sphere}={\mu}_{fg}di{s}_{flat}$$
Assuming that ${p}_{se}$ is a keypoint in ${P}_{s}$, we select a circular window with the center at ${p}_{se}$ and radius $3{\mu}_{fg}\sigma $. The keypoints ${P}_{sori}$ within the circular window are shown in Figure 5. Then, we obtain another point set ${P}_{gori}={f}_{gs}({P}_{sori})$, as shown in Figure 6. The point set ${P}_{gori}$ is Delaunay triangulated to the triangle set ${S}_{tris}$, as shown in Figure 6:
Each triangle in ${S}_{tris}$ is calculated individually. Let us define two vectors $\overrightarrow{{G}_{1}{G}_{3}}$ and $\overrightarrow{{G}_{1}{G}_{2}}$ to represent the two edges of the triangle (cf. Figure 7). We have
$$\overrightarrow{{G}_{1}{G}_{3}}\times \overrightarrow{{G}_{1}{G}_{2}}=\left(\begin{array}{ccc}\overrightarrow{{\alpha}_{1}}& \overrightarrow{{\alpha}_{2}}& \overrightarrow{{\alpha}_{3}}\\ {g}_{1}& {g}_{2}& {g}_{3}\\ {h}_{1}& {h}_{2}& {h}_{3}\end{array}\right)={\left({n}_{{\alpha}_{1}},{n}_{{\alpha}_{2}},{n}_{{\alpha}_{3}}\right)}^{T}=\overrightarrow{{n}_{tri}}$$
We use the incenter of the triangle as its location. The incenter is the center of the inscribed circle and must be located in the triangle. Compared with the circumcenter and other representations, the incenter is more representative of the location of the triangle. The incenter ${O}_{tri}$ is obtained by
$$\overrightarrow{{G}_{2}{G}_{3}}\times \overrightarrow{{O}_{tri}{G}_{1}}+\overrightarrow{{G}_{1}{G}_{3}}\times \overrightarrow{{O}_{tri}{G}_{2}}+\overrightarrow{{G}_{1}{G}_{2}}\times \overrightarrow{{O}_{tri}{G}_{3}}=\overrightarrow{0}$$
After we have obtained the normal and location of each triangle, we calculate the dominant orientation based on this information. Unlike SIFT, we compute the area of the triangle instead of the gradient magnitudes by using
where A is the area of a triangle.
$$A={\scriptscriptstyle \frac{1}{2}}\Vert \overrightarrow{{n}_{tri}}\Vert $$
We denote the point ${O}_{tri}(\theta ,\phi ,g)$ as the location of a triangle and obtain ${O}_{stri}={f}_{gs}({O}_{tri})$, as shown in Figure 8. In the Cartesian coordinate system $\alpha $ (the basis is $\left({\overrightarrow{\alpha}}_{1},{\overrightarrow{\alpha}}_{2},{\overrightarrow{\alpha}}_{3}\right)$), the coordinates of ${O}_{stri}$ are
$${O}_{stri}={\left({k}_{1},{k}_{2},{k}_{3}\right)}^{T}={(\mathrm{cos}\phi \mathrm{sin}\theta ,\mathrm{sin}\phi \mathrm{sin}\theta ,\mathrm{cos}\theta )}^{T}$$
To facilitate the computation of the orientation of the triangle, we convert the basis from $\left({\overrightarrow{\alpha}}_{1},{\overrightarrow{\alpha}}_{2},{\overrightarrow{\alpha}}_{3}\right)$ to $\left({\overrightarrow{\beta}}_{1},{\overrightarrow{\beta}}_{2},{\overrightarrow{\beta}}_{3}\right)$, where ${\overrightarrow{\beta}}_{3}$ is a unit vector of $\overrightarrow{OP}$; ${\overrightarrow{\beta}}_{2}$ is the unit vector of the tangent vector that is in the meridian through ${O}_{stri}$ and ${\overrightarrow{\beta}}_{1}$ is determined by ${\overrightarrow{\beta}}_{2}$ and ${\overrightarrow{\beta}}_{3}$ according to the righthand rule
where $\alpha $ is the basis of the original coordinate system, and $\beta $ is the basis of the new coordinate system. Because $\alpha $ is a unit matrix, we compute the transitional matrix ${T}_{\alpha \beta}$ using
$$\alpha =\left({\overrightarrow{\alpha}}_{1},{\overrightarrow{\alpha}}_{2},{\overrightarrow{\alpha}}_{3}\right)={I}_{n}$$
$$\begin{array}{l}{\overrightarrow{\beta}}_{20}={k}_{1}{\overrightarrow{\alpha}}_{1}+{k}_{2}{\overrightarrow{\alpha}}_{2}+{k}_{3}{\overrightarrow{\alpha}}_{3},\\ {\overrightarrow{\beta}}_{30}={k}_{1}{\overrightarrow{\alpha}}_{1}{k}_{2}{\overrightarrow{\alpha}}_{2}+{\scriptscriptstyle \frac{{\mathrm{sin}}^{2}\theta}{\mathrm{cos}\theta}}{\overrightarrow{\alpha}}_{3},\\ {\overrightarrow{\beta}}_{10}={\overrightarrow{\beta}}_{20}\times {\overrightarrow{\beta}}_{30}\end{array}$$
$$\beta =\left({\overrightarrow{\beta}}_{1},{\overrightarrow{\beta}}_{2},{\overrightarrow{\beta}}_{3}\right)=\left({\scriptscriptstyle \frac{{\overrightarrow{\beta}}_{10}}{\Vert {\overrightarrow{\beta}}_{10}\Vert}},{\scriptscriptstyle \frac{{\overrightarrow{\beta}}_{20}}{\Vert {\overrightarrow{\beta}}_{20}\Vert}},{\scriptscriptstyle \frac{{\overrightarrow{\beta}}_{30}}{\Vert {\overrightarrow{\beta}}_{30}\Vert}}\right)$$
$$\beta =\alpha {T}_{\alpha \beta}\Rightarrow {T}_{\alpha \beta}={\alpha}^{1}\beta =\beta $$
According to ${T}_{\alpha \beta}$, we can convert $\overrightarrow{{n}_{tri}}$ to $\overrightarrow{{n}_{\beta tri}}$ into the $\beta $ coordinate system
$$\overrightarrow{{n}_{\beta tri}}={({n}_{\beta 1},{n}_{\beta 2},{n}_{\beta 3})}^{T}={T}_{\alpha \beta}\overrightarrow{{n}_{tri}}$$
We define the orientation of the triangle by
$${\varphi}_{ori}=arc\mathrm{tan}(\frac{{n}_{\beta 2}}{{n}_{\beta 1}})$$
The area value of each triangle adds to the histogram after being weighted by a Gaussian centered on the keypoint with 1.5 times that of the keypoint.
Finally, once the histogram has been computed, the dominant orientation is calculated. If there are bins greater than 0.8 times the biggest bin, they are also considered. This results in multiple dominant orientations for the same point.
The pseudo algorithm for the computation of the dominant orientation is presented as Algorithm 1. The pseudo algorithom for the computation of the dominant orientation is presented as Algorithm 1.
Algorithm 1 Algorithm for the computation of the dominant orientation 

4.4. The Descriptor Construction
The descriptors of the considered keypoints are computed using their corresponding dominant orientations as reference. This descriptor is a threedimensional histogram of orientations (two spatial dimensions and one dimension for orientations) in which all the orientations are considered with respect to the dominant orientation.
We set a keypoint ${p}_{se}$ as the center, with a circular window of radius $r=3{\mu}_{fg}\sigma $ to compute the descriptors. All points within the window are represented by the point set ${P}_{sdes}$. To achieve invariance to rotation, ${P}_{sdes}$ is rotated by the angle of the dominant orientation. As shown in Figure 8, we define the dominant orientation in the $\beta $ coordinate system, and convert the coordinates of all points in the window from the $\alpha $ to the $\beta $ coordinate system. After the rotation, the coordinate of ${p}_{sdes\beta}^{\prime}$ is converted to ${p}_{sdes}^{\prime}$ in the original $\alpha $ coordinate system. The calculation process is given below
where ${p}_{sdes}\in {P}_{sdes}$,${p}_{sdes}^{\prime}\in {P}_{sdes}^{\prime}$.
$${p}_{sdes\beta}={({\mathrm{k}}_{1},{\mathrm{k}}_{2},{\mathrm{k}}_{3})}^{T}={T}_{\alpha \beta}{p}_{sdes}$$
$${p}_{sdes\beta}^{\prime}=\left(\begin{array}{ccc}\mathrm{cos}{\varphi}_{ori}& \mathrm{sin}{\varphi}_{ori}& 0\\ \mathrm{sin}{\varphi}_{ori}& \mathrm{cos}{\varphi}_{ori}& 0\\ 0& 0& 1\end{array}\right){p}_{sdes\beta}$$
$${p}_{sdes}^{\prime}={T}_{\alpha \beta}^{1}{p}_{sdes\beta}^{\prime}$$
With the new point set ${P}_{sdes}^{\prime}$, we triangulate the point set ${f}_{gs}^{1}\left({P}_{sdes}^{\prime}\right)$ and obtain the set of triangles. The method for computing the location and the orientation of each triangle is described in Section 4.3. The form of our descriptor is an extension of GLOH, whose histogram has 17 × 8 bins (17 bins for the spatial dimension and 8 bins for the orientations). The descriptor is constructed on the sphere, and so, a square window such as the one in SIFT, is difficult to select, which increases the difficulties in the calculation. GLOH [19] is based on a circular window, which is easy to calculate on the sphere, and thus, its performance is better than SIFT.
Our descriptor is computed for a logpolar location grid with 3 bins in the radial orientation (the radius is set to 0.2, 0.5 and 1.0 of the original radius) and 8 bins in the angular orientation, which results in 17 location bins. Note that the central bin is not divided in angular orientations to avoid sudden changes in the location of the window. The gradient orientations are quantized in 8 bins. The central bin is not divided in angular orientations to avoid sudden changes in the location of the window. The gradient orientations are quantized in 8 bins. Each bin value corresponds to the weighted sum of the surface areas of the triangles, which are triangulated from the set of points inside the window, at the spatial location and orientation defined by the bin. The weight value is defined by a Gaussian centered on the keypoint and having a standard deviation of 1.5σ.
To avoid boundary effects, the values of each area sample are distributed by linear interpolation into adjacent histogram bins. The resulting histogram is normalized; each bin has a threshold of 0.2 and is normalized again, in order to make the histogram robust to contrast changes. The algorithm for computing the Local Spherical Descriptors is summarized in Algorithm 2.
Algorithm 2 Algorithm for the computation of LSD 

5. Experiment
To determine the losses of generality, the experimental image data contain various configurations of the camera: scaling, translation, affine transformation, and varying degrees of distortion. To test the matching performance in fisheye images, triSIFT is compared with the standard SIFT algorithm, rectSIFT (image correction before using SIFT algorithm) and RDSIFT (radial distortion SIFT) [10].
Figure 9 shows the experiments panoramic image pairs. In Figure 9a, the scale of left images is different to the scale of right images, and each two different scaled image is an image pair. In Figure 9b, the translation of left images is different to the translation of right images, which are captured by the same camera in the same scale. In Figure 9c, the left images have different affined angle with the right images, which are captured by the same camera. We matched the images in Figure 9 with the standard SIFT algorithm, rectSIFT, RDSIFT and triSIFT, sequentially. We removed the false matching points by using the RANSAC (Random Sample Consensus) algorithm to obtain the appropriate match points and plotted the 1precision versus recall curves of the four algorithms, as shown in Figure 10. By observing and comparing the various curves, we can see that the triSIFT algorithm shows a generally good performance in terms of the distortion degrees and various changes in scale, translation, and affine. The more distortion degrees, the worse matching results performed. While in the four methods, the matching result of proposed method has the smallest impact of distortion degrees, and the matching result of standard SIFT is most seriously influenced by the distortion degrees. The standard SIFT algorithm can obtain more points at a 10% degree of distortion. However, without any compensation for distortions in fisheye images, the performance of the standard SIFT dramatically decreases when the degree of distortion is more than 20%. For the RDSIFT algorithm, the performance is better at 10% and 20% degrees of distortion. However, when the degree of distortion continues to increase, the performance is not as exceptional, although it is still better than that of the standard SIFT and worse than that of the rectSIFT. The proposed triSIFT algorithm is superior to the rectSIFT in terms of performance at small degrees of distortion (10% and 20%), but it is inferior to the RDSIFT algorithm. However, in the case of a smaller number of match points, the proposed method shows better matching performance. In triSIFT, the calculation of the points is replaced by the calculation of the triangles. The method is thus more adaptable in the large distortion region and can obtain more initial and accurate match points.
The influence of the algorithms considering various poses and orientations is shown in Table 1 and Figure 11. In Table 1, we list the resulting matches of the standard SIFT, rectSIFT, RDSIFT and triSIFT for various changes in the camera pose (nearfar, translation, affine) and a degree of distortion of 20%. In Table 1, the initial match is keypoints matching without RANSAC algorithm, and correct match is the keypoints matching using RANSAC algorithm. Since there are some mismatching keypoints, the initial match can match more keypoints than the correct match. Compared with the SIFT, the triSIFT improves the matching performance by 24.5%, 12.1%, and 10.6% under the conditions of scaling, translating and affine, respectively. Compared with the other three methods, under the influence of conditions of scaling, translating and affine, respectively, although RDSIFT can get the highest initial match, correct match is not as high as TriSIFT. Besides, according to the initial match and correct match numbers, our method is much more than the standard SIFT and rectSIFT. While, considering the correct match can more obviously reflect the matching performance, our method gets the most correct match numbers. Thus, our method can get the best matching performance. From the data shown in Figure 11, we analyze the results of the standard SIFT and triSIFT, and the distribution of the match points obtained by the standard SIFT at the center of the image. The distortion at the image center is negligible. The distortion and scale on the image periphery are remarkable, which makes the standard SIFT algorithms unsuitable to be applied to the peripheral area. When the images have translation and affine distortion simultaneously, the matching becomes more complicated. However, because of triSIFT concerns about distortion, the matching points obtained by triSIFT can be distributed anywhere in the images, as shown in Figure 11b.
6. Conclusions
In this study, we investigated the problem of matching feature points in fisheye images. A triangulationbased detection and matching algorithm in fisheye images combined with the camera’s imaging model to eliminate the impact of distortion was proposed. This paper has demonstrated how radial distortion affects the performance of the original Scale SIFT algorithm. Then, we proposed the method that calculates the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation and the Delaunay triangulation is used to calculate the orientation and the descriptor. The experiments validate the robustness of the proposed method to distortion and demonstrate the achieved high efficiency in matching feature points in large distortion areas. Compared with SIFT algorithm, rectSIFT and RDSIFT, the proposed method can achieve the best matching performance. Besides, the TriSIFT can be applied into several robot vision tasks, 3D reconstruction based on panoramic images and other matching tasks with large distortion areas.
However, the proposed method also cannot match the image pairs with larger light effects correctly. The keypoints in images captured from the ground and images captured from sky, which have large affine transformation, cannot be match well. This kind of matching problem also needs other information to help the keypoints to match. Thus, the matching problems for the images captured from sky with large affine influence is the key future work, and it is useful for many driving automations, detailed 3D reconstruction and so on.
Author Contributions
Data curation, D.L.; Methodology, J.J.; Project administration, E.W.; Resources, J.T.; Software, J.Y.; Visualization, J.J.; Writing – original draft, E.W.; Writing – review & editing, J.J.
Funding
This research was funded by Natural Science Foundation of China grant number 61401455 and the authors also thank the support by Youth Innovation Promotion Association CAS.
Conflicts of Interest
The authors declare no conflict of interest.
References
 Barreto, J.; Santos, J.M.; Menezes, P.; Fonseca, F. Raybased calibration of rigid medical endoscopes. Available online: https://www.researchgate.net/publication/29622069_Raybased_Calibration_of_Rigid_Medical_Endoscopes (accessed on 23 November 2018).
 Baker, P.; Fermuller, C.; Aloimonos, Y.; Pless, R. A spherical eye from multiple cameras (makes better models of the world). In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
 Liu, N.; Zhang, B.; Jiao, Y.; Zhu, J. Feature matching method for uncorrected fisheye lens image. In Seventh International Symposium on Precision Mechanical Measurements; SPIE Press: Bellingham, WA, USA, 2016. [Google Scholar]
 Kim, D.; Paik, J. Threedimensional simulation method of fisheye lens distortion for a vehicle backup rearview camera. JOSA A 2015, 32, 1337–1343. [Google Scholar] [CrossRef] [PubMed]
 Zhang, B.; Jia, Y.; Röning, J.; Feng, W. Feature matching method study for uncorrected fisheye lens image. In Intelligent Robots and Computer Vision XXXII: Algorithms and Techniques; SPIE Press: Bellingham, WA, USA, 2015. [Google Scholar]
 Hansen, P.I.; Corke, P.; Boles, W.; Daniilidis, K. Scaleinvariant features on the sphere. In Proceedings of the IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–20 October 2007. [Google Scholar]
 Hansen, P.; Corke, P.; Boles, W. Wideangle visual feature matching for outdoor localization. Int. J. Robotics Res. 2010, 29, 267–297. [Google Scholar] [CrossRef]
 Miyamoto, K. Fish eye lens. JOSA 1964, 54, 1060–1061. [Google Scholar] [CrossRef]
 Lowe, D.G. Object recognition from local scaleinvariant features. In Proceedings of the International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
 Lourenço, M.; Barreto, J.P.; Vasconcelos, F. Srdsift: Keypoint detection and matching in images with radial distortion. TRO 2012, 28, 752–760. [Google Scholar] [CrossRef]
 Hughes, C.; Denny, P.; Glavin, M.; Jones, E. Equidistant fisheye calibration and rectification by vanishing point extraction. TPAMI 2010, 32, 2289–2296. [Google Scholar] [CrossRef] [PubMed]
 Se, S.; Lowe, D.; Little, J. Mobile robot localization and mapping with uncertainty using scaleinvariant visual landmarks. Int. J. Robotics Res. 2002, 21, 735–758. [Google Scholar] [CrossRef]
 Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speededup robust features (SURF). Comput. Vis. Image Und. 2008, 110, 346–359. [Google Scholar] [CrossRef]
 Wan, L.; Li, X.; Feng, W.; Long, B.; Zhu, J. Matching method of fisheye image based on CSLBP and MSCR regions. Available online: http://cea.ceaj.org/EN/column/column105.shtml (accessed on 22 November 2018).
 DING, C.X.; JIAO, Y.K.; LIU, L. Stereo Matching Method of Fisheye Lens Image Based on MSER and ASIFT. Available online: http://en.cnki.com.cn/Article_en/CJFDTOTALHGZD201801008.htm (accessed on 22 November 2018).
 Kannala, J.; Brandt, S.S. A generic camera model and calibration method for conventional, wideangle, and fisheye lenses. TPAMI 2006, 28, 1335–1340. [Google Scholar] [CrossRef] [PubMed]
 Arfaoui, A.; Thibault, S. Fisheye lens calibration using virtual grid. Appl. Opt. 2013, 52, 2577–2583. [Google Scholar] [CrossRef] [PubMed]
 Kanatani, K. Calibration of Ultrawide Fisheye Lens Cameras by Eigenvalue Minimization. TPAMI 2013, 35, 813–822. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. TPAMI 2005, 27, 1615–1630. [Google Scholar] [CrossRef] [PubMed][Green Version]
Figure 2.
Test images containing varying degrees of distortion: 10%, 20%, 30%, and 40%, in order from left to right.
Figure 3.
Detection estimation of the keypoints using the test images. Yaxis is the number of keypoints.
Figure 5.
The point cloud. The red points are in the circular window. The left figure shows the point set ${P}_{sori}$. The right figure shows the point set ${P}_{gori}$. X,Y,Zaxis represent rsinθ cos φ, rsinθsinφ, rcos θ.
Figure 6.
Result of triangulating the point set ${P}_{gori}$. X,Y,Zaxis represent gsinθ cosφ, gsinθ sinφ, gcos θ.
Figure 9.
Image pairs used in the experiment: the numbers in the left column are degree of distortion percentages; (a) shows the scaled image pairs; (b) shows the translated image pairs; (c) shows the affined image pairs.
Figure 10.
1precision vs. recall curves of the standard SIFT, rectSIFT, RDSIFT and triSIFT algorithms.
SIFT  RectSIFT  RDSIFT  TriSIFT  

Initial Match  Correct Match  Initial Match  Correct Match  Initial Match  Correct Match  Initial Match  Correct Match  
Scale  170  153  238  198  251  209  243  216 
Translation  97  83  102  91  116  89  108  98 
Affine  114  102  128  106  151  119  136  125 
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).