Tri-SIFT: A Triangulation-Based Detection and Matching Algorithm for Fish-Eye Images

: Keypoint matching is of fundamental importance in computer vision applications. Fish-eye lenses are convenient in such applications that involve a very wide angle of view. However, their use has been limited by the lack of an effective matching algorithm. The Scale Invariant Feature Transform (SIFT) algorithm is an important technique in computer vision to detect and describe local features in images. Thus, we present a Tri-SIFT algorithm, which has a set of modiﬁcations to the SIFT algorithm that improve the descriptor accuracy and matching performance for ﬁsh-eye images, while preserving its original robustness to scale and rotation. After the keypoint detection of the SIFT algorithm is completed, the points in and around the keypoints are back-projected to a unit sphere following a ﬁsh-eye camera model. To simplify the calculation in which the image is on the sphere, the form of descriptor is based on the modiﬁcation of the Gradient Location and Orientation Histogram (GLOH). In addition, to improve the invariance to the scale and the rotation in ﬁsh-eye images, the gradient magnitudes are replaced by the area of the surface, and the orientation is calculated on the sphere. Extensive experiments demonstrate that the performance of our modiﬁed algorithms outweigh that of SIFT and other related algorithms for ﬁsh-eye images


Introduction
Visual feature extraction and matching are the most basic and difficult problems in computer vision and application of optical engineering.Many applications are built on visual feature matching, such as robotic navigation, image stitching, 3D modeling, gesture recognition, and video tracking.In most of these applications, unconventional lensed cameras with nonlinear projection exhibit numerous advantages compared to regular cameras.A camera equipped with micro-lenses and borescopes enables the visual inspection of cavities that are difficult to access [1], whereas a camera equipped with a fish-eye lens can acquire wide field-of-view (FOV) images for a thorough visual coverage of environments.Such a camera also improves the performance of geomotion estimation by avoiding the ambiguity between translation and rotation motions [2,3].
However, the visual feature matching algorithms designed for perspective images cannot handle the strong radial distortion introduced by the optics [4][5][6][7].The entire hemispherical field is covered in front of the fish-eye camera, and the view angle of fish-eye lenses is in the range of 0 • -180 • .In addition, a fish-eye lenses obey other projection models because the hemispherical field of view cannot be projected on a finite image plane through a perspective projection.Thus, the fish-eye model is different from the common camera model, and the inherent distortion of a fish-eye lens is similar to that of the pinhole model [8].Because of the distinctive feature of the camera lens and the valuable wide angle of the images, fisheye images suffer from large radial distortion and change in scale according to the image locations.
In this paper, we propose the Tri-SIFT feature matching method to overcome radial distortion of fish-eye cameras.We demonstrate how radial distortion affects the performance of the original Scale Invariant Feature Transform (SIFT) algorithm and propose a set of modifications that improve the matching effectiveness.The paper provides a detailed account of the method, presenting details of a thorough analysis and experimental validation.
In detail, we propose a triangulation-based detection and matching algorithm combined with the camera's imaging model to eliminate the impact of distortion.Our method improves the robustness of the proposed method to distortion and enhances the efficiency of feature point matching in large distortion areas.
In Section 2, we present the related works.In Section 3, we briefly introduce the SIFT algorithm.In Section 4, we describe the proposed tri-SIFT.In Section 5, we present and discuss the experimental results.Finally, we summarize the features of the proposed algorithm in Section 6.

Related Work
SIFT is an algorithm in computer vision used to extract and describe local features in images [9].It is able to extract stable features from resized and rotated images.The SIFT algorithm exhibits stable performance in terms of the images' scale, size, and noise in the Gaussian scale space.Moreover, the SIFT algorithm can adapt to perspective and lighting transformations.SIFT's superior performance has rapidly made it the most commonly used feature extraction algorithm.
Recently, several algorithms concerning keypoint detection and matching in fish-eye images have been proposed [5][6][7]10,11].In a series of studies [6,7], Hansen, Corke, and Boles proposed a method that involves using stereographic projections for approximating the diffusion on a sphere.In their methods, SIFT was modified for images with significant distortion.In [10], Lourenço et al. proposed the adaptive Gaussian filtering to correct the SIFT algorithm.This method detects keypoints by looking for extrema in a scale-space representation obtained using a kernel that adapts the distortion at each image pixel position.It also achieved description invariance to RD (radial distortion) by implementing implicit gradient correction using the Jacobian of the distortion function.In [11], Denny et al. described a method to photogrammetrically estimate the intrinsic and extrinsic parameters of fish-eye cameras, with the aim of providing a rectified image for scene viewing applications.While some works simply ignored the pernicious effects of the radial distortion and directly applied the original algorithm to distorted images [12], others performed a preliminary correction of distortion through image rectification and then applied SIFT [13].The latter approach is quite straightforward, but it has two major drawbacks: the explicit distortion correction can be computationally expensive for the case of large frames; more importantly, the interpolation required by the image rectification introduces artifacts that affect the detection repeatability.

SIFT Algorithm Theory
In this section, we briefly introduce the SIFT algorithm, which the Tri-SIFT algorithm is inspired by.The major steps of the SIFT algorithm are: detecting the threshold in scale space, locating features, selecting the dominant orientation for feature points, and establishing the features' descriptor.For an image I(x,y), G(x, y, σ) = 1 2πσ 2 e −(x 2 +y 2 ) /2σ 2  (1) using the Gaussian function (1) as the convolution kernel, the scale space of a two-dimensional image can be obtained using a Gaussian kernel convolution.σ is the width parameter of the function, which controls the radial extent of the function.
The SIFT algorithm determines the feature points by detecting local keypoints in a two-dimensional Difference of Gaussian (DoG) scale space to ensure the unique and stable feature points.The DoG operator is defined as the subtraction of the two different scales of the Gaussian kernel.k is the scale factor.It is the approximation of the normalized 16 × 16 Laplacian of Gaussian (LoG) The feature point requires a total of 26 neighborhood pixels to ensure that the chosen pixel is the local keypoint in the scale space and the two-dimensional image space.SIFT detects the local keypoint by fitting a three-dimensional quadratic function to determine the location and scale of feature points (up to sub-pixel accuracy).In addition, the SIFT algorithm excludes low-contrast feature points and unstable edge response points to enhance the matching stability and suppress noise.By assigning a dominant orientation for every feature point, the feature point's descriptor is described in its dominant orientation to achieve rotation invariance.The value of the gradient m(x, y) and the orientation θ(x, y) of each image L(x, y) are obtained by the differences between pixel points.
Finally, a 16 × 16 neighborhood window of the feature point in the rotated image is obtained, and the window is evenly divided into sub-regions.A gradient orientation histogram of eight orientations in every sub-region is calculated, and the values of all gradient d orientations are accumulated.The feature point's descriptor is a 4 × 4 × 8 = 128 dimensional vector.Then, SIFT re-normalizes the dimensional vector to eliminate the impact of the light transform.

Tri-SIFT Algorithm
We propose the Tri-SIFT algorithm, which is an extension of the SIFT algorithm, for application to fish-eye images.The first stage of Tri-SIFT is the search for keypoints over all scales and image locations.It is implemented efficiently by using a DoG function to identify potential interest points that are invariant to the scale and orientation.For each candidate location, a detailed model is fit to determine the location and scale, and keypoints are selected based on their stability.To match the points extracted from different fish-eye images, and those obtained using the proposed algorithm, a Local Spherical Descriptor (LSD) is computed at each point on the surface of a unit sphere.The descriptor is obtained using the spherical representation of the image and consists of a set of histograms of orientations in the region around the given point.The size of the region depends on the scale (σ) at which the point has been detected.The magnitude of the LSD is calculated by the area of the triangle obtained by triangulating the points in a circular area surrounding the keypoint.The orientation of the LSD is along the normal component of the plane determined by the three vertices of the triangle.
In this section, we first introduce back-projection and triangulation and the Delaunay triangulation algorithm, and then describe the method of calculation of the dominant orientation and the descriptor construction.

Back-Projection
The distortion caused by the nonlinear projection of a fish-eye camera lens causes nonuniform compression of the image structures, which affects the SIFT matching performance.The conventional method is to rectify the fish-eye image to the undistorted image by explicitly correcting the distortion and applying classical SIFT to the rectified image [14,15].The solution is straightforward; however, the problem is that the distortion correction by image resampling requires reconstruction of the signal from the initial discrete image.Thus, there are high-frequency components that cannot be recovered (e.g., low resolution and aliasing), and the reconstruction filters are imperfect.This outcome negatively affects the construction of the descriptor and decreases thus the accuracy of keypoint matching on images.
In this paper, we propose a model-based approach by transforming the fish-eye image to its original state, in which the lights of the physical world pass through the camera lens.As shown in Figure 1, the projection process of the spherical model for an omnidirectional camera can be divided into two steps [16][17][18].We assume a point P(X, Y, Z) in space to demonstrate these two steps.In the first step, the point is linearly projected along the incident ray to a point p on the unit sphere, where θ is the angle between the incident ray oP and the principal axis z c .r is the distance between the image point and the principal point.In the second step, the point p is then non-linearly projected to a point p on the image plane XOY.There are several mathematical models to describe the second projection step, such as the following polynomial formulation.
where k 1 is the radial or polar distance from the image point to the origin of the world coordinate system; i.e., k 1 is the focal length.k 2 is the distortion coefficient.X and Y are the image coordinates, and x c and y c are the pixel coordinates of the principal point.ϕ is the angle between the X-axis and the radial line passing through the image point p. m u and m v are the two scale factors denoting the number of pixels per unit distance in the horizontal and vertical orientations, which should be known beforehand.The mapping between the point P in the space and the image point p is reversible, and the reversal can be performed by using (5)-the detailed process is reported in [16].
Information 2018, 9, x FOR PEER REVIEW 4 of 16 In this paper, we propose a model-based approach by transforming the fish-eye image to its original state, in which the lights of the physical world pass through the camera lens.As shown in Figure 1, the projection process of the spherical model for an omnidirectional camera can be divided into two steps [16][17][18].We assume a point ( , , ) P X Y Z in space to demonstrate these two steps.In the first step, the point is linearly projected along the incident ray to a point p  on the unit sphere, where  is the angle between the incident ray oP and the principal axis c z .r is the distance between the image point and the principal point.In the second step, the point p  is then non-linearly projected to a point p on the image plane XOY .There are several mathematical models to describe the second projection step, such as the following polynomial formulation.
where 1 k is the radial or polar distance from the image point to the origin of the world coordinate system; i.e., 1 k is the focal length. 2 k is the distortion coefficient.X and Y are the image coordinates, and xc and yc are the pixel coordinates of the principal point.is the angle between the X -axis and the radial line passing through the image point p .u m and v m are the two scale factors denoting the number of pixels per unit distance in the horizontal and vertical orientations, which should be known beforehand.The mapping between the point P in the space and the image point p is reversible, and the reversal can be performed by using (5) -the detailed process is reported in [16].
Using the camera model, the fish-eye images can be back-projected to the unit sphere, as if the sensor is on the surface of the fish-eye lens.The scene of the actual world is linearly projected to the lens, with the scale corresponding sequentially to that scale of the real world.As a result, the light is no longer non-linearly projected to the sensor plane, and the consequent distortion is eliminated from the fish-eye image that then becomes a back-projected image.Using the camera model, the fish-eye images can be back-projected to the unit sphere, as if the sensor is on the surface of the fish-eye lens.The scene of the actual world is linearly projected to the lens, with the scale corresponding sequentially to that scale of the real world.As a result, the light is no longer non-linearly projected to the sensor plane, and the consequent distortion is eliminated from the fish-eye image that then becomes a back-projected image.

Triangulation
If the size of the selected region is fixed on the back-projected image to calculate the orientation of the keypoint or the descriptor, the number of pixels in the region varies when the location of the region changes.In theory, the number of the pixels decreases when the polar angle θ increases, which makes the feature points rotation-variant.In Figure 2, there are four panoramic test images, which contain degrees of distortion: 10%, 20%, 30%, and 40%, respectively.In Figure 3, the groundtruth of the keypoints are detected by SIFT in the images without degree of distortion.Then, the keypoints of the four test images with different degrees of distortion are compared with the groundtruth to estimate the keypoints detection effect.Repetition represents the percentage of keypoints, which are detected both in groundtruth and the test images.New detection represents the keypoints detected in the test images which are not detected in the groundtruth.Wrong detection represents the wrong points which are detected as keypoints in the test images.In Figure 4, shows the matching results of keypoints in four test images with groundtruth.Recall and precision of the four test images are calculated point by point.As shown in Figures 2-4, the asymmetry introduces significant changes in the gradient histogram, and consequentially affects the orientation and the descriptor of the keypoints, which increases the difficulty in matching the keypoints.In Tri-SIFT, we calculate the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation.An image is described in the three-dimensional space using the coordinates of the horizontal pixel axis, vertical pixel axis, and grayscale.We assume that the gradient of a slope with a fixed orientation is A. If the slope has 5 pixels on it, the histogram   In Tri-SIFT, we calculate the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation.An image is described in the three-dimensional space using the coordinates of the horizontal pixel axis, vertical pixel axis, and grayscale.We assume that the gradient of a slope with a fixed orientation is A. If the slope has 5 pixels on it, the histogram   In Tri-SIFT, we calculate the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation.An image is described in the three-dimensional space using the coordinates of the horizontal pixel axis, vertical pixel axis, and grayscale.We assume that the gradient of a slope with a fixed orientation is A. If the slope has 5 pixels on it, the histogram In Tri-SIFT, we calculate the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation.An image is described in the three-dimensional space Information 2018, 9, 299 6 of 15 using the coordinates of the horizontal pixel axis, vertical pixel axis, and grayscale.We assume that the gradient of a slope with a fixed orientation is A. If the slope has 5 pixels on it, the histogram on the particular location and orientation adds 5A.If the slope has 10 pixels on it, the histogram will add 10A.This indicates a significant difference.However, the area of the slope is fixed, and irrespective of the number of pixels on the slope, the sum of the areas is fixed.
To calculate the area of the surface, we triangulate the set of points P using Delaunay triangulation, which has a time complexity of O(nlogn).The Delaunay triangulation for a set of points P in a plane is a triangulation DT(P) such that no point P is inside the circumcircle of any triangle in DT(P), as shown in (6).V is the vertices of the polygon, and E is the edge between the vertices.Delaunay triangulations maximize the minimum angle of all the angles of the triangles in the triangulation, while avoiding skinny triangles.
DT = (V, E) We use Delaunay triangulation to calculate the orientation in Section 4.3 and the descriptor in Section 4.4.

Orientation
In the tri-SIFT, we do not calculate the gradient at each point in the region to determine the dominant orientation or descriptor; instead, the gradient at each triangle is obtained using Delaunay triangulation.The gradient magnitude and orientation are replaced by the area and the normal component of the triangle, respectively.
The sphere on which the image is located is a compact manifold of constant positive curvature.After the image has been back-projected to the sphere, each point, in spherical coordinates, is a three-dimensional vector.We define the point set as P s : Simultaneously, we define another point set as Obviously, the elements of P g and P s have the following relation functions, p g = f gs −1 (p s ) (10) in which p s ∈ P s , p g ∈ P g .Then, for each considered keypoint of the sphere, we calculate the orientations of the surrounding points on the circular region with a radius 3σ, which is centered at the keypoint (where σ is the scale at which each keypoint is located).To define this region, the distance between two points on the unit sphere, p s1 ≡ (θ 1 , ϕ 1 ) and p s2 ≡ (θ 2 , ϕ 2 ), must be calculated.The distance can be obtained using Vincenty's formulae.The angular distance ∆σ is 11) and the distance between the two points is The points on the plane are back-projected to the unit sphere, and the distance between the two points on the plane is not the same as that on the surface of the unit sphere.In a fish-eye image with radial distortion, the distance between two adjacent pixels near the principal point is different from that near the edges.According to the fish-eye camera model, the distortion at the principal point is almost 0. Therefore, at the same scale, we back project two adjacent pixels at the principal point on the plane to the sphere and compute the distance µ f g using ( 11) and (12).We take this value as the unit of measurement on the sphere and we can obtain the transformation of the distance on the plane using (13) dis sphere = µ f g dis f lat (13) where dis sphere is the distance between the two points on the plane, and dis f lat is the distance between the two points on the sphere.
Assuming that p se is a keypoint in P s , we select a circular window with the center at p se and radius 3µ f g σ.The keypoints P sori within the circular window are shown in Figure 5.Then, we obtain another point set P gori = f gs (P sori ), as shown in Figure 6.The point set P gori is Delaunay triangulated to the triangle set S tris , as shown in Figure 6: the plane to the sphere and compute the distance fg  using ( 11) and (12).We take this value as the unit of measurement on the sphere and we can obtain the transformation of the distance on the plane using ( 13) where sphere dis is the distance between the two points on the plane, and flat dis is the distance between the two points on the sphere.Assuming that se p is a keypoint in s P , we select a circular window with the center at se p and radius 3 fg   .The keypoints sori P within the circular window are shown in Figure 5.Then, we obtain another point set ( )

G G
 to represent the two edges of the triangle (cf. Figure 7).We have   , , the plane to the sphere and compute the distance fg μ using ( 11) and (12).We take this value as the unit of measurement on the sphere and we can obtain the transformation of the distance on the plane using ( 13)

G G
 to represent the two edges of the triangle (cf. Figure 7).We have ( ) Each triangle in S tris is calculated individually.Let us define two vectors − −− → G 1 G 3 and − −− → G 1 G 2 to represent the two edges of the triangle (cf. Figure 7).We have We use the incenter of the triangle as its location.The incenter is the center of the inscribed circle and must be located in the triangle.Compared with the circumcenter and other representations, the incenter is more representative of the location of the triangle.The incenter tri O is obtained by After we have obtained the normal and location of each triangle, we calculate the dominant orientation based on this information.Unlike SIFT, we compute the area of the triangle instead of the gradient magnitudes by using where A is the area of a triangle., , (cos sin ,sin sin ,cos ) To facilitate the computation of the orientation of the triangle, we convert the basis from    We use the incenter of the triangle as its location.The incenter is the center of the inscribed circle and must be located in the triangle.Compared with the circumcenter and other representations, the incenter is more representative of the location of the triangle.The incenter O tri is obtained by After we have obtained the normal and location of each triangle, we calculate the dominant orientation based on this information.Unlike SIFT, we compute the area of the triangle instead of the gradient magnitudes by using where A is the area of a triangle.We use the incenter of the triangle as its location.The incenter is the center of the inscribed circle and must be located in the triangle.Compared with the circumcenter and other representations, the incenter is more representative of the location of the triangle.The incenter tri O is obtained by After we have obtained the normal and location of each triangle, we calculate the dominant orientation based on this information.Unlike SIFT, we compute the area of the triangle instead of the gradient magnitudes by using where A is the area of a triangle., , (cos sin ,sin sin ,cos ) To facilitate the computation of the orientation of the triangle, we convert the basis from , , , , Information 2018, 9, 299 where α is the basis of the original coordinate system, and β is the basis of the new coordinate system.Because α is a unit matrix, we compute the transitional matrix T αβ using We define the orientation of the triangle by The area value of each triangle adds to the histogram after being weighted by a Gaussian centered on the keypoint with 1.5 times that of the keypoint.
Finally, once the histogram has been computed, the dominant orientation is calculated.If there are bins greater than 0.8 times the biggest bin, they are also considered.This results in multiple dominant orientations for the same point.
The pseudo algorithm for the computation of the dominant orientation is presented as Algorithm 1.The pseudo algorithom for the computation of the dominant orientation is presented as Algorithm 1.
Algorithm 1 Algorithm for the computation of the dominant orientation 1.
for each considered keypoint (x i , y i ) do 2.
Select a circular region of size 3µ f g σ centered at (θ i , ϕ i , 1) and obtain the point set P sori

5.
Obtain the point set P gori = p g : p g = f gs −1 (p s ) p s ∈ P sori

6.
Triangulate the point set P gori

7.
bin←Compute the orientation and the area of each triangle after being weighted by Gaussian operators 8.
max←maximum value inside bin 9.
for each bin value ≥ 0.8 max do 10.create a feature with corresponding orientation 11. end for 12. end for

The Descriptor Construction
The descriptors of the considered keypoints are computed using their corresponding dominant orientations as reference.This descriptor is a three-dimensional histogram of orientations (two spatial dimensions and one dimension for orientations) in which all the orientations are considered with respect to the dominant orientation.
We set a keypoint p se as the center, with a circular window of radius r = 3µ f g σ to compute the descriptors.All points within the window are represented by the point set P sdes .To achieve invariance to rotation, P sdes is rotated by the angle of the dominant orientation.As shown in Figure 8, we define the dominant orientation in the β coordinate system, and convert the coordinates of all points in the window from the α to the β coordinate system.After the rotation, the coordinate of p sdesβ is converted to p sdes in the original α coordinate system.The calculation process is given below where p sdes ∈ P sdes ,p sdes ∈ P sdes .With the new point set P sdes , we triangulate the point set f −1 gs P sdes and obtain the set of triangles.The method for computing the location and the orientation of each triangle is described in Section 4.3.The form of our descriptor is an extension of GLOH, whose histogram has 17 × 8 bins (17 bins for the spatial dimension and 8 bins for the orientations).The descriptor is constructed on the sphere, and so, a square window such as the one in SIFT, is difficult to select, which increases the difficulties in the calculation.GLOH [19] is based on a circular window, which is easy to calculate on the sphere, and thus, its performance is better than SIFT.
Our descriptor is computed for a log-polar location grid with 3 bins in the radial orientation (the radius is set to 0.2, 0.5 and 1.0 of the original radius) and 8 bins in the angular orientation, which results in 17 location bins.Note that the central bin is not divided in angular orientations to avoid sudden changes in the location of the window.The gradient orientations are quantized in 8 bins.The central bin is not divided in angular orientations to avoid sudden changes in the location of the window.The gradient orientations are quantized in 8 bins.Each bin value corresponds to the weighted sum of the surface areas of the triangles, which are triangulated from the set of points inside the window, at the spatial location and orientation defined by the bin.The weight value is defined by a Gaussian centered on the keypoint and having a standard deviation of 1.5σ.
To avoid boundary effects, the values of each area sample are distributed by linear interpolation into adjacent histogram bins.The resulting histogram is normalized; each bin has a threshold of 0.2 and is normalized again, in order to make the histogram robust to contrast changes.The algorithm for computing the Local Spherical Descriptors is summarized in Algorithm 2.
Algorithm 2 Algorithm for the computation of LSD 1.
for each considered keypoint (x i , y i ) do 2.
P sdes ←the points in a circular window (the center is p se , the radius is 3µ f g σ)

6.
Compute the transition matrix of p se 7.
P sdes is rotated into P sdes with the angle φ ori
The window is divided into 17 bins 10. for each triangle do 11.Compute the orientation and the location of the triangle 12. Determine the location in the histogram using the angle, the orientation and the distance 13. bin←compute the area of the triangle after being weighted by Gaussian operators 14. end for 15.Descriptor vector←transform bin 16. end for

Experiment
To determine the losses of generality, the experimental image data contain various configurations of the camera: scaling, translation, affine transformation, and varying degrees of distortion.To test the matching performance in fish-eye images, tri-SIFT is compared with the standard SIFT algorithm, rect-SIFT (image correction before using SIFT algorithm) and RD-SIFT (radial distortion SIFT) [10].
Figure 9 shows the experiments panoramic image pairs.In Figure 9a, the scale of left images is different to the scale of right images, and each two different scaled image is an image pair.In Figure 9b, the translation of left images is different to the translation of right images, which are captured by the same camera in the same scale.In Figure 9c, the left images have different affined angle with the right images, which are captured by the same camera.We matched the images in Figure 9 with the standard SIFT algorithm, rect-SIFT, RD-SIFT and tri-SIFT, sequentially.We removed the false matching points by using the RANSAC (Random Sample Consensus) algorithm to obtain the appropriate match points and plotted the 1-precision versus recall curves of the four algorithms, as shown in Figure 10.By observing and comparing the various curves, we can see that the tri-SIFT algorithm shows a generally good performance in terms of the distortion degrees and various changes in scale, translation, and affine.The more distortion degrees, the worse matching results performed.While in the four methods, the matching result of proposed method has the smallest impact of distortion degrees, and the matching result of standard SIFT is most seriously influenced by the distortion degrees.The standard SIFT algorithm can obtain more points at a 10% degree of distortion.However, without any compensation for distortions in fisheye images, the performance of the standard SIFT dramatically decreases when the degree of distortion is more than 20%.For the RD-SIFT algorithm, the performance is better at 10% and 20% degrees of distortion.However, when the degree of distortion continues to increase, the performance is not as exceptional, although it is still better than that of the standard SIFT and worse than that of the rect-SIFT.The proposed tri-SIFT algorithm is superior to the rect-SIFT in terms of performance at small degrees of distortion (10% and 20%), but it is inferior to the RD-SIFT algorithm.However, in the case of a smaller number of match points, the proposed method shows better matching performance.In tri-SIFT, the calculation of the points is replaced by the calculation of the triangles.The method is thus more adaptable in the large distortion region and can obtain more initial and accurate match points.

Experiment
To determine the losses of generality, the experimental image data contain various configurations of the camera: scaling, translation, affine transformation, and varying degrees of distortion.To test the matching performance in fish-eye images, tri-SIFT is compared with the standard SIFT algorithm, rect-SIFT (image correction before using SIFT algorithm) and RD-SIFT (radial distortion SIFT) [10] .Figure 9 shows the experiments panoramic image pairs.In Figure 9(a), the scale of left images is different to the scale of right images, and each two different scaled image is an image pair.In Figure 9(b), the translation of left images is different to the translation of right images, which are captured by the same camera in the same scale.In Figure 9(c), the left images have different affined angle with the right images, which are captured by the same camera.We matched the images in Figure 9  the standard SIFT algorithm, rect-SIFT, RD-SIFT and tri-SIFT, sequentially.We removed the false matching points by using the RANSAC (Random Sample Consensus) algorithm to obtain the appropriate match points and plotted the 1-precision versus recall curves of the four algorithms, as shown in Figure 10.By observing and comparing the various curves, we can see that the tri-SIFT algorithm shows a generally good performance in terms of the distortion degrees and various changes in scale, translation, and affine.The more distortion degrees, the worse matching results performed.While in the four methods, the matching result of proposed method has the smallest impact of distortion degrees, and the matching result of standard SIFT is most seriously influenced by the distortion degrees.The standard SIFT algorithm can obtain more points at a 10% degree of distortion.However, without any compensation for distortions in fisheye images, the performance of the standard SIFT dramatically decreases when the degree of distortion is more than 20%.For the RD-SIFT algorithm, the performance is better at 10% and 20% degrees of distortion.However, when the degree of distortion continues to increase, the performance is not as exceptional, although it is still better than that of the standard SIFT and worse than that of the rect-SIFT.The proposed tri-SIFT algorithm is superior to the rect-SIFT in terms of performance at small degrees of distortion (10% and 20%), but it is inferior to the RD-SIFT algorithm.However, in the case of a smaller number of match points, the proposed method shows better matching performance.In tri-SIFT, the calculation of the points is replaced by the calculation of the triangles.The method is thus more adaptable in the large distortion region and can obtain more initial and accurate match points.The influence of the algorithms considering various poses and orientations is shown in Table 1 and Figure 11.In Table 1, we list the resulting matches of the standard SIFT, rect-SIFT, RD-SIFT and tri-SIFT for various changes in the camera pose (near-far, translation, affine) and a degree of distortion of 20%.In Table 1, the initial match is keypoints matching without RANSAC algorithm, and correct match is the keypoints matching using RANSAC algorithm.Since there are some mismatching keypoints, the initial match can match more keypoints than the correct match.Compared with the SIFT, the tri-SIFT improves the matching performance by 24.5%, 12.1%, and 10.6% under the conditions of scaling, translating and affine, respectively.Compared with the other three methods, under the influence of conditions of scaling, translating and affine, respectively, although RD-SIFT can get the highest initial match, correct match is not as high as Tri-SIFT.Besides, according to the initial match and correct match numbers, our method is much more than the standard SIFT and rect-SIFT.While, considering the correct match can The influence of the algorithms considering various poses and orientations is shown in Table 1 and Figure 11.In Table 1, we list the resulting matches of the standard SIFT, rect-SIFT, RD-SIFT and tri-SIFT for various changes in the camera pose (near-far, translation, affine) and a degree of distortion of 20%.In Table 1, the initial match is keypoints matching without RANSAC algorithm, and correct match is the keypoints matching using RANSAC algorithm.Since there are some mismatching keypoints, the initial match can match more keypoints than the correct match.Compared with the SIFT, the tri-SIFT improves the matching performance by 24.5%, 12.1%, and 10.6% under the conditions of scaling, translating and affine, respectively.Compared with the other three methods, under the influence of conditions of scaling, translating and affine, respectively, although RD-SIFT can get the highest initial match, correct match is not as high as Tri-SIFT.Besides, according to the initial match and correct match numbers, our method is much more than the standard SIFT and rect-SIFT.While, considering the correct match can more obviously reflect the matching performance, our method gets the most correct match numbers.Thus, our method can get the best matching performance.From the data shown in Figure 11, we analyze the results of the standard SIFT and tri-SIFT, and the distribution of the match points obtained by the standard SIFT at the center of the image.The distortion at the image center is negligible.The distortion and scale on the image periphery are remarkable, which makes the standard SIFT algorithms unsuitable to be applied to the peripheral area.When the images have translation and affine distortion simultaneously, the matching becomes more complicated.However, because of tri-SIFT concerns about distortion, the matching points obtained by tri-SIFT can be distributed anywhere in the images, as shown in Figure 11b.

Conclusions
In this study, we investigated the problem of matching feature points in fisheye images.A triangulation-based detection and matching algorithm in fish-eye images combined with the camera's imaging model to eliminate the impact of distortion was proposed.This paper has demonstrated how radial distortion affects the performance of the original Scale SIFT algorithm.

Conclusions
In this study, we investigated the problem of matching feature points in fisheye images.A triangulation-based detection and matching algorithm in fish-eye images combined with the camera's imaging model to eliminate the impact of distortion was proposed.This paper has demonstrated how radial distortion affects the performance of the original Scale SIFT algorithm.Then, we proposed the method that calculates the area of the surface in a region instead of the gradient such that the number of keypoints is invariant to orientation and the Delaunay triangulation is used to calculate the orientation and the descriptor.The experiments validate the robustness of the proposed method to distortion and demonstrate the achieved high efficiency in matching feature points in large distortion areas.Compared with SIFT algorithm, rect-SIFT and RD-SIFT, the proposed method can achieve the best matching performance.Besides, the Tri-SIFT can be applied into several robot vision tasks, 3D reconstruction based on panoramic images and other matching tasks with large distortion areas.
However, the proposed method also cannot match the image pairs with larger light effects correctly.The keypoints in images captured from the ground and images captured from sky, which have large affine transformation, cannot be match well.This kind of matching problem also needs other information to help the keypoints to match.Thus, the matching problems for the images captured from sky with large affine influence is the key future work, and it is useful for many driving automations, detailed 3D reconstruction and so on.

Figure 1 .
Figure 1.Projection of a point.

Figure 1 .
Figure 1.Projection of a point.

Information 2018, 9 ,
x FOR PEER REVIEW 5 of 16 represents the wrong points which are detected as keypoints in the test images.In Figure4, shows the matching results of keypoints in four test images with groundtruth.Recall and precision of the four test images are calculated point by point.As shown in Figures2-4, the asymmetry introduces significant changes in the gradient histogram, and consequentially affects the orientation and the descriptor of the keypoints, which increases the difficulty in matching the keypoints.

Figure 3 .Figure 4 .
Figure 3. Detection estimation of the keypoints using the test images.Y-axis is the number of keypoints.

Figure 3 .Figure 4 .
Figure 3. Detection estimation of the keypoints using the test images.Y-axis is the number of keypoints.

Figure 3 .
Figure 3. Detection estimation of the keypoints using the test images.Y-axis is the number of keypoints.

Figure 3 .Figure 4 .
Figure 3. Detection estimation of the keypoints using the test images.Y-axis is the number of keypoints.
in Figure 6.The point set gori P is Delaunay triangulated to the triangle set tris S , as shown in Figure 6:

Figure 5 .Figure 6 . 1 G G  and 1 2
Figure 5.The point cloud.The red points are in the circular window.The left figure shows the point set sori P .The right figure shows the point set gori P .X,Y,Z-axis represent rsinθ cos φ, rsinθsinφ, rcos θ

Figure 5 .
Figure 5.The point cloud.The red points are in the circular window.The left figure shows the point set P sori .The right figure shows the point set P gori .X,Y,Z-axis represent rsinθ cos ϕ, rsinθsinϕ, rcos θ.
between the two points on the plane, and flat dis is the distance between the two points on the sphere.Assuming that se p is a keypoint in s P , we select a circular window with the center at se p and radius 3 fg μ σ .The keypoints sori P within the circular window are shown in Figure 5.Then, we obtain another point set in Figure 6.The point set gori P is Delaunay triangulated to the triangle set tris S , as shown in Figure 6:

Figure 5 .Figure 6 .
Figure 5.The point cloud.The red points are in the circular window.The left figure shows the point set sori P .The right figure shows the point set gori P .X,Y,Z-axis represent rsinθ cos φ, rsinθsinφ, rcos θ

2 is determined by 2  and 3 
is a unit vector of OP  ;  is the unit vector of the tangent vector that is in the meridian through stri O and 1       according to the right-hand rule

Figure 8 .
Figure 8. α and β coordinate systems.To facilitate the computation of the orientation of the triangle, we convert the basis from

Figure 9 .
Figure 9. Image pairs used in the experiment: the numbers in the left column are degree of distortion percentages; (a) shows the scaled image pairs; (b) shows the translated image pairs; (c) shows the affined image pairs.

Figure 9 .
Figure9shows the experiments panoramic image pairs.In Figure9(a), the scale of left images is different to the scale of right images, and each two different scaled image is an image pair.In Figure9(b), the translation of left images is different to the translation of right images, which are captured by the same camera in the same scale.In Figure9(c), the left images have different affined angle with the right images, which are captured by the same camera.We matched the images in Figure9with

Table 1 .
Matching results of the four algorithms.

Table 1 .
Matching results of the four algorithms.