Next Article in Journal
Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image
Next Article in Special Issue
Spatial Location in Integrated Circuits through Infrared Microscopy
Previous Article in Journal
Exhaustive Description of the System Architecture and Prototype Implementation of an IoT-Based eHealth Biometric Monitoring System for Elders in Independent Living
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FSD-BRIEF: A Distorted BRIEF Descriptor for Fisheye Image Based on Spherical Perspective Model

1
Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education, School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
2
The Department of Applied Mathematics, The University of Waterloo, Waterloo, ON N2L 3G1, Canada
3
Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield S1 3JD, UK
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(5), 1839; https://doi.org/10.3390/s21051839
Submission received: 9 January 2021 / Revised: 20 February 2021 / Accepted: 25 February 2021 / Published: 6 March 2021

Abstract

:
Fisheye images with a far larger Field of View (FOV) have severe radial distortion, with the result that the associated image feature matching process cannot achieve the best performance if the traditional feature descriptors are used. To address this challenge, this paper reports a novel distorted Binary Robust Independent Elementary Feature (BRIEF) descriptor for fisheye images based on a spherical perspective model. Firstly, the 3D gray centroid of feature points is designed, and the position and direction of the feature points on the spherical image are described by a constructed feature point attitude matrix. Then, based on the attitude matrix of feature points, the coordinate mapping relationship between the BRIEF descriptor template and the fisheye image is established to realize the computation associated with the distorted BRIEF descriptor. Four experiments are provided to test and verify the invariance and matching performance of the proposed descriptor for a fisheye image. The experimental results show that the proposed descriptor works well for distortion invariance and can significantly improve the matching performance in fisheye images.

1. Introduction

For decades, feature detection and matching is one of the core areas of image processing in various applied fields, such as Visual based Simultaneously Localization and Mapping (V-SLAM), Structure from Motion (SfM), Augmented Reality (AR), general image retrieval, image mosaic, and image registration. Common features include Scale Invariant Feature Transform (SIFT) [1], Speed Up Robust Feature (SURF) [2], BRIEF [3] Oriented FAST and Rotated BRIEF (ORB) [4], KAZE [5], Binary Robust Invariant Scalable Keypoints (BRISK) [6], etc. and their derivations, such as Principle Component Analysis SIFT (PCA-SIFT) [7], Simplified-SIFT (SSIFT) [8], and Accelerated-KAZE (AKAZE) [9]. Neural network based features are also developed, such as L2-NET [10], HardNet [11], and AffNet [12]. These features are designed for pinhole images with little distortion and cannot achieve good performances for fisheye images with severe radial distortion.
Compared with a pinhole camera, a fisheye camera has a wide field of view (FoV), and the captured image contains more abundant information. This makes the fisheye camera extensively adopted in robot navigation, visual monitoring, virtual reality, visual measurement, and 3D reconstruction. However, due to the severe radial distortion of the fisheye image, adopting the common feature descriptors directly may lead to a significant reduction in matching performance.
In order to reduce the impact of distortion on the feature matching performance, we propose a novel distorted BRIEF descriptor based on the spherical perspective model, named Fisheye Spherical Distorted BRIEF (FSD-BRIEF). Firstly, we propose a method based on 3D gray centroid to determine the direction of each feature point in the spherical image. By constructing an attitude matrix of a feature point, the position and direction of the feature point in the spherical image can be described in a nonsingular form. In order to reduce the calculation error of the 3D gray centroid caused by uneven distribution of pixels in the spherical image, a pixel density function is designed to represent the degree of pixel density on the spherical surface by the size of the patch area mapped by each pixel in the fisheye image. We build an attitude coordinate system of each feature point and propose a coordinate mapping method to project the BRIEF descriptor template on the fisheye image. The distortion form of the projected BRIEF template is consistent with the image distortion near the feature point, which prevents the calculated BRIEF descriptor from the affection of the radial distortion in fisheye image. The main contributions of the paper include:
  • A new pixel density function represented by the area of the spherical surface patch that each pixel of fisheye image occupies;
  • A new method of determining the 3D gray centroid and the direction of feature points with pixel density function based on a spherical perspective model;
  • A new feature point attitude matrix, providing a nonsingular description for both the position and the direction of the feature point in the spherical image surface;
  • A novel descriptor template distortion method based on the spherical perspective model and the feature point attitude matrix.
The remaining of the paper is arranged as follows. In Section 2, the related work of the fisheye image point feature is presented. In Section 3, the notation of the perspective model is briefly introduced. Section 4 is about the method of determining and expressing the direction of feature point. Then the method of calculating the FSD-BRIEF descriptor is described. In Section 5, experimental results are provided and the performance of the proposed FSD-BRIEF is tested and verified. Section 6 briefly summarizes the work. In Section 7, the future work is stated.

2. Related Work

By virtue of its front lens protruding in a parabola shape, fisheye camera has a large FoV whose angle of view is close to or even more than 180°. Although this characteristic can maximize the angle of view, it brings severe radial distortion in its captured image, leading to different scale factors for pixels in different positions of the image. Thus, it could make the traditional feature descriptors designed for plane image fail to match in raw fisheye images [13,14].
Generally, the methods to extract descriptors in fisheye images can be divided into two main streams according to whether images are corrected or not: resampling and non-resampling approaches.
Resampling approaches [15,16,17] segment the FoV image into several sub-FoVs and correct them based on a plane perspective model, then feature descriptors can be extracted and matched on the corrected sub-FoV. Lin et al. [15] adopted a visual-inertial based UAV (Unmanned Aerial Vehicle) navigation system, where two sub-regions are sampled in the horizontal direction of the fisheye FoV to obtain two undistorted pinhole image fields, which cover 180° horizontal FoV, but they discarded the upper and lower parts of the vertical FoV. Miiller et al. [16] presented a robust visual inertial odometry and time-efficient omni-directional 3D mapping system, where the FoV of each fisheye camera is divided into two piecewise pinhole fields so as to overcome the distortion. However, some parts near the edge of the FoV are wasted. Wang et al. [17] proposed a new real-time feature-based simultaneous localization and mapping system, where a fisheye image is projected onto five surfaces of a cube, and then descriptors are extracted on the unfolded surfaces of the cube. However, the stretching distortion and seam distortion exist between surfaces, for example, a straight line will become a broken line. Thus, in the resampling approaches, the whole FoV of the fisheye image is hard to be fully utilized, and the continuity between sub-FoV cannot be guaranteed. In addition, due to the view geometry of the plane perspective model, there is a small stretching distortion in the edge of the sub-FoV.
Unlike the resampling approaches, which directly correct fisheye images to pinhole images, a non-resampling approach uses descriptors to describe features in fisheye images. For example, inspired by the planar SIFT framework [18,19,20], Arican et al. [21] designed a new scale invariant omni-directional SIFT feature based on Riemannian geometry. Lourenco et al. [22] proposed a Spherical Radial Distortion SIFT (sRD-SIFT) feature, where the extraction of the feature and the calculation of the descriptor was designed based on the spherical perspective model and the raw fisheye image without resampling. However, the improved algorithms based on SIFT are generally long time-consuming. Cruz-Mota et al. [23] and Hansen et al. [24] utilized spherical harmonic function as the basic function to study the spectral analysis of spherical panoramic images. Since Gaussian filtering on the sphere can be realized as a diffusion process through the spherical Fourier transform, spherical harmonic function is used to construct scale space on the sphere. In theory, the spherical harmonic function can be used to maintain the invariance of the descriptors to encounter the changes of the camera poses and positions. However, the spherical harmonic function usually needs a large amount of computation and has inherent bandwidth limitation. This greatly weakens the capability of dealing with large-scale matching problems and cannot meet the real-time requirements of many applications.
For improving the calculation speed, Qiang et al. [25] proposed Spherical ORB (SPHORB), a binary spherical feature based on the ORB feature, which is the first binary descriptor for a panoramic image based on hexagon geodesic grid. In essence, SPHORB is still a special resampling approach, which divides the spherical panoramic image into 20 regular triangle fields according to the shape of a regular icosahedron, and aligns the pixel of adjacent regular triangles seamlessly. However, in the hexagon geodesic grid, the image patches near the 12 vertices of the regular icosahedron are discarded due to the distortion of the pixel distribution pattern, resulting in 12 FoV holes occupying 1.4% of the total FoV.
Note that it can result in holes when resampling the fisheye image based on hexagon geodesic grid. To avoid this, Urban et al. [26] proposed a new distorted descriptor, called Masked Distorted BRIEF (mdBRIEF). Although this work distorts the descriptors to adapt to different image regions instead of correcting the distortion of the fisheye image, the direction angle of feature points is obtained in the raw fisheye image by calculating the gray centroid in a circle template, which is still affected by the fisheye image distortion. Furthermore, the descriptors are distorted excessively near the edge of the fisheye image since it is distorted based on the plane perspective model.
Most recently, Pourian et al. [27] proposed an end-to-end framework to enhance the precision of the descriptor matching between multiple wide-angle images. In their work, the global matching and the local matching of descriptors are combined in three stages. However, a new distortion in the edge of the corrected image is introduced when an equal rectangle image transformation is employed in the global matching stage, lowering the performance of the framework.
In summary, the binary descriptor that can make use of the whole FoV and keep invariance in each position of the fisheye image has not been proposed. In order to avoid the FoV holes caused by the resampling approaches, and reduce the excessive distortion of descriptors in large FoV images, in this paper, we design a novel Fisheye Spherical Distorted BRIEF (FSD-BRIEF) descriptor, which is a distorted binary feature descriptor based on the spherical perspective model for fisheye images.

3. Fisheye Camera Model

In this paper, in order to ensure that the FoV of the fisheye image can be fully utilized without losing the performance of the feature descriptor, a new descriptor FSD-BRIEF is designed based on spherical perspective model. Different from the plane perspective model, the projection surface is a unit sphere with the origin of camera coordinate system as center, so as to ensure that the scale factors of each position on the projection surface are consistent. The spherical perspective model and its perspective projection relationship are shown in Figure 1. We define the camera coordinate system as O c X c Y c Z c . The origin point O c is located at the optical center of the camera, the X-axis O c X c points to the right along the long side of the imaging target surface, the Y-axis O c Y c points downward along the wide edge direction of the imaging target surface, and the Z-axis O c Z c points to the front of the camera along the optical axis direction. P is the projection point on the spherical image surface of the space point P. L is the projection large arc on the spherical image surface of the space Line L.
For a point P in a three-dimensional space, define its space coordinate in camera coordinate system as:
P c = [ x y z ] T
The projection point of P in the fisheye image is p , and its pixel coordinates are expressed as follows:
p = u v T
In this paper, Kannala-Brandt4 (KB4) [28] model is used as the fisheye camera model, its mathematical form is shown below:
θ = arctan 2 ( x 2 + y 2 , z ) φ = arctan 2 ( y , x ) θ d = θ ( 1 + k 1 θ 2 + k 2 θ 4 + k 3 θ 6 + k 4 θ 8 ) u = f x θ d cos φ + c x v = f y θ d sin φ + c y
where f x and f y are the horizontal and vertical focal length of the camera, c x and c y are the coordinates of the principal points of the camera, and k 1 , k 2 , k 3 , k 4 are the distortion coefficients. θ is the FoV latitude angle, which represents the angle between the O c Z c axis and the vector O c P . φ is the FoV longitude angle, which denotes the angle between the O c X c axis and the projection vector of O c P on the X c O c Y c plane. θ d is the angle θ as deflected by the fisheye lens. The arctan2 is the quadrant aware version of arctangent function.
Based on the spherical perspective model in Equation (3), Π represents the mapping function. The mapping from the point P c to the pixel point p in fisheye image can be expressed as:
p = Π ( P c )
The inverse mapping function of Π is defined as Π 1 , which indicates the mapping from the point p to the point P on the spherical image surface as follows:
P c = Π 1 ( p )
where P c is the coordinate vector of point P in the camera coordinate system. Notice that P c = x 2 + y 2 + z 2 = 1 .

4. FSD-BRIEF Descriptor

The procedure of extracting the FSD-BRIEF descriptor includes four steps, namely, pixel density function designing, 3D gray centroid calculation, feature point attitude matrix construction, and FSD-BRIEF descriptor extraction. In the spherical perspective model, the densities of pixels are distributed unevenly, lowering the effectiveness of descriptors. Thus, a pixel density function is proposed firstly to calculate the distribution compensation of each pixel so as to reduce the effect of uneven pixel distribution. Then, with the help of the pixel density function, a more accurate 3D gray centroid is designed to determine the direction of FSD-BRIEF descriptor and keep its rotation invariance in the spherical perspective model. Next, we further devise a nonsingular form, a feature point attitude matrix, to represent the position and the direction of a feature point. Finally, based on the feature point attitude matrix, an FSD-BRIEF descriptor is extracted by a constructed coordinate mapping relation between the BRIEF template and the raw fisheye image.

4.1. Pixel Density Function Designing

In this section, by defining the pixel density function, the distribution density of pixels on the unit sphere surface is expressed numerically.
Assuming that a pixel p in a fisheye image occupies a small patch P I X _ P A T C H ( p ) of the corresponding unit sphere, the mathematical expression of P I X _ P A T C H ( p ) is given by:
P I X _ P A T C H ( p ) = Π 1 ( p + Δ p ) Δ p = Δ u Δ v T , 1 2 < Δ u < 1 2 , 1 2 < Δ v < 1 2
where Δ u and Δ v are the coordinate offsets under the pixel coordinate system in the fisheye image. It is obvious that the area of the patch P I X _ P A T C H ( p ) will be smaller if the distance between point p and its adjacent pixels is closer, which means that the pixel density of point p is denser.
Therefore, the pixel density function m ( p ) is defined as the area of the patch P I X _ P A T C H ( p ) . To simplify the computation of the curved surface area, we assume that the patch size is small enough to approximate as a parallelogram, so the pixel density function compensation m ( p ) can be computed by:
m ( p ) = 1 4 Π 1 ( p + Δ x ) Π 1 ( p Δ x ) × Π 1 ( p + Δ y ) Π 1 ( p Δ y ) 2 , p I 0 , p I
where 2 means L2 norm operation, and Δ x , Δ y are the coordinate offsets as follows:
Δ x = 1 0 T Δ y = 0 1 T
From Equation (7), the pixel density function m ( p ) of the whole FoV only depends on the mapping function Π of the spherical perspective model in Equation (4).

4.2. 3D Gray Centroid Calculation

To determine the direction of the FSD-BRIEF descriptor, we propose a 3D gray centroid. Compared with 2D gray centroid [13,14,26], the proposed 3D gray centroid is more accurate since it takes full advantage of the consistent scale factor on the spherical perspective model. The 3D gray centroid is calculated in a circle area on the unit spherical surface. Figure 2 illustrates the correspondence of the circle area between the unit spherical surface and the fisheye image plane. As shown in Figure 2, for a FAST (Features From Accelerated Segment Test) [29] feature point p , its projection point on the unit spherical surface is P , and its 3D gray centroid calculation area is the circle area P A T C H _ 3 D ( P ) with P as the center. P A T C H ( p ) is the projection of the P A T C H _ 3 D ( P ) in the fisheye image plane O P X P Y P . α is half of the apex angle of the cone formed by P A T C H _ 3 D ( P ) and the origin point O c .
Note that the horizontal and vertical angular resolutions of fisheye cameras are approximately f x and f y (Pixels Per Radian) in KB4 model, and the values of f x and f y are often very close. In order to make the radius of the circular range cover about 15 pixel width while ensure the same mathematical status of f x and f y , the value of α in radians is selected as 15 divided by the arithmetic mean of f x and f y , that is,
α = 15 f x + f y 2 = 30 f x + f y
Define the projection area P A T C H ( p ) as:
P A T C H ( p ) = p + Δ p Π 1 ( p + Δ p ) · Π 1 ( p ) > cos α
where Δ p is the offset from the pixel p to the pixel in the area P A T C H ( p ) in the fisheye image plane. Π 1 ( p ) is the position vector of P . P is also the projection point of the pixel p on the unit sphere. Π 1 ( p + Δ p ) represents the position vector of the projection point of the pixel p + Δ p on the unit sphere. Π 1 ( p + Δ p ) · Π 1 ( p ) > cos α means that the angle between the two vectors Π 1 ( p + Δ p ) and Π 1 ( p ) is less than α . The region P A T C H ( p ) is actually the projection area of the region P A T C H _ 3 D ( P ) on the fisheye image.
The 3D gray centroid of the feature point p is defined as C. The symbol C c denotes the coordinate vector of C in the camera coordinate system. The calculation formula of C c is:
C c = p P A T C H ( p ) Π 1 ( p ) m ( p ) I ( p ) p P A T C H ( p ) m ( p ) I ( p )
where p is a pixel in P A T C H ( p ) . I ( p k ) represents the gray value of the pixel p in P A T C H ( p ) , m ( p ) is the pixel density function value of p , Π 1 ( p ) indicates the 3D coordinate of the projection point on the unit sphere surface of p .

4.3. Feature Point Attitude Matrix Construction

In order to avoid the singularity of direction expression of feature points on the poles of the unit spherical surface [25], we propose a feature point attitude matrix, a nonsingular expression, to represent the position and the direction of a feature point. The feature point attitude coordinate system O b X b Y b Z b is shown in Figure 3. The origin point O b coincides with the origin point O c of the camera coordinate system. The Z-axis O b Z b coincides with the vector P c . The Y-axis O b Y b is consistent with the P c × C c . The X-axis O b X b direction is determined by right-hand rule. The X-axis is coplanar with the 3D gray centroid vector C c and the position vector P c .
The coordinate transformation matrix R c b from the feature point attitude coordinate system to the camera coordinate system can be obtained as follows:
R c b = C c C c · P c P c 2 P c C c C c · P c P c 2 P c P c × C c P c × C c P c
The matrix R c b is defined as feature point attitude matrix.

4.4. FSD-BRIEF Descriptor Extraction

In this section, to enhance the distortion invariance of the descriptor in the fisheye image, FSD-BRIEF will be extracted by distorting the BRIEF template based on the constructed feature point attitude matrix so that its template can fit the distortion form of the adjacent area of the feature point.
At first, for a feature point, we define its square neighborhood region as a BRIEF template with a coordinate system O B X B Y B whose origin point O B is located at the feature point and coordinate ranges from −15 to 15, as shown in Figure 4. The green lines are the selected 256 groups of pixel pairs on the template.
Then, the defined BRIEF template is scaled to a certain extent and placed at the feature point as shown in Figure 5. For doing so, the following three conditions must be satisfied:
  • The center point O B of the descriptor template coincides with the projection point P of the feature point p on the sphere. In other words, the coordinate of point O B in the feature point attitude coordinate system is 0 0 1 T .
  • The directions of O B X B , O B Y B axis of BRIEF template coordinate system are consistent with the directions of O b X b , O b Y b axis of the feature point attitude coordinate system.
  • There is a scale factor α 15 between the coordinates in the BRIEF template coordinate system and the coordinates in the feature point attitude coordinate system.
Figure 6 shows a zoom-in of a local area along the direction of O b Z b in Figure 5 at the feature point P . As shown in Figure 6, for a point P on the BRIEF template, its homogeneous coordinate vector in O B X B Y B coordinate system is s . The coordinate vector of point P in the feature point attitude coordinate system is P b . Then, the P b can be solved by:
P b = Ds
where
D = d i a g ( α 15 , α 15 , 1 ) s = s x s y 1 T
According to the law of 3D coordinate transformation and the P b , the coordinate P c of point P in the camera coordinate system can be calculated by:
P c = R c b P b
where R c b is the feature point attitude matrix.
The projection point p of P c in the fisheye image can be obtained by:
p = Π P c
To sum up, for a feature point whose attitude matrix is R c b , the coordinate mapping relationship between the point s in the BRIEF template and the projection point p in the fisheye image is:
p = Π ( R c b Ds )
According to Equation (17), the FSD-BRIEF of a feature point can be extracted by the calculated projection points of the FSD-BRIEF template in the fisheye image. Figure 7 shows the general view of the FSD-BRIEF descriptor. It is clear that the FSD-BRIEF template in the fisheye image changes with the position where the feature point is located, so as to ensure that the descriptor is adaptive to the different distortions in the fisheye image, and achieves a good performance on distortion invariance.

5. Experimental Evaluation

In this section, we present four experiments that were used for evaluating the performance of the proposed method. Experiment 1 was an ablation experiment carried out on a virtual dataset, which was used to verify the contribution of pixel density function towards improving the solution accuracy of FSD-BRIEF orientation. Experiment 2 was also conducted on the virtual dataset, aiming to prove the invariance of FSD-BRIEF compared with three BRIEF-based descriptors. Experiment 3 and Experiment 4 were performed to evaluate the matching performance of FSD-BRIEF under (1) different camera motions on a real dataset, and (2) distortion conditions on sRD-SIFT dataset [22], respectively. The results of these two experiments were compared with those produced by five state-of-the-art features.

5.1. Experiment 1: The Contribution Evaluation of the Pixel Density Function to the Accuracy of Feature Point Orientation

Dataset: In this experiment, we investigated the contribution of the pixel density function to the accuracy of feature point orientation. In order to have accurate ground truth of the direction of feature points, we produced a virtual dataset by simulating a projection of the first image of the Graffiti dataset [30]; this was used as a test image to two virtual fisheye cameras with different intrinsic parameters. At first, in the test image, N p feature points p t i ( i = 1 , 2 , , N p ) were extracted. During the generation of the virtual dataset, the test image and a selected virtual fisheye camera were placed in the same virtual space. By placing the test image in different poses, we projected each feature point in the fisheye image on several selected positions with different longitude angle φ and latitude angle θ . The relationship between the angle φ , θ and the pose of the test image is shown in Appendix A. φ takes N φ values and θ takes N θ values. For each virtual fisheye camera, N p × N φ × N θ test samples were generated. Each test sample consisted of a generated fisheye image I ( φ , θ , p t i ) , a corresponding feature point position p c i ( φ , θ , p t i ) in the fisheye image, and a ground truth feature point attitude matrix R c b i * ( φ , θ , p t i ) . More details of the dataset are given in Appendix B.
Baseline: To verify the effectiveness of the pixel density function compensation proposed in this paper, we compared two algorithms, namely, the feature point attitude matrix computation part of FSD-BRIEF without the compensation (version 1) and with (version 2). In version 1, the 3D gray centroid was calculated without the pixel density compensation term m ( p ) . That is, the gray centroid computation formula of version 1 is shown as Equation (18). In version 2, we used Equation (11) to calculate the 3D gray centroids of feature points.
C c = p P A T C H ( p ) Π 1 ( p ) I ( p ) p P A T C H ( p ) I ( p )
Fisheye cameras: In order to verify the contribution of the pixel density function under different FoV cameras, two virtual cameras with different FoVs were selected for this experiment. Table 1 shows the intrinsic parameters of the two cameras.
Figure 8 shows the curve of the pixel density function of 170° FoV and 210° FoV cameras with θ . From the curve, we can see that the curve of the pixel density function of 170° FoV cameras decreased in angle range 0–60°, and increased in angle range 60–80°. Another curve, which was for the pixel density function of 210° FoV camera, increased in the whole angle range of 0–90°.
Evaluation metrics: In the experimental verification process, the direction angle error of the feature point is used for quantitative evaluation. The direction angle error, denoted by e ( φ , θ , p t i ) , is shown in Figure 9, where P i ( φ , θ , p t i ) is the projection point of p c i ( φ , θ , p t i ) on the unit sphere surface. The coordinate system O b * X b * Y b * Z b * is the feature point attitude coordinate system corresponding to the ground truth feature point attitude matrix R c b i * ( φ , θ , p t i ) , whilst O b X b Y b Z b is the feature point attitude coordinate system corresponding to the calculated feature point attitude matrix R c b i ( φ , θ , p t i ) . Note that O b * X b * and O b X b are defined as the ground truth direction and the calculated direction of the feature point (see Section 4.3). The unit of e ( φ , θ , p t i ) is defined as degree (°). Let ( O b * X b * ) c and ( O b X b ) c be the coordinate of the unit direction vectors corresponding to O b * X b * and O b X b in the camera coordinate system, then:
( O b * X b * ) c = R c b i * φ , θ , p t i [ 1 0 0 ] T ( O b X b ) c = R c b i ( φ , θ , p t i ) [ 1 0 0 ] T cos ( π 180 e ( φ , θ , p t i ) ) = O b X b · O b * X b * = ( O b X b ) c T ( O b * X b * ) c
From Equation (19), we can obtain the expression of e ( φ , θ , p t i ) as:
e φ , θ , p t i = 180 π arccos 1 0 0 R c b i φ , θ , p t i R c b i * φ , θ , p t i 1 0 0 T
Note that values of e ( φ , θ , p t i ) could be calculated from experimental results indexed by φ (FoV longitude angle), θ (FoV Latitude Angle) and i (feature point index in test image). For an ideal method, e ( φ , θ , p t i ) is always zero, and the calculated direction of feature point is consistent with the real direction. In fact, due to the influence of noise, the angle error e ( φ , θ , p t i ) would not be zero. In this experiment, the smaller the value of e ( φ , θ , p t i ) , the more accurate the calculated feature point direction.
In this study, the mean error e m e a n θ and the standard deviation e S D θ were used to evaluate the results of e φ , θ , p t i . e m e a n θ measures the average error of the feature point direction calculated by using all the points under the latitude angle θ . e S D θ measures the dispersion of the e φ , θ , p t i distribution under θ . e m e a n θ and e S D θ are calculated as follows:
e m e a n ( θ ) = φ i e ( φ , θ , p t i ) N φ N p e S D ( θ ) = φ i [ e ( φ , θ , p t i ) e m e a n ( θ ) ] 2 N φ N p
where, the N φ and N p are the number of φ and i values. The smaller the e m e a n θ is, the more accurate the feature point direction is. The smaller the e S D θ is, the more stable the result of feature point direction is.
Evaluations: In the 170° FoV camera, the range of θ is 10–80°. In the 210° FoV camera, the range of θ is 10–90°. The two statistics e m e a n θ and e S D θ are computed for both of the two cases. The comparison results are shown in Table 2 and Table 3. The error reduction of version 2 compared to version 1 are calculated as follows:
η = e v 2 e v 1 e v 1 × 100 %
where η is the value of error reduction, e v 1 and e v 2 are the value of the direction angle error of version 1 and version 2 individually. Taking the horizontal axis as the θ value and the vertical axis as e m e a n ( θ ) and e S D ( θ ) , the e θ curves are also drawn in Figure 10.
For the 170° FoV camera, both of the two compensation schemes led to similarly stable results in the angle range of 10–60°. However, when the angle θ became large (especially in the range of 60–80°), the performance of Version 2 was obviously much better than that of Version 1. Both of the average angle error and the accuracy dispersion of the proposed method (version 2) were about 1° in the whole fisheye FoV of the dataset.
For the 210° FoV camera, the overall performance of Version 2 was continuously better than that of Version 1 throughout the range of 30–90°.
The experimental results showed that near the edge of FoV, especially in the FoV region where the pixel density function increased monotonously with the angle θ , the pixel density compensation improved the accuracy and stability of feature point direction calculation significantly.

5.2. Experiment 2: Descriptor Invariance Evaluation of Fisheye Images in Different FoV Positions

Baselines: In this experiment, three typical BRIEF descriptors, including ORB, dBRIEF (Distorted BRIEF), and mdBRIEF, were selected as baselines. The descriptor of the feature point in each test sample in the virtual dataset generated in Experiment 1 was extracted by the tested features (FSD-BRIEF, ORB, dBRIEF, and mdBRIEF). In order to ensure a fair comparison of experimental results, all the binary descriptors were chosen to be 256 bits. dBRIEF is the version of mdBRIEF without on-line mask learning. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub. For ORB, we used the functions provided in OpenCV and its default parameter settings.
Evaluation metrics: In this experiment, we define D ( φ , θ , p t i ) as the descriptor of the feature point p c i ( φ , θ , p t i ) . The associated Hamming distance error Δ D ( φ , θ , p t i ) of the descriptor of the feature point was used to evaluate the invariance performance of algorithms. Δ D ( φ , θ , p t i ) is calculated for each feature point test sample by each test feature as:
Δ D ( φ , θ , p t i ) = h ( D ( φ , θ , p t i ) , D ( φ 0 , θ 0 , p t i ) )
here we selected D ( φ 0 , θ 0 , p t i ) as the reference standard descriptor to compute the Hamming distance error, where φ 0 = 45 °   θ 0 = 10 ° . For an ideal feature algorithm, for the same p t i , no matter what values of φ and θ take, there is Δ D ( φ , θ , p t i ) = 0 . However, in practice, due to the resampling error of the fisheye camera, Δ D ( φ , θ , p t i ) was not zero. Therefore, the smaller the calculated value of Δ D ( φ , θ , p t i ) , the stronger the invariance of the feature algorithm to radial distortion of the fisheye image.
Similar to Experiment 1, Δ D m e a n θ and Δ D S D θ were used as evaluation metrics. Δ D m e a n θ is the average value of the descriptor distance calculated by using all the points under the latitude angle θ . Δ D S D θ is the dispersion of the Δ D φ , θ , p t i distribution under θ . The smaller the Δ D m e a n θ is, the stronger the invariance of feature algorithm to radial distortion of fisheye images. The smaller the Δ D S D θ is, the more stable the performance of the feature algorithm is. The computation formula of Δ D m e a n θ and Δ D S D θ was as follows:
Δ D m e a n ( θ ) = φ i Δ D ( φ , θ , p t i ) N φ N p Δ D S D ( θ ) = φ i [ Δ D ( φ , θ , p t i ) Δ D m e a n ( θ ) ] 2 N φ N p
Evaluations: Since θ 0 = 10 ° was set for the reference standard descriptor D ( φ 0 , θ 0 , p t i ) , the ranges of θ were selected as 20–80° in 170° FoV camera, and 20–90° in 210° FoV camera respectively. The values of Δ D m e a n ( θ ) and Δ D S D ( θ ) of FSD-BRIEF, ORB, dBRIEF, and mdBRIEF were computed. The numerical results are shown in Table 4 and Table 5. The corresponding curves of Δ D m e a n ( θ ) are shown in Figure 11, and the curves of Δ D S D ( θ ) are shown in Figure 12.
The experimental results of the two cameras showed that, in the angle range of 20–40°, FSD-BRIEF led to similarly stable descriptor errors as ORB and dBRIEF. However, in the angle range of 40–80°, the descriptor errors of ORB and dBRIEF tended to increase significantly, while the descriptor errors of FSD-BRIEF increased much less than that of ORB and dBRIEF. In the angle range of 75–80°, the descriptor error of FSD-BRIEF was smaller than that of mdBRIEF. However, the descriptor error of FSD-BRIEF was larger than that of mdBRIEF in the angle range of 20–60°; this is because an on-line mask learning scheme was performed in mdBRIEF, where the unstable binary bits were masked.
The standard deviations (SD) of FSD-BRIEF, ORB and dBRIEF were similar in the angle range of 20–40°. In the angle range up to 50°, the SD of FSD-BRIEF was significantly smaller than that of ORB and dBRIEF. In the angle range of 20–60°, the SD value of FSD-BRIEF was not as small as mdBRIEF, but smaller than mdBRIEF in the angle range of 70–80°.
Because dBRIEF and mdBRIEF distorted the descriptor template based on the plane perspective model, it could not extract the feature descriptor when θ was 90°, and there was no 90° effective value of the descriptor errors.
It can be observed from the results that, compared with other BRIEF based features, FSD-BRIEF could effectively adapt to the radial distortion of fisheye images and ensure the invariance of descriptors.

5.3. Experiment 3: Matching Performance Evaluation in Different Kind of Image Variance

Dataset: In order to verify the FoV edge distortion invariance, translation invariance, and scale invariance performances of the proposed FSD-BRIEF in image matching process, a dataset captured by a 210° FoV fisheye camera was made. The intrinsic parameter of the 210° FoV fisheye camera is shown in Table 1. There were three groups of images in this dataset, and each group contained 13 images. In the first group of images, through rotation of the camera, the test image fell on the edge of the camera’s FoV as close as possible, and the test image was distorted by the radial distortion of the fisheye camera to the greatest extent. In the second group of images, by moving and rotating the camera parallel to the test image plane, the test image fell in different positions of the camera FoV. In the third group of images, the camera moved forward and backward greatly relative to the test image, which made the projection of the test image in the fisheye image has a large-scale change.
Baselines: In this experiment, five state-of-the-art descriptors, AKAZE, BRISK, ORB, dBRIEF and mdBRIEF, were selected as baselines. For FSD-BRIEF, we used the FAST feature to extract feature points. For BRISK, ORB, and AKAZE, we used the functions provided in OpenCV with default parameter settings. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub.
Evaluation metrics: In order to evaluate the matching performance of FSD-BRIEF proposed in this paper, according to [30], we conducted comparison experiments with state-of-the-art descriptors by calculating the PR (recall—“1-precision”) curve of the matching results. Designate S i , S j to be a set of feature points detected in the image I i and I j respectively, then the set of ground truth matching points G i j can be given by:
G i j = { ( p i , p j ) | p i Π ( H i j Π 1 ( p j ) ) < ε , p i S i , p j S j }
where · refers to Euclidean distance between the p i and the projecting point of p j in image I i , H i j is the ground truth homography matrix from image I i to I j , which was calculated by manually labeled corresponding points in the image sequence. The distance threshold ε was taken as 3 pixels. To evaluate the matching performance of test features, let M i j be the set of matching feature point pairs gained by the algorithm from the image I i and I j , and M i j consisted of correct matches M t r u e i j and incorrect matches M f a l s e i j . Hence, as shown in Equation (26), the r e c a l l ( ε ) presents the ability of the matching algorithm to find correct matches, and 1 p r e c i s i o n ( ε ) indicates the algorithm’s capability of discarding unmatched points.
r e c a l l ( ε ) = 1 i < j n N ( M t r u e i j ( ε ) ) 1 i < j n N ( G i j ) , 1 p r e c i s i o n ( ε ) = 1 1 i < j n N ( M f a l s e i j ( ε ) ) 1 i < j n N ( M i j ( ε ) )
where n is the number of images in the image sequence, N ( ) denotes the point pair number of a set, ε is a descriptor distance threshold that was used to obtain the correct matches whose Euclidean distance between their descriptors is below ε . Each of the two measures yielded a so-called PR cure by increasing the threshold ε from zero gradually. That PR curve passed at a short distance of the ideal point (0, 1) meant the corresponding test feature was absolutely perfect which could make both the value of precision and recall rate 1. In practice, a good matching performance was achieved when the matching algorithm’s PR curve had the minimum distance to the point (0, 1), the highest recall, and the minimum 1-precision.
Evaluation: To test the matching performance in this dataset, we used the test features to extract and match features and drew PR curves. For each algorithm in each image, 300 strongest feature points were extracted. The PR curve results are shown in Figure 13.
From Figure 13a,b, the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.75–0.8. For other features involved in the comparison, the recall value at the end of the PR curve was in the range of 0.3–0.6. The result showed that, compared with other features, FSD-BRIEF had significant FoV edge distortion invariance in the feature matching process of severely distorted images.
Figure 13c,d shows that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was near 0.5. For other features involved in comparison, the recall value at the end of the PR curve was in the range of 0.25–0.5 and below FSD-BRIEF. The result showed that, compared with other features, FSD-BRIEF had better translation invariance in the feature matching process of fisheye images.
In Figure 13e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.4–0.45. For AKAZE, BRISK, ORB, and dBRIEF, the recall value at the end of the PR curve was in the range of 0.25–0.4. The recall value of FSD-BRIEF was higher than mdBRIEF when 1 p r e c i s i o n was in the range of 0.05–0.3. The results showed that FSD-BRIEF had better scale invariance in the feature matching process of fisheye images compared with most of the state-of-the-art features.
Using AKAZE, BRISK, ORB, dBRIEF and mdBRIEF as references, experimental results showed that FSD-BRIEF showed comparable performance in FoV edge distortion invariance, translation invariance, scale invariance, and matching performance in fisheye images.

5.4. Experiment 4: Matching Performance Evaluation in Different Distortion Images

Dataset: In order to verify the matching performance of FSD-BRIEF under different radial distortion, the sRD-SIFT dataset was used in this experiment. The sRD-SIFT datasets [22] were published with the work of sRD-SIFT. It consisted of three sets of images (FireWire, Dragonfly, and Fisheye), each set containing 13 images and captured by a camera with different radial distortion. The dataset contained significant scaling and rotation changes. Four images selected randomly for each dataset are shown in the right panels of Figure 14.
Fisheye cameras: The three sets of images were attached with the image of a checkerboard calibration board for camera calibration. Therefore, we calibrated each camera based on the KB4 fisheye camera model using the chessboard image provided. The calibration results are shown in Table 6.
Evaluation: Similar to Experiment 3, to test the matching performance in the three groups of the sRD-SIFT dataset, we also employed the baseline descriptors (ORB, AKAZE, BRISK, dBRIEF and mdBRIEF) to extract and match 300 strongest keypoints for each image, then draw PR curves. The results are shown in Figure 14, where Figure 14a,b shows the results and the image group with the least distortion. Figure 14c,d shows the results and the image group with moderate distortion. Figure 14e,f shows the results of the image group with the most distortion captured by fisheye cameras.
Figure 14a,b shows that the PR curve of FSD-BRIEF almost coincided with that of ORB and AKAZE, and the performance of AKAZE was slightly better. The recall rate at the end of the curve of FSD-BRIEF, ORB, and AKAZE was in the range of 0.65–0.7, which was higher than that of BRISK and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in small distorted images.
Figure 14c,d shows that the PR curve of FSD-BRIEF almost coincided with that of ORB, and the recall at the end of the curve was around 0.6, which was higher than that of AKAZE, BRISK, and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in moderate distorted images and better than AKAZE, BRISK, and dBRIEF.
In Figure 14e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF was around 0.6, which was higher than that of ORB, AKAZE, BRISK, and dBRIEF, and almost the same as that of mdBRIEF. From the result, we can see that the performance of FSD-BRIEF was almost equivalent to that of mdBRIEF and better than ORB, AKAZE, BRISK, and dBRIEF in the most distorted images.
These experimental results show that the performance of FSD-BRIEF in large distortion image was better than most of the state-of-the-art features involved in the comparison. In small and moderate distorted images, the performance of FSD-BRIEF was similar to that of the ORB feature. That is because that the test image was close to the center of the FoV in this dataset, the radial distortion effect of the test image by the fisheye lens was limited compared with Experiment 3. Therefore, the performance of FSD-BRIEF in this paper on the sRD-SIFT dataset was not as prominent as the 210° FoV camera dataset in Experiment 3.

6. Conclusions

In this paper, to tackle the problem of the feature matching performance deterioration due to the impact of fisheye radial distortion, we proposed a novel distorted BRIEF descriptor, named FSD-BRIEF, for fisheye images based on the spherical projection model. First, for reducing the impact of the distortion on gray centroid calculation and the accuracy of feature point direction, we designed a pixel density function and evaluated its performance by comparing the feature point direction error results of the algorithms with and without using the function. The obtained results shown that the pixel density function can promote the precision of the feature point direction calculation. Second, the distortion invariance of the proposed FSD-BRIEF was verified and compared with other BRIEF based descriptors, and the associated results demonstrated that FSD-BRIEF works well for distortion invariance in different positions of fisheye images. In the matching experiments in 210° FoV camera datasets, FSD-BRIEF shown better performance for FoV edge distortion invariance, translation invariance, and scale invariance in large distortion fisheye images. In the sRD-SIFT dataset, the FSD-BRIEF descriptor can significantly improve the matching performance for large distortion images, and meanwhile can still produce excellent results for small distortion images.

7. Future Work

It is known that panoramic images have been widely used today. The proposed descriptor can be adapted and potentially applied to panoramic images, with some slight modifications of the camera model and the computation method of the pixel density function, respectively. Moreover, in the future work, we will design a distorted FAST detector based on the spherical perspective model for panoramic images to extract feature points at any position including the two Polar Regions.

Author Contributions

Conceptualization Y.Z.; investigation J.S., Y.D. and Y.Y.; methodology Y.Z. and Y.D.; project administration J.S. and Y.D.; software Y.Z.; supervision J.S. and Y.D.; validation Y.Z. and Y.Y.; visualization Y.Z. and Y.Y.; writing—original draft Y.Z., Y.D., Y.Y. and H.-L.W.; writing—review and editing Y.D., Y.Y. and H.-L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets in Experiment 1-3 are available at https://github.com/Ironeagleufo123/FSD-BRIEF-Dataset (accessed on 6 March 2021). The datasets in Experiment 4 were published with the work of sRD-SIFT.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Coordinate Transformation for Virtual Dataset Generation

For the test picture shown in Figure A1a, define the test image coordinate system O t X t Y t Z t , as shown in Figure A1b. The coordinate origin O t is located in the first pixel in the upper left corner of the test image.
Figure A1. Graffiti Test Image (a) and the Test Image Coordinate System (b).
Figure A1. Graffiti Test Image (a) and the Test Image Coordinate System (b).
Sensors 21 01839 g0a1
The X-axis represents the row of the image pixel, the Y-axis represents the column of the image pixel, and the Z-axis is determined according to the right-hand rule.
In this paper, the coordinate system transformation process from the camera coordinate system O c X c Y c Z c to the test image coordinate system 0 t X t Y t Z t is shown in Figure A2, which mainly includes three steps: (1) Deflection transformation, (2) Roll transformation, (3) Translation transformation.
Figure A2. Three step transformation from the camera coordinate system to test image coordinate system.
Figure A2. Three step transformation from the camera coordinate system to test image coordinate system.
Sensors 21 01839 g0a2

Appendix A.1. Deflection Transformation

As shown on the left of Figure A2, note that the vector l is the unit vector consistent with the Z-axis direction of the camera coordinate system 0 c X c Y c Z c and the vector r is a 3D unit vector, indicating the position to which the Z-axis of the camera coordinate system will turn. The camera coordinate system is rotated around vector l × r according to the right-hand rule, so that the Z-axis is consistent with the r vector direction after rotation, and the transition coordinate system O c X c Y c Z c is obtained. The rotation angle is equal to the angle between vector l and r , which is defined as θ .
The unit vector corresponding to the rotation axis l × r is defined as n , and φ is defined as the angle between the projection of the vector r on the X c O c Y c plane and the O c X c axis. The following constraints exist:
cos θ = l · r , sin θ = l × r , n = l × r l × r , n = sin φ cos φ 0 T
According to Rodriguez formula, the coordinate transformation matrix from the transition coordinate system O c X c Y c Z c to the camera coordinate system O c X c Y c Z c is as follows:
R c c = cos θ I + ( 1 cos θ ) n n T + sin θ n ^
where the symbol ^ represents the transformation from three-dimensional column vector to skew-symmetric matrix:
x y z ^ = 0 z y z 0 x y x 0
According to the Equations (A1) and (A2), it can be deduced that:
R c c = ( l · r ) I + ( I × r ) ( l × r ) T 1 + l · r + ( l × r ) ^ = 1 cos 2 φ sin 2 θ 1 + cos θ sin 2 θ cos 2 φ 2 ( 1 + cos θ ) cos φ sin θ sin 2 θ cos 2 φ 2 ( 1 + cos θ ) 1 sin 2 φ cos 2 θ 1 + cos θ sin φ sin θ cos φ sin θ sin φ sin θ cos θ

Appendix A.2. Roll Transformation

The transition coordinate system O c X c Y c Z c rotates ψ angle around Z-axis according to the right-hand rule to obtain the transition coordinate system O c X c Y c Z c . The transformation matrix between O c X c Y c Z c and O c X c Y c Z c is:
R c c = cos ψ sin ψ 0 sin ψ cos ψ 0 0 0 1

Appendix A.3. Translation Transformation

The transition coordinate system O c X c Y c Z c is transformed into the test image coordinate system 0 t X t Y t Z t through a translation transformation by vector q . The coordinate of vector q in the test image coordinate system is defined as q t . Note that the 4 × 4 relative pose matrix between O c X c Y c Z c and 0 t X t Y t Z t is T c t , which is expressed as:
T c t = I 3 × 3 q t 0 T 1
In conclusion, the relative pose matrix T c t , between the camera coordinate system 0 c X c Y c Z c and the test image coordinate system 0 t X t Y t Z t is as follows:
T c t = T c c T c c T c t = R c c 0 0 T 1 R c c 0 0 T 1 I 3 × 3 q t 0 T 1 = R c c R c c R c c R c c q t 0 T 1 = R c t R c t q t 0 T 1
where
R c t = 1 cos 2 φ sin 2 θ 1 + cos θ sin 2 θ cos 2 φ 2 ( 1 + cos θ ) cos φ sin θ sin 2 θ cos 2 φ 2 ( 1 + cos θ ) 1 sin 2 φ cos 2 θ 1 + cos θ sin φ sin θ cos φ sin θ sin φ sin θ cos θ cos ψ sin ψ 0 sin ψ cos ψ 0 0 0 1

Appendix B. Virtual Dataset Generation

Several feature points, which are expressed as { p t i | i = 1 , 2 , . . . , N p } , are extracted based on FAST feature in the original Graffiti test image, where p t i = [ p t x i p t y i ] T , N p is the number of feature points, and in this experiment, the value of N p is 30. Define that the three-dimensional point corresponding to the feature point p t i in the test image is P i , and the 3D coordinate of P i in the test image coordinate system is P t i , and P t i = [ p t x i p t y i 0 ] T . Based on the original ORB centroid calculation method, the gray centroid c t i of each feature point p t i is calculated. Define that the corresponding 3D point of c t i is C i , and the 3D coordinate of C i in the test image coordinate system is C t i .
In order to ensure that the generated dataset can accurately test the accuracy of the algorithm to calculate the direction of feature points, the dataset generation meets the following conditions:
  • As shown in Figure A3, ensure that the line P i O c ¯ is perpendicular to the test image plane X t O t Y t , that is P i O c ¯ X t O t Y t , so as to ensure that the circular neighborhood used to calculate the gray centroid of the feature point p t i in the test image and the optical center O c of the camera forms a regular cone;
  • Ensure that the length of P i O c ¯ is equal to the average values of the horizontal and vertical focal length f x + f y 2 of the virtual camera, so as to ensure that the circular area used for calculating the gray centroid of feature points in the original test image is approximately the same as that used to calculate the gray centroid of feature points in the fisheye image.
If these conditions are not satisfied, it will lead to the inconsistency between the calculation area of the gray centroid in the original test image and that in the fisheye image, and the calculated gray centroid will have different mathematical meanings, which will lead to the loss of the experimental verification value.
Figure A3. Position relationship between the camera coordinate system and the test image coordinate system.
Figure A3. Position relationship between the camera coordinate system and the test image coordinate system.
Sensors 21 01839 g0a3
The necessary and sufficient condition for satisfying the above rules is that the vector q t satisfies:
q t ( p t i ) = [ p t x i p t y i f x + f y 2 ] T
where p t x i and p t y i are the pixel coordinates of point P t i in the test image, and f x and f y is the horizontal and vertical focal length of the virtual camera. φ and θ determine the projection position of point p t i in the fisheye image. ψ determines the projection position of the gray centroid C i in the fisheye image.
Then, the dataset is generated according to the following method:
  • Within the camera’s FoV, starting from θ = 10 ° , taking 10° as the interval, the θ angle is uniformly selected, and N θ values of θ are generated.
  • From 45° to 315°, φ is uniformly taken at 90° intervals, and the number of φ values generated is N φ = 4 .
  • When φ is 45° or 225°, the value of ψ is taken as 4 θ . When φ is 135° or 315°, the value of ψ is taken as 0.
For each combination of φ , θ and p t i , a corresponding fisheye distortion image I ( φ , θ , p t i ) is generated, which constitutes a virtual dataset containing N θ × N φ × N p fisheye images as follow:
I φ , θ , p t i | φ = 45 ° , 135 ° , 225 ° , 315 ° ; θ = 10 ° , 20 ° , , 80 ° ; i = 1 , 2 , , N p ; N p = 30
The coordinate P c i of point P i and C c i of point C i in the camera coordinate system are calculated by using the coordinate transformation relationship, as shown in the following Equations:
P c i = T c t φ , θ , p t i P t i C c i = T c t φ , θ , p t i C t i
The projection position p c i ( φ , θ , p t i ) of the feature point p t i in the fisheye image is calculated by the following formula:
p c i ( φ , θ , p t i ) = Π ( P c i )
According to the relationship shown in the following formula, the ground truth attitude matrix R c b i * ( φ , θ , p t i ) corresponding to the feature point p c i ( φ , θ , p t i ) in the fisheye image is calculated by:
R c b i * φ , θ , p t i = C c i C c i · C c i P c i 2 P c i C c i C c i · P c i P c i 2 P c i P c i × C c i P c i × C c i P c i
In this dataset, each virtual fisheye image I ( φ , θ , p t i ) uniquely corresponds to a feature point pixel coordinate p c i ( φ , θ , p t i ) and a ground truth attitude matrix R c b i * ( φ , θ , p t i ) , which constitutes a feature point test sample. Some examples of test samples are shown in Figure A4.
Figure A4. Some examples of virtual dataset test sample. The green circle in sample image indicates the feature point.
Figure A4. Some examples of virtual dataset test sample. The green circle in sample image indicates the feature point.
Sensors 21 01839 g0a4

References

  1. Lowe, D.G. Object recognition from local scale-invariant features. Proc. IEEE Int. Conf. Comput. Vis. 1999, 2, 1150–1157. [Google Scholar]
  2. Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded up robust features. In Computer Vision - ECCV 2006, Proceedings of the 9th European Conference on Computer Vision. Proceedings, Part I (Lecture Notes in Computer Science); Springer: Berlin/Heidelberg, Germany, 2006; Volume 3951, pp. 404–417. [Google Scholar]
  3. Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary robust independent elementary features. Lect. Notes Comput. Sci. 2010, 6314 LNCS, 778–792. [Google Scholar]
  4. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV 2011), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
  5. Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE features. Lect. Notes Comput. Sci. 2012, 7577 LNCS, 214–227. [Google Scholar]
  6. Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust invariant scalable keypoints. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar]
  7. Ke, Y.; Sukthankar, R. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004; Volume 2, pp. II506–II513. [Google Scholar]
  8. Liu, L.; Peng, F.y.; Zhao, K.; Wan, Y.p. Simplified SIFT algorithm for fast image matching. Infrared Laser Eng. 2008, 37, 181–184. [Google Scholar]
  9. Alcantarilla, P.F.; Nuevo, J.; Bartoli, A. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the BMVC 2013–Electronic Proceedings of the British Machine Vision Conference 2013, Bristol, UK, 9–13 September 2013. [Google Scholar]
  10. Tian, Y.; Fan, B.; Wu, F. L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6128–6136. [Google Scholar]
  11. Mishchuk, A.; Mishkin, D.; Radenovi, F.; Matas, J. Working Hard to Know Your Neighbor’s Margins: Local Descriptor Learning Loss. arxiv 2017, arXiv:1705.10872. [Google Scholar]
  12. Mishkin, D.; Radenovic, F.; Matas, J. Repeatability Is Not Enough: Learning Affine Regions via Discriminability. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
  13. Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. arXiv 2020, arXiv:2007.11898. [Google Scholar]
  14. Urban, S.; Hinz, S. MultiCol-SLAM-a modular real-time multi-camera SLAM system arXiv. arXiv 2016, arXiv:1610.07336. [Google Scholar]
  15. Lin, Y.; Gao, F.; Qin, T.; Gao, W.; Liu, T.; Wu, W.; Yang, Z.; Shen, S. Autonomous aerial navigation using monocular visual-inertial fusion. J. Field Robot. 2018, 35, 23–51. [Google Scholar] [CrossRef]
  16. Miiller, M.G.; Steidle, F.; Schuster, M.J.; Lutz, P.; Maier, M.; Stoneman, S.; Tomic, T.; Sturzl, W. Robust Visual-Inertial State Estimation with Multiple Odometries and Efficient Mapping on an MAV with Ultra-Wide FOV Stereo Vision. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3701–3708. [Google Scholar]
  17. Yahui, W.; Shaojun, C.; Shi-Jie, L.; Yun, L.; Yangyan, G.; Tao, L.; Ming-Ming, C. CubemapSLAM: A piecewise-pinhole monocular fisheye SLAM system. In Computer Vision-ACCV 2018, Proceedings of the 14th Asian Conference on Computer Vision. Revised Selected Papers: Lecture Notes in Computer Science (LNCS 11366); Jawahar, C., Li, H., Mori, G., Schindler, K., Eds.; Springer: Cham, Switzerland, 2019; Volume 11366, pp. 34–49. [Google Scholar]
  18. Scaramuzza, D.; Siegwart, R. Appearance-guided monocular omnidirectional visual odometry for outdoor ground vehicles. IEEE Trans. Robot. 2008, 24, 1015–1026. [Google Scholar] [CrossRef] [Green Version]
  19. Tardif, J.P.; Pavlidis, Y.; Daniilidis, K. Monocular visual odometry in urban environments using an omnidirectional camera. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, Nice, France, 22–26 September 2008; pp. 2531–2538. [Google Scholar]
  20. Rituerto, A.; Puig, L.; Guerrero, J.J. Visual SLAM with an Omnidirectional Camera. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 348–351. [Google Scholar] [CrossRef] [Green Version]
  21. Arican, Z.; Frossard, P. OmniSIFT: Scale invariant features in omnidirectional images. In Proceedings of the International Conference on Image Processing, ICIP, Hong Kong, China, 26–29 September 2010; pp. 3505–3508. [Google Scholar]
  22. Lourenco, M.; Barreto, J.P.; Vasconcelos, F. SRD-SIFT: Keypoint detection and matching in images with radial distortion. IEEE Trans. Robot. 2012, 28, 752–760. [Google Scholar] [CrossRef]
  23. Cruz Mota, J.; Bogdanova, I.; Paquier, B.; Bierlaire, M.; Thiran, J.P. Scale invariant feature transform on the sphere: Theory and applications. Int. J. Comput. Vis. 2012, 98, 217–241. [Google Scholar] [CrossRef]
  24. Hansen, P.; Corke, P.; Boles, W. Wide - angle visual feature matching for outdoor localization. Int. J. Robot. Res. 2010, 29, 267–297. [Google Scholar] [CrossRef]
  25. Zhao, Q.; Feng, W.; Wan, L.; Zhang, J. SPHORB: A Fast and Robust Binary Feature on the Sphere. Int. J. Comput. Vis. 2015, 113, 143–159. [Google Scholar] [CrossRef]
  26. Urban, S.; Weinmann, M.; Hinz, S. mdBRIEF-a fast online-adaptable, distorted binary descriptor for real-time applications using calibrated wide-angle or fisheye cameras. Comput. Vis. Image Underst. 2017, 162, 71–86. [Google Scholar] [CrossRef] [Green Version]
  27. Pourian, N.; Nestares, O. An End to End Framework to High Performance Geometry-Aware Multi-Scale Keypoint Detection and Matching in Fisheye Imag. In Proceedings of the International Conference on Image Processing, ICIP, Taipei, Taiwan, 22–25 September 2019; pp. 1302–1306. [Google Scholar]
  28. Kannala, J.; Brandt, S.S. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1335–1340. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Viswanathan, D.G. Features from accelerated segment test (fast). In Proceedings of the 10th workshop on Image Analysis for Multimedia Interactive Services, London, UK, 6 May 2009; pp. 6–8. [Google Scholar]
  30. Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Spherical perspective model ( θ : The FoV latitude angle; φ : the FoV longitude angle).
Figure 1. Spherical perspective model ( θ : The FoV latitude angle; φ : the FoV longitude angle).
Sensors 21 01839 g001
Figure 2. The circle area for 3D gray centroid calculation on the unit spherical surface and its projection area in the fisheye image plane.
Figure 2. The circle area for 3D gray centroid calculation on the unit spherical surface and its projection area in the fisheye image plane.
Sensors 21 01839 g002
Figure 3. Feature point attitude coordinate system.
Figure 3. Feature point attitude coordinate system.
Sensors 21 01839 g003
Figure 4. BRIEF template and its coordinate system.
Figure 4. BRIEF template and its coordinate system.
Sensors 21 01839 g004
Figure 5. Position relationship between BRIEF template and spherical projection surface.
Figure 5. Position relationship between BRIEF template and spherical projection surface.
Sensors 21 01839 g005
Figure 6. Coordinate mapping between BRIEF template coordinate system and feature point attitude coordinate system.
Figure 6. Coordinate mapping between BRIEF template coordinate system and feature point attitude coordinate system.
Sensors 21 01839 g006
Figure 7. General view of FSD-BRIEF descriptor.
Figure 7. General view of FSD-BRIEF descriptor.
Sensors 21 01839 g007
Figure 8. The pixel density function of 170° field of view (FoV) and 210° FoV camera with θ .
Figure 8. The pixel density function of 170° field of view (FoV) and 210° FoV camera with θ .
Sensors 21 01839 g008
Figure 9. The definition of feature point direction angle error between calculated direction and ground truth direction.
Figure 9. The definition of feature point direction angle error between calculated direction and ground truth direction.
Sensors 21 01839 g009
Figure 10. e θ curves of two versions of feature point direction calculation methods in 170° FoV camera and 210° FoV camera.
Figure 10. e θ curves of two versions of feature point direction calculation methods in 170° FoV camera and 210° FoV camera.
Sensors 21 01839 g010
Figure 11. Δ D θ curve results in 170° FoV camera.
Figure 11. Δ D θ curve results in 170° FoV camera.
Sensors 21 01839 g011
Figure 12. Δ D θ curve results in 210° FoV camera.
Figure 12. Δ D θ curve results in 210° FoV camera.
Sensors 21 01839 g012
Figure 13. 210° FoV camera dataset and corresponding PR curve result.
Figure 13. 210° FoV camera dataset and corresponding PR curve result.
Sensors 21 01839 g013
Figure 14. sRD-SIFT dataset and corresponding PR curve result.
Figure 14. sRD-SIFT dataset and corresponding PR curve result.
Sensors 21 01839 g014
Table 1. The Intrinsic Parameters of 170° FoV and 210° FoV Camera.
Table 1. The Intrinsic Parameters of 170° FoV and 210° FoV Camera.
Intrinsic Parameter170° FoV Camera210° FoV Camera
f x 284.977257.28
f y 284.977257.28
c x 423.039582.006
c y 398.179419.655
k 1 −0.00454−0.0765
k 2 0.03960.00908
k 3 −0.0363−0.0117
k 4 0.005840.00373
Table 2. The numerical results of direction angle error in 170° FoV camera.
Table 2. The numerical results of direction angle error in 170° FoV camera.
θ (°)Version 1 (Without Compensation)Version 2 (With Compensation)Error Reduction (%)
MeanSDMeanSDMeanSD
101.1330.9201.0840.865−4.306−5.978
201.2130.8271.1620.800−4.140−3.360
301.0340.7860.9220.703−10.895−10.452
401.1430.9140.9480.782−17.111−14.451
501.1060.9051.1160.8110.895−10.367
601.0300.7960.9470.668−8.033−16.065
701.7561.2510.8490.656−51.629−47.526
805.1853.3261.3421.011−74.123−69.592
Table 3. The numerical results of direction angle error in 210° FoV camera.
Table 3. The numerical results of direction angle error in 210° FoV camera.
θ (°)Version 1 (Without Compensation)Version 2 (With Compensation)Error Reduction (%)
MeanSDMeanSDMeanSD
100.6970.5210.6840.504−1.850−3.322
200.8000.6200.7810.574−2.425−7.339
301.1340.8400.9800.720−13.540−14.277
402.0521.4021.5181.080−25.995−22.989
501.9741.3571.2180.932−38.280−31.344
601.4741.1630.8370.594−43.226−48.942
702.0851.4150.9200.717−55.880−49.322
802.3101.5910.8990.703−61.068−55.838
902.3731.6050.9290.725−60.838−54.859
Table 4. The numerical results of Hamming distance error in 170° FoV camera.
Table 4. The numerical results of Hamming distance error in 170° FoV camera.
θ (°)FSD-BRIEFORBdBRIEFmdBRIEF
MeanSDMeanSDMeanSDMeanSD
2025.1007.03325.6928.00520.4586.5203.4581.779
3020.6586.28422.7677.70018.8335.9743.1921.583
4021.8256.99425.8678.07323.8507.2374.6672.413
5021.3007.20930.0178.79831.9179.5167.0833.635
6023.3257.40737.05010.59844.21711.88211.6335.680
7026.5336.90452.17512.15460.74213.56319.8838.170
8033.85010.04587.23317.57292.79215.49543.08313.704
Table 5. The numerical results of Hamming distance error in 210° FoV camera.
Table 5. The numerical results of Hamming distance error in 210° FoV camera.
θ (°)FSD-BRIEFORBdBRIEFmdBRIEF
MeanSDMeanSDMeanSDMeanSD
2020.8925.63921.1586.30917.6004.4672.7751.345
3022.6086.12525.4677.24220.0005.8043.2081.460
4025.7677.47530.9258.59223.7087.4393.7921.788
5025.8757.99641.44211.53831.0839.7185.0581.881
6028.8677.97857.50815.82242.55813.9378.0584.101
7030.3178.17670.77516.62858.75814.92717.8928.579
8036.25010.37584.21716.84495.29216.52244.97514.635
9045.00014.17097.45021.129----
Table 6. The intrinsic parameters of the cameras in sRD-SIFT datasets.
Table 6. The intrinsic parameters of the cameras in sRD-SIFT datasets.
Intrinsic Parameterset1(FireWire)set2(Dragonfly)set3(Fisheye)
f x 539.389528.626306.780
f y 539.389528.626306.780
c x 312.103365.029634.729
c y 233.050228.558478.546
k 1 0.0537−0.0994−0.000788
k 2 0.0871−0.02050.0181
k 3 00.00661−0.0117
k 4 00.01500.00190
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Song, J.; Ding, Y.; Yuan, Y.; Wei, H.-L. FSD-BRIEF: A Distorted BRIEF Descriptor for Fisheye Image Based on Spherical Perspective Model. Sensors 2021, 21, 1839. https://doi.org/10.3390/s21051839

AMA Style

Zhang Y, Song J, Ding Y, Yuan Y, Wei H-L. FSD-BRIEF: A Distorted BRIEF Descriptor for Fisheye Image Based on Spherical Perspective Model. Sensors. 2021; 21(5):1839. https://doi.org/10.3390/s21051839

Chicago/Turabian Style

Zhang, Yutong, Jianmei Song, Yan Ding, Yating Yuan, and Hua-Liang Wei. 2021. "FSD-BRIEF: A Distorted BRIEF Descriptor for Fisheye Image Based on Spherical Perspective Model" Sensors 21, no. 5: 1839. https://doi.org/10.3390/s21051839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop