FSD-BRIEF: A Distorted BRIEF Descriptor for Fisheye Image Based on Spherical Perspective Model

Fisheye images with a far larger Field of View (FOV) have severe radial distortion, with the result that the associated image feature matching process cannot achieve the best performance if the traditional feature descriptors are used. To address this challenge, this paper reports a novel distorted Binary Robust Independent Elementary Feature (BRIEF) descriptor for fisheye images based on a spherical perspective model. Firstly, the 3D gray centroid of feature points is designed, and the position and direction of the feature points on the spherical image are described by a constructed feature point attitude matrix. Then, based on the attitude matrix of feature points, the coordinate mapping relationship between the BRIEF descriptor template and the fisheye image is established to realize the computation associated with the distorted BRIEF descriptor. Four experiments are provided to test and verify the invariance and matching performance of the proposed descriptor for a fisheye image. The experimental results show that the proposed descriptor works well for distortion invariance and can significantly improve the matching performance in fisheye images.

Compared with a pinhole camera, a fisheye camera has a wide field of view (FoV), and the captured image contains more abundant information. This makes the fisheye camera extensively adopted in robot navigation, visual monitoring, virtual reality, visual measurement, and 3D reconstruction. However, due to the severe radial distortion of the fisheye image, adopting the common feature descriptors directly may lead to a significant reduction in matching performance.
In order to reduce the impact of distortion on the feature matching performance, we propose a novel distorted BRIEF descriptor based on the spherical perspective model, named Fisheye Spherical Distorted BRIEF (FSD-BRIEF). Firstly, we propose a method based on 3D gray centroid to determine the direction of each feature point in the spherical image. By constructing an attitude matrix of a feature point, the position and direction of the feature point in the spherical image can be described in a nonsingular form. In order to reduce the calculation error of the 3D gray centroid caused by uneven distribution of pixels in the spherical image, a pixel density function is designed to represent the degree of pixel density on the spherical surface by the size of the patch area mapped by each pixel in the fisheye image. We build an attitude coordinate system of each feature point and propose a coordinate mapping method to project the BRIEF descriptor template on the fisheye image. The distortion form of the projected BRIEF template is consistent with the image distortion near the feature point, which prevents the calculated BRIEF descriptor from the affection of the radial distortion in fisheye image. The main contributions of the paper include:

1.
A new pixel density function represented by the area of the spherical surface patch that each pixel of fisheye image occupies; 2.
A new method of determining the 3D gray centroid and the direction of feature points with pixel density function based on a spherical perspective model; 3.
A new feature point attitude matrix, providing a nonsingular description for both the position and the direction of the feature point in the spherical image surface; 4.
A novel descriptor template distortion method based on the spherical perspective model and the feature point attitude matrix.
The remaining of the paper is arranged as follows. In Section 2, the related work of the fisheye image point feature is presented. In Section 3, the notation of the perspective model is briefly introduced. Section 4 is about the method of determining and expressing the direction of feature point. Then the method of calculating the FSD-BRIEF descriptor is described. In Section 5, experimental results are provided and the performance of the proposed FSD-BRIEF is tested and verified. Section 6 briefly summarizes the work. In Section 7, the future work is stated.

Related Work
By virtue of its front lens protruding in a parabola shape, fisheye camera has a large FoV whose angle of view is close to or even more than 180 • . Although this characteristic can maximize the angle of view, it brings severe radial distortion in its captured image, leading to different scale factors for pixels in different positions of the image. Thus, it could make the traditional feature descriptors designed for plane image fail to match in raw fisheye images [13,14].
Generally, the methods to extract descriptors in fisheye images can be divided into two main streams according to whether images are corrected or not: resampling and non-resampling approaches.
Resampling approaches [15][16][17] segment the FoV image into several sub-FoVs and correct them based on a plane perspective model, then feature descriptors can be extracted and matched on the corrected sub-FoV. Lin et al. [15] adopted a visual-inertial based UAV (Unmanned Aerial Vehicle) navigation system, where two sub-regions are sampled in the horizontal direction of the fisheye FoV to obtain two undistorted pinhole image fields, which cover 180 • horizontal FoV, but they discarded the upper and lower parts of the vertical FoV. Miiller et al. [16] presented a robust visual inertial odometry and time-efficient omni-directional 3D mapping system, where the FoV of each fisheye camera is divided into two piecewise pinhole fields so as to overcome the distortion. However, some parts near the edge of the FoV are wasted. Wang et al. [17] proposed a new real-time feature-based simultaneous localization and mapping system, where a fisheye image is projected onto five surfaces of a cube, and then descriptors are extracted on the unfolded surfaces of the cube. However, the stretching distortion and seam distortion exist between surfaces, for example, a straight line will become a broken line. Thus, in the resampling approaches, the whole FoV of the fisheye image is hard to be fully utilized, and the continuity between sub-FoV cannot be guaranteed. In addition, due to the view geometry of the plane perspective model, there is a small stretching distortion in the edge of the sub-FoV.
Unlike the resampling approaches, which directly correct fisheye images to pinhole images, a non-resampling approach uses descriptors to describe features in fisheye images. For example, inspired by the planar SIFT framework [18][19][20], Arican et al. [21] designed a new scale invariant omni-directional SIFT feature based on Riemannian geometry. Lourenco et al. [22] proposed a Spherical Radial Distortion SIFT (sRD-SIFT) feature, where the extraction of the feature and the calculation of the descriptor was designed based on the spherical perspective model and the raw fisheye image without resampling. However, the improved algorithms based on SIFT are generally long time-consuming. Cruz-Mota et al. [23] and Hansen et al. [24] utilized spherical harmonic function as the basic function to study the spectral analysis of spherical panoramic images. Since Gaussian filtering on the sphere can be realized as a diffusion process through the spherical Fourier transform, spherical harmonic function is used to construct scale space on the sphere. In theory, the spherical harmonic function can be used to maintain the invariance of the descriptors to encounter the changes of the camera poses and positions. However, the spherical harmonic function usually needs a large amount of computation and has inherent bandwidth limitation. This greatly weakens the capability of dealing with large-scale matching problems and cannot meet the real-time requirements of many applications.
For improving the calculation speed, Qiang et al. [25] proposed Spherical ORB (SPHORB), a binary spherical feature based on the ORB feature, which is the first binary descriptor for a panoramic image based on hexagon geodesic grid. In essence, SPHORB is still a special resampling approach, which divides the spherical panoramic image into 20 regular triangle fields according to the shape of a regular icosahedron, and aligns the pixel of adjacent regular triangles seamlessly. However, in the hexagon geodesic grid, the image patches near the 12 vertices of the regular icosahedron are discarded due to the distortion of the pixel distribution pattern, resulting in 12 FoV holes occupying 1.4% of the total FoV.
Note that it can result in holes when resampling the fisheye image based on hexagon geodesic grid. To avoid this, Urban et al. [26] proposed a new distorted descriptor, called Masked Distorted BRIEF (mdBRIEF). Although this work distorts the descriptors to adapt to different image regions instead of correcting the distortion of the fisheye image, the direction angle of feature points is obtained in the raw fisheye image by calculating the gray centroid in a circle template, which is still affected by the fisheye image distortion. Furthermore, the descriptors are distorted excessively near the edge of the fisheye image since it is distorted based on the plane perspective model.
Most recently, Pourian et al. [27] proposed an end-to-end framework to enhance the precision of the descriptor matching between multiple wide-angle images. In their work, the global matching and the local matching of descriptors are combined in three stages. However, a new distortion in the edge of the corrected image is introduced when an equal rectangle image transformation is employed in the global matching stage, lowering the performance of the framework.
In summary, the binary descriptor that can make use of the whole FoV and keep invariance in each position of the fisheye image has not been proposed. In order to avoid the FoV holes caused by the resampling approaches, and reduce the excessive distortion of descriptors in large FoV images, in this paper, we design a novel Fisheye Spherical Distorted BRIEF (FSD-BRIEF) descriptor, which is a distorted binary feature descriptor based on the spherical perspective model for fisheye images.

Fisheye Camera Model
In this paper, in order to ensure that the FoV of the fisheye image can be fully utilized without losing the performance of the feature descriptor, a new descriptor FSD-BRIEF is designed based on spherical perspective model. Different from the plane perspective model, the projection surface is a unit sphere with the origin of camera coordinate system as center, so as to ensure that the scale factors of each position on the projection surface are consistent. The spherical perspective model and its perspective projection relationship are shown in Figure 1. We define the camera coordinate system as O c X c Y c Z c . The origin point O c is located at the optical center of the camera, the X-axis O c X c points to the right along the long side of the imaging target surface, the Y-axis O c Y c points downward along the wide edge direction of the imaging target surface, and the Z-axis O c Z c points to the front of the camera along the optical axis direction. P is the projection point on the spherical image surface of the space point P. L is the projection large arc on the spherical image surface of the space Line L. For a point P in a three-dimensional space, define its space coordinate in camera coordinate system as: The projection point of P in the fisheye image is p, and its pixel coordinates are expressed as follows: In this paper, Kannala-Brandt4 (KB4) [28] model is used as the fisheye camera model, its mathematical form is shown below: where f x and f y are the horizontal and vertical focal length of the camera, c x and c y are the coordinates of the principal points of the camera, and k 1 , k 2 , k 3 , k 4 are the distortion coefficients. θ is the FoV latitude angle, which represents the angle between the O c Z c axis and the vector − − → O c P. ϕ is the FoV longitude angle, which denotes the angle between the O c X c axis and the projection vector of − − → O c P on the X c O c Y c plane. θ d is the angle θ as deflected by the fisheye lens. The arctan2 is the quadrant aware version of arctangent function.
Based on the spherical perspective model in Equation (3), Π represents the mapping function. The mapping from the point P c to the pixel point p in fisheye image can be expressed as: The inverse mapping function of Π is defined as Π −1 , which indicates the mapping from the point p to the point P on the spherical image surface as follows: where P c is the coordinate vector of point P in the camera coordinate system. Notice that P c = x 2 + y 2 + z 2 = 1.

FSD-BRIEF Descriptor
The procedure of extracting the FSD-BRIEF descriptor includes four steps, namely, pixel density function designing, 3D gray centroid calculation, feature point attitude matrix construction, and FSD-BRIEF descriptor extraction. In the spherical perspective model, the densities of pixels are distributed unevenly, lowering the effectiveness of descriptors. Thus, a pixel density function is proposed firstly to calculate the distribution compensation of each pixel so as to reduce the effect of uneven pixel distribution. Then, with the help of the pixel density function, a more accurate 3D gray centroid is designed to determine the direction of FSD-BRIEF descriptor and keep its rotation invariance in the spherical perspective model. Next, we further devise a nonsingular form, a feature point attitude matrix, to represent the position and the direction of a feature point. Finally, based on the feature point attitude matrix, an FSD-BRIEF descriptor is extracted by a constructed coordinate mapping relation between the BRIEF template and the raw fisheye image.

Pixel Density Function Designing
In this section, by defining the pixel density function, the distribution density of pixels on the unit sphere surface is expressed numerically.
Assuming that a pixel p in a fisheye image occupies a small patch PIX_PATCH(p) of the corresponding unit sphere, the mathematical expression of PIX_PATCH(p) is given by: where ∆u and ∆v are the coordinate offsets under the pixel coordinate system in the fisheye image. It is obvious that the area of the patch PIX_PATCH(p) will be smaller if the distance between point p and its adjacent pixels is closer, which means that the pixel density of point p is denser. Therefore, the pixel density function m(p) is defined as the area of the patch PIX_PATCH(p). To simplify the computation of the curved surface area, we assume that the patch size is small enough to approximate as a parallelogram, so the pixel density function compensation m(p) can be computed by: where 2 means L2 norm operation, and ∆x, ∆y are the coordinate offsets as follows: From Equation (7), the pixel density function m(p) of the whole FoV only depends on the mapping function Π of the spherical perspective model in Equation (4).

3D Gray Centroid Calculation
To determine the direction of the FSD-BRIEF descriptor, we propose a 3D gray centroid. Compared with 2D gray centroid [13,14,26], the proposed 3D gray centroid is more accurate since it takes full advantage of the consistent scale factor on the spherical perspective model. The 3D gray centroid is calculated in a circle area on the unit spherical surface. Figure 2 illustrates the correspondence of the circle area between the unit spherical surface and the fisheye image plane. As shown in Figure 2, for a FAST (Features From Accelerated Segment Test) [29] feature point p, its projection point on the unit spherical surface is P , and its 3D gray centroid calculation area is the circle area PATCH_3D(P ) with P as the center. PATCH(p) is the projection of the PATCH_3D(P ) in the fisheye image plane O P X P Y P . α is half of the apex angle of the cone formed by PATCH_3D(P ) and the origin point O c .
Note that the horizontal and vertical angular resolutions of fisheye cameras are approximately f x and f y (Pixels Per Radian) in KB4 model, and the values of f x and f y are often very close. In order to make the radius of the circular range cover about 15 pixel width while ensure the same mathematical status of f x and f y , the value of α in radians is selected as 15 divided by the arithmetic mean of f x and f y , that is, The circle area for 3D gray centroid calculation on the unit spherical surface and its projection area in the fisheye image plane.
Define the projection area PATCH(p) as: where ∆p is the offset from the pixel p to the pixel in the area PATCH(p) in the fisheye image plane. Π −1 (p) is the position vector of P . P is also the projection point of the pixel p on the unit sphere. Π −1 (p + ∆p) represents the position vector of the projection point of the pixel p + ∆p on the unit sphere. Π −1 (p + ∆p) · Π −1 (p) > cos α means that the angle between the two vectors Π −1 (p + ∆p) and Π −1 (p) is less than α. The region PATCH(p) is actually the projection area of the region PATCH_3D(P ) on the fisheye image. The 3D gray centroid of the feature point p is defined as C. The symbol C c denotes the coordinate vector of C in the camera coordinate system. The calculation formula of C c is: where p is a pixel in PATCH(p). I(p k ) represents the gray value of the pixel p in PATCH(p), m( p) is the pixel density function value of p, Π −1 ( p) indicates the 3D coordinate of the projection point on the unit sphere surface of p.

Feature Point Attitude Matrix Construction
In order to avoid the singularity of direction expression of feature points on the poles of the unit spherical surface [25], we propose a feature point attitude matrix, a nonsingular expression, to represent the position and the direction of a feature point. The feature point The X-axis is coplanar with the 3D gray centroid vector C c and the position vector The coordinate transformation matrix R cb from the feature point attitude coordinate system to the camera coordinate system can be obtained as follows: The matrix R cb is defined as feature point attitude matrix.

FSD-BRIEF Descriptor Extraction
In this section, to enhance the distortion invariance of the descriptor in the fisheye image, FSD-BRIEF will be extracted by distorting the BRIEF template based on the constructed feature point attitude matrix so that its template can fit the distortion form of the adjacent area of the feature point.
At first, for a feature point, we define its square neighborhood region as a BRIEF template with a coordinate system O B X B Y B whose origin point O B is located at the feature point and coordinate ranges from −15 to 15, as shown in Figure 4. The green lines are the selected 256 groups of pixel pairs on the template.
Then, the defined BRIEF template is scaled to a certain extent and placed at the feature point as shown in Figure 5. For doing so, the following three conditions must be satisfied: There is a scale factor α 15 between the coordinates in the BRIEF template coordinate system and the coordinates in the feature point attitude coordinate system.  Figure 5. Position relationship between BRIEF template and spherical projection surface. Figure 6 shows a zoom-in of a local area along the direction of O b Z b in Figure 5 at the feature point P . As shown in Figure 6, for a point P on the BRIEF template, its homogeneous coordinate vector in O B X B Y B coordinate system is s. The coordinate vector of point P in the feature point attitude coordinate system is P b . Then, the P b can be solved by: where Figure 6. Coordinate mapping between BRIEF template coordinate system and feature point attitude coordinate system.
According to the law of 3D coordinate transformation and the P b , the coordinate P c of point P in the camera coordinate system can be calculated by: where R cb is the feature point attitude matrix. The projection point p of P c in the fisheye image can be obtained by: To sum up, for a feature point whose attitude matrix is R cb , the coordinate mapping relationship between the point s in the BRIEF template and the projection point p in the fisheye image is: According to Equation (17), the FSD-BRIEF of a feature point can be extracted by the calculated projection points of the FSD-BRIEF template in the fisheye image. Figure 7 shows the general view of the FSD-BRIEF descriptor. It is clear that the FSD-BRIEF template in the fisheye image changes with the position where the feature point is located, so as to ensure that the descriptor is adaptive to the different distortions in the fisheye image, and achieves a good performance on distortion invariance.

Experimental Evaluation
In this section, we present four experiments that were used for evaluating the performance of the proposed method. Experiment 1 was an ablation experiment carried out on a virtual dataset, which was used to verify the contribution of pixel density function towards improving the solution accuracy of FSD-BRIEF orientation. Experiment 2 was also conducted on the virtual dataset, aiming to prove the invariance of FSD-BRIEF compared with three BRIEF-based descriptors. Experiment 3 and Experiment 4 were performed to evaluate the matching performance of FSD-BRIEF under (1) different camera motions on a real dataset, and (2) distortion conditions on sRD-SIFT dataset [22], respectively. The results of these two experiments were compared with those produced by five state-of-the-art features.

Experiment 1: The Contribution Evaluation of the Pixel Density Function to the Accuracy of Feature Point Orientation
Dataset: In this experiment, we investigated the contribution of the pixel density function to the accuracy of feature point orientation. In order to have accurate ground truth of the direction of feature points, we produced a virtual dataset by simulating a projection of the first image of the Graffiti dataset [30]; this was used as a test image to two virtual fisheye cameras with different intrinsic parameters. At first, in the test image, N p feature points p i t (i = 1, 2, ..., N p ) were extracted. During the generation of the virtual dataset, the test image and a selected virtual fisheye camera were placed in the same virtual space. By placing the test image in different poses, we projected each feature point in the fisheye image on several selected positions with different longitude angle ϕ and latitude angle θ. The relationship between the angle ϕ, θ and the pose of the test image is shown in Appendix A. ϕ takes N ϕ values and θ takes N θ values. For each virtual fisheye camera, N p × N ϕ × N θ test samples were generated. Each test sample consisted of a generated fisheye image I(ϕ, θ, p i t ), a corresponding feature point position p i c (ϕ, θ, p i t )in the fisheye image, and a ground truth feature point attitude matrix R i cb * (ϕ, θ, p i t ). More details of the dataset are given in Appendix B.
Baseline: To verify the effectiveness of the pixel density function compensation proposed in this paper, we compared two algorithms, namely, the feature point attitude matrix computation part of FSD-BRIEF without the compensation (version 1) and with (version 2). In version 1, the 3D gray centroid was calculated without the pixel density compensation term m( p). That is, the gray centroid computation formula of version 1 is shown as Equation (18). In version 2, we used Equation (11) to calculate the 3D gray centroids of feature points.
Fisheye cameras: In order to verify the contribution of the pixel density function under different FoV cameras, two virtual cameras with different FoVs were selected for this experiment. Table 1 shows the intrinsic parameters of the two cameras.  Figure 8 shows the curve of the pixel density function of 170°FoV and 210°FoV cameras with θ. From the curve, we can see that the curve of the pixel density function of 170°FoV cameras decreased in angle range 0-60°, and increased in angle range 60-80°. Another curve, which was for the pixel density function of 210°FoV camera, increased in the whole angle range of 0-90°.
Evaluation metrics: In the experimental verification process, the direction angle error of the feature point is used for quantitative evaluation. The direction angle error, denoted by e(ϕ, θ, p i t ), is shown in Figure 9, where P i (ϕ, θ, p i t ) is the projection point of p i c (ϕ, θ, p i t ) on the unit sphere surface.
is the feature point attitude coordinate system corresponding to the ground truth feature point attitude in the camera coordinate system, then: From Equation (19), we can obtain the expression of e(ϕ, θ, p i t ) as:  Note that values of e(ϕ, θ, p i t ) could be calculated from experimental results indexed by ϕ (FoV longitude angle), θ (FoV Latitude Angle) and i (feature point index in test image). For an ideal method, e(ϕ, θ, p i t ) is always zero, and the calculated direction of feature point is consistent with the real direction. In fact, due to the influence of noise, the angle error e(ϕ, θ, p i t ) would not be zero. In this experiment, the smaller the value of e(ϕ, θ, p i t ), the more accurate the calculated feature point direction.
In this study, the mean error e mean (θ) and the standard deviatione SD (θ) were used to evaluate the results of e ϕ, θ, p i t . e mean (θ) measures the average error of the feature point direction calculated by using all the points under the latitude angle θ. e SD (θ) measures the dispersion of the e ϕ, θ, p i t distribution under θ. e mean (θ) and e SD (θ) are calculated as follows: where, the N ϕ and N p are the number of ϕ and i values. The smaller the e mean (θ) is, the more accurate the feature point direction is. The smaller the e SD (θ) is, the more stable the result of feature point direction is. Evaluations: In the 170°FoV camera, the range of θ is 10-80°. In the 210°FoV camera, the range of θ is 10-90°. The two statistics e mean (θ) and e SD (θ) are computed for both of the two cases. The comparison results are shown in Tables 2 and 3. The error reduction of version 2 compared to version 1 are calculated as follows: where η is the value of error reduction, e v1 and e v2 are the value of the direction angle error of version 1 and version 2 individually. Taking the horizontal axis as the θ value and the vertical axis as e mean (θ) and e SD (θ), the e − θ curves are also drawn in Figure 10. For the 170°FoV camera, both of the two compensation schemes led to similarly stable results in the angle range of 10-60°. However, when the angle θ became large (especially in the range of 60-80°), the performance of Version 2 was obviously much better than that of Version 1. Both of the average angle error and the accuracy dispersion of the proposed method (version 2) were about 1 • in the whole fisheye FoV of the dataset.
For the 210°FoV camera, the overall performance of Version 2 was continuously better than that of Version 1 throughout the range of 30-90°.
The experimental results showed that near the edge of FoV, especially in the FoV region where the pixel density function increased monotonously with the angle θ, the pixel density compensation improved the accuracy and stability of feature point direction calculation significantly.

Experiment 2: Descriptor Invariance Evaluation of Fisheye Images in Different FoV Positions
Baselines: In this experiment, three typical BRIEF descriptors, including ORB, dBRIEF (Distorted BRIEF), and mdBRIEF, were selected as baselines. The descriptor of the feature point in each test sample in the virtual dataset generated in Experiment 1 was extracted by the tested features (FSD-BRIEF, ORB, dBRIEF, and mdBRIEF). In order to ensure a fair comparison of experimental results, all the binary descriptors were chosen to be 256 bits. dBRIEF is the version of mdBRIEF without on-line mask learning. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub. For ORB, we used the functions provided in OpenCV and its default parameter settings.
Evaluation metrics: In this experiment, we define D(ϕ, θ, p i t ) as the descriptor of the feature point p i c (ϕ, θ, p i t ). The associated Hamming distance error ∆D(ϕ, θ, p i t ) of the descriptor of the feature point was used to evaluate the invariance performance of algorithms. ∆D(ϕ, θ, p i t ) is calculated for each feature point test sample by each test feature as: here we selected D(ϕ 0 , θ 0 , p i t ) as the reference standard descriptor to compute the Hamming distance error, where ϕ 0 = 45 • θ 0 = 10 • . For an ideal feature algorithm, for the same p i t , no matter what values of ϕ and θ take, there is ∆D(ϕ, θ, p i t ) = 0. However, in practice, due to the resampling error of the fisheye camera, ∆D(ϕ, θ, p i t ) was not zero. Therefore, the smaller the calculated value of ∆D(ϕ, θ, p i t ), the stronger the invariance of the feature algorithm to radial distortion of the fisheye image.
Similar to Experiment 1, ∆D mean (θ) and ∆D SD (θ) were used as evaluation metrics. ∆D mean (θ) is the average value of the descriptor distance calculated by using all the points under the latitude angle θ. ∆D SD (θ) is the dispersion of the ∆D ϕ, θ, p i t distribution under θ. The smaller the ∆D mean (θ) is, the stronger the invariance of feature algorithm to radial distortion of fisheye images. The smaller the ∆D SD (θ) is, the more stable the performance of the feature algorithm is. The computation formula of ∆D mean (θ) and ∆D SD (θ) was as follows: Evaluations: Since θ 0 = 10 • was set for the reference standard descriptor D(ϕ 0 , θ 0 , p i t ), the ranges of θ were selected as 20-80°in 170°FoV camera, and 20-90°in 210°FoV camera respectively. The values of ∆D mean (θ) and ∆D SD (θ) of FSD-BRIEF, ORB, dBRIEF, and mdBRIEF were computed. The numerical results are shown in Tables 4 and 5. The corresponding curves of ∆D mean (θ) are shown in Figure 11, and the curves of ∆D SD (θ) are shown in Figure 12.   The experimental results of the two cameras showed that, in the angle range of 20 • -40 • , FSD-BRIEF led to similarly stable descriptor errors as ORB and dBRIEF. However, in the angle range of 40 • -80 • , the descriptor errors of ORB and dBRIEF tended to increase significantly, while the descriptor errors of FSD-BRIEF increased much less than that of ORB and dBRIEF. In the angle range of 75 • -80 • , the descriptor error of FSD-BRIEF was smaller than that of mdBRIEF. However, the descriptor error of FSD-BRIEF was larger than that of mdBRIEF in the angle range of 20 • -60 • ; this is because an on-line mask learning scheme was performed in mdBRIEF, where the unstable binary bits were masked.
The standard deviations (SD) of FSD-BRIEF, ORB and dBRIEF were similar in the angle range of 20 • -40 • . In the angle range up to 50 • , the SD of FSD-BRIEF was significantly smaller than that of ORB and dBRIEF. In the angle range of 20 • -60 • , the SD value of FSD-BRIEF was not as small as mdBRIEF, but smaller than mdBRIEF in the angle range of 70 • -80 • .
Because dBRIEF and mdBRIEF distorted the descriptor template based on the plane perspective model, it could not extract the feature descriptor when θ was 90 • , and there was no 90 • effective value of the descriptor errors.
It can be observed from the results that, compared with other BRIEF based features, FSD-BRIEF could effectively adapt to the radial distortion of fisheye images and ensure the invariance of descriptors.

Experiment 3: Matching Performance Evaluation in Different Kind of Image Variance
Dataset: In order to verify the FoV edge distortion invariance, translation invariance, and scale invariance performances of the proposed FSD-BRIEF in image matching process, a dataset captured by a 210°FoV fisheye camera was made. The intrinsic parameter of the 210°FoV fisheye camera is shown in Table 1. There were three groups of images in this dataset, and each group contained 13 images. In the first group of images, through rotation of the camera, the test image fell on the edge of the camera's FoV as close as possible, and the test image was distorted by the radial distortion of the fisheye camera to the greatest extent. In the second group of images, by moving and rotating the camera parallel to the test image plane, the test image fell in different positions of the camera FoV. In the third group of images, the camera moved forward and backward greatly relative to the test image, which made the projection of the test image in the fisheye image has a large-scale change.
Baselines: In this experiment, five state-of-the-art descriptors, AKAZE, BRISK, ORB, dBRIEF and mdBRIEF, were selected as baselines. For FSD-BRIEF, we used the FAST feature to extract feature points. For BRISK, ORB, and AKAZE, we used the functions provided in OpenCV with default parameter settings. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub.
Evaluation metrics: In order to evaluate the matching performance of FSD-BRIEF proposed in this paper, according to [30], we conducted comparison experiments with stateof-the-art descriptors by calculating the PR (recall-"1-precision") curve of the matching results. Designate S i , S j to be a set of feature points detected in the image I i and I j respectively, then the set of ground truth matching points G ij can be given by: (25) where · refers to Euclidean distance between the p i and the projecting point of p j in image I i , H ij is the ground truth homography matrix from image I i to I j , which was calculated by manually labeled corresponding points in the image sequence. The distance threshold ε was taken as 3 pixels. To evaluate the matching performance of test features, let M ij be the set of matching feature point pairs gained by the algorithm from the image I i and I j , and M ij consisted of correct matches M ij true and incorrect matches M ij f alse . Hence, as shown in Equation (26), the recall(ε ) presents the ability of the matching algorithm to find correct matches, and 1 − precision(ε ) indicates the algorithm's capability of discarding unmatched points.
where n is the number of images in the image sequence, N( * ) denotes the point pair number of a set, ε is a descriptor distance threshold that was used to obtain the correct matches whose Euclidean distance between their descriptors is below ε . Each of the two measures yielded a so-called PR cure by increasing the threshold ε from zero gradually. That PR curve passed at a short distance of the ideal point (0, 1) meant the corresponding test feature was absolutely perfect which could make both the value of precision and recall rate 1. In practice, a good matching performance was achieved when the matching algorithm's PR curve had the minimum distance to the point (0, 1), the highest recall, and the minimum 1-precision. Evaluation: To test the matching performance in this dataset, we used the test features to extract and match features and drew PR curves. For each algorithm in each image, 300 strongest feature points were extracted. The PR curve results are shown in Figure 13.
From Figure 13a,b, the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.75-0.8. For other features involved in the comparison, the recall value at the end of the PR curve was in the range of 0.3-0.6. The result showed that, compared with other features, FSD-BRIEF had significant FoV edge distortion invariance in the feature matching process of severely distorted images. Figure 13c,d shows that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was near 0.5. For other features involved in comparison, the recall value at the end of the PR curve was in the range of 0.25-0.5 and below FSD-BRIEF. The result showed that, compared with other features, FSD-BRIEF had better translation invariance in the feature matching process of fisheye images.
In Figure 13e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.4-0.45. For AKAZE, BRISK, ORB, and dBRIEF, the recall value at the end of the PR curve was in the range of 0.25-0.4. The recall value of FSD-BRIEF was higher than mdBRIEF when 1 − precision was in the range of 0.05-0.3. The results showed that FSD-BRIEF had better scale invariance in the feature matching process of fisheye images compared with most of the state-of-the-art features.
Using AKAZE, BRISK, ORB, dBRIEF and mdBRIEF as references, experimental results showed that FSD-BRIEF showed comparable performance in FoV edge distortion invariance, translation invariance, scale invariance, and matching performance in fisheye images.

Experiment 4: Matching Performance Evaluation in Different Distortion Images
Dataset: In order to verify the matching performance of FSD-BRIEF under different radial distortion, the sRD-SIFT dataset was used in this experiment. The sRD-SIFT datasets [22] were published with the work of sRD-SIFT. It consisted of three sets of images (FireWire, Dragonfly, and Fisheye), each set containing 13 images and captured by a camera with different radial distortion. The dataset contained significant scaling and rotation changes. Four images selected randomly for each dataset are shown in the right panels of Figure 14.
Fisheye cameras: The three sets of images were attached with the image of a checkerboard calibration board for camera calibration. Therefore, we calibrated each camera based on the KB4 fisheye camera model using the chessboard image provided. The calibration results are shown in Table 6. Evaluation: Similar to Experiment 3, to test the matching performance in the three groups of the sRD-SIFT dataset, we also employed the baseline descriptors (ORB, AKAZE, BRISK, dBRIEF and mdBRIEF) to extract and match 300 strongest keypoints for each image, then draw PR curves. The results are shown in Figure 14, where Figure 14a,b shows the results and the image group with the least distortion. Figure 14c,d shows the results and the image group with moderate distortion. Figure 14e,f shows the results of the image group with the most distortion captured by fisheye cameras. Figure 14a,b shows that the PR curve of FSD-BRIEF almost coincided with that of ORB and AKAZE, and the performance of AKAZE was slightly better. The recall rate at the end of the curve of FSD-BRIEF, ORB, and AKAZE was in the range of 0.65-0.7, which was higher than that of BRISK and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in small distorted images. Figure 14c,d shows that the PR curve of FSD-BRIEF almost coincided with that of ORB, and the recall at the end of the curve was around 0.6, which was higher than that of AKAZE, BRISK, and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in moderate distorted images and better than AKAZE, BRISK, and dBRIEF.
In Figure 14e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF was around 0.6, which was higher than that of ORB, AKAZE, BRISK, and dBRIEF, and almost the same as that of mdBRIEF. From the result, we can see that the performance of FSD-BRIEF was almost equivalent to that of mdBRIEF and better than ORB, AKAZE, BRISK, and dBRIEF in the most distorted images.
These experimental results show that the performance of FSD-BRIEF in large distortion image was better than most of the state-of-the-art features involved in the comparison. In small and moderate distorted images, the performance of FSD-BRIEF was similar to that of the ORB feature. That is because that the test image was close to the center of the FoV in this dataset, the radial distortion effect of the test image by the fisheye lens was limited compared with Experiment 3. Therefore, the performance of FSD-BRIEF in this paper on the sRD-SIFT dataset was not as prominent as the 210°FoV camera dataset in Experiment 3.

Conclusions
In this paper, to tackle the problem of the feature matching performance deterioration due to the impact of fisheye radial distortion, we proposed a novel distorted BRIEF descriptor, named FSD-BRIEF, for fisheye images based on the spherical projection model. First, for reducing the impact of the distortion on gray centroid calculation and the accuracy of feature point direction, we designed a pixel density function and evaluated its performance by comparing the feature point direction error results of the algorithms with and without using the function. The obtained results shown that the pixel density function can promote the precision of the feature point direction calculation. Second, the distortion invariance of the proposed FSD-BRIEF was verified and compared with other BRIEF based descriptors, and the associated results demonstrated that FSD-BRIEF works well for distortion invariance in different positions of fisheye images. In the matching experiments in 210°FoV camera datasets, FSD-BRIEF shown better performance for FoV edge distortion invariance, translation invariance, and scale invariance in large distortion fisheye images. In the sRD-SIFT dataset, the FSD-BRIEF descriptor can significantly improve the matching performance for large distortion images, and meanwhile can still produce excellent results for small distortion images.

Future Work
It is known that panoramic images have been widely used today. The proposed descriptor can be adapted and potentially applied to panoramic images, with some slight modifications of the camera model and the computation method of the pixel density function, respectively. Moreover, in the future work, we will design a distorted FAST detector based on the spherical perspective model for panoramic images to extract feature points at any position including the two Polar Regions.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Coordinate Transformation for Virtual Dataset Generation
For the test picture shown in Figure A1a, define the test image coordinate system O t X t Y t Z t , as shown in Figure A1b. The coordinate origin O t is located in the first pixel in the upper left corner of the test image.  The X-axis represents the row of the image pixel, the Y-axis represents the column of the image pixel, and the Z-axis is determined according to the right-hand rule.
In this paper, the coordinate system transformation process from the camera coordinate system O c X c Y c Z c to the test image coordinate system 0 t X t Y t Z t is shown in Figure A2, which mainly includes three steps: (1) Deflection transformation, (2) Roll transformation, (3) Translation transformation.
The coordinate P i c of point P i and C i c of point C i in the camera coordinate system are calculated by using the coordinate transformation relationship, as shown in the following Equations: The projection position p i c (ϕ, θ, p i t ) of the feature point p i t in the fisheye image is calculated by the following formula: According to the relationship shown in the following formula, the ground truth attitude matrix R i cb * (ϕ, θ, p i t ) corresponding to the feature point p i c (ϕ, θ, p i t ) in the fisheye image is calculated by: In this dataset, each virtual fisheye image I(ϕ, θ, p i t ) uniquely corresponds to a feature point pixel coordinate p i c (ϕ, θ, p i t ) and a ground truth attitude matrix R i cb * (ϕ, θ, p i t ), which constitutes a feature point test sample. Some examples of test samples are shown in Figure A4.