5.1. Experiment 1: The Contribution Evaluation of the Pixel Density Function to the Accuracy of Feature Point Orientation
Dataset: In this experiment, we investigated the contribution of the pixel density function to the accuracy of feature point orientation. In order to have accurate ground truth of the direction of feature points, we produced a virtual dataset by simulating a projection of the first image of the Graffiti dataset [
30]; this was used as a test image to two virtual fisheye cameras with different intrinsic parameters. At first, in the test image,
feature points
were extracted. During the generation of the virtual dataset, the test image and a selected virtual fisheye camera were placed in the same virtual space. By placing the test image in different poses, we projected each feature point in the fisheye image on several selected positions with different longitude angle
and latitude angle
. The relationship between the angle
,
and the pose of the test image is shown in
Appendix A.
takes
values and
takes
values. For each virtual fisheye camera,
test samples were generated. Each test sample consisted of a generated fisheye image
, a corresponding feature point position
in the fisheye image, and a ground truth feature point attitude matrix
. More details of the dataset are given in
Appendix B.
Baseline: To verify the effectiveness of the pixel density function compensation proposed in this paper, we compared two algorithms, namely, the feature point attitude matrix computation part of FSD-BRIEF without the compensation (version 1) and with (version 2). In version 1, the 3D gray centroid was calculated without the pixel density compensation term
. That is, the gray centroid computation formula of version 1 is shown as Equation (
18). In version 2, we used Equation (
11) to calculate the 3D gray centroids of feature points.
Fisheye cameras: In order to verify the contribution of the pixel density function under different FoV cameras, two virtual cameras with different FoVs were selected for this experiment.
Table 1 shows the intrinsic parameters of the two cameras.
Figure 8 shows the curve of the pixel density function of 170° FoV and 210° FoV cameras with
. From the curve, we can see that the curve of the pixel density function of 170° FoV cameras decreased in angle range 0–60°, and increased in angle range 60–80°. Another curve, which was for the pixel density function of 210° FoV camera, increased in the whole angle range of 0–90°.
Evaluation metrics: In the experimental verification process, the direction angle error of the feature point is used for quantitative evaluation. The direction angle error, denoted by
, is shown in
Figure 9, where
is the projection point of
on the unit sphere surface. The coordinate system
is the feature point attitude coordinate system corresponding to the ground truth feature point attitude matrix
, whilst
is the feature point attitude coordinate system corresponding to the calculated feature point attitude matrix
. Note that
and
are defined as the ground truth direction and the calculated direction of the feature point (see
Section 4.3). The unit of
is defined as degree (°). Let
and
be the coordinate of the unit direction vectors corresponding to
and
in the camera coordinate system, then:
From Equation (
19), we can obtain the expression of
as:
Note that values of could be calculated from experimental results indexed by (FoV longitude angle), (FoV Latitude Angle) and i (feature point index in test image). For an ideal method, is always zero, and the calculated direction of feature point is consistent with the real direction. In fact, due to the influence of noise, the angle error would not be zero. In this experiment, the smaller the value of , the more accurate the calculated feature point direction.
In this study, the mean error
and the standard deviation
were used to evaluate the results of
.
measures the average error of the feature point direction calculated by using all the points under the latitude angle
.
measures the dispersion of the
distribution under
.
and
are calculated as follows:
where, the
and
are the number of
and
i values. The smaller the
is, the more accurate the feature point direction is. The smaller the
is, the more stable the result of feature point direction is.
Evaluations: In the 170° FoV camera, the range of
is 10–80°. In the 210° FoV camera, the range of
is 10–90°. The two statistics
and
are computed for both of the two cases. The comparison results are shown in
Table 2 and
Table 3. The error reduction of version 2 compared to version 1 are calculated as follows:
where
is the value of error reduction,
and
are the value of the direction angle error of version 1 and version 2 individually. Taking the horizontal axis as the
value and the vertical axis as
and
, the
curves are also drawn in
Figure 10.
For the 170° FoV camera, both of the two compensation schemes led to similarly stable results in the angle range of 10–60°. However, when the angle became large (especially in the range of 60–80°), the performance of Version 2 was obviously much better than that of Version 1. Both of the average angle error and the accuracy dispersion of the proposed method (version 2) were about 1° in the whole fisheye FoV of the dataset.
For the 210° FoV camera, the overall performance of Version 2 was continuously better than that of Version 1 throughout the range of 30–90°.
The experimental results showed that near the edge of FoV, especially in the FoV region where the pixel density function increased monotonously with the angle , the pixel density compensation improved the accuracy and stability of feature point direction calculation significantly.
5.2. Experiment 2: Descriptor Invariance Evaluation of Fisheye Images in Different FoV Positions
Baselines: In this experiment, three typical BRIEF descriptors, including ORB, dBRIEF (Distorted BRIEF), and mdBRIEF, were selected as baselines. The descriptor of the feature point in each test sample in the virtual dataset generated in Experiment 1 was extracted by the tested features (FSD-BRIEF, ORB, dBRIEF, and mdBRIEF). In order to ensure a fair comparison of experimental results, all the binary descriptors were chosen to be 256 bits. dBRIEF is the version of mdBRIEF without on-line mask learning. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub. For ORB, we used the functions provided in OpenCV and its default parameter settings.
Evaluation metrics: In this experiment, we define
as the descriptor of the feature point
. The associated Hamming distance error
of the descriptor of the feature point was used to evaluate the invariance performance of algorithms.
is calculated for each feature point test sample by each test feature as:
here we selected
as the reference standard descriptor to compute the Hamming distance error, where
. For an ideal feature algorithm, for the same
, no matter what values of
and
take, there is
. However, in practice, due to the resampling error of the fisheye camera,
was not zero. Therefore, the smaller the calculated value of
, the stronger the invariance of the feature algorithm to radial distortion of the fisheye image.
Similar to Experiment 1,
and
were used as evaluation metrics.
is the average value of the descriptor distance calculated by using all the points under the latitude angle
.
is the dispersion of the
distribution under
. The smaller the
is, the stronger the invariance of feature algorithm to radial distortion of fisheye images. The smaller the
is, the more stable the performance of the feature algorithm is. The computation formula of
and
was as follows:
Evaluations: Since
was set for the reference standard descriptor
, the ranges of
were selected as 20–80° in 170° FoV camera, and 20–90° in 210° FoV camera respectively. The values of
and
of FSD-BRIEF, ORB, dBRIEF, and mdBRIEF were computed. The numerical results are shown in
Table 4 and
Table 5. The corresponding curves of
are shown in
Figure 11, and the curves of
are shown in
Figure 12.
The experimental results of the two cameras showed that, in the angle range of 20–40°, FSD-BRIEF led to similarly stable descriptor errors as ORB and dBRIEF. However, in the angle range of 40–80°, the descriptor errors of ORB and dBRIEF tended to increase significantly, while the descriptor errors of FSD-BRIEF increased much less than that of ORB and dBRIEF. In the angle range of 75–80°, the descriptor error of FSD-BRIEF was smaller than that of mdBRIEF. However, the descriptor error of FSD-BRIEF was larger than that of mdBRIEF in the angle range of 20–60°; this is because an on-line mask learning scheme was performed in mdBRIEF, where the unstable binary bits were masked.
The standard deviations (SD) of FSD-BRIEF, ORB and dBRIEF were similar in the angle range of 20–40°. In the angle range up to 50°, the SD of FSD-BRIEF was significantly smaller than that of ORB and dBRIEF. In the angle range of 20–60°, the SD value of FSD-BRIEF was not as small as mdBRIEF, but smaller than mdBRIEF in the angle range of 70–80°.
Because dBRIEF and mdBRIEF distorted the descriptor template based on the plane perspective model, it could not extract the feature descriptor when was 90°, and there was no 90° effective value of the descriptor errors.
It can be observed from the results that, compared with other BRIEF based features, FSD-BRIEF could effectively adapt to the radial distortion of fisheye images and ensure the invariance of descriptors.
5.3. Experiment 3: Matching Performance Evaluation in Different Kind of Image Variance
Dataset: In order to verify the FoV edge distortion invariance, translation invariance, and scale invariance performances of the proposed FSD-BRIEF in image matching process, a dataset captured by a 210° FoV fisheye camera was made. The intrinsic parameter of the 210° FoV fisheye camera is shown in
Table 1. There were three groups of images in this dataset, and each group contained 13 images. In the first group of images, through rotation of the camera, the test image fell on the edge of the camera’s FoV as close as possible, and the test image was distorted by the radial distortion of the fisheye camera to the greatest extent. In the second group of images, by moving and rotating the camera parallel to the test image plane, the test image fell in different positions of the camera FoV. In the third group of images, the camera moved forward and backward greatly relative to the test image, which made the projection of the test image in the fisheye image has a large-scale change.
Baselines: In this experiment, five state-of-the-art descriptors, AKAZE, BRISK, ORB, dBRIEF and mdBRIEF, were selected as baselines. For FSD-BRIEF, we used the FAST feature to extract feature points. For BRISK, ORB, and AKAZE, we used the functions provided in OpenCV with default parameter settings. For dBRIEF and mdBRIEF, we used the open source version provided in GitHub.
Evaluation metrics: In order to evaluate the matching performance of FSD-BRIEF proposed in this paper, according to [
30], we conducted comparison experiments with state-of-the-art descriptors by calculating the PR (recall—“1-precision”) curve of the matching results. Designate
,
to be a set of feature points detected in the image
and
respectively, then the set of ground truth matching points
can be given by:
where
refers to Euclidean distance between the
and the projecting point of
in image
,
is the ground truth homography matrix from image
to
, which was calculated by manually labeled corresponding points in the image sequence. The distance threshold
was taken as 3 pixels. To evaluate the matching performance of test features, let
be the set of matching feature point pairs gained by the algorithm from the image
and
, and
consisted of correct matches
and incorrect matches
. Hence, as shown in Equation (
26), the
presents the ability of the matching algorithm to find correct matches, and
indicates the algorithm’s capability of discarding unmatched points.
where
n is the number of images in the image sequence,
denotes the point pair number of a set,
is a descriptor distance threshold that was used to obtain the correct matches whose Euclidean distance between their descriptors is below
. Each of the two measures yielded a so-called PR cure by increasing the threshold
from zero gradually. That PR curve passed at a short distance of the ideal point (0, 1) meant the corresponding test feature was absolutely perfect which could make both the value of precision and recall rate 1. In practice, a good matching performance was achieved when the matching algorithm’s PR curve had the minimum distance to the point (0, 1), the highest recall, and the minimum 1-precision.
Evaluation: To test the matching performance in this dataset, we used the test features to extract and match features and drew PR curves. For each algorithm in each image, 300 strongest feature points were extracted. The PR curve results are shown in
Figure 13.
From
Figure 13a,b, the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.75–0.8. For other features involved in the comparison, the recall value at the end of the PR curve was in the range of 0.3–0.6. The result showed that, compared with other features, FSD-BRIEF had significant FoV edge distortion invariance in the feature matching process of severely distorted images.
Figure 13c,d shows that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was near 0.5. For other features involved in comparison, the recall value at the end of the PR curve was in the range of 0.25–0.5 and below FSD-BRIEF. The result showed that, compared with other features, FSD-BRIEF had better translation invariance in the feature matching process of fisheye images.
In
Figure 13e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF proposed in this paper was in the range of 0.4–0.45. For AKAZE, BRISK, ORB, and dBRIEF, the recall value at the end of the PR curve was in the range of 0.25–0.4. The recall value of FSD-BRIEF was higher than mdBRIEF when
was in the range of 0.05–0.3. The results showed that FSD-BRIEF had better scale invariance in the feature matching process of fisheye images compared with most of the state-of-the-art features.
Using AKAZE, BRISK, ORB, dBRIEF and mdBRIEF as references, experimental results showed that FSD-BRIEF showed comparable performance in FoV edge distortion invariance, translation invariance, scale invariance, and matching performance in fisheye images.
5.4. Experiment 4: Matching Performance Evaluation in Different Distortion Images
Dataset: In order to verify the matching performance of FSD-BRIEF under different radial distortion, the sRD-SIFT dataset was used in this experiment. The sRD-SIFT datasets [
22] were published with the work of sRD-SIFT. It consisted of three sets of images (FireWire, Dragonfly, and Fisheye), each set containing 13 images and captured by a camera with different radial distortion. The dataset contained significant scaling and rotation changes. Four images selected randomly for each dataset are shown in the right panels of
Figure 14.
Fisheye cameras: The three sets of images were attached with the image of a checkerboard calibration board for camera calibration. Therefore, we calibrated each camera based on the KB4 fisheye camera model using the chessboard image provided. The calibration results are shown in
Table 6.
Evaluation: Similar to Experiment 3, to test the matching performance in the three groups of the sRD-SIFT dataset, we also employed the baseline descriptors (ORB, AKAZE, BRISK, dBRIEF and mdBRIEF) to extract and match 300 strongest keypoints for each image, then draw PR curves. The results are shown in
Figure 14, where
Figure 14a,b shows the results and the image group with the least distortion.
Figure 14c,d shows the results and the image group with moderate distortion.
Figure 14e,f shows the results of the image group with the most distortion captured by fisheye cameras.
Figure 14a,b shows that the PR curve of FSD-BRIEF almost coincided with that of ORB and AKAZE, and the performance of AKAZE was slightly better. The recall rate at the end of the curve of FSD-BRIEF, ORB, and AKAZE was in the range of 0.65–0.7, which was higher than that of BRISK and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in small distorted images.
Figure 14c,d shows that the PR curve of FSD-BRIEF almost coincided with that of ORB, and the recall at the end of the curve was around 0.6, which was higher than that of AKAZE, BRISK, and dBRIEF. From the result, we can see that the performance of FSD-BRIEF was equivalent to that of ORB in moderate distorted images and better than AKAZE, BRISK, and dBRIEF.
In
Figure 14e,f, it can be observed that the recall value at the end of the PR curve of FSD-BRIEF was around 0.6, which was higher than that of ORB, AKAZE, BRISK, and dBRIEF, and almost the same as that of mdBRIEF. From the result, we can see that the performance of FSD-BRIEF was almost equivalent to that of mdBRIEF and better than ORB, AKAZE, BRISK, and dBRIEF in the most distorted images.
These experimental results show that the performance of FSD-BRIEF in large distortion image was better than most of the state-of-the-art features involved in the comparison. In small and moderate distorted images, the performance of FSD-BRIEF was similar to that of the ORB feature. That is because that the test image was close to the center of the FoV in this dataset, the radial distortion effect of the test image by the fisheye lens was limited compared with Experiment 3. Therefore, the performance of FSD-BRIEF in this paper on the sRD-SIFT dataset was not as prominent as the 210° FoV camera dataset in Experiment 3.