Next Article in Journal
Face Memorization Using AIM Model for Mobile Robot and Its Application to Name Calling Function
Previous Article in Journal
Building a Graph Signal Processing Model Using Dynamic Time Warping for Load Disaggregation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Illumination-Invariant Feature Point Detection Based on Neighborhood Information

1
Key Laboratory of Metallurgical Equipment and Control Technology, Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China
2
Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan 430081, China
3
Institute of Robotics and Intelligent Systems, Wuhan University of Science and Technology, Wuhan 430081, China
4
School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China
5
School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide 5005, Australia
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(22), 6630; https://doi.org/10.3390/s20226630
Submission received: 22 October 2020 / Revised: 10 November 2020 / Accepted: 16 November 2020 / Published: 19 November 2020
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Feature point detection is the basis of computer vision, and the detection methods with geometric invariance and illumination invariance are the key and difficult problem in the field of feature detection. This paper proposes an illumination-invariant feature point detection method based on neighborhood information. The method can be summarized into two steps. Firstly, the feature points are divided into eight types according to the number of connected neighbors. Secondly, each type of feature points is classified again according to the position distribution of neighboring pixels. The theoretical deduction proves that the proposed method has lower computational complexity than other methods. The experimental results indicate that, when the photometric variation of the two images is very large, the feature-based detection methods are usually inferior, while the learning-based detection methods performs better. However, our method performs better than the learning-based detection method in terms of the number of feature points, the number of matching points, and the repeatability rate stability. The experimental results demonstrate that the proposed method has the best illumination robustness among state-of-the-art feature detection methods.

1. Introduction

Digital images consist of limited and discrete pixels obtained using digital image sensors (such as CCD or CMOS). These discrete pixels reflect energy intensity through numerical values, and the energy intensity is related to the characteristics of the captured object. Due to the existence of this relationship, the features of the captured object can be expressed by the pixels in the image. Feature detection is an abstraction of image information and a local decision-making method for each pixel whether there is a given type of feature. It is a fundamental problem in computer vision and has many practical applications, such as object detection [1], stereo matching [2], color matching [3], and motion estimation [4]. In order to response to diverse applications, many detection methods have been proposed [5,6]. Following traditional classification methods, feature detection can be divided into point, edge, and region detection. Feature point is most widely used because of its stability and uniqueness.
Feature point detection with geometric invariance and illumination invariance has always been a challenging problem. Geometric invariance includes translation, rotation, scale, and affine invariance. The illumination invariance is also called illumination robustness. The illumination robustness of the feature detector reflects the ability to extract features from low-illumination or overexposed images. In the past, this work was often used as a supplement to geometric invariance, and there were few dedicated studies as if it were not important. However, with the widespread application of computer vision, feature point detection in complex scenes (such as non-uniform illumination) has become a must. The illumination invariance becomes as important as the geometric invariance. This paper focuses on the feature point detection method of illumination robustness, proposes a novel method of illumination-robust feature point detection.
To the best of our knowledge, the early illumination-robust detection are all feature-based methods. One of the most common methods is to improve the illumination quality of the input image. For example, Faille [7] decomposes the input image into illumination components and reflection components, and then uses a high-pass filter to remove low-frequency illumination components. Gevrekci et al. [8] apply the contrast stretching function to two differently illuminated images. When the contrast center is changed, the two differently illuminated images obtain similar response images at different contrast centers. At this time, most feature detectors can obtain a better detection result. Xue and Gao [9] constructed an illumination invariant color space based on adaptive histogram equ lization and dark channel priority theory, and then used AKAZE detector to extract feature points. Adaptive histogram equalization was used to enhance texture details and balance the illumination in the image, and dark channel priority was used to further reduce the impact of illumination on feature extraction.
Another better option is to consider the illuminance robustness during the design of the feature detector. Moravec [10] proposed the earliest corner detection method. Harris and Stephens [11] used the gradient to calculate the response function, and then used the response function to determine corners. The introduction of gradients reduced the impact of illumination on the detector. Lowe [12] proposed a SIFT feature detector, and suggested using Hessian matrix instead of Harris for keypoints selection, and redefined the keypoints response function. The introduction of the Hessian matrix makes the detector robust to illumination. As an accelerated version of SIFT, SURF [13] also uses the Hessian matrix for feature selection, and the response function is improved on the basis of the Harris detector. Lee and Chen [14] proposed a method to detect feature points using histogram information. This method constructs a Hessian matrix that does not contain the second-order partial differential equation, but it contains the histogram information of the pixel neighborhood. Miao and Jiang [15] proposed a feature detector based on a nonlinear filter which is named ROLG (Rank Order Laplace of Gaussian). The ROLG is a rank order filter, and its weight is proportional to the coefficients of the LoG (Laplace of Gaussian) filter. Wu et al. [16] proposed a detection method that utilizes optimal multi-binary images to eliminate the noise and illumination effects. Considering the problem that low-contrast image structure is easily submerged by high-contrast image structure, Miao et al. [17] proposed to construct a zero-norm LoG filter. Since the response of the zero-norm LoG filter is proportional to the weighted of pixels in the local area, the filter keeps the image contrast unchanged. Furthermore, based on the zero-norm LoG filter, they developed a new feature point detector. Hong-Phuoc and Guan [18] pointed out that most hand-crafted feature detectors rely on pre-designed structures, and this pre-designed structure will be affected by uneven illumination. They proposed a feature detector to locate feature points in the image by calculating the complexity of the blocks surrounding the pixels.
Among the feature-based detection methods, Harris is considered to be the basis for the illumination robustness of the corner detectors, and the Hessian matrix is the root cause of the illumination robustness of the spot detection methods. However, Harris was based on the autocorrelation matrix introduces textured patterns and noise while detecting corners. The Hessian matrix contains second-order partial differential, and the feature detector constructed with the Hessian matrix as the response function will inevitably introduce unstable and error points around the structure [18]. Though there are also some other illumination-robust feature detection methods, these methods are not widely used due to own limitations. For example, the Wu’s method [16] must provide a reference image when extracting feature points. The method of Hong-Phuoc and Guan [18] does not work well for severely underexposed or overexposed images.
When feature-based detection methods encounter bottlenecks, deep learning have been widely used in many fields as a brand-new problem-solving idea. Naturally, learning-based methods were also introduced into feature point detection as a new attempt.
TILDE [19] introduced a learning-based method for feature point detection, and trained the regressor through supervised learning to work normally even if the illumination changes drastically. Unlike TILDE, which only performs feature detection, LIFT [20] is a novel architecture that can perform detection, orientation estimation, and description at the same time. The training process introduces the inverse training, which can minimize the influence of illumination on feature point detection. Although LIFT can extract illumination-robust feature points well, it is still a supervised learning method. Quad-Networks [21] is an unsupervised feature point detection method. It trains a neural network in an illumination-invariant manner and uses the network to sort pixels. If some pixels can achieve higher ranking under different illumination, these pixels are selected as candidate feature points. The network obtained by this training method is an illumination-robust feature detection network, which can extract illumination-robust feature points. The unsupervised learning of SuperPoint [22] is different from Quad-Networks. It proposes pre-training the feature detector on the procedurally generated polygonal synthetic geometric data set, then uses the pre-training network to extract the feature points on the real data set and use them as label data, and finally uses these data to train the network. In addition, LF-Net [23] exploit depth and relative camera pose to create a virtual target response for the network. Through this response relationship, training can be performed without hand-crafted detector, thereby performing sparse matching. D2-Net [24] addresses the problem of poor performance of traditional sparse local features under illumination changes drastically by postponing the detection. Key.Net [25] combines the hand-crafted detector and CNN filter in the shallow multi-scale framework, which reduces the number of network parameters and ensures the detection repeatibility rate. ASLFeat [26] further improves the positioning accuracy of D2-Net keypoints.
With the widespread application of learning-based methods in feature detection, some inherent disadvantages have gradually been exposed, such as poor versatility, high training costs (time and equipment), and the need for large amounts of data for learning. In addition, the uninterpretability of learning results is also a problem that must be faced. Before these problems are solved, learning-based detection methods are not suitable in many application scenarios. In view of this, feature-based detection methods are still a key research area at present and for a long time in the future. However, feature-based detectors are basically extended based on Harris, Hessian, and FAST, and these detectors themselves do not have excellent illumination robustness. Our method is a brand-new detection method, which completely bypasses the conventional design ideas of the detector and uses the location information of eight-neighborhoods for detection. Since the eight-neighborhood of the pixel itself is very close to the position of the pixel, the detailed information can be well preserved and the illumination robustness of the detection can be improved. At the same time, our method is different from Wu’s method [16]. Based on Wu’s method, we have further deepened and expanded the types of feature points from 8 types to 250 types. The expansion of types promotes the improvement of matching accuracy and matching speed. In addition, we designed a complete illumination robustness feature detection method and analyzed its matching performance. We also added experiments with different illumination intensity and illumination direction in the Section 5 (Experimental Results). The contributions of this paper are as follows:
  • This paper proposes a novel feature point detection method based on the position of the neighborhood connection. At the same time, the paper also analyzed the computational complexity of the method.
  • By introducing multiple-optimal image binarization method before the feature point detection, it is ensured that the proposed detection method has better illumination invariance.
  • Experimental results prove that our method has significant advantages over the current state-of-the-art method in terms of the number of matching feature points and the stability of the repeatibility rate.
This paper is organized as follows. The second section introduces a multiple-optimal image binarization method. In the third section, we propose a novel feature point classification and detection method. The fourth section proposes a classification matching method based on the third section and theoretically analyzes the time consumption of different detection methods. The experimental results are given in the fifth section, and the conclusion is presented in the last section.

2. Illumination-Invariant Transformation

For the image with large-photometric-variation, this paper proposes a multiple-optimal image binarization method based on the related information of two images. The multiple-optimal image binarization method can further improve the feature point detection performance of the proposed method by improving the detection environment. The method assumes that the processed images are the different illumination images obtained by the same camera for the same scene. Under this premise, combined with the monotonous increment of the camera response function (CRF) [27] and the Median Threshold Bitmap (MTB) [28] order measurement method, the threshold required for binarization can be obtained. Through the multiple-optimal image binarization method, the feature point information in the image can be retained to the maximum extent, which provides guarantee for the subsequent feature point detection.

2.1. Monotonically Increasing of Camera Response Function

According to the monotonous increment of the CRF, a function that converts the brightness of the scene to the intensity of the image under certain exposure conditions indicates that the modification in illumination changes the intensity of the image, but maintains their relative order. Suppose we have two images Z 1 , Z 2 R M × N which are two images with the same scene but of different illumination. By rearranging the pixel values in ascending order of brightness, Z 1 1 , Z 1 2 , …, Z 1 k , …, Z 1 M × N and Z 2 1 , Z 2 2 , …, Z 2 k , …, Z 2 M × N , which is according to the monotonicity of the camera response function, we have the correspondence relationship,
Z 1 k Z 2 k , k = 1 , 2 , , M × N .
Therefore, for photometric-variation images, the identical binary image can be obtained by binarizing any percentile of the ordering pixels.

2.2. The Ordinal Measures

The MTB, Local Binary Pattern (LBP) [29], and Local Ternary Patterns (LTP) [30] are often used to represent the illumination invariance of image. Wu [16] proposed the MTB because it can obtain the best features for different illumination images. The mathematical expression is shown by:
F M T B u = 1 , i f Z u > z m e d 0 , o t h e r w i s e ,
where the u is a point in the image Z, the Z u is intensity value of point u, and the z m e d is the median.
However, Wu [31] pointed out that MTB also has some problems as: (1) the same gray value in the discrete domain has many pixels, so it is impossible to achieve absolute equal segmentation with the median; (2) the conversion is very sensitive to noise, especially for pixels that are close to the median; and (3) this conversion is less accurate in taking extreme values in very dark or high-brightness images (which is close to 0 or 255). In order to solve the problems, the multiple-optimal image binarization method is introduced.

2.3. Multiple-Optimal Image Binarization Method

Note that Z 1 and Z 2 are the two images of the same scene, and Π 1 and Π 2 are the corresponding cumulative distribution. The optimal percentile ξ ( ξ 1 , ξ 2 ) based on ordinal information to binarize images Z 1 and Z 2 are obtained by:
ξ 1 , ξ 2 = arg min p , q Π 1 p Π 2 q ,
where p and q are the gray values, the p , q 0 , 255 , and the minimum value is 0 when both p and q equal to 255. To avoid this, and to eliminate the noise appearing in the shadow image, the search range was limited to [50, 250]. In order to further improve the robustness of the method, the introduced multiple binarizations method to obtain a series of new images:
B 1 k u = 1 , i f Z 1 u > ξ 1 k 0 , o t h e r w i s e , B 2 k u = 1 , i f Z 2 u > ξ 2 k 0 , o t h e r w i s e ,
where the B 1 k and B 2 k is the k - th binary image. When the K is the total energy level of the original image binarization, that is, the illumination change image is binarized by the suboptimal percentile ξ 1 k , ξ 2 k k = 1 , 2 , , K .

2.4. Eliminating Effect of Photometric Variation

Here, Z ^ 1 and Z ^ 2 are two smooth images with the same scene and different illumination, which can be linked by:
Z ^ 2 u = f 12 Z ^ 1 u , Z ^ 1 u = f 21 Z ^ 2 u ,
where f 12 and f 21 are known as the Intensity Mapping Functions (IMFs) [32]. The f 12 f 21 represent the image Z ^ 1 Z ^ 2 to image Z ^ 2 Z ^ 1 mapping strength. IMFs can be calculated by histogram matching as shown by:
f 12 z 1 = Π 2 1 Π 1 z 1 , f 21 z 2 = Π 1 1 Π 2 z 2 ,
where the z 1 and z 2 are the intensity value of corresponding image Z ^ 1 and Z ^ 2 . In order to determine whether to use f 12 or f 21 , a weighting function ω z is introduced for the pixel value at each pixel point, and its mathematical expression is shown by:
ω z = z , i f z < 128 255 z , o t h e r w i s e ,
where the z is the intensity value of single pixel. However, what we need is to perform intensity mapping on the entire image, so the weight of a single pixel is not enough. Therefore, we need to calculate the cumulative weight of all pixels of image Z ^ 1 Z ^ 2 , and the expression is as follows:
W Z ^ 1 = u ω Z ^ 1 u , W Z ^ 2 = u ω Z ^ 2 u ,
where W Z ^ 1 and W Z ^ 2 are the cumulative weight of the image Z ^ 1 and Z ^ 2 . Further, we determine whether to transform the image by comparing the cumulative weight of the two images. Finally, normalize the input image. The result is as follows
Z ¯ 1 = f 21 Z ^ 2 i f W Z ^ 1 < W Z ^ 2 Z ^ 1 o t h e r w i s e Z ¯ 2 = f 12 Z ^ 1 i f W Z ^ 2 < W Z ^ 1 Z ^ 2 o t h e r w i s e .
The key of this section is to use a reliable (less saturated) image to map the intensity, which can significantly reduce the effect of image saturation, eliminate the effect of large-photometric-variation on the image, improve detection environment, and reduce the difficulty of feature point detection.

3. Feature Point Detection Based on Neighborhood Information

Detection method based on feature point neighborhood information can be further divided into two types, namely the detection method based on the number of feature point neighborhood connections and the detection method based on the location of feature point neighborhood connections. The former has been introduced in Reference [16], we will focus on introducing the latter in this paper.

3.1. Classification Based on Neighborhood Connectivity Location

Different from the classification method based on the number of neighborhood connections, the classification method based on the location of neighborhood connections not only contains the number information of neighbors but also contains the location information.
Figure 1c is a local candidate feature points map of Figure 1a, and the diagram of feature point neighborhood connectivity information is shown in Figure 2.
Each combination of letters and numbers in Figure 2 represents a candidate feature point. Different letters indicate that the number of neighboring connections is different. The letters are the same and the numbers are different, indicating that the neighboring pixels are different connected location. Furthermore, the feature point neighborhood contains up to eight pixels, that is, there can be up to eight directions. Therefore, based on the number of neighboring feature points we can divide feature point into eight types: Endpoint, Corner, Junction, Intersection, Five-line intersection, Six-line intersection, Seven-line intersection, and Eight-line intersection. Here, we count the number and proportion of different types of feature points in the image. The experimental material was derived from the TID2008 dataset. The statistical results are shown in Figure 3.
The experimental results indicate: (1) Corner account for the highest proportion, close to 50%. Followed by Endpoint and Junction; (2) the first four types of feature points account for more than 99%; and (3) the latter four types of feature points account for a very small proportion and can be ignored. Therefore, feature detection only needs to detect the first four types of feature points.
In order to further reduce the time spent on matching and improve the matching accuracy, we introduced the location information of the neighborhood, and proposed a feature point classification method based on the connection location of the neighborhood, as shown in Figure 4. It should be particularly noted that the proposed method divides the feature points into 250 types, and it is neither realistic nor necessary to list them all in the paper. Therefore, Figure 4 only shows a part of them for visual analysis.
  • Endpoint
    Different connection positions of neighboring pixels constitute different types of Endpoint. One pixel is arbitrarily connected in the 8 neighborhoods of the feature point to form an Endpoint. Therefore, the Endpoints can be divided into 8 types. The Endpoint type is shown in Figure 4a.
  • Corner
    The feature point is connected with two different pixels in the 8 neighborhoods to form a Corner. Take the I-type Endpoint in Figure 4a as an example, where the Endpoint itself occupies a pixel position, and another pixel is randomly selected from the remaining seven positions to form a Corner. According to the position of the second pixel, the feature points form a new type. Note that, when two neighboring pixels form a straight line with the feature points, as shown in Figure 4b type IV, it is no longer a Corner and needs to be excluded. The Corner can be divided into 24 types.
  • Junction
    Based on the I-type Corner in Figure 4b, the connected pixel is added to the remaining neighborhood position to form a third type of feature point, which is named Junction. Figure 4c shows a Junction that is derived from the Corner of I-type. The Junction can be divided into 56 types.
  • Intersection
    The Intersection is generated based on the Junction. Figure 4d shows several types of Intersection derived from the Junction of I-type. The Intersection can be divided into 70 types. Figure 3 shows that, when the number of connected neighbors of feature points is greater than 4, the probability of occurrence is small, which is not enough to affect the matching result, so it is not considered.

3.2. Feature Point Detection

For the photometric-variation image, multiple-optimal image binarization method is used to obtain multiple binarization images. For each binarization image, assuming that B 1 and B 2 are the optimal binarization image that is obtained by the optimal percentile ξ ξ 1 , ξ 2 . The image target boundary is obtained as follows:
P j = B j B j Θ Ω ,
where the j 1 , 2 , the Ω is a square structural unit having a width of 3 pixels, and Θ is a corrosion operation.
For the image P j containing the feature points, the image feature point F j u is derived from the number of k pixels connected to u in the image P j , and the mathematical expression of the F j u is shown by:
F j u = k Θ u P j k ,
where Θ u is the 8-connected neighborhood of feature point u, and F j u is the number of neighbors of feature point u in the j - th image, F j u 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 . When F j u = 1, it means that the feature point is the Endpoint, F j u = 2 , the feature point is the Corner. Equation (11) is the mathematical expression of the classification method based on the number of connected neighbors.
The detection method based on the connected position of the feature point neighborhood not only needs to obtain the number of connected pixels in the neighborhood around the feature point but also acquires the connected position. The mathematical expression of the proposed method is as follows:
F j i u k = k Θ u k P j k ,
where u k represents the specific position of the k pixel relative to the feature point u, i represents the number of connected neighbors of the feature point, j represents the corresponding image, and Θ u is the 8-connected neighborhood of the pixel u.
There is the following equivalent relationship between the feature points and their mathematical expressions in the proposed method,
F j 1 u 1 0 0 0 0 u 1 0 0 0 ,
where F j 1 ( u 1 ) indicates that the point is an Endpoint, and is the type-I Endpoint.

4. Matching Performance Analysis

Feature point matching is the process of detecting and extracting feature points from the image, and then finding the closest corresponding point according to a preset measurement criterion. Figure 5 shows two different matching ideas. Figure 5a shows the general feature point matching, and Figure 5b shows the classification matching of feature point.
The key to the classification matching is to perform the matching process in a subset of the corresponding classification. The classification matching based on the number of connected neighbors can be shown:
F 1 u F 2 u .
The classification matching based on the connected position of the feature point neighborhood can be shown:
F 1 i u k F 2 i u k ,
where the u k is a subset of the u. The number of feature points in u k is less than that in u.

4.1. Matching Time Estimation

Element z i j in W represents a measure of similarity between feature point x i and y j , the kernel function K : X × Y R is used to define these elements as inner products in an inner product space, and the mathematical expression is as shown by:
z i j = K x i , y j = ϕ x i , ϕ y j ,
and the time cost of two matching point pairs,
Z = z 11 z 1 n z m 1 z m n = K x 1 , y 1 K x 1 , y n K x m , y 1 K x m , y n .
In general, given an appropriate kernel function K, Mercer’s theorem [33] ensures that there is an inline function ϕ · ; in this paper, the time cost of two matching point pairs is estimated by Equation (17).

4.2. Matching Time Comparison

For the traditional feature point matching method, such as SIFT, the time consumed by the matching is equal to the inner product of each feature point x i and y j , and time consumption are obtained by
T ime Z = i = 1 m j = 1 n t i m e K x i , y j .
Based on the detection method of the number of neighborhood of feature points, the feature point set X of image Z 1 is segmented into eight feature point sets: X 1 , X 2 , , X 7 , X 8 . The feature point matching time overhead can be described by
T ime Z = i = 1 8 t i m e K X i , Y i .
Further, in the feature point detection method based on the connected position of feature points, the feature point set is further refined and divided into endpoint X 1 i , corner X 2 j , junction X 3 k , intersection X 4 h , etc. The following relationship exists after the division:
X 1 i X 1 , X 2 j X 2 , X 3 k X 3 , X 4 h X 4 .
The time required for feature point matching in the feature point detection method based on the feature point neighborhood connected position is expressed by:
T ime Z = j = 1 p i = 1 q t i m e K x p q , y p q s . t . p 1 , 2 , 3 , 4 , q = C 8 p .
For example, K x 1 i , y 1 i is the kernel function of the set of points formed by the type i endpoints in the endpoint, and t i m e K x 1 i , y 1 i is the time it takes for the type i endpoint to match.
According to Equations (18), (19) and (21), it can be judged that the time overhead of feature point matching has the following relationship:
T ime Z > T ime Z > T ime Z .

5. Experimental Results

In this section, we selected images with different illumination as experimental materials for feature point detection and matching. Some of these images are obtained by changing the exposure settings, and the others are captured at different time periods, such as morning and afternoon, daytime, and night. The experimental materials include indoor scenes, outdoor scenes, close-up, and long-distance scenes.
The comparison methods used in this paper include two types, which are feature-based and learning-based detection methods. Feature-based methods include Harris [11], MinEigen [34], SIFT [12], SURF [13], IRFET_Harris [8], FAST [35], ORB [36], A-KAZE [37], and Wu [16]. In this section, unless otherwise specified, the Wu’s methods [16] are all denoted as Wu. The learning-based methods select LIFT [20], SuperPoint [22], and LF-Net [23]. The relevant parameters of the feature-based method all follow the parameters in the published paper, and the learning-based detection method uses the pre-trained model published by the author of the paper on github. The original setting number of keypoints in the LIFT and LF-Net pre-training models is small (LIFT is 1000 and LF-Net is 500), which seriously affects the fairness of the experimental result. In order to avoid the unfairness, we uniformly set the maximum number of keypoints in the pre-training model to a very large value to ensure that the most feature points can be detected.
We use several common feature detector evaluation indicators including the number of feature points, the number of matching points, and the repeatibility rate to evaluate the performance of the proposed method. Repeatibility rate is a key evaluation indicator, with various definitions, among which the definition of [38] is widely used, and the expression is as follows:
r d = x ˜ i , x ˜ j d i s t H i j x ˜ i , x ˜ j < d min n i , n j ,
where x ˜ i , x ˜ j is a pair of matching feature points, d i s t ( H i j x ˜ i , x ˜ j ) is the distance between the pair of matching feature points, and the H i j is a homography matrix, used to transform point x ˜ i in one image to another image.

5.1. Different Exposure Value

Figure 6 contains six groups of experimental materials with different exposure value. Each group of materials consists of two images of the same scene. The left image is overexposed, and the right image is underexposed.
The number of feature points is shown in Figure 7. The left experimental result corresponds to the overexposed images in Figure 6, and the right experimental result corresponds to the underexposed images. The number of matching points is shown in Table 1. The repeatibility rate evaluation value based on Equation (23) is shown in Figure 8.
The number of feature points is one of the most important performance evaluation indicators for feature detectors. Figure 7 indicates that our method can extract a large number of significant feature points from two images with large-photometric-variation. In most cases, our method can obtain the most feature points, and, in the remaining few experimental results, although the number of feature points extracted by the proposed method is not the most, it can still be guaranteed to be at the upper-middle level. In addition, ORB and LF-Net also show excellent performance in terms of the number of feature points extracted, sometimes even more than the proposed algorithm.
In addition, the number of matching points is another important evaluation indicator. In this article, we use the number of theoretical matching feature points and the actual number of matching feature points for algorithm evaluation. The calculation method of theoretical matching feature points is as follows. (1) First, extract feature points from underexposed images and overexposed images. (2) Secondly, the feature points in the overexposed image are transformed into the underexposed image through the homography matrix (since the scene is the same, the homography matrix here can be simplified to a unit matrix). (3) Finally, check whether there is a feature point at the corresponding position of the underexposed image. If it exists, we consider this pair of feature points as theoretical matching feature points. Table 1 shows the number of theoretical matching feature points. The experimental result is obtained by Equation (23).
In the first four groups of experimental results, the proposed method has obvious advantages. The number of matching points is several to several tens of times that of other detection methods. In the last two groups of experiments results, the proposed method is equivalent to LF-Net algorithm.
LF-Net shows very good performance in the matching experiment of large-photometric-variation, which is only slightly inferior to the proposed method; LIFT and A-KAZE are inferior to the former, but they perform well in terms of the number of matching points and matching stability; ORB and Wu can obtain a large number of matching feature points under certain scenes and illumination conditions, but their performance is not stable enough. In addition, Harris, FAST, and SURF perform extremely poorly under large-photometric-variation, and sometimes even a pair of matching points cannot be obtained.
In addition to the number of feature points and matching points, the repeatibility rate is also a commonly used evaluation indicator. It intuitively reflects the proportion of matching feature points in the extracted feature points and is used to characterize the availability and repeatibility of the feature points extracted by the feature detector. The repeatibility rate is shown in Figure 8.
Figure 8 shows that the repeatibility rate of the proposed method is not the highest in most cases, but it is the most stable, basically around 30%, with a small fluctuation range of 20% to 40%. On the contrary, the repeatability rate of other methods fluctuates greatly. For example, Wu’s method has a repeatibility rate of 60% at the highest and close to 0 at the lowest. The repeatibility rate of SuperPoint exceeds 40% at the highest and about 10% at the lowest. Combining Figure 7 and Figure 8 and Table 1, we find that the proposed method can extract the most feature points and obtain the most matching feature points, while the repeatibility rate changes the most stable. Therefore, we believe that the proposed method has the best illumination robustness.
However, this is not enough because we also need to verify whether the matching points can indeed be used for feature point matching in the real environment. The calculation method of actual matching feature points is as follows. First, extract the feature points from the two images; then, calculate the descriptor for each extracted feature point; finally, select the appropriate matching algorithm for feature point matching and calculate the actual number of matching feature points. Table 2 shows the actual number of matching points (the same descriptor and matching method were used in the previous period).
There is a certain deviation between the data in Table 1 and Table 2. However, the proposed method still obtains the most matching feature points in most cases. Although the actual number of matching feature points in the other two groups is not the most, it performs well in the same group of experiments. In addition, although LF-Net performance is not as good as the proposed method in terms of the matching points number of theoretical calculations, the experimental results of “CadikDesk” and “Memorial” have exceeded the proposed method in actual matching experiments. At the same time, the experimental results of “BigTree” and “WindowSeries” are very close to the proposed method, which indicate that LF-Net also has excellent illumination robustness. In addition to LF-Net, SuperPoint and LIFT also surpass most feature-based detection methods (except the proposed methods) in the actual feature points matching experiment.
In order to further verify the previous experimental results, we give the alignment and overlay images of different experimental groups, as shown in Table 3.
The experimental results in Table 3 indicate that the alignment based on Harris and FAST is the worst; LIFT, SuperPoint, LF-Net, and the proposed method perform best in the image alignment experiments, and all can achieve correct image alignment. “Belgium” and “Memorial” have the largest illumination differences, so most feature detectors fail in these two experiments. “SnowMan”, “CadikDesk”, and “BigTree” are relatively difficult, so most detectors can extract enough matching feature points and perform correct alignment. The alignment results in Table 3 can well prove the previous experimental results.

5.2. Different Capture Time

When the camera settings and pose are fixed and only the capture time is different, a series of images with different illumination directions or intensities can be obtained, as shown in Figure 9. The first and second rows correspond to the same scene, the capture time of the first row is in the morning, and the capture time of the second row is in the afternoon. Therefore, we collectively refer to the first two rows as Morning-Afternoon dataset. The third and fourth rows correspond to the same scene, the third row of images were captured during the daytime, and the fourth row was captured at night. We call the last two rows Daytime-Night dataset. From left to right, the first column is named Scene_1, the second column is named Scene_2, and so on.
We extracted the feature points of each pair of images in the Morning-Afternoon dataset and shown them in Table 4 and Table 5.
The images in the Morning-Afternoon dataset show different states in different areas due to the different directions of sunlight. The originally bright area may become darker, and the originally darker area may become brighter. This makes it more difficult to match the feature points of the image.
Compared with other methods, the ORB, Wu’s method, LIFT, LF-Net, and the proposed method can extract more feature points when the illumination in different areas of the same scene changes significantly. Further, we count the number of theoretical matching points, and the statistical results are shown in Table 6.
The experimental results in Table 6 indicate that the proposed method can still obtain the most matching feature points when the image illumination direction changes. However, the situation reflected by Scence_3 cannot be ignored. When the light-dark area is completely reversed, the proposed method may not work well. In addition, ORB, Wu’s method, and LF-Net can also theoretically extract many matching feature points.
The number of theoretical matching feature points is obtained by Equation (23), which does not consider feature descriptors and matching methods, so interference caused by algorithm compatibility can be eliminated. However, the number of theoretical matching points is extremely dependent on the control accuracy of the camera pose during the image capture process. Therefore, in addition to counting the number of theoretical matching points, we also need to further examine the actual number of matching feature points, and comprehensively consider the two to ensure the credibility of the result. The actual number of matching feature points is shown in Table 7 (the same descriptor and matching method were used in the previous period).
In the 8 groups of experiments, 5 groups of proposed methods obtained the most matching feature points, and the other two groups ranked second, and the result of one group was poor (Scene_3). LF-Net followed closely behind.
When the illumination direction changes, Wu’s method, LIFT, LF-Net, and the proposed method can perform well in terms of the number of feature points and the number of matching points. In addition to considering the change of illumination direction, we also further consider the change of illumination intensity, as shown in the Daytime-Night dataset in Figure 9.
The illumination intensity of the two images in the Daytime-Night dataset is very different, so it is more difficult to use feature detection methods to extract feature points from low-illuminance images and match them with other images. The number of feature points extracted by different feature detection methods from the Daytime-Night dataset is shown in Table 8 and Table 9.
The images in Table 8 are captured during the daytime, so all detection methods can extract enough feature points. The images in Table 9 are different. Because it is captured after the sun sets, the illumination is very poor, so the difficulty of extracting feature points is greatly increased. However, there are also some detection methods that can extract feature points from low-illumination images, such as Wu’s method, LIFT, SuperPoint, LF-Net, and the proposed method. Further, we conducted statistics on the number of theoretical matching points and obtained the experimental results shown in Table 10.
Similarly, only Wu’s method, LIFT, SuperPoint, LF-Net, and the proposed method can obtain better matching results. The matching points of the proposed method is much higher than other methods, which indicates that the proposed method has the potential to obtain the most matching points under low illumination. In addition, the statistical results of actual matching points are shown in Table 11.
The actual matching point statistics indicate that the proposed method can still obtain the most matching feature points, but the advantages are reduced compared to Table 10. For example, the actual matching feature points of the proposed methods in Scene_5 and Scene_7 are very close to LF-Net. In addition, the feature points extracted by LF-Net have obvious performance advantages in the actual matching process, far exceeding other algorithms, followed by Wu’s method, SuperPoint and LIFT detection methods.
Through the analysis of the experimental results of the day-night dataset, it is found that, except for the methods of IRFET_Harris and Wu, other feature-based detection methods are difficult to extract enough feature points for matching. In contrast, learning-based methods have good phenotypes in terms of the number of feature points and the number of matching points, especially LF-Net, which has excellent illumination robustness. However, our proposed method surpasses LF-Net in all performance evaluation indicators. Furthermore, through analysis of the number of theoretical matching points and actual matching points, it is found that, due to the limitation of feature description and matching methods, many feature points cannot be matched correctly.

6. Discussion

This paper focuses on the illumination robustness of feature detection methods. In order to make the results more convincing, we used three types of data sets with different exposure values, different light directions, and different light intensities. For each data set, the proposed method and the other twelve feature detection methods are used for feature detection, extraction, and matching. Finally, the number of feature points and the number of matching points is used as evaluation indicators.
The experimental results of the three data sets are generally consistent, but due to the characteristics of the data sets themselves, the experimental results also have some differences in some details. In datasets with different exposure values, in addition to Wu, LIFT, SuperPoint, LF-Net, and the proposed method, the experimental results of other methods are not good. The reason is that the two images contained in each pair of experimental materials are underexposed images and overexposed images, respectively. Wu and the proposed method use multi-optimal image binarization to resist this large photometric variation. The other three groups of learning-based methods may have considered large photometric variation during the training process.
In the experiment where the illumination direction changes, most detection methods can extract enough feature points, which indicates that the change of the illumination direction has little effect on the detection method.
The last data set contains two images with different light intensities. Images captured during the daytime can extract enough feature points, while images captured at night have two extremes when extracting feature points. Some methods, including the proposed method and three learning-based methods, can still extract feature points equivalent to those during the daytime, but other methods cannot detect feature points at all. By comparing and analyzing the experimental results of the three data sets, we can conclude that the proposed method has the best illumination robustness.

7. Conclusions

In this paper, we proposed a novel feature point detector based on neighborhood connected information, which classifies and detects feature points based on the number and location information of the eight neighborhoods of the pixels to be detected. The proposed detector is proved to have better detection ability than other detectors in the case of under-exposure and over-exposure. This indicates that our method has the best illumination robustness. At the same time, it is also superior to other methods in terms of matching accuracy and matching time consumption. The experimental results also verify the above conclusion.
The proposed method also has some disadvantages. For matching accuracy, our method abandons geometric invariance. In other words, this method is not suitable for feature point detection under rotation or affine transformation. In the future, if the homography matrix of geometric transformation can be calculated, the proposed method can be broadened to geometric invariance of feature detection.

Author Contributions

Conceptualization, R.W. and S.W.; methodology, R.W. and S.W.; software, R.W.; validation, S.W., W.C., and K.W.; formal analysis, K.W.; resources, S.W. and L.Z.; data curation, S.W. and W.C.; writing—original draft preparation, R.W.; writing—review and editing, R.W., S.W., and W.C.; supervision, L.Z.; project administration, L.Z. and S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 61775172.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rashid, M.; Khan, M.A.; Sharif, M.; Raza, M.; Sarfraz, M.M.; Afza, F. Object detection and classification: A joint selection and fusion strategy of deep convolutional neural network and SIFT point features. Multimed. Tools Appl. 2019, 78, 15751–15777. [Google Scholar] [CrossRef]
  2. Ma, S.; Bai, X.; Wang, Y.; Fang, R. Robust Stereo Visual-Inertial Odometry Using Nonlinear Optimization. Sensors 2019, 19, 3747. [Google Scholar] [CrossRef] [Green Version]
  3. Yao, W.; Li, Z. Instant Color Matching for Mobile Panorama Imaging. IEEE Signal Process. Lett. 2015, 22, 6–10. [Google Scholar] [CrossRef]
  4. Henawy, J.; Li, Z.; Yau, W.Y.; Seet, G. Accurate IMU Factor Using Switched Linear Systems For VIO. IEEE Trans. Ind. Electron. 2020, 62. [Google Scholar] [CrossRef]
  5. Li, Y.; Wang, S.; Tian, Q.; Ding, X. A survey of recent advances in visual feature detection. Neurocomputing 2015, 149, 736–751. [Google Scholar] [CrossRef]
  6. Tuytelaars, T.; Mikolajczyk, K. Local Invariant Feature Detectors: A Survey; Now Foundations and Trends: Delft, The Netherlands, 2007; Volume 3, pp. 177–280. [Google Scholar] [CrossRef] [Green Version]
  7. Faille, F. A fast method to improve the stability of interest point detection under illumination changes. In Proceedings of the 2004 International Conference on Image Processing, Singapore, 24–27 October 2004. [Google Scholar] [CrossRef]
  8. Gevrekci, M.; Gunturk, B.K. Illumination robust interest point detection. Comput. Vis. Image Underst. 2009, 113, 565–571. [Google Scholar] [CrossRef]
  9. Xue, Y.; Gao, T. Feature Point Extraction and Matching Method Based on Akaze in Illumination Invariant Color Space. In Proceedings of the 2020 IEEE 5th International Conference on Image, Vision and Computing, Beijing, China, 10–12 July 2020. [Google Scholar] [CrossRef]
  10. Moravec, H.P. Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover; Stanford University: Stanford, CA, USA, 1980. [Google Scholar]
  11. Harris, C.G.; Stephens, M. A combined corner and edge detector. In Proceedings of the Fourth Alvey Vision Conference 1988, Manchester, UK, 31 August–2 September 1988; pp. 147–151. [Google Scholar]
  12. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999. [Google Scholar] [CrossRef]
  13. Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  14. Lee, W.T.; Chen, H.T. Histogram-based interest point detectors. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar] [CrossRef] [Green Version]
  15. Miao, Z.; Jiang, X. Interest point detection using rank order LoG filter. Pattern Recognit. 2013, 46, 2890–2901. [Google Scholar] [CrossRef]
  16. Wu, S.; Xu, W.; Jiang, J.; Qiu, Y.; Zeng, L. A robust method for aligning large-photometric-variation and noisy images. In Proceedings of the 2015 IEEE 17th International Workshop on Multimedia Signal Processing, Xiamen, China, 19–21 October 2015. [Google Scholar] [CrossRef]
  17. Miao, Z.; Jiang, X.; Yap, K. Contrast Invariant Interest Point Detection by Zero-Norm LoG Filter. IEEE Trans. Image Process. 2016, 25, 331–342. [Google Scholar] [CrossRef] [PubMed]
  18. Hong-Phuoc, T.; Guan, L. A Novel Key-Point Detector Based on Sparse Coding. IEEE Trans. Image Process. 2020, 29, 747–756. [Google Scholar] [CrossRef] [PubMed]
  19. Verdie, Y.; Yi, K.M.; Fua, P.; Lepetit, V. Tilde: A temporally invariant learned detector. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  20. Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. Lift: Learned invariant feature transform. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
  21. Savinov, N.; Seki, A.; Ladicky, L.; Sattler, T. Quad-networks: Unsupervised learning to rank for interest point detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef] [Green Version]
  22. DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef] [Green Version]
  23. Ono, Y.; Trulls, E.; Fua, P.; Yi, K.M. LF-Net: Learning local features from images. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  24. Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
  25. Barroso-Laguna, A.; Riba, E.; Ponsa, D.; Mikolajczyk, K. Key.Net: Keypoint detection by handcrafted and learned CNN filters. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef] [Green Version]
  26. Luo, Z.; Zhou, L.; Bai, X.; Chen, H.K.; Zhang, J.H.; Yao, Y.; Li, S.W.; Fang, T.; Quan, L. Aslfeat: Learning local features of accurate shape and localization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
  27. Cerman, L.; Hlavac, V. Exposure Time Estimation for High Dynamic Range Imaging with Hand Held Camera; Czech Pattern Recognition Society: Prague, Czech Republic, 2006. [Google Scholar]
  28. Ward, G. Fast, Robust Image Registration for Compositing High Dynamic Range Photographs from Hand-Held Exposures. J. Graph. Tools 2003, 8, 17–30. [Google Scholar] [CrossRef] [Green Version]
  29. Ojala, T.; Pietikäinen, M.; Harwood, D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996, 29, 51–59. [Google Scholar] [CrossRef]
  30. Tan, X.; Triggs, B. Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions. IEEE Trans. Image Process. 2010, 19, 1635–1650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Wu, S.; Li, Z.; Zheng, J.; Zhu, Z. Exposure-Robust Alignment of Differently Exposed Images. IEEE Signal Process. Lett. 2014, 21, 885–889. [Google Scholar] [CrossRef]
  32. Grossberg, M.D.; Nayar, S.K. Determining the camera response from images: What is knowable? IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1455–1467. [Google Scholar] [CrossRef]
  33. Breneman, J. Kernel Methods for Pattern Analysis. Technometrics 2009, 47, 237. [Google Scholar] [CrossRef]
  34. Shi, J.; Tomasi, C. Good Features to Track. In Proceedings of the 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994. [Google Scholar] [CrossRef]
  35. Rosten, E.; Porter, R.; Drummond, T. Faster and Better: A Machine Learning Approach to Corner Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 105–119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar] [CrossRef]
  37. Alcantarilla, P.F.; Nuevo, J.; Bartoli, A. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the Electronic Proceedings of the British Machine Vision Conference, Bristol, UK, 9–13 September 2013. [Google Scholar] [CrossRef] [Green Version]
  38. Schmid, C.; Mohr, R.; Bauckhage, C. Evaluation of Interest Point Detectors. Int. J. Comput. Vis. 2000, 37, 151–172. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Candidate feature point extraction process. (a) Original image. (b) Edge feature. (c) Local candidate feature points map.
Figure 1. Candidate feature point extraction process. (a) Original image. (b) Edge feature. (c) Local candidate feature points map.
Sensors 20 06630 g001
Figure 2. Feature point neighborhood connectivity information.
Figure 2. Feature point neighborhood connectivity information.
Sensors 20 06630 g002
Figure 3. Feature point type statistics. A to H are Endpoint, Corner, Junction, Intersection, Five-line intersection, Six-line intersection, Seven-line intersection, and Eight-line intersection in turn. I01 to I25 are the numbers of 25 images in TID2008.
Figure 3. Feature point type statistics. A to H are Endpoint, Corner, Junction, Intersection, Five-line intersection, Six-line intersection, Seven-line intersection, and Eight-line intersection in turn. I01 to I25 are the numbers of 25 images in TID2008.
Sensors 20 06630 g003
Figure 4. Classification method based on the connection position of feature point neighborhood. (a) Endpoint. (b) Corner. (c) Junction. (d) Intersection. The I to VIII are classifications based on connected positions.
Figure 4. Classification method based on the connection position of feature point neighborhood. (a) Endpoint. (b) Corner. (c) Junction. (d) Intersection. The I to VIII are classifications based on connected positions.
Sensors 20 06630 g004
Figure 5. Two matching method. (a) General matching method. (b) Classification matching method.
Figure 5. Two matching method. (a) General matching method. (b) Classification matching method.
Sensors 20 06630 g005
Figure 6. Experiment material of different exposure value. (a) Belgium. (b) SnowMan. (c) CadikDesk. (d) BigTree. (e) Memorial. (f) WindowSeries.The left image of each group is overexposed, and the right image is underexposed.
Figure 6. Experiment material of different exposure value. (a) Belgium. (b) SnowMan. (c) CadikDesk. (d) BigTree. (e) Memorial. (f) WindowSeries.The left image of each group is overexposed, and the right image is underexposed.
Sensors 20 06630 g006
Figure 7. Number of feature points. On the left are the feature points extracted from the overexposed images, and on the right are the feature points extracted from the underexposed images. The X-axis serial number corresponds to experimental materials in Figure 6.
Figure 7. Number of feature points. On the left are the feature points extracted from the overexposed images, and on the right are the feature points extracted from the underexposed images. The X-axis serial number corresponds to experimental materials in Figure 6.
Sensors 20 06630 g007
Figure 8. Repeatibility rate change curve of feature detector.
Figure 8. Repeatibility rate change curve of feature detector.
Sensors 20 06630 g008
Figure 9. Images of different illumination. The first and second rows correspond to the same scene, the capture time of the first row is in the morning, and the capture time of the second row is in the afternoon. The third and fourth rows correspond to the same scene, the third row of images were captured during the daytime, and the fourth row was captured at night.
Figure 9. Images of different illumination. The first and second rows correspond to the same scene, the capture time of the first row is in the morning, and the capture time of the second row is in the afternoon. The third and fourth rows correspond to the same scene, the third row of images were captured during the daytime, and the fourth row was captured at night.
Sensors 20 06630 g009
Table 1. Number of matching points obtained through theoretical calculation. The bold font indicates that the data obtained the best results in the same group of experiments.
Table 1. Number of matching points obtained through theoretical calculation. The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialBelgiumSnowManCadikDeskBigTreeMemorialWindowSeries
Harris074748111
MinEigen2482211129717150
SIFT12722672261250
FAST01451129343
SURF91101579014
IRFET_Harris31105914060443146
ORB761978287147380180
A-KAZE1620924217427
Wu32973698622813746
LIFT324618881401313289
SuperPoint55425334121147165
LF-Net110211162059776672737
Proposed18,04310,93537294666679754
Table 2. Actual number of matching points. The same descriptor and matching method were used in the previous period. The bold font indicates that the data obtained the best results in the same group of experiments.
Table 2. Actual number of matching points. The same descriptor and matching method were used in the previous period. The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialBelgiumSnowManCadikDeskBigTreeMemorialWindowSeries
Harris53066210
MinEigen757970596101
SIFT52291817125
FAST05693207
SURF7551615022
IRFET_Harris476779114679
ORB26248961239
A-KAZE3592426120
Wu10118511928325
LIFT3955577431687267
SuperPoint164142736340141
LF-Net619271312616137412
Proposed1308 17,43445767449503
Table 3. Actual number of matching points. Blank means that the feature detector cannot find enough feature points on the corresponding experimental material for image registration. The red frame area indicates that the alignment and overlay results are incorrect due to registration errors.
Table 3. Actual number of matching points. Blank means that the feature detector cannot find enough feature points on the corresponding experimental material for image registration. The red frame area indicates that the alignment and overlay results are incorrect due to registration errors.
Method∖MaterialBelgiumSnowManCadikDeskBigTreeMemorialWindowSeries
Harris Sensors 20 06630 i001 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i004 Sensors 20 06630 i005
MinEigen Sensors 20 06630 i006 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i008 Sensors 20 06630 i005
SIFT Sensors 20 06630 i009 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i005
FAST Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i010
SURF Sensors 20 06630 i011 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i005
IRFET_Harris Sensors 20 06630 i012 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i013 Sensors 20 06630 i005
ORB Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i005
A-KAZE Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i005
Wu Sensors 20 06630 i014 Sensors 20 06630 i002 Sensors 20 06630 i015 Sensors 20 06630 i007 Sensors 20 06630 i005
LIFT Sensors 20 06630 i014 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i008 Sensors 20 06630 i005
SuperPoint Sensors 20 06630 i014 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i008 Sensors 20 06630 i005
LF-Net Sensors 20 06630 i014 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i008 Sensors 20 06630 i005
Proposed Sensors 20 06630 i014 Sensors 20 06630 i002 Sensors 20 06630 i003 Sensors 20 06630 i007 Sensors 20 06630 i008 Sensors 20 06630 i005
Table 4. Feature points obtained from the image captured in the morning. The bold font indicates that the data obtained the best results in the same group of experiments.
Table 4. Feature points obtained from the image captured in the morning. The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris11233701388734323479107299
MinEigen122314,2485034341358618644701170
SIFT94872603221154079712536321021
FAST9656691406473537362136401
SURF26325671201690659374410492
IRFET_Harris125217,6325002277962916154811056
ORB338955,66015,19967713615546525924476
A-KAZE547353518009841146872930873
Wu217913,027412013,5119081640949747864
LIFT36435939556153653400440734294091
SuperPoint856132041181809804592709912
LF-Net974711,06010,26310,3416982780865747479
Proposed714422,576493723,07624,32115,51713,65719,741
Table 5. Feature points obtained from the image captured in the afternoon. The bold font indicates that the data obtained the best results in the same group of experiments.
Table 5. Feature points obtained from the image captured in the afternoon. The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris47021737442775381243241260
MinEigen15149049386377466987578201550
SIFT1282543225865237586703614885
FAST323528910422697236273169185
SURF40417279382091257138366387
IRFET_Harris158111,797433884916388526501001
ORB527137,31112,85228,7142419281028813111
A-KAZE780257013503302425338733645
Wu229011,516533020,06211,444667668078781
LIFT36415629507455393595436534754324
SuperPoint895137236652363544538746905
LF-Net904110,99910,98399668578899974828171
Proposed752922,88011,29330,35525,27016,64316,64219,129
Table 6. Number of theoretical matching feature points (Morning-Afternoon dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 6. Number of theoretical matching feature points (Morning-Afternoon dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris201331016745254114
MinEigen24310701508405737859660
SIFT114375459140304820227
FAST196291235161231179
SURF396081502099107
IRFET_Harris255183617483609215854620
ORB771611625408795056133311079
A-KAZE7911414695404840204
Wu35322822872948109710026541324
LIFT1832501149366111389179782
SuperPoint108191676212363173422
LF-Net9831143213810315649995561599
Proposed344610,71985310,5359195703256449147
Table 7. Actual number of matching points (Morning-Afternoon dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 7. Actual number of matching points (Morning-Afternoon dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris711231237187241467
MinEigen82738410011227611037342
SIFT301112183567582043
FAST632941320151261544
SURF1451071134109302473
IRFET_Harris85759410610929311442330
ORB71037554543398846321
A-KAZE2721531045172735095
Wu30620630233373290173458
LIFT84622319090282314172489
SuperPoint525921777719210781238
LF-Net1889381273125495499326606
Proposed120342419260178510795731797
Table 8. Feature points obtained from the image captured in the daytime (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 8. Feature points obtained from the image captured in the daytime (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris8132111548318942099709517061830
MinEigen2349617011,166422810,60016,89585458172
SIFT2257430477033341349212,59761705744
FAST792243572872058251613,35012652349
SURF8881418311113641142451113732405
IRFET_Harris2446708313,7775021939622,40389859602
ORB827524,80152,45716,12622,12982,34218,07630,391
A-KAZE15132147443522431878678021283293
Wu712419,98719,84615,590437014,49323575550
LIFT53015288515246445701644264335070
SuperPoint15101737210115241299239317283365
LF-Net842410,38211,403961711,10013,20710,00611,437
Proposed14,79241,06339,88835,13810,74239,636798015,409
Table 9. Feature points obtained from the image captured in the night (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 9. Feature points obtained from the image captured in the night (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris1155732081004922532699139
MinEigen1294059365131532435533851637
SIFT0366156366401220124043
FAST100453193101
SURF02024753238413
IRFET_Harris422695209222121743961091245
ORB7838445311817268944467262
A-KAZE01832086331426
Wu833119,71318,35718,729544513,28022427735
LIFT49625147534743555074603160744784
SuperPoint84511401047806105713356411033
LF-Net10,38612,54511,62810,37510,90710,671923610,135
Proposed16,70239,88135,76833,98510,40633,886749318,187
Table 10. Number of theoretical matching feature points (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 10. Number of theoretical matching feature points (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris13125154282130018562
MinEigen181913772063782410836778
SIFT05538015617784015
FAST100202085401
SURF00951516241
IRFET_Harris375011831203212936346136
ORB122912296051047157883
A-KAZE021201825884
Wu141655526509341210774370646824
LIFT64062785492551019458161127
SuperPoint312368250397160478147456
LF-Net13321828247816371490369611852099
Proposed727727,19628,53619,147585328,50961906976
Table 11. Actual number of matching points (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Table 11. Actual number of matching points (Daytime-Night dataset). The bold font indicates that the data obtained the best results in the same group of experiments.
Method∖MaterialScene_1Scene_2Scene_3Scene_4Scene_5Scene_6Scene_7Scene_8
Harris0314667110152114919
MinEigen04881091159370951619220
SIFT022108125156132
FAST00021032200
SURF0880178470
IRFET_Harris1575141926363114728543
ORB0765731109883194
A-KAZE001003111340
Wu5157601303701365847253130
LIFT4953756164283789261449504
SuperPoint296217263250184284172147
LF-Net750453112857057615611581692
Proposed195326274606193075052011707617
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, R.; Zeng, L.; Wu, S.; Cao, W.; Wong, K. Illumination-Invariant Feature Point Detection Based on Neighborhood Information. Sensors 2020, 20, 6630. https://doi.org/10.3390/s20226630

AMA Style

Wang R, Zeng L, Wu S, Cao W, Wong K. Illumination-Invariant Feature Point Detection Based on Neighborhood Information. Sensors. 2020; 20(22):6630. https://doi.org/10.3390/s20226630

Chicago/Turabian Style

Wang, Ruiping, Liangcai Zeng, Shiqian Wu, Wei Cao, and Kelvin Wong. 2020. "Illumination-Invariant Feature Point Detection Based on Neighborhood Information" Sensors 20, no. 22: 6630. https://doi.org/10.3390/s20226630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop