Open Access
This article is
 freely available
 reusable
Information 2019, 10(12), 376; https://doi.org/10.3390/info10120376
Article
A Global Extraction Method of High Repeatability on Discretized ScaleSpace Representations
^{1}
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
^{2}
School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, China
^{*}
Authors to whom correspondence should be addressed.
Received: 2 October 2019 / Accepted: 25 November 2019 / Published: 28 November 2019
Abstract
:This paper presents a novel method to extract local features, which instead of calculating local extrema computes global maxima in a discretized scalespace representation. To avoid interpolating scales on few data points and to achieve perfect rotation invariance, two essential techniques, increasing the width of kernels in pixel and utilizing diskshaped convolution templates, are adopted in this method. Since the size of a convolution template is finite and finite templates can introduce computational error into convolution, we sufficiently discuss this problem and work out an upper bound of the computational error. The upper bound is utilized in the method to ensure that all features obtained are computed under a given tolerance. Besides, the technique of relative threshold to determine features is adopted to reinforce the robustness for the scene of changing illumination. Simulations show that this new method attains high performance of repeatability in various situations including scale change, rotation, blur, JPEG compression, illumination change, and even viewpoint change.
Keywords:
local feature extraction; scalespace representation; Laplacian of Gaussian; convolution template1. Introduction
Local feature extraction is a fundamental technique for solving problems of computer vision, such as matching, tracking, and recognition. A local feature is a structure around a point in an image, and its size, which relates to the scale, is usually unknown before it is extracted. The traditional Harris corner detector [1] does not consider the variance of scale, which accounts for a drawback that it cannot be applied to matching features with different scales. For detecting corners with different resolutions, Dufournaud [2] discussed a scaleinvariant approach based on the Harris detector, which adopts the Gaussian kernel with width $\sigma $ and uses a variable s as the scale factor. Therefore $s\sigma $ represents an arbitrary scale, by which corner features with different scales are detected by the traditional Harris corner detector. Scaleinvariant properties were systematically studied by Lindeberg. Introducing a normalized derivative operator [3] into the scalespace theory [4], Lindeberg presented a framework for automatic scale selection, pointing out that a local maximum of some combination of normalized derivatives over scale reflects a characteristic length of a corresponding structure, and has a nice behavior under rescaling of the intensity pattern [3], which has been a principle for solving problems of feature extraction. In SIFT [5,6], Lowe presented the DifferenceofGaussian (DoG) method on image pyramids, which is an inchoate type of multiscale representation, to approximate Laplacian of Gaussian (LoG). Mikolajczyk presented a Harris–Laplacian method [7], which uses Harris functions of images in scalespace representation to extract interesting points and then invokes Laplacian to select feature points as well as their precise scale parameters. This method afterwards is extended to an affineadapted approach [8]. Aiming at reducing computation time, Bay introduced integral images and box filters and worked out SURF [9,10]. Because of the techniques of integral images and box filters, Hessian feature detector in SURF is revised into FastHessian detector, which can be computed more quickly than the former. Recently, LomeliR and Nixon presented a feature detector, the Locally Contrasting Keypoints detector (LOCKY) [11,12], which extracts blob keypoints directly from the Brightness Clustering Transform (BCT) of an image. The BCT also exploits the technique of integral images, and performs a fast search through different scale spaces by the strategy of coarsetofine.
In the extractors mentioned above, features are extracted through comparison amid its immediate neighbors, which are in the image and two other adjacent images. We here name this methodology of these extractors as LocalPrior Extraction (LPE). Due to the exponential growth of scale parameters usually adopted by LPE (which are too coarse to locate features precisely at the scale axis), the LPE needs interpolation or other refining procedures to obtain precise scales. A great advantage of LPE is the relatively low cost of computation, which enables LPE to be broadly applied to numerous extractors. However, the repeatability of features obtained by LPE is yet to be improved. We alternatively study a novel method, which, instead of LPE, extracts features in a discretized scalespace representation that has been constructed in advance, and name this new method as GlobalPrior Extraction (GPE). To achieve this goal, the pivot techniques contributed in this paper are: (1) the algorithm of globalprior extraction, which improves repeatability of extracted features; (2) the diskshaped convolution templates of increasing size in pixel, which is applied to realize rotation invariance and to obtain precise scales of local features; and (3) the threshold relative to the maximal feature response, which is employed to achieve illumination invariance. The rest of this paper is organized as follows. In Section 2, we give a brief introduction for GPE. In Section 3, we present an approach to compute feature responses in a discretized scalespace representation and to represent these responses by a threedimensional array. In Section 4, we carry out a method for finding local features in the array to achieve the extraction of features. In Section 5, we test the algorithm of GPE and compare results with some classical extractors. We conclude our work in Section 6.
2. Sketch of GPE
Suppose $f(u,v)$$(u,v\in \mathbb{R})$ to be a twodimensional signal and $K(\xb7;t)$$(t\in {\mathbb{R}}_{+})$ to be a kernel with width $\sqrt{t}$. Then, the scalespace representation of $f(u,v)$ is (cf. [4])
$$\begin{array}{l}\left\{\begin{array}{l}L(u,v;0)=f(u,v),\\ L(\xb7,\xb7;t)=K(\xb7,\xb7;t)\ast f.\end{array}\right.\end{array}$$
Using the scalenormalized derivative $\mathcal{D}$, the scale space in Equation (1) can be transformed to (cf. [3])
$$\begin{array}{c}\mathcal{D}L(\xb7,\xb7;t)=\mathcal{D}K(\xb7,\xb7;t)\ast f.\end{array}$$
Assume that $\mathcal{K}(\xb7;\xb7)$ is an appropriate kernel that sensitively responds to a certain class of features, which is applied as a detector for some kinds of features. Then, in any bounded open set $\Omega \subset {\mathbb{R}}^{2}\times {\mathbb{R}}_{+}$, the expression
represents a maximal responding position of both spatial space and scale space and therefore is an extremum of $L(u,v;t)$. This extremal point is a feature point in the signal $f(x,y)$, and has the scaleinvariant property.
$$\begin{array}{c}(x,y;\tau )\in \underset{(u,v;t)\in \Omega}{\mathrm{argmax}}\mathcal{D}{L}^{2}(u,v;t),\end{array}$$
An iterative procedure can be proposed to work out scaleinvariant features in $f(x,y)$. Let $F=\varphi $ be the initial set of feature points, and $({x}_{i},{y}_{i};{\tau}_{i})$ be the ith feature point calculated through Equation (3). Denote by ${U}_{i}$ a neighborhood of the point $({x}_{i},{y}_{i};{\tau}_{i})$. Put ${\Omega}_{i}=\Omega \backslash {\cup}_{k=1}^{i1}{U}_{k}$ (obviously, ${\Omega}_{i}={\Omega}_{i1}\backslash {U}_{i1}$). The $(i+1)$th feature point can be computed through steps as follows.
 Compute the point of maximal response in ${\Omega}_{i}$ through Equation (3).
 Update the set ${\Omega}_{i+1}={\Omega}_{i}\backslash {U}_{i}$.
Repeatedly executing the two steps above, we can obtain a set ${\left\{({x}_{i},{y}_{i};{\tau}_{i})\right\}}_{i=1}^{N}$ for choosing features, where N is the times of iterations, and $({x}_{1},{y}_{1};{\tau}_{1})$ is obtained from ${\Omega}_{1}:=\Omega $.
In contrast to LPE, GPE does not detect features during the procedure of generating scaled images, but instead detects features in a discretized scalespace representation constructed beforehand. Therefore there are two essential stages in GPE: (1) constructing a discrete scalespace representation and transforming it properly; and (2) obtaining maxima iteratively in this transformed discrete scalespace representation.
3. Discretization and Transformation of ScaleSpace Representations
The natural structure imposed on a scalespace representation is a semigroup, and the kernels should satisfy $K(\xb7;{t}_{1})\ast K(\xb7;{t}_{2})=K(\xb7;{t}_{1}+{t}_{2})$ [4]. For retaining the semigroup structure within some range in scale when discretizing a scalespace representation, one can sample scales equidistantly from the scale space. However, a computer image $f(x,y)(1\le x\le c,1\le y\le d;x,y,c,d\in {\mathbb{Z}}_{+})$ can be regarded as a sample drawn equidistantly from a given twodimensional signal $f(u,v)$$(u,v\in \mathbb{R})$. The domain of $f(x,y)$ therefore consists of finitely many pixels. Considering the computation of discrete convolution and its cost, we alternatively employ pixel as the unit for the width of kernels, and then determine sampling intervals on the scale space by these widths. We here call the kernel width in pixel as the pixel scale. When increasing the width of kernels by a single pixel each time, a sequence of samples with pixel scale $1,4,9,\cdots ,{N}^{2}$ (where N is the maximal width of kernels used in computation), can be drawn from a scalespace representation. In contrast to multiplying the original scale, the preference of increasing scale by adding pixels rids GPE of interpolating scale values as many LPE extractors do.
3.1. Choice of an Appropriate Kernel
To choose a suitable kernel for our method, we consider the normalized LoG
where
and $G(\xb7;\xb7)$ is the Gaussian kernel.
$$\begin{array}{c}{\nabla}_{norm}^{2}G={G}_{xx}^{norm}+{G}_{yy}^{norm},\end{array}$$
$$\begin{array}{ll}{G}_{xx}^{norm}(x,y):& ={\sigma}^{2}{G}_{xx}(x,y)\\ & =\frac{1}{\sqrt{2\pi}\sigma}(\frac{{x}^{2}}{{\sigma}^{2}}1){e}^{\frac{{x}^{2}+{y}^{2}}{2{\sigma}^{2}}},\\ {G}_{yy}^{norm}(x,y):& ={\sigma}^{2}{G}_{yy}(x,y)\\ & =\frac{1}{\sqrt{2\pi}\sigma}(\frac{{y}^{2}}{{\sigma}^{2}}1){e}^{\frac{{x}^{2}+{y}^{2}}{2{\sigma}^{2}}},\end{array}$$
The LoG is preferable due to its excellent performance on scalespace feature detecting. Mikolajczyk pointed out that the LoG is the most efficient one to draw interesting points over a scale space in contrast to operators such as DoG, Gradient and Harris [7]. Moreover, the LoG operator (Equation (4)) is a strict rotationinvariant integral kernel when the integral region is a finite disk. The rotation invariance is justified by the following reasoning.
Suppose that $\mathit{A}$ is a 2by2 orthogonal matrix and $\xi $ is a vector in ${\mathbb{R}}^{2}$. It is obvious that
$$\begin{array}{c}{\nabla}_{norm}^{2}G\left(\xi \right)={\nabla}_{norm}^{2}G\left(A\xi \right).\end{array}$$
Consider two signals f and ${f}^{\prime}$ related by $f\left(\xi \right)={f}^{\prime}\left(A\xi \right)$. Then, on a disk $\mathcal{D}$ with center $\mathit{c}$, we have
where ${\mathcal{D}}^{\prime}$ is a disk centered at $\mathit{A}\mathit{c}$ with radius identical to that of the disk $\mathcal{D}$.
$$\begin{array}{l}{\int}_{\mathcal{D}}{\nabla}_{norm}^{2}G(\xi \mathit{c})f\left(\xi \right)\sqrt{\mathrm{d}{\xi}^{T}\mathrm{d}\xi}\\ ={\int}_{\mathcal{D}}{\nabla}_{norm}^{2}G\left(\mathit{A}(\xi \mathit{c})\right){f}^{\prime}\left(\mathit{A}\xi \right)\sqrt{\mathrm{d}{\left(\mathit{A}\xi \right)}^{T}\mathrm{d}\left(\mathit{A}\xi \right)}\\ ={\int}_{{\mathcal{D}}^{\prime}}{\nabla}_{norm}^{2}G(\eta \mathit{A}\mathit{c}){f}^{\prime}\left(\eta \right)\sqrt{\mathrm{d}{\eta}^{T}\mathrm{d}\eta},\end{array}$$
Under the scaleinvariant framework, many feature detectors can be modified into scaleinvariant detectors. However, some of them, such as the determinant of the Hessian, the Gradient, and the Harris, are not rotationinvariant on such a disk region because ${G}_{xy}\left(\xi \right)\ne {G}_{xy}\left(\mathit{A}\xi \right)$, ${G}_{x}\left(\xi \right)\ne {G}_{x}\left(\mathit{A}\xi \right)$, and ${G}_{y}\left(\xi \right)\ne {G}_{y}\left(\mathit{A}\xi \right)$.
3.2. Size of Convolution Templates
To compute the convolution of a kernel with a computer image, it should be discretized into a bounded template. By our foregoing results, the templates for LoG in GPE should be disks with certain radii. Denote by ${r}_{T}$ the radius of a LoG template utilized in GPE. We discuss how to determine the radius ${r}_{T}$.
Suppose the current scale to be ${\sigma}^{2}$. For a given signal $f(u,v)$, it follows that
$$\begin{array}{ll}L(u,v;{\sigma}^{2})& =\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int \int}}_{{\mathbb{R}}^{2}}(\frac{{x}^{2}+{y}^{2}}{{\sigma}^{2}}2){e}^{\frac{{x}^{2}+{y}^{2}}{{\sigma}^{2}}}f(xu,yv)\mathrm{d}x\mathrm{d}y\\ & =\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int \int}}_{{\mathbb{R}}^{2}}(\frac{{r}^{2}}{{\sigma}^{2}}2){e}^{\frac{{r}^{2}}{{\sigma}^{2}}}f(r\mathrm{cos}\theta u,r\mathrm{sin}\theta v)r\mathrm{d}r\mathrm{d}\theta \\ & =\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int}}_{0}^{\infty}r(\frac{{r}^{2}}{{\sigma}^{2}}2){e}^{\frac{{r}^{2}}{{\sigma}^{2}}}{{\displaystyle \int}}_{0}^{2\pi}f(r\mathrm{cos}\theta u,r\mathrm{sin}\theta v)\mathrm{d}\theta \mathrm{d}r.\end{array}$$
When $r>4\sigma $, the function $g\left(r\right)=r(\frac{{r}^{2}}{{\sigma}^{2}}2){e}^{\frac{{r}^{2}}{{\sigma}^{2}}}$ is monotonically decreasing. It is easy to know that
$$\begin{array}{c}0<g\left(r\right)\xb7{I}_{\{r>4\sigma \}}<56\sigma {e}^{\frac{{r}^{2}}{{\sigma}^{2}}}.\end{array}$$
Let
$$\begin{array}{cc}e\left(4\sigma \right)=& \frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int}}_{4\sigma}^{\infty}r(\frac{{r}^{2}}{{\sigma}^{2}}2){e}^{\frac{{r}^{2}}{{\sigma}^{2}}}{{\displaystyle \int}}_{0}^{2\pi}f(r\mathrm{cos}\theta u,r\mathrm{sin}\theta v)\mathrm{d}\theta \mathrm{d}r.\end{array}$$
Then, we have
where $h\left(r\right)={\int}_{0}^{2\pi}f(r\mathrm{cos}\theta u,r\mathrm{sin}\theta v)\mathrm{d}\theta $. Supposing that the maximal gray level is $\gamma $, we further have
$$\begin{array}{c}e\left(4\sigma \right)=\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int}}_{4\sigma}^{\infty}h\left(r\right)g\left(r\right)\mathrm{d}r<\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int}}_{4\sigma}^{\infty}56\sigma {e}^{\frac{{r}^{2}}{{\sigma}^{2}}}h\left(r\right)\mathrm{d}r,\end{array}$$
$$\begin{array}{c}e\left(4\sigma \right)<56\gamma \sqrt{2\pi}{{\displaystyle \int}}_{4\sigma}^{\infty}{e}^{\frac{{r}^{2}}{{\sigma}^{2}}}\mathrm{d}r<14\gamma \sigma \pi \sqrt{2\pi}{e}^{16}.\end{array}$$
Then, we set
and estimate $\mathcal{D}L(u,v,;\sigma )$ by $v\left(4\sigma \right)$. The inequality in Equation (6) gives an upper bound of the error introduced by the use of convolution templates with finite size (which is $4\sigma $ here). Utilizing this upper bound, we can preclude points not satisfying the tolerance of computation error from feature candidates. Therefore, we introduce a relative error threshold $\alpha $, and construct a threshold for feature response:
where $\tilde{\sigma}$ is the maximal width of kernels used in the computation of drawing features. Hence, the function to determine features is:
and the maximal point $(u,v;\sigma )$ is a feature point if and only if $\rho (u,v;\sigma )=1$.
$$\begin{array}{c}v\left(4\sigma \right)=\frac{1}{\sqrt{2\pi}\sigma}{{\displaystyle \int}}_{0}^{4\sigma}r(\frac{{r}^{2}}{{\sigma}^{2}}2){e}^{\frac{{r}^{2}}{{\sigma}^{2}}}{{\displaystyle \int}}_{0}^{2\pi}f(r\mathrm{cos}\theta u,r\mathrm{sin}\theta v)\mathrm{d}\theta \mathrm{d}r,\end{array}$$
$$\begin{array}{c}\beta =\frac{14\gamma \tilde{\sigma}\pi \sqrt{2\pi}{e}^{16}}{\alpha},\end{array}$$
$$\begin{array}{c}\rho (u,v;\sigma )=1\xb7{I}_{\{\mathcal{D}{L}^{2}(u,v,;\sigma )\u2a7e{\beta}^{2}\}}+0\xb7{I}_{\{\mathcal{D}{L}^{2}(u,v,;\sigma )<{\beta}^{2}\}},\end{array}$$
In summary, we set the radius of the convolution template in GPE as ${r}_{T}=4\sigma $, and introduce a relative error threshold $\alpha $ to ensure that all features obtained are computed under a given tolerance.
3.3. Algorithm for Discretizing and Transforming ScaleSpace Representations
The general idea of discretizing and transforming a scalespace representation is to produce a sequence of smoothed images, which are obtained through convolution between the original image and a series of LoG templates with increasing widths. The criterion to stop the process is the maximal width of kernels, which should be set in advance. A pseudocode for this algorithm is shown in Algorithm 1.
Algorithm 1: Sampling a scalespace representation 
Input: (i) an image to be processed, $f(x,y)$; (ii) the maximal pixel scale, N. 
Calculate the maximal gray level $\gamma $ in the image. 
for $\sigma =1:N$

end for 
Build a 3dimensional array $\mathcal{A}(:,:,:)$ by $\mathcal{A}(x,y,\sigma )=\mathcal{D}{L}_{\sigma}^{2}(x,y)$ $(\sigma =1,2,\cdots ,N)$; 
Output: the array $\mathcal{A}(:,:,:)$, and the maximal gray level $\gamma $. 
In this algorithm, the responses of LoG on discretized scalespace representation are described by a threedimensional array, where the first and second dimensions represent xaxis and yaxis, respectively, for the image, and the third dimension is the scale axis.
4. Extracting Features from Discretized ScaleSpace Representations
It is easy to find the maximal entry in the threedimensional array $\mathcal{A}$ in Algorithm 1, and therefore through recording extrema and then excluding their neighborhoods iteratively, a series of candidates of local features can be extracted. A crucial problem is how many candidates should be chosen as true local features. Since the scheme of global comparison is adopted in GPE, besides the threshold $\beta $ mentioned in the previous section, a parameter $\lambda $ can also be introduced to calculate a relative threshold to be applied to determinate a position in the series of candidates, before which all candidates are considered as local features. The parameter $\lambda $ works with the maximal entry in the array $\mathcal{A}$ (denoted by M here). When the response of a candidate times $\lambda $ is less than M, the candidate is not a local feature. Otherwise, it is a local feature. Because of the scheme of local comparison, LPE employs an absolute threshold to determinate whether a candidate is a local feature. In contrast to the relative threshold in GPE, it lowers adaptiveness in the scene of illumination change. From Equation (2), it is easy to know that, for two images with the same content but different illumination, LPE and the GPE that only applies $\beta $ as the threshold can compute different sets of extremal points under variant illumination, whereas, using the relative threshold, GPE computes the same set under variant illumination, and therefore achieves adaptiveness to illumination change. Here, we call the adaptiveness to illumination change as illumination invariance. Considering the subpixel accuracy is beneficial for precisely measuring the spatial locations of features, we adopt the technique of interpolation to adjust primitive positions obtained through kernels with pixelmeasured width and center. Here, the spline method is applied on squares of $7\times 7$ pixels surrounding primitive feature points to calculate offsets with respect to original spatial positions. We introduce a parameter $\delta $ to represent the resolution of interpolation. Namely, the value of $\delta $$(0<\delta \u2a7d1)$ indicates the resolution to be $\delta $, where $\delta =1$ means the position of features to be pixelmeasured without interpolation. Algorithm 2 shows the algorithm for extracting features in GPE.
Algorithm 2: Extracting extrema in discretized scalespace representations 
Input: (i) the sample $\mathcal{A}(:,:,:)$ (an ${n}_{1}\times {n}_{2}\times {n}_{3}$ array) and the maximal gray leve $\gamma $ obtained by Algorithm 1; (ii) the relative error threshold $\alpha $; (iii) a positive real number $\lambda $; (iv) the resolution of interpolation $\delta $. 
Calculate the threshold $\beta $ for the error tolerance by Equation (7) 
for $i=1:{n}_{1}{n}_{2}{n}_{3}$

end for 
Build a matrix $\mathit{M}(:,:)$ by all records (column vectors) from step (d); 
Output: the matrix $\mathit{M}(:,:)$ consisting of extracted local features. 
In this algorithm, local features in the image $f(x,y)$ are extracted to construct a matrix $\mathit{M}$ whose columns are vectors ${({x}_{i},{y}_{i},{\sigma}_{i})}^{T}$, $i=1,\cdots ,N$ (for some positive integer N), which means that there are N local features located at $({x}_{i},{y}_{i})$ in the image with pixel scale ${\sigma}_{i}$.
5. Simulations
Adjoining Algorithms 1 and 2, we arrive at a complete algorithm for GPE, and we utilize repeatability to test the extracting performance of GPE. The score of repeatability is a ratio between the number of true matches and the number of matches. In general, an extractor attaining higher score of repeatability and larger number of true matches is a better extractor [13]. Test data, criteria, and codes for the test of repeatability can be found at Mikolajczyk [13] (the image sequences and the test software are from the website http://www.robots.ox.ac.uk/~vgg/research/affine/). In all following tests, the parameter N in Algorithm 1 was set to 16. Either a small $\alpha $ or a small $\lambda $ can lower the number of extracted features, whereas, if these parameters are large, the number of unstable features increases. Trading off this dilemma and through some experiments in advance, we set the parameters $\alpha $ and $\lambda $ in Algorithm 2 to be ${10}^{3}$ and $2\times {10}^{3}$, respectively. We tested the repeatability of GPE, Harris–Hessian–Laplace, SIFT, and SURF on Mikolajczyk’s test data. The executable file of Harris–Hessian–Laplace is from VGG (this executable file for Windows is from the website http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html). The executable file of SIFT detector is from David Lowe (this executable file for Windows is from the website http://www.cs.ubc.ca/lowe/keypoints/). The codes of SURF detector are OpenSURF, which are developed by Chris Evans (the codes of OpenSURF is from the website http://github.com/gussmith23/opensurf). The test results are shown in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8, where GPE1 and GPE0.1 mean GPE without interpolation and GPE with interpolation in the resolution of $0.1$, respectively.
In the aspect of repeatability, GPE shows promising results. In comparison with SIFT and SURF, except the score being close to SIFT under the scene of JPEG compression (cf. Figure 7a), GPE acquires prominent advantage in all other cases. In contrast to Harris–Hessian–Laplace detector, except for some special situations, i.e., viewpoint change with more than 40 degrees for the structured scene in Figure 1a, the viewpoint change with degrees greater than 60 for the textured scene in Figure 2, one slight change of scale change for the structured scene in Figure 3a and the first four cases of JPEG compression in Figure 7a, GPE obtains higher scores. Especially in Figure 4a, the interpolation technique in GPE (GPE0.1) obviously improves repeatability at the largest scale change for the textured scene. In the aspect of true recalls, GPE also shows better performance under the situations of viewpoint change for the textured scene (cf. Figure 2b), scale change for the textured scene (cf. Figure 4b), blur for the structured scene (cf. Figure 5b), JPEG compression (cf. Figure 7b), and illumination change (cf. Figure 8b). In addition to the above comparisons, we use the results directly from related works to compare with GPE, and discuss them in following subsections.
5.1. Comparison with Affine Detectors
In the work [13], there are eight sets of test results for six affine region detectors, namely HarrisAffine [8,14,15], HessianAffine [8,14], MSER [16], IBR [15,17], EBR [17,18], and Salient [19].
Since GPE is not intended for the situation of viewpoint change, when the viewpoint angle is greater than 30 degrees, the repeatability score for structured scene is less than all those affine detectors. However, from 20 to 30 degrees, GPE drastically overcomes any other detectors (cf. Figures 1a and 13a in [13]). In the situation of the images containing repeated texture motifs, in Figure 2a (in comparison with Figure 14a in [13]), it can be seen that, except viewpoint change of 60 degrees, GPE reaches higher repeatability score than all affine detectors, which means that, as long as the viewpoint angle is less than 50 degrees, GPE has strong capacity of extracting affine features. In the tests for scale change and rotation, GPE shows its obvious advantages in both structured scene and texture scene except at scale 4 in the textured scene, where HessianAffine attains the repeatability of $70\%$ (cf. Figures 3a and 4a and Figures 15a and 16a in [13]). In the results shown in Figure 5 (in contrast to Figure 17a in [13]) and Figure 6 (in contrast to Figure 18a in [13]), GPE shows excellent capacity to cope with the situation of blur in both structured scene and texture scene. In those comparisons, none of other detectors achieve higher repeatability score than GPE at any test point. In the test of JPEG compression, GPE has similar performance compared with HarrisAffine and HessianAffine, but obviously outperforms other LPE detectors (cf. Figures 7a and 19a in [13]). The HessianAffine detector shows its slight advantage for JPEG compression change. Figures 8a and 20a in [13] show that GPE has good robustness to illumination change and overall higher repeatability score than other extractors.
5.2. Comparison with Detectors of FastHessian, DoG, HarrisLaplace and HessianLaplace
In the work [10], five detectors, FH15, FH9, DoG [6], HarrisLaplace, and HessianLaplace [14], were tested for repeatability performance of viewpoint change for structured scene, viewpoint change for textured scene, scale change for structured scene, and blur for structured scene. Therefore, there are four results can be adapted directly, which are shown in the Figures 16 and 17 in [10]. Comparing these results, respectively, with the (a) in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5, it can be seen that except one point (which is the scale change about $1.3$ for FH15 and DoG), GPE apparently overcomes FH15, FH9, DoG, HarrisLaplace, and HessianLaplace in all these tests.
5.3. Comparison with Locally Contrasting Keypoints Detector
Figure 5 in [11] and Figure 7 in [12] show results in the tests of LOCKY, where the subfigures, (a)–(h) correspond to Figure 1a, Figure 2a, Figure 3a, Figure 4a, Figure 5a, Figure 6a, Figure 7 and Figure 8a, respectively, in our work. Since LOCKY mainly aims to achieve faster computation than most of the currently used feature detectors, except the cases that viewpoints are greater than 40 in the Graffiti sequence, GPE shows apparently higher repeatability score than LOCKY.
6. Conclusions
We present a new method (GPE) for local feature extracting with high repeatability, which transforms a discretized scalespace presentation through LoG and extracts local features by the scheme of global comparison. Because convolution templates of disk shape are used, GPE is rotationinvariant. Discussion for the radii of convolution templates and the error caused by finite radii is an important merit in our work. We first decompose the LoG transformation of a discretized scalespace presentation into two parts, the approximation and the error. Then, an upper bound of the error under a given radius is worked out and we utilize this upper bound to determinate a threshold, below which the candidates are no longer regarded as features since the computational error can influence the precision of the approximation (cf. Equations (6) and (7)). Because of the global comparison, the relative threshold can be employed to choose local features from candidates, and hence these chosen features are illuminationinvariant. Since the kernel width increases only one pixel a time, GPE obtains more precise scales for extracted local features without interpolation than LPE does, and therefore the step of interpolation for precisely locating the scale of a feature point in LPE is elided in GPE. Simulations show that GPE reaches high performance for repeatability and true recalls in various situations, including scale change, rotation, blur, JPEG compression, illumination change, and even viewpoint change of a textured scene.
Author Contributions
Methodology, original draft and writing, Q.Z.; Supervision, B.S.
Funding
This research was supported by Guangdong Project of Science and Technology Development (2014B09091042) and Guangzhou Sci & Tech Innovation Committee (201707010068).
Acknowledgments
The authors appreciate Krystian Mikolajczyk for his test dataset, criteria and executable files. The authors are also thankful to David Lowe for his executable file of SIFT, and to Chris Evan for his code of OpenSURF.
Conflicts of Interest
The authors declare no conflict of interest.
References
 Harris, C.; Stephens, M. A combined corner and edge detector. Proc. Alvey Vision Conf. 1988, 1988, 147–151. [Google Scholar]
 Dufournaud, Y.; Schmid, C.; Horaud, R. Matching images with different resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, 15 June 2000; Volume 1, pp. 612–618. [Google Scholar] [CrossRef]
 Lindeberg, T. Feature Detection with Automatic Scale Selection. Int. J. Comput. Vis. 1998, 30, 79–116. [Google Scholar] [CrossRef]
 Lindeberg, T. Scalespace theory: A basic tool for analyzing structures at different scales. J. Appl. Stat. 1994, 21, 225–270. [Google Scholar] [CrossRef]
 Lowe, D.G. Object recognition from local scaleinvariant features. iccv 1999, 2, 1150–1157. [Google Scholar] [CrossRef]
 Lowe, D.G. Distinctive Image Features from ScaleInvariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
 Mikolajczyk, K.; Schmid, C. Indexing based on scale invariant interest points. In Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV 2001), Vancouver, BC, Canada, July 7–14 2001; Volume 1, pp. 525–531. [Google Scholar] [CrossRef]
 Mikolajczyk, K.; Schmid, C. An Affine Invariant Interest Point Detector. In Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I; Heyden, A., Sparr, G., Nielsen, M., Johansen, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 128–142. [Google Scholar] [CrossRef]
 Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Computer Vision—ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7–13, 2006. Proceedings, Part I; Leonardis, A., Bischof, H., Pinz, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar] [CrossRef]
 Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. SpeededUp Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
 LomeliR, J.; Nixon, M.S. The Brightness Clustering Transform and Locally Contrasting Keypoints. In Computer Analysis of Images and Patterns; Azzopardi, G., Petkov, N., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 362–373. [Google Scholar]
 LomeliR, J.; Nixon, M.S. An extension to the brightness clustering transform and locally contrasting keypoints. Mach. Vis. Appl. 2016, 27, 1187–1196. [Google Scholar] [CrossRef]
 Mikolajczyk, K.; Tuytelaars, T.; Schmid, C.; Zisserman, A.; Matas, J.; Schaffalitzky, F.; Kadir, T.; Gool, L.V. A Comparison of Affine Region Detectors. Int. J. Comput. Vis. 2005, 65, 43–72. [Google Scholar] [CrossRef]
 Mikolajczyk, K.; Schmid, C. Scale & Affine Invariant Interest Point Detectors. Int. J. Comput. Vis. 2004, 60, 63–86. [Google Scholar] [CrossRef]
 Schaffalitzky, F.; Zisserman, A. Multiview Matching for Unordered Image Sets, or “How Do I Organize My Holiday Snaps?”. In Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I; Heyden, A., Sparr, G., Nielsen, M., Johansen, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 414–431. [Google Scholar] [CrossRef]
 Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust Wide Baseline Stereo from Maximally Stable Extremal Regions. Br. Mach. Vis. Conf. 2002, 22, 384–393. [Google Scholar]
 Tuytelaars, T.; Gool, L.V. Wide baseline stereo matching based on local, affinely invariant regions. In Proceedings of the British Machine Vision Conference 2000 (BMVC 2000), Bristol, UK, 11–14 September 2000; pp. 412–425. [Google Scholar]
 Tuytelaars, T.; Van Gool, L. Matching Widely Separated Views Based on Affine Invariant Regions. Int. J. Comput. Vis. 2004, 59, 61–85. [Google Scholar] [CrossRef]
 Kadir, T.; Zisserman, A.; Brady, M. An Affine Invariant Salient Region Detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2004; pp. 228–241. [Google Scholar]
Figure 1.
Repeatability score (a) and number of correspondences (b) under viewpoint change for the structured scene by the Graffiti sequence.
Figure 2.
Repeatability score (a) and number of correspondences (b) under viewpoint change for the textured scene by the Wall sequence.
Figure 3.
Repeatability score (a) and number of correspondences (b) under scale change for the structured scene by the Boat sequence.
Figure 4.
Repeatability score (a) and number of correspondences (b) under scale change for the textured scene by the Bark sequence.
Figure 5.
Repeatability score (a) and number of correspondences (b) under blur for the structured scene by the Bikes sequence.
Figure 6.
Repeatability score (a) and number of correspondences (b) under blur for the textured scene by the Trees sequence.
Figure 7.
Repeatability score (a) and number of correspondences (b) under JPEG compression by the UBC sequence.
Figure 8.
Repeatability score (a) and number of correspondences (b) under illumination change by the Leuven sequence.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).