Assigning Main Orientation to an EOH Descriptor on Multispectral Images

This paper proposes an approach to compute an EOH (edge-oriented histogram) descriptor with main orientation. EOH has a better matching ability than SIFT (scale-invariant feature transform) on multispectral images, but does not assign a main orientation to keypoints. Alternatively, it tends to assign the same main orientation to every keypoint, e.g., zero degrees. This limits EOH to matching keypoints between images of translation misalignment only. Observing this limitation, we propose assigning to keypoints the main orientation that is computed with PIIFD (partial intensity invariant feature descriptor). In the proposed method, SIFT keypoints are detected from images as the extrema of difference of Gaussians, and every keypoint is assigned to the main orientation computed with PIIFD. Then, EOH is computed for every keypoint with respect to its main orientation. In addition, an implementation variant is proposed for fast computation of the EOH descriptor. Experimental results show that the proposed approach performs more robustly than the original EOH on image pairs that have a rotation misalignment.


Introduction
Keypoint and descriptor techniques have been widely applied in computer vision or pattern recognition. Applications include stereo vision, 3D scene reconstruction, human activity recognition, etc. Keypoints are often matched by computing the distance of their associated descriptors. The matching ability of descriptors is measured with the repeatability and distinctiveness, and in practice, a trade-off is often made between them. On single spectral images, SIFT [1] and its variants with post-processing techniques (e.g., RANSAC) have witnessed many successful applications. On multi-sensor (multispectral) images, SIFT descriptors generate few correct mappings. Recently, the edge-oriented histogram (EOH) [2] was proposed, which utilizes only edge points and five bins for computing descriptors. EOH has a better matching performance on multispectral images than SIFT, but does not assign a main orientation to keypoints, which limits its application to images containing translation misalignment.

Related Work
Salient points have been widely used in a variety of fields, including object tracking, image fusion, intelligent navigation, etc. [3][4][5][6][7][8][9]. Many keypoint and descriptor detection techniques have been proposed for single spectral images. Lowe [1] proposed SIFT detecting keypoints invariant to scale and rotation. The keypoints are defined to be the extrema of the difference of Gaussians (DOG). The local gradient pattern around a keypoint with respect to an assigned main orientation is computed as its descriptor. Bay et al. [10] proposed SURF (speeded-up robust features). SURF has the same repeatability and distinctiveness as SIFT, but is computed faster than SIFT. Alahi et al. [11] proposed fast retina keypoint (FREAK). FREAK is a cascade of binary strings computed by comparing image intensities over a retinal sampling pattern. Ambai and Yoshida [12] proposed compact and real-time descriptors (CARD). Compared with SIFT and SURF, CARD can be computed rapidly utilizing lookup tables to extract histograms of oriented gradients. Other descriptors include ORB [13] and PCA-SIFT [14].
The above descriptors are devised for single-sensor images and yield a good matching performance on such images. Recently, multispectral systems became an attractive research topic, since they provide a rich representation of scene with images taken by different sensors [15]. Barrera et al. proposed an imaging system for computing depth maps from color and infrared images [16]. Stereo vision can be accomplished by keypoint matches. However, the descriptors, such as SIFT, SURF and ORB, are computed by utilizing the gradient pattern, which may revert on multispectral images [17,18], and hence, their performance deteriorates [19]. Since the computing gradient is a linear operation of original image intensities, the matching ability of descriptors relies on the linear relationship between image intensities.
Three factors contribute to the decrease of matching ability: the repeatability of keypoints, the accuracy of main orientation and the repeatability/distinctiveness of descriptors. From the perspective of descriptors, many techniques have been proposed to adapt descriptors of SIFT/SURF to multispectral images. Chen et al. [18] proposed the partial intensity invariant feature descriptor (PIIFD), which uses gradient orientation instead of direction. The gradient orientation is limited within [0, π), and PIIFD can register poor-quality multi-modal retinal image pairs. Saleem and Sablatnig [20] proposed NG-SIFT, which computes descriptors using a normalized gradient. NG-SIFT outperforms SIFT on image pairs of a visible image and a near-infrared image. Dellinger et al. [21] proposed SAR-SIFT for SAR images. SAR-SIFT uses a new gradient computation method, gradient by ratio (GR), which is robust to speckle noise, so that it can perform better on SAR images than SIFT. Hossain et al. [17] proposed the symmetric-SIFT algorithm for multi-modal image registration. It overcomes the problem that gradient direction could be inverted in different sensors.
Aguilera et al. [2] proposed the edge-oriented histogram (EOH). Unlike SIFT, EOH exploits only edge points in local windows rather than all pixels, since in general, edges are more likely repeatable and, hence, tending to be reliable between multi-sensor images. For an edge pixel, five responses are computed with filters designed in [22]. Edge points are detected by the Canny detector [23]. Note that the first four filters are directional derivatives, and the fifth filter is "no direction". The problem with EOH is that it does not assign a main orientation to keypoints, which amounts to assuming that the main orientation for all keypoints takes on the same value, e.g., 0 • . When the misalignment does not contain a rotation component, EOH works pretty well [2], but this limits the application of EOH to translation only. In real applications, image pairs taken from different views often contain a rotation component in the misalignment, and rotation invariant descriptors are hence desired or necessitated.

Proposed Method
To adapt EOH to dealing with rotation, we propose assigning a main orientation to keypoints for EOH computation. The main orientation makes EOH invariant to both translation and rotation and, hence, invariant to similarity transformation and partially invariant to affine transformations [1]. Note that the rotation contained in the misalignment is unknown and by no means can one obtain it before building keypoints.
Gauglitz et al. [24] proposed two orientation assignment methods. One method is to utilize the center-of-mass (COM), which is suitable for corners, and the other one is to utilize the histogram of intensities (HOI). Both COM and HOI are more suitable for single-spectral images, since they implicitly use the linear relationship of image intensities. This work utilizes the main orientation provided by PIIFD [18]. When a main orientation is assigned to a keypoint, the computation of its associated EOH descriptor needs interpolation at fractional pixels. Bilinear interpolation is applied to compute the response of the five filters used by EOH.
The rest of the paper is arranged as follows. Section 2 discusses assigning main orientation to keypoints and computing descriptors for keypoints relative to the main orientation, Section 3 gives the matching scheme for keypoints, Section 4 presents experimental results comparing the matching performance, and Section 5 concludes this paper.

Assigning Main Orientation to the Keypoint and Compute Descriptor
This section discusses assigning main orientation to keypoints, then the descriptor is computed for every keypoint with respect to the assigned main orientation.

Why a Main Orientation is Needed for Keypoints
Let I r (x, y) and I t (x, y) denote the reference and the test image to be registered. SIFT [1] calculates the histogram of gradient orientation and finds its peak in a local window to serve as the main orientation. Chen et al. [18] considered the problem of gradient and/or region reversal and square gradient (G x , G y ) by: and then smoothed the squared gradient by convolving it with an average filter h σ , The main orientation is calculated as follows, A careful derivation shows that the main orientation φ and the traditional gradient (G x , G y ) roughly have the following relationship. Let θ = atan2 (G y , G x ) be the gradient direction falling in [−π, π]. atan2 is the four-quadrant inverse tangent for a gradient (G x , G y ), giving the actual gradient direction for (G x , G y ). In mathematics, it differs from tan −1 in that the range of tan −1 is (− π 2 , π 2 ), while the range of atan2 is (−π, π). θ is mapped to [0, π] by setting θ = mod(θ, π), i.e., Then: Equations (4) and (5) indicate that a gradient direction θ and its reversal direction θ ±π will contribute to the same main orientation bin.
EOH is applied to I r (x, y) and I t (x, y) to detect keypoints and descriptors. Let K i t , i = 1, . . . , N t , denote the i-th keypoint on the test image I t (x, y), and K j r , j = 1, . . . , N r , denote the j-th keypoint on the reference image I r (x, y). Let f i t , i = 1, . . . , N t , denote the descriptor of K i t , and f j r , j = 1, . . . , N r , denote the descriptor of K j r . Note that both EOH [2] and PIIFD [18] employ the extrema of DOG as keypoints, which was proposed in SIFT. When detecting keypoints, σ n = 0.5 is the standard deviation of the Gaussian function used for nominal smoothing of an input image. The threshold on the ratio of principle curvatures is set to 10, the default value in SIFT [1].
This work assigns the main orientation computed with PIIFD to keypoints K i t , i = 1, . . . , N t and K j r , j = 1, . . . , N r , then computes EOH descriptors. Like SIFT, the process of building keypoint matches includes four steps: (1) detect keypoints to be the extrema of DOG as proposed by SIFT; (2) assign to every keypoint main orientation computed with PIIFD; (3) compute the EOH descriptor for each keypoint with respect to its main orientation. Edge points are detected by the Canny operator, and all parameters of the Canny detectors are set to default values used by the MATLAB implementation, except that the standard deviation of the Gaussian filter σ is set to three, like in the original EOH [2]. The high threshold is defined to be the gradient magnitude ranked as the top 30%, and the low threshold is defined to be 40% of the high threshold. Interpolation is needed here to obtain pixel values at fractional pixels. Finally, (4) match the keypoint with the computed descriptor.

Compute the Descriptor for Keypoint with the Main Orientation
The EOH computes the gradient orientation at each edge pixel with the following five filters. These filters correspond to the 0 • , 45 • , 90 • , 135 • and non-direction, as shown in Figure 1. The filters shown in Figure 1a-d, are called direction filters, while the one shown in Figure 1e is called the non-direction filter. For a pixel, the filter giving the maximum response is defined to be the direction at the pixel.
Formally, let f k (x, y), k = 0, 1, 2, 3, 4, denote the mathematical representation of the five filters shown in Figure 1, then an edge pixel at (x, y) will contribute one to the bin defined by: where '•' is the correlation between image and filter. The four direction filters shown in Figure 1 are in fact the orientation partition used in PIIFD. SIFT employs eight orientations (bins) for computing descriptors, i.e., for a pixel, its gradient orientation is quantized to eight bins with the center of each bin being 0 • , 45 • , 90 • , 135 • , 180 • , 225 • , 270 • , 315 • . PIIFD considers the gradient reverse and utilize mod-180 • orientation. Specifically, let α x,y be the gradient orientation at (x, y). For SIFT, α x,y contributes to the bin: For PIIFD, it contributes to the bin: Similar to SIFT, PIIFD uses eight bins in Equation (8). However, Equation (8) maps an orientation α x,y and α x,y + 180 • to the same bin, i.e., bin PIIFD (α x,y ) = bin PIIFD (α x,y + 180 • ). PIIFD partitions [0, 2π] into 16 bins at first, with each bin covering 22.5 • , and then "merges" two bins if their center angles differ by 180 • . Equation (7) also says that the centers of the first four orientation bins for SIFT are 0 • , 45 • , 90 • , 135 • , exactly the same as EOH bins. Thus, When main orientation is assigned to a keypoint, we need to compute the maximum response of the five filters. The five filters are rotated by the amount of main orientation, and the rotated pixels for computing the filter response lie in a fractional grid, as shown in Figure 2. To obtain pixel values at the fractional grid, a bilinear interpolation is employed. Figure 2. When the main orientation is assigned to a keypoint, the filters shown in Figure 1 ought to be rotated with respect to the main orientation. Black dots represent the integer pixel grid, and red dots are the fractional pixel locations, whosewhole values are used by the rotated filters.
After the interpolation step, pixel values for computing the filter response are obtained. The filter giving the maximum response is defined to be the direction at this pixel and contributes to EOH descriptors. As in the original EOH [2], a local window of radius 50 is used for computing an EOH descriptor. Only edge pixels in the window contribute to the descriptor. Alternatively, we can skip the interpolation step and just utilize the gradient orientation at edge pixels, which is discussed in Section 2.3 as a variant of implementing EOH.

Variant Implementation of EOH
Computing the responses of five filters shown in Figure 1 can be speeded up by fast Fourier transform (FFT). We use I t (x, y) as an example and compute the filter responses with FFT. A similar process can be applied to I r (x, y). Let I 0 t (x, y) denote the response of the zeroth filter (0 • ) applied to I t (x, y). Formally, wherein '•' denotes the correlation, which can be rewritten as the convolution of f 0 (x, y) and I t (x, y). f 0 (x, y) is the version of f 0 (x, y) flipped left-right and up-down, i.e., f 0 (x, y) = f 0 (−x, −y). The convolution I t (x, y) * f 0 (x, y) can be quickly implemented with FFT.
The equivalence of four directional filter responses in EOH to the first four orientation bins in SIFT and the above discussed fast computation by FFT, motivate implementing EOH as follows. Define the directional gradient along the horizontal and vertical axes to be: Note, the gradient computation in Equation (11) is nothing but the Sobel operator [25]. Once the directional gradient is obtained, its direction can be simply calculated by: Equation (11) together with Equation (9) gives the orientation bin to which every edge pixel contributes. By this means, we implement a variant of EOH, which can be understood from the adaptation of different descriptors.
• The variant ignores the non-direction bin, which in our experiments proved to have little effect on matching performance. See the analysis in Tables 1 and 2 in Section 4. Furthermore, the orientation bin computed with Equations (12), (11) and (9) may not be identical to that computed with Equation (6). See Section 4 for their matching performance comparison.

Matching Keypoints with Descriptors
This section discusses matching keypoints with descriptors. The matching ability of descriptors is evaluated with the number of correct keypoint matches. To make a fair comparison for different descriptors, a simple matching approach suggested by SIFT is employed here. A reference keypoint K j 0 r is defined to be matched to a test keypoint K i 0 t if: where D(·, ·) is the Euclidean distance and f j 1 r is the second-closest neighbor to f i 0 t . The '0.8' in Equation (13) can be changed to 0.6, which means a tighter matching criterion giving fewer matched keypoints.
Equation (13) is the matching method suggested in the original SIFT [1]. Through Equation (13), a set of keypoint mappings can be established, which will be used to analyze the descriptor performance. See Section 4 for details. Note that post-processing techniques can be applied to keypoints and descriptors for removing outlier keypoint matches. Commonly-used techniques include RANSAC [26,27], its variant fast sample consensus (FSC) [28], etc. However, post-processing is to some extent independent of descriptor matching ability, and the resulting improvement ought to be excluded for comparing descriptors.

Experimental Results
This section presents the experimental results. Visual matching results are provided firstly, followed by the quantitative analysis on matching results. The proposed method is compared with the original EOH. Two datasets, EOIRand VS-long-wave infrared (LWIR), are used for investigating the matching performance. EOIR includes 87 image pairs captured by ourselves, 12 Landsat image pairs from NASA. The 87 image pairs include outdoor and indoor scenes with one image taken with visible light and the other taken with middle-wave infrared (MWIR) light. The 12 Landsat image pairs were downloaded from [29] with one taken with the visible band, e.g., Landsat 8 Band 3 Visible (0.53-0.59 µm), and the other taken with middle-wave light or the Thermal Infrared Sensor (TIRS), e.g., Landsat 8 Band 10 TIRS 1 (10.6-11.19 µm). Dataset VS-LWIR is from [2] containing 100 image pairs, one image taken with the visible bandwidth (0.4-0.7 µm) and the other taken with the long-wave infrared bandwidth (LWIR, 8-14 µm).

Visual Results
This section gives visual matching results. Figure 3 gives the keypoint matchings built with the original EOH without the main orientation and the proposed method. The visible image serves as the reference image, and the infrared image is used as the test image. The test image is rotated by 10 • , 20 • and 30 • . Figure 3a,c,e show the matching result of EOH between the reference and the rotated test image by 10 • , 20 • and 30 • , respectively. Due to the lack of main orientation, the keypoint matches built with the EOH contain very few or no correct matches. As a comparison, the proposed method provides sufficiently many correct matches in Figure 3b,d,f. Figure 4 shows the keypoint matches on an image pair from dataset EOIR built with EOH, EOH equipped with SIFT main orientation, with COM (center-of-mass) main orientation [24], with HOI (histogram of intensity) main orientation [24] and the proposed method. The infrared image is rotated by 20 • . EOH provides five keypoint matches in Figure 4a, and three are visually correct. SIFT main orientation gives seven keypoint matches in Figure 4b, and four matches are visually correct. The COM and HOI main orientations do not give many correct matches, as shown in Figure 4c,d, while the proposed method gives 11 keypoint matches in Figure 4e, and nine matches are visually correct. Visually, the SIFT main orientation and the proposed method give almost the same correct rate of matches, except that the proposed method gives more matches. The reason might be that although this pair of images was taken with a visible camera and an infrared camera, they are very close to single-spectrum images, i.e., brighter (darker) areas in the visible image are also brighter (darker) in the infrared image. However, the relationship between image intensities is not linear, which makes COM and HOI not perform very well. Figure 5 illustrates the keypoint matches on an image pair from dataset VS-LWIR built with EOH, EOH equipped with SIFT main orientation, COM main orientation, HOI main orientation and the proposed method. The performance of EOH and EOH equipped with SIFT main orientation in Figure 5a,b is inferior to that in Figure 4a,b. The performance of COM and HOI is not good either, as shown in Figure 5c,d. This image pair is taken with a visible camera and an LWIR camera. The multimodality between them causes the inaccuracy of SIFT main orientation, COM and HOI and, hence, the mismatches in Figure 5b-d. The proposed method, for the induction of main orientation to keypoints, performs still well on this image pair.

Quantitative Results
This section presents quantitative results. The above visual results can only provide a simpler comparison on a few image pairs, and the comparison result is affected by an individual criterion on "correct" matches. To quantitatively assess the performance for different methods, we perform statistics on the number of correct matches. We define a keypoint match to be correct if the distance d between the two keypoints comprising the match is smaller than a threshold d 0 . In the literature, different values have been used for d 0 , e.g., d 0 = 2, d 0 = 4, d 0 = 5, etc. [4]. To eliminate the effect of d 0 on the performance comparison, d 0 = 1, 2, 3, 4, 5, 10, 20, 50, 100 are used in this work. The number of keypoint matches of distance d < d 0 is counted and listed in Table 1 for dataset EOIR and Table 2 for dataset VS-LWIR.
In Tables 1 and 2, the test (infrared) image is rotated by 10 • , 20 • , 30 • and 45 • . The proposed method outperforms the original EOH for all rotation degrees. For example, when the test image is rotated by 10 • , 45.60% of the keypoint matches on dataset EOIR built with the proposed method has a distance less than five, i.e., falling in the range [0, 5], while the EOH has 33.17% falling in [0, 5]. On dataset VS-LWIR, the proposed method has 29.91% matches falling in [0, 5], while the EOH has 19.53%. Additionally, this also indicates that the dataset VS-LWIR is more challenging than EOIR. Both the proposed method and EOH provide superior results on dataset EOIR over VS-LWIR. COM and HOI perform only slightly better than the original EOH without the main orientation on EOIR and VS-LWIR. This to some extent indicates that the main orientations computed with COM and HOI on visible and infrared images are not so accurate as the ones computed with PIIFD, failing to account for the rotation difference between two images.
The performance decreases with the increase of rotation degree for all methods. For example, on dataset VS-LWIR, when the test image is rotated by 10 • , the proposed method has 48.13% of matches falling in [0, 10], but this number decreases to 43.08%, 33.13% and 24.02% when the test image is rotated by 20 • , 30 • and 45 • . For EOH, the percent of keypoint matches falling in [0, 10] decreases more than the proposed method, from 41.83% to 5.93%, 1.20% and 0.44%. The performance decrease for EOH is due to the lack of main orientation, while the decrease for the proposed method originates from the inaccuracy of computing main orientation. Figure 6 shows the performance of different methods under rotation. Keypoint matches of distance d ≤ 10 are defined to be correct. From Figure 6, it can be seen that the percent of correct matches for all methods decreases with the increase of rotation degree. On both EOIR and VS-LWIR, the proposed method and the variant implementation of EOH decrease slower than the original EOH without the main orientation, SIFT main orientation, COM main orientation and HOI main orientation.
The variant implementation of EOH with the main orientation performs comparable to the proposed method that utilizes the five filters in Figure 1. On dataset VS-LWIR, the variant gives 18.55% of matches falling in [0, 5] when the rotation is 20 • , and the proposed method gives 16.50%. When the rotation gets to 45 • , the variant gives 11.79%, and the proposed method gives 12.35%. It can also be observed from Figure 6 that the variant EOH proposed in Section 2.3 performs as well as the proposed method. From Tables 1 and 2, we can conclude that the proposed variant implementation of EOH can yield keypoint matches as reliable as the EOH assigned to the main orientation. This explains and verifies that the non-direction bin does not have a great effect on the matching ability and is not used in descriptors, such as SIFT, SURF and PIIFD.   the matching performance of EOH on images of misalignment containing rotation. Additionally, a variant of EOH is proposed that employs the gradient orientation as the filter responses. The variant EOH can be computed with respect to the main orientation more easily and achieve a comparable matching performance to the original EOH, but needs less computational cost.