MIFT: A Moment-Based Local Feature Extraction Algorithm

We propose a local feature descriptor based on moment. Although conventional scale invariant feature transform (SIFT)-based algorithms generally use difference of Gaussian (DoG) for feature extraction, they remain sensitive to more complicated deformations. To solve this problem, we propose MIFT, an invariant feature transform algorithm based on the modified discrete Gaussian-Hermite moment (MDGHM). Taking advantage of MDGHM’s high performance to represent image information, MIFT uses an MDGHM-based pyramid for feature extraction, which can extract more distinctive extrema than the DoG, and MDGHM-based magnitude and orientation for feature description. We compared the proposed MIFT method performance with current best practice methods for six image deformation types, and confirmed that MIFT matching accuracy was superior of other SIFT-based methods.


Introduction
The scale invariant feature transform (SIFT) was proposed by Lowe [1] to extract image features invariant to changes in image scale, rotation, illumination, viewpoint, and partial occlusion. SIFT has been widely used in various areas, including image stitching [2], image registration [3], object recognition [4]. Several SIFT variants and extensions have been developed recently to facilitate robust feature extraction. Ke [5] used principal component analysis (PCA) rather than histograms to reduce computational time and compared SIFT and PCA-SIFT. Bay [6] proposed speeded-up robust features (SURF) to reduce computational time. Mikolajczyk [7] presented a comparative study for several local descriptors. Kang [8] proposed a modified local discrete Gaussian-Hermite moment based SIFT (MDGHM-SIFT) that significantly improved matching accuracy by replacing gradient magnitude and orientation with accumulated MDGHM based magnitude and orientation. Junaid [9] proposed binarization of gradient orientation histograms to reduce storage and computational resources. Although these SIFT based algorithms can improve feature extraction performance, they are all the different types of Gaussian based methods and, hence, sensitive to complicated deformations, e.g., large illumination changes [10].
Gaussian-Hermite moment (GHM) has been recently shown to have merit for image features [11]. GHM base functions have different orders with different numbers of zero crossings, hence GHM can distinguish image features more efficiently, and incorporating part of the Gaussian function makes GHM less sensitive to noise. Discrete GHM (DGHM) [12,13] is a global feature representation method, and modified DGHM (MDGHM) [8] is an efficient local feature representation method generated 1.
Scale space extrema detection. Search keypoint candidates on the basis of the extrema over all scale images. This constructs a Gaussian pyramid and finds the local extrema in DoG images.

2.
Keypoint localization: Locate keypoints by removing unstable keypoint candidates having low contrast or poor localization along an edge. 3.
Orientation assignment. Identify the orientation for each keypoint based on local image gradient information. 4.
Build a descriptor for each keypoint.

Discrete Gaussian-Hermite Moment
The MDGHM has orthogonality, and calculates the moment for a local image area, acting like a filter mask to describe local features using neighboring information. Suppose I(i,j) is an image and t(u,v) is a mask with size M × N [0 ≤ u ≤ M − 1, 0 ≤ v ≤ N − 1]. The coordinates for t(u,v) are transformed to −1 ≤ x, y ≤ 1 by: Hence, the Hermite polynomial is: and the MDGHM mask can be expressed as: Thus, MDGHM of an image at point (i, j) can be expressed as: Figure 1 shows MDGHM examples with three derivative orders. MDGHM acts like a Gaussian filter with multi-order derivatives. Since the base function of the nth order MDGHM changes sign n Appl. Sci. 2019, 9, 1503 3 of 14 times, MDGHM can efficiently represent spatial characteristics and strongly separate image features using the multi-order derivatives. Therefore, MDGHM can be used as a filter to describe local features.

Proposed MIFT Algorithm
The scale invariant feature transform remains sensitive to deformations [1,8], because DoG and gradient methods do not provide distinctive information to accurately determine keypoint location in a deformed image. To obtain better matching accuracy, we propose MIFT, an MDGHM based invariant feature transform. Figure 2 shows an overview for SIFT and MIFT methods.

Stage 1
We use MDGHM based scale space to detect extrema, rather than the Gaussian pyramid and DoG used for conventional SIFT, extracting more distinctive features. The input image is downsampled by a factor of 2 to create an octave, and MDGHM is applied to create scale images according to selected parameters for derivative order, sigma, and mask size. i j s can be expressed using MDGHM as:

Proposed MIFT Algorithm
The scale invariant feature transform remains sensitive to deformations [1,8], because DoG and gradient methods do not provide distinctive information to accurately determine keypoint location in a deformed image. To obtain better matching accuracy, we propose MIFT, an MDGHM based invariant feature transform. Figure 2 shows an overview for SIFT and MIFT methods.

Proposed MIFT Algorithm
The scale invariant feature transform remains sensitive to deformations [1,8], because DoG and gradient methods do not provide distinctive information to accurately determine keypoint location in a deformed image. To obtain better matching accuracy, we propose MIFT, an MDGHM based invariant feature transform. Figure 2 shows an overview for SIFT and MIFT methods.

Stage 1
We use MDGHM based scale space to detect extrema, rather than the Gaussian pyramid and DoG used for conventional SIFT, extracting more distinctive features. The input image is downsampled by a factor of 2 to create an octave, and MDGHM is applied to create scale images according to selected parameters for derivative order, sigma, and mask size.

Stage 1
We use MDGHM based scale space to detect extrema, rather than the Gaussian pyramid and DoG used for conventional SIFT, extracting more distinctive features. The input image is down-sampled by a factor of 2 to create an octave, and MDGHM is applied to create scale images according to selected parameters for derivative order, sigma, and mask size.
Let (i α , j α , s) be an arbitrary pixel in a scale s, then the local moment at (i α , j α , s) can be expressed using MDGHM as: Appl. Sci. 2019, 9, 1503 4 of 14 and the vertical, ∧ η p,0 (i α , j α , s) and horizontal, ∧ η 0,q (i α , j α , s), moments can be obtained for each keypoint. Therefore, we can calculate the scale space moment by summing the vertical and horizontal components: and more distinctive keypoint candidates can be detected using MDGHM pyramid extrema than the DoG pyramid. Figure 3 compares and contrasts DoG and MDGHM pyramids and feature detection methods between conventional SIFT and proposed MIFT methods, and Figure 4 provides example scale space using DoG and MDGHM. Since MDGHM can include more distinctive feature information, the MDGHM pyramid can represent feature information in more detail than conventional SIFT, and building the MDGHM scale space is also somewhat simpler. q i j s , moments can be obtained for each keypoint. Therefore, we can calculate the scale space moment by summing the vertical and horizontal components: and more distinctive keypoint candidates can be detected using MDGHM pyramid extrema than the DoG pyramid. Figure 3 compares and contrasts DoG and MDGHM pyramids and feature detection methods between conventional SIFT and proposed MIFT methods, and Figure 4 provides example scale space using DoG and MDGHM. Since MDGHM can include more distinctive feature information, the MDGHM pyramid can represent feature information in more detail than conventional SIFT, and building the MDGHM scale space is also somewhat simpler.  q i j s , moments can be obtained for each keypoint. Therefore, we can calculate the scale space moment by summing the vertical and horizontal components: and more distinctive keypoint candidates can be detected using MDGHM pyramid extrema than the DoG pyramid. Figure 3 compares and contrasts DoG and MDGHM pyramids and feature detection methods between conventional SIFT and proposed MIFT methods, and Figure 4 provides example scale space using DoG and MDGHM. Since MDGHM can include more distinctive feature information, the MDGHM pyramid can represent feature information in more detail than conventional SIFT, and building the MDGHM scale space is also somewhat simpler.  The vertical and horizontal axes in Figure 4 represent octave and scale in scale space, respectively. MDGHM scale space images extract more distinctive feature information. Points with maximum or minimum MDGHM compared with its 26 neighbors at the consecutive three scales are regarded as keypoint candidates.

Stage 2
The proposed MIFT keypoint localization is similar to conventional SIFT, except that local DoG maxima and minima are replaced by the MDGHM counterparts. We then fit the candidates to filter those with low contrast or localized along an edge. Figure 5 shows the SIFT and MIFT processes to assign keypoint orientation. Conventional SIFT uses gradient magnitude and orientation, whereas MIFT uses the MDGHM magnitude,

Stage 3
and: The vertical and horizontal axes in Figure 4 represent octave and scale in scale space, respectively. MDGHM scale space images extract more distinctive feature information. Points with maximum or minimum MDGHM compared with its 26 neighbors at the consecutive three scales are regarded as keypoint candidates.

Stage 2
The proposed MIFT keypoint localization is similar to conventional SIFT, except that local DoG maxima and minima are replaced by the MDGHM counterparts. We then fit the candidates to filter those with low contrast or localized along an edge.

Stage 3
and: Each sample point around a keypoint has an MDGHM magnitude and an orientation, and the orientation histogram is calculated by summing these orientations weighted by the magnitudes. The highest histogram peak and other local peaks within 80% of the highest peak are selected as the orientations.

Stage 4
We calculate a descriptor that is invariant to deformations following a similar procedure to conventional SIFT, using MDGHM rather than gradient based magnitude and orientation, as shown in Figure 6. We calculate MDGHM magnitude and orientation for each sample point in a region around a given keypoint (Figure 6, left), and accumulate these orientations into the orientation Each sample point around a keypoint has an MDGHM magnitude and an orientation, and the orientation histogram is calculated by summing these orientations weighted by the magnitudes. The highest histogram peak and other local peaks within 80% of the highest peak are selected as the orientations.

Stage 4
We calculate a descriptor that is invariant to deformations following a similar procedure to conventional SIFT, using MDGHM rather than gradient based magnitude and orientation, as shown in Figure 6. We calculate MDGHM magnitude and orientation for each sample point in a region around a given keypoint (Figure 6, left), and accumulate these orientations into the orientation histogram for eight directions weighted by their magnitude (Figure 6, right). Each descriptor consists of a 2 × 2 array of histograms, hence each keypoint descriptor has 2 × 2 × 8 = 32 dimensions. In this study, we used 4 × 4 × 8 = 128 dimensions to describe a keypoint. histogram for eight directions weighted by their magnitude (Figure 6, right). Each descriptor consists of a 2 × 2 array of histograms, hence each keypoint descriptor has 2 × 2 × 8 = 32 dimensions. In this study, we used 4 × 4 × 8 = 128 dimensions to describe a keypoint.

Experimental Results
We performed experiments to evaluate the proposed MIFT performance using keypoint matching accuracy, compared with five SIFT relative algorithms, and considered application to egomotion compensation for a humanoid robot. Table 1 compares the proposed MIFT method with five SIFT relative algorithms. SIFT relative algorithms generally have four stages and, hence, they are characterized by their differences.

Image Deformation Dataset
We conducted an experiment to matching accuracy for a dataset containing six image deformations [15]: scale, image, viewpoint, blur, JPEG-compression, and illumination. Figure 7 shows

Experimental Results
We performed experiments to evaluate the proposed MIFT performance using keypoint matching accuracy, compared with five SIFT relative algorithms, and considered application to ego-motion compensation for a humanoid robot. Table 1 compares the proposed MIFT method with five SIFT relative algorithms. SIFT relative algorithms generally have four stages and, hence, they are characterized by their differences.

Image Deformation Dataset
We conducted an experiment to matching accuracy for a dataset containing six image deformations [15]: scale, image, viewpoint, blur, JPEG-compression, and illumination. Figure 7 shows some example testing images, where the left and right images represent reference and corresponding deformed images, respectively.
where correct positive was a match for two keypoints corresponding to the same physical location, false positive was a match for two keypoints corresponding to different physical locations, and F score ϵ [0, 1].
(a)  Figure 8 shows the resultant considered method performances, where each datapoint was the average for 3-6 continuously varying test images, and we increased DR 0-1 with 0.05 interval. MIFT achieved significantly superior performance than the other SIFT based algorithms for all deformations, with general performance order being MIFT, MDGHM-SIFT, SIFT, MDGHM-SURF, SURF and PCA-SIFT algorithms for most deformation cases. Thus, MDGHM had a positive effect on feature representation ability for image information. Table 2 shows performance metric outcomes. MIFT F-score increased approximately 32.5%, 21.4%, 41.9%, 14.9%, 20.1%, and 57.8% for scale, rotation, viewpoint, blur, JPEG compression, and illumination distortions, respectively, compared with conventional SIFT. The proposed MIFT method exhibited significantly superior performance compared with the other SIFT based algorithms for F-

Matching Method and Evaluation Metrics
We utilized nearest neighbor distance ratio (NNDR) matching [7] for performance evaluation, since NNDR selects only the best match. Two descriptors were considered to match if they were nearest neighbors and the distance ratio was less than a threshold. We used DR as a short descriptor for NNDR for convenience.
We used recall, 1-precision, and F-score evaluation metrics [7,14]: recall = number of correct − positives total number of positives , 1 − precision = number of false − positives total number of matches , (10) and: where correct positive was a match for two keypoints corresponding to the same physical location, false positive was a match for two keypoints corresponding to different physical locations, and F score [0, 1]. Figure 8 shows the resultant considered method performances, where each datapoint was the average for 3-6 continuously varying test images, and we increased DR 0-1 with 0.05 interval. MIFT achieved significantly superior performance than the other SIFT based algorithms for all deformations, with general performance order being MIFT, MDGHM-SIFT, SIFT, MDGHM-SURF, SURF and PCA-SIFT algorithms for most deformation cases. Thus, MDGHM had a positive effect on feature representation ability for image information.    Table 2 shows performance metric outcomes. MIFT F-score increased approximately 32.5%, 21.4%, 41.9%, 14.9%, 20.1%, and 57.8% for scale, rotation, viewpoint, blur, JPEG compression, and illumination distortions, respectively, compared with conventional SIFT. The proposed MIFT method exhibited significantly superior performance compared with the other SIFT based algorithms for F-score. Thus, MIFT was the most effective method tested, which attributed to employing the MDGHM based pyramid and MDGHM based feature description. The MDGHM pyramid generated more distinctive keypoints from scale space extrema detection, producing stronger histogram peaks during MIFT orientation assignment. Therefore, MIFT extracted more distinguishable final keypoints and, hence, achieved superior matching accuracy. We examined the effects due to MDGHM parameters standard deviation and derivative degree by fixing the MDGHM mask size, using F-score, as shown in Figure 9. The lower plane in each subfigure represents the SIFT F-score, the curve represents the MIFT F-score. Although the performance results exhibit some oscillation, MIFT outperformed SIFT for most parameter settings. In particular, MIFT achieved high F-scores for rotation, viewpoint, and illumination deformations for all parameter settings, with margin > 0.1 between MIFT and SIFT. For scale, blur and JPEG compression deformations, MIFT F-score was higher than SIFT for most parameter cases.

Motion Compensation Application
We applied MIFT to ego-motion compensation for a humanoid robot. Vision information obtained from a humanoid robot exhibits deformations while walking, hence compensation is mandatory to recognize the walking environment.
We first simulated an ego-motion image sequence from ideal data by including x and y axis displacement and rotation, and then calculated the apparent displacement using MIFT and SIFT, as shown in Figure 10, where the left images show the estimation, and the right images show the error. We calculated error by comparing SIFT and MIFT algorithm outcomes with ideal data. MIFT achieved significantly superior performance than SIFT, as shown in Figure 10 and summarized in Table 3.
Finally, we mounted an SR4000 camera on a humanoid robot and evaluated algorithm errors using real image sequences, as summarized in Table 4. Figure 11 shows a sample image sequence from the humanoid robot and Figure 12 shows the corresponding image sequence after ego-motion compensation calculated using the proposed MIFT method. Thus, the proposed MIFT method always provided superior performance than SIFT for appropriate parameter choices. MIFT exhibited the best efficiency in terms of matching accuracy when derivative order ≈ 3-7 and sigma ≈ 0.3-0.5.

Motion Compensation Application
We applied MIFT to ego-motion compensation for a humanoid robot. Vision information obtained from a humanoid robot exhibits deformations while walking, hence compensation is mandatory to recognize the walking environment.
We first simulated an ego-motion image sequence from ideal data by including x and y axis displacement and rotation, and then calculated the apparent displacement using MIFT and SIFT, as shown in Figure 10, where the left images show the estimation, and the right images show the error. We calculated error by comparing SIFT and MIFT algorithm outcomes with ideal data. MIFT achieved significantly superior performance than SIFT, as shown in Figure 10 and summarized in Table 3.     Finally, we mounted an SR4000 camera on a humanoid robot and evaluated algorithm errors using real image sequences, as summarized in Table 4. Figure 11 shows a sample image sequence from the humanoid robot and Figure 12 shows the corresponding image sequence after ego-motion compensation calculated using the proposed MIFT method.       Figure 11 after ego-motion compensation using the proposed MIFT method.

Conclusions
We proposed MIFT, an MDGHM based invariant feature transform descriptor. The SIFT-based descritpors are still sensitive to more complicated deformations because of the property of DoG used for the construction of scale-space. We proposed an MDGHM based pyramid which is less sensitive to noise and can provide more distinctive feature information than DoG, and calculated MDGHM based magnitude, orientation, and keypoint descriptors to improve the robustness of local features. We then performed experiment to compare the proposed MIFT method with various conventional SIFT approaches and parameter settings for six deformation types. The results confirmed that the proposed MIFT method provided significantly improved matching accuracy compared with conventional SIFT algorithms. We also evaluated performance effects for MDGHM parameter selections and showed that the proposed MIFT method outperformed conventional SIFT algorithms for most parameter settings. We then applied MIFT to ego-motion compensation for a humanoid robot.
However, adaptive parameter tuning for derivative orders, mask size, and MDGHM variance; and applying the proposed MIFT method to particular areas, such as image stitching and robot environment recognition, remain to be considered in future studies.