A Line Matching Method Based on Multiple Intensity Ordering with Uniformly Spaced Sampling

This paper presents a line matching method based on multiple intensity ordering with uniformly spaced sampling. Line segments are extracted from the image pyramid, with the aim of adapting scale changes and addressing fragmentation problem. The neighborhood of line segments was divided into sub-regions adaptively according to intensity order to overcome the difficulty brought by various line lengths. An intensity-based local feature descriptor was introduced by constructing multiple concentric ring-shaped structures. The dimension of the descriptor was reduced significantly by uniformly spaced sampling and dividing sample points into several point sets while improving the discriminability. The performance of the proposed method was tested on public datasets which cover various scenarios and compared with another two well-known line matching algorithms. The experimental results show that our method achieves superior performance dealing with various image deformations, especially scale changes and large illumination changes, and provides much more reliable correspondences.


Introduction
Feature matching has remained an essential engineering task in image processing and has been widely applied in computer vision, including image registration [1], image-based 3D modelling [2], object recognition [3] and pose estimation [4].
Typical feature matching algorithms usually consist of three steps: feature extraction, feature description and feature correspondence. In the first step, salient and stable features are extracted efficiently. Then the descriptors are constructed to encode the appearance of the neighborhood. In the matching stage, the similarity between the descriptors are measured to evaluate the correspondence. Among the various features used in computer vision, point features have been widely studied [5][6][7][8]. However, point features are likely to fail in low-texture scenes, such as industrial environments and indoor scenes. Because the establishment of the description for point features mainly relies on texture information, the number of the extracted point features may also fall drastically. On the contrary, line features are usually abundant in these man-made scenes, which also provide more geometric and structural information.
Compared with point feature matching algorithms, line matching methods have their own difficulties. The line segments extracted by line detection methods are far from being practical and physical. In some cases, a line segment in the real world may be divided into several fragments by the existing line segment detector [9,10]. Although some merging strategies are adopted, it is still quite hard to obtain unbroken line segments perfectly. Moreover, the location of line endpoints might not be precisely determined. These characteristics lead to multiple-to-multiple correspondences fix the dimension of the line descriptor. The descriptors are constructed by examining the gradient information of the pixels, and the pair-wise geometric attributes of neighboring line segments are considered to generate the final results. This paper extends our previous work, aiming to further improve the robustness of illumination changes and scale changes with the following aspects: 1) the local intensity order is examined to build the local feature description rather than gradient information; 2) uniformly spaced sampling is conducted in a multiple concentric ring shaped structure to reduce the dimension of the descriptor and improve the discriminative ability; 3) the approximate scale between the reference and the test image is estimated; 4) outliers are removed by estimating the fundamental matrix of the image pair from the intersections of the candidate line, matching pairs with the random sample consensus (RANSAC) strategy.

Methodology
An overview of our line matching method is presented in Figure 1, which consists of the following four main parts: multi-scale line segment detection, intensity-based local descriptor construction, descriptor matching in an image pyramid and fundamental matrix estimation by intersections. More details for each part are discussed in the following sections.
Sensors 2020, 20, x FOR PEER REVIEW 3 of 22 segments are considered to generate the final results. This paper extends our previous work, aiming to further improve the robustness of illumination changes and scale changes with the following aspects: 1) the local intensity order is examined to build the local feature description rather than gradient information; 2) uniformly spaced sampling is conducted in a multiple concentric ring shaped structure to reduce the dimension of the descriptor and improve the discriminative ability; 3) the approximate scale between the reference and the test image is estimated; 4) outliers are removed by estimating the fundamental matrix of the image pair from the intersections of the candidate line, matching pairs with the random sample consensus (RANSAC) strategy.

Methodology
An overview of our line matching method is presented in Figure 1, which consists of the following four main parts: multi-scale line segment detection, intensity-based local descriptor construction, descriptor matching in an image pyramid and fundamental matrix estimation by intersections. More details for each part are discussed in the following sections.

Line Segment Detection from Image Pyramid
In order to acquire the scale robustness in the matching process and overcome the fragmentation problem, the line segments are extracted by line detectors in a multi-scale way, which is based on Gaussian scale-space theory.

Line Segment Detection from Image Pyramid
In order to acquire the scale robustness in the matching process and overcome the fragmentation problem, the line segments are extracted by line detectors in a multi-scale way, which is based on Gaussian scale-space theory.
The original reference and test images are down-sampled and Gaussian blurred. For convenience, only one layer is built between two contiguous octaves. Then, a line segment detection algorithm is applied to each octave of the image pyramid to generate line segments. The corresponding lines located in each octave image are matched by inspecting several geometric characteristics. First, we check the parallelism of the line segments located in the same region in the octave images. If the angle between two lines is less than 10 degrees, they are considered as parallel. Then the distances between the endpoints are estimated. The scale factor is applied to the endpoints of the lines in each octave images, in order to evaluate the distances under original scale. Lines are considered corresponding only if the endpoints are close enough. After exploring all the line segments in the image pyramid, the corresponding line segments are assigned to an identical group, as shown in Figure 2. The original reference and test images are down-sampled and Gaussian blurred. For convenience, only one layer is built between two contiguous octaves. Then, a line segment detection algorithm is applied to each octave of the image pyramid to generate line segments. The corresponding lines located in each octave image are matched by inspecting several geometric characteristics. First, we check the parallelism of the line segments located in the same region in the octave images. If the angle between two lines is less than 10 degrees, they are considered as parallel. Then the distances between the endpoints are estimated. The scale factor is applied to the endpoints of the lines in each octave images, in order to evaluate the distances under original scale. Lines are considered corresponding only if the endpoints are close enough. After exploring all the line segments in the image pyramid, the corresponding line segments are assigned to an identical group, as shown in Figure 2. In the following line matching process, corresponding line segments in the same group are regarded as the identical line segments matching with other line groups. It is worth mentioning that not every line segment in the original image has its corresponding line in other octave images. A group with only one line segment from the original image could potentially exist.
For the line segment detection algorithm, a real-time line segment detector with edge drawing (EDLines) [9] is used in this paper. It is widely used in computer vision applications in recent years [26][27][28]. EDLines generates line segments very fast with no need for parameter tuning. The Helmholtz principle [29] is used in the detector to make validation with few false positives. As mentioned above, EDLines is applied to the octave images to detect line segments.

Support Region with Local Coordinate System
For a line segment detected from the octave image, the local appearance is evaluated by picking and partitioning its local neighborhood, which is the first consideration when creating a feature descriptor. All the subsequent computation is conducted in this local region. A line support region is established to undertake the computation and a local orientation system is built to obtain rotation invariance.
The support region is designed to be rectangular and centered on the line segment. As Figure 3 shows, the width of the region is set to be equal to the length L of the center line l, and the height is denoted by h. The local coordinate system is built by investigating the average gradient direction of pixels on the line segment, which is defined as ⊥ d , and the anticlockwise orthogonal direction of ⊥ d is defined as L d . Attributed to the definition of the local orientation system based on gradient direction, the rotation invariance is achieved when constructing a multiple concentric ring-shaped In the following line matching process, corresponding line segments in the same group are regarded as the identical line segments matching with other line groups. It is worth mentioning that not every line segment in the original image has its corresponding line in other octave images. A group with only one line segment from the original image could potentially exist.
For the line segment detection algorithm, a real-time line segment detector with edge drawing (EDLines) [9] is used in this paper. It is widely used in computer vision applications in recent years [26][27][28]. EDLines generates line segments very fast with no need for parameter tuning. The Helmholtz principle [29] is used in the detector to make validation with few false positives. As mentioned above, EDLines is applied to the octave images to detect line segments.

Support Region with Local Coordinate System
For a line segment detected from the octave image, the local appearance is evaluated by picking and partitioning its local neighborhood, which is the first consideration when creating a feature descriptor. All the subsequent computation is conducted in this local region. A line support region is established to undertake the computation and a local orientation system is built to obtain rotation invariance.
The support region is designed to be rectangular and centered on the line segment. As Figure 3 shows, the width of the region is set to be equal to the length L of the center line l, and the height is denoted by h. The local coordinate system is built by investigating the average gradient direction of Sensors 2020, 20, 1639 5 of 22 pixels on the line segment, which is defined as d ⊥ , and the anticlockwise orthogonal direction of d ⊥ is defined as d L . Attributed to the definition of the local orientation system based on gradient direction, the rotation invariance is achieved when constructing a multiple concentric ring-shaped structure. The details are explained in the following section.

Sub-Region Division Based on Intensity Order
In this section, the line support region is further divided into several sub-regions. The division process is quite essential, because line segments with different lengths should be unified to the description vector with a fixed dimension, in other words, the descriptor should be independent of the line length, which relies on a particular design. It can be observed that some of the previous line descriptors choose to carry out the division based on spatial information, such as banded structure [12,15], which is parallel to the line segment, in order to overcome this problem. After this, the statistic approaches are applied to each bands to compute the descriptors. However, such geometrical division strategies may not take full advantage of spatial or intensity information. Inspired by the local intensity order pattern [30], we divide the support region into several irregular sub-regions based on intensity order to improve the distinctiveness of the descriptor.
To start, we introduce some basic symbol definition.
where ( into m groups uniformly of the same sizes. In particular, the threshold of each group of sorted pixels is given by Thus a pixel ( ) p x in the support region can be assigned to a particular sub-region ( ) ; S x T based on the threshold. It is described as, As Figure 4b shows, each color represents a specific group of pixels. A support region is adaptively divided into several parts based on intensity order, rather than an artificially designed banded or circular structure. According to the construction principle, it is robust to monotonic illumination changes.

Sub-Region Division Based on Intensity Order
In this section, the line support region is further divided into several sub-regions. The division process is quite essential, because line segments with different lengths should be unified to the description vector with a fixed dimension, in other words, the descriptor should be independent of the line length, which relies on a particular design. It can be observed that some of the previous line descriptors choose to carry out the division based on spatial information, such as banded structure [12,15], which is parallel to the line segment, in order to overcome this problem. After this, the statistic approaches are applied to each bands to compute the descriptors. However, such geometrical division strategies may not take full advantage of spatial or intensity information. Inspired by the local intensity order pattern [30], we divide the support region into several irregular sub-regions based on intensity order to improve the distinctiveness of the descriptor.
To start, we introduce some basic symbol definition. P = p 1 , p 2 , . . . , p n is the set of n pixels in a support region, and their intensity is denoted by I = I(p 1 ), I(p 2 ), . . . , I(p n ) . Firstly, those n pixels are sorted in ascending order by their intensity value, and then a new array is formed, which is given bŷ The new intensity order set is described as, where f (1), f (2), . . . f (n) is one of the permutations of 1, 2, . . . , n. Then those sorted pixels are divided into m groups uniformly of the same sizes. In particular, the threshold of each group of sorted pixels is given by Thus a pixel p(x) in the support region can be assigned to a particular sub-region S(x; T) based on the threshold. It is described as, As Figure 4b shows, each color represents a specific group of pixels. A support region is adaptively divided into several parts based on intensity order, rather than an artificially designed banded or circular structure. According to the construction principle, it is robust to monotonic illumination changes.

Multiple Local Intensity Order Representation
In this section, multiple local intensity ordering is applied to describe the local neighborhood. Sample points distributed on a circle centered on a pixel are used to derive the description. In this work, a multiple concentric ring-shaped structure is established to improve the discrimination of the local feature description rather than a single circle. In addition, sample points are picked at regular intervals and assigned to different point sets to reduce the dimension of the descriptor to further improve the computational efficiency and improve the discriminative ability at the same time.
Firstly, a pixel x in a partitioned sub-region in the local coordinate system is considered. A number of neighborhood points are evenly distributed over a series of concentric circles of radius Figure 5 shows, there are N sample points on each circle with the same uniform distribution. The ith neighborhood point on the circle of radius j R is denoted as x on the circle of radius j R is located along the L d direction of the local coordinates from the center point x. Then the remaining neighborhood points are sampled and distributed on the circle in an anticlockwise direction, as shown in Figure 5, and the rotation invariance is obtained. The intensity value of those neighborhood points is obtained by bilinear interpolation, which is denoted by After this, the local feature description on each circle is calculated by sorting the intensity value of the N sample points in an ascending order to obtain a permutation, which is formed by their superscripts. Mathematically, an To be more specific, the new local intensity order set is described as where (1)

Multiple Local Intensity Order Representation
In this section, multiple local intensity ordering is applied to describe the local neighborhood. Sample points distributed on a circle centered on a pixel are used to derive the description. In this work, a multiple concentric ring-shaped structure is established to improve the discrimination of the local feature description rather than a single circle. In addition, sample points are picked at regular intervals and assigned to different point sets to reduce the dimension of the descriptor to further improve the computational efficiency and improve the discriminative ability at the same time.
Firstly, a pixel x in a partitioned sub-region in the local coordinate system is considered. A number of neighborhood points are evenly distributed over a series of concentric circles of radius R = R j : j ∈ [1, r] centered at x. As Figure 5 shows, there are N sample points on each circle with the same uniform distribution. The ith neighborhood point on the circle of radius R j is denoted as j on the circle of radius R j is located along the d L direction of the local coordinates from the center point x. Then the remaining neighborhood points are sampled and distributed on the circle in an anticlockwise direction, as shown in Figure 5, and the rotation invariance is obtained. The intensity value of those neighborhood points is obtained by bilinear interpolation, After this, the local feature description on each circle is calculated by sorting the intensity value of the N sample points in an ascending order to obtain a permutation, which is formed by their superscripts. Mathematically, an N-dimensional vector is mapped to a permutation of i : i ∈ [1, N] based on the order of I x i : i ∈ [1, N] . To be more specific, the new local intensity order set is described as where It is quite clear that the total number of the permutations is N! for a single circle. In order to facilitate the constructing descriptors, a one-to-one lookup table was established to match the local intensity ordering  I with the corresponding permutation. Accordingly, the local intensity ordering  I can be mapped directly to the corresponding integer index value η . The lookup table covers all the possible permutations of length N!. Then, a histogram H is created to compute the local feature descriptor, which is described as Each bin η B in the histogram corresponds to an integer index value of the lookup table.
Obviously, the size of the lookup table depends on the number of neighborhood points N, and the dimension of the local feature descriptor, which is computed based on the lookup table, also increases dramatically as N increases. Therefore, we picked N sample points at regular intervals in different point sets to avoid dimension disaster. As mentioned above, N neighborhood points are uniformly distributed on each circle. Those points are then uniformly space sampled and divided into v sets with u sample points in each point set, as shown in Figure 6a, which means = ⋅ N u v . As Figure 6b shows separately, multiple point sets are established on the circles of radius In this way, the dimension of the local feature descriptor depends on ⋅ ! v u , which is much lower than N!. Subsequent experiment sections show that the dimension of the local feature descriptor is reduced while the discrimination and the robustness is maintained. It is quite clear that the total number of the permutations is N! for a single circle. In order to facilitate the constructing descriptors, a one-to-one lookup table was established to match the local intensity ordering I with the corresponding permutation. Accordingly, the local intensity ordering I can be mapped directly to the corresponding integer index value η. The lookup table covers all the possible permutations of length N!.
Then, a histogram H is created to compute the local feature descriptor, which is described as Each bin B η in the histogram corresponds to an integer index value of the lookup table. Obviously, the size of the lookup table depends on the number of neighborhood points N, and the dimension of the local feature descriptor, which is computed based on the lookup table, also increases dramatically as N increases. Therefore, we picked N sample points at regular intervals in different point sets to avoid dimension disaster. As mentioned above, N neighborhood points are uniformly distributed on each circle. Those points are then uniformly space sampled and divided into v sets with u sample points in each point set, as shown in Figure 6a, which means N = u · v. As Figure 6b shows separately, multiple point sets are established on the circles of radius R = R j : j ∈ [1, r] . They are represented as In this way, the dimension of the local feature descriptor depends on v · u!, which is much lower than N!. Subsequent experiment sections show that the dimension of the local feature descriptor is reduced while the discrimination and the robustness is maintained.

The Construction of the Local Feature Descriptor
As stated above, the histogram technique is applied to build the descriptor. In order to further increase the robustness of the descriptor and emphasize the importance of the pixels close to the line segment, a Gaussian weight d g is applied rather than simple voting. It is defined as where d is the distance of the current pixel x to the line segment l and σ = h . In particular, for u sample points in point set v on the rth circle centered at pixel x in the sub-region S, d g is added to the bin η B based on their intensity value sorting results and the histogram is denoted by ( ) Then the whole local feature descriptor D is constructed simply by concatenating the histograms together and normalization. It is represented as

The Construction of the Local Feature Descriptor
As stated above, the histogram technique is applied to build the descriptor. In order to further increase the robustness of the descriptor and emphasize the importance of the pixels close to the line segment, a Gaussian weight g d is applied rather than simple voting. It is defined as where d is the distance of the current pixel x to the line segment l and σ = h. In particular, for u sample points in point set v on the rth circle centered at pixel x in the sub-region S, g d is added to the bin B η based on their intensity value sorting results and the histogram is denoted by H (r,ν) S . Then the whole local feature descriptor D is constructed simply by concatenating the histograms together and normalization. It is represented as Sensors 2020, 20, 1639 9 of 22

Descriptor Matching in Image Pyramid
In this section, the similarity between feature descriptors is evaluated and the approximate scale between the image pairs is estimated. To address the scaling and the fragmentation problem, we extract line segments in a multi-scale scheme in the previous section and the local feature descriptors are constructed. In Zhang's work [15], the rotation angle between the image pairs are estimated by drawing a direction histogram of the line segments. Inspired by this, we also compute a minimum distance histogram to estimate the scale between the image pairs.
The feature descriptors in the image pyramid with the closest scale return the most similar results. From the above consideration, the similarities between all the local feature descriptors in the image pyramid are calculated. A common way to do this is through the calculation of the Euclidean distances of the feature description vectors. To be specific, for a group of identical line segments in the reference image pyramid and a group of identical line segments in the test image pyramid, the descriptors are Then the minimum values of the distances are computed and accumulated to each bin corresponding to each pair of octave images to construct the distance histogram. Not all the minimum values are taken into account, due to the consideration of the influence of some random factors. We only choose to add up the top λ percent of the total number of the line segments from the octave image with the lowest number of line segments. Accordingly, the scale factor corresponding to the bins with the minimum value is considered to be the approximate scale between the original image pairs. An example is demonstrated in the following experimental section.
After estimating the scale factor, the nearest neighbor distance ratio (NNDR) matching criterion is applied to further select candidate matching pairs from the corresponding octave image pairs. It is defined as the distance from the nearest match divided by the distance from the second nearest match. If the distance ratio is within a certain threshold, it is considered to be a candidate match.

Fundamental Matrix Estimation by Intersections
Inevitably, false matches exist after evaluating local feature descriptor with the NNDR criterion. To be specific, candidate matching pairs obtained by a higher NNDR ratio may provide more correct matches but also more outliers. Therefore, we use the RANSAC strategy to remove outliers by estimating a fundamental matrix between the reference image and test image from candidate matching pairs. An eight-point algorithm is used to compute the fundamental matrix [31]. Hence, intersections of every two line segments from candidate matching pairs are calculated. Among them, the intersections Then q ij re f , q ij test is considered to be a pair of candidate intersections spontaneously. If both q ij re f and q ij test are located in the image range, they are sent to estimate the fundamental matrix. After carrying out RANSAC, inlier intersection pairs are obtained, and the corresponding line segments that form those intersections are regarded as the final result.

Experiments and Discussion
In this section, image matching experiments are conducted to evaluate the performance of the proposed line matching method. Two well-known line matching algorithms are also tested in the same scheme to compare the discriminative ability and the robustness of the methods. They are MSLD, the line segment appearance description method with SIFT-like strategy; and LPI, the algorithm based on line-point invariants. All the line segments in the experiments are extracted by the EDLines detector. The point matches required in LPI are obtained by SIFT from the OpenCV library.
The test image pairs used in this paper are shown in Figures 7 and 8, which cover different kinds of scenarios from a natural scene. All the image pairs are from the public datasets downloaded from the internet and used in literature [14,15,30,32]. Common image transformations are collected as follows: rotation (Figure 7a,b), illumination changes (Figure 7c,d and 8), view point changes (Figure 7e) and scale changes (Figure 7g,h). Occlusion (Figure 7f) and low-texture scenes (Figure 7i,j) are also included. The test image pairs used in this paper are shown in Figures 7 and 8, which cover different kinds of scenarios from a natural scene. All the image pairs are from the public datasets downloaded from the internet and used in literature [14,15,30,32]. Common image transformations are collected as follows: rotation (Figure 7a,b), illumination changes (Figure 7c,d and 8), view point changes (Figure 7e) and scale changes (Figure 7g,h). Occlusion (Figure 7f) and low-texture scenes (Figure 7i,j) are also included.

Parameters Evaluation
In order for the line matching method to exhibit better performance, several relevant parameters should be carefully selected.  Table 1. When discussing a parameter, all other parameters are set to the same fixed values. The matching precision and recall are used as the evaluation criterion. However, in order to consider matching precision and recall comprehensively, the F1-Measure is also taken into account. They are defined as follows = the number of correct matches precision total matches , = the number of correct matches recall the groundtruth matches , The average performance of the method under different parameter settings are shown in Figure 9 by the recall vs 1-precision curves and the curves of the total number of correct matches. The selected parameter values are also displayed in Table 1.
For the height of the line support regions, Figure 9a demonstrates that the result is not monotonically increasing with the parameter. The number of correct matches increases rapidly in the beginning, then the performance trends toward a steady and slow decline. Figure 9b shows that as the number of sub-regions increases, the number of correct matches rises to a peak and then falls. As shown in the recall vs 1-precision curves, m = 6 is slightly superior to m = 4, although it achieves the maximum number of correct matches when m = 4.

Parameters Evaluation
In order for the line matching method to exhibit better performance, several relevant parameters should be carefully selected. They are the number of sub-regions m, the number of concentric circles r, the radius of the circles R = R j : j ∈ [1, r] , the number of sample point sets v, the number of sample points in each point set u and the height h of the support region. To evaluate the performance of the proposed method under different parameter settings, various image matching experiments are conducted with all the tested parameters listed in Table 1. When discussing a parameter, all other parameters are set to the same fixed values. The matching precision and recall are used as the evaluation criterion. However, in order to consider matching precision and recall comprehensively, the F1-Measure is also taken into account. They are defined as follows precision = the number o f correct matches total matches , recall = the number o f correct matches the groundtruth matches , The average performance of the method under different parameter settings are shown in Figure 9 by the recall vs 1-precision curves and the curves of the total number of correct matches. The selected parameter values are also displayed in Table 1.
For the height of the line support regions, Figure 9a demonstrates that the result is not monotonically increasing with the parameter. The number of correct matches increases rapidly in the beginning, then the performance trends toward a steady and slow decline.  Figure 9b shows that as the number of sub-regions increases, the number of correct matches rises to a peak and then falls. As shown in the recall vs 1-precision curves, m = 6 is slightly superior to m = 4, although it achieves the maximum number of correct matches when m = 4.
As can be seen in Figure 9c, more correct matches are obtained when three or four points are sampled in each point set. It is observed that v = 3, u = 3 outperforms the other cases from the recall vs 1-precision curves. Figure 9d indicates that multiple concentric circles have a big advantage over the single ring-shaped structure. To strike a balance between the discriminative ability and the dimension of the local feature descriptor, two circles with radius R 1 = 12, R 2 = 15 are selected.
Sensors 2020, 20, x FOR PEER REVIEW 12 of 22 As can be seen in Figure 9c, more correct matches are obtained when three or four points are sampled in each point set. It is observed that v = 3, u = 3 outperforms the other cases from the recall vs 1-precision curves. Figure 9d indicates that multiple concentric circles have a big advantage over the single ring-shaped structure. To strike a balance between the discriminative ability and the dimension of the local feature descriptor, two circles with radius R1 = 12, R2 = 15 are selected.

The Descriptor Dimension
The dimension of the local feature descriptor and the effective measure to reduce the dimension is discussed in this section.
Suppose that the line support region is divided into m sub-regions, and r concentric circles are constructed with v point set on each circle, each containing u sample points. The dimension of the proposed local feature descriptor D can then be calculated as of the local feature descriptor ′ D with only one point set on each concentric circle with the same number of sample points is ′ = ⋅ ⋅ ⋅ dim( ) ( )! D m r v u by contrast. As can be seen in Figure 10, the dimension of the local feature descriptor is reduced dramatically by dividing the same number of sample points into multiple point sets rather than using those sample points directly. Nevertheless, it is also important to note that the performance might decrease when the sample points are divided, to a certain extent. As shown in Figure 9d, six sample points are divided into two and three point sets, respectively. Obviously, when each point set contains only two sample points, it becomes a binary pattern, which might reduce the discriminative ability of the method.

The Descriptor Dimension
The dimension of the local feature descriptor and the effective measure to reduce the dimension is discussed in this section.
Suppose that the line support region is divided into m sub-regions, and r concentric circles are constructed with v point set on each circle, each containing u sample points. The dimension of the proposed local feature descriptor D can then be calculated as dim(D) = m · r · (v · u!). The dimension of the local feature descriptor D with only one point set on each concentric circle with the same number of sample points is dim(D ) = m · r · (v · u)! by contrast. As can be seen in Figure 10, the dimension of the local feature descriptor is reduced dramatically by dividing the same number of sample points into multiple point sets rather than using those sample points directly. Nevertheless, it is also important to note that the performance might decrease when the sample points are divided, to a certain extent. As shown in Figure 9d, six sample points are divided into two and three point sets, respectively. Obviously, when each point set contains only two sample points, it becomes a binary pattern, which might reduce the discriminative ability of the method. Sensors 2020, 20, x FOR PEER REVIEW 14 of 22

The Scale Estimation
As described earlier, the scale factor between the original image pairs is estimated by constructing a minimum distance histogram from the image pyramid. Figures 11 and 12 gives an example of a minimum distance histogram of the image pair shown in Figure 7g. The ground truth scale between the image pair is 3.3 and the estimated approximate scale is ≈ 2 2 2.83 . Figure 11a illustrates the image pyramid of the reference image and the scale factor between each adjacent octave image is 2 . In this case, the octave image in scale four has the least number of line segments, with 49 line segments. So the sum of the top λ percent of 49 minimum Euclidean distances are computed and drawn in Figure 12. As can be observed, the summation in scale 2 2 is the smallest over all the presented values of λ . Hence, it is selected as the approximate scale between the image pairs. As for parameter λ , we select λ= 80 in the following experiments.

The Scale Estimation
As described earlier, the scale factor between the original image pairs is estimated by constructing a minimum distance histogram from the image pyramid. Figures 11 and 12 gives an example of a minimum distance histogram of the image pair shown in Figure 7g. The ground truth scale between the image pair is 3.3 and the estimated approximate scale is 2

The Scale Estimation
As described earlier, the scale factor between the original image pairs is estimated by constructing a minimum distance histogram from the image pyramid. Figures 11 and 12 gives an example of a minimum distance histogram of the image pair shown in Figure 7g. The ground truth scale between the image pair is 3.3 and the estimated approximate scale is ≈ 2 2 2.83 . Figure 11a illustrates the image pyramid of the reference image and the scale factor between each adjacent octave image is 2 . In this case, the octave image in scale four has the least number of line segments, with 49 line segments. So the sum of the top λ percent of 49 minimum Euclidean distances are computed and drawn in Figure 12. As can be observed, the summation in scale 2 2 is the smallest over all the presented values of λ . Hence, it is selected as the approximate scale between the image pairs. As for parameter λ , we select λ= 80 in the following experiments.

Experimental Results and Discussion
In this section, we present the details of the line matching experimental results to evaluate the proposed method compared with MSLD and LPI. The same criterion described above is followed and the precision, recall and F1-measure values are used to evaluate the methods.  2. In this case, the octave image in scale four has the least number of line segments, with 49 line segments. So the sum of the top λ percent of 49 minimum Euclidean distances are computed and drawn in Figure 12. As can be observed, the summation in scale 2 √ 2 is the smallest over all the presented values of λ. Hence, it is selected as the approximate scale between the image pairs. As for parameter λ, we select λ = 80 in the following experiments.

Experimental Results and Discussion
In this section, we present the details of the line matching experimental results to evaluate the proposed method compared with MSLD and LPI. The same criterion described above is followed and the precision, recall and F1-measure values are used to evaluate the methods.

Natural Scene Image Pairs
The experiments are conducted on 10 pairs of natural scene images shown in Figure 7. The bar graphs in Figure 13 shows the performance of the methods and Figure 14 illustrates the matching results of our method.
As can be seen from Figure 13, the proposed method performs better than other two algorithms for image pairs with rotation changes (Figure 14a,b). This is because all the calculations are based on the local coordinate system built by examining the average gradient on either side of the line segment, so that the local feature descriptors are less affected by the rotation changes. In addition, the sampling strategy when constructing descriptors achieves more robustness against rotation changes than dominant orientation estimation system in point features, compared with LPI based on SIFT descriptor.

Experimental Results and Discussion
In this section, we present the details of the line matching experimental results to evaluate the proposed method compared with MSLD and LPI. The same criterion described above is followed and the precision, recall and F1-measure values are used to evaluate the methods.

Natural Scene Image Pairs
The experiments are conducted on 10 pairs of natural scene images shown in Figure 7. The bar graphs in Figure 13 shows the performance of the methods and Figure 14 illustrates the matching results of our method.
As can be seen from Figure 13, the proposed method performs better than other two algorithms for image pairs with rotation changes (Figure 14a,b). This is because all the calculations are based on the local coordinate system built by examining the average gradient on either side of the line segment, so that the local feature descriptors are less affected by the rotation changes. In addition, the sampling strategy when constructing descriptors achieves more robustness against rotation changes than dominant orientation estimation system in point features, compared with LPI based on SIFT descriptor.  For image pairs with drastic illumination changes in Figure 14c,d, the proposed method demonstrates the best performance, mainly due to the design of intensity value sorting. The results indicate that the intensity order-based method is more robust against large illumination changes than dominant orientation estimation technique.
Image pairs with viewpoint change and occlusion are displayed in Figure 14e,f, and the matching results show that the proposed method obtains high scores in both accuracy and recall.
In the case of scale changes, illustrated in Figure 14g,h, our method outperforms the other two algorithms significantly. It is observed that the multi-scale line segment detection and matching strategy help to overcome the scale change problem effectively. The SIFT descriptor used in LPI also employs the scale-space theory to handle scale changes. In contrast, as shown in Figure 13, MSLD fails in this scenario because of its lack of special design for scale variation. Figure 14i,j shows low-texture scene, which are quite common in indoor scenes. Our method ranks the top of three methods in terms of both precision and recall and obtains fairly high scores. In contrast, LPI with SIFT fails in this situation, because the local appearance described by SIFT For image pairs with drastic illumination changes in Figure 14c,d, the proposed method demonstrates the best performance, mainly due to the design of intensity value sorting. The results indicate that the intensity order-based method is more robust against large illumination changes than dominant orientation estimation technique.
Image pairs with viewpoint change and occlusion are displayed in Figure 14e,f, and the matching results show that the proposed method obtains high scores in both accuracy and recall.
In the case of scale changes, illustrated in Figure 14g,h, our method outperforms the other two algorithms significantly. It is observed that the multi-scale line segment detection and matching strategy help to overcome the scale change problem effectively. The SIFT descriptor used in LPI also employs the scale-space theory to handle scale changes. In contrast, as shown in Figure 13, MSLD fails in this scenario because of its lack of special design for scale variation.
descriptor is largely depended on texture information. It is quite difficult to manage such a poorly textured scene with a lack of gradient information.

The Illumination Dataset
The illumination dataset contains three sets of image sequences with increasing illumination changes, as shown in Figure 8. The performance of the methods is shown in Table 2.
As can be seen from Table 2, the proposed method produces a much greater number of correct matches with the highest precision in all the image sequences. The number of correct matches obtained by our method is twice the amount of MSLD. The proposed method still provides 121 correct matches with high precision under the large illumination changes in Figure 8a(1-5), while the performance of both MSLD and LPI decreases rapidly.
The running time is also evaluated in this experiment. The experiments are conducted on a 1.8 GHz Intel Core i7-8565U processor with 16 GB of RAM. The average running time for each line segment is given in Table 2. The processing time of our method depends on the number of the extracted line segments, the dimension of the local feature descriptor and the area of the line support region.  14i,j shows low-texture scene, which are quite common in indoor scenes. Our method ranks the top of three methods in terms of both precision and recall and obtains fairly high scores. In contrast, LPI with SIFT fails in this situation, because the local appearance described by SIFT descriptor is largely depended on texture information. It is quite difficult to manage such a poorly textured scene with a lack of gradient information.

The Illumination Dataset
The illumination dataset contains three sets of image sequences with increasing illumination changes, as shown in Figure 8. The performance of the methods is shown in Table 2.
As can be seen from Table 2, the proposed method produces a much greater number of correct matches with the highest precision in all the image sequences. The number of correct matches obtained by our method is twice the amount of MSLD. The proposed method still provides 121 correct matches with high precision under the large illumination changes in Figure 8a(1-5), while the performance of both MSLD and LPI decreases rapidly.
The running time is also evaluated in this experiment. The experiments are conducted on a 1.8 GHz Intel Core i7-8565U processor with 16 GB of RAM. The average running time for each line segment is given in Table 2. The processing time of our method depends on the number of the extracted line segments, the dimension of the local feature descriptor and the area of the line support region. Table 2. Line matching results of the proposed method, MSLD and LPI on illumination datasets.

Img Pairs
The