Hierarchical Point Matching Method Based on Triangulation Constraint and Propagation

: Reliable image matching is the basis of image-based three-dimensional (3D) reconstruction. This study presents a quasi-dense matching method based on triangulation constraint and propagation as applied to different types of close-range image matching, such as illumination change, large viewpoint, and scale change. The method begins from a set of sparse matched points that are used to construct an initial Delaunay triangulation. Edge-to-edge matching propagation is then conducted for the point matching. Two types of matching primitives from the edges of triangles with areas larger than a given threshold in the reference image, that is, the midpoints of edges and the intersections between the edges and extracted line segments, are used for the matching. A hierarchical matching strategy is adopted for the above-mentioned primitive matching. The points that cannot be matched in the first stage, specifically those that failed in a gradient orientation descriptor similarity constraint, are further matched in the second stage. The second stage combines the descriptor and the Mahalanobis distance constraints, and the optimal matching subpixel is determined according to an overall similarity score defined for the multiple constraints with different weights. Subsequently, the triangulation is updated using the newly matched points, and the aforementioned matching is repeated iteratively until no new matching points are generated. Twelve sets of close-range images are considered for the experiment. Results reveal that the proposed method has high robustness for different images and can obtain reliable matching results. different weights to determine the best matching result. This stage helps generate more corresponding points and improves matching results.


Introduction
Image matching is a process of identifying corresponding features on different images and is an essential step in image processing, such as three-dimensional (3D) reconstruction, image fusion, and target tracking [1][2][3].Numerous research papers were published in the field of image matching [4][5][6][7][8][9][10].Existing image matching approaches can be classified into three categories according to the number of matching results: (i) feature-based sparse matching, which is focused on the construction of a descriptor and is widely used in initial matching to calculate projection parameters; (ii) quasi-dense stereo matching algorithm based on region growing, which is focused on matching strategies, such as matching constraints and matching propagation; (iii) dense pixel matching, which is mostly used in binocular stereo vision.
This study mainly aims to obtain matching results as dense as possible for different types of images on the basis of initial sparse matching and matching propagation, and this study belongs to the quasi-dense matching domain.The use of matching constraints is key in this method, and the widely used epipolar constraint can be used to reduce the matching search space from two dimensions (2D) to one dimension (1D).However, a large search space may exist, and this space is generally combined with other various geometric constraints, such as parallax constraints [11], Voronoi diagram [12][13][14], and Delaunay triangulation [15,16].The search range of the epipolar line can be limited to the internal corresponding triangle when combined with the triangulation constraint [17].Li et al. [18] proposed a Voronoi diagram-based propagation stereo matching algorithm that firstly subdivides the whole image into N cells by using the Voronoi diagram of the seed feature points; then, the corresponding relations are propagated from the seed in each cell until all pixels within this cell are processed.Similarly, Delaunay triangulation is widely used in matching [19][20][21].Delaunay triangulation is a form of a vector-based digital surface model constructed by a set of unorganized points, whose main contributions in matching are constraint matching and matching propagation through region segmentation of triangle meshes [22].Zhao et al. [23] used Delaunay triangulation to remove outliers in iterative matching by comparing Delaunay graph structures and then used the dual graph of Delaunay to check the above-mentioned results, recover inliers, and remove outliers.Ahuja et al. [24] proposed that a point within a triangle in a single image be matched to a point within the corresponding triangle in another image, and this approach is called local position constraint.Guo et al. [25] utilized the relationship between the matching point and the vertices of a triangle for constraint matching candidates.Zhu et al. [26,27] added a parallax gradient to the constraint matching candidates on the basis of the corresponding triangles established by the existing corresponding points on the epipolar stereo pair.The method requires a vertex of a triangle with the highest matching reliability to be selected as the reference point, and the parallax gradient is assumed to be related to the ratio of the distance between the reference and potential matching points to their difference in the parallax.The aforementioned methods can produce reliable matching results, but they only consider the point matching.Wu et al. [28] proposed a matching method that integrates point and edge matching based on self-adaptive edge-constrained triangulations to incorporate edge features in the matching and improve the image matching performance of poor textural images.The method takes advantage of edge information and updates triangulations dynamically.It is used for space-borne, airborne, and terrestrial image matching and can produce reliable matching results for poor textural images.However, it only uses image feature information.Jia et al. [29] proved the mathematical property of equal proportion of triangulation affine transformation as a means of obtaining much denser matching results and proposed a dense matching method for wide baseline images based on the theory.In their method, the equal proportion points on the corresponding edges of triangles in two views that satisfy the similarity constraints are regarded as corresponding points.The method can achieve dense matching of images with continuous scenes but lacks robustness for different types of images, especially large viewpoint change images with discontinuous surfaces.
This study proposes a new quasi-dense matching method based on triangulation constraint and propagation.The proposed method can achieve dense matching and consider the edge features of the image.Compared with existing approaches, our method has several advantages.Firstly, we introduce the intersections between edge line segments and triangulation as matching primitives.In this manner, more matches can be generated, and the image edge feature is considered at the same time.Secondly, we propose a new descriptor for point dense matching that adopts overlapped subregions to increase the correlation between description vectors and contributes to the generation of dense matching results.Thirdly, we adopt the hierarchical matching strategy by integrating additional matching into the second stage to reach subpixel accuracy.An overall similarity score is defined to integrate the descriptor similarity measure, the distance of the point from a triangle's edge, Mahalanobis distance, and the distance of the point from the epipolar line as the four similarity measures with different weights to determine the best matching result.This stage helps generate more corresponding points and improves matching results.
The remainder of the paper is organized as follows: Section 2 introduces our hierarchical point matching method; Section 3 presents the experiments conducted on twelve image pairs and demonstrates the performance of our method by integrating additional matching into the hierarchical matching strategy; Section 4 offers the concluding remarks and suggests future work.

Proposed Method
The flowchart of the proposed dense matching algorithm is illustrated in Figure 1.The input data of our method consist of two images.An image is selected as the reference image while the other image acts as the search image.The scale-invariant feature transform (SIFT) algorithm [30] is used to obtain the initial corresponding points.Incorrect matches are removed by the random sample consensus algorithm (RANSAC) [31], and correct point-to-point correspondences are obtained.Next, the corresponding Delaunay triangulation is constructed in both images.The line segments extracted from the reference image are used to generate intersection points with triangles in the reference image, which are used as matching primitives for the succeeding matching.For each triangle with an area greater than the given threshold T s , the midpoints of the three sides and the intersection points mentioned above are matched in turn.Whether new corresponding points are generated after all the triangles are processed should be assessed.If new corresponding points are generated, then Delaunay triangulation is updated with these new points, and the aforementioned steps are repeated iteratively until no new corresponding points are generated.Otherwise, the matching is stopped.Thereafter, the final matching results are outputted.
ISPRS Int.J. Geo-Inf.2020, 9, x FOR PEER REVIEW 3 of 19 The remainder of the paper is organized as follows: Section 2 introduces our hierarchical point matching method; Section 3 presents the experiments conducted on twelve image pairs and demonstrates the performance of our method by integrating additional matching into the hierarchical matching strategy; Section 4 offers the concluding remarks and suggests future work.

Proposed Method
The flowchart of the proposed dense matching algorithm is illustrated in Figure 1.The input data of our method consist of two images.An image is selected as the reference image while the other image acts as the search image.The scale-invariant feature transform (SIFT) algorithm [30] is used to obtain the initial corresponding points.Incorrect matches are removed by the random sample consensus algorithm (RANSAC) [31], and correct point-to-point correspondences are obtained.Next, the corresponding Delaunay triangulation is constructed in both images.The line segments extracted from the reference image are used to generate intersection points with triangles in the reference image, which are used as matching primitives for the succeeding matching.For each triangle with an area greater than the given threshold  , the midpoints of the three sides and the intersection points mentioned above are matched in turn.Whether new corresponding points are generated after all the triangles are processed should be assessed.If new corresponding points are generated, then Delaunay triangulation is updated with these new points, and the aforementioned steps are repeated iteratively until no new corresponding points are generated.Otherwise, the matching is stopped.Thereafter, the final matching results are outputted.Two types of matching primitives are applied in our study.One includes the midpoints of three edges of the triangle, and the other is the intersection point of the extracted line segments and edges of the triangles.In the following sections, both primitives are referred to as the midpoint and the intersection point, respectively.The proposed method is an iterative matching process that includes Two types of matching primitives are applied in our study.One includes the midpoints of three edges of the triangle, and the other is the intersection point of the extracted line segments and edges of the triangles.In the following sections, both primitives are referred to as the midpoint and the intersection point, respectively.The proposed method is an iterative matching process that includes two stages in each iteration, with each stage corresponding to a matching strategy.In the first matching stage, we match points only by using the proposed SIFT-like local gradient descriptors and Euclidean distances.In the case of unmatched points, we further match them in the second stage by utilizing multiple constraints, which is described in Section 2.2.

Point Matching Based on Descriptors with Overlapping Subregions
For any triangle with an area greater than T s in the reference image, the midpoints of the three sides and the intersections between the triangle and line segments are calculated.We suppose that the triangles ∆abc and ∆a b c are a pair of corresponding triangles (Figure 2 two stages in each iteration, with each stage corresponding to a matching strategy.In the first matching stage, we match points only by using the proposed SIFT-like local gradient descriptors and Euclidean distances.In the case of unmatched points, we further match them in the second stage by utilizing multiple constraints, which is described in Section 2.2.

Point Matching Based on Descriptors with Overlapping Subregions
For any triangle with an area greater than  in the reference image, the midpoints of the three sides and the intersections between the triangle and line segments are calculated.We suppose that the triangles abc  and a b c ′ ′ ′  are a pair of corresponding triangles (Figure 2), and the corresponding vertices (, ′), (, ′), and (, ′) are three pairs of corresponding points.For any midpoint , its candidate point ′ is the midpoint of the corresponding side of the corresponding triangle in the search image.For any intersection point , its candidate point ′ is the intersection point between the epipolar line of point  in the search image and the corresponding side of the corresponding triangle in the search image.For each correspondence hypothesis (, ′) or (, ′), we propose a new descriptor for similarity measure calculation and determine whether the hypothesis offers the correct match.The feature descriptor in the feature matching process is an important algorithm because it extracts pertinent information and encodes it into feature vectors, which are then used to differentiate a feature from another or find the same features.With the popularization of the SIFT descriptor [32][33][34], many descriptors based on histograms of gradient orientations, such as ASIFT [35] and Daisy [36], were proposed.These descriptors have high robustness for different types of images during matching, but they also have high computational complexity.As such, we propose a new SIFT-like descriptor for the local neighborhood of a point that not only offers the advantages of the SIFT descriptor but also reduces the computation cost.The descriptor construction is processed as described below.
Firstly, Gaussian filtering is used to reduce the image noise.Then, the neighborhood window with a center at the target point is built for the descriptor construction.The size of the window is (2 + 1) × (2 + 1), ( = 3,4 … , ).In this study, we set  = 4.We assume that the coordinates of the target point are ( ) 0 0 , y x and the coordinates of the four corners of the neighborhood window are ( ) ) ) , and ( ) . The neighborhood window is then divided into four subregions, in which the square areas with the target point and the four corners of the neighborhood window are used to set the diagonals.The size of each subregion is ( + 1) × ( + 1).The schematic is shown in Figure 3.The four subregions with overlapping regions, which all include the row and column where the target point is located, are obtained.This strategy  (d, d ) is an intersection point correspondence hypothesis.

Descriptor Construction
The feature descriptor in the feature matching process is an important algorithm because it extracts pertinent information and encodes it into feature vectors, which are then used to differentiate a feature from another or find the same features.With the popularization of the SIFT descriptor [32][33][34], many descriptors based on histograms of gradient orientations, such as ASIFT [35] and Daisy [36], were proposed.These descriptors have high robustness for different types of images during matching, but they also have high computational complexity.As such, we propose a new SIFT-like descriptor for the local neighborhood of a point that not only offers the advantages of the SIFT descriptor but also reduces the computation cost.The descriptor construction is processed as described below.
Firstly, Gaussian filtering is used to reduce the image noise.Then, the neighborhood window with a center at the target point is built for the descriptor construction.The size of the window is (2w + 1) × (2w + 1), (w = 3, 4 . . ., n).In this study, we set w = 4.We assume that the coordinates of the target point are (x 0 , y 0 ) and the coordinates of the four corners of the neighborhood window are (x 0 − w, y 0 − w), (x 0 + w, y 0 − w), (x 0 − w, y 0 + w), and (x 0 + w, y 0 + w).The neighborhood window is then divided into four subregions, in which the square areas with the target point and the four corners of the neighborhood window are used to set the diagonals.The size of each subregion is (w + 1) × (w + 1).The schematic is shown in Figure 3.The four subregions with overlapping regions, which all include the row and column where the target point is located, are obtained.This strategy increases the correlation between regions and avoids the influence of region inconsistency caused by large viewpoint changes.Secondly, each subregion is again divided into four square regions with the same sizes, and 16 small patches are obtained around the target point.A gradient magnitude histogram with four orientations is then calculated in each small patch.As the impact of the nearby points on the descriptor increases, the distance  from each point in the patch to the target point is calculated, and each gradient magnitude is weighted by the corresponding distance  with the function  =  ×  in the statistical process.Then, a statistical description vector about the gradient magnitudes in the four directions for each small patch is obtained, the description vectors of four small patches in the subregion are stacked, and a 4 × 4 description matrix for this subregion is obtained.Thereafter, inspired by the mean-standard deviation line descriptor (MSLD) descriptor [37], the mean and standard deviation vectors of matrix are calculated and normalized, respectively, to unit norm to make the descriptor invariant to linear changes of illumination.Finally, the descriptor vector with a length of 32 is obtained by concatenating the normalized mean vector of four subregions with the normalized standard deviation vector of four subregions into a single vector.
Our design of the descriptor adopts the subregions overlapping one another in the feature description, which increases the correlation between description vectors and contributes to the generation of dense matching results.

Dissimilarity Measure
For two points in any set of matching hypotheses in the first matching stage, the two descriptors  and  are obtained for the reference and search image points, and Euclidean distance is used to calculate the dissimilarities between the descriptors.If the Euclidean distance  is smaller than the given threshold  , then this set of matching hypotheses is considered correct, that is, the two points in the matching hypotheses are a pair of corresponding points.

Point Matching with Multiple Constraints
Points that could not be matched in the first stage are further matched using a strategy with multiple constraints as described below.
Firstly, for reference point  with an initial candidate point denoted as ′ , we define the searching region as a square with a center at point ′ and a length of (2 + 1), where  is a userdefined parameter.In this study, we set  = 1.All pixels in the searching window are regarded as matching candidates.For each candidate, we build its descriptor and calculate the Euclidean distance  between two descriptors of reference and candidate point.If at least one candidate point Secondly, each subregion is again divided into four square regions with the same sizes, and 16 small patches are obtained around the target point.A gradient magnitude histogram with four orientations is then calculated in each small patch.As the impact of the nearby points on the descriptor increases, the distance d i from each point in the patch to the target point is calculated, and each gradient magnitude is weighted by the corresponding distance d i with the function gradient true = gradient × e −d i in the statistical process.Then, a statistical description vector about the gradient magnitudes in the four directions for each small patch is obtained, the description vectors of four small patches in the subregion are stacked, and a 4 × 4 description matrix for this subregion is obtained.Thereafter, inspired by the mean-standard deviation line descriptor (MSLD) descriptor [37], the mean and standard deviation vectors of matrix are calculated and normalized, respectively, to unit norm to make the descriptor invariant to linear changes of illumination.Finally, the descriptor vector with a length of 32 is obtained by concatenating the normalized mean vector of four subregions with the normalized standard deviation vector of four subregions into a single vector.
Our design of the descriptor adopts the subregions overlapping one another in the feature description, which increases the correlation between description vectors and contributes to the generation of dense matching results.

Dissimilarity Measure
For two points in any set of matching hypotheses in the first matching stage, the two descriptors descriptor 1 and descriptor 2 are obtained for the reference and search image points, and Euclidean distance is used to calculate the dissimilarities between the descriptors.If the Euclidean distance distance 1−2 is smaller than the given threshold T 1 , then this set of matching hypotheses is considered correct, that is, the two points in the matching hypotheses are a pair of corresponding points.

Point Matching with Multiple Constraints
Points that could not be matched in the first stage are further matched using a strategy with multiple constraints as described below.
Firstly, for reference point a with an initial candidate point denoted as a , we define the searching region as a square with a center at point a and a length of (2m + 1), where m is a user-defined parameter.In this study, we set m = 1.All pixels in the searching window are regarded as matching candidates.For each candidate, we build its descriptor and calculate the Euclidean distance distance 1−2 between two descriptors of reference and candidate point.If at least one candidate point satisfies the condition in which the distance 1−2 value is smaller than the threshold T 2 , then the optimal matching with subpixel accuracy should be determined further by utilizing multiple constraints.
Secondly, we divide each candidate point satisfied by the Euclidean distance constraint mentioned above into grids and obtain a set of candidate subpixels.For each candidate subpixel, three dissimilarity measures are defined as described below.
The first dissimilarity measure distance p−l is the distance of the candidate subpixel from the epipolar line corresponding to the reference point (Figure 4).In this study, the epipolar line is calculated using the fundamental matrix obtained by corresponding points [38].Its formula is l e = Fx ; l e = Fx, where x and x are the corresponding points on both images, F is the fundamental matrix, and l e and l e are the corresponding epipolar lines on both images.In theory, the corresponding points are located on the corresponding epipolar lines, but small deviations may exist due to the inaccuracy of the fundamental matrix.
satisfies the condition in which the  value is smaller than the threshold  , then the optimal matching with subpixel accuracy should be determined further by utilizing multiple constraints.
Secondly, we divide each candidate point satisfied by the Euclidean distance constraint mentioned above into grids and obtain a set of candidate subpixels.For each candidate subpixel, three dissimilarity measures are defined as described below.
The first dissimilarity measure  is the distance of the candidate subpixel from the epipolar line corresponding to the reference point (Figure 4).In this study, the epipolar line is calculated using the fundamental matrix obtained by corresponding points [38].Its formula is  =  ;  = , where  and  are the corresponding points on both images,  is the fundamental matrix, and  and  are the corresponding epipolar lines on both images.In theory, the corresponding points are located on the corresponding epipolar lines, but small deviations may exist due to the inaccuracy of the fundamental matrix.The second dissimilarity measure  is the distance of the candidate subpixel from the corresponding side in the corresponding triangle, as shown in Figure 4.
The third dissimilarity measure is related to the Mahalanobis distance, which was introduced by P.C. Mahalanobis in 1936.This distance is a measure of the distance between a point and a distribution.The formula is as follows: ℎ  = ( − ) ∑ ( − ) where  is a sample point,  is the mean value of a set of sample points, and ∑ is the covariance matrix.In this study, the Mahalanobis distance can also be defined as a dissimilarity measure between two random vectors x and y of the same distribution with the covariance matrix ∑, ℎ  = ( − ) ∑ ( − ) where  and  are two matched points in a single image.We suppose the corresponding points on both images are denoted by the two sets  and  .For any new point-to-point correspondence ( , ) a a′′ on both images, the Mahalanobis distance between a and the corresponding points in the set  is similar to that between a′′ and the corresponding points in the set  .The dissimilarity measure  is defined on the basis of this principle.To reduce the computational complexity, we only calculate the Mahalanobis distance between the point and the three vertices of the triangle where the point is located.We assume that  and  are the Mahalanobis distance vector of 3 × 1 corresponding to the reference and candidate points, respectively.We define the third dissimilarity measure as  = (| −  |), where (  ) denotes the ratio of the sum of all elements in the vector and the total number of elements in the vector.
On the basis of the third dissimilarity measure mentioned above, we define two thresholds  , where  ∈ {3,4}, and eliminate the candidate subpixels with any element in | −  | greater than  or  >  .Thus, only the subpixels fulfilling the two constraints above are retained.Then, an overall matching similarity score  related to four dissimilarity measures is determined to achieve the optimal matching subpixel for the reference point.The second dissimilarity measure distance p−s is the distance of the candidate subpixel from the corresponding side in the corresponding triangle, as shown in Figure 4.
The third dissimilarity measure is related to the Mahalanobis distance, which was introduced by P.C. Mahalanobis in 1936.This distance is a measure of the distance between a point and a distribution.The formula is as follows: where x is a sample point, µ is the mean value of a set of sample points, and is the covariance matrix.
In this study, the Mahalanobis distance can also be defined as a dissimilarity measure between two random vectors x and y of the same distribution with the covariance matrix , where x and y are two matched points in a single image.We suppose the corresponding points on both images are denoted by the two sets s and s .For any new point-to-point correspondence (a, a ) on both images, the Mahalanobis distance between a and the corresponding points in the set s is similar to that between a and the corresponding points in the set s .The dissimilarity measure distance m is defined on the basis of this principle.To reduce the computational complexity, we only calculate the Mahalanobis distance between the point and the three vertices of the triangle where the point is located.We assume that d m1 and d m2 are the Mahalanobis distance vector of 3 × 1 corresponding to the reference and candidate points, respectively.We define the third dissimilarity measure as distance m = mean( d m1 − d m2 ) , where mean() denotes the ratio of the sum of all elements in the vector and the total number of elements in the vector.
On the basis of the third dissimilarity measure mentioned above, we define two thresholds T k , where k ∈ {3, 4}, and eliminate the candidate subpixels with any element in |d m1 − d m2 | greater than T 3 or distance m > T 4 .Thus, only the subpixels fulfilling the two constraints above are retained.Then, an overall matching similarity score Θ s related to four dissimilarity measures is determined to achieve the optimal matching subpixel for the reference point.
where w k represents the different weights corresponding to four dissimilarity measures, in which k ∈ {1, 2, 3, 4}.In this study, we set the weights to 0.45, 0.25, 0.15, and 0.15, respectively.As for all the remaining potential matching subpixels, we calculate Θ s and find the highest overall similarity score if the highest value is larger than the given threshold T 5 , which corresponds to the subpixel selected as the final matching point.

Matching Propagation
In the existing graph-based constraint matching, the images are divided into different regions by Delaunay triangulation or Voronoi, and many corresponding regions are usually obtained from two images.Matching propagation is performed within these regions (i.e., graph constraint matching propagation).On the basis of the principle in which the corresponding points are located within the corresponding regions, the matching search is conducted in the corresponding regions in different views.Most methods use the feature points extracted in the region as the matching primitives [16,26,28].This propagation method belongs to the patch-to-patch propagation strategy or the triangle-to-triangle propagation strategy.Different from the region propagation strategies, a new edge-to-edge matching propagation strategy is adopted in this study.This new strategy uses the edge of the triangle as the carrier of matching propagation and the two types of points (midpoints and intersections) on the edge as the matching primitive.As the main principle of this propagation method, triangulation is updated by inserting newly matched points, new edges are generated as a means of obtaining new matching points, and matching is performed on the corresponding edges of the corresponding triangles in different views.Compared with the region-to-region propagation strategy, our proposed strategy offers the advantage of having an edge that is more stable than the region in theory.The principle of matching propagation is described in Algorithm 1. i f dis(des_a j,k , des_a j,k ) ≤ Th 2 , grid pixel Calculate score for each subpixel in pixel( j, k), end end i f max score ≥ T 5 , Pt = [Pt; [a, subpixel max score ]], end end During matching propagation, we only deal with newly generating edges of triangles in the updating process of triangulation to avoid repeated matching in each iteration.The blue triangles shown in Figure 5a represent the initial triangulation, while the black and green points are the newly matched points (midpoints and intersections, respectively) generated by the first iteration.We insert the newly matched points into the initial triangulation and obtain the updated triangulation.The red edges of triangles in Figure 5b are the newly generated or split edges after the first matching, which are used in the subsequent iteration matching.During matching propagation, we only deal with newly generating edges of triangles in the updating process of triangulation to avoid repeated matching in each iteration.The blue triangles shown in Figure 5a represent the initial triangulation, while the black and green points are the newly matched points (midpoints and intersections, respectively) generated by the first iteration.We insert the newly matched points into the initial triangulation and obtain the updated triangulation.The red edges of triangles in Figure 5b are the newly generated or split edges after the first matching, which are used in the subsequent iteration matching.Figure 6 shows the results of the different stages of matching for the tested stereo pairs, with  = 30.Figure 6b presents the matching result of the second iterations, while Figure 6c shows the final matching result, that is, the result of the ninth iteration.Two types of matching primitives (the intersection points and midpoints) are matched by propagation, as shown by the green and red marks, respectively.The intersection points in Figure 7 are all located on the feature edge of the building, while the midpoints have an approximately uniform distribution.The numbers of the two types of points increase in each iteration (Figure 8).As the number of iterations increases, the number of corresponding points and triangles also increases, especially in the first four iterations in which the number of corresponding points increases greatly.The results of the intersection point and the midpoint are relatively stable after five and eight iterations, respectively.These phenomena are consistent with our original assumption that the number of midpoints increases accordingly with the increase in triangles in the iterations, whereas the number of extracted line segments in the image is fixed.Figure 6 shows the results of the different stages of matching for the tested stereo pairs, with T s = 30.Figure 6b presents the matching result of the second iterations, while Figure 6c shows the final matching result, that is, the result of the ninth iteration.Two types of matching primitives (the intersection points and midpoints) are matched by propagation, as shown by the green and red marks, respectively.The intersection points in Figure 7 are all located on the feature edge of the building, while the midpoints have an approximately uniform distribution.The numbers of the two types of points increase in each iteration (Figure 8).As the number of iterations increases, the number of corresponding points and triangles also increases, especially in the first four iterations in which the number of corresponding points increases greatly.The results of the intersection point and the midpoint are relatively stable after five and eight iterations, respectively.These phenomena are consistent with our original assumption that the number of midpoints increases accordingly with the increase in triangles in the iterations, whereas the number of extracted line segments in the image is fixed.

Further Experiments and Analysis
The performance of our method is evaluated by selecting twelve representative image pairs for the dense matching (Figure 9).These image pairs include various image transformations, such as viewpoint changes, scale changes, illumination, and rotation.A computer with a 3.6-GHz central processing unit (CPU) and 16 GB of random-access memory (RAM) is used for the experiment, and the code is implemented in Matlab 2016a.Extensive experiments are conducted to select the initial proper values for certain parameters of the proposed method.Then, hierarchical matching is performed in different scene images with determined parameters.

Further Experiments and Analysis
The performance of our method is evaluated by selecting twelve representative image pairs for the dense matching (Figure 9).These image pairs include various image transformations, such as viewpoint changes, scale changes, illumination, and rotation.A computer with a 3.6-GHz central processing unit (CPU) and 16 GB of random-access memory (RAM) is used for the experiment, and the code is implemented in Matlab 2016a.Extensive experiments are conducted to select the initial proper values for certain parameters of the proposed method.Then, hierarchical matching is performed in different scene images with determined parameters.

Further Experiments and Analysis
The performance of our method is evaluated by selecting twelve representative image pairs for the dense matching (Figure 9).These image pairs include various image transformations, such as viewpoint changes, scale changes, illumination, and rotation.A computer with a 3.6-GHz central processing unit (CPU) and 16 GB of random-access memory (RAM) is used for the experiment, and the code is implemented in Matlab 2016a.Extensive experiments are conducted to select the initial proper values for certain parameters of the proposed method.Then, hierarchical matching is performed in different scene images with determined parameters.

Parameter Selection
A method for accuracy assessment based on the homography matrix is adopted to assess the matching accuracy.Firstly, the homography matrix is calculated according to the corresponding points obtained by matching.Next, the corresponding points in the reference image are mapped onto the search image by using the homography matrix.The distance histogram is obtained according to the distance between the mapping and corresponding points.Finally, the inner and outer points are determined according to the distance histogram, and the matching accuracy is calculated.
The main parameters to be tuned in our method are  , where  ∈ {1,2,3,4,5}, and  . and  are the descriptors for the dissimilarity measure threshold. is used to determine whether the

Parameter Selection
A method for accuracy assessment based on the homography matrix is adopted to assess the matching accuracy.Firstly, the homography matrix is calculated according to the corresponding points obtained by matching.Next, the corresponding points in the reference image are mapped onto the search image by using the homography matrix.The distance histogram is obtained according to the distance between the mapping and corresponding points.Finally, the inner and outer points are determined according to the distance histogram, and the matching accuracy is calculated.
The main parameters to be tuned in our method are T k , where k ∈ {1, 2, 3, 4, 5}, and T s .T 1 and T 2 are the descriptors for the dissimilarity measure threshold.T 1 is used to determine whether the correspondence hypothesis is correct in the first stage, and T 2 controls the matching hypotheses to be processed in the second matching stage.T 3 and T 4 are the parameters for the Mahalanobis distance constraint, and T 5 is the overall similarity score threshold used to determine the final matching points in the second stage.We firstly determine the approximate ranges of the parameters and steps according to the histogram distribution of the operation results of all image pairs.Then, initial parameter values are selected (Table 1) after the experimental analysis of representative image pairs.Thereafter, all the image pairs are used to for parametric analysis.Owing to the relatively complex interaction between the parameters in the iterative process and the parameter analysis of the unsupervised results, we adopt the following strategy to determine the parameters.We firstly set the initial parameter values and then update each of the parameters in turn according to the matching results.After parametric analysis of the first eight image pairs (Figure 9a-h) according to the above-mentioned strategy, we find that all test image pairs satisfy a set of parameter values optimally or have only small deviations from the initial parameter values; thus, this set of parameters is chosen as the final parameters for matching (Table 2).The image pair (a) in Figure 9, which is one of the image pairs corresponding to the optimal parameter, is taken as an example.We update parameters T k , where k ∈ {1, 2, 3, 4, 5}, and T s successively.The resulting curves with different parameter values are shown in Figures 10 and 11.The graphs on the left show the number and accuracy of the matching, while the graphs on the right show the number of iterations and running time.The larger the two values on the left figures are and the smaller the two values in the right figures are, the better the matching result will be.As such, the optimal settings can be selected for the parameters or thresholds by combining the four factors mentioned above.
We firstly determine T 1 .The matching results with different values are shown in Figure 10a,b.The number and accuracy can reach their maximum values at the same time at T 1 = 0.8, and the numbers of iterations are all 10.Clearly, the running time decreases with the increase in T 1 .This outcome is attributed to the number of matching hypotheses decreasing in the second matching stage during each iteration with the increase in T 1 , while the running time of the second stage is more time-consuming than the first stage.Thus, the running time is reduced.
We fix the value to T 1 = 0.8 and conduct the experiment with different T 2 values.The result is shown in Figure 10c,d.The accuracy yields the local peak at T 2 = 1.8, and the number of iterations and run time reach the local minimum peak at the same time.The findings indicate that the initial T 2 value can be retained at T 2 = 1.8.
Figure 10e,f show the results with different T 3 values.The number of matching result shows a rising trend with the increase in T 3 , and the accuracy yields a peak at T 3 = 0.011.The time and the number of iterations are relatively stable even though the number of iterations is nine when T 3 is 0.007 or 0.008, and the corresponding accuracy is low.Therefore, T 3 must be selected in a manner wherein the above-mentioned aspects are balanced and T 3 is updated to 0.011.Similar to  , parameter  constrains the matching candidates in the second stage.The larger both values are, the more candidate matches will be generated in each iteration.As shown in Figure 11a,b, the accuracy of the result can reach the local peak at  = 0.005, and the number, run time, and iteration times begin to gradually flatten from  = 0.005.This finding suggests that the initial  value can be retained.
For matching candidates obtained by the above-mentioned  and  constraints, we calculate the overall similarity score for each candidate and determine the highest score.If the highest score is larger than  , then the candidate corresponding to the highest score will be selected as the final corresponding point.As shown in Figure 11c,d, the number of matching presents a downtrend with the increase in  , and the accuracy yields a peak at  = 0.55.This result indicates that the initial  value can be retained.
is the area of the triangle threshold that controls the size of the propagation triangle.The smaller  is, the denser the matching results will be in general.This scenario can also be observed in Figure 11e,f, in which the number of corresponding points increases with the decrease in  , and the running time is significantly increased at the same time.Although the number of matched points can reach the local maximum at  = 20,  = 30, which has the highest accuracy, is eventually selected in this study when running time is considered.Similar to T 3 , parameter T 4 constrains the matching candidates in the second stage.The larger both values are, the more candidate matches will be generated in each iteration.As shown in Figure 11a,b, the accuracy of the result can reach the local peak at T 4 = 0.005, and the number, run time, and iteration times begin to gradually flatten from T 4 = 0.005.This finding suggests that the initial T 4 value can be retained.
For matching candidates obtained by the above-mentioned T 3 and T 4 constraints, we calculate the overall similarity score for each candidate and determine the highest score.If the highest score is larger than T 5 , then the candidate corresponding to the highest score will be selected as the final corresponding point.As shown in Figure 11c,d, the number of matching presents a downtrend with the increase in T 5 , and the accuracy yields a peak at T 5 = 0.55.This result indicates that the initial T 5 value can be retained.
T s is the area of the triangle threshold that controls the size of the propagation triangle.The smaller T s is, the denser the matching results will be in general.This scenario can also be observed in Figure 11e,f, in which the number of corresponding points increases with the decrease in T s , and the running time is significantly increased at the same time.Although the number of matched points can reach the local maximum at T s = 20, T s = 30, which has the highest accuracy, is eventually selected in this study when running time is considered.
Notably, our method has low sensitivity toward different parameters within a certain range, in which the numbers and accuracies of matching points only have slight differences, and the matching results are relatively stable.Given that the proposed method is an iteration matching method, the last matching result will directly affect the matching result of the next iteration, and the effect of the different parameters on the result is relatively complex.Therefore, some signs can still be followed despite the difficulty in achieving consistent regularity for the whole scheme.
ISPRS Int.J. Geo-Inf.2020, 9, x FOR PEER REVIEW 13 of 19 Notably, our method has low sensitivity toward different parameters within a certain range, in which the numbers and accuracies of matching points only have slight differences, and the matching results are relatively stable.Given that the proposed method is an iteration matching method, the last matching result will directly affect the matching result of the next iteration, and the effect of the different parameters on the result is relatively complex.Therefore, some signs can still be followed despite the difficulty in achieving consistent regularity for the whole scheme.

Different Matching Stages
Table 2 presents the required parameters that are finally determined according to the parametric analysis in Section 3.1.The proposed method is used for the dense matching of the 12 sets of images previously shown in Figure 9.The experimental results are listed in Table 3 and depicted in Figure 12.The columns in Table 3 represent the number of initial corresponding points, the stage of finished matching, the number of matching iterations, the number of matched midpoints, the number of matched intersection points, the number of all corresponding points, the number of incorrect matching, the matching accuracy, and the running time, respectively.

Different Matching Stages
Table 2 presents the required parameters that are finally determined according to the parametric analysis in Section 3.1.The proposed method is used for the dense matching of the 12 sets of images previously shown in Figure 9.The experimental results are listed in Table 3 and depicted in Figure 12.The columns in Table 3 represent the number of initial corresponding points, the stage of finished matching, the number of matching iterations, the number of matched midpoints, the number of matched intersection points, the number of all corresponding points, the number of incorrect matching, the matching accuracy, and the running time, respectively.The results in Table 3 show several findings.Firstly, the proposed method utilizes the two-stage hierarchical matching strategy.The number of corresponding points can be increased by adding an implementation of the second matching stage, which will improve the accuracy of some results.However, with the increase in the number of corresponding points and the computational complexity in the second matching stage, the method will be a time-consuming process.Secondly, the number of matched midpoints is higher than the number of matched intersections for the two matching primitives.This phenomenon can be explained by the number of midpoints, which increases dynamically in the iterative matching process.By contrast, the number of extracted line segments is fixed, while that of the matched intersections tends to be stable after a few iterations.Thirdly, as shown by the corresponding points and their distribution in different colors in Figure 11, the corresponding points manifest a consistent distribution.Lastly, as for the experimental images selected in this study, the matching accuracies of the method all exceed 98%, which indicates the method's high accuracy.This result can be attributed to the strict constraints of the algorithm and the reliability of the descriptors.The robustness of the method is also verified by the different types of image matching experiments.

Comparison with Jia's Method
We compare the results of the proposed method with the results of the quasi-dense matching method presented in Jia (2019).In order to perform a comparative analysis of the two methods under consistent conditions, we utilize the same initial corresponding points to construct initial triangulations during the implementation of Jia's method.Similar to our method, only triangles with an area greater than 30 are used for matching propagation in the matching process.The comparison results of the number of corresponding points, accuracy, the number of iterations, and running times of the two methods for 12 sets of image pairs are provided in Figure 13.As observed, the number of corresponding points of the proposed method is greater than that of Jia's method for all test images, where Jia's method fails in image pairs (e) and (f).In terms of accuracy, the accuracy of both methods is above 97%.However, the accuracy of the proposed method is higher than that of Jia's method for most of the image pairs.Notably, Jia's method has an accuracy of 100% at image pair (f), but this accuracy belongs to the accuracy of the initial seed point because Jia's method fails in image pair (f) and does not obtain the corresponding point.However, no regularity is observed in the number of iterations between the two methods due to the complexity of the iterative process.We also compare the computation times required by the two methods.The results show that our method runs 0.1-4.1 times slower than Jia's method, except for the image pairs (e) and (f).These results are due to three reasons.Firstly, the proposed method is designed using two matching stages; points that do not meet the descriptor similarity constraint in the first stage are further matched by extending the matching candidates and using multiple constraints, which increases the number of matches while ensuring match reliability.Secondly, compared with similarity constraints of single-pixel color information in Jia's method, our constructed gradient descriptors are more robust for different images, especially for images with illumination change.The proposed method produces higher dense matching results, whereas Jia's method fails to match.Thirdly, although the above-mentioned aspects contribute to the dense matching results of the proposed method, they require heavy workload.As a result, Jia's method achieves better computation times than our method, but the proposed method outperforms Jia's method in terms of robustness and density.
ISPRS Int.J. Geo-Inf.2020, 9, x FOR PEER REVIEW 16 of 19 most of the image pairs.Notably, Jia's method has an accuracy of 100% at image pair (f), but this accuracy belongs to the accuracy of the initial seed point because Jia's method fails in image pair (f) and does not obtain the corresponding point.However, no regularity is observed in the number of iterations between the two methods due to the complexity of the iterative process.We also compare the computation times required by the two methods.The results show that our method runs 0.1-4.1 times slower than Jia's method, except for the image pairs (e) and (f).These results are due to three reasons.Firstly, the proposed method is designed using two matching stages; points that do not meet the descriptor similarity constraint in the first stage are further matched by extending the matching candidates and using multiple constraints, which increases the number of matches while ensuring match reliability.Secondly, compared with similarity constraints of single-pixel color information in Jia's method, our constructed gradient descriptors are more robust for different images, especially for images with illumination change.The proposed method produces higher dense matching results, whereas Jia's method fails to match.Thirdly, although the above-mentioned aspects contribute to the dense matching results of the proposed method, they require heavy workload.As a result, Jia's method achieves better computation times than our method, but the proposed method outperforms Jia's method in terms of robustness and density.

Conclusion
A hierarchical method based on triangulation constraint and propagation for the dense matching of points from a close-range image is proposed.The matching begins from the initial triangulation constructed by the initial corresponding points generated using the SIFT method.The correspondences between the image pairs are established with the midpoints and intersection points in the edges of the triangles as matching primitives.The characteristics of the proposed method are manifold.Firstly, we utilize points in the edges of triangles as the matching primitives and then conduct edge-to-edge matching propagation in the process of matching iteration.Our approach is more stable in theory than the region-to-region propagation.Secondly, we use two types of matching strategies for the different matching stages.We further match points that cannot be matched by the

Conclusions
A hierarchical method based on triangulation constraint and propagation for the dense matching of points from a close-range image is proposed.The matching begins from the initial triangulation constructed by the initial corresponding points generated using the SIFT method.The correspondences between the image pairs are established with the midpoints and intersection points in the edges of the triangles as matching primitives.The characteristics of the proposed method are manifold.Firstly, we utilize points in the edges of triangles as the matching primitives and then conduct edge-to-edge matching propagation in the process of matching iteration.Our approach is more stable in theory than the region-to-region propagation.Secondly, we use two types of matching strategies for the different matching stages.We further match points that cannot be matched by the descriptor similarity in the first stage by utilizing multiple constraints.An overall matching similarity score is defined for the multiple constraints with different weights.In this manner, the number of matching results can be increased, and the reliability of the matching can be enhanced.Thirdly, two aspects need to be considered in improving the efficiency of the matching.The first aspect focuses on the multithreading technique for improving processing performance.The second aspect is only for the newly generated triangle edges, which are processed in each iteration matching to avoid repeated matching.Fourthly, the descriptor construction with overlapping subregions can improve the reliability of the descriptors and effectively restrain the propagation of mismatches.Lastly, intersections between feature line segments and triangle edges are introduced as the matching primitive.This way enriches the edge information of the result.
The results of the 12 typical close-range image pairs show that matching accuracy can exceed 98%.Thus, the proposed method has high robustness for different images and can obtain reliable matching results.In the future, we plan to change the range of matching results in images, from the coverage of corresponding points to the maximum overlap of image pairs, by performing an outward growth of a region with seed points.We plan to conduct tests to validate if the method can also obtain reliable results for aerial images.We also intend to transform the matching results into discrete point clouds for further 3D reconstruction.

Figure 1 .
Figure 1.Flowchart of the proposed method.

Figure 1 .
Figure 1.Flowchart of the proposed method.
), and the corresponding vertices (A, A ), (B, B ), and (C, C ) are three pairs of corresponding points.For any midpoint a, its candidate point a is the midpoint of the corresponding side of the corresponding triangle in the search image.For any intersection point d, its candidate point d is the intersection point between the epipolar line of point d in the search image and the corresponding side of the corresponding triangle in the search image.For each correspondence hypothesis (a, a ) or (d, d ), we propose a new descriptor for similarity measure calculation and determine whether the hypothesis offers the correct match.ISPRS Int.J. Geo-Inf.2020, 9, x FOR PEER REVIEW 4 of 19

Figure 2 .
Figure 2. Two types of matching primitives and their corresponding matching hypotheses.Three sets of midpoint correspondence hypotheses are (a, a ), (b, b ), and (c, c ); (d, d ) is an intersection point correspondence hypothesis.
ISPRS Int.J. Geo-Inf.2020, 9, x FOR PEER REVIEW 5 of 19 increases the correlation between regions and avoids the influence of region inconsistency caused by large viewpoint changes.

Figure 3 .
Figure 3. Schematic of the point descriptor construction.

Figure 3 .
Figure 3. Schematic of the point descriptor construction.

Figure 5 .
Figure 5. Propagating process of matching: (a) initial triangulation and two types of newly matched points; (b) updated triangulation using newly matched points.

Figure 5 .
Figure 5. Propagating process of matching: (a) initial triangulation and two types of newly matched points; (b) updated triangulation using newly matched points.

Figure 6 .
Figure 6.Propagating process of matching: (a) initial matching result; (b) further matching result; (c) final matching result.

Figure 6 .
Figure 6.Propagating process of matching: (a) initial matching result; (b) further matching result; (c) final matching result.

Figure 6 .
Figure 6.Propagating process of matching: (a) initial matching result; (b) further matching result; (c) final matching result.

Figure 7 .
Figure 7. Zoomed view of the final triangulation from the two types of matching primitives.

Figure 8 .
Figure 8. Numbers of different types of corresponding points and triangles in varying stages of matching iterations.

Figure 7 .
Figure 7. Zoomed view of the final triangulation from the two types of matching primitives.

Figure 6 .
Figure 6.Propagating process of matching: (a) initial matching result; (b) further matching result; (c) final matching result.

Figure 7 .
Figure 7. Zoomed view of the final triangulation from the two types of matching primitives.

Figure 8 .
Figure 8. Numbers of different types of corresponding points and triangles in varying stages of matching iterations.

Figure 8 .
Figure 8. Numbers of different types of corresponding points and triangles in varying stages of matching iterations.

Figure 10 .
Figure 10.Matching results of different parameters: (a,c,e) represent the number of corresponding points and accuracy of matching; (b,d,f) represent the number of matching iterations and running time of the proposed method.

Figure 10 .
Figure 10.Matching results of different parameters: (a,c,e) represent the number of corresponding points and accuracy of matching; (b,d,f) represent the number of matching iterations and running time of the proposed method.

Figure 11 .
Figure 11.Matching results of different parameters: (a,c,e) represent the number of corresponding points and accuracy of matching; (b,d,f) represent the number of matching iterations and running time of the proposed method.

Figure 11 .
Figure 11.Matching results of different parameters: (a,c,e) represent the number of corresponding points and accuracy of matching; (b,d,f) represent the number of matching iterations and running time of the proposed method.

Figure 12 .
Figure 12.Matching result of the proposed method for image pairs, with corresponding points shown in green and blue color gradients.(a,c,e,g,i,k,m,o,q,s,u,w) are the reference image in each image pair, respectively; (b,d,f,h,j,l,n,p,r,t,v,x) are the search image in each image pair, respectively.

Figure 12 .
Figure 12.Matching result of the proposed method for image pairs, with corresponding points shown in green and blue color gradients.(a,c,e,g,i,k,m,o,q,s,u,w) are the reference image in each image pair, respectively; (b,d,f,h,j,l,n,p,r,t,v,x) are the search image in each image pair, respectively.

Figure 13 .
Figure 13.Comparison of the proposed method and Jia's method for 12 sets of image pairs: (a) number of matched points obtained by both methods; (b) accuracy of the two methods; (c) number of iterations of the two methods; (d) running times of the two methods.

Figure 13 .
Figure 13.Comparison of the proposed method and Jia's method for 12 sets of image pairs: (a) number of matched points obtained by both methods; (b) accuracy of the two methods; (c) number of iterations of the two methods; (d) running times of the two methods.

Table 1 .
Initial threshold values for matching.

Table 2 .
Final selected thresholds and parameters for matching.

Table 3 .
Comparison of matching results of the proposed method in different stages for twelve image pairs.