Local Edge Matching for Seamless Adjacent Spatial Datasets with Sequence Alignment

: This study proposes a local edge matching method with a sequence alignment technique for adjacent spatial datasets. By assuming that the common boundary edges of the datasets are point strings, the proposed method obtains the sequence for point edit operations to align the edges by using the string matching algorithm with the following operations: (1) snapping two points from each string to their average position, (2) removing a point from one string and (3) removing a point from the other string. The costs for these operations are derived from the deformation of the involved line segments in terms of the angle and length changes. The corresponding point pairs are then considered point pairs for which the snapping operation is chosen in a sequence. Based on these pairs, a border area of adjacent spatial datasets can be partitioned into sub-border areas where distinctive matching and alignment processes can be performed.


Introduction
Edge matching between adjacent spatial datasets finds and removes the geometric differences of common boundary edges between adjacent spatial datasets [1][2][3][4].There are two components in this task: feature matching to identify the corresponding point or edge pairs and map alignment to remove the differences in the identified pairs.Because map alignment is performed with the corresponding point pairs, point feature matching is generally used in the field of geospatial information.For this reason, diverse point matching methods have been proposed.Beard and Chrisman proposed the Zipper algorithm, which finds the closest point pairs within a distance threshold and determines that these pairs are the corresponding point pairs [1].This algorithm is simple and efficient; thus, it has been adopted in many studies [2][3][4][5].However, one problem with this algorithm lies in estimating an appropriate threshold because even though the threshold is related to the positional accuracies of the spatial datasets involved, it is not clear how the accuracies are translated into a threshold value [6].To address this problem, a statistical analysis of the sampled corresponding point pairs in the training data was applied.Lolonis [4] proposed a chi-square analysis on the positional differences of candidate corresponding point pairs and then discriminated mismatched pairs according to the level of statistical confidence.Masuyama [5] determined the threshold with a Monte-Carlo simulation-based analysis of the sampled pairs.Huh et al. [7] used a box-plot analysis on the sampled pairs because the analysis can discriminate proper and outlier values in the sampled pairs without any assumption of an underlying statistical distribution.Meanwhile, map alignment methods without the above feature matching were recently proposed [6,8].They represent the geometric differences of adjacent spatial datasets as gaps and overlaps.The gaps and overlaps stand for erroneous areas where no spatial data coverage exists and where coverage from different datasets co-exists, respectively.These areas are decomposed into triangles by using computational geometry techniques such as a constrained Delaunay triangulation method.Then, each of the triangles [6] or divided parts of each of the triangles [8] is assigned to one of the spatial datasets and seamless common boundary edges without gaps and overlaps are obtained.
Although the above studies have presented successful results, additional improvements are necessary in feature matching to address locally auto-correlated positional differences.In the field of map conflation, which finds corresponding point pairs or corresponding objects between super-imposed spatial datasets, a transformation model such as an affine or rigid transformation model is applied to explain the auto-correlated positional differences.These types of differences also arise in the edge matching shown as sub-border areas in Figure 1.By considering the horizontal direction differences of the corresponding point pairs in sub-border area 1, (a(1), b(1)), (a(2), b(2)), (a(3), b(3)) and (a(4), b(4)) are proper corresponding point pairs.However, (a(3), b(2)) and (a(5), b(4)) are closest pairs and thus would be chosen for the corresponding point pairs when a general matching method is applied.Meanwhile, there are vertical direction differences for the corresponding point pairs in sub-border area 3. When there are many sub-border areas whose positional difference directions are distinctive, these differences cannot be explained by a single transformation model.
Moreover, this problem can be more severe when there are many road or river objects that are divided by boundaries of adjacent spatial datasets and are partially represented in each dataset.When the two datasets are constructed by the same mapping agency with the same mapping rule, divided spatial objects have the same attributes, including the name or identification code.In this case, these objects are found first and then the corresponding point pairs of each corresponding object pair can be easily found.When the two datasets are not constructed by the same mapping agency with the same mapping rule, only geometric properties such as distance or shape are available to find corresponding point pairs.However, feature matching with these geometric properties is vulnerable to the aforementioned problem in Figure 1.
Moreover, this problem could not be addressed by a transformation model because the border area's length is often more than several kilometres and a simple model cannot explain locally distinctive differences.

Figure 1. Positional differences between adjacent spatial datasets.
To address this problem, this study proposes a local edge matching method with sequence alignment by using a string matching algorithm.By assuming that the border boundary edges of two adjacent maps are point strings, the proposed method obtains the sequence for point edit operations to align the edges by using the following operations: (1) snapping two points from each string to their average position, (2) removing a point from one string and (3) removing a point from the other string.The costs for these operations are derived from the deformation of the involved line segments in terms of the angle and length changes.The optimal operation sequence with the minimum total cost is obtained by means of a dynamic programming optimization technique [9,10].The corresponding point pairs are then considered point pairs for which the snapping operation is chosen in a sequence.Based on these pairs, a border area of adjacent spatial datasets can be partitioned into sub-border areas where distinctive matching and alignment processes can be performed.
The remainder of the paper is structured as follows.In the next section, the details of the proposed method are presented.In Section 3, the results of the proposed method are evaluated and discussed.The proposed method is applied to two cadastral maps of neighbouring local authorities.The result of the proposed method is compared with manually chosen corresponding point pairs and with a conventional distance threshold method.Finally, the conclusion is given in Section 4.

Proposed Method
The proposed method consists of four steps:

Extraction of Point Strings from Adjacent Spatial Datasets
This step is a preprocessing step to extract boundary point strings from spatial datasets.Technically, different data manipulations are necessary according to the data type of involved spatial datasets.In this study, there are two assumptions regarding spatial datasets.The first assumption is that these datasets are built and maintained by mapping agencies only for their own catchment areas, so that the boundaries of the datasets present a seamless spatial dataset after edge matching.The other assumption is that the datasets are composed of polygon objects that are mutually exclusive and collectively exhaustive, such as a cadastral map.
Currently, it is necessary to define a border area where point strings to be matched and aligned are extracted.This area is obtained as a union area of buffered boundaries for two spatial datasets (Figure 3c).With a buffer distance ℎ, each spatial dataset's boundaries are extended (Figure 3a,b) and all line segments and points for these datasets within the union area are extracted.Then, line segments that are not for each dataset's boundary edges are obtained as duplicate pairs because of the aforementioned assumptions (Figure 3d).These duplicate segments are removed and the remaining line segments are concatenated into polylines, which are represented as point strings (Figure 3e).Later, these removed line segments connected to these point strings are used for matching the disconnected line segments in Step 3. It is notable to determine the order of the point strings.In general, the point order of a simple polygon is counter-clockwise.Similarly, end points for each point string are connected temporarily, and then the orders are determined to be counter-clockwise.

These operations change the shapes of the boundary edges by moving or removing involved points.
There have been studies to align boundaries for the corresponding object pairs with point edit operations [7,[11][12][13].These studies proposed various cost functions quantitatively to measure the change in the shape by finding the optimal point edit operation sequence with the minimum total cost.Among the cost functions, we choose the deformation energy model [7].It measures the change in terms of the stretching and bending energy, as expressed by Equation (1).
Here, Δ and Δ are the length and angle differences caused by an operation, respectively, and λ(0 ≤ λ ≤ 1) is a coefficient for the weighted summarization of the stretching and bending energy.
In Figure 4a, the snapping operation ( () ,  () → ) changes the lengths and angles of four boundary line segments.Thus, according to Equation 1, the cost of the snapping operation is determined by Equation (2).
In this equation,  , is the length (meter) of a boundary line segment between points  and .  is the angle change (degree) at point  after an operation.
In Figure 4b, the removing operation ( () → ) changes the lengths and angles of two line segments of one boundary.Thus, the cost of the removing operation is formulated as Equation (3).
Here,  () is one type of edit operation and  () is the cost of  () . is the total number of edit operations.
Given the two point strings and the cost functions, the optimal sequence  is obtained by the dynamic programming technique shown in Figure 5 based on the following property of Equation ( 5) [14].
Here,  <1,> = [ (1) ,  (2) , ⋯ ,  () ] and  <1,> = [ (1) ,  (2) , ⋯ ,  () ] denote two partial point strings from  (1) to  () , and from  (1) to  () , respectively.c(• |) indicates the cost for a point edit operation whose point pair of the latest snapping operation is .As explained earlier, the above cost functions need the point pairs for the latest snapping operation.To support this information when calculating C( <1,> ,  <1,> ), matrix  is constructed in Figure 5.A distance constraint for the snapping operation to improve the matching accuracy is added; the distance between the corresponding points is less than ℎ.This parameter is the same as the buffer distance in the previous step.

Local Map Transformation and Matching for Disconnected Line Segments
The point strings extracted in Step 1 are divided into sub-point strings and are locally aligned by a rigid transformation according to the average position of corresponding points as shown in Figure 6.Although the string matching in this study finds most of the corresponding point pairs between common boundaries of adjacent spatial datasets, it does not work well for points of disconnected lines segments as shown in Figure 6a.This is because the string matching in this study searches the edit sequence to align two point strings with the minimum cost.Generally, points on the disconnected line segments are incident to almost straight segments and their removing operation often causes a smaller amount of deformation compared with their snapping operation.Therefore, it often misses the corresponding point pairs of the disconnect line segments.To solve this problem, the mutually closest point pairs within a tolerance distance  are also chosen in post-processing as shown in Figure 6a.Meanwhile, there are no point pairs within  in Figure 6b,c.

Map Alignment to Remove Gap and Overlap
After the above map transformation, there are three types of geometric differences: linear difference (Figure 7a), triangle difference (Figure 7b) and polygonal difference (Figure 7c).The linear difference is a case when every point for the sub-point strings has its own corresponding points so that snapping these point pairs to their average positions removes their geometric differences.The triangle difference is a case when one sub-point string is a line segment and the other is a polyline with two line segments.In this case, an aligned boundary edge is obtained by connecting the snapped corresponding points via the centroid of the triangle.The polygon difference is a complicated problem compared with the above two cases.In shape analysis, this difference is studied as a skeleton problem that finds a thin version of a shape that is equidistant to its boundaries.Among skeleton methods, a constrained Delaunay triangulation (CDT) method is applied to obtain an aligned boundary edge.CDT is a generalization of the Delaunay triangulation that forces certain line segments into the edges of triangulation.In this study, the aforementioned constraint is imposed on the line segments of the transformed boundary edge.Then, an aligned boundary edge is obtained by connecting snapped corresponding points via the mid-points of two internal edges of each triangle.The details for this CDT are explained in [15].
Sometimes, transformed corresponding sub-point strings intersect as shown in Figure 8.In this case, these strings are divided according to the intersection points and the above map alignment method is applied for each section by assuming that these intersection points are snapped corresponding points.

Dataset and Parameter Determination
We applied the proposed method to two adjacent cadastral maps, as shown in Figure 9. Map A is a cadastral map of the Geum-Cheon district in Seoul city, and map B is a cadastral map of the Gwang-Myeong city in the Gyeong-gi province.The length of the border area is approximately 10 km.Because these maps are created by joining the respective legacy parcel maps and they are maintained independently by local authorities, irregular positional discrepancies arise between the boundary edges of the maps.
The proposed method has three parameters: ℎ,  and .Among them, ℎ is determined by a statistical analysis of 351 corresponding point pairs in the training area of Figure 9.We apply the boxplot method [16] to determine ℎ because the threshold should be a feasible upper limit of the lengths for the corresponding point pairs.The method begins by finding the median of the training data and then doing the same for each of the halves.These upper and lower quartiles define the centre box and are often referred to as the upper hinge () and lower hinge (), respectively.The upper inner fence () is defined as an upper fence of the box that is extended by 1.5 times the length of the box towards the maximum, and the upper whisker ( ) is defined as the farthest observation inside the  , as expressed by Equations ( 6) and ( 7), respectively, and as used for ℎ. is used for ℎ, and it has a value of 8.89 m as calculated from the pairs in the training data, as shown in Figure 10.Meanwhile, the remaining parameters  and  cannot be directly trained by this analysis.Thus, various values of the parameters are evaluated and then the optimal parameters with the highest level of matching accuracy are obtained.We used three types of measures for the accuracy: precision, recall and the F-measure.Precision refers to the ratio of correctly found pairs over the total number of found pairs and recall denotes the ratio of correctly found pairs to the total number of correct pairs.The F-measure is defined by Equation (8).In this equation,  and  represent precision and recall, respectively.
We applied 21 candidate s from 0 to 1 with 0.05 intervals and five candidate s from 1 m to 9 m with 2 m intervals.The highest precision (0.924) was obtained when  and  were 0.95 and 5 m, respectively, as shown in Figure 11a.In addition, the highest recall (0.887) was obtained when  and  were 0.25 and 7 m, respectively, as shown in Figure 11b.According to , precision and recall presented a trade-off relationship.When  increased and became closer to one, the degree of precision increased.Meanwhile, when  decreased and became closer to zero, the degree of recall increased.Thus,  serves as a matching threshold for the proposed method.In general, a tighter threshold presents a smaller number of matching pairs with higher precision and lower recall, whereas a looser threshold presents a larger number of matching pairs with lower precision and higher recall.In this study, the proposed method uses the snapping and removing edit operations.In addition, one of the operations that presents the minimum total cost is chosen for a given point.Accordingly, a matching result with high precision and low recall according to a large  indicates an increase in the cost of the snapping operation.Meanwhile, a matching result with low precision and high recall according to a small  indicates a decrease in the cost of the snapping operation.Compared with ,  does not have a meaningful effect on the matching performance.As a result, the highest F-measure (0.902) was obtained when  and  were 0.25 and 5 m, respectively, as shown in Figure 11c.

Result and Discussion
To compare the performance and find the corresponding point pairs for edge matching, a statistical evaluation of the proposed method and a distance threshold method is performed in the test area.As shown in Table 1, precision, recall and F-measure values of the distance threshold method with Th of 8.89 m were 0.761, 0.927 and 0.835, respectively.Meanwhile, the proposed method with the parameters determined in the previous section showed higher precision and F-measure values; however, they also showed a lower recall value.These findings indicate that the proposed method has a tighter matching criterion and that the overall matching result of the proposed method was better than that of the conventional distance threshold method in terms of the F-measures.
Table 1.The statistical evaluation of the proposed method and a distance threshold method in the test area in Figure 9.A higher precision for the proposed method was obtained especially from points on disconnected line segments, as shown in Figure 12b,c.When the distances between these points are sufficient compared with the positional discrepancies between adjacent maps, as shown in Figure 12a, the two methods presented nearly the same results.However, the distances for the erroneous point pairs are coincidently shorter than those of true point pairs, as shown in Figure 12b, which shows that the distance threshold method was vulnerable to this problem and presented an inaccurate result with low precision.This problem became even more severe for the points on parallel disconnected line segments, which describe the roads as shown in Figure 12c.Occasionally, the widths of these roads are not sufficient compared with the positional discrepancies between the maps.Therefore, many erroneous point pairs can be obtained by the distance threshold method because the closest points are simply chosen for the pairs regardless of the neighbouring auto-correlated positional discrepancies.Meanwhile, the proposed method performs a matching process by considering these discrepancies; hence, it presents an improved matching performance in terms of precision.Figure 13 is the result of map alignment with these corresponding point pairs.
Compared with the higher degree of precision, the proposed method presented lower recall, as shown in Table 1.This occurred because it tends to choose corresponding point pairs between salient corner points rather than those on nearly straight line segments, as shown in Figure 14a.The proposed method determines corresponding point pairs as the point pairs for which the snapping operation is chosen in the optimal sequence with which two point strings are aligned at the minimum total cost.In general, the removing operation of a salient corner point causes a considerable amount of shape deformation on the line segments involved, which results in a larger edit operation cost than that of the snapping operation.This relationship is opposite for the points connected by nearly straight line segments.Thus, as shown in the previous figures, the proposed method accurately found the corresponding point pairs between salient corner areas with a higher degree of recall accuracy.However, except for those disconnected line segments, the corresponding point pairs between points connected by nearly straight line segments were not sufficiently found because the choices in removing operations for them present a lower cost.This property prevents irregular stretching or shrinking of the neighbouring border areas, as shown in Figure 14b.The proposed method found the corresponding point pairs between salient corner areas by allowing gradual map alignments along the border area to be obtained.However, when all of the mutually closest points are used, the map alignment result by itself could be abrupt, which decreases the overall performance.Based on the above comparisons to the manually chosen corresponding point pairs and the matching property, the proposed method is superior to previous distance threshold methods.Additionally, it is necessary to discuss the effect of point string order.In Section 2.1, the orders were determined to be counter-clockwise.However, according to the order of whether it is counter-clockwise or clockwise, matching results for the corresponding point pairs can be different.Although both results are nearly the same, this cannot be explained at this stage and further research is necessary.

Figure 14.
Comparison of corresponding point pairs found by the proposed method and the distance threshold method for straight line segments; (a) the proposed method does not find corresponding point pairs between points on nearly straight line segments meanwhile the distance threshold method finds correct pairs, (b) the distance threshold method finds corresponding point pairs which result irregular stretching or shrinking after map alignment meanwhile the proposed method does not cause such problem.

Conclusions
This paper proposed a new method to find corresponding point pairs for common boundary edge matching between adjacent spatial datasets by using a string matching technique with three types of point edit operations.These three operations snapped two points of the strings to their average position, removed a point of one string, and removed a point of the other string.Unlike previous distance threshold methods, the proposed method can consider local auto-correlated positional discrepancies because the choice in edit operation changes the geometries of the boundary line segments involved, after which the changed geometries affect the remaining points' edit costs.Thus, the proposed method presented an improved matching performance compared with a previous method especially for the precision measure.This demonstrates that the proposed method finds more robust corresponding point pairs.However, it found fewer pairs from true corresponding point pairs than the previous method because according to the proposed cost functions, the costs of removing points connected by nearly straight line segments are generally lower than the costs of snapping the points.For this reason, the proposed method tends to find corresponding pairs between salient corner points and not between points connected by nearly straight line segments.However, this property results in gradual map alignments along the border area by preventing the irregular stretching or shrinking of neighbouring borders.As a result, the proposed local edge matching method showed an improved ability to find corresponding point pairs for seamless adjacent spatial datasets.
Step 1. Extraction of point strings from adjacent spatial datasets, Step 2. String matching for extracted point strings, Step 3. Local map transformation and matching for disconnected line segments and Step 4. Local map alignment to remove gaps and overlaps as shown in Figure 2. The details for this method are as follows.

Figure 2 .
Figure 2. Workflow of the proposed method.

Figure 3 .
Figure 3. Extraction of point strings from adjacent spatial datasets; (a) buffer zone of boundary of map A, (b) buffer zone of boundary of map B, (c) union area of buffer zones of boundaries of map A and B, (d) line segments extracted by union area, (e) point strings to be aligned.

2. 2
.1.Cost Function The proposed method uses three types of point edit operations.The snapping operation ( () ,  () → ) moves two points, one from each string, to their average position , as shown in Figure 4a.The removing operation ( () → ) or ( () → ) removes a point of one string, as shown in Figure 4b.Because the removing operations remove the points of the boundaries, only point pairs for snapping operations remain and are used for subsequent edit operations.Thus,  is the point of the average position of two points for the latest snapping operation.

Figure 4 .
Figure 4. Cost functions of string matching in this study; (a) snapping operation, (b) removing operation.

Figure 5 .
Figure 5. Pseudo code for string matching in this study.

Figure 6 .
Figure 6.Local map transformation of corresponding sub-point string pair; (a) two corresponding point pairs are found after transformation, (b,c) no additional corresponding point pairs are found and there remain spatial inconsistences after transformation.

Figure 7 .
Figure 7. Map alignment method for geometric difference case in this study; (a) linear difference, (b) triangle difference and (c) polygon difference.

Figure 8 .
Figure 8. Polygon difference with intersection points of a transformed sub-point string pair.

Figure 9 .
Figure 9. Two adjacent cadastral maps (map A and map B) for the experiments.

Figure 10 .
Figure 10.A histogram and boxplot analysis of manually chosen corresponding point pairs.

Figure 11 .
Figure 11.Accuracy evaluations of the proposed method according to the two parameters  and : (a) precision, (b) recall and (c) the F-measure.

Figure 12 .
Figure 12.Comparison of corresponding point pairs found by the proposed method and the distance threshold method; (a) both methods find correct pairs, (b,c) the proposed method finds correct pairs of disconnected line segments between two maps meanwhile the distance threshold methods finds erroneous pairs.

Figure 13 .
Figure 13.Map alignment to remove gap and overlap of Figure 12; (a) alignment result of Figure 12a, (b) alignment result of Figure 12b, (c) alignment result of Figure 12c.

The Proposed Method with Th: 8.89 m,
:0.25 and