A Multi-Scale Residential Areas Matching Method Using Relevance Vector Machine and Active Learning

.


Introduction
Environmental protection, land resource management, emergency relief, and the construction of smart cities require reliable, applicable, and timely geographical spatial data for support.Therefore, an important issue that needs to be resolved in geographical information systems (GIS) is the immediate updating and integration of geographical spatial data [1,2].The purpose of multi-scale data updating is to ensure that the different scales of spatial data reflect the latest situation.However, comprehensive updating that uses a large-scale data generalization method to produce small-scale data requires a great amount of work, and ensuring the consistency of multi-scale data is difficult.Multi-scale feature-cascade updating is a popular method for quickly updating spatial data in academic research [3][4][5].This method uses incremental information from large-scale data to update small-scale data, and updates only the changed features.Therefore, it involves less work than other methods and is able to maintain consistency.The feature-cascade relationship forms the basis for multi-scale cascade updating.The establishment of this relationship relies on entity matching technology for recognition of different scale entities with the same name.Therefore, entity matching is the key technology for updating spatial data.Geospatial data fusion aims to automatically generate data that has higher degree of accuracy and richer attribute information than multiple data sources [6].To extract and merge information from different data sources, the corresponding relationships of different entities in the database must be identified, which relies on the technology of matching spatial entities [7].Considering that multi-source spatial data invariably have different scales, the multi-scale features should be taken into account in the matching method.
Considering the current rapid urban and rural development, residential areas are among the fastest changing geographic objects and, thus, are an important data type that requires updating.The research objective of the present study is for multi-scale residential areas matching.A large amount of prior research has been conducted in matching of residential areas.The current methods can be classified as being a similarity-based matching method, a probability-based matching method, or an error-based matching method.
The similarity-based matching method for planar spatial entities realizes matching by analyzing the degree of overlap of buffer areas [8], distance [9,10], shape [11][12][13], topology, direction, semantics [14], and other shared characteristics.Similarity matching integrates multiple similar characteristics [15] with an optimal combination while considering many-to-many matching relationships [16].Ai [11] proposed using Fourier shape descriptors to measure the shape similarity of residential areas to realize shape analysis and to match multi-scale residential areas.An and Sun [12] proposed describing the geometric shape step-by-step from an overall blueprint to specify details, and apply the multi-stage chord function and the center distance function to establish a common geometric similarity measurement model for multi-scale spatial data.These two shape similarity measurement methods are applicable to the matching of a single planar entity.Huh [17] proposed detecting the corresponding nodes of planar entities in multi-source data and conducting matching in terms of object outlines.Birgit [18] proposed extracting the skeleton of planar entities and conducting the matching of a multi-scale river network by calculating the similarity degree of skeleton characteristics.Kim [19] proposed an object-matching method based on the geographic environment.This method measures the geographic environment's similarity of the space between objects and the selected geographic landmarks to realize spatial data matching within different coordinate systems.Thus, this method is dependent on the selection of landmarks.Zhang and Ai [20] suggested the use of relaxation tag technology in combination with the overall information architecture to establish a compatibility matrix.By the constant updating of compatibility of candidate matching objects, the compatibility matrix is converged and multi-scale planar entity matching is achieved.
Walter and Fritsch [21] utilized a matching method based on probability statistics, which first selects the candidate matching set and then utilizes regional statistics to determine the threshold value.Finally, it applies the merit function to finalize the matching results.Tong [22] studied the multi-characteristic matching method based on probability, and discovered that calculating the matching probability of the entity to determine the matching entities avoids the selection of the exact threshold for the matching index.
The error-based matching method is applicable to the matching of multi-scale geographic data under strict cartographic specifications.Safra [23] proposed a spatial data matching method based on location.This method compares the distance between spatial objects with the tolerance error of the map to determine the matching relationship.To solve the difficulty of determining the threshold and many-to-many relationships of multi-scale spatial data matching, Liu [24] suggested a planar entity matching algorithm based on mean square error and adjacent entity relationships.
In recent years, to improve the automation of spatial entity matching, researchers have proposed a matching method based on pattern recognition.Zhang [25] utilized a multi-scale residential areas matching method based on pattern classification, and Wang [26] proposed a multi-represented feature matching method based on a back-propagation neural network.
Similarity-based matching methods coincide with human spatial cognition and are the most widely used method types in multi-scale residential areas matching studies.These methods present difficulties, including the following: (1) owing to different scales, production units, and production times, the residential areas in a multi-scale spatial database are large in number, great in disparity, and complicated in matching relationships, making it difficult to conduct similarity measurement; and (2) it is difficult to determine the similar weights and threshold values, and manual intervention is largely required.For the first problem, we aim to improve the characteristic similarity measurement method for multi-scale residential areas.The study provides a solution for similarity measurements of matching objects that are not one-to-one.For the second problem, machine learning is a more effective method to solve the problem of threshold and weight; however, the existing machine learning-based matching methods require a large number of labeled training samples, and it is difficult to identify multiple matching relationships [25,26].The study presents a matching method that is based on the RVM algorithm and active learning, which avoid manually setting characteristic weights and matching thresholds in the match methods.Moreover, the work associated with sample labeling can be reduced.In Section 2, we describe our methodology, while experimental results are analyzed and discussed in Section 3. The final section includes conclusions and future prospects.

Multi-Scale Residential Areas Matching Relationship
Matching of multi-scale geospatial data is difficult because of the comprehensive impact of cartography, errors yielded during data production, and the alteration of geographic entities themselves.Following is an introduction to the matching relations of different scale spatial entities after cartographic generalization.To begin with, we assume that the multi-scale spatial entities have the same spatial coordinates, and the small-scale maps are the generalizations from the large-scale maps.Thus, there are differences in the spatial expression of different scaled entities.For planar residential areas, the matching relations of large and small scales can be classified as: 1:1 (Figure 1a)-in both large and small scales the entities with the same name have a 1:1 matching relationship.• 1:0 (Figure 1b)-where some entities occur in large-scale maps, but are invisible in small-scales maps because during map generalization some small entities are omitted.

•
m:1 (Figure 1c)-a many-to-one relationship for entities between large-scale maps and small-scale maps, where in the process of map generalization large-scale objects are combined to form small-scale objects.1d)-a many-to-many relationship for entities between large-scale maps and small-scale maps, where during the map generalization process a stylization operation is conducted to reflect the shapes and spatial distribution features of residential areas.
In addition to the differences reflected in number, operations such as shape simplification and displacement are also conducted during map generalization.In this way, different entities with the same name will differ in shape and position in different scale maps.
ISPRS Int.J. Geo-Inf.2017, 6, 70 3 of 20 times, the residential areas in a multi-scale spatial database are large in number, great in disparity, and complicated in matching relationships, making it difficult to conduct similarity measurement; and (2) it is difficult to determine the similar weights and threshold values, and manual intervention is largely required.For the first problem, we aim to improve the characteristic similarity measurement method for multi-scale residential areas.The study provides a solution for similarity measurements of matching objects that are not one-to-one.For the second problem, machine learning is a more effective method to solve the problem of threshold and weight; however, the existing machine learning-based matching methods require a large number of labeled training samples, and it is difficult to identify multiple matching relationships [25,26].The study presents a matching method that is based on the RVM algorithm and active learning, which avoid manually setting characteristic weights and matching thresholds in the match methods.Moreover, the work associated with sample labeling can be reduced.In Section 2, we describe our methodology, while experimental results are analyzed and discussed in Section 3. The final section includes conclusions and future prospects.

Multi-Scale Residential Areas Matching Relationship
Matching of multi-scale geospatial data is difficult because of the comprehensive impact of cartography, errors yielded during data production, and the alteration of geographic entities themselves.Following is an introduction to the matching relations of different scale spatial entities after cartographic generalization.To begin with, we assume that the multi-scale spatial entities have the same spatial coordinates, and the small-scale maps are the generalizations from the large-scale maps.Thus, there are differences in the spatial expression of different scaled entities.For planar residential areas, the matching relations of large and small scales can be classified as: 1a)-in both large and small scales the entities with the same name have a 1:1 matching relationship. 1:0 (Figure 1b)-where some entities occur in large-scale maps, but are invisible in small-scales maps because during map generalization some small entities are omitted. m:1 (Figure 1c)-a many-to-one relationship for entities between large-scale maps and small-scale maps, where in the process of map generalization large-scale objects are combined to form small-scale objects. m:n (m > n) (Figure 1d)-a many-to-many relationship for entities between large-scale maps and small-scale maps, where during the map generalization process a stylization operation is conducted to reflect the shapes and spatial distribution features of residential areas.
In addition to the differences reflected in number, operations such as shape simplification and displacement are also conducted during map generalization.In this way, different entities with the same name will differ in shape and position in different scale maps.

Overall Design
The present study aims to constitute a multi-scale residential areas matching method compatible with data characteristics, by introducing the concept of categorization from pattern recognition.In addition, the study aims to place selected samples into classification models by machine learning designed for application to object matching within identical scenarios.The overall framework is shown in Figure 2, and is as follows: (i).Selecting training samples of both matched and unmatched objects via human-machine cooperation.(ii).For candidate matching objects, converting matching relations that do not correspond one-to-one into one-to-one relations for the convenience of similarity computation by data processing.(iii).Computing characteristic similarities of the sample data.(iv).Applying a relevance vector machine (RVM) algorithm to characteristic similarities and matching results to generate classifiers.(v).Inputting residential data at various scales after data processing into (iv) classifiers to yield classification results.(vi).Multi-matching data classified as matched to obtain the final matching results.

Overall Design
The present study aims to constitute a multi-scale residential areas matching method compatible with data characteristics, by introducing the concept of categorization from pattern recognition.In addition, the study aims to place selected samples into classification models by machine learning designed for application to object matching within identical scenarios.The overall framework is shown in Figure 2, and is as follows: (i) Selecting training samples of both matched and unmatched objects via humanmachine cooperation.(ii) For candidate matching objects, converting matching relations that do not correspond one-to-one into one-to-one relations for the convenience of similarity computation by data processing.(iii) Computing characteristic similarities of the sample data.(iv) Applying a relevance vector machine (RVM) algorithm to characteristic similarities and matching results to generate classifiers.(v) Inputting residential data at various scales after data processing into (iv) classifiers to yield classification results.(vi) Multi-matching data classified as matched to obtain the final matching results.

Overall Design
The present study aims to constitute a multi-scale residential areas matching method compatible with data characteristics, by introducing the concept of categorization from pattern recognition.In addition, the study aims to place selected samples into classification models by machine learning designed for application to object matching within identical scenarios.The overall framework is shown in Figure 2, and is as follows: (i).Selecting training samples of both matched and unmatched objects via human-machine cooperation.(ii).For candidate matching objects, converting matching relations that do not correspond one-to-one into one-to-one relations for the convenience of similarity computation by data processing.(iii).Computing characteristic similarities of the sample data.(iv).Applying a relevance vector machine (RVM) algorithm to characteristic similarities and matching results to generate classifiers.(v).Inputting residential data at various scales after data processing into (iv) classifiers to yield classification results.(vi).Multi-matching data classified as matched to obtain the final matching results.

Object Merging
First, we apply the buffer to search for candidate matching objects.To contain the many-to-many matching relationship, large-scale and small-scale objects must undergo a forward and reverse bidirectional iterative search [16,25].Object merging is the most efficient way to convert one-to-many and many-to-many relations into one-to-one matching relations.Considering the complexity of the reduction and one-to-one matching relations that do not need to be converted, the present study conducts the data processing in a way that merges without simplifying.
Our aim is to maintain the outer contour of residential areas during merging because several merging methods are feasible for objects with large-scales owing to the variety of cartographic generalization methods.Residential areas that meet each other are merged by removing the joining edges, while residential areas discrete from each other are merged by generating and processing Delaunay triangulation [27], as shown in Figure 3.The exact approach is as follows: (i) To perform node encryption by inserting nodes into the contour of residential areas elements, and to construct Delaunay triangulation, thus categorizing triangles outside and inside residential areas elements into external and internal triangles.(ii) To generate convex hull from pre-merging residential areas and remove the following three types of external triangles that have joining edges with a convex hull: (1) all three vertices are located inside an identical residential areas; (2) vertices are located in two residential areas with one interior angle measuring more than θ (θ is obtuse) and one edge sharing the same edge with either of the two residential areas; and (3) vertices are located in two residential areas with an edge overlapping with the contour of residential areas and the altitude of this edge is larger than the threshold.This rule is applied in the measurement of the distance between residential areas.(iii) To apply a recursive algorithm to search for other external triangles that have joining edges with removed triangles and applying rule ( 2) and (3) in the previous step to remove suitable triangles.(iv) To merge triangles remaining by removing joining edges.
After the merging processing, the many-to-one and many-to-many relations in residential areas of various scales are converted into one-to-one relations to be matched.

Object Merging
First, we apply the buffer to search for candidate matching objects.To contain the many-to-many matching relationship, large-scale and small-scale objects must undergo a forward and reverse bidirectional iterative search [16,25].Object merging is the most efficient way to convert one-to-many and many-to-many relations into one-to-one matching relations.Considering the complexity of the reduction and one-to-one matching relations that do not need to be converted, the present study conducts the data processing in a way that merges without simplifying.
Our aim is to maintain the outer contour of residential areas during merging because several merging methods are feasible for objects with large-scales owing to the variety of cartographic generalization methods.Residential areas that meet each other are merged by removing the joining edges, while residential areas discrete from each other are merged by generating and processing Delaunay triangulation [27], as shown in Figure 3.The exact approach is as follows: (i).To perform node encryption by inserting nodes into the contour of residential areas elements, and to construct Delaunay triangulation, thus categorizing triangles outside and inside residential areas elements into external and internal triangles.(ii).To generate convex hull from pre-merging residential areas and remove the following three types of external triangles that have joining edges with a convex hull: (1) all three vertices are located inside an identical residential areas; (2) vertices are located in two residential areas with one interior angle measuring more than θ (θ is obtuse) and one edge sharing the same edge with either of the two residential areas; and (3) vertices are located in two residential areas with an edge overlapping with the contour of residential areas and the altitude of this edge is larger than the threshold.This rule is applied in the measurement of the distance between residential areas.(iii).To apply a recursive algorithm to search for other external triangles that have joining edges with removed triangles and applying rule ( 2) and (3) in the previous step to remove suitable triangles.(iv).To merge triangles remaining by removing joining edges.
After the merging processing, the many-to-one and many-to-many relations in residential areas of various scales are converted into one-to-one relations to be matched.

Similarity Computation
Numerous one-to-many and many-to-many relations exist within residential areas of various scales that require data processing to convert them into one-to-one relations suitable for characteristic similarity computation.In the present study, the method described in Section 2.3 is used to transfer these relationships into one-to-one relations to facilitate the calculation of feature similarity.Based on the features of residential areas with human spatial cognition, five characteristics, i.e., position, area, shape, orientation, and surroundings are utilized to evaluate the similarity for matching relations.The value of each characteristic similarity index is between 0 and 1.Although there are problems related to incomplete attribute information and inconsistent criteria of data, this study adopts a method independent of semantic information by selecting spatial characteristics, which are less affected by multi-scale representation and map generalization than characteristics such as perimeter and overlapping areas are.

Similarity Computation
Numerous one-to-many and many-to-many relations exist within residential areas of various scales that require data processing to convert them into one-to-one relations suitable for characteristic similarity computation.In the present study, the method described in Section 2.3 is used to transfer these relationships into one-to-one relations to facilitate the calculation of feature similarity.Based on the features of residential areas with human spatial cognition, five characteristics, i.e., position, area, shape, orientation, and surroundings are utilized to evaluate the similarity for matching relations.The value of each characteristic similarity index is between 0 and 1.Although there are problems related to incomplete attribute information and inconsistent criteria of data, this study adopts a method independent of semantic information by selecting spatial characteristics, which are less affected by multi-scale representation and map generalization than characteristics such as perimeter and overlapping areas are.

Position Similarity Index
The closeness of spatial entities indicates a high similarity of position.For a geographical area entity, the centroid might best reflect the characteristics of its location.The present study applies Equation (1) below to measure position similarity by calculating the ratio of the Euclidean distance of the centroid of the two residential groups and the maximum distance D. In Equation ( 1), (x 1 , y 1 ) and (x 2 , y 2 ) are the centroid coordinates of the two entities to be matched.The maximum distance D of the matched entities is determined by statistical analysis of the centroid distance of positive and negative matching samples.In this study, the value of D is double the centroid distance of matching samples.

Area Similarity Index
Area is an important characteristic that reveals the size of a geographical entity.Although differences in area exist in area entities of various scales because of map generalizations and other factors, maintaining the characteristics of size of a geographic entity is one principle of map generalization.Residential areas that have matching relations would have similarities in area.This study applies Equation ( 2) below to measure the area similarity of residential areas by calculating the ratio of areas of the residential areas to be matched, where A and B refer to the residential areas to be matched.

Shape Similarity Index
The quantitative description of shape is an enigma in the field of GIS and computers [28].Taking into account the shapes of residential entities, the present study uses the shape index (compactness) proposed by Peter to measure it [29].Compactness is affected by the size and boundary of the object [30] and is calculated as shown in Equation (3), where p represents the area entity.The calculation method of shape similarity is proposed in Equation ( 4), where A and B represent objects to be matched in large and small scales, respectively.

Orientation Similarity Index
The orientation similarity of residential areas refers to the overall extension direction.Commonly used methods include the long side method, the wall-based statistical method, and the smallest minimum bounding rectangle (SMBR) method [30].The SMBR method uses the long axis direction of the SMBR of the entities to be matched as the orientation of residential areas.The angle difference of the long axis direction is the angle difference of the two area entities.This method cannot recognize the orientation of two area entities that rotate their orientations 180 degrees.The present study proposes an improved method for the SMBR method.If the residential areas to be matched are A and B, the orientation similarity of the residential areas is calculated according to Equation (5), where θ A and θ B are the angles of the SMBR long axes of A and B and the y-axis, respectively; the value intervals are [0, π/2]; and F is the Boolean function that determines whether the residential areas rotate 180 degrees.
If the shape similarity index of the two objects is low, then the F in Equation ( 5) is 0. If the shape similarity index is high, then it needs to be further processed using the following method: the minimum angles a A and a B are calculated by counterclockwise rotation of the to-be-matched object A and B and their SMBR when the long axes of the SMBR and the y-axis are parallel.When |a A − a B | > π/2, object A and B are counterclockwise rotated for a A and a B , and the new objects A' and B' are obtained.When |a A − a B | > π/2, object A is clockwise rotated for a A to obtain A', and object B is counterclockwise rotated for π − a B to obtain B'.As Figure 4 shows, after rotating Figure 4a, Figure 4b is obtained, and after rotating Figure 4c, Figure 4d is achieved.The bottom right corner of the SMBRs of object A' and B' are used as the points of origin.A coordinate system is established with the short axis as direction x and the long axis as direction y.The SMBRs of A' and B' are equally divided to m rectangles along the long side, and n rectangles along the short side.The ratio between the intersection area of each rectangle and object, and the area of corresponding rectangle is calculated, the value interval of which is [0,1].Two histograms are generated using each ratio in the x-axis positive direction and the y-axis positive direction.Figure 5a,b are area histograms of Figures 4b and 5c,d are area histograms of Figure 4d.The horizontal axis of the histogram is a serial number of rectangles, and the vertical axis is the ratio between the intersection area of the rectangle and the object and the area of the corresponding rectangle.
The procedures for recognizing direction via a histogram are as follows: (1) Smooth the histograms by using Equation ( 6) with an interpolation method, where x in Equation ( 6) is the horizontal coordinate value of the histogram, f(x) is the vertical coordinate value, step is the step length, and Z is the value of histograms after smoothing.The x-axis direction of the area histogram after smoothing has h units, and the y-axis direction of the area histogram has j units.(2) Compare the average values of the histograms and each unit value, excluding the histogram that has the smallest unit value difference.The histogram represents a rectangle that does not need to be compared.The F value is 1. (3) Compare the ith unit of a histogram with the (h-i)th unit of another histogram.When their difference is smaller than the given threshold value, these two values are regarded as the same.After comparing h groups, the number of the same units reaches u (here u/h > 0.9), which means the two histograms are opposite.(4) When there is the opposite corresponding histogram, the F value in Equation ( 5) is 0; otherwise it is 1.
ISPRS Int.J. Geo-Inf.2017, 6, 70 7 of 20 If the shape similarity index of the two objects is low, then the F in Equation ( 5) is 0. If the shape similarity index is high, then it needs to be further processed using the following method: the minimum angles аA and аB are calculated by counterclockwise rotation of the to-be-matched object A and B and their SMBR when the long axes of the SMBR and the y-axis are parallel.When |аA − аB| < π /2, object A and B are counterclockwise rotated for аA and аB, and the new objects A' and B' are obtained.When |аA − аB| > π /2, object A is clockwise rotated for аA to obtain A', and object B is counterclockwise rotated for π − аB to obtain B'.As Figure 4 shows, after rotating Figure 4a, Figure 4b is obtained, and after rotating Figure 4c, Figure 4d is achieved.The bottom right corner of the SMBRs of object A' and B' are used as the points of origin.A coordinate system is established with the short axis as direction x and the long axis as direction y.The SMBRs of A' and B' are equally divided to m rectangles along the long side, and n rectangles along the short side.The ratio between the intersection area of each rectangle and object, and the area of corresponding rectangle is calculated, the value interval of which is [0,1].Two histograms are generated using each ratio in the x-axis positive direction and the y-axis positive direction.Figure 5a,b are area histograms of Figures 4b and 5c,d are area histograms of Figure 4d.The horizontal axis of the histogram is a serial number of rectangles, and the vertical axis is the ratio between the intersection area of the rectangle and the object and the area of the corresponding rectangle.
The procedures for recognizing direction via a histogram are as follows: (1) Smooth the histograms by using Equation ( 6) with an interpolation method, where x in Equation ( 6) is the horizontal coordinate value of the histogram, f(x) is the vertical coordinate value, step is the step length, and Z is the value of histograms after smoothing.The x-axis direction of the area histogram after smoothing has h units, and the y-axis direction of the area histogram has j units.(2) Compare the average values of the histograms and each unit value, excluding the histogram that has the smallest unit value difference.The histogram represents a rectangle that does not need to be compared.The F value is 1. (3) Compare the ith unit of a histogram with the (h-i)th unit of another histogram.When their difference is smaller than the given threshold value, these two values are regarded as the same.After comparing h groups, the number of the same units reaches u (here u/h > 0.9), which means the two histograms are opposite.(4) When there is the opposite corresponding histogram, the F value in Equation ( 5) is 0; otherwise it is 1. )

Surroundings Similarity Index
Commonly on a large-scale map, constructions in the same area have the same shape, with mistakes being made if only the position and geometric characteristic are used for matching.According to the spatial cognition habit, artificial matching of the information in surrounding areas is frequently combined to identify entities.We determine the measurement of surroundings similarity by measuring the characteristics of surrounding entities in residential areas, as shown in Figure 6, the mass center of the entity to be matched is used as a center point to construct a 2 × 2 square grid, which is parallel to the coordinate axis.The side length of the grid is set at twice the length of the long side of the SMBR element of the small-scale residential areas to be matched.G1, G2, G3, and G4 represent the upper left, upper right, bottom left, and bottom right of the grid, respectively, as shown in Figure 6.The surroundings similarity of each grid area is calculated according to Equation (7), where Area(SMi) and Area(LAi) are the surrounding residential areas located in the area of the grid in the small-scale and large-scale data, respectively.When the value of Area(SMi) and the value of Area(LAi) are not 0, the value of Sim(Gi) will be the area ratio.When the value of Area(SMi) and the value of Area(LAi) are 0, the value of Sgrid(Gi) will be 1.With different scales, the representations of geographic entities also show differences, so some very small residential areas in small-scale that would be presented in large-scale data might not be observed.When the value of Area(SMi) is 0 and the value of Area(LAi) is a small number (less than the threshold ε ), the value of Sgrid(Gi) will be 1.When the value of Area(SMi) is not 0 and the value of Area(LAi) is 0, the value of Sgrid(Gi) will be 0. The total surroundings similarity is calculated according to Equation (8). ))

Surroundings Similarity Index
Commonly on a large-scale map, constructions in the same area have the same shape, with mistakes being made if only the position and geometric characteristic are used for matching.According to the spatial cognition habit, artificial matching of the information in surrounding areas is frequently combined to identify entities.We determine the measurement of surroundings similarity by measuring the characteristics of surrounding entities in residential areas, as shown in Figure 6, the mass center of the entity to be matched is used as a center point to construct a 2 × 2 square grid, which is parallel to the coordinate axis.The side length of the grid is set at twice the length of the long side of the SMBR element of the small-scale residential areas to be matched.G 1 , G 2 , G 3 , and G 4 represent the upper left, upper right, bottom left, and bottom right of the grid, respectively, as shown in Figure 6.The surroundings similarity of each grid area is calculated according to Equation ( 7), where Area(SM i ) and Area(LA i ) are the surrounding residential areas located in the area of the grid in the small-scale and large-scale data, respectively.When the value of Area(SM i ) and the value of Area(LA i ) are not 0, the value of Sim(G i ) will be the area ratio.When the value of Area(SM i ) and the value of Area(LA i ) are 0, the value of S grid (G i ) will be 1.With different scales, the representations of geographic entities also show differences, so some very small residential areas in small-scale that would be presented in large-scale data might not be observed.When the value of Area(SM i ) is 0 and the value of Area(LA i ) is a small number (less than the threshold ε), the value of S grid (G i ) will be 1.When the value of Area(SM i ) is not 0 and the value of Area(LA i ) is 0, the value of S grid (G i ) will be 0. The total surroundings similarity is calculated according to Equation (8).

Surroundings Similarity Index
Commonly on a large-scale map, constructions in the same area have the same shape, with mistakes being made if only the position and geometric characteristic are used for matching.According to the spatial cognition habit, artificial matching of the information in surrounding areas is frequently combined to identify entities.We determine the measurement of surroundings similarity by measuring the characteristics of surrounding entities in residential areas, as shown in Figure 6, the mass center of the entity to be matched is used as a center point to construct a 2 × 2 square grid, which is parallel to the coordinate axis.The side length of the grid is set at twice the length of the long side of the SMBR element of the small-scale residential areas to be matched.G1, G2, G3, and G4 represent the upper left, upper right, bottom left, and bottom right of the grid, respectively, as shown in Figure 6.The surroundings similarity of each grid area is calculated according to Equation (7), where Area(SMi) and Area(LAi) are the surrounding residential areas located in the area of the grid in the small-scale and large-scale data, respectively.When the value of Area(SMi) and the value of Area(LAi) are not 0, the value of Sim(Gi) will be the area ratio.When the value of Area(SMi) and the value of Area(LAi) are 0, the value of Sgrid(Gi) will be 1.With different scales, the representations of geographic entities also show differences, so some very small residential areas in small-scale that would be presented in large-scale data might not be observed.When the value of Area(SMi) is 0 and the value of Area(LAi) is a small number (less than the threshold ε ), the value of Sgrid(Gi) will be 1.When the value of Area(SMi) is not 0 and the value of Area(LAi) is 0, the value of Sgrid(Gi) will be 0. The total surroundings similarity is calculated according to Equation (8).

Relevance Vector Machine
After calculating each feature similarity of the matching candidates, the standard matching approach is to obtain comprehensive similarity by weighing characteristic similarity and to select matching results using thresholds [13,15].The weighing and threshold setting processes within this approach require manual intervention, which makes it cumbersome for adoption in different data fields.In the present study, a machine learning approach is designed to realize the matching of spatial entities.
The RVM [31] is a new type of machine learning approach that has been developed in recent years.It is similar to the support vector machine (SVM), as it is especially suitable for binary classification of small samples.
In the present study, the input vector of the RVM is defined as being five-dimensional, including the similarities of five characteristics (i.e., position, area, shape, orientation, and surroundings).Classification is defined as "match" or "mismatch".The output of the RVM can be utilized as an assessment of the reliability of the classification results.The output function of RVM is shown in Equation ( 9) [32], where y ∈ The value of z in Equation ( 9) is calculated as shown in Equation (10), where Q(x, x n ) is the kernel function and, w n is the weight of the model.
The estimates of the dataset obtained from the likelihood estimator is shown as Equation ( 11), where t = (t In the Bayesian framework, the weights W in Equation ( 11) can be obtained with the maximum likelihood estimation method.However, to avoid over-learning, RVM defines a Gaussian prior probability distribution for each weight to constrain the parameters (Equation ( 12)), where α in Equation ( 12) is a N + 1-dimensional hyper parameter.Although the posterior probability of the weights cannot be calculated, it can be approximated by the Laplacian theory.The maximum possible weight W MP is calculated for the currently fixed α value.Because p(w|t, α) ∝ p(t|w)p(w|α) , it can be translated into the maximum of Equation (13).
1 2 w T Aw is a constant when the maximum possible W MP is obtained.When the relationship between two objects is matched, the value of y tends to 1 so that the result of Equation ( 13) is maximum.When the relationship between two objects is mismatched, the value of y tends to 0 so that the result of Equation ( 13) is maximum.Therefore, the reliability of matching between objects is higher when the output values are closer to 1, and the reliability of mismatching between objects is higher when the output values are closer to 0.

Matching Strategy
The following are some of the important steps in the matching strategy.

•
Data preprocessing: The first step of data preprocessing is to identify the candidate elements for matching.A buffer area is generated from small-scale residential areas, and the large-scale features that intersect with this buffer area are identified as candidate features.Second, the candidate elements are determined by recognizing pairs of multiple match relations and using a bidirectional search.Third, after obtaining the candidate features, since there are many 1:n or m:n match relations in multi-scale residential areas, permutation and combination are used to generate a combination of candidate matching objects to recognize the match relation.Since the combination of candidate-matching objects is determined according to the number of elements, we set as mismatches the objects that are impossible to be matched based on the matching types

Matching Strategy
The following are some of the important steps in the matching strategy.
• Data preprocessing: The first step of data preprocessing is to identify the candidate elements for matching.A buffer area is generated from small-scale residential areas, and the large-scale features that intersect with this buffer area are identified as candidate features.Second, the candidate elements are determined by recognizing pairs of multiple match relations and using a bidirectional search.Third, after obtaining the candidate features, since there are many 1:n or m:n match relations in multi-scale residential areas, permutation and combination are used to generate a combination of candidate matching objects to recognize the match relation.Since the combination of candidate-matching objects is determined according to the number of elements, we set as mismatches the objects that are impossible to be matched based on the matching types of multi-scale residential areas (e.g., the object whose quantitative relationship between the large-scale object and small-scale object is 1:n or m:n (m < n)) and deleted these objects from the candidate-matching objects.For ease of similarity measuring, we use the method of entity merging described in Section 2.3 to combine the multiple entities into a single entity.

•
Sample selecting: For the training samples, we adopt the method of human-computer cooperation, generating a buffer area with the use of source objects to search for the candidate-matching objects.Manual work is used for identifying and labeling matches or mismatches between candidate elements and source elements.samples are obtained by searching for candidate-matching objects in the buffer area generated from source elements.

•
Multiple matching relation processing: There might be some cases of multiple matching based on the output of the classifier, as Figure 8 shows.Entities A and B and A and BC are all classified as matches.However, determining the final match relation depends on the reliability output of the RVM.To be specific, we determine the matching pairs containing the same elements in the category of the match, after being selected by the classifier.Utilizing Equation ( 14), the set with maximum reliability is selected as the final match result.
ISPRS Int.J. Geo-Inf.2017, 6, 70 11 of 20 of multi-scale residential areas (e.g., the object whose quantitative relationship between the large-scale object and small-scale object is 1:n or m:n (m < n)) and deleted these objects from the candidate-matching objects.For ease of similarity measuring, we use the method of entity merging described in Section 2.3 to combine the multiple entities into a single entity.

•
Sample selecting: For the training samples, we adopt the method of human-computer cooperation, generating a buffer area with the use of source objects to search for the candidate-matching objects.Manual work is used for identifying and labeling matches or mismatches between candidate elements and source elements.Unlabeled samples are obtained by searching for candidate-matching objects in the buffer area generated from source elements.

•
Multiple matching relation processing: There might be some cases of multiple matching based on the output of the classifier, as Figure 8 shows.Entities A and B and A and BC are all classified as matches.However, determining the final match relation depends on the reliability output of the RVM.To be specific, we determine the matching pairs containing the same elements in the category of the match, after being selected by the classifier.Utilizing Equation ( 14), the set with maximum reliability is selected as the final match result.

Experimental Design
To verify the effectiveness of the proposed method, we selected the residential areas of 1:5000 and 1:25,000 in the Tianhe District, Guangzhou, for the matching experiment (Figure 9).The 1:5000 and 1:25,000 datasets contain 4375 and 1023 entities, respectively.In this experiment, ArcGIS Engine10.0was secondary developed by Visual Studio 2010 to obtain the characteristic similarity of the spatial entities, together with RVM_Matlab toolbox for the classifications.
In the experiment, a buffer area is generated with the small-scale residential area as the source element and a dataset consisting of 503 recordings is constructed by program automatic selection, which accounts for 70% of the training set, and 30% of the test set.From this we manually selected 76 labeled classification samples for constructing an initial classifier.In the labeled samples, 21 pairs are of 1:1 matching relations, 28 pairs are of 1:m matching relations, seven pairs are of m:n matching relations, and 20 pairs are mismatched.The active learning approach is adopted to continuously optimize the classifier, and the iteration number of the active learning is set at 10. Table 1 shows the initial training samples, where SOURCEID and TARGETID are the element serial identifier code of the small-scale residential area and large-scale residential areas, respectively.Column headings of LOCAL, ORIEN, AREA, SHAPE, and SUR represent the five characteristic similarities, i.e., position, orientation, area, shape, and surroundings, respectively.The column heading RESULT shows the classification result of manual recognition, in which 1 represents match and 0 represents mismatch.The uncertainty interval of classification confidence is set at [0.1, 0.9].This uncertainty interval demonstrates that the results of classification are largely in doubt and the rate of wrong classification increases for values in an interval.When the result of the classification training output of unlabeled samples is within the doubting interval, 10 samples with the lowest reliability are selected, and it is manually determined as either a "match" or "mismatch".The manually judged classification result

Experimental Design
To verify the effectiveness of the proposed method, we selected the residential areas of 1:5000 and 1:25,000 in the Tianhe District, Guangzhou, for the matching experiment (Figure 9).The 1:5000 and 1:25,000 datasets contain 4375 and 1023 entities, respectively.In this experiment, ArcGIS Engine10.0was secondary developed by Visual Studio 2010 to obtain the characteristic similarity of the spatial entities, together with RVM_Matlab toolbox for the classifications.
In the experiment, a buffer area is generated with the small-scale residential area as the source element and a dataset consisting of 503 recordings is constructed by program automatic selection, which accounts for 70% of the training set, and 30% of the test set.From this we manually selected 76 labeled classification samples for constructing an initial classifier.In the labeled samples, 21 pairs are of 1:1 matching relations, 28 pairs are of 1:m matching relations, seven pairs are of m:n matching relations, and 20 pairs are mismatched.The active learning approach is adopted to continuously optimize the classifier, and the iteration number of the active learning is set at 10. Table 1 shows the initial training samples, where SOURCEID and TARGETID are the element serial identifier code of the small-scale residential area and large-scale residential areas, respectively.Column headings of LOCAL, ORIEN, AREA, SHAPE, and SUR represent the five characteristic similarities, i.e., position, orientation, area, shape, and surroundings, respectively.The column heading RESULT shows the classification result of manual recognition, in which 1 represents match and 0 represents mismatch.The uncertainty interval of classification confidence is set at [0.1, 0.9].This uncertainty interval demonstrates that the results of classification are largely in doubt and the rate of wrong classification increases for values in an interval.When the result of the classification training output of unlabeled samples is within the doubting interval, 10 samples with the lowest reliability are selected, and it is manually determined as either a "match" or "mismatch".The manually judged classification result is added to the dataset for retraining and a new classifier is formed.A test is then implemented with the test set.This procedure is until the classification result is convergent, and the final classifier is obtained.Following this, the characteristic similarity of candidate matching pair selected in the buffer is inputted into the classifier to obtain the output of binary classification result.Eventually, the final matching object is determined according to the classification reliability.The evaluation of the experimental results is performed by comparing the results of manual matching by professional cartographers with the automatically matched results.The objects that are labeled as matched by both manual matching and automatic matching are TP; those that are matched by manual matching while not recognized by automatic matching are NP; and those that are identified as matched by automatic matching but unrecognized by manual matching are FP.The indicators of evaluation are precision and the recall rate.Equation ( 15) is utilized to calculate precision, and Equation ( 16) is used to obtain the recall rate.The parameter F1 value was introduced to measure the harmonic mean of precision and recall, with Equation ( 17) used to calculate F1.The evaluation of the experimental results is performed by comparing the results of manual matching by professional cartographers with the automatically matched results.The objects that are labeled as matched by both manual matching and automatic matching are TP; those that are matched by manual matching while not recognized by automatic matching are NP; and those that are identified as matched by automatic matching but unrecognized by manual matching are FP.The indicators of evaluation are precision and the recall rate.Equation ( 15) is utilized to calculate precision, and Equation ( 16) is used to obtain the recall rate.The parameter F1 value was introduced to measure the harmonic mean of precision and recall, with Equation ( 17) used to calculate F1.

Merging of Residential Areas
Merging tests of selected matching samples are carried out based on the merging approach proposed in the present study and the convex hull-based merging method [26].The results are shown in Table 2 and Figures 10-13.The results reveal: (1) The proposed method obtains the higher average similarity than the convex hull-based merging method; and (2) the position similarity values, area similarity values, and shape similarity values are flatter in the proposed method.Thus, compared with the traditional convex hull-based merging method, the proposed method that transfers the one-to-many and many-to-many relations into one-to-one relations is more suitable for adopting to the similarity measure, with the feature similarity and measurements selected being more applicable for the matching of multi-scale candidates.

Merging of Residential Areas
Merging tests of selected matching samples are carried out based on the merging approach proposed in the present study and the convex hull-based merging method [26].The results are shown in Table 2 and Figures 10-13.The results reveal: (1) The proposed method obtains the higher average similarity than the convex hull-based merging method; and (2) the position similarity values, area similarity values, and shape similarity values are flatter in the proposed method.Thus, compared with the traditional convex hull-based merging method, the proposed method that transfers the one-to-many and many-to-many relations into one-to-one relations is more suitable for adopting to the similarity measure, with the feature similarity and measurements selected being more applicable for the matching of multi-scale candidates.Figure 14 displays examples of merging tests of features.Figure 14a,b are the residential areas to be matched at different scales.They are matched visually, but by using different programs to make automatic identification and using the convex hull-based method to merge Figure 14a into Figure 14c, we observe that they are quite different geometrically.However, using the proposed method to merge Figure 14a and obtain Figure 14d, we discover that the geometric similarity between Figure 14d and Figure 14a is higher than that between Figure 14c and Figure 14a.Figure 14 displays examples of merging tests of features.Figure 14a,b are the residential areas to be matched at different scales.They are matched visually, but by using different programs to make automatic identification and using the convex hull-based method to merge Figure 14a into Figure 14c, we observe that they are quite different geometrically.However, using the proposed method to merge Figure 14a and obtain Figure 14d, we discover that the geometric similarity between Figure 14d and Figure 14a is higher than that between Figure 14c   Figure 14 displays examples of merging tests of features.Figure 14a,b are the residential areas to be matched at different scales.They are matched visually, but by using different programs to make automatic identification and using the convex hull-based method to merge Figure 14a into Figure 14c, we observe that they are quite different geometrically.However, using the proposed method to merge Figure 14a and obtain Figure 14d, we discover that the geometric similarity between Figure 14d and 14a is higher than that between Figure 14c and 14a.
Figure 14 displays examples of merging tests of features.Figure 14a,b are the residential areas to be matched at different scales.They are matched visually, but by using different programs to make automatic identification and using the convex hull-based method to merge Figure 14a into Figure 14c, we observe that they are quite different geometrically.However, using the proposed method to merge Figure 14a and obtain Figure 14d, we discover that the geometric similarity between Figure 14d and Figure 14a is higher than that between Figure 14c and Figure 14a.

Feature Similarity Measure
In terms of the measurement of residential areas' similarity (Figure 15), by calculating the long axis direction of the SMBR of the entities, the direction of the merged entity of No. 2625 and No. 2626 is recognized as being the same as the direction of No. 2629 by the former SMBR measuring method [26], which is similar to the direction of residential area No. 729.However, the results from the proposed measurement method reveal that the direction similarity is 0, which means that No. 2629 and No. 729 are in opposite directions, and the similarity between the merged entity of No. 2625, No. 2626, and No. 729 is 1, which is consistent with the artificial perception.

Feature Similarity Measure
In terms of the measurement of residential areas' similarity (Figure 15), by calculating the long axis direction of the SMBR of the entities, the direction of the merged entity of No. 2625 and No. 2626 is recognized as being the same as the direction of No. 2629 by the former SMBR measuring method [26], which is similar to the direction of residential area No. 729.However, the results from the proposed measurement method reveal that the direction similarity is 0, which means that No. 2629 and No. 729 are in opposite directions, and the similarity between the merged entity of No. 2625, No. 2626, and No. 729 is 1, which is consistent with the artificial perception.

Result Comparison of the Matching Methods
In the process of constructing the classifier using the RVM and active learning, we counted the correct number of classification results in 151 test samples.As shown in Figure 16, with an increase in the number of iterations, the correct number of test samples is gradually increased.After eight iterations, the classification results are stable.Using the same number of labeled samples for passive learning (samples are randomly selected from the sample set), the accuracy of the classification of the test sample is lower than the active learning method.Figure 16 shows that active learning can

Result Comparison of the Matching Methods
In the process of constructing the classifier using the RVM and active learning, we counted the correct number of classification results in 151 test samples.As shown in Figure 16, with an increase in the number of iterations, the correct number of test samples is gradually increased.After eight iterations, the classification results are stable.Using the same number of labeled samples for passive learning (samples are randomly selected from the sample set), the accuracy of the classification of the test sample is lower than the active learning method.Figure 16 shows that active learning can achieve better classification results with fewer labeled samples.

Result Comparison of the Matching Methods
In the process of constructing the classifier using the RVM and active learning, we counted the correct number of classification results in 151 test samples.As shown in Figure 16, with an increase in the number of iterations, the correct number of test samples is gradually increased.After eight iterations, the classification results are stable.Using the same number of labeled samples for passive learning (samples are randomly selected from the sample set), the accuracy of the classification of the test sample is lower than the active learning method.Figure 16 shows that active learning can achieve better classification results with fewer labeled samples.The results of the experiments are shown in Table 3, which shows that by taking the proposed RVM to process the selected features, the accuracy of matching is 92.1% and the recall rate is 91.8%.Compared with other methods, the RVM shows a distinct advantage in successful matching.The overlay method [8] for determining the buffer area is commonly used in matching data of the same scale, because in multi-scale data the shifting, merging, simplifying, and other operations during the cartographic generalization results in a low overlapping rate of buffer areas with different scales.In addition, the overlapping threshold is difficult to identify.Therefore, the accuracy of this method is relatively low.When taking the proposed selected characteristics to process, among the weighting feature similarity [15], SVM [25] and the mentioned RVM methods, the accuracy of matching is higher than that when using the selected characteristics proposed in Zhang [25].For the weighted matching method [15] based on characteristics' similarities, the characteristics' weight and matching thresholds have a great impact on accuracy, and it is difficult to identify them manually; thus, the rate of successful matching is not high.Using the SVM [25] algorithm can avoid the manual setting of characteristic weights and match thresholds.In addition, this method is suitable for classifying two-type problems, yet when multiple matches commonly exist in multi-scale objects, it is unable to further identify best match pair and may judge some mismatched objects as being matched.To ensure high classification accuracy, the method of active learning adopted in the present study can reduce the workload in labeling samples manually.Moreover, the output of RVM can be used to further recognize the multiple matching relations.Therefore, the proposed method displays a higher rate of successful matching and a smaller manual intervention workload than other methods.The results of matching are shown in Figure 17. Figure 17c demonstrates the matching effects of simple objects, while Figure 17b illustrates the matching effects for complicated objects.In this figure, the entity with gray solid line is illustrating large-scale data, the entity with blue gray line is showing the corresponding small-scale data, and the red solid line refers to the matching relations.4, the proposed method in the present study compares the value of classifying output and selects the combinations with the highest reliability, Nos.1782 and 1783, as the final matching objects, which is consistent with the manual recognition result.The experimental results from the proposed method are calculated using three matching types: 1:1, 1:m, and m:n.As shown in Figure 19, the matching precision of 1:1 is highest, reaching 95.5%, with a recall rate of 96%, and F1 (the harmonic mean of precision and recall) value is 95.7%.The matching of 1:m needs several merging operations and the method of merging could bring some complicated shapes such as cavity, which makes it more difficult to measure the similarity.The accuracy of 1:m is lower than 1:1 (i.e., precision = 91.9%,recall rate = 91.4% and F1 = 91.6%).The quantity of m:n type is small and is more complicated.The large number of candidate matching entities might result in partial mistakes in the selection of candidate matching objects, and, at the same time, the measurement may be influenced by complicated shapes.Its matching precision is 82.2%, the recall rate is 83.3%, and the F1 value is 82.7%.The experimental results from the proposed method are calculated using three matching types: 1:1, 1:m, and m:n.As shown in Figure 19, the matching precision of 1:1 is highest, reaching 95.5%, with a recall rate of 96%, and F1 (the harmonic mean of precision and recall) value is 95.7%.The matching of 1:m needs several merging operations and the method of merging could bring some complicated shapes such as cavity, which makes it more difficult to measure the similarity.The accuracy of 1:m is lower than 1:1 (i.e., precision = 91.9%,recall rate = 91.4% and F1 = 91.6%).The quantity of m:n type is small and is more complicated.The large number of candidate matching entities might result in partial mistakes in the selection of candidate matching objects, and, at the same time, the measurement may be influenced by complicated shapes.Its matching precision is 82.2%, the recall rate is 83.3%, and the F1 value is 82.7%.

Conclusions
Multi-scale object matching is the key technology used in cascading updates and fusion of multi-scale spatial data.This study presents a type of multi-scale residential areas matching method based on RVM algorithm and active learning.It proposes the rule to merge and not to simplify the method, using Delaunay triangulation, which converts the one-to-many or many-to-many relationships into one-to-one relationships in the matching of residential areas, thereby facilitating the measurement of geometric similarity.According to the characteristics of multi-scale area objects, the five characteristics of position, area, shape, orientation, and surroundings are selected to achieve

Conclusions
Multi-scale object matching is the key technology used in cascading updates and fusion of multi-scale spatial data.This study presents a type of multi-scale residential areas matching method based on RVM algorithm and active learning.It proposes the rule to merge and not to simplify the method, using Delaunay triangulation, which converts the one-to-many or many-to-many relationships into one-to-one relationships in the matching of residential areas, thereby facilitating the measurement of geometric similarity.According to the characteristics of multi-scale area objects, the five characteristics of position, area, shape, orientation, and surroundings are selected to achieve similarity measurements.Improvement of the orientation similarity measurement using the histogram of area projection is achieved, and a grid-based method for measuring surroundings similarity is designed.The classifying method of the RVM can avoid manual work for determining weights and threshold values.The active learning strategy achieves reasonable classification results with a small number of labeled samples, which can reduce the work of marking samples manually.This work enhances the automation and intellectualization of multi-scale spatial entities matching.
By means of the matching experiment that utilized 1:5000 scale residential areas and 1:25,000 scale residential areas, it is shown that the proposed method has obvious advantages in entity merging, similarity measuring, and matching compared with other methods.The overall precision of matching exceeds 90%, with the accuracy of 1:1 being highest and the other two (1:m and m:n) also having high matching precision.However, this method still needs further improvements: (1) the measurement of shape similarity needs further development to be suitable for area entities that have extremely complex shapes (e.g., an area entity with many cavities); and (2) when the values of m and n are relatively large in matching relations of 1:m and m:n, they will generate more candidates to be matched, which requires a longer processing time.Therefore, the selection process of matching groups needs further enhancement to increase efficiency.

Figure 4 .
Figure 4. Rotation of object: (a) is rotated counterclockwise to obtain (b), and (c) is rotated counterclockwise to obtain (d).

Figure 4 .
Figure 4. Rotation of object: (a) is rotated counterclockwise to obtain (b), and (c) is rotated counterclockwise to obtain (d).

Figure 5 .
Figure 5. Projective area histograms: (a) Figure 4b area projective histogram as per x-axis; (b) Figure 4b area projective histogram as per y-axis; (c) Figure 4d area projective histogram as per x-axis, and (d) Figure 4d area projective histogram as per y-axis.

Figure 6 .
Figure 6.Grid-based surroundings similarity: (a) grid constructed by small-scale data; and (b) grid constructed by large-scale data.

Figure 5 .
Figure 5. Projective area histograms: (a) Figure 4b area projective histogram as per x-axis; (b) Figure 4b area projective histogram as per y-axis; (c) Figure 4d area projective histogram as per x-axis, and (d) Figure 4d area projective histogram as per y-axis.

Figure 5 .
Figure 5. Projective area histograms: (a) Figure 4b area projective histogram as per x-axis; (b) Figure 4b area projective histogram as per y-axis; (c) Figure 4d area projective histogram as per x-axis, and (d) Figure 4d area projective histogram as per y-axis.

Figure 6 .
Figure 6.Grid-based surroundings similarity: (a) grid constructed by small-scale data; and (b) grid constructed by large-scale data.

Figure 6 .
Figure 6.Grid-based surroundings similarity: (a) grid constructed by small-scale data; and (b) grid constructed by large-scale data.

Figure 7 .
Figure 7. RVM and active learning to construct the classifier.

Figure 7 .
Figure 7. RVM and active learning to construct the classifier.

Figure 8 .
Figure 8. Example of multiple matching.

Figure 8 .
Figure 8. Example of multiple matching.
ISPRS Int.J. Geo-Inf.2017, 6, 70 12 of 20 in the buffer is inputted into the classifier to obtain the output of binary classification result.Eventually, the final matching object is determined according to the classification reliability.

Figure 10 .
Figure 10.Position similarity values between the matching pair using the two merging approaches.

Figure 11 .
Figure11.Orientation similarity values between the matching pair using the two merging approaches.

Figure 10 .
Figure 10.Position similarity values between the matching pair using the two merging approaches.

Figure 10 .
Figure 10.Position similarity values between the matching pair using the two merging approaches.

Figure 11 .
Figure 11.Orientation similarity values between the matching pair using the two merging approaches.Figure 11.Orientation similarity values between the matching pair using the two merging approaches.

Figure 11 . 20 Figure 12 .
Figure 11.Orientation similarity values between the matching pair using the two merging approaches.Figure 11.Orientation similarity values between the matching pair using the two merging approaches.ISPRS Int.J. Geo-Inf.2017, 6, 70 14 of 20

Figure 13 .
Figure 13.Shape similarity values between the matching pair using the two merging approaches.

Figure 12 . 20 Figure 12 .
Figure 12.Area similarity values between the matching pair using the two merging approaches.

Figure 13 .
Figure 13.Shape similarity values between the matching pair using the two merging approaches.
Figure14displays examples of merging tests of features.Figure14a,b are the residential areas to be matched at different scales.They are matched visually, but by using different programs to make automatic identification and using the convex hull-based method to merge Figure14ainto Figure14c, we observe that they are quite different geometrically.However, using the proposed method to merge Figure14aand obtain Figure14d, we discover that the geometric similarity between Figure14dand Figure14ais higher than that between Figure14cand Figure14a.

Figure 13 .
Figure 13.Shape similarity values between the matching pair using the two merging approaches.

Figure 14 .
Figure 14.Examples of feature merging: (a) large-scale residential areas; (b) small-scale residential areas; (c) merging effect of the convex hull-based method; and (d) merging effect of the proposed method.

Figure 14 .
Figure 14.Examples of feature merging: (a) large-scale residential areas; (b) small-scale residential areas; (c) merging effect of the convex hull-based method; and (d) merging effect of the proposed method.

Figure 15 .
Figure 15.An example of the metric of direction similarity index.

Figure 15 .
Figure 15.An example of the metric of direction similarity index.

Figure 15 .
Figure 15.An example of the metric of direction similarity index.

Figure 16 .
Figure 16.The statistics of the correct number of categories in test samples.Figure 16.The statistics of the correct number of categories in test samples.

Figure 16 .
Figure 16.The statistics of the correct number of categories in test samples.Figure 16.The statistics of the correct number of categories in test samples.

20 Figure 17 .
Figure 17.Display of matching effect: (a) global display; (b) matching effect of complicated objects; (c) matching effect of simple objects.

Figure 17 .
Figure 17.Display of matching effect: (a) global display; (b) matching effect of complicated objects; (c) matching effect of simple objects.

Figure 18
Figure 18  is an example of multiple matching, which shows that with the use of a buffer, No. 509 small-scale residential area (the blue dashed border) can search the three large-scale candidate elements Nos.1782, 1783, and 1784 (the gray solid borders).If there are three combinations as judged by the classifier, then they are all matched.The method of SVM usually misjudges No. 1784 as one of the matching objects of No. 509.As shown in Table4, the proposed method in the present study compares the value of classifying output and selects the combinations with the highest reliability, Nos.1782 and 1783, as the final matching objects, which is consistent with the manual recognition result.

Figure 17 .
Figure 17.Display of matching effect: (a) global display; (b) matching effect of complicated objects; (c) matching effect of simple objects.

Figure 18 .
Figure 18.An example of the multiple matching.

Figure 18 .
Figure 18.An example of the multiple matching.

Figure 19 .
Figure 19.Calculation of accuracy of various matching types with the proposed method.

Figure 19 .
Figure 19.Calculation of accuracy of various matching types with the proposed method.

Table 1 .
Example of initial training samples.

Table 1 .
Example of initial training samples.

Table 2 .
Average similarity calculated by different merging methods.

Table 2 .
Average similarity calculated by different merging methods.

Table 3 .
The statistical evaluation of the proposed method and other methods.

Table 4 .
The values of the matching reliability.

Table 4 .
The values of the matching reliability.