## 1. Introduction

Map generalization is a procedure that utilizes transformation operations such as elimination, amalgamation, displacement, and simplification to solve spatial conflicts and derive smaller-scale maps from larger-scale maps [

1]. Map generalization is an important means of modelling and understanding geographical phenomena [

2]. When updating multi-representation databases, we often implement map generalization in order to propagate updates from the source scale to high-level scales [

3,

4]. The process of map generalization can be decomposed into model generalization and cartographic generalization [

5,

6]. Model generalization aims to derive higher-level abstractions from a primary geographic database without considering the artistry for visualization, which can be viewed as a preprocessing step prior to visualization via cartographic generalization. Amalgamation that fuses buildings within a cluster into a single object for the next higher-level representation is an essential operation of model generalization for map production [

7], and attracts scientific interest from cartographic researchers [

8,

9,

10].

The process of amalgamation is challenging for urban environments where buildings with complex spatial distribution need to merge. Specifically, proper amalgamation needs to address the issues of both the identification of building clusters (i.e., the grouping of individual buildings into distinct clusters by analyzing the spatial relations between buildings) [

8,

11] and cartographic constraint requirements (e.g., maintain position accuracy, retain balance of the whole area, square shapes, and avoid short gap distance) [

10]. Nevertheless, a variety of methods have been developed for specific sub-problems of amalgamation. For building patterns recognition, significant achievements have already been made [

12,

13,

14,

15,

16,

17].

Previous research on the amalgamation of building clusters can be categorized into two types according to the processing data structure. The first is developed for raster data (if the source data is vector data, it is converted to raster data), including the method using morphologic operators (i.e., expansion and erosion) [

18,

19,

20], and the method using scanning on raster data in two vertical directions to fill gaps between buildings in order to implement amalgamation [

10,

21]. These methods are difficult to fully control, as the solutions are designed for all scenarios without local tuning [

14], and are inappropriate for other generalization operations (e.g., simplification and rectangularity). In addition, they would result in the loss of position accuracy during the conversion between raster and vector data. The other types of amalgamation methods are developed for vector data, which have been extensively studied. Strategies employed by these methods can be categorized into four types, including aggregation by displacing, aggregation by flooding, aggregation by sampling, and aggregation by connecting objects [

11]. In this paper we explore the last of these strategies, which uses the triangles connecting objects as connectors to merge objects, which is by far the most commonly used methodology [

22]. Triangles from the Delaunay triangulation can provide explicit spatial relationships between features, and can used be to guide the amalgamation process [

3,

7,

8,

9,

23]. When applying the Delaunay triangulation to aggregate buildings within clusters, it is critical to determine which triangle is to be removed or maintained. The measured parameters include the position, angle, and height of a triangle and the mean length of its three edges [

7,

24]. As a result, these methods often involve considerable empirical thresholds for comparison. Moreover, such methods prefer to employ a global and fixed amalgamating distance to aggregate buildings within a cluster, which would result in some details of amalgams being lost, although these details meet cartographic constraints and are important to users.

In this paper, we present a progressive strategy for the amalgamation of building clusters based on the assumption that a building cluster recognized at the target scale may contain different levels of homogeneous subgroups according to certain variable conditions (e.g., mean distance) and that a hierarchy of that cluster can be derived. Thus, by decomposing a building cluster into scaled homogeneous subgroups (i.e., father clusters represent a coarse-scale grouping result, while their subgroups correspond to a fine-scale grouping result) to construct a hierarchy and progressively amalgamating subgroups from the bottom level to the highest level of the hierarchy, it is possible to obtain an amalgam with better preserved details that satisfy cartographic constraints and may be of importance for users. Moreover, it can avoid the necessity of employing considerable empirical parameters. To obtain such amalgams, the following requirements need to be considered: (1) as a prerequisite of amalgamation, an appropriate grouping method to detect building clusters must be found; (2) it must be determined which criteria define the homogeneity of subgroups for decomposing the building clusters; and (3) building subgroups must be amalgamated progressively without significant modification of geometry.

## 4. Results and Discussion

All experiments were performed on a personal computer with an Intel(R) Core(TM) i5-4460 CPU (central processing unit) and a memory of 8 GB. All algorithms proposed in

Section 2 were realized using C# on Microsoft Windows 7 (×64). Component libraries and tool libraries of ArcGIS Engine 10.1 were applied to developed related algorithms.

Figure 5 presents four 1:10,000 generalized results derived by using different methods. Compared to manual generalization results (

Figure 5a), some difference can be identified from the generalized results derived by automatic methods (

Figure 5b–d). For example, discernible details of the generalized results derived by the proposed method were maintained as much as possible, whereas those of the results generalized by group maximum distance were eliminated (built-up area marked with black rectangles in

Figure 5c). These details make the contours of generalized objects more similar to the outlines of the corresponding building clusters. In addition, some open spaces connecting to roads are preserved in the generalized results derived by using the proposed method, while the method using group maximum distance would fill these open spaces during the amalgamation (built-up area marked with black circles in

Figure 5c). Since it is without building grouping, the ArcMap tool would cause the buildings within multiple distinct clusters to be merged into one big object (built-up area colored red in

Figure 5d), demonstrating that building grouping is essential to the amalgamation of building clusters. Accordingly, the subsequent analysis mainly centers on the generalization results derived by using the other two automatic methods.

Table 1 provides a brief summary of generalized objects derived by the different methods. The number of generalized objects are the same, except for that of the results generalized by using the ArcMap tool which often results in distinct building clusters being merged into a large one (e.g., the objects remarked red in block 1 of

Figure 5d). Moreover, the minimum distance between neighboring objects derived by the ArcMap tool is too small (0.36 m) to distinguish them at the target scale. Almost all the smallest objects derived by the different methods satisfy the smallest area constraint of not less than 200 m

^{2} on the ground. The results derived by the proposed method are much better than those of max distance in terms of root mean squared errors (RMSEs).

Figure 6 displays the number of levels within the hierarchy of each building cluster. To facilitate this discussion, a cluster might be viewed as its hierarchy. Visually, a higher level leads to fewer corresponding building clusters. Moreover, most building clusters recognized at the target scale only have one level (i.e., they require no further partitioning and have no subgroup). At first glance, the outlines of building clusters with more than one level are more complex than those of building clusters with only one level. However, upon careful examination we can find that the generalized results (

Figure 5) derived from the building clusters with one level are almost the same, even though they were amalgamated by different methods. Accordingly, if there is a difference among the generalized results derived by different methods, it could result from the building clusters whose hierarchies include more than one level. For example, the two generalized objects (

Figure 5b,c) generated from the same building cluster with four levels in block 0 (

Figure 6) are different in detail. This will be analyzed further in the sections that follow.

Figure 7 plots the mean difference (i.e., the difference between all total building areas and the total area of generalized objects, normalized by the total building area) with each level number, which further explains the differences between generalized results that were generated by using the different methods. Overall, the mean difference becomes greater when increasing the number of levels for both methods. For building clusters with one level, the mean differences of generalized objects for the two methods are the same and are at their lowest, demonstrating that the gaps between the two results lie in those building clusters that have more than one level. When building clusters are complex (i.e., have multiple levels), the gap between the two methods widens, indicating that the areas of generalized results derived by using the Max distance method are more prone to being out of balance in comparison to the original buildings. However, during the generalization of building clusters, it is necessary to retain the balance of building representation areas between scales [

10]. Accordingly, we should pay more attention to the complex clusters with multiple levels, and may amalgamate them progressively when generalizing maps.

Figure 8 is a plot of the areas of generalized objects against the areas of the corresponding buildings within clusters. Note that only the building clusters with more than one level are taken into account. Overall, the regressions between the areas of generalized objects and the areas of corresponding clusters agree well, with coefficients of determination (R²) of 0.983 and 0.982 for the proposed and max distance methods, respectively. Their root mean squared errors (RMSEs) are 358 and 367 square meters, respectively, which are less than the minimum whole area of 600 m

^{2}, demonstrating that the proposed method is better than the method employing Max distance, and that the strategy of progressive amalgamation is appropriate for map generalization.

Parts of map objects should have a minimal size to be clearly legible, which is an important legibility constraint in map generalization [

36].

Figure 9 plots the percentage of each length of polygon line segments. There is a small portion of line segments shorter than three meters, indicating that both methods need improvement. These defects of generalized objects may result from the rectangularity operator that is performed on post-simplification objects. However, it is apparent that the presented method still performed better than the max distance method. In addition, from

Figure 8 we can see that polygons derived by the proposed method have a lower deviation of the area of generalized objects from that of the original buildings. Normally, if the area of a fitted polygon must be close to that of a cluster, the more short line segments the polygon has, the better the generated polygon will fit the cluster. Thus, from the two figures we can reasonably infer that the Max distance method may result in more short edges which deviate from the outlines of building clusters. This may be because some large V-shaped concave corners of building clusters are almost filled (buildings marked with rectangles in

Figure 5c) by the maximum distance method. These line segments can be regarded as the “false” edges of fitting objects.

During the amalgamation of building clusters, the outlines of fitted polygons should be similar to the initial states of clusters. The more points of a fitted polygon that are close to the outline of a cluster, the better the fitted polygon is.

Figure 10 displays the percentage of points whose buffers intersect the buildings within corresponding clusters. Both methods achieved good results, as the percentage of points under the given buffer distance (no more than 2.3 m) is far higher than 90%. Moreover, from this figure we can infer that polygons derived by the proposed method present a higher compactness. This corollary is consistent with the above analysis that polygons derived by proposed method have a lower RMSE in area comparison (

Figure 8).

As a preprocessing step prior to amalgamation, building grouping is essential to the effectiveness of the proposed method.

Figure 11 shows the grouping results detected at the scale of 1:10,000 for the tested data. Buildings marked in sky blue with red outlines were grouped to the same clusters, whereas erroneous building clusters are marked in green with blue outlines. Visually, most buildings were correctly grouped in terms of patterns (e.g., linear pattern, L-shaped pattern, and high-density pattern), providing good preparation for the amalgamation of building clusters and demonstrating that the grouping method is effective in recognizing different building group patterns. Overall, the grouping method is able to detect 96.88% of the building group patterns correctly.

Since building clusters were decomposed into various scaling subgroups and progressive generalization was carried out throughout the whole continuous spectrum of subgroups, a side effect of this process is that intermediate generalized results are available. These results can form the continuous scale representations of buildings. They are called continuous scale representations because they were generated by continuous generalization [

37], which leads to the representations of two adjacent scales without other intermediate generalized results. For continuous representation, it is critical to quantify the scale parameter. Here the maximal distance of each subgroup is used to quantify the scale parameter. When a value of this parameter is given, we can obtain a level detail of buildings (a generalized result) from the clusters whose maximal distances are no more than this value. Thus, the links among building clusters, generalized objects, and scale parameter values can be stored intrinsically; these links are often missing in multi-scale representations [

38].

Figure 12 and

Figure 13 present an example of how these links are derived. Note that they only give an impression of contents at different map scales without corresponding to correct user impression when zooming in on the map. In order to display the relationship between subgroups clearly, 0.5 m is set as the max distance of those subgroups whose buildings touch each other. Obviously, there are only six scaling representations for this set of buildings. The finest scaling grouping results consist of eight building subgroups with the scale parameter value (Max distance) of less than 0.5 m, whereas that of the coarsest scaling grouping results composed of only the largest group is 2.8 m. In addition, the scale of each level of detail is not fixed, but changes in a certain range—namely, the scale interval (

Figure 13). In other words, the level of detail will not change in a scale interval.

## 5. Conclusions

This study set out to amalgamate building clusters gradually without a significant modification of geometry while preserving the details of generalized objects as much as possible under cartographic constraints. To accomplish this goal, this study proposed a progressive strategy of amalgamation based on scaling subgroups, which consists of a building grouping method, a method for deriving hierarchies of building clusters, and a progressive amalgamation algorithm.

We validated our approach on a vector dataset together with some quantitative measurements. Comparative studies first revealed that building grouping is essential to the amalgamation of building clusters. This is because the methods without building grouping would often cause the buildings within multiple distinct clusters to be merged into one large object. The research also showed that the generalized objects derived from simple building clusters that only have one level are almost the same, even though they were generalized by different methods. However, for the complex clusters that have multiple levels, the outlines of fitted polygons derived by the proposed method are more similar to those of clusters. The reason is that the proposed method progressively aggregates buildings within subgroups from the bottom level to the highest level and uses the maximum distance of each subgroup as the amalgamating tolerance in every iterative fusing process. Thus, significant modification of geometry is avoided while the details of generalized objects are preserved as much as possible during the amalgamation process. Taken together, these results suggest that we should pay more attention to the complex clusters with multiple levels and may amalgamate them in a progressive fashion when generating maps. In addition, the proposed method will prove useful in multi-scale representations, as it can generate continuous representations and provide links among building clusters, generalized objects, and scale parameter values.

Further tests are needed to improve the proposed method (e.g., testing it with more spatial datasets from different scales), and more generalization operators (e.g., displacement and typification) should be integrated in the process. More work is also needed to automatically calibrate parameters (e.g., homogeneous criteria and scale index) used in the presented strategy.