Geological map generalization driven by size constraints

: Geological maps are an important information source used in the support of activities relating to mining, earth resources, hazards, and environmental studies. Owing to the complexity of this particular map type, the process of geological map generalization has not been comprehensively addressed, and thus a complete automated system for geological map generalization is not yet available. In particular, while in other areas of map generalization constraint-based techniques have become the prevailing approach in the past two decades, generalization methods for geological maps have rarely adopted this approach. This paper seeks to fill this gap by presenting a methodology for the automation of geological map generalization that builds on size constraints (i.e., constraints that deal with the minimum area and distance relations in individual or pairs of map features). The methodology starts by modeling relevant size constraints and then uses a workflow consisting of generalization operators that respond to violations of size constraints (elimination/selection, enlargement, aggregation, and displacement) as well as algorithms to implement these operators. We show that the automation of geological map generalization is possible using constraint-based modeling, leading to improved process control compared to current approaches. However, we also show the limitations of an approach that is solely based on size constraints and identify extensions for a more complete workflow. Abstract: Geological maps are an important information source used in the support of activities relating to mining, earth resources, hazards, and environmental studies. Owing to the complexity of this particular map type, the process of geological map generalization has not been comprehensively addressed, and thus a complete automated system for geological map generalization is not yet available. In particular, while in other areas of map generalization constraint-based techniques have become the prevailing approach in the past two decades, generalization methods for geological maps have rarely adopted this approach. This paper seeks to ﬁll this gap by presenting a methodology for the automation of geological map generalization that builds on size constraints (i.e., constraints that deal with the minimum area and distance relations in individual or pairs of map features). The methodology starts by modeling relevant size constraints and then uses a workﬂow consisting of generalization operators that respond to violations of size constraints (elimination / selection, enlargement, aggregation, and displacement) as well as algorithms to implement these operators. We show that the automation of geological map generalization is possible using constraint-based modeling, leading to improved process control compared to current approaches. However, we also show the limitations of an approach that is solely based on size constraints and identify extensions for a more complete workﬂow.


Introduction
The majority of the research conducted on the automation of map generalization has thus far been dedicated to topographic maps, with a focus on the generalization of roads, boundaries, river networks, and buildings; that is, with a focus on linear objects or small area objects. The generalization of thematic map types, including categorical maps (e.g., geological, soil, or land use maps), has received less attention, perhaps since categorical maps contain polygons of potentially arbitrary shapes and sizes, rendering them more complex than typical shapes found, for instance, for buildings on a topographic map. Nevertheless, categorical maps, with geological maps as a prominent representative, are a frequent map type and require specific methods with which to automate generalization. For instance, in topographic maps, buildings are usually of a regular shape and are often arranged in a regular fashion (e.g., in linear alignments), while in categorical maps, polygon features can be of an arbitrary shape, occurring in arbitrary spatial arrangements. Merely reusing the approach and processes of topographic map generalization for categorical mapping will not provide a proper solution [1,2], as requirements for categorical and thus geological map generalization are different from those used for topographic mapping, as detailed in Section 2.
Bedrocks assist geologists in portraying the natural history of a study area and identifying associated rock formations. They also carry essential mineral resources such as coal, oil, and uranium, which are in the focus of the mining industry. Thus, a map for mining purposes highlights ancient bedrocks that may carry particular minerals, while neglecting sedimentary rocks as they are more recent [11]. Geophysicists, in turn, place more emphasis on the intrinsic characteristics of features such as porosity and permeability in rock and sediments [12].
The goal of a geological map is "to interpretively portray the spatial and temporal interactions between rocks, earth materials, and landforms at the Earth's surface" [13]. Geological materials are the igneous, metamorphic, and sedimentary rock and surficial sediments that form the landscape around us. Most geological maps use colors and labels to show areas of different geological materials, called geological units. Geological structures are the breaks and beds in the geological structures resulting from the slow but strong forces that form the world [14]. "Geological maps show the location of these structures with different types of lines. Because the Earth is complex, no two maps show the same materials and structures, and so the meaning of the colors, labels, and lines is explained on each map" [15].
Geological maps consist of diverse patterns formed by a fabric of polygons, plus additional linear objects (e.g., fault lines) and point objects (e.g., wells) which however are not of concern here. The polygons can be described by spatial, structural, and semantic properties to evaluate similarities or differences and thus infer spatial patterns relating to the perceptual structure or arrangement of polygon features on the map [16]. Figure 1 shows some sample excerpts of geological maps, ordered by increasing geometric and graphical complexity. In the simple map extract of Figure 1a, few geological units are involved, with relatively simple shapes, which could be generalized by merely using simplification operators. The next level of complexity comprises many small polygons of the same or similar geological units (Figure 1b), which may be generalized by removing or alternatively aggregating units or sub-units into a single group. Another level consists of a series of elongated polygons of the same unit embedded in, and possibly crossing, other units (Figure 1c), where a cartographer may recommend merging neighboring units while trying to maintain their overall arrangement (i.e., using the so-called typification operator). Another complicated form found in geological maps are tree-like, dendritic forms, which were created at a later stage of the quaternary period by rivers and streams carrying sediments and other minerals. This type also defines the position of a river system (Figure 1d). In this case, a possible solution is to replace several branches with a smaller number of simplified tree branches. Figure 1e shows that various kinds of units consisting of small and large/long and narrow polygons and tree-like structures may also co-exist, rendering the generalization process even more challenging. The generalization of such complex fabrics requires making multiple, interrelated (and possibly conflicting) generalization decisions.
The few examples in Figure 1 illustrate that the polygonal layer of geological maps consists of a far greater variability in category, size, shape, boundary sinuosity, and topological (in particular containment) relations of the concerned polygons than can typically be found on topographic maps. The spatial arrangement of polygons in geological maps can take many different forms, as is noticeable in Figure 1, although even in the complex map of Figure 1e we can perceive alignments and clusters of polygons. The key feature classes in topographic maps are predominantly anthropogenic and hence tend to have more regular shapes and a lesser degree of variability, and they are often arranged in regular alignments (e.g., grid street networks or straight alignments of buildings). "Natural" feature classes, such as land cover, in topographic maps usually are restricted to few categories (e.g., woodlands, waterbodies, built-up areas) and are of secondary priority. Hence, we would argue that while the same operators (elimination, simplification, aggregation, typification displacement, etc.) are valid in both domains, different generalization algorithms, or at least different combinations of algorithms, have to be used to adapt to the peculiarities of geological maps or more generally categorical maps.
The next section offers a review of various generalization methods that have been explicitly proposed to deal with the peculiarities of the geological maps listed above.

Related Work
The earliest significant attempt at specifically generalizing geological maps was made by the authors of [17], who tried to automate the generalization of a 1:250,000 scale geological map from a 1:50,000 scale source map (a product of the British Geological Survey (BGS)), using the conceptual model previously suggested in [18]. While the results obtained were encouraging, the BGS concluded that the strategy still required intervention by the human operator, rendering it less flexible and more subjective. Importantly, however, this early work highlighted the significance of basing the generalization process upon an understanding of the essential structures and patterns inherent to the source map, a task that is of course not unique to the case of geological maps [18,19].
The authors of [20] presented a conceptual workflow model dedicated to the semi-automated generalization of thematic maps with three main phases: structural analysis, generalization, and visualization. Structural analysis (or "structure recognition" according to [18]) was deemed especially crucial, as once all relevant structures present in a map are made known, this information can support the decision of "when and how to generalize" [21]. The second step of their conceptual model consisted of constraint modeling in a multi-agent system, aiming to build an objective and flexible workflow.
Inspired by the aforementioned conceptual model [20], the author of [22] developed a generalization workflow based on ArcGIS tools. A sample geological map at a 1:24,000 scale was generalized to three target scales, namely 1:50,000, 1:100,000, and 1:250,000. The study results were compared with the corresponding U.S. Geological Survey (USGS) geological maps and helped to summarize the strengths and limitations of the tools available for generalization at the time within ArcGIS.
Another experiment on the effectiveness of ArcGIS was conducted by Smirnoff et al. [23], where a cellular automata (CA) approach was developed specifically for geological map generalization using ArcGIS tools as a basis. When comparing their results with those of a process using the generalization tools directly available in ArcGIS, they concluded that the cell-based, or cellular, automata model had essential advantages for the automated generalization of geological maps. In more recent research [24], the ArcGIS toolbox called "GeoScaler" was tested on surficial and bedrock maps. The results were evaluated and found adequate. Also, repeatable results were obtained by maintaining some amount of human intervention in the process. Very importantly, since the GeoScaler toolbox was made freely available, this methodology can be used in the practice of geological mapping and has thus defined the state of the art of generalization tools for geological map generalization, which persists today. However, the approach does not consider the individual, local properties of geological features such as the size, shape, and orientation of the polygons, or the distance between them, which is crucial for carrying over the critical patterns of the source map to the derived map. Moreover, as most geological map data exist in vector format, conversion from vector to raster format (and possibly back to vector format again) causes additional, uncontrolled loss of data accuracy.
Hence, a vector-based approach that uses a combination of generalization operators that can be adaptively applied depending on the local situation seems more appropriate. In this vein, the authors of [25] proposed algorithms for several operators of polygon generalization, including elimination, enlargement, aggregation, and displacement, based on a rolling ball principle, conforming Delaunay triangulation, and skeleton approaches. The authors of [8,26] proposed a list of constraints and an agent-based process for the generalization of polygonal maps, extending earlier works by the authors of [4,5,9].
Probably the most comprehensive work regarding constraint modeling for geological maps was presented by Edwardes and Mackaness [27], who developed and illustrated a rich set of constraints and proposed ways to represent these in formal language. While conceptually intriguing, the proposed set of constraints was unfortunately not linked to particular generalization algorithms and was not implemented. The study also demonstrated the vast complexity that is involved when trying to model the constraints governing geological maps comprehensively and cast these into a computer-based process.
Müller and Wang [28] proposed an automated methodology for area patch generalization that encompassed elimination, enlargement/contraction, merging, and displacement operators. They addressed a problem similar to that dealt with in this paper, and they also used a similar set of operators. However, their approach considered all polygons as semantically identical. In contrast, geological maps easily consist of over 20 rock types, demanding simultaneous consideration of geometrical as well as attribute properties of the polygons. Their approach also leaves rather little flexibility for modifying the control parameters, as it lacks the capability of testing different solutions for a given conflict.
As argued in Section 1, the best way to detect cartographic conflicts and formalize and control generalization algorithms for resolving such situations is by using constraints, as has been shown in topographic map generalization. In this paper, we seek to demonstrate the application of the constraint-based paradigm to the case of geological maps. As shown in Section 2 and another study [27], there is an almost infinite complexity involved when trying to solve geological map generalization comprehensively. Thus, we focus on a particular generalization problem: small, free-standing polygons (area patches in the terminology of [28]), which represent a frequent case in geological maps (Figure 1b,d,e) and other types of categorical maps, such as soil maps. Since such small polygons are prone to legibility problems in scale reduction but also often represent geological units of superior economic value (Section 4.4.2), we focus on size constraints, which are simple yet allow many of the problems tied to small polygons to be addressed. A limited set of rather simple constraints alleviates tracing the effects of using a constraint-based approach in polygonal map generalization.

Overall Workflow
An overview of the proposed methodology is shown in Figure 2. The overall workflow starts from a large-scale, detailed map stored in a geological database (at scales of 1:10,000 to 1:25,000) to obtain generalized maps at medium scales (1:50,000 to 1:100,000), subsequently storing them again in the database. The actual generalization workflow consists of three main stages: identification of constraints, modeling of constraints, and generalization execution. While this part of the workflow shares some similarity with the model presented in [29], the focus here is simply on representing the sequence of the main stages. In the initial stage (Section 4.2), the relevant cartographic constraints governing geological maps are identified, in our case size constraints related to the generalization of small area patches. This is followed by the stage of constraint modeling (Section 4.3), which aims to formalize the said constraints for geological map generalization. The third and main stage is the actual generalization process (Section 4.4), which takes the previously modeled constraints and uses them to orchestrate the application of different generalization operators-elimination, enlargement, aggregation, displacement-that can be used to resolve potential constraint violations. Lastly, the obtained map is evaluated based on the goal values of each representative constraint. If the map is regarded as not satisfactory by the set goal values, control is returned to the constraint modeling stage to explore alternative options, optimizing the importance and priority values of the constraints. Once the resulting map is considered satisfactory, it is stored in the database.

Identification of Constraints
As we focus on the generalization of small area patches in categorical maps, the relevant set of constraints is restricted to size constraints. Size constraints form the very starting point of the generalization process, as they deal with minimum area and distance relations in individual map features or pairs of map features. Therefore, while they may be considered to be more straightforward than, for instance, constraints dealing with groups of multiple polygons (expressing spatial pattern configurations), they are nonetheless of crucial importance as they form the foundation of all other constraints. The primary trigger of these constraints is the illegibility of map objects due to scale reduction when the limits of human visual perception are reached and polygons become too small to be visible, or are too close, causing visual coalescence. Thus, two constraints form the core of size constraints: "minimum area" (MA), and "object separation" (OS). Based on the study of relevant literature [8,30,31], we identified the following five size constraints for our generalization of geological maps:

Constraint Modeling
Once the constraints have been identified, they need to be further defined. Following the approach proposed in [8], which follows the agent-based paradigm, a complete constraint definition consists of the constraint itself, one or more goal values (minimum dimension to be attained on the target map), one or more measures that can quantify whether the constraint is met, possible plans (i.e., generalization operators triggered in case of constraint violation), and the importance or priority of generalization operators. Table 1 summarizes the five size constraints, the cause that may lead to their violation, goal values, measures, possible generalization operators, and the impact that will be caused by generalization. Figure 3 illustrates the size constraints visually.

Generalization Workflow
The minimum area constraint is fundamental to the generalization of polygonal maps, as every type of categorical map is related to the size of polygonal features. Hence, it serves as the initial trigger and decision point for the generalization process, which is modeled after a typical process of map generalization in traditional cartography [32]. The workflow in Figure 4 is represented in two main branches, but it is essentially sequential. First, the right-hand branch deals with polygons that are too small. If they do not belong to an essential geological unit, they are eliminated. Else, they are enlarged, using a different enlargement algorithm depending on their shape. After having dealt with the elimination and enlargement operators, the workflow returns to check if now every polygon is large enough. If the answer is yes, the left-hand branch is triggered, which deals with polygons larger than the minimum area limit. Here, the next size constraint is tested. When the minimum object separation distance to its neighbors is not met, if the polygons concerned are of the same category they are aggregated or else displaced to meet the object separation constraint. Else, the polygon can be output to the target map or database. This final task also includes further line simplification and smoothing operations to take care of the "consecutive vertices" and "outline granularity" constraints.
In the remainder of this section, we will introduce the different generalization operators and algorithms one by one. Note that we assume that any reclassification operations merging sub-categories into superordinate categories take place before our workflow.

Elimination
The elimination operator dissolves small polygons into background polygons. It is triggered after a polygon fails two tests: it does not meet the MA constraint, and it is considered not sufficiently important to be nevertheless retained. However, the two tests are not applied in an unbounded way on the polygons. Still, a global target value is first determined, giving the number of polygons to be eliminated (n elim ) from the source map (or conversely selected for the target map). Three different selection methods are devised, all factoring in both the MA constraint and the importance of polygons given by their category (i.e., the geological unit in our case).
The first selection method, the "Radical Law" [33], is used as a baseline, as it is often used as a reference in map generalization. Consequently, the number of polygons to be eliminated (n elim ) is determined using the Radical Law. A detailed example of how n elim is determined is given in Section 5.2. Once n elim has been established, the polygons to be removed must be identified. The polygons falling below the MA threshold are sorted in ascending order according to the sum of their normalized area and their normalized geological importance value. The importance of a polygon is a function of its geological unit, which reflects geological age. Older units, such as "amphibolite" and "pegmatite", will have more mineral deposits and hence more economic value than newly generated rocks [11]; younger rock types, such as "quaternary", are thus considered less critical than ancient "Jurassic" layers. After the sorting order has been established, the n elim smallest and least important polygons are then eliminated.
In the second selection method, called "area loss-gain selection", the polygons are first sorted by category in ascending order of their area, and polygon removal then also proceeds by category (i.e., by geological unit). In each category, as many small polygons are removed as are needed to balance the area gain of the remaining same-class, small polygons treated by the enlargement operator (next subsection). Thus, the total area of the removed polygons per category is always balanced with the area gained by the enlargement operator. For that purpose, this selection operator for each polygon pre-computes the area gain that would be added if the polygon was kept and enlarged.
The third approach, termed as "category-wise selection", applies the Radical Law separately for each category, as indicated by its name. For instance, if a map has 5 categories and according to the Radical Law 30% of polygons should be removed, then in each category 30% of the smallest polygons are eliminated. Note that this method does not take into account the area loss and gain of the polygons, yet it better balances the removal of polygons from each category than the pure Radical Law does, protecting categories consisting only of small polygons from vanishing.

Enlargement
After removing small and unimportant polygons in the elimination step, some polygons will remain that are smaller than the MA limit, but owing to their importance are flagged for retention. Therefore, the next step is to enlarge them until they reach the MA limit, and adequate readability by the human eye is ensured. Two algorithms are devised to accomplish enlargement, buffering, and scaling. Figure 5 illustrates the effect of these two algorithms: enlargement by buffering generates round polygon shapes while scaling enlarges the polygon area and maintains the initial polygon shape.
Polygon buffering is a commonly used operation in GIS and is straightforward to accomplish. However, its tendency to generate round shapes may lead to the loss of the polygon's original shape, mainly if very small polygons are involved and/or the buffer distance is significant. Hence, we recommend using buffer enlargement when only a smaller increase is necessary, and when polygons have a more elongated shape, where shape distortion is less noticeable.
The scaling algorithm can handle round polygons and extend small polygons without distortion of the shape. The algorithm loops over every vertex and calculates its distance to the point of reference, which is the polygon centroid by default. The scaled polygon is then produced by extending a line from the reference point across each initial vertex at the scaled length, with the endpoint becoming the scaled vertex. One disadvantage of the basic scaling algorithm is that when the polygon centroid is used as a reference point, topological errors may be created for non-convex polygons (Figure 5c), necessitating displacement to restore the topology. The enlarged polygon in Figure 5c is shifted from its original position to the right-hand side and below due to the scaling process. The choice between the two enlargement algorithms for each polygon is guided by an analysis of their shape using the "ipsometric quotient" (IPQ) shape index [34], used as a measure of compactness in [35]: where A denotes the area and P the perimeter of the polygon. The IPQ takes values in the interval (0,1), where values approaching 0 indicate elongated shapes, while a value of 1 indicates a circle. We used a threshold of 0.5 in our experiments to distinguish between elongated shapes (for buffering) and round shapes (for scaling), as this threshold marks the midpoint between the two extremes. The buffer width K b that needs to be added to the polygon is calculated as a function of the target area A' to be reached at the target scale (e.g., 625 m 2 at 1:50,000), the current area A, and current perimeter P The scaling factor K s that allows the algorithm to enlarge a polygon to the target area A' (e.g., 625 m 2 at 1:50,000) equals the square root from the target area A' over the current area A.

Aggregation
After execution of the enlargement operator, all remaining polygons should be sufficiently large for the target scale and thus adequately readable. However, enlargement also may give rise to another conflict: the violation of the object separation constraint, which is dealt with the aggregation and displacement operator, respectively, in this work. Both are triggered by a violation of object separation, but aggregation resolves this problem by merging polygons that are too close but also similar, while displacement moves dissimilar polygons until the minimum separability distance is again met. A conflict between features is identified by calculating the distance between any two polygons, where the polygons are closest to each other.
The similarity of polygons may be determined using several properties, such as the shape, size, orientation, and category of involved polygons [10], but in this paper, we use only the semantic similarity, that is, same category, to denote two polygons as being similar. The semantic similarity is of overriding importance: it makes only sense to aggregate polygons of same categories (or possibly of shared subcategories, if they exist).
Various algorithms have been proposed for the aggregation and typification of buildings [36,37], but for our purposes, we were more interested in an algorithm that would fill in voids between polygons that are found to be too close, akin to the algorithm in [25]. The aggregation is performed by generating a concave hull from the outline vertices of the polygons to be aggregated using the algorithm by [38]. The inset maps of Figure 12 highlight sample results of this procedure. The algorithm starts by selecting the first vertex as the one with the smallest Y value. Next, k points nearest to the current vertex are selected as candidates to be the next vertex for the output polygon. The next vertex is decided based on the biggest right-hand turn from the horizontal axis. These two steps are repeated until the selected candidate is the first vertex. Finally, the consecutive vertices found are connected by line segments to form the resulting concave polygon. Adjusting the parameter k allows for control of the degree of concavity of the resulting aggregated polygon.

Displacement
There are two reasons why the displacement operator may be activated. First, as a direct consequence of scale reduction, visually differentiating between map objects becomes harder and at some stage, impossible (an effect often termed congestion; [18]). Second, as a consequence of polygon enlargement, the object separation constraint may be violated, as enlarged polygons require more space and may start to coalesce or even overlap. In both cases, map objects should be either aggregated or displaced to the minimum separability range. In instances where polygons are densely clustered, alternative operators such as typification (which is a combination of elimination, aggregation, and displacement) may be preferred. Figure 6 illustrates the effect of the displacement operator, triggered by a violation of the object separation constraint following polygon enlargement. After the violation has been detected, polygons are moved to the minimum separability distance from each other. We devised two algorithms to implement the displacement operator: a simple pairwise displacement algorithm used when only two polygons are in conflict, and a Voronoi-based algorithm for cases involving multiple polygons. Note, however, that these could be replaced by other, more sophisticated algorithms featuring different displacement characteristics if so desired [39,40].
The pairwise displacement algorithm was inspired by [41] and considers a pair of polygons that are too close to each other to be distinguishable and must, therefore, be pushed back to the object separation distance. The algorithm finds the nearest points between the two polygons, calculates the movement vectors for each polygon, and displaces the polygons from each other (Figure 7). The size of the polygons controls the degree of displacement of each polygon: the smaller the polygons, the more they move, while larger polygons move less. In the example in Figure 7, the polygon with an area of 8666 m 2 moves only 5 meters, while the polygon with an area of 2471 m 2 is displaced 18 meters from its initial position.
If multiple polygons are in conflict, the pairwise displacement algorithm cannot guarantee that the minimum separability constraint is met everywhere, as it is unable to safeguard against possible knock-on effects when, e.g., a polygon is pushed into another, neighboring polygon that would then also need to be displaced. Therefore, conflict zones involving three or more polygons are handled and are achieved by a Voronoi-based displacement approach inspired by algorithms using Voronoi or Voronoi-like cells to control the displacement [42,43].
The Voronoi-based method is divided into three steps. First, polygons are identified that violate the object separation limit (Figure 8a). In the second step, polygons are collapsed to their centroid points, which serve as seed points for the Voronoi diagram (Figure 8b). From the perpendicular bisectors of the Voronoi polygons (i.e., the edges of the Delaunay triangulation), the displacement vectors are identified for each polygon pair. Finally, the polygons are pushed to the minimum separation distance while forcing the polygons to remain within their Voronoi polygon (Figure 8c).    The completion of the displacement operator also marks the end of the process laid out in Figure 4. This process was initially triggered by the minimum area constraint and eventually ensured that this constraint as well as the object separation constraint were fulfilled. All that is left now are the finishing touches by simplification and smoothing of the polygon outlines to ensure the "consecutive vertices" and "outline granularity" constraints are also met ("Output polygons" step in Figure 4). This step, however, uses standard algorithms with conservative parameter settings and thus does not need to be described in detail here.

Implementation
The proposed workflow uses GeoPandas 0.6.0, an open-source project that handles geospatial data in the Python programming language. Geopandas methods such as dissolving, buffering, centroid identification, convex hull, rotation, scaling, and translation are used to remove, enlarge, aggregate, and displace polygons. Moreover, the ArcPy package from Esri, Inc., is used, implementing ArcGIS toolbox capabilities such as calculating area or finding near and nearby objects. Map visualization is undertaken using Esri ArcGIS 10.6 and Matplotlib, a plotting library for the Python programming language.

Data
The data used for our experiments were taken from the "Euriowie Block (including part of Campbells Creek)" 1:25,000 scale geological map published in the year 2000 by the Geological Survey of New South Wales (NSW), Department of Mineral Resources, Australia (https://www. resourcesandgeoscience.nsw.gov.au). A portion of this map is shown in Figure 10, left. The "Euriowie Block" is part of the bigger geological block called the "Broken Hill Block", located near the mining city of Broken Hill, NSW. It contains significant mineral resources, particularly abundant and rich lead-silver-zinc (Pb-Ag-Zn) deposits. The Broken Hill Block's geology is highly complex, and historically there has always been extensive exploration activity in the area [44,45], which gave us grounds for choosing this particular map. Moreover, the map is openly available for public and research purposes and thus promotes the reproducibility of the results. The map's objective is to present a broad distribution of rock types, especially those with mineralization or stratigraphic significance. The data are of moderate to high quality, reliability, and internal consistency. It has an accuracy that is appropriate for a map of a scale of 1:25,000.
The geological map consists of 215 sub-categories of rock types or units, which in turn are part of 24 higher categories. Small-sized minerals such as the "hornblende" and "feldspar" groups make up approximately 60% of terrestrial rocks by weight. These mineral groups are mostly related to plutonic (also called intrusive) igneous rocks such as pegmatites (shown in yellow in Figure 10), which form deep under the Earth's surface and are created by the solidification of magma [46]. They are also present in many types of metamorphic rocks, such as amphibolite (shown in green in Figure 10). Following erosion, they become exposed at the Earth's surface, often in small patches, represented as small polygons on geological maps. Since the amphibolite and pegmatite units have more mineral deposits and are thus of higher economic value, they will be retained longer in the scale reduction process. Since they often form small polygons, these rock types are good candidates for dealing with minimum size constraints, which are the focus of this paper.

Elimination
In the sequence of experiments reported in Sections 5.2-5.5, we use goal values for the MA constraint of 0.5 × 0.5 mm and for the object separation constraint the value of of 0.4 mm given in [30]. The whole Euriowie map database consists of n s = 1877 polygons at the source scale of 1:25,000. With a target scale of 1:50,000, the number of polygons n MA that fall below the MA limit is 1135, given a minimum area of 0.5 × 0.5 mm on the map or 625 m 2 on the ground; (Table 1). This means that almost 60% of all polygons are smaller than what may be defined as the legibility limit. Using the "Radical Law selection" strategy for the elimination operator (Section 4.4.2), the number of polygons that should be retained, n t , can be computed as follows (where N s is the denominator of the source scale and N t the denominator of the target scale): n t = n s N s /N t = 1877 * 25, 000/50, 000 = 1327 (4) That is, according to the Radical Law [33], n t = 1327 polygons should be retained, or conversely, n elim = n sn t = 550 polygons should be eliminated. Finally, the number of polygons falling below the MA constraint that should be kept, n keep , can be obtained as n keep = n MAn elim = 585.
Thus, with the Radical Law selection strategy, n elim = 550 polygons with an area below the MA threshold should be eliminated from the map, while n keep = 585 polygons are also smaller than the MA threshold, but are deemed sufficiently important to be still kept on the map, and thus forwarded to the subsequent generalization operators such as enlargement.
To determine which polygons are eliminated, their importance value is established based on their area as well as their position in the "geological hierarchy" ( Table 2). The area values occurring among the n MA polygons smaller than the MA limit range from 7 m 2 to 625 m 2 . The geological hierarchy follows the order of geological units according to their age: the older the rocks, the more important they are considered, as they likely have more mineral content. Overall, in the Euriowie Block, 15 geological units can be found, which are assigned a value of 15 for the oldest unit and 1 for the youngest unit, respectively ( Table 2). For each polygon, its area and its importance value, respectively, are then normalized and summed to obtain the integrated importance value per polygon. After sorting in ascending order, the n elim (550) least important polygons are then terminally eliminated using the geopandas.dissolve() algorithm of GeoPandas 0.6.1, which dissolves selected polygons into those neighboring polygons, with whom they share the largest border. Figure 10, right, shows an excerpt of the Euriowie map after the elimination operation using the Radical Law selection approach compared to the original map, on the left. Examples using the other two selection strategies of the elimination operator will be presented in Section 5.6.1.

Enlargement
Those polygons that are smaller than the MA threshold but sufficiently important to be kept on the target map (n keep = 585) need to be enlarged until they reach the minimum area (Section 4.4.3). Depending on the IPQ shape index of polygons, two separate algorithms, buffering and scaling, are used for enlargement.
Overall, 234 polygons (roughly 40%) out of 585 polygons had IPQ values lower than 0.5, indicating rather elongated shapes, and were thus enlarged by buffering. Accordingly, 351 polygons (around 60%) had IPQ values above 0.5, indicating rather round shapes, and were thus enlarged by scaling. The result of the enlargement step is shown in Figure 11.

Aggregation
As the next operation, aggregation combines polygons of the same category (i.e., same geological unit) that fall within the object separation (OS) distance by applying the concave hull algorithm (Section 4.4.4). We used OS = 0.4 mm ( Table 1) corresponding to 20 meters on the ground at scale 1:50,000 and k = 3 in the concave hull algorithm to produce the result depicted in Figure 12. Overall, of the n keep = 585 polygons, 78 were aggregated, leaving 507.

Displacement
As a final step, the displacement operator is executed to relocate polygons that are too close to each other (i.e., not meeting the OS limit) from their original position to a minimum separability distance. An excerpt from the result of this operation, and thus the final generalization product, is shown in Figure 13 in comparison to the source map. Overall, 35 polygons were displaced, 24 using the pairwise displacement algorithm and 11 using the Voronoi-based approach.

Sensitivity to Parameter Settings
The operators and algorithms of the proposed methodology are controlled by a variety of parameters, all of which can take different values depending on the purpose and application domain of the map in question. Even for the various size constraints, although we defined goal values in Table 1, no canonical or "right" values exist. Hence, in three test cases, we illustrate the sensitivity of the generalization results in variations of the control parameters. Note that in all test cases, no simplification and smoothing was applied. These operators would normally be applied using conservative parameter settings and thus have little effect on the final result (Section 4.4.1).

Test Case 1
In the first test case, the influence of using the three different selection methods for the elimination operator was tested on 1877 polygons contained in the source map at 1:25,000 scale as shown in Figure 14, in the transition to a target scale of 1:50,000. This experiment used the same goal values for the MA and OS constraint as in Sections 5.2-5.5. In the "Radical Law selection", 29.29% or 563 of the source polygons were removed (Table 3). In "area loss-gain selection", only 139 were eliminated. Finally, "category-wise selection" resulted in the removal of 417 polygons. Figure 14    In the second test, we compared three different sets of thresholds for the size constraints, called "Fine-grained" (FG), "Compromise' " (CM), and "Coarse-grained" (CG) ( Table 4). The first threshold set was inspired by [30]. While their fine-grained goal values were defined for regularly shaped polygons on topographic maps, the authors noted that larger values should be used for tinted and irregularly shaped polygons, such as those typical of categorical maps. The Coarse-grained constraint set was positioned at the other end of the granularity scale and was obtained from [31], a source devoted to the symbolization of geological maps, defining the minimum size of tinted polygons to be around 2 mm 2 . The Compromise constraint set tries to balance between the Fine-grained and the Coarse-grained sets. The visual comparison of the generalization results produced using the three constraint sets is given in Figure 15. The figure also presents a close-up comparison of the three approaches and the original map.  As a final test, we used the Compromise goal values set and the Category-wise selection method, which had both been found to perform best in the above tests, to produce a series of maps at the scales of 1:50,000, 1:100,000, and 1:200,000, generalized from the original 1:25,000 scale map. The interest of this test was in exploring the scale range over which the proposed methodology could be usefully applied. The visual comparison of the resulting generalization series with the original map is presented in Figure 16. Table 5 shows the evolution of the number of polygons and their combined area of four selected geological units present in Figure 16: amphibolite, pegmatite, late Proterozoic, and the soil-sand-gravel-clay unit. Amphibolite (green polygons) and pegmatite (yellow polygons) were chosen because they represent two classes that occur both mainly as small, frequent polygons, and are of high importance due to their age. Late Proterozoic (orange-brown) and soil-sand-gravel-clay (light yellow) units, on the other hand, are young and of the youngest age, respectively, and populate the study area as few but large polygons. Table 5. Effect of scale reduction using the Compromise set of goal values for the MA and OS constraints ( Table 4). The numbers are given for the map window of Figure 16.

Scales Geological Units Total Area in m 2 (%) Polygons
Original (

Comparison with Cellular Automata Approach
In this section, we compare the results that can be obtained by the proposed methodology with those of an approach based on cellular automata (CA) [23,24], as implemented in the GeoScaler toolbox [47]. As mentioned in Section 2, this approach can probably be considered as the state of the art of geological map generalization available in practice, which is why we used it as a benchmark for comparison. The result of our constraint-based approach was generated using the CM goal values and the category-wise selection method (Figure 17, left). This result is equivalent to the map shown in Figure 15 (bottom left). Since GeoScaler is based on a completely different approach (raster-based, using a CA) as compared to our methodology (which works on individual polygons), it is hard to exactly match the parameter settings for the two methods. GeoScaler also offers a variety of postprocessing tools to further enhance the initial result of the CA algorithm. In order to be able to compare the results of the two methods on a level playing field, we decided not to make use of the GeoScaler postprocessing options and limit the parameter settings to those that find a correspondence between the two methods (i.e., parameters that relate to size constraints). GeoScaler [47] first asks the user for the source and target scale (1:25,000 and 1:50,000 in this case), which then automatically sets the scale reduction factor and size-related parameters. Furthermore, we rasterized the polygon map using a cell-size of 1 meter, the highest possible, to keep the map quality as high as possible. The CA was then run over the rasterized map with a Moore's neighborhood of radius 3 and 2 iterations of final CA generation. Following the CA processing, the raster map was converted back to a vector map, yielding the result shown in Figure 17 (right).

Discussion
This paper presented a methodology for geological map generalization using a constraint-based approach. The main objective of this paper was to demonstrate the usefulness of the constraint-based approach in map generalization when applied to geological maps, not to solve geological map generalization comprehensively (more feature classes are addressed in [23,24]). This was done against the background that the constraint-based approach has so far not been explicitly applied to the automated generalization of geological maps. Hence, the proposed methodology focused on a sub-problem of geological map generalization, though nevertheless an important one. In particular, it focused on size constraints as the primary driving force of maintaining map legibility through map generalization. Furthermore, we focused on the case of small, free-standing polygons (or area patches; [28]), as they represent a frequent case not only in geological maps but also other types of categorical maps such as soil maps or land cover maps.
Based on the experiments reported in Section 5, we can make several general observations: 1.
The proposed methodology resolves the main legibility problems associated with small polygons in a step-by-step manner. The resulting map is more readable, and map features remain distinguishable after generalization ( Figure 13).

2.
Although the goal values of the size constraints are defined globally, each polygon is treated individually to its own, specific properties rather than by a global process such as cellular automata. Hence, by consideration of the semantics of individual polygons, important polygons can be protected by enlargement; by consideration of shape properties, the shape characteristics of the individual polygons are largely maintained.

3.
The approach, by being based on testing whether the goal values of constraints are met, allows self-evaluation, and can guarantee that the legibility limits are met-but not more than that. Again, this differs from global approaches such as cellular automata [23,24].

4.
Few parameters are required to control the generalization methodology. The process is initially triggered by the MA constraint and further assisted by few additional size constraints (most importantly, the OS constraint). Once the goal values of the size constraints and additional algorithm-specific parameters have been set, the methodology operates automatically, without further human intervention. 5.
The constraints' goal values are, first of all, a function of the map legibility at the target scale, and hence allow adapting to the desired scale transition. Furthermore, the goal values also allow for controlling the overall granularity of the output map (Table 4 and Figure 15), depending on the map purpose. 6.
Despite the rather low number of constraints and control parameters, the methodology is modular and features several generalization operators, thus achieving considerable flexibility.
Because the proposed methodology builds on the universally valid concept of constraints relating to map legibility, the potential exists to apply the same approach to the generalization of other categorical maps, such as vegetation or soil maps. Furthermore, since the workflow is modular, any of the algorithms used in our methodology could be replaced by another algorithm that is perhaps better suited for the peculiarities of the given generalization problem. Thus, for instance, the aggregation or displacement algorithms used in this paper could be replaced by more elaborate algorithms if so desired.
Using a workflow approach, we showed how the proposed methodology could be implemented using the libraries of a general-purpose GIS, achieving a behavior similar that of an agent-based system, but with less implementation effort and less effort for parameter tuning.
Comparing Figures 10-13, which present the results of the individual generalization operators, it becomes obvious that elimination and aggregation have the most far-reaching effects. Elimination may retain, or alternatively destroy, both the patterns of spatial arrangements of polygons, as well as the balance of area proportions of the different categories appearing on the map. Below, we discuss the available elimination algorithms separately through Test Case 1. Aggregation is the other operator that can lead to significant alteration of the map image, in this case however by area gain and shape modification of the polygons concerned. As Figure 12 suggests, the aggregation algorithm that was implemented in our workflow can sometimes lead to undesirable shape and area changes, and would be better replaced by a typification algorithm [36,37], which however would require preceding detection of group patterns amenable to typification [10].
Elimination is the first but also the most radical step of the methodology, as some polygons are permanently removed from the target map. Selective elimination of small island polygons supported by importance values gives way to preserve polygons that are important relative to others, thus reflecting geological importance or economic value. However, in the elimination process, the connectivity between neighboring polygons is not considered, which may lead to the removal or uncontrolled dissolution of certain group patterns (e.g., clusters, alignments) of polygons. Examples of this effect can be observed comparing Figure 10, left (the original map), and Figure 10, right (the map after elimination).
To address this issue, we ran Test Case 1 on three selection methods (Section 5.6.1): Radical Law selection, area loss-gain selection, and category-wise selection. As Figure 14 and Table 3 show for the scale transition to 1:50,000, area loss-gain selection preserved the number and structure of polygons best. In contrast, Radical Law selection had the most destructive effect on polygon groups. However, when the scale transition is larger, e.g., with a target scale of 1:100,000 or 200,000, more polygons will have to be removed to balance the area gain induced by enlargement. Thus, category-wise selection seems to be the better approach for the elimination operation in greater scale reductions, as it distributes the removal of polygons evenly across each category. In general, Radical Law selection is not recommended for use in categorical map generalization unless more strongly generalized ("overgeneralized") maps are desired. One of the essential requirements of categorical map generalization is to preserve areas as much as possible across scale ranges, which can be achieved using the area loss-gain balance selection method. Based on Test Case 1, however, the category-wise selection method ultimately seems more appropriate as it evens out the elimination of polygons across categories, and also represents a good compromise. Nevertheless, regardless of which of the three selection methods is used, none can safeguard against the inadvertent destruction of group patterns, especially not at greater scale reductions. The actual generalization stage should thus be preceded by a stage devoted to the recognition of essential collective patterns present in the source map [16,18], which led us to develop a process for the recognition of polygon group patterns [10].
In Test Case 2, documented in Figure 15 and Table 4, we compared three different sets of goal values to explore their effect on the granularity of the resulting map. The most fine-grained of the three sets (FG) was originally developed for topographic mapping [30]. As Figure 15 shows, it is however reaching its limits for the generalization of geological maps that are usually populated by tinted, irregularly shaped polygons that often have reduced visual contrast (as is for instance particularly the case with the yellow pegmatite polygons). For such maps, the goal values of the FG set seem too detailed. Thus, in [31] the use of considerably higher goal values for geological maps was proposed, which produces a rather coarse-grained result with more polygons being enlarged and aggregated. As Figure 15 suggests, this Coarse-grained (CG) set of goal values may lead to overgeneralization and excessive loss of detail, particularly when the focus is on small polygons, as is the case in this paper. This is confirmed by the average polygon area and the number of polygons (Table 4), which are markedly different from those of the other two sets of goal values. The Compromise (CM) set of goal values allows for a more differentiated picture to be maintained, while ensuring that legibility is maintained. Note that in this set, the values were chosen to lie closer to those of the FG set than those of the CG set, with the intention of retaining as much detail as possible, thus also allowing a greater range of scale reduction. This is also reflected by the results reported in Table 4 for the average polygon area and the number of polygons, which are very similar to those obtained with the FG set.
In Test Case 3, we used the Compromise goal values and the Category-wise selection strategy to produce a series of generalized maps, shown in Figure 16 at the corresponding target scales. While the resulting map at 1:50,000 overall looks convincing, the map at 1:100,000 starts showing signs of visual imbalance. At the scale of 1:200,000, the map breaks down completely. The visual impression is supported by Table 5, which shows that the relative area per class becomes heavily imbalanced after 1:100,000. In the original map, the Amphibolite and Pegmatite categories are the most frequent ones regarding the number of polygons, which is due to the fact that they mostly occur as small polygons.
Hence, they are candidates for removal due to their generally small size, and their number decreases significantly in the transition from 1:100,000 and 1:200,000 (Table 5). However, as they are so numerous, these two categories are also most affected by enlargement and particularly by aggregation; hence, the area gained in the generalization process is disproportional. Again, this effect suggests that beyond a scale of 1:100,000 the results are no longer pleasing. Once again, this points to the necessity of an approach that is based on the recognition of local group structures which can then inform improved contextual generalization operators [10]. Figure 17 compares the proposed constraint-based methodology and the CA approach of GeoScaler [23,24], and both methods indeed yielded comparable results in that the legibility was improved. However, the CA approach, as it is based on a moving-window operator that uses the same, essentially majority-based principle across the entire map, has a tendency towards dilating larger polygons while eroding small polygons. Hence, more small polygons disappeared than in the result of the proposed methodology, and some polygons became hardly visible. GeoScaler includes post-processing operations, such as enlargement of too small polygons, which can be applied to ensure the legibility of all polygons. However, even that postprocessing operation will not bring back those small polygons that have been removed. Thus, overall the proposed constraint-based methodology seems to outperform the CA-based approach regarding the adequate maintenance of small polygons. Additionally, due to the raster-based majority filtering of the CA approach, characteristic polygon shapes also become more uniform than in our proposed methodology.
We have already mentioned the lack of explicit polygon group recognition in the elimination operator as the first shortcoming of the proposed methodology. Here, we see the second weakness: because the methodology focuses on small polygons and favors small polygons of important geological units by enlarging them, those small polygons grow disproportionately when the scale reduction factor is large. Once again, this points to the necessity of treating groups of polygons rather than individual polygons. While the methodology of this paper includes contextual generalization operators (aggregation, displacement), "context" is merely understood as the immediate, first-order neighborhood defined by the OS constraint. If polygons are separated by a distance slightly exceeding the OS limit, no link is detected. Likewise, there is no facility for detecting higher-order neighbors and hence forming larger, contiguous groups of polygons. Hence, for a full treatment of the small polygons forming the focus of this paper, group pattern detection procedures are needed, as well as contextual generalization operators that can make use of such group patterns. In related work [10], we therefore propose a methodology that considers proximity as well as geometrical similarities (shape, size, and orientation of map features) and attribute similarities to find, refine, and form groups that can be used to subsequently inform contextual generalization operators, such as aggregation and typification, to overcome persistent limitations in generalization approaches such as those shown in this paper.

Conclusions
Taking the example of size constraints (minimum area, object separation, etc.), this paper demonstrated how constraint-based map generalization can be realized for the case of small island polygons, which are mainly dominated by size constraints. Size constraints are rather simple but very fundamental; everything else depends on them. Thus, starting with the most fundamental constraint-the minimum area constraint-we were able to demonstrate how a constraint-based approach can be built, triggering different generalization operators depending on the local analysis of the individual situation found. From a practical perspective, we were able to show how such an approach can be implemented using the libraries of a general-purpose GIS, achieving a behavior similar to that of an agent-based system but with less implementation effort and less effort for parameter tuning.
As our experiments also showed, an approach focusing solely on size constraints and local analysis and operations is not sufficient if greater scale reductions are to be achieved. Thus, future research will have to explore the detection of groups of polygons defined by proximity and similarity (in attribute, size, shape, orientation, etc.), as well as contextual generalization operators for aggregation and typification that can make use of those group patterns.