Constraint-Based Spatial Data Management for Cartographic Representation at Different Scales

: This article elaborates on map-quality evaluation and assessment as a result of the generalization of geospatial data through the development of a methodology, which incorporates a quality data model including constraints. These constraints are used to guide the generalization process and they operate as requirements in quality controls applied for the quality evaluation and assessment of the resulting cartographic data. The quality model stores the required map speciﬁcations compiled as constraints, and provides quality measures along with new techniques for the evaluation and assessment of cartographic data quality. This secures the map composition process in each and every step and for all features involved, at any map scale. The methodology developed results in the creation of a scale-dependent cartographic database that contains exclusively the features to be portrayed on the map, generalized properly according to the map scale. It will reduce cartographers’ need to review each transformation throughout the map-composition process with considerable savings in time and money and, on the other hand, it will secure the quality of the ﬁnal map. The formulation of the proposed methodology amalgamates generalization theory with the authors’ research in computer-assisted cartography, taking into account the work conducted on the topic by other researchers. In this study, the quality requirements, the measures and the associated techniques together with the results of the application of the proposed methodology for area and line features are described in detail to allow others to replicate and build on the presented results.


Background
Attempts to formulate a modern definition of the map concept define a map as a 'system of relationships' [1]. In this context, map generalization (a process necessary for depicting geographical entities in different degrees of detail at different scales) is considered a process of managing relationships consisting of two components that are widely recognized: a. the modeling component, which refers to the clarification of relationships and b. the cartographic component, which relates to the portrayal of features relationships through symbols. In this framework, [1] states that 'a well-designed' map is related to its ability to render the relationships that constitute the semantic properties (metric, topological and Gestalt/aesthetic) of the depicted area. Measurements of the cartographic data properties produced by the two distinct generalization processes (semantic and cartographic) were introduced initially by [2][3][4]. Other researchers [5][6][7][8] attempted to develop process frameworks integrated with techniques for the quality evaluation of generalization results. In addition, a review of the activities of national mapping agencies reveals their continuous effort to optimize the processes in map production through automation and, especially, the automation of generalization [9][10][11][12], incorporating evaluation procedures in the production of cartographic data. Several national mapping agencies (Ordnance Survey of Great Britain (OSGB), Institut Geographique National (IGN France), The Netherlands Kadaster (Kadaster), Institut Cartografic de Catalunya (ICC), AdV German, Swisstopo, KMS Denmark, USGS USA) have conducted research to standardize the generalization process and have achieved various degrees of automation.
According to [13], any generalization solution should incorporate quality evaluation and assessment techniques. Other researchers [8], consider the quality evaluation of the results as an integral part of a 'holistic process' in map production and [13] considers constraint-based generalization as a process where a situation on the map is acceptable when a variety of constraints are satisfied. The constraint-based generalization process originally proposed by [14] is implemented in current agent-based generalization models (AGENT, CartACom, GAEL, RevK, CollaGEN) [12]. Multi-agent systems perform with satisfactory results. Their main disadvantage is the complexity of the parametrization and the need for high computing performance for large datasets. So, sometimes, simpler techniques perform better in resolving geometric conflicts [12]. Therefore, a simplified methodology containing tools that run autonomously and are encoded in a widely accepted programming language (e.g., Python) easily incorporated in commercial GIS environments, could operate more efficiently.

Research Goals and Objectives
This paper describes a methodology for the quality evaluation and assessment of the cartographic data produced in the two discrete phases of generalization (semantic and cartographic generalization). It aims to contribute to the development of 'knowledge' on the quality evaluation and assessment of the cartographic data, a topic in cartography where further research is required, as [8] states emphatically. More specifically, the proposed methodology includes the design of two quality models for each distinct generalization phase, focusing on the standardization of suitable transformations using constraints, the standardization of the minimum required specifications, and the standardization of the necessary quality controls. Considering the current agent-based generalization approaches where emphasis is given to the cartographic generalization phase [12], the proposed methodology integrates a quality model able to support semantic generalization where semantic and schema generalization operations are standardized in a framework with constraints and quality controls. This way, data reduction is accomplished that facilitates the subsequent cartographic generalization process. The proposed methodology, even if it is simplified compared to agent-based generalization models, achieves quite satisfactory results regarding the properties of cartographic data. Provided that it is based on a simplified database schema, only important attributes are preserved, which guide the assignment of importance to features, a factor crucial for the resolution of conflicts between them. It is pointed out that the implementation of the described methodology is completed in distinct phases, making it suitable for autonomous operation and integration in any automated generalization system.
In the next section, the map-quality concept is defined and the conceptual and logical framework of the proposed methodology is presented. The conceptual framework refers to the description of the application environment where the quality models are implemented and the definition of their basic structural elements and of the minimum conditions that ensure their smooth operation. The logical framework includes the development of the quality models configured especially for each generalization phase as a sequence of specific procedures for the quality evaluation and assessment of the cartographic data with qualitative criteria. Furthermore, the quality model for semantic generalization incorporated in the proposed methodology is described in detail, along with an example of its implementation. The example refers to the automated production of two cartographic databases at scales 1:500,000 and 1:1,000,000 based on a database at scale 1:250,000. Section 3 discusses the results of the methodology proposed and Section 4 elaborates on its utilization and on topics of future research.

Map-Quality Concept
The quality evaluation of map features is composed of the following three components: a.
The geometric quality, which pertains to the results of the transformations applied to the geometry of entities and directly impacts the position, the shape and the geometric elements of the entities (vertices, line segments, angles) and, indirectly, the topological relations between them. b.
The thematic quality, which concerns the results of the transformations applied to the database schema and impacts the information completeness, the correct classification of the entities based on their definitions, the compliance of the attribute values to the attributes domains and the values correctness. c.
The aesthetic/graphic quality (Gestalt), which regards the evaluation of the graphic map quality and the map ability to transfer the thematic information, considering the map as a communication medium [1].
The proposed methodology has been developed with the aim to evaluate and assess the geometric and thematic quality of the produced entities, as these are finally formed as the result of semantic and cartographic generalization. The study focuses on the above quality components, as their quantitative assessment is feasible.

Conceptual Framework
The conceptual framework of the proposed methodology includes the delineation of the application environment where the quality model operates, the definition of the minimum required operating conditions and the definition of the quality-model structural elements. Considering that map compilation is completed at the end of the implementation of a sequence of transformations, we consider the digital environment of a geographic information system-GIS-as the suitable environment for the implementation of the proposed methodology. In GIS, all processes are implemented in the framework of a quality management system according to the ISO 9001/2015 Standard [15], which ensures that the quality of the incoming data at the commencement of each new process is acceptable. In cases where the methodology is applied autonomously, it is necessary that the initial spatial data and the data incoming in each distinct composition phase be of acceptable quality. This specific condition is considered fundamental for the smooth operation of the applied quality model, as the implementation of procedures with low-quality data and the execution of final controls reduce the prospect of error recovery and limit the potential of identifying a satisfactory solution.
The structural elements of the proposed quality model were formulated and based on the framework of procedures for quality evaluation of the data produced in cartographic generalization in the context of the development of the AGENT system (IGN, France) [7] and the conclusions resulted from the research carried out by the EuroSDR [8,16]. More specifically, the analysis of the evaluation process is presented in three stages in the existing works, according to [7]: evaluation for tuning, evaluation for controlling, and evaluation for evaluation. Alternatively, it can be formulated in three axes: standardization of the specifications as constraints, definition of data-quality quantification measures, and definition of data-matching techniques between generalized data and reference data [8].
Combining the above frameworks for quality evaluation in cartographic generalization, the following three structural elements of the quality model are defined: Structural element 1: Includes the quality specifications formulated: a. as constraints used to delimit the generalization results and b. as quality requirements with the corresponding compliance thresholds for the assessment of the degree of constraints preservation.
Structural element 2: Includes the quality measures selected and applied in quality controls to evaluate data consistency with the constraints.
Structural element 3: Includes the constraint-based process of generalization integrating quality controls.
The general principle for the developed quality model refers to the quality evaluation of generalized data through the results of measurements used to evaluate compliance with the set constraints and quality assessment through the evaluation of the degree of constraints satisfaction.

Logical Framework
The logical framework of the methodology includes the description of the characteristics of the three structural elements that define the quality model in combination with the properties of the entities (geometric and thematic) and their relationships. The characteristics of the structural elements of the quality model are adjusted properly in each generalization phase (semantic and cartographic) by using different parameters in the specifications and constraints, different measures, different transformations and quality controls.
Structural element 1 (specifications/constraints/quality requirements): The quality specifications are formulated as constraints, based on the constraint typology proposed by [17] in the framework of the EuroSDR research program. In that study, the constraints were classified into two categories: the improvement of legibility and the preservation of appearance. They were also related to the geometric type of the map entities (points, lines, polygons) and their relationships (see Figure 1). The legibility constraints (minimum dimensions and density of features/minimum separation distances) are applied in both generalization phases while the preservation of other constraints (e.g., topology, position/direction, shape, pattern, and distribution) is applied in cartographic generalization only. The above constraints, formulated by [17], were developed in the context of cartographic generalization, so they suitably direct the process and assess the geometric quality of cartographic data. However, in the case of semantic generalization, they are not sufficient as they are related only to the geometric quality and not to the thematic (pertaining to database schema). Consequently, new constraints are defined based on the transformations of the semantic generalization [18] and the ISO 19157 Standard [19]. These are adopted to describe the thematic quality, considering that cartographic data are derived from spatial data and they retain the same inherent characteristics. Conformance levels in constraints preservation are set as acceptable or unacceptable for both phases of generalization except for the shape preservation constraint, which is formulated more flexibly.
Structural element 2 (quality measures): The quality measures for spatial data according to [19] are selected in cases where the corresponding quality elements are used to describe the quality in semantic generalization and, in the case of topology constraints, in cartographic generalization. In addition, new measures are adopted or developed such as those related to the legibility improvement constraint (semantic and cartographic generalization), position, direction, and shape constraints (cartographic generalization). The method of selecting an appropriate measure is based on the definition included in the technical specifications of the basic quality measures of the AGENT system (1997-2000, IGN, France) [20]. A 'good' measure is a measure that achieves a distinct differentiation of the observed characteristics of the entity, remains unchanged in relation to its other characteristics and is characterized by the following properties: robustness, separability, independence in relation to the user, independence in relation to the representation of the object, invariance under geometric transformations, ease of calculation, ease of use, ease of its parameterization and ease of recognition.
Structural element 3: The third structural element includes a standard set of transformations at the end of which quality controls are applied. Transformations and quality controls are dedicated for each generalization phase. In the next paragraph, the quality model for the semantic generalization is presented in detail.

Concept and Operations
The semantic generalization process transforms the categorization of features [18] and modifies their descriptive characteristics (classification and attribution of the data). It is considered a synthesis of the four information abstraction processes: classification, association, generalization in the sense of simplifying a category, and aggregation [21]. Implementing the four abstraction processes requires the application of transformations (operations) at two levels: the schema level and the instance level [18]. At the schema level, the operations are configured as follows: class abstraction, class elimination, class composition, attribute elimination, attribute aggregation and modification of the conditions existing in a class, which determine if a feature belongs or not to the class (modification of the class intension). At the instance level, the operations concern feature elimination, feature reclassification, feature aggregation, features merging and attribute modification. Considering that operations introduced by [18] fully cover the transformations in semantic generalization and that the process is carried out by transferring the data between two databases of different schema (initial database/new database), three cases of data transfer are standardized: i.
Transfer of all data when there is correspondence between the features classes of both databases (initial database/new database), one-to-one relationship between features classes: 'migration'. ii.
Transfer of all data when there is correspondence between many feature classes of the initial database and one feature class in the new database, many-to-one relationship between the features classes: 'class abstraction'. iii.
Removal of a class: 'class elimination'.
Considering the above standardization of the data transfer process between the two databases of different schemata, appropriate constraints are formed to guide the process and to be used as quality criteria for quality evaluation and assessment of the cartographic data in the new database.

Constraints and Measures
Map specifications as constraints (structural element 1 of the quality model) with the corresponding quality measures (structural element 2 of the quality model) are presented below. i.
Preservation of consistency in the projection coordinate system of the new database ( [19] quality element: logical consistency/format consistency). Conformance level is set to acceptable or unacceptable. The satisfaction of this constraint is assessed by conducting a query to the database (initial/new) about its descriptive features and checking the coincidence of the coordinate systems characteristics of the initial database against the new one. ii.
Preservation of consistency in geometric types ( [19] quality element: logical consistency/format consistency). Conformance level is set to acceptable or unacceptable. Entities with geometric type not compliant with the feature-class geometric type are transformed. iii.
Preservation of information completeness regarding entities ( [19] quality element: completeness/omission). Each feature class in the new database is considered complete when the amount of its data is equal to the amount of data associated with the feature class in the initial database. Conformance level is set to acceptable or unacceptable. This constraint is applicable for 'migration' and 'class abstraction' processes. iv.
Preservation of classification correctness. This constraint is applicable for 'migration' and 'class abstraction' processes and concerns the correct classification of the entities per subtype. Conformance level is set to acceptable or unacceptable. v.
Preservation of conceptual consistency in the relationships between entities when features classes or features are eliminated (class elimination/feature elimination) ( [19] quality element: logical consistency/conceptual consistency). Holes in polygons, gaps between connections and 'orphan' connections are not allowed. The measure of "holes" in a polygon is defined by the number of 'ring' features found in the feature class. The measure of the gap between connections is defined by the number of missing links. The identification of a gap is performed through counting the dangle nodes in the feature class before and after link elimination considering that the missing link should have the same endpoints with two dangle nodes of two different lines in the feature class. Maintaining the readability of each individual entity using improvement of legibility constraints-minimal dimensions ( [19] quality element: completeness/commission) and maintain legibility between different entities of the same feature class ( [19] quality element: logical consistency/conceptual consistency). Maintaining legibility between different entities is achieved by using minimum distances based on resolution (0.25 mm) at the generalization scale. Legibility measure between entities is calculated as the length of the remaining items or as the number of remaining items inside a buffer with width equal to resolution. Conformance level is set to acceptable or unacceptable. x.
Preservation of information completeness regarding entities in each feature class ( [19] quality element: completeness/commission) by maintaining the object compliance to the class intension. Conditions of minimum length or area are usually defined. Conformance level is set to acceptable or unacceptable.

Generalization Process and Quality Controls
The third component of the quality model includes a series of standard operations/ transformations where quality controls are applied. The set of operations in combination with an application paradigm are described below. The paradigm concerns data transformation of a geodatabase at scale 1:250,000 to two cartographic databases at scales 1:500,000 and 1:1,000,000, respectively, based on the EuroGlobal Map data specifications [22]. Semantic generalization transformations are applied to EuroRegional Map database at scale 1:250,000 (German region) [23]. The ESRI file geodatabase format is used for data storage and Ar-cMap is used for the depiction of map data in Figures 2-4. The semantic generalization operations and quality controls are implemented in Python programming language incorporating functions from the "arcpy" module and 'Shapely' module. Topological queries are implemented by using 'Shapely' module in Python and GIS buffer function. Comparison between attributes tables is implemented after their transformation in Python lists objects.
The following requirements are fundamental to achieve successful implementation of the quality model.

a.
The schema of the new database should be known and the structure of the new database must be compatible with the specifications. The proposed quality model includes the necessary controls to assess the consistency of the new database schema with the specifications but these controls are omitted in this paper for brevity reasons. b.
The correspondence between the classes and the attributes of the initial database and the new database should be predefined and based on their definitions. The implementation of the example requires the configuration of two tables incorporating the relations between the classes and the attributes of the initial and the new database, and the execution of the required transformations. c.
In case of aggregation or merging entities, the policy of attributes values composition should be predefined. d.
Each entity of a feature class should be identified by a unique code (id) to facilitate its retrieval from the original database when required, usually for correcting attributes values.
As mentioned in Section 2.4.1, the semantic generalization process is executed on two levels as [18] suggested: a. on the schema level and b. on the instance level.
Schema level transformations are processed as a sequence of actions with incorporated quality controls to execute the transformation of the initial database into the new schema: i.
Data transfer from the initial database to the new database, according to the correlations prescribed in Tables 1 and 2 of categories and attributes.
• In case of one-to-one ('migration') and many-to-one ('class abstraction') relationships between features classes in the initial database and in the new database, the following evaluation procedures are performed in hierarchical order to ensure satisfaction of the set constraints: evaluation of the compatibility of the projection reference system and evaluation of the compatibility of the entities' geometric type by applying query techniques on the database. In case of incompatibility, an automatic correction is applied and the transformed data are appended to the corresponding class of the new database in accordance with the sub-categories of the feature class. The quality controls are performed in hierarchical order and concern the completeness of the registrations and the correct classification of the entities in the feature class sub-categories. The results are recorded as the number of missing entities and erroneous thematic classification cases. Errors are automatically corrected by retrieving data from the initial database.

•
In case of feature class elimination, one-to-none relationship, evaluation procedures are performed based on the geometric type of the associated entities to ensure the conceptual consistency in the database regarding the existence of holes within polygons, gaps between links and 'orphan' lines. In case of polygonal entities, the existence of enclosure relations regarding the eliminated features is examined in conjunction with the retained polygonal entities that have a hole. A hole is not permitted to remain in the position of the eliminated entity. A class composition transformation is therefore applied. Similarly, in case of linear connections, gaps are not allowed so they must be identified. For this reason, a technique is applied that includes the calculation of the dangle nodes before and after the removal of a feature class. The gap in the connection is identified when the start/end points of a line in the initial database are identical with the start/end points of two different lines in the new database, which are characterized as dangle nodes. The integration of the eliminated line with the retained line is automatically applied with the retrieval of the eliminated line from the initial database. The attributes values of the aggregated or merged entities are removed. In case of orphan lines, the lines that have dangle nodes at the endpoints and do not intersect with other entities (polygons, points) are characterized as 'orphans' and are deleted. The results are recorded as number of conceptual inconsistencies at three levels: 'holes', 'gaps', and 'orphan lines'.
ii. Evaluation of the attribute values correctness in the attributes fields (thematic accuracy/non-quantitative attribute value correctness) is performed by identifying fields with "empty" values. Results are recorded as the number of incorrect attributes field values. Errors are automatically corrected by retrieving data from the initial database. iii.
Evaluation of the attribute values compliance to field's domain. Results are recorded as the number of incorrect attributes field values. Errors are automatically corrected by retrieving data from the initial database. Items with attribute values that violate the field's domain are eliminated in the next level of semantic generalization process.     Instance level transformations refer to those applied on single features within features classes. Their application follows the hierarchical order of polygons-lines-points. They are guided by the constraints of the preservation of the legibility improvement (minimum dimensions), the preservation of feature compatibility to its class intension and the preservation of the readability between entities.
In the case of polygonal features, transformations are related to: a. "holes" elimination with respect to a minimum value for the area parameter, b. to the elimination of features that are not compliant to class intension rules (minimum dimensions, domain inconsistency) with respect to conceptual consistency regarding holes occurring after feature elimination and c. to features reclassification. In case of aggregated features, attributes are modified accordingly. The remaining attribute fields are populated with the values of the entity with the maximum area. Figures 2 and 3 depict the polygonal cartographic features resulting from the developed semantic generalization methodology.
Respectively, line features transformations (Figure 4) include: a. elimination of features not complying to class intension rules, to feature class domain or due to density reasons with respect to conceptual consistency regarding missing links and orphan lines and b. features reclassification. The reduction in features density is implemented through a simplified technique considering the values of one or two attributes and the geometric characteristics of features, as there is no specifications provision regarding features hierarchy in feature class. Therefore, the attributes values of the features that should be retained are set as 'hard values'. Then, for each feature carrying the specific values, a buffer is created with width consistent to separation distance limit (0.25 mm at generalization scale). Line features inside the buffer are eliminated if their attributes values differ from the set 'hard values'. In case of same attributes values, the selection is based on features length. The feature with the maximum length should be retained. In order to resolve the problem of gaps that may occur in the previous step, a simplified method was developed to recover line segments corresponding to gaps. This method is based on lines difference before and after the transformation in combination with the dangle nodes occurring after the transformation.
Finally, in the case of point features, point elimination is applied when there is violation concerning feature class domain and violation of the separation distance. In this case, an attribute should be added to control the hierarchical order of features elimination in the same feature class. Figure 5 is the flowchart of the transformations carried out in the framework of the quality model for semantic generalization.

Results and Discussion
In this article, a detailed methodology on the quality evaluation and assessment of cartographic data generated in map generalization is described in combination with the implementation of the quality model for semantic generalization. The outcomes-both analytical and graphical-of the application of the proposed methodology to a real geodata set at scale 1:250,000 for the production of maps at scales 1:500,000 and 1:1:1,000,000 show that the result achieved is the one expected. More specifically, in the case of polygonal features, the constraints proposed are sufficient for conducting semantic generalization and assessing the quality of the produced data with respect to conceptual consistency, map legibility (density, features distinction, features identification) and thematic accuracy. The polygonal features in Figures 2 and 3 depict the data resulting through the transformations described in the previous section. They demonstrate that the sequence of operations with the corresponding quality controls incorporated in the quality model lead to the correct generalization of polygonal data. Therefore, following the guidelines of the quality model, the produced polygonal data are suitable to be inserted in the phase of cartographic generalization.
Likewise, Figure 4 shows the line features resulting from the generalization of linear data. With respect to the specific features category, further research is required concerning density reduction when hierarchy has not been set in the feature class.
It is emphasized that the charts shown constitute the graphical representation of the database created automatically in the framework of the system developed. In cases of conflicts between features belonging in different feature classes, they will be resolved through cartographic generalization where more measures and techniques are available.
The main contribution of the methodology developed is to monitor quality in map composition and especially in its most critical phase, that of map generalization. This will reduce significantly cartographers' need to review each transformation throughout the map composition process with considerable savings in time and money. On the other hand, it will secure the quality of the final map. The proposed methodology is based on international standards for both the constraints used in the generalization process as well as for the evaluation and assessment of the result. The system developed in the framework of this study results in the creation of a scale-dependent cartographic database that contains exclusively the features to be portrayed on the map, properly generalized according to the map scale. This is a unique characteristic that the commercial cartographic systems do not provide for. Another advantage of the proposed methodology is that it operates autonomously in the environment of any commercial geographic information system and provides easy techniques that can be encoded in the Python programming language, which nowadays is widely used. It also offers the basis for the creation of a quality-verified cartographic database through the generalization of reference geospatial data.

Conclusions
The methodology described and the quality models developed integrate the results of the research on the topic carried out by the authors with certain components resulting from previous research. It can be utilized by national mapping agencies to automate the process of generalization in map production together with a functional model that will ensure the quality of the product. Its conceptual framework is compliant with the logic adopted by commercial map production systems. The detailed presentation of the procedure and the analysis of the structure/content of the quality model along with the tests that should be carried out will allow others to replicate and build on its results for utilization in a map-production environment. Furthermore, it will result in the development of a 'knowledge base' for use in such an environment. The methodology has been tested through the development of code in Python on a specific data set and proved to lead to very good results.
Future research can be carried out on the optimization of the proposed techniques, especially on the technique regarding lines density reduction based on geometric criteria.
In addition, the evaluation and quality assessment of the resulting cartographic data should be complemented through the implementation of the corresponding quality model for cartographic generalization, leading to the production of high-quality cartographic data bases and maps.