Map Metadata: the Basis of the Retrieval System of Digital Collections

: The article presents research on the evaluation of hidden map metadata. A hidden map is a map being part of a book that illustrates certain facts described in the book (e.g., military campaigns, political processes, migrations). The evaluation regards their completeness. Metadata completeness is the degree to which objects are described using all metadata elements. The analysis took into account the metadata of archival maps accessed via the GeoPortOst geoportal. Over 3000 hidden maps from the period 1572–2018 were analyzed, and the map set was divided into 8 collections. The main purpose of cartographers and librarians is to facilitate understanding of the relationship between individual information (librarians) and spatial data (cartographers). To this end, the research focused on the kind of information about old maps that should be stored in metadata to describe them in terms of space, time, content and context so as to increase their interoperability. The following metadata were taken into account in the assessment: title of content, type of content, date, date range, rights, language, subject, distribution format, geographic location, scale of map, reference system, mapping methods, map format, and source materials used to develop the map. The completeness of individual metadata as well as the completeness of metadata for individual collections was assessed. Finally, good practices of individual collections and metadata that could increase the interoperability of the entire collection were identiﬁed. The evaluation enables the owners to show the strengths and weaknesses of a given collection in a quick and easy way.


Introduction
Cartographers and librarians have a lot in common. Their products share the same goal of providing orientation regarding (spatial or informational) relations that are difficult to survey. They both have in mind the needs of their users, who are looking for access to spatial entities or information. They both use a set of instruments that highlight patterns and reduce complexity. The librarians' traditional product, the catalogue, may be described in the same words as [1] stated for maps: 'Nothing [...] is reality; everything is representation. ' However, the mechanisms that cartographers and librarians have developed to represent realities are different. Cartography encompasses the concept of 'space' in graphic and mathematical terms, while in libraries, information is made verbally accessible through documentation languages. Geographic names are the main focus of attention here. This may be a problem when describing maps in libraries: space is a physical constant, but geographic names can change in time [2,3].
The progressing digitization of maps in libraries and the use of specific geodata portals allow this problem in representing geographic media to be largely solved. Geographical media can be searched for in more intuitive and effective ways (Figure 1), for example by using an interactive web map service.
The progressing digitization of maps in libraries and the use of specific geodata portals allow this problem in representing geographic media to be largely solved. Geographical media can be searched for in more intuitive and effective ways (Figure 1), for example by using an interactive web map service. Networked spatial information not only allows for location-based resource searches, but also makes the exploration of topographic relationships possible [4]. The implementation of a geographic interface makes it much easier to find maps of a specific area [5]. However, it is not sufficient for a comprehensive retrieval. Old maps and thematic maps in particular come with a specific context of provenance and have a specific content. There are many projects that analyze the standardized description of maps in digital libraries, which suggest ways by which one can describe old maps using library standards [6][7][8] or spatial data standards [9,10]. Additionally, certain projects which combine archival documents from various digital libraries [11,12] and some others collect archival cartographic documents [13][14][15]. Subirana [10] emphasized that it is worth describing old maps of geographic, spatial data by creating SDI (spatial data infrastructure). SDI is known as an appropriate set of institutional policies and agreements, standards, and technologies, as well as human resources that are necessary for users to use geospatial information for various purposes, not just those for which they were created [16]. Users need increasingly better spatial data that can be used according to their needs [17]. Unfortunately, in huge collections, finding a specific map of interest is often very difficult. We may either find a lot of results or no results at all. Despite many initiatives aimed at improving the interoperability of collections, quality should be kept in mind, as now it will require more attention. It is also worth paying attention to whether the proposed standards are used to develop map collections in digital libraries and if the metadata are collected in an appropriate way, according to the rules [18]. Existing research gaps [19] were identified, proving that there is a strong need for new research contributions in the evaluation of map metadata. Therefore, the first research question is: what kind of information about old maps should be stored in metadata to describe them in terms of space, time, content and context to increase their interoperability? The implementation of a geographic interface makes it much easier to find maps of a specific area [5]. However, it is not sufficient for a comprehensive retrieval. Old maps and thematic maps in particular come with a specific context of provenance and have a specific content. There are many projects that analyze the standardized description of maps in digital libraries, which suggest ways by which one can describe old maps using library standards [6][7][8] or spatial data standards [9,10]. Additionally, certain projects which combine archival documents from various digital libraries [11,12] and some others collect archival cartographic documents [13][14][15]. Subirana [10] emphasized that it is worth describing old maps of geographic, spatial data by creating SDI (spatial data infrastructure). SDI is known as an appropriate set of institutional policies and agreements, standards, and technologies, as well as human resources that are necessary for users to use geospatial information for various purposes, not just those for which they were created [16]. Users need increasingly better spatial data that can be used according to their needs [17]. Unfortunately, in huge collections, finding a specific map of interest is often very difficult. We may either find a lot of results or no results at all. Despite many initiatives aimed at improving the interoperability of collections, quality should be kept in mind, as now it will require more attention. It is also worth paying attention to whether the proposed standards are used to develop map collections in digital libraries and if the metadata are collected in an appropriate way, according to the rules [18]. Existing research gaps [19] were identified, proving that there is a strong need for new research contributions in the evaluation of map metadata. Therefore, the first research question is: what kind of information about old maps should be stored in metadata to describe them in terms of space, time, content and context to increase their interoperability?
This question is explored using the portal GeoPortOst: Thematic and Hidden Maps of Eastern and Southeastern Europe [20] as an example. As it integrates cartographic resources of different provenance, GeoPortOst is a good example of a new type of collection. Traditional library collections ISPRS Int. J. Geo-Inf. 2020, 9, 444 3 of 13 are 'owned' and local [21]. In the digital world, this limitation no longer exists. The 'owned' resources can interact with external ones in a network logic. As a result, in place of the physical stock, patterns, themes, or a research agenda become relevant for the construction of a collection [22]. GeoPortOst provides an infrastructure for aggregating heterogeneous documents in different formats and at various levels of indexing. The decisive factor for the portal is no longer keeping and preservation, but rather processing, arranging and sharing [23]. The digital collection thus loses its static nature and can be understood as a process of assemblage around the users' needs [24]. What is crucial now is how entities are described in the metadata. We assume that '[...] metadata will govern the outcome of the generation of transactional sets' [25]. Therefore, the second research question is: which collection in GeoPortOst Project provides resources metadata in such a way as to give the users the best chance of using the cartographic materials necessary for their research and for generating datasets?

Materials and Methods
The subject of research are metadata of archival cartographic documents. Currently, metadata are the basis for searching objects in retrieval systems to find digital data. Based on the metadata assessment, we can determine the extent to which archival materials are available to users and assess the quality of map metadata. Quality is in this case understood as a set of features that determine how the product fits to satisfy certain needs [26].
Based on the evaluation of map metadata presented in the research of Kuźma and Mościcka [27], we conducted an analysis of hidden maps. A hidden map is a map being part of a book that illustrates certain facts described in the book (e.g., military campaigns, political processes, migrations) [9]. The methodology includes adopting a scope pattern, assigning metadata elements from a particular map collection to it, verifying how the metadata of a specific digital collection are consistent with the scope pattern, and the statistics on the evaluation of map metadata (Figure 2.). This question is explored using the portal GeoPortOst: Thematic and Hidden Maps of Eastern and Southeastern Europe [20] as an example. As it integrates cartographic resources of different provenance, GeoPortOst is a good example of a new type of collection. Traditional library collections are 'owned' and local [21]. In the digital world, this limitation no longer exists. The 'owned' resources can interact with external ones in a network logic. As a result, in place of the physical stock, patterns, themes, or a research agenda become relevant for the construction of a collection [22]. GeoPortOst provides an infrastructure for aggregating heterogeneous documents in different formats and at various levels of indexing. The decisive factor for the portal is no longer keeping and preservation, but rather processing, arranging and sharing [23]. The digital collection thus loses its static nature and can be understood as a process of assemblage around the users' needs [24]. What is crucial now is how entities are described in the metadata. We assume that '[...] metadata will govern the outcome of the generation of transactional sets' [25]. Therefore, the second research question is: which collection in GeoPortOst Project provides resources metadata in such a way as to give the users the best chance of using the cartographic materials necessary for their research and for generating datasets?

Materials and Methods
The subject of research are metadata of archival cartographic documents. Currently, metadata are the basis for searching objects in retrieval systems to find digital data. Based on the metadata assessment, we can determine the extent to which archival materials are available to users and assess the quality of map metadata. Quality is in this case understood as a set of features that determine how the product fits to satisfy certain needs [26].
Based on the evaluation of map metadata presented in the research of Kuźma and Mościcka [27], we conducted an analysis of hidden maps. A hidden map is a map being part of a book that illustrates certain facts described in the book (e.g., military campaigns, political processes, migrations) [9]. The methodology includes adopting a scope pattern, assigning metadata elements from a particular map collection to it, verifying how the metadata of a specific digital collection are consistent with the scope pattern, and the statistics on the evaluation of map metadata (Figure 2.). The scope pattern defines how the metadata of archival maps should be described. This pattern [27] has been modified to reflect the specifics of hidden maps. The whole scope pattern was divided into two parts: the first one is connected with common metadata in a digital library and the second one is related to cartographic metadata.
Compliance with the scope pattern is assessed based on the features (how to obtain data from the metadata profile of a given digital library to the scope pattern, i.e., directly, by simple analysis, or specialist analysis). Individual features have been assigned weights that allow determining to what extent (how easily/difficultly) data can be obtained. The level of difficulty of obtaining metadata may be calculated by using the formula [27]: The scope pattern defines how the metadata of archival maps should be described. This pattern [27] has been modified to reflect the specifics of hidden maps. The whole scope pattern was divided into two parts: the first one is connected with common metadata in a digital library and the second one is related to cartographic metadata.
Compliance with the scope pattern is assessed based on the features (how to obtain data from the metadata profile of a given digital library to the scope pattern, i.e., directly, by simple analysis, or specialist analysis). Individual features have been assigned weights that allow determining to what extent (how easily/difficultly) data can be obtained. The level of difficulty of obtaining metadata may be calculated by using the formula [27]: where: n-the total number of criteria in scope pattern; k-the criterion number; w k -the weight of obtaining data (1.0-directly, 0.8-simple analysis, 0.5-specialist analysis, 0.0-lack of data) for k-th criterion.
The higher the value of E, the easier it is to obtain data. The evaluation relates to the completeness. Metadata completeness is the degree to which objects are described using all metadata elements [28].

Metadata Scope Pattern
The archival maps were analyzed in the study, on the one hand, as part of an information container, such as books or magazines, and on the other hand, an independent representation of a geographical space. With the above in mind, the metadata scope pattern was developed based on Dublin Core [29,30], MARC21 (machine-readable cataloging) [31]. Some of the features, such as the title of the content, type of content, date, date range, rights, language, subject, and distribution format, are directly connected with digital objects in each digital library. They are very common, and all objects have this kind of metadata. They are usually easy to gather and collect in databases.
Library staff should, however, possess knowledge about specific characteristics of maps. This knowledge may be used for describing maps by using the geographic location, the scale of the map, the reference system, mapping methods the, map format, and the source materials that were used to develop the map. Those kinds of information do not have standardized metadata in Dublin Core or MARC21 to collect specific cartographic features. Even though there are some initiatives which demonstrate how MARC21 [6] or Dublin Core [7] can be used, each librarian may gather the same information in a different way, without any rules, or each library may establish their own rules to collect data. This means that the collections in different libraries are not interoperable. Therefore, the metadata scope pattern was developed based on the research of Kuźma and Mościcka [27]. We have adopted the following: type of content, date, date range, rights, language, subject, distribution format, such as typical metadata, and geographic location, scale of map, reference system, mapping methods, map format, and the source materials used to develop the map, such as cartographic metadata. The access rights 1 and access rights 2 were included in rights. We added the title of content because often only the title contained detailed information about a given map (such as the area or map topic) ( Table 1). Geographic location cartographic 10.
Scale of map cartographic 11.
Source materials used to develop the map cartographic Two characteristics were used to evaluate the metadata: • Completeness for each of the evaluation criteria-E k [27]: where c-collection number from Table 2; m c -number of all maps in a particular collection; m ck -number of maps that have metadata for each evaluation criteria in a particular collection.

•
Completeness for each collection depending on the number of resources in the digital collection for typical metadata: depending on the number of resources in the digital collection for cartographic metadata The biggest and the oldest collection is GeoPortOst, which gathers 1.169 historic maps, and the oldest map comes from 1572. The newest resources come from Digital collections and Lambda.
The calculation connected with the evaluation of map metadata for collections in a particular digital library is presented in the Results subsection.

Data
Our research was based on maps in the GeoPortOst Project [20]. GeoPortOst was developed at the Leibniz Institute for East and Southeast European Research in Regensburg (IOS) from 2014 to 2019 with funding from the German Research Foundation [32]. GeoPortOst provides access to maps of Eastern and Southeastern Europe. The collection includes notably hidden, thematic maps related to history and ethnography as well as the economic and social relations of this area. We define hidden maps as maps that have been printed in publications and have been catalogued in a special catalogue at the IOS. The special source value of hidden maps, especially for area studies on Eastern and Southeastern Europe, lies in the fact that they stand directly in the context of scientific studies or political texts, and often function as arguments in a narrative. Thus, they are not only orientation aids, but also the means of scientific proof for spatial constructions. Maps of this kind reproduce 'selective representations of reality' [33], visually highlighting qualitative dimensions of space or omitting them. The portal currently contains 3027 digitized maps from several institutions. The maps were georeferenced in a crowdsourcing campaign (using the Klokan Technologies Georeferencer application) and ingested in a GeoBlacklight database [34].
The following collections are accessible in the project GeoPortOst: The main details about each collection in GeoPortOst are presented in Table 2.

Results
Based on the presented methodology, GeoPortOst metadata were evaluated. Metadata in GeoPortOst were entered according to the rules established by a team of librarians, historians, and geographers at the Leibniz Institute for East and Southeast European Research. The rules were created based on our experience and some of the recommendations concerning the use of MARC21 [6] and Dublin Core [7], as well as the experiences of different initiatives for sharing digital maps on the Internet [35,36]. The metadata for the maps were first exported from the library's structuring language Aleph sequential format (ASEQ) into a simple Excel spreadsheet, and then supplemented with additional information that is not commonly found in library catalogues (e.g., after georeferencing by coordinates or references to context documents). Each document is described by 60 attributes in 60 columns. In addition, all subjects of the maps from the authority files of the German National Library (GND) [37] were refined with Wikidata [38] using Open Refine [39]. Finally, the table fields were mapped to Dublin Core, Bibo and Geosparql and fed into a Solr index for final implementation with GeoBlacklight [34]. Furthermore, the data are available in Resource Description Framework (RDF) and can be queried via a Protocol and RDF Query Language (SPARQL) endpoint [40].
An inseparable part of the database is a spreadsheet in which administrative metadata (metadata about metadata) and relationships between individual standards (Dublin Core [29], MARC21 [31], ISO [41], Europeana Data Model [11]) are collected. It also contains metadata transformations so that they can be used in systems based on these standards.
The most time-consuming part of the evaluation was to match information from metadata in the analyzed digital library to the scope pattern. The assignment of metadata elements of the analyzed collections to the adopted evaluation criteria is presented in Table 3, and the level of difficulty of obtaining metadata was calculated according to Formula (1). Table 3. Assignment of metadata elements of the analyzed collections to the adopted evaluation criteria.  Table 3 demonstrates clearly that the metadata contain elements that correspond to 13 out of 14 evaluation criteria. Almost each item of metadata was gathered directly from particular metadata in the GeoPortOst database. This means that each evaluation criterion has an equivalent in the GeoPortOst database, and the weights are equal to 1.0 for almost all criteria. Mapping methods were obtained by using simple analysis, and their weight is 0.8. Unfortunately, the information about the reference system is not provided in the GeoPortOst database (so its weight equals 0.0).

Item (k) Evaluation Criterion Metadata in GeoPortOst
What is noteworthy is the separation of width and height when determining the map format, so there is no problem using these numerical values to determine the details of the map (if we know the extent of the geographical coordinates of the map).
According to the specific hidden maps, a very detailed description of the source documents is provided. As a result, it is possible to determine the map's reliability, time of creation, descriptive information, and the type of data that were the basis for creating the map.
Having the above in mind, the level of difficulty equals 12.8 out of 14.0, which means that it is very easy to obtain important metadata to the scope pattern.
Typical features, such as the title, the type, the date, the data range, rights, language, the subject, and the distribution format were considered for evaluation. At first, the completeness for each evaluation criterion (E k ) was calculated according to Formula (2). Then, completeness for each collection (E tc ) depending only on typical metadata was calculated according to Formula (3). The results of these calculations are presented in Table 4. The collections Ethnodoc, Lambda, and the Handbook of the History of Southeastern Europe had the most complete characteristic of typical criteria. On the other hand, GEI digital and IEG-Maps lacked information about the distribution format and provided little information about the subject. Finally, in GeoPortOst, it is worth improving the following metadata: date range and language. The subject was the metadata with the lowest score. All collections were characterized by well-collected information about the tittle, type, rights, and date, and the completeness for those criteria was about 8.0 out of 8.0.
Cartographic metadata, including the geographic location, scale, reference system, mapping methods, map format, and information about source, were considered for evaluation. Completeness for each evaluation criterion was calculated according to Formula (2). Completeness for each collection (E mc ) depending only on map metadata was calculated according to Formula (3). The results of these calculations are presented in Table 5.   Table 5 shows that geographic coordinates were provided for almost every digital object in the library and that the map metadata with the second best result were information regarding the source. The GeoPortOst offered the most comprehensive cartographic characteristics-the completeness for this collection equaled 3.0 out of 6.0. The completeness of Digital collections and GEI digital was 2.5 and 2.7, respectively.

Discussion
Research has shown that the scope pattern to describe maps by space, time, content, and context to increase their interoperability should contain metadata that are typical for all objects in the digital library, as well as those that are specific only to maps. It is known that typical metadata are easy for catalogers to obtain in the library [42]. It turned out that the title, type, rights, and date were the most complete, but the subject was the worst described metadata for all collections. This is due to the fact that the appropriate qualification of the subject is difficult, especially for maps that originate from a wide time range, as in the case of Digital collections (1575-1918), GEI digital (1833-1918), and GeoPortOst . It is much easier to define a subject for modern maps or those originating from the same period, such as in the Ethnodoc collection.
Another important item of metadata item is type. This was based on the controlled vocabulary of map types in the authority files (GND) from the German National Library [43]. It is an open vocabulary that contains 52 subjects. It is worth harmonizing this vocabulary because it contains very similar types of maps, e.g., Geschichtskarte or Historische Karte. Additionally, since it has been maintained only by the library community, it generates certain typing errors, such as Topografische Karte/Topographische Karte.
Thus, cartographic metadata are not so easy to obtain. Geographic coordinates are best collected because the idea of the creators of the digital library was to provide maps that have spatial reference. This information is a priority. GeoPortOst is the most comprehensive of the described collections. Its description is focused on information about the source, as well as on the map dimension. This information turned out to be relatively easy to collect by librarians (non-cartographers, and non-geographers). Scale, which is the denominator of the map scale, was collected for 896 maps (30% of all objects) in 4 out of 8 collections. With well-defined dimensions and geographical coordinates, it is possible to determine the level of detail of maps. Furthermore, as the maps in this collection originate from the period 1572-1934, it is quite difficult to obtain information about the scale of the oldest maps. Unfortunately, information about the reference system, which provides details about map distortion and the possibility to transform the map to use it in various systems, is not gathered by any collection. It turned out that the most complete collection is Ethnodoc, as far as typical information regarding digital objects is concerned. It may serve as a model. The collection is consistent, and it includes maps by two authors that were published in 2004-2018. Cartographic metadata were collected in the most comprehensive way for the GeoPortOst. This was also the only collection for which map format data had been gathered.
Hidden maps collected in the GeoPortOst Project come from various books. In the library, cataloguers collect information about the source, that is, the author, the title, the publication date, the catalog number, and the source link. Table 6 presents the number of maps which have metadata regarding their source. The source description is very valuable information in the context of hidden maps. It turned out that information about the publication date and title are best collected, because those metadata are collected for 88-89% of all objects.
In addition, it should be noted that the set has been associated with numerous content aggregators. Thanks to this description, maps are available via Online Public Access Catalog (OPAC), Wikimedia commons ( Figure 3) [44], Karten Speicher (a network connecting resources from various German libraries [15]), DFG Viewer (eng. German Research Foundation Viewer, German: Deutschen Forschungsgemeinschaft) [45], Wiki data [38], Georeferencer [13] or web apps that can simply use this map with web map services, and Recogito [46]. GeoPortOst is being planned to import to old maps online [14]. It turned out that information about the publication date and title are best collected, because those metadata are collected for 88-89% of all objects.
In addition, it should be noted that the set has been associated with numerous content aggregators. Thanks to this description, maps are available via Online Public Access Catalog (OPAC), Wikimedia commons ( Figure 3) [44], Karten Speicher (a network connecting resources from various German libraries [15]), DFG Viewer (eng. German Research Foundation Viewer, German: Deutschen Forschungsgemeinschaft) [45], Wiki data [38], Georeferencer [13] or web apps that can simply use this map with web map services, and Recogito [46]. GeoPortOst is being planned to import to old maps online [14]. Further research on the provision of cartographic resources will concern the ontology of time and space in the context of the use of old maps by historians, geographers, cartographers, and librarians [47].

Conclusions
When describing maps, metadata should be divided into three groups. The first is administrative data, i.e., metadata concerning metadata. These metadata are collected in special databases that show the kind of standards that were used to prepare particular metadata, and the relationship between different standards. The second are typical metadata for each object in a digital library, such as the author, publication date, etc., and the third are cartographic metadata, which describe the character of maps. The answer to the first research question is as follows: the set of metadata to describe maps should contain typical metadata (the title, type, date, date range, rights, language, subject, and distribution format) and cartographic metadata (the geographic location, scale, reference system, mapping methods, map format, and, finally, information about the source).
It turns out that some collections are well described by typical metadata such as the title, type, rights, and date. The Ethnodoc collection has the most complete metadata, while the GeoPortOst Further research on the provision of cartographic resources will concern the ontology of time and space in the context of the use of old maps by historians, geographers, cartographers, and librarians [47].

Conclusions
When describing maps, metadata should be divided into three groups. The first is administrative data, i.e., metadata concerning metadata. These metadata are collected in special databases that show the kind of standards that were used to prepare particular metadata, and the relationship between different standards. The second are typical metadata for each object in a digital library, such as the author, publication date, etc., and the third are cartographic metadata, which describe the character of maps. The answer to the first research question is as follows: the set of metadata to describe maps should contain typical metadata (the title, type, date, date range, rights, language, subject, and distribution format) and cartographic metadata (the geographic location, scale, reference system, mapping methods, map format, and, finally, information about the source).
It turns out that some collections are well described by typical metadata such as the title, type, rights, and date. The Ethnodoc collection has the most complete metadata, while the GeoPortOst collection has the most correct cartographic metadata, including the geographic location and information about the source.
Data that are completed in compliance with uniform rules are easier to make more interoperable. This is what made it possible to link data from the GeoPortOst Project to other databases.
The evaluation allows for identifying good practices in collecting metadata, such as detailed information about source material. The evaluation enables the owners to show the strengths and weaknesses of a given collection in a quick and easy way. Additionally, it is possible to detect errors and introduce quick/easy improvement. It may also indicate elements that can be used further, for example, the width, height and geographic coordinates may be used to calculate the level of detail of a given map.