Describing Geospatial Assets in the Web of Data: A Metadata Management Scenario

: Metadata management is an essential enabling factor for geospatial assets because discovery, retrieval, and actual usage of the latter are tightly bound to the quality of these descriptions. Unfortunately, the multi-faceted landscape of metadata formats, requirements, and conventions makes it difﬁcult to identify editing tools that can be easily tailored to the speciﬁcities of a given project, workgroup, and Community of Practice. Our solution is a template-driven metadata editing tool that can be customised to any XML-based schema. Its output is constituted by standards-compliant metadata records that also have a semantics-aware counterpart eliciting novel exploitation techniques. Moreover, external data sources can easily be plugged in to provide autocompletion functionalities on the basis of the data structures made available on the Web of Data. Beside presenting the essentials on customisation of the editor by means of two use cases, we extend the methodology to the whole life cycle of geospatial metadata. We demonstrate the novel capabilities enabled by RDF-based metadata representation with respect to traditional metadata management in the geospatial domain.


Introduction
Retrieval of assets on the Web primarily relies on the metadata describing them.Data portals can then articulate search mechanisms on the basis of these descriptions, with degrees of expressiveness that are dependent on the granularity of the metadata schema governing their structure.Among assets, those related to geospatial information are relying almost entirely on metadata for retrieval, often dubbed as discovery, because assets are typically in non-textual format and then the indexing practices of generalist search engines are inefficient at best.Moreover, this category of data is characterised by properties (e.g., the geographic extent, a.k.a. the bounding box) that require standard metadata in order to be encoded and processed by the specific-purpose tools in this domain [1][2][3].This is among the drivers of the standardisation effort represented by the INSPIRE Directive, [4][5][6], a collection of recipes for interoperability that have sound foundation on standards.INSPIRE allows for harmonisation of geospatial assets among European countries because, despite the different transpositions of the Directive in the distinct countries, the baseline set by core INSPIRE metadata allows the geospatial community to discover and access heterogeneous assets in a seamless fashion.
However, the landscape of data management has inevitably changed since the formulation of the Directive and its early, prototypical implementations.With regard to metadata, the once-state-of-the-art groundwork set by the reference standards for geographic information [7][8][9] has become inadequate, primarily because of the shift in perspective from data representation (that is, the quest for appropriate encoding mechanisms for data and metadata in a specific domain) to data access and processing.As an example, terms like Open Data, Linked Data, RDF, and SPARQL are increasingly associated with data interoperability and open-ness.Also, novel requirements not envisaged in the original framework of INSPIRE have emerged in the last few years, some of these are being tackled by INSPIRE Thematic Working Groups (such as in [10,11]).As an example, data catalogues-in the broadest sense-have been enriched by the new category of data sources constituted by real-time/near-real-time data pulled from sensors.Consequently, these new data sources required ad-hoc data and metadata representations for management and enactment.Finally, as formats and practices develop in the geospatial context, the Web at large participates in the emergence of the vast amount of machine-processable information generally referred to as the Web of Data [12,13].In general, Semantic Web technologies have a great potential for fine-grained discovery of assets, as testified by the applications of semantics in many distinct domains, such as in [14][15][16].
In the engineering of the data sharing infrastructure for RITMARE (http://www.ritmare.it/)(a Flagship Project by the Italian Ministero dell'Istruzione, dell'Università e della Ricerca) we decided to adopt best practices that, together with compliance with the INSPIRE Directive, could also ease migration to the aforementioned novel paradigms for data representation and access.In fact, on the one hand we need to harmonise metadata according to the baseline set by INSPIRE (specifically, by the Italian transposition of the Directive, RNDT [17]) and also according to SensorML, the metadata format we employ for sensor information [18,19].On the other, we want to ground metadata creation on controlled vocabularies and, in general, context information drawn from the Web of Data.Referring to the categorisation of heterogeneities in [20], our research tackles the semantic level, the other two (syntactic and structural) being addressed by the authoritative standardisation bodies (e.g., the ISO, INSPIRE, and OGC communities).Also, it should be noted that, among the different efforts addressing semantic mismatch of metadata descriptions, we are concentrating on semantic lift (that is, the association of text-based property values with unique, URI-based identifiers, as detailed in Section 3).Other research threads, such as the complementary activities devoted to harmonising the independent contributions to the geospatial Web of Data into a consistent mesh of interconnected data structures, shall be considered as off topic.
In other words, we present our solution for bridging the gap between legislative compliance and the aforementioned novel Web 3.0 practices [21].In particular, we focus on the integration of external data sources in order to turn single-tenanted, monolithic metadata descriptions into living documents.In fact, our RDF-based metadata descriptions can more easily adapt to changes (researchers changing workplace, companies moving to a different address, terminologies evolving, etc.) w.r.t.traditional, text-based metadata and elicit innovative discovery mechanisms, as exemplified in Section 4. It should be noted that, whereas the practices described in this work stem from the requirements posed by the RITMARE project, usage of the methodology does not require any prior knowledge on the project.In fact, our metadata management framework has been adopted by a number of projects, namely the FP7 projects ERMES (http://www.ermes-fp7space.eu/it/homepage/)and EuroFleets2 (http://www.eurofleets.eu/np4/home.html),the H2020 project eLTER (http://www.lter-europe.net/lter-europe/projects/eLTER), and the Italian Flagship Project NextData (http://www.nextdataproject.it/).Moreover, the metadata profiles that are available out-of-the-box are of general interest to the INSPIRE and SWE communities [22] and then our methodology is fit for adoption by data providers.
The methodology for metadata management we propose is hinged on templates; that is, definitions of metadata schemas created according to a metalanguage (here, we use "metalanguage" instead of "template language" to avoid ambiguity with the definitions in Section 3.2).Templates express the structure of a target XML metadata schema, such as the already mentioned INSPIRE and RNDT schemas, but also SensorML (versions 1.0.1 and 2.0.0)[18,19].They also contain the necessary information to apply semantic lift to the metadata that is generated.Specifically, the template drives creation of a metadata editing interface, the application frontend, that is bound to the intended schema and data sources.Then, the metadata authoring activity provides additional information that is not meant to the specific XML output but rather to constituting the semantic counterpart to the metadata description, expressed as RDF [23].Whenever the original metadata are requested, for example after discovery of the asset by the end user, the template is looked up again, the RDF description corresponding to the asset is extracted from the catalogue, and the XML description is reconstructed.The advantages of this practice with respect to metadata consistency are apparent: In fact, every time the XML metadata description is generated, it is built on the basis of the updated information drawn from the Web of Data according to what the template and the RDF description specify.
This paper is organized as follows.Section 2 draws the context of this work, comprising essentials on Semantic Web technologies that are going to be useful in the following, presents an overview of project RITMARE, and introduces related works.Among the solutions proposed, we focus on EDI, the metadata management application harnessing XML and RDF for metadata description.Section 3 presents two use cases and implements them as template structures; this Section also covers integration of external data structures and generation of the distinct output formats.Section 4 discusses the advantages of our approach.Finally, Section 5 draws conclusions, hints at the "big picture" of decentralised metadata management, and outlines future work on the subject.

Semantic Web Essentials
Among the innovative aspects of our approach is recourse to information drawn from the Web of Data for metadata provision.We are going to barely scratch the surface of the variety of technologies, specifications, and practices that characterise this domain but, after reading this paragraph, the reader shall be aware of the data representation and query formalisms that are used throughout the paper.Further information on this topic can be found in [24].Basic knowledge of the XML data model (https://www.w3.org/standards/xml/l) is also useful to fully grasp the structure of the metalanguage we developed; still, introducing this baseline technology is out of the scope of this work.
The reference data model for the Web of Data is RDF [23], a formalism expressing information as directed labelled graphs whose atomic component is the triple (or assertion), such as the following: < http :// some / subject > < http :// some / predicate > < http :// some / object > .
Triples are composed of a subject, a predicate, and an object typically identified through URIs [25] (think of URIs as good old URLs that do not necessarily lead to some web content).Objects can also be plain literals, although this category of "leaf" entities does not contribute to shaping the overall data graph: < http :// some / subject > < http :// some / predicate > " some object " @en .
The namespaces [26] hampering readability of URIs are often substituted with prefixes, as in the Turtle formalism we chose among the different serialisation formats for this data model [27].Turtle also allows to avoid specifying invariant information (i.e., same property and/or same subject) when providing multiple triples by using punctuation: @prefix ex : < http :// some / > .ex : subject ex : predicate_1 ex : object_1 ; ex : predicate_2 ex : object_2 .
Whatever the specification style, RDF triples induce a (decentralised) graph whose structure and information content require ad-hoc query and, in general, manipulation formalisms.In an ideal Linked Data scenario (that is, when URIs in RDF descriptions are resolvable to the actual data structures), agents can cross the graph from one end to the other by following RDF properties; in realistic terms, browsing the Web of Data requires federating a number of distinct endpoints that are interrogated through SPARQL, an SQL-like query language.SPARQL [28], has reached maturity and provides far more flexibility and expressiveness than traditonal SQL [29] (albeit the performance that relational databases allow for when managing huge amounts of data is still unparalleled).We are only going to use SPARQL queries featuring, in accordance with SQL practices, SELECT and INSERT statements, even if the language allows for other query forms (CONSTRUCT, DESCRIBE, ASK).Retrieval of data with SELECT statements amounts to matching the data (i.e., the triples, the assertions) in the data base with the graph defined by the triples in the query.As an example, a query selecting the keywords associated with resource dataset_1, and retrieving from a remote endpoint their human-readable representations in English, is the following: In particular, in the following of this paper, we are going to exploit query federation in order to reconstruct the normative XML representation of INSPIRE metadata from the corresponding RDF description.

The RITMARE Flagship Project
As already discussed in the introduction, adoption of our metadata management techniques does not require any prior knowledge on project RITMARE.However, this section outlines the data sharing infrastructure of this specific project as an example application of the methodology in a comprehensive architecture for the management of spatial data.RITMARE requires integration of all state-of-the-art contributions to Italian marine research into a coherent SDI [30], a framework for the collection and provisioning of geospatial data, metadata, networked services, and technologies.A coarse-grained categorisation of SDIs distinguishes between centralised and decentralised infrastructures, according to whether data and metadata are stored in a single repository or distributed among the distinct data providers.The RITMARE infrastructure belongs to the second kind, comprising: • A set of peripheral nodes that expose standards-compliant metadata and services.
• A centralised catalogue service that provides access to the resources made available by the project as a whole.
The project is characterised by a heterogeneous set of data providers (public research bodies and inter-university consortia) as well as a variety of stakeholders (public administrations, private companies, and citizens).As a consequence, these entities envisage a varied corpus of heterogeneous data, metadata, workflows, and requirements.Beside this, data providers featured different degrees of maturity with regard to the provisioning of assets according to the mandated standards: This means that, in the development of the infrastructure, much effort was devoted to capacity building on the data provider-side.Moreover, being a national project, the RITMARE SDI is bound to the rules set by INSPIRE as well as by RNDT: Thus metadata management has a key role in the required architecture [31,32].This has been achieved by providing a virtual appliance, the Geoinformation Enabling ToolkIT software suite (http://www.get-it.it/),GET-IT for short, a FOSS product that is capable of kickstarting an autonomous node in the SDI for the collection, annotation, and deployment of geospatial data.Among the achievements of the GET-IT suite is integration of traditional geographic information (e.g., layers) with sensor data and, to our knowledge, GET-IT is the first application achieving this.Specifically, the GeoNode distribution (http://geonode.org/)has been complemented with ad-hoc components and, among these, the facilities for managing sensor descriptions and related observations in the Sensor Observation Service (SOS) (http://www.opengeospatial.org/standards/sos)implementation by 52 • North (http://52north.org/).Geographic layers can then be mix-and-matched with real-time data from sensors [33,34].

Related Works
Among the many products available in the state of the art for the provision of geospatial metadata, the one provided by GeoNetwork version 2.8 and earlier (http://geonetwork-opensource.org/) is to our knowledge the only tool allowing for easily "pluggable" metadata schemas.Thus it is difficult to identify editors that can be compared to our solution: We can only consider the tools that, separately, address the metadata schemas that are more widely implemented in RITMARE, that is, ISO 191** profiles and SensorML descriptions.The use case presented in this work is focusing on the first category of metadata schemas, which is also the one most widely supported by editing tools; a good review of these is provided by the page maintained by the American FGDC (ISO metadata editor review: https://www.fgdc.gov/metadata/iso-metadata-editor-review).This source makes it apparent that, although more mature editors may provide features that are currently not supported by our tool, which is still in its infancy, existing ISO metadata editors only have partial support to customisation of the governing metadata schema and no support at all for third-party data sources.The same applies to the second category of metadata editing tools, that is, those devoted to SensorML.A similar review can be found on the OGC website (Sensor Web Enablement Software: http://www.ogcnetwork.net/SWE_Software).The two editing tools referred to in this review are the Pines SensorML Editor (http://lxspine.googlepages.com/pine'ssensormleditor),and the SensorML Process Editor (http: //code.google.com/p/sensorml-data-processing/).Both are implementing an outdated version of SensorML and none implements the functionalities implied by our requirements.Also considering newer editing tools not included in this list, such as the OpenSensorHub SensorML editor (https: //github.com/opensensorhub/sensorml-editor)and the SensorNanny -DrawMyObservatory editor (https://github.com/ifremer/snanny-drawmyobservatory),none of them provide such flexibility with respect to the metadata schema that is implemented.More importantly, plugging in data structures provided by third-parties is another functionality not implemented in existing editors.

EDI, a Template-Driven Metadata Editor
In developing GET-IT, we had to support creation of metadata compliant with RNDT and SensorML; we also wanted to support the core INSPIRE profile of ISO 19115/19119 as well as its transposition by different countries.Moreover, the heterogeneity of the user community in RITMARE demanded for a high degree of customisation, particularly with respect to project-specific context information.Hence, in order to manage such diversity, we decided to abstract from the specific output format and create a general-purpose tool that, appropriately parameterised by a custom metadata schema definition, could render a web-based authoring tool in the browser and assist the user in providing the metadata.EDI (http://edidemo.get-it.it)[35], the editing tool described in this paper, is constituted by a client-side JavaScript application that can autonomously create the interface and connect to the data sources that are specified in the given template.A server-side component, written in Java, executes the actual translation of the user input in both the XML and RDF representations that are supported by our architecture.To our knowledge, no existing tool provides the same functionalities with respect to the heterogeneity of metadata profiles that can be supported and the capability of plugging in external data sources.The semantics-aware annotation of metadata fields is another charachteristic that can not be found in state-of-the-art metadata editing tools.
The EDI metalanguage allows for tailoring the behaviour of the application to a specific project or domain.On the one hand, recourse to a metalanguage for expressing the target metadata schema allows for full tailoring of the latter to the application context in hand.On the other, integration of external data structures elicit customisation of the metadata property values that populate this schema.In RITMARE, such customisation starts with the creation of templates expressing the required metadata schemas (in our case, RNDT and SensorML).Secondly, context information associated with the project (e.g., the project's structure as collection of institutes and individuals as well as selected controlled vocabularies) have been formalised and made available as RDF data structures in order to be referred to in templates.Similarly, the applicable external data sources have been identified and plugged in the editing tool.Integration of EDI with the GET-IT suite also allowed for narrowing the number of required metadata fields by applying naming conventions on resource identifiers and by extracting information at data upload-time.

Metadata Management Scenario
Although we indulge in some technicalities in the Appendices, this paper is not meant to provide a comprehensive overview of the functionalities that are implemented by EDI (please refer to the documentation provided on GitHub for a developer view on the tool).However, customisation of the target metadata schema does not call for modification of a single line of code in the application.In fact, EDI is a template-driven metadata editor that, in principle, elicits any XML-based (or text-based) metadata schema: System administrators can either create a template from scratch or customise one of those provided with the application, which implement the basic INSPIRE profile, RNDT, and SensorML versions 1.0.1 and 2.0.0 profiled according to the OGC SOS Lightweight Profile [36].In the following of this Section, we are going to focus on the features that, in line with the research problems addressed in the Introduction, allow for opening metadata management to the Web of Data.
Figure 1 presents the information flow regulated by our template structure, consisting of the following phases: 1.The system administrator executes the one-off creation or customisation of the template in order to set the metadata schema for the specific use case.In this phase, external data sources from the Web of Data can be plugged in: This allows metadata properties to refer to resources (persons, toponyms, code vales, etc.) that are managed by third parties.The template serves as input to EDI both in the peripheral and central nodes of RITMARE.2. The metadata maintainer of a RITMARE peripheral node uses the editing interface that is created by EDI on the basis of template definitions and creates/edits metadata (the output of EDI).The data sources that have been plugged in the previous phase enable autocompletion functionalities that reduce as much as possible the effort required for metadata provision.3. The metadata records in XML format are generated by EDI for insertion in catalogues and applications that understand the specific formalism, such as in the peripheral nodes of the RITMARE infrastructure.Typically, the entities referred to in the previous phase are rendered now as free-text property values.4. The semantics-aware counterpart of the metadata description is also produced by EDI and stored as RDF data in the project's triple store (i.e., a database for RDF data).The record can also be published on the Web of Data and be accessed according to the same formats and protocols that allowed for plugging in external data sources in the first phase.5.The end user can search the RITMARE central geoportal through the discovery client.When metadata records are requested according to the XML metadata schema, they are produced again by EDI on the basis of template definitions.Specifically, property values drawn from the data sources that are referred to in the template are accessed again at user request-time.This allows for generating an XML description containing up-to-date property values, as detailed in the following of this Section.

Use Cases and Requirements
In this section, we introduce two use cases that may look straightforward to implement when tackled in a simplistic way but, as soon as new-perfectly reasonable-requirements pop in, may become far more challenging.In both use cases, the key point is selection of the data sources that are made accessible via the metadata editor.It should be noted that, although these use cases stem from the requirements posed by project RITMARE, they are significant in the broader scope of geospatial metadata management.

Specifying Points of Contact
Figure 2A shows the web form fragment that allows the metadata maintainer to specify a point of contact for a dataset according to the INSPIRE profile of ISO 19115/19119.As soon as the user starts typing the e-mail address, the interface suggests some options for completion of the metadata field (b).Upon selection of one of these, the following field describing the point of contact (that is, the field containing the name of the organisation the person works in) is automatically filled in (c).Also, the form provides a drop-down list for selecting the specific role the point of contact fulfils (in the example, "Resource provider").One may argue that, with a supporting relational database behind the scenes, it is straightforward to implement exactly the same functionality.But what if the data structures that are leveraged on for autocompletion are managed by a third party?Although not apparent in this use case, this requirement (that is, integrating third-party data structures in the metadata editing application) is of topmost relevance for the following use case.Also, consider what happens when some of the metadata fields involved in this use case (say, the e-mail address associated with the individual) change.Shall we rely upon inconsistent metadata featuring the old property values or try to keep them updated?
Use cases "specifying points of contact" (A); "keywords from controlled vocabularies" (B).

Keywords from Controlled Vocabularies
The web form fragment in Figure 2B allows the metadata maintainer to provide the keywords describing a dataset which, as often happens in geospatial metadata, are bound to a specific controlled vocabulary (or a set of).As an example, in INSPIRE-based metadata, the codelists provided by the reference ISO standards have been supplemented by those defined by the Directive (e.g., the codelist containing the INSPIRE Themes); also, distinct application domains may rely on further controlled vocabularies that are specific to a given CoP.In the use case, the codelist Climate and Forecast Standard Names has been selected in the comprehensive collection of thesauri defined by project SeaDataNet (http://www.seadatanet.org/)and provided through the NERC Vocabulary Server (http://vocab.nerc.ac.uk/) [37,38] and is used to find the suggestions that are proposed to the metadata maintainer (b).Depending on her choice, the accessory fields describing the keyword are automatically filled in (c).
Also here, a supporting relational database could provide the terminology grounding autocompletion of keywords; but what about the inevitable evolution of such terminology?It is inconvenient for system administrators to duplicate this data source and strive to keep it updated to the authoritative version.Rather, it would be better to directly link the application behaviour to the aforementioned data source.What if, at some point, a "find similar assets" or "translate this metadata record" functionality is requested?In traditional discovery based on ISO 19136 metadata, implementing query expansion and multilingual functionalitites constitutes a daunting task.Instead, it is straightforward to accomplish this with the URI-based metadata generated by our application.
In the next Section, we review the template components that allow for realising these use cases and also introduce the components that regulate translation of user input in either of the data formats we employ, respectively, for metadata representation and for XML encoding of the latter.

Template Structure
All activities related to metadata management, such as editing of fields and creation of the RDF and XML representations of metadata descriptions, are driven by the EDI template, in XML format as well, that is dependent on the specific schema that is required by a given project (e.g., RNDT, SensorML, etc.).The template specifies properties of metadata that shall translate into widgets in the editing interface: Some of these directly stem from regulations (e.g., whether a given field is mandatory or not, the associated multiplicity, etc.), while other properties depend on the context information associated with the specific project (such as the controlled vocabularies being used, the queries that allow the interface to propose default values for fields, etc.).Of course, the template also specifies the information required for translation into a valid XML document.It should be noted that the XML Schemas underlying the specific metadata format already contain some of this information.However, we found it unwieldy to leverage these specifications for articulating interface composition for a number of reasons.Firstly, many constraints are not directly stemming from the base schema but rather from the specific profile of the latter: As an example, both INSPIRE and RNDT rely on ISO 19136 for encoding, but each has its own mandatory fields, prescribed values, etc.Secondly, the most interesting aspects in this integration exercise, such as pervasive recourse to semantics-aware data structures and generation of multiple output formats, are not encoded in (and not the purpose of) the originating schemas.
As a consequence, a metalanguage for expressing these properties has been created and encoded as XML Schema.Figure 3 is showing the key components that are used in templates for defining a metadata field.Specifically, the element tag (note that we prefer term "tag" over "element" to avoid ambiguity with element components of templates) in Figure 3a defines individual metadata fields: The attributes defined provide a unique xml:id for it, specify whether it is mandatory or optional, and declare the associated multiplicity.Also, the field can be declared as alternativeTo another.The content of element tags include the multilingual visual cues that can be seen in the interface (the label and help tags) and the XPath location where multiple instances of the metadata field shall be rooted (the hasRoot tag).Tag produces contains the set of items that represent individual XML nodes that shall be created for the specific metadata field.Finally, tag rdfOut drives creation of the actual metadata representation to be stored in the RDF triple store; conversely, tag rdfIn drives extraction of metadata field values when re-creating the metadata description according to the schema the template is meant to implement.This tag contains a SPARQL query retrieving the necessary property values from the local triple store.Since multiple data sources may contribute to creating the final metadata, this query definition is only partial because it lacks the triple patterns (i.e., the query constraints) that shall be matched against the remote data sources that are defined.This aspect will be clarified in the following.In fact, as anticipated above, element definitions can trigger the creation of multiple XML nodes (XML elements and attributes) in the target metadata file, which is the purpose of the item tag we are about to describe.This distinction is outmost necessary for the target format of our use cases because in ISO metadata (the reference schema the template structure was based upon) it is not infrequent to have the actual values selected by the metadata maintainer inserted in multiple places, together with fixed values in other positions in the specific XML sub-tree.Among others, this behaviour is apparent for keywords taken from controlled vocabularies where a single term choice by the metadata maintainer produces six different XML nodes in the final document (see Appendix D for an example of this).The semantics of the item tag is the following: Attributes hasIndex and outIndex specify the ordering of fields in the interface and in the output document, respectively.Attribute isFixed determines whether a widget shall be created in the editing interface (when set to "false") or if the metadata field can be kept transparent to the end user because its value is known in advance.Other key attributes are hasDatatype, specifying the range of valid values for the item, and datasource, providing the aforementioned flexibility in the definition of external data sources.The tags included in individual items define, beside the visual cues we already found in the definition of the element tag, the specific value (when attribute isFixed is set to "true"), a default one, or the corresponding variable (the field tag) exposed by the associated datasource.The hasPath tag, specifying the XPath of the XML node that shall be created, is an essential component of item definitions.Finally, an optional rdfIn tag complements the SPARQL query in the rdfIn tag defined for the element as a whole.In fact, each element definition in the template defines the SPARQL query for retrieving the metadata property values that are required by the distinct items.Note that, since each of these may rely on a different endpoint, the SPARQL queries compiled from EDI templates are, by definition, federated.Appendix A contains the template definitions implementing the two use cases introduced in this Section.Now we concentrate on the definition of data sources, the core feature for eliciting decentralised management of metadata.In fact, both use cases defined in Section 3.1 involve data sources: The first is constituted by the triple store we set up for managing context information in project RITMARE, while the second is provided by project SeaDataNet.Although managed by different parties, these data sources can be accessed in a seamless fashion, through an HTTP endpoint, because they share the same access protocol (SPARQL) and the same data format (RDF).Of course, the content of each of these may relate to schemata created with different formalisms (such as RDFS or OWL [39,40]) and with different purposes, but the commonality of the base format allows for univocal query mechanisms.In fact, both data sources are handled in the same way by EDI and either can be the target of datasource definitions in the template.Figure 4 shows the three different categories of data sources supported by templates; all of them share an xml:id attribute for unique identification and an endpointType attribute allowing administrators to plug in different triple stores by associating the correct request parameters.They also share the url tag allowing for per-datasource definition of endpoints (that is, the web addresses queries shall be posted to).The three categories are also described below: Codelist: This category of datasources assumes that the nested uri tag refers to a SKOS thesaurus (a controlled vocabulary encoded according to this specific ontology) and executes a standard query for matching code values.This is the data source type allowing for creation of the drop-down list for selecting the role of a point of contact in Figure 2a.Sparql: This category allows for executing generic SPARQL queries.This comes handy in the example use cases because the SeaDataNet endpoint provides SKOS-compliant thesauri whose structure slightly differs from that expected by the previous datasource type.Both use cases involve a datasource of this type.Singleton: Executes queries that are required to return a single record (typically, as a consequence of a previous call to a datasource of the preceding type to provide autocompletion functionalities).Both use cases involve a datasource of this type.
Specifically, the use cases above rely on the RITMARE endpoint (http://sparql.get-it.it/)and the NERC Vocabulary Server endpoint (http://vocab.nerc.ac.uk/sparql/), for the data structures representing the user community and the controlled vocabularies, respectively.When accessed, these addresses provide a web form to issue queries to the endpoint; by providing the query as GET or POST request parameter, it is possible to issue queries in an automated way.As an example, the query that is used in use case "specifying points of contact" (matching the e-mail address of prospective points of contact) is shown in Listing 1.
Listing 1: SPARQL query for use case "specifying points of contact" The data structures that are looked up feature properties from the FOAF (http://xmlns.com/foaf/spec/) and vCard (https://www.w3.org/TR/vcard-rdf/) schemata: The first is used for modelling the researchers and institutes participating in RITMARE, the second provides fine-grained properties for detailing the former.In order to provide users of the EDI demo site with nonsensitive data to play with, we created example FOAF data structures for the characters in The Lord of the Rings and The Hobbit.By inserting the query in the web form provided by the first endpoint, it is possible to test the query before deploying the template: For doing this, the placeholder "$search_param" shall be substituted with the actual search pattern, just like in Figure 2b.Then, the query is going to return the URIs and e-mail addresses of individuals matching the pattern.
A second query, shown in Listing 2, allows the application to autocomplete field "Institute" in the interface fragment shown in Figure 2a: Listing 2: SPARQL query for use case "specifying points of contact" By substituting any of the URIs returned by the first query to the placeholder "$search_param", the name of the corresponding organisation is returned.
Similarly, the second use case, "keywords from controlled vocabularies", relies on the queries shown in the following Listings.Listing 3 is used to match user input against the Climate and Forecast Standard Names vocabulary P07 provided by SeaDataNet and this restriction can be seen in line 6; let aside this, the query is very similar to that in Listing 1.
Listing 3: SPARQL query for use case "keywords from controlled vocabularies" The query in Listing 4 extracts the name of the thesaurus under consideration and the associated publication date.Since the format of the latter data value is a full timestamp, the BIND clause in line 8 extracts the fragment that matches the prescribed format yyyy-mm-dd and associates this value to variable ?date:Listing 4: SPARQL query for use case "keywords from controlled vocabularies" The template definitions for all data sources can be found in Appendix B and C.

Generating and Storing Metadata
Once metadata is posted to the server-side component of EDI, the prescribed XML output is generated by creating XML nodes following the XPath expressions defined by hasRoot and hasPath tags in the template.Additionally, the tag rdfOut defined by each element allows for specifying which RDF triples shall be produced and inserted in the triple store.In fact, during editing of metadata, EDI exposes (for fields that are not defined as fixed) the free-text values that are needed for generating the specfic XML nodes but, under the hood, it keeps track of the URIs that have been selected by the user in order to create an RDF data fragment conveying the same semantics.As an example, the rdfOut tag in the first element defined in Appendix A (implementing the first use case, "specifying points of contact") is depicted in Listing 5.
Listing 5: Content of the rdfOut tag associated with use case "specifying points of contact" It relates the identifier (the URI) of the dataset being described (indicated as "$id_1_uri") to a vcard:Individual data structure via property contactPoint from the DCAT namespace: This data structure is composed of the identifier of the contact point and that of the specific role she covers (both expressed, again, as URIs).All queries in this Section contain placeholders following the production rule <element_id>_<item_index>[_uri].They are substituted run-time with the actual values entered by the metadata maintainer.As an example, placeholder <$resp_1_uri> is replaced with the URI associated with the first item in the element whose xml:id is "resp" (that is, the URI of the individual selected).
It should be noted that these triples are not in one-to-one correspondence with the XML nodes that are created in ISO metadata, a characteristics that is even more apparent in the rdfOut tag of the second element (implementing use case, "keywords from controlled vocabularies"), shown in Listing 6.
Listing 6: Content of the rdfOut tag associated with use case "keywords from controlled vocabularies" In this case, a single triple is sufficient to provide the semantics of the six items that are defined in the corresponding element in the template.At mere syntactic level, this is the major difference between our approach and any of the possible one-to-one translations of INSPIRE metadata into RDF, such as those in [41,42].Section 4 is going to elaborate on the profound implications of our approach in metadata management.For the time being, it suffices to note that this practice is the enabling factor for the novel functionalities we are going to sketch in the next Section.
In fact, whenever the XML representation of a metadata record is requested, the template is parsed again in order to look up property values at metadata request-time.This is the role of rdfIn tags in the template: As an example, use case "specifying points of contact" defines three of such tags (see Appendix A) because retrieving the necessary data values may involve up to three distinct data sources, that is, the local endpoint we use for storing metadata, the endpoint hosting the information on contact points (and organisations), and finally the endpoint storing the codelists that define the available roles for the point of contact (only two endpoints in our worked-out example).Listing 7 shows the query that is compiled from this information: Listing 7: SPARQL query originating from rdfIn tag definitions in use case "specifying points of contact" Variable names in the projection clause are the numeric values identifying the distinct items of the specific element.The query is going to retrieve all vcard:Individuals related to the specific asset in locally stored metadata.The URIs identifying the person and her role as a contact point are then used in the federated queries indicated by SERVICE clauses.Each tuple in the result set will generate the corresponding XML data chunks as specified in the template.
Similarly, use case "keywords from controlled vocabularies" is going to entail federation of the NERC vocabulary server in order to retrieve the necessary keyword details: Listing 8: Content of the rdfIn tag associated with use case "keywords from controlled vocabularies" The queries we compile by combining the content of rdfIn tags are partly redundant with those provided for the definition of datasources but, for the time being, we did not find a viable means to express these queries in a unified way.

Discussion
In this Section, we check whether the metadata management practices that were proposed in the previous Sections are correctly addressing the issues and desiderata presented in Section 3.1.Moreover, we provide further ideas on how to extend the capabilities of state-of-the-art geoportals.

Assuring Metadata Consistency
The first use case put forward possible inconsistency issues that may arise as soon as one of the field values included in metadata changes.There is no acknowledged practice for (automatically) reflecting such changes in the metadata descriptions that annotate assets: Typically, human intervention is required to bring metadata descriptions back to consistency.Instead, the inherently deferred XML production described in Section 3.3 is going to return up-to-date property values (of course, to the extent that the specific data source is kept updated).Note that this functionality can not be achieved with normative ISO-based INSPIRE metadata as well as with any one-to-one translation of these into RDF.To our knowledge, no existing metadata management technique can achieve automatic update of metadata fields, especially those that involve data provided by third parties.

Recommending Assets
The accessory information that can be drawn from heterogeneous data sources can be valuable for extending the scope of one's application.As an example, the URI of a point of contact in a metadata description may lead to people in her social network (indicated, for example, by foaf:knows property values) and these may point to relevant assets that did not match the discovery criteria (e.g., because of semantic heterogeneity issues).Moreover, when the user accessing the geoportal is authenticated, discovery results that have been deemed as relevant by people in her own social network (because the corresponding assets have been viewed, downloaded, bookmarked, etc.) can be suggested on the basis of search log analysis.Both these social networks (that of the point of contact and that of the user executing the search) could be further shortlisted by matching the preferred research topics of the individuals involved (expressed as foaf:topic_interest property values).

Expanding Queries
The advantages of relying on information drawn from the Web of Data are even more apparent when considering resources that are more semantically rich and connected with other resources of the same kind, such as terms in controlled vocabularies.Use case "keywords from controlled vocabularies" hinted at this when speculating on the "find similar assets" or "translate this metadata record" functionalities.The first can be implemented on the basis of the (poly)hierarchical structure of SKOS thesauri because the URI of a term can be used to derive more general, more specific, equivalent, or simply related terms.These can in turn increase the result set of a discovery by matching the URIs of these terms (or their text representations in traditional discovery) against catalogue metadata.When metadata is encoded as XML or text, the second functionality can only harness automatic translation tools; instead, the URIs of terms in controlled vocabularies are in principle language-neutral but the terms themselves can accommodate translations into multiple languages.Hence, a keyword from a controlled vocabulary can be straightforwardly translated into any of the supported languages.

Exploiting Gazetteer Information
Gazetteer services constitute the primary means to bridge the gap between geospatial and traditional (i.e., text-based) web searches.Even though the use cases presented in this paper refer to metadata that have no geographic connotation, our metadata descriptions cover the full set of INSPIRE metadata, comprising geographic and temporal extent (expressed as data structures complying with the schemata in [41]).Elaborating on the former of these properties, RDF-based gazetteer services, such as GeoNames (http://www.geonames.org/)can be used to "translate" text-based queries such as "salinity of water column 100 km West of Naples" into the corresponding geospatial query.Instead, geoportals typically require the user to provide coordinates or draw a rectangle (the "bounding box") on a map.Moreover, toponyms are typically organized as a hierarchy according to containment properties, thus allowing for more query expansion criteria.

Further Suggestions for Semantic Enrichment
The use cases in Section 3.1 entail data structures representing individuals, encoded according to the FOAF and vCard schemata, and descriptions of terms in controlled vocabularies, expressed as SKOS data structures.However, even a sneak peek at the Linked Open Data cloud diagram (http://lod-cloud.net/)can suggest a number of viable data sources for enriching metadata with semantic information.We acknowledge that relying on dependability of data sources that are out there in the Web of Data, hence not under direct control of the system administrator customising EDI for a specific project, is a big leap of faith.However, in our opinion it is more advisable, on the long term, to refer to well-established data sources rather than to rely on proprietary data structures that are prone to be neglected as soon as a given project or initiative is discontinued.Please refer to [43] for a general view on the RDF data structures we integrated in project RITMARE for the purpose of grounding metadata management.

Conclusions and Future Work
In this paper, we presented a scenario illustrating the methodology for metadata management that has been set up for project RITMARE.It hinges on a client-server application rendering the editing interface for a given metadata schema and taking care of the generation of XML and RDF descriptions on the basis of the data inserted by the metadata maintainer.This tool, EDI, is a FOSS package (the RITMARE SP7 team is part of the GEOforALL network: http://www.geoforall.org/) that can be easily customised to the requirements of other projects.If compliance with one of the supported profiles is sufficient to a given application context, the software can be used out-of-the-box.A second step in customisation of the tool is consituted by personalisation of the template that regulates its behaviour: This is assisted by the XML Schema associated with the templating language, whose essentials have been presented in this work.In the future, further templates may be added to the GitHub distribution (https://github.com/SP7-Ritmare)as the output of a community effort.Finally, external data sources can be easily plugged in for full customisation of the metadata editing experience: We exemplified either recourse to proprietary data sources describing a project (e.g., describing users, institutes, roles, etc.) and the integration of third-party data structures provided as SPARQL endpoints (e.g., to leverage controlled vocabularies in use by the specific CoP).
This feature is aimed at fostering a decentralised paradigm for metadata management: In our framework, attribute values are retrieved from the (possibly remote) data structures that are associated with the specific principal.As an example, the e-mail address of a point of contact is directly taken from her FOAF description, thus avoiding redundancies and inconsistencies.Our methodology has the key advantage of enabling automatic update of metadata attributes that have URIs (directly or indirectly) associated with them: In our opinion, this is sufficient to deem our metadata management paradigm as more robust with respect to traditional centralised strategies.In fact, centralised strategies cannot but repeat the same property values (again, consider the e-mail address of a point of contact) over and over in metadata records, hampering efficient propagation of updates and impeding more fine-grained management of metadata items.
There is also a number of novel query expansion techniques that can be based on the RDF data structures that express metadata: These can identify people, keywords, toponyms, and all categories of descriptions one can find on the Web of Data.The extent to which this can impact on precision and recall in discovery of geospatial assets (an evaluation out of the scope of this work) is yet to be determined and compared with other strategies for semantics-aware discovery, such as those in [44,45].However, an apparent advantage of our methodology with respect to the first approach is that no preemptive processing of metadata descriptions is required, because semantic lift is implicit in metadata editing.Instead, in [44] traditional ISO-based metadata is first processed in order to (heuristically) associate property values with URIs.Semantic lift still makes sense for query parameters entered by the end user at discovery-time, as in [45], and this is one of the aspects we are going to implement in the interface of the RITMARE geoportal.
As a bottom line, we point out that the decisions that drove development of the methodology described in this paper find their roots in our vision of geospatial metadata as multi-tenanted descriptions.Unfortunately, metadata are typically overlooked, not exhaustive, or inconsistent.Then it makes sense to encompass third-party data structures if this means prolonging the time lapse in which metadata descriptions can be relied upon.Sourcing these external data structures as Open Data does not pose any condition on the usage or open-ness of the assets referred to in metadata descriptions.
Still, sharing data and metadata in an open fashion is a recommended practice we generally subscribe to: In particular, in the context of RITMARE, we promoted an open data policy [46] that can easily match those characterising Open Science initiatives [47][48][49].
Future activities in this research thread are going to focus on generalising the approach.Specifically, we are going to generate RDF-based descriptions from ISO metadata that have been produced by tools other than EDI, i.e., devoid of any semantic reference.In the beginning, the template language introduced in this paper is going to retain its importance because semantic lift requires definition of data sources and SPARQL queries, such as those in the implementation of the use cases presented in this work.Then, loosening this requirement is going to be the final step toward a generally applicable framework for decentralised, semantics-aware metadata management.

Figure 3 .
Figure 3.The element and item XML Schema components.