Semantically-Aware Retrieval of Oceanographic Phenomena Annotated on Satellite Images

Scientists in the marine domain process satellite images in order to extract information that can be used for monitoring, understanding, and forecasting of marine phenomena, such as turbidity, algal blooms and oil spills. The growing need for effective retrieval of related information has motivated the adoption of semantically aware strategies on satellite images with different spatio-temporal and spectral characteristics. A big issue of these approaches is the lack of coincidence between the information that can be extracted from the visual data and the interpretation that the same data have for a user in a given situation. In this work, we bridge this semantic gap by connecting the quantitative elements of the Earth Observation satellite images with the qualitative information, modelling this knowledge in a marine phenomena ontology and developing a question answering mechanism based on natural language that enables the retrieval of the most appropriate data for each user’s needs. The main objective of the presented methodology is to realize the content-based search of Earth Observation images related to the marine application domain on an application-specific basis that can answer queries such as “Find oil spills that occurred this year in the Adriatic Sea”.


Introduction
Coastal zones and oceans are the subjects of a vast and increasing number of studies whose purpose is to prevent or manage disasters, the sustainable management of coastal areas and oceans, and marine safety. Several studies develop Remote Sensing (RS) methods and techniques-such as processing of Earth Observation (EO) satellite images (indices, classifications, object-based image analysis, etc.), mathematical simulation models, and deep learning-for better monitoring, understanding, and forecasting natural or human-induced marine phenomena. Furthermore, these techniques are integrated with Geographic Information Systems (GIS) that allows the implementation of static, live or forecasting spatio-temporal analysis and the production of useful products like sea wind/waves, sea temperature, sea color, spatial distribution of the sea species, seasonal cycle of microorganisms (based on temperature, sunlight, currents, and presence of polluting species), oil spill detection etc. The growing interest in smart approaches for retrieving such information has motivated the development of a strategy for approaching the retrieval process of satellite images with different spatio-temporal and spectral characteristics semantically. The exploitation of semantic information derived from satellite imagery will provide ground for new smart products and applications and further promote satellite imagery.
In this work, we focus on an integrated process that: (a) extracts semantic knowledge from EO images, (b) models this knowledge using a geo-ontology for marine phenomena, and (c) applies question answering techniques on a semantically enabled knowledge base that allow users to express their needs and issue queries in natural language. Specifically, we develop automatic RS algorithms (mathematical modeling, classifications, indices, spectral matching, image segmentation, etc.), that extract information from the Sentinel 1, 2, and 3 satellites acquired daily. The algorithms annotate images with a set of predefined core marine phenomena (Chl-a, turbidity, oil-spills), but the system could be easily extended to support annotation of more marine phenomena and use images from other satellites as well. We define a marine phenomena ontology that semantically enriches the knowledge extracted from satellite images and serves as the basis for the knowledge base. Lastly, we adopt a methodology for realizing the content-based search and retrieval of images and phenomena by developing a Question Answering (QA) module that handles natural language queries, including a geocoding component for acquiring spatial entities' coordinates. These components are integrated into a semantic web retrieval system for EO data, called in short SeMaRe (Semantic Marine Retrieval). The methodology presented in this paper has been initiated in the framework of a European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie, namely SEO-DWARF (https: //cordis.europa.eu/project/id/691071, accessed on 4 August 2021), and is a continuation of this work. Its main objective is to realize the content-based search of images related to the marine domain on an application-specific basis. Queries such as "Find satellite images that contain turbidity phenomena around Lesvos this year" would be answered by helping users retrieve the appropriate information for their specific needs.
The main innovative scope of this work is to bridge the gap between the raw information of satellite images and the knowledge gained from the marine domain to enable users to retrieve data that are relevant to their needs expressed by natural language queries. For this reason, we adopt a multidisciplinary approach where researchers in the marine domain provide knowledge and experience about their field, and ontology engineers capture the semantics on a marine phenomena ontology, and Natural Language Processing (NLP) experts integrate it in the SeMaRe search engine capable of performing natural language queries. The main contributions of this work are: (a) the development of algorithms for annotating EO images, (b) the formalization of the marine phenomena ontology, (c) the determination and implementation of the semantic queries for the application domain that realizes CBIR, and (d) the coordination of these components to design the architecture of the SeMare search engine that performs semantic storage, management, and retrieval of the extracted knowledge. This paper is organized as follows. In Section 2, we present the related work on the three axes of this work, i.e., remote sensing algorithms for annotating satellite images with marine-related phenomena, modeling of the marine domain related knowledge using ontologies, and semantically enabled retrieval systems for satellite images. In Section 3, we present the RS algorithms for the annotation of marine-related phenomena, the Se-MarRe ontology that supports the semantic representation of the domain, and the adopted question answering techniques for handling natural language queries. In Section 4, we present the implementation of the proposed methodology on the integrated semantic web retrieval system for EO data. In Section 5, we evaluate the retrieval accuracy and the query performance of the method. In Section 6, we conclude this paper with a discussion on the evaluation results and pointers for future work.

Background & Related Work
This section consists of three parts. Section 2.1 introduces the examined categories of marine phenomena, namely, turbidity, algal blooms and oil spills, and presents for each the main RS approaches for phenomena annotation on EO images. Section 2.2 presents existing ontologies for modelling the marine domain. Sections 2.1 and 2.2 provide the necessary background material for the RS methods and the ontology implemented in this work. Section 2.3 presents state-of-the-art systems related to the semantically-aware image retrieval.

Marine Phenomena and Remote Sensing
In the SEO-DWARF project, several marine phenomena were examined and analyzed (i.e., hot-spots, upwelling, algal blooms, fronts, oil-spills, trophic status index, turbidity, winds, waves, shallow water bathymetry, coastal habitat mapping). In this work, we focus on: (a) turbidity, (b) algal blooms (estimated by Chl-a concentration), and (c) oilspill detection.

Turbidity
Turbidity describes the level of transparency of a liquid based on the presence of undissolved material. It is expressed as the optical property of a medium to scatter or absorb light instead of transmitting it straightly through the sample. Suspended sediment in water presents the main cause of water quality deterioration, and contaminated water may cause significant health issues. Turbidity measurements are therefore used in many fields to estimate the concentration of suspended material in a sample [15]. The concentration and character of suspended sediments, phytoplankton, and dissolved organic matter affect the optical properties of water. Satellite sensors measure the water reflectance at different wavelengths, and their imagery can be used to estimate water optical properties. This is an advantageous method of measuring water quality, compared to ground sampling, because of the following reasons [16]: (a) its spatial coverage allows the estimation of water quality over large areas, (b) its global coverage allows the estimation of water quality in remote areas, and (c) the long record of archived imagery enables the estimation of water quality for time periods when no ground measurements are available. State of the art algorithms regarding the turbidity phenomenon are: (a) the method of Garaba et al. [17]-a derivation of turbidity algorithm using the 645nm band, (b) the method of Dogliotti et al. [18]-combines three main approaches: Red-NIR combination as ratio, Red based linear algorithm at 645 nm, and NIR based linear algorithm, and (c) the method of Nechad et al. [19] a calibrated algorithm using the 665 nm band.

Algal Blooms (Estimated by Chl-a Concentration)
Phytoplankton, also known as microalgae, are photosynthesizing microscopic organisms that inhabit the upper sunlit layer of almost all oceans and bodies of freshwater. In a complex light-dependent process, photosynthesis transfers absorbed photon energy to organic compounds [20]. A microalgae bloom is a rapid increase or accumulation in the population of algae (typically microscopic) in a water system. Remote sensing techniques, and in particular ocean color data, are extensively used to derive and monitor phytoplankton blooms [21][22][23][24]. There is a wide variety of operational ocean color satellite sensors and algorithms to assist in the detection and monitoring of phytoplankton blooms [21].
The seawater optical properties are mainly determined by phytoplankton, the concentration of which is approximated with Chl-a, color dissolved organic matter (CDOM), and suspended sediments [25]. These are the three main components that affect the ocean color and are used as a base for classifying the oceanic waters. One thing that should always be considered is the differences between Case 1 and Case 2 waters. As defined by Morel and Prieur [26], and Morel [27] Case 1 waters have their main optical properties determined by phytoplankton and are slightly influenced by particulate organic carbon and CDOM. On the other hand, Case 2 waters optical properties are dominated by substances (mineral particles, CDOM) changing independently of phytoplankton. Oceanographers have commonly used the Case 1 and Case 2 classifications to differentiate essentially open ocean and coastal waters, respectively, as per Dickey, Lewis, and Chang [28]. At this stage, our study is focused on the estimation of Chl-a from Sentinel-2 MSI data for the more complex Case 2 waters. Reflectance band-ratio algorithms are intensively used to retrieve chlorophyll concentrations and used for standard product computations. Empirical blue-green (440-550 nm) spectral band ratios are the most common ocean color algorithms used for Chl-a retrievals because most of the phytoplankton absorption occurs within this portion of the visible spectrum. However, the use of visible wavelengths can be unreliable in coastal waters. In optically complex Case 2 waters, blue-green reflectance band-ratios become less sensitive to changes in Chl-a concentrations because of increasing concentrations of CDOM and total suspended matter (TSM) (e.g., [29]). To overcome this limitation, other studies suggested the use of red-NIR band-ratios, empirical models, neural networks, and machine learning for Chl-a retrieval in coastal waters [30][31][32][33][34]. Most of the studies present encouraging but not very accurate results, and our experiments using Copernicus in-situ Chl-a data to assess the accuracy of some of the most common algorithms (C2RCC, ACOLITE OC2, etc.) showed that there is still work needed in this field.

Oil-Spill Detection
Effective and efficient monitoring of oil spills that originate from ships, offshore platforms, and any accident is of high importance from the view of public safety and environmental protection [35]. For radar systems, the primary backscattering mechanism in marine regions is surface roughness due to capillary waves [10]. Oil on the sea surface dampens the capillary waves, and in the radar image, these regions appear as 'dark spots'. Several other phenomena have the attenuating effect on the capillary waves and thus also appear as slicks ('look-alikes') in radar images, such as natural films/slicks, grease ice, threshold wind speed areas (wind speed < 3 m/s), wind sheltering by land, rain cells, shear zones, internal waves, etc. [36,37]. Differentiation of actual spills from look-alikes is one of the main challenges in oil spill detection in Synthetic Aperture Radar (SAR) imagery. A common methodology for oil spill detection in SAR images is Object Based Image Analysis (OBIA) [37,38]. This approach consists of four main steps: Image preprocessing, segmentation, feature extraction, and post classification. In the framework of our former research program (SEO-DWARF), a new open-source method using OBIA has been developed for oil-spill detection using Sentinel-1 data [10].

Marine Domain Ontologies
Although much research has been conducted on the development of EO data ontologies to represent knowledge related to the Earth sciences and the marine domain [39][40][41][42][43], the provision of state-of-the-art ontologies is rare. One of the most notable projects that involve Earth and environmental aspects is SWEET (The Semantic Web for Earth and Environmental Terminology) [44]. It is promoted by NASA to improve the use of Earth science data in semantic applications. For this project, 200 separated ontologies were created, and more than 6000 concepts were subdivided into nine categories that cover aspects of space, time, earth realms, physical quantities, etc., and integrative science knowledge concepts (such as phenomena, events, etc.). The starting point of this ontology development was the collection of keywords in the NASA Global Change Master Directory that contains about 1000 controlled terms structured as a taxonomy. Moreover, other 20,000 terms, often synonymous with the previous, were extracted by free-text. The level of granularity used is high, and this group of ontologies can be seen as a group of top-level ontologies. For example, the term "air temperature" was not defined as a specific concept but only as a composition of "air" and "temperature" term. SWEET Ontologies are written in OWL 2 [45] and can be easily edited in Protégé after the download from the official project site [44], enabling thus its reuse in other ontologies.
European Environment Agency (EEA) is the driving force of a consortium of organizations that provide CORINE Land Cover methodology, technology, and data [46]. Land cover and land use in Europe are derived from satellite imagery, then classified and provided for download (as shapefiles) to the public. The classification is used to characterize areas, e.g., green urban areas, code 141. On top of the EEA maintained classification, an ontology is modeled [46]. This ontology is developed to cover the CORINE nomenclature. It is defined in three levels, which describe natural and artificial elements that can be visu-alized in a geographical image. Analyzing the ontology macroscopically, we can identify five classes: artificial areas, agricultural areas, forest, and semi-natural areas, wetlands, and water bodies. Marine waters are also described, such as oceanic and continental shelf waters, bays, and narrow channels, including sea lochs or loughs, fiords or fjords, rye straits, and estuaries. The ontology is written in OWL 2 [45], and it is available online for free download, use, and extension.
Koubarakis [47] proposed another ontology for EO images called DLR Ontology, which was developed to annotate TerraSAR-X images for the European project TELEIOS [48]. This ontology is different from the previous because it describes EO images and presents concepts about the image acquisition metadata. In particular, the following macro sections are described: • Image metadata: this section includes predicates that describe image properties. A small number of metadata are included, such as time and area of acquisition, sensor, image mode, incidence angle. • Elements of annotation: this section includes classes about patches, images, vectors used to describe an EO Image after the knowledge discovery step. • Concepts about the land cover: this section includes an object visible in an EO image such as agriculture areas, bare grounds, forests, transport areas, urban areas, water bodies.
The ontology is not very specific but covers macro-concepts that can be further specialized and extended for specific domain applications.
Ontologies proposed in [41][42][43] focus on the image interpretation process for coastal and ocean areas, while SWEET [44] and CORINE ontology [46] focus on a high level conceptualization of the Earth science domain. Our work is most similar to DLR [47], in the sense that they also develop an application-specific ontology that focuses on the retrieval process. In Section 3.2, we present a lightweight application-specific ontology for aiding marine phenomena image retrieval.

Semantic Image Retrieval
The necessity for content-based image retrieval (CBIR) techniques in remote sensing calls for new methodologies to match the information contained within images with the semantics of users' queries. Related work focuses on techniques for hyperspectral remote sensing images [49], while in the EU-funded project TELEIOS, features are extracted from TerraSAR-X images and accompanied with image metadata and GIS data unfold their semantics [50]. The methodology in the work of Priti and Namita [51] is applied on multispectral images to different image processing and querying techniques. Object/Segment oriented techniques for relating low-level features of images and ontological concepts can be seen in Ruan et al. [52], Li and Bretschneider [53], Liu et al. [54], and Wang et al. [55]. In Datcu et al. [56], Li and Narayanan [57], Aksoy et al [58] the labeling process is applied to pixels. Tiede et al. [59] propose a system that allows performing queries to retrieve specific EO images. However, users have to express their information need using a particular user interface that allows specifying a list of filters. These works do not consider natural language as an interface for querying, limiting the use of the system to advanced users. With our work, we want to further enhance the users' expressiveness, allowing them to perform queries using natural language, i.e., formulate a question like they are talking to another human being. To the best of our knowledge, this is the first work to implement a Question Answering (QA) system for marine related image retrieval.
Question Answering systems can be used to allow non-technical users to retrieve the information they are looking for even in restricted domain applications and currently represents one of the most advanced tasks topics in the field of Natural Language Processing. In fact, QA is an advanced form of information retrieval where the aim is to satisfy a user's information need expressed through natural language, i.e., English. More specifically, QA as a discipline was born in the late sixties with the development of natural language interfaces for databases [60]. The birth of the semantic web [61] at the begin-ning of this century led to the development of several large open and closed domain ontologies, which are constantly being expanded like DBpedia [62] and Wikidata [63]. Exploiting QA systems to retrieve the information encapsulated within ontologies is nowadays one of the main challenges in the NLP research field: the QALD challenge [64] is the most well-known series of evaluation campaigns on open domain question answering over DBpedia. Even though the research in this field is now mainly focused on open domain QA systems, closed domain QA systems are still proposed in the literature. QUARK [65], GeoVAQA [66], and QUASAR [67] represent examples of QA systems in the geographical domain however they extract the information needed to formulate the answer from text documents. More recently, systems like [68][69][70][71] instead exploit the geographic information contained in well known knowledge bases like DBpedia, OpenStreetMap (https://www.openstreetmap.org/, accessed on 4 August 2021) and the GADM [72] dataset. Although working on geographic information, none of these systems allow to query ontologies containing such specific information about phenomena collected from EO satellite images.

Methodology
In this section, we present the underlying methodology of SeMaRe, which consists of the following parts: (a) the annotation of marine phenomena on EO images (Section 3.1), (b) the design of the marine domain ontology (Section 3.2), and (c) the question answering methodology for retrieving semantically enriched data (Section 3.3).

Annotation of Marine Phenomena
Marine phenomena are annotated on EO satellite images provided by Sentinel 1, 2, and 3. Each image is a raster file covering a geographic area on a specific timestamp and contains low-level information, which is exploited by RS algorithms to identify features or phenomena within the image. Different approaches and RS algorithms have been tested and evaluated for their accuracy and computational cost. Below, we describe the processing chain of the most efficient algorithm (in terms of achieving the best possible classification accuracy with the least computational cost) for each of the marine phenomena under consideration.

Turbidity
Based on the experimental results using in-situ data, we implemented the turbidity algorithm proposed by Dogliotti et al. [18] as it provides enhanced turbidity values. The ACO-LITE processor (https://odnature.naturalsciences.be/remsem/software-and-data/acolite, accessed on 4 August 2021) has been used for this scope and was integrated into the processing system. Consulting oceanic domain experts, relevant to the marine phenomena considered in this work, it has been decided that the quantitative classes of Table 1 will be used for classifying turbidity phenomena into high level categories. The unit used for the turbidity classes is the Formazin Nephelometric Unit (FNU). Image pixel level of detail is contained in the exported geometries with only minor generalization for the correction of the output polygon topology. The polygon creation and generalization process consists of three steps, which are performed on the raster output of the ACOLITE processor: (1) using the GDAL (https://gdal.org/, accessed on 4 August 2021) sieve command with a size of 50, isolated pixels and very small areas are eliminated, (2) using the GDAL polygonize command the output geometries are created, and (3) a python script performs an erosion filtering process within a small buffer (20 pixels) to correct the output for overlapping polygons. The same process is applied to the raster output of Chl-a concentration as well.

. Algal Blooms
A processing algorithm has been implemented for Chl-a concentration using Sentinel-2 images. After several experiments with various Chl-a estimation algorithms (MCI, OC2, OC3, C2RCC) and in-situ data, we decided that the system will use the blue/green ratio (OC2) algorithm of the ACOLITE processor. The default settings and the aerosol correction method of "Dark Spectrum Fitting" were chosen [73]. The diffuse attenuation coefficient at the wavelength of 490 nm was calculated using the Quasi-Analytical Algorithm (QAA) of Lee et al. [74]. It has been decided that the quantitative classes of Table 2 will be used for classifying Chl-a concentration into high level categories. Again, this has been done after consulting oceanic domain experts relevant to the marine phenomena considered in this work. Table 2. Chl-a classes and association between low and high level categories.

Low Level Quantitative Categories
High Level Qualitative Categories In the framework of the SEO-DWARF project, we developed a new open-source OBIA method for oil-spill detection using Sentinel-1 data. Initially, the Sentinel-1 images were pre-processed (land masking, noise reduction, etc.). Then, images were segmented in order to extract the dark spots and feed them into the classifier. After the image segmentation, the Orfeo Toolbox SVM vector classifier was trained based on the features extracted and separated the objects to possible oil spills and look-alikes. The method is fully described in [10]. The high level categories of Table 3 were used for the oil-spills, in accordance to the OBIA algorithm output. Although a look-alike category was included, it was not an active class in the system and was only used for experiments. This means that no metadata for look-alikes were stored in order not to overload the system with false oil-spill detections. Nevertheless, the look-alike category already exists and can be easily activated and processed in a future version of the system.

Ontology
In Section 2.2 we presented ontologies related to the marine domain. In this section, we first examine the possibility of adopting them in order to develop an application-level marine phenomena ontology. SWEET [44] ontology covers densely interconnected marine and landscape related concepts, which complicates the reuse of a particular branch of the entire ontology to formalize only the closed domain of marine application. Additionally, it does not refer to a specific application domain, and consequently, it needs to be specialized and adapted to the specific application. Nevertheless, the specialization cost is mitigated because most of the required information is fully covered. The complexity for future extensions is low, is frequently updated, and is supported by a large community of supporters. CORINE [46] is less detailed than SWEET, and the absence of formalization for many EO concepts can make the extension process for future applications laborious. DLR [47] has specific concepts for the application domain. It covers water and land concepts, and the three-level structure makes possible the extension or the specification of concepts. The main limitation is the difficulty accessing that ontology because it is private, and no future updates are confirmed. As a consequence of these considerations, it has been decided to develop a new lightweight marine phenomena ontology (called SeMaRe) that will facilitate the retrieval process and reuse top-level concepts from the NASA SWEET where possible.
To develop the marine phenomena ontology, we followed the Linked Open Terms (LOT) methodology initially proposed by Poveda Villal [75] and further developed by García-Castro et al. [76]. It guided us during all the steps of definitions of requirements and ontology development. The specialization of the marine phenomena concepts was approached through a top-down decomposition strategy with a conceptualization approach. We started from a general concept, and we specialized it where needed. Newly identified concepts were added to the SeMaRe ontology and linked with the appropriate relations (e.g., father-child) to the other SeMARe concepts. The conceptualization step refers to the identification of the concepts, which could be mapped with the SWEET ontology and other external sources, e.g., DBpedia. Concepts and properties defined in the SeMaRe ontology were distinguished with the namespace seo pointing at seodwarf.eu/ontology/v1.0/. It is important to notice that our specialization of the SWEET ontology was strictly related to the specific domain application and the algorithm used in the Question Answering (QA) module that would use it for semantic entailment. In particular, the QA module used the name of classes and relations to explore the semantic structure of the ontology and then find a path in it that could be considered the right candidate for answering user queries. This particular intended use of SeMaRe ontology forced us to diversify from the original hierarchy of the NASA SWEET ontology in some points, ignoring relations that were not useful for answering user queries. Moreover, in our ontology design, phenomena already existing in the SWEET ontology in some cases were redefined to accomplish the specific project goals.
We began the design of the SeMaRe ontology from the general concept of image (class seo:Image), which, for this application, was specialized in the concept of satellite image (class seo:SatelliteImage). The class seo:Image was linked with concept of image of the SWEET ontology located under the swe:representation class. The general concept phenomenon (class seo:Phenomenon) was specialized into the concept of ocean phenomenon (class seo:OceanPhenomenon) and each specific phenomenon was defined as an rdfs:subClassOf the seo:OceanPhenomenon acquiring all its shared properties. The class seo:Phenomenon was linked with a owl:equivalentClass relation with the phenomena concept (class swe:phenomena) of the SWEET ontology. We considered as a valid conceptualization of the following marine phenomena: It is worth noting that new phenomena could be easily added as subclasses of the seo:OceanPhenomenon class similarly. The seo:Phenomenon class, and consequently its subclasses, had the following properties: • Category (Property seo:hasCategory): the category of a phenomenon (see Section 3.1), a value for characterizing the phenomenon. • Coverage (Property seo:hasCoverage): the geometry of a phenomenon in WKT (Well Known Text) format using the WGS84 reference system.
The overall design of the SeMaRe ontology is depicted at Figure 1. We sketch the following use case to illustrate the use of the ontology: an EO image about an area in the Mediterranean Sea is declared as instance of the class seo:SatelliteImage, and is described by the appropriate properties (e.g., seo:hasTitle, seo:hasTimestamp, seo:hasBoundingBox, etc.). Each identified phenomenon (e.g., algal bloom) within the image is declared as an instance of the class seo:AlgalBloom, a subclass of the seo: OceanPhenomenon class, and is described by the properties seo:hasCoverage and seo: hasCategory. The image and each algal bloom phenomenon identified within the image are then linked with the object type property seo:hasPhenomenon. If for an image there are no identified phenomena, then the specific image instance does not use the seo:hasPhenomenon property.  Initially, the user query, which was written using natural language, was processed and tokenized through a standard NLP pipeline based on the Stanford CoreNLP framework [77]. The geographical entity inside the query was extracted and geocoded (assignment of its latitude and longitude coordinates) through a Named Entity Recognition module based on the CoreNLP framework. It was important to identify a gazetteer with comprehensive coverage of marine environments and locations since the system was used to identify coastal cities, beaches, gulfs, seas, oceans, etc. Thus, we employed GeoNames (https: //www.geonames.org/, accessed on 4 August 2021), which is a geographic database available for free download. If within the query there was an adverb of place, it was detected and parsed by our system [78] to derive the lexical dependencies between the different words. This step helped us understand if there was a lexical relationship between the adverb of place and the geographical location. Indeed, our goal was to model the output polygon based on the type of adverb of place. When one of these dependencies existed, we extracted the adverb, and we check if it was one of our pre-defined list of modifiers. The full list of modifiers and their corresponding number of kilometers used for transforming the polygon are reported in Table 4 and defined in a configuration file.
We defined these kilometers values arbitrarily to fit our project use cases. Anyway, these values could be changed according to the necessities, e.g., in the case of vast areas where the center coordinates of this place were too much inland or to fit the needs of the final user of the framework. At this point, we used the geographic entity obtained as an output of GeoNames and the modifier (i.e., move, shrink, enlarge) for generating the output polygon describing our area of investigation. The polygon had a square shape and was obtained by applying mathematical functions (depending on the adverb, if present) to the point given as output by GeoNames. The default size of the polygon was 10 × 10 km. For example, suppose we detected a geographical entity as "Athens" and an adverb like below. In that case, we would obtain by Geonames the center coordinates of Athens (N 37 • 59 1 , E 23 • 43 40 ) and, starting from this point, we first generated a polygon of 10 × 10 km and, consequently, it was moved by 10 km to the south.

Question Processing
The question answering approach for retrieving appropriate EO data was based on controlled natural languages as proposed in [79]. Given a language, we obtained a controlled language by considering only a subset of its vocabulary and its grammatical rules. This was based on the assumption that, especially in close domain scenarios like the SeMaRe ontology, the words and the syntactic structure of the questions that users used for asking the system follows specific patterns. By creating a controlled natural language, the process of answering users' questions could be seen as a deterministic process of searching for the right resource in the ontology. In this way, it was possible to build a finite state automaton (FSA) [80] that was capable of recognizing any sentence written in the controlled natural language of choice.
Following the approach shown in [79], we first built the dictionary for our controlled natural language by taking into account all the labels of each entity, class, and property that could be found in the SeMaRe ontology.
This dictionary was used to map words or phrases contained in the question to the proper resources in the ontology. For example, the word "turbidity" was mapped to the ontology class seo:Turbidity. However, if we just considered the resources' labels to build our vocabulary, the system would lack the ability to cover any case where there was no direct match between the words in the questions and the ontology entities, like what happened in the case of typos and synonyms.
To cope with this problem, we extended the vocabulary using an approach based on distributional semantics models. More specifically, we exploited Word2Vec [81] to create a vector space model using Wikipedia abstracts.
During the data matching step, the system checked if there was a match between a phrase in the question and one of the labels of the knowledge base by exploiting the dictionary. If a match was not found, the system additionally computed a ranked list of alternative phrases semantically similar to the one in the question. Therefore, the system substituted iteratively the phrase in the question with the ones retrieved using the distributional semantic model until a match with the knowledge base was found. To prevent deadlocks, we also introduced a backtracking algorithm that, combined with the semantic matching mechanism described above, allowed the FSA to reconsider the previous choices, thus leading to the correct resource.
The FSA, which was used to analyze the users' questions, was designed based on the syntactic structure of questions for the English language and is shown in Figure 2. Each state (i.e., S 0 , S 1 , ... , S F ) was associated with portions of SPARQL templates and the type of token analyzed by the system causes the shift from one state to the next. The token type could be one of the following: Question start, Entity, Property, Class, Literal, Operator, Location, and Date. In particular, the types Entity, Property, Class and Literal referred to the respective resource types of the SeMaRe ontology, while the types Operator, Location and Date were used to answer more complex questions involving specific modifiers. For example in the question "Get the images that contain turbidity phenomena after 22 May 2019", "after" is an Operator, while "22 May 2019" is a Date. In addition there were some words/characters, i.e., "having", "with", "?", "." which caused the FSA to shift in particular states: for example, the character "?" or "." caused the shift to the final state S F which concluded the computation. Given a natural language question, the system analyzed it progressively by gradually removing the rightmost word. In this way, it was possible to identify the most extended token that allowed the FSA to shift from the current state to the next. An example of this behavior is shown in Figure 3. Given as example the question "Get the images that contain turbidity around Athens", the algorithm first attempted a match between the entire string and the resources of the ontology as defined by the rules of the FSA. Since there was no match with this first attempt, the algorithm reduced the string removing the last word from the right until matching was found.
In this example, "Get the" had a matching with the initial state of the FSA. After a match was found, the string was removed from the initial question, and the algorithm analyzed the remaining section similarly.
In the next step (step 2), the process was iterated, and a match is found between "images" and one of the resources of the SeMaRe ontology. Again, the string was removed, and the FSA could shift to the next state. The process goes on until the FSA reached its final state. At the end of this process, the list of the states that were visited by the FSA, as well as the strings that caused each shift were used to construct the final SPARQL query. Each state of the FSA was paired with a specific SPARQL pattern. By combining the patterns of all the states that have been visited during the execution, it was possible to build the final SPARQL query (Figure 4).

Implementation
The SeMaRe methodology was implemented on application specific basis following the overall architecture presented in Figure 5. The Semantic Annotation module regularly retrieved satellite images, i.e., the raw data and their XML metadata, and applied remote sensing algorithms to identify any oceanographic phenomena in them. The output of the module's processing step was: (a) metadata about images obtained directly from the satellite and updated during RS processing, and (b) geospatial objects representing phenomena that were extracted by RS algorithms applied on raw data of images. At postprocessing, the module converted the output into a semantically aware format, that is, as RDF instances according to the SeMaRe ontology, and populated the Knowledge Base (KB), ensuring that the latter was up-to-date with the output of the Semantic Annotation module. The Knowledge Base was the middleware between the user needs and the raw data by providing the information that the Question Answering (QA) module needed to parse the user query and by generating the response to the user. The user interacted with the QA module by providing natural language queries as free text using a REST API. The text was parsed, elaborated, and mapped with all the fields supposed for the exact interpretation of the query. Then, the text was transformed into a SPARQL query and sent to the Knowledge Base to retrieve the appropriate data. Finally, the QA module parsed the response and satisfied the user's needs. The details for each module are presented in the following sections.

Semantic Annotation Module
The Semantic Annotation module contains python scripts that downloaded newly acquired images from Sentinel 1, 2, and 3 daily. The system was limited to work in a specified geographic region between Italy and Greece due to the computational costs of the processing step. Still, it can be expanded to global coverage if appropriate computing capacity becomes available. Each image was provided with its INSPIRE-based image-level metadata (name, abstract, spatial extent, time of the acquisition, etc.) and is appropriately pre-processed and processed to identify phenomena within the image. The general processing followed the pursuing scheme starting at 00:00 a.m. every day: • Image and XML metadata download; • Image pre-processing (radiometric/atmospheric corrections, cloud masking, etc.); • Phenomenon-specific image processing (see Section 3.1); • Creation of phenomenon-specific raster map; • Conversion of the raster map to vector (GeoJSON); • Update of the INSPIRE compliant enriched metadata combining image metadata and phenomenon-specific processing results.
All the above steps of the routine were carried out by a combination of Python scripts, SNAP GPT (Sentinel Application Platform, https://step.esa.int/main/toolboxes/snap/, accessed on 4 August 2021) and GDAL commands, and bash scripts in a Linux server. For each satellite image, the module produced two files, containing the output of the processing step: 1.
An XML file containing the original and the updated metadata of the image. The original metadata file maintained generic metadata about the retrieved image and used the INSPIRE datasets and services in ISO/TS 19139 based XML format (https://inspire.ec. europa.eu/id/document/tg/metadata-iso19139, accessed on 4 August 2021). The updated metadata file extendd the original version during the image processing with additional application-specific elements.

2.
A GeoJSON file that maintains spatial and descriptive metadata about the identified phenomena within the image. Each phenomenon instance was characterized by a) the spatial area it covers, that is, its geometry in Well-Known-Text (WKT) format using the WGS84 reference system and b) the set of its descriptive properties as described in Section 3.1.
At post-processing, the module automatically converted the image's INSPIRE-extended metadata (XML file) and phenomena (GeoJSON file) to an RDF file. For the conversion step, the TripleGeo utility (https://github.com/GeoKnow/TripleGeo, accessed on 4 August 2021) was used and adapted according to SeMaRe needs. The conversion is based on manually-defined rules in XSLT files that map XML and GeoGSON fields to RDF classes and predicates according to the SeMaRe ontology. The generated RDF file was loaded to the Knowledge Base automatically using the Knowledge Base's API.

Knowledge Base
The semantically-enabled core of the SeMaRe system was the knowledge base, which was structured in two levels: • Schema Level: Modeled the marine domain application concepts about phenomena that are present and interpretable in EO images and formalized as an ontology containing the semantic definition of the data and defining what properties each image and phenomenon had as described in Section 3.2. • Instance Level: Contained the actual data for describing semantically annotated images and phenomena according to the schema.

Schema
The task of conceptualization, described in Section 3.2, produced the general design of the ontology. This design needed to be translated using a descriptive language for storing semantically annotated EO images into a computable representation. We used the W3C RDF (https://www.w3.org/TR/rdf11-concepts/, accessed on 4 August 2021), a common language and instrument for ontology development for the Semantic Web. To translate the design of the ontology in RDF language, we used Protégé (https://protege.stanford.edu/, accessed on 4 August 2021), an open-source tool provided by the University of Stanford that allowed developing ontologies and intelligent systems. The RDF/XML serialization of the SeMaRE ontology is available in Github (https://github.com/SeMaReSEODWARF/ Ontology, accessed on 4 August 2021)) and is used as the schema of the Knowledge Base. The following are some general implementation decisions: • The ontology IRI was specified to http://seodwarf.eu/ontology/v1.0; • The Pascal case capitalization style used for naming classes (e.g., SatelliteImage); • The Camel case capitalization style used for naming properties (e.g., hasCoverage).

Instances
While the schema level refers to the implementation of the domain conceptualization, instances are the actual data of the Knowledge Base, that is, instantiated information about semantically annotated EO images and phenomena. Instances were generated as RDF triples in accordance with the ontology and they are inserted in the Knowledge Base during the semantic annotation process. Both instances and the ontology were represented as triples and are preserved and exposed through the Knowledge Base's endpoint.

Endpoint
The Knowledge Base is semantically enabled, and thus its implementation supports semantic web technologies. Its content uses the RDF data model and is consultable using a query language such as SPARQL, allowing semantically enabled query answering over its content. In addition, it supports the GeoSPARQL query language for executing queries involving relations between spatial entities. Well known frameworks that support handling of RDF formatted data include Apache Jena (https://jena.apache.org/, accessed on 4 August 2021), OpenLink Virtuoso (https://virtuoso.openlinksw.com/, accessed on 4 August 2021) and Parliament [82] and several studies compare state-of-the-art RDF stores [83,84]. In this implementation, the Knowledge Base was implemented in the Parliament store because it supported GeoSPARQL queries on polygon geometries. Knowledge Base's (available at http://90.147.102.176/parliament, accessed on 4 August 2021) content was exposed through a public SPARQL endpoint and organized in two graphs. Schema level information could be queried from the http://seodwarf.eu/ontology/v1.0 graph and instance level information from the http://seodwarf.eu/triples graph.

Question Answering Module
The Question Answering (QA) module consisted of the dictionary and the query matching module. The dictionary contained the labels of the resources belonging to the ontology and was used to map phrases in the user question and the aforementioned resources. Its content was exposed through Apache Lucene (https://lucene.apache.org/, accessed on 4 August 2021), which is a high-performance text search engine. The query matching module included the FSA logic and the geocoding module, and its responsibility was to transform the natural language question into a SPARQL query. The question was split into sections, and each of them triggered a transition of the FSA from one state to another. Each state was associated with a specific SPARQL construct. At the end of the NLP analysis, the set of states in which the FSA transitioned was used to build the final SPARQL query that was then executed against the KB endpoint. The QA module was implemented in Java and supports a RESTful architecture for the communication between system modules and users. The REST API exposed the following two basic methods: • getExpandedQuery, which was used internally to translate a natural language question into its equivalent SPARQL query; and • getKBResults, which allowed user communication with SeMaRe by retrieving their NLP queries and responding with the appropriate answers.

Preliminary Evaluation
In this section, we describe the preliminary evaluation of the SeMaRe system. For this kind of retrieval systems, based on a natural language interface, an "in vivo" evaluation would be desirable to assess: • the ease of use of the system, i.e., if the adoption of natural language actually helped the users to express their needs; • the accuracy of the system, i.e., its ability to correctly retrieve instances when querying a knowledge base in which semantically annotated EO images and phenomena were described as RDF triples; • the efficiency of the system in terms of response time.
In the following, we describe the preliminary evaluation performed to assess the ease of use (Section 5.1) and the accuracy and efficiency (Section 5.2) of the system.

Ease of Use Evaluation
We carried out an "in vivo" evaluation by involving a company that expressed its willingness to participate in the experiment.
The experiment involved a set of 25 subjects, in which participants were selected according to their degree of knowledge with SPARQL so that the ratio between expert and non-expert users would be balanced. The experiment was composed of the following four phases: 1.
Gathering information about the participant's skills in IT and SPARQL; 3.
Participants were asked to interact with the system by freely querying the interface; 4. Survey about the system, collecting feedback from the participants.
From the second phase of the experiment, it emerged that the 52% of the participants declared that they had low to medium level of IT skills; the 48% of them declared having none or little knowledge of SPARQL. During phase 4, we asked the participants to express their overall rating about the system using a 10 point Likert scale, which ranged from a minimum of 1, which expressed the lowest score, to a maximum of 10. From the analysis of the user feedback, it emerged that the Natural Language Interface was appreciated by 80% of the participants, who assigned an overall rating greater than 6. We asked the users to select a preference between SPARQL and natural language when querying the database after interacting with the system. Although this result is influenced by the presence among the participants of users that never used SPARQL, we can see that also people with technical skills felt more confident in using natural language rather than a data query language.

Accuracy & Efficiency Evaluation
For evaluation purposes, we transformed and loaded to the knowledge base 165 images and their associated phenomena. This sample consists of semantically annotated images from 10 Novermbaer 2019 to 3 December 2019 and contains turbidity, oil spill, and algal bloom phenomena. Their geospatial distribution covered a large area of the Adriatic, Ionian, and Tyrrhenian Sea ( Figure 6). We note that all phenomena were described by polygon geometries. Table 5 summarizes the content of the knowledge base. The total number of the generated triples for the 165 images was 103,673, resulting in approximately 628 triples per image. The total number of identified phenomena was 29,099, which resulted in approximately 176 phenomena per image. Out of the 165 images, six did not contain phenomena because the RS algorithm did not identify a phenomenon, and consequently, they could not extract phenomena related metadata (empty GeoJSON file). For these images, the knowledge base maintained only its generic metadata (INSPIRE based metadata). Table 5 also presents statistics per phenomenon type. For instance, the knowledge base contained 47 images annotated with turbidity phenomena. The total identified turbidity phenomena were 3791, and, thus, the average turbidity phenomena per image was approximately 80.  The SeMaRe system, and especially the ontology and the QA module, was designed based on some generic user information needs that are sufficiently representative of the range of requests that are possible to be made by the end-users. These user needs were specified by marine domain experts in the framework of the SEO-DWARF project and expressed as potential query forms. According to these generic query forms, an end-user could ask for information about: • images that contained a phenomenon, optionally, for a given location and a given period of time; • phenomena, optionally, for a given location and a given period of time; • areas where a user specified threshold of parameters/index, i.e., phenomenon category, was reached.
For this preliminary evaluation, these generic query forms were instantiated for the turbidity phenomenon to the seven natural language queries shown in Table 6, and we assessed: (a) their accuracy, i.e., if the natural language query could be converted in a suitable SPARQL form and return the correct result set, and (b) their efficiency regarding the time needed to execute each query. The Results column shows the number of the results for each query, and the Time column shows the response time for each query in seconds. For example, the natural language query Q: "Get all the images that contain turbidity phenomena" (SN 3) was transformed by the QA module to the respective SPARQL query S and returned 47 results, i.e., images, in 5 s. As indicated by the Images with turbidity statistic of Table 5, the query managed to identify all 47 images contained in the knowledge base. In fact, the execution of the SPARQL query was a deterministic process that ensured that all correct results from the knowledge base were retrieved as long as the SPARQL query was syntactically correct and matched the underlying schema. Therefore, the focus lay on the correct transformation of the natural language to the SPARQL query, which as Table 6 shows, was successful for all queries, including those containing spatial and temporal references (queries 4, 5 and 6). Nevertheless, in real-world, end users are free to express their own queries, and it is natural to make the hypothesis that real user free-text queries will greatly vary regarding their syntax and structure. A further evaluation of the system accuracy would require online experiments to capture user expressed needs and test whether they are satisfied with their results. This kind of evaluation is left as a future work due to the need of building an appropriate prototype and a dataset, in which it is necessary to: 1.
collect a real-word set of natural language queries asked to the system; 2.
define the subset of the relevant images for each query in order to compute the accuracy in terms of the classic precision and recall measures adopted in Information Retrieval.
Regarding system efficiency, we observed that the slowest queries contained spatial references (queries 4 and 6). Even though this is an expected behaviour because spatial operations are in general costly, in the discussion section, we stress that RS algorithms produced fine-grained, pixel-level scale, polygon geometries for the phenomena, which further increased the associated query execution costs. A solution to the efficiency problem could be the use of geometric approximations of the phenomena polygons (e.g., Minimum Bounding Boxes) for the execution of spatial operations during querying.

Discussion and Conclusions
In this paper, we have presented SeMaRe, a semantic marine retrieval framework that aims to allow users to retrieve information regarding marine phenomena annotated on EO satellite images. Specific marine phenomena have been selected as the test case of SeMaRe, i.e., turbidity, Chl-a concentration (algal bloom), and oil spills. Information contained in the images provided by three satellites (namely Sentinel 1, 2, and 3) is routinely extracted for the selected marine phenomena using RS algorithms, which are either widely tested and accepted by the scientific community (ACOLITE processor) or were developed by our team for this scope (OBIA method for oil spills). The evaluation of the algorithms needs extensive in-situ data, but this activity is beyond the scope of this work. Another issue, which has arisen during the test phase of the SeMaRe system, concerns the geometries of the RS algorithm output for the marine phenomena. Working at the pixel level scale produces a very high volume of geometries about the phenomena, and the limited generalization, which is being applied, does not efficiently reduce this high volume of geometries. Due to this fact, the associated storage, processing, and retrieval costs are also high, as the response times (Table 6) of some queries show. Thus, it is important that a more efficient generalization algorithm for the geometries produced will be integrated into the SeMaRe system.
The marine domain knowledge is formalized as an ontology that contains information about EO satellite images and their associated phenomena. The presented ontology models turbidity, algal bloom, and oil spill phenomena, but it can be easily extended for other marine phenomena such as hot spots, upwelling, fronts, trophic status index, winds, and waves. SeMaRe concepts are linked with the SWEET ontology, but links with other sources, such as DBpedia and GeoNames, can be established at schema or instance level in order to enrich the Knowledge Base. For example, a SeMaRe phenomenon could be linked, using an include spatial relation, with a GeoNames geographical entity that would allow the Question Answering module to perform more complex or semantically-abstract queries (e.g., "find oil spills around big coastal cities") and to retrieve more accurate information. In the current implementation, SeMaRe ontology uses custom properties for representing images (seo:hasBoundingBox) and phenomena (seo:hasCoverage) geometries in order to explicitly define the relation of concepts and geometries in a simple way. However, in terms of reusability, the substitution of these properties with a common spatial vocabulary, e.g., the GeoSPARQL annotation, is suggested.
Our framework is also based on a Question Answering module which allows users to express their information needs by using natural language easily. To translate the users' questions into SPARQL queries, we adopted a method based on controlled natural language, which has also been empowered with distributional semantics models. Since questions can be related to specific geographical areas, a geocoding module has been integrated to recognize geographical entities within the question and translate them into the corresponding coordinates (latitude and longitude). As for the retrieval capabilities of the framework, we observe that the system is able both to retrieve images that contain certain phenomena and to find phenomena in a certain image. However, queries that involve spatial operations with phenomena geometries (e.g., intersects) present increased response times mainly due to the fine-grained representation of phenomena geometries (exported by the Semantic Annotation module). Possible improvements regarding the response times of the Knowledge Base include the capture of a less fine-grained geometry for each phenomenon (e.g., its bounding box) or the substitution of Parliament with another RDF store that supports spatial querying. Regarding the use of natural language to query the system, a very preliminary evaluation session with real users showed that the majority of the participants appreciated this kind of interface. It was interesting to observe that people with technical skills preferred natural language interaction rather than using a data query language. As future work, since the specificity of the topic does not allow finding suitable resources in the available literature, we plan to conduct an "in vivo" evaluation of the whole framework to assess its performances in terms of accuracy and degree of usability perceived by real users. In order to do so, we plan to develop a user interface that can allow users to insert their question by writing them in a text box and selecting the area of interest using maps.