Towards an Interoperable Field Spectroscopy Metadata Standard with Extended Support for Marine Specific Applications

This paper presents an approach to developing robust metadata standards for specific applications that serves to ensure a high level of reliability and interoperability for a spectroscopy dataset. The challenges of designing a metadata standard that meets the unique requirements of specific user communities are examined, including in situ measurement of reflectance underwater, using coral as a case in point. Metadata schema mappings from seven existing metadata standards demonstrate that they consistently fail to meet the needs of field spectroscopy scientists for general and specific applications (μ = 22%, σ = 32% conformance with the core metadata requirements and μ = 19%, σ = 18% for the special case of a benthic (e.g., coral) reflectance metadataset). Issues such as field measurement methods, instrument calibration, and data representativeness for marine field spectroscopy campaigns are investigated within the context of submerged benthic measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. A hybrid standard that serves as a “best of breed” incorporating useful modules and parameters within the standards is proposed. This paper is Part 3 in a series of papers in this journal, examining the issues central to a metadata standard for field spectroscopy datasets. The results presented in this paper are an important step towards field spectroscopy metadata standards that address the specific needs of field OPEN ACCESS Remote Sens. 2015, 7 15669 spectroscopy data stakeholders while facilitating dataset documentation, quality assurance, discoverability and data exchange within large-scale information sharing platforms.


The Importance of Interoperable Field Spectroscopy Metadatasets
Interoperability of field spectroscopy metadata is central to facilitating common platforms for sharing field spectroscopy datasets within the remote sensing community [1][2][3][4][5][6].A field spectroscopy metadata standard is defined as those data elements that explicitly document the spectroscopy dataset and field protocols, sampling strategies, instrument properties and environmental and logistical variables [5].A metadata standard in general supports dataset-related functions (e.g., identification, discovery, administration, version control) built on a framework of specific categories of defined metadata elements [7].Interoperability between metadata standards can be defined as the preservation of information within a metadataset as it is exchanged across data platforms [8][9][10][11].The metadata schema, taxonomies, and granularity of the metadata elements are determining factors in the interoperability between one metadata standard and another.The increasing volume of field spectroscopy datasets, from a broad variety of instruments, across research domains [1,[12][13][14] necessitates a framework for field spectroscopy metadata interoperability.
In addition to a core metadataset that is critical to all campaigns [5], an extended metadataset is required to support specific applications of the data.An application profile for a metadata standard constrains and obligates a set of metadata elements for specific user requirements [15].Scientists with domain expertise are best relied upon to inform about what belongs in an extended metadataset relating to those applications of interest-a marine scientist, for example, has the requisite knowledge and experience to provide a credible opinion on extended metadata for substratum features such as seagrasses and corals.This subject matter expertise can be used as a basis for adapting and expanding the core metadataset to make it useful for specific user communities.Currently there is no metadata standard for field spectroscopy, and consequently, none for specific application domains, such as marine environments.As key stakeholders of the data, field spectroscopy scientists have a vested interest in the development and adoption of a standard most suitable to their needs as both metadata data creators and users of these data.

Unique Requirements for Marine Field Spectroscopy Metadata
Field spectroscopy within the marine environment has unique requirements due to environmental factors and the additional logistics and challenges of collecting and documenting measurements both above the water surface and below.For example, tide conditions, wave attenuation, turbidity, and a modified and attenuated light field are just some of the environmental conditions influencing the spectral measurements that are not generally a consideration for terrestrial campaigns.For benthic or substratum targets such as coral, spectral measurements may be taken above surface or below surface and opinions differ on how inclusive a metadataset must be in order to document environmental and target properties [16,17].Two methods may be used for taking spectral measurements: (1) the actual measurement is done from above the water by lowering a sensor on the side of the boat versus a diver operated measurement below water; or (2) the bottom feature is taken out of the water for measurement.
Operating a spectroradiometer underwater requires specialized protocols and resources for protecting and stabilizing the instrument, ensuring operator safety, and documenting operations.Water proof underwater housings are necessary to permit submersion to various water depths of the instrument and in some instances the instrument must be specially adapted to the underwater light field.Underwater spectrometry can be surface operated by lowering the sensor in the water column, which results in limited overview or influence on the bottom type being measured.SCUBA diver collected spectra provides a more reliable approach for recording the required spectra of substratum targets.This however will be limited due to air supply, dive time and depth for safe diving practice.At the University of Queensland, various customized underwater spectrometer systems have been developed and tailored specifically to coral reef ecology, and the ecology and physiology of animal colour vision.
The choice to document metadata concurrently or retrospectively can be a result of prioritizing metadata collection due to constraints of time and conditions under which the measurements are being taken [18][19][20][21][22][23] with the marine environment being challenging due to its ephemeral nature.The accompanying protocols for recording metadata in situ are interdependent with the challenges of radiometric data collection underwater as they are designed to simultaneously ensure the requisite operator safety [19].For example, information relating to viewing geometry, which includes the height and angle of the sensor above the target, and height of the sensor above surface or substratum, the field of view, and foreoptics used-is best documented in the same window of time as the EMR signatures being recorded, since this data is difficult to obtain post-event and prone to error if done from memory alone.However, resourcefulness and creativity are required in marine campaigns for concurrent documentation (Figure 1), and taking detailed photographs of the target and its environment is recommended.

Reviewing Existing Field Spectroscopy Protocols for a Framework for Interoperability
In the absence of a field spectroscopy metadata standard, field practice is the most informative resource for requisite metadata for specific field spectroscopy applications.Worldwide practice for recording metadata relating to the instrument properties, illumination and viewing angles, reference standards and general project information varies considerably but is generally completed according to a group's own definition of what constitutes a suitable metadataset [14,17,22,24] for a given application.There are laboratories, research agencies and organizations that provide documentation for good practice in the field.Table 1 [20][21][22][23][24][25][26][27] is a representative sample of the scope of the good practice guides.Their prescriptiveness and underlying assumptions about the instrument operator and principal investigator varies.There are guides that are comprehensive, especially for specific applications, and some that assume that the principal investigator has an advanced understanding of the principles of sampling (viewing geometry strategies, bi-directional distribution functions), and as a result, little background information about field spectroscopy science is provided.The amount of advice given and its explicitness varies across the good practice guides and illustrates the spectrum of opinions about what constitutes good sampling strategy.The comparison shows that the application-specific guides (Australian Government Department of Sustainability, Environment, Water, Population and Communities: Standards for reflectance spectral measurement of temporal vegetation plots and the University of Queensland Field Spectrometer and Radiometer Guide) discuss the broadest range of topics for field spectroscopy and are more explicit in their instructions for field protocol and how to document it (with the exception of the Spectranomics Protocol: Leaf Spectroscopy (350-2500 nm) guide).The other guides leave it to the researchers to decide what viewing geometries and sampling strategies are ideal, and omit references to field data documentation.
The National Environmental Research Council Field Spectroscopy Facility (NERC FSF) states in its online instrument (ASD Field Spec Pro, GER1500, GER5700) guides that it is unable to recommend sampling strategies due to varying requirements across projects, and that this responsibility ultimately lies with the principal investigator [20][21][22], but it does advise on sampling strategies in its training courses [28].It does provide general guidance about warming up the spectroradiometer prior to measuring samples, the importance of calculating the field of view, secure mounting of the instrument, and taking white reference measurements for the ASD Field Spec Pro [18,20,21].PANalytical Boulder (formerly ASD Inc.), a leading manufacturer of field spectroradiometers, maintains an online document repository on the physics of field spectroscopy, as well as general guidance for instrument optimization, and viewing geometry in its instrument guides [26,27].
Other good practice guides provide more explicit guidance on field protocol.The Australian government Department of Sustainability, Environment, Water, Population and Communities provides a detailed protocol for spectral measurement of temporal vegetation plots [24].It includes a background on electromagnetic radiation theory for field spectroscopy, and recommends the number of average signals per sample, optimal viewing geometries, stabilizing equipment setup, methods for cleaning the white reference panel, and a protocol for measuring an instrument's conformance to manufacturer specifications (including warm-up time for illumination laps, average spectrums and white reference measurements taken).The Carnegie Spectranomics lab provides detailed protocol for leaf collection and spectroscopy, but omits any discussion on electromagnetic radiation theory [25].
The University of Queensland provides a detailed protocol for marine campaigns that includes advice about the instruments (ASD, Ocean Optics, TriOS Ramses) best suited to the type of signal being recorded (in situ marine spectral reflectance of submerged features such as coral down-or up-welling irradiance for depth profiles) [23].It also presents optimal sampling strategies and ways of minimizing influencing environment effects on the signal, including: specific references to CSIRO-recommended viewing geometries, proper communication with divers operating the instrument, ways to avoid splashing water on the instrument, minimizing reflecting effects of wet samples and surrounding environments, and measuring the water surface and column before each white reference measurement to counteract their influence [23].The range of opinions and explicitness on what constitutes good practice among these guides has implications for the remote sensing community.Rasaiah et al. [5] established that a lack of standardized protocols, and no community consensus on how to document them (i.e., what metadata to provide), ultimately may serve as a hindrance to intercomparison of field spectroscopy datasets and quality assurance.

Existing Metadata Standards in Support of Specific Field Spectroscopy Applications
Existing metadata standards within field-spectroscopy related domains may also provide guidance for documenting metadata for specific field spectroscopy applications.Metadata standards such as Dublin Core, Darwin Core, Ecological Modelling Language (EML), and the Content Standard for Digital GeoSpatial Metadata have been adopted by agencies involved in research in geospatial science or geospatial data standards and include the Federal Geographic Data Committee (FGDC), NASA's Earth Science Division, Commonwealth Scientific and Industrial Research Organisation (CSIRO), and Infrastructure for Spatial Information in the European Community (INSPIRE), among others [29][30][31][32][33][34].Efforts towards a XML-based metadata schema based on the ISO 19115 standard and incorporating the core metadataset identified in Rasaiah et al. [5] are presented in Jimenez et al. [35].A comprehensive review of all geospatial metadata standards is beyond the scope of this paper; a subset of the most common geospatial metadata standards within the context of field spectroscopy applications are discussed in more detail in Section 2.
Metadata standards adopted by the geospatial community can be categorized into generic standards, applicable to all datasets for the purposes of archiving and discoverability (Darwin Core, Dublin Core, D-Space Metadata) and more specialized standards of use for a given user community include Access to Biological Collections Data (ABCD) Schema 2.06 for ecology, and ANZLIC Metadata Profile 1.1 (Geographic dataset core) for geospatial datasets.Each standard is designed with different objectives for the use of the metadata, for different user groups, with unique vocabularies, taxonomies (discipline-specific classifications based on ontologies among metadata elements), and granularity (the specificity or level of detail at which each metadata field is expressed).The variety of standards illustrates that there is no "one size fits all", and the utility of a standard is directly linked to the preferences and needs of data users, and the purposes for which the metadata will be utilized.

The Impact of Metadata Schema on Interoperability and Dataset Discoverability
Metadata standards can be structured according to a specific schema, with unique taxonomies, syntax, and granularity.The term "standard" has often been used interchangeably with the term "schema", but there are differences between the two.Metadata schema are the specifications for representing metadata elements in digital format [7].The schema can include document format (HTML, XML, SGML), syntax (controlled vocabularies), taxonomies, and granularity.Metadata standards and their schema play the greatest role in interoperability between metadatasets.Examining the complexity of schemas helps illustrate this.Schemas can be categorized into three levels of complexity: (1) simple (highest degree of interoperability with other metadata schemas, generally multidisciplinary and non-granular, with 15-25 metadata fields); (2) simple/moderate (interoperability is inversely correlated with the specific needs of an application or discipline, granular with more metadata fields); (3) complex (interoperability requires expertise, hierarchical, granular, and extensive, with more than 100 metadata fields) [36] For example, Dublin Core 1.1 has fifteen elements at a single level of granularity, whereas ABCD 2.06 has 1004 elements defined within hierarchies.Mapping and intercomparison of metadata elements between these two standards is not a straightforward exercise and implies that much consideration must be given to adopting the most suitable metadata standard for a given dataset.Therefore, the complexity of a schema must accommodate the user's needs and the purposes for which the metadata will be used (discoverability, archiving, data mining, etc.).
The capability of a data user to find a dataset in a digital repository and assess its usefulness for a given application is dependent in part on its underlying schema.Consider the simple scenario of a database user conducting a search for geospatial datasets by entering keywords for the criteria-this criteria may include geographic extent, description of the datasets, nature of the scientific study for which the dataset was generated, and the instrumentation and sampling protocols used.It is possible that two similar datasets in the database can meet the criteria for a user's needs, but one may be undiscoverable if its metadataset adheres to a schema that is not catalogued by the data service accessing the database or is inadequately structured to convey its usefulness to the data user.
There are ongoing efforts to translate metadata from one standard or schema to another to avoid such problems-this is also known as "crosswalk mapping" [37].Schemas have also been extended or adapted for specific applications.OGC (Open Geospatial Consortium) adopted GML (Geographic Markup Language) and KML (Keyhole Markup Language) as schemas based on the XML-format for geographic datasets and 3D map software, respectively.The inherent properties of metadata standards discussed here demonstrate the need for a more thorough investigation that would identify the requisite essentials for building a robust field spectroscopy metadata standard that aligns with specialist needs and enables interoperability with other metadata.The aim of this paper is to present an approach to developing a field spectroscopy metadata standard with extended functionality for benthic applications.These essentials include defining the key metadata for benthic reflectance, and an assessment of how well this metadata is supported by existing metadata standards.

Identifying Key Metadata for Marine Measurements
Defining the key metadata for benthic reflectance requires firstly identifying the user community, and secondly, consulting them directly on what they judge to be critical metadata within the applications they would use this metadata for.These activities align with the core principles that must be adhered to when designing a "good" metadata standard, which include: identification of the needs of users who will access and use the data; identification of an application profile; direct involvement of interested stakeholders; extension or refinement of existing standards that may not entirely meet the requirements of users; enabling modularity for logical and consistent organization of the data; facilitation of data discovery, retrieval, and re-use; and elimination of redundancy in data documentation so that data is collected only once [30,[38][39][40][41].A methodology for identifying the key metadata for marine measurements for informing a field spectroscopy metadata standard was designed using these core principles, and applied by identifying the needs of the users as the initial step.
An expert panel of field spectroscopy data stakeholders from the Australian and international community was convened at the TERN ACEAS "Bio-optical data: Best practice and legacy datasets" workshop in Brisbane, Australia held on 18-22 June in 2012.The purpose of the workshop was to "drive best practice in field measurement and to lay the foundations of an international standard for the exchange of spectral datasets" (p. 1, [42]).The workshop participants included scientists with expertise in vegetation, marine, estuarine, mineralogical, and soil reflectance measurements.Based on the collective expertise in the group, panel discussions were structured to identify key metadata for specific application domains.A group of remote sensing scientists with expertise in marine and estuarine field spectroscopy comprised the marine metadata group.Each team was presented with a baseline metadataset comprised of the core metadataset identified in Rasaiah et al. [5], field data collection protocols unique to each application, and proposed metadata obtained through personal interviews with field spectroscopy scientists prior to the workshop.The objective of the activity was to derive the elements of a metadata standard for each application that would incorporate the core metadataset, application-specific metadata, and optional metadata as proposed by each team for enhancing exchange and usability (Figure 2).Once presented with the baseline metadataset, the participants were asked two questions: (1) "If you were to create the highest quality metadataset possible, for use in either calibration or validation activities, which fields would be critical, and which would be optional?(2) "Do you recommend any new fields?"For the first question, "highest quality" was defined to be a dataset that was: (1) comprehensive: accurately documents the protocol executed to obtain the data; (2) complete: inclusive of all metadata critical to that metadataset; (3) interoperable (digitally and semantically): comprises metadata elements expressed in a manner conforming to commonly accepted terminologies and ontologies to accommodate fusion with other datasets and exchange across data platforms; (4) explicit: captures the requisite metadata to a granularity that minimizes potential for recording ambiguous metadata (granularity in this context is the smallest unit of metadata defined for capturing a given unit of information) Prior to panel discussions on application-specific metadata, the above parameters had been defined and discussed with the participants during a presentation given on methods and criteria for a "best fit" metadata standard for field spectroscopy datasets.Calibration and field validation activities were used as a point of reference, as they are widely acknowledged within the field spectroscopy community to require the most stringent adherence to best practices in data collection [43,44].Field protocol, or the sampling and methodology used to generate the field spectroscopy datasets, was selected for inclusion in the metadatasets because it is an integral component in the collection of in situ spectroscopy data [1,5,13,45,46].For each metadata field presented within the baseline set, the scientists were asked to provide a reason for inclusion or comments, categorize the fields as critical or optional, provide an example, and to specify the data type for each field (Boolean/text/numeric/other).Providing an example and specifying a data type allowed the scientists to customize the metadataset in accordance with the taxonomies and vocabularies of their discipline.
The human perspective on metadata was a central consideration in the design of the panel discussions.Rather than being an exercise simply in documenting information related to a given application, it was felt important that the scientists provide direct input into the semantic structure of the metadata.Best practice for creating an application profile requires identifying specific requirements of the community that is going to use the application profile.The approach was based on using scenarios and case studies, and defining the obligation of data elements, with the emphasis on human-generated metadata developed by skilled classifiers ensuring more precise and high-quality metadata [47,48].The granularity of metadata examined in the exercise was also considered.It has been previously demonstrated that in the interpretation and application of a metadata standard, people can easily confuse a concept with the designation used to represent it [49].For this reason, the metadata elements were expressed at a single level of atomization, with no subclasses or formally defined ontological interdependencies among metadata elements.The structure of the discussions served to minimize any potential confusion about what information is being documented and to enable the scientists to define the metadata elements in a way that is least ambiguous and most meaningful to them.Additionally, each team was invited to volunteer any new fields that may be suitable.

Assessing Geospatial Metadata Standards to Support the Core Metadataset and the Benthic Reflectance Metadataset
A representative set of geospatial metadata standards was selected for analysis (Table 2 [29][30][31][32][33][34]50]).The inclusion of a given metadata standard in this set was dependent on it meeting the following criteria: (1) it is a community-endorsed standard and/or (2) it is thematically aligned with in situ benthic reflectance spectroscopy.Generic geographic metadata standards such as ISO 19115 were already incorporated in part or in whole in several of the standards selected (EML 2.1.1,CDGSM, ANZLIC Metadata Profile) and therefore were not directly examined to avoid redundant analysis.
Assessing a given metadata standard in its support of the core metadataset and the benthic reflectance metadataset was done by answering a single question: how many metadata fields (metadata elements) in each existing standard could be used to capture the information in the core metadataset and the benthic reflectance metadataset?
The purpose of the analysis was to determine how well an existing metadata standard can be mapped, unidirectionally, on a metadata element-by-metadata element basis, to the field spectroscopy metadatasets.Unidirectional mapping was necessary for an accurate assessment of conformance between the metadata standards and the spectroscopy datasets.A poorly executed analysis would include a reverse mapping, from the field spectroscopy datasets to the existing metadata standards.However, this would only serve to examine whether the field spectroscopy metadata elements could be operationalized as a metadataset conforming to an existing standard and would fail to yield any meaningful results.An example of such a superfluous mapping, excluded from analysis, is to associate field spectroscopy metadata elements within the numerous generic free-text parameters within the existing standards (such the value-eml-text field in EML 2.1.1 standard).Figure 3 shows a successful mapping for metadata elements in two existing standards to metadata elements in the proposed field spectroscopy metadataset.Criteria were applied to define a successful mapping.These are explained in detail in Table 3. Metadata elements specified at the smallest level of granularity or atomization in the standard were chosen.This was to allow a uniform comparison among the proposed and existing standards.For example, the "Date" field in the field spectroscopy metadataset is expressed as a single unit of metadata, whereas the ABCD standard for Date data (in the /DataSets/DataSet/Units/Unit/Identifications/ Identification/Date container class) has nine subfields (DateText, TimeZone, ISODateTimeBegin, DayNumberBegin, TimeOfDayBegin, ISODateTimeEnd, DayNumberEnd, TimeOfDayEnd, PeriodExplicit) used to capture this information.
Using the finest granularity was true for all cases where the documentation for the metadata standard defined parameters to this level of granularity.This was the baseline against which all standards were measured.All other standards needed to be reduced to the same level of granularity for analysis, taking into account both explicit and implicit references to a given metadata element.The definition of each element was used as the determining factor for mapping.For example, EML 2.1.1 specifies that the "instrumentation" metadata element in the "Methods" module can include information about the quality control and quality assurance for the instrument, therefore it could be mapped to the instrument calibration metadata category in the core metadataset.
Unique and non-unique mappings were counted.A unique mapping occurs when a metadata element (e1) in an existing standard has been mapped to one and only one metadata element (p1) in the field spectroscopy dataset (core/underwater benthic).An example of a unique mapping is the "Wind direction" field for above-surface marine conditions in the FGDC Marine Shoreline Data Extension being mapped to the "Wind direction" field in the underwater benthic metadataset, with no other mappings to other fields in the underwater benthic metadataset.A non-unique mapping occurs when metadata element e1 can be mapped to multiple metadata elements in a proposed dataset.An example of a non-unique mapping is two metadata fields in ABCD 2.06 that can be mapped to both [target] "Species or name" and "Phytoplankton species/classes" in the underwater benthic set.Counting unique and non-unique mappings is useful for determining the requisite explicitness of an existing standard to successfully capture information in a field spectroscopy metadataset.Table 3. Criteria for accepting or rejecting a metadata element in an existing standard for mapping.

Explicit reference
Example: The "Wind speed" metadata element in the FDGC Marine Extension standard was successfully mapped to "Wind Speed" in the coral target metadataset.

Implicit reference
Example: Instrument category metadata elements ("Make", "Model", "Serial Number") could be recorded in the EML 2.1.1 "Instrumentation" metadata field in both the "Protocol" and "Methods" module.

Undefined or ambiguous metadata element
Example: Where the parameter description was absent or too vague to determine its purpose, it was not counted as a suitable metadata element.For example, in ABCD standard user guidelines, the "Method" field within the "/DataSets/DataSet/Units/Unit/Sequences/Sequence/" class has no definition.

Incorrect parent or container class
Example: The "Viewing Geometry" category in the proposed core metadataset is comprised of critical elements relating to sensor viewing angles.A mapping was not successful if counterparts in an existing standard were in the wrong parent or container classes.
Sensor azimuth and zenith angle parameters exist within the FGDC Remote Sensing Extension but are defined within the "Satellite" container class and therefore could not be mapped to sensor geometry metadata in the core dataset.

Manually-defined classes or fields
Example: Instances of the EML 2.1.1 "attribute" parameter that could defined by the user to record any campaign metadata.

Generic metadata element
Example: Any metadata elements within an existing standard that referred to data that could be extracted from a generic data table, such as those referenced by the EML 2.1.1 "dataset" module; the "measurementValue", "Attribute", "dynamicProperties" metadata fields in Darwin Core 1.1 that could be applied to any numeric or text metadata parameter.

Flexibility Analysis
An additional measure was included in the analysis to determine whether an existing standard's flexibility had an effect on how much information it could capture in the field spectroscopy metadatasets (core/underwater benthic).In this context, flexibility is defined as the potential for a metadata element in an existing standard to be re-used (or re-mapped) to multiple metadata fields in a field spectroscopy metadataset.For example, according to the user guidelines for EML 2.06 [34], in the "Sampling" module, the metadata element "instrumentation" can be mapped to all parameters for instrument metadata defined in the core metadataset.This is considered a non-unique mapping.On the other hand, the "Wind speed" metadata element in the FGDC Shoreline Metadata Profile standard can be successfully mapped to one and only one metadata element ("Wind Speed") in the benthic reflectance metadataset.This is considered a unique mapping.The more explicit a metadata element in the existing standard is, the greater the likelihood of a unique mapping for that field.
The degree of flexibility, and its corollary-prescriptiveness-of a metadata standard is worth considering as a measure of its value to a given community.Prescriptiveness has the potential to guide good practice for metadata documentation in the field.Conversely, it is possible that requiring a user to record protocol steps, or target properties in multiple metadata fields at too fine a granularity may in fact be prohibitive and result in an inflexible and onerous standard.This can arise first from draining resources of time in the field by forcing the user to comply with the proposed standard.It may also prevent an expert user from making their own informed choices about what is good practice if they are forced to comply with a rigid metadata standard.An average of unique (UM/me) and non-unique (NUM/me) mappings per total number of mapped elements for each dataset was calculated.These averages were then correlated to a standard's success in capturing information in the field spectroscopy metadatasets (core and underwater benthic reflectance).

Results and Discussion
The results of identifying the key metadata for benthic reflectance measurements and an analysis of existing standards in supporting the field spectroscopy metadataset (the core metadataset and the extended benthic reflectance metadataset) are presented in the succeeding sections.

Benthic Reflectance Metadata
Based on the expert input, metadata for benthic reflectance are presented in Table 4 (critical metadata) and Table 5 (optional metadata).The underwater benthic reflectance metadata list includes metadata elements relating to location and environment conditions in addition to application-specific parameters.This is a result of the unique and complex conditions under which underwater spectroscopy operates and the environmental factors influencing the spectral measurements that are absent from terrestrial campaigns (these include tide conditions, above-and sub-surface conditions, and water column profile data).There is an almost even distribution of critical and optional designations, and two parameters (wind speed and direction) have been ranked as critical in the special case of severe conditions.There are fifteen fields relating to benthic properties, nearly half of which have been designated as critical.Two fields refer to a photo for additional data "Homogeneity/heterogeneity" and "Presence of epiphytes" This is illustrative, in part, of the difficulty of recording metadata in situ for marine campaigns and the use of alternate methods (such as analysis of a photo taken onsite) to add metadata retrospectively.
Metadata relating to illumination, a component of the core metadataset, has also been expanded for the underwater benthic reflectance; these include non-critical metadata including "natural canopy shading" and "artificial canopy effect".There are four parameters relating to viewing geometry that are normally not required for terrestrial campaigns-"distance from bottom/substrate", "distance of operator from sensor", "height of sensor from surface", and "depth of sensor from surface".The latter three are critical only in cases of shading by the operator's body or where data is required for profiling the water column.

Metadata Schema Mappings to the Core Metadataset
Metadata schema mappings from the seven metadata standards to the core metadataset and the benthic reflectance metadataset are presented in the Appendix.

Metadata Schema Mapping Analysis
Rates of successful mappings from the seven metadata standards to the field spectroscopy metadatasets are presented in Figure 4. Detailed analysis is provided in subsequent sections.Fifteen metadata fields within Dublin Core were examined, with low overall mappings.The consistency in high failure rates could be accounted for by Dublin Core's primary purpose to identify a dataset at the collection-level with parameters whose scope are limited to content (i.e., subject, description), intellectual property (i.e., publisher, rights), and instantiation (i.e., format, identifier).The mapping had some success (5%) with the core metadataset, specifically within a subset of the core metadataset relating to project information, of which four metadata elements could be mapped to (given that the owner of the dataset would choose to use the project/experiment details as identifiers for the dataset as well).

Access to Biological Collections Data Schema 2.06
One thousand and four metadata elements were examined in ABCD 2.06.It mapped to 29% of the core metadataset and 43% of the elements in the benthic metadataset.Dublin Core has been wholly incorporated into ABCD 2.06 so a minimum of successful mappings to the core metadataset is guaranteed.The mandate for ABCD is to facilitate "access and exchange" of "primary biodiversity data" [31], of which the underwater benthic reflectance metadataset has the higher proportion, compared to the core metadataset, in terms of biological sample parameters (including species, specimen id).

Ecological Metadata Language 2.1.1
Four hundred eighty four elements in EML 2.1.1 were examined.It had the highest overall success with both metadatasets: 91% for core and 33% for underwater benthic.As with ABCD 2.06, it is biased towards biological data collection.Mappings to underwater benthic can increase if the "table dataset value" element, referring to an associated table with target characteristics, is selected to store parameters such as and chlorophyll concentration (benthic).However, the "table dataset value" element was ignored for successful mappings as it was classed as too generic, according to the criteria in Table 3.
Its success with the core metadataset can be accounted for in part the by the fact that it has a larger amount of dataset-level metadata elements that can be mapped to the "project information" subset, and instrumentation metadata that can be populated in the "methods" module "instrumentation" metadata element, which accommodates description of any instruments used in the data collection.The sampling protocol metadata elements in the underwater benthic metadataset (ex: "height of sensor from surface") can also be captured either in the "methods" or "protocols" modules.According to the EML documentation, either parameter is suitable, based on how the protocols are described: "'methods' is descriptive (often written in the declarative style: "I took five subsamples...") whereas 'protocol' is prescriptive (often written in the imperative mood: "Take five subsamples...")" [34].

Darwin Core
Forty five elements in Darwin Core were examined.These had higher success with the benthic metadataset (33%) than for the core (15%).Those parameters referring to sample properties have been semantically structured for biodiversity data, hence its relative success with benthic data.There were no explicit or implicit references to instrument properties (within the core metadataset), and the "method" parameter was considered insufficient in scope by the authors to be suitable for sampling protocol or viewing geometry.

FGDC Content Standard for Digital Geospatial Metadata (Remote Sensing Extension)
Three hundred and sixty elements in FGDC Content Standard for Digital Geospatial Metadata (Remote Sensing Extension) were examined.The Remote Sensing Extension could be mapped only to 2% of the core metadataset with no mappings to the benthic metadataset.Mappings to the core were for dataset-level metadata, given that the experiment information (name, date) could be used to identify the metadataset at this level.However, this hypothetical dataset would be empty as no target properties could be documented within the standard.The Remote Sensing Extension is designed for digital geospatial data (obtained from satellite and airborne sensors primarily), and has no suitable parameters to capture sampling techniques, viewing geometry, or instrument information for in situ sensors.

FGDC Content Standard for Digital Geospatial Metadata: Shoreline Metadata Profile
Thirty three elements in the Shoreline Metadata Profile were examined.These had higher success with the critical elements in the underwater benthic reflectance (19%), and core (2%).Even though this standard applies to digital geospatial metadata, when examined on its own, it is useful for recording location and environment parameters (wind speed, tide, above surface conditions) for the underwater benthic campaign.It is noteworthy that this standard has no "depth" parameter.The metadata elements mapped to the core metadataset related to a subset of location and environment parameters.

ANZLIC Metadata Profile 1.1 (Geographic Dataset Core)
Forty five elements in the ANZLIC Metadata Profile 1.1 (Geographic dataset core) were examined.Successful mappings to the core dataset were 8% and 5% of the elements in the benthic reflectance metadataset.ANZLIC standards are primarily for cataloguing services, and in the context of the geographic dataset core standard, document information about the "identification, spatial and temporal extent, quality, application schema, spatial reference system, and distribution of digital geographic data" [30].The few core metadataset parameters that were mapped to relate to project and experiment profile information, or the special case of GPS coordinates categorized as spatial reference information for underwater benthic reflectance.

Metadata Standard Flexibility Analysis
The flexibility analysis comparing the average of unique (UM/me) and non-unique (NUM/me) mappings per total number of mapped elements for both metadatasets is presented in Figure 5.
The datasets shows that the correlation between the amount of data captured by an existing metadata standard (% elements mapped in the dataset) and average mappings per element is stronger for non-unique mappings (r = 0.353 n = 14, p = 0.001) than for unique mappings (r = 0.007 n = 14, p = 0.001).This suggests that in the context of the standards studied, generally, the less prescriptive or explicit an existing standard is, the more likely it is to capture a larger amount of information in the field spectroscopy metadataset.These results are significant to the formal adoption and implementation of a field spectroscopy metadata standard.First of all, a balance must exist between the generality of metadata parameters (for capturing the maximum amount of information necessary for a dataset) and the granularity of metadata parameters (so that datasets can be described in sufficient detail).Secondly, the interoperability between a field spectroscopy metadata standard and other metadata standards is dependent in part on the prescriptiveness of the field spectroscopy metadata standard.These two considerations must be addressed to enable data users to share and intercompare datasets.

A Way Forward: Towards a Robust Metadata Standard
Examination of existing geospatial metadata standards demonstrates that although they are deficient in meeting the needs of field spectroscopy scientists, they are comprised of modules and parameters that are useful for enabling and enhancing the robustness, discoverability, quality assurance, and interoperability of the field spectroscopy datasets.These include metadata relating to dataset-level information (title, abstract, keywords, contacts, maintenance history, purpose), data quality (logical consistency, completeness, lineage), access rights (copyrights, levels of access for user groups), revision history, literature citations, and physical format data, among others.
Digital provenance information is especially significant for long-term preservation of datasets, and research scientists have demonstrated a preference for long-term storage capabilities (i.e., over five years) over short-term storage (i.e., less than twelve months) and commonly share datasets from 1-3 months to 2-5 years after findings have been published [51,52].Documenting this metadata has benefit within and outside the field spectroscopy community.It enables logging of the use of the dataset, promotes greater understanding of research inquiries, provides those responsible for its governance with information for forecasting the use of the dataset, who in turn endorse services to support data access [52].
Figure 6 shows a proposed hybrid metadata standard that fuses the core metadataset, the benthic reflectance metadataset identified as requisite by marine scientists and additional metadatasets imported from the standards examined in Section 3 that can serve as a "best of breed" standard.Figure 6.A proposed hybrid metadata standard fusing the core and application-specific field spectroscopy metadatasets with elements from the standards examined.
The new modules that have been imported and customized from existing metadata standards are: dataset module: broad-scope information that describes the entire dataset and includes title of the dataset, metadata standard name and version, revision history, keywords, purpose, and other general descriptors, for the main purpose of cataloguing and discoverability.Imported from the ANZLIC Metadata Profile (Geographic dataset core) metadata element, ABCD 2.06 metadata module, EML 2.1.1.datasetmodule.
resource module: information about the creators/owners/distributors of the data, lineage information, and contact information for the data resources.Imported from ANZLIC Metadata Profile (Geographic dataset core) metadata element; ABCD 2.06 metadata module; Dublin Core 1.1 publisher metadata element, EML 2.1.1.datasetmodule.
access module: specifies access rights to groups or particular users.Includes information about copyrights, trademarks, licenses, sequestered/classified datasets.Imported from Dublin Core 1.1 rights metadata element, EML 2.1.1 access module.
project module: information about the research context and purpose, experiment design, funding and sponsorship.Imported from the EML 2.1.1 project module.
applications module: databases/datawarehouses/online repositories where the data can be accessed, and software recommended for viewing or analyzing the associated dataset.These can be references to EOSDIS Reverb|ECHO, Carnegie Spectranomics, TERN Data Discovery Portal, DLR Spectral Archive (for data access), ViewSpec Pro, SPECCHIO, MATLAB (for data analysis).Imported from the EML 2.1.1 software module.
data quality module: reports, indices, and assurances on the completeness, quality, and logical consistency of the data.Imported from the FGDC Content Standard for Digital Geospatial Metadata (Remote Sensing Extension).Rasaiah et al. [6], presents metadata quality parameters specifically aligned to field spectroscopy.
citations module: relevant literature, publications, reports, journal articles, etc. cited in the metadataset or specifications about how the dataset itself should be cited externally.Imported from the EML 2.1.1 literature module.
protocol module: documentation of (or references to) the sampling and field protocols used in the collection of the field data, such as those for hyperspectral ground calibration, leaf sampling, underwater benthic sampling.Can also include taxonomies, nomenclatures, and classification systems used in the protocol such as the AASHTO/FAO/USDA/Canadian/Australian soil classification systems for soil applications.Imported from the EML 2.1.1 literature module.
The protocol module is especially relevant to field spectroscopy.Section 1 demonstrated that in many cases, sampling techniques for a single target are dependent on the purposes for which the data is being collected, and Section 3.3 established the value of flexibility in a standard in capturing the requisite metadata for a given campaign.Including a protocol module in a field spectroscopy metadata standard allows the user to choose the protocol (with associated metadata elements) they want to apply to their metadataset, and in cases where they are creating one ad hoc, the baseline metadataset for the application is available and can be customized accordingly to the campaign.

Conclusions
A methodology for informing a robust, interoperable field spectroscopy metadata standard with extended capabilities for underwater benthic reflectance measurement was applied using core principles that include identifying the needs of users who will access and use the data, and examining existing standards in their support of the requirements of users.A metadataset for underwater benthic reflectance was identified based on domain expert input.It includes descriptions and rationale for each metadata element, optionality rankings, and preferred data formats.It was established that some parameters are difficult to obtain in situ, due to conditions and environments that are unique to marine campaigns, and can only be populated retrospectively.Seven metadata standards, selected as being representative of standards within geospatial science and its applications, were examined for their ability to support the core metadataset and the benthic reflectance metadataset.These were: Dublin Core 1.1, Access to Biological Collections Data Schema 2.06, Ecological Metadata Language 2.1.1,Darwin Core, Content Standard for Digital GeoSpatial Metadata (Remote Sensing Extension), Content Standard for Digital GeoSpatial Metadata (Shoreline Metadata Profile) and ANZLIC Metadata Profile 1.1 (Geographic dataset core).The results show they consistently fail to accommodate the needs of both field spectroscopy scientists in general as well as marine scientists.Mappings from each standard to the field spectroscopy metadatasets were, on average, 22% of the core metadataset (σ = 32%), and 19% of the benthic metadataset (σ = 32%).In no instances were the critical metadata elements for the benthic reflectance metadataset captured in their entirety.
Field spectroscopy metadata has a large proportion of protocol and sampling information that is commonly documented in biological data metadata standards (hence the relative success with EML 2.1.1)but these are absent from dataset-level specific standards such as Dublin Core 1.1 and the ANZLIC Metadata Profile 1.1 (Geographic dataset core).There was a consistent lack of explicit references to critical field metadata such as instrument properties, viewing geometry, and reference standards.The metadata model in the FGDC Content Standard for Digital Geospatial Data (Remote Sensing Extension) for satellite and airborne sensors was the most closely aligned with requirements for field spectroradiometers.
Flexibility analysis revealed that the less prescriptive or explicit an existing metadata standard is, the more likely it is to capture a larger amount of information in the field spectroscopy metadatasets.The correlation tests for unique and non-unique mappings show that flexibility has a positive effect on a standard's success in capturing more information.These results have the greatest implications for metadata that documents field or sampling protocols, as these are most likely to be non-standard and dependent upon the purpose for which the data being is being collected.
Despite the deficiencies in the existing metadata standards, many have dataset-level modules and parameters (literature citations, quality assessment reports) that may be useful in enhancing a field spectroscopy metadataset's potential for discoverability and re-use.By building upon the knowledge of scientists in ecology, marine science, the physical sciences and data governance experts who helped to develop existing geospatial standards, a hybrid "best of breed" field spectroscopy metadata standard can be created.Elements describing and documenting the dataset, resources, access, applications, data quality, citations, and protocols can enrich a field spectroscopy standard and make it adaptable to multiple data infrastructures.
Subsequent research can address the technical implementation of the results of this paper, including the metadata schema specifications and encoding formats in support of data exchange.The work presented here is an important step forward for a field spectroscopy metadata standard that addresses the specific needs of field spectroscopy data stakeholders with sufficient robustness to facilitate documentation, quality assurance, discoverability and data exchange within large-scale data sharing platforms.

Figure 1 .
Figure 1.Underwater field spectroscopy.Documenting metadata underwater (upper left), photo of target feature with finger pointing the exact target area (lower left), JAZ ocean optic spectrometer in off the shelve housing (lower right), ASD field spec 2 build in customized underwater housing (upper right).Source: [19].

Figure 2 .
Figure 2. Profile of an application-specific field spectroscopy metadataset.

Figure 3 .
Figure 3.A conceptual example of a successful mapping from two existing geospatial metadata standards to the proposed field spectroscopy metadata standard.

Figure 4 .
Figure 4. Successful mappings from existing metadata standards to the field spectroscopy metadatasets as a percentage of the total number of elements mapped in the core metadataset and the benthic reflectance metadataset.

Figure 5 .
Figure 5. Correlation of mappings per element (both unique and non-unique) to the percentage of total elements mapped in the dataset.

Table 1 .
Comparison of a sample of field spectroscopy good practice guides.

Table 2 .
Metadata standards selected for analysis.

Table 4 .
Critical metadata for in situ benthic reflectance measurements.

Table 5 .
Optional metadata for in situ benthic reflectance measurements.

Table A3 .
Mappings from Darwin Core to the Core Metadataset.

Table A4 .
Mappings from Dublin Core 1.1 to the Core Metadataset.

Table A5 .
Mappings from Ecological Metadata Language 2.1.1 to the Core Metadataset.

Table A8 .
Mappings from Access to Biological Collections Data Schema 2.06 to the underwater benthic reflectance metadataset.

Table A9 .
Mappings from ANZLIC Metadata Profile 1.1 (Geographic dataset core) to the underwater benthic reflectance metadataset.

Table A10 .
Mappings from Darwin Core to the underwater benthic reflectance metadataset.

Table A11 .
Mappings from Ecological Metadata Language 2.1.1 to the underwater benthic reflectance metadataset.

Table A12 .
Mappings from FGDC Content Standard for Digital Geospatial Metadata (Shoreline Metadata Profile) to the underwater benthic reflectance metadataset.