Marine

.


Introduction
Ocean modelers typically require many different types of input data for forcing, assimilation and boundary conditions, and routinely produce GB or larger amounts of output data.Depending on which model is used, the horizontal coordinate of the output data may be on a regular, curvilinear, or unstructured (e.g., triangular) grid, while the vertical coordinate may be on a uniform or stretched grid with a number of different possibilities (e.g., sigma, sigma-over-z, s-coordinate, isopycnal).Ocean modelers therefore often spend large amounts of time on mundane data manipulation tasks such as searching and reformatting data from external sources, writing custom readers for specific models so that results between models can be compared and assessed, as well as responding to custom data requests from consumers of their model products.Better tools reduce time spent on these mundane data manipulation tasks, thereby increasing time spent on modeling and analysis work.The U.S. Integrated Ocean Observing System (U.S. IOOS ® ) has been working on better tools to support not only its member organizations, but the entire ocean science community.U.S. IOOS (hereafter referred to simply as IOOS), is a collaboration between Federal, State, Local, Academic and Commercial partners to manage ocean observing and modeling systems to meet the unique needs of each region around the US [1][2][3].Federal partners provide the -National Backbone‖, and 11 IOOS Regional Associations (RAs) build upon the backbone with local assets to create observational and modeling systems designed to be more than the sum of the parts, capable of responding to the societal needs of each individual region (e.g., harmful algal blooms, eutrophication, search and rescue, oil spills, navigation, mariculture) (Figure 1).
In 2008, IOOS held a community modeling workshop attended by 57 members spanning federal, research and private sectors, including modelers and stakeholders, and the workshop produced a report with nine specific recommendations to advance the state of ocean modeling in the US [4].One of recommendations was to -develop an implementation plan for a distributed, one-stop shopping national data portal and archive system for ocean prediction input and output data‖.The US Geological Survey (USGS) had been working on model data interoperability for their collaborative projects on sediment transport modeling [5][6][7] and in 2009 agreed to send one of their modelers to the U.S. IOOS Program Office, within the National Ocean and Atmospheric Administration (NOAA), for a one year detail to lead the effort.
The one year project to develop model data interoperability for IOOS was remarkably successful.Leveraging technologies developed for the atmospheric community, a model data delivery and access system was implemented in all 11 IOOS RAs and at many of the National Backbone modeling centers [8].The approach used mostly technologies that had grown from the community and emerged as community practices [9,10].The system design allowed modelers to serve their data in a standardized manner via IOOS-approved web services without modifying their original data files or their models.Users were then able to access these standardized data streams using a variety of tools, from simple map-based browsing, to more sophisticated 3D visualization, to full scientific exploration on their desktop computers.With this success, future work to build on this infrastructure was recommended, including improved techniques for searching datasets, better support for unstructured grids and observational data, server-side subsetting for unstructured grids, more tools for common analysis tasks, and tools for scientific analysis and visualization environments in addition to Matlab.In 2010, IOOS funded a Coastal and Ocean Modeling Testbed (COMT), with the goal of accelerating improvement in ocean forecasting through targeted model assessment and comparison projects.The initial COMT focused on Estuarine Hypoxia, Shelf Hypoxia and Inundation [11,12], and prioritized assessment of model data output from both curvilinear orthogonal grid models (SLOSH, ROMS, NCOM and HyCOM) and unstructured triangular grid models (ADCIRC, FVCOM and SELFE) (Figure 2).The COMT Cyberinfrastructure team was charged with developing and implementing technologies to meet these needs.
Here we report on significant improvements of the IOOS infrastructure relevant to ocean modelers or users of ocean model products since the system described in [8].Many of these were developed in the COMT and other IOOS activities, while other components were developed external to IOOS in the international geoscience community.These include new standards for unstructured grid model output and for observational data (e.g., time series, profiles, trajectories), new services and access tools for consuming these standardized data, more analysis tools for Matlab users, and new tools for Python users.These tools and techniques are not specific to IOOS, and should be of interest to anyone interested in more efficient distribution or access to ocean modeling and observational data.Once the data has been standardized to Common Data Model feature types (by the use of CF-1.6 and UGRID-0.9 conventions), it can be distributed uniformly by appropriate services and consumed by standards-based clients, providing data interoperability for the user.

IOOS COMT Model Data Interoperability Design
The IOOS COMT model data interoperability design used the same basic core strategy described in [8]: Convert collections of non-standard data files to a common data model using a light-weight Extended Markup Language (XML) layer, which then allows distribution of datasets uniformly via standard services, which can be consumed by standards-based clients (applications) (Figure 3).At the heart of the system is the Unidata Thematic Realtime Environmental Distributed Data Services (THREDDS) Data Server, which is built on the Unidata NetCDF-Java library.The NetCDF-Java library is capable of reading NetCDF, HDF5, GRIB and GRIB2 data files into a common data model, which allows a uniform representation of the data regardless of input format.In addition, it can read NetCDF Markup Language (NcML) files, simple XML files that allow the provider to define aggregations of binary files as well as provide or modify metadata.Thus collections of non-standard convention binary files can be turned into aggregated, standardized datasets without modification of the original files.This is a powerful feature that places minimal impact on the providers.They can continue to use their existing data files with their existing software while exploring the benefits of standardized tools.The new features of this system since [8] are described individually below.

The ncSOS Service for Observational Data
The CF Conventions and the Unidata Common Data Model were originally developed only for 2D, 3D, and 4D gridded data (featureType:Grid) with two spatial dimensions (e.g., longitude, latitude), and time and/or depth dimensions.The success of this approach motivated the extension of this approach to observational data.In version 1.6 of the CF Conventions, metadata were defined to support observational data such as tide gauges, CTDs, ADCPs and ocean gliders (featureTypes: TimeSeries, Profile, TimeSeriesProfile, Trajectory).The NetCDF-Java library was updated to support these featureTypes, allowing for customized methods appropriate for these data types.
The OGC Sensor Observation Service (SOS) is an IOOS-approved web service for delivering observational data, supporting GetCapabilities, DescribeSensor, and other service requests that allow for a rich exchange between the server and client.Typically SOS services connect to databases that store the observational data (e.g., NOAA-COOPS, NDBC, 52 North), but with the new CF-1.6 specifications allowing standardized collections of observed data in NetCDF files to be ingested into the Common Data Model, it was realized that an SOS service could also be developed relatively easily for the THREDDS Data Server.Under funding from IOOS COMT, RPS ASA developed ncSOS, released it as open source, where it continued to be developed with funding from USGS and IOOS [13].Written in Java, it is a simple plug-in for the THREDDS Data Server, installing in minutes with no configuration necessary.This allows access to observational data by any broker or client that can formulate SOS requests such as XML or simple Representational State Transfer (REST) text and process the responses (currently XML, JSON or CSV).

Unstructured Grid (UGRID) Standards and Tools
To represent the data output from unstructured grid models a common way, metadata conventions need to be adopted.The CF Conventions have proven very popular and effective for structured (e.g., rectilinear, curvilinear) grid model output, but had no way of specifying the grid topology (connectivity) necessary for unstructured grids, or concepts such as location of data on the grid elements (e.g., located on faces, edges or nodes).Shortly after CF version 1.0 was released in 2008 a UGRID Interoperability Google Group was formed with representatives from organizations such as Deltares, NOAA, USGS, DOE, and the FVCOM, ADCIRC, SELFE modeling communities [14].After several years of discussion, development and testing, unstructured grid metadata conventions were finally released in 2013 as UGRID 0.9 [15].The conventions were developed to allow specification of data variables on fixed horizontal unstructured grids.Higher-order element representation of data variables and handling data from moving or changing meshes were left as future enhancements, with the realization that these enhancements might necessitate a different underlying data model, but leave the functionality for users intact.
With the new UGRID 0.9 conventions for unstructured grid data, it was possible to create a new class for NetCDF-Java to support the UGRID featureType.This Java code was also developed by RPS ASA and released as an open source plugin for NetCDF-Java and/or THREDDS Data Server [16].This allows for unambiguous retrieval of properties such as connectivity arrays or data location on the elements (e.g., face, node), which allows interoperable clients to be developed to support any UGRID-compliant data.
The NCTOOLBOX was developed to leverage the NetCDF-Java library Common Data Model for Matlab users [17].An evolution of the njTBX Toolbox for Matlab described in [8], it supports a wide range of operations on CF-compliant gridded data (rectilinear or curvilinear).With the new UGRID 0.9 conventions for unstructured grid data, and support in NetCDF-Java, it was possible for RPS ASA to create new tools for NCTOOLBOX to support UGRID-Compliant datasets as well.As an example, water levels from three different models used in COMT (ADCIRC, SELFE and FVCOM) can be accessed and displayed without using model specific code (Figure 4).The Matlab code to recreate this figure is the script demos/contrib/test_ugrid3.m from the UGRID version of NCTOOLBOX, available at [18].
Blanton et al. [19] leveraged the capabilities of UGRID standards and NCTOOLBOX to build a powerful GUI-based tool (ADCIRCVIZ) for accessing and visualizing storm forecasts run on unstructured grid models from multiple remote locations.While geared toward forecasts computed with ADCIRC, any model that conforms to UGRID standard can be visualized in this application.
The THREDDS Data Server [20] currently includes the built in Web Map Services (WMS) provided by ncWMS, developed by the University of Reading [21].Although this service works exceptionally well for rectilinear data, the performance is poor for curvilinear grids and there is no support for unstructured grids.To rectify this situation, ASA-RPS built a new Python-based WMS service called SciWMS [22] that uses standard Python plotting via the Matplotlib Basemap library to generate maps.This turns out to be several times faster than the approach ncWMS uses, at least for the current generation of models, and works for unstructured grids.Because it is written in Python, it can't be bundled with THREDDS like NcWMS.It must be installed and configured separately, but the procedure is well documented, along with instructions how to customize a THREDDS server configuration to point to the SciWMS mapping services instead of the usual THREDDS-supplied WMS services.With this configuration in place, the SciWMS services become associated with the ISO metadata, which makes the SciWMS services discoverable via the catalog services instead of the default ncWMS services.Thus tools can be developed that allow searching for relevant datasets via the ISO metadata, and then quick display of model results via the SciWMS services.

Expanded Analysis Functions and Demos in NCTOOLBOX for Matlab Users
In addition to providing support for unstructured grids, more tools and demos have been added to the NCTOOLBOX for Matlab, significantly increasing the functionality over the preceding njTBX toolbox described in [8].As an example, the nc_genslice.mfunction takes a CF-compliant model dataset URL and an [x,y,z,t] trajectory on input, and returns an interpolated track from the selected model along that path.Instead of downloading data from the entire bounding box and temporal extent of the glider path from the model, the data is extracted in small chunks following the glider path, and the end result is typically only a few hundred KB of data.This provides an easy way to compare different models to ocean gliders, and was recently used with several IOOS forecast models and data collected during GliderPalooza, a collaborative glider campaign run on the US East Coast during Sep-Nov 2013 (Figure 5).Because users of NCTOOLBOX have the data, not just graphics, quantitative model assessment can be performed in addition to visual comparison.Wilkin and Hunter [23] leveraged the power of these new routines to objectively assess seven different forecast models in the Mid-Atlantic Bight, using IOOS community glider data collected over an 18 month period.

An Improved Procedure for Modelers to Create Standardized Datasets
In the COMT, many of the groups wanted to upload their data to a central server, requiring a procedure to catalog the datasets being uploaded.As typical in the larger ocean community, the modeling groups generated output files with differing metadata and conventions.All were NetCDF, but while some were nearly CF or UGRID-compliant, others contained only minimal metadata.For a single simulation, some modeling groups produced a single NetCDF file for all variables and time steps, while others produced collections of NetCDF files, with individual files for each variable and fixed number of time steps.To handle this situation, an approach was developed that used template NcML files and a Google Drive spreadsheet to automatically generate the THREDDS catalog.
Despite the non-uniformity of output files, NcML made it possible to virtually aggregate and standardize the datasets.For each modeling group, a template NcML file was provided that would turn their output files into a single, CF-compliant or UGRID-compliant dataset.For example, the template provided to the SELFE group aggregated each variable along the time dimension, and then aggregated all the variables together, while also aggregating a grid file that contained the lon/lat locations of the mesh, allowing the 49 different files constituting a single simulation to be accessed through a single UGRID-compliant URL.The modeling groups could use these templates without needing to understand the details of the CF or UGRID conventions, and the templates needed little or no modification to be used for each simulation performed by a particular modeling group.
A spreadsheet on Google Drive was used by modelers to specify the location of their NcML template as well as additional custom descriptors for the model run.After completing a new simulation, they would create a directory on the testbed server and upload their output files and template NcML (preferably using GridFTP via Globus [24]).They then added a row to a shared Google Drive spreadsheet that specified a title for the run, the location of the NcML template, and a short summary statement describing the model run.Every hour a Python script running on the testbed server read the spreadsheet using the Google API and combined the metadata from the numerous NcML files and additional metadata from the Google Spreadsheet into a single THREDDS Catalog of CF-and UGRID-compliant datasets.

Enabling Discovery via Standardized Metadata and Catalog Services
Enabling standardized datasets is a great step forward for interoperability, but it still can be difficult for users to find these standardized datasets.In [8] and in other projects (e.g., the NOAA Unified Access Framework project [25]) the approach was to build a single catalog that points to other catalogs, basically creating a large tree of datasets organized in a particular way (e.g., by IOOS region or NOAA Line Office).Thus a user had to navigate this tree to search for datasets that might be of interest.Instead, most users would rather search on space, time, and variable to dynamically find datasets that are of interest.Thanks to advances in metadata standardization and cataloging services, this is now relatively easy to enable.
With IOOS funding, NOAA NGDC developed a Java plug-in for THREDDS called NcISO that provides an ISO metadata service, converting the attributes and other metadata into ISO 19119-2 XML.Written in Java, it also can be used as a stand-alone application which scans a remote THREDDS catalog and generates ISO metadata for each dataset.This metadata, in turn, can be harvested by catalog services such as Geonetwork, GI-CAT, Geoportal Server, CKAN and PYCSW.The COMT datasets were harvested from the testbed server THREDDS catalog by the NGDC Geoportal Server that drives the IOOS Catalog.The COMT datasets are therefore discoverable by users internal and external to the COMT project using a standardized approach (Figure 6). Figure 6.Results from a query for 3D FVCOM or ADCIRC datasets found within a specified bounding box (the extent of the map window).The user has selected one of the datasets returns, which displays the boundary of the dataset on the map (yellow rectangle), a summary (yellow-highlighted text), and dataset links, including -Open‖ to access the dataset using OPeNDAP, and -Metadata‖ to provide the full metadata document.

CF Compliant Tools for Python
Matlab is one of the popular analysis and visualization environments in the oceanographic community, so it made sense to focus initial effort on standards-based Matlab tools.To improve the efficiency for as many users as possible, however, standards-based tools need to be developed for all commonly used environments so that users can continue to use their favorite environment yet benefit from standards-enabled data.
One leading environment with similar capabilities to Matlab is Python.Python has the advantage of being open-source and free, so that tools and scripts developed for Python may be freely shared with scientists and other users without the requirement that they first buy a license.With hundreds of toolboxes giving capabilities like advanced time series, image processing, mapping and publication quality graphics, Python is becoming increasingly popular in the meteorological and oceanic research community.
Unlike Matlab, however, Python cannot directly utilize the Unidata NetCDF-Java library to take advantage of standards-based functionality.Although Python can easily take advantage of C and Fortran modules, and Unidata began working several years ago on a C library to support CF conventions (-LibCF‖), progress has been slow, and LibCF does not yet have the capability to perform fundamental tasks such as returning the geospatial coordinates from CF-compliant ocean models.

Figure 7.
Access and display of CF-compliant WaveWatch III data using the Python Iris package from the British Met Office.This demonstration was done using the IPython Notebook, which allows code, output and rich text to be combined in a web document that can be easily shared with others.
To fill this void, the British Met Office has created Iris, a CF-compliant package for Python [26].The primary goal is to serve their own users, but because it is open and standards-based, Iris can support a much wider community.With Iris, as in NCTOOLBOX, users can access and work with output from different models without any specific code: any CF-compliant structured grid model can be easily opened, accessed and displayed in Iris (Figure 7).
With several full time developers, government backing, a clear roadmap, and agile and open development approach, Iris is a strong contender to be the dominant met and ocean package for standards-based data access.Although Iris currently only supports structured grids, support for UGRID-compliant unstructured grid data and CF-1.6-compliant observational data is on the development roadmap.

Conclusions
Significant progress has been made in the international geoscience community to develop standards, services and tools that make data search, access, analysis and visualization easier and more efficient.In the ocean modeling community, techniques originally developed for atmospheric forecast and climate models have been adapted and extended to serve the ocean community.Leveraging Unidata technologies such as NetCDF, NcML and the THREDDS Data Server, coupled with international standards development work on the CF Conventions, UGRID Conventions and the OGC Services, a system has been developed that places relatively little burden on data providers or data users.
There is still work to be done hardening and expanding the system.More providers need to be aware of existing tools that will allow them to easily serve standardized, aggregated data.WMS services for unstructured grids are functional, but need to be optimized for performance.Standards-based tools for Python need to be brought up to the same functionality as the tools for Matlab.Packages for other commonly used scientific analysis and visualization environments such as R still need to be developed.
While additional work needs to be done, the advances described here bring us closer to a future where users discover data by keyword and geospatial queries on distributed holdings, access data via standard data services, and analyze and visualize data with common, standards-based software.The basic infrastructure depends on a common data model for each data type, a system that was first demonstrated on structured gridded data, and has been expanded to work with both unstructured grid data and specific observational data types.Although this approach has been developed for atmospheric, climate and oceanographic use, it could be used for hydrology, geology or other geoscience communities that use these data types.While applied here to IOOS, it is also being applied to support other applications [19,27].With demonstrated success for IOOS, and with support from the international geoscience community, the future looks promising for this distributed, standards-based approach.

Figure 3 .
Figure 3. Schematic of the IOOS Coastal and Ocean Modeling Testbed (COMT) model data interoperability design.Non-standard model output and data files are converted into standardized and aggregated virtual datasets using the NetCDF Markup Language (NcML), a lightweight XML layer.A custom NcML template is developed for each type of model output (e.g., collections of SELFE files).Once the data has been standardized to Common Data Model feature types (by the use of CF-1.6 and UGRID-0.9 conventions), it can be distributed uniformly by appropriate services and consumed by standards-based clients, providing data interoperability for the user.

Figure 4 .
Figure 4. Water levels from three different unstructured grid models (ADCIRC, SELFE and FVCOM) displayed by the NCTOOLBOX script demos/contrib/test_ugrid3.m.The script takes advantage of the UGRID conventions to access and display data from different unstructured grid models without any model-specific code.Any UGRID-compliant model could be displayed.

Figure 5 .
Figure 5.Comparison of ocean glider data (top panel) with forecast data from three different forecast models: the SECOORA SABGOM ROMS model from NCSTATE, the NAVY USEAST NCOM model, and the NOAA Global RTOFS HYCOM model.Tools from NCTOOLBOX were used that can extract vertical sections along time and space paths from any Climate and Forecast (CF)-compliant structured grid ocean model.The scripts that produce these plots may be found in the toolbox demos/contrib directory.