Next Article in Journal
A Method of Population Spatialization Considering Parametric Spatial Stationarity: Case Study of the Southwestern Area of China
Previous Article in Journal
Analysis of Tourism Hotspot Behaviour Based on Geolocated Travel Blog Data: The Case of Qyer
Previous Article in Special Issue
MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
Open AccessArticle

Advanced Cyberinfrastructure to Enable Search of Big Climate Datasets in THREDDS

Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA 22030, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2019, 8(11), 494; https://doi.org/10.3390/ijgi8110494
Received: 1 January 1970 / Revised: 25 October 2019 / Accepted: 31 October 2019 / Published: 2 November 2019
(This article belongs to the Special Issue Big Data Computing for Geospatial Applications)
Understanding the past, present, and changing behavior of the climate requires close collaboration of a large number of researchers from many scientific domains. At present, the necessary interdisciplinary collaboration is greatly limited by the difficulties in discovering, sharing, and integrating climatic data due to the tremendously increasing data size. This paper discusses the methods and techniques for solving the inter-related problems encountered when transmitting, processing, and serving metadata for heterogeneous Earth System Observation and Modeling (ESOM) data. A cyberinfrastructure-based solution is proposed to enable effective cataloging and two-step search on big climatic datasets by leveraging state-of-the-art web service technologies and crawling the existing data centers. To validate its feasibility, the big dataset served by UCAR THREDDS Data Server (TDS), which provides Petabyte-level ESOM data and updates hundreds of terabytes of data every day, is used as the case study dataset. A complete workflow is designed to analyze the metadata structure in TDS and create an index for data parameters. A simplified registration model which defines constant information, delimits secondary information, and exploits spatial and temporal coherence in metadata is constructed. The model derives a sampling strategy for a high-performance concurrent web crawler bot which is used to mirror the essential metadata of the big data archive without overwhelming network and computing resources. The metadata model, crawler, and standard-compliant catalog service form an incremental search cyberinfrastructure, allowing scientists to search the big climatic datasets in near real-time. The proposed approach has been tested on UCAR TDS and the results prove that it achieves its design goal by at least boosting the crawling speed by 10 times and reducing the redundant metadata from 1.85 gigabytes to 2.2 megabytes, which is a significant breakthrough for making the current most non-searchable climate data servers searchable.
Keywords: climate science; metadata; web cataloging service; big geospatial data; geospatial cyberinfrastructure climate science; metadata; web cataloging service; big geospatial data; geospatial cyberinfrastructure
MDPI and ACS Style

Gaigalas, J.; Di, L.; Sun, Z. Advanced Cyberinfrastructure to Enable Search of Big Climate Datasets in THREDDS. ISPRS Int. J. Geo-Inf. 2019, 8, 494.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop