A High-Resolution Global Gridded Historical Dataset of Climate Extreme Indices

Climate extreme indices (CEIs) are important metrics that not only assist in the analysis of regional and global extremes in meteorological events, but also aid climate modellers and policymakers in the assessment of sectoral impacts. Global high-spatial-resolution CEI datasets derived from quality-controlled historical observations, or reanalysis data products are scarce. This study introduces a new high-resolution global gridded dataset of CEIs based on sub-daily temperature and precipitation data from the Global Land Data Assimilation System (GLDAS). The dataset called “CEI_0p25_1970_2016” includes 71 annual (and in some cases monthly) CEIs at 0.25◦ × 0.25◦ gridded resolution, covering 47 years over the period 1970–2016. The data of individual indices are publicly available for download in the commonly used Network Common Data Form 4 (NetCDF4) format. Potential applications of CEI_0p25_1970_2016 presented here include the assessment of sectoral impacts (e.g., Agriculture, Health, Energy, and Hydrology), as well as the identification of hot spots (clusters) showing similar historical spatial patterns of high/low temperature and precipitation extremes. CEI_0p25_1970_2016 fills gaps in existing CEI datasets by encompassing not only more indices, but also by being the only comprehensive global gridded CEI data available at high spatial resolution. Dataset: https://doi.org/10.1594/PANGAEA.898014 Dataset License: CC-BY: Creative Commons Attribution 4.0 International


Introduction
Extremes in climate such as floods, droughts, and cold and heat-waves can have significant societal, ecological, and economic impacts globally [1]. Since the publication of the third assessment report of the Intergovernmental Panel on Climate Change (IPCC) in 2000, characterizing extremes under past and projected future climate has generated rapid interest [2]. The climate modelling community, for instance, has spent increasing effort to capture high-frequency extreme events in their simulations of historical and future projected climate. The underlying aim for both regional and global climate modelling exercises (e.g., CORDEX and PRIMAVERA) 1 has been to develop a better understanding of the evolution of extreme weather events under long-term climate change and variability. The impetus to better understand extreme weather events is further driven by the impact modellers who assess sectoral damages at varying spatial scales. The two vital characteristics of climate that are at the core of impact models are (i) mean climate and (ii) the occurrence and frequency of extreme events [3]. An increasing notion shared within the climate research community is that even a relatively small change in the frequency or severity of extreme weather events (i.e., in the tails of the probability distribution function) would have profound impacts on life and assets [4], thus making it further imperative to analyze extremes at higher temporal and spatial resolutions. For the scientific community focusing on impacts of climate change and variability, historical observations of extreme indicators can facilitate a better understanding of the role of extreme events and sectoral implications [5].
Largely driven by the requirement for a robust definition of climate extreme indicators, the Expert Team on Climate Change Detection and Indices (ETCCDI) 2 in 1999 led the first efforts in defining a set of climate extreme indices (CEIs) that provide a comprehensive overview of temperature and precipitation statistics [4,[6][7][8]. The ETCCDI has developed an internationally coordinated set of core climate indices consisting of 27 descriptive indices for moderate weather extremes 3 [9][10][11]. The preliminary set of these 27 core indices were drawn up keeping the detection and attribution needs of the research community in mind [10,11]. Noting the limitations of the ETCCDI indices with regard to restricted scope/usage in assessing sectoral impacts, additional sector-relevant indices were recommended and developed by the Expert Team on Sector-specific Climate Indices (ET-SCI) [9].
This study introduces a new open-access high-resolution global gridded (0.25 • × 0.25 • ) 4 dataset of 71 CEIs (including the original 27 ETCCDI indices), covering the period 1970-2016. The dataset (hereafter referred to as "CEI_0p25_1970_2016") aims to contribute to the existing CEI databases by making available the first comprehensive CEI dataset currently unavailable for the climate community at a high resolution with worldwide coverage. Moreover, a consistent global CEI dataset covering a long historical time period can lay a framework for not only analyzing observed changes in extremes, but also potentially improving information services on extremes at regional scales [10].
The CEI_0p25_1970_2016 are a set of core (Table S1 in Supplementary Materials) and non-core (Table S2 in Supplementary Materials) indices 5 as defined and developed by the ETCCDI/ET-SCI, and adopted by the World Meteorological Organization (WMO). The set of "core indices" refers to indices that were developed by ETCCDI targeting the research community focusing on "detection and attribution" in climate science (details in Section 4).
The rest of the paper is organized as follows. Section 2 describes the CEI_0p25_1970_2016 in detail. Section 3 discusses the underlying meteorological dataset and the tools/methodology used in the preparation of the CEI_0p25_1970_2016. Section 4 outlines the novelty, potential scope, application, and limitations of the CEI_0p25_1970_2016. Dataset availability, ongoing work, and some recommendations for future research are summarized in Section 5. 2 Formed by the World Meteorological Organization (WMO) Commission for Climatology (CCl). 3 Extreme events that by definition typically occur a few times annually rather than severe impact, decadal weather events. The indices for moderate weather extremes use absolute or percentile thresholds generally set at moderate values (e.g., 25 • C, 90th percentile). 4~2 7 km × 27 km at the equator.

Spatial and Temporal coverage of CEI_0p25_1970_2016
The CEIs included in this study encompass all but two indices 6 that are part of the complete list of 73 ETCCDI/ET-SCI core and non-core indices [9]. The CEI_0p25_1970_2016 is derived using meteorological variables from the reanalysis data product Global Land Data Assimilation System (GLDAS) [13]. GLDAS is a new generation of reanalysis developed jointly by the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC) and National Centers for Environmental Prediction (NCEP) [14]. Because the spatial extent of GLDAS covers all land north of 60 • S, the indices in CEI_0p25_1970_2016 are also computed over the corresponding 1440 (longitude) × 600 (latitude) grid cells. Further description of GLDAS as well as the reasons for using it as a data source in this study are discussed in Section 3.

Other Existing Datasets Incorporating CEIs
While other similar historical gridded CEI datasets do exist, they are either (i) regional in coverage, (ii) at coarser resolution, or (iii) limited in the number of indices available for research purpose. Examples include (i) the 30 CEIs made available by E-OBS at 0.10 • gridded resolution for Europe http://surfobs.climate.copernicus.eu/dataaccess/access_eobs_indices.php, (ii) the global 0.50 • gridded resolution S-14 indices dataset of 27 core ETCDDI indices available at http://h08.nies.go. jp/s14/ [15], and (iii) the global 3.75 • × 2.5 • resolution HadEX2 and GHCNDex datasets of 27 core ETCDDI indices available at https://www.climdex.org/learn/datasets/ [6,7]. To the best of the author's knowledge, the present database CEI_0p25_1970_2016 is currently the only comprehensive high-resolution global-gridded historical dataset of ETCCDI/ET-SCI core and non-core indices.

Data Acquisition and Processing
The CEIs used in this study were computed utilizing the WMO ET-SCI recommended and developed R-software package "ClimPACT2" 7 [9]. R [16] is an open-source language and software environment, developed primarily (but not solely) for statistical computing, and is applied widely in climate research. Moreover, ClimPACT2 also makes use of several R subroutines, such as SPEI [17], and is designed for operating on parallel computing infrastructure.
For the computation of CEI, ClimPACT2 requires the following meteorological variables (i) maximum near-surface air temperature (TX), (ii) minimum near-surface air temperature (TN), and (iii) near-surface total precipitation (PR), all at daily timesteps. These variables in the native Network Common Data Form 4 (NetCDF4) 8 format were obtained from the GLDAS-version 2 9 [13,18,19], available at 3-hourly timesteps and a fine spatial resolution of 0.25 • × 0.25 • . GLDAS is a global high-resolution reanalysis dataset that incorporates satellite and ground-based observations, producing optimal fields of land surface states and fluxes in near-real-time [13].
For the purpose of computing the CEI, the 3-hourly gridded variables (TX, TN, and PR) were first temporally aggregated to construct daily mean TX and TN, and daily total PR, using a suite of 6 The two indices Cooling and Heating Degree Days (CDD and HDD) are computed separately as part of another dataset of additional indices relevant for health and energy sectors, currently under preparation [12]. Further details are provided in Section 5.2. 7 R version 3.5.0 ("Joy in Playing") x86_64 on Linux Centos 6.6 software architecture. ClimPACT2 was accessed on 23 September 2018 from https://github.com/ARCCSS-extremes/climpact2. 8 NetCDF is a set of scientific software libraries, with self-describing and machine-independent data format. https://www. unidata.ucar.edu/software/netcdf/docs/. 9 Data accessed from https://disc.gsfc.nasa.gov/ on 12 July 2018. command line operators from NetCDF Command Operators (NCO ver 4.3.4) 10 and Climate Data Operators (CDO ver 1.9.0) 11 . Indices based on percentile thresholds (e.g., WSDI and CSDI in Table S1) were computed using years 1970-2000 as the baseline period. For details on classification of CEIs (namely "percentiles", "absolute", "threshold", "duration", and "others"), readers are guided for further reading in [6][7][8][9].

Choice of GLDAS as a Reanalysis Dataset for the Computation of CEIs
Vis-à-vis other global gridded reanalysis datasets, GLDAS offers several advantages. First, GLDAS provides a consistent quality-controlled long global gridded time-series of the required variables (i.e., TX, TN, and PR) at a high spatial resolution. Other reanalysis data products available were found to have either a coarser spatial resolution (e.g., ECMWF-ERA40 and JRA-55, both available from the mid-1950s but at 1.125 • ), or a shorter time series (e.g., newly released ECMWF-ERA5 at 0.281 • from 1979-present day, and NCEP-CFSv2 at 0.205 • from 2011-present day). Second, GLDAS runs in near-real-time, offering the potential to regularly update the database presented here.
The choice of GLDAS for computing the current set of indices was further motivated by its large number of additional meteorological (e.g., specific humidity, surface pressure), land surface state (e.g., soil moisture, surface temperature), and flux (e.g., evaporation, sensible heat flux) variables, not commonly available in other reanalysis data products for a long time-series and at a high spatial resolution 12 . While none of these additional variables are required for computing the current set of indices, another dataset [12] of sectoral indices that are not presently implemented in the ETCCDI/ET-SCI indices requires a subset of these variables (details in Section 5.2). The two datasets of indices (current and [12] under prep.) will together comprise a large (~85) number of indices both based on the same underlying GLDAS data, thus enabling the climate impacts community to access "ready-to-use" multi-sectoral indices.
GLDAS has been comprehensively evaluated using different regional/global reference datasets in earlier studies (e.g., see [14] who compare the GLDAS daily surface air temperature at 0. Equally well-documented are certain known limitations of the temperature and precipitation estimates in GLDAS. Whereas spatial details in high mountainous areas are not sufficiently estimated by the GLDAS data, the surface air temperature estimates are generally accurate, with some caution recommended for mountainous areas [14]. Previous studies that have incorporated GLDAS data include (i) [22] for impact assessment studies in energy sector, and (ii) [23,24] for the analysis of regional environmental conditions and changes. For a comprehensive list of GLDAS-related references, readers are referred to https://ldas.gsfc.nasa.gov/gldas/GLDASpublications.php.

Novelty of CEI_0p25_1970_2016
The CEI_0p25_1970_2016 is currently the only dataset providing researchers and policymakers with an exhaustive list of ETCCDI/ET-SCI recommended indices, dating back to the preceding four decades, covering nearly all global land grid-cells, and assembled using a quality-controlled reanalysis data product at a high spatial resolution. Considering the computational time and resources required for assembling a comprehensive dataset of CEIs at a global scale, the biggest asset of CEI_0p25_1970_2016 from the users' perspective is the open access to a pre-compiled ready-to-use set of indices in its native data format, along with a web interface allowing robust statistical analysis and mapping of the results in a few easy steps (details in Section 5.1).

Scope of Application
The CEIs included in this study are not only suited as assessment tools in multiple sectors such as Agriculture, Health, Energy, Water resource, etc., but also as metrics capable of being aggregated as composite indicators for risk assessment and vulnerability studies (e.g., as demonstrated and applied recently by [25] over Italy in the form of a "Climate Risk Index"). A number of earlier studies have demonstrated the efficacy of the CEIs, both in detection and attribution studies, as well in the impacts assessment of climate change and variability in key sectors. Examples include (i) [26] who use "Rx1day" (Table S2) to examine the changes in model-simulated extreme precipitation by decomposing the daily regional-scale extreme precipitation as contributions from atmospheric thermodynamics and dynamics; and (ii) [27] who consider a broad range of CEIs (from Tables S1 and S2), for assessing future climate change impacts on agriculture, human health, ecological ecosystems and utility (energy demand) in Canada.
Moreover, it is widely known and established in sectoral impact studies employing empirical methods that a large proportion of variation in the outcome variable is better explained by the climatic variables accounting for moderate or severe extremes (e.g., the relationship (i) between crop productivity and a variant of the index "GDDgrown" in Table S1, known as killing degree days (KDDs) [28], (ii) between electricity consumption and degree-day indices namely "CDD" and "HDD" [29]). CEI_0p25_1970_2016 for instance provides an instant resource platform for empirical modellers to download and investigate a number of potential predictor variables that are robust moderate/severe extreme indicators.
The robust characteristics and climatological attributes captured by ETCCDI/ET-SCI indices can facilitate consistent comparison of results across different climatic zones, different time periods, and the identification of regions (clusters) with similar characteristics in extremes (e.g., grid cells with similar trends in annual days when daily maximum temperature is at least 30 • C ("TXge30", Table S1). The identification of common hot spots can be of potential interest to policymakers, insurance companies, and country planners for the assessment of the risk and vulnerability of regions to extreme weather disasters (e.g., flooding, drought, heat waves).
While the mean climatology of a location is invariably well-captured by the state-of-the-art reanalysis data products and Earth System Models (ESMs), extremes (particularly in precipitation) at fine spatial scales have been difficult to replicate [30]. CEIs provide the modelling community with a detailed set of indicators enabling the comparison of different input data sources in their ability to model extremes [8,9].
Finally, with the planned inclusion of additional indices to the current inventory of ETCCDI/ET-SCI indices in the near future [9], the development of larger CEI datasets for historical and future time periods could make valuable instruments available to researchers, policymakers, and adaptation planners focusing on occurrences and return periods of rarer extreme meteorological events (e.g., using extreme value theory).

Limitations of Indices Included in CEI_0p25_1970_2016
While the CEIs included in this study (Tables S1 and S2) were developed by the WMO expert teams to largely address the growing demands of sectoral impact modellers, certain limitations of the existing ETCCDI/ET-SCI indices have been recognized, and efforts are ongoing to develop other robust indices meeting multi-sectoral requirements [9]. For instance, under the current framework of ET-SCI definitions, the Heat Wave Magnitude (HWM) indices (Table S2 are based on the methodology developed by either [31] or [32]. The more recently developed HWM Index daily (HWMId) defined by [33] and implemented in various sectoral studies (e.g., [34] for river discharge and [35] for assessing impacts on wheat yields 13 ) is yet to be included in the inventory of ETCCDI/ET-SCI indices.
Moreover, the ETCCDI/ET-SCI indices are defined largely at annual timescales, and some are defined at monthly timescales as well. For certain sectoral applications (e.g., in Agriculture and Energy), the current set of monthly/annual indices may prove less useful, as climate anomalies need to be computed over different timescales. For instance, the "GSL" index (Table S1) in its current form defined at annual timescales does not account for heterogeneity in the length of crop-specific growing season (further details in [35]). In such cases, using indices computed at annual timescales can lead to misleading results. Some further shortcomings of the existing ETCCDI/ET-SCI indices are discussed and recommended for future work (details in Section 5.2).
Lastly, it must be emphasized that because CEI_0p25_1970_2016 utilizes temperature and precipitation data from GLDAS, when using the current set of indices users should keep in mind the known uncertainties and limitations of the GLDAS data (as discussed in Section 3.2).
The files follow the naming convention CEI_timescale_GLDAS_0p25_deg_hist_1970_2016.nc (Figure 1), wherein "CEI" is the abbreviation of the index (as described in Tables S1 and S2) and "timescale" is either "ANN", "MON", or "DAY", relating to annual, monthly, or daily timescales 15 over which the corresponding CEI is computed.
The size of the individual NetCDF files vary between 156 megabytes (Mb) and 1.9 gigabytes (Gb), depending on the CEI and time-scales at which it is computed. One exception is the file "hw_ANN_GLDAS_0p25_deg_hist_1970_2016.nc" which is 3.1 Gb as it includes twenty individual indices in a single netCDF4 file. GLDAS does not include data over (or near) water bodies. Such grid cells where the required GLDAS TX, TN, and PR data are not available for computing the CEIs are identified by missing values "1.e+20f". Further details of the variables/dimensions in the individual netCDF4 files can be examined using either NCO or CDO commands, such as "ncdump -h netcdf_file_name" or "cdo sinfo netcdf_file_name", respectively. For creating quick plots and exploratory data analysis of individual netCDF files, open-access data tools such as Panoply (https:// www.giss.nasa.gov/tools/panoply/) or NCview (http://meteora.ucsd.edu/~pierce/ncview_home_ page.html) are recommended. Sample plots using Panoply for the four indices ("TXx", "HWM_Tx90", "CSDI", and "PRCPTOT") are shown in Appendix A ( Figures A1-A4). 13 The authors use a slightly modified version of HWDId in their study, which they refer to as Heat Magnitude Day (HMD) in agriculture. 14 The dataset will also be mirrored on KNMI Climate Explorer (http://climexp.knmi.nl/about.cgi?id=someone@somewhere), a web application interface that can facilitate not only rapid aggregation and robust statistical analysis of the CEI, but also downloading of spatio-temporal subsets and quick plotting. 15 The dataset includes a total of 89 netCDF4 files (49 on annual, 39 on monthly and 1 on daily timescales). Some indices have data both on monthly and annual timescales.

Ongoing Work and Recommendations for Work in Future
The indices in CEI_0p25_1970_2016 are intended to be updated post-2016 years, subject to the availability of the required GLDAS raw meteorological variables in the coming years. The updated longer time-series of CEIs of more recent years should prove beneficial to the research community focusing on recent extreme events (e.g., the droughts of 2017 and 2018 in south-east Australia, the heat waves of 2018 in California, United States of America, the more recent January-February 2019 extreme cold wave in North America). Additionally, upon the formal inclusion of any new indices (such as the "HWMId" and the "Crop-specific GSL" as discussed in Section 4.3) by the WMO expert teams to their list of ET-SCI indices, the same will be formally included in the existing dataset presented in this study.
While the ETCCDI/ET-SCI core and non-core indices employed in this study encompass a very large spectrum of sectoral and non-sectoral indices, the list is by no means exhaustive. Motivated by the suggestions of the R ClimPACT2 [9] package creators, another dataset of indices largely relevant for health and energy sectors (called "HEI_0p25_1970_2016") is currently under preparation [12]. Some features of HEI_0p25_1970_2016 will for instance be the inclusion of the two ETCCDI indices (i.e., CDD and HDD [36]) that are not included in this study 16 . Moreover, HEI_0p25_1970_2016 will also account for additional meteorological variables (e.g., near-surface relative humidity and wind speed) for computing non ETCCDI/ET-SCI indices, such as the Humidex [37,38], the Heat Index (HI) [39,40], and the Discomfort Index (DI) [41,42]. Together, both CEI_0p25_1970_2016 and HEI_0p25_1970_2016 are aimed to address the growing needs of the climate impact community, by overcoming the current data scarcity of high-resolution global gridded CEIs in earth science.
Supplementary Materials: The following are available online at http://www.mdpi.com/2306-5729/4/1/41/s1, Table S1: 32 Core ET-SCI indices. Bold indicates index is also an ETCCDI index. (TX: daily maximum near-surface air temperature, TN: daily minimum near-surface air temperature, PR: daily near-surface total precipitation, H: Health, AFS: Agriculture and Food Security, WRH: Water Resources and Hydrology); Table S2: 39 non-core ET-SCI indices. Bold indicates index is also an ETCCDI index. Sectoral abbreviations same as in Table S1. Acknowledgments: The author is grateful to Nicholas Herold in assisting with the R software package ClimPACT2; Lisa Alexander and Enrico Scoccimarro for constructive discussion on sectoral extreme indices; Enrica De Cian for feedback on the draft version of the paper; the high-performance computing resources of the Boston University Shared Computing Cluster (SCC) on which the CEIs were computed; and NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) for making GLDAS data publicly available. Developers of R SPEI package, CDO, and NCO are also acknowledged for providing open-access tools that were used for data preparation in this study. The constructive feedback received from three anonymous reviewers helped to improve the manuscript further.

Conflicts of Interest:
The author declares no conflict of interest.
Appendix A. Sample Plots of Selective Indices from Tables S1 and S2 Using Panoply