A High Spatiotemporal Resolution Global Gridded Dataset of Historical Human Discomfort Indices

Meteorological human discomfort indices or bioclimatic indices are important metrics to gauge potential risks to human health under varying environmental thermal exposures. Derived using sub-daily meteorological variables from a quality-controlled reanalysis data product (Global Land Data Assimilation System—GLDAS), a new high-resolution global dataset referred to as “HDI_0p25_1970_2018” is presented in this study. The dataset includes the following daily indices at 0.25° × 0.25° gridded resolution: (i) Apparent Temperature indoors (ATind); (ii) two variants of Apparent Temperature outdoors in shade (ATot); (iii) Heat Index (HI); (iv) Humidex (HDEX); (v) Wet Bulb Temperature (WBT); (vi) two variants of Wet Bulb Globe Temperature (WBGT); (vii) Thom Discomfort Index (DI); and (viii) Windchill Temperature (WCT). Spanning 49 years over the period 1970–2018, HDI_0p25_1970_2018 fills gaps in existing climate indices datasets by being the only high-resolution historical global-gridded daily time-series of multiple human discomfort indices based on different meteorological parameters, thus offering applications in wide-ranging climate zones and thermal-comfort environments.


Introduction
Recent global temperature extremes, such as the summer heatwaves in Europe (2003,2006,2018 [1][2][3][4][5][6]. Exposure of human body to different meteorological elements, especially a combination of extremes in temperature, humidity, and/or wind that can have a disruptive impact on working conditions and labor productivity, are also frequently mentioned in health impacts literature (e.g., see [7,8] and references therein). Amongst both indoor and outdoor thermal exposures, human discomfort measures accounting for cold, heat, wind, humidity, and direct sunlight, are often considered as proxies for determining the likelihood of health risks (e.g., heat stroke and the associated mortality [9][10][11][12][13] to environment exposure), as well as the potential spike in demands of public utilities (e.g., energy demand for space cooling in residential/commercial buildings and work-spaces [14][15][16], ambulance callouts [17,18], etc.).
In addition, a rapidly growing body of literature also highlights the importance of occupational heat stress (OHS) [9,11,13,19,20]. Usually referred to as local workplace heat stress [19], OHS predominantly account for four environmental factors, namely air temperature, humidity, wind speed, and heat radiation [19,21,22]. Likewise, the level of physical activity undertaken by individuals and the clothing worn are the two non-environmental factors (referred to as personal factors) that play a dominant role in OHS ( Figure 1).

Figure 1.
Four environmental (blue) and two non-environmental or personal (pink) factors affecting human thermal comfort (Image Source [23]). The indices discussed in this study account for the three environmental factors: "Humidity", "Air Velocity", and "Air Temperature". For a detailed discussion on the six factors and their interactions, readers are referred to Epstein and Moran [21]. Table 1. Human discomfort indices presented in this article (Column "Discussion Paper" refers to the reference article on which the equations used in this article are based).

Source of Meteorological Variables
The HDIs included in this study (Table 1) are derived using meteorological variables from Global Land Data Assimilation System [55][56][57] (GLDAS)-version 2. GLDAS is a new generation global high-resolution reanalysis data product developed jointly by the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC), and National Centers for Environmental Prediction (NCEP) [58]. It provides a consistent quality-controlled long global gridded time-series of several key meteorological variables at fine scale spatiotemporal (0.25° gridded, 3-hourly) resolution.
GLDAS incorporates satellite and ground-based observations, producing optimal fields of land surface states and fluxes in near real-time, thus facilitating regular updates of the HDI_0p25_1970_2018 dataset presented in this study. Furthermore, GLDAS makes available meteorological and land-surface variables that are not commonly available in other reanalysis data products either as consistent long time-series, or at a high spatial resolution. Currently, GLDAS version-2 spans 1948-present year (with a latency of about two months) and includes a total 36 landsurface fields. Further details on GLDAS can be found at [59].
Other reanalysis data products available have either (i) a coarser spatial resolution (e.g., ECMWF-ERA40 and JRA-55, both available from the mid-1950s but at 1.125°), or (ii) a shorter time series (e.g., newly released ECMWF-ERA5 at 0.281° from 1979-present day, and NCEP-CFSv2 at 0.205° from 2011-present day). Further details on studies evaluating and applying the GLDAS dataset are provided in Section 3.3 "Technical Validation".
Out of these 10 variables, VP, SVP, RH, and VDP are not directly available in GLDAS but instead derived using other variables (see Table 2 for details). Moreover, VPD is not used in compiling any HDI but made available in the database as a supplementary variable. All GLDAS variables (outlined in Table 2) collectively referred to as "secondary variables" are also made available to the user community along with the HDIs presented in this article (details in Section 3 "Data Records").
Pre-processing and quality checks of all GLDAS data variables (  [62]. Although GLDAS is a quality-controlled data product, daily fields aggregated using 3-hourly records were subject to further careful quality control, to identify spurious data, such as values outside expected range of 0-100% for RH, negative values for P, Q, VP, SVP, and W.  [27,63], and is defined for temperatures above 0 °C. Equations (18)- (20) are discussed in detail in [64].

Compilation of HDIs, Historical Origins, and Operational Limits of Usage
The HDIs presented in this article are defined and well documented in the health impact and epidemiology literature, and form a subset of those that are most commonly used by the research community focusing on climate-health impacts. The following section introduces each of the HDI presented in this article in detail. In particular, the historical roots of the HDIs, and their scope of application are summarized below. Where applicable, limitations of the current set of HDIs in relation to the range of environmental thresholds are also discussed here. Other limitations emanating due to approximations in formulations are discussed in Section 4.1.2. In addition to the examples of applications of HDIs in existing climate-health impacts literature (Table 1), selective usage of HDIs in literature are also discussed below.
Unless otherwise reported below, the units of the input variables used in computation of the HDIs presented here follow those discussed in Table 2. 2.3.1. Apparent Temperature (AT), Units: °C AT is related to the commonly used heat index (HI) in the U.S. and Australia. It can be considered as a measure of relative discomfort from combined heat and humidity [32]. In its native form, Steadman [38] defines the AT as the dry-bulb temperature (or simply Ta) for thermal equilibrium of an adult walking, assuming moderate humidity in the absence of both wind and solar radiation (SR), and with the same thermal resistance between the skin and the atmosphere as in the given circumstances (see [38][39][40]65] for detailed description on the origins and derivation of AT under multiple arbitrary sets of human physiological and environmental conditions).
The AT originally mooted by Steadman in 1971 and developed in 1979 (ATind, Equation (1)) [38], has its origins partly rooted in a much older and stringent measure of human discomfort, referred to as Standard Effective Temperature (SET) introduced by Gagge [66]. While the SET and its earlier forms such as the Effective Temperature and the Corrected Effective Temperature (see Table 3 in [21] for further details) consider all environmental and bodily conditions that affect human thermoregulation, Steadman conceived a scale of a HDI based on a human body exposed to different conditions. The outcome was different versions of AT expanded in 1984 (ATot_NOAA, Equation (2)) [39] and in 1995 (ATot_ABM, Equation (3)) [40], based on Ta, VP, and W. Equations (2) and (3) represent the empirical expressions of AT in outdoor shade conditions. It must be emphasized though that the different versions of AT (Equations (1)-(3)) are not the only formulations found in literature. For instance, AT based on Ta and dew point temperature [67,68], and other variations based on the normal AT (referred to as Weather Stress Index-WSI, or simply a relative AT) [69] are a few notable examples.
To a careful eye, comparing Equations (1)-(3) and Equation (20) (Section 3.1) makes it obvious that AT increases with higher RH. Another observation that can be inferred from Equations (2) and (3) is the suitability of application of the two variants of AT utilizing W in the definition. ATot_NOAA and ATot_ABM are the only two HDIs in this article that can be considered as "all season" indices (i.e., indices whose usage are not restricted by upper (summer like) or lower (winter like) conditions or thresholds [69]). Whereas WCT discussed later is a winter-oriented index, the remaining HDIs discussed below are "summer indices" as they are not valid below lower ranges of Ta and/or RH thresholds.
Examples of earlier studies applying AT include [41] that evaluates the relationship between AT and air pollution (PM2.5) vs. mortality in elderly population of Metro Vancouver, [67] that evaluates the effects of AT on daily mortality in Lisbon and Oporto (Portugal), and [68] that investigates AT and acute myocardial infarction hospital admissions in Copenhagen, Denmark.

2.3.2.
Heat Index (HI) as defined by the U.S. National Oceanic and Atmospheric Administration (NOAA)-National Weather Service (NWS), Units: °C Sometimes referred to as "Humiture" (as first introduced by Meteorologist George Winterling), HI derived using a multiple regression equation of Ta and RH on the AT [33] is closely related to the AT. The NOAA-NWS implemented HI as formulated by Rothfusz [24] utilizes a least squares fit on data using a polynomial function in Ta (in °F) and RH.
HI is not defined for Ta ≤ 80 °F (27 °C) and RH ≤ 40%, which therefore makes it unsuitable as an HDI for a global analysis. Put another way, HI can be thought as optimal for Ta > 80 °F and RH > 40%. For measured paraments of Ta and RH meeting the optimal criteria, NOAA-NWS implements HI (Equation (4)) along with the following conditional adjustments (Equations (5)- (7)): where Ta is in °F and RH in % (although the HI assembled in the current dataset is converted to °C).
For RH < 13% and 80 °F ≤ Ta < 112 °F, Equation (4) becomes: For RH > 85% and 80 °F < Ta < 87 °F, the following adjustment is added to HI. Equation (4) therefore becomes: If the resulting HI for any combination of Ta and RH is below 80 °F when using Equations (4)-(6) above, the Rothfusz regression is replaced by Steadman's formula [38] expressed as: HI is often interchangeably referred to as Humidex (different to the Canadian version of Humidex discussed next), or as AT. This is probably due to the NOAA-NWS practice of substituting the HI with ATind, whenever the prevailing ambient conditions are not met. More recently in 2005, Schoen [43] proposed another simplified variant of HI having only three parameters as opposed to the large number of parameters and adjustments above.
HI is less frequently found in literature as an HDI for assessment of health impacts. It is instead used more as a meteorological indicator (e.g., by the NWS in the U.S.) for issuing operational heatstress warmings.

Humidex (HDEX) or Humidity Index, as defined by Environment and Climate Change
Canada, Units: °C Devised in 1979 by Canadian meteorologists Masterson and Richardson [29], HDEX was first introduced in Canada in the mid-1960s with a purpose to create an easily understood method of describing how very hot and humid weather feels to a human. Just like HI which combines Ta and RH to describe how hot the weather feels to an average person when combining the effect of heat and humidity [32], HDEX combines Ta and dew-point temperature (Td) (or an equivalent formulation utilizing VP as done here) into a single number thereby reflecting perceived temperature.
HDEX is used more widely than HI in Canada. As noted by [46], compared to the HI, HDEX typically yields higher values at equal Ta and RH. The main difference vis-à-vis HI discussed above is a simpler approach on which HDEX is designed (i.e., not being based on a thermo-regulatory model) [32]. Yet both HI and HDEX use the same underlying principle of a sensory index, neglecting the effect of other meteorological parameters such as strong winds-which help wick away perspiration-and whether the person is walking in the sunshine, which significantly increases how hot it feels.
Similar to HI, HDEX is less frequently found in literature as an HDI for assessment of health impacts. Usage of HDEX in Canada as a meteorological indicator for issuing heat-stress warmings are examples of applications in an operational environment.

Wet-Bulb Temperature (WBT), Units: °C
WBT is the minimum temperature to which air can be cooled by evaporative cooling, and as such, contains information about air temperature as well as moisture content. WBT is a combined measure of T and RH, or more simply as put by Pal and Eltahir [48], a measure of "mugginess". In operational weather stations, WBT is measured using a wetted thermometer (Twb) exposed to wind, but usually shielded from direct sunlight.
Following Stull [26,64], WBT is computed utilizing Ta and RH as follows: where the arctangent (atan) function returns values in radians.
WBT albeit less commonly used as a HDI to investigate impacts of thermal exposure on human body, provides yet another physically based relationship to the human body's core temperature. In fact, among the HDIs discussed here, WBT with the earliest origins (dating back to 1905 [47]), is used more often as a parameter in the definition of other commonly used HDIs (see WBGT, Equations (10)-(13); DI and modified DI, Equations (14)- (16); and the Thermal Humidity Comfort-THIC and Physiology Indexes-THIP; discussed in [30]).
Examples of earlier studies applying WBT include [48] for investigating exceedance of a threshold for human adaptability under future climate change scenarios.

Simplified Wet Bulb Globe Temperature (WBGT), Units: °C
Perhaps no HDI other than the WBGT has been well tested, documented, and used under a variety of climatic conditions (e.g., see [9,20,21,49,50,70] for historical evolution and application of WBGT as a health stress index or a heat exposure index). Equally well documented have been the quantitative links between WBGT and the work-rest cycles needed to prevent heat stress effects at both indoor and outdoor workplaces [9,50].
The origins of the WBGT index trace back to the detailed studies undertaken by the US military ergonomists in the 1950s. Introduced by Yaglou and Minard in 1957 [70], WBGT is recommended by many organizations such as the International Organization for Standardization (ISO 7243:2017), framing criteria for exposing workers to hot environments [9,20,21]. It also finds applications in military use, as well as in sports and recreation activities [32].
Numerous forms of WBGT exist in literature. A detailed discussion on the different instruments and the origins of the variants of the equations used for measuring/computing WBGT is outside the scope of this study (see [21,32,50,71,72] for details). In a nutshell, the most complete forms of WBGT require measurements incorporating a shielded dry-bulb thermometer measuring Ta, a 150 mm diameter black globe measuring black globe temperature (Tg), and a natural Twb; thus taking into account convention, conduction, evaporation, and radiation [32].
The two versions of WBGT used in this article calculated following Gagge and Nishi [49] (Equation (10)) and the ABM (Equation (11)) are as follows: WBGT_ABM = 0.567 × Ta + 0.393 × VP + 3.94 (11) It must be emphasized here that neither of the two versions of WBGT (Equations (10) and (11)) account for exposure to SR (or the Tg) as commonly found in literature. Such variants of WBGT (Equations (12) and (13)) [50] applicable for outdoor conditions under direct short-wave radiation, are defined as: WBGTout = 0.7 × Twb + 0.2 × Tg + 0.1 × Ta (12) Indoors or outdoors in shade: The two versions of WBGT (Equations (10) and (11)) included in HDI_0p25_1970_2018 are therefore prefixed as "Simple" to distinguish them from the original WBGT pioneered by Yaglou and Minard that emphasized the inadequacy of Ta and RH as being truly representative of thermal discomfort conditions outdoors [73].
Examples of earlier studies applying WBGT are quite exhaustive. Using a variant of WBGT, the authors of [74] make a comparative evaluation of human heat stress indices on selected hospital admissions in Sydney, Australia. In [75] the authors model productivity loss in workers due to heat stress. For a summary of other studies applying WBGT in health impacts assessment, readers are guided to [8,19].
2.3.6. Thom Discomfort Index (DI), also referred to as Thermal-Heat Index, Thermohygrometric Index or Temperature-Humidity Index, Units: °C The origins of DI are deeply rooted in a similar simple index known as "Oxford Index" [21], based on a weighted combination of WBT and Ta. Together with the WBT discussed above, the DI originally proposed by a U.S. weather bureau climatologist E. C. Thom in 1959 [28] are the only two physiological thermal stress indicators that have been in regular use dating back nearly six decades. The original formulation of DI proposed by Thom expressed in discomfort units (Equation (14) not used in this article) takes the following form: Simplifying the original equation further to a weighted summation of WBT and Ta, Thom [76] developed the widely used form of DI (Equation (15), as used in this article). DI = 0.5 × WBT + 0.5 × Ta (15) As noted by [77], the strength and appeal of DI lies in its accurate representation of the atmospheric evaporative cooling power (heat load), and its ability to convey the relevant climatic conditions in summer in physiologically significant terms.
The DI finds wide application in Israel, such as its usage by the Israeli Weather Bureau and the Israeli Army [77]. The DI is used to assess heat stress for defining thresholds and adequate conditions for military and is integrated as a measure for training restrictions and limitations. As with the other HDIs discussed above, the DI can also be found in various modified forms in literature. For instance, more recently in 1999, Moran and Pandolf suggested the modified discomfort index (MDI) [71] calculated as: The MDI is constructed using more sophisticated statistical analysis and as reported in a wide number of studies, is highly correlated with the WBGT.
It is worth noting that both WBGT and the different formulations of DI fall under the category of Industrial and Military Indices. Whereas the origins of the AT, HI, HDEX, WBT (discussed earlier), and WCT (discussed below) are rooted in the meteorological or epidemiological framework, WBGT and DI were primarily designed and developed for protection of both labor workforce and military personnel under varying (largely acute) heat exposure environments [32].
Examples of earlier studies applying DI include [19] for evaluating its importance as a HDI visà-vis other HDIs (such as WBGT and UTCI) in occupational heat stress, and [78] looking at the influence of the winter North Atlantic Oscillation index and selective HDIs on hospital admissions through diseases of the circulatory system in Lisbon, Portugal.
2.3.7. Windchill Temperature (WCT), or Windchill Equivalent Temperature, Units: °C Among the HDIs discussed in this article, the WCT has both interesting historical origins, as well as falls under a different sub-category of thermal indices. The basic principle of WCT is as follows. Wind blowing on a human body surface has a cooling effect due to the additional loss of body heat. The sensation or the feeling is referred to as "windchill" [53]. The WCT is a way to describe lower (apparent) temperatures in the presence of wind. It is based on the rate of heat loss from exposed skin caused by wind and cold.
The WCT coined by Siple and Passel while carrying out research in Antarctica, dates back to the early 1940s [52], and was considered the best-known research contribution of polar research until the discovery of the hole in the ozone layer over Antarctic in 1985 [79]. It is also the only HDI in this article that can be considered as "winter oriented". Contrary to the HI, HDEX, WBGT, and DI that are by design applicable in a combination of moderate to high range of Ta and RH, the WCT defined here using Siple and Passel approach [52], is applicable under milder Ta (≤33 °C) and W ≥ 1.39 m/s, and is mainly used for freezing temperatures.
Experts though have repeatedly criticized the above definition of WCT, questioning its (lack of) theoretical basis, and shaky experimental foundations. Yet, the Siple and Passel approach has remained in use and been widely adopted by weather forecasting centers globally, partly because of its simplicity and its known efficacy. For a detailed comparison and limitations of Siple and Passel based WCT with other subsequent definitions of WCT, avid readers are guided to [54,79,80]. Finally, it must be noted that the WCT here is not to be confused with Wind Chill Index [52,79], which is based on the original work of Siple and Passel utilizing windchill factors and is measured in kCal/m 2 h.
Examples of earlier studies applying WCT can be found largely in the medical research community, and not surprisingly covering the colder regions. For instance, authors in [81] use WCT for evaluating the effects of temperature and wind on facial temperature, heart rate, and sensation, and in [82], the authors examine the thermal responses in the body during snowmobile driving.

Data Repository and File Format
All indices in HDI_0p25_1970_2018 (Table 1)   One of the advantages of .nc4 file format is data compression. All .nc4 files are compressed using CDO command: cdo -z zip_5 copy input.nc4 compressed.nc4, thus reducing file size by a factor of ≈3. The size of the compressed individual annual files varies between 280 and 300 megabytes (MB) depending on the HDI, with the combined time-series (49 years) file ≈14 gigabyte (GB) in size. The total size of all 10 HDIs are ≈140 GB. The files follow the naming convention "gldas_HDI_daily_year.nc4", wherein "HDI" is the abbreviation of the index (Table 1), and "year" the year over which the HDIs are computed on daily timesteps. Some of the indices use more than one definition for computation in literature (e.g., ABM and NOAA, for AT in Table 1). These abbreviations are also included in the filenames.

Data Grid, Spatial Coverage, Resolution, and Projection
The spatial extent of GLDAS covers all land north of 60°S latitude and does not include data at or near water bodies. Such grid cells where the required GLDAS data are not available for computing the HDIs are identified by missing values "1.e+20f". The spatial grid ranges from -59.875 to 89.875 latitude and -179.875 to 179.875 longitude at 0.25° resolution (EPSG projection 4326-"+proj=longlat +ellps=WGS84+datum=WGS84+no_defs"). Consequently, the HDIs (as well as the secondary variables) in HDI_0p25_1970_2018 are also computed over the corresponding 1440 (longitude) × 600 (latitude) grid cells spanning 90° N-60° S, at the same 0.25° gridded resolution, with the land grid-cells in the proximity of water bodies reporting missing values.

Technical Validation
GLDAS has been comprehensively evaluated using different regional/global reference datasets in earlier studies, such as [84] who compare the GLDAS daily surface air temperature at 0.25° gridded resolution with two reference datasets: (a) Daymet data (2002 and 2010) for the conterminous U.S. at 1-km gridded resolution, and (b) global meteorological observations (2000) from the Global Historical Climatology Network (GHCN) [85]. A few notable examples of earlier studies that have incorporated GLDAS data include (i) high-resolution global-gridded datasets of climate extreme indices [86] and cooling/heating degree-days [87], and (ii) climate impacts assessment in energy sector [14,15]. Further details on studies implementing GLDAS are available on [88].
Given the lack of any other publicly available regional or global historical dataset incorporating the HDIs discussed in this article, a direct comparison of the indices in HDI_0p25_1970_2018 either at point-specific locations or at aggregated gridded scale is currently not feasible. The HDIs are instead compared with the two HDIs UTCI and MRT recently made available by the Copernicus CDS [36,37]. Maps in Figure 3 below show the heat-stress over western and central Europe, well documented for the 2003 heatwave. For the purpose of comparison, HDIs for July 27, 2003 are shown using a common color legend scale. Focusing on south-western Europe, majority of HDIs capture the unprecedented heat wave over southern-France and most of Spain in particular, with the two HDIs from ERA5-HEAT dataset showing a similar pattern. . Readers are referred to the heat-stress thresholds of the individual HDIs for comparison of thermal perception. Maps are created using R "raster" [89] and "sp" [90,91] packages.
It is worth emphasizing that the category of heat stress varies by HDI, and so a visual comparison of the values across HDIs (Figure 3) is not the most appropriate way of comparing the heat stress represented across the HDIs. To facilitate better comparison and evaluate a broader agreement in the spatial patterns across the HDIs, Figure 4 below shows the "Pearson" correlation computed using the spatial maps in Figure 3. It is interesting to note the high correlations (0.88-0.99) across the 10 HDIs in this study. It is also worth noting that although MRT is a component used in the computation of UTCI [36], the correlation between the two is only marginally higher (0.88) compared to the moderately high correlations between each of the 10 HDIs and UTCI/MRT (0.60-0.66). Readers are cautioned though that the spatial correlations shown in Figure 4 Figure 3, using R "raster" [89], "sp" [90,91] packages, with the plot created using R "corrplot" [92] package.

Discussion
Driven largely by the demands of climate health impact modelers, public health experts, and health-care infrastructure planners, a new historical high spatiotemporal resolution global gridded dataset of multiple HDIs constructed using various meteorological parameters is presented in this study. The dataset is aimed to address a common recurring pitfall faced by the research community; that of freely available, ready to use comprehensive set of quality controlled HDIs for rapid implementation in studies focusing on human health under varying degree of environmental thermal exposure. Nevertheless, certain limitations of the dataset, along with the key features, scope of application, and examples of usage are important to be highlighted.

Data Limitations in HDI_0p25_1970_2018 Emanating from GLDAS
The use of GLDAS variables for assembling HDIs in this study was motivated for reasons discussed in Section 2.1 "Source of Meteorological Variables". Whenever possible, applications of HDIs (especially in impacts assessment) should incorporate input variables from different underlying data products to account for parameter and input data source uncertainty. For instance, as highlighted in [86,87], certain known limitations of GLDAS data, such as larger uncertainty in the surface air temperature estimates over high mountainous areas are well documented in literature [84]. Users of the GLDAS-derived data products are therefore recommended to pay attention to the data caveats.
Moreover, as highlighted earlier, the grid cells in the proximity of water bodies do not report HDIs because of missing data in GLDAS. For studies focusing on point locations or regions smaller than ≈27 × 27 km 2 in the vicinity of water bodies (such as lakes and rivers, especially in densely populated areas near coastal region), users can be marginally restricted in the usage of the dataset. Such instances in HDI_0p25_1970_2018 are likely to be minimal because the criteria to assign the grid cell as land or water in GLDAS ver-2 data are based on a very high-resolution land-water mask (see further details in Appendix Figure A1 and the associated discussion). One work around is to fill such gaps using an appropriate interpolation technique available in R, CDO, etc. (e.g., bilinear, near neighbor, inverse-distance mapping).
Another shortcoming of the existing dataset is the lack of uncertainty estimates in the HDIs. Unlike ECMWF ERA5 reanalysis data, GLDAS does not provide ensemble members and so parameter uncertainties of the input meteorological fields are not accounted for in the computation of HDIs. The users of HDI_0p25_1970_2018 are thus reminded that the HDIs should be considered as representative of the ensemble mean of GLDAS realization, and not the spread. Nonetheless, the computation of HDIs on each ensemble member would be a daunting task requiring higher computational and storage facilities.

Limitations in the Current Set of HDIs Presented in This Article
A large number of HDIs derived using numerical and/or experimental approaches, applicable in both indoor and outdoor environments have been documented since the early 1900s (see Table 3 in Epstein and Moran [21]; and Table 1 in Havenith and Fiala [32] for a comprehensive historical summary of the HDIs). As emphasized by Havenith [93] and Pappenberger et al. [94], other meteorological factors such as short-and long-wave radiant fluxes also need to be considered when correctly determining thermal stress.
The current set of HDIs discussed in this article do not directly account for other atmospheric variables such as SR and cloud cover. More complex indices such as the Moran's Environment Stress Index (ESI, °C) [71], Perceived Temperature (PT, °C) [25] and UTCI (°C) [95,96], are left for work in future.
While the HDIs presented in this article are largely defined using the standard methods in the literature, a few of them use alternative definitions or approximations. For instance, the two versions of WBGT (Equations (10) and (11)) included in the current set of HDIs are simplified versions of the WBGT not accounting for the black globe temperature (Tg) (see [21,50,70] for extensive discussions on different formulations of WBGT employing Tg). Similarly, whereas the AT included in the dataset does not account for SR (see [32] for a variation of ATot_ABM introduced in Equation (2)), the WCT based on Sipel and Passel (Equation (17)) excludes both SR and humidity [53].
None of the HDIs account for the heat balance mechanisms of the human body. Such HDIs (e.g., UTCI) require a number of parameters based on physiological models [50]. The present dataset is therefore restrictive to "direct indices" [21], i.e., those accounting for only direct measurements of environmental variables proxying heat strain in lieu of a physiologically thermal balance approach.
Finally, all HDIs discussed in this article fall under the category of an "absolute HDI". A detailed description of a "relative HDI" is outside the scope of this article and readers are referred to [69] for a detailed comparison between an absolute and a relative HDI, and development of one such index known as "WSI". The difference between the two in a nutshell emanates from the region-specific scope of application. Put simply, HI with a value of 34 °C both in Texas (U.S.) and Berlin (Germany) could have different implications on human activities. Depending on the scope of study, users are reminded to examine alternative formulations and pay attention to limitations of the HDIs in their current form.

Key Features of the Dataset
Minimum, maximum, or average ambient temperature at sub-daily and daily timescales are often the only variables used in the definition of a majority of thermal indices (see [86,97,98] for further details on the heat wave indices defined using various temperature thresholds). However, such thermal indices when not accounting for other important factors that play a pivotal role (e.g., RH and W), may not provide a realistic assessment of comfort [42]. This is especially more pertinent in coastal regions where large diurnal variations in RH are usually observed. Despite some limitations highlighted earlier, the "direct" HDIs assembled here using a combination of Ta, W, and variants of RH are useful in their ability to encompass the perceived thermal effects into a single number, and therefore interpretable as the more commonly used Ta. To this date, HDI_0p25_1970_2018 spanning 49 years is the only high-resolution global gridded historical dataset of a broad suite of HDIs available to the research community. Along with the subsidiary variables, the dataset offers users with immense flexibility in spatiotemporal aggregation of individual indices (see example shown in Section 4.4).

Scope of Application
The indices in HDI_0p25_1970_2018 are aimed towards the research community and policy makers focusing largely on but not limited to (a) the impacts of climate change on health and energy sectors; (b) energy as a form of thermal adaptation (e.g., air conditioning) in discomfort environments that are not necessarily work environments; (c) historical trends and patterns in OHS. Biometeorological conditions can however show significant intra-day fluctuations, especially in the temperate climate zones. The specific values of the HDIs presented here are representative of the average daily thermal discomfort levels. The daily timescales of the HDIs in this study may therefore make the data less suitable for attribution of health outcomes on intra-day extremes of thermal discomfort levels. For such applications, users are recommended to explore other datasets at subdaily timescales (e.g., ERA5-HEAT [36]).
Since the individual HDIs in HDI_0p25_1970_2018 are essentially a large panel data (time series of a multiple geo-referenced locations), the dataset can find wider application in both detection/attribution studies (e.g., teleconnections patterns [78,99], amplification or trends under historical warming patterns [99,100]), as well in statistical/econometric analysis incorporating daily mortality and/or morbidity data (e.g., labor productivity under climate change-see [101] and references therein).
Assembled at a high spatiotemporal resolution, HI_0p25_1970_2018 allows users to aggregate the indices both at spatial (such as administrative levels, national boundaries), as well as temporal scales (such as months, seasons). It is worth emphasizing once again that the HDIs (Table 1), are assembled at the same spatiotemporal resolution (daily, 0.25°) as the near-surface daily averaged secondary variables outlined in Table 2. Each HDI therefore represents the estimated mean daily historical human discomfort conditions averaged over the 0.25° grid boxes (≈27 × 27 km at the equator).
A primary aim to assemble the HDIs uniformly over time and space, and across all global land grid-cells at high spatial resolution, motivated the choice of meteorological variables from a qualitycontrolled reanalysis data product in this study. Consequently, utilizing station point data (e.g., ECA&D [102]) or other gridded data products from either national (e.g., Brazil [103], India [104]), regional (e.g., E-OBS [105], ClimateNA [106]), or global meteorological data sources (e.g., GHCN [85]) at coarser resolutions were not considered in the preparation of HDIs presented in this article. Nonetheless, depending on the broader spatiotemporal application of the HDIs, researchers are encouraged to use other data sources in conjunction with the GLDAS based HDI_0p25_1970_2018 dataset presented here. The code utilized in this article for the preparation of HDIs is available upon request for rapid portability to other data sources. Table 3 summarizes the descriptive categories of thermal comfort levels across selective HDIs used in this article. It must be highlighted that while the HDIs at a particular location may have been recorded within their operational thresholds, certain HDIs by definition are more suited for particular climate zones than the others (e.g., DI which considers temperatures ranging from 24 to 27 °C as uncomfortable is intuitively more suitable for midlatitude countries with cooler climate compared to tropics). Table 3. Comparison of thermal perceptions for selective HDIs included in this article (adapted from [13,73] In the context of a historical dataset presented here, the global coverage of the HDI_0p25_1970_2018 offers users a unique opportunity to examine spatiotemporal variability using sophisticated techniques, such as the space-time perspective approach using rotated Empirical Orthogonal Functions (EOFs) (see Chen and Tung [107] for methodology adopted using rotated EOFs to examine global-mean surface temperature variability and associated large-scale teleconnections with El Nino-Southern Oscillation Index among others). HDI_0p25_1970_2018 can also facilitate examining the impacts of climate change on heat stress.
Moreover, the impact of extreme human discomfort levels could be more detrimental in densely congested regions, in particular to the socio-economically vulnerable people having little or no availability to counter-acting adaptation measures (e.g., usage of air-conditioning, see [14]). Accounting for population exposure across the extreme ranges of the HDIs over regional boundaries for instance (i.e., population weighted HDIs), can facilitate demographic stratification of sectoral impacts.
It is important to emphasize that while the HDIs are primarily designed for climate-health experts to gauge exposure risk to humans under different thermal environments, the spill-over effects are not necessarily limited to the health sector. For instance, a recent report by the International Labor Organization (ILO) [108] highlights heat related morbidity and mortality can have wider implications for labor productivity, economy, and social conditions. Similarly, the energy sector could also have implications with subsequent escalations in energy demands for maintaining thermal comfort levels, and shift in investments towards environmentally friendly modes of cooling.

Tools and Recommended Ways to Use the Dataset
NetCDF-4 facilitates data compression, portability, scalability, and is the de facto file format in the climate science community. It is compatible with a number of open-access software environments and desktop Geographic Information System (GIS) tools for generating quick plots, and for exploratory and geo-spatial data analysis. A non-exhaustive list of such freely available tools includes: (i) Panoply [109], and (ii) NCview [110] for generating quick plots and exploring .nc4 file data structure; while (iii) R [111], and (iv) Python [112] are recommended for data manipulation, exploratory and geospatial analysis, statistical/econometric modelling; (v) QGIS [113] and (vi) GRASS GIS [114] are recommended for users comfortable with graphical user interface (GUI) based environments.
Handling high resolution global gridded netCDF files can be computationally and memory intensive on standard machines with ≈4 GB of RAM. Whenever work scope requires focusing on smaller spatial subset of data (e.g., summer months-June to September-over a spatial domain covering China), users are first recommended to subset the .nc4 files prior reading them in certain software environments (e.g., R) where memory usage can be high. For such rapid data manipulation (spatiotemporal extraction etc.), as well as for exploring the variables, dimensions, units, attributes, etc., within the .nc4 file structure, users are recommended to employ command line routines made available in NCO and/or CDO. For e.g., NCO command "ncdump-h filename.nc4" or its equivalent CDO command "cdo sinfo filename.nc4" would allow a user to print the entire three-dimensional file structure. Readers are guided to the comprehensive user guides of NCO [61] and CDO [62] for reference commands.
Made available at daily timesteps, the HDIs are not recommended to be aggregated to other temporal scales (as highlighted in Section 4.3), such as weekly, monthly, seasonal, and annual timescales. While aggregation can be rapidly undertaken using NCO/CDO commands (as well as R, Python, etc.), multi-year daily statistics, and spatial aggregation to defined polygon boundaries (e.g., regions, provinces etc., using appropriate shape files) are instead more recommended ways when handling the HDIs. For instance, the native 0.25° gridded resolution can be re-gridded (interpolated) to either finer or coarser gridded resolution (e.g., 0.15° or 0.5°) using remapping techniques available in various geo-spatial tools, such as the CDO [62] remapping functions (e.g., "remapbil") or R raster [89] package. Users are cautioned though of a possible aggregation/disaggregation bias that may result because of remapping to different spatial scales.
For illustration, a sample R code ("HDI_sample_script.R") is provided in Supplementary Material to facilitate spatiotemporal sub-setting and aggregation of the DI over regional boundaries of Italy. The script does the following:

Work Planned for Future
The indices in HDI_0p25_1970_2018 are intended to be updated post-2018 years, subject to the availability of the required GLDAS raw meteorological variables in the coming years. The updated netCDF files will be made available to users on the same data repository, and users are encouraged to consult the author for any going updates prior accessing the existing data from the repository. In addition, spatiotemporal aggregation of the current set of HDIs for rapid implementation in impacts analyses, and development of a broader set of HDIs not included in this article are also envisaged in the near future.
While netCDF is more common and popular with the climate modeling community, and also facilities easier data storage and transfer, risk and impact assessment modelers often prefer data in easier and ready to use formats (e.g., ".csv", ".tiff") for implementation in their geo-spatial and statistical/econometric frameworks. Such data in other commonly used formats (e.g., ASCII, GeoTIFF) require additional data transformation and storage from the existing .nc4 files provided in the repository. In addition, users may also prefer an option to download selective years and/or regions instead of the heavier netCDF files.
To facilitate such a choice of downloadable data format, as well as user-defined spatiotemporal sub-setting of the HDIs to smaller domains, a graphical user interface (GUI) tool is planned for the benefit of the wider user community. Once implemented, the webpage URL of the tool will be indicated on the current PANGAEA repository.

Code Availability
All codes for assembling the HDIs (Table 1) and the associated secondary variables (Table 2), are available from the author upon request. The code for data generation of HDIs and secondary variables includes a combination of CDO [62] (ver 1.9.0) and NCO [61] (ver 4.3.4) commands written in Unix bash (shell) scripts, customized to run in a parallel super-computing infrastructure. Sample R code ("HDI_sample_script.R") discussed in Section 4.4, is provided for illustration in supplementary material. The code written in open source R [111] software (version 3.5.0-"Joy in Playing"), downloads, reads, analyses, transforms, plots, and saves the output of a sample HDI. All codes used in this study are developed on x86_64 Linux CentOS 6.6 software architecture but can be adapted to Windows as well as MacOSX.

Acknowledgments:
The author is grateful to Enrica De Cian (PI of ENERGYA), Hiroko Kato Beaudoing (NASA-GSFC) for clarifications on GLDAS data, Enrico Scoccimarro (CMCC), Jeremy Pal (CMCC), and Ian Sue Wing (Boston University) for constructive discussion on selective indices, Arthur Essenfelder (CMCC) for assistance with R demo code, and Paola Vesco (Uppsala University) for suggestions on the general structure of the manuscript. The preparation of the dataset was possible thanks to the high-performance computing resources and support staff of the (i) Boston University Shared Computing Cluster (SCC), and (ii) CMCC Athena, and the GLDAS data made publicly available by NASA Goddard Earth Sciences Data and Information Services Center (GES DISC). Developers of CDO, NCO, and R packages (used in this study) are also acknowledged for providing open-access tools used for data preparation, analysis, and facilitating sample codes used in this study. Any remaining errors in the dataset or the manuscript are those of the author.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A
Land-Water Mask in GLDAS (ver-2) data: As highlighted in the main text (Section 4.1.1), marginal gaps in data emanating from missing GLDAS input fields at or near water bodies is a minor limitation of the dataset presented in the study. In line with the discussion in the supplementary material of [87], the land-water mask for GLDAS is based on the MODIS 44W product at 0.01°; where the number of 0.01° pixels within 0.25° grid if greater than or equal to 50% are considered as "land" ( Figure A1). Thus, the potential loss of data in such grid-cells in the vicinity of water bodies, as discussed in main text is minimal. Figure A1. Land-water pixels in GLDAS 0.25° (ver-2) data (data source for map: [115]). Figure A1, grid-cells (shaded in blue) report missing data in GLDAS, as the fraction of pixels that fall within the 0.25° grid are ≤50%, and thus do not report meteorological parameters.