Use of the NASA Giovanni Data System for Geospatial Public Health Research : Example of Weather-Influenza Connection

The NASA Giovanni data analysis system has been recognized as a useful tool to access and analyze many different types of remote sensing data. The variety of environmental data types has allowed the use of Giovanni for different application areas, such as agriculture, hydrology, and air quality research. The use of Giovanni for researching connections between public health issues and Earth’s environment and climate, potentially exacerbated by anthropogenic influence, has been increasingly demonstrated. In this communication, the pertinence of several different data parameters to public health will be described. This communication also provides a case study of the use of remote sensing data from Giovanni in assessing the associations between seasonal influenza and meteorological parameters. In this study, logistic regression was employed with precipitation, temperature and specific humidity as predictors. Specific humidity was found to be associated (p < 0.05) with influenza activity in both temperate and tropical climate. In the two temperate locations OPEN ACCESS ISPRS Int. J. Geo-Inf. 2014, 3 1373 studied, specific humidity was negatively correlated with influenza; conversely, in the three tropical locations, specific humidity was positively correlated with influenza. Influenza prediction using the regression models showed good agreement with the observed data (correlation coefficient of 0.5–0.83).


Introduction
Investigation of connections between Earth's environment and public health issues can be considerably enhanced by the incorporation of remotely-sensed data.These investigations may comprise the examination of relationships between public health and primarily natural influences, such as meteorological and oceanic processes-an example would be the connection between water-borne diseases and heavy rainfall events, the latter potentially related to sea surface temperatures.Also included are processes affected by human activities, such as potentially harmful emissions into the atmosphere or water supply.An example of this relationship is the emission of sulfur dioxide (SO2) and nitrogen dioxide (NO2) by fossil fuel combustion for energy production.Geographical setting, climatological baselines, and global teleconnections may also be included in research with remotely-sensed data that has public health implications.
The National Aeronautics and Space Administration (NASA) has acquired a rapidly growing archive of Earth remote sensing data, originating with the Landsat and Nimbus satellite missions in the 1970s and continuing with increasingly ambitious and technologically advanced missions to the present year, marked by the recent launches of the Global Precipitation Mission (GPM) and Orbiting Carbon Observatory-2 (OCO-2) satellites.
Since its inception in 2003, the NASA Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni) system provides access to a wide variety of NASA remote sensing data and other Earth science data sets, allowing researchers to apply selected data to a broad range of research topics.Currently hosted by the Goddard Earth Sciences Data and Information Services Center (GES DISC, Giovanni includes data from many different NASA missions and projects.An in-progress Advancing Collaborative Connections for Earth System Science (ACCESS) project titled "Federated Giovanni" will expand the data available in the system by including data from other NASA data centers.
This variety of data gives Giovanni marked potential for the investigation of different public health issues.One of Giovanni's primary attributes is ease-of-use; researchers who are generally unfamiliar with remote sensing data can use the system to find data that is applicable to their topic area and employ it.Only a relatively short investment of time and effort is required to become facile with the system.Correspondence with users in the public health research sector has indicated their high level of satisfaction with access to the data it provides, and the capability of determining whether or not remote sensing data can be used in their particular research area.
Giovanni provides remote sensing data alongside several different basic analytical capabilities, which include spatial maps of data variable values, difference maps, area-averaged time-series, animations, and vertical profiles of atmospheric variables.The mapping capability includes rapid averaging, so that mean values for months, seasons, or years can be visualized readily.All maps and plots generated by Giovanni can be immediately downloaded.Though it is not designed specifically as a data subsetting engine, for many data types Giovanni provides a relatively simple way to acquire spatially and temporally subsetted data, and it has been used for this purpose in numerous investigations.Giovanni is, ideally, a data exploration tool, allowing the performance of operations that used to require days and week for data acquisition and preparation to be performed in minutes, enabling more detailed analyses with considerably reduced time and effort.The Giovanni system is currently being transitioned from the current system, colloquially referred to as "Giovanni-3" [1], to a more flexible architecture, "Giovanni-4", that accelerates processing speed, adds new analysis capabilities, and which consolidates all of the data variables into a single search interface, rather than separate portals.The in-development Giovanni-4 system has not yet been described in a publication, but is available for use at the GES DISC Web site.
Missions, instruments, or projects providing data products available in Giovanni which are useful for public health research include: the Atmospheric Infrared Sounder (AIRS); the Tropical Rainfall Measuring Mission (TRMM); the Ozone Measuring Instrument (OMI); the Moderate Resolution Imaging Spectroradiometer (MODIS); the Modern Era Retrospective-analysis for Research and Applications (MERRA) project; the NASA Ocean Biogeochemical Model (NOBM); and both the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS).
Although Giovanni is easy to use, many researchers and applied science professionals need guidance on how to find the appropriate datasets and how to interpret them.The NASA Applied Remote Sensing Training (ARSET) program provides online and in-person training for professional audiences, including health specialists, on how to use NASA resources and data, including data sets hosted at the GES DISC through Giovanni.Training modules with step by step instructions can be found online [2].ARSET also has online resources that can supplement the use of Giovanni by non-specialists in remote sensing.

Data Parameters in Giovanni Relevant to Public Health
Data in Giovanni can be categorized with respect to its applicability to public health issues.In the following, three tiers of applicability will be presented: Tier 1, data that have a strong relationship to public health, and which are thus directly applicable in public health research; Tier 2, data that have indirect yet established relationships with an area of public health concern; and Tier 3, data that are related to weather or climate with an effect on public health and well-being.

Tier 1 Data Parameters
Tier 1 data types include: Precipitation data finds wide application in public health research.Precipitation occurrence has frequently been associated with waterborne diseases, insect population outbreaks, and disease transmission modes (i.e., shared water resources).Recent studies used Tropical Rainfall Measuring Mission (TRMM) daily data products to investigate the connection between rainfall and the location of cholera outbreaks in Haiti following a devastating earthquake [3].Research on malaria transmission using remote sensing data frequently involves rainfall data.Malaria is a mosquito-borne disease, and since mosquitoes have an aquatic stage of their life cycle, mosquito populations are influenced by rainfall patterns.Kiang et al. [4] described research on malaria transmission patterns in Thailand, examining correlations with surface temperature, vegetation cover, and rainfall.Adimi et al. [5] described the potential for malaria risk prediction in Afghanistan.Both of these investigations accessed rainfall data products in Giovanni.Midekisa et al. [6] also used rainfall data to create early-warning models for malaria in Ethiopia.
Precipitation extremes also have public health effects-directly due to the danger posed by flood waters, subsequently due to damage to water utilities and freshwater sources affecting the water supply, and finally due to increased potential for disease outbreaks due to contaminated water.With regard to floods, Cools et al. [7] described the creation of a flash flood early warning system for Egypt that used precipitation data from Giovanni.Singh, Pandey, and Nathawat [8] used Giovanni to investigate the cause of the 2008 Kosi flood in India.GLDAS and NLDAS feature many different hydrological variables, including soil moisture and runoff in addition to precipitation.These variables can be used to study severe storms, snowmelt flooding, and drought intensity.In addition, cloud cover data can be correlated with changing precipitation patterns, as well as for tracking severe storms and weather fronts.

Temperature Data
Temperature data, along with relative humidity, also can provide significant insight into public health concerns.Soebiyanto, Adimi, and Kiang [9] determined that temperature was a primary variable associated with seasonal influenza transmission.Surface temperature is a fundamental variable related to water resources, drought conditions, vegetation survival, insect overwintering survival, heat stress, and disease-vector species ranges.Giovanni has remotely-sensed land surface temperature data from the Moderate Resolution Imaging Spectroradiometer (MODIS), atmospheric temperature data from the Atmospheric Infrared Sounder (AIRS), model and assimilated model (GLDAS and NLDAS) temperature data, and high-resolution temperature data for specific regions.An example of such research was presented in Shen et al. [10], which described the pioneering data portal built for the Northern Eurasian Earth Science Partnership Initiative (NEESPI).Changes in this region, such as higher temperatures and increased fire outbreaks, were described.

Air Quality Data
Another area of public health concern is air quality.There are several regularly accessed variables related to air quality in Giovanni.Likely the most used are Aerosol Optical Depth (AOD) data products, which are acquired by MODIS and the Ozone Measuring Instrument (OMI).AOD indicates the optical clarity of the atmospheric air column, with higher values indicating more scattering and absorption by particles and chemicals in the atmosphere.Because of the direct relationship between AOD and some kinds of air pollution, particularly the frequently monitored PM2.5 and PM10 particle size fractions, AOD data variables have been primary resources in many different studies.Two examples are Li, Shao, and Buseck [11] on the effects of biomass burning aerosols on haze in Beijing, China, and Lu et al. [12] on sulfur dioxide emissions and trends in eastern Asia.AOD also has been used to track the regional impact of smoke from wildfires, which can be transported hundreds of miles from its source (Figure 1).Prados et al. [13] provides a comprehensive review of the use of air quality-related data sets in Giovanni.OMI is also an important source of other atmospheric chemistry data.The potential health significance of stratospheric ozone depletion is well-known, and OMI ozone data are integral to that research.OMI also provides a useful nitrogen dioxide (NO2) data product, which can be used to track wildfire locations and movement, as well as air pollution sources resulting from the combustion of fossil fuels.Sergei Sitnov has been a prolific user of Giovanni, using the system to publish several papers on NO2 and air quality in Russia.One such study looked at the weekly pattern of air quality and its relationship to meteorology in the environs of Moscow [14].Another air quality indicator chemical species is carbon monoxide (CO), acquired by AIRS.It may not be immediately apparent why oceanic phytoplankton chlorophyll concentrations are useful for health-related research.However, this data type actually has one of the longest associations with public health of any that has been provided by the GES DISC.This is due to the fact that Vibrio cholerae, the bacterial species responsible for cholera, has a stage in its life cycle when it infests copepods, a zooplankton species that feeds on phytoplankton.Thus, flood-related blooms of phytoplankton can provide a fertile ground for the proliferation of copepods and V. cholerae.Coastal Zone Color Scanner (CZCS) data were used in the 1980s to examine a cholera outbreak related to a phytoplankton bloom in the Bay of Bengal.These data in Giovanni can be used for cholera research, and to examine vectors of seafood contamination ("red tides" and other Harmful Algal Blooms, HABS), fish mortality, and severe storm effects.The use of ocean remote sensing data to study cholera outbreaks has been described previously [15][16][17].

Tier 2 Data Parameters
Phytoplankton patterns also are related to fishery success or failure.Because fish constitute the major protein source for many coastal populations, these data too can have public health ramifications.Blooms also can indicate where someone should not fish; Van Holt showed that shellfish in areas with consistently elevated chlorophyll concentrations have more undesirable organisms clinging to their shells than in lower-chlorophyll zones [18].Euphotic depth, a measure of water clarity, has been used for water quality studies and reports, and can indicate offshore flood effects.Sea surface temperature (SST) is directly related to water quality and phytoplankton growth, but indirectly it is related to coastal precipitation, storms, flooding, and the health of coral reefs.The Caribbean SERVIR (Sistema Regional de Visualización y Monitoreo) program used MODIS SST from Giovanni extensively in their research report "Sea Surface Temperature Trends in the Caribbean Sea and eastern Pacific Ocean" [19], published in 2011 to provide a baseline study for the impacts of these events on the population of countries in Central America, northern South America, and the Caribbean Sea.

Ozone Data
As noted earlier, OMI data is an obvious choice to examine stratospheric ozone depletion and the Antarctic "ozone hole" depth and extent.But the Erythemal Daily Dose data product, which describes the impact of ultraviolet radiation exposure on humans, has been used in some unique ways.Serrano, Cañada, and Moreno [20] used this data product to quantify the dangers to youth skiers of significant exposure to ultraviolet radiation.

Vegetation Indices
NDVI and EVI, both indices of vegetation greenness and ground cover, are also potentially useful data types for health research, as is soil moisture.These indices indicate the extent and intensity of drought, and thus are related to water resources and agricultural success.Kiang et al. [3] employed the vegetation indices in modeling malaria occurrence in Thailand, as they are relevant to land use and mosquito breeding environments.High resolution (5.6 kilometer) NDVI and EVI data are currently available in the Monsoon Asia Integrated Regional Study (MAIRS) high resolution monthly data portal.

Tier 3 Data Parameters
Tier 3 data types may be related to weather and climate, with effects on public health and well-being.Many of these data types measure quantities that are important to water resources: The current drought besetting western states of the United States, which some meteorologists describe as commencing in the year 2000, is having observable effects on snow in the mountain ranges, particularly those of California.As this will have ramifications for the management of water resources, and also impact wetland areas, the use of Giovanni to monitor such changes may be warranted.Furthermore, heavy snows can lead to floods, which may be predictable from snow depth data and observable with runoff data.Trends in snow parameters also may be indicators of climate change impacts and shifts in freeze and melt timing.A Giovanni time-series prepared for the NASA Data Investigations for Climate Change Education (DICCE) project [21] demonstrated how Giovanni could be used by teachers and students in New Mexico.Figure 2 shows a 1979-2010 monthly snow mass time-series for the mountainous area of northern New Mexico, a major source area for the Rio Grande River.Reduced snow mass from 1995-2005 is clearly visible.
The data tiers described above are necessarily broad classifications.Data types can have varying relevance to particular diseases, and research conducted on the connections between diseases and environmental factors must consider the spectrum of potential relationships.For example, snow depth or mass is rarely important for malaria incidence, but it is an important variable for malaria in Afghanistan.NDVI is not related to influenza, but is important for many vector-borne diseases, while AOD can be significant for respiratory diseases (such as influenza), but not for vector-borne diseases.Regional (particularly coastal) SST has been shown to be very relevant to cholera, while basin-scale SST is related to rainfall, and thus may have a relationship to diseases with a precipitation or hydrological connection.

Influenza Example
The following example demonstrates how Giovanni was integral to the use of remote sensing data in studying the relationship between influenza and meteorological parameters, and further shows the capability of these parameters in predicting influenza activity.The burden of influenza, and how it is related to meteorological conditions, will be described first.
Influenza is an acute respiratory infection that can rapidly spread worldwide in seasonal epidemics.It approximately infects 5%-15% of the world population and causes up to 500,000 deaths each year [22].In the United States, the economic cost of influenza epidemic is estimated to be around US$71-167 billion per year [22].The epidemic timing varies across latitude, further suggesting the role of meteorological and environmental factors on influenza transmission.In the temperate region, influenza epidemic occurs during the winter time [23,24].However, the seasonality and pattern of influenza epidemics in the tropics are less defined: from year-round high influenza activity, peaks that coincide with rainy seasons, to multiple peaks in a year [23,[25][26][27].Animal and laboratory studies have indicated that low temperature and humidity-consistent with wintertime conditions-provide suitable conditions for efficient transmission and longer virus survivorship [28,29].In the tropics, rainfall is often associated with higher influenza activity although the direct causal relationship remains unclear.It is postulated that rainfall promotes indoor crowding that in turn, increases the probability for aerosol-and contact transmission [30].
In this study, influenza occurrence in five countries with either temperate or tropical climates was analyzed.The countries we studied were the Netherlands and New Zealand in temperate climate zones, and the Philippines, Vietnam and Sri Lanka in tropical climate zones.Influenza data was obtained from the World Health Organization Flu Net [31] for each country.Data was obtained for at least 3 years (Figure 3).Precipitation data was obtained from NASA's TRMM via Giovanni (TRMM 3B42 product).Briefly, the TRMM 3B42 product combines precipitation estimates from TRMM and other satellites as well as gauge analysis to produce daily precipitation at finer scale [32].Near surface (2 m) specific humidity was obtained from Global Land Data Assimilation System (GLDAS), also archived in Giovanni.The Giovanni system was used to automatically download and subset the data based on the rectangular boundary of the study area.Output from Giovanni was an ASCII file of the aforementioned geophysical parameters, tagged with latitude, longitude and time.Spatial and temporal averaging was then performed on the retrieved data, as described below.The ASCII output format made it easier to do post-processing with statistical software (R), where we developed our model.Hence, the Giovanni system allowed us to effectively retrieve the dataset without the need to download or store large-sized HDF files.Ground stations were the source of minimum temperature data [33].Due to the limitations of TRMM spatial coverage, precipitation data for the Netherlands was obtained from ground stations.The weekly proportion of respiratory samples that tested positive for influenza acted as an influenza activity indicator.Logistic regressions were then developed for each study location, with minimum temperature, precipitation and specific humidity (averaged from current to previous 3 weeks) as covariates.The previous two weeks of influenza activity, and a third order polynomial function of the week number, were also included as covariates.Backward variable selection was then applied to obtain a parsimonious model (a model with as few covariates as possible).A more detailed description of the model can be found in Soebiyanto et al. [25].
Figure 3 plots the weekly influenza activity and meteorological parameters, averaged across the study period.Out of the three tropical study locations, Vietnam had larger variability in both minimum temperature and specific humidity.In the temperate locations, the Netherlands had larger variability in these two parameters (as compared to New Zealand).Precipitation in the tropical locations showed varying seasonality, while it was evenly distributed throughout the year in the temperate locations.The plots of the influenza positive proportion (Figure 3) showed that influenza peaks during wintertime in the temperate locations: around February-March in the Netherlands (Northern Hemisphere) and July-August in New Zealand (Southern Hemisphere).At this time, both temperature and specific humidity were also at their minimum values (Figure 3).In the tropics, the seasonality was not as well-defined.On average in the tropics, higher influenza activity appears to be associated with higher temperature, specific humidity, and precipitation values.
Results from the logistic regression models (Table 1) indicated that minimum temperature was inversely associated (p < 0.05) with influenza activity in Sri Lanka (Odds Ratio (OR) = 0.59, 95% Confidence Interval (CI) = 0.39-0.90).Specific humidity was significantly associated (p < 0.05) with influenza activity in all locations with varying relationships.Proportional associations were found in all three tropical locations (OR range of 1.13-1.47),while inverse associations were found in the two temperate locations (OR = 0.79 (0.67-0.95) in the Netherlands, and OR = 0.41 (0.29-0.58) in New Zealand).Here, proportional association indicates that an increase in the specified meteorological parameter was associated with an increase in influenza activity.Inverse association indicates an increase in a meteorological parameter was associated with a decrease in influenza activity.Precipitation, meanwhile, was not significantly associated (p > 0.05) with influenza activity in any of the locations.Table 1.Multivariate regression between influenza positive proportion and meteorological parameters.Bold font indicates significance at α = 0.05 levels, RMSE indicate root mean squared error and Corr.Coeff. is the correlation coefficient between the observed and predicted influenza positive proportion.The models were adjusted for previous weeks' influenza activity, seasonality and other possible nonlinear relationships (modeled as a polynomial function, up to degree of 3, of the week number).When OR is not shown, the variable is not selected by the backward selection (not included in the final model).The inverse relationship between influenza activity and minimum temperature in Sri Lanka, and specific humidity in the temperate locations were consistent with experimental studies that indicated such conditions (low temperature and humidity) were suitable for longer influenza virus survivorship and more efficient transmission [28,29].Findings on specific humidity in the tropical locations were in contrast to those in the temperate locations.These were consistent with other studies in the tropics [23,26,27].The proportional association with specific humidity may indicate an indirect relationship with influenza activity, similar to precipitation.Indoor public places may provide opportunities for crowding when it rains or when humidity is high, and thus may enhance contact, aerosol, and droplet transmission.
We only find an association with minimum temperature in Sri Lanka, and not in the rest of the tropics and temperate study locations.In the temperate region, minimum temperature is often highly correlated with specific humidity.Hence, minimum temperature could be related to influenza in a similar fashion as specific humidity, but our model did not select this parameter, as it may not give the best model performance.This was consistent with another study showing that the relationship between influenza and temperature in the temperate United States was not as statistically strong as that of influenza and absolute humidity [24].Meanwhile, temperature in the tropics typically remains relatively similar throughout the year without strong seasonality pattern.Therefore it was not associated with influenza, which was also observed in another study for tropical regions [23].The resulting models (one model in each study location) were then used to predict influenza activity during the final year of the data (Figure 4).These data sets were not used in training the models.The estimated influenza activity could be seen to reasonably follow the observed curves.Root mean squared errors (RMSE, Table 1) were less than 0.15 (influenza activity in this study was expressed in proportion).The correlation coefficients (Table 1) between the estimated and observed influenza activity showed a good agreement between the two (mean = 0.7, range = 0.5-0.83).
In conclusion, this analysis showed specific humidity as an important determinant for influenza activity across the climate zones.In temperate climate zones, influenza activity increased with decreasing specific humidity; the reverse was observed in the tropical locations.The former is consistent with low temperatures and low humidity occurring in winter, when influenza activity is elevated in temperate climate zones.It was also demonstrated, in Figure 4, that regression models which include meteorological, seasonal, and autoregressive inputs can be used to predict influenza activity relatively well.Hence, it is possible to use projected influenza activity as a guide for planning future prevention efforts.Short-term weather forecasts can be used to estimate influenza in the following week, whereas climate models can be used to assess how influenza activity and timing may change with climate over the next decades.

Conclusions
The multitude of remotely-sensed data parameters available in Giovanni, provide many potential research opportunities for examination of relationships between environmental factors and public health issues.Our example, an investigation of possible relationships between influenza activity and three variables-temperature, specific humidity, and precipitation-demonstrates how having such data types readily available for rapid subsetting and download in Giovanni, can enable the research process for researchers who are not necessarily familiar with the details of satellite remote sensing.This example does not fully demonstrate the capabilities of Giovanni's basic analytical functions, such as data variable maps, time-series, or Hovmöller diagrams, which can also be used in research.With due attention to the possibility of non-causal statistical correlations between environmental data and public health data, Giovanni provides an easy and rapid way to access and use NASA's Earth science resources in the health science sector.Furthermore, as climate change results in differing patterns of disease transmission and occurrence, Giovanni's analytical capabilities can be exploited to observe related changes in environmental factors.

Figure 1 .
Figure 1.MODIS Aerosol Optical Depth (AOD) image showing the large area of elevated aerosol concentrations northeast of Moscow (yellow), stemming from massive wildfires that erupted in the hot summer of 2010.The daily AOD data was acquired for the period 27-31 July 2010, and averaged over this time period with Giovanni.

Figure 2 .
Figure 2. Monthly time-series of Modern Era Retrospective-analysis for Research and Applications (MERRA) snow mass data, plotted with Giovanni, for the central mountainous region of northern New Mexico, USA.

Figure 3 .
Figure 3. Weekly influenza positive (in %) and meteorological parameters averaged across study period.Bar plot shows the percentage of influenza positive.TMIN is minimum temperature (°C), SH is Specific Humidity (g/kg) and PRCP is precipitation (1 cm).

Figure 4 .
Figure 4. Regression models prediction of influenza positive proportion during the indicated period.The black line is the observed data (validation dataset, not used in training the models), and the red line is the model prediction with grey shades indicating the 95% Confidence Interval (CI).