1. Introduction
Satellite observations of aerosol optical properties, such as the aerosol optical depth (AOD), are increasingly being used to infer spatial and temporal patterns of fine-mode particulate matter, PM
2.5, for health studies [
1]. However, significant challenges associated with the use of these observations remain. A large proportion of satellite observations are missing (estimated at ~70% in the 10 km AOD products), chiefly as a result of cloud-cover, snow-cover, and surface brightness [
2,
3]. Previous work to address this gap-filling problem has largely assumed that the observed aerosols are comparable to aerosols that could not be observed [
4,
5]. Contradicting this assumption, global and US-centric studies have estimated that missing satellite observations result in an underestimation of true PM
2.5 concentrations, by an average of 20% in the US [
6,
7]. Additional work has demonstrated that missing satellite data results in over-prediction of ground-level PM
2.5 concentrations in the summer months and under-prediction in the winter months at higher latitudes [
8,
9]. More recent work has gone beyond this to examine the contribution of certain drivers, namely the impact of cloud-cover, on PM
2.5 concentrations at ground level and associated changes in the composition of particulates [
10]. The authors found that increased quantities of cloud-cover and increased cloud optical depth were associated with both compositional changes in PM
2.5 and an overall decrease in concentrations in the southeastern US. These findings suggest that cloud-cover is associated with changes in ground-level PM
2.5 concentrations and composition. Non-random missingness in satellite retrievals, if not accounted for during exposure estimation of PM
2.5, can bias health effect estimates in subsequent analyses [
11].
Through complex physical and chemical processes, clouds influence the composition, vertical distribution, diurnal patterns, size distribution, and mass concentration of the aerosols beneath them [
12]. At the macro scale, clouds are associated with meteorological conditions that govern the micro- and macro-physical properties of both clouds and aerosols, as well as temperature, humidity, wind speed, vertical convection, and planetary boundary layer height [
13,
14]. All of these can influence particulate concentrations at the ground level by altering rates of deposition, vertical distributions, emissions, and rates of secondary aerosol formation [
15]. Relative humidity and temperature additionally interact to influence rates of both cloud and aerosol formation, the properties and phase of the clouds, and gas-particle partitioning of aerosol components [
13,
15,
16,
17]. On a more localized scale, clouds, particularly thunderstorms, alter vertical and horizontal convection, block light, and occasionally rain. Changes in convection directly influence vertical distributions of aerosols beneath the cloud, as well as rates of dry deposition [
18,
19]. Light blockage alters rates of the photochemical reactions responsible for secondary aerosol formation from gaseous precursors in the atmosphere, indirectly altering aerosol composition and concentrations nearer the ground [
20,
21]. A small fraction of clouds precipitate, in the process depositing airborne aerosols within and beneath the cloud to the ground [
22,
23]. Near and within the actual cloud, aerosols participate in the process of cloud formation via nucleation scavenging, and can reduce the effective radius of the cloud particles and alter precipitation efficiency [
24,
25]. Taken together, the result is a complex tangle of interrelationships between clouds, aerosols, and meteorology which results in different aerosol concentrations and composition beneath cloud-cover relative to that observed when the sky is clear.
The combined impact of these processes on ground-level PM2.5 has not been directly studied or linked to measurable properties of the clouds themselves. The current study aims to advance our understanding of whether satellite-retrieved cloud properties are associated with changes in ground-level PM2.5 concentration and composition, and the extent to which cloud properties are associated with these changes. We examine the empirical relationship between cloud properties and the meteorological conditions associated with cloud presence and ground-level concentrations of PM2.5 from area ground monitors over two urban sites in the US: Atlanta and San Francisco, two sites chosen as representative of different aerosol and meteorological regimes. We additionally apply these relationships to account for cloud-cover related missing PM2.5 estimates when using AOD to predict ground-level PM2.5. We compare results from a model which assumes that the reason for the missing AOD observation is random, to one that accounts specifically for cloud-cover missingness as a distinct phenomenon.
2. Materials and Methods
Environmental Protection Agency (EPA) ground observations of 24-h total and speciated PM
2.5 concentrations between 1 April 2007 and 31 March 2015, were obtained from the EPA’s AirData website [
26]. Daily ground observations were used to represent the daily gravimetric mass concentrations at individual stations. Mass reconstruction was used to calculate concentrations of organic carbon (OC), sulfate, and nitrate, elemental carbon (EC), sea salt, and soil to account for unmeasured molecules in the speciation information and ensure that changes in the speciated masses, and model estimates, would be comparable to changes in the matched gravimetric measurements [
27]. This aids interpretation by allowing direct comparison of changes in component masses to changes in gravimetric masses. Results are only presented in the paper for the reconstructed OC, sulfate and nitrate mass concentrations. The Chemical Speciation Network (CSN) EC and OC carbon fractions were additionally corrected for differences between Total Optical Transmittance (TOT) and Total Optical Reflectance (TOR) monitors, following previous work [
28].
Monitors located within the study areas surrounding San Francisco and Atlanta, displayed in
Figure 1, were collocated with additional data products. The 1 × 1 km twice-daily MAIAC AOD product, with a retrieval accuracy that is comparable to the ±(0.05 + 0.15)*AOD error envelope of the 10 km MODIS AOD products in validation studies, was used to obtain information on AOD and cloud presence/absence, as calculated using the slightly different screening criteria used for aerosol products relative to cloud products [
29,
30]. The twice-daily MODIS collection 6, daytime cloud product (M*D06) was used to obtain information on cloud emissivity, cloud optical depth (OD), cloud effective radius, and cloud phase [
31]. Of these, cloud emissivity, comparable to cloud fraction, and cloud phase are available at 5 km resolution at nadir, while cloud radius and cloud optical depth are available at 1 km resolution at nadir. The 13 × 13 km hourly rapid update cycle (RUC) and its successor the RAPid refresh (RAP) model [
32,
33] was used to obtain meteorological data on convective available potential energy (CAPE), wind speed, relative humidity (RH), planetary boundary layer (PBL) height, temperature, and precipitation rates in the pixel nearest to each EPA monitoring station during the hours in which twice-daily MODIS pass times from Terra and Aqua occurred. The RUC/RAP meteorological model represents a continuous time-series of moderate resolution assimilated meteorological data, and is known to accurately reproduce vertical profiles of temperature, humidity, and wind speed, all of particular importance to this application [
33]. Collocations of satellite and modeled products with EPA observations were processed in a stepwise fashion, starting with MAIAC, so that AOD missingness could be defined separately from its associated climatic conditions and to account for differences in the spatial resolution of each product. First, each 24-h gravimetric EPA observation was matched to the nearest MAIAC pixel within 1 km of the station and defined as AOD missing or present. Using the Quality Assurance (QA) code we further defined each missing AOD value as missing as a result of cloud or other reason, such as snow-cover or fire hot spot. Observations with AOD missing as a result of cloud-cover were then matched to MODIS cloud parameters averaged within a 10 km radius of each EPA observation, and the nearest RUC/RAP observation. Observations where discrepancies existed between the MODIS cloud parameters and the RUC/RAP results on precipitation rate were classified as possibly cloudy, with the remaining cloudy pixels classified according to the cloud phase information from MODIS. This collocation process was repeated separately for both Aqua and Terra MODIS overpasses. Observations were categorized into five categories: definitively uncloudy, possibly cloudy, definitively cloudy but with no phase determination for the cloud, ice clouds, and water clouds. The possibly cloudy and cloudy but of an uncertain phase categories were collapsed in the later analysis into the possibly cloudy category, and definitively uncloudy observations were not analyzed.
In preliminary analyses, a linear mixture modeling approach was used to examine the nature of the relationship between ground-level PM
2.5 and cloud properties [
34,
35]. A number of categorical variables were tested as conditioning variables for grouping PM
2.5 values into sub-populations. The conditional variables included cloud top height, cloud phase, multi-layered cloud flag, the interaction of cloud phase and cloud height and the interaction of multi-layered cloud flag and cloud top height. Of these, the lowest AIC (Akaike information criterion) value was obtained when using cloud phase as the conditioning variable. Since a mixture model with hard separation of components using a categorical variable is statistically very similar to a set of independent models. The final results presented here correspond to simpler, linear mixed effects models run independently for each modeling category.
Specifically, four separate models for the two cloud phases (ice and water), to all observations where AOD was not missing, as well as to all other observations where AOD was missing as a result of possible cloud-cover, were fit to the natural log of the 24-h PM
2.5 mass concentration at each study location and for each overpass time, making a total of 16 independent models. PM
2.5 concentrations were log-transformed to normalize the data distribution for these linear models. Results for the possibly cloud models are presented only in the
supplementary materials. All models included as predictors RH, wind speed, temperature, PBL height, CAPE, precipitation rate, cloud radius, cloud OD, and cloud emissivity. The model fit to observations where AOD was not missing were fit only to the meteorological parameters RH, wind speed, temperature, PBL height and CAPE. All models additionally included random intercepts for each day of the study period to control for seasonal effects. The equation for this model, used throughout the paper, is given in Equation (1). Here, the natural log of the PM
2.5 observation at each location (
j) and time (
i), is modeled using a random intercept for each day (
βi), and a fixed effect slope (
γk) for each of k predictors (
X), plus a random Normal error component (
ε).
The same linear mixed effects models (Equation (1)) used to model the impact of cloud cover and meteorological conditions on PM2.5 mass were used to model the various PM2.5 components, with the goal of identifying the individual component’s relative impacts on the change in total mass. Models were fitted to the natural log of the reconstructed mass of three largest components: sulfate, nitrate, and organic carbon.
We then conducted a case study using a MAIAC AOD-PM model to estimate daily PM
2.5 where AOD was available. When AOD was not available, values missing in the ungap-filled model were filled in using Equation (3) in the Harvard gap-filling model and were filled in using Equation (1) in the Cloud gap-filling model. We examined differences between the ungap-filled, Harvard gap-filled, and Cloud gap-filled models in the spatial distribution of aerosols from an example monthly estimate choosing January 2012 at the San Francisco site and using a models fit to EPA data over the time period from 2012 to 2014 to predict PM
2.5. We compared daily predictions made using an ungap-filled model to one that assumes missingness is random (Harvard gap-filled) and to one that assumes cloud-driven missingness (Cloud gap-filled). To accomplish this, the MODIS cloud product and RUC/RAP observations were gridded to the 1 × 1 km MAIAC grid used as the predictive surface for PM
2.5. The MODIS cloud product was gridded using a method that reconstructs the MODIS polygons using a Voronoi tessellation algorithm from the midpoint locations for each pixel in a granule [
3]. These reconstructed polygons were then matched to the MAIAC grid by area to account for the fisheye effect, where pixels towards the edges of the granule are larger than those in the center, still present in the MODIS cloud product. The 1 × 1 km MAIAC grid cells were then matched to the nearest ~13 × 13 km RUC/RAP observation. For pixels where AOD was present a standard prediction model, published in previous works (Equation (2)), was used to predict PM
2.5 from AOD [
36].
where MAIAC AOD was absent, Equation (1) was used to impute the missing PM
2.5 values. We additionally compared results to those obtained over cloudy pixels from an adaptation of the gap-filling model developed by researchers at Harvard, which assumes that all types of missing AOD observations are comparable (Equation (3)) [
5,
37]. All three models fit a first-stage model to obtain ground-level PM
2.5 estimates over all times and locations where AOD exists (Equation (2)). In Equation (2), daily PM
2.5 is modeled using a mixed effects model with fixed (
α) and daily random intercepts (
ut), fixed (
βʹ1,
βʹ2k) and daily random slopes (
vt) for AOD. We additionally included fixed slopes for each of k meteorological variables (MetVars), which included RH, PBL height, temperature, and wind speed as well as fixed slopes (
βʹ3–6) for spatial variables including road length, forest cover percentage, point emissions, and elevation. Equation (2) additionally accounts for error in space and time (
εʹst(
ut,
vt,
wt)), assuming a multivariate normal distribution centered at 0
N[(0,0,0),
ψ]. The Harvard gap-filled model predicts missing PM
2.5 via the use of Equation (3), while the gap-filling model utilized in this work accounts for cloud cover by predicting missing PM
2.5 using Equation (1). Equation (3) predicts the square root of PM
2.5 concentrations at each location (
s) and time (
t), to constrain estimates to be positive, fitting a model with an intercept (
αʹ), slope for the square root of the daily mean PM
2.5 concentration over the study area (
βʺ1), and using a spatial smoother (
s(
Xs,
Ys)) fit for each month in the year, predicts the value at each location using the daily mean, assuming random error (
εʺst). The R statistical computing language was used to fit all models, relying on the packages mgcv, and lme4 [
38].
4. Discussion
We examined the relationship between cloud presence and ground-level PM2.5 mass and speciation, linking changes in concentration to cloud properties and meteorological conditions. We found that, overall, cloud presence can lead to fairly substantial over or under-prediction of PM2.5 concentrations and differences in the spatial patterns of pollutant concentrations when using satellite-observed AOD to estimate ground-level concentrations.
The impact of relative humidity on PM
2.5 was both negative and consistent between sites, overpass times, and cloud and type. However, results differed by species, with estimates for RH that were negative and largest in magnitude for organic carbon. This implies that most of the changes in total PM
2.5 mass that were associated with relative humidity result specifically from a decrease in the organic carbon fraction. A likely explanation for this is an increase in the photo-oxidation rates for aromatic hydrocarbons with decreasing humidity [
16]. The fact that this association was stronger for organics at the San Francisco site, where NOx concentrations are higher and relative humidity tends to be lower on average, but stronger for gravimetric PM
2.5 at the Atlanta site, which is known for its high isoprene emissions, also supports this explanation.
The impact of PBL height and the horizontal wind speed on ground-level concentrations of PM2.5 were consistently negative, excepting estimates for the association between PBL height and nitrates in Atlanta, implying that increased wind speeds and PBL heights were associated with decreases in PM2.5 concentrations. CAPE, an indicator of vertical stability, was more consistently associated with increases in ground-level concentrations of PM2.5 on the ground, implying increases with decreasing convective energy, although this association was not consistent. This, in addition to the nitrate results, suggests that future work on this topic should include consideration of vertical convection and distribution of aerosols, as these may also change under cloudy conditions.
Increasing cloud OD, a marker of light blockage from cloud cover, and cloud emissivity, an indicator of the quantity of cloud present, were significantly associated with changes in nitrate, sulfate, and organic carbon concentrations. At both sites, we observed decreases in sulfate and total mass with increasing cloud OD when ice clouds were present. This is consistent with previous results [
10] and with an impact specifically from blockage of light to the surface during sunny/fair weather conditions that would otherwise be conducive to the photochemical production of sulfate from gaseous sulfur dioxide [
20,
39]. Results for water clouds and for nitrate and OC were not consistent between sites, however, and interpretation of these results is less straightforward. This interpretation is further complicated by the fact that cloud-aerosol interactions go both ways, and aerosols have the potential to reduce cloud droplet radii, and thus alter emissivity and OD [
21,
24]. We had expected to observe an increase in nitrate concentrations with increasing cloud OD or cloud amount, but instead only observed a decrease in nitrate concentrations under afternoon ice clouds in San Francisco. One possible explanation is noise from precipitation events associated with darker cloud-cover that were missing from our precipitation variable. Similarly, we observed an increase in the OC mass with morning water cloud OD at the Atlanta site and emissivity at the San Francisco site. The results point to changes in rates of secondary organic aerosol formation associated with light blockage. Similar to nitrate, recent research points to more rapid, nitrate-driven, nighttime oxidation of isoprene and other volatile organic compounds than through the photo-oxidation routes available during daytime and could explain this increase in concentration with increasing light blockage during the morning hours when nitrate could still be present [
20].
Precipitation, via the process of wet deposition, is associated with an overall decrease in PM
2.5 mass that is larger in magnitude for soluble than for non-soluble PM species [
40]. This was observed in our data consistently for ice clouds, which tended to precipitate more, and to some extent for water clouds. The impact of precipitation at the time of the overpass in San Francisco was also larger than that observed in Atlanta. Reasons for this could include the fact that we used a precipitation indicator instead of the precipitation rate, and that it rains more frequently in Atlanta than San Francisco, making the capture of rain during a MODIS overpass time less important relative to 24-h pollutant concentrations.
Finally, we observed a few important differences between sites. Overall, cloud-cover properties and observations at the time of the MODIS overpasses had greater explanatory power in San Francisco than in Atlanta. This was evidenced both by the significance of the cloud OD, cloud emissivity, cloud radius, and precipitation predictors in the models, as well as by the R
2 values presented in
Table 3. The case study included in our results additionally demonstrates that accounting for cloud-cover in a gap-filling model produces differences in monthly results that can be substantial. The observed differences may also stem from the frequency of cloud cover.
We had expected a large proportion of MAIAC retrievals for AOD would be missing, however, a smaller proportion than expected had consistent information on cloud properties between products. Hence, this study was only able to investigate associations for around 50% of the missing AOD observations, limiting the generalizability of conclusions. To mitigate this issue, we have made an effort in the discussion to only highlight results that were consistently observed across the models. However, this also underscores the importance of possible cloud contamination as a source of uncertainty in estimation of ground-level PM2.5 from satellite retrievals and is a potentially important area for future research.
5. Conclusions
This study demonstrated that clouds are associated with changes in ground-level PM2.5 concentration, and these changes are driven by physical and chemical processes associated with cloud cover. We additionally demonstrated that the impact of cloud-driven satellite missingness on our ability to make accurate PM2.5 estimates over a surface using this data differs by location. Not accounting for cloud cover and associated meteorological conditions, particularly rainfall, can lead to both over- and under-estimation of PM2.5 concentrations. However, additional work is still needed to confirm and clarify the relationships investigated here, particularly into the nature and rationale for the geographic differences observed in these relationships.
Associations between meteorological variables and PM2.5 total mass and constituents showed variability across pollutants, cloud types, and locations, but a few important findings stood out. We found that relative humidity is associated with a decrease in the organic component of PM2.5 resulting from the humidity dependence of rates of secondary organic aerosol formation. Also, precipitation and changes in rates of secondary aerosol production, indicated by increased cloud OD or cloud emissivity, impact concentration, and speciation of aerosols underneath the clouds.
Our analyses also suggested that not all clouds and locations can be considered equal, and the cloud presence, observed at a specific time of the day, generally matters more in San Francisco than in Atlanta. In San Francisco, we conducted a case study demonstrating changes in spatial patterns of air pollution at the monthly level that were associated with cloud-cover.