Evaluation of Using Satellite-Derived Aerosol Optical Depth in Land Use Regression Models for Fine Particulate Matter and Its Elemental Composition

: This study introduced satellite-derived aerosol optical depth (AOD) in land use regression (LUR) modeling to predict ambient concentrations of ﬁne particulate matter (PM 2.5 ) and its elemental composition. Twenty-four daily samples were collected from 17 air quality monitoring sites (N = 408) in Taiwan in 2014. A total of 12 annual LUR models were developed for PM 2.5 and 11 elements, including aluminum, calcium, chromium, iron, potassium, manganese, sulfur, silicon, titanium, vanadium, and zinc. After applied AOD and a derived-predictor, AOD percentage, in modeling, the number of models with leave-one-out cross-validation R 2 > 0.40 signiﬁcantly increased from 5 to 9, indicating the substantial beneﬁts for the construction of spatial prediction models. Sensitivity analyses of using data stratiﬁed by PM 2.5 concentrations revealed that the model performances were further improved in the high pollution season.


Introduction
Many epidemiological studies have revealed that exposure to particulate matter with an aerodynamic diameter less than 2.5 µm (PM 2.5 ) is associated with adverse health effects [1][2][3][4]. In recent years, an increasing number of studies focused on source-specific PM 2.5 and associated composition because they may affect human health more specifically [5][6][7][8][9]. For example, wood combustion, which is mainly characterized by potassium (K), was found having a strong association with mortality [8].
To predict the exposure to air pollutants, land use regression (LUR) models were developed based on air pollutants acquired at numerous locations and surrounding land use information extracted using geographic information system (GIS) as predictor variables [10,11]. On the basis of variables selected in the final models, critical sources of air pollutants can be identified [12][13][14][15].
The LUR model has been employed to predict the spatial distribution of PM 2.5 -bound elemental composition in previous studies [16][17][18][19][20][21][22][23]. However, most of these studies were conducted in the United States, Europe, and Australia. Because culture-specific sources (e.g., burning of joss sticks and papers in Chinese temples) may contribute to PM 2.5 composition, developing the LUR model to study PM 2.5 composition sources in Asia is crucial. Hsu et al. [17] developed LUR models using the satellite-derived data of vegetation index for PM 2.5 composition based on repeated measurements at six monitoring sites in Taiwan.
The composition with good model performance (R 2 ≥ 0.50) include ammonium (NH 4 + ), sulfate (SO 4 2-), nitrate (NO 3 -), organic carbon (OC), elemental carbon (EC), barium (Ba), manganese (Mn), copper (Cu), zinc (Zn), and antimony (Sb). Nevertheless, the limited number of monitoring sites might restrict its applicability to predict the spatial variation of PM 2.5 composition in a wide area. Furthermore, the utility of other satellite data (e.g., aerosol optical depth, AOD) was not explored in Hsu et al.'s study. In general, land use information indicates the surface emission solely while the satellite AOD includes the transport part of pollutants.

PM 2.5 Sample Collection and Chemical Analysis
Filter-based PM 2.5 samples used in this study were collected in Taiwan, which has an approximate population of 23 million and an area of 36,000 km 2 . Here, 24 daily samples (two samples per month per site in 2014) from 17 air quality monitoring sites, total 408 samples, were obtained from Taiwan Environmental Protection Administration (TEPA) (www.epa.gov.tw) and applied for chemical analysis. Figure 1 shows the locations of sites. Hsu et al. [17] developed LUR models using the satellite-derived data of vegetation index for PM2.5 composition based on repeated measurements at six monitoring sites in Taiwan.
The composition with good model performance (R 2 ≥ 0.50) include ammonium (NH4 + ), sulfate (SO4 2-), nitrate (NO3 -), organic carbon (OC), elemental carbon (EC), barium (Ba), manganese (Mn), copper (Cu), zinc (Zn), and antimony (Sb). Nevertheless, the limited number of monitoring sites might restrict its applicability to predict the spatial variation of PM2.5 composition in a wide area. Furthermore, the utility of other satellite data (e.g., aerosol optical depth, AOD) was not explored in Hsu et al.'s study. In general, land use information indicates the surface emission solely while the satellite AOD includes the transport part of pollutants. In this technical note, LUR models were constructed for PM2.5, aluminum (Al), calcium (Ca), chromium (Cr), iron (Fe), potassium (K), Mn, sulfur (S), silicon (Si), titanium (Ti), vanadium (V), and Zn composition based on 17 monitoring sites in Taiwan. The efficacy of utilizing satellite-derived AOD was also evaluated.

PM2.5 Sample Collection and Chemical Analysis
Filter-based PM2.5 samples used in this study were collected in Taiwan, which has an approximate population of 23 million and an area of 36,000 km 2 . Here, 24 daily samples (two samples per month per site in 2014) from 17 air quality monitoring sites, total 408 samples, were obtained from Taiwan Environmental Protection Administration (TEPA) (www.epa.gov.tw) and applied for chemical analysis. Figure 1 shows the locations of sites. A total of 11 elemental composition, including Al, Ca, Cr, Fe, K, Mn, S, Si, Ti, V, and Zn, were quantitatively obtained using energy-dispersive X-ray fluorescence spectrometry with a high detection rate (≥70%) [15]. Calibration curves were built using thin-film standards (Micromatter, Vancouver, Canada). National Institute of Standards and Technology standard reference material (SRM 2783) was used to verify the measurements (Table S1). Furthermore, the method detection limit was computed as triple the standard deviation of the detected signals of each element in 10 blank Teflon filters. A multi-elemental A total of 11 elemental composition, including Al, Ca, Cr, Fe, K, Mn, S, Si, Ti, V, and Zn, were quantitatively obtained using energy-dispersive X-ray fluorescence spectrometry with a high detection rate (≥70%) [15]. Calibration curves were built using thin-film standards (Micromatter, Vancouver, Canada). National Institute of Standards and Technology standard reference material (SRM 2783) was used to verify the measurements (Table S1). Furthermore, the method detection limit was computed as triple the standard deviation of the detected signals of each element in 10 blank Teflon filters. A multi-elemental quality control standard (Micromatter, Vancouver, BC, Canada) was analyzed for each batch of samples to ensure instrument performance.

Collection of LUR Predictors
For developing models, this study collected diverse potential predictors, including land use, road information, elevation, demographic data, location of stationary emission sources (e.g., factories and boilers), temples, and satellite data. A total of 21 predictor variables were created: area of road, residence, industry, port, semi-natural, and forest and urban green; length of road and distance to road; number of population, household, emission sources, and temples; extracted AOD values. The predictor variables are similar to the ones in our previous study and can be found elsewhere [24]. For detailed information of predictors, please see Table S2 of Supplementary Materials. The predictor variables applied in this study were processed and extracted using Quantum GIS 2.8.9 (QGIS) [25].
In this study, satellite-derived predictor of AOD was introduced in LUR model constructions to assess the efficacy for estimating concentrations of PM 2.5 and the elemental composition. Larger AOD values indicate a hazier condition and higher aerosol concentration in atmosphere. The daily AOD acquired in this study were measured by moderate resolution imaging spectroradiometers (MODIS) onboard Terra and Aqua satellites via the principle of optical properties of aerosol (e.g., extinction or backscatter). AOD in spatial resolution of 3 km × 3 km were retrieved from MODIS aerosol products of MOD04_3K and MYD04_3K. Average AOD were computed from the 3 × 3 group (i.e., a 9 km × 9 km area centered at each air quality monitoring site) to reduce the effects of having missing AOD value at the center pixel. To further utilize the AOD data, an additional predictor, the AOD percentage (AOD_PER), was calculated based on the following formula: Daily AOD_PER = number of pixels with available AOD 9 (total number of pixels) Daily AOD is calculated as an average of Terra and Aqua AOD values. Annual averages of AOD and AOD_PER were computed based on 365 daily values for modeling.

Model Constructions
LUR is built with a supervised stepwise selection procedure [12] using SAS statistical software (SAS 9.4; SAS Institute Inc., Cary, NC, USA). The construction procedure is described concisely as follows. Firstly, the start model was chosen as the univariate linear regression model with the highest adjusted explained variance (adjusted R 2 ). Next, remaining predictors were sequentially regressed against the start model and the predictor was retained if the criteria were all achieved: (a) has the highest improved adjusted R 2 (also > 0.01), (b) direction of coefficient matched with the anticipated effect, and (c) directions of remaining predictors were unchanged. Then, the predictors with p-value higher than 0.10 were removed after no more predictors met the above criteria. For predictors with variance inflation factor (VIF) > 3, the one with the highest VIF value was excluded to minimize the collinearity effects. Cook's D and Moran's I were utilized for evaluating influential sites and spatial autocorrelations. LUR model performance was assessed by leave-one-out cross-validation (LOOCV) and root mean square error (RMSE).
In this study, three methods were assessed for annual LUR model constructions. For Method 1, non-satellite predictors, including land use, road information, elevation, demographic data, stationary emission sources, and temples were applied for model building. For Method 2, AOD was added to the aforementioned predictors in Method 1. For Method 3, AOD_PER was introduced to the former predictors in Method 2. In addition, two scenarios defined as "high PM 2.5 season" (HPS) and "low PM 2.5 season" (LPS), classified by the period with PM 2.5 concentration higher or lower than the average value, were developed using Method 3 for sensitivity analysis.

LUR Modeling Results
Among the three methods for constructing annual LUR models, Method 3 (considering AOD and AOD_PER as predictors) showed the best overall performance (median LOOCV R 2 = 0.70) than Methods 1 and 2 (median LOOCV R 2 = 0.37 and 0.44, respectively) ( Table 1). Compared with the temporal-invariant land use predictors, satellite AOD and AOD_PER vary with time. Although their average values corresponding to the specific periods (i.e., annual, HPS, or LPS) were used in the models, they still are useful for representing the underlying variation of PM measures over time and space. Given the length restrictions of a technical note, we only discuss the main results of Method 3. For the summary of LUR models developed based on Methods 1 and 2, please refer to Tables S4 and S5 of Supplementary Material.

LUR Modeling Results
Among the three methods for constructing annual LUR models, Method 3 (considering AOD and AOD_PER as predictors) showed the best overall performance (median LOOCV R 2 = 0.70) than Methods 1 and 2 (median LOOCV R 2 = 0.37 and 0.44, respectively) ( Table 1). Compared with the temporal-invariant land use predictors, satellite AOD and AOD_PER vary with time. Although their average values corresponding to the specific periods (i.e., annual, HPS, or LPS) were used in the models, they still are useful for representing the underlying variation of PM measures over time and space. Given the length restrictions of a technical note, we only discuss the main results of Method 3. For the summary of LUR models developed based on Methods 1 and 2, please refer to Tables S4 and S5 of Supplementary Materials. Table 2 shows the annual model performances of PM 2.5 and elemental composition from Method 3. LOOCV R 2 among 12 PM measures ranged from 0.07 (V and Cr) to 0.92 (Si). Most PM measures, including PM 2.5 , Ca, Fe, K, Mn, S, Si, Ti, and Zn, showed LOOCV R 2 higher than 0.40, indicating reasonable performance of the constructed LUR models. The length and surface area of road network and distance to road were applied in LUR development to represent traffic factors. Ca, Fe, Si, Ti, and Zn retained road-related predictors in the final models. Ca, Fe, Si, and Ti are regarded as crustal elements and roadside dust, which may be re-suspended by wind flow and heavy traffic [26][27][28]. Fe and Zn may be contributed by the abrasion of tire wear or brake linings from automobiles [29][30][31].   INDUSTRY (industrial area in buffer size (i.e., radius) of 1000 or 5000 m) and number of stationary emission sources (POINT_N_5000) were selected in the models of PM 2.5 , Cr, Fe, K, Mn, Si, Ti, and Zn. Cr, Fe, K, Mn, and Zn may be emitted from the ferrous or non-ferrous metal processing of industrial sources [32][33][34]. Si and Ti belong to crustal elements and are commonly utilized as markers of soil or construction dusts [35], and may also present in the industrial process of cement production or metal manufacturing [33,36,37]. Besides, the large buffer size (5000 m) implied that the elemental composition were influenced by distant industrial sources. Comparing with the studies conducted in Taipei and Kaohsiung in Taiwan, the industrial area variables were also retained in the final LUR models for PM 2.5 , Fe, Mn, Si, Ti, and Zn [15,38].
In this study, the number of temples was considered as a culture-specific predictor in model constructions. Five composition, K, S, Si, Ti, and V, retained the variable of temples in the final models. This variable was not significant in Hsu et al. [17], possibly because the limited number of monitoring sites in that study did not reflect the variabilities of environmental information. More geographically heterogeneous sites are expected to be included in LUR modeling for sufficiently representing the features of study areas.
Satellite-derived predictor of AOD was selected in four models among 12 PM measures (Table 1), including PM 2.5 , Fe, S, and Zn. AOD_PER was retained in nine models of PM 2.5 , Cr, Fe, K, Mn, S, Si, Ti, and Zn. LOOCV R 2 of most PM measures increased significantly after AOD and AOD_PER were introduced in modeling, as compared to results from Method 1. For PM 2.5 , Fe, K, Mn, S, Si, Ti, and Zn, LOOCV R 2 increased more than 0.30. AOD and AOD_PER exhibited a strong Pearson correlation with PM 2.5 (0.75 with AOD; 0.69 with AOD_PER) and specific elemental composition (ranged from 0.69 to 0.72 with AOD; 0.35 to 0.74 with AOD_PER), and thus beneficial for the construction of spatial prediction model. AOD is obtained through measuring the extinction of the solar beam by particles, which relates to the amount of suspended aerosol in atmosphere. However, LOOCV R 2 of Al and Cr model decreased. Predictor selection procedures were examined, which indicated that the presence of AOD or AOD_PER affected the selection procedures and changed the results. Thus, the model did not always give superior performance as more predictors were considered in model constructions.
AOD_PER showed positive coefficients in all nine LUR models. The value of AOD_PER depends on the availability of AOD, which is mainly influenced by cloud cover effect. Higher AOD_PER indicates fewer missing values and therefore less cloud cover effects, leading to the lower possibilities of precipitation and higher potentials of photochemical reaction from solar radiation. Precipitation reduces PM 2.5 concentration in the atmosphere and corresponds with the decrease of elemental composition in PM 2.5 mixtures while the photochemical reaction partially contributes to the formation of secondary aerosol. Furthermore, the fine particulate matters with higher hygroscopicity potentially facilitate the cloud formation which is related to AOD_PER value, indicating that the AOD_PER parameter includes the information of aerosol type.
In this study, the samples were divided into HPS and LPS based on PM 2.5 levels higher or lower than the average value (=21.0 µg/m 3 ) and LUR models were built accordingly using Method 3 (Tables S6 and S7). January to April and December were classified as the HPS while May to November as the LPS. The average temperature during the HPS and LPS was 18.7 vs. 28.1 • C, respectively ( Figure S1). During the cold season, the lower mixing height would be favorable to form high air pollutions [39]. Overall, the model performed better during the HPS than during the LPS (Table 1, median LOOCV R 2 = 0.76 vs. 0.46). Additionally, AOD_PER showed higher occurrences in HPS models than in LPS models. AOD_PER represents an indirect indicator of cloud coverage, implying that meteorological factors might be decisive predictors and more critical to estimate the levels of PM 2.5 and associated elemental composition during the HPS than the LPS. Moreover, Al, Ca, and Mn showed better LOOCV R 2 in both HPS and LPS models than in the annual models. This suggests that predictors might have different effects on certain PM measures by seasons, which cannot be reflected in annual models. For example, AOD_PER was retained in the HPS model but not annual model for Al. This may be because the meteorological variability was smoothed out in annual averages, thus cannot reflect the seasonal variance of Al level by AOD_PER.
For applications in epidemiological studies, most previous studies considered LOOCV R 2 greater than 0.40 as an inclusion criterion [40][41][42][43]. Table 1 shows that the number of applicable models (LOOCV R 2 > 0.40) was nine (Method 3) for the annual data, and increased to 12 for HPS. It was also noted that the LOOCV R 2 of Al, Cr, and V improved significantly from < 0.3 in the annual models to 0.70, 0.62, and 0.53, respectively, in HPS models. This suggests potential influence of diverse meteorological conditions between clean and hazy days. Further investigation is needed to explore the meteorological effect to the LUR modeling. One limitation of this study is that the models are not applicable to mountainous areas since the locations of samples were mainly distributed at altitude < 250 m.

Conclusions
In this technical note, the efficacy of developing LUR models for PM 2.5 and elemental composition with satellite-derived AOD was evaluated. The LOOCV R 2 for all PM measures were higher than 0.40, except for Al, Cr, and V, after including AOD and AOD_PER, indicating the critical improvements for model constructions. In HPS models, LOOCV R 2 of Al, Cr, and V significantly increased and were higher than 0.40, demonstrating the improved utility of the models during hazy periods.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/atmos12081018/s1, Table S1: Certified and measured values for PM measures using National Institute of Standards and Technology Standard Reference Material 2783 (n = 3), Table S2: The definitions of predictor variables in constructions of LUR models, Table S3: Descriptive statistics of PM 2.5 and elemental composition in annual value, HPS and LPS, Table S4: Summary of land use regression models of PM 2.5 and elemental composition using annual averages (Method 1), Table S5: Summary of land use regression models of PM 2.5 and elemental composition using annual averages (Method 2), Table S6: Summary of land use regression models of PM 2.5 and elemental composition in high PM 2.5 season (HPS) (Method 3), Table S7: Summary of land use regression models of PM 2.5 and elemental composition in low PM 2.5 season (LPS) (Method 3), Figure S1: Comparison of monthly PM 2.5 concentration (black column) and ambient temperature (dotted line).