1. Introduction
Studies show that air pollution has negative health impacts on humans [
1], and this issue has received considerable attention in recent years [
2,
3]. The air pollutants that are most dangerous to humans include sulfur dioxide, nitrogen dioxide, ozone, and particulate matter with an aerodynamic diameter of less than 2.5 μm (PM
2.5) [
4]. Because of its small size, particulate matter can adhere to the deep respiratory tract and affect blood circulation by penetrating lung cells [
5,
6]. Several studies have shown that particulate matter increases the risk of developing airway obstructive disease, chronic bronchitis [
7], asthma (in children) [
8,
9,
10], lung cancer [
11], and various other cardiovascular diseases [
12,
13,
14]. Thus, when assessing PM
2.5 pollution and the cumulative health effects, an accurate method of estimating PM
2.5 exposure at fine spatial and temporal resolutions is essential, even when actual measurements are unavailable. However, the global monitoring of PM
2.5 remains in a nascent phase and is characterized by limited spatial coverage, as observed in China [
15,
16].
Due to missing early PM
2.5 monitoring data, and the insufficient spatiotemporal coverage of existing PM
2.5 monitoring data, the approaches to estimating PM
2.5 exposure have not been developed fully until recently. Early practitioners [
17,
18] used simple methods to predict PM
2.5 at time slices when measurements of co-located pollutants (PM
10 or total suspended particulates (TSPs)) were available. These approaches involved the use of long-term ratios of PM
2.5 to the co-located pollutants, which introduced considerable limitations because of the lack of available data for these pollutants.
Advanced approaches have recently been applied for PM
2.5 exposure estimations, including the land-use regression (LUR) [
19] and kriging methods [
20]. Further, the increased availability of satellite data has led to strategies that employ normalized difference vegetation index (NDVI), surface temperature, and aerosol optical thickness (AOT) data in combination with other non-satellite variables, such as meteorological parameters, traffic indices and elevation levels [
21]. Of these recent approaches, linear regressions have been applied extensively, although non-linear approaches have also been used [
22]. Kriging is often used separately from LUR because of the model’s complexities in combining other covariates. In addition, several spatiotemporal models have been developed for the estimation of PM
2.5 concentrations at high spatiotemporal resolutions. Kloog et al. [
23] proposed a number of spatiotemporal models using satellite-derived AOT and they obtained an out-of-sample average R
2 of 0.81. Xie et al. [
24] and Zheng et al. [
25] employed satellite-derived AOT data for China to predict the daily PM
2.5 levels and achieved a R
2 of roughly 0.80. However, the two AOT-based approaches measure pollution at a coarse spatial resolution (three kilometers) and cannot reliably estimate the within-community variability of PM
2.5 at higher spatial resolutions. For the kriging methods, measurement data are usually used to train variogram models to perform PM
2.5 concentration predictions without considering other covariates [
20] that might limit the model’s predictive power.
Due to the limited temporal and spatial coverage of PM
2.5 monitoring data, as well as the limitation of the previous estimation methods, we propose an ensemble spatiotemporal modeling approach to be an improvement in estimating PM
2.5 concentrations. Compared with the previous methods, our approach integrates non-linear associations, ensemble learning, and residual kriging methods to predict the within-community variability of PM
2.5 with an improved accuracy. Our approach employed generalized additive models (GAM) to consider the variability of spatial and spatiotemporal predictors with non-linear effects to capture associations between predictors and PM
2.5 [
26]. Ensemble learning can be used to generate stable predictions with less extreme values based on multiple GAM models while also outputting an uncertainty indicator (standard deviation). A kriging interpolation of the daily residuals derived from ensemble learning predictions can be used to capture residual spatial patterns and considerably improve estimations even without use of PM
10 or other co-pollutant predictor. We also examined the proposed approach for several scenarios of different combinations of predictors, and we demonstrated how our approach could achieve optimal accuracy for these different scenarios.
Supplemental Materials Table S1 presents an annex table for technical terms used in this paper.
4. Discussion
In this paper, we proposed an ensemble spatiotemporal modeling approach for robustly predicting PM
2.5 concentrations. For the individual model, we used a GAM (generalized additive models) to determine the non-linear association between PM
2.5 and multiple predictors (meteorological patterns, traffic indices, season variations, the number of emission sources, and land-use patterns). In particular, we extracted temporal basis functions from the monitoring stations to represent the seasonal PM
2.5 variability for the study region, which accounted for a large portion of the explained variance. We then used the bagging method to sample the dataset to train 1000 individual models and derived stable ensemble predictions with standard deviations as an uncertainty measure. Then, the kriging method was used to model the residuals from the ensemble predictions to estimate the daily PM
2.5 residuals throughout a year (365 days) for the study region. For the model that did not include PM
10 as a predictor, the daily residual kriging achieved a similar predictive performance (R
2: 0.86 vs. 0.89) as the model using PM
10 as a predictor. Strong spatial autocorrelations of the residuals accounted for a considerable portion (33%) of the explained variance when co-located PM
10 values were not included. These results denote the usefulness of residual spatial autocorrelations for predicting PM
2.5 when PM
10 measurements are missing as observed for our study location of Shandong Province, China. To our knowledge, previous studies that have performed the same predictions [
22,
23,
65] have only reported R
2 values of 0.54–0.81, and few studies have achieved a similar estimation accuracy for PM
2.5 without using PM
10 as a predictor.
This study also illustrates the important contributions of non-linear associations in the models [
19,
20,
22,
23]. The final results show a considerable predictive improvement through the use of non-linear additive models. The results reveal a notable positive non-linear association between PM
2.5 and PM
10, AOT (aerosol optical thickness), the number of emission plants, and air temperature, with varying fluctuations found for different predictor intervals (
Figure 5a–c,f), whereas a negative association was observed between PM
2.5 and precipitation (
Figure 5e). For the wind vector terms, the PM
2.5 concentrations tended to decline as the wind speed increased in the south-north and east-west directions because of complex interactions (
Figure 5d). Such associations are generally consistent with previous conclusions [
29,
49].
As a regional pollutant, PM
2.5 is affected by various factors, including meteorological parameters, emissions sources, and traffic indices [
18,
19,
22,
66]. PM
10 consists of PM
2.5 and other components, and it is strongly correlated with PM
2.5; therefore, co-located PM
10 is a primary predictor of PM
2.5 levels and explained most of the variance in the multivariate models. Following PM
10, the first temporal basis function was the second most important predictor. The temporal basis functions captured the seasonal variability of PM
2.5 levels for the study region. For the model with PM
10 used as a predictor, the first temporal basis function accounted for roughly 5% of the total variance. However, in the multivariate model (Model 3) without PM
10, meteorological parameters (including precipitation, wind speed, temperature, and humidity) accounted for only 4.55% of the variance. In addition, traffic indices accounted for 5.01% of the variance; emission plants accounted for roughly 3.34% of the variance; and AOT and NDVI (normalized difference vegetation index) accounted for 4.77% and 0.24% of the variance, respectively. Previous studies of certain regions show weak correlations between AOT and surface PM
2.5 because the surface reflectance ratio between visible and shortwave infrared channels of the AOT product was underestimated [
67]. Our study also illustrates the limited contributions of AOT and NDVI as predictors.
For the models using PM10 as a predictor (Models 5 and 6), co-located PM10 accounted for most (67.97%) of the variance, whereas the other predictors together accounted for only 13.33%. This finding illustrates that PM10 captured a major part of the spatiotemporal variability in PM2.5. Without the PM10 predictor, the other variables accounted for roughly 53.2% of the variance. In addition to the first temporal basis function (accounting for 26.71%), the other predictors together accounted for only 26.49% of the variance.
Unfortunately, historical PM
2.5 and PM
10 measurements are not available for China and many other countries across the globe [
16,
68]. Even today, the spatial coverage of the PM
2.5 and PM
10 monitoring network are limited in China. To comprehensively monitor PM
2.5 pollution levels and assess their cumulative health effects on humans, reliable estimates of PM
2.5 concentrations at fine spatiotemporal resolutions should be performed using the limited available PM
2.5 monitoring data for time periods without available data over extensive spatial areas. Unfortunately, accurately predicting PM
2.5 levels at high spatiotemporal resolutions is difficult without relevant data on co-located pollutants. In this paper, we explored the use of kriging interpolations of daily residuals, and the results show considerable improvements in predictive performance. Whereas the GAM already took both spatial and temporal information by the covariates, such spatial and spatiotemporal covariates couldn’t fully capture the spatiotemporal variability of PM
2.5 and so the performance is not so good without use of spatiotemporal residuals or PM
10. The daily residual’s kriging captured an important portion of PM
2.5 spatiotemporal variability not captured by GAMs. Although a previous study also employed residual kriging interpolations [
69], the variogram modeling of daily residuals remains poorly researched. Therefore, we analyzed the variogram patterns of residuals for a one-year period and performed a cross validation to illustrate the generalizability of the proposed method. As a regional pollutant, PM
2.5 exhibited stronger effects and spatial autocorrelations than nitrogen oxide. Spatial autocorrelations of the residuals were better able to explain the spatial variability of PM
2.5 than nitrogen oxide pollutants [
49]. Because of the considerable effects of PM
2.5, particularly for winter in Shandong Province, the residual kriging method was applicable despite the limited number of PM
2.5 monitoring stations examined and large spatial distances between the monitored samples. As shown in the results (
Figure 6), PM
2.5 in winter showed effects over a longer time period and a greater spatial area than those observed in summer. We expected PM
2.5 to have an effect over a longer time period (a longer range) and to present higher concentrations (also resulting in higher partial sill and nugget values) in winter than summer. Our variogram modeling of the daily residuals captured these spatial autocorrelation trends to compensate for the gap in the variance explained due to the missing PM
10 predictor, and thus improved the accuracy of our final predictions. In practice, co-located PM
10 and other pollutant measurements are not usually available. Thus, Model 4 (using residual kriging interpolations but not PM
10 as a predictor) has the potential for use in a greater number of applications than the other models that used PM
10 as a predictor. Our cross-validation results show that Model 4 achieved high levels of accuracy and the results were slightly better than those of Model 5 (using PM
10 as a predictor but not the residual kriging interpolation) (R
2: 0.86 vs. 0.82), although the results were slightly less accurate than those of Model 6 (using PM
10 and residual kriging) (R
2: 0.86 vs. 0.89). The R
2 value of Model 6 was only 7% higher than that of Model 5, which may have been related to the spatiotemporal PM
10 values, which accounted for a major portion of the variance explained in Model 6; whereas the remaining 7% that was not captured by the predictors was explained by the spatial autocorrelations of the residuals.
Based on individual GAMs, our approach introduced the bagging technique to obtain the stable prediction with standard deviation as an uncertainty indicator. Thereby, our prediction could be robust and less over-fitting although our performance is pretty similar to the individual model’s output from the other approaches [
13,
24,
25,
65,
70]. On the other hand, due to characteristics of strong spatial autocorrelation of PM
2.5 concentration, our approach could considerably improve performance even without use of the PM
10 predictor. Thus, for practical prediction, our approach does not need the extraction of complicated covariates such as output of the chemical transport model (GEOS-CHEM) and is less costly with a similar performance at high spatiotemporal resolution. Production of PM
2.5 is complicated and varies with different regions. While some approaches may have good performances in other regions [
13,
65,
70], they may be not applied to China as our approach, due to unavailability of some predictive covariates or differences in geographical and meteorological factors and emission sources.
This study presents several limitations. First, we extracted the first and second temporal basis functions from regular monitoring data to represent the study region’s seasonal variability in PM
2.5. This procedure could have resulted in overfitting. To determine the influence on the final results, we conducted a 10 × 10 cross-validation and independent test, and the results (CV R
2: 0.86–0.89; 0.73 for independent test) show high levels of predictive accuracy, thereby illustrating the limited influence of the temporal basis functions extracted from the predictions. Second, our variogram modeling of the daily residuals was not systematic, which may have affected the applicability of the proposed spatiotemporal approach. Our validation and independent results show that such effects are limited for extensive applications. Third, our spatiotemporal modeling approach was based on monitoring data for Shandong Province, China; thus, its applicability may be limited. PM
2.5 pollution has a larger sphere of influence in northern regions of China, including Shandong Province, and the spatial autocorrelation of residuals of PM
2.5 concentrations was markedly strong. Thus, the use of the residual kriging method for less dense monitoring networks could still capture the spatiotemporal variability of PM
2.5 and account for a large portion of the variance. For regions with lower PM
2.5 pollution and sparser monitoring networks, the residual kriging method may not be applicable, and spatial effects modeling may represent a more preferable approach. We have explored and reported on such a method in another study [
71].