Inferring Near-Surface PM2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model

Xue, Wenhao; Wei, Jing; Zhang, Jing; Sun, Lin; Che, Yunfei; Yuan, Mengfei; Hu, Xiaomin

doi:10.3390/rs13030505

Open AccessArticle

Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model

by

Wenhao Xue

^1,†

,

Jing Wei

^2,†

,

Jing Zhang

^1,*,

Lin Sun

³,

Yunfei Che

^1,4,

Mengfei Yuan

¹ and

Xiaomin Hu

¹

College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China

²

Department of Atmospheric and Oceanic Science, Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20740, USA

³

College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China

⁴

State Key Laboratory of Severe Weather & Key Laboratory for Cloud Physics, Chinese Academy of Meteorological Sciences, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

^†

Both authors contributed equally to this work and should be considered co-first authors.

Remote Sens. 2021, 13(3), 505; https://doi.org/10.3390/rs13030505

Submission received: 26 December 2020 / Revised: 21 January 2021 / Accepted: 28 January 2021 / Published: 31 January 2021

(This article belongs to the Special Issue Remote Sensing of Atmospheric Aerosols over Asia: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

Much of the population is exposed to PM_2.5 (particulate matter) pollution in China, and establishing a high-precision PM_2.5 grid dataset will be very valuable for air pollution and related studies. However, limited by the traditional models themselves and input data sources, PM_2.5 estimations are of low accuracy with narrow spatial coverage. Therefore, we develop a new spatiotemporally weighted random forest (SWRF) model to improve the estimation accuracy and expand the spatial coverage of PM_2.5 concentrations using the latest release of the Visible infrared Imaging Radiometer (VIIRS) Deep Blue (DB) aerosol product, along with meteorological variables, and socioeconomic data. Compared with traditional methods and the results of previous similar studies, our satellite-derived PM_2.5 distribution shows better consistency with surface-measured records, having a high out-of-sample (out-of-station) cross-validation (CV) coefficient of determination (CV-R²), root mean squared error (RMSE), and mean absolute error (MAE) of 0.87 (0.85), 11.23 (11.53) μg m⁻³ and 8.25 (8.78) μg m⁻³, respectively. The monthly, seasonal, and annual mean PM_2.5 were also successfully captured (CV-R² = 0.91–0.92, RMSE = 4.35–6.72 μg m⁻³). Then, the spatial characteristics of PM_2.5 pollution in 2018 were investigated, showing that although air pollution has diminished in recent years, China still faces a high PM_2.5 pollution risk overall, especially in winter (average = 50.43 + 16.81 μg m⁻³). In addition, 19 provinces or administrative regions have annual PM_2.5 concentrations >35 μg m⁻³, particularly the Xinjiang Uygur Autonomous Region (~55.25 μg m⁻³), Tianjin (~49.65 μg m⁻³), and Henan Province (~48.60 μg m⁻³). Our estimated surface PM_2.5 concentrations are accurate, which could benefit further research on air pollution in China.

Keywords:

PM_2.5; VIIRS; DB; AOD; SWRF; China

Graphical Abstract

1. Introduction

In the past decade, air pollution in China has become a hot topic. Among all air pollutants, PM_2.5 (particulate matter with the aerodynamic diameter ≤2.5 μ m) has been identified as the major component of composite air pollution in China. With the rapid increase in urban loading throughout China, particulate-laden (i.e., smoggy) air is emitted from industrial sources during production processes and from domestic emissions, particularly in urban agglomerations throughout China [1,2,3]. Furthermore, high PM_2.5 concentration loading has caused frequent heavy pollution episodes in those key regions in recent years [4]. More importantly, public health studies have confirmed that particulate matter is a crucial causative factor in respiratory, cardiovascular, and autoimmune diseases [5,6,7,8], and Rohde and Muller (2015) estimated that air pollution kills approximately 700,000 to 2,200,000 people per year in China. Moreover, fine particulate matter could also affect the material and energy cycles of the Earth-atmosphere system [9]. Hence, related research on PM_2.5 is important for resolving air pollution and environmental health problems.

In the above context, the Chinese government has monitored the PM_2.5 concentration in real time since 2013, for which in situ observation sites were established primarily in urban areas across mainland China [10]. Despite these endeavors, however, the temporal and spatial distributions of measurement stations are highly discontinuous, introducing major uncertainties into related research. Therefore, mapping the PM_2.5 concentration distribution across China is vital. Fortunately, in tandem with the advances made in satellite remote-sensing technology, it is feasible to construct high-precision and -resolution PM_2.5 records. The aerosol optical depth (AOD) refers to the integration of the extinction coefficient of the medium in the vertical direction, which is used to describe the reduction of light by aerosol [11,12]. It has been used as the most important indicator to derive ground-level PM_2.5 concentrations due to their stable and positive relationships [13]. The moderate-resolution imaging spectroradiometer (MODIS) Dark Target (DT) and Deep Blue (DB) algorithm aerosol products with a spatial resolution of 10 km have been extensively applied [14,15]. However, since the launch of Terra, the first Earth Observing System (EOS) AM-1 orbiting satellite, on 18 December 1999, this satellite has been in service for over 20 years, which is far beyond its designed service life. As an extension and improvement of MODIS and the Advanced Very High Resolution Radiometer (AVHRR), a new sensor, the Suomi National Polar-orbiting Partnership (NPP) Visible infrared Imaging Radiometer (VIIRS) satellite was successfully launched in 2011. The VIIRS aerosol products (VAOOO) are generated by an algorithm similar to the DT algorithm at a spatial resolution of 6 km [16]. However, large uncertainties were discovered in the VIIRS VAOOO AOD product with large missing values, especially for bright surfaces [17,18]. Fortunately, a new aerosol product, i.e., the VIIRS Version 1 DB aerosol (AERDB) product, was generated using the DB algorithm over land and a satellite ocean aerosol retrieval algorithm over the ocean and was released by the National Aeronautics and Space Administration (NASA) in February 2018. Owing to improvements in the radiative transfer model, surface reflectance and aerosol type, this new product yields a much better accuracy with wider spatial coverage than the VIIRS VAOOO AOD product, especially for bright surfaces [19].

Three main approaches have been widely used to derive PM_2.5 concentrations: physical mechanisms [20], statistical modeling [21] and machine learning [22,23]. Many of these methods were applied to establish the VIIRS AOD–PM_2.5 relationship on different spatial scales. Wu et al. (2016) selected a spatiotemporal statistical model to construct a prediction-based PM_2.5 map for the Beijing–Tianjin–Hebei (BTH) region based on the VIIRS AOD product, meteorological data, and other auxiliary data, resulting in a R² of 0.72 for the model validation. Subsequently, Yao et al. (2018) indicated that the VIIRS AOD-based model could explain 76% of the PM_2.5 variations in the BTH and work better than the MODIS AOD-based model (~71%), hence suggesting that using the VIIRS AOD product is meaningful. On this basis, a two-stage model incorporating both spatial and temporal information was created to establish a PM_2.5 map for China according to the VIIRS AOD product, and the cross-validation (CV) R² was 0.60 [24]. In addition, studies have been conducted to invert the PM_2.5 distribution based on the VIIRS AOD product in many regions of China [10,24]. However, the consistency between the model-predicted and ground-measured PM_2.5 remains poor because the VIIRS VAOOO AOD product lacks determinacy and traditionally derived models are unstable. Accordingly, because of its superior data mining capability and ability to construct regression relationships, machine learning has been widely employed to derive accurate near-surface PM_2.5.

Although there are existing studies making gridded PM_2.5 data of the China region from different satellite AOD products, there are still many limitations that need to be improved. On the one hand, the overall accuracy of PM_2.5 estimates in previous studies is generally low and limited by the developed models, which have poor data mining ability or ignored the spatiotemporal differences of air pollution. On the other hand, most previous studies are based on the widely used MODIS satellites, which have been in service for more than 20 years and may decommission soon in the future. Thus, VIIRS will play an important role in extending the EOS long-term observations [25,26]. However, recent related PM_2.5 studies are mainly relying on VIIRS VAOOO AOD products, which have a large number of missing values over bright surfaces due to the limitation of the DT algorithm [17,18]. Therefore, focusing on these issues, we developed a new spatiotemporally weighted random forest (SWRF) model based on ensemble learning by considering the spatiotemporal characteristics of air pollution to improve the estimation accuracy and spatial coverage of PM_2.5 concentrations in China from the newly released VIIRS DB AOD products. Section 2 introduces the data and establishes the SWRF model. Section 3 evaluates the performance of the SWRF model and presents a comparison with other relevant studies and traditional models; moreover, the PM_2.5 spatial distribution is investigated in China in this section. Section 4 provides the summary and conclusions.

2. Data and Methods

2.1. Datasets

2.1.1. Particulate Matter (PM_2.5) Monitoring Data

For this research, hourly PM_2.5 records were obtained from 1583 ground-based monitoring stations operating in 2018 in China to build the AOD-PM_2.5 relationships. To be consistent with the satellite overpass time, hourly PM_2.5 measurements from 13:00 to 14:00 local time were averaged regard as the daily mean PM_2.5 concentration for model fitting and verification. Figure 1 displays the locations of in situ observation stations in monitoring air quality across China. In general, the number of sites in East China is much higher than that in Western China, and the distribution of sites in the former is more uniform than that in the latter. For the urban agglomeration regions in particular, e.g., the BTH, Yangtze River Delta (YRD) and Pearl River Delta (PRD) regions, the urban and suburban areas in China are characterized by a high station density.

2.1.2. Aerosol Optical Depth (AOD) Data

In this study, daily Level-2 VIIRS AERDB product data with a spatial resolution of 6 km were collected for 2018 to establish the AOD–PM_2.5 relationship. For VIIRS aerosol products, only the cloud-free DB AOD retrievals (550 nm) with the highest quality (i.e., quality assurance = best) in 2018 covering the whole of China are employed [25,26]. In addition, AOD measurements provided by the Aerosol Robotic Network (AERONET) at 10 stations across China in 2018 were collected to evaluate the accuracy of the VIIRS DB AOD product. For AOD measurements, the AERONET Version 3 Level 2.0 data with cloud screened and quality assured were selected [27]. Figure 2 shows the agreement between the satellite-based and ground-measured AODs in China, demonstrating excellent consistency between the VIIRS AERDB retrievals and AERONET AODs with a high R², low root mean squared error (RMSE), and low mean absolute error (MAE) of 0.91, 0.11, and 0.07, respectively. The high quality of these data allows a stable AOD-PM_2.5 relationship to be established.

2.1.3. Meteorological Dataset

Meteorological factors can impact PM_2.5 concentrations [26,27], and surface meteorological measurements are available in China, but they are in situ observations, and there are only 720 public base stations across China, which are more sparsely distributed and number much less than PM_2.5 monitoring stations. Thus, it cannot meet the needs of our study to generate the grid data in our study paper. By contrast, ERA-Interim reanalysis data can provide spatial continuous surface meteorological measurements and has a considerable accuracy across China [28,29]. Here, eight relevant meteorological factors, namely, the 2-m temperature (TEM), surface pressure (SP), wind direction (WD) and speed (WS), relative humidity (RH), evaporation (ET) precipitation (PRE) and the boundary layer height (BLH), were collected [30] to help build the AOD-PM_2.5 relationship. The temporal and spatial resolutions of the ERA-Interim meteorological variables were 3 hours and 0.125° × 0.125°, respectively.

2.1.4. Other Multiple Datasets

In this study, the VIIRS normalized difference vegetation index (NDVI) was collected with a monthly temporal resolution and a spatial resolution of 750 m to explain the spatial heterogeneity of PM_2.5 concentrations caused by vegetation coverage in China. However, land use and altitude also have an important impact on PM_2.5. Therefore, VIIRS annual land cover data (LUC, 750 m) and a Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM, 90 m) were obtained. Furthermore, as socioeconomic factors also play a crucial role in air pollution monitoring, monthly nighttime light (NTL) data were obtained from the VIIRS Day/Night Band (DNB) product with a spatial resolution of 500 m. All auxiliary data were resampled using bilinear interpolation to the same spatial resolution of 0.06°×0.06° to be consistent with VIIRS AOD.

2.2. Methodology

2.2.1. Spatiotemporally Weighted Random Forest Model

Aimed at the issues of weak data mining ability and ignoring the spatiotemporal heterogeneities of PM_2.5 pollution for current existing models, in this study we developed a totally new spatiotemporally weighted random forest (SWRF) model by involving the spatial and temporal information into the ensemble learning to improve the PM_2.5 estimations in China. The SWRF model includes two stages: the first stage is a temporally weighted random forest (TRF) model [31], which is used to build the preliminary relationship between PM_2.5 and explanatory variables by considering the different air pollution conditions among varying days in a year. The second stage employs the geographically weighted regression (GWR) model, which is used to eliminate the residual by considering the spatial autocorrelations and differences of points in space. Figure 3 shows the flowchart and structure of the SWRF model.

In the first stage, to describe the time information more accurately, a time-weighted matrix was established to reflect the temporal difference in PM_2.5 concentrations among different days in a year. The matrix includes the day of the year (DOY, ranging from 1 to 366) and weighted time distances, where the DOY represents a time interval of equal weight (e.g., 1 day) used to identify each data record at one point on different days in a year. This matrix can reflect the different pollution conditions on each day and the continuity between adjacent days. In addition, PM_2.5 pollution shows obvious seasonal variations [32]: summer (winter) is usually the cleanest (dirtiest) season; thus, seasonal differences were also considered. To describe the indeterminacy caused by seasonal fluctuations, 4 seasonal time nodes, i.e., 15 January, 16 April, 16 July, and 16 October, were selected to represent the middle of winter, spring, summer, and autumn, respectively. Then, the minimum time intervals from the above four timing nodes of each day in a year were calculated based on the inverse distance weighting (IDW) method [33]. Accordingly, a time-weighted matrix was established to depict the PM_2.5 variations on daily and seasonal scales, and the matrix can be expressed as:

T W M_{g t} = [D O Y_{g t}, \frac{1}{S p r D_{g t}}, \frac{1}{S u m D_{g t}}, \frac{1}{A u t D_{g t}}, \frac{1}{W i n D_{g t}}]

(1)

where

T W M_{g t}

indicates the time-weighted matrix in grid g on day t,

D O Y_{g t}

represents the DOY index in grid g on day t, and

S p r D_{g t}, S u m D_{g t}, A u t D_{g t}

and

W i n D_{g t}

are the temporal distances for four seasons.

Except for

T W M_{g t}

, other variables to be entered RF model include AOD, BLH, ET, NDVI, NTL, RH, PRE, SP, TEM, WD, and WS. We first performed a correlation analysis to ensure the interrelation between PM_2.5 and each independent variable to ensure the results are statistically significant (Table 1). Among them, AOD, evaporation, land cover, night light, relative humidity, surface pressure, and wind direction showed positive influences on the PM_2.5 concentration; in contrast, negative relationships were found between PM_2.5 and boundary layer height, digital elevation model, NDVI, precipitation, temperature and wind speed. These significant interrelations with PM_2.5 indicate that all the above variables could contribute to estimating the PM_2.5 concentration distribution. In addition, the multicollinearity problem may arise due to many independent variables employed herein. Therefore, we used the variance inflation factor (VIF) method to test and eliminate collinearity among these predictors (Table 1). There are obvious collinearity problems among independent variables if the VIF values are >10 [34]. The results show that except for the strong collinearity between SP (VIF = 10.09) and DEM (VIF = 10.10), all the other variables are totally independent with small VIF values (less than 10). However, SP and DEM are used in different stages in the SWRF model; thus, the collinearity between SP and DEM can be ignored, and all of the aforementioned variables were involved in estimating PM_2.5. Therefore, the aforementioned 11 independent variables and the temporal term are input to the RF model as:

P M_{{2.5}_{-}} P {re}_{g t} = f_{R F} [A O D_{g t}, B L H_{g t}, \dots, W S_{g t}, W S_{g t}, T W M_{g t}]

(2)

where

P M_{{2.5}_{-}} P {re}_{g t}

indicates the preliminary estimated PM_2.5 concentrations at the surface in grid g on day t.

Furthermore, PM_2.5 pollution also shows significant spatial heterogeneities because of the difference of natural conditions and human activities at the regional scale. Therefore, in the second stage, AOD, LUC, and DEM were selected to calculate geographical weights to express the spatial autocorrelations and differences at a point in space and to correct the spatial uncertainty in the PM_2.5 estimates obtained in the first stage. The residuals of preliminary near-surface PM_2.5 concentrations (i.e., the differences between the measured and predicted PM_2.5 concentrations) were calculated as the explained variable, and a geostatistical method was used to revise the PM_2.5 spatial heterogeneity as follows:

P M_{{2.5}_{-}} r e s i_{g t} = a_{0 g} + a_{1} (μ_{g t}, υ_{g t}) \times A O D_{g t} + a_{2} (μ_{g t}, υ_{g t}) \times D E M_{g t} + a_{3} (μ_{g t}, υ_{g t}) \times L U C_{g t} + ε 0_{g t}

(3)

where

P M_{{2.5}_{-}} r e s i_{g t}

indicates the simulated PM_2.5 residual obtained by the RF model in grid g on DOY t, a_0g represents the intercept in grid g,

a_{1} (μ_{g t}, υ_{g t}), a_{2} (μ_{g t}, υ_{g t})

and

a_{3} (μ_{g t}, υ_{g t})

represent the slopes of AOD, DEM, and LUC, and

ε 0_{g t}

is the error term.

In addition, Traditional statistical methods have been widely used in the estimation of surface PM_2.5 concentration. To compare the inversion ability of our SWRF model to pervious methods, five traditional widely used PM_2.5 estimation models, including the generalized additive model (GAM), the multiple linear regression (MLR), linear mixed effect (LME), geographically weighted regression (GWR) methods and the traditional 2-stage approach, were selected for comparison used the same dataset with SWRF model [21,35,36,37,38].

2.2.2. Valuation Approaches

In this paper, we employed two independent 10-fold cross-validation (10-CV) methods [39] based on the data samples (i.e., out-of-sample or sample-based) and the PM_2.5 monitoring stations (i.e., out-of-station or station-based), respectively. Data samples or PM_2.5 monitoring stations were randomly divided into 10 subsets, where nine were used as training data, and one was used as validation data. This procedure was repeated 10 times until all the data had been tested. The training and validation data were totally independent in the sample and spatial scales, which have been widely used to evaluate the overall accuracy and spatial prediction ability of the model [40,41,42]. In addition, four statistical indicators were employed: the regression line (slope and intercept), R², RMSE, and MAE.

3. Results and Discussion

3.1. Evaluation of the Modeling Results

3.1.1. Overall Accuracy

Figure 4 shows the sample- and station-based CV results of the PM_2.5 estimates using our SWRF model in China. In addition, the performances of the original RF model and TRF model were also evaluated in this paper. The RF model shows the worst accuracy (predictive ability) with the lowest sample-(station-)based cross-validation correlation coefficient (CV-R²) of 0.77 (0.76), the largest RMSE of 12.91 (13.05) μg m⁻³, and the largest MAE of 10.96 (11.22) μg m⁻³; however, upon considering the temporal information of the PM_2.5 concentration, the model performance was obviously improved with an increased CV-R² of 0.12 (0.12), a decreased RMSE of 1.37 (1.39) μg m⁻³, and a decreased MAE of 2.17 (2.27) μg m⁻³. By contrast, the proposed SWRF model showed the best accuracy with the highest sample-based CV-R² (0.87) and lowest estimation uncertainties (i.e., RMSE = 11.23 μg m⁻³, MAE = 8.25 μg m⁻³). Additionally, our model had the strongest ability of PM_2.5 prediction across China (i.e., CV-R² = 0.85, RMSE = 11.53 μg m⁻³, MAE = 8.78 μg m⁻³). Furthermore, most of the data samples were close to the 1:1 line (with the strongest regressed slopes > 0.83 and the smallest intercepts < 7.5 μg m⁻³), especially for the highest distribution density with PM_2.5 concentrations below 150 μg m⁻³. In general, the out-of-station results were slightly worse than the out-of-sample results, further illustrating the robustness of our model, which was mainly due to the advantages of integrated learning.

3.1.2. Spatiotemporal-Scale Validation

The spatial and temporal consistency between our PM_2.5 estimates and the surface measurements was also verified. Figure 5 presents the regional sample-based 10-CV results in Chinese main urban agglomerations (i.e., BTH, YRD, and PRD). The satellite-derived PM_2.5 show the highest CV-R² (0.89) and strongest regression lines (slope = 0.86) but the largest RMSE (12.65 μg m⁻³) and highest MAE (9.35 μg m⁻³) in the BTH region because this agglomeration is characterized by the most severe air pollution, followed by the YRD region with average out-of-sample CV-R², RMSE, and MAE values of 0.87, 9.74 μg m⁻³, and 7.41 μg m⁻³, respectively. However, we reached the opposite conclusion in the PRD; i.e., the model in this region yields the lowest sample-based CV-R² of 0.83 but the smallest RMSE and MAE values of 8.35 and 6.47 μg m⁻³, respectively. The increased frequency of clouds over South China results in a smaller number of data samples. The reduction of the number of training samples can affect the learning and training ability of the model, which is one of the potential reasons for affecting the accuracy of the results [43]. However, it will not have a great impact on the regional scale. The numbers of data samples of YRD and PRD regions are 17,407 and 1984, respectively, with a difference of about 9 times, but the difference of CV-R² is only −0.04. In addition, PRD region has a much lower level of PM_2.5 pollution, with most data samples falling within the range of 0–100 μg m⁻³.

Figure 6 illustrates the model accuracy and uncertainty (i.e., R² and RMSE) at each PM_2.5 monitoring station across China. The results suggest obvious spatial differences in model performance: higher CV-R² values are observed mainly in East China, especially in Henan Province (CV-R² = 0.98), whereas lower CV-R² values are observed primarily in Western China, especially in the Xinjiang Uygur Autonomous Region (CV-R² <0.6). In addition, except for some individual sites located in Xinjiang (RMSE >20 μg m⁻³), our model yields small uncertainties with RMSEs <15 μg m⁻³ at most stations across China. This is mainly attributed to the lack of monitoring stations as well as the frequent occurrence of sandstorms in Western China, increasing the uncertainty of PM_2.5 retrievals. Nevertheless, the average site-scale CV-R² and RMSE are 0.81 and 11.39, respectively, and approximately 82.5% and 82.2% of all surface sites express a high CV-R² >0.7 and a low RMSE <15 μg m⁻³. These results illustrate that, in China, our SWRF model can successfully estimate PM_2.5 at site scale.

Furthermore, the temporal performance of the SWRF model as a function of the DOY was also investigated. In 2018, the CV-R² varies from 0.27 to 0.98 with an average of 0.73 in China. In general, approximately 65.5% of all days have a high CV-R² >0.7, with only 8 days yielding a low CV-R² <0.4. The PM_2.5 predicts are always less correlated with the ground measurements (CV-R² = 0.67) but with lower uncertainties (RMSE = 8.32 μg m⁻³) on summer days than on days in the other seasons; by contrast, higher CV-R² values > 0.72 but larger RMSEs >15 μg m⁻³ are always observed on spring and winter days. This disparity occurs because in summer, the dominant natural (e.g., meteorological) conditions complicate the AOD-PM_2.5 relationship; by contrast, spring and winter are characterized by a higher number of severely polluted days due to intense human activity (e.g., heating) and a higher frequency of sand-dust storm days, especially in North China [44,45]. Nevertheless, 65.5% and 77.2% of the days in 2018 are observed to have high CV-R² values >0.7 and low RMSEs <15 μg m⁻³ in China, indicating that the proposed SWRF model can effectively capture the time series variations of PM_2.5 pollution in China.

Figure 7 shows the validation results of our national-scale PM_2.5 estimates at different timescales in 2018. The monthly, seasonal, and yearly mean PM_2.5 concentrations coincide significantly with the surface observations, featuring high R² (strong slopes) of 0.91 (0.89), 0.92 (0.90), and 0.92 (0.91), respectively. In addition, the estimation uncertainties are low overall with average small RMSEs of 6.72, 5.79 and 4.35 μg m⁻³, and low MAEs of 5.25, 4.59 and 3.51 μg m⁻³, respectively. These results suggest that our new model can accurately describe the spatiotemporal variations of PM_2.5 pollution in China.

3.2. Comparison with Other Models and Studies

Figure 8 shows a comparison between the performance of our SWRF model and the performances of other traditional models using the same input dataset, which was developed and utilized in previous research. Among all models, the MLR model shows the worst performance with the lowest CV-R² (~0.38) and the highest RMSE (~16.99 μg m⁻³) and MAE (~19.18 μg m⁻³) because this model considers only simple linear relations between PM_2.5 and numerous variables. The GAM shows a better performance with improved CV-R² values (~ 0.5–0.51) and decreasing estimation uncertainties (RMSE = 17.44–17.83 μg m⁻³, MAE = 16.63–16.91 μg m⁻³) because it is established based on non-linear relationships; however, the GAM does not consider the spatiotemporal heterogeneity of air pollution. Thus, the GWR and LME models were selected for comparison; however, their accuracies are unsatisfactory with overall low sample-based (station-based) CV-R² of 0.55 (0.53) and 0.67 (0.66), respectively, and low RMSEs of 17.32 (17.65) μg m⁻³ and 16.48 (16.77) μg m⁻³ because these two models consider only one factor of spatial and temporal information. Thus, a 2-stage model combining the LME and GWR models was employed for the estimation, demonstrating improvements in the accuracy and predictive ability with better evaluation metrics (e.g., CV-R² = 0.70–0.72, RMSE=15.59 – 15.72 μg m⁻³, MAE=12.41–12.87 μg m⁻³). Nevertheless, the two-stage model is much less accurate than our SWRF model due to the considerably weaker data mining ability of the former and the poor integration of spatiotemporal information.

Moreover, we compared our results with those of previous PM_2.5 studies using the VIIRS AOD product at the national and regional scales in China (Table 2). All the listed studies performed the independent validation using the same out-of-sample 10-CV method, making it comparable. We found that our model is more accurate than the time fixed effects regression (TFER) model (CV-R² = 0.72, RMSE = 22.07 μg m⁻³) and the combination of the TFER and GWR models in the BTH region (CV-R² = 0.72, RMSE = 19.72 μg m⁻³) [14,46]. In addition, our model outperforms the LME (CV-R² = 0.64, RMSE = 18.02 μg m⁻³), LME + GAM (CV-R² = 0.69, RMSE = 15.82 μg m⁻³), and LME + GWR (CV-R² = 0.70, RMSE =15.73) models in Central China [23]. Furthermore, our model is superior to the spatially structured adaptive 2-stage model (TFER + GWR) (CV-R² = 0.60, RMSE = 21.67 μg m⁻³) in the whole of China. This is mainly due to the much stronger data mining ability of our model. In addition, we further optimized the introduction of spatiotemporal information and selected the AERDB AOD product with higher accuracy and wider coverage than the VAOOO DT AOD product in our study.

3.3. Spatial Distribution of PM_2.5 in China

Ground-based observations can provide high-frequency and high-precision PM_2.5 data at the individual station scale; however, PM_2.5 monitoring stations distributed unevenly and varied greatly in number at the regional scale (e.g., only 51 stations in Hebei province) and, in addition, most sites were distributed in urban areas. This cannot accurately observe the air pollution from a wide-scale, leading to inevitable overestimations due to the obvious urban-rural differences. By contrast, satellite remote sensing can make up for this deficiency by generating spatially continuous data, which can provide more accurate distribution and variations of PM_2.5 pollution, especially for those areas without stations, such as suburban/rural areas. Therefore, based on the SWRF model, we generated spatially continuous PM_2.5 distributions for 2018 at a spatial resolution of 6 km across China.

Figure 9 shows the satellite-derived and surface-based annual averaged PM_2.5 among China. Our satellite retrievals reveal spatial patterns that are highly consistent with the ground measurements at most sites in China, especially in North and Central China, which suffer from severe air pollution. In general, the 2018 annual mean PM_2.5 among China was 36.47 ± 12.45 μg m⁻³. Although air pollution has been alleviated recently, more than 45% of the area in China still exceeds the national air quality standard (PM_2.5 = 35 μg m⁻³). The main polluted regions are the North China Plain and Sichuan Basin, which are characterized by developed economies, large populations, and rapid industrial development. In addition, the Taklimakan Desert area of Xinjiang also shows an extremely high PM_2.5 pollution level because it is a main dust source area and experiences frequent dust storms in spring. In contrast, lower PM_2.5 (<15 μg m⁻³) are mainly observed in Southwest and Northeast China, which exhibit high vegetation coverage.

We also investigated the local PM_2.5 concentrations in three city clusters in China (BTH, YRD, and PRD region). In general, the BTH region shows the worst air quality with an annual mean PM_2.5 of 45.60 ± 10.44 μg m⁻³. Higher PM_2.5 exist in the southern BTH region with widely distributed manufacturing and industry [48]. In contrast, the pollution in the northern BTH region is much lighter than in the northern part due to the high vegetation coverage and fewer human activities. In the YRD, the annual mean PM_2.5 is 43.68 ± 8.31 μg m⁻³. In particular, the southern part of the YRD suffers from high PM_2.5 pollution: highly developed urbanization and transportation increase the emissions of primary pollutants and promote the production of secondary pollutants. In the PRD, the PM_2.5 pollution is relatively low (average = 37.32 ± 5.30 μg m⁻³), but this level still exceeds the national air pollution standard. The occurrence of relatively strong wet settlement and an advanced industrial structure are both valuable for decreasing the PM_2.5 in PRD.

Figure 10 shows the sorted concentrations of PM_2.5 pollutants and proportions of exposure time (%, defined as the proportion of days with >35 μg m⁻³ in a year) at the provincial level in mainland China. In general, the annual mean PM_2.5 exceeded 15 in all provinces in 2018, and 19 administrative divisions exceeded the national air pollution standard, especially Xinjiang (~55.25 μg m⁻³), Tianjin (~49.65 μg m⁻³), Henan (~48.60 μg m⁻³), Shandong (~47.95 μg m⁻³), and Jiangsu (~46.47 μg m⁻³) Provinces. These provinces also display large proportions (>10.67%) of exposure time, indicating relatively long-term severe PM_2.5 pollution throughout the year. By contrast, 12 provinces satisfy the national air quality standard, including Tibet (~25.41 μg m⁻³), Heilongjiang (~27.83 μg m⁻³), and Yunnan (~29.36 μg m⁻³), and 17 provinces exhibit small proportions (<2%) of exposure time in China, indicating good air quality.

Figure 11 shows the spatial distributions of the seasonal PM_2.5 in 2018 across China. Winter shows the most severe air pollution with the highest mean concentration of 50.43 ± 16.81 μg m⁻³; approximately 80.76% of China exceeded the acceptable air quality standard in the winter of 2018. The Xinjiang Uygur Autonomous Region, Sichuan Basin, and North China Plain are the major highly polluted areas. Fossil fuel combustion during the heating season and the unfavorable weather conditions for the dissipation of PM can explain this. By contrast, summer has the least PM_2.5 pollution, and except for the desert areas of Northwest China, over 88% of the country has low PM_2.5 concentrations (below 35 μg m⁻³). The pollution level in autumn is similar to that in summer with an average value of 30.77 ± 9.55 μg m⁻³. In spring, higher PM_2.5 > 80 are observed, mainly in the Taklimakan Desert due to frequent sandstorms.

4. Conclusions

The number of satellite-based studies on the near-surface PM_2.5 concentration in China is increasing. However, these studies are based mainly on MODIS aerosol products with a coarse spatial resolution, and the MODIS satellites have been in service for more than 20 years, exceeding their design life. In our study, by contrast with the traditional models that have weak data mining ability and ignore the spatiotemporal heterogeneities of air pollution, we developed a new spatiotemporally weighted random forest (SWRF) model based on the idea of ensemble learning to improve PM_2.5 estimations in China. The temporal information is represented by a time matrix calculated using the inverse distance weighting method, which is used to identify the differences in air pollution conditions among different days and seasons in a year. The spatial information is determined according to the geographically weighted regression to describe the autocorrelations of points in space. Validation results show that our model has a high accuracy and a strong predictive ability with average CV-R² of 0.87 and 0.85, respectively, and RMSEs of 11.23 μg m⁻³ and 11.53 μg m⁻³. In addition, the proposed model also works well at varying spatial and temporal scales. More importantly, comparison results show that the model performance has been significantly improved after considering the spatiotemporal information, and our model outperforms the traditional models and those models developed in previous related studies. Based on the SWRF model, PM_2.5 maps are generated for China in 2018 and indicate that it continues to face considerable exposure risk with an annual averaged PM_2.5 of 36.47 ± 12.45 μg m⁻³. In general, 19 provincial administrative regions exceed the national secondary pollution standard (35 μg m⁻³), and severe air pollution regions are observed in the BTH region and the Xinjiang Uygur Autonomous Region.

Author Contributions

Conceptualization, W.X. and J.W.; methodology, W.X. and J.W.; software, W.X.; validation, W.X.; formal analysis, W.X.; writing—original draft preparation, W.X.; writing—review and editing, J.W., J.Z. and L.S.; supervision, J.W. and J.Z.; funding acquisition, J.Z.; data curation, W.X., J.W., Y.C., M.Y., and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (41575144) and the National Key R&D Program of China (2017YFA0603603).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the co-first authors.

Conflicts of Interest

All authors declare no conflict of interest.

References

Pui, D.Y.; Chen, S.-C.; Zuo, Z. PM 2.5 in China: Measurements, sources, visibility and health effects, and mitigation. Particuology 2014, 13, 1–26. [Google Scholar] [CrossRef]
Xue, W.; Zhang, J.; Zhong, C.; Li, X.; Wei, J. Spatiotemporal PM 2.5 variations and its response to the industrial structure from 2000 to 2018 in the Beijing-Tianjin-Hebei region. J. Clean. Prod. 2021, 279, 123742. [Google Scholar] [CrossRef]
Wang, G.; Cheng, S.; Li, J.; Lang, J.; Wen, W.; Yang, X.; Tian, L. Source apportionment and seasonal variation of PM2.5 carbo-naceous aerosol in the Beijing-Tianjin-Hebei region of China. Environ. Monit. Assess. 2015, 187, 143. [Google Scholar] [CrossRef]
Ming, L.; Jin, L.; Li, J.; Fu, P.; Yang, W.; Liu, D.; Zhang, G.; Wang, Z.; Li, X. PM2.5 in the Yangtze River Delta, China: Chemical compositions, seasonal variations, and regional pollution events. Environ. Pollut. 2017, 223, 200–212. [Google Scholar] [CrossRef]
Bartell, S.M.; Longhurst, J.; Tjoa, T.; Sioutas, C.; Delfino, R.J. Particulate Air Pollution, Ambulatory Heart Rate Variability, and Cardiac Arrhythmia in Retirement Community Residents with Coronary Artery Disease. Environ. Health Perspect. 2013, 121, 1135–1141. [Google Scholar] [CrossRef]
Zheng, S.; Pozzer, A.; Cao, C.X.; Lelieveld, J. Long-term (2001–2012) concentrations of fine particulate matter (PM2.5) and the impact on human health in Beijing, China. Atmos. Chem. Phys. 2015, 15, 5715–5725. [Google Scholar] [CrossRef]
Ge, E.; Lai, K.; Xiao, X.; Luo, M.; Fang, Z.; Zeng, Y.; Ju, H.; Zhong, N. Differential effects of size-specific particulate matter on emergency department visits for respiratory and cardiovascular diseases in Guangzhou, China. Environ. Pollut. 2018, 243, 336–345. [Google Scholar] [CrossRef]
Xia, X.; Wang, G. Treg/Th17 Cells in Chronic Lung Inflammation Models Exposed to PM2.5 in Beijing China. Chest 2016, 149, A407. [Google Scholar] [CrossRef]
Kim, Y.; Sievering, H.; Boatman, J. Airborne measurement of atmospheric aerosol particles in the lower troposphere over the central united states. J. Geophys. Res. Atmos. 1988, 93, 12631–12644. [Google Scholar] [CrossRef]
Zhang, N.-N.; Ma, F.; Qin, C.-B.; Zhang, Z.-F. Spatiotemporal trends in PM2.5 levels from 2013 to 2017 and regional demarcations for joint prevention and control of atmospheric pollution in China. Chemosphere 2018, 210, 1176–1184. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Peng, Y.; Sun, L.; Yan, X. A Regionally Robust High-Spatial-Resolution Aerosol Retrieval Algorithm for MODIS Images Over Eastern China. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 4748–4757. [Google Scholar] [CrossRef]
Wei, J.; Sun, L.; Peng, Y.; Wang, L.; Zhang, Z.; Bilal, M.; Ma, Y. An improved high-spatial-resolution aerosol retrieval algo-rithm for MODIS images over land. J. Geophys. Res. Atmos. 2018, 123, 12291–12307. [Google Scholar] [CrossRef]
Yang, L.; Xu, H.; Jin, Z. Estimating ground-level PM2.5 over a coastal region of China using satellite AOD and a combined model. J. Clean. Prod. 2019, 227, 472–482. [Google Scholar] [CrossRef]
Zhang, T.; Liu, G.; Zhu, Z.; Gong, W.; Ji, Y.; Huang, Y. Real-Time Estimation of Satellite-Derived PM2.5 Based on a Semi-Physical Geographically Weighted Regression Model. Int. J. Environ. Res. Public Health 2016, 13, 974. [Google Scholar] [CrossRef]
Song, Z.; Fu, D.; Zhang, X.; Han, X.; Song, J.; Zhang, J.; Wang, J.; Xia, X. MODIS AOD sampling rate and its effect on PM2.5 estimation in North China. Atmos. Environ. 2019, 209, 14–22. [Google Scholar] [CrossRef]
Wu, J.; Yao, F.; Li, W.; Si, M. VIIRS-based remote sensing estimation of ground-level PM2.5 concentrations in Beijing–Tianjin–Hebei: A spatiotemporal statistical model. Remote. Sens. Environ. 2016, 184, 316–328. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Sun, L.; Yang, Y.; Zhao, C.; Cai, Z. Enhanced Aerosol Estimations from Suomi-NPP VIIRS Images Over Het-erogeneous Surfaces. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9534–9543. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Peng, Y.; Sun, L. MODIS Collection 6.1 aerosol optical depth products over land and ocean: Validation and comparison. Atmos. Environ. 2019, 201, 428–440. [Google Scholar] [CrossRef]
Hsu, N.C.; Gautam, R.; Sayer, A.M.; Bettenhausen, C.; Li, C.; Jeong, M.J.; Tsay, S.-C.; Holben, B.N. Global and regional trends of aerosol optical depth over land and ocean using SeaWiFS measurements from 1997 to 2010. Atmos. Chem. Phys. Discuss. 2012, 12, 8037–8053. [Google Scholar] [CrossRef]
Lin, C.; Liu, G.; Lau, A.K.; Li, Y.; Li, C.; Fung, J.; Lao, X. High-resolution satellite remote sensing of provincial PM2.5 trends in China from 2001 to 2015. Atmos. Environ. 2018, 180, 110–116. [Google Scholar] [CrossRef]
Li, S.; Zhai, L.; Zou, B.; Sang, H.; Xiong, L. A Generalized Additive Model Combining Principal Component Analysis for PM2.5 Concentration Estimation. ISPRS Int. J. Geo-Inf. 2017, 6, 248. [Google Scholar] [CrossRef]
Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach. Remote. Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, T.; Zhang, R.; Zhu, Z.; Yang, J.; Chen, P.; Ou, C.; Guo, Y. Extreme gradient boosting model to estimate PM2.5 concentrations with missing-filled satellite data in China. Atmos. Environ. 2019, 202, 180–189. [Google Scholar] [CrossRef]
Yao, F.; Wu, J.; Li, W.; Peng, J. A spatially structured adaptive two-stage model for retrieving ground-level PM2.5 concentrations from VIIRS AOD in China. ISPRS J. Photogramm. Remote. Sens. 2019, 151, 263–276. [Google Scholar] [CrossRef]
Hsu, N.C.; Lee, J.; Sayer, A.M.; Kim, W.V.; Bettenhausen, C.; Tsay, S. VIIRS Deep Blue Aerosol Products Over Land: Extending the EOS Long-Term Aerosol Data Records. J. Geophys. Res. Atmos. 2019, 124, 4026–4053. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Sun, L.; Xue, W.; Ma, Z.; Liu, L.; Fan, T.; Cribb, M. Extending the EOS Long-Term PM2.5 Data Records Since 2013 in China: Application to the VIIRS Deep Blue Aerosol Products. IEEE Trans. Geosci. Remote. Sens. 2021, 1–12. [Google Scholar] [CrossRef]
Giles, D.M.; Sinyuk, A.; Sorokin, M.G.; Schafer, J.S.; Smirnov, A.; Slutsker, I.; Welton, E.J. Advancements in the Aerosol Rbotic Network (AERONET) Version 3 database–automated near-real-time quality control algorithm with improved cloud screening for Sun photometer aerosol optical depth (AOD) measurements. Atmos. Meas. Tech. 2019, 12, 169–209. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Lyapustin, A.; Sun, L.; Peng, Y.; Xue, W.; Su, T.; Cribb, M. Reconstructing 1-km-resolution high-quality PM2.5 data records from 2000 to 2018 in China: Spatiotemporal variations and policy implications. Remote. Sens. Environ. 2021, 252, 112136. [Google Scholar] [CrossRef]
Zhou, C.; Wang, K.; Ma, Q. Evaluation of Eight Current Reanalyses in Simulating Land Surface Temperature from 1979 to 2003 in China. J. Clim. 2017, 30, 7379–7398. [Google Scholar] [CrossRef]
Dee, D.P.; Uppala, S.M.; Simmons, A.J.; Berrisford, P.; Poli, P.; Kobayashi, S.; Andrae, U.; Balmaseda, M.A.; Balsamo, G.; Bau-er, P.; et al. The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc. 2011, 137, 553–597. [Google Scholar] [CrossRef]
Breiman, L.; Breiman, L.; Cutler, R.A.J.J.o.C.M. Random Forests. Mach. Learn. 2001, 2, 199–228. [Google Scholar]
Ma, Z.; Hu, X.; Sayer, A.M.; Levy, R.; Zhang, Q.; Xue, Y.; Tong, S.; Bi, J.; Huang, L.; Liu, Y. Satellite-Based Spatiotemporal Trends in PM 2.5 Concentrations: China, 2004–2013. Environ. Health Perspect. 2016, 124, 184–192. [Google Scholar] [CrossRef] [PubMed]
Babak, O.; Deutsch, C.V. Statistical approach to inverse distance interpolation. Stoch. Environ. Res. Risk Assess. 2008, 23, 543–553. [Google Scholar] [CrossRef]
Ziegel, E.R.; Neter, J.; Kutner, M.; Nachtsheim, C.; Wasserman, W. Applied Linear Statistical Models. Technometrics 1997, 39, 342. [Google Scholar] [CrossRef]
Gupta, P.; Christopher, S.A. Particulate matter air quality assessment using integrated surface, satellite, and meteorological products: Multiple regression approach. J. Geophys. Res. Space Phys. 2009, 114, 114. [Google Scholar] [CrossRef]
Ma, Z.; Liu, Y.; Zhao, Q.; Liu, M.; Zhou, Y.; Bi, J. Satellite-derived high resolution PM2.5 concentrations in Yangtze River Del-ta Region of China using improved linear mixed effects model. Atmos. Environ. 2016, 133, 156–164. [Google Scholar] [CrossRef]
Hu, X.; Waller, L.A.; Al-Hamdan, M.Z.; Crosson, W.L.; Estes, M.G., Jr.; Estes, S.M.; Quattrochi, D.A.; Sarnat, J.A.; Liu, Y. Esti-mating ground-level PM(2.5) concentrations in the southeastern U.S. using geographically weighted regression. Environ. Res. 2013, 121, 1–10. [Google Scholar] [CrossRef]
Xue, W.; Zhang, J.; Zhong, C.; Ji, D.; Huang, W. Satellite-derived spatiotemporal PM2.5 concentrations and variations from 2006 to 2017 in China. Sci. Total. Environ. 2020, 712, 134577. [Google Scholar] [CrossRef]
Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 569–575. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Cribb, M.; Huang, W.; Xue, W.; Sun, L.; Guo, J.; Peng, Y.; Li, J.; Lyapustin, A.; et al. Improved 1 km resolution PM2.5 estimates across China using enhanced space–time extremely randomized trees. Atmos. Chem. Phys. Discuss. 2020, 20, 3273–3289. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Xue, W.; Sun, L.; Fan, T.; Liu, L.; Su, T.; Cribb, M. The ChinaHighPM10 dataset: Generation, validation, and spatiotemporal variations from 2015 to 2019 across China. Environ. Int. 2021, 146, 106290. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Shen, H.; Yuan, Q.; Zhang, X.; Zhang, L. Estimating Ground-Level PM2.5 by Fusing Satellite and Station Observations: A Geo-Intelligent Deep Learning Approach. Geophys. Res. Lett. 2017, 44, 11–985. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Guo, J.; Sun, L.; Huang, W.; Xue, W.; Fan, T.; Cribb, M. Satellite-Derived 1-km-Resolution PM1 Concentrations from 2014 to 2018 across China. Environ. Sci. Technol. 2019, 53, 13265–13274. [Google Scholar] [CrossRef] [PubMed]
Kong, S.; Li, X.; Li, L.; Yin, Y.; Chen, K.; Yuan, L.; Zhang, Y.; Shan, Y.; Ji, Y. Variation of polycyclic aromatic hydrocarbons in atmospheric PM2.5 during winter haze period around 2014 Chinese Spring Festival at Nanjing: Insights of source changes, air mass direction and firework particle injection. Sci. Total. Environ. 2015, 520, 59–72. [Google Scholar] [CrossRef] [PubMed]
Ge, J.M.; Huang, J.P.; Xu, C.P.; Qi, Y.L.; Liu, H.Y. Characteristics of Taklimakan dust emission and distribution: A satellite and reanalysis field perspective. J. Geophys. Res. Atmos. 2014, 119, 11–772. [Google Scholar] [CrossRef]
Yao, F.; Si, M.; Li, W.; Wu, J. A multidimensional comparison between MODIS and VIIRS AOD in estimating ground-level PM2.5 concentrations over a heavily polluted region in China. Sci. Total. Environ. 2018, 618, 819–828. [Google Scholar] [CrossRef]
Zhang, K.; De Leeuw, G.; Yang, Z.; Chen, X.; Su, X.; Jiao, J. Estimating Spatio-Temporal Variations of PM2.5 Concentrations Using VIIRS-Derived AOD in the Guanzhong Basin, China. Remote. Sens. 2019, 11, 2679. [Google Scholar] [CrossRef]
Wang, L.; Liu, Z.; Sun, Y.; Ji, D.; Wang, Y. Long-range transport and regional sources of PM2.5 in Beijing based on long-term observations from 2005 to 2010. Atmos. Res. 2015, 157, 37–48. [Google Scholar] [CrossRef]

Figure 1. Spatial distributions of the 1583 PM_2.5 (blue) and aerosol optical depth (AOD, red) surface measurement stations operating in 2018. The background is an elevation map of China, and the three main urban agglomerations are also shown.

Figure 2. Validation between the VIIRS Deep Blue (DB) AOD retrievals and Aerosol Robotic Network (AERONET) AOD measurements in China in 2018 (N=1173). The red line is the fitting line, and the black line is the 1:1 line.

Figure 3. Flowchart of the SWRF model developed in this study.

Figure 4. Density scatter plots of the (a–c) sample-based and (d–f) station-based 10-fold cross-validation (10-CV) results for the random forest (RF), temporally weighted random forest (TRF), and SWRF models in 2018 in China.

Figure 5. Density scatter plots of the sample-based 10-CV results in the (a) Beijing–Tianjin–Hebei (BTH) region (N = 12,891), (b) Yangtze River Delta (YRD) (N = 17,407) and (c) Pearl River Delta (PRD) (N = 1984) in 2018.

Figure 6. (a,b) Spatial distributions and (c,d) temporal time series of the consistency between the SWRF model-derived PM2.5 concentrations and surface measurements in 2018 across China.

Figure 7. Density scatter plots of validation of the 2018 monthly (a), seasonal (b), and annual (c) mean PM_2.5 estimates in China.

Figure 8. Comparison of the out-of-sample and out-of-station validation results in terms of (a,b) accuracy (CV-R²) and (c,d) uncertainty (RMSE and MAE) between our SWRF model and other traditional statistical models in 2018 in China.

Figure 9. Satellite-derived (6 km) and surface-based annual mean PM_2.5 concentrations in (a) the whole of China, (b) the BTH, (c) YRD, and (d) PRD regions in 2018.

Figure 10. Annual mean PM_2.5 concentration (a) and proportion of exposure time (b) in each province across mainland China in 2018. The black and blue vertical lines represent annual mean PM_2.5 values of 15 and 35 μg m⁻³, respectively.

Figure 11. Spatial distributions of the seasonal PM_2.5 concentration (6 km) in 2018 across China.

Table 1. Correlation and collinearity diagnosis results between selected independent variables used in the spatiotemporally weighted random forest (SWRF) model.

Variable	AOD	BLH (m)	ET (mm)	NDVI	NTL	PRE (mm)	RH (%)
R	0.50	−0.25	0.27	−0.30	0.11	−0.08	0.07
VIF	1.15	2.10	3.60	2.45	1.16	1.15	1.90
Variable	SP (kpa)	TEM (k)	WD (°)	WS (m s⁻¹)	LUC	DEM (m)
R	0.11	−0.23	0.01	−0.14	0.13	−0.11
VIF	10.09	3.85	1.15	1.14	1.18	10.10

AOD: aerosol optical depth; BLH: boundary layer height; ET: evaporation; NDVI: normalized difference vegetation index; NTL: nighttime light; PRE: precipitation; RH: relative humidity; SP: surface pressure; TEM: temperature (TEM); WD: wind direction; WS: wind speed; LUC: land use cover; DEM: digital elevation model

Table 2. Comparison results between our model and the models of other similar studies.

Model	Aerosol Product	Spatial Resolution	Study Area	Model Validation		Reference
Model	Aerosol Product	Spatial Resolution	Study Area	CV-R²	RMSE	Reference
TFER + GWR	VAOOO	6 km	BTH	0.72	19.29	[16]
TFER	VAOOO	6 km	BTH	0.72	22.07	[46]
LME	VAOOO	6 km	Central China	0.64	18.02	[47]
LME + GAM	VAOOO	6 km	Central China	0.69	15.82
LME + GWR	VAOOO	6 km	Central China	0.70	15.73
TFER + GWR	VAOOO	6 km	China	0.60	21.76	[24]
SWRF	AERDB	6 km	China	0.87	11.53	Our study
	AERDB	6 km	BTH	0.89	12.65
	AERDB	6 km	YRD	0.87	9.74
	AERDB	6 km	PRD	0.83	8.35

GAM: generalized additive model; GWR: geographically weighted regression; LME: linear mixed effect; SWRF: spatiotemporally weighted random forest; TFER: time fixed effects regression.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, W.; Wei, J.; Zhang, J.; Sun, L.; Che, Y.; Yuan, M.; Hu, X. Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model. Remote Sens. 2021, 13, 505. https://doi.org/10.3390/rs13030505

AMA Style

Xue W, Wei J, Zhang J, Sun L, Che Y, Yuan M, Hu X. Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model. Remote Sensing. 2021; 13(3):505. https://doi.org/10.3390/rs13030505

Chicago/Turabian Style

Xue, Wenhao, Jing Wei, Jing Zhang, Lin Sun, Yunfei Che, Mengfei Yuan, and Xiaomin Hu. 2021. "Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model" Remote Sensing 13, no. 3: 505. https://doi.org/10.3390/rs13030505

APA Style

Xue, W., Wei, J., Zhang, J., Sun, L., Che, Y., Yuan, M., & Hu, X. (2021). Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model. Remote Sensing, 13(3), 505. https://doi.org/10.3390/rs13030505

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inferring Near-Surface PM_2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model

Abstract

1. Introduction