Satellite Retrieval of Air Pollution Changes in Central and Eastern China during COVID-19 Lockdown Based on a Machine Learning Model

: With the implementation of the 2018–2020 Clean Air Action Plan (CAAP) the and impact from COVID-19 lockdowns in 2020, air pollution emissions in central and eastern China have decreased markedly. Here, by combining satellite remote sensing, re-analysis, and ground-based observational data, we established a machine learning (ML) model to analyze annual and seasonal changes in primary air pollutants in 2020 compared to 2018 and 2019 over central and eastern China. The root mean squared errors (RMSE) for the PM 2.5 , PM 10 , O 3 , and CO validation dataset were 9.027 µ g/m 3 , 20.312 µ g/m 3 , 10.436 µ g/m 3 , and 0.097 mg/m 3 , respectively. The geographical random forest (RF) model demonstrated good performance for four main air pollutants. Notably, PM 2.5 , PM 10 , and CO decreased by 44.1%, 43.2%, and 35.9% in February 2020, which was likely inﬂuenced by the COVID-19 lockdown and primarily lasted until May 2020. Furthermore, PM 2.5, PM 10 , O 3 , and CO decreased by 16.4%, 24.2%, 2.7%, and 19.8% in 2020 relative to the average values in 2018 and 2019. Moreover, the reduction in O 3 emissions was not universal, with a signiﬁcant increase (~20–40%) observed in uncontaminated areas.


Introduction
With the rapid economic growth, industrialization, and urbanization of China, particulate matter (PM), including PM 10 and PM 2.5 (<10-and <2.5-micron diameters, respectively), and gaseous pollutants, such as sulfur dioxide (SO 2 ), ozone (O 3 ), carbon monoxide (CO), and nitrogen oxides (NO x ), have become a serious concern. In 2012, to improve ambient air quality, the Chinese government issued stringent national ambient air quality standards (NAAQS) [1] and enhanced the monitoring of six major air contaminants (PM 10 , PM 2.5 , SO 2 , NO 2 , CO, and O 3 ) in 338 Chinese cities [2], with mixed results. For example, the Atmospheric Pollution Prevention and Control Action Plan (2013-2017) greatly contributed to the reduction in PM 2.5 [3,4]. However, although various studies have reported an improvement in ambient air quality, O 3 concentrations have shown an increasing trend in the last five years [4][5][6].
The implementation of the second-phase Clean Air Action Plan (CAAP) has been impacted by the coronavirus disease 2019 (COVID-19) outbreak during the Chinese New Year holiday (24 January to 8 February 2020) [7,8]. The Chinese government implemented strict lockdown measures during (and beyond) the festival period. Efforts to contain the rapid spread of COVID-19 drastically reduced human activities worldwide, including industrial production, energy consumption, and transportation. Of note, air pollutant emissions also rapidly decreased during the lockdown period [9][10][11][12]. Liu et al., reported a 77% decrease in NO x , leading to a significant increase of O 3 in Hangzhou during the COVID-19 lockdown [13]. Bao et al., found that PM 2.5 , PM 10 , and CO decreased by 5.93%, 13.66%, and 4.58%, respectively, in 44 cities of northern China [14]. Thus far, however, most studies have assessed changes in primary air pollutants before April 2020, without considering temporal environmental hysteresis [15][16][17][18][19].
Of interest, several studies have reported that despite the decrease in air pollutant emissions in January and February 2020, severe haze episodes in northern China did not show an associated improvement [12,15,20]. These anomalous changes have caused public controversy about the actual effects of the CAAP [8]. However, short-term responses in air quality to variations in anthropogenic emissions may not be obvious at the regional meteorological scale [21]. For example, Li et al., reported that environmental regulations have a stronger hysteresis impact on air pollutants in eastern China than in western China [17]. In addition, the influence of transboundary air pollutants can also lead to uncertainties in assessing the impacts of exhaust emission control on air pollutant reduction over short periods in small regions [21,22]. Therefore, short-term investigations on the changes in air pollutants during the first few months of 2020, as opposed to examining annual variations, are insufficient.
As for the long-term prediction in the geosciences, machine learning (ML) algorithms have demonstrated that they deal with non-linear processes well [23]. Several studies have successfully applied ML algorithms to estimate PM 2.5 and PM 10 [24]. These studies have produced most air pollutants data with high quality and spatial coverage, but still lack CO and other air pollutant predictions [24][25][26]. Moreover, the comprehensive assessments of spatiotemporal distributions of air pollutants and their changes are still deficient during COVID-19 lockdown [14]. Therefore, in the current study, we utilized the ML approach to determine the concentrations of PM 2.5 , PM 10 , O 3 , and CO in central and eastern China. We further analyzed the spatiotemporal distributions and changes in these primary air pollutants from 2018 to 2020 and explored the impact of COVID-19 lockdown on ambient pollution.

Study Area
The study area and air quality monitoring stations in 22 provinces and municipalities of central and eastern mainland China are shown in Figure 1a and include economically developed megacities within the Beijing-Tianjin-Hebei Region (BTH), Yangtze River Delta (YRD), and Guangdong-Hong Kong-Macao Greater Bay Area (GBA). These stations and ground-based observations provide a broad range of air monitoring capabilities and data for these regions. Furthermore, the population in the 22 provinces accounts for 82.4% of the total population in China, and Gross Domestic Product (GDP) accounts for 86.2% of the national total according to the China Statistical Yearbook (2019) [27]. Since 2013, the Chinese government has implemented various stringent pollution control policies, resulting in a significant improvement in ambient air quality. COVID-19 lockdowns in early 2020 further enhanced these reductions in emissions. The joint effects of the CAAP and COVID-19 lockdowns have reduced human activities and improved the air environment. Therefore, understanding the concentrations of and changes in air pollutants from 2018 to 2020 is of great significance for future environmental protection and public health in central and eastern China.

Data Description
The ground-based observational data of the four air pollutants (PM 2.5 , PM 10  The Suomi National Polar-Orbiting Partnership (Suomi NPP) spacecraft, which launched on 28 October 2011, carries the Visible Infrared Imaging Radiometer Suite (VIIRS), with 22 radiometric bands ranging from 0.41 to 12.5 microns and a 3060 km swath. We used the VIIRS Level 3 daily deep blue aerosol products (AERDB_D3_VIIRS_SNPP, see https: //ladsweb.modaps.eosdis.nasa.gov/search/, accessed on 10 May 2021) from 2018 to 2020, which are derived from VIIRS Level 2, 6 min and swath-based aerosol products gridded on a 1 • by 1 • horizontal resolution grid. The use of aggregated products can improve data validity and avoid outliers.
We used the European Centre for Medium-Range Weather Forecasts (ECMWF) 5th Re-Analysis (ERA5, see https://cds.climate.copernicus.eu/#!/home, accessed on 10 May 2021) dataset with 0.25 • grid resolution from 2018 to 2020. According to previous study and correlation analysis (Figure 2), we selected hourly meteorological data from the ERA5 re-analysis dataset, which included relative humidity (RH, %), surface pressure (SP, Pa), air temperature (T, K), boundary layer height (BLH, m), u-component of wind above 10 m (U 10 , m/s), and v-component of wind above 10 m (V 10 , m/s), with the hourly dataset then converted to a daily scale.

ML Algorithm
In recent years, an ML algorithm has been widely applied in air pollutants prediction mainly due to the superior ability of ML algorithms to capture and employ the features of independent variables and solve complex non-linear problems [25,26,28,29]. Air pollutants are non-linearly affected by multiple ambient factors, such as T, RH, SP, and wind speed. As shown in Figure 2, meteorological factors significantly influenced the concentration of air pollutants (PM 2.5 , PM 10 , O 3 , and CO) and showed relatively strong relationships with aerosol optical depth (AOD) and T. Conventional statistical models have difficulty in addressing the complex non-linear relationship among meteorological factors and air pollutant concentrations. Therefore, ML models can be utilized to resolve this non-linear problem [26].
The random forest (RF) model is a family of ML algorithms and constituted by an ensemble of decision trees. The RF regression model has relatively low generalization error based on ensemble decision tree theory and generates an unbiased output by aggregating the importance of each regression tree branch feature [30]. In the process of RF regression model simulations, the optimal splitting method can be used to split the samples for each regression tree according to the data characteristics, as shown in the red chain in Figure 1b. The detailed structures of RF algorithm for the PM 2.5 , PM 10 , O 3 , and CO are presented in Figures S1-S4.
Considering the differences in resolution between AOD and meteorological data, we interpolated their resolutions linearly to a 0.5 • grid for the model input. To assess the relationships among the concentrations of the four air pollutants and input variables, we comatched ground-based observational data to the nearest input pixel. The specific structure and schematics of the geographical RF model are illustrated in Figure 1b. Following previous research, we set the forest to 2000 trees, maximum depth of trees to 15, and minimum number of splitting samples to 10 [25,31]. Input variable features were extracted by model training and their relative feature importance (RFI) was determined as a predictor. Additionally, sample-based 10-fold cross-validation (CV) was applied to optimize model performance. The RF model emphasizes practical application for substantive interpretation by calculating variable importance compared to neural network (NN) models [30].

Evaluation of Model Performance
As shown in Figure 3, the coefficients of determination (R 2 ) between the RF-model predicted and ground-measured air pollutant concentrations were above 0.925 during training (2018-2019) and above 0.883 for independent validation (2020). The root mean squared errors (RMSE) for the PM 2.5 , PM 10 , O 3 , and CO validation dataset were 9.027 µg/m 3 , 20.312 µg/m 3 , 10.436 µg/m 3 , and 0.097 mg/m 3 , respectively. The R 2 and RMSE values of the validation dataset were weaker than those of the training dataset, but were within the permissible range. However, the model tended to underestimate air pollutant levels under high-concentration conditions, which is expected in most ML algorithms [32]. The robustness of the O 3 model was better than that of the other three air pollutant models due to its higher correlations with T, BLH, and RH (correlation coefficients (R) of 0.42, 0.36, and −0.3, respectively) ( Figure 3). The RFI of the predictors can be shown in Figure 4. We can find that the two most important predictors of PM 2.5 and PM 10 are AOD and T, whereas the two most important predictors of O 3 and CO are T and BLH. The larger RFI means that the predictor contributes more on the output variable. Thus, in geographical RF models, AODs mainly regulate the changes of PM 2.5 and PM 10 , while T mainly controls the changes of O 3 .  The performance of geographical RF models also need to be diagnosed by the learning curves, which are shown in Figure 5. The learning curve represents the changes in performance score for training and cross-validation with the sample size. Here, the performance score of geographical RF models are determined based on the sample-based 10-fold CV method. As Figure 5a-d show, the CV performance scores are enhanced with the increases of the training dataset size, suggesting that larger training dataset can further improve the model performance. Additionally, the impacts of training dataset size on the performance scores of PM 2.5 and PM 10 are stronger than of O 3 and CO. Overall, our results indicated that four air pollutants concentrations could be accurately predicted using the geographical RF model.    10 , and CO, indicating that short-term control of production may improve the air environment quality over a relatively long-term period.    Figure 8 shows the monthly spatial distributions of model-predicted PM 2.5 , PM 10 , O 3 , and CO concentration in 2020 over the central and eastern China. Interestingly, although the O 3 concentration increased during COVID-19 lockdown, its rate of change was −2.7% in 2020, which may be due to the relatively sharp decline (−22.5%) in summer and autumn compared with the model-predicted concentrations shown in Figures 7c and 8c. The increase in O 3 concentration may be due to the higher temperature (~1.8 • C relative to the averages in 2018 and 2019) in February 2020 [33]. Conversely, due to the weak aerosol scattering and absorption effects caused by the sharp decrease in PM 2.5 and PM 10 , higher solar radiation intensity may have led to the increase in O 3 [5]. Notably, the changes in air pollutants are consistent with the spatial distributions expected for the O 3 concentration due to the similar R and RFI results of the other three air pollutants in Figures 2 and 4. Spatially, in the NCP, GBA, and Liaotung Peninsula regions, the O 3 concentration was high but showed a decreasing tendency, whereas in central China, especially Chongqing and Guizhou, the concentration was relatively low but showed an increasing tendency (~20-40%). In contrast, the level of PM 2.5 decreased by 32.56%, on average in the above regions. According to previous studies, the O 3 -PM 2.5 relationship (O 3 is produced, while PM 2.5 scavenges hydroperoxyl (HO 2 ) and nitric oxide (NO x ) radicals) may be one of the main reasons for the serious increase in O 3 pollution in these regions [6]. Comparing Figures 3b and 7a, we can see that PM 2.5 increases more dramatically in low-concentration areas, with PM 10 and CO showing similar behavior. Specifically, the increases in air pollutants (except for O 3 ) were obvious (~15-55%, on average, in 2020) in the eastern Shandong Peninsula and southeastern China. However, the changes in O 3 were reversed (i.e., decreased by~5-20%) compared to the changes in the other three pollutants in these areas, which may be explained by the O 3 -PM 2.5 inverse relationship [6]. Although the average rate of change for CO was similar to that of PM 2.5 and PM 10 in spatial distribution, the concentration distribution was not consistent, especially in spring and autumn (Figures 7d and 8d). This may be because the concentration of CO was less influenced by AOD (R is 0.23, and RFI is 0.14) compared to regional vehicle exhaust and industrial waste gas [34]. Therefore, reducing emissions and controlling anthropogenic activities are effective at alleviating air pollution in central and eastern China.

Seasonal Changes in Air Pollutants in 2020 Compared to 2018 and 2019
The predicted means and standard deviations of the PM 2.5 , PM 10 , O 3 , and CO concentrations are presented in Figure 9. The PM 2.5 , PM 10 , and CO concentrations all reach the minimum values in summer 2018-2020, but O 3 reaches the maximum value in summer 2018-2020. The rapid changes in PM 10 in 2018-2019 occurred in April, but occurred in February for 2019-2020, which may be due to differences in policy implementation. Additionally, the changes in PM 10 (−25.8%) in January 2019-2020 were stronger than that of PM 2.5 (−13.3%), suggesting that PM 10 may be more heavily influenced by policy changes than PM 2.5 .   (Figures 4 and 10). Similar results have been reported in previous research [13,35]. The changes in PM 2.5 and PM 10 in the summer of 2020 were not significant, which may be due to the resumption of production [15].
However, the trends of the four air pollutants all reversed in September and October 2020 (decreasing) relative to those in 2018-2019 (increasing) (Figures 4 and 10), which may be linked to the joint attenuation effects of the CAAP and COVID-19 lockdown. Similar seasonal changes in PM 2.5 , PM 10 , O 3 , and CO in 2020 were also predicted from the models, as shown in Figures 7 and 8. Although the declines in O 3 concentration were weaker than those of the other three air pollutants, O 3 still presented a reversing trend (i.e., rates of change in 2018-2019 and 2019-2020 were 0.0349 and −0.0337 µg/m 3 /day, respectively). Moreover, the attenuation of air pollutants after lockdown accounted for a large proportion of the annual reduction in 2020.   Table 1 show that the ground-based observations of changes in air pollutants were relatively consistent with the model predictions, but the rates of changes showed slight differences due to different spatial coverage. However, the model predictions may slightly underestimate the concentrations under extreme weather conditions [32]. Although the reduction ratio of PM 10 was larger than that of PM 2.5 in 2019-2020, PM 2.5 was more strongly influenced by COVID-19 lockdown than PM 10 . As shown in Figure 4, PM 2.5 was the only air pollutant to show a sharply reversing trend in February 2020. Moreover, as shown in Figure 10, the change in O 3 concentration was inversely proportional to that of CO, which may be due to their opposite correlation with temperature (0.42 for O 3

Conclusions
We used ground-based observations, re-analysis, and satellite remote sensing data based on ML to analyze the annual and seasonal changes in primary air pollutants over central and eastern China and explore the joint impact of COVID-19 lockdowns and the CAAP on ambient pollution. The RMSE for the PM 2.5 , PM 10 , O 3 , and CO validation dataset were 9.027 µg/m 3 , 20.312 µg/m 3 , 10.436 µg/m 3 , and 0.097 mg/m 3 , respectively. The differences between the predicted and observed concentrations of PM 2.5 , PM 10 , O 3 , and CO were statistically acceptable, and the geographical RF model showed good performance (R 2 = 0.883, RMSE within permissible range) for the four pollutants, indicating that the selected variables were able to explain the changes in the pollutants.
Relative to their concentrations in 2018-2019, the concentrations of PM 2.5 , PM 10 , and CO in February 2020 decreased by 44.1%, 43.2%, and 35.9%, respectively. These reductions were influenced by the COVID-19 lockdown and lasted until May 2020. The overall increasing trend in the concentration of O 3 was arrested and reversed in 2020. Interestingly, the 2019-2020 trends of the four pollutants all reversed in September and October 2020 relative to the levels observed in 2018-2019, which may be due to the joint effects of the CAAP and COVID-19 lockdown.
Spatially, the concentrations of PM 2.5 , PM 10 , and CO showed decreasing trends in almost all regions of central and eastern China, and decreased by 16.4%, 24.2%, and 19.8%, respectively, in 2020 relative to the average levels in 2018 and 2019. The average rate of O 3 change was −2.7% in 2020. Although areas with high O 3 concentration (NCP, GBA, and Liaotung Peninsula) showed declining trends, the government need still needs to pay attention to uncontaminated areas due to the overall increasing trends (~20-40%). The effect of COVID-19 lockdowns on PM 2.5 was stronger than that on PM 10 , but the annual average rate of PM 10 change influenced by policy was larger. Overall, under the combined influence of the CAAP and COVID-19 lockdowns, the ambient air quality improved, but preparations are still required to prevent future haze events.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/rs13132525/s1: Figure S1. The detailed structures of RF algorithm for the PM 2.5 . As the maximum depth of RF algorithm is 15, only the first four levels of decision trees are displayed; Figure  S2. The detailed structures of RF algorithm for the PM 10 . As the maximum depth of RF algorithm is 15, only the first four levels of decision trees are displayed; Figure S3. The detailed structures of RF algorithm for the O 3 . As the maximum depth of RF algorithm is 15, only the first four levels of decision trees are displayed; Figure S4 The detailed structures of RF algorithm for the CO. As the maximum depth of RF algorithm is 15, only the first four levels of decision trees are displayed.