Spatiotemporal Associations between PM2.5 and SO2 as well as NO2 in China from 2015 to 2018

Given the critical roles of nitrates and sulfates in fine particulate matter (PM2.5) formation, we examined spatiotemporal associations between PM2.5 and sulfur dioxide (SO2) as well as nitrogen dioxide (NO2) in China by taking advantage of the in situ observations of these three pollutants measured from the China national air quality monitoring network for the period from 2015 to 2018. Maximum covariance analysis (MCA) was applied to explore their possible coupled modes in space and time. The relative contribution of SO2 and NO2 to PM2.5 was then quantified via a statistical modeling scheme. The linear trends derived from the stratified data show that both PM2.5 and SO2 decreased significantly in northern China in terms of large values, indicating a fast reduction of high PM2.5 and SO2 loadings therein. The statistically significant coupled MCA mode between PM2.5 and SO2 indicated a possible spatiotemporal linkage between them in northern China, especially over the Beijing–Tianjin–Hebei region. Further statistical modeling practices revealed that the observed PM2.5 variations in northern China could be explained largely by SO2 rather than NO2 therein, given the estimated relatively high importance of SO2. In general, the evidence-based results in this study indicate a strong linkage between PM2.5 and SO2 in northern China in the past few years, which may help to better investigate the mechanisms behind severe haze pollution events in northern China.


Introduction
Concentrations of fine particulate matter (denoted as PM 2.5 ) in China have decreased prominently during the time period of 2013-2018 due to the implementation of the air pollution prevention and control action plan [1][2][3]. Further investigations indicate that the observed reduction of PM 2.5 loadings in China could be largely attributed to the implementation of many effective air pollution control policies, such as fuel quality promotion and usage of cleaner energy, and especially the reduction of industrial emissions of air pollutants like sulfates and nitrogen oxides [4][5][6].
Many previous studies have evaluated the spatial and temporal variations of PM 2.5 concentrations in China in terms of either trend estimation or spatial clustering analysis. For instance, using a global and local Moran's I index, Ye et al. (2018) [7] examined the spatiotemporal patterns of PM 2.5 in 338 major Chinese cities at multiple time-scales, and found that PM 2.5 in China has evident spatial autocorrelation and clustering characteristics with "hot spots" identified in the Beijing-Tianjin-Hebei (BTH) region and southwest Xinjiang. By taking advantage of satellite-based PM 2.5 estimations, Ma et al. (2015) [8] analyzed the spatiotemporal trends of PM 2.5 nationwide and found that PM 2.5 in China increased during 2004 to 2007, and has decreased since then. Similar effects were also revealed in Reference [1], but with more spatial details, as the PM 2.5 inflection time was identified at each grid pixel. Moreover, their results indicate that performing trend analysis on stratified data (e.g., large values) is more informative and meaningful for evaluating the effectiveness of emission control actions on PM 2.5 pollution levels [1].
Aside from trend analysis, correlations between PM 2.5 and other air pollutants such as nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), carbon monoxide, and ozone have also been examined in many studies [4,9], aiming to explore their possible interactions in space and time [10]. Due to complex mechanisms of PM 2.5 formation and variation, attribution of the observed PM 2.5 variations to precursor gases via correlation analysis is by no means an easy task. Numerical simulations are always needed to achieve such a goal via sensitivity experiments, by considering various influential factors such as anthropogenic emissions and meteorological conditions [3,[11][12][13][14].
Previous studies have revealed that anthropogenic emissions of SO 2 and NO 2 play important roles in the process of secondary fine particulate matter formation [6,15,16]. Although the newly established China national ambient air quality monitoring work has provided ample first-hand observations of six essential air pollutants on an hourly basis since 2013, few studies have attempted to examine the possible spatiotemporal associations between PM 2.5 and SO 2 as well as NO 2 at the national scale using these valuable long-term in situ records.
In this study, ground measured concentrations of PM 2.5 and SO 2 as well as NO 2 in China from 2015 to 2018 were used to explore possible associations between them in space and time. Differing from previous correlation analysis, maximum covariance analysis (MCA) was applied here to identify possible coupled spatiotemporal modes between PM 2.5 and the other two pollutants. Statistical modeling practices were subsequently performed to explore the statistical relationship between them, aiming to resolve the relative contribution of SO 2 and NO 2 to the observed PM 2.5 variations. Science questions to be answered by this study may include: (1) Is there any statistically significant association between PM 2.5 and SO 2 and/or NO 2 in space and time across China? (2) To what extent can the observed PM 2.5 variations be explained by SO 2 and NO 2 ? (3) Which factor is more important in modulating the observed PM 2.5 variations?

In Situ Air Quality Records
Hourly PM 2.5 , SO 2 , and NO 2 concentration records during the period of 1 January 2015 to 31 December 2018, acquired from the China national environmental monitoring center, were employed. China began establishing a national ambient air quality monitoring network in 2013. To date, more than 1600 state-level monitoring stations have been deployed across 338 major Chinese cities. The spatial distribution of these stations can be found in Figure 1. Most of these stations are located in the east and south of China, where dense populations are found. Six essential air quality indicators (PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 ) are automatically sampled at each station, primarily using a tapered element oscillating microbalance analyzer and/or beta-attenuation monitor, and are then released publicly online by the China Environment Monitoring Center on an hourly basis.

Quality Control and Data Integration
Given observations from dense stations would not only result in significant information redundancy, but might increase the computational complexity due to significant spatial heterogeneity of air quality observations, data from 1604 stations were integrated into 530 data sets via spatial hierarchy clustering by following the process as detailed in Reference [1]. As shown in Figure 1, the clustered stations consisted of 369 merged stations and 161 independent stations.
Prior to data integration, essential quality control measures were applied to raw observations at each station to detect and remove outliers. First, a reference time series was created among stations within the same cluster using the median values of data from original stations, and a data value was referred to as an outlier once it exceeded triple (six times for SO 2 because of large variation) the 24 h standard deviations from the reference time series. Second, gradient difference was calculated for the previously screened time series to further identify outliers. Specifically, the quadratic derivatives were calculated and data values were deemed outliers once the quadratic gradient values were not within the range of −100 to 100. Such a screening enabled identification of outliers overlooked in the previous step. Around 3.9% of PM 2.5 observations (2.9% for SO 2 and 5.1% for NO 2 ) were identified as outliers and then removed. Quality assured hourly data values in each cluster were then averaged on a semi-monthly basis to create new time series for each air pollutant. Such a data integration scheme not only improved the data completeness in the temporal domain, but reduced the overall dimensionality of data sources. Finally, 530 clustered stations were attained, and records from only 499 stations were used for the subsequent analysis, given the presence of salient data gaps in data records from other 31 stations.

Quality Control and Data Integration
Given observations from dense stations would not only result in significant information redundancy, but might increase the computational complexity due to significant spatial heterogeneity of air quality observations, data from 1604 stations were integrated into 530 data sets via spatial hierarchy clustering by following the process as detailed in Reference [1]. As shown in Figure 1, the clustered stations consisted of 369 merged stations and 161 independent stations.
Prior to data integration, essential quality control measures were applied to raw observations at each station to detect and remove outliers. First, a reference time series was created among stations within the same cluster using the median values of data from original stations, and a data value was referred to as an outlier once it exceeded triple (six times for SO2 because of large variation) the 24 h standard deviations from the reference time series. Second, gradient difference was calculated for the previously screened time series to further identify outliers. Specifically, the quadratic derivatives were calculated and data values were deemed outliers once the quadratic gradient values were not within the range of −100 to 100. Such a screening enabled identification of outliers overlooked in the previous step. Around 3.9% of PM2.5 observations (2.9% for SO2 and 5.1% for NO2) were identified as outliers and then removed. Quality assured hourly data values in each cluster were then averaged on a semi-monthly basis to create new time series for each air pollutant. Such a data integration scheme not only improved the data completeness in the temporal domain, but reduced the overall dimensionality of data sources. Finally, 530 clustered stations were attained, and records from only 499 stations were used for the subsequent analysis, given the presence of salient data gaps in data records from other 31 stations.

MCA Method
MCA was developed to identify coupled modes between two geophysical fields by maximizing their covariance rather than correlation in the time-space domain [17,18]. Because of this unique capability, MCA has been widely applied to link two geophysical fields in time and space, especially for climate change detection and attribution, since MCA enables resolution of the coupled fields and their presentation in both spatial and temporal domains [19,20]. A detailed description of the MCA method can be found elsewhere [17].
In terms of the identified coupled modes, whether these modes are statistically meaningful is always a big concern in real practice. In this study, Monte Carlo simulations were conducted to test the statistical significance of each MCA mode by randomly permuting the time series of one field [18,21,22]. The derived MCA modes were deemed statistically significant once the actual statistics were greater than those of their random counterparts. Prior to MCA analysis, the seasonality (i.e., annual cycle) of each variable was first removed, and the goal was to avoid possible influence from local atmospheric dynamic related variations. Specifically, PM 2.5 values at each semi-month of the year for the whole study period were averaged and then removed from the original time series. Similar procedures were also applied for SO 2 and NO 2 data sets.
In principle, mapping the left and right MCA derived singular vectors geographically would reveal synergetic variations of both fields in the spatial domain. Given singular vectors always have different magnitudes, most studies have converted them to either homogeneous or heterogeneous correlation maps to better interpret the MCA results [17,[23][24][25]. Homogeneous correlation maps represent the correlation values between the expansion coefficient of one field and the time series of the same field at each grid point, with another field for heterogeneous maps [26]. In this study, the spatial pattern of coupled modes were presented in the form of covariance maps, since they reveal the spatial localization of the co-varying part on each field [17,26,27].

Quantifying the Relative Contribution of NO 2 and SO 2 to PM 2.5 Variations
Although MCA enables resolution of the coupled patterns between PM 2.5 and another two fields, it is challenging to determine which field plays a more important role in contributing to the observed PM 2.5 variations. Statistical modeling practices were therefore performed in the present study in the form of multiple linear regression (MLR) to further evaluate the relative contribution of NO 2 and SO 2 to the detected regional PM 2.5 variations. Specifically, PM 2.5 variations and the corresponding NO 2 and SO 2 information were first extracted from the hot spot regions as identified in the MCA modes. Subsequently, the extracted PM 2.5 time series were used as the dependent variable and the extracted principal components of NO 2 and SO 2 over the selected hot spot regions were used as explanatory variables.
The variance inflation factor (VIF) was calculated between these regressors to examine the possible collinearity problem. A threshold of 10 (an empirical value that is widely used in the literature) was also used in this study to determine the possible collinearity [28]. Finally, the LMG method [29], an advanced technique to determine the relative importance of explanatory variables in a linear model, was also applied to assess the relative contribution of NO 2 and SO 2 to the observed PM 2.5 variations. A detailed description of the calculation process of the LMG measure can be found elsewhere [29].

Results and Discussion
3.1. Linear Trends for PM 2.5 and SO 2 as well as NO 2 To better examine possible coupled spatiotemporal variations between PM 2.5 and SO 2 and NO 2 , linear trends for each factor were firstly estimated using the least-square fitting method. Figure 2 shows the estimated linear trends for three factors between January 2015 and December 2018, and the left panel (Figure 2a-c) presents the trends estimated from an averaged value based on all data values, whereas the right counterparts (Figure 2d-f) were derived from their large values (i.e., averaged over values greater than the fourth quartile). It shows that PM 2.5 and SO 2 decreased markedly during the study time period, especially in northern China, where significant decreasing trends were observed. Moreover, in contrast to the trends estimated from all data value averages, trends estimated from large values were even more salient in northern China, indicating the effectiveness of clean air actions in reducing high loadings of PM 2.5 and SO 2 therein. However, no such significant declination effect was found for NO 2 concentrations across China. In contrast, marked increasing trends (dots shaded in orange) were even observed in some regions, for example over Shanxi and Anhui provinces. Table 1 shows the estimated linear least-square trends for these three pollutants over seven major geographic divisions (refer to Figure 1 for the geolocation of these divisions) from 2015 to 2018. It is noticeable that concentrations of PM 2.5 and SO 2 decreased nationwide, but had different declining rates. In regard to PM 2.5 , regions exhibiting the most significant decreasing trends included the northeast, northern China, and central China, by a rate of about −4 µg/(m 3 ·a) for all value averaged concentrations and −8 µg/(m 3 ·a) for large values. It shows that trends for large values almost doubled those of all value averaged concentrations, indicating the decline of PM 2.5 loading is mainly dominated by the decreasing of severe pollution episodes. A similar effect was also observed for SO 2 , while northern China, central China, and east China were the three regions with the most significant declining trends.
With respect to NO 2 , both increasing and decreasing trends were observed, whereas most trends were statistically insignificant, except for the northeast, where NO 2 concentrations were observed with a statistically significant decreasing trend. These results collectively indicate that new actions should be implemented to control NO 2 emissions, toward the improvement of air quality nationwide.

Spatiotemporal Coupled Patterns between PM2.5 and SO2 as well as NO2
MCA was applied here to explore the spatiotemporal coupled patterns between PM2.5 and SO2 as well as NO2. Prior to the analysis of coupled patterns, significance of each MCA mode was firstly assessed using Monte Carlo simulations. Figure 3 shows the estimated squared covariance fraction (SCF) of each MCA mode and the estimated upper limit at a 95% confidence interval (UPCI). In  MCA was applied here to explore the spatiotemporal coupled patterns between PM 2.5 and SO 2 as well as NO 2 . Prior to the analysis of coupled patterns, significance of each MCA mode was firstly assessed using Monte Carlo simulations. Figure 3 shows the estimated squared covariance fraction (SCF) of each MCA mode and the estimated upper limit at a 95% confidence interval (UPCI). In principle, only MCA modes with SCFs greater than the 95% UPCI are considered to be statistically significant. Apparently, only the first MCA mode of each pair passed the significance test. In other words, statistically significant linkages could be found from the first MCA modes between PM 2.5 and SO 2 and/or NO 2 .
Given the significant test results, we only used the first MCA mode to explore possible coupled patterns between these three pollutants, to ensure the spatial patterns we extracted were statistically meaningful. It shows that the first MCA mode between PM 2.5 and SO 2 explained more than 80% of the observed covariance, indicating a tight spatiotemporal co-varying pattern between PM 2.5 and SO 2 across China. This effect could be also linked to the observed coincident declining trends for both pollutants. In contrast, the first MCA mode between PM 2.5 and NO 2 only explained about 50% of the covariance, indicating no uniform spatiotemporal variation pattern between these two factors. Given that PM 2.5 at most stations exhibited decreasing trends across China, the small SCF values associated with coupled modes between PM 2.5 and NO 2 thus suggest a significant heterogeneity of NO 2 variations nationwide. Such a deduction is also in line with the results previously shown in Figure 2, where heterogeneous NO 2 trends were observed across China. words, statistically significant linkages could be found from the first MCA modes between PM2.5 and SO2 and/or NO2. Given the significant test results, we only used the first MCA mode to explore possible coupled patterns between these three pollutants, to ensure the spatial patterns we extracted were statistically meaningful. It shows that the first MCA mode between PM2.5 and SO2 explained more than 80% of the observed covariance, indicating a tight spatiotemporal co-varying pattern between PM2.5 and SO2 across China. This effect could be also linked to the observed coincident declining trends for both pollutants. In contrast, the first MCA mode between PM2.5 and NO2 only explained about 50% of the covariance, indicating no uniform spatiotemporal variation pattern between these two factors. Given that PM2.5 at most stations exhibited decreasing trends across China, the small SCF values associated with coupled modes between PM2.5 and NO2 thus suggest a significant heterogeneity of NO2 variations nationwide. Such a deduction is also in line with the results previously shown in Figure 2, where heterogeneous NO2 trends were observed across China.  (Figure 4) as well as NO2 (Figure 5), respectively. As indicated by both homogeneous and heterogeneous covariance maps (Figure 4a-d) as well as temporal variations (Figure 4e), there is an evident co-varying pattern between PM2.5 and SO2 in northern China. Specifically, the hot spot region for PM2.5 is mainly located over regions in BTH and the west regions of Shandong province (the outlined region in Figure 4a).   (Figure 4) as well as NO 2 ( Figure 5), respectively. As indicated by both homogeneous and heterogeneous covariance maps (Figure 4a-d) as well as temporal variations (Figure 4e), there is an evident co-varying pattern between PM 2.5 and SO 2 in northern China. Specifically, the hot spot region for PM 2.5 is mainly located over regions in BTH and the west regions of Shandong province (the outlined region in Figure 4a). With respect to SO 2 , regions exhibiting large covariance with PM 2.5 are also observed in northern China, but cover larger areas, as Shanxi province is also included. Such a highly clustered spatial pattern in both fields indicates that PM 2.5 and SO 2 in these regions share a similar variation pattern in temporal domain. Moreover, it indicates a far-reaching linkage between PM 2.5 and SO 2 in the spatial domain, as PM 2.5 in northern China may exhibit a similar variation pattern to that of SO 2 , even in Shanxi province. Given PM 2.5 formations are highly associated with regional SO 2 emissions, we may suspect that the observed PM 2.5 variations in northern China might be modulated by not only local SO 2 emissions, but even emissions from neighboring regions like Shanxi province. Numerical simulations could be performed in the future to corroborate this inferred linkage, more specifically, to examine whether there is an eastward transfer of SO 2 emissions from Shanxi to northern China.   In contrast, no evident coupled field was found in the first MCA mode between PM 2.5 and NO 2 ( Figure 5). Although a spatially localized PM 2.5 hot spot region was identified in northern China, no such effect was found in the NO 2 fields. Rather, stations with large covariance were observed to scatter randomly across China, indicating a lack of potent spatiotemporal coupled patterns between these two factors, regardless of the observed close correlation between their expansion coefficients (Figure 5e). In other words, the observed PM 2.5 variation in northern China had a weak linkage with local NO 2 emissions.

Relative Contributions of SO2 and NO2 to PM2.5 Variations in Northern China
An MLR model was established to assess the relative contribution of SO2 and NO2 to PM2.5 variations observed over northern China. First, PM2.5 anomaly time series in northern China were extracted by averaging PM2.5 anomaly time series at all stations located in BTH and nearby regions (i.e., hotspot regions identified from PM2.5 homogeneity and heterogeneity covariance maps). The first two principal components (PC1 and PC2) of SO2 and NO2 (denoted as SO2_PC1, SO2_PC2, NO2_PC1 and NO2_PC2) were then extracted over the identified SO2 hotspot region (outlined region in Figure 4b) and used as explanatory variables to proximate the observed PM2.5 variations in northern China. All regressors were standardized by subtracting their mean values and then divided by their standard deviations to make them unitless and comparable. The MLR model can be expressed as: Figure 6 simply compares the observed PM2.5 variations in northern China and the predicted one, using the first two principal components of SO2 and NO2. It shows that about 58% (R 2 = 0.58) of

Relative Contributions of SO 2 and NO 2 to PM 2.5 Variations in Northern China
An MLR model was established to assess the relative contribution of SO 2 and NO 2 to PM 2.5 variations observed over northern China. First, PM 2.5 anomaly time series in northern China were extracted by averaging PM 2.5 anomaly time series at all stations located in BTH and nearby regions (i.e., hotspot regions identified from PM 2.5 homogeneity and heterogeneity covariance maps). The first two principal components (PC1 and PC2) of SO 2 and NO 2 (denoted as SO 2 _PC1, SO 2 _PC2, NO 2 _PC1 and NO 2 _PC2) were then extracted over the identified SO 2 hotspot region (outlined region in Figure 4b) and used as explanatory variables to proximate the observed PM 2.5 variations in northern China. All regressors were standardized by subtracting their mean values and then divided by their standard deviations to make them unitless and comparable. The MLR model can be expressed as: (1) Figure 6 simply compares the observed PM 2.5 variations in northern China and the predicted one, using the first two principal components of SO 2 and NO 2 . It shows that about 58% (R 2 = 0.58) of the observed PM 2.5 variances can be explained by local SO 2 and NO 2 variations therein. Nonetheless, the predicted time series were prone to larger oscillations than the observed one. Table 2 summarizes the regression coefficients and the estimated LMG measures for each regressor. It shows that only two regressors (SO 2 _PC1 and NO 2 _PC1) were statistically significant at the 95% confidence interval. the regression coefficients and the estimated LMG measures for each regressor. It shows that only two regressors (SO2_PC1 and NO2_PC1) were statistically significant at the 95% confidence interval.  In terms of LMG measures, SO2_PC1 had the largest LMG fraction, indicating the most significant role of SO2 in contributing to the observed PM2.5 variation. In contrast, NO2_PC1 played a relative weak role. These effects were also evidenced by their semi-partial coefficient of determination (SPR), as the inclusion of SO2_PC1 would result in salient improvement of R 2 regardless of the sequence. Since the largest value of estimated VIF between these four regressors was 1.93, which is far less than the threshold of 10, we may therefore claim that the collinearity between these four regressors can be neglected. In general, the modeling results clearly revealed the relative important role of SO2 over NO2 in contributing to the observed PM2.5 variations in northern China from 2015 to 2018.
Nevertheless, this does not mean PM2.5 variations in northern China have no inherent linkage with NO2. Rather, there is no statistically significant linkage found between them in our cases. On the other hand, we should be aware of the fact that the temporal phase (i.e., warm versus cold or day versus night) of the relationship between PM2.5 and NO2 and SO2 was not accounted for in the present study, though it plays critical roles in the formation of nitrates and sulfates [30][31][32]. The effect could be explored in the future by performing MCA analysis between PM2.5 and day and/or night averaged NO2 and SO2 to better examine their possible linkage in space and time.

Conclusions
By taking advantage of in situ air quality records acquired from the China national air quality monitoring network, linear trends for PM2.5, SO2, and NO2 were evaluated for the time period of 2015  In terms of LMG measures, SO 2 _PC1 had the largest LMG fraction, indicating the most significant role of SO 2 in contributing to the observed PM 2.5 variation. In contrast, NO 2 _PC1 played a relative weak role. These effects were also evidenced by their semi-partial coefficient of determination (SPR), as the inclusion of SO 2 _PC1 would result in salient improvement of R 2 regardless of the sequence. Since the largest value of estimated VIF between these four regressors was 1.93, which is far less than the threshold of 10, we may therefore claim that the collinearity between these four regressors can be neglected. In general, the modeling results clearly revealed the relative important role of SO 2 over NO 2 in contributing to the observed PM 2.5 variations in northern China from 2015 to 2018.
Nevertheless, this does not mean PM 2.5 variations in northern China have no inherent linkage with NO 2 . Rather, there is no statistically significant linkage found between them in our cases. On the other hand, we should be aware of the fact that the temporal phase (i.e., warm versus cold or day versus night) of the relationship between PM 2.5 and NO 2 and SO 2 was not accounted for in the present study, though it plays critical roles in the formation of nitrates and sulfates [30][31][32]. The effect could be explored in the future by performing MCA analysis between PM 2.5 and day and/or night averaged NO 2 and SO 2 to better examine their possible linkage in space and time.

Conclusions
By taking advantage of in situ air quality records acquired from the China national air quality monitoring network, linear trends for PM 2.5 , SO 2 , and NO 2 were evaluated for the time period of 2015 to 2018. Prior to trend analysis, data values from neighboring stations were integrated on a semi-monthly basis to improve data completeness and to reduce dimensionality. The estimated linear trends indicate salient decreasing trends for PM 2.5 and SO 2 , especially over northern China. Moreover, trends estimated from large values almost doubled those of the all-value averaged data, indicating the need to take data magnitudes into consideration in trend analysis in order to better assess the effectiveness of clean air actions. However, no evident decreasing trend was detected for NO 2 concentrations, even using their large data values.
The MCA results revealed an evident spatiotemporal coupled pattern between PM 2.5 and SO 2 in northern China from 2015 to 2018, but with different hot spot regions. Such an effect indicates a possible linkage that PM 2.5 variations in BTH might be attributed to SO 2 emissions, even in Shanxi province, to some degree. In contrast, no significant coupled pattern was found between PM 2.5 and NO 2 . Further modeling practices revealed that about 58% of the observed PM 2.5 variations in northern China could be explained by local NO 2 and SO 2 variations collectively. Moreover, the observed PM 2.5 variations should be attributed largely to SO 2 variations in northern China, given the relatively larger importance found for SO 2 rather than NO 2 . These results collectively suggest the need to implement more stringent emission control on NO 2 in China in the future. Given the constraint of temporal limitation of the in situ data record, a long-term study could be performed in the future by making use of satellite-derived PM 2.5 estimations and remotely sensed NO 2 and SO 2 columns, measured by the Ozone Monitoring Instrument (OMI), to better explore their associations in space and time.