Spatiotemporal Variability and Influencing Factors of Aerosol Optical Depth over the Pan Yangtze River Delta during the 2014–2017 Period

Large amounts of aerosol particles suspended in the atmosphere pose a serious challenge to the climate and human health. In this study, we produced a dataset through merging the Moderate Resolution Imaging Spectrometers (MODIS) Collection 6.1 3-km resolution Dark Target aerosol optical depth (DT AOD) with the 10-km resolution Deep Blue aerosol optical depth (DB AOD) data by linear regression and made use of it to unravel the spatiotemporal characteristics of aerosols over the Pan Yangtze River Delta (PYRD) region from 2014 to 2017. Then, the geographical detector method and multiple linear regression analysis were employed to investigate the contributions of influencing factors. Results indicate that: (1) compared to the original Terra DT and Aqua DT AOD data, the average daily spatial coverage of the merged AOD data increased by 94% and 132%, respectively; (2) the values of four-year average AOD were high in the north-east and low in the south-west of the PYRD; (3) the annual average AOD showed a decreasing trend from 2014 to 2017 while the seasonal average AOD reached its maximum in spring; and that (4) Digital Elevation Model (DEM) and slope contributed most to the spatial distribution of AOD, followed by precipitation and population density. Our study highlights the spatiotemporal variability of aerosol optical depth and the contributions of different factors over this large geographical area in the four-year period, and can, therefore, provide useful insights into the air pollution control for decision makers.


Introduction
Aerosols, the liquid or solid particulate matter suspended in the atmosphere [1], have both natural and anthropogenic sources, such as volcanic eruptions, sand, dust, fossil fuel combustion, and industrial and traffic emissions [2][3][4]. By absorbing and scattering solar radiation and perturbing the hydrological cycle, aerosols have a crucial effect on regional and global climate change [5][6][7]. In addition, numerous aerosol particles contribute to increased levels of haze and lead to low visibility [8][9][10][11]. Furthermore,

Study Area
Located in the mid-east China (27 • 12' N~35 • 20' N, 114 • 54' E~123 • 10' E), the Pan Yangtze River Delta region (PYRD) consists of the provinces of Anhui, Jiangsu, Zhejiang and the provincial-level municipality of Shanghai, with an area of approximately 357,282 km 2 ( Figure 1). With low elevations in the northeast and high in the southwest, the PYRD is characterized by diverse geomorphological features including plains, tablelands, hills and mountains [68]. The plains are mainly distributed in Jiangsu, Shanghai, and north Anhui, while most of the hills and mountains are scattered in Zhejiang and southeast, and southwest of Anhui. Divided by the Huai River, the PYRD has a subtropical monsoon climate in the south with hot, rainy summers and mild winters but a temperate monsoon climate in the north with hot, rainy summers and cold, dry winters.
As one of the most densely populated and economically developed regions in China, the PYRD has a population of some 223 million people and generated a GDP (gross domestic product) of 19.53 trillion CNY (Chinese yuan) in 2017, accounting for approximately 16.08% and 23.05% of China's total population and GDP, respectively. However, the population density and GDP per capita varied across the PYRD, being highest in Shanghai (3813 people/km 2 and 126,687 CNY) and lowest in Anhui (448 people/km 2 and 43,194 CNY). Despite its important role in China's economic growth, the PYRD has experienced severe haze pollution since 2013.  Table 2. Data from these sites were used for calibration and validation in Section 3.  [70]. To mitigate these serious levels of air pollution, China's State Council issued the National Action Plan for Air Pollution Prevention and Control in September 2013, followed by the regional rule for the implementation of National Action Plan in the PYRD jointly released by governments of Anhui, Jiangsu, Zhejiang, and Shanghai in January 2014 [67]. There is therefore an urgent need to examine the effect of such regulations and to further explore the influencing mechanism of the factors contributing to the AOD.

MODIS AOD Data
MODIS sensors onboard Terra and Aqua satellites, both launched by the U.S. National Aeronautics and Space Administration (NASA), provide daily AOD measurements [71]. Terra and Aqua satellites cross the equator separately at approximately 10:30 a.m. and 1:30 p.m. local solar time [31]. MODIS Collection 6.1 (C6.1) Level 2 aerosol products from 1st January 2005 to 30th December 2017 covering the PYRD were obtained from the website of Level 1 and Atmosphere Archive and Distribution System (LAADS) [72]. The downloaded MODIS AOD data products consist of 3-km DT AOD and 10-km DB AOD from both Terra and Aqua satellites. The expected error (EE), which represents a one-standard deviation confidence interval around the retrieved AOD (i.e., about 68% of points should fall within ±EE from the AERONET AOD), is ±(0.05 + 20%) for the 3-km DT retrievals over land [27,73]. For the 10-km DB retrievals, the EE is defined relative to DB-retrieved AOD rather than to AERONET AOD, is approximately ±(0.03 + 20%) on average [29,73]. In this study, only those AOD retrievals at 550 nm with the recommended quality assurance (QA) for the DT (QA = 3) and DB (QA ≥ 2) were selected [26]. Therefore, the DT and DB high-quality retrievals were obtained from the Scientific Data Set (SDS) "Optical_Depth_Land_and_Ocean" in the 3-km DT products and "Deep_Blue_Aerosol_Optical_ Depth_550_Land_Best Estimate" in the 10-km DB products, Figure 1. The geographical locations of the PYRD and AERONET sites. Details of these sites are given in Table 2. Data from these sites were used for calibration and validation in Sections 3.2.1 and 3.2.2.
According to the China Statistical Yearbook on Environment 2015 [69], the annual average concentrations of PM 2.5 and PM 10 [70]. To mitigate these serious levels of air pollution, China's State Council issued the National Action Plan for Air Pollution Prevention and Control in September 2013, followed by the regional rule for the implementation of National Action Plan in the PYRD jointly released by governments of Anhui, Jiangsu, Zhejiang, and Shanghai in January 2014 [67]. There is therefore an urgent need to examine the effect of such regulations and to further explore the influencing mechanism of the factors contributing to the AOD.

MODIS AOD Data
MODIS sensors onboard Terra and Aqua satellites, both launched by the U.S. National Aeronautics and Space Administration (NASA), provide daily AOD measurements [71]. Terra and Aqua satellites cross the equator separately at approximately 10:30 a.m. and 1:30 p.m. local solar time [31]. MODIS Collection 6.1 (C6.1) Level 2 aerosol products from 1st January 2005 to 30th December 2017 covering the PYRD were obtained from the website of Level 1 and Atmosphere Archive and Distribution System (LAADS) [72]. The downloaded MODIS AOD data products consist of 3-km DT AOD and 10-km DB AOD from both Terra and Aqua satellites. The expected error (EE), which represents a one-standard deviation confidence interval around the retrieved AOD (i.e., about 68% of points should fall within ±EE from the AERONET AOD), is ±(0.05 + 20%) for the 3-km DT retrievals over land [27,73]. For the 10-km DB retrievals, the EE is defined relative to DB-retrieved AOD rather than to AERONET AOD, is approximately ±(0.03 + 20%) on average [29,73]. In this study, only those AOD retrievals at 550 nm with the recommended quality assurance (QA) for the DT (QA = 3) and DB (QA ≥ 2) were selected [26]. Therefore, the DT and DB high-quality retrievals were obtained from the Scientific Data Set (SDS) "Optical_Depth_Land_and_Ocean" in the 3-km DT products and "Deep_Blue_Aerosol_Optical_ Depth_550_Land_Best Estimate" in the 10-km DB products, respectively. Due to their relatively high spatial resolution, the 3-km DT AOD datasets were selected as the main source to illustrate the spatiotemporal characteristics of AOD over the PYRD [27]. However, the DT algorithm does not perform well over bright surfaces. To fill the gaps left by the 3-km DT AOD, the 10-km DB AOD datasets were used as supplementary source and merged to the 3-km DT AOD datasets because of their better performance over bright targets [22]. MODIS AOD data derived from 2005 to 2013 were utilized for AOD calibration (Section 3.2.1). Data from 2014 to 2017 were calibrated and then employed for spatiotemporal characteristics and influencing factors analysis of AOD. Table 1 provides a summary of MODIS AOD data products used in this study.

AERONET AOD Data
In order to validate MODIS AOD values, the high-accuracy ground-based aerosol measurements from 2005 to 2017 were obtained from the Aerosol Robotic Network (AERONET) [74], a global aerosol observation network recording AOD observations by CE-318 Solar Photometer every 15 min with an uncertainty of~0.01-0.02 under cloud-free conditions [75][76][77][78]. The AERONET offers three levels of AOD data, Level 1.0 without strict quality checks, Level 1.5 with cloud screening checks, and Level 2.0 with rigorous quality checks [34]. Due to the accuracy and volume of data, the Level 1.5 AERONET AOD data at 15 sites (Table 2) in the region were chosen for validation [79,80]. Table 2. Locations of the Aerosol Robotic Network (AERONET) sites within the Pan Yangtze River Delta (PYRD) (see Figure 1) and the periods of their available data.

Number
Site Name Longitude

Auxiliary Data
Ten potential factors affecting the spatial distribution of AOD were selected from four categories, namely topography, meteorology, vegetation, and socioeconomics. To derive these factors, multi-sources were collected. The 90-m resolution Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) data were freely obtained from the website of Consultative Group for International Agriculture Research Consortium for Spatial Information (CGIAR-CSI) [81] and used to provide DEM and slope of the study area. Monthly meteorological dataset observed at 87 observation stations within and around the study area from 2014 to 2017 were downloaded from the China Meteorological Data Service Center (CMDC) [82]. The dataset provides meteorological information including precipitation (PREC), average wind speed (AWS), average temperature (ATEM) and average relative humidity (ARH). As planetary boundary layer height (PBLH) data were not available at this website, the monthly PBLH data from 2014 to 2017 were collected from the European Center for Medium-Range Weather Forecasts [83], with a horizontal resolution of 0.125 • × 0.125 • . The normalized difference vegetation index (NDVI) data from 2014-2017, representing the vegetation coverage of the study area, were acquired from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) [84]. RESDC provides seasonal NDVI values and annual NDVI values at 1-km resolution [41]. From this website, we also obtained the 1-km resolution annual gross domestic product (GDP) and population density (POP) data of the study area in 2015 to reflect the anthropogenic emissions of pollutants from 2014 to 2017.

Methodology
The general process of this study is shown in Figure 2. A fixed 3 × 3 km grid (40,801 cells in total) was first created in the extent of the PYRD, as the MODIS AOD pixel centroids varied from day to day [71]. To be consistent with DT AOD, the 10-km DB AOD data were resampled to 3-km resolution using the nearest neighbor method in ENVI 5.3 (Exelis Visual Information Solutions, Boulder, CO, USA), as shown in previous studies [25,85], based on the assumption that the variability of DB AOD data is small within the 10 × 10 km grid. To check whether the other resampling methods can improve the accuracy of the resampled AOD data, bilinear interpolation and cubic convolution were also used for AOD interpolation and the validation results show the nearest neighbor method outperformed the other two methods (Table A1 in Appendix A). Next, daily AOD pixel values from the four datasets (Terra 3-km DT AOD, Aqua 3-km DT AOD, Terra 3-km DB AOD, and Aqua 3-km DB AOD) were matched to the 3-km grid cells whose centroids were within a given grid cell [32], using the extraction tool in ArcGIS 10.2 (Esri, Redlands, CA, USA). To fill AOD data gaps, DB AOD and DT AOD data were merged and then the merged effect was evaluated. After that, the merged AOD data were utilized for spatiotemporal analysis and identification of influencing factors.

MODIS AOD Merging
To fill AOD data gaps, a four-step merging approach was utilized to merge DT AOD and DB AOD data, following the method proposed by He and Huang [22]: Step 1: Calibrating the MODIS AOD data. To reduce the systematic bias in satellite-retrieved AOD values, simple linear regression relationships between AERONET AOD and MODIS AOD from 2005 to 2013 were developed to calibrate the MODIS AOD data during 2014-2017 period [21,30]. Since the relationship between AERONET AOD and MODIS AOD varied by season and AOD dataset, linear relationship analysis was conducted for each of the four MODIS AOD dataset (i.e., Terra/Aqua 3-km DT AOD and Terra/Aqua 10-km DB AOD) and each season, separately (Table A2 in Appendix A) [22,27,39]. The seasons were defined in this study as spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February). Step 2: Filling missing Terra AOD data with Aqua AOD values, and vice versa. Owing to the contrasting crossing times, the AOD data retrieved from the two satellites (i.e., Terra and Aqua) differ in spatial coverage [21,22,34,42]. Therefore, for AOD datasets retrieved by the same algorithm (i.e., DT or DB AOD), a simple linear regression model between Terra and Aqua values were developed for each day to fill the missing Terra/Aqua AOD data with the present one (e.g., predicting the missing Aqua DT AOD with the present Terra DT AOD, and vice versa) [86]. It is notable that extra biases may be generated in this step, due to the changing PBLH and aerosol concentration between two satellite overpass times. We acknowledge the limitation of this approach. However, it is a common and effective practice to predict missing AOD values for Terra or Aqua AOD [22,31,86]. Additionally, Pearson correlation coefficients also indicate that there were high correlations between AERONET AOD values at the two satellite passing times (Table A3 in Appendix A).
Step 3: Filling missing DT AOD data with DB AOD values. To fully exploit the retrievals of both DT and DB algorithms, linear regression relationships between daily DT and DB AOD values were established and used to predict values in the no-data pixels in DT AOD when only DB AOD is present Figure 2. The framework of the study procedure.
Step 2: Filling missing Terra AOD data with Aqua AOD values, and vice versa. Owing to the contrasting crossing times, the AOD data retrieved from the two satellites (i.e., Terra and Aqua) differ in spatial coverage [21,22,34,42]. Therefore, for AOD datasets retrieved by the same algorithm (i.e., DT or DB AOD), a simple linear regression model between Terra and Aqua values were developed for each day to fill the missing Terra/Aqua AOD data with the present one (e.g., predicting the missing Aqua DT AOD with the present Terra DT AOD, and vice versa) [86]. It is notable that extra biases may be generated in this step, due to the changing PBLH and aerosol concentration between two satellite overpass times. We acknowledge the limitation of this approach. However, it is a common and effective practice to predict missing AOD values for Terra or Aqua AOD [22,31,86]. Additionally, Pearson correlation coefficients also indicate that there were high correlations between AERONET AOD values at the two satellite passing times (Table A3 in Appendix A).
Step 3: Filling missing DT AOD data with DB AOD values. To fully exploit the retrievals of both DT and DB algorithms, linear regression relationships between daily DT and DB AOD values were established and used to predict values in the no-data pixels in DT AOD when only DB AOD is present [22,42]. After this step, two gap-filled AOD datasets were generated, called as processed Terra AOD and processed Aqua AOD respectively.
Step 4: Averaging the daily Terra AOD and Aqua AOD values. The average of the daily processed Terra AOD and Aqua AOD values (both overserved and predicted values) were calculated and considered as the final daily AOD data (merged AOD hereafter) [22,31,39,42].

Merged AOD Evaluation
Ten-fold cross-validation (CV) method was used to evaluate the performance of linear regression models. The original data were randomly divided into 10 groups. From the 10 groups, nine groups of data were used as training data for developing the model, and the remaining group was used to test its predictions. This step was then repeated 10 times until every fold was tested. The commonly used statistical metrics, including root mean squared error (RMSE), relative prediction error (RPE), and the coefficient of determination (R 2 ) were used to measure the predictive performance of the models [32,87]. The standard deviation (σ) of the predicted AOD values in each step (i.e., Step 2 and Step 3 in Section 3.2.1) were also calculated.
To examine the performance of the merging operation, accuracy comparison was conducted between the original Terra/Aqua DT AOD and processed Terra/Aqua AOD data, because the merged AOD data cannot be collocated with the AERONET AOD measurements in time [22]. In addition to the accuracy comparison, we also examined if the coverage of the merged AOD data higher than that of original Terra DT AOD and Aqua DT AOD data. Here, the coverage includes daily spatial coverage (denotes the ratio of AOD available pixels of all the pixels for each day) and pixel-level temporal coverage (denotes the ratio of the AOD available days of the whole study period for each pixel) [21].
Linear fitting of MODIS AOD with corresponding AERONET AOD data was used to validate the accuracy of MODIS AOD. Since AERONET AOD are point measurements at 15-minute intervals while MODIS AOD are instantons data when the satellites overpass, MODIS AOD retrievals and AERONET measurements cannot be compared directly and need to be matched in space and time. Thus, following the method of previous studies [77,80], the MODIS AOD retrievals within 5 × 5 pixels (i.e., 15 × 15 km) centered over the AERONET sites were averaged and then collocated with the mean values of the AERONET AOD measurements within 30 min of the time when MODIS passes over. Note that the MODIS AOD were retrieved at 550 nm while AERONET does not provide AOD data at 550 nm, AERONET AOD at 550 nm was derived by interpolating the AERONET AOD values at 440 nm and 675 nm with Equation (1) and Equation (2) [34,80]: where τ λ 1 , τ λ 2 are AOD at the two closest bands λ 1 (440 nm) and λ 2 (675 nm), respectively, τ λ 3 are AOD at 550 nm. Several statistical indicators were selected for comparison of values between MODIS AOD and AERONET AOD, such as the number of matched MODIS AOD and AERONET AOD pairs (N), correlation efficient (R), root mean squared error (RMSE), and the percentage retrievals within the expected error (EE, ±(0.05 + 20%τ A ), where τ A is the AERONET AOD) envelope [23,88].

Data Integration
For the analysis of spatiotemporal variability and influencing factors, the daily merged AOD (Section 3.2.1) and auxiliary data (Section 3.1.3) were further processed. The seasonally and annually averaged AOD were derived by averaging the daily merged AOD. The monthly meteorological data (i.e., PREC, AWS, ATEM, ARH and PBLH) and NDVI data (both seasonal and annual data) [41] were converted to four-year seasonal and annual average data. After that step, the meteorological data (i.e., PREC, AWS, ATEM, ARH) at each site were interpolated to 3-km continuous raster data by the inverse distance weight interpolation method [38]. PBLH data were resampled to 3 × 3 km grid cell by bilinear interpolation [89]. For the other variables (i.e., DEM, SLP, NDVI, GDP and POP), the corresponding values of the pixels fell in each grid cell were averaged separately to match the fixed 3 × 3 km grid [90].

Influencing Factors Identification
To identify the intensity and directions of the impacts of factors on AOD, geographical detector method and multiple linear regression analysis were used. Based on the concept of spatial stratification heterogeneity-which refers to a geographical phenomenon that the observations are homogeneous within each stratum rather than between strata, the geographical detector method can quantify the contributions of influencing factors [91]. The philosophy of this method is that if an independent variable X (the factor) takes on a similar spatial distribution to that of the dependent variable Y (AOD), there is a direct or indirect relationship between the variable X and dependent variable Y [57]. The geographical detector method examines if an independent variable X takes on a similar spatial distribution with the dependent variable Y and measures the association between Y and X by the power of determinant (q) [91]. Here, the power of determinant (q) indicates how much X contributes to the spatial stratification heterogeneity of Y, or how much Y is interpreted by X [92].
Specifically, a study area is composed of N units, and the AOD in each unit is denoted as The factor (X) layer is stratified into h = 1, . . . , L stratum according to the spatial heterogeneity first, and then the AOD (Y) layer is divided into L stratum also by overlaying the Y layer and X layer. Stratum h has N h units and N = L h=1 N h . In stratum h, the AOD in each unit is denoted as Y hi (1 ≤ hi ≤ N h ). For the whole study area, the mean value and variance of AOD are , respectively. For stratum h, the mean value and variance of AOD are respectively. The power of determinant (q) of X to Y can be expressed as [91]: where SSW is the within sum of the squares; SST is the total sum of the squares. If SSW is less than SST, spatially stratified heterogeneity exists. Usually, q ∈ [0, 1]. If q = 1, it means that X can explain 100% of Y; If q = 0, there is no association between X and Y. A larger q value indicates a greater influence of X on Y. Following the threshold set by Tang et al. [93], we considered a factor had an important contribution to AOD when the q value of this factor approaches 0.2.
In addition, as the geographical detector method can only measure the explanatory power of factors, but cannot reveal the nature of the effect (i.e., negative or positive) [56,59], the multiple linear regression was performed as a supplement to identify such information [56]. To avoid the collinearity issue, Pearson correlation coefficients among influencing factors were used to select the variables for model building [21,45,94]. The positive or negative regression coefficients in the multiple linear regression model indicate that the impact of some factor on AOD is positive or negative. The variation inflation factor (VIF) is used to measure the multicollinearity among multiple regression variables [95], if VIF is less than 3, it indicates that there is no collinearity in the regression model. The geographical detector method and multiple linear regression analysis were conducted using GeoDetector [91] and IBM SPSS Statistics 20.0 (SPSS Inc., Chicago, IL, USA), respectively.

Validation of the Merged AOD
The cross-validation results (Table A4 in Appendix A) demonstrate the good predictive performance of the linear regression models, with RMSE ranging from 0.09 to 0.14, RPE ranging from 20.8% to 29.6% and R 2 ranging from 0.82 to 0.88 for the years and the four-year period. The σ of the predicted AOD values range from 0.0011 to 0.0110 for the four-year period of 2014-2017. Figure 3 shows the comparison between the AERONET AOD data from 2014 to 2017 against the original Terra/Aqua DT AOD data and the processed Terra/Aqua AOD data. Overall, the processed Terra/Aqua AOD data approximated the AERONET better than the original Terra/Aqua DT AOD data. After merging, there are 486 and 476 pairs of matched data for processed Terra and Aqua AOD with AERONET AOD, respectively. For the processed Terra AOD, the RMSE decreased from 0.25 to 0.19, and the percentage of retrievals within the EE increased from 40.77% to 68.93%. For the processed Aqua AOD, the RMSE decreased from 0.20 to 0.17, and the percentage of retrievals within the EE increased from 48.86% to 72.24%. However, the R-value of the processed AOD showed a slight decrease, from 0.9248 to 0.9108 for Terra and from 0.9320 to 0.9160 for Aqua.
The cross-validation results (Table A4 in Appendix A) demonstrate the good predictive performance of the linear regression models, with RMSE ranging from 0.09 to 0.14, RPE ranging from 20.8% to 29.6% and R 2 ranging from 0.82 to 0.88 for the years and the four-year period. The σ of the predicted AOD values range from 0.0011 to 0.0110 for the four-year period of 2014-2017. Figure 3 shows the comparison between the AERONET AOD data from 2014 to 2017 against the original Terra/Aqua DT AOD data and the processed Terra/Aqua AOD data. Overall, the processed Terra/Aqua AOD data approximated the AERONET better than the original Terra/Aqua DT AOD data. After merging, there are 486 and 476 pairs of matched data for processed Terra and Aqua AOD with AERONET AOD, respectively. For the processed Terra AOD, the RMSE decreased from 0.25 to 0.19, and the percentage of retrievals within the EE increased from 40.77% to 68.93%. For the processed Aqua AOD, the RMSE decreased from 0.20 to 0.17, and the percentage of retrievals within the EE increased from 48.86% to 72.24%. However, the R-value of the processed AOD showed a slight decrease, from 0.9248 to 0.9108 for Terra and from 0.9320 to 0.9160 for Aqua.

Assessment of the Spatiotemporal Coverage of the Merged AOD
The objective of this present study is to improve the coverage of the DT AOD data with the available DB AOD retrievals. To evaluate the merging effect, the daily spatial coverage of the original Terra DT AOD, Aqua DT AOD and the merged AOD data were compared ( Figure A1 in Appendix A). After merging, the average daily spatial coverage was greatly increased, by 94% and 132% compared to the original Terra DT AOD and Aqua DT AOD. For the original Terra DT AOD and Aqua DT AOD data, there were 71 and 64 days with spatial coverage of more than 50%, while for the merged AOD data, there were 323 days with spatial coverage of more than 50%. Figure 4 illustrates the spatial distribution of the original Terra DT AOD, Aqua DT AOD and the merged AOD data on

Assessment of the Spatiotemporal Coverage of the Merged AOD
The objective of this present study is to improve the coverage of the DT AOD data with the available DB AOD retrievals. To evaluate the merging effect, the daily spatial coverage of the original Terra DT AOD, Aqua DT AOD and the merged AOD data were compared ( Figure A1 in Appendix A). After merging, the average daily spatial coverage was greatly increased, by 94% and 132% compared to the original Terra DT AOD and Aqua DT AOD. For the original Terra DT AOD and Aqua DT AOD data, there were 71 and 64 days with spatial coverage of more than 50%, while for the merged AOD data, there were 323 days with spatial coverage of more than 50%. Figure 4 illustrates the spatial distribution of the original Terra DT AOD, Aqua DT AOD and the merged AOD data on 26 November 2017. For the data of this date, the spatial coverage of the merged AOD was 51.42%, much higher than that of the original Terra DT AOD (25.61%) and Aqua DT AOD (28.54%). 26 November 2017. For the data of this date, the spatial coverage of the merged AOD was 51.42%, much higher than that of the original Terra DT AOD (25.61%) and Aqua DT AOD (28.54%). The temporal coverage was also compared as shown in Figure 5. It is clear that the temporal coverage of the merged AOD data was higher than the original. From 2014 to 2017, there were 1451 days for which both Terra and Aqua AOD data were available (there were 10 days for which only Terra AOD data were available and we discarded them). The temporal coverage (pixel-level) of the original Terra DT AOD and Aqua DT AOD data ranged from 0 to 29.08% and from 0 to 25.36% ( Figure  5a-b), respectively. After merging, the temporal coverage of AOD data for most of the PYRD ranged from 20% to 40% (Figure 5c). The temporal coverage was also compared as shown in Figure 5. It is clear that the temporal coverage of the merged AOD data was higher than the original. From 2014 to 2017, there were 1451 days for which both Terra and Aqua AOD data were available (there were 10 days for which only Terra AOD data were available and we discarded them). The temporal coverage (pixel-level) of the original Terra DT AOD and Aqua DT AOD data ranged from 0 to 29.08% and from 0 to 25.36% (Figure 5a,b), respectively. After merging, the temporal coverage of AOD data for most of the PYRD ranged from 20% to 40% (Figure 5c).  26 November 2017. For the data of this date, the spatial coverage of the merged AOD was 51.42%, much higher than that of the original Terra DT AOD (25.61%) and Aqua DT AOD (28.54%). The temporal coverage was also compared as shown in Figure 5. It is clear that the temporal coverage of the merged AOD data was higher than the original. From 2014 to 2017, there were 1451 days for which both Terra and Aqua AOD data were available (there were 10 days for which only Terra AOD data were available and we discarded them). The temporal coverage (pixel-level) of the original Terra DT AOD and Aqua DT AOD data ranged from 0 to 29.08% and from 0 to 25.36% ( Figure  5a-b), respectively. After merging, the temporal coverage of AOD data for most of the PYRD ranged from 20% to 40% (Figure 5c).  The temporal coverage of the merged AOD data varied from area to area (Figure 5c). While more days were available in north Jiangsu and north Anhui with the percentage of availability mostly from 25% to 40%, fewer days were available in Shanghai and the most of Zhejiang with the percentage of availability mostly ranged from 20% to 25%.  The temporal coverage of the merged AOD data varied from area to area (Figure 5c). While more days were available in north Jiangsu and north Anhui with the percentage of availability mostly from 25% to 40%, fewer days were available in Shanghai and the most of Zhejiang with the percentage of availability mostly ranged from 20% to 25%. Figure 6a presents the spatial distribution of four-year average AOD over the PYRD from 2014 to 2017. The overall four-year average AOD over the PYRD was 0.514, with high values in the north-eastern and low in south-western. Shanghai and Jiangsu generally exhibited high AOD values, with a four-year average AOD of 0.626 and 0.622 respectively. Apart from the southeast and southwest, the four-year average AOD in most parts of Anhui was also large, ranging from 0.50 to 0.80. In contrast, Zhejiang was low in AOD values (the four-year average AOD was 0.395). Figure 6b-e indicate that spatial distribution in annual average AOD showed a similar pattern as the four-year average AOD. However, the average AOD over the PYRD showed a gradual decline. The high-AOD (>0.6) area decreased from 57.10% (of the total PYRD area) to 3.98% while the low-AOD (<0.3) area increased from 1.44% to 18.93% during the four years.

Spatial Variations of AOD
As illustrated in Figure 6f-i, AOD values for the four seasons were high in the north-eastern and low in the south-western. The average AOD in spring, summer, autumn, and winter over the PYRD were 0.544, 0.537, 0.467, and 0.500, respectively. Despite obvious seasonal variability, the high-AOD (>0.60) area was larger in spring (50.64% of the total PYRD area) and summer (46.7%). In autumn, the high-AOD (>0.60) area decreased obviously (8.43%) while the low-AOD (<0.3) area was largest in this season, accounting for 15.18% of the total PYRD area. In winter, most of the AOD values (60.3%) ranged from 0.5 to 0.7.

Contribution of Each Factor to AOD Distribution
The power of determinant values (q) were calculated by the geographical detector method and the impact directions were determined by multiple linear regression analysis. As shown in Figure 9, during the four-year period, the highest q value was found for DEM (0.863), with SLP a close second    In different seasons, the q values of DEM and SLP ranged from 0.733 to 0.866 and from 0.663 to 0.834, respectively, indicating they could explain AOD more than the other factors. The q values of PREC were 0.499, 0.282, 0.286, 0.588 in spring, summer, autumn, winter, respectively, which suggests that PREC was the main meteorological factor influencing AOD. AWS and ATEM also exhibited strong effects on AOD in winter, with the q value of 0.370 and 0.467, respectively. ARH and PBLH were found to show somewhat strong influences on AOD in spring (q value = 0.283 and 0.204 respectively). Notably, despite low in summer and autumn, the q value of NDVI reached a maximum of 0.583 in winter. In addition, GDP and POP exerted great influences on AOD distribution in all the seasons, with q values greater than 0.2.
The directions of regression coefficients for variables in multiple linear regression models (Table A5 in Appendix A) show the positive or negative correlation between AOD and the factors. DEM, PREC, NDVI and PBLH were found negatively linked to the AOD, while AWS, ARH, and POP were observed positively associated with the AOD.

AOD Gap-Filling
Previous studies mapped the spatiotemporal characteristics of AOD at coarse spatial resolutions, which makes it difficult to unravel regional-scale aerosol heterogeneity [16,43,66]. The latest released MODIS 3-km AOD can provide more fine-scale aerosol details over urban areas [27]. Due to the limitation of the Dark Target (DT) algorithm, there are however a large number of missing values in the daily AOD images of MODIS 3-km AOD [28]. In this study, we applied the method proposed by He and Huang [22] to merge AOD data at a regional-scale, and the accuracy and spatiotemporal coverage of merged AOD were compared. To our knowledge, this is the first attempt to merge 3-km DT and 10-km DB AOD with daily linear regression models at regional-scale. The linear regression models achieved high prediction accuracy (cross-validation R 2 ranging from 0.82 to 0.88, Table A4). The performance of linear regression was steady across the years and among different AOD datasets (Table A4). Validation of merged AOD against AERONET AOD also indicate that when comparing with the original DT AOD, the merged AOD (both for Terra and Aqua) outperformed in RMSE and the percentage of AOD retrievals within EE. The correlation coefficients (R), though a slight decrease after merging, were higher in our study (>0.91 for both Terra and Aqua) than those reported by Qin  Meanwhile, the coverage of the merged AOD was improved in this study. The average spatial coverage increased from 13.7% (the original Terra DT AOD) and 11.45% (the original Aqua DT AOD) to 26.52% ( Figure A1). In a previous study, Xu et al. [37] merged MISR, SeaWiFS, MODIS DT, MODIS DB and MODIS SRAP AOD datasets using the maximum likelihood estimate method over Asia for the year 2007, with the average spatial coverage of AOD data increasing to 50% while the figures of the operational AOD datasets only ranged from 5% to 20%. Compared with the study of Xu et al., the coverage improvement of our study is slightly lower, probably due to the low availability of the original AOD data [85]. The linear regression merging approach could only estimate the AOD pixels at where at least one of the four AOD datasets has valid retrievals. However, because of the cloud contamination and retrieval errors [32], the original AOD datasets have large numbers of data gaps in the PYRD. The mean daily spatial coverage of the original Terra DT, Aqua DT, Terra DB and Aqua DB data were 13.7%, 11.45%, 19.98% and 16.91% respectively ( Figure A1). Insufficient retrievals lead to limited improvement in AOD coverage. Moreover, the more datasets utilized in the study of Xu et al. broadened the coverage of AOD data to a greater degree. Additionally, the temporal coverage for most of the PYRD increased to 20%-40% in our study ( Figure 5), which were similar to the result of He and Huang [22]. Though limited, the linear regression merging approach improved the spatial and temporal coverage to a certain degree, which could provide more information about AOD for the subsequent analyses (i.e., spatiotemporal variations and influencing factors analysis of AOD). The results of our study also proved that this method is not only suitable for merging AOD at national-scale but regional-scale.

The Impacts of Factors on the Spatial Variations of AOD
Topography was confirmed to be closely and negatively related to AOD, with quite high q values for DEM and SLP ( Figure 9, Table A5). Over the PYRD, high AOD values were observed in plain and tableland areas (Figure 6a) such as Jiangsu, Shanghai, and North, and Central Anhui, while low AOD values were primarily concentrated in hilly areas like Zhejiang, Southwest and Southeast Anhui. A previous study has found that both the MODIS C6.1 DT and DB AOD retrievals show small biases in low-elevation areas (height < 800 m) while the DT AOD retrievals show increasing positive biases as the elevation increases in high-elevation areas (height > 800 m) [26]. Since most parts of the PYRD (approximately 97.47%) are at elevations below 800 meters, it was assumed that biases caused by elevation are small. The close and negative association between DEM, SLP and AOD may be explained from three aspects: firstly, low-elevation and flat areas are more influenced by human activity such as industry and construction, and thus emitted more air pollutants [34,50]; secondly, high mountains in high-elevation areas can prevent the horizontal dispersion of air pollutants [19,96]; and lastly, for the mid-latitude areas, precipitation usually increases with elevation [97], while precipitation is capable of bringing down aerosols [46]. Previous studies also showed the aerosol distribution is strongly affected by topography conditions [19,[41][42][43].
Compared to other factors, socioeconomic factors (i.e., population density and GDP) were identified as greater contributors to AOD (Figure 9, Table A5), explaining 41% and 36.9% of the spatial variability in AOD, respectively. This finding agrees with previous studies of AOD-GDP association in Guangdong [50], Huaihai economic region [45] and mainland China [40] and AOD-population density association in mainland China [40]. It is due to the fact that a dense population causes high anthropogenic aerosol particles emissions and that high GDP requires heavy energy consumption [40,65], thereby leading to an increase in AOD. For example, Jiangsu and Shanghai were areas with large populations and high GDP in China, where an enormous amount of fuel combustion, industrial emission, and transportation and construction sources have always caused large AOD values [49,65]. But despite high population density and GDP, Zhejiang had lower AOD values than the other areas. We assume that the possible reason is that the impact of terrain was stronger than socioeconomic factors.
The influence of local meteorological factors on the spatial pattern of AOD varied in different periods. From the result of geographical detector method and multiple linear regression analysis ( Figure 9, Table A5), precipitation had a prominent negative impact on the AOD during the four-year period and in each season, which is consistent with multiple previous studies [19,[44][45][46]. It is because precipitation can lower aerosol concentration by washing away aerosols [45]. Additionally, precipitation tends to increase soil moisture, making the dust more difficult to rise into the atmosphere [19]. The influence of wind speed on AOD is complex because it may either disperse aerosols or bring in fresh aerosols [47,48]. Wind speed was observed to make an important positive contribution to AOD, particularly in winter. The prevailing north-west wind in winter can bring in highly polluted airborne particles from North China to the PYRD [66]. The planetary boundary layer height exhibited an obvious negative impact on AOD in spring. This is due to the fact that relative high planetary boundary layer in this season can lead to strong dilution and diffusion of aerosol particles [53]. In addition, relative humidity and temperature also have strong impacts on AOD in spring and in winter, respectively. It was well documented that the higher relative humidity could result in a larger volume of fine particles because of the hygroscopic growth of aerosol particles [19,51]. Regarding temperature, some studies have confirmed that high temperature can promote the photochemical reaction, thus increasing aerosol concentrations in the atmosphere [49][50][51]. However, on the other hand, the occurrence of inversion phenomenon in winter may hinder the diffusion and dispersion of aerosol particles, causing accumulation of aerosols over the region [48,98]. In the present study, since the temperature was not included in the models, thus its impact directions were not detected. Previous studies have reported that AOD is strongly and negatively related to the NDVI in Guangdong and Yangtze River Basin [44,50], because denser vegetation can lower AOD values by absorbing and depositing aerosol particles, especially in the dusty environment [19]. In some cases, however, vegetation can also increase AOD, for example, through burning straw in rural areas [19,44,52,99]. In our study, NDVI contributed to the AOD negatively in winter more than in the other seasons ( Figure 9, Table A5). A possible explanation is that owing to the sparse vegetation in winter, large amounts of dust aerosols were emitted into the air by wind erosion and this remarkably increased the aerosols in the atmosphere. In contrast, thick vegetation in spring, summer, and autumn mitigated the determinate power of NDVI for the spatial variability of AOD in these seasons.

The Effect of Environmental Policy on the Temporal Variability of AOD
The annual average AOD of the PYRD showed a decreasing trend from 2014 to 2017, in agreement with the trends observed in the Huaihai Economic Region [45] and East China [49]. It has been widely acknowledged that precipitation, temperature, and wind speed can impact the concentration of aerosols [16,19,51]. However, no prominent annual variation on precipitation, wind speed, or temperature over the PYRD was observed during the 2014-2017 period. Hence, the decline of annual average AOD might not be attributed to temporal change in meteorological factors. On the other hand, ground-level particles have presented a downward trend in recent years, which might be a key reason for the reduction in annual average AOD [42]. Since 2013, a variety of environmental measures have been implemented to lower PM emissions in the PYRD by the central and local governments, for instance, improving combustion technologies and vehicle emission standards, adjusting the energy structure, and utilizing clear energy [42,66,98].

Limitations
There are some limitations in this study. Firstly, though the linear regression-based merging approach can improve the spatiotemporal coverage of MODIS AOD, large data gaps remain in some daily images. Thus, the other gap-filling methods, such as spatiotemporal kriging [34] and multiple imputation [32] should be adopted to further fill AOD based on MODIS AOD merging. Secondly, we only focused on the impacts of factors on the spatial pattern of AOD, without considering the causes for seasonal variations of AOD. Lastly, although anthropogenic emissions are prominent sources of atmospheric aerosols, we only considered two socioeconomic factors (GDP and population density) due to the lack of data. More factors should be selected to represent the impact of human activity.

Conclusions
In this study, we merged four MODIS AOD datasets from 2014 to 2017 with an assessment of the accuracy and spatiotemporal coverage of the merged AOD and investigated its spatial pattern and temporal variations over the Pan Yangtze River Delta (PYRD). In addition, the contributions of topography, meteorology, vegetation, and socioeconomic factors to AOD distribution were identified through the geographical detector method and multiple linear regression analysis. The key findings and main conclusions are as follows: • The merged AOD are better than the original Terra/Aqua DT AOD, with the average spatial coverage increased by 94% and 132% respectively.

•
The AOD over the PYRD were high in the northeast and low in the southwest and decreased from 2014 to 2017. Seasonal average AOD were relatively higher in spring and summer than in autumn and winter.

•
Topographical factors contributed most to AOD, followed by precipitation and population density, while NDVI showed a relatively week impact on AOD.
Our study highlights how AOD varies over time and in space and therefore, has the potential to contribute to the formulation of environmental policy to protect atmospheric quality over large economically prosperous regions like the PYRD.

Validation of the Resampled 3-km DB AOD Data
The 3-km DB AOD data from 2014 to 2017, resampled by nearest neighbor, bilinear interpolation and cubic convolution methods respectively, were validated against AERONET AOD values. The Calibration of AOD The linear regression relationships between the four MODIS AOD datasets and AERONET AOD data were established using the following equation: where AOD Aeronet refers to AERONET AOD data, AOD MODIS refers to Terra DT, Aqua DT, Terra DB or Aqua DB AOD data; a, b, R 2 refer to intercept, slope, and R square of the linear regression respectively.

Multiple Linear Regression Models
To avoid the collinearity issue, Pearson correlation analyses among influencing factors were conducted for each season and the four-year period. If two factors were closely correlated (Pearson's r > 0.6) with each other, only one of the two was selected to develop the final model while the other was removed. For example, the results of Pearson correlation analyses show that some factors (i.e., DEM vs. SLP, PREC vs. ARH, AWS vs. ATEM, and POP vs. GDP) were closely correlated in spring. Meanwhile, the results of multiple linear regression models show that DEM, PREC, AWS and POP have better performance than the others. Thus, we selected DEM, PREC, AWS, POP, PBLH, and

Multiple Linear Regression Models
To avoid the collinearity issue, Pearson correlation analyses among influencing factors were conducted for each season and the four-year period. If two factors were closely correlated (Pearson's r > 0.6) with each other, only one of the two was selected to develop the final model while the other was removed. For example, the results of Pearson correlation analyses show that some factors (i.e., DEM vs. SLP, PREC vs. ARH, AWS vs. ATEM, and POP vs. GDP) were closely correlated in spring. Meanwhile, the results of multiple linear regression models show that DEM, PREC, AWS and POP have better performance than the others. Thus, we selected DEM, PREC, AWS, POP, PBLH, and NDVI in the spring model (Table A5). The same method was used to build models for the other seasons and the whole period (Table A5).