High-Resolution Mapping of Aerosol Optical Depth and Ground Aerosol Coefﬁcients for Mainland China

: Aerosols play an important role in climate change, and ground aerosols (e.g., ﬁne particulate matter, abbreviated as PM 2.5 ) are associated with a variety of health problems. Due to clouds and high reﬂectance conditions, satellite-derived aerosol optical depth (AOD) products usually have large percentages of missing values (e.g., on average greater than 60% for mainland China), which limits their applicability. In this study, we generated grid maps of high-resolution, daily complete AOD and ground aerosol coefﬁcients for the large study area of mainland China from 2015 to 2018. Based on the AOD retrieved using the recent Multi-Angle Implementation of Atmospheric Correction advanced algorithm, we added a geographic zoning factor to account for variability in meteorology, and developed an adaptive method based on the improved full residual deep network (with attention layers) to impute extensively missing AOD in the whole study area consistently and reliably. Furthermore, we generated high-resolution grid maps of complete AOD and ground aerosol coefﬁcients. Overall, compared with the original residual model, in the independent test of 20% samples, our daily models achieved an average test R 2 of 0.90 (an improvement of approximately 5%) with a range of 0.75–0.97 (average test root mean square error: 0.075). This high test performance shows the validity of AOD imputation. In the evaluation using the ground AOD data from six Aerosol Robotic Network monitoring stations, our method obtained an R 2 of 0.78, which further illustrated the reliability of the dataset. In addition, ground aerosol coefﬁcients were generated to provide an improved correlation with PM 2.5 . With the complete AOD data and ground coefﬁcients, we presented and interpreted their spatiotemporal variations in mainland China. This study has important implications for using satellite-derived AOD to estimate aerosol air pollutants.


Introduction
Aerosols are tiny particles in the air that come from many sources, including anthropogenic sources, such as coal, petroleum, burning wood and biofuels; and natural sources, such as the desert, wildfires and dust [1]. They can control the energy from the Sun to the Earth, and scatter the energy back into space, thereby considerably influencing the climate [2,3]. Ground aerosols, such as coarse particulate matter (PM 10 ) or fine particular matter (PM 2.5 ), also have adverse health effects, ranging from asthma [4], stroke [5] and heart disease [6], to pregnancy complications [7] and mortality [8]. Recent studies [9][10][11] also show that aerosol droplets may be an important route of human-to-human transmission of COVID-19. The spatiotemporal distributions and variations of aerosols are critical for studying climate change, and for estimations of exposure for health evaluations. Although a worldwide or countrywide ground network for monitoring aerosols, such as sampling stations of PM 10 , PM 2.5 or Aerosol Robotic Network (AERONET) aerosol optical depth (AOD), has been established, due to limited spatial coverage, such a network is insufficient for monitoring and measuring spatiotemporal variation of aerosols. For mainland China. Compared with the traditional regression spatial statistics or machine learning methods, modern deep learning methods present powerful performance and generalization. The high missing proportion of AOD makes convolutional neural networks (CNN) inappropriate. Recently, a full residual deep network [35] was developed as a novel deep learning method to impute missing AOD with high generalization and natural spatial smoothing in surface modelling. However, for a large area such as mainland China, high variability in meteorology, geography and altitude needs to be considered in imputation of missing AOD.
In this study, we generated spatial maps of high-resolution (1 × 1 km 2 ), daily complete AOD and ground aerosol extinction coefficients of mainland China from 2015 to 2018. Using MAIAC AOD as the base data, we developed an improved full residual attention deep learning method to reliably impute extensive missing AOD and produce the high-resolution complete AOD dataset. Taking into account the important impact of meteorology on AOD and its missing values, but due to the lack of publicly available high-resolution parameters, we combined ground monitoring data with meteorological reanalysis data, topography, coordinates and time index to generate reliable high spatiotemporal resolution surfaces of meteorological factors.
In order to account for high variability in meteorology and land-use for a large area such as mainland China, the geographic zones were also included as a one-hot indicator through seven binary variables in the model; the attention layers were added to the full residual network to enhance the influence of important predictive covariates and reduce the bias from uncorrelated confounding covariates [36] for imputation in a large area. In deep learning, the weight parameters in the attention layer are used to allow the model to focus and place more "attention" (high weights) on the important parts rather than the minor or noisy parts (low weights) in the input sequence or nodes as needed [37], and it has been widely applied to solve complex problems of natural language processing [38] or computer vision [39].
In order to improve the effectiveness of model training, we developed the adaptive method of heuristic searching to find optimal hyperparameters (minibatch size, learning rate, number of hidden layers and number of nodes) to obtain a better model: when the test performance was lower than the threshold (for example, R 2 < 0.75), the system can search for optimal hyperparameters to retrain the (changed) model, thereby improving test performance and generalization ability. In deep learning, these hyperparameters have important influence upon learning and testing [40]. As shown in this study, the adaptive method with the incorporation of the regional factor and attention layers obtained high generalization in practice and modelling flexibility, and could cope well with the high variability and complexity under the meteorological and geographic conditions of a large area such as mainland China. Further, based on the imputed AOD, planetary boundary layer height (PBLH) and relative humidity, the grid maps of national ground aerosol coefficients were generated and shared.
The rest of this paper is organized as follows: Section 2 describes the study area, data and methods, including preprocessing, adaptive imputation modelling, evaluation and conversion to ground aerosol coefficient; Section 3 presents the results, the discussion of the results and data availability; and Section 4 draws the conclusions.

Study Area
The study area is mainland China, approximately between 18 and 54 • north latitude and 73 and 135 • east longitude. It covers an area of approximately 9.6 million square kilometers and had a population of approximately 1.4 billion in 2019 [41]. From the deserts in the arid north to the humid subtropical forests in the south, China has a vast and diverse landscape. The climate of the study area is mainly dominated by dry seasons and humid monsoons. Due to the highly complex topography, China's climate varies from region to region. Mainland China has multiple aerosol sources including coal combustion, motor Remote Sens. 2021, 13, 2324 4 of 26 vehicle emissions, and industrial sources, etc. In this study area, sulfate, ammonium, and nitrate from coal burning and vehicle emissions are the main causes of regional haze [42,43]; the continuous expansion of the deserts (e.g., Gobi Desert) has led to sandstorms, which is one of the main sources of aerosols [44]. Supplementary Figure S1 shows the study area of mainland China including its altitude and seven geographic zones, as well as six AERONET monitoring sites. The geographic zones are used to represent the differences in climate and land-use among different regions and are encoded using one-hot binary variables in the models to account for regional differences in mainland China. For imputation modeling, however, we trained a national model for each day, not separate regional models.

Original Satellite-Derived AOD Data and AERONET Data
The advanced MAIAC algorithm retrieves the daily AOD at a 1 × 1 km 2 spatial resolution from the MODIS Aqua (crossover approximately at 1:30 p.m. local time) and Terra (crossover approximately at 10:30 a.m. local time) sensors [17,25]. The DT algorithm associates the reflectance in visual bands with the 2.1 µm region to retrieve the surface reflectance and the DB algorithm uses an angular distribution model of surface reflectance derived from the time series of cloudless observations. As an improvement over the DT and DB algorithms, MAIAC uses time series and image processing to perform simultaneous retrievals of aerosol properties and surface bidirectional reflectance. Thus, it works on both dark vegetations and bright deserts at a higher spatial resolution [45].
We obtained 18 MODIS Sinusoidal tiles [46] of MAIAC AOD from the NASA data website (https://lpdaac.usgs.gov/products/mcd19a2v006/ (accessed on 6 January 2019)). These tiles together cover the entire mainland China with the sinusoidal projection (https://en.wikipedia.org/wiki/Sinusoidal_projection (accessed on 6 January 2019)). We mosaicked and cropped these tiles as a large grid to cover mainland China, and reprojected it onto the Beijing 1954 coordinate system (https://epsg.io/2412 (accessed on 6 January 2019)). The original MAIAC AOD data also provides the quality assurance flags that indicate cloud, land, water or snow pollution, which can be used to filter out invalid AOD.
In order to evaluate our imputation, the ground measurements of the AERONET data AOD from 6 AERONET monitoring sites (Supplementary Figure S1 for their spatial distribution) were gathered, and these data samples were measured when the sky was cloudless. We first obtained the averages of the 5-min AERONET data in the 60 min intervals to match the overpass times of the Aqua and Terra sensors; then the spectral linear interpolation was used to estimate AOD at 550 nm in the log-log space based on the AOD at the two nearest wavelengths [47]. The interpolated AERONET AOD at 550 nm was averaged and used as the "ground truth" to evaluate our imputed AOD.

Coarse-Resolution Reanalysis Data
The Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) is a global reanalysis data product, which enables assimilation of modern hyperspectral radiance and microwave observations, climate parameters, the GEOS model and the GSI along with GPS-Radio Occultation datasets. Spatial resolution of MERRA2 is about 0.625 • × 0.5 • . MERRA2 AOD is estimated by assimilation of GEOS-5, the Goddard Chemistry, Aerosol, Radiation and Transport (GOCART) aerosol module, MODIS AOD data and AERONET ground-based observations. MERRA2 coarse-level ADO can provide background information of AOD in imputation modelling. Planetary boundary layer (PBL) is a critical factor in the turbulent mixing and vertical distribution of air pollutants, and its height determines the volume available for pollution dispersion and transport, and significantly affects vertical structure and turbulent mixing aerosols. We obtained reanalysis PBL height from the Goddard Earth Sciences Data and Information Services Centre "GIOVANNI" platform (https://giovanni.gsfc.nasa.gov/giovanni (accessed on 26 January 2019)). We also extracted the daily cloud fraction from the YD08_D3 MODIS Aqua and Terra Daily L3 Global 1Deg CMA datasets [48,49].

High-Resolution Meteorological Data by Spatiotemporal Interpolation
Since there are no publicly available meteorological surface grids to match our target resolution (1 × 1 km 2 ), we used the full residual deep network [35] as the interpolation method to generate the daily 1 × 1 km 2 grids of meteorological variables. Compared with tree-based or other existing machine learning methods, the full residual deep network shows good performance and generalization in spatiotemporal interpolation of continuous surfaces [35,50]. Due to the large difference in spatial resolution (25 × 25 km 2 to 62.5 × 50 km 2 vs. 1 × 1 km 2 ), the cubic or bilinear interpolation, if used directly to convert the meteorological reanalysis data to target resolution, may cause the resulting grid to lose the locality of spatial variations [50]. For our interpolation of meteorology, the input covariates included latitude, longitude, day of year, elevation and meteorological reanalysis data (Supplementary Table S1). We collected the hourly meteorological measurement data from approximately 900 monitoring stations of the China Meteorological Data Service (http://data.cma.cn (accessed on 6 January 2019)). These data were converted into daily data and used as the dependent variable to train the interpolation models. Finally, we generated the high-resolution grids of the following meteorological variables, including daily air pressure (hPa), air temperature ( • C), relative humidity (%) and wind speed (m/s). For grid interpolation of these variables, the reanalysis data provided background covariates including air pressure, air temperature, wind speed and ozone. For details, please refer to Supplementary Section S1 and our published papers [51,52].

Geographic Zone
The geographic zone data can be used to distinguish the differences in climate, elevation and land-use among different locations. We obtained the geographic zone data of China from the Resource Environmental Science and Data Centre, Chinese Academy of Sciences (http://www.resdc.cn (accessed on 1 June 2019)). In total, we obtained seven zones (Supplementary Figure S1): Northeast China, Northwest China, North China, Southwest China, East China, Central China and South China. For training a daily-level national model, the one-hot coding method [53] was used to encode the geographic zoning factor through seven binary (0 or 1) variables to include it in the model to account for the zonal variability in AOD.

Other Spatial and/or Temporal Variables
The other spatial and temporal variables included either the coordinates (x and y) and their derivatives (squares of x and y, and product of x and y), elevation, time index, or combination. The elevation, and the coordinates and their derivatives were used to capture spatial variability of AOD.

The Preprocessing of Satellite-Derived AOD and Covariates
The preprocessing (Supplementary Figure S2) included data cleaning, projection conversion, filtering, temporal alignment, AOD fusing and merging and resampling. Data cleaning was to remove the invalid values in the original AOD and covariate data according to the normal value range. The outer fences were defined as (mean − 5 × IQR, mean + 5 × IQR) (IQR: interquartile range) to overlay with the normal range to filter out the invalid values. With the Beijing 1954 coordinate system as the target projection, projection conversion was performed for the AOD and covariate data. For the MAIAC AOD data, the quality assurance flags in the original dataset were used to filter out invalid AOD measurements due to the influences of cloud and high reflectance. Temporal alignment was performed, including averaging the fine timescale data (e.g., hourly meteorological measurements), on the target timescale (daily), and bilinear interpolation of coarse or misplaced timescale data to the target timescale. For fusing and merging of AOD data, see Supplementary Section S2. In addition, the meteorological reanalysis AOD and PBLH play an important role in the formation and spread of aerosols, but they have a coarse resolution (approximately 25 × 25 km 2 or 62.5 × 50 km 2 ). We suggest bilinear interpolation [54] as the resampling method to convert the coarse-resolution reanalysis data to the fineresolution data.

Adaptive Imputation Modelling
Compared with other methods, the encoder-decoder full residual network has better generalization ability and better spatial variation simulation [35], so for imputation of missing AOD, it is used as the base model with the addition of attention layers (Supplementary Figure S3). The model was constructed based on the encoder-decoder architecture, which has a symmetrical structure from the encoding layers to the decoding layers. For the encoding layers, the number of nodes of a hidden layer has a downward trend; the middle hidden layer has the least number of nodes, as the coding layer of latent representation; for the decoding layers, the number of nodes of the hidden layers has an upward trend, corresponding to each encoding layer. The architecture can efficiently learn latent representation from raw features. Full residual connections are introduced to the encoder-decoder architecture and they can considerably improve learning efficiency and generalization ability.
Based on the full residual architecture, attention layers were added to enhance the contribution of important predictive covariates and reduce overfitting in the adaptive base model [55] (Supplementary Figure S3).
In the architecture, each attention layer was a dense layer whose input was from the input layer or the output of the last attention layer.
Each attention layer generated a weight vector, where large weights emphasized the impacts of important covariates, and small weights reduced the impacts of secondary covariates: where a (l−1) i is the weight of the ith node of the l-1 attention layer, h (l−1) i is the ith input node of the l-1 attention layer and i is the node index; and the residual connection was used to alleviate vanishing gradient in the attention layer. The softmax activation function was used to generate the weights of each attention layer (Supplementary Section S3).
For AOD imputation of a large area, the attention layers were used to highlight critical contributions of the important predictive covariates, thus reducing over-fitting in the full residual deep network.
For daily imputation of missing AOD in mainland China, the following loss function was defined: where x is the matrix of covariates used for imputation, θ W,b denotes the set of parameters of weights (W) and bias (b), y is the ground truth AOD,ŷ θ W,b (x) is the estimates of AOD and f θ W,b (x) is the estimates of the input, x by the network. x is the mean square error (MSE) loss function for the input x, as a regularizer for the imputed y. Ω(θ w,b ) is the elastic regularizer (w i = 0.5, i = 1, 2) [56]. In order to improve the learning efficiency, we developed an adaptive imputation method that could heuristically search optimal hyperparameters including a learning rate, a minibatch size, the number of attention layers, the number of hidden layers and the number of nodes in each hidden layer. Figure 1 shows the heuristic search process, which is described in detail in Supplementary Section S4. we used Python to implement the adaptive imputation modelling to improve the efficiency of model learning.

Training and Evaluation
Among the daily samples with AOD available from the MAIAC data without cloud coverage, 20% were randomly selected for validation and 20% were randomly used for the independent test, and the remaining 60% were used to train the daily models. R 2 and root mean square error (RMSE) were used as the performance metrics. In the evaluation, training and testing metrics were summarized and compared. Furthermore, the spectrally interpolated AOD from AERONET was used as the third-party ground-truth to evaluate our imputation results, and R 2 and RMSE are also reported in the validation.
If the missing proportion of one day was too high (>75%), the available samples of the previous day and the next day were combined with the data of the day for training and testing. Correspondingly, the day index (−1: the previous day; 0: the day; 1: the next day) was included as a time covariate in the predictors to account for the day variation. If the size of the combined samples was still small, we used a similar method to include the data of more days (two days each time). For training and testing, similar proportions were used to split the merged samples (60%: training; 20%: validation; 20%: testing).

Conversion of AOD into Ground Aerosol Coefficient
With the complete AOD (original values plus imputed values), downscaled PBLH and relative humidity (RH), the following optimal formula [29] was used to convert the complete AOD into the ground aerosol coefficient: where AOD is the original and imputed AOD, GAC denotes the ground aerosol coefficient and g is the parameter to be solved. Automatic differentiation [57] can be used to find an optimal g. For different geographic regions, we also fitted different g to show the influence of different zones on the GAC conversion in mainland China.

Descriptive Statistics of AOD and Fusion of High-Resolution Meteorology
For mainland China, the mean AOD of 2015-2018 was 0.22 with a standard deviation of 0.16. Figure 2 shows the spatial distribution of mean AOD (a) and seasonal means (b The Python package of full residual deep network with attention layers has been published at https://pypi.org/project/fullresattn (accessed on 30 March 2021). In addition, we used Python to implement the adaptive imputation modelling to improve the efficiency of model learning.

Training and Evaluation
Among the daily samples with AOD available from the MAIAC data without cloud coverage, 20% were randomly selected for validation and 20% were randomly used for the independent test, and the remaining 60% were used to train the daily models. R 2 and root mean square error (RMSE) were used as the performance metrics. In the evaluation, training and testing metrics were summarized and compared. Furthermore, the spectrally interpolated AOD from AERONET was used as the third-party ground-truth to evaluate our imputation results, and R 2 and RMSE are also reported in the validation.
If the missing proportion of one day was too high (>75%), the available samples of the previous day and the next day were combined with the data of the day for training and testing. Correspondingly, the day index (−1: the previous day; 0: the day; 1: the next day) was included as a time covariate in the predictors to account for the day variation. If the size of the combined samples was still small, we used a similar method to include the data of more days (two days each time). For training and testing, similar proportions were used to split the merged samples (60%: training; 20%: validation; 20%: testing).

Conversion of AOD into Ground Aerosol Coefficient
With the complete AOD (original values plus imputed values), downscaled PBLH and relative humidity (RH), the following optimal formula [29] was used to convert the complete AOD into the ground aerosol coefficient: where AOD is the original and imputed AOD, GAC denotes the ground aerosol coefficient and g is the parameter to be solved. Automatic differentiation [57] can be used to find an optimal g. For different geographic regions, we also fitted different g to show the influence of different zones on the GAC conversion in mainland China.

Descriptive Statistics of AOD and Fusion of High-Resolution Meteorology
For mainland China, the mean AOD of 2015-2018 was 0.22 with a standard deviation of 0.16. Figure 2 shows the spatial distribution of mean AOD (a) and seasonal means (b for spring, c for summer, d for autumn and d for winter) of AOD for 2015-2018. The average AOD in summer and spring was higher than that in autumn and winter (0.25-0.26 vs. 0.19-0.22; Table 1). In addition, the average AOD in central and eastern China is higher than in other regions (0.45-0.47 vs. 0.13-0.39); the average AOD in southern China is higher than that in northern China (0.36 vs. 0.18); the northeast, northwest and southwest regions have lower average AOD values than other regions (0.13-0.19 vs. 0.30-0.47; Figure 2 and Table 2). As economic development drove high emission sources of aerosols, the AOD values of the central and eastern developed regions were higher than other regions.
Overall, the average proportion of missing values in the study area from 2015 to 2018 was 63%, and the standard deviation was 18% (Table 1). Figure 3 shows the spatial distribution of the mean (a) and seasonal means (b for spring, c for summer, d for autumn and d for winter) of the missing proportion of AOD in mainland China. In summer and autumn, the missing proportion of AOD is higher than that in spring and winter (67% vs. 53-64%). For mainland China, due to the influence of precipitation and cloud cover [27], the average missing proportion of AOD in the southern region is higher than that in the northern region (81% vs. 57%); the average missing proportions of AOD in eastern, central, southern and southwest China are higher than that in the other regions (70-87% vs. 50-66%; Table 2).     For the reanalysis of PBLH and AOD, compared with the original coarse-resolution data, the high-resolution (1 × 1 km 2 ) grid resampled using bilinear interpolation showed a spatially smoothing surface (Supplementary Figure S4 for the PBLH example). Sensitivity analysis showed that bilinear interpolation and the downscaling method [52,58] had similar results for the imputation of missing AOD. Table 3 shows the test performance of the fusion of high-resolution meteorological variables for mainland China. The results show high test performance and generalization for the trained models of the meteorological factors (air pressure, air temperature, relative humidity and wind speed). Due to the volatility of wind, estimating wind speed is challenging. Thus, we performed bootstrap aggregating (bagging) of the full residual deep base networks for the estimation of wind speed and obtained an R 2 of 0.79.

Reliability of Imputed AOD and Simulated GAC
For imputation of AOD, we trained 1461 daily-level models with attention layers from 1 January 2015 to 31 December 2018. For each daily-level model, the learning procedure was stable, and the model parameters also converged to an optimal solution (Supplementary Figure S5 for the learning curves of training and testing R 2 for four seasonal days). Figure 4 shows the time series of test R 2 and test RMSE for the 1461 models, and Table 4 shows the summary of test performance metrics (Supplementary Figure S6 for the scatter plots between observed AOD and predicted AOD in independent tests on four seasonal days). In total, the R 2 had a mean of 0.90, ranging from 0.75 to 0.97, with a standard deviation of 0.04 and an IQR of 0.05; the RMSE had a mean of 0.08, ranging from 0.03 to 0.32, with a standard deviation of 0.03 and an IQR of 0.03. The model's daily-level test performance seemed to remain stable with small changes, as shown in Figure 4 (Supplementary Figure S7 for the histograms of test R 2 and RMSE). The analysis showed that, without adding data from other days, a higher AOD missing proportion was one of the main reasons for the lower test R 2 . Due to the high missing proportion, the sample size in training and testing was small, which might have a negative impact on the testing performance of the trained model [59]. In addition, the high cloud proportion was the main reason for the high missing proportion; accordingly, this was also one of the main reasons for the low test R 2 .

Reliability of Imputed AOD and Simulated GAC
For imputation of AOD, we trained 1461 daily-level models with attention layers from 1 January 2015 to 31 December 2018. For each daily-level model, the learning procedure was stable, and the model parameters also converged to an optimal solution (Supplementary Figure S5 for the learning curves of training and testing R 2 for four seasonal days). Figure 4 shows the time series of test R 2 and test RMSE for the 1461 models, and Table 4 shows the summary of test performance metrics (Supplementary Figure S6 for the scatter plots between observed AOD and predicted AOD in independent tests on four seasonal days). In total, the R 2 had a mean of 0.90, ranging from 0.75 to 0.97, with a standard deviation of 0.04 and an IQR of 0.05; the RMSE had a mean of 0.08, ranging from 0.03 to 0.32, with a standard deviation of 0.03 and an IQR of 0.03. The model's daily-level test performance seemed to remain stable with small changes, as shown in Figure 4 (Supplementary Figure S7 for the histograms of test R 2 and RMSE). The analysis showed that, without adding data from other days, a higher AOD missing proportion was one of the main reasons for the lower test R 2 . Due to the high missing proportion, the sample size in training and testing was small, which might have a negative impact on the testing performance of the trained model [59]. In addition, the high cloud proportion was the main reason for the high missing proportion; accordingly, this was also one of the main reasons for the low test R 2 .    As shown in the summary and the boxplots ( Figure 5), there are almost no differences between training performance metrics and testing performance metrics, indicating little over-fitting and reliability for our imputation models. Due to the low initial test performance (<0.75), our adaptive method automatically retrained the imputation models of 30 days after searching for their optimal hyperparameters. The default configuration of hyperparameters (minibatch size: 1024; number of attention layers: 2; learning rate: 0.1; number of layers: 12; numbers of nodes: (15, 15,   Sensitivity analysis was conducted to evaluate the contributions of geographic zon ing and attention layers. Compared with the full residual deep network, the analysi shows that the climate zone and attention layers made average improvements in test R of 2% and 3.5%, respectively. Both contributed an average improvement of 5% in test R The analysis shows that for such a large area as mainland China, it is important to includ geographic zones to account for variability in meteorology and land-use, and adding th attention levels to highlight important predictors in imputing missing AOD. For the larg study area with high complexity in regional heterogeneity, the attention layers wer learned to improve the model sensitivity to the influence of the covariates on the predic tions through different attention weights in different regions. Therefore, by focusing mor on the important parts rather than the minor or noisy parts in the input covariates in dif ferent regions, the testing performance was improved. The importance scores adaptivel varied with the input for the model to obtain the high test performance. Statistics on th results showed that the covariates with a higher importance score included MERRA AOD and PBLH, relative humidity, air pressure, altitude, latitude and longitude. In total, 3608 samples collected from six AERONET sites were used to validate th imputed values. In short, we obtained an R 2 of 0.78 (correlation: 0.88) and an RMSE of 0.2 in the validation using AERONET data (Figure 6a). The residuals from the AERONET AOD presented a normal distribution with a mean (0.02) close to 0 (Figure 6b), indicatin that the majority of the spatiotemporal variance was reliably captured by our imputations One thousand, three hundred and thirty-eight samples of the AERONET AOD were col lected under a cloudy sky where MAIAC AOD was missing. The imputed AOD under th cloudy sky had an R 2 of 0.68 (correlation: 0.83; RMSE: 0.31). In a cloudy sky, aerosols ma be activated and turned into cloud droplets, which may reduce AOD and increase cloud Sensitivity analysis was conducted to evaluate the contributions of geographic zoning and attention layers. Compared with the full residual deep network, the analysis shows that the climate zone and attention layers made average improvements in test R 2 of 2% and 3.5%, respectively. Both contributed an average improvement of 5% in test R 2 . The analysis shows that for such a large area as mainland China, it is important to include geographic zones to account for variability in meteorology and land-use, and adding the attention levels to highlight important predictors in imputing missing AOD. For the large study area with high complexity in regional heterogeneity, the attention layers were learned to improve the model sensitivity to the influence of the covariates on the predictions through different attention weights in different regions. Therefore, by focusing more on the important parts rather than the minor or noisy parts in the input covariates in different regions, the testing performance was improved. The importance scores adaptively varied with the input for the model to obtain the high test performance. Statistics on the results showed that the covariates with a higher importance score included MERRA2 AOD and PBLH, relative humidity, air pressure, altitude, latitude and longitude.
In total, 3608 samples collected from six AERONET sites were used to validate the imputed values. In short, we obtained an R 2 of 0.78 (correlation: 0.88) and an RMSE of 0.27 in the validation using AERONET data (Figure 6a). The residuals from the AERONET AOD presented a normal distribution with a mean (0.02) close to 0 (Figure 6b), indicating that the majority of the spatiotemporal variance was reliably captured by our imputations. One thousand, three hundred and thirty-eight samples of the AERONET AOD were collected under a cloudy sky where MAIAC AOD was missing. The imputed AOD under the cloudy sky had an R 2 of 0.68 (correlation: 0.83; RMSE: 0.31). In a cloudy sky, aerosols may be activated and turned into cloud droplets, which may reduce AOD and increase cloud optical depth [60,61]. It is possible for the imputed AOD to be overestimated under a cloudy sky. In the proposed method, the cloud fraction was used to account for this difference in AOD between the clear sky and cloudy sky. As shown in the result, the proposed method achieved a high correlation (0.83) of imputed AOD with AERONET AOD, indicating its reliability under the cloudy sky when extensive AOD was missing. AOD is defined as the integral of aerosol extinction coefficient at all altitudes and presents a probably exponential distribution along the vertical profile. The temporally varying mixing vertical height has a great influence on the relationship between AOD and ground aerosol concentrations [62]. Typically, due to the lower PBL height (Supplementary Figure S8) in winter, column-integrated AOD is higher in summer than in winter, which is quite different from the seasonal trend of ground aerosols such as PM10 and PM2.5 (high in winter and low in summer) [62]. Thus, PBLH and RH played a critical role in the conversion from AOD to GAC. As shown in Figures 9 and 12, the early spring day seemed to have higher AOD and the winter seemed to have higher GAC. The increase in AOD may cause the ground surface to cool, which may impede the development of PBL so that more aerosols such as PM2.5 may be concentrated on the ground [63]. The complex spatiotemporal interactions between meteorological factors (including relative humidity, wind speed and PBLH, etc.) and aerosols may lead to uncertainty in the estimation of these variables [64,65]. Studies have shown that satellite-derived [66] and MERRA2 [67,68] PBLH have high uncertainty. Since the satellite did not capture the shallow surface aerosol layers in winter, the uncertainty of PBLH estimates might have a negative impact on ground aerosol estimates.
Despite the uncertainty in the MERRA2 PBLH and RH estimates, GAC converted from AOD can better represent the temporal and seasonal trend of ground aerosol concentrations (Figures 7 and 8) and can be used as a better predictive covariate for estimating PM2.5. The results show that the correlation (0.51) between GAC and PM2.5 was improved by 0.21, compared with that (0.30) of the AOD. Therefore, compared with AOD, GAC is a better proxy variable used to represent temporal trend and spatial distribution of ground aerosol pollutants. Using the complete AOD, and downscaled PBLH and RH, the grid map dataset of ground aerosol coefficient was simulated. The national optimal g in Equation (3) is 0.21. The sensitivity analysis showed that the optimal g varied for each region. However, for the estimation of ground PM 2.5 , the difference between the national optimal g and the regional optimal g was very small, as shown by their small difference in the RMSE of Supplementary Table S2. AOD is defined as the integral of aerosol extinction coefficient at all altitudes and presents a probably exponential distribution along the vertical profile. The temporally varying mixing vertical height has a great influence on the relationship between AOD and ground aerosol concentrations [62]. Typically, due to the lower PBL height (Supplementary Figure S8) in winter, column-integrated AOD is higher in summer than in winter, which is quite different from the seasonal trend of ground aerosols such as PM 10 and PM 2.5 (high in winter and low in summer) [62]. Thus, PBLH and RH played a critical role in the conversion from AOD to GAC. As shown in Figures 9 and 12, the early spring day seemed to have higher AOD and the winter seemed to have higher GAC. The increase in AOD may cause the ground surface to cool, which may impede the development of PBL so that more aerosols such as PM 2.5 may be concentrated on the ground [63]. The complex spatiotemporal interactions between meteorological factors (including relative humidity, wind speed and PBLH, etc.) and aerosols may lead to uncertainty in the estimation of these variables [64,65]. Studies have shown that satellite-derived [66] and MERRA2 [67,68] PBLH have high uncertainty. Since the satellite did not capture the shallow surface aerosol layers in winter, the uncertainty of PBLH estimates might have a negative impact on ground aerosol estimates.
Despite the uncertainty in the MERRA2 PBLH and RH estimates, GAC converted from AOD can better represent the temporal and seasonal trend of ground aerosol concentrations (Figures 7 and 8) and can be used as a better predictive covariate for estimating PM 2.5 . The results show that the correlation (0.51) between GAC and PM 2.5 was improved by 0.21, compared with that (0.30) of the AOD. Therefore, compared with AOD, GAC is a better proxy variable used to represent temporal trend and spatial distribution of ground aerosol pollutants.  In order to evaluate the reliability of the imputation for mainland China in practice, we compared the original and complete AODs, and the MAIAC land surface reflectance of true color (bands 1, 4 and 3) and GAC. Figures 9-12 show the original (a) and complete (b) AOD, the land surface reflectance (c) and the ground aerosol coefficient (d) for four typical seasonal days (   In order to evaluate the reliability of the imputation for mainland China in practice, we compared the original and complete AODs, and the MAIAC land surface reflectance of true color (bands 1, 4 and 3) and GAC. Figures 9-12 show the original (a) and complete (b) AOD, the land surface reflectance (c) and the ground aerosol coefficient (d) for four typical seasonal days (  In order to evaluate the reliability of the imputation for mainland China in practice, we compared the original and complete AODs, and the MAIAC land surface reflectance of true color (bands 1, 4 and 3) and GAC. Figures 9-12 show the original (a) and complete (b) AOD, the land surface reflectance (c) and the ground aerosol coefficient (d) for four typical seasonal days (

Spatiotemporal Distributions of AOD and GAC for Mainland China
As shown in Figure 7a, the national average AOD (including imputed AOD) presented a seasonal pattern (high in spring and low in winter). Similarly, due to the influence of average AOD, the standard deviation of AOD had a similar pattern (Figure 7b). In most areas of mainland China, the low-altitude AOD in winter is the highest because more aerosols are concentrated near the ground surface, thereby resulting in high ground aerosol loading, due to a lower atmospheric boundary layer in winter than in summer [69]. In addition, the northwest desert dust, as one of the main sources of aerosols, covers most of Xinjiang and had an important impact on the high AOD in this area; anthropogenic aerosols were one of the main reasons for the high AOD in North China and Central China.

Spatiotemporal Distributions of AOD and GAC for Mainland China
As shown in Figure 7a, the national average AOD (including imputed AOD) presented a seasonal pattern (high in spring and low in winter). Similarly, due to the influence of average AOD, the standard deviation of AOD had a similar pattern (Figure 7b). In most areas of mainland China, the low-altitude AOD in winter is the highest because more aerosols are concentrated near the ground surface, thereby resulting in high ground aerosol loading, due to a lower atmospheric boundary layer in winter than in summer [69]. In addition, the northwest desert dust, as one of the main sources of aerosols, covers most of Xinjiang and had an important impact on the high AOD in this area; anthropogenic aerosols were one of the main reasons for the high AOD in North China and Central China.  (Figures 14 and 15 and Supplementary  Figures S13 and S14). Although Beijing, Shanghai and Guangzhou had similar seasonal patterns (high in spring or winter and low in summer), Chengdu had a different pattern (generally high in four seasons). Compared with the other three cities, Chengdu was more polluted by PM 2.5 [70]. In addition, with the exception of Chengdu and Guangzhou, the average GAC of Beijing and Shanghai also showed a moderate downward trend from 2015 to 2018 ( Figure 13).
(b) from 2015 to 2018 in mainland China. In addition, from 2015 to 2018, the national annual average AOD (Figure 13a) and GAC (Figure 13b) showed a slight downward trend. The cities also showed similar GAC seasonal patterns, but with different magnitudes (Supplementary Figures S9-S12 for four megacities, i.e., Beijing in the metropolitan Jing-Jin-Tang region, Shanghai in the Yangtze River Delta, Guangzhou in the Pearl River Delta and Chengdu in the Chengdu Plain). The spatial distributions of daily GAC in four seasonal days are shown for the four megacities (Figures 14 and 15 and Supplementary Figures S13 and S14). Although Beijing, Shanghai and Guangzhou had similar seasonal patterns (high in spring or winter and low in summer), Chengdu had a different pattern (generally high in four seasons). Compared with the other three cities, Chengdu was more polluted by PM2.5 [70]. In addition, with the exception of Chengdu and Guangzhou, the average GAC of Beijing and Shanghai also showed a moderate downward trend from 2015 to 2018 ( Figure 13).
As an indicator of aerosol pollution (e.g., PM2.5 and PM10, and total suspended particles), the declining trend of satellite GAC might show the effectiveness of the Chinese government in controlling air pollution. For example, the China Clean Air Policy has been implemented since 2013, resulting in a gradual decline in PM2.5 [71,72]. On the other hand, the seasonal pattern (Figure 8) of average GAC of 2015-2018 in mainland China was consistent with the trends of PM10 and PM2.5, which are usually low in summer and high in winter.  Due to the influence of the spatiotemporally varying PBLH, AOD and GAC presented different patterns for mainland China. A typical example is the spatial distributions of AOD and GAC on 30 December of 2018 (b vs. d of Figure 12). The AOD map showed low and moderate values for most of mainland China, but the GAC map showed moderate to high pollution in parts of southern, central, eastern, northwest and northeastern  As an indicator of aerosol pollution (e.g., PM 2.5 and PM 10 , and total suspended particles), the declining trend of satellite GAC might show the effectiveness of the Chinese government in controlling air pollution. For example, the China Clean Air Policy has been implemented since 2013, resulting in a gradual decline in PM 2.5 [71,72]. On the other hand, the seasonal pattern (

Conclusions
This article reported an adaptive method of a full residual deep network with attention layers to improve the imputation of extensive missing AOD and the mapping of ground aerosol coefficients for mainland China. To account for the regional variability of emissions and meteorology in a large study area, the geographical zoning factor was encoded as one-hot quantitative variables in the daily imputation model, and attention layers were added to highlight the contribution of important predictive covariates to improving generalization. In the adaptive method, we designed a heuristic search method to improve the optimization of hyperparameters, thereby improving the efficiency of imputation training on large satellite datasets. The daily imputation models achieved an average test R 2 of 0.90, ranging from 0.75 to 0.97 (average RMSE: 0.075), with an improvement of approximately 5% over the original full residual network. There were smooth transitions (no unrealistic abrupt changes) between the imputed values and the original AOD. The validation with the AERONET AOD (R 2 : 0.78) further illustrated the reliability of the proposed method. We shared the datasets of high-resolution complete AOD and ground aerosol coefficient for mainland China from 2015 to 2018. This study has important implications for the imputation of AOD and the mapping of ground aerosol coefficients for a large area, and extensive applications of high-resolution MAIAC AOD data.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Section S1: Spatiotemporal interpolation of meteorological parameters, Section S2: Fusing and merging of AOD Due to the influence of the spatiotemporally varying PBLH, AOD and GAC presented different patterns for mainland China. A typical example is the spatial distributions of AOD and GAC on 30 December of 2018 (b vs. d of Figure 12). The AOD map showed low and moderate values for most of mainland China, but the GAC map showed moderate to high pollution in parts of southern, central, eastern, northwest and northeastern China, which was consistent with actual situations [73]. Table 5 lists the simulated GAC summary of mainland China and its sub-regions. It shows that the central region had the highest average GAC from 2015 to 2018 (0.00113), indicating heavy air pollution for this region. On the contrary, the average GAC in the southwest region was the lowest (0.00032), indicating less pollution for this region. In mainland China and its subregions, summer air pollution is the lowest, while winter air pollution is the highest.
The complete high-resolution (1 × 1 km 2 ) AOD and GAC across mainland China during 2015-2018 are available at https://doi.org/10.7910/DVN/RNSWRH (accessed on 30 December 2020) [74] and https://doi.org/10.7910/DVN/YDJT3L (accessed on 20 February 2021) [75], respectively, which can be downloaded in GeoTiff format. We will continue to update the datasets to cover 2019, 2020 and earlier years. This study has several limitations. First, although we reported the average high test R 2 in the sample test (0.90) and AERONET data (0.78), the data from the six monitoring AERONET sites was limited for national validation of imputed MAIAC AOD. In the future when more AERONET sites become available, more test data can be used in imputation validation. Second, when satellite-derived AOD cannot capture shallow surface aerosol pollutants as in winter, the uncertainty of PBLH estimates will have a negative impact on GAC estimates, inevitably having ill-posed features. In the future, PBLH measured using ground-based lidar, if available, may be used to adjust satellite-derived PBLH to reduce bias in the conversion from AOD to GAC. Third, we used national instead of regional g in the conversion from AOD to GAC, although the difference in the estimation of ground aerosols such as PM 2.5 by different GAC is very small.

Conclusions
This article reported an adaptive method of a full residual deep network with attention layers to improve the imputation of extensive missing AOD and the mapping of ground aerosol coefficients for mainland China. To account for the regional variability of emissions and meteorology in a large study area, the geographical zoning factor was encoded as onehot quantitative variables in the daily imputation model, and attention layers were added to highlight the contribution of important predictive covariates to improving generalization. In the adaptive method, we designed a heuristic search method to improve the optimization of hyperparameters, thereby improving the efficiency of imputation training on large satellite datasets. The daily imputation models achieved an average test R 2 of 0.90, ranging from 0.75 to 0.97 (average RMSE: 0.075), with an improvement of approximately 5% over the original full residual network. There were smooth transitions (no unrealistic abrupt changes) between the imputed values and the original AOD. The validation with the AERONET AOD (R 2 : 0.78) further illustrated the reliability of the proposed method. We shared the datasets of high-resolution complete AOD and ground aerosol coefficient for mainland China from 2015 to 2018. This study has important implications for the imputation of AOD and the mapping of ground aerosol coefficients for a large area, and extensive applications of high-resolution MAIAC AOD data. of daily PBLH for MAIAC AOD of mainland China, Figure S9: Time series of AOD mean (a) and standard deviation (b) in Beijing, Figure S10: Time series of AOD mean (a) and standard deviation (b) in Shanghai, Figure S11: Time series of AOD mean (a) and standard deviation (b) in Guangzhou, Figure S12: Time series of AOD mean (a) and standard deviation (b) in Chengdu, Figure S13: Spatial distributions of GAC for four seasonal days in Chengdu, Figure S14: Spatial distributions of GAC for four seasonal days in Guangzhou.
Author Contributions: L.L. designed the research and developed the methodology; L.L. wrote the manuscript and revised the manuscript.