Full Coverage Hourly PM2.5 Concentrations’ Estimation Using Himawari-8 and MERRA-2 AODs in China

(1) Background: Recognising the full spatial and temporal distribution of PM2.5 is important in order to understand the formation, evolution and impact of pollutants. The high temporal resolution satellite, Himawari-8, providing an hourly AOD dataset, has been used to predict real-time hourly PM2.5 concentrations in China in previous studies. However, the low observation frequency of the AOD due to long-term cloud/snow cover or high surface reflectance may produce high uncertainty in characterizing diurnal variation in PM2.5. (2) Methods: We fill the missing Himawari-8 AOD with MERRA-2 AOD, and drive the random forest model with the gap-filled AOD (AODH+M) and Himawari-8 AOD (AODH) to estimate hourly PM2.5 concentrations, respectively. Then we compare AODH+M-derived PM2.5 with AODH-derived PM2.5 in detail. (3) Results: Overall, the non-random missing information of the Himawari-8 AOD will bring large biases to the diurnal variations in regions with both a high polluted level and a low polluted level. (4) Conclusions: Filling the gap with the MERRA-2 AOD can provide reliable, full spatial and temporal PM2.5 predictions, and greatly reduce errors in PM2.5 estimation. This is very useful for dynamic monitoring of the evolution of PM2.5 in China.


Introduction
PM 2.5 (particulate matter with an aerodynamic equivalent diameter less than or equal to 2.5 µm) emitted from anthropogenic and natural sources has a great adverse effect on human health, the climate and the environment [1][2][3]. China has suffered seriously from PM 2.5 pollution in recent decades with rapid urbanization and industrialization [4][5][6]. Under the strategy of air pollution control, a ground-based observation network has been established to monitor air pollution in real time. Although it provides high-quality PM 2.5 measurements every hour, there is a huge limitation in its spatial coverage due to the sparse and uneven distribution of monitoring stations. Therefore, satellite-based aerosol optical depth (AOD) has been widely used to estimate PM 2.5 due to its strong relationship with ground-level PM 2.5 [7][8][9]. Owing to the convenience of obtaining observations from monitoring stations in recent years, statistical approaches that combine satellite-retrieved AOD data with PM 2.5 observations have become the main method to produce spatial continuous PM 2.5 concentrations. These include the linear mixed effect (LME) model [10], generalized additive models (GAM) [11], geographically weighted regression (GWR)-related models [12,13] and hybrid models [10]. In essence, these statistical methods are still dominated by linear methods. With the development of deep learning, lots of novel machine learning methods such as random forest (RF), the deep neural network (DNN), the Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM) have been induced to produce high accuracy PM 2.5 concentrations over China [14][15][16].
In addition to the predicting model, the large rate of non-random missing information from AOD retrieval is an important factor that may lead to inevitably biases in monthly or yearly ground-level PM 2.5 calculations [7,17,18]. Therefore, some researches tried to fill the gap in AOD retrieval to obtain full coverage ground-level PM 2.5 concentrations. Xiao et al. combined Multi-Angle Implementation of Atmospheric Correction (MAIAC) with chemical transport model simulations through a multiple imputation method to fill the missing AOD [19]. Chen et al. developed a two-step interpolation method to replace the missing values in AOD [20]. Tuygun et al. merged MODIS, AERONET and MERRA-2 data to estimate PM 10 concentrations.
These studies mainly paid attention to filling the gap of AOD products obtained from polar-orbiting satellites to estimate daily PM 2.5 concentrations. The geostationary orbit satellite such as Himawari-8 equipped with the Advanced Himawari Imager (AHI) can provide a high temporal resolution AOD product that is useful for the diurnal variation investigation of air pollution. Recently, some researches have begun to estimate real-time hourly ground-level PM 2.5 from the Himawari-8 AOD product [15,[20][21][22]. However, most of these studies have focused on PM 2.5 estimation models but not the essential relation between PM 2.5 and AOD [23]. Xu et al. conducted a comprehensive investigation of the relationship between PM 2.5 and the Himawari-8 AOD for the period of 2016-2018 in China, and found that due to different meteorological conditions, dominant aerosol type and AOD availability, the correlation between PM 2.5 and the AOD fluctuate in different regions. Furthermore, cloud cover, land surface and the degree of pollution are different in major populated areas in China, such as Beijing-Tianjin-Hebei (BTH), the Pearl River Delta (PRD), the Yangtze River Delta (YRD) and Chengdu-Chongqing (CY). The error and missing rate in satellite-retrieved AODs due to these factors may introduce huge biases in hourly PM 2.5 estimation and misunderstanding of the diurnal variation in PM 2.5 .
Focusing on the above issues, we fill the gaps of the Himawari-8 AOD with MERRA-2 AODs, and employ a RF model to estimate hourly ground-level PM 2.5 in China from March 2017 to February 2018. After that, we present a comprehensive comparison of PM 2.5 estimation based on the gap-filled AOD with that based on the non-gap-filled AOD, and then investigate the influence of the missing AOD on diurnal variation in densely populated regions including BTH, YRD, PRD and CY. This study provides full coverage of hourly PM 2.5 predictions across China, which is helpful to obtain accurate diurnal variations in PM 2.5 and reduce errors in exposure assessments to air pollution. The AHI sensor onboard Himawari-8 has provided hourly AOD (AOD H ) at a 5 km resolution since 2014, and the coverage is 80 • E-160 • W and 60 • N-60 • S, which includes most regions of China. The validation of Himawari-8 AOD retrieval shows that Himawari-8 AOD has a high correlation with AERONET and Sun-Sky radiometer observation network over China [24]. We downloaded the L3 hourly AOD data from FTP address (ftp.ptree.jaxa.jp) provided by Japan Aerospace Exploration Agency (JAXA), and selected reliable AOD values through quality assurance flag that marked as 'very good' and 'good'.

Materials and Methods
NASA's Global Modeling and Assimilation Office (GMAO) produced the new atmospheric reanalysis product, namely, the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) in 2017 [25]. MERRA-2 provides multi-decadal reanalysis of aerosol products that assimilate millions of aerosol observations from satellites and surface monitoring stations. Zhang et al. systematically evaluated the performance of Himawari-8 AODs and two reanalysis AOD datasets offered by MERRA-2 and Copernicus Atmosphere Monitoring Service (CAMS) over China [26]. They found that Himawari-8 and MERRA-2 AODs showed similar accuracies overall, and both presented significant diurnal variations. However, the accuracy of AOD products could be affected by pollution level, pollution distribution patterns and meteorological conditions. Recently, the accuracy of MERRA-2 AOD in China has been evaluated in many researches [27,28], and the results show that MERRA-2 AOD is in high agreement with both the Aerosol Robotic Network (AERONET) AOD and satellited-retrieved AOD. Studies have confirmed that the use of MERRA-2 AOD combined with machine learning models can estimate PM 2.5 concentrations with reasonable accuracy [29]. Therefore, we selected MERRA-2 AODs to fill the gap of Himawari-8 AODs to obtain hourly full coverage AODs. The hourly MERRA-2 aerosol diagnostic dataset (AOD M ) at the spatial resolution of 0.625 • × 0.5 • was used to fill gap of Himawari-8 AOD in this study.

Auxiliary Data
The commonly used variables in previous researches including meteorological factors, normalized difference vegetation index (NDVI), population density, road network data, NO 2 concentrations and DEM were selected as covariates in this study [30,31]. Meteorological data such as hourly air temperature, wind speed, specific humidity, surface pressure, total precipitation and boundary layer height were obtained from the Goddard Earth Observing System Assimilation System GEOS-5 Forward Processing (https: //fluid.nccs.nasa.gov/weather/) at a spatial resolution of 0.25 • × 0.3125 • . NDVI data at 1 km spatial resolution was provided by MODIS 16-day global NDVI dataset (MOD13A2). The road length and density in a 5 km grid that represent vehicle emissions was calculated based on road network data from OpenStreetMap (https://openstreetmap.org/). The population distribution was collected from WorldPop at 1 km spatial resolution population density dataset (https://www.worldpop.org/). Integrated column concentrations of NO 2 at a spatial resolution of 0.25 • × 0.25 • obtained from Ozone Monitoring Instrument (OMI) was used as a prediction variable as NO 2 is an important precursor of PM 2.5 . The Shuttle Rader Topography Mission (SRTM) 90 m digital elevation model (DEM) product was used to characterize the influence of topography. All these datasets (Table 1) were resampled to the 5 km × 5 km Himawari-8 AOD grid covering the study region.

Model Development and Validation
The workflow of this study is shown in Figure 1. First, we resampled the AOD M to the grid of AOD H , and used the resampled AOD M to fill the missing AOD H . Second, the gap-filled AOD (AOD H+M ) and other datasets including meteorology, NO 2 , NDVI, road network, population density and coordinates were integrated to the unified grid through spatial-temporal collocating. Finally, a random forest model derived by these predictor variables was used to predict ground PM 2.5 concentrations.
Random forest model is an effective and relatively new machine learning method based on decision tree [32]. It is an ensemble of decision trees, and each sub-decision tree is constructed by sub-data drawn from a training set with replacement. The prediction results of random forests can be obtained by averaging the results of sub-decision trees. It is easy to evaluate the importance of each feature during the classification and reduce risk of overfitting. Compared with other deep learning algorithms, the random forest model is much simpler because there are only a few parameters needed to achieve optimal performance. Furthermore, the results can be more interpretable due to this model providing variable importance measures [33].

Model Development and Validation
The workflow of this study is shown in Figure 1. First, we resampled the AODM t the grid of AODH, and used the resampled AODM to fill the missing AODH. Second, th gap-filled AOD (AODH+M) and other datasets including meteorology, NO2, NDVI, roa network, population density and coordinates were integrated to the unified grid throug spatial-temporal collocating. Finally, a random forest model derived by these predicto variables was used to predict ground PM2.5 concentrations. Random forest model is an effective and relatively new machine learning metho based on decision tree [32]. It is an ensemble of decision trees, and each sub-decision tre is constructed by sub-data drawn from a training set with replacement. The predictio results of random forests can be obtained by averaging the results of sub-decision trees. In this study, a random forest model was implemented by R package 'Ranger', a fast implementation of random forests. Two most important parameters in package 'range' are the number of trees (n tree ) and the number of variables that can possibly be split in each node (m try ). We set n tree as 500 and m try as 6 to obtain the balance of computing time and prediction accuracy after comparing results generated by different settings. Tenfold cross-validation (CV) method based on all the data samples was used to ensure the robustness of the RF model in this study. Some stat metrics including the determination coefficient (R 2 ), mean absolute error (MAE), relative prediction error (RPE), the root mean square error (RMSE) and index of agreement (IA) are often used to evaluate the accuracy of models [34,35]. We selected three commonly used statistical indicators (R 2 , RMSE, MAE) for PM 2.5 estimation accuracy assessment. Here, we compared two RF models derived by AOD H and the gap-filled AOD H+M dataset to investigate the model performance after filling the gap of the Himawari-8 AOD with MERRA-2. node (mtry). We set ntree as 500 and mtry as 6 to obtain the balance of computing time and prediction accuracy after comparing results generated by different settings. Ten-fold cross-validation (CV) method based on all the data samples was used to ensure the robustness of the RF model in this study. Some stat metrics including the determination coefficient (R 2 ), mean absolute error (MAE), relative prediction error (RPE), the root mean square error (RMSE) and index of agreement (IA) are often used to evaluate the accuracy of models [34,35]. We selected three commonly used statistical indicators (R 2 , RMSE, MAE) for PM2.5 estimation accuracy assessment. Here, we compared two RF models derived by AODH and the gap-filled AODH+M dataset to investigate the model performance after filling the gap of the Himawari-8 AOD with MERRA-2.  To further investigate the performance of the models, the valid observed frequency (N), CV MAE, R 2 and RMSE of individuals site across China were calculated and are shown in Figure 3. Figure 3a-d show the stat metrics of the AODH-derived model, and Figure 3e-h show those of the AODH+M-derived model. The observation frequency of hourly AODH varies greatly over regions due to the retrieval algorithm and cloud cover, and it is at its highest (~35%) in North China Plain, is lower (~16%) in central China and at its lowest in southwest China (~10%). After filling the gap of AODH with MERRA-2, the observation frequencies of most of the monitoring sites were up to 100%, except for a few sites where the measurements were missing. As the results show in Figure 3a-h, the To further investigate the performance of the models, the valid observed frequency (N), CV MAE, R 2 and RMSE of individuals site across China were calculated and are shown in Figure 3.  Although the overall accuracy of the model is reliable, the performance varies greatly in the typically populated regions. The values of CV R 2 and RMSE are highest in the North China Plain, which has a dense network of monitoring sites and high PM2.5 concentrations. Before the AOD was gap filled with MERRA-2, the site-specific cross-validation Although the overall accuracy of the model is reliable, the performance varies greatly in the typically populated regions. The values of CV R 2 and RMSE are highest in the North China Plain, which has a dense network of monitoring sites and high PM 2.5 concentrations.

Model Fitting and Validation
Before the AOD was gap filled with MERRA-2, the site-specific cross-validation accuracy was poor with an R 2 value of 0.2~0.4 and RMSE value of 15~20 µg/m 3 in southern China where it is constantly cloudy. However, the accuracy of estimation improved significantly with a CV R 2 of 0.4~0.6 and a CV RMSE of 10~15 µg/m 3 after the AOD H gap was filled. The site-specific cross-validation result further suggests that using AOD H+M in an RF model can improve estimation accuracy effectively, especially in long-term cloudy regions.

Performance of the Estimation Model on Temporal Scale
The spatial variation in model performance is large due to the network density of monitoring sites, cloud cover, polluted degree, meteorological conditions, etc. How estimation errors vary over different timescales in different regions still requires further study. Figure 4 shows the temporal dependence of CV R 2 and RMSE in hourly AOD Hand AOD H+M -derived PM 2.5 (R 2 _H, RMSE_H, R 2 _H+M, RMSE_H+M) over China and four typical regions. The AOD H -derived model has reliable accuracy over China from 8:00~17:00 (CV R 2 _H: 0.6~0.7, RMSE_H: 15~25 µg/m 3 ), and its performance varies over time obviously. As the observation frequency of AOD H increases from 10:00~15:00, the overall CV R 2 _H improves and the CV RMSE_H increases. accuracy was poor with an R 2 value of 0.2~0.4 and RMSE value of 15~20 μg/m 3 in sout China where it is constantly cloudy. However, the accuracy of estimation improved nificantly with a CV R 2 of 0.4~0.6 and a CV RMSE of 10~15 μg/m 3 after the AODH gap filled. The site-specific cross-validation result further suggests that using AODH+M i RF model can improve estimation accuracy effectively, especially in long-term cloud gions.

Performance of the Estimation Model on Temporal Scale
The spatial variation in model performance is large due to the network densi monitoring sites, cloud cover, polluted degree, meteorological conditions, etc. How mation errors vary over different timescales in different regions still requires fu study. Figure 4 shows the temporal dependence of CV R 2 and RMSE in hourly AODH-AODH+M-derived PM2.5 (R 2 _H, RMSE_H, R 2 _H+M, RMSE_H+M) over China and four ical regions. The AODH-derived model has reliable accuracy over China from 8:00~1 (CV R 2 _H: 0.6~0.7, RMSE_H: 15~25 μg/m 3 ), and its performance varies over time o ously. As the observation frequency of AODH increases from 10:00~15:00, the overal R 2 _H improves and the CV RMSE_H increases. The hourly error stat metrics (R 2 and RMSE) of the AODH-derived and AODH+M rived models varied greatly in different regions. Generally, the hourly CV R 2 incre significantly and RMSE decreased over China after filling the gap of the Himawari-8 A with MERRA-2, but the degree of decline varied a lot at different times and in diffe spaces. As the most polluted region in China, BTH has high values of PM2.5 concentra and a dense network of monitoring sites; the values of hourly CV R 2 (R 2 _H is ~0.7, R 2 _H is ~0.  The hourly error stat metrics (R 2 and RMSE) of the AOD H -derived and AOD H+Mderived models varied greatly in different regions. Generally, the hourly CV R 2 increased significantly and RMSE decreased over China after filling the gap of the Himawari-8 AOD with MERRA-2, but the degree of decline varied a lot at different times and in different spaces. As the most polluted region in China, BTH has high values of PM 2.5 concentrations and a dense network of monitoring sites; the values of hourly CV R 2 (R 2 _H is~0.7, R 2 _H+M is~0.8) and RMSE (RMSE_H and RMSE_H+M are both 15~25 µg/m 3 ) are higher than other regions. The hourly RMSE values declined significantly (5-15 µg/m 3 ) in PRD, CY and YRP where there is a high AOD missing rate (8~20%), but only declined a little in BTH (<5 µg/m 3 ) from 10:00~15:00 after the AOD gap was filled. A previous study demonstrated that even though the PM 2.5 model has a high accuracy of PM 2.5 estimation overall, it performs relatively poorly in PRD, possibly due to the significant reduction in AOD observation caused by long-term cloud cover [21]. Our result suggests that using the MEERA-2 AOD to fill the gap of AOD H can significantly reduce the uncertainty of PM 2.5 estimation caused by the missing AOD.
To evaluate the model performance in different temporal scales, the cross-validated hourly PM 2.5 estimated by AOD H and AOD H+M was used to predict daily, monthly, seasonal and annual average concentrations ( Figure 5). The AOD H+M model has very different J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW 8 of that even though the PM2.5 model has a high accuracy of PM2.5 estimation overall, it p forms relatively poorly in PRD, possibly due to the significant reduction in AOD obs vation caused by long-term cloud cover [21]. Our result suggests that using the MEER 2 AOD to fill the gap of AODH can significantly reduce the uncertainty of PM2.5 estimat caused by the missing AOD. To evaluate the model performance in different temporal scales, the cross-valida hourly PM2.5 estimated by AODH and AODH+M was used to predict daily, monthly, s sonal and annual average concentrations ( Figure 5). The AODH+M model has very differ R 2 (0.95-0.

Discussion
Full spatial-temporal coverage of PM2.5 concentrations can provide valuable inf mation to interpret the formation, transport and removal process of pollutants. Usua CV R 2 , RMSE and MAE values are used to assess the performance of a PM2.5 estimat model. Although most researches have similar R 2 and RMSE values, they may differ s nificantly in the spatial and temporal distribution of PM2.5, which is also an import evaluation indicator of model performance. Figure 6 shows the spatial distributions of AODH+M-derived and ground-measured PM2.5 concentrations. The AODH+M-derived PM concentrations agree well with ground-level observations, and the spatial patterns of AODH+M-derived PM2.5 concentrations are similar to the results reported in previous stu

Discussion
Full spatial-temporal coverage of PM 2.5 concentrations can provide valuable information to interpret the formation, transport and removal process of pollutants. Usually, CV R 2 , RMSE and MAE values are used to assess the performance of a PM 2.5 estimation model. Although most researches have similar R 2 and RMSE values, they may differ significantly in the spatial and temporal distribution of PM 2.5 , which is also an important evaluation indicator of model performance. Figure 6 shows the spatial distributions of the AOD H+M -derived and ground-measured PM 2.5 concentrations. The AOD H+M -derived PM 2.5 concentrations agree well with ground-level observations, and the spatial patterns of the AOD H+M -derived PM 2.5 concentrations are similar to the results reported in previous studies, especially in the typical regions such as BTH, YRD, PRD and CY [9,12].  Some studies have suggested that the non-random missing AOD data may result in serious underestimation of the annual PM2.5 in BTH [7,19]. However, the influence of the observation frequencies in regions with relatively lower PM2.5 concentrations, such as PRD, CY and YRD, has been paid little attention. To further investigate the influence of the missing rate of the Himawari-8 AOD on PM2.5 diurnal variation, we first calculated the hourly observations from 8:00-17:00 of all monitoring sites (PM2.5_ALL), then calculated the hourly PM2.5 observations matched with AODH (PM2.5_H) and AODH+M (PM2.5_H+M). We compared the average hourly PM2.5 derived by AODH (PM2.5_H_CV) and AODH+M (PM2.5_H+M_CV) with PM2.5_ALL, PM2.5_H and PM2.5_H+M over the whole of China and four the typical regions shown in Figure 7. PM2.5_H is higher than PM2.5_ALL by about (5~10) μg/m 3 from 9:00 to 16:00 across China, and this indicates that the model may overestimate the annual PM2.5 concentrations over China due to the missing AOD. These biases vary considerably in different regions due to their different pollution backgrounds. The average hourly PM2.5 concentration is underestimated in BTH because AOD is missing in some heavily polluted scenarios, while in less polluted regions such as PRD, CY and YRD, it is overestimated due to many clean scenarios being excluded when there is cloud cover. These biases may lead to huge misunderstandings of the diurnal variations in PM2.5 in China. After filling the gap of Himawari-8 AOD with MERRA-2 AOD, the values of hourly PM2.5_H+M_CV approached the true diurnal variations that were better than the PM2.5_H_CV values. Some studies have suggested that the non-random missing AOD data may result in serious underestimation of the annual PM 2.5 in BTH [7,19]. However, the influence of the observation frequencies in regions with relatively lower PM 2.5 concentrations, such as PRD, CY and YRD, has been paid little attention. To further investigate the influence of the missing rate of the Himawari-8 AOD on PM 2.5 diurnal variation, we first calculated the hourly observations from 8:00-17:00 of all monitoring sites (PM 2.5_ ALL), then calculated the hourly PM 2.5 observations matched with AOD H (PM 2.5 _H) and AOD H+M (PM 2.5 _H+M). We compared the average hourly PM 2.5 derived by AOD H (PM 2.5 _H_CV) and AOD H+M (PM 2.5 _H+M_CV) with PM 2.5_ ALL, PM 2.5 _H and PM 2.5 _H+M over the whole of China and four the typical regions shown in Figure 7. PM 2.5 _H is higher than PM 2.5_ ALL by about (5~10) µg/m 3 from 9:00 to 16:00 across China, and this indicates that the model may overestimate the annual PM 2.5 concentrations over China due to the missing AOD. These biases vary considerably in different regions due to their different pollution backgrounds. The average hourly PM 2.5 concentration is underestimated in BTH because AOD is missing in some heavily polluted scenarios, while in less polluted regions such as PRD, CY and YRD, it is overestimated due to many clean scenarios being excluded when there is cloud cover. These biases may lead to huge misunderstandings of the diurnal variations in PM 2.5 in China. After filling the gap of Himawari-8 AOD with MERRA-2 AOD, the values of hourly PM 2.5 _H+M_CV approached the true diurnal variations that were better than the PM 2.5 _H_CV values.

Conclusions
PM2.5 has a great influence on the atmospheric environment and health in China. Knowing the diurnal variation is important to understand the formation and evolution mechanism of PM2.5. Many studies have used hourly AOD products of geostationary satellites, including Himawari-8/AHI, to estimate hourly PM2.5 concentrations. However, due to cloud cover and surface characteristics, the non-random missing of Himawari-8 AODs may lead to misunderstandings and errors. In this study, the MERRA-2 AOD dataset is used to fill the gaps of the Himawari-8 AOD. Then, based on the gap-filled AOD and other auxiliary data, the RF model is used to predict hourly ground-level PM2.5 concentrations. PM2.5 concentrations derived from AODH+M and AODH are compared using different spatial and temporal scales. The impact of missing AOD on ground PM2.5 daily variation varies hugely in different regions. Annual hourly PM2.5 concentrations derived from AODH in the daytime (8:00-19:00) are lower than observations in heavily polluted BTH and much higher than those of less polluted regions such as PRD, CY and YRD. However, by filling the gaps of the Himawari-8 AOD with the MERRA-2 data, the accuracy of the estimating model and the ability to estimate diurnal variations of ground PM2.5 are greatly improved. This is very useful for dynamic monitoring of the evolution of PM2.5 in China.

Conclusions
PM 2.5 has a great influence on the atmospheric environment and health in China. Knowing the diurnal variation is important to understand the formation and evolution mechanism of PM 2.5 . Many studies have used hourly AOD products of geostationary satellites, including Himawari-8/AHI, to estimate hourly PM 2.5 concentrations. However, due to cloud cover and surface characteristics, the non-random missing of Himawari-8 AODs may lead to misunderstandings and errors. In this study, the MERRA-2 AOD dataset is used to fill the gaps of the Himawari-8 AOD. Then, based on the gap-filled AOD and other auxiliary data, the RF model is used to predict hourly ground-level PM 2.5 concentrations. PM 2.5 concentrations derived from AOD H+M and AOD H are compared using different spatial and temporal scales. The impact of missing AOD on ground PM 2.5 daily variation varies hugely in different regions. Annual hourly PM 2.5 concentrations derived from AOD H in the daytime (8:00-19:00) are lower than observations in heavily polluted BTH and much higher than those of less polluted regions such as PRD, CY and YRD. However, by filling the gaps of the Himawari-8 AOD with the MERRA-2 data, the accuracy of the estimating model and the ability to estimate diurnal variations of ground PM 2.5 are greatly improved. This is very useful for dynamic monitoring of the evolution of PM 2.5 in China.
Author Contributions: Z.L.: investigation, editing, funding acquisition writing-original draft; Q.X.: data curation, validation, formal analysis; R.L.: conceptualization, data curation, supervision, resources, writing-review and editing. All authors have read and agreed to the published version of the manuscript.