Superior PM 2.5 Estimation by Integrating Aerosol Fine Mode Data from the Himawari-8 Satellite in Deep and Classical Machine Learning Models

: Artiﬁcial intelligence is widely applied to estimate ground-level ﬁne particulate matter (PM 2.5 ) from satellite data by constructing the relationship between the aerosol optical thickness (AOT) and the surface PM 2.5 concentration. However, aerosol size properties, such as the ﬁne mode fraction (FMF), are rarely considered in satellite-based PM 2.5 modeling, especially in machine learning models. This study investigated the linear and non-linear relationships between ﬁne mode AOT (fAOT) and PM 2.5 over ﬁve AERONET stations in China (Beijing, Baotou, Taihu, Xianghe, and Xuzhou) using AERONET fAOT and 5-year (2015–2019) ground-level PM 2.5 data. Results showed that the fAOT separated by the FMF (fAOT = AOT × FMF) had signiﬁcant linear and non-linear relationships with surface PM 2.5 . Then, the Himawari-8 V3.0 and V2.1 FMF and AOT (FMF&AOT-PM 2.5 ) data were tested as input to a deep learning model and four classical machine learning models. The results showed that FMF&AOT-PM 2.5 performed better than AOT (AOT-PM 2.5 ) in modelling PM 2.5 estimations. The FMF was then applied in satellite-based PM 2.5 retrieval over China during 2020, and FMF&AOT-PM 2.5 was found to have a better agreement with ground-level PM 2.5 than AOT-PM 2.5 on dust and haze days. The better linear correlation between PM 2.5 and fAOT on both haze and dust days (dust days: R = 0.82; haze days: R = 0.56) compared to AOT (dust days: R = 0.72; haze days: R = 0.52) partly contributed to the superior accuracy of FMF&AOT-PM 2.5 . This study demonstrates the importance of including the FMF to improve PM 2.5 estimations and emphasizes the need for a more accurate FMF product that enables superior PM 2.5 retrieval. southeastern Asia, the values of V3.0 AE was much higher than that of V2.0 AE, with a difference greater than 0.2. The high values of both V3.0 and V2.1 FMF (>0.6) corresponded well with regions where there were ﬁne mode aerosols, such as southeastern Asia, northern India, northeastern China, and central Australia. The difference between V3.0 and V2.1 FMF also showed a large positive value (>0.2) over southeastern Asia and a negative value over central Australia (> − 0.2). MODIS and VIIRS are polar-orbiting satellites, they have smaller match-ups (N < 500) with AERONET data in comparison to Himawari-8 (N > 7000). Figure S6 shows the application of the four FMF products to estimate PM 2.5 over China. We found that the PM 2.5 estimated by MODIS and VIIRS FMF have a poorer performance than by the Himawari-8 V3.0 FMF.


Introduction
Accurate monitoring of particulate matter that has a diameter of less than 2.5 µm (PM 2.5 ) is vital to the atmospheric environment and human health [1,2]. Ground-based PM 2.5 monitoring sites have limited connectivity and thus lack extensive spatial coverage; this in turn restricts investigations of spatial dynamics of PM 2.5 and its impact on human health. To overcome this issue, satellite-based remote sensing has been widely used to obtain spatially continuous ground-level PM 2.5 values [3]. Given their significant linear correlation, traditional satellite-based PM 2.5 estimations mostly rely on building models between the aerosol optical thickness (AOT) and PM 2.5 ( Figure 1) [4]. Such models include empirical statistical models [5,6], chemical transport models [7], and physical models [8]. AOT products can be commonly obtained at various spatial and temporal resolutions by many satellites, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) [9], the Ozone Monitoring Instrument (OMI) [10], the Visible Infrared Imaging Radiometer Suite (VIIRS) [11], and the Advanced Himawari Imager (AHI) onboard Himawari-8 [12]. Unfortunately, the reliability and accuracy of FMF data hinders its modeling PM2.5. Only a limited number of satellites, such as the MODIS, H Geostationary Ocean Color Imager (GOCI), and Polarization and Direction Reflectance (POLDER) provide FMF or fAOT products, albeit with poor acc al. [21] reported that the MODIS-retrieved FMF over land remained highly it was then deleted in the MODIS Collection 6 global aerosol dataset MYD08). For Himawari-8, Yang et al. [22] found that its V2.1 FMF data had (with a high RMSE of 0.30). Furthermore, Choi et al. [23] evaluated the G AERONET FMF for AERONET AOD > 0.3, and the results showed a med 0.264. Therefore, the lack of satellite-based FMF data accuracy makes it diff AOT is an assembled optical parameter that associates with the columnar quantity of all sizes of suspended particles, as well as their composition, mixing state, density, vertical distribution, and so on, while PM 2.5 is the near-surface concentration of small-sized particulate matter with an aerodynamic diameter of less than 2.5 µm. Therefore, it is difficult to find a general and uniform expression that directly converts AOT to PM 2.5 , and this issue has been widely ignored in previous estimations of PM 2.5 [13]. The impact of fine mode fraction (FMF) on the correlation between PM 2.5 and AOT is non-negligible [14], and studies have used FMF to separate the contribution of fine particles from AOT [15]. In this respect, Yan et al. [16] used ground-based experimental data of a one-month duration and found that the linear correlation between fAOT and PM 2.5 (R = 0.74) was higher than the correlation between AOT and PM 2.5 (R = 0.49); this result provides evidence that fAOT is more suitable for use in estimating PM 2. 5 .
In recent years, machine learning models have been extensively applied in PM 2.5 estimations [17,18] because of their ability to capture the nonlinear relationship between PM 2.5 and AOT [19,20]. However, to our knowledge, in over 100 studies based on machine learning retrieved PM 2.5 (Table S1) during 2014-2021, most employed the AOT for PM 2.5 estimations. Only a limited number of studies have considered utilizing aerosol size information, such as the FMF, as the input parameter for the model. Therefore, the role of FMF in machine learning-based PM 2.5 retrieval models requires further intensive investigation.
Unfortunately, the reliability and accuracy of FMF data hinders its application in modeling PM 2.5 . Only a limited number of satellites, such as the MODIS, Himawari-8, the Geostationary Ocean Color Imager (GOCI), and Polarization and Directionality of Earth's Reflectance (POLDER) provide FMF or fAOT products, albeit with poor accuracy. Levy et al. [21] reported that the MODIS-retrieved FMF over land remained highly uncertain, and it was then deleted in the MODIS Collection 6 global aerosol dataset (MOD08 and MYD08). For Himawari-8, Yang et al. [22] found that its V2.1 FMF data had a low accuracy (with a high RMSE of 0.30). Furthermore, Choi et al. [23] evaluated the GOCI FMF with AERONET FMF for AERONET AOD > 0.3, and the results showed a mediocre RMSE of 0.264. Therefore, the lack of satellite-based FMF data accuracy makes it difficult to explore the role of aerosol size information in estimating PM 2.5 . POLDER/Generalized Retrieval of Aerosol and Surface Properties (GRASP) FMF has substantially improved agreement with ground measurements compared with MODIS FMF [24,25], with an RMSE of 0.170 in China. However, the POLDER/FMF provided global coverage every two days, which results in a coarse temporal resolution compared to MODIS [26].
Nevertheless, Himawari-8, which was launched on 7 October 2014, is a promising geostationary satellite in this respect. It contains a state-of-the-art imager, the Advanced Himawari Imager (AHI), with 16 spectral bands from visible to infrared at high spatial (0.5-1.0 km at nadir for the visible and near-infrared bands, and 2 km at nadir for the infrared bands) and temporal (10 min for full disk coverage) resolutions [12]. This unparalleled spatiotemporal resolution enables detailed observations of aerosol properties to be made, and AOT and aerosol size information (the AE (Ångström exponent) and FMF) can be obtained over Asia and Oceania [27]. Himawai-8 updated its aerosol products from V2.1 to V3.0 on 30 October 2020, providing a valuable opportunity for investigating the impact of FMF on the accuracy of PM 2.5 estimations in machine learning modeling.
Therefore, this study aimed to determine the relationships between fAOT and PM 2.5 using the FMF, and then test the use of the FMF in the PM 2.5 retrieval ability of deep learning and classical machine models. Applying FMF in these models has rarely been considered in previous studies. By comparing the V2.1 and V3.0 Himawari-8 aerosol products (AE and FMF), we ascertained whether the improved FMF results enabled highly accurate PM 2.5 estimations. This study not only enhances our understanding of the use of aerosol size information and the FMF in satellite-based PM 2.5 retrieval by machine learning models, but also emphasizes the importance of a more accurate FMF product in PM 2.5 retrieval.

Study Area and Himawari-8 V2.1 & V3.0 Aerosol Products
Himawari-8 hovers over the equator at 140E longitude and has an observation range of 80E-160W and 60S-60N. Figure S1 shows the geographical domain of Himawari-8 covering East Asia, Southeast Asia, Oceania, and the western Pacific Ocean. China is one of the largest developing countries within its domain, and it has experienced severe air pollution in recent years [28].
Himawari-8/AHI aerosol products were derived based on the methods of Fukuda et al. [29] and Yoshida et al. [30]. V2.1 has now been upgraded to V3.0, and the improvements include the use of canonical correlation analysis and a model forecast to enable an a priori estimate of retrieval. Himawari-8 V3.0 provides products that include AE at 440-675 nm and FMF over land and ocean, with spatial and temporal resolutions of 0.05 • and 10 min, respectively; therefore, Himawari-8 V3.0 products can provide sufficient samples and detailed information about daily variations in aerosols [31]. The Himawari-8 V3.0 dataset also includes the Quality Assurance Flags of four values ("very good," "good," "marginal," "no confidence"); only "very good" products were considered in this study. In this study, AERONET data were used to explore the role of FMF in estimating PM 2.5 . AERONET is a globally distributed network of sun/sky radiometers that provide data with a low uncertainty and high temporal resolution (15 min) for all sites [32]. AERONET data were provided in different quality levels: level 1.0 (unscreened), level 1.5 (L1.5, cloudscreened and quality controlled) and L2 (cloud-screened and quality assured). In this study, the newly updated version 3 (V3.0) level 2 (L2) spectral deconvolution algorithm (SDA) data, including AOT [33], AE, and FMF [34], were selected when available. Although L1.5 data have comparatively higher uncertainty than L2 data, L1.5 were used when L2 data were inadequate or unavailable. The distribution of the selected 89 AERONET stations is shown in Figure S1, and detailed information is given in Table S2.
It should be noted that AERONET SDA FMF is assumed as the weighting between one fine mode and one coarse mode, separating by spectral fashion [34]. While the fine and coarse mode in Himawari-8 is regarded as a monomodal lognormal volume size distribution with a defined radius cutoff for fine (0.143 µm) and coarse (2.834 µm) mode [30], for MODIS and VIIRS, FMF is defined as the contribution of fine-dominated aerosol to AOT, where the fine-dominated aerosol actually contains a coarse mode [35,36]. Therefore, these satellite FMF products and SDA FMF are not physically identical. However, AERONET SDA FMF is still considered comparable to satellite FMF products [35,37] and is widely used in FMF validation [22,38], thus, we use the AERONET SDA FMF as the ground truth in this study. Moreover, in satellite retrievals, the equation of fAOT = AOT × FMF is based on the assumption of single scattering approximation, which is suitable over the ocean. While over land, this could be problematic when AOT is high [39].

Ground-Based PM 2.5 , Meteorological, and Radiosonde Data
To investigate the impact of aerosol size data on estimations over China, the groundbased hourly PM 2.5 data were collected from the National Urban Air Quality Real-Time Publishing Platform. 1701 PM 2.5 monitoring stations' data were obtained and their distributions showed in Figure S2a. The PM 2.5 data from 2015-2019 were used to explore the relationship between fAOT and PM 2.5 , then the PM 2.5 data in 2020 were used to test the satellite-based PM 2.5 retrieval accuracy with or without aerosol size information. Meteorological factors are important for estimating PM 2.5 [40], and the in situ meteorological data were obtained from the National Center for Environment Information (NCEI). Via 405 meteorological stations over China ( Figure S2c), temperature, dew point temperature, and wind speed data were obtained, and the relative humidity (RH) was further determined by temperature and dew point temperature. In addition, the data at 8:00 am (local time) for 95 radiosonde stations from Integrated Global Radiosonde Archive (IGRA) ( Figure S2b) were collected, and the daytime boundary layer height (PBLH) was determined by the 'parcel method' [41,42], which is the height with the same virtual potential temperature as the surface one.

Classical Machine Learning Models
In this study, four traditional machine learning models were introduced to estimate PM 2.5 when using different input variables, and schematic diagrams of these models are shown in Figure 2. The models used are described as follows and the detailed parameter settings for the four machine learning models is shown in Table S3: (1) Extratree is a supervised ensemble learning model that consists of the ensembles of unpruned classification or regression trees [43]. It uses a random value for the split of each node, which leads to more diversified trees and fewer splitters. Previous studies have used Extratree in both prediction [44] and classification [45]. (2) Random Forest (RF) is a supervised ensemble learning model introduced by Ho [46], and its construction is based on the ensembles of unpruned classification or regression trees. It operates by selecting random features in the tree induction and bootstrap samples of the training data, and it splits each node in accordance with the largest information gain. RF has been widely applied in PM 2.5 estimations in previous studies [17]. (3) Extreme Gradient Boosting (XGBoost) is a machine learning algorithm based on the gradient boosting decision tree (GBDT) proposed by Chen and Guestrin [47]. It can conduct parallel computation efficiently, and it uses fewer computing resources than other methods. In XGBoost, each decision tree is split by a level-wise algorithm that is based on different independent variables. Pan [48] used XGBoost to forecast hourly PM 2.5 in Tianjin based on data from air-monitoring stations. (4) LightGBM also has a GBDT framework. It grows trees using a leaf-wise algorithm and it only grows a leaf with the max delta loss. Compared to a level-wise algorithm, LightGBM shows higher loss reduction on the same leaf. Zhong et al. [49] utilized LightGBM to predict historical PM 2.5 based on meteorological observations. Remote Sens. 2021, 13, x FOR PEER REVIEW is based on different independent variables. Pan [48] used XGBoost to foreca PM2.5 in Tianjin based on data from air-monitoring stations. (4) LightGBM also has a GBDT framework. It grows trees using a leaf-wise a and it only grows a leaf with the max delta loss. Compared to a level-wise al LightGBM shows higher loss reduction on the same leaf. Zhong et al. [49] LightGBM to predict historical PM2.5 based on meteorological observations.

Deep Learning Model EntityDenseNet
EntityDenseNet is a deep learning model that can reduce the overfitting p and it has a great capacity for capturing nonlinear relationships between varia As shown in Figure 2, it consists of one input layer, two hidden layers, and on layer. Traditional neural network approaches (such as BPNN) cannot dire categorical data, but EntityDenseNet uses an embedding layer to process ca variables. This not only accelerates the training, but it also helps the neural ne learn about the intrinsic relationships between categorical variables [51]. Each layer consists of one fully connected layer, a rectified linear unit (ReLU) layer, o Normalization (BN) layer, and one dropout layer [52]. The ReLU layer is use activation function because it overcomes the saturated and vanishing gradient [53], and is faster than traditional activation functions, such as the sigmoid a function [54]. The BN layer accelerates the training speed by equally distributing in each layer [54] and the dropout layer reduces overfitting of the neural netwo

Deep Learning Model EntityDenseNet
EntityDenseNet is a deep learning model that can reduce the overfitting problem, and it has a great capacity for capturing nonlinear relationships between variables [50]. As shown in Figure 2, it consists of one input layer, two hidden layers, and one output layer. Traditional neural network approaches (such as BPNN) cannot directly use categorical data, but EntityDenseNet uses an embedding layer to process categorical variables. This not only accelerates the training, but it also helps the neural network to learn about the intrinsic relationships between categorical variables [51]. Each hidden layer consists of one fully connected layer, a rectified linear unit (ReLU) layer, one Batch Normalization (BN) layer, and one dropout layer [52]. The ReLU layer is used as the activation function because it overcomes the saturated and vanishing gradient problem [53], and is faster than traditional activation functions, such as the sigmoid activation function [54]. The BN layer accelerates the training speed by equally distributing the data in each layer [54] and the dropout layer Remote Sens. 2021, 13, 2779 6 of 20 reduces overfitting of the neural networks [55]. Further details about EntityDenseNet are found in the work of Yan et al. [50]. In this study, the optimal parameters (epochs = 40, hidden nodes = 256, dropout rate = 0.3, learning rate = 0.001, batch size = 128, weight decay = 0.0001) for EntityDenseNet were used by parameter tuning.

Model Training and Validation
In this study, Himawari-8 and meteorological data for 2020 were used as modeling inputs. Ground-based PM 2.5 data were provided on an hourly scale; therefore, the 10-min resolution satellite data were averaged on an hourly scale. In EntityDenseNet, we selected month, season, administrative divisions, and global climate zones ( Figure S2d and Table S4) [56] as categorical variables, while AOT, FMF, digital elevation model (DEM), temperature, wind speed, RH, PBLH, longitude, and latitude were employed as continuous variables. In the four classical machine-learning methods, all variables were directly used as input data.
To integrate the input data with site-based PM 2.5 data, the meteorological data and DEM were interpolated into the same spatial resolution as Himawari-8 AOT and FMF (0.05 • ). Then, the pixels of input data closest to the PM 2.5 site were extracted and match the PM 2.5 value at the same hour. Therefore, the matched data is for retrieving hourly PM 2.5 at the spatial resolution of 0.05 • . Figure 3 shows the training process employed by all the models used in this study. The input data of 1650 stations were first separated into training, validation, and test data using the station-based method [57]. Specifically, we randomly selected input data from over 70% of the PM 2.5 stations for training, from 20% of the PM 2.5 stations for validation, and from 10% of the PM 2.5 stations for testing. The training and validation data were then used to determine the hyperparameters in the models. EntityDenseNet was first initialized using the Xavier initialization scheme [58], which ensures that moderate weight values are used during the propagation of the network, and the hyperparameters were then determined using the validation data. The hyperparameters in the other machine learning models were directly determined using the validation data. Using the hyperparameters, the models were subsequently trained using the training data. Finally, the test dataset (41,836 samples) was used to evaluate the performances of the trained models. This station-based validation accurately reflects the spatial performance of each model and has been used in previous studies, such as that of Wang et al. [56].

Linear and Non-Linear Relationships between fAOT and PM 2.5
To explore the relationship between fAOT and PM 2.5 , data were collected from five AEROENT stations (in Beijing, Baotou, Taihu, Xianghe, Xuzhou) ( Figure 4g) and PM 2.5 over stations located nearest to the AEROENT stations in mainland China during 2015-2019. Using FMF, the fAOT was separated from the AOT (fAOT = AOT × FMF); this represents the AOT of fine mode particles with radii ranging from 0.439-0.992 µm. Both linear and non-linear relationships between fAOT and PM 2.5 were investigated, and the results are shown in Figure 4. The linear relationships between fAOT and PM 2.5 show significant correlation values at the 95% significance level for the five stations, and the R of fAOT and PM 2.5 reached 0.48 on average. Of the five stations, Beijing showed the highest correlation between fAOT and PM 2.5 (R = 0.60), while other stations had correlation values ranging from 0.18 to 0.53. A significant linear relationship between fAOT and PM 2.5 was also reported by Zhang and Li [59] (with an R of 0.88) on haze days, and by Yan et al. [16] (with an R of 0.74) in Xingtai during May 2016.
ote Sens. 2021, 13, x FOR PEER REVIEW 7 of Figure 3. Flowchart of the entire training process for both EntityDenseNet and the four traditional machine learning models. The distributions of meteorological data were interpolated by the Kringing method using averaged values in 2020 over meteorology stations (for temperature, relative humidity (RH), and wind speed) and radiosonde stations (for the boundary layer height (PBLH)) in China.

Linear and Non-Linear Relationships between fAOT and PM2.5
To explore the relationship between fAOT and PM2.5, data were collected from fi AEROENT stations (in Beijing, Baotou, Taihu, Xianghe, Xuzhou) ( Figure 4g) and PM over stations located nearest to the AEROENT stations in mainland China during 201 2019. Using FMF, the fAOT was separated from the AOT (fAOT = AOT × FMF); t represents the AOT of fine mode particles with radii ranging from 0.439-0.992 µm. Bo linear and non-linear relationships between fAOT and PM2.5 were investigated, and t results are shown in The generalized additive model (GAM) was used to test the non-linear relationsh between fAOT and PM2.5, and the results showed that the non-linear relationships we significant at the 99% significance level (Figure 4h-l). In Beijing and Baotou, fAO positively influenced PM2.5 when fAOT was less than 1 and 0.8, respectively, and PM The generalized additive model (GAM) was used to test the non-linear relationships between fAOT and PM 2.5 , and the results showed that the non-linear relationships were significant at the 99% significance level (Figure 4h-l). In Beijing and Baotou, fAOT positively influenced PM 2.5 when fAOT was less than 1 and 0.8, respectively, and PM 2.5 had a negative influence on fAOT with a further increase in fAOT. This means that fine mode aerosols were not the only component in severe haze events in Beijing and Baotou, and the proportion of coarse mode aerosols was non-negligible [60]. The PM 2.5 in Taihu and Xuzhou responded positively to fAOD when fAOD was less than approximately 1.7 and 2, but it was negatively correlated with a further increase in fAOD. This is partly related to the fact that although fine mode aerosols dominate local PM 2.5 on severe pollution days, fine particles can absorb moisture and increase in size [61]. The dominant coarse-mode aerosols on severe pollution days caused this negative response between fAOT and PM 2.5 . In Xianghe, there was a strong positive response between fAOT and PM 2.5 , even when fAOT was greater than 3, which indicated that PM 2.5 in Xianghe consisted mainly of fine mode particles on both clean and heavy pollution days [62]. Figure 4 shows the relationship between fAOT and PM 2.5 , which indicates the importance of using aerosol size information when estimating PM 2.5 . fAOT and PM2.5. In Xianghe, there was a strong positive response between fAOT and PM2.5, even when fAOT was greater than 3, which indicated that PM2.5 in Xianghe consisted mainly of fine mode particles on both clean and heavy pollution days [62]. Figure 4 shows the relationship between fAOT and PM2.5, which indicates the importance of using aerosol size information when estimating PM2.5.      showed an overall better retrieval than V2.1 FMF (R = 0.28, RMSE = 0.37). In addition, the significant underestimation and overestimation phenomena of V2.1 FMF were reduced in V3.0 FMF, which provided a more reliable FMF product. The site-based validation results are shown in Figures S3-S5, where it is evident that V3.0 AE performed better than V2.1 AE before and after quality control, and that FMF was generally improved over the sites. The validations suggest that using FMF with the Himawari-8 L2 V3.0 AE provides an overall improvement compared with using FMF with V2.1 AE.
performance of AE QA was superior to that of AE. In addition, V3.0 AE QA ( RMSE = 0.35) outperformed V2.1 AE QA (R = 0.19, RMSE = 0.48). For FMF, the V (R = 0.33, RMSE = 0.26) also showed an overall better retrieval than V2.1 FMF ( RMSE = 0.37). In addition, the significant underestimation and overestimation phe of V2.1 FMF were reduced in V3.0 FMF, which provided a more reliable FMF The site-based validation results are shown in Figures S3-S5, where it is evident t AE performed better than V2.1 AE before and after quality control, and that F generally improved over the sites. The validations suggest that using FMF w Himawari-8 L2 V3.0 AE provides an overall improvement compared with using F V2.1 AE.    Figure 7 shows the estimation of PM2.5 obtained from the deep learning model (EntityDenseNet) and the four classical machine learning methods (Extratree, Random Forest, LightGBM, and XGBoost) using Himawari-8 V2.1 and V3.0. To ensure consistency in PM2.5, we used the same training and test data for all the models. The results clearly show that for all models, PM2.5 based on both FMF and AOT (FMF&AOT-PM2.5) performed better than PM2.5 based only on AOT (AOT-PM2.5), which indicates the importance of using FMF to improve the estimation of PM2.5. EntityDenseNet exhibited the best improvement of all models: R 2 increased by 0.11 (0.03) and RMSE decreased by 1.82 (0.88) µg/m 3 for V2.1 (V3.0) data.
MODIS daily Level-2 data (MOD04_L2) [36] and the Suomi National Polar-orbiting Partnership (SNPP) VIIRS L2 dark target products (AERDT_L2_VIIRS_SNP) [36,67] also provide the FMF over land at the spatial resolution of 10km and 6km, respectively. In this study, we further used MODIS and VIIRS FMF for comparison. Table S5 evaluated Figure 7 shows the estimation of PM 2.5 obtained from the deep learning model (Enti-tyDenseNet) and the four classical machine learning methods (Extratree, Random Forest, LightGBM, and XGBoost) using Himawari-8 V2.1 and V3.0. To ensure consistency in PM 2.5 , we used the same training and test data for all the models. The results clearly show that for all models, PM 2.5 based on both FMF and AOT (FMF&AOT-PM 2.5 ) performed better than PM 2.5 based only on AOT (AOT-PM 2.5 ), which indicates the importance of using FMF to improve the estimation of PM 2.5 . EntityDenseNet exhibited the best improvement of all models: R 2 increased by 0.11 (0.03) and RMSE decreased by 1.82 (0.88) µg/m 3 for V2.1 (V3.0) data.
MODIS daily Level-2 data (MOD04_L2) [36] and the Suomi National Polar-orbiting Partnership (SNPP) VIIRS L2 dark target products (AERDT_L2_VIIRS_SNP) [36,67] also provide the FMF over land at the spatial resolution of 10km and 6km, respectively. In this study, we further used MODIS and VIIRS FMF for comparison. Table S5 evaluated  MODIS and VIIRS are polar-orbiting satellites, they have smaller match-ups (N < 500) with AERONET data in comparison to Himawari-8 (N > 7000). Figure S6 shows the application of the four FMF products to estimate PM 2.5 over China. We found that the PM 2.5 estimated by MODIS and VIIRS FMF have a poorer performance than by the Himawari-8 V3.0 FMF. study also showed that for FMF&AOT-PM2.5, the accuracy of V3.0 was better than tha V2.1. The most significant improvement was found in the EntityDenseNet model results, where the R 2 increased by 0.04 and the RMSE decreased by 1.18 µg/m 3 . T improvement in the accuracy of the V3.0 FMF was superior to that of the V2.1 FMF ( Fig  5); this implies that when the FMF is more accurate, superior PM2.5 estimation can obtained. Zhao et al. [13] used MODIS FMF (mean error = 0.38) and the fused FMF, wh showed a superior accuracy (mean error = 0.13), to estimate PM2.5, and the PM2.5 estima from the fused FMF (mean error = 41.8 µg/m 3 ) was superior to that of the MODIS F (mean error = 45.3 µg/m 3 ). Our results are consistent with those of earlier studies.

Application of FMF for Conducting PM2.5 Estimations in China
As the EntityDenseNet model and Himawari-8 L2 V3.0 data provided the b results, we used them to retrieve PM2.5 in 2020 over China to verify their potential providing superior PM2.5 estimations. PM2.5 estimations, including AOT&FMF-PM2.5 a AOT-PM2.5, were further compared with ground-level PM2.5 values. The results show that AOT&FMF-PM2.5 provided significantly better accuracy than AOT-PM2.5 on dust a haze days. Figure 8 shows the regional performances of AOT&FMF-PM2.5 and AOT-PM over China on typical dust and haze days. On 12 February 2020, according to the t color image (Figure 8c), haze covered the Beijing-Tianjin-Hebei (BTH) region, and PM2.5 concentration was significantly higher (>85 µg/m 3 ) than in other regions of Ch (Figure 8a,b). AOT&FMF-PM2.5 showed close agreement with ground-based PM2.5 (Fig  8d) and it thus captured the high PM2.5 concentration within the central BTH region contrast (Figure 8e), AOT-PM2.5 underestimated the PM2.5 concentration in the main urb Previous studies have also shown improvements in PM 2.5 estimation accuracy with the addition of FMF. For example, Choi et al. [68] used a global chemical transport model (GEOS-Chem) and reported a considerable improvement in the results of PM 2.5 , with regression slopes between estimated and observed PM 2.5 closer to 1. The results in our study also showed that for FMF&AOT-PM 2.5 , the accuracy of V3.0 was better than that of V2.1. The most significant improvement was found in the EntityDenseNet modeling results, where the R 2 increased by 0.04 and the RMSE decreased by 1.18 µg/m 3 . The improvement in the accuracy of the V3.0 FMF was superior to that of the V2.1 FMF ( Figure 5); this implies that when the FMF is more accurate, superior PM 2.5 estimation can be obtained. Zhao et al. [13] used MODIS FMF (mean error = 0.38) and the fused FMF, which showed a superior accuracy (mean error = 0.13), to estimate PM 2.5 , and the PM 2.5 estimated from the fused FMF (mean error = 41.8 µg/m 3 ) was superior to that of the MODIS FMF (mean error = 45.3 µg/m 3 ). Our results are consistent with those of earlier studies.

Application of FMF for Conducting PM 2.5 Estimations in China
As the EntityDenseNet model and Himawari-8 L2 V3.0 data provided the best results, we used them to retrieve PM 2.5 in 2020 over China to verify their potential in providing superior PM 2.5 estimations. PM 2.5 estimations, including AOT&FMF-PM 2.5 and AOT-PM 2.5 , were further compared with ground-level PM 2.5 values. The results showed that AOT&FMF-PM 2.5 provided significantly better accuracy than AOT-PM 2.5 on dust and haze days. Figure 8 shows the regional performances of AOT&FMF-PM 2.5 and AOT-PM 2.5 over China on typical dust and haze days. On 12 February 2020, according to the true color image (Figure 8c), haze covered the Beijing-Tianjin-Hebei (BTH) region, and the PM 2.5 concentration was significantly higher (>85 µg/m 3 ) than in other regions of China (Figure 8a,b). AOT&FMF-PM 2.5 showed close agreement with ground-based PM 2.5 ( Figure 8d) and it thus captured the high PM 2.5 concentration within the central BTH region. In contrast (Figure 8e), AOT-PM 2.5 underestimated the PM 2.5 concentration in the main urban area of Beijing, where the ground-based PM 2.5 exceeded 205 µg/m 3 . In addition, AOT-PM 2.5 overestimated PM 2.5 , where the ground-based measurement was less than 85 µg/m 3 in southeastern Tianjin. The difference between the modeling results ( Figure 8f) shows that, compared with AOT&FMF-PM 2.5 , AOT-PM 2.5 tended to overestimate PM 2.5 in areas with lower pollution levels (PM 2.5 < 85 µg/m 3 ), particularly in the southeastern part of the BTH regions when the reading was over 20 µg/m 3 , but it also underestimated PM 2.5 in seriously polluted areas (PM 2.5 > 125 µg/m 3 ) by over 10 µg/m 3 . Figure 8i shows that on 3 June 2020, the BTH region experienced a dust day, and there was a moderate concentration of PM 2.5 nationwide (<35 µg/m 3 ) (Figure 8g,h). Compared with ground-level PM 2.5 , AOT&FMF-PM 2.5 (Figure 8d) corresponded well with the low PM 2.5 levels (<25 µg/m 3 ) in the northern BTH region and the high PM 2.5 (>45 µg/m 3 ) in southeastern Tianjin, while AOT-PM 2.5 (Figure 8k) clearly underestimated PM 2.5 , with a PM 2.5 value of less than 45 µg/m 3 . Moreover, the difference in the modeling results ( Figure 8l) shows that, compared with AOT&FMF-PM 2.5 , AOT-PM 2.5 significantly overestimated PM 2.5 in the northern BTH region, the main urban area of Beijing, and southeastern Tianjin by over 5 µg/m 3 . In general, Figure 8 reveals that, although FMF had little influence on the overall distribution of PM 2.5 estimations in China, it significantly improved the estimation of PM 2.5 , and differences between AOT&FMF-PM 2.5 and AOT-PM 2.5 are evident on a regional scale.
According to Figure 9a,b, AOT&FMF-PM 2.5 provided R 2 and RMSE values of 0.60 and 11.67 µg/m 3 , respectively, on dust days, which were better than the results of AOT-PM 2.5 (R 2 = 0.57, RMSE = 13.05 µg/m 3 ). In addition, the overall accuracy of AOT&FMF PM 2.5 (R 2 = 0.62, RMSE=37.61 µg/m 3 ) on haze days also outperformed AOT-PM 2.5 (R 2 = 0.59, RMSE = 39.01 µg/m 3 ) (Figure 9c,d). This improvement can be attributed to the greater correlation achieved between fAOT and surface PM 2.5 . As shown by the linear relationships in Figure 10, fAOT calculated by Himawari-8 FMF and AOT had a higher correlation with PM 2.5 (dust day: R = 0.77; haze day: R = 0.52) than Himawari-8 AOT (dust day: R = 0.61; haze day: R = 0.48) under both conditions. Moreover, in Figure 11, AERONET fAOT (dust day: R = 0.82; haze day: R = 0.56) also showed better correlation with AOT (dust day: R = 0.72; haze day: R = 0.52) in both dust and haze days from 2015-2019. According to the number of haze and dust days occurring every year from to 2015-2019 (Figure 11c), Beijing experienced frequent haze and dust days, especially in 2015 when 90 haze days were recorded. As haze days are dominated by fine mode aerosols and prohibitively high PM 2.5 , fAOT becomes more important than AOT with respect to estimating PM 2.5 . These results therefore prove that using fAOT to estimate PM 2.5 is more suitable than using AOT, and this is in agreement with the results of previous studies, such as those of Di Nicolantonio et al. [69] and Yan et al. [16]. In addition, on haze days, Zhang and Li [59] found that the R 2 of PM 2.5 with fAOT was higher than that with AOT (0.69), which indicates that dominant fine mode aerosols on haze days result in a close association between fAOT and AOT, which makes the estimation of PM 2.5 more accurate when using FMF.
southeastern Tianjin. The difference between the modeling results ( Figure 8f) shows that, compared with AOT&FMF-PM2.5, AOT-PM2.5 tended to overestimate PM2.5 in areas with lower pollution levels (PM2.5 < 85 µg/m 3 ), particularly in the southeastern part of the BTH regions when the reading was over 20 µg/m 3 , but it also underestimated PM2.5 in seriously polluted areas (PM2.5 > 125 µg/m 3 ) by over 10 µg/m 3 .   this is in agreement with the results of previous studies, such as those of Di Nico et al. [69] and Yan et al. [16]. In addition, on haze days, Zhang and Li [59] found R 2 of PM2.5 with fAOT was higher than that with AOT (0.69), which indica dominant fine mode aerosols on haze days result in a close association between fA AOT, which makes the estimation of PM2.5 more accurate when using FMF.

Discussion
Satellite data have been used with various models to estimate ground-level PM2.5 by building a relationship between PM2.5 and AOT ( Figure 1). However, as shown in Figure  10, the correlations between PM2.5 and AOT are mediocre because they lack identical size information, and this difference in aerosol size information affects the correlations between PM2.5 and AOT. For example, when coarse mode aerosols are dominant, the relationships between PM2.5 and AOT are weakened compared to when fine mode aerosols dominate [28]. Yang et al. [70] reported that, although all the cities in the BTH region are located in urban agglomerations, the aerosols in the region mainly consist of fine particles, such as sulfates and nitrates. This means that the correlations between PM2.5 and AOT are higher than in cities within the Pearl River Delta (PRD) region, where aerosols comprise coarse particles, such as sea salt. In the BTH region, the proportions of fine particles vary among seasons, with higher numbers of fine mode aerosols during winter and higher numbers of coarse mode aerosols during spring; therefore, the relationship between PM2.5 and AOT is better in winter [71]. As PM2.5 is mainly composed of fine particles, separating the fine mode information from the total AOT helps to build a better and closer relationship with PM2.5.

Discussion
Satellite data have been used with various models to estimate ground-level PM 2.5 by building a relationship between PM 2.5 and AOT ( Figure 1). However, as shown in Figure 10, the correlations between PM 2.5 and AOT are mediocre because they lack identical size information, and this difference in aerosol size information affects the correlations between PM 2.5 and AOT. For example, when coarse mode aerosols are dominant, the relationships between PM 2.5 and AOT are weakened compared to when fine mode aerosols dominate [28]. Yang et al. [70] reported that, although all the cities in the BTH region are located in urban agglomerations, the aerosols in the region mainly consist of fine particles, such as sulfates and nitrates. This means that the correlations between PM 2.5 and AOT are higher than in cities within the Pearl River Delta (PRD) region, where aerosols comprise coarse particles, such as sea salt. In the BTH region, the proportions of fine particles vary among seasons, with higher numbers of fine mode aerosols during winter and higher numbers of coarse mode aerosols during spring; therefore, the relationship between PM 2.5 and AOT is better in winter [71]. As PM 2.5 is mainly composed of fine particles, separating the fine mode information from the total AOT helps to build a better and closer relationship with PM 2.5 .
In this study, we found that fAOT separated by FMF from total AOT had both significant linear and non-linear correlations with PM 2.5 ( Figure 4). This phenomenon has also been reported in a previous study, which showed that fAOT (R = 0.74) had a better correlation with PM 2.5 than AOT (R = 0.49) [16]. In addition, the study of Wei et al. [24] also found that the distribution of fAOT was closer to PM 2.5 than AOT over China, which indicated the superiority of the relationship between fAOT and PM 2.5 .
FMF has already been introduced in statistical models for estimating PM 2.5 ; for example, the study of She et al. [72] employed a linear mixed effect model that also combines FMF to estimate PM 2.5 in the Yangtze River Delta, China. However, FMF has rarely been applied in machine learning models. In this study, we used the deep learning model, Enti-tyDenseNet, and four traditional machine learning methods to apply FMF in estimating PM 2.5 over China. The results showed that, compared with using only AOT, estimations of PM 2.5 with FMF provided higher R 2 and lower RMSE values for all the models (Figure 7). This indicates that the use of aerosol size information can assist in providing estimations of PM 2.5 that are more accurate than those previously obtained. Zhang and Li [59] also reported that using fAOT provides better PM 2.5 estimates than AOT during haze days, with a reduction in the RMSE from 61 to 53 µg/m 3 . Spatially, on haze days (Figure 8), AOT-PM 2.5 showed an obvious overestimation in the southeastern BTH region (>20 µg/m 3 ) and an underestimation in the southern BTH region (>10 µg/m 3 ). On dust days, AOT-PM 2.5 provided a large overestimation in the main urban area of Beijing (of over 5 µg/m 3 ). The superior accuracy of AOT&FMF-PM 2.5 is partly due to the better correlation between PM 2.5 and fAOT than AOT and PM 2.5 (Figures 10 and 11). Therefore, using FMF in models can achieve different distributions of PM 2.5 estimations compared to using only AOT. Owing to the high spatial and temporal resolutions of satellites such as Himawari-8, improvements in PM 2.5 estimation will generate data providing more accurate fine spatial distributions and variations of PM 2.5 . This can enhance our understanding of the changes and transportation of fine mode pollutants.
In addition, Figure 7 shows that the FMF&AOT-PM 2.5 by Himawari-8 V3.0 outperforms FMF&AOT-PM 2.5 by Himawari-8 V2.1, which indicates that a more accurate FMF product is required to provide superior estimations of PM 2.5 . Although Figure 5 provides evidence of the improvements in V3.0 FMF compared to V2.1, it still provides a low accuracy (R = 0.33) and large RMSE of 0.26, which hinders the application of FMF on PM 2.5 estimations. Some studies have proposed using modified satellite-based FMF retrievals to improve the accuracy of FMF. For example, Yan et al. [38] applied an improved LUT-SDA for FMF retrieval, which resulted in an RMSE of 0.168, and Chen et al. [73] used deep learning to improve the accuracy of FMF, and achieved an RMSE of 0.157. However, FMF products with a high spatial resolution on a global scale are currently rare. Although Yan et al. [74] generated a 10-year FMF product on a global scale, it has a spatial resolution of 1 • × 1 • , which cannot fulfill the needs of estimating PM 2.5 at a high spatial resolution. Therefore, the generation of a highly accurate FMF product with a high spatial resolution is urgently required.

Conclusions
This study investigated the linear and non-linear relationships between fAOT and PM 2.5 over five AERONET stations in China (Beijing, Baotou, Taihu, Xianghe, and Xuzhou) using AERONET fAOT and ground-level PM 2.5 data. The linear relationships were significant over the stations (with an overall R of 0.45), and the non-linear relationships between fAOT and PM 2.5 were significant for all stations in China. Data obtained from Himawari-8 aerosol product data and meteorological data obtained in the year 2020 were then used to estimate PM 2.5 employing a deep learning method (EntityDenseNet) and four traditional machine learning methods (Extratree, RF, XGBoost, and LightGBM). The PM 2.5 estimations were found to be more accurate when adding FMF as input data (FMF&AOT-PM 2.5 ) than when using only AOT (AOT-PM 2.5 ). Compared with the FMF&AOT-PM 2.5 by V2.1 FMF, the FMF&AOT-PM 2.5 by V3.0 FMF showed further improvement. These results indicate that when FMF is more accurate, superior PM 2.5 estimations can be obtained. When both FMF&AOT-PM 2.5 and AOT-PM 2.5 from 2020 over China were applied, the agreement between FMF&AOT-PM 2.5 and ground-level PM 2.5 was obviously closer than that of AOT-PM 2.5 on both dust and haze days. These results are related in part to the better correlation between PM 2.5 and fAOT (dust days: R = 0.82; haze days: R = 0.56) than with AOT (dust days: R = 0.72; haze days: R = 0.52).
This study demonstrates that incorporating FMF can effectively improve the ability of machine learning models to accurately estimate PM 2.5 from satellite data. However, a more accurate, higher spatial and temporal resolution of the FMF product is required to further improve PM 2.5 estimation.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13142779/s1. Figure S1 Figure S6. Density scatter plots of modeling results of ground-based PM 2.5 retrieved by four machine learning models (Extratree, Random Forest, LightGBM and XGBoost) based on four different FMF products (Himawari-8 V2.1 and V3.0 FMF, MODIS and VIIRS FMF) with the same lengths for training and test datasets. The black and red lines represent 1:1 and fitting lines, respectively. Because the amount of training data is small (N = 1321), which is unsuitable for applying deep learning models, EntityDenseNet was not used for PM 2.5 estimations here. Table S1. Previous studies of machine learning retrieved PM 2.5 since 2014. Table S2. AERONET stations and their data level used in this study. Table S3. The parameters for the four machine learning models used in this study. Table S4. The class types and their abbreviations of global climate zone. Table S5  Data Availability Statement: The Himawari-8 was collected from the Himawari Monitor and the P-Tree system (ftp.ptree.jaxa.jp, accessed on 4 June 2021). The AERONET data are available at https://aeronet.gsfc.nasa.gov/ (accessed on 4 June 2021). The radionsonde data are from NOAA's Integrated Global Radiosonde Archive data (https://www.ncdc.noaa.gov/data-access/weatherballoon/integrated-global-radiosonde-archive, accessed on 4 June 2021). The in-situ meteorological data are from NOAA's National Centers for Environmental Information (https://www.ncei.noaa. gov/products/integrated-surface-database, accessed on 4 June 2021). The in-situ PM2.5 data are available at the China National Environmental Monitoring Center (http://www.cnemc.cn) (accessed on 4 June 2021).