Hourly PM 2.5 Estimation over Central and Eastern China Based on Himawari ‐ 8 Data

: In this study, an improved geographically and temporally weighted regression (IGTWR) model for the estimation of hourly PM2.5 concentration data was applied over central and eastern China in 2017, based on Himawari ‐ 8 Advanced Himawari Imager (AHI) data. A generalized distance based on the longitude, latitude, day, hour, and land use type was constructed. AHI aerosol optical depth, surface relative humidity, and boundary layer height (BLH) data were used as independent variables to retrieve the hourly PM 2.5 concentrations at 1:00, 2:00, 3:00, 4:00, 5:00, 6:00, 7:00, and 8:00 UTC (Coordinated Universal Time). The model fitting and cross ‐ validation performance were satisfactory. For the model fitting set, the correlation coefficient of determination ( R 2 ) between the measured and predicted PM 2.5 concentrations was 0.886, and the root ‐ mean ‐ square error (RMSE) of 437,642 samples was only 12.18 μ g/m 3 . The tenfold cross ‐ validation results of the regression model were also acceptable; the correlation coefficient R 2 of the measured and predicted results was 0.784, and the RMSE was 20.104 μ g/m 3 , which is only 8 μ g/m 3 higher than that of the model fitting set. The spatial and temporal characteristics of the hourly PM 2.5 concentration in 2017 were revealed. The model also achieved stable performance under haze and dust conditions.


Introduction
Real-time monitoring of ground-level fine particulate matter (PM2.5) concentrations is essential for the early warning of extreme weather events such as dust and haze, and the prediction of health exposure risks [1]. Satellite remote sensing with a large spatial coverage and continuous observations can address the limitations of ground-based monitoring [2][3][4][5] and provide comprehensive and reliable PM2.5 monitoring. Daily monitoring by polar orbit satellites cannot meet the requirements for real-time continuous monitoring. By contrast, the geostationary satellite Himawari-8 can provide high-frequency observations with broad coverage. Several aerosol retrieval methods based on Himawari-8 data have been developed [6][7][8]. Using an optimal estimation method, She et al. [9] [27]. The number of different land use types refers to the Land Use and Cover Change (LUCC) classification system, as shown in Table 1. The red boundary line represents the study area.

PM2.5 Data
The in situ ground-level PM2.5 concentrations are provided by the National Real-time Air Quality Publishing Platform (http://106.37.208.233:20035/). We collected hourly PM2.5 data for 1385 sites in the study area in 2017. The stations are relatively dense in urban areas, especially in the North China Plain, Northeast Plain, Yangtze River Delta, Guanzhong Plain, Pearl River Delta, and Sichuan Basin. The ground-based PM2.5 observation network provides sufficient data to support the construction of the PM2.5 remote sensing estimation model.

Land Use Type
Notably, the land use types in adjacent areas may be different. For example, there are two different types of cultivated land, paddy fields and dry land, in eastern China. The underlying surface inevitably affects the relationship between the AOD and PM2.5. This study mainly focuses on the effects of geographic distance, the type of underlying surface, the observation time, and other factors. The data on land use types in China in 2015 were provided by the Resource and Environment Data Cloud Platform (http://www.resdc.cn/) at a resolution of 10 km. These data were obtained via remote sensing interpretation methods, with human-computer interactions and visual interpretations based on Landsat 8 data [27]. Figure 1 shows the distribution of land use types in China in 2015. Table 1 describes the Land Use and Cover Change (LUCC) classification system. The land use types show the distribution of arable land, woodland, grassland, and desert from the east to the inland northwestern area. The relative surface humidity in central and eastern China varies greatly between 9:00 a.m. and 16:00 p.m. Beijing time. Atmospheric moisture plays an important role in the formation of secondary pollutants. Therefore, the humidity must be considered in PM estimations. The surface relative humidity data used in this study are derived from Global Surface Summary of the Day observation data (https://catalog.data.gov/dataset/global-surface-summary-of-the-day-gsod). The temporal resolution is 30 min or 3 h depending on the site. First, we performed the temporal interpolation for each site. For each of the eight times from 9:00 a.m. to 16:00 p.m. Beijing time on one site, the closest relative humidity observation data before and after the time were extracted. The linear interpolation between the closed data extracted was used to obtain the relative humidity at each time. Then, we performed the spatial interpolation for each time. The surface relative humidity of each time was obtained via inverse distance-weighted interpolation between all the observation sites.

Boundary Layer Height
The boundary layer height has a notable effect on the atmospheric particulate matter concentrations [28]. The planetary boundary layer height used in this study was obtained from the ERA-2000 dataset released by the European Centre for Medium-Range Weather Forecasts (ECMWF), with a time resolution of 3 h. The BLH data on each site were extracted according to the ECMWF BLH value on its located pixel. The intraday hourly BLH was obtained by the linear interpolation between the closed data before and after each time. The intraday hourly BLH over central and eastern China was obtained via inverse distance-weighted interpolation between all the observation sites.

Improved Geographically and Temporally Weighted Regression Model
The geographic data are affected by both spatial and temporal factors. The geographically and temporally weighted regression model (GTWR) has been developed to solve the temporal and spatial problems (Equation (1)) [29]: In this function, i denotes the observation point (i = 1, 2, ..., n, where n is the total number of observation points), k is a modelling parameter (k = 1, 2, ..., d, where d is the total number of parameters), is the estimated parameter, is the independent variable representing the value of k at point i, and is the coefficient; , , and denote the longitude, latitude, and date, respectively, and represents the random error. However, the estimation of surface level PM2.5 is not only affected by spatial-temporal factors, but also influenced by the underlying surface conditions. In this study, an improved geographically and temporally weighted regression (IGTWR) model was thereby established (Equation (2)). The parameters used in this model are AOD, RH, and BLH. The spatial (longitude and latitude), temporal (day and hour), and underlying surface data (land use data) are used to characterize the relationship between different sample sites.
In Equations (2) and (3), , , , ℎ , represent longitude, latitude, day, hour, and land use type, respectively. d refers to the number of independent variables (Himawari-8 AOD, relative humidity, and boundary layer height) in this model, and d = 3; i denotes the observation point, i = 1,2, ..., n, where n is the total number of observation points.
In this study, in order to calculate the weights of different sites in the regression process, the improved geographical and temporal weights , , , ℎ , were introduced into the model. The weights were calculated using the longitude, latitude, day, hour, and land use type data for the 1385 observation stations in the study area. The Himawari-8 AOD, relative humidity, and boundary layer height data were used as independent variables to fit the concentration of PM2.5 over the study area. The objective function f is given by The regression objective is to find a set of coefficients that can minimize the objective function f. Assuming that a set of coefficients , , , ℎ , can minimize the objective function according to the least-squares principle: The partial derivative of the objective function f (Equation (4)) is computed, and the partial derivative with , , , ℎ , , r = 0, 1, …, d, is then set equal to 0, as follows: Equation (6) can be written as the following equation (Equation (7)): The next step is to solve the above equations (Equation (7)). Let W, Y, and X represent the following matrices: Then, Equation (7) can be expressed as follows: The estimated value, , , , ℎ , (Equation (13)), of β , , , ℎ , at , , , ℎ , can be obtained by solving Equation (11) as follows (Equation (12)): , , , ℎ , , , , ℎ , , , , ℎ , … , , , ℎ , .
The estimated PM2.5 concentration Y at , , , ℎ , is as follows: Therefore, the PM2.5 concentration at , , , ℎ , is as follows： and The selection of the weight coefficient , , , ℎ , is particularly important for the regression model. The methods of calculating and selecting the weight coefficient , , , ℎ , are described below. The distance between the estimated points , , , ℎ , and any other observed sample data , , , ℎ , is defined as Here, , , and are the weight coefficients. The weight function , , , ℎ , is expressed as a Gaussian function: Here, ℎ , ℎ , ℎ , and ℎ are the bandwidths corresponding to the spatial (longitude and latitude), temporal (day and hour), and land use bandwidths, and , , , ℎ , and are weights. In this study, the cross-validation method [30] is used to select the optimal bandwidth and the optimal bandwidth that can provide the minimum residual: In this study, we used spatial, temporal, and underlying surface parameters to calculate the weight of different samples in the regression model. The hourly AHI AOD, BLH, and RH were used as independent variables to estimate the hourly surface level PM2.5 over central and eastern China. The tenfold cross-validation was used as the PM2.5 validation method. The model performance in the haze and dust event has also been evaluated and analyzed ( Figure 2).

Performance Evaluation of the IGTWR Model
In this study, the temporal, geographical, and underlying surface-weighted regression model was used to retrieve the ground-level PM2.5 for 8 h per day during the daytime over central and eastern China in 2017. The hourly Himawari-8 AOD (1:00, 2:00, 3:00, 4:00, 5:00, 6:00, 7:00, and 8:00 UTC, which correspond to 9:00, 10:00, 11:00, 12:00, 13:00, 14:00, 15:00, and 16:00 Beijing time), surface relative humidity, and boundary layer height were selected as input parameters. The longitude, latitude, day, hour, and land use type were used to estimate the regression weight, and 437,642 samples were successfully matched and used as the modelling set to construct the GWR model. The model fitting performance is shown in Figure 3 (left). The tenfold cross-validation method was used to evaluate the model fitting (Rodriguez et al. 2010). We split the 437,642 matched data points evenly into ten parts. One part was used for validation, and the remaining nine parts were used for training. This process was repeated for every fold. The cross-validation results are shown in Figure 3 (right). For the 437,642 samples, the R 2 value between the predicted and observed PM2.5 is 0.886, and the rootmean-square error (RMSE) of the model fit is only 12.180 μg/m 3 . In the scatterplot of the measured values versus the predicted values, the points are distributed around the 1:1 line. The correlation coefficient R 2 of the cross-validation is 0.784, and the RMSE is 20.104 μg/m 3 , which is only 8 μg/m 3 higher than the RMSE of the model fit. The slope of the fitting line of the measured-predicted PM2.5 scatter plot is 0.747, which indicates that the PM2.5 was underestimated. However, for most sample points below 200 μg/m 3 , this underestimation is not obvious. The accuracy, reliability, and robustness of the PM2.5 estimation under typical conditions are relatively high.

Hourly PM2.5 Concentration in Central and Eastern China
The hourly spatial distributions of PM2.5 in the central and eastern parts of China are shown in Figure 4. The distribution of the 1385 sites is also shown in Figure 4 (second column from left). In addition, the annual PM2.5 concentrations based on observational data are also shown in Figure 4. The annual average PM2.5 concentration is below 100 μg/m 3 for most of the study area. The spatial and temporal patterns of the estimated PM2.5 concentrations are in agreement with the observed patterns. Spatially, severe atmospheric fine particulate matter pollution occurred mainly in the North China Plain, the Guanzhong Plain, the Fenwei Plain, the Yangtze River Delta, South China, and the Sichuan Basin. The fine particulate matter pollution in the North China Plain was extreme. An area of relatively stable and severe PM2.5 pollution formed in southern Hebei Province and northern Henan Province at the eastern foot of the Taihang Mountains owing to the effect of the mountains. Notably, the PM2.5 concentration in Beijing is even lower than that in the Yangtze River Delta and Pearl River Delta, which may be related to the air pollution measured in Beijing. The Atmospheric Environment Meteorological Bulletin also showed that the air quality values in Beijing, Tianjin, and Hebei improved continuously in 2017, and the annual number of haze pollution days dropped by 18.1. In southern China, for example, in Guangxi, the annual average concentration of fine particulate matter is more than 40-60 μg/m 3 . This result may be related to the unfavorable air diffusion conditions, such as temperature inversions, and the effects of biomass combustion in northern China and southern Asia [31].
Temporally, the PM2.5 concentrations in central and eastern China are higher in the morning and lower in the afternoon, and they tend to decrease gradually with time. Figure 4 shows that the highest concentration of PM2.5 occurred at 2:00 UTC. The annual mean concentration of PM2.5 in the main polluted areas is above 60 μg/m 3 and then decreases gradually with time. At 8:00 UTC, the average PM2.5 concentration in the main polluted areas falls below 60 μg/m 3 . This variation in hourly PM2.5 may be closely related to the meteorological conditions. Before sunrise, the height of the atmospheric boundary layer is relatively low, and aerosols are concentrated in the lower level of the atmosphere, which results in a relatively high PM2.5 concentration near the surface. Then, with increasing solar altitude, the atmosphere is heated by solar radiation, which causes expansion and thermodynamic uplift, and the boundary layer rises accordingly [32,33]. The atmospheric particulate matter also moves to the upper atmosphere, resulting in a gradual decrease in the ground-level PM2.5 concentration. The geostationary satellite records this process perfectly. The Sichuan Basin is located in the southwest of China. The PM2.5 in the Sichuan Basin and other western areas obviously decreases at 5:00 UTC. The PM2.5 in the Huabei Plain and other eastern areas decreases after 3:00 UTC. The Sichuan Basin and the Huabei Plain are marked on Figure 1. It is very interesting that the decrease in PM2.5 in western China occurs later than in eastern China. Owing to the Earth's rotation, the western region of China receives solar radiation later than the eastern region, which delays the periodic change in PM2.5. This phenomenon proves once again that the ground-level PM2.5 concentration is greatly affected by the solar radiation and atmospheric conditions. The intraday concentration of atmospheric fine particulate matter is not constant. Therefore, hourly monitoring of the PM2.5 concentration is necessary.
We selected 26 sites in central and eastern China as typical cities and calculated the annual PM2.5 concentration in 2017 ( Figure 5). Differences between the estimated and observed PM2.5 concentrations in the selected 26 cities were analyzed. The PM2.5 concentrations were relatively high in Jinan, Hangzhou, Hefei, Zhengzhou, Xi'an, Taiyuan, and Chengdu. The measured and predicted PM2.5 concentrations in typical cities were in agreement except for those in Xi'an and Taiyuan. The underestimation of PM2.5 in Xi'an and Taiyuan is related to the great disparity of PM2.5 between rural and urban areas. As shown in the true color map of China in Figure 5, Xi'an and Taiyuan are located in the narrow plain or basin, which keeps pollution low-lying and stagnant. In addition, the intense human activities in urban areas also aggravate the air pollution. Xi'an and Taiyuan become the isolated PM2.5 hotspot that affected the performance of the model. From 1:00 to 8:00 UTC, the PM2.5 concentrations first increase and then decrease in nearly all the cities. The peak of the PM2.5 concentration appears earlier in eastern China than in western China, which is also consistent with our previous analysis. It is related to the influence of solar radiation. Due to the rotation of the Earth, the solar angle and intensity of radiation over eastern and central China is different. The boundary layer rises with the increase in solar radiation and carries the atmospheric particulate matter upward [32,33], and the ground-level PM2.5 concentrations decrease accordingly.

Model Performance in Typical Cases
The analysis in this section shows that the model is sensitive to the estimates of the ground-level hourly PM2.5 concentrations. The model performance was tested for typical cases such as haze events and dust events.

Model Performance during Haze Events
A serious haze event occurred in South China on December 24, 2017. Based on the temporal, geographical, and underlying surface-weighted regression models, the PM2.5 concentrations for 8 h on December 24, 2017 were estimated. The results are shown in Figure 6. The PM2.5 estimation results were highly consistent with the monitoring data. The results show that haze occurred in Jiangxi, Hunan, Hubei, Anhui, Zhejiang, and Shanghai and that the PM2.5 concentration exceeded 100 μg/m 3 . The PM2.5 concentration was relatively high in the morning and then decreased gradually. The geostationary satellite retrieval results were also used to monitor the haze dissipation process. The PM2.5 concentration began to decrease at 2:00 UTC in Anhui Province. Next, the haze in Zhejiang Province also began to dissipate. Although haze was still present in Jiangxi, Hunan, and Hubei provinces, the PM2.5 concentration decreased. Until 7:00 UTC, the PM2.5 concentration over the middle and lower reaches of the Yangtze River was below 70 μg/m 3 . In the middle and lower reaches of the Yangtze River on the 24th, the PM2.5 concentration varied by more than 100 μg/m 3 in 8 h. The geostationary satellite satisfied the requirements for continuous, real-time, and large-scale monitoring of PM2.5 under these extreme weather conditions.

Model Performance during Dust Events
Dust storms are another weather event in which the PM2.5 concentrations rise rapidly. An early warning and monitoring system for dust storms is particularly important. During severe dust storms, ground monitoring of PM2.5 is often inaccurate owing to excessive wind speed and shows an abrupt increase in particulate matter concentration. This problem makes accurate monitoring of PM2.5 concentrations difficult. However, geostationary satellites provide stable and continuous monitoring platforms that are not affected by extreme weather near the ground. The model's performance during dust events was also tested. Figure 7 shows a dust plume with a high PM2.5 concentration that spreads from northwestern to northeastern China in May 2017. Dust was widespread in the north-central part of China. The PM2.5 concentrations in Inner Mongolia, Southwestern Gansu, Ningxia, Northern Shaanxi, Shanxi, and Hebei provinces were more than 200 μg/m 3 . The geostationary satellite successfully captured the movement of the dust plume (Figure 7), which gradually moved from Inner Mongolia Province to Northeast China. The PM2.5 concentration in western Inner Mongolia began to decrease gradually. After 6:00 UTC, the dust in western Inner Mongolia, southeastern Gansu, and Ningxia provinces began to weaken gradually. Until 8:00 UTC, the PM2.5 concentration in the dust plume decreased to approximately 70 μg/m 3 . The effective ground-based PM2.5 observation data were limited on May 4 and were concentrated mainly in the North China Plain. There were no ground monitoring data for Inner Mongolia, which was seriously affected by dust. The satellite-based PM2.5 concentration estimation results are still highly consistent with the observation data.

Discussion
The AHI AOD was calculated by a novel algorithm that uses a non-Lambertian forward model coupled with a surface bidirectional reflectance distribution function model. The accuracy of the retrieved AHI AOD is comparable to that of the MODIS AOD (C6). The coverage rate of the AHI AOD is better than that of the Japan Meteorological Agency (JMA)/AOD products [9].
Unlike the models used in previous studies, the IGTWR model generalized the geographical distance. Hu et al. [24] used the spatial distance as the weight parameter; later, Huang et al. [29] generalized this distance to a geographical and temporal distance by introducing the day number into the model. In this study, we also use the day, hour, and land use type to construct the distance. The R 2 value of the observed and predicted hourly PM2.5 in our GTLWR fitting set is 0.886, and the RMSE is 12.18 μg/m 3 for the 437,642 samples. In addition, the R 2 value of the observed and predicted daily PM2.5 concentrations obtained by IGTWR is 0.87, and the RMSE is 14.56 μg/m 3 for 60,810 samples in China. The IGTWR model still outperforms the GTWR model with a sample set that was approximately eight times larger.
In this study, ground-level hourly PM2.5 concentrations in central and eastern China with high estimation accuracy were obtained. This achievement is meaningful for high-temporal-resolution air quality monitoring in China. Although the relevant research has been reported, which mainly focus on a relatively small region. Wang et al. [20] estimated the hourly PM2.5 concentrations based on an improved linear mixed effects model for the Beijing-Tianjin-Hebei region with Himawari-8 data. The RMSE of the tenfold cross-validation result for the Beijing-Tianjin-Hebei region in that study was 24.3 μg/m 3 for 83,989 samples, whereas, in this study, we obtained the PM2.5 validation result for central and eastern China with an RMSE of 12.18 μg/m 3 for 437,642 samples.
The general spatial pattern of the hourly PM2.5 pollution in China obtained by IGTWR is similar to that in previous studies [10,25], except that GTLWR shows that the Fenwei Plain becomes a PM2.5 hotspot, and the North China Plain has lower PM2.5 concentrations. The temporal characteristics of the satellite-based hourly PM2.5 data in central and eastern China are also revealed. PM2.5 showed an intraday decreasing trend, and the different performance in the eastern and western parts of our study area may be related to the timing of solar radiation [31].
The performance of the IGTWR model in extreme weather conditions, especially during dust events, is similar to the dust detection results reported in a previous study. She et al. [6] introduced an enhanced dust intensity index based on Himawari-8 brightness temperature data. The distribution of the hourly PM2.5 concentration estimated by IGTWR on May 4, 2017 is consistent with the dust intensity index.

Conclusion
A temporally, geographically, and underlying surface weighted-regression model of the hourly PM2.5 concentration was applied to central and eastern China in 2017. In this study, the weight of the generalized geographically weighted model was constructed using the longitude, latitude, day, hour, and land use type, and the AHI AOD, surface relative humidity, and boundary layer height data were used as independent variables. The hourly PM2.5 concentrations in eastern and central China were estimated at eight times (1:00, 2:00, 3:00, 4:00, 5:00, 6:00, 7:00, and 8:00 UTC) in 2017.
The model fitting and cross-validation performance are satisfactory. For the modelling set, the R 2 value of the measured and predicted PM2.5 is 0.886, and the RMSE is only 12.180 μg/m 3 . The tenfold cross-validation results of the regression model are also acceptable; the correlation coefficient R 2 of the measured and predicted PM2.5 is 0.784, and the RMSE is 20.104 μg/m 3 , which is only 8 μg/m 3 higher than that of the modelling set. The validation results showed highly predictable and reliable PM2.5 estimation.
There are significant regional pollution characteristics over central and eastern China in 2017. Fine particulate matter pollution events are concentrated mainly in the North China Plain, Guanzhong Plain, Fenwei Plain, middle and lower reaches of the Yangtze River, South China, and Sichuan Basin. The ground-level PM2.5 concentration is negatively correlated with the solar radiation. The predicted and measured values of the PM2.5 concentration are highly consistent throughout the study area. The model results represent the actual distribution characteristics of PM2.5 in central and eastern China.
The model designed in this study can monitor the occurrence, development, and termination of extreme weather events. The predicted PM2.5 concentration is very similar to the observed PM2.5 concentration. Satellite-based hourly PM2.5 monitoring will play an important role in predicting extreme weather events and providing warnings in advance.