PM2.5 Estimation and Spatial-Temporal Pattern Analysis Based on the Modified Support Vector Regression Model and the 1 km Resolution MAIAC AOD in Hubei, China

The satellite-retrieved Aerosol Optical Depth (AOD) is widely used to estimate the concentrations and analyze the spatiotemporal pattern of Particulate Matter that is less than or equal to 2.5 microns (PM2.5), also providing a way for the related research of air pollution. Many studies generated PM2.5 concentration networks with resolutions of 3 km or 10 km. However, the relatively coarse resolution of the satellite AOD products make it difficult to determine the fine-scale characteristics of PM2.5 distributions that are important for urban air quality analysis. In addition, the composition and chemical properties of PM2.5 are relatively complex and might be affected by many factors, such as meteorological and land cover type factors. In this paper, an AOD product with a 1 km spatial resolution derived from the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm, the PM2.5 measurements from ground sites and the meteorological data as the auxiliary variable, are integrated into the Modified Support Vector Regression (MSVR) model that proposed in this paper to estimate the PM2.5 concentrations and analyze the spatiotemporal pattern of PM2.5. Considering the relatively small dataset and the somewhat complex relationship between the variables, we propose a Modified Support Vector Regression (MSVR) model that based on SVR to fit and estimate the PM2.5 concentrations in Hubei province of China. In this paper, we obtained Cross Correlation Coefficient (R2) of 0.74 for the regression of independent and dependent variables, and the conventional SVR model obtained R2 of 0.60 as comparison. We think our MSVR model obtained relatively good performance in spite of many complex factors that might impact the accuracy. We then utilized the optimal MSVR model to perform the PM2.5 estimating, analyze their spatiotemporal patterns, and try to explain the possible reasons for these patterns. The results showed that the PM2.5 estimations retrieved from 1 km MAIAC AOD could reflect more detailed spatial distribution characteristics of PM2.5 and have higher accuracy than that from 3 km MODIS AOD. Therefore, the proposed MSVR model can be a better method for PM2.5 estimating, especially when the dataset is relatively small.


Introduction
Currently, air pollution and its related health problems have become research hot spots [1]. Numerous studies have indicated that particles smaller than 2.5 µm in aerodynamic diameter (PM 2.5 ) have adverse effects on human health and can cause pulmonary and cardiovascular diseases [2,3].
People exposed to polluted environments are prone to illness or even death. Thus, PM 2.5 exposure monitoring and pattern analysis are critical to air quality assessment and environmental epidemiologic studies [4,5]. PM 2.5 concentrations are traditionally obtained by ground monitoring sites distributed throughout a country. However, the existing ground monitoring sites are too sparse to provide continuous PM 2.5 monitoring due to the high construction cost. In contrast, satellite remote sensing has wide and continuous spatial coverage and has been widely applied in the estimation of PM 2.5 concentrations [6,7], although the cloud could affect the availability of data.
Satellite-derived Aerosol Optical Depth (AOD) represents the quantity of light removed from a beam by the role of aerosol scattering or absorption during its path [8,9]. Furthermore, previous studies have indicated that there exists a direct relationship between the atmospheric particles (such as PM 2.5 ) and AOD [10]. Thus, remote sensing satellites' AOD products provide a potentially cost-effective way to estimate ground-level PM 2.5 mass concentrations [11,12]. A series of AOD products have been applied to such surveys [13], e.g., the Moderate Resolution Imaging Spectroradiometer (MODIS) [14,15], the Multi-angle Imaging Spectroradiometer [16], the Himawari-8 (H8) [17,18], and the Visible Infrared Imaging Radiometer Suite [19]. However, the relatively coarse spatial resolutions (usually 3 km or 10 km) of the above-mentioned satellite sensors limit the precise estimates of PM 2.5 in urban areas. Recently, the Multi-Angle Implementation of Atmospheric Correction (MA-IAC) algorithm, which utilizes time-series analysis and image-based processing techniques, was developed to conduct aerosol retrievals and atmospheric corrections.
The Multiangle Implementation of Atmospheric Correction (MAIAC) is a new generic algorithm applied to collection 6 (C6) MODIS measurements to retrieve Aerosol Optical Depth (AOD) over land at high spatial resolution (1 km) [20]. The related AOD product MCD19A2 (MODIS Collection 6 (C6) daily AOD dataset), which is based on the MAIAC algorithm, was released in 2018 [21]. Although the goal of the MAIAC AOD product is aerosol monitoring, this product of 1 km resolution gives us chance to estimate PM 2.5 concentrations in a higher spatial and temporal resolution.
Previous studies have shown that the relationship between PM 2.5 and AOD is relatively complex and may be affected by a series of parameters, such as the aerosol type and the vertical structure of aerosol distribution [22], the relative humidity (RH) [23], planetary boundary layer height (PBLH) [24], wind speed and direction [25], the depth and temperature difference of the inversion layer [24], land cover [26], etc. Furthermore, recently, more sophisticated methods used to estimate PM 2.5 have been developed by taking into account these parameters.
Studies have tried to explore the relationship between these variables by statistical approaches. There have been many different approaches proposed by studies that explored the relationship between PM 2.5 and AOD. For example, including but not limited to, the linear regression model, the geographically weighted regression model [27,28], the twostage model [29] and the newly developed neural network methods [30,31]. As geospatial data, PM 2.5 concentration data have spatial heterogeneity and spatial dependence. The statistical characteristics of PM 2.5 concentrations may vary over space and time. This space-time anisotropy may violate the independent and identically distributed random variables in most of the machine learning methods [32].
The Support Vector Machine (SVM) based on the principle of structural risk minimization initially developed for solving classification problems using small sample learning is found to be promising for solving regression problems. The SVM for regression termed as Support Vector Regression (SVR) has revealed superior performance due to its inherent capability to circumvent overfitting problem in regression and improved response approximation ability [33].
Considering the characteristics of the experiment: 1. Nonlinear and complex relationship; as an atmospheric research, the relationship between PM 2.5 , AOD and auxiliary variables is rather complicated and it would be better to describe it with nonlinear model. The kernel function can simplify the inner product operation in the mapping space, avoiding calculating in the high-dimensional space directly. 2. The relatively small dataset. Regression algorithms generally obey the law of big data; this means that the final result is relatively more accurate with more samples. The SVM makes it possible to achieve relatively good results on small samples. SVR based on the support vector machine solves regression problems using small sample learning. Furthermore, the small sample is a considerable concept; we think the samples in our experiment are enough for digging out the nonlinear relationship.
3. The capability to handle high-dimensional data sets well. SVR can grasp the nonlinear relationship between data and features on relatively small datasets, especially compared to most other machine learning methods.
Thus, this paper proposes the modified SVR (MSVR) method to improve the estimation accuracy of PM 2.5 concentrations. MSVR considers the impacts of spatial distance on estimation accuracy and adds factors in the model input that are generally included and contribute relatively more significant influence, with MAIAC AOD as the primary predictor and the meteorological and land cover information as ancillary information.

Study Area
The study area of this paper is Hubei, China. Hubei Province is located in central China, and its capital city, Wuhan, is one of the largest cities and developing centers of the country (Figure 1). With the rapid development of urbanization and industrialization in China, climate disasters, such as smog, frequently occur in large cities in China as a result of worsening atmospheric and environmental conditions [34]. Research on PM 2.5 estimation and analysis in Wuhan can provide constructive guidance for urban air quality research in other large cities in China. Meanwhile, due to the complex climate and land cover of Hubei Province, research in this area can help analyze the effects of various factors such as climate and land cover on PM 2.5 distribution. operation in the mapping space, avoiding calculating in the high-dimensional space di rectly.
2. The relatively small dataset. Regression algorithms generally obey the law of bi data; this means that the final result is relatively more accurate with more samples. Th SVM makes it possible to achieve relatively good results on small samples. SVR based o the support vector machine solves regression problems using small sample learning. Fur thermore, the ′small sample′ is a considerable concept; we think the samples in our exper iment are enough for digging out the nonlinear relationship.
3. The capability to handle high-dimensional data sets well. SVR can grasp the non linear relationship between data and features on relatively small datasets, especially com pared to most other machine learning methods.
Thus, this paper proposes the modified SVR (MSVR) method to improve the estima tion accuracy of PM2.5 concentrations. MSVR considers the impacts of spatial distance o estimation accuracy and adds factors in the model input that are generally included and contribute relatively more significant influence, with MAIAC AOD as the primary predic tor and the meteorological and land cover information as ancillary information.

Study Area
The study area of this paper is Hubei, China. Hubei Province is located in centra China, and its capital city, Wuhan, is one of the largest cities and developing centers o the country (Figure 1). With the rapid development of urbanization and industrializatio in China, climate disasters, such as smog, frequently occur in large cities in China as result of worsening atmospheric and environmental conditions [34]. Research on PM2 estimation and analysis in Wuhan can provide constructive guidance for urban air qualit research in other large cities in China. Meanwhile, due to the complex climate and land cover of Hubei Province, research in this area can help analyze the effects of various fac tors such as climate and land cover on PM2.5 distribution.

MAIAC and MODIS AOD Data
The MAIAC algorithm is an advanced algorithm for aerosol retrievals and atmos pheric correction for MODIS data over both dark vegetated and bright desert surfaces This algorithm explores the advantages of time-series processing at a synergistic level fo cloud masking and aerosol-surface retrievals. The MAIAC algorithm uses up to 16 day of gridded MODIS measurements to make simultaneous retrievals of AOD and surfac

MAIAC and MODIS AOD Data
The MAIAC algorithm is an advanced algorithm for aerosol retrievals and atmospheric correction for MODIS data over both dark vegetated and bright desert surfaces. This algorithm explores the advantages of time-series processing at a synergistic level for cloud masking and aerosol-surface retrievals. The MAIAC algorithm uses up to 16 days of gridded MODIS measurements to make simultaneous retrievals of AOD and surface bidirectional reflectance distribution factor (BRF)/albedo and then produces the AOD at a 1 km resolution. The algorithm uses both individual grid cells (also called pixels) and fixed-size (25 × 25 km 2 ) areas (also called blocks), as required by the cloud mask algorithm and retrieval [21].
Studies have indicated that MAIAC AOD agreed well with AERONET AOD (Aerosol Robotic Network AOD), which is commonly considered as the ground true values of AOD [35]. Additionally, comparison studies between the satellite AOD and the AERONET AOD showed that the MAIAC algorithm improved the accuracy than MODIS DB (Deep Blue)/DT (Dark Target) [36]. Mhawish [20] obtained RMSE 0.148 for MAIAC, 0.198 for MODIS DB, and 0.183 for MODIS DT.
AERONET AOD currently has 61 stations distributed in China, but there are no stations in our research area, Hubei Province. Therefore, we may not be able to conduct a comparison of AERONET and MODIS data in the study area of the study period. Furthermore, there has been research that gained the insights regarding the performance of MAIAC algorithm by comparing MAIAC aerosol products with the ground-based observations at nine typical sites spread out in China. The research indicated that MAIAC and ground-observed AOD values showed high correlation coefficient (R of most sites reached higher than 0.85 or even 0.9) in spite of the complicated surface types, diverse aerosol sources, and heavy loading of aerosols in the vast atmosphere of China. MODIS MAIAC 1 km AOD at 550 nm, compared with AERONET measurements in Xuzhou and Taihu sites that are relatively close to our research area, obtained R of 0.91 and 0.878. More and more studies have applied MAIAC AOD to PM 2.5 inversion in China. We believe that this AOD product has quality assurance over China.

Auxiliary Variables
Because atmospheric flow is affected by meteorological parameters (e.g., temperature and humidity) [25] and because air pollutants may be absorbed by some land cover types (e.g., vegetation) [37], meteorological parameters and land cover parameters were adopted as auxiliary variables in our algorithm to improve the estimation accuracy of PM 2.5 .

•
Meteorological data Boundary layer height (BLH, m) refers to the thickness of the planet's boundary layer and is one of the important physical parameters for atmospheric numerical model and atmospheric environment evaluation. Usually, BLH has a negative relationship with PM 2.5 because a higher BLH can expand the near-surface atmosphere and facilitate vertical convection. Humidity can influence the concentration of PM 2.5 by changing the weight of particulates and further affecting the diffusion speed of pollutants [38]. Temperature has a negative relationship with PM 2.5 because the inversion layer caused by low temperature is not conducive to the convection of the atmosphere [39]. Therefore, in this paper, the three types of meteorological parameters, BLH, relative humidity (RH, %), and 2 m near-groundtemperature (T, • C), were adopted in our algorithm to improve the estimation accuracy.
The BLH, RH, and T data employed in this paper were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) (https://www.ecmwf.int/). The ECMWF provides quality controlled spatially and temporally consistent, real-time, current forecasts, climate reanalysis and specific datasets. The data assimilation system used to produce ERA-Interim was based on a 2006 release of the IFS (Cy31r2). The system includes a 4-dimensional variation analysis (4D-Var) with a 12-h analysis window. The spatial resolution of the data set is approximately 80 km (T255 spectral) on 60 vertical levels, from the surface up to 0.1 hPa, and at a horizontal resolution of 0.125 • × 0.125º, and its temporal resolution is 3 h.

•
Land cover data Land cover variables, especially vegetation coverage and construction parameters, can influence the performance of the AOD-PM 2.5 regression models [40]. Studies have shown that vegetation coverage has an obvious effect on reducing and redistributing atmospheric particulate matters by means of sedimentation, retardation and adsorptions. It is worth noting that these effects will change with the change of seasons, because some vegetation will have different states in different seasons. Generally, winter vegetation has the smallest reduction effect on PM 2.5 [41]. Conversely, construction land produces dust and other pollutants that increase the concentration of atmospheric pollutants.
Land cover data with a 1 km spatial resolution were utilized in this paper. The data were obtained from the "Remote sensing monitoring data of land cover in China" section on the website of the Chinese Academy of Sciences Data Centre for Resources and Environmental Sciences (http://www.resdc.cn). It provides a database of multi-phase land cover status for the national land area under the years of accumulation generated by manual visual interpretation and based on the Landsat TM/ETM remote sensing image data. The land cover types include six primary types: cultivated land, forestland, grassland, waters, residential areas and unused land.

Ground PM 2.5 Measurements
The ground PM 2.5 measurements were used as training data and testing data in our algorithm to obtain the relationship between PM 2.5 and AOD and were then utilized to generate continuous PM 2.5 . As shown in Figure 1, there are 47 national ground monitoring sites in the study area (Hubei Province). The 47 sites are mainly located in densely populated cities, such as Wuhan (10 sites), and there are rural areas without ground sites. The PM 2.5 measurements in four seasons of 2017 from these sites were used in the experiment of this paper. The PM 2.5 hourly data were collected from the PM25.IN website (http://www.pm25.in/) provided by the China National Environmental Monitoring Center (CNEMC); as for worldwide research, the PM 2.5 data can be obtained from the following website (http://www.aqicn.info/here/).
Terra and Aqua satellites passed over the equator at approximately 10:30 a.m. and 13:30 p.m. local time [42]. To ensure the time consistency of AOD and PM 2.5 , we took the average of the hourly PM 2.5 measurements from 10:00 AM to 14:00 PM.

Data Pre-Processing and Integration
All the data and the descriptions were listed in Table 1. Four types of data were used in MSVR, including AOD, meteorological factors, land cover, and ground PM 2.5 , and data pre-processing and integration were required due to the differences in the spatial coverage and observation frequency of the data. To ensure the spatial consistency of the data from multiple sources, all data were first re-projected to a unified coordinate system: The World Geodetic System 1984 geographic coordinate system [43,44].
On the one hand, the atmosphere is moving and circulating, thus the concentration of atmospheric particles at a certain location changes over time. On the other hand, variables in this research are not instantaneous data, but the average over a certain period of time. Therefore, we took the weighted average of 3 × 3 pixels centered on the grid.
Meanwhile, the forest and construction land-use data that significantly affected the formation and concentration of PM 2.5 were extracted with a buffer of 1 km from the ground monitoring sites. In terms of temporal consistency, the meteorological data were integrated to the average of the period from 10:00 AM to 14:00 PM local time to correspond to the period when Terra and Aqua passed over the equator. The extraction and integration of multi-source data were mainly implemented by coding in Python.

Model Constructing and Training
For better performance of MSVR, the datasets including all variables were first normalized to [0, 1]. As for the parameter setting of the MSVR model, we selected the RBF kernel for the kernel function considering the complicated relationships between all the variables. We listed a series of value combinations of the parameters "gamma" and "C" (penalty parameter) and then chose the optimal parameter values by grid searching.
Five-fold cross validation was employed in this experiment. The training dataset was first randomly split into five subsets, with approximately 20% of the total data record in each subset. In each round of cross-validation, one subset was selected to be used as validation samples, and the remaining subsets were utilized to train and fit the model.
In this experiment, three statistical indicators, including the R 2 , the mean percentage error (MPE) and the square root of the mean squared prediction errors (RMSE), were calculated to assess the model performance. The accuracy evaluation and the comparison of the different models are discussed later in Section 4.1.

Principle of SVR
A Support Vector Machine (SVM) is a kind of machine learning method based on statistical learning theory, dimension theory and structural risk minimization principle. It shows many unique advantages in solving small sample, nonlinear and high-dimensional pattern recognition problems, and to a large extent overcomes the problems of "dimension disaster" and "over-learning" [45]. SVM is widely used to solve problems such as classification and nonlinear mapping and performs better in solving the problem of small sample datasets. SVM can, to a certain extent, solve the problems of model selection and over-learning that may be difficult for some traditional networks to solve. Support Vector Regression (SVR) is the application of SVM in regression analysis [46], and it solves the regression task by transforming an inseparable problem of low-dimensional space into a linearly separable problem of high-dimensional space using the space mapping function [47]. We must find a hyperplane to fit the distribution of the sample data, which means minimizing the sum of the distance between the sample points and the hyperplane [48,49].
In this experiment, for a given set of training data, the variables x, y are both vectors. {(x 1 , y 1 ), (x 2 , y 2 ), · · · , (x i , y i )}, x ⊂ R n and y ⊂ R, the problem of regression is to find the flattest function f that map a point in the space R n onto the space R with the lowest expected risk.
The key of nonlinear SVR is to map the input vector X ∈ R n into a high dimensional feature space H ∈ R m (with m larger than n), through mapping and then to solve a linear regression in space H.
We call ϕ(X) the mapping function, and usually the mapping function is difficult to calculate. Therefore, the kernel function K(x, y) is introduced into SVR, where ∅(x), ∅(y) stands for the scalar product mapped to the feature space.
K(x, y) = ∅(x), ∅(y) In the high dimensional space, we can find a hyperplane to divide the sample points and the hyperplane is defined as below. ω is the normal vector, which determines the direction of the hyperplane; b is the displacement, which determines the distance between the hyperplane and the origin.
The loss function of SVR was defined as where y is the true value corresponding to the dependent variable in the sample, f (x) is the predicted value returned by the trained model, and ε describes the variation between the above y and f (x).
The slack variables ξ i , ξ * i are introduced, and then the initial nonlinear problem is converted to find the optimal values for ω and b, the parameters of the linear regression function. Thus, the solving process could be represented as the mathematical model below.

Basi Idea of MSVR
A number of studies have indicated that the spatial distribution of PM 2.5 concentrations is distinct. According to the first law of geography, the farther the pixel is from the center pixel, the smaller the impact is [50]. However, the traditional SVR model utilized the general average method, and the different impacts of the pixel distances from the central value were ignored in the estimations of PM 2.5 . PM 2.5 is most essentially characterized by its spatial and temporal heterogeneity, and many models such as the GWR and the GTWR models were proposed to cover these characteristics [33,51].
In this research, the SVR is modified and improved not only in making use of rich information of the input variables, but also in terms of extracting geospatial information weighted by the distance of adjacent pixels from a center pixel and the time difference from ground-based PM 2.5 measurements. Thus, the Modified SVR can also be called Space-Time Support Vector Regression. The basic structure of MSVR was clarified in Figure 2.
where ds and dt represent the spatial and temporal distances. W and L represent the w pixels near the site and the l prior days for the same pixel. When considering the spatial and temporal information, the model can be described as PM2.5 = f (AOD, RH, BLH, T, LC, Ps, Pt). In order to evaluate the accuracy of the model, we introduced commonly used regression evaluation indicators: R 2 , MPE and RMSE. The specific meaning and calculation methods of these indicators are described as follows: Cross correlation coefficient (also R 2 ) statistically quantifies the predictive accuracy of a statistical model. It shows the proportion of variance in the outcome variable that is explained by the predictions. It is also known as the cross correlation coefficient.  For a given pixel, its spatial (Ps) and temporal (Pt) characteristics can be expressed as below, where ds and dt represent the spatial and temporal distances. W and L represent the w pixels near the site and the l prior days for the same pixel. When considering the spatial and temporal information, the model can be described as PM 2.5 = f (AOD, RH, BLH, T, LC, Ps, Pt).
In order to evaluate the accuracy of the model, we introduced commonly used regression evaluation indicators: R 2 , MPE and RMSE. The specific meaning and calculation methods of these indicators are described as follows: Cross correlation coefficient (also R 2 ) statistically quantifies the predictive accuracy of a statistical model. It shows the proportion of variance in the outcome variable that is explained by the predictions. It is also known as the cross correlation coefficient.
Mean percentage error (MPE) that are reported in some statistical procedures are signed measures of error which indicate whether the forecasts are biased.
The root mean squared error (RMSE) is the square root of the mean squared error.
where y i represents the true value,ŷ i represents the predicting value, y represents the average of all the true values.

Statistical Features of the Variables
To obtain an overview of the general situation and trend of the variables, the descriptive statistics of the variables used in the MSVR model are illustrated in Tables 2 and 3  Before estimating the PM 2.5 based on the constructed MSVR method, we explored the relationships between PM 2.5 and AOD. The results demonstrated that the AOD had a positive relationship with the PM 2.5 concentrations, and when the site location or observation time changed, the AOD values and PM 2.5 concentrations had similar fluctuating trends. By utilizing the optimal MSVR model produced in the training process, we obtained continuous PM 2.5 estimations at a spatial resolution of 1 km. For visualization, we created a rendering map reflecting the concentration and distribution of PM 2.5 in the study area, which is shown in Figure 3. As shown in the graph, the PM 2.5 estimations had different values and distributions in different regions, and we discuss the details of these features in the following section. The MSVR model proposed in this paper was used to estimate the continuous PM2.5 concentrations in the study area at the grid resolutions of both 1 × 1 km from the MAIAC AOD and 3 × 3 km from the MODIS AOD. Although derived from different data

Spatial Pattern of the Estimated PM 2.5
The MSVR model proposed in this paper was used to estimate the continuous PM 2.5 concentrations in the study area at the grid resolutions of both 1 × 1 km from the MAIAC AOD and 3 × 3 km from the MODIS AOD. Although derived from different data sources, the distribution patterns of MAIAC-and MODIS-derived PM 2.5 were very similar, e.g., similar high-value areas and similar spatial variation patterns.
Whether from MAIAC or MODIS, the high PM 2.5 concentrations generally occurred in developed urban cities where the population and pollution sources were concentrated, while the low PM 2.5 concentrations usually occurred in rural areas or mountainous areas as it was demonstrated in the literature. In the study area of Hubei Province, the high values were mainly distributed in Wuhan, Xiangyang, Jingzhou and other quickly developing cities.
As shown in Figure 4, although the PM 2.5 of MODIS and MAIAC have extremely similar spatial distributions in 2017, the 1 km PM 2.5 from MAIAC has indisputable advantages in terms of resolution. When the study area was zoomed out to a smaller scale, such as an urban area, the 3 km PM 2.5 could describe only the general spatial variation, and the pixels became relatively coarse and difficult to read. However, the 1 km PM 2.5 derived from MAIAC AOD were able to reflect subtler spatial changes and could distinguish the pixels with high levels of PM 2.5 from the others; this difference enabled us to locate the pollution sources.   Additionally, MAIAC AOD with 1 km resolution could generate many more AOD-PM 2.5 pairs [52,53] than MODIS AOD with 3 km resolution. As a result, when the dataset becomes larger, the MAIAC AOD is able to reflect more information related to PM 2.5 , and can achieve better results.
To evaluate the estimating accuracy, we extract the measured PM 2.5 values of ground monitoring site 1844A as the situ true values. The estimated PM 2.5 values are generated from MAIAC 1 km AOD and MODIS 3 km AOD. We analyzed the linear correlation of the measured and estimated PM 2.5 values, 0.6355 for MAIAC AOD and 0.5918 for MODIS AOD (shown as Figure 5). As the result shows, the PM 2.5 estimated from MAIAC AOD not only have higher spatial resolution, but also have higher estimating accuracy, which reveals the feasibility and superiority of MAIAC AOD in terms of PM 2.5 estimating. To evaluate the estimating accuracy, we extract the measured PM2.5 values of ground monitoring site 1844A as the situ true values. The estimated PM2.5 values are generated from MAIAC 1 km AOD and MODIS 3 km AOD. We analyzed the linear correlation of the measured and estimated PM2.5 values, 0.6355 for MAIAC AOD and 0.5918 for MODIS AOD (shown as Figure 5). As the result shows, the PM2.5 estimated from MAIAC AOD not only have higher spatial resolution, but also have higher estimating accuracy, which reveals the feasibility and superiority of MAIAC AOD in terms of PM2.5 estimating.  To illustrate the temporal and seasonal pattern of PM 2.5 in the study area, the data for different seasons were extracted to construct the seasonal regression models and then estimate the PM 2.5 concentrations of different seasons. Figure 6 shows the seasonal mean value distributions of PM 2.5 . Figure 6 shows that the PM 2.5 estimations derived from the optimal MSVR model using the 1 km resolution MAIAC AOD values as the model inputs had significant seasonal differences.
The PM 2.5 concentration values were generally low in summer, reaching the lowest values among the four seasons. Conversely, the PM 2.5 concentrations reached the highest level in winter, which might be the reason for more frequent air pollution weather in winter. In addition, the PM 2.5 concentrations in spring and autumn fell between the other two seasons, generally higher than summer and lower than winter, which could be explained by the seasonal climate differences.
The PM 2.5 concentration values were generally low in summer, reaching the lowest values among the four seasons. On the contrary, the PM 2.5 concentrations generally reached the high level in winter, which fitted the fact that the smoggy weather occurred more frequently in winter. Furthermore, from this side, our experiment results were reasonable. In addition, the PM 2.5 concentrations in spring and autumn fell between the other two seasons, generally higher than summer and lower than winter, which could be explained by the seasonal climate differences. values among the four seasons. On the contrary, the PM2.5 concentrations generally reached the high level in winter, which fitted the fact that the smoggy weather occurred more frequently in winter. Furthermore, from this side, our experiment results were reasonable. In addition, the PM2.5 concentrations in spring and autumn fell between the other two seasons, generally higher than summer and lower than winter, which could be explained by the seasonal climate differences.

Performance of the MSVR Model
Conventional Support Vector Regression (SVR) models were conducted on the same dataset to compare and evaluate the performance of MSVR. Three evaluation indexes, including the R², MPE and RMSE, were applied in the model performance evaluation. As recorded in Tables 4 and 5, the regression accuracy was influenced by season and model type and generally increased when models were established in different seasons (e.g., the

Performance of the MSVR Model
Conventional Support Vector Regression (SVR) models were conducted on the same dataset to compare and evaluate the performance of MSVR. Three evaluation indexes, including the R 2 , MPE and RMSE, were applied in the model performance evaluation. As recorded in Tables 4 and 5, the regression accuracy was influenced by season and model type and generally increased when models were established in different seasons (e.g., the summer model showed poor performance due to the absence of AOD data) compared to the whole year; additionally, the MSVR model generally had better performance than the SVR. The R 2 of the whole-year MSVR model increased by 0.14 compared to the SVR model, while the MPE and RMSE of the MSVR model decreased by 1.42 µg/m 3 and 1.73 µg/m 3 , respectively. Considering the complexity of atmospheric retrieval and study area, surface reflectance impact, cloud contamination, etc., we think MSVR in this paper obtained relatively good performance.
The R 2 , MPE and RMSE of the MSVR models generally increased in the four seasons, respectively. The measurements and estimation values at different scales for different seasons showed that the PM 2.5 concentration values changed over time. Generally, as the results illustrate, the PM 2.5 concentrations reached a peak value in winter and the lowest value in summer, which was in good agreement with existing empirical knowledge.
As shown in Figure 7, the estimated PM 2.5 and measured PM 2.5 are concentratedly distributed near the centerline, which means that the estimated PM 2.5 generally have the similar pattern with the measured PM 2.5 no matter for MSVR model or SVR model. According to the figure, most of PM 2.5 values fall between 20 µg/m 3 to 80 µg/m 3 . And from the figure, the gap between the estimated and measured PM 2.5 values of SVR model gets bigger when PM 2.5 values get higher. This means the MSVR model is more capable for accurate estimating in the high level of PM 2.5 values.
As the results showed, the MSVR model performed better than the conventional SVR model in PM 2.5 estimating. In addition, compared with other nonlinear models, such as the neural network, MSVR is more suitable for relatively small datasets, such as the dataset used in this paper. SVM has a solid mathematical theoretical basis and a strong generalization ability, which can effectively solve the problem of high-dimensional data model construction under limited sample size, and correspondingly, MSVR is relatively easy to construct and train compared to other nonlinear models.

Advantages of MAIAC AOD
In this paper, the MAIAC AOD with a spatial resolution of 1 km and the MODIS AOD with a spatial resolution of 3 km were utilized to estimate the ambient PM 2.5 concentrations with the same optimal model. According to the results, the PM 2.5 estimations from MAIAC AOD and MODIS AOD had similar spatial and temporal distributions. However, as shown in Figures 4 and 5, MAIAC AOD with 1 km resolution could generate many more AOD-PM 2.5 pairs than MODIS AOD with 3 km resolution. As a result, when the dataset becomes larger, the MAIAC AOD is able to reflect more information related to PM 2.5 , and can achieve better results in the inversion. Researches have shown that the correlation between PM 2.5 and AOD decreased significantly as AOD resolution was degraded in spite of the intrinsic mismatch between PM 2.5 ground level measurements and AOD vertically integrated measurements [54]. This conclusion could also be seen from our analysis in Figure 5. PM 2.5 retrievals of higher spatial resolution make it possible to figure out the spatial distribution characteristics. This is of great importance to accurately locate the source of air pollution where the PM 2.5 concentration usually abnormally elevates. Additionally, we cannot ignore the other superiorities of MAIAC algorithm over the DB/DT algorithm. For example, high-resolution MAIAC is capable to distinguish aerosol sources and fine feature, MAIAC retrieval accuracy is higher than DT/DB with more accuracy over dark surface, and MAIAC has less sensitivity to variation in aerosol types across the seasons. As the results showed, the MSVR model performed better than the conventional SVR model in PM2.5 estimating. In addition, compared with other nonlinear models, such as the neural network, MSVR is more suitable for relatively small datasets, such as the dataset used in this paper. SVM has a solid mathematical theoretical basis and a strong generalization ability, which can effectively solve the problem of high-dimensional data model construction under limited sample size, and correspondingly, MSVR is relatively easy to construct and train compared to other nonlinear models.

Advantages of MAIAC AOD
In this paper, the MAIAC AOD with a spatial resolution of 1 km and the MODIS AOD with a spatial resolution of 3 km were utilized to estimate the ambient PM2.5 concentrations with the same optimal model. According to the results, the PM2.5 estimations from MAIAC AOD and MODIS AOD had similar spatial and temporal distributions. However, as shown in Figures 4 and 5, MAIAC AOD with 1 km resolution could generate many more AOD-PM2.5 pairs than MODIS AOD with 3 km resolution. As a result, when the dataset becomes larger, the MAIAC AOD is able to reflect more information related to PM2.5, and can achieve better results in the inversion. Researches have shown that the cor-

Analyses of the Spatiotemporal Pattern of PM 2.5
Firstly, as shown in Figure 3, it should be noted that most cities in the middle and southern parts, such as Jingzhou, Qianjiang, Jingmen, and Wuhan, have the higher PM 2.5 concentration level. This might be interpreted as the lager population, heavy vehicle emissions and more reliance on carbon-intensive industries for a significant proportion of their economic activities [55]. Additionally, researches show that secondary particles resulting from chemical reactions in primary gaseous pollutants may also contribute to levels in this region [51]. On the other hand, the lowest PM 2.5 level occurs in Shiyan, northwest of Hubei, with more mountainous woodland and vegetation cover and far lower anthropogenic emission levels, which is helpful for atmospheric dispersion and dilution.
Secondly, the PM 2.5 concentrations differ greatly across the seasons. The highest PM 2.5 levels in 2017 occurred during the winter and the lowest during the summer ( Figure 6). Seasonal maximum PM 2.5 concentrations in winter may be the reason of coal combustion and unfavorable meteorological conditions for pollution dispersion in winter (such as the stagnant weather and temperature inversion) [56]. It is worth noting that the secondary particles generated from chemical reactions in primary gaseous pollutants can contribute to fine PM formation. Summer was the season with the lowest average PM 2.5 concentration. This may be interpreted as the reduced anthropogenic emissions related to coal burning for domestic heating; summer monsoons can also help to dissipate aerosols [57].

Limitations
The MSVR method proposed in this paper has its own limitations. First, due to the missing AOD data on some days, which are usually caused by cloud cover and high surface reflectance, the number of AOD-PM 2.5 matchups per day is limited [58], resulting in model over-fitting or decreased estimation accuracy. Moreover, the focus of this experiment is the regression method. PM 2.5 feature analyses in large-scale and long period or in the urban areas are not detailed enough. In future work, more available satellite data with higher resolution and quality should be introduced into AOD-PM 2.5 research.
It is worth noting that MODIS AOD is a column average and not surface. There are satellite products that offer vertical profiles, and these are assimilated in MERRA-2 and ECMWF air chemistry sub-models, and could provide surface estimates [59]. However, in view of the current wide application of MODIS AOD and the current research on AOD of vertical profiles not being enough, we have not explored the impact of column average on the experiment. This is the focus of our future work.
In this experiment, we mainly added meteorological and land cover data but, in fact, the relationship between AOD and PM 2.5 is affected by many factors, such as the temperature inversion and more parameters including meteorological, topographic and society parameters [60]. The influence of these factors is relatively complex, and the current research is not yet mature. In future work, we will try to introduce more factors into the regression and analysis.
In addition, the composition and structure of atmospheric pollutants are quite complex and diverse [61], such as SO 2 , NO 2 , CO, O 3 , etc. We will be committed in the research and retrieval of AOD and other more pollutants.

Conclusions
The satellite AOD data used in this experiment has superiority over the conventional DB/DT AOD in terms of resolution and accuracy. A higher resolution (1 km) satellite AOD data is used to ensure that the obtained PM 2.5 can reflect more accurate and detailed temporal and spatial characteristics. Additionally, the accuracy of MAIAC algorithm has been proved to be higher than DT/DB algorithm over dark surface. The experiment verified the feasibility of 1 km MAIAC AOD for PM 2.5 retrieval and the superiority over the 3 km MODIS AOD in terms of spatial resolution and retrieval accuracy.
MSVR proposed in this paper, is modified based on the traditional SVR for the regression of AOD and PM 2.5 and obtain the improvement of experiment accuracy. The results showed that the MSVR model could improve the accuracy of the regression from R 2 0.60 to 0.74 in 2017 and 0.66 to 0.78 in 2018 compared to the traditional SVR.
We introduced the commonly used meteorological parameters to reduce the influence of complex factors on PM 2.5 retrieval from satellite AOD to a certain extent. The integrated meteorological parameters and land cover data demonstrated that the appropriate auxiliary variables could improve the performance of PM 2.5 retrieval.
The experimental results also showed that PM 2.5 has obvious spatial and temporal differences. We analyzed the spatial and temporal distribution and characteristics of PM 2.5 in Hubei Province, and conducted the above analysis by season. We also analyzed the possible reason of such spatiotemporal differences.
In our future work, we will make efforts from three aspects. First, we will try to find satellites with higher resolution and aerosol retrieval algorithm with better performance. It is worth noting that, recently, there are studies that propose combining satellite remotesensing techniques and a newly established low-cost sensor network to estimate long-term PM 2.5 concentrations to increase the measurement density. Secondly, we will try to figure out the specific influence of meteorological, topographic and social factors on the distribution characteristics of PM 2.5 and the retrieval of PM 2.5 from satellite AOD. Thirdly, we will attempt to conduct longer time series and wider range of analyses of PM 2.5 distribution characteristics.