Spatio-Temporal Correlation Analysis of Air Quality in China: Evidence from Provincial Capitals Data

: In China, public health awareness is growing as people get more concerned about the air quality. Based on the air quality index (AQI) of 31 provincial capital cities (2015–2018) in China, we studied the spatio-temporal correlations of air quality between cities. With spatial, temporal and spatio-temporal analysis, we systematically obtained many interesting results where the traditional analyses may be lacking. Firstly, the air quality of cities has spatial spillover and agglomeration e ﬀ ects and further the spatial correlation becomes higher with time. Secondly, there exists temporal correlation between the current AQI and its past values on multiple time scales, which shows certain periodicity. Thirdly, due to the changing characteristics of time, social activities and other factors a ﬀ ect the air quality positively. However, with the panel data model, the coe ﬃ cients of spatio-temporal correlation vary for di ﬀ erent cities.


Introduction
Due to the rapid growth of industrialization and urbanization, the consumption of energy such as coal, natural gas and oil has increased rapidly in China. The air pollution is getting worse and worse [1]. The pollution problem has seriously affected people's quality of life and the public's health and has even caused certain obstacles to the sustainable development of the entire country. How to face and effectively deal with air pollution has become the focus of the whole society and its proper solution will significantly improve the living environment of the people and promote the harmonious development of social ecology [2].
The traditional perspective of research is to study the impact of specific pollutants on the urban air quality or to analyze the distribution characteristics of air quality only from the perspective of time or space [3]. However, spatial and temporal correlation analysis should be used in the study of air quality in Chinese cities. Spatial and temporal correlation analysis of air quality refers to the statistical analysis of the evolution characteristics of data sequences formed by the time evolution of points in all directions of space. In other words, it reflects the correlation of spatial and temporal data in time and space by studying the change law of spatial objects with time. From the perspective of the spatio-temporal correlation of air quality, this paper comprehensively analyzes the changing law of air quality among Chinese cities, which has certain significance for China to formulate relevant air pollution control policies.
This paper aims to reveal the spatial and temporal evolution of air quality of Chinese cities. It proceeds as follows: In Section 2, we review the literature and discuss the spatio-temporal correlations of air quality in cities. The methodology section describes the measurements, sample and data. Then, the following section presents the statistical results. Finally, we discuss the results and make some policy suggestions.

The Characteristics of AQI Time Distribution
The law of time changes tends to three angles: Annual, quarterly and intraday. In the annual analysis, Xu, Liu and Wang [4] explored the change of AQI in Chinese cities from 2014 to 2016 and found that AQI showed a downward trend and the number of air polluted cities decreased. Xiao, Tian and Xu et al. [5] discussed the spatio-temporal distribution characteristics of air quality in China over the past decade and found that the air quality is improving year by year. Guo et al. [6] found that the annual average mass concentration of PM2.5, PM10, SO2, CO and other pollutants decreased from 2015 to 2017 and human factors were the main causes of the air quality changes.
In the quarterly analysis, most studies believe that the temporal variation characteristics of the air quality in Chinese cities are seasonal [7,8]. China has the northwest monsoon in winter and southeast monsoon in summer, but the "leeward slope" on the east side of the Tai-hang Mountains and the semi-closed topography on the south side of Yanshan Mountain weaken the effect of the northwest monsoon in winter; the pollutant diffusion condition is poor. At the same time, anthropogenic emissions caused particulate matter levels to increase dramatically in winter. Therefore, the proportion of particulate matter in polluted air is the highest in winter, indicating fine particulate matter pollution is the most serious. Xu, Liu and Wang [4] believed that the seasonal value of the AQI in Chinese cities is: Winter > spring > autumn > summer. Xia et al. [7] found that the average concentration of air pollutants in north wind days is higher than that in south wind days. Zhang et al. [8] found that the air pollution in China is U-shaped, for major pollutants varying with the seasons temporally. Tian et al. [9] explored the air quality index and found it is highest in winter and lowest in summer, mainly due to man-made emissions caused by heating demand and industrial production.
In the intraday analysis, many studies found that the daily value of Chinese AQI fluctuated during the day and was relatively stable at night [10]. He, Gong, Yu et al. [10] found that meteorological conditions are the main factors that determine the daily variation of pollutant concentrations.

The Characteristics of AQI Spatial Distribution
Many scholars found that the spatial autocorrelation model is an important method for studying the characteristics of spatial changes in air quality [11]. Tao and Wu [11] believed that the application of spatial autocorrelation model to air quality research can intuitively obtain the agglomeration region and distribution characteristics and reveal the spatial agglomeration pattern and the law of air pollutants. He, Han and Cui [12] distinguished global spatial autocorrelation from local spatial autocorrelation. When there exists no global autocorrelation, local autocorrelation can be used to find the covered local autocorrelation region. Global autocorrelation statistics compare the similarity of observations at adjacent spatial locations. Local spatial autocorrelation statistics indicate whether observations at each spatial location are correlated with the observations at adjacent locations. Fang et al. [13] and He et al. [14] used global and local Moran's I to measure the spatial autocorrelation of air quality index (AQI) values and found that AQI values had significant spatial dependence and heterogeneity. Dadhich et al. [15] used Moran's I and LISA for spatial analysis to determine the spatial correlation between air quality index and weather conditions in different wards in Jaipur. The results of Moran's I showed that there was a strong positive correlation among AQI and relative humidity, temperature and wind speed.
Many scholars believe that there is a spatial correlation between the air quality in Chinese cities [16]. Liu and Du [16] found that the air pollution in China is not randomly distributed, showing a significant spatial autocorrelation structure. Liu et al. [17] found that the statistical results of Moran's I showed that the AQI measurement values of Chinese cities had a positive spatial autocorrelation, that is, the spatial distribution of high AQI values tended to aggregate rather than disperse. Zhu, Zhang and Fu [18] showed that air quality as a whole presented positive spatial autocorrelation and increased year by year. Xu, Tian, Liu et al. [19] used the ESDA-GWR model to find the increase of spatial positive correlation of Sustainability 2020, 12, 2486 3 of 21 AQI during the sample period. Xiao, Tian, Xu et al. [5] found that the national air quality presented a significant spatial agglomeration and differentiation laws, which was manifested in the spatial pattern of "the north is heavier than the south, while the east is heavier than the west". Huang, Wang and Wei [20] used factor analysis, the linear regression model and the spatial auto-regression (SAR) model to obtain the global Moran's I test, which showed that the AQI in China had a significant positive correlation in space. Local Moran's I test showed that there were significant high-high AQI clusters (high-high clusters" and "low-low clusters" represent the high (low) air quality in cities and the high (low) air quality in surrounding cities) around the Beijing-Tianjin-Hebei region, while low-low AQI clusters are all located in southern China, including Yunnan, Guangxi and Fujian.
Most studies analyzed the spatial change characteristics of air quality [21]. Li, Zhang, Wang et al. [21] asserted that China's terrain is high in the south and low in the north, and high in the west and low in the east, which has a wide impact on the distribution of meteorological conditions, affecting regional changes in air quality. Pu et al. [22] and Zhao et al. [23] found that the AQI values of Chinese cities were spatially dependent by spatial autocorrelation analysis. The high AQI was mainly distributed in the north and northwest, while the low AQI was mainly distributed in the south and the Qinghai-Tibet region. Xiao, Tian, Xu et al.
[5] studied the national air in the past 10 years. The conclusion was that the center of mass was mainly moving toward the northeast, showing that provinces in the eastern and northern regions had more severe air pollution than other provinces in the country. Using geographic information system (GIS) and heat map, Xu et al. [24] found a large number of spatial variations in AQI, among which the AQI value was the highest in the middle-east area of the north China plain. Liu, Wu and Yu [25] explored a major air pollution incident in northern China, finding that the Beijing-Tianjin-Hebei region was the epicenter of the problem. Burning straw in agricultural areas in the region appears to be one of the main drivers, along with coal burning and traffic emissions.

The Characteristics of AQI Spatio-Temporal Distribution
The distribution characteristics of the air quality from the perspective of either space or time have been extensively explored, but there were few studies on the dynamic changes of air quality from both time and space analyses. Luo [26] pointed out that the model that only used the information of two-dimensional data in time series or cross-sectional data failed to meet the needs to analyze the actual economic problems in many cases. Boyce et al. [27] believed that almost all ecological and behavioral data were self-related in space and time, and the structure of these data can be understood from the autocorrelation model. Panel data analysis is an important way to combine the two. Liu, Wang and Yang [28] concluded that there were significant spatial spillover and dynamic effects in the spatial and temporal dimensions of urban air pollution, respectively.

Data Sources
This paper selects AQI data of 31 provincial capital cities in China from 2015 to 2018 to analyze the spatio-temporal correlation of the air quality. AQI is a dimensionless index that quantitatively describes the condition of air quality and can reflect the air quality level within a certain period in certain regions. The value ranges from 0 to 500. The higher the AQI value, the worse the air quality. All the data are obtained from the web: https://www.AQIstudy.cn. We use the 24-h average of a natural day as the daily data, the average in one month as monthly data, as well as the quarterly data and the annual data similarly.

Time Autocorrelation Model
In this paper, the autocorrelation function is naturally used to analyze the temporal correlation between different data points. The autocorrelation function is given as follows.
It ranges from -1 to 1. We also use the autocorrelation function to explore the time change of the air quality, as the time evolution characteristics of the AQI can be studied by time autocorrelation model [29][30][31][32]. Further, the temporal autocorrelation is estimated on multiple time scales, including the intraday, quarterly and annual analyses.

Spatial Autocorrelation Model
Spatial autocorrelation measures whether there exists a correlation between the observed value of a certain point in space and the value of its neighboring points. The corresponding statistics are used to represent the degree of correlation and aggregation mode of geographic variables [33]. Global Moran's I and local Moran's I i are the most commonly used spatial autocorrelation functions to test by regression residuals. They belong to the general category of quadratic ratios [34]. The positive and negative values of the statistics represent the positive and negative of the spatial autocorrelation. The larger the statistics, the stronger the spatial autocorrelation. The global Moran's I index measures global spatial correlation, which is shown as follows.
Here n is the total number of cities;x i , x j are the observations of urban environmental monitoring points in i and j, respectively; x is the average of x; W ij are the elements of the spatial weight matrix, when the distance between region i and region j is less than region d W ij = 1, otherwise W ij = 0. The global Moran's I value ranges in [−1, 1]. When i is positive, it means that the urban air quality is agglomerated. When i is negative, the urban air quality presents discrete characteristics; when i is equal to 0, it means that the urban air quality space is not correlated and distributed randomly and irregularly.
To observe the difference in spatial relationship according to different spatial locations, the local spatial autocorrelation by the local Moran's I i is used in this study. It can measure the degree of spatial correlation between the observation location and the surrounding cities effectively. The local Moran's I i is calculated as follows.
Similarly, n is the total number of cities; x i , x j are the observations of urban environmental monitoring points in units i and j, respectively; x means the average of x; W ij is the space weight matrix element. I i describes the degree of spatial agglomeration between area i and the surrounding areas, which can be divided into four association modes: High-high (HH), high-low (HL), low-high (LH), and low-low (LL). Among them, the HH and LL groups represent the urban air quality and the air quality of its neighboring cities is consistent; the HL and LH groups indicate that the high (low) urban air quality is surrounded by the low (high) urban air quality.

Panel Data Model
Panel data has both time and section dimensions, which means that multiple sections are taken on the time series, and the sample data consists of sample observations selected on these sections at the same time. To illustrate these models and their effect estimates, this study builds a model based on the panel data from 31 provincial capitals in China between 2015 and 2018 [35,36].
The general regression equation of the panel data is as follows.
Among them, i = 1, 2, ... N, means N cities, t = 1, 2, ... t means t periods, Y it is the explained variable, which represents the observed value of the ith individual at time t, X βit is an explanatory variable β for AQI observations of city i at time t, α βit is the parameter to be evaluated, α 0 is a constant term, ε it is the random interference term.
Panel data is the combination of time series data and the cross-sectional data. For time series data, it is necessary to test the stability of the data, which is the premise of measurement. Otherwise, "pseudo-regression" may occur, which cannot truly reflect the equilibrium relationship between dependent variables and explanatory variables, but is just a numerical coincidence. The most common test for the stationarity of a sequence is the unit root test. If there is a unit root, it is a nonstationary time series. The processing method of non-stationary time series is to eliminate the unit root by differential method and turn it into stationary sequence. In this way, the method of stationary time series can be applied to the corresponding research.
In this study, the temporal and spatial dimensions of the air quality changes are considered for the spatio-temporal analysis, and the panel data model is selected [37][38][39]. Because the panel data models exist in various forms, the final applicable model is determined by the F test and the Hausman test [40].
After the mixed estimation model is established, the F test is then used to determine whether an individual fixed effect model needs to be established. The following are the two assumptions of the F test. Next, construct the F statistic as follows.
where SSE r represents the sum of squared residuals of the estimated model after applying constraints; SSE u represents the sum of the squared residuals of the estimated model without constraints; m represents the number of constraints; T represents the sample size; k represents the number of estimated parameters in the unconstrained model. In the original hypothesis "constraints are true" conditions, the F statistic asymptotically obeys the F distribution with the degrees of freedom (m, T − k). When the calculated F value is greater than Fα(m, T − k), the null hypothesis is rejected and an individual fixed effect model should be established; a mixed regression model otherwise. The Hausman test is often used to test whether the individual effect or the time effect in the model is directly related to the explanatory variables [41]. The Hausman test is given below: H 0 : Individual effects are not related to the regression variables (individual random effect regression models should be established) H 1 : Individual effects are related to the regression variables (individual fixed-effect regression models should be established) Next, constructing the W statistic is as follows.
Here b is the LSDV estimation vector of the regression coefficients;β is the (Generalized Least Squares) GLS (GLS is a common method to eliminate heteroscedasticity. Its main idea is to add weight to the explanatory variables so that the variance of the regression equation after adding the weight is the same.) estimation vector of the regression coefficient; Ω is the variance of the difference between b andβ. Hausman proved that under the null hypothesis, the statistic W obeys the χ 2 distribution of the degree of freedom k.

The Multiscale Temporal Characteristics of AQI's Change
We typically select the AQI data of Beijing, Wuhan, Urumqi and Kunming in 2015-2018 and calculate the AQI averages. The annual results are shown in Figure 1a-d, where the dashed lines indicate the four-year AQI mean for each city. As a whole, the AQI values of the four cities showed a decreasing trend from 2015 to 2018, reflecting the significant improvement of China's urban air quality in recent years. The declines are highest in Beijing and Wuhan, with 29% and 23%, respectively. The annual results show that the air quality in various regions of China has been continuously improved in recent years, indicating that the regulations issued by the government on the air governance and other efforts have begun to bear fruit, especially in Beijing.

The Multiscale Temporal Characteristics of AQI's Change
We typically select the AQI data of Beijing, Wuhan, Urumqi and Kunming in 2015-2018 and calculate the AQI averages. The annual results are shown in Figure 1a-d, where the dashed lines indicate the four-year AQI mean for each city. As a whole, the AQI values of the four cities showed a decreasing trend from 2015 to 2018, reflecting the significant improvement of China's urban air quality in recent years. The declines are highest in Beijing and Wuhan, with 29% and 23%, respectively. The annual results show that the air quality in various regions of China has been continuously improved in recent years, indicating that the regulations issued by the government on the air governance and other efforts have begun to bear fruit, especially in Beijing.    The main reasons for the seasonal change of the AQI might be given as follows. In winter, the temperature is low, with a significant increase in the amount of coal fired due to heating needs, and the release of pollutants from fuel combustion, coupled with the dry climate and reduced precipitation, which makes it difficult to flush the atmosphere, leading to the AQI value reaching its peak for the year. Coal-fired energy is converted to building space heating, which can produce a lot of haze pollution in winter. What is more, the annual increase in data central heating capacity means that more and more heat-related pollutants are discharged into the atmosphere, leading to worse environmental quality [42]. Besides, northeast China is the main grain producing area in China. Every spring and autumn, a large number of crop stalks are burned, which has a great impact on the air quality [43], so the air pollution in northeast China, represented by Shenyang, is worse in winter. Increased precipitation in summer, frequent rainfall and windy weather are conducive to the diffusion and dilution of pollutants, so the value of AQI in summer is the lowest in the year. Windy sand and sand storms in spring bringing a lot of dust caused the air quality in many areas in the spring to be not optimistic. Compared with summer, the autumn is drier, and the burning of plant straw releases a lot of pollutants, which makes the air quality in autumn fall, but it is still better than that in winter and spring.
Further, we selected Shanghai, Taiyuan, Harbin and Changchun in four regions for daily timesharing AQI data in 2018, and used three hours as a data point to calculate the intraday changes of the AQI index in four regions in 2018. The AQI index of the four regions from 21:00 to 00:00 the next day shows a high level throughout the day (shown in Figure 3). This is because, at night, the temperature of the upper atmosphere is lower than the surface, which forms an "inversion layer", which is not conducive to the spread of the pollutants. A stable atmosphere at night significantly impedes the diffusion of air pollutants. After 6 am, the AQI index gradually increases, with a small peak around 9 am. The AQI index falls slightly in the afternoon, but continues to rise after 6 pm. This is because traffic flow affects the daily flux peak; more developed cities such as provincial capitals will experience morning and evening commute, and a large amount of vehicle exhaust will increase the air pollution [44]. The main reasons for the seasonal change of the AQI might be given as follows. In winter, the temperature is low, with a significant increase in the amount of coal fired due to heating needs, and the release of pollutants from fuel combustion, coupled with the dry climate and reduced precipitation, which makes it difficult to flush the atmosphere, leading to the AQI value reaching its peak for the year. Coal-fired energy is converted to building space heating, which can produce a lot of haze pollution in winter. What is more, the annual increase in data central heating capacity means that more and more heat-related pollutants are discharged into the atmosphere, leading to worse environmental quality [42]. Besides, northeast China is the main grain producing area in China. Every spring and autumn, a large number of crop stalks are burned, which has a great impact on the air quality [43], so the air pollution in northeast China, represented by Shenyang, is worse in winter. Increased precipitation in summer, frequent rainfall and windy weather are conducive to the diffusion and dilution of pollutants, so the value of AQI in summer is the lowest in the year. Windy sand and sand storms in spring bringing a lot of dust caused the air quality in many areas in the spring to be not optimistic. Compared with summer, the autumn is drier, and the burning of plant straw releases a lot of pollutants, which makes the air quality in autumn fall, but it is still better than that in winter and spring.
Further, we selected Shanghai, Taiyuan, Harbin and Changchun in four regions for daily time-sharing AQI data in 2018, and used three hours as a data point to calculate the intraday changes of the AQI index in four regions in 2018. The AQI index of the four regions from 21:00 to 00:00 the next day shows a high level throughout the day (shown in Figure 3). This is because, at night, the temperature of the upper atmosphere is lower than the surface, which forms an "inversion layer", which is not conducive to the spread of the pollutants. A stable atmosphere at night significantly impedes the diffusion of air pollutants. After 6 am, the AQI index gradually increases, with a small peak around 9 am. The AQI index falls slightly in the afternoon, but continues to rise after 6 pm. This is because traffic flow affects the daily flux peak; more developed cities such as provincial capitals will experience morning and evening commute, and a large amount of vehicle exhaust will increase the air pollution [44].

Time Autocorrelation Analysis of AQI
In this section, we use the 24-hour AQI index data of 31 provincial capital cities in 2018. In order to eliminate duplicate data due to the slow change of the AQI index and some missing data, we selected 3 hours as a data point. We obtained the AQI time series of each region in 2018, and estimated the correlation coefficient matrix by the time autoregressive model. As in Figure 4, we typically showed the correlation coefficients of four regions, including Beijing, Hohhot, Hangzhou and Nanning.

Time Autocorrelation Analysis of AQI
In this section, we use the 24-h AQI index data of 31 provincial capital cities in 2018. In order to eliminate duplicate data due to the slow change of the AQI index and some missing data, we selected 3 h as a data point. We obtained the AQI time series of each region in 2018, and estimated the correlation coefficient matrix by the time autoregressive model. As in Figure 4, we typically showed the correlation coefficients of four regions, including Beijing, Hohhot, Hangzhou and Nanning.
As indicated by the correlation coefficient curves of the AQI values and the lag period of 25 periods in the four regions, the correlation coefficient decreases with the increase of the lag period from the overall trend. The inflection point is the time node where the correlation coefficient rises slightly with time at every moment. Periodicity can be judged according to the inflection point. In this paper, the correlation coefficient curve of AQI and its lag period have inflection points at some time points, which indicates that the urban air quality in China has a time correlation effect, and shows a certain periodicity. In particular, when the lag period is 8, 16 and 24, the correlation coefficient shows a small increase or inflection point (marked with arrows in the figure), showing a periodicity of 8 in the lag period. As the data selected 3 h as a data point, the lag period of 8 corresponds to 24 h.
In this section, we use the 24-hour AQI index data of 31 provincial capital cities in 2018. In order to eliminate duplicate data due to the slow change of the AQI index and some missing data, we selected 3 hours as a data point. We obtained the AQI time series of each region in 2018, and estimated the correlation coefficient matrix by the time autoregressive model. As in Figure 4, we typically showed the correlation coefficients of four regions, including Beijing, Hohhot, Hangzhou and Nanning.  As indicated by the correlation coefficient curves of the AQI values and the lag period of 25 periods in the four regions, the correlation coefficient decreases with the increase of the lag period from the overall trend. The inflection point is the time node where the correlation coefficient rises slightly with time at every moment. Periodicity can be judged according to the inflection point. In this paper, the correlation coefficient curve of AQI and its lag period have inflection points at some time points, which indicates that the urban air quality in China has a time correlation effect, and shows a certain periodicity. In particular, when the lag period is 8, 16 and 24, the correlation coefficient shows a small increase or inflection point (marked with arrows in the figure), showing a periodicity of 8 in the lag period. As the data selected 3 hours as a data point, the lag period of 8 corresponds to 24 hours.
Through the empirical study of time autocorrelation, this paper can make a preliminary judgment. There exists temporal correlation between the current AQI and its past values on multiple time scales. The correlation effect decreases with the increase of time lag, and shows a periodicity, with the period one day.
The reason why the degree of correlation decreases with the increase of the lag period is not difficult to understand. With the passing of time, various factors affecting AQI such as PM2.5 and PM10 constantly change, so the correlation between them and the air quality at a fixed time point gradually decreases. The reason for the slight increase in the correlation coefficient at the same time point every day may be that factors such as climatic conditions and human social activities that affect air quality at the same time point are similar. For example, in the morning and evening rush hours, the increase of automobile exhaust and the emission of polluting gas from factories during working hours would aggravate air pollution and increase the value of AQI. On the other hand, in the early morning of every day, the active vehicles would greatly reduce and most factories would not work, so the AQI value would reduce.
For other cities, we obtained very similar results. Table 1   Through the empirical study of time autocorrelation, this paper can make a preliminary judgment. There exists temporal correlation between the current AQI and its past values on multiple time scales. The correlation effect decreases with the increase of time lag, and shows a periodicity, with the period one day.
The reason why the degree of correlation decreases with the increase of the lag period is not difficult to understand. With the passing of time, various factors affecting AQI such as PM2.5 and PM10 constantly change, so the correlation between them and the air quality at a fixed time point gradually decreases. The reason for the slight increase in the correlation coefficient at the same time point every day may be that factors such as climatic conditions and human social activities that affect air quality at the same time point are similar. For example, in the morning and evening rush hours, the increase of automobile exhaust and the emission of polluting gas from factories during working hours would aggravate air pollution and increase the value of AQI. On the other hand, in the early morning of every day, the active vehicles would greatly reduce and most factories would not work, so the AQI value would reduce.
For other cities, we obtained very similar results. Table 1  The correlation coefficients of AQI values and their lags in 12 cities decrease with the increase of lags. The correlation coefficients of most cities show a slight increase or inflection point in the 8th, 16th and 24th periods. In Beijing (BJ) and Hangzhou (HZ), the rate of decrease is significantly slowed or is flat compared with the previous period in the 8th and 16th periods. It can be shown as an inflection point in the graph, with obvious periodicity.

The Characteristics of Spatial Distribution of AQI
Due to the differences in natural conditions such as altitude, topography, sea and land location and climate, the air quality in China's provincial capitals is better in the south than in the north, and better in the coastal areas than in the interior.
As shown in Figure 5, the air quality of coastal cities represented by Haikou and Fuzhou is better than others in 2018 because the southern part of China has a tropical or subtropical monsoon climate with relatively flat terrain, and the humid monsoon from the ocean in summer is conducive to the diffusion of urban pollutants. The air quality of cities in the high-altitude areas represented by Kunming and Lhasa also presents a better level. This is because there is little human activity in the high-altitude southwest, so the environment in this area has not been destroyed by human beings.
The cities with poor air quality are concentrated in the northeast, north and northwest of China. Northeast China, represented by Changchun, is China's heavy industrial base and consumes a lot of coal and oil, so it has serious air pollution problems. In north China, represented by Shijiazhuang, there is a problem of low vegetation coverage, which cannot resist dust from northwest China. There is a potential relationship between the air quality and topography of the north China plain and the east Asian monsoon when westerly and southerly winds blow through the north and east of the region, leading to the accumulation of regional air pollutants [45]. Moreover, due to the low topography of north China, the diffusion conditions of air pollutants are poor. In north China, there is still the problem of large population density, and the rapid economic development and industrialization and urbanization process, so the damage to the environment is relatively serious. For the cities in northwest China, such as Urumqi, the distance from the sea is far, so water vapor is difficult to reach. Therefore, there is less precipitation, vegetation is scarce and the natural environment is harsh. All these reasons lead to poor air quality in northwest China. The cities with poor air quality are concentrated in the northeast, north and northwest of China. Northeast China, represented by Changchun, is China's heavy industrial base and consumes a lot of coal and oil, so it has serious air pollution problems. In north China, represented by Shijiazhuang, there is a problem of low vegetation coverage, which cannot resist dust from northwest China. There is a potential relationship between the air quality and topography of the north China plain and the east Asian monsoon when westerly and southerly winds blow through the north and east of the region, leading to the accumulation of regional air pollutants [45]. Moreover, due to the low topography of north China, the diffusion conditions of air pollutants are poor. In north China, there is still the problem of large population density, and the rapid economic development and industrialization and urbanization process, so the damage to the environment is relatively serious. For the cities in northwest China, such as Urumqi, the distance from the sea is far, so water vapor is difficult to reach. Therefore, there is less precipitation, vegetation is scarce and the natural environment is harsh. All these reasons lead to poor air quality in northwest China.

Spatial Correlation Analysis of AQI
We use the daily air quality index data of 31 provincial capitals from 2015 to 2018 to obtain the annual average value for each provincial capital, then calculate the global Moran index I and local Moran index Ii of each city every year. Further, we draw the scatter chart and LISA aggregation chart. Table 2  In order to further study the local spatial autocorrelation of each region and explain the spatial aggregation model of different provinces and cities, we analyze the scatter diagram of the local Moran's Ii.

As shown in
The local spatial autocorrelation analysis is carried out through the Moran scatter plots, as indicated from Figure 6. The variable Z is the result of the standardization of AQI data, and the spatial

Spatial Correlation Analysis of AQI
We use the daily air quality index data of 31 provincial capitals from 2015 to 2018 to obtain the annual average value for each provincial capital, then calculate the global Moran index I and local Moran index I i of each city every year. Further, we draw the scatter chart and LISA aggregation chart.
As shown in Table 2, the Moran's I from 2015 to 2018 is always positive, between 0.3 and 0.5. The value peaks at 0.458 in 2016 and falls to 0.384 in 2018. It indicates that there exists an obvious positive spatial correlation of China's air quality. The air quality of a city is closely related to that of its neighboring cities. In order to further study the local spatial autocorrelation of each region and explain the spatial aggregation model of different provinces and cities, we analyze the scatter diagram of the local Moran's I i .
The local spatial autocorrelation analysis is carried out through the Moran scatter plots, as indicated from Figure 6. The variable Z is the result of the standardization of AQI data, and the spatial lag factor W z is the vector of the weighted average of the neighbors around the observed value.AQI samples in China are mostly concentrated in the first quadrant (H-H) and the third quadrant (L-L), which indicates that most provinces in China are spatially dependent, while the rest are spatially heterogeneous.
H-H and L-L aggregation areas represent high (low) air quality in cities and high (low) air quality in surrounding cities. H-L and L-H aggregation areas stand for high (low) urban air quality surrounded by low (high) urban air quality, which means the AQI in the H-L aggregation area is high, but the AQI in surrounding cities is low. From Figure 7, the H-H aggregation areas in China are mainly distributed in cities in the central part of north China, such as Beijing, Shijiazhuang, Zhengzhou, Jinan and Taiyuan. The AQI index of cities in these areas is high, and that of surrounding cities is also high. Cities in China's L-L aggregation areas are mainly distributed in parts of the northeast and the southern coastal areas, such as Changchun and Guangzhou. In addition, cities in the southwest plateau region, such as Kunming and Guiyang, are also located in the concentrated L-L area. The distribution characteristics are related to the fact that these cities are located in coastal areas, which is conducive to the diffusion of pollutants, or because of the low human activity in the plateau areas. There are few cities in the H-L and L-H aggregation areas, mainly concentrated in northwest China and the Yangtze River delta, such as Shanghai and Hangzhou. H-H and L-L aggregation areas represent high (low) air quality in cities and high (low) air quality in surrounding cities. H-L and L-H aggregation areas stand for high (low) urban air quality surrounded by low (high) urban air quality, which means the AQI in the H-L aggregation area is high, but the AQI in surrounding cities is low. From Figure 7, the H-H aggregation areas in China are mainly distributed in cities in the central part of north China, such as Beijing, Shijiazhuang, Zhengzhou, Jinan and Taiyuan. The AQI index of cities in these areas is high, and that of surrounding cities is also high. Cities in China's L-L aggregation areas are mainly distributed in parts of the northeast and the southern coastal areas, such as Changchun and Guangzhou. In addition, cities in the southwest plateau region, such as Kunming and Guiyang, are also located in the concentrated L-L area. The distribution characteristics are related to the fact that these cities are located in coastal areas, which is conducive to the diffusion of pollutants, or because of the low human activity in the plateau areas. There are few cities in the H-L and L-H aggregation areas, mainly concentrated in northwest China and the Yangtze River delta, such as Shanghai and Hangzhou. From the above analysis, the urban air quality in China presents an obvious trend of aggregation. Therefore, the air pollution in the H-H aggregation area should be mainly treated to prevent its diffusion effect from further reducing the air quality in the surrounding cities and maintain the good air quality in the L-L aggregation area.  From the above analysis, the urban air quality in China presents an obvious trend of aggregation. Therefore, the air pollution in the H-H aggregation area should be mainly treated to prevent its diffusion effect from further reducing the air quality in the surrounding cities and maintain the good air quality in the L-L aggregation area.

The Spatio-Temporal Characteristics of AQI
From Figure 8, the country's air quality improved overall, from 2015 to 2018 The number of cities with poor air quality decreased significantly. Among them, the improvement of the air quality is most obvious in northeast China, such as Heilongjiang, Changchun and Shenyang, where the AQI index dropped by more than 30% from 2015 to 2018. The reason is that the state attaches great importance to revitalizing the old industrial base in northeast China, promoting industrial upgrading in this area, and developing high-tech industries with more environmental protection and lower consumption, so as to reduce the consumption of coal and oil, and improve the air quality gradually.
From the above analysis, the urban air quality in China presents an obvious trend of aggregation. Therefore, the air pollution in the H-H aggregation area should be mainly treated to prevent its diffusion effect from further reducing the air quality in the surrounding cities and maintain the good air quality in the L-L aggregation area.

The Spatio-Temporal Characteristics of AQI
From figure 8,the country's air quality improved overall, from 2015 to 2018 The number of cities with poor air quality decreased significantly. Among them, the improvement of the air quality is most obvious in northeast China, such as Heilongjiang, Changchun and Shenyang, where the AQI index dropped by more than 30% from 2015 to 2018. The reason is that the state attaches great importance to revitalizing the old industrial base in northeast China, promoting industrial upgrading in this area, and developing high-tech industries with more environmental protection and lower consumption, so as to reduce the consumption of coal and oil, and improve the air quality gradually. The air quality problem in Shijiazhuang has been high, with the AQI index over 100 from 2015 to 2018. This is because the Beijing-Tianjin-Hebei integration policy has relocated polluting enterprises from Beijing to Shijiazhuang, so the transformation and upgrading of polluting industries in Shijiazhuang cannot be completed in a short time. Besides, the pillar industry of Shijiazhuang is the high-pollution industry represented by steel manufacturing, which makes the air quality in Shijiazhuang unable to be improved effectively in recent years [46].

Spatio-Temporal Correlation Analysis of AQI
In Table 3, the p-values of the unit root tests are all less than 0.05; thus the null hypothesis is rejected. Within the 95% confidence level, the data do not have a unit root, so the sequence is stationary. Table 3. Results of unit root test.

Method
Statistic Prob.** Cross-sections Obs Null: Unit root (assumes common unit root process) The air quality problem in Shijiazhuang has been high, with the AQI index over 100 from 2015 to 2018. This is because the Beijing-Tianjin-Hebei integration policy has relocated polluting enterprises from Beijing to Shijiazhuang, so the transformation and upgrading of polluting industries in Shijiazhuang cannot be completed in a short time. Besides, the pillar industry of Shijiazhuang is the high-pollution industry represented by steel manufacturing, which makes the air quality in Shijiazhuang unable to be improved effectively in recent years [46].

Spatio-Temporal Correlation Analysis of AQI
In Table 3, the p-values of the unit root tests are all less than 0.05; thus the null hypothesis is rejected. Within the 95% confidence level, the data do not have a unit root, so the sequence is stationary. By the autocorrelation and partial correlation of the AQI values of the panel data in Table 4, the autocorrelation (AC) and partial correlation (PAC) coefficients of the first eight stages of the lag period are all positive, while the PAC value of the ninth stage is negative, which indicates that the ninth stage of the lag period is not significant. Therefore, lag 8 is initially selected as the lag period of the model, and the model is tested by AIC and SC criteria. According to Table 5, the AIC and SC values in the lag 8 attain a minimum, so the selection phase lag 8 is the best lag period. After verifying the stationarity of the data, we need to select the type of panel data model. The selection of panel data models usually takes three forms: One is the pooled regression model. If, from the perspective of time, there is no significant difference between different individuals, from the view of the cross section, there is no significant difference between different cross sections, so the panel data can be directly mixed together and the parameters can be estimated by the ordinary least square method. One is the fixed effects regression model. If the model intercepts are different for different sections or different time series, the method of adding dummy variables to the model can be used to estimate the regression parameters. One is the random effects regression model. If the intercept term in the fixed effect model includes the average effect of the random error term of the section and the random error term of the time, and the two random error terms obey the normal distribution, then the fixed effect model becomes the random effect model. In the selection method of the panel data model form, we often use the test to decide whether to choose the mixed model or the fixed effect model, and then use the Hausman test to determine whether to establish the random effect model or the fixed effect model.  After verifying the stationarity of the data, we need to select the type of panel data model. The selection of panel data models usually takes three forms: One is the pooled regression model. If, from the perspective of time, there is no significant difference between different individuals, from the view of the cross section, there is no significant difference between different cross sections, so the panel data can be directly mixed together and the parameters can be estimated by the ordinary least square method. One is the fixed effects regression model. If the model intercepts are different for different sections or different time series, the method of adding dummy variables to the model can be used to estimate the regression parameters. One is the random effects regression model. If the intercept term in the fixed effect model includes the average effect of the random error term of the section and the random error term of the time, and the two random error terms obey the normal distribution, then the fixed effect model becomes the random effect model. In the selection method of the panel data model form, we often use the F test to decide whether to choose the mixed model or the fixed effect model, and then use the Hausman test to determine whether to establish the random effect model or the fixed effect model.

Mixed Effect Model
The results of the mixed effect model test are shown in Table 6. For eight explanatory variables, it is found that the p-value of seven explanatory variables is less than 0.05. This indicates that under the confidence level of 95% most of these variables are significant. The regression results of the mixed effect model can be given by:

Individual Fixed Effect Model
The results of the individual fixed effect model test are shown in Table 7. For first eight explanatory variables, the p-value of seven explanatory variables is less than 0.05, which also indicates that under the confidence level of 95%, most of these variables are significant. The regression results of the individual fixed effect model is given by: BJ, HK, NJ, XN are the regional dummy variables, which takes 1 if the current region is analyzed, otherwise it is 0. The coefficients of the dummy variables of each city are shown in Table 7. 3.
Fixed Effect Test The following are two hypotheses of the fixed effect test: According to Table 8, the p-value of this test is less than 0.05, so it is more reasonable to reject the null hypothesis H 0 and establish an individual fixed effect regression model. According to Table 9, the p-value of the Hausman test is less than 0.05, so it is more appropriate to reject the null hypothesis H 0 and establish an individual fixed effect model. Finally, the panel data model of fixed effect is established, as indicated by Equation (8). From Equation (8), the regression coefficients with the base period from the first stage to the eighth stage are 0.61, 0.08, 0.03, 0.01, 0.003, 0.02 and 0.08, respectively. This indicates that the AQI value of the first stage of lag has the greatest correlations with the value of the base stage, while from the second stage of lag to the eighth stage of lag are the second. The reasons for the strongest correlations of AQI value in the first stage of lag and the second stage of lag is that the time distance between lags 1 and 2 is the closest to the base stage, and the factors affecting the AQI value do not change significantly within three hours and six hours, so the AQI value is highly correlated with the base stage. The correlation between the AQI value at base stage and the 8th stage lagging behind is significantly higher than that in the earlier stages. This is because the 8th stage lagging behind corresponds to 24 h away from the base stage, which means that the factors influencing the AQI value, such as natural conditions and human activities, are more similar to the base stage.
The absolute values of the correlation coefficients of the dummy variables for each region in Equation (8) reflect the degree of influence of the geographical location and surrounding cities on the air quality of the city. Shijiazhuang (SJZ) has the largest absolute value of dummy variable coefficient, which is 5.8. It shows that the air quality in SJZ is most influenced by its geographical location and the air quality of its surrounding cities. The reason is that SJZ is low-lying, which is not conducive to the diffusion of air pollutants, and the air quality in the surrounding areas is generally poor, which aggravates the air pollution problem in SJZ. Chengdu (CD) has the smallest absolute value of the dummy variable coefficient, which is −0.13. This is due to CD's location in the Sichuan basin, which makes it more isolated from the surrounding area and less affected by the air quality of other cities.
To sum up, the following conclusions can be drawn in this part: The correlation coefficient between the air quality and its lag period is: A lag of 3 h > a lag of 6 h > a lag of 24 h. Moreover, due to the geographical location and the air quality of surrounding cities, the correlation coefficient of each dummy variable is significantly different.

Conclusions and Discussion
This paper studies the air quality of 31 provincial capitals in China from 2015 to 2018. From temporal, spatial and spatio-temporal analysis, the following conclusions are drawn:

The Characteristics of Air Quality of Chinese Cities Can Be Analyzed on Multiple Time Scales
The annual feature of China's AQI is that China's overall air quality improved from 2015 to 2018. This is closely related to the national and local governments vigorously carrying out pollution control work. The quarterly distribution of AQI is characterized by the best air quality in summer, followed by autumn, again in spring, and the worst air quality in winter. This result is consistent with previous studies. However, the difference between AQI indexes in spring, summer and autumn gradually decreases. This is because there is less precipitation in the winter and the air is dry. At the same time, the winter enters the heating period, and the pollution emissions increase, which aggravates air pollution. On the contrary, the increase of precipitation in summer, the increase of humidity and the increase of temperature are conducive to the dilution and pollution of pollutants. The diffusion has effectively improved air quality. Due to the continuous deepening of governance, the gap between seasons has gradually narrowed. At about 6 a.m. in a day, due to reduced human activity at night and reduced emissions of air pollutants, the AQI data reaches a lower level. There is a correlation effect between urban air quality and the lag period in China, and its correlation decreases with the increase of the time lag. This is because the various factors affecting the AQI index are gradually weakening, and the degree of correlation with air quality at a fixed time point is gradually decreasing. At the same time, there is a periodicity; the period is one day, which means the correlation at the same time point is strong every day. As shown in Table 1 above, the correlation coefficients of the AQI values and their lags in the twelve cities decrease with the increase of lags. The correlation coefficients of most cities such as Guangzhou, Kunming, and Changchun are in the 8th, the 16th and 24th periods showing a slight increase or inflection point and showing obvious periodicity.

The Air Quality of Provincial Capitals in China is Better in the South than in the North, and Better in the Coastal Areas than the Inland Areas
This is mainly related to factors such as climate type, topography and economic development level. Low pollutants are difficult to spread, and economic development and industrialization are fast, and environmental damage is more serious; southern coastal cities, represented by Fuzhou, are mostly tropical or subtropical monsoon climates, and the terrain is relatively flat. The humid monsoon from the ocean in summer is conducive to the diffusion of pollutants. There is an obvious positive spatial autocorrelation of air quality in China, and urban air quality in China shows an obvious tendency of aggregation. Cities with a high degree of air pollution will have a negative impact on the air quality of surrounding cities.
With this result, energy structures should be optimized and the proportion of green energy should be increased. China's energy consumption structure is still given priority with coal; burning coal for heating or generating electricity from coal increases the concentration of air pollutants as it burns, so China needs to change the traditional energy consumption structure, reduce the coal consumption, raise the proportion of natural gas consumption and vigorously develop renewable energy sources such as wind, solar, geothermal energy. The government should take actions to alleviate the air pollution in some typical cities, as well as in winter [47]. It should promote the policy of "replacing coal with electricity" in northern China to reduce the use of coal for heating in winter; besides, promoting the popularity of new energy vehicles is an efficient way to control urban vehicle emissions and improve urban air quality. Through the survey of public attitudes towards air pollution control, it is found that respondents in China are very concerned about air pollution and believe that it is the duty of all citizens to improve air quality. When necessary, they are willing to use new energy vehicles to replace traditional energy vehicles [48]. The number of provinces with poor air quality decreased significantly, while the number of provinces with good air quality increased significantly compared with 2015. The overall conclusion is consistent with the findings of other authors. Northeast China, such as Heilongjiang, Changchun and Shenyang, has seen the most significant improvement in air quality, while Shijiazhuang still has a serious air pollution problem. The reason is that the country attaches importance to revitalizing the old industrial bases in the northeast, promotes industrial upgrading in the three eastern provinces, develops more environmentally friendly and low-consumption high-tech industries and reduces the consumption of coal and oil. The air quality has gradually improved. On the other hand, due to the Beijing-Tianjin-Hebei integration policy, Beijing's polluting enterprises have been relocated to Shijiazhuang, so the transformation and upgrading of high-pollution industries in Shijiazhuang cannot be completed in a short time.

The Relationship between the Correlation Coefficient of Chinese Urban Air Quality Has a Lag Period
The lag period is as follows: A lag of 3 h > a lag of 6 h > a lag of 24 h. Moreover, due to the relative location of different regions and the influence of climate and other factors, the absolute values of the correlation coefficients of dummy variables in different regions are greatly different. The result suggests that there is a strong correlation between air quality in different provinces in China, especially in north China including Beijing. Cities in north China need to work together to tackle air pollution, which is a regional problem rather than a local one [49]. We believe that effective air pollution control policies should be regional, but in order to address air pollution more effectively, cross-border cooperation between regional and local governments is essential [50]. Controlling air pollution problems, therefore, should involve implementing a regional combination, breaking the limitation of administrative regions and establishing a regional air pollution control management agency. It is short sighted to deal with the air pollution problem in a region at the cost of the environmental benefits of the surrounding region, so it is our duty to achieve coordinated regional air pollution control.
While this study has successfully explored the correlation between the air quality in Chinese cities and its lag period from the perspective of space and time, there are still deficiencies, which can be considered in future research. Due to the lack of concentration data of the study on factors leading to air pollution, the close relationship between the air quality and human factors as well as natural factors such as the temperature, vegetation and terrain cannot be highlighted, so the conclusion obtained should be cautious. Wang et al. [51] studied the spatial and temporal patterns and dynamics of urban aerosols using AOD data in phoenix and Los Angeles. The correlation analysis shows that, compared with the urban land use and the vegetation, topography is the main factor affecting the spatial pattern of AOD. The impact of human factors such as urbanization on air pollution varies with the existing landscape, which significantly alleviates aerosol concentrations. In addition, plants release fine aerosols, which can reduce air pollution. Generally speaking, temporal variation characteristics and regional differences in air pollutant concentration are the result of the comprehensive effects of meteorological conditions, landform, vegetation coverage, human activities and socio-economic development level. The influence of natural factors and anthropogenic factors on urban air quality distribution needs to be further studied.