1. Introduction
In the past few years, air pollution problem in China, especially Beijing, has been so severe that it has received widespread attention from all over the world. Cities are dense areas of economic activities, and therefore, populations, and Beijing is the political and economic center of China. After a stage of radical pursuit of economic growth, improving air quality and overall living environment is the current focus of China’s realization of green development. Therefore, it is important to study Beijing’s air quality issues to find ways to tackle air pollution problems and provide a reference for other cities.
The correlation between human activities and the atmospheric system in urban ecosystems has been increasing year by year [
1]. Domestic research on air quality conditions began in the 1990s, behind abroad [
2]. In terms of air quality characteristics, some researchers have studied the temporal and spatial distribution characteristics of China’s AQI, finding that the national air quality situation shows a spatial clustering effect. High pollution and low pollution regions show a pattern of north–south differentiation, and the overall air quality of the country shows the distribution characteristics of slightly lighter in the south and lighter in the east [
3,
4]. The AQI showed a downward trend from 2016 to 2019, showing a “U” shape in the middle of the month [
5]. The state has put forward clear pollution control requirements. Many local government departments regard pollution prevention and control as their primary task, and air quality monitoring has become an urgent need. Therefore, an accurate air quality forecast system can reflect the air conditions promptly and provide preparation information for the Ministry of Environmental Protection [
6].
The statistical forecast is to analyze data through mathematical modeling, using correlation analysis [
7], multiple regression [
8,
9], principal component analysis [
10], gray model [
11], fuzzy comprehensive evaluation method [
12], harmonic regression [
13], and other methods to predict air quality. However, it is difficult to provide air quality data in a timely and rapid manner due to the long forecast period. With the development of data collection and processing technology and the integration of various disciplines, data mining and machine learning methods have been used to analyze environmental information, and obtain timely and accurate air information and provide guiding suggestions [
14,
15,
16,
17,
18].
Firstly, this paper compares data of Beijing’s AQI and the concentration data of six major pollutants from 2019 to 2021, comprehensively evaluates its air quality, and explores factors affecting the air quality. Secondly, it uses time series models and data mining methods to establish predictive models. The ARIMA model is constructed based on the time series data of AQI, and the three-layer neural network model is constructed based on the daily average data and the data of the concentration of six major pollutants. Finally, study shows that the two models are effective for AQI to make short-term forecasts and analyses. Furthermore, this paper analyzes the long-term forecast of Beijing’s air quality index based on the seasonal ARIMA model and compares it with the short-term forecast to draw a comprehensive conclusion, which could be helpful to provide references for relevant departments for urban air and environmental governance.
Compared with previous research on air quality, this paper not only uses the combination of visual analysis and time series model, but also considers the delayed effect of air pollution. On the basis of short-term forecast, the long-term forecast of air quality index is added, which makes the results more convincing and representative. Additionally, this paper includes a cluster analysis on the air quality index of Beijing in different periods and a multi-layer perceptron (MLP) neural network model based on the built-in algorithm of data mining technology to classify and evaluate the air quality level of the city. Finally, the classification rules of the six pollutants are used to explore a classification model with high accuracy, and a comprehensive comparison is made with the previous descriptive analysis, which effectively avoids the problems of chance and errors caused by the use of a single method. Research results are time-sensitive and have strong practical significance.
4. Long-Term AQI Forecast Based on Seasonal Model
Based on the results of the previous analysis of Beijing air quality visualization, it can be seen that the AQI of Beijing shows more obvious seasonal characteristics. In the above paper, short-term forecast was made for the daily data of Beijing AQI, and in this chapter, long-term forecast of Beijing AQI is made based on the seasonal model of ARIMA model, so the model was built by selecting 83 monthly air quality data from January 2015 to November 2021. Therefore, 83 monthly air quality data points from January 2015 to November 2021 were selected as the experimental data, and the data from December 2021 to February 2022 were used as the data set to verify the model fitting effect, and the original monthly data points were pre-processed in the following section.
4.1. Data Preprocessing
The time series plot of the AQI monthly data is drawn using R software, as
Figure 20 shown below. From the time series plot of the monthly data, it is known that from January 2015 to November 2021, the overall AQI of Beijing shows a decreasing trend and has a more obvious seasonal effect. Subsequently, the graphical test of autocorrelation and partial autocorrelation coefficients was conducted, as shown in
Figure 21 and
Figure 22, and its autocorrelation coefficients have a long-term trailing and periodic trend, and the monthly data of Beijing air quality index is initially inferred to be unsteady by the graphical test observation, and in order to further evaluate objectively the steadiness, the series is concluded to be a non-steady time series after unit root test using R software.
Similar to the ARIMA model for short-term forecast of AQI daily data, a pure randomness test is performed on the original series before the modeling analysis in order to investigate whether there is any correlation between the series and whether there is value for further study. The pure randomness test was performed using the Box.test function in the R software, and the results are shown in
Table 6. It can be seen that the
p-values of delayed 6 periods and delayed 12 periods are significantly less than 0.05; therefore, the original hypothesis is rejected, and the monthly data series of Beijing AQI is not a white noise series, which can be used for subsequent modeling analysis.
4.2. Construction of Seasonal Model
From the above time series graph, we can see that the original series shows the change of year as the cycle, and the selected air quality data is monthly data, so the cycle length s = 12. To make the original time series smooth, we need to eliminate the linear trend and seasonal periodicity of the series. Therefore, the monthly AQI data of Beijing are first differenced to eliminate the linear trend, and then differenced to eliminate the seasonal periodicity in 12 steps. The series after the first-order twelve-step differencing is denoted as AQI-diff12, and its time series is plotted as shown in
Figure 23. The series after trend differencing and seasonal differencing has no obvious upward or downward trend and no obvious periodicity, fluctuating around the zero value, which can be initially judged as a smooth time series after differencing. The autocorrelation coefficients and partial autocorrelation coefficients of the series after differencing are verified by the graph test method, as shown in
Figure 24 and
Figure 25. The autocorrelation coefficient quickly decays to zero, and the
p-value of the pure randomness test of the differenced series is 0.01, which is smaller than the significance level of 0.05. The original hypothesis is rejected, indicating that the series is smooth after eliminating the linear trend and seasonal trend. The
p-value of the differenced series after the pure randomness test is 0.019, which is less than the significance level of 0.05. Therefore, the differenced series is still a non-white noise series, and the next modeling analysis is conducted for this series.
- (1)
Model identification and model ranking
Based on the above autocorrelation and partial autocorrelation plots after differencing, the first step is to consider the characteristics of the autocorrelation coefficients and partial autocorrelation coefficients within 12 orders of the series after trend differencing and seasonal differencing in order to determine the short-term correlation model. In the autocorrelation and partial autocorrelation plots of the differenced series, the autocorrelation coefficients and partial autocorrelation coefficients up to order 12 are not truncated, so an ARMA(1,1) model is attempted to extract the short-term autocorrelation information of the differenced series.
The second step considers the autocorrelation characteristics of the season in question in order to confirm the choice of additive or multiplicative seasonal model. The approach is to consider the characteristics of autocorrelation coefficients and partial autocorrelation coefficients in autocorrelation plots and partial autocorrelation plots with delayed 12th order, 24th order, etc. with the length of the period as the unit. According to the autocorrelation and partial autocorrelation plots, the autocorrelation coefficients and partial autocorrelation coefficients of the delayed 12th and 24th orders fall within the range of 2 times the standard deviation, and the corresponding values of the delayed 24th order are smaller, which shows that there is no significant seasonal effect in the differenced series, so we initially consider a simple seasonal model, i.e., an additive seasonal model. At this point, the seasonal differencing order D = 1, p = 0, and Q = 0.
Combined with the previous first-order twelve-step differencing information, the additive seasonal model fitting ARIMA (1,(1,12),1) was finally determined, and its model structure is as follows.
- (2)
Parameter estimation of the model
The final fitted model has been determined in the previous step of the analysis, and the next step is to determine the caliber of this model based on the observed values of the series, which means that the values of the unknown parameters in the fitted model need to be estimated. Using R software, the parameters of the fitted additive seasonal model were estimated according to the maximum likelihood estimation method, and the following results were obtained, as shown in
Table 7.
Based on the above results, the caliber of the fitted additive seasonal model can be seen as
where
is the delay operator and
is the white noise sequence, i.e.,
.
A white noise test of the residuals was performed on the established additive seasonal model in order to determine the significance of the model. Next, the Box.test function in the R software was used to test whether the residual series is a white noise series, and the test results are shown in
Table 8. According to the results of the white noise test of the residuals, the
p-value corresponding to the LB statistic at each order of delay is significantly greater than the significance level of 0.05; therefore, it can be considered that the residual series of the fitted additive seasonal model is a white noise series, which means that the established model is significantly valid.
4.3. Forecast Analysis of the Additive Seasonal Model
Based on the established additive seasonal model, the Beijing air quality index from December 2021 to February 2022 was selected as the test set to verify whether the model had a more accurate fit. Using the same short-term correlation criteria as above, the results are shown in
Table 9. As can be seen from the graphs, the differences between the predicted and true values are small and the error values are within acceptable limits, indicating that the additive seasonal model is appropriate and valid for extrapolating the future long-term Beijing AQI, with high forecast accuracy and reasonable and credible results.
The predicted results of Beijing AQI for the next 24 periods are shown in
Figure 26. It is observed that the AQI index still shows seasonal cycles and still has a slightly decreasing trend in the next two years.
4.4. Section Subsection
In this section, the long-term forecast of AQI in Beijing is based on the seasonal model of ARIMA model, which shows an overall decreasing trend of AQI in Beijing from January 2015 to November 2021 with a more obvious seasonal effect. The parameters of the fitted additive seasonal model are estimated according to the maximum likelihood estimation method, and the AQI of Beijing from December 2021 to February 2022 is predicted according to the additive model, and the results show that the AQI still shows a seasonal cycle and still has a slightly decreasing trend in the next two years.
5. Summary and Outlook
Based on Beijing’s AQI from January 2019 to November 2021 and the daily average and monthly data of six major air pollutants, this article uses descriptive statistical analysis, correlation analysis, and cluster analysis to visualize air quality development trends; Using time series analysis and data mining algorithms to build models and make short-term forecasts of Beijing’s air quality, the following conclusions are obtained:
Using statistical methods to analyze the air quality level, AQI and the distribution of the six types of pollution concentration changes, the daily analysis results show that with the continuous deepening of air pollution prevention and control work, the air quality in Beijing continues to improve, AQI has improved significantly, and the level is excellent. The proportion of days has increased year by year. The monthly analysis results show that in the past three years, the air pollution level was the most serious in June, which was mainly related to the serious excess of ozone content. The changes in air quality in Beijing show obvious seasonal characteristics. The five main pollutants PM2.5, PM10, SO2, CO, and NO2 have low concentrations in summer and high concentrations in winter; only O3 is the opposite of other pollutants. Because of the high concentration in summer and low concentration in winter, the persistently high content of ozone is still a thorny issue facing today, and the air quality varies greatly between the heating period and non-heating period.
The short-term forecast of Beijing air quality index using time series model and neural network model overcomes the lag of the current air quality monitoring system, and the AQI index high and low is determined by the co-construction of six air pollutants. The results show that both ARIMA model and neural network model are significant for the forecast of air quality index, and the established models are reasonable and effective, and it is found by comparison, the fitting effect of the neural network is better than that of the ARIMA model, but both models have their own characteristics. It was also found that PM2.5, PM10, and O3 have a greater influence on the air quality class, and are the main factors to determine the specific value of AQI and air quality class. When using the additive seasonal model for long-term forecast of monthly data, it was found that the Beijing AQI still shows seasonal cyclicality and still has a slightly decreasing trend in the next two years. In summary, based on the conclusions of the article, we can propose measures to improve air quality from the three perspectives of the government, society, and individuals.
The government must increase implementation of environmental protection policies and investment in environmental protection technology. Environmental protection departments should strengthen environmental management, earnestly implement national and local laws and regulations, comprehensively use technical means and administrative measures, and manage air quality through legislation, monitoring, and protection.
The analysis shows that PM2.5, PM10, and O3 have a greater impact on air quality levels. Therefore, environmental protection management agencies have been established at all levels from the central to the local level to use monitoring technology tools to publish monitoring data promptly, inspect and dispose of pollution sources, and control building dust, Pollution behaviors such as burning coal for heating and burning straw. Increase investment in the field of environmental protection technology, develop reasonable treatment equipment, reduce waste of resources, and improve sewage treatment technology. Optimize the industrial structure, lower pollution standards, and increase pollution punishment. Resource control policies such as pollutant discharge fees have a significant impact on pollution control costs. The development of a washing energy industry with high energy utilization and low pollution, and making good use of renewable resources such as solar and wind energy. Air pollution has fluidity and regional characteristics, and its changes are synchronized. Pollution between regions affects each other. Pollution prevention and control is not just an administrative region’s problem. It is necessary to establish a regional cooperation system, regional joint prevention and control, to solve cross-regional air pollution problems, for example, the Beijing-Tianjin-Hebei simultaneous implementation of the “Regulations on the Prevention and Control of Emission Pollution from Motor Vehicles and Non-road Mobile Machinery”, and so on. Improve urban green coverage, borrow the characteristics of plants to absorb dust and purify the air, provide zoning control strategies for the in-depth fight against pollution, and continue to promote precise, scientific, and legal pollution control.
The society must vigorously promote environmental protection knowledge, raise awareness of protecting the atmospheric environment, and advocate low-carbon life. Prevention and control work increasingly requires scientific and refined management. The city should adhere to project emission reductions and management emission reductions according to changes in air quality in months and seasons, and promote the formation of a spatial pattern, industrial structure and lifestyle that conserves resources and protects the environment. The aims are to deepen the “one microgram” action, focus on the coordinated governance of PM2.5, PM10 and O3, and achieve green transformation of the industrial structure, green and low-carbon energy structure, green optimization of vehicle structure, and green and clean urban appearance.
Another aim is to establish an action pattern led by the government and public participation. With the expansion of the scale of cities and the improvement of the level of economic activities, the number of motor vehicles has increased, and cars emit a large amount of NO2 and inhalable particulate matter, which will seriously damage the environment and affect people’s health. Therefore, it is necessary to consciously eliminate old motor vehicles and improve awareness of the purification and treatment of polluting vehicle exhausts, supporting the development and use of new energy vehicles. The general public should actively participate in environmental protection activities and environmental protection supervision, consciously practice a simple and moderate, green and low-carbon lifestyle, and offer advice and suggestions for a more beautiful Beijing.