Monitoring the Inﬂuence of Industrialization and Urbanization on Spatiotemporal Variations of AQI and PM 2.5 in Three Provinces, China

: With the rapid development of industrialization and urbanization, atmospheric pollution research is vital for regional sustainable development and related policies formulated by the government. Previous studies have mainly studied a single evaluation method to analyze the air quality index (AQI) or single air pollutant. This research integrated the Spearman coefﬁcient (SC) correlation analysis, a random search (RS) algorithm and an excellent extreme gradient boosting (XGBoost) algorithm to evaluate the air pollution inﬂuence of industrialization and urbanization (APIIU). Industrialization, urbanization and meteorological indicators were used to measure the inﬂuence degree of APIIU on AQI and particulate matter 2.5 (PM 2.5 ), respectively. The main ﬁndings were: (1) the APIIU-AQI and APIIU-PM 2.5 of Henan Province, Hubei Province and Hunan Province had signiﬁcant changes from 2017 to 2019; (2) the value of square of determination coefﬁcient of real value ( R 2 ), the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of APIIU-AQI and APIIU-PM 2.5 in three provinces predicted by the SC-RS-XGBoost were 0.945, 0.103, 4.25% and 0.897, 0.205, 4.84%, respectively; (3) the predicted results were more accurate than using a SC-XGBoost, RS-XGBoost, traditional XGBoost, support vector regression (SVR) and extreme learning machine (ELM).


Introduction
Air quality has become a more significant problem with the development of industrialization and urbanization [1][2][3][4]. The remarkable economic growth has resulted in serious environmental issues [5,6]. The emissions of air pollutants from industrial and motor vehicles are currently the most important environmental risk to human health [7]. According to an assessment report from the World Health Organization, urban air pollution has resulted in more than two million deaths every year [8,9]. In the past 20 years, the European Union has made great policy progress in atmospheric emissions and air quality. However, air pollution continues to have a serious influence on the health of people who live in Europe's urban areas [10,11]. As one of the rapidly developing countries, China has experienced rapid economy growth in the past several decades. As a result, industrial activities, urban expansion and human engineering activities have produced a high intensity of air pollutant emissions in China [12][13][14][15][16]. Hence, reducing air pollution is a critical measure for environmental protection and sustainable development [17].
From 2010, China has actively implemented the Clean Air Action to deal with air pollution, and many pollutants' emissions have decreased since then [18]. To effectively assess air quality and provide guidance for outdoor activities, the Chinese Ministry of Environmental Protection (MEP) has adopted and developed an air quality index (AQI) system in 2012 [19][20][21]. The existing related research works elaborately documented that the six major air pollutants were particulate matter 2.5 (PM 2.5 ), particulate matter 10 (PM 10 ), sulfur oxides (SO 2 ), nitrogen oxides (NO 2 ), carbon mono oxide (CO) and ozone (O 3 ). The City and exhibited a better stability and prediction performance. Zhou et al. [49] adopted the Gaussian process mixture (GPM) model based on an iterative learning algorithm to predict the air pollutants' concentrations. Yu et al. [50] developed a dynamic model based on ELM to forecast the concentrations of SO 2 and NO 2 . Using the quantum genetic algorithm (QGA) to optimize the connection weight threshold of the ELM, it contributed to increasing the prediction performance. Middya et al. [51] found that several models, such as the bi-directional LSTM (Bi-LSTM) and convolutional LSTM (Conv-LSTM), achieved a high forecasting accuracy for the majority of air pollutants. Sun et al. [52] established a new model based on the stacking-driven ensemble (SDE) and two kinds of input selection methods to forecast the hourly concentration of PM 2.5 and found that the proposed model had a higher predicting accuracy, much better performance and more robust forecasting ability. Ribeiro [53] used Bayesian, econometrics and machine-learning models applied to predict the future concentration of SO 2 emissions, respectively. Finally, the machinelearning models had better generalization power than traditional methods. Although previous research works had superiority in predicting the concentrations of air pollutants, there were few studies of air quality based on analyzing the influence of relevant indicators of air quality in the process of industrialization and urbanization. Rapid economic growth and industrialization have decreased air quality in developing countries. The urbanization scale and high population density have increased air pollutant emissions through various human daily travel and life activities [54][55][56][57][58].
The above literature demonstrates the research on AQI and air pollutant concentrations with different methods. Currently, there is no universally agreed method of constructing an indicators system for the assessment of air pollution influence of industrialization and urbanization (APIIU). The definition of an APIIU indicators system is variable, and it can hold multiple dimensions and present different specificities depending on the province, time span or assessment target variable for air pollution. This uncertainty makes it difficult to identify consistent research methods for analysis and prediction. In this paper, considering the availability of data and the cumulative effect of development of industrialization and urbanization, our approach was to choose annual data. Since the study area and time span were selected from 49 cities in three provinces from 2017 to 2019, the above data belonged to a small sample. Extreme gradient boosting (XGBoost), support vector regression (SVR) and ELM have been proven to possess good prediction performance for small samples in past research [59][60][61][62][63][64][65]. Additionally, since the construction of the indicators system was mainly based on previous research, the Spearman coefficient (SC) was used for analyzing the constituent indicators of APIIU of AQI and PM 2.5 to select indicators with a large correlation coefficient. When using XGBoost and SVR to predict data, internal parameters affect the prediction accuracy. Hence, the parameters need to be optimized to improve the performance results. XGBoost includes the general, booster and learning task parameters. Many types and quantities of parameters needed to be optimized in this research, and we adopted a random search (RS) algorithm to optimize the parameters of XGBoost and SVR [66][67][68]. On the one hand, the algorithm was selected to consider the efficiency of optimizing XGBoost. On the other hand, it was used for comparing the prediction performance of RS-XGBoost and RS-SVR.
This study firstly considered the industrialization, urbanization, population, regional gross domestic product (GDP) and meteorological indicators as the APIIU indicators. Then, we constructed an APIIU indicators system from the aspects of economy, society and natural environment. This research was divided into four sections. The first section introduces the research review. The Materials and Methods section introduces the research region, data sources, indicators system and the principle of the SC-RS-XGBoost. The Analytical Results section introduces the determination of each indicator's weight and calculation of APIIU, and the final section offers a discussion, our conclusions and the limitations of this research.  Figure 1. Henan Province is located in the central part of eastern China and the middle and lower reaches of the Yellow River. It is bounded by latitude 31 • 23 -36 • 22 N and longitude 110 • 21 -116 • 39 E, with a total area of 167,000 square kilometers. Hubei Province is located in the central part of China with a total area of 185,900 square kilometers, between latitude 29 • 0 53 -33 • 03 N and longitude 108 • 21 42 -116 • 07 50 E, in the middle reaches of the Yangtze River. Hunan Province is located in the central part of China with a total area of 211,800 square kilometers, between latitude 24 • 38 -30 • 08 N and longitude 108 • 47 -114 • 15 E. At the end of 2021, the resident population of Henan Province was 98.8 million, and the regional GDP was over CNY 58,887 billion. The resident population of Hubei Province was 58.3 million, and the regional GDP was over CNY 50,012 billion. The resident population of Hunan Province was 66.2 million, and the regional GDP was over CNY 46,063 billion. In these three provinces, the proportion of secondary industry is 41.3%, 37.9% and 39.4%, respectively. The related research works showed that the economic scale and high proportion of secondary industry had a significant impact on air quality and environmental pollution. In the 2021 census data, the population of Henan Province, Hubei Province and Hunan Province, respectively, ranked third, seventh and tenth in thirty-four provinces in China. The construction of urbanization has resulted in a decrease in urban green vegetation coverage, which directly contributed to air pollution. The higher proportion of secondary industry and the larger population have resulted in more air pollutant emissions and AQI exceeding the standard. Hence, it is necessary to study air quality in the three provinces. The geographic locations of the research regions are shown in Figure 1. and calculation of APIIU, and the final section offers a discussion, our conclusions and the limitations of this research.

Research Region
The research domain covers 49 cities composed of 18 cities in Henan Province, 17 cities in Hubei Province and 14 cities in Hunan Province, which is shown in Figure 1. Henan Province is located in the central part of eastern China and the middle and lower reaches of the Yellow River. It is bounded by latitude 31°23′-36°22′ N and longitude 110°21′-116°39′ E, with a total area of 167,000 square kilometers. Hubei Province is located in the central part of China with a total area of 185,900 square kilometers, between latitude 29°0′53″-33°03' N and longitude 108°21'42″-116°07′50″ E, in the middle reaches of the Yangtze River. Hunan Province is located in the central part of China with a total area of 211,800 square kilometers, between latitude 24°38′-30°08′ N and longitude 108°47′-114°15′ E. At the end of 2021, the resident population of Henan Province was 98.8 million, and the regional GDP was over CNY 58,887 billion. The resident population of Hubei Province was 58.3 million, and the regional GDP was over CNY 50,012 billion. The resident population of Hunan Province was 66.2 million, and the regional GDP was over CNY 46,063 billion. In these three provinces, the proportion of secondary industry is 41.3%, 37.9% and 39.4%, respectively. The related research works showed that the economic scale and high proportion of secondary industry had a significant impact on air quality and environmental pollution. In the 2021 census data, the population of Henan Province, Hubei Province and Hunan Province, respectively, ranked third, seventh and tenth in thirty-four provinces in China. The construction of urbanization has resulted in a decrease in urban green vegetation coverage, which directly contributed to air pollution. The higher proportion of secondary industry and the larger population have resulted in more air pollutant emissions and AQI exceeding the standard. Hence, it is necessary to study air quality in the three provinces. The geographic locations of the research regions are shown in Figure 1.

Data Sources
The data on the regional development of industrialization and urbanization used in this study include regional GDP, the GDP of secondary industry, coal consumption, exhaust emissions, population of city jurisdiction, total city population, city jurisdiction areas, administrative land area, density of population and so on. Due to yearbook data generally publishing the annual data of the previous year, the above data range was from 31 December 2018 to 31 December 2020 and was sourced from the National Statistics Administration and the China City Statistical Yearbook (https://data.stats.gov.cn/index.html/ accessed on 2 May 2022). The meteorological data included annual relative humidity, average temperature, rainfall data and time length of sunshine (http://data.cma.cn/dataService/ cdcindex/datacodel/ accessed on 3 May 2022). The average annual AQI in 18 cities from 2017 to 2019 were calculated from the daily data from the air quality online monitoring platform (https://www.aqistudy.cn/historydata/ accessed on 5 May 2022). The annual average PM 2.5 were calculated from the daily data, which were from the US Embassy's Air Quality Report (https://www.airnow.gov/ accessed on 5 May 2022). In this paper, by using PyCharm 2021 in Python, Anaconda 3, SPSS Statistics 26.0 and ArcGIS 10.2, the average annual AQI changing trend is shown in Figure 2a-c. In addition, the relevant statistical analysis, including maximum value and standard deviation of the annual concentrations of PM 2.5 in every city in three provinces from 2017 to 2019, is shown in Tables S1-S3.

Data Sources
The data on the regional development of industrialization and urbanization used in this study include regional GDP, the GDP of secondary industry, coal consumption, exhaust emissions, population of city jurisdiction, total city population, city jurisdiction areas, administrative land area, density of population and so on. Due to yearbook data generally publishing the annual data of the previous year, the above data range was from 31 December 2018 to 31 December 2020 and was sourced from the National Statistics Administration and the China City Statistical Yearbook (https://data.stats.gov.cn/index.html/ accessed on 2 May 2022). The meteorological data included annual relative humidity, average temperature, rainfall data and time length of sunshine (http://data.cma.cn/da-taService/cdcindex/datacodel/ accessed on 3 May 2022). The average annual AQI in 18 cities from 2017 to 2019 were calculated from the daily data from the air quality online monitoring platform (https://www.aqistudy.cn/historydata/ accessed on 5 May 2022). The annual average PM2.5 were calculated from the daily data, which were from the US Embassy's Air Quality Report (https://www.airnow.gov/ accessed on 5 May 2022). In this paper, by using PyCharm 2021 in Python, Anaconda 3, SPSS Statistics 26.0 and ArcGIS 10.2, the average annual AQI changing trend is shown in Figure 2a-c. In addition, the relevant statistical analysis, including maximum value and standard deviation of the annual concentrations of PM2.5 in every city in three provinces from 2017 to 2019, is shown in Tables S1-S3.   Zhangjiajie had dropped by more than 18% compared with 2017. Although China has issued related policies to protect the environment and manage air pollutant emissions in recent years, with the development of industrialization and urbanization and the increase in the population scale, the annual average concentrations of PM2.5 of most cities are still more than 30 μg/m 3 , and even some cities have consistently exceeded 60 μg/m 3 in 2019. Hence, it is very necessary to carry out the relevant environmental research of AQI and particulate matter, providing the government with a feasibility analysis report.

Indicators System
The regional assessment indicators system of APIIU constructed from the perspective of four aspects is shown in Table 1.

Primary Indicators Secondary Indicators Reference Source
Industrialization indicators X 1 : Regional GDP (CNY 100 million) [1,2,4,54,57] X 2 : Regional GDP of secondary industry (CNY 100 million)  Based on the existing related research results, considering the current situation regarding air pollution and the availability of data, this study constructed an indicators system of APIIU assessment using the industrialization, urbanization and meteorological indicators. The industrialization indicators were directly responsible for air pollutant emissions. Previously published studies have shown that the major air pollutant emissions were from the daily industrial operation of the machines. Therefore, the regional GDP, GDP of the secondary industry, coal consumption and exhaust emissions were used as direct or indirect influencing indicators of APIIU. Urbanization indicators included the city's development level, scale and population. The occurrence of air pollution was closely related to the city's development level and scale. With population growth and urban development, more and more building construction areas contributed to the decrease in the city's green vegetation areas, which increased the concentrations of dust and particulate matter in the air and reduced the adsorption capacity of green plants for air pollutants. The higher density of the regional population living in urban areas caused more environmental pollution problems and air pollutant emissions. The annual average air humidity and the annual average rainfall could decrease air pollution to some extent. Higher emissions of industrial and residential pollutants caused higher annual average temperature. PM 2.5 refers to particulate matter with aerodynamic equivalent diameter of less than or equal to 2.5 microns. Owing to long residence time and conveying distance, it has a greater impact on human health and the quality of the atmospheric environment. PM 10 is inhalable particulate matter with aerodynamic equivalent diameter of less than or equal to 10 microns, which refers to the general term for solid and liquid particles floating in the air. PM 10 can enter the upper respiratory tract directly and endanger human lung health. PM 10 in many cities were positively correlated with PM 2.5 , but O 3 was negatively correlated with PM 2.5 . Related studies conducted in other regions also showed that the concentrations of SO 2 , NO 2 and CO have increased year by year with the formulation of relevant policies. For example, the most obvious feature was that the occurrence of acid rain has reduced in the past many years [69,70]. Hence, this research selected the AQI and PM 2.5 as the objectives of APIIU; industrialization indicators, urbanization indicators and meteorological indicators were selected to evaluate the influence degree of APIIU on AQI and PM 2.5 .

The Principle of SC
In statistics, the SC is a nonparametric measure of the dependence of two variables. It uses a monotonic equation to evaluate the correlation of two statistical variables. If there are no repeated values in experimental data, the value of SC is 1, or −1 when the two variables are in a perfect monotonical correlation. SC is also referred to as level correlation, which is the observed data being replaced by the level. The SC indicates the direction of correlation between independent variable x and dependent variable y. If y tends to increase with x increases, the SC is positive. If y tends to decrease with x increases, the SC is negative. When SC is zero, it indicates that y exhibits no trend with x increases. The absolute value of SC increases when x and y become closer and closer [66][67][68]. The SC can reflect the direction and degree of the change trend between the 2 random variables, which is calculated as the difference of equal magnitude. The most notable feature is that it need not consider the sample size or overall distribution characteristics of the variables; the SC of two random variables can be expressed as Equation (1): where ρ is the SC between x and y, n is datasets sample size, x and y is the average value of x and y.
In fact, the connection between the variables is irrelevant. The principle is to sort the data of two variables and calculate the linear correlation analysis using the value of rank difference obtained after sorting, as shown in Equation (2): where n is datasets sample size, d is the rank difference value sorted by 2 variables.

The Principle of RS
The grid search (GS) is an exhaustive optimization algorithm, and the searched parameters are defined in the space of the uniform grids. All the nodes in the grid are then evaluated to identify the global minimum. Finally, the grid search finds the global minimum of all the nodes in the parameter grid. By approaching the optimal point in the next step, a finer grid is defined and gradually approaches the optimal point in the searching pa-rameter space. However, the optimization process is more time consuming and inefficient. The RS algorithm selects the specific number of 1 random value per 1 hyperparameter to reduce the computation of the hyperparameter search, shorten the optimization time and improve the model performance. It is performed in a random manner in the spatial distribution, and the RS algorithm samples it as 1 distribution [59][60][61]. In this research, RS was applied to optimize XGBoost.

The Principle of XGBoost
The integrated learning method refers to combining multiple learning models to obtain better results and a stronger generalization ability. XGBoost is a scalable tree boosting system that is an ensemble method that aims to aggregate weak learning models to form a stronger and more robust estimator in an iterative fashion. The residual of the previous estimator will be used to learn and optimize the loss function during each iteration. A binary decision tree called a classification and regression tree (CART) is selected as a basic learner, and regularization is added to the loss function for improvement and to avoid overfitting in XGBoost [62][63][64][65]. XGBoost is a highly flexible and versatile tool that can solve most regression problems and objective functions created by users. The SC-RS-XGBoost algorithm used in this paper is shown in Algorithm 1.

Algorithm 1: SC-RS-XGBoost
Input: D m×n = {(X i )} (X i ∈ R m , i = 1, 2, . . . , n), original data with n samples and m feature variables Output: D l×n = {(X i )} (X i ∈ R l , i = 1, 2, . . . , n), using SC to screen correlation coefficients greater than 0.3 based on D m×n (l < n) Input: Objective function of RS, f (X,Y) = g(X) h(Y) Set default values and value ranges of parameters to be optimized Set the threshold of mean squared error (MSE) Output: Every parameter value when f (X,Y) reaches the maximum value Input: D l×n = {(X i )} (X i ∈ R l , i = 1, 2, . . . , n) I, instance set of current node d, feature dimension The XGBoost algorithm includes three types of parameters: general, booster and learning task. There are relatively many types of parameters, and the selection of parameter combinations directly affects the accuracy of boosting tree regression prediction. In machine learning, a reasonable parameter selection can improve the prediction accuracy of the model to a large extent. In this study, the RS optimization algorithm was used to optimize some parameters of XGBoost, and the other parameters selected appropriate default values according to the training process. The range of parameter values and final parameter determination optimized by the RS algorithm are shown in Table 2.  [30,200] In this study, the objective function of RS was set to the mean squared error (MSE) as the calculation error; the maximum number of iterations was set to 50; and the maximum value of MSE was set to 3.00. In the general parameters of XGBoost, the parameter "booster" was set to the "gbtree"; the parameter "n_thread" was set to not exceed the maximum possible number of threads of the processor; and other parameters were set to the default values. After the above optimized parameters were given the initialized default values, the processed data set was input into the optimization function of the RS, and the partial results of the optimization are shown in Table 3. By calculating the MSE, a more suitable parameter combination was found when the iteration number was 16th. However, the effect of the parameter combination gradually deteriorated with the number of iterations continuing to increase. Therefore, the parameters of the XGBoost were selected as the parameter combination in the 16th iteration to forecast APIIU of AQI and PM 2.5 in this research.

Analytical Results of APIIU
Before using SC analysis, based on the influence path of industrialization and urbanization on AQI and PM 2.5 , this research constructed the structural equation modeling (SEM) using AMOS to evaluate the reasonableness of the model and preliminarily screen the relevant indicators from the indicators system [71,72]. Additionally, the least square estimation method was selected to perform the parameter estimation of SEM. In this research, the comparative fit index (CFI), Akaike information criterion (AIC) and root mean square error of approximation (RMSEA) were selected as evaluation indicators. The model had better performance when CFI was higher than 0.8, RMSEA was lower than 0.1, and AIC was relatively small. Based on the structural equation and evaluation results, this research finally deleted the indicators with a moderating effect, including the square of coal consumption per land area, square of proportion of population and square of density of population. Hence, the above three indicators were no longer considered when using SC analysis. The SC correlation analysis results are shown in Figures 4 and 5. forecast APIIU of AQI and PM2.5 in this research.

Analytical Results of APIIU
Before using SC analysis, based on the influence path of industrialization and urbanization on AQI and PM2.5, this research constructed the structural equation modeling (SEM) using AMOS to evaluate the reasonableness of the model and preliminarily screen the relevant indicators from the indicators system [71,72]. Additionally, the least square estimation method was selected to perform the parameter estimation of SEM. In this research, the comparative fit index (CFI), Akaike information criterion (AIC) and root mean square error of approximation (RMSEA) were selected as evaluation indicators. The model had better performance when CFI was higher than 0.8, RMSEA was lower than 0.1, and AIC was relatively small. Based on the structural equation and evaluation results, this research finally deleted the indicators with a moderating effect, including the square of coal consumption per land area, square of proportion of population and square of density of population. Hence, the above three indicators were no longer considered when using SC analysis. The SC correlation analysis results are shown in Figure 4 and Figure 5.   Before using SC to analyze the data, this research used the maximum and minimum standardized method in data processing, and the value of each indicator was controlled between 0 and 1. In order to obtain the weights of relevant indicators affecting the APIIU of AQI (APIIU-AQI) and APIIU of PM2.5 (APIIU-PM2.5), taking AQI and PM2.5 as target variables, respectively, the correlation coefficients are shown in Figures 4 and 5. The indicators were from the indicators system shown in Table 1. In this research, the correlation coefficients of AQI and PM2.5 were greater than 0.3, respectively. The correlation coefficients in the SC analysis results reflected the degree of importance of the role of each indicator in the APIIU-AQI and AQPIIU-PM2.5. Hence, the correlation coefficient of each indicator could be used to express the indicator weight. Using the weighting equation calculated using the air pollution index according to principal indicators, the APIIU could Before using SC to analyze the data, this research used the maximum and minimum standardized method in data processing, and the value of each indicator was controlled between 0 and 1. In order to obtain the weights of relevant indicators affecting the APIIU of AQI (APIIU-AQI) and APIIU of PM 2.5 (APIIU-PM 2.5 ), taking AQI and PM 2.5 as target variables, respectively, the correlation coefficients are shown in Figures 4 and 5. The indicators were from the indicators system shown in Table 1. In this research, the correlation coefficients of AQI and PM 2.5 were greater than 0.3, respectively. The correlation coefficients in the SC analysis results reflected the degree of importance of the role of each indicator in the APIIU-AQI and AQPIIU-PM 2.5 . Hence, the correlation coefficient of each indicator could be used to express the indicator weight. Using the weighting equation calculated using the air pollution index according to principal indicators, the APIIU could be expressed as in Equation (3): where I i is the APIIU of the i th unit with different principal indicators, X ij is standardized values of the j th indicator of the i th unit, and S j is the SC value of the j th indicator.
The APIIU-AQI and APIIU-PM 2.5 in every city calculated from the SC results showed that there were significant temporal and regional differences among the three provinces from 2017 to 2019. The changing trends in the APIIU-AQI and APIIU-PM 2.5 are shown in Figures 6a-c and 7a-c over the past 3 years, respectively. Considering the development of industrialization and urbanization, the APIIU-AQI and APIIU-PM 2.5 showed a significant changing trend from 2017 to 2019.
Based on the changing trend of APIIU-AQI and APIIU-PM 2.5 of every city within three provinces over the past 3 years, the following observations can be made. Hubei Province was an area with higher APIIU-AQI among the three provinces, indicating that the development of industrialization and urbanization had a more serious impact on AQI. From 2017 to 2019, the APIIU-AQI in 2019 in every city increased by 20-30% compared to 2017. Hunan Province was an area with higher APIIU-PM 2.5 among the three provinces, indicating that the development of industrialization and urbanization had a more serious impact on PM 2.5 . From 2017 to 2019, the APIIU-PM 2.5 of every city within the three provinces over the past 3 years were higher compared to APIIU-AQI. In the past three years, the APIIU of some cities in Hunan Province has declined; however, some cities in Henan Province and Hubei Province have shown a trend of initial decline and then rise. The data showed a slowly decreasing trend and indicated that Hunan Province was paying attention to the protection and governance of the environment when developing its economy. Additionally, the industry has transformed into a new energy and environmentally friendly direction; the population density has decreased; and the air pollutant emissions emitted by human activities have decreased in Hunan Province. With an increasing trend in Henan Province and Hubei Province, this phenomenon showed that the proportion of secondary industry in the two provinces still occupied a major position. In addition, the increasing population and the government's control with a slow rate of environmental governance made the industrial and human activities aggravate environmental pollution. Through the assessment of APIIU-AQI and APIIU-PM 2.5 of regional air pollution and the descriptive analysis, the changing results of AQI and PM 2.5 were generally consistent with the findings published by the China Environmental Statistical Yearbook in Henan Province, Hubei Province and Hunan Province, respectively.
Air pollution has caused a serious impact on the natural environment, human society and residents' health in recent years. Hence, it is vital to evaluate the influence of the development of regional industrialization and urbanization on AQI and PM 2.5 , which could help the government plan reasonable policies to a certain degree. For Henan Province, with a large population and agricultural industry, importance should be attached to increasing the development of environmental protection enterprises while transforming to heavy industrialization. It is necessary to avoid excessive population concentration during urbanization. For Hubei Province, with a high GDP, the pace of development of traditional industries can be slowed down while developing industrialization. The transformation to new energy enterprises can be intensified, which can not only protect the environment but prevent more population from outflowing. For Hunan Province, with the promotion of cleaner production technology and increasing financial investment in environmental protection, the environmental quality has been greatly improved. However, it should avoid repeating the same predicament. Environmental governance should be processed simultaneously while developing the economy and urbanization.  Based on the changing trend of APIIU-AQI and APIIU-PM2.5 of every city within three provinces over the past 3 years, the following observations can be made. Hubei Province was an area with higher APIIU-AQI among the three provinces, indicating that the development of industrialization and urbanization had a more serious impact on AQI. From 2017 to 2019, the APIIU-AQI in 2019 in every city increased by 20-30% compared to 2017. Hunan Province was an area with higher APIIU-PM2.5 among the three provinces, indicating that the development of industrialization and urbanization had a more serious impact on PM2.5. From 2017 to 2019, the APIIU-PM2.5 of every city within the three provinces over the past 3 years were higher compared to APIIU-AQI. In the past three years, the APIIU of some cities in Hunan Province has declined; however, some cities in Henan Province and Hubei Province have shown a trend of initial decline and then rise. The data showed a slowly decreasing trend and indicated that Hunan Province was paying

Evaluation Indicator
To reasonably evaluate the prediction model, the square of determination coefficient of real value (R 2 ), the root mean square error (RMSE) and the mean absolute percentage error (MAPE) were selected to test the prediction result. In statistical experiments, RMSE actually describes a degree of dispersion, which reflects the size of the average prediction error. MAPE can be used to evaluate different models on the same data source, which actually reflects the median relative error. The evaluation indicators can be shown as follows: where n represents the total number of test samples, y * i , y i and y represent the predicted value, the real value and the average real value in Equations (4)-(6), respectively.

Result Analysis Based on SC-RS-XGBoost
To verify the performance of the SC-RS-XGBoost model in predicting APIIU-AQI and APIIU-PM 2.5 , this model was compared with the SC-XGBoost model, RS-XGBoost model and XGBoost model. For the SC-RS-XGBoost model and SC-XGBoost model, this research used SEM to screen the indicators and adopted the maximum and minimum method to standardize the indicators as the training data. Additionally, the original data were multiplied by the weights after SC analysis as the final training data. For the RS-XGBoost model and XGBoost model, this research used SEM to screen the indicators as the training data. Data were collected from 49 cities from 2017 to 2019, and the number of samples was 147. After the data were preprocessed, the feature data and target data were input into the different models, and the ratio of training data and testing data was set to 3:1. APIIU-AQI and APIIU-PM 2.5 were selected as the target variables, respectively. Industrialization, urbanization and meteorological indicators were selected as the characteristic variables.
The predicted values of different models are shown in Figures A1 and A2 in Appendix A. Collecting the prediction results and calculating the evaluation indicators, the results are shown in Table 4. Using SC, reducing the input variables and reducing the target function convergence value effectively solved the correlation between the input variables and the defection of excessive input data. In this paper, RF was used to optimize the parameters of XGBoost to reduce local overfitting, showing better prediction performance. It can be seen from the results that SC-RS-XGBoost performed better than SC-XGBoost. As shown in Table 4, we found that the traditional XGBoost model had the largest prediction error. As seen from the comparison of R 2 , RSME and MAPE of APIIU-AQI, the prediction accuracy of the SC-RS-XGBoost model improved by 6 Figures S1-S3. It can be seen from the calculation results of evaluation indicators that other models did not perform as well as the SC-RS-XGBoost model.
Additionally, through the calculation results of evaluation indicators, we found that the prediction accuracy of APIIU-AQI was higher than APIIU-PM 2.5 . In terms of RMSE and MAPE, the difference between the APIIU-AQI and APIIU-PM 2.5 was also basically inconspicuous. However, the values of R 2 were all higher than 0.89, showing that the SC-RS-XGBoost model had a good fitting performance on APIIU-AQI and APIIU-PM 2.5 . By verifying the effectiveness and feasibility of the proposed model, the SC-RS-XGBoost model accurately predicted the APIIU-AQI and APIIU-PM 2.5 and provided helpful insights for the government's environmental protection and air pollution governance. This could make regional governments pay more attention to air pollution when developing industrialization and urbanization.

Conclusions
Most previous research works have analyzed AQI and PM 2.5 spatiotemporal distribution using statistical methods and regression analyses to evaluate the air quality. The goal of the current research was to analyze and determine the influence of the development of industrialization and urbanization on AQI and PM 2.5 . In this period of artificial intelligence, a SC-RS-XGBoost model for air pollution assessment could effectively improve the assessment accuracy and provide a new reference source for future air pollution management. Based on previous relevant research works, this research constructed a regional air pollution assessment indicators system to evaluate APIIU-AQI and AQIIU-PM 2.5 using data from Henan Province, Hubei Province and Hunan Province. The indicators system of this research adopted three aspects, including industrialization indicators, urbanization indicators and meteorological indicators. The following conclusions were drawn from our findings: (1) The principal indicators were screened using SC, and the weights of each indicator were determined according to the correlation coefficients. From the correlation analysis, there were differences in the impact of different indicators on AQI and PM 2.5 . The indicators, including regional GDP, proportion of secondary industry, coal consumption, density of exhaust emissions, total city population, proportion of population, density of population and several relevant meteorological indicators, had a high influence on AQI. However, the proportion of secondary industry, coal consumption, exhaust emissions, density of exhaust emissions, population of city jurisdiction, total city population, administrative land areas and other related indicators had a high influence on PM 2.5 .
(2) Using the weights equation to calculate the APIIU-AQI and APIIU-PM 2.5 of each region from 2017 to 2019, we identified that 17 cities from Hubei Province, which has the largest regional GDP and the largest proportion of the population, had the largest APIIU-AQI in the past three years. Moreover, APIIU-AQI increased sharply compared with 2017 in the three provinces. Fourteen cities from Hunan Province, which has the largest proportion of secondary industry and the largest administrative land area, had the largest APIIU-PM 2.5 in the past three years. APIIU-PM 2.5 has gradually increased compared with 2017 in the three provinces.
(3) This research verified that the SC-RS-XGBoost could be used as a method of air pollution assessment. Analyzing the influence of industrialization and urbanization on AQI and PM 2.5 is of great significance for future regional sustainable development. It is necessary for carrying out more advanced research and constructing a more accurate indicators system and prediction models for the various different regions to deliver useful information for the government officials and policymakers.
The present research contributed to the limited knowledge in the regions regarding the influence of industrialization and urbanization on spatiotemporal variations of AQI and PM 2.5 in three Chinese provinces, and it is important to conduct additional research in other regions on this topic. AQI and the concentration of PM 2.5 in different regions are affected by wind speed and direction at different altitudes to some extent. It is worth noting that this research was focused only on static and mild wind conditions, and the annual average wind speed was only observed at 70 m-80 m altitude. The result was limited, since the data selected for discussion were monitored under specific climate conditions. Hence, long-term continuous monitoring under various weather conditions is needed in future research. Due to differences in industrial and human activity emissions in different seasons, in order to explore the seasonal changes, it is necessary to consider the influence of industrialization and urbanization indicators on AQI and PM 2.5 in different seasons in future research.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/atmos13091377/s1, Table S1: The statistical analysis of PM 2.5 in 2017 in three provinces; Table S2: The statistical analysis of PM 2.5 in 2018 in three provinces; Table S3: The statistical analysis of PM 2.5 in 2019 in three provinces; Figure

Conflicts of Interest:
The authors declare no conflict of interest.