Measuring the Critical Inﬂuence Factors for Predicting Carbon Dioxide Emissions of Expanding Megacities by XGBoost

: CO 2 is the main greenhouse gas. Urban spatial development, land use, and so on may be affected by CO 2 and climate change. The main questions studied in this paper are as follows: What are the drivers of CO 2 emissions of expanding megacities? How can they be analyzed from different perspectives? Do the results differ for megacities at different stages of development? Based on the XGBoost model, this paper explored the complex factors affecting CO 2 emissions by using data of four Chinese megacities, Beijing, Tianjin, Shanghai, and Chongqing, from 2003 to 2017. The main ﬁndings are as follows: The XGBoost model has better applicability and accuracy in predicting carbon emissions of expanding megacities, with root mean square error (RMSE) as low as 0.036. Under the synergistic effect of multiple factors, population, land size, and gross domestic product are still the primary driving forces of CO 2 emissions. Population density and population become more important in the single-factor analysis. The key drivers of CO 2 emissions in megacities at respective developmental stages are different. This paper provides methods and tools for accurately predicting CO 2 emissions and measuring the critical drivers. Furthermore, it could provide decision support for megacities to make targeted carbon-emission-reduction strategies based on their own developmental stages.


Introduction
Carbon dioxide (CO 2 ) is recognized as the main greenhouse gas (GHG). CO 2 emissions (CE) have become the main cause of global warming [1]. To address climate change, Japan, Britain, Canada, South Korea, and other countries have successively put forward political commitments to achieve the goal of carbon neutrality by 2050. Maintaining or reducing CE has become a challenging task for most countries [2]. On the basis of the International Energy Agency (IEA), global annual CE have increased by 12.8 gigatonnes (Gt) over the last 19 years [3]. As centers of human activities, 71% of global CE come from urban areas [4,5]. By 2030, K.C. Seto et al. predicted that more than 5.87 million km 2 of land have a high probability of being converted to urban areas, with 20%of this having high potential for urban expansion [6]. The expansion of urban area leads to urban sprawl, which brings CE problems in industry, transportation, construction, infrastructure, and so on. More and more governments are emphasizing the hazards arising from CE caused by urban expansion. The transformation of the global environment will be affected by the initial regional changes that will gradually extend to larger regions [7].
In cities, there are many factors that have significant impacts on CE; the case in point is urban land use patterns [8], socioeconomic driving factors [9], urban forms [10], and so on. However, the process-based character of urban expansion is denied by such research [11].
It is known that population growth and the growing needs of the population living in a given area are the key driving factors for urban expansion [12]. Demands for residential,

•
The authors collect the data of four megacities from 2003 to 2017 and make a detailed analysis about the critical factors of urban CE based on these data. These analysis results can be helpful to the governments.

•
The authors discuss the impact of different factors on urban CE from both a single and synergistic view. The authors employ XGBoost to predict urban CE and compare it with ordinary linear regression (OLR) and multi-layer perception (MLP) algorithms.

•
The authors further discuss the impact of different factors on CE of urban expansion under different urban development stages. The findings of this research could provide decision support for cities to make targeted strategies of carbon emissions reduction according to their own developmental stages.
This article's basic structure is as follows. Section 2, following the introduction, incorporates relevant research from predecessors. Section 3 discusses the research topic, variable collection, and processing methods. The experimental results are discussed in Section 4. Section 5 delves deeper into the relationship between typical variables and CE from urban expansion. Section 6 concludes with a summary of this article.

Literature Review
The key to predicting total CE is to scientifically extract the driving force behind CE. [25]. Numerous factors have been used to predict total CE based on some methods, including total population, affluence, technology, energy, land use, and urban form. Many studies have developed models for predicting CE and have used various factors to simulate CE in various scenarios. The IPAT model, STIRPAT model, MARKAL model, LEAP model, coupled social-ecological systems model (CSES), social-ecological process model, socioeconomic factors, and spatial structure (CSS) models are linked via the urbanization energy consumption-CO 2 Emission System Dynamics (UEC-SD) model, with the majority of them based on OLR or other models.
Population, energy, technology, and affluence have an obvious positive impact on CE [19]. Since the 1970s, IPAT identity has been a widely accepted method for analyzing population, the economy, and technology to identify the forces driving CE [26,27]. However, through empirical results, CE at the national level will be significantly affected by the economy, technology, and energy [19]. Therefore, rapid economic growth is the primary determinant that causes an increase in CE by a regional perspective [28], while the variable T in the IPAT model is usually incorrectly treated as a residual term encompassing everything that affects CE other than population or affluence. The model also allows for testing the effect of different variables, such as physical space, which can trigger CE [29].
Due to rapid urbanization, a large amount of energy is consumed by the city, which also leads to higher CE. As energy demand grows more than supply, it is essential for cities to improve energy efficiency. To reduce urban energy consumption, many scholars have studied low carbon transition of the key CE sectors in urban regions, especially building energy consumption. For example, some scholars have used dynamic emission scenarios to simulate the energy and emission peaks of China's residential construction industry [30] and some researchers have shown that Heating Ventilation and Air Conditioning (HVAC) system plays a key role in decarbonization of existing buildings [31]. Since the mid-1970s, energy system analysts have been trying to reduce CE by using models. The MARKAL model is an energy and environmental system planning model with the lowest cost and can be used to explore medium and long-term responses to different CE [32]. According to the explanation of V. Bhatt et al., by using this model, the best strategies for climate change mitigation, design of long-term energy security, and environmental protection in fast-growing urban areas are selected, but this model cannot analyze the impact of other factors on CE [33]. The sources of CE in the energy sector and non-energy sectors can be explained by the Long-Range Energy Alternatives Planning System (LEAP) model. The system aims to identify and quantifying the impact of future energy consumption patterns on CE, and energy planners and decision makers will be helped [34]. Also, the LEAP model has a flexible data structure and requires low initial data. Likewise, the energy consumption, transformation, and CE are the main factors of its framework, but there are limitations in defining some variables [35].
The change in land use has directly or indirectly influenced CE in the past years [7]. The analytical results reveal that CE usually increased with the increases in urban-planned zones [22]. The assessment of land use's influence will be one of the primary directions of CE prediction in the future [7]. For example, some scholars coupled the semi-distributed hydrological model (HYPE) and Land-Use Evolution and Impact Assessment model (LEAM) to quantify essential urbanization links, effects, and feedback of socioeconomically driven land-use changes and hydrological changes. They present a CSES model that can provide such support [36]. Some researchers presented a social-ecological process model (SES) and systems modelling framework for examining future GHG emissions (include CE) of various scenarios [37]. The advantage of the SES model is that it enhances land use on CE urban expansion. However, the application of different expansion modes in other cities needs further exploration.
Except for the socioeconomic attributes, some studies show that the relationship between urban form and CE have significant correlations [16]. The urban expansion process is very complicated. Urban planning should form the basis of the decisions or measures that create our environment and influence CE [38]. Some scholars have built the UEC-SD model using a system dynamics method to compare the CE change and rules in different urbanization rates [21]. System dynamics, cellular automata and support vector regression are combined by some researchers to evaluate the impacts of different socioeconomic developments and urban spatial structure on CE. The CSS model found that the government should construct an ideal urban structure of compact and multiple-nuclei development through spatial optimization for building a low carbon city [9].
In conclusion, scholars mostly focus on national, provincial, and general cities in their research on the relationship between urban expansion and CE, ignoring megacities in the late stages of urbanization. The megacity development stage is relatively mature, and the factors influencing CE are more complex, necessitating more accurate research models as support. Studies on CE driving factors mostly analyze the driving factors of CE in a region from the perspective of multifactor synergy, ignoring the influence of a single factor, and rarely studying the differences of CE driving factors in cities at different development stages. To address the aforementioned research gap, this paper will examine the key influencing factors of megacities at various stages of development from the perspectives of multiple factors and single factors to provide support for similar megacities around the world to formulate carbon emission reduction policies.
Along with the technical development in the computer, the Particle Swarm Optimization (PSO) algorithm [20], supervised machine learning regression technique [39], and some other methods are applied for forecasting the CE. Unfortunately, these methods still have some problems, such as complex parameter adjustment and low convergence accuracy [40]. These methods are not suitable for the mechanism research and prediction of carbon emissions in megacities. Compared with the above methods, XGBoost has a faster training speed and higher accuracy, and can better deal with linear and nonlinear data. XGBoost belongs to an integrated learning model, which generates a model with high accuracy through continuous iterative fitting of residuals [41]. It has been widely used in the study of traffic flow forecasting [42], user behavior prediction [43], disease risk prediction [44], and so on. However, few researchers use the XGBoost model to predict carbon emissions. Therefore, this paper attempts to use the XGBoost model to explore the driving factors of CE of expanding megacities.

Study Area
Beijing, Shanghai, Tianjin, and Chongqing play a key role in promoting the economic development of Chinese cities, and they are also four municipalities directly under the direct management of the Chinese central government [45]. According to statistics in 2018, these four metropolitan areas accounted for 1.22% of Chinese land area. In addition, they have not only experienced economic growth and social development, but also the dramatic expansion of cities [46]. Some research indicates that annual GHG emission of these four cities exceed 500 million tons [47]. Based on this background, these four megacities have been selected as case studies. The impact of urban expansion on CE will be explored, and the results of the paper may have typical significance for other megaregions in other countries.

Methods and Models
The main steps of this research are listed below: (1) collecting the four megacities' energy supply data to calculate CE; (2) the potential factors of CE that affect urban expansion are selected and quantified; (3) using regression analysis to analyze influencing factors; (4) the relative importance of the influencing factors will be evaluated, and the interaction between the factors will be calculated at the same time; (5) results are analyzed and interpreted. The workflow is shown in Figure 1. The XGBoost, OLR, and MLP model is used to evaluate the significance between CE and influence factors in step (4).
tors; (4) the relative importance of the influencing factors will be evaluated, and the interaction between the factors will be calculated at the same time; (5) results are analyzed and interpreted. The workflow is shown in Figure 1. The XGBoost, OLR, and MLP model is used to evaluate the significance between CE and influence factors in step (4).

XGBoost
XGBoost is a system for tree boosting that can scale machine learning. The system's impact has been widely recognized, such as Kaggle and KDDCup. The runs of the course are compared with the existing popular solutions on a single machin; not only does it run more than ten times faster, but it can scale to billions of examples in distributed or memory-limited settings [48]. The trained XGBoost model can automatically calculate the feature importance of each factor in the prediction model and measure the relationship between each factor and CE by the feature importance of each factor. The forecast assessment process is as follows [49]: First, given a sample {(xi, yi)} (1, 2, …, n), a regression tree is constructed for each sample, then there are n regression trees. The formula is as follows: In the formula, xi is the sample i, is the predicted value of the sample i, ( ) is the regression equation corresponding to the regression tree k of the variable i, and F is the set of all regression trees.
Second, an objective function consisting of a loss function and a regularization term is constructed. The loss function is used to fit the training data and the regularization term is used to simplify the model: In the formula, (∅) is the objective function, ( ， ( ) + ( )) is the loss function, ( ) is the regularization term, and is the real value of the sample. Then, the loss function is expanded by Taylor to approximate the true value, and the formula is as follows:

XGBoost
XGBoost is a system for tree boosting that can scale machine learning. The system's impact has been widely recognized, such as Kaggle and KDDCup. The runs of the course are compared with the existing popular solutions on a single machin; not only does it run more than ten times faster, but it can scale to billions of examples in distributed or memorylimited settings [48]. The trained XGBoost model can automatically calculate the feature importance of each factor in the prediction model and measure the relationship between each factor and CE by the feature importance of each factor. The forecast assessment process is as follows [49]: First, given a sample {(xi, yi)} (1, 2, . . . , n), a regression tree is constructed for each sample, then there are n regression trees. The formula is as follows: In the formula, xi is the sample i,ŷ i is the predicted value of the sample i, f k (x i ) is the regression equation corresponding to the regression tree k of the variable i, and F is the set of all regression trees.
Second, an objective function consisting of a loss function and a regularization term is constructed. The loss function is used to fit the training data and the regularization term is used to simplify the model: is the loss function, Ω( f k ) is the regularization term, and y i is the real value of the sample.
Then, the loss function is expanded by Taylor to approximate the true value, and the formula is as follows: Atmosphere 2022, 13, 599 6 of 18 In the formula, L(y i ,ŷ i (t−1) ) is the loss value of t − 1 trees before sample i, Finally, the objective function is:

Multi-Layer Perceptron
Artificial neural networks can solve a variety of complex prediction problems [50]. According to the prediction involved in the given current and previous conditions, the model can predict the future trend in the data series [51]. The MLP neural networks were observed to be better compared to predicting CE with regression models [52].

Dependent Variable
This paper's dependent variable was the total CE and CE per capita of the four megacities from 2003 to 2017. The population and CE are combined. This combination is of great significance for determining the share of CE reduction in the region [53]. The data acquisition method and process are as follows.
First, according to the 2006 IPCC [54], this paper used the four megacities' energy supply data to calculate the CE from the combustion of mainly fossil fuels, which included Raw Coal, Crude Oil, Coke, Diesel Oil, Gasoline, Fuel Oil, Kerosene, and Natural Gas. The energy consumption data were obtained from the China Energy Statistical Yearbook of 2004 to 2018. Refer to " (5)." The computational formula is as follows: where CE (tons) refers to annual total CE from the built-up environment; CE energy-i represents the quantity of CE from energy i; ADi is the quantity of energy consumption by energy i, NCVi, CCi, COFi and 44/12 are the net caloric value, carbon content, oxygenation efficiency of energy i and molecular weight ratio of CO 2 to C. The energy supply data is in Table 1. The CE per capita can be calculated to refer to "(6)," as follows: where CEPC means CE per capita, CE stands for the amount of CE, and P (persons) represents Permanent Population. The permanent population data was from the four megacities' Statistical Yearbook from 2004 to 2018. . I there are eight influencing factors selected, which are from the four dimensions of the city's size, economic development, technology, and the difference of regional space in urban areas as variables.
The size of the city involves population size and urban built-up area [1]. Population urbanization and land urbanization usually have a positive impact on CE. Therefore, this paper chooses the permanent population and land for construction in urban areas (L/Square kilometer (km 2 )) to measure city size. Economic growth can directly promote energy production and consumption, causing more CE [55]. Some research found strong positive correlation between CE and real GDP [56]. The present study used GDP and GDP per capita (GDPPC) to measure urban economic development. The technology has a positive impact on energy efficiency [42]. Usually, the more advanced technology is, the lower CE intensity, namely, energy consumption per GDP (CI). To clarify the impact of spatial urbanization on CE of urban expansion, this paper used the indicators of the population density (PD), residential density (RD), and the urban development model is reflected by the ratio of built-up area to urban area (BU) [57][58][59]. The descriptive statistics for the variables are in Tables 2-5.

Results
This paper fitted the data based on the XGBoost algorithm and compared it with OLR and MLP algorithms. The authors also select several standard evaluation indexes of regression algorithm, such as mean square error (MSE), mean absolute error (MAE), mean fundamental percentage error (MAPE), root mean square error (RMSE), and symmetrical mean absolute percentage error (SMAPE) to compare the regression effects of different models. The authors study the relationship between each influencing factor and CE, and the output and fitting results of the model are shown in Tables 6-9.   Based on the above results, XGBoost algorithm has the best appropriate degree in each city's data, which indicates that XGBoost algorithm is the best choice in explaining the impact of various factors on urban CE. Therefore, the authors conduct the following experimental analysis using the XGBoost model. And the relevant results of the relationship between influencing factor and CE/CE per capita are shown in Figures 2 and 3.     Figure 4 shows the results of the XGBoost model. In the fifteen years studied, at the CE scale, in Beijing, the relative importance of influence factors is in turn P, GDP, GDPPC, CI, PD, L, and BU, and the percentages are 27.52%, 26.84%, 23.49%, 9.40%, 6.04%, 6.04% and 0.67%. In Tianjin, the relative importance of influence factors is P, L, GDP, GDPPC, PD, and CI, and the percentages are 32.83%, 24.63%, 19.40%, 12.69%, 6.72%, 3.73%. In Shanghai, the relative importance of influence factors is P, GDP, GDPPC, CI, PD, L and BU, and the percentages are 36.74%, 25.17%, 19.05%, 11.56%, 3.40%, 3.40% and 0.68%. in Chongqing the relative importance of influence factors are in turn L, P, GDP, GDPPC, PD, RD, CI and BU, the percentages are 28.13%, 25.78%, 18.75%, 14.06%, 6.25%, 4.69%, 1.56% and 0.78%.  Figure 4 shows the results of the XGBoost model. In the fifteen years studied, at the CE scale, in Beijing, the relative importance of influence factors is in turn P, GDP, GDPPC, CI, PD, L, and BU, and the percentages are 27.52%, 26.84%, 23.49%, 9.40%, 6.04%, 6.04% and 0.67%. In Tianjin, the relative importance of influence factors is P, L, GDP, GDPPC, PD, and CI, and the percentages are 32.83%, 24.63%, 19.40%, 12.69%, 6.72%, 3.73%. In Shanghai, the relative importance of influence factors is P, GDP, GDPPC, CI, PD, L and BU, and the percentages are 36.74%, 25.17%, 19.05%, 11.56%, 3.40%, 3.40% and 0.68%. in Chongqing the relative importance of influence factors are in turn L, P, GDP, GDPPC, PD, RD, CI and BU, the percentages are 28.13%, 25.78%, 18.75%, 14.06%, 6.25%, 4.69%, 1.56% and 0.78%. Figure 5 shows the results of the XGBoost model. In the fifteen years studied, at the CEPC scale, in Beijing, the relative importance of influence factors is in turn P, L, GDP, GDPPC, CI, BU and RD, the percentages are 26.98%, 25.40%, 19.05%, 9.52%, 9.52%, 5.56%, and 3.97%. In Tianjin, the relative importance of influence factors is P, GDP, L, GDPPC and CI, and the percentages are 40%, 25.72%, 18.57%, 8.57%, and 7.14%. In Shanghai, the relative importance of influence factors is P, GDP, L, GDPPC, CI and PD, and the percentages are 35.09%, 24.56%, 19.30%, 10.52%, 7.02%, and 3.51%. In Chongqing, the relative importance of influence factors are, in turn P, L, GDP, GDPPC, RD, PD, BU, and CI, and the percentages are 28.68%, 23.26%, 20.93%, 14.73%, 6.20%, 4.65%, 0.78%, and 0.77%.   Figure 5 shows the results of the XGBoost model. In the fifteen years studied, at the CEPC scale, in Beijing, the relative importance of influence factors is in turn P, L, GDP, GDPPC, CI, BU and RD, the percentages are 26.98%, 25.40%, 19.05%, 9.52%, 9.52%, 5.56%, and 3.97%. In Tianjin, the relative importance of influence factors is P, GDP, L, GDPPC and CI, and the percentages are 40%, 25.72%, 18.57%, 8.57%, and 7.14%. In Shanghai, the relative importance of influence factors is P, GDP, L, GDPPC, CI and PD, and the percentages are 35.09%, 24.56%, 19.30%, 10.52%, 7.02%, and 3.51%. In Chongqing, the relative importance of influence factors are, in turn P, L, GDP, GDPPC, RD, PD, BU, and CI, and the percentages are 28.68%, 23.26%, 20.93%, 14.73%, 6.20%, 4.65%, 0.78%, and 0.77%. In order to further discuss the influence of various factors on the CE and CEPC in the regression analysis results of the total CE and CEPC model, the authors further consider the final model results by using a single feature for regression analysis and deleting a

The Importance of the Influencing Factors in Single Factor
In order to further discuss the influence of various factors on the CE and CEPC in the regression analysis results of the total CE and CEPC model, the authors further consider the final model results by using a single feature for regression analysis and deleting a feature for regression analysis. The experimental results are shown in Figure 6. In order to further discuss the influence of various factors on the CE and CEPC in the regression analysis results of the total CE and CEPC model, the authors further consider the final model results by using a single feature for regression analysis and deleting a feature for regression analysis. The experimental results are shown in Figure 6.

CE
In CE, PD and P are the most critical factors affecting CE, followed by GDP, GDPPC, CI, and RD. The effects of L and BU were not significant in Beijing. CI, P, and PD are the

CE
In CE, PD and P are the most critical factors affecting CE, followed by GDP, GDPPC, CI, and RD. The effects of L and BU were not significant in Beijing. CI, P, and PD are the most critical factors affecting CE, followed by GDPPC, GDP, and L. The effects of BU and RD were not significant in Tianjin. GDP and GDPPC are the most critical factors affecting CE, followed by CI, P, PD, and RD. The effects of L and BU were not significant in Shanghai. CI, PD, and P are the most critical factors affecting CE, followed by GDP, GDPPC, and L. The effects of BU and RD were not significant in Chongqing.

CEPC
In CEPC, P and PD are the most critical factors affecting CEPC, followed by GDP, GDPPC, and CI. The effects of L, BU, and RD were not significant in Beijing. P and PD are the most critical factors affecting CE, followed by L, GDPPC, GDP, and BU. The effects of CI and RD was not significant in Tianjin. P, PD, and RD are the most critical factors affecting CE, followed by GDP and GDPPC. The effects of CI, L, and BU were not significant in Shanghai. BU and L are the most critical factors affecting CE, followed by GDPPC, GDP, PD, and P. The effects of RD and CI were not significant in Chongqing.

Discussion
Based on XGBoost, the authors derive several meaningful conclusions about the relationship between CE of urban expansion and the influence factors' growth rate. In the aspect of multi factors coordination and single factors prediction, different elements have different effects. Furthermore, the scale of total CE and per capita CE are different. The empirical results are shown in Figure 7.
Based on XGBoost, the authors derive several meaningful conclusions about the lationship between CE of urban expansion and the influence factors' growth rate. In aspect of multi factors coordination and single factors prediction, different elements h different effects. Furthermore, the scale of total CE and per capita CE are different. empirical results are shown in Figure 7. The results show that under the influence of multiple factors, the CE scale of Bei and Shanghai are greatly affected by P, GDP, and GDPPC, while those of Tianjin Chongqing are greatly affected by P, L and GDP. The reason for this phenomenon is Beijing and Shanghai, as the two major economic centers of China, are at a higher leve development and attract a large number of migrant workers for employment and migr enterprises for investment every year with population growth and ecological devel ment. As a result, traffic congestion, industrial pollution, and energy consumption becoming increasingly serious, leading to more CE. For cities in the development sta The results show that under the influence of multiple factors, the CE scale of Beijing and Shanghai are greatly affected by P, GDP, and GDPPC, while those of Tianjin and Chongqing are greatly affected by P, L and GDP. The reason for this phenomenon is that Beijing and Shanghai, as the two major economic centers of China, are at a higher level of development and attract a large number of migrant workers for employment and migrant enterprises for investment every year with population growth and ecological development. As a result, traffic congestion, industrial pollution, and energy consumption are becoming increasingly serious, leading to more CE. For cities in the development stage, managers should take measures to increase the proportion of green travel in the city, improve the utilization of green energy, such as solar energy, and reduce CE from large-scale industrial production by changing the industrial structure. While Tianjin and Chongqing are in the rapid development of industrialization and urbanization period, the level of urban development is relatively low. In addition to population and ecological growth, the continued increase in construction land has brought about a large amount of infrastructure construction and growth in energy demand, resulting in an increase in carbon emissions. For cities at this stage of development, managers should focus on improving the intensive use of land and the mix of land functions. Thus, for more developed cities, population and economic growth drive the total amount of CE more effectively than the increase in urban land area in the progress of urban expansion. For generally developed cities, the increase of population, economy, and land area are the important driving factors of the total CE in the process of urban expansion. The CEPC scale of the above four cities is greatly influenced by P, L, and GDP. It indicates that population growth, land area expansion, and economic growth are important driving factors for the increase of per capita CE in megacities.
Under the influence of a single factor, the total carbon emissions of Tianjin and Chongqing have significant relevance with CI, P, and PD, and Beijing is relevant to P and PD, while Shanghai is significantly affected by GDP and GDPPC. The reason for this phenomenon is that Shanghai's GDP growth has been at a high level in the process of urban expansion, and the external performance of economic development and the increase in total CE has the same change trend. The other three cities showed the same trend of population increase and total carbon emissions increase in the process of urban expansion. Meanwhile, the energy consumption per GDP shows significant correlation with total carbon emissions in generally developed cities. The per capita carbon emissions of Beijing, Tianjin, and Shanghai are significantly affected by P and PD, while those of Chongqing were significantly affected by BU and L. The reason is that the area increment of built-up areas is always at a high level in the process of urban expansion in Chongqing, and land area expansion and per capita carbon emissions increase, which have the same change trend in the external performance. The other three cities showed the same trend of population increase and per capita CE increase in the process of urban expansion.
The findings show that megacities' CE are closely related to economic development, population growth, and the expansion of construction land area. The key drivers of carbon emissions differ in megacities at different development stages. When developing carboncontrol strategies for megacities, it should consider the economy, population, and land use. Simultaneously, the focus of emission reduction direction is determined based on the different stages of different cities. The driving effect of land expansion on CE is more significant in megacities, such as Chongqing, where the urbanization rate still has room for improvement, and attention should be paid to controlling the scale of construction land, reasonable division of land functions, construction of low-carbon land use system, increased use of green buildings, and rational planning of road network structure to reduce the emission brought by urban construction land expansion. The growth of the economy, population, and land area in Tianjin, a more developed regional central city, has a greater impact on CE, which should be reflected in the formulation of carbon-control strategies. The driving effect of economic growth on CE is more significant for a highly developed national economic center similar to Shanghai, and the synergy between economic growth and CE reduction should be emphasized when formulating carbon-control strategies. Under the condition of stable economic development, the authors should improve resource efficiency and reduce energy consumption to reduce the impact of economic development on carbon emissions. For Beijing, the country's most developed political center, the driving effect of population growth on CE is more significant, and when developing carbon-control strategies, the authors should focus on improving green and low-carbon awareness among the population and advocating green consumption, green travel, and green life of the low-carbon concept to reduce the impact of population growth on CE.
The contribution of urban expansion to regional carbon emissions has been studied by some researchers [60], the potential impacts of urban expansion on regional carbon storage [61], and the drivers of CE in international trade [62]. In contrast to these studies, the contribution of this paper is to measure the driving factors of CE in the process of urban expansion of megacities at different developmental stages. In the meantime, several XGBoost models and other variants have been developed for predicting a problem, for example, open-access data and machine learning are used to model and predict urban CE [63] and research on urban air quality prediction based on ensemble learning of XGBoost [64]. Compared with these studies, the contribution of this paper is to study the influence factors of total CE and per capita CE in the process of urban expansion from two different angles: multi-factor synergy and single-factor correlation. Along with the development of IoT and big data, different scales including neighborhood level, community level, etc., and more accurate data will be applied to the prediction of CE, especially the application of real-time data, which will provide a dynamic simulation platform for CE prediction and real-time control strategies.

Conclusions
Due to the completeness and availability of data, this paper collected data from four megacities (Beijing, Shanghai, Tianjin, and Chongqing) in China between 2003 and 2017. According to the data, the authors investigated the impacts of P, L, GDP, GDPPC, CI, PD, RD, and BU on CE of urban expansion by using the XGBoost model. The following are the main results.
First, compared with the traditional multiple linear regression and MLP algorithms, the XGBoost model has better applicability and accuracy in predicting total CE and CE per capita in urban expansion. The results show that the model is not only feasible but also achieves high predicting accuracy. The authors can use it to predict CE.
Second, under the synergistic effect of multiple factors, i.e., population, land size, and gross domestic product, are still the core, driving force of total CE and CE per capita of urban expansion. Similarly, this is consistent with existing research that shows that population, land, and economic growth are the main drivers of CE for megacities' city expansion [65].
Third, in the single-factor analysis, population density, population, and carbon emission intensity play a more prominent role. This paper reveals that the spatial development pattern of megacities also has a dramatic impact on CE. Global CE can be significantly mitigated through technological innovation [66].
Finally, the authors also found that the factors affecting urban CE are different for different megacities at different developmental stages. For those cities that are still in a rapid expansion stage of urban land area expansion, the increasing trend of per capita carbon emission is more consistent with the increasing trend of construction land area. For megacities with higher levels of economic development, population and economic growth are more significant driving factors of total CE, and the trend of increase in total CE is more consistent with the trend of economic development. CO 2 is the most important greenhouse gas, and it is of great significance to accurately predict the CE in the process of urban expansion. The paper is instructive to formulate carbon emissions reduction strategies and targets of megacities according to local conditions. It needs to be pointed out that although the contribution degree of different driving factors to urban carbon emissions is calculated in this paper, how much carbon emissions can be increased or decreased by the change of driving factors has not been calculated yet, for example, how much carbon emissions can be reduced by reducing a certain percentage of construction land area. Further exploration using other methods and data is needed in the future.