Next Article in Journal
Advancing Research on Urban Ecological Corridors in the Context of Carbon Neutrality: Insights from Bibliometric and Systematic Reviews
Previous Article in Journal
Seasonal Cycle of the Total Ozone Content over Southern High Latitudes in the CCM SOCOLv3
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data

1
School of Environmental and Energy Engineering, Anhui Jianzhu University, Hefei 230601, China
2
Anhui Provincial Key Laboratory of Environmental Pollution Control and Resource Reuse, Hefei 230000, China
3
School of Architecture and Urban Planning, Anhui Jianzhu University, Hefei 230009, China
4
College of Geoexploration Science and Technology, Jilin University, Changchun 130026, China
*
Author to whom correspondence should be addressed.
Atmosphere 2025, 16(10), 1173; https://doi.org/10.3390/atmos16101173
Submission received: 29 August 2025 / Revised: 1 October 2025 / Accepted: 8 October 2025 / Published: 9 October 2025
(This article belongs to the Special Issue Urban Carbon Emissions: Measurement and Modeling)

Abstract

Land use patterns are a critical driver of changes in carbon emissions, making it essential to elucidate the relationship between regional carbon emissions and land use types. As a nationally designated economic strategic zone, the Yangtze River Basin encompasses megacities, rapidly developing medium-sized cities, and relatively underdeveloped regions. However, the mechanism underlying the interaction between landscape patterns and carbon emissions across such gradients remains inadequately understood. This study utilizes nighttime light, land use and carbon emissions datasets, employing XGBoost, CatBoost, LightGBM and a stacking ensemble model to analyze the impacts and driving factors of land use changes on carbon emissions in the Yangtze River Basin from 2002 to 2022. The results showed: (1) The stacking ensemble learning model demonstrated the best predictive performance, with a coefficient of determination (R2) of 0.80, a residual prediction deviation (RPD) of 2.22, and a root mean square error (RMSE) of 4.46. Compared with the next-best models, these performance metrics represent improvements of 19.40% in R2 and 28.32% in RPD, and a 22.16% reduction in RMSE. (2) Based on SHAP feature importance and Pearson correlation analysis, the primary drivers influencing CO2 net emissions in the Yangtze River Basin are GDP per capita (GDPpc), population density (POD), Tertiary industry share (TI), land use degree comprehensive index (LUI), dynamic degree of water-body land use (Kwater), Largest patch index (LPI), and number of patches (NP). These findings indicate that changes in regional landscape patterns exert a significant effect on carbon emissions in strategic economic regions, and that stacked ensemble models can effectively simulate and interpret this relationship with high predictive accuracy, thereby providing decision support for regional low-carbon development planning.

1. Introduction

Land serves as a crucial transporter of carbon emissions [1,2]. According to the Global Carbon Emissions Statistics, the change in land use represents the second largest anthropogenic source of global carbon emissions [3]. Since 1850, cumulative emissions from changes in land use have amounted to approximately 20.5 billion tons of carbon, accounting for almost one-third of total anthropogenic emissions [4]. Human activities such as fossil fuel combustion, industrial production, urban expansion, and economic development all lead to large emissions of carbon dioxide [5,6], which are ultimately reflected in different land use practices. Therefore, exploring the spatiotemporal distribution patterns and driving factors of carbon emissions associated with regional land use change is essential for establishing a scientific basis to promote low-carbon land use strategies and to support sustainable ecological and socio-economic development.
Currently, the factors that influence the impact of land use on carbon emissions are concentrated mainly in energy intensity [7], economic activity [8], population density [9], industrial structure [10], and expansion of construction land [11]. Wang et al. [12] indicate that the key driver of growth in carbon dioxide emissions from energy consumption in China is sustained economic growth. They propose that optimizing the composition of energy consumption and industrial layout is an effective strategy to reduce carbon emissions. Zhang et al. [13] used Changzhou City as a case study to explore the bidirectional interaction mechanism between land use and carbon emissions and found that there is significant spatial heterogeneity in the distribution of land use patterns and transportation carbon emissions between urban and rural areas. The change in land use significantly influences the spatial and temporal patterns and intensity of regional carbon emissions through synergistic effects on multiple spatial scales, revealing the driving forces behind carbon emissions regarding economic growth, energy consumption and land use.
Scientific identification and quantitative analysis of the factors mainly include STIRPAT [14], Logarithmic Mean Divisia Index (LMDI) [15], SBM [16], panel data regression model [17], etc. Meng et al. [18] analyzed the factors that influence carbon emissions from land use in nine provinces of the Yellow River basin based on the LMDI model and showed that factors such as population density and land use structure have a positive effect on carbon emissions from land use. Yang et al. [19] analyzed the results based on the extended STIRPAT model and found that economic, population, energy, and land factors are direct determinants of regional carbon emissions from land use, with land serving as a key driver. Da Costa et al. [20] discussed the characteristics of spatiotemporal variation and the main factors influencing CO2 in southeastern Brazil based on regression models and pointed out that the average columnar CO2 concentration is inversely proportional to vegetation and climate parameters, while the change in land use is directly correlated with CO2 emissions. These methods or models can only reflect the unidirectional causal relationship between independent variables and dependent variables and cannot explain complex nonlinear relationships and interactions between variables [21]. It is also difficult to dynamically capture the spatio-temporal evolution characteristics of carbon emission factors, which in turn affects the effective identification and analysis of long-term trends [22].
In recent years, machine learning has become increasingly popular in carbon emission simulation research. As a data-driven method for autonomous model construction, it exhibits strong robustness to multicollinearity among explanatory variables during model training [23]. Acheampong and Boateng [24] selected nine variables, including economics, energy and R&D investment, to establish carbon emission prediction models for Australia, China, the United States, Brazil, and India based on the ANN model. They noted that R&D investment, population, urbanization rate, and energy are the main factors that affect carbon emissions. Zhang et al. [25] used the ANN-CA and IWOA-LSTM models to simulate changes in land use and variations in surface temperature in Wuhan, exploring the correlation between carbon emissions and surface temperature. Their results showed that in both 2010 and 2020, the R2 values of the linear fits for winter and summer exceeded 0.85. Although such algorithms demonstrate excellent adaptation capabilities when processing non-linear data, the optimization process in high-dimensional parameter spaces is relatively complex, which can easily lead to local optima and overfitting [26].
As a representative ensemble learning algorithm, stacking effectively captures complex features of nonlinear time series by coordinating and integrating diverse base learners and leveraging the trade-off between variance and bias [27], providing a new approach to selecting the main predictive variables. Hoxha et al. [28] constructed a stacking ensemble model that incorporates factors such as population, vehicle mileage, and passenger mileage to predict energy consumption related to transportation in Turkey. The results showed an R2 as high as 0.99 in the test set, with vehicle mileage and population variables exhibiting high importance scores. Zhang et al. [29] used a hybrid model that combined quadratic decomposition with stacking ensemble learning to predict carbon emissions in Germany, France, and Italy. The findings showed that the prediction accuracy for Italy and Germany improved by 64.56% and 81.66%, respectively, while MAPE and RMSE for France also improved by 56.66% and 59.11%, respectively. They identified the level of industrialization and energy consumption as the main factors. Therefore, the stacking ensemble model can combine the advantages of different base models and compensate for the limitations of a single model in comprehensively capturing potential correlations in data and avoiding overfitting, constructing a meta-model with higher accuracy and stronger robustness, thereby improving the accuracy of carbon emission predictions and enabling effective assessment of variable importance.
This study uses the Yangtze River basin as an example, collecting nighttime light data, land use data, and socioeconomic data from 2002 to 2022, and proposes a CO2 prediction model based on the stacking ensemble learning algorithm. This model integrates datasets from different spatial scales, using LightGBM, XGBoost, and SVR as base learners. Through comparative analysis, its superiority is demonstrated, and CO2 prediction maps are generated, revealing the spatio-temporal characteristics of land use and carbon emissions in the Yangtze River basin and assessing the importance of contributing features. The main objective of this paper is to (1) calculate indirect CO2 emissions using nighttime remote sensing data, analyze regional spatio-temporal evolution characteristics and trends, analyze the driving factors affecting CO2 net emissions, and explore the relationship between CO2 net emissions and land use; (2) quantify various indicators of landscape patterns, predict urban CO2 net emissions using stacking learning algorithms, and analyze feature importance to provide practical references for the scientific formulation of carbon emission reduction measures in the Yangtze River basin.

2. Materials and Methods

2.1. Research Area

The Yangtze River basin (N 24°27′–35°54′, E 90°33′–122°19′) is the largest river basin in China, with a total length of approximately 6300 km. It extends across the eastern, central and western economic zones of the country, encompassing 17 provinces (autonomous regions) and 2 municipalities, with a total land area of approximately 1.8 million km2. The region spans a vast area, stretching from the Yellow River and Huai River basins in the north to the Pearl River basin and the Fujian-Zhejiang water system in the southeast. There are significant disparities in economic development and natural differences between the upper, middle, and lower reaches of the region. The terrain slopes from west to east and is characterized by complex and diverse landforms, with a variety of land use types. Therefore, studying its land use status in this basin is of great importance for optimizing land use patterns and promoting sustainable regional development (Figure 1).

2.2. Data Sources and Preprocessing

As shown in Table 1, the data sources for this article mainly include land use data, carbon emissions data (based on fossil fuel data), nighttime lighting data, and socioeconomic data.
The nighttime light data used in this study combines radiance calibrated DMSP/OLS imagery (2002–2013) and NPP/VIIRS imagery (2013–2022). Following the studies of Zhang et al. [30], a series of correction procedures were applied to the DMSP/OLS imagery. Firstly, the raw images underwent standardized projection and resampling, ultimately achieving a uniform spatial resolution of 1 km × 1 km pixels. Subsequently, saturation correction and continuity correction were performed sequentially. For NPP/VIIRS data, the monthly data were first averaged annually to produce year-scale imagery. Subsequently, this imagery underwent projection and resampling, followed by outlier removal and continuity correction, yielding continuous night-time light data for the period 2013–2022. Finally, based on the fitting of 2013 DMSP/OLS and NPP/VIIRS night-time radiance data, the NPP/OLS data were processed to produce continuous night-time radiance data imagery for the period 2002–2022.

2.3. Variable Selection

Considering the characteristics of the study area and the availability of data, this paper selected 17 primary indicators across three aspects (land use structure, landscape index, and socio-economy) as shown in Table 2. Among these, the single land use dynamics indicator is further subdivided into five sub-indicators: cropland (Kcrop), woodland (Kwood), grassland (Kgrass), water bodies (Kwater), and built-up land (Kcons). In total, 21 factors are employed to delineate the influences of land use practices on carbon emissions.

2.4. Methods

2.4.1. Land Use Carbon Emission Estimation Methods

Based on the calculation methods provided in the IPCC, this study divides the carbon emission effects of land use into two types: direct emissions and indirect emissions.
(1)
Direct carbon emissions calculation method
Following existing research, carbon emissions from cropland, forest land, grassland, water areas, and unused land were estimated using the direct emission method. The calculation formula is as follows:
E x = S i T i
where E x is carbon emissions (absorption) from cropland, forest land, grassland, water areas, and unused land. S i is the area of land type i (ha). T i is the carbon emission coefficient of land type i (Table 3). E x is positive for carbon emissions and negative for carbon absorption. The CO2 emission (absorption) coefficients were established by referencing previous studies and incorporating the actual conditions of the Yangtze River Basin, as shown in Table 3.
(2)
Indirect carbon emissions calculation method
Nighttime lighting data can effectively reflect the intensity of human social activities, and many studies have demonstrated a linear correlation between nighttime light intensity and carbon emissions [36], which can estimate regional carbon emissions with high precision. Based on data released by the ODIAC platform, this study integrates monthly time series observation data to obtain carbon dioxide emissions from fossil fuels data from 2002 to 2022, achieving spatiotemporal inversion of annual indirect carbon emissions during the study period.
E i = M i
where E i is indirect carbon emissions. M i is the monthly carbon emissions that is a grid dataset of CO2 emissions generated by industrial processes such as fossil fuel combustion and cement production.
(3)
CO2 net carbon emissions
CO2 net carbon emissions are the sum of direct and indirect carbon emissions, calculated as follows:
E = E x + E i
where E x is direct carbon emissions, which are carbon emissions from cropland, forest, grassland, water bodies, and unused land; E i is indirect carbon emissions, which are carbon emissions generated on construction land.

2.4.2. Land Use and Carbon Emissions Relationship Model

(1)
XGBoost model
XGBoost is a highly efficient and scalable tree boosting system based on a gradient boosting framework proposed by Chen and Guestrin. It constructs models through iterative integration of weak learners, with each iteration employing a gradient descent strategy to fit negative gradient residuals and optimize accuracy [37]. Its core strengths lie in its excellent balance among superior prediction accuracy, robustness, computational efficiency, and model interpretability when processing structured data [38]. XGBoost effectively prevents overfitting by incorporating a regularization term to control model complexity, while simultaneously achieving multi-system optimization to enhance execution speed and efficiency [39].
(2)
CatBoost model
The CatBoost model is a machine learning algorithm proposed by Dorogush et al. [40] that is based on the gradient boosting decision tree (GBDT) framework. This method employs iterative ensemble learning using a symmetric binary tree structure, where each subsequent models optimize the prediction residuals of preceding models to progressively reduce prediction errors. Compared with traditional GBDT approaches, the significant advantage of CatBoost lies in its ability to efficiently and robustly handle categorical features while effectively mitigating gradient bias and prediction bias. Therefore, it reduces the risk of overfitting and enhances the predictive accuracy and generalization performance [41].
(3)
LightGBM model
LightGBM is an efficient machine learning method based on the gradient boosting framework, incorporating histogram decision tree optimization. It is suitable for complex tasks such as regression and classification [42]. Its core mechanisms include gradient unilateral sampling and mutually exclusive feature bundling, both of which significantly reduce data scale and computational complexity while ensuring controllable information loss. Simultaneously, the histogram algorithm accelerates node splitting computations and reduces memory consumption by discretizing continuous features into finite intervals. Additionally, LightGBM employs a leaf-first growth strategy [43], selecting the leaf node with the greatest gain for splitting at each step to achieve higher model accuracy (Figure 2).
(4)
Stacking ensemble learning models
The stacking ensemble learning model enhances overall performance by integrating the predictive capabilities of multiple learners and is fundamentally composed of two layers: base learners and a meta-learner. Its core principle is to generate predictions using multiple primary learners, then feed these results as new features into a secondary learner for training and fusion, thereby producing the final prediction value [44]. In this study, the stacking regression framework was constructed with LightGBM, XGBoost, and Support Vector Regression (SVR) as base learners, and Ridge regression as the meta-learner. This configuration enables complementarity between tree-based models and kernel methods, thereby leveraging the strengths of different learning mechanisms to improve predictive accuracy. Notably, this paper introduces a passthrough strategy within the Stacking framework. This approach preserves the original feature inputs while transmitting the meta-learners’ predictions, enabling the meta-learners to balance both original information and multi-model outputs during weighted fusion to improve the robustness and interpretability of the fusion results.
Furthermore, to ensure optimal performance of each base learner under varying levels of complexity, this paper adopts an automated hyperparameter optimization method for systematic adjustment of core parameters. Through global parameter search and cross-validation mechanisms, an effective balance was achieved between model complexity and generalization capability, significantly enhancing the predictive accuracy and robustness of the final ensemble model [45]. Finally, the SHAP method was applied to interpret the model outputs, providing a quantitative assessment of the contributions of individual features to the predictions. The overall framework of the model is illustrated in Figure 3.

2.4.3. Model Evaluation

The R2, RMSE, and RPD are used to evaluate and compare the stability and accuracy of different models, as well as to assess the agreement between predicted and observed CO2 emissions. A higher R2 value, approaching 1, indicates a stronger correlation; the smaller the RMSE, the smaller the error between the predicted value and the actual value, indicating a better model prediction performance. RPD is defined as the ratio of the standard deviation to the RMSE, where a higher RPD value indicates superior predictive capability.
R 2 = 1 i = 1 n y i y ^ 2 i = 1 n y i y ¯ 2
R M S E = 1 n i = 1 n y i y ^ 2
R P D = S D R M S E
where y i is the actual CO2 emission value, ŷ represents the predicted CO2 emission value, y ¯ is the average of actual measurements of CO2, S D is the standard deviation, n is the number of samples.

2.4.4. Model Design

During the process of model construction, this paper employs a method combining time series grouping with time-blocking to ensure temporal independence between model training and validation. Time-blocking is a validation strategy specifically designed for time series data in which the core principle involves partitioning the data based on the temporal sequence of observations. This ensures the training set comprises solely historical data, while the test set consists of observations that are temporally more delayed [46]. This approach effectively mitigates the risk of future information leaking into the model training process, which conventional random partitioning could not guarantee [47]. Accordingly, this paper uses data from 2002, 2007 and 2012 as the training set, comprising 399 samples in total, while data from other years form the test set, comprising 266 samples in total. This represents a training-to-test split ratio of approximately 6:4 (shown in Table 4).

3. Results

3.1. Spatio-Temporal Characteristics of the Change in Land Use

There were significant differences in changes between land use types in the Yangtze River basin during the study period (Figure 4). Overall, there has been a significant decline in cropland area, while the areas of construction land and forest land have increased. Specifically, the cropland area decreased from 5.37 × 107 hm2 in 2002 to 5.09 × 107 hm2 in 2022. The converted cropland area was 1.3 × 106 hm2 between 2002 and 2012, and 1.5 × 106 hm2 between 2012 and 2022. Among them, forests were the main type of land converted from cropland, covering an area of 3.77 × 106 hm2, accounting for 70.17% of the total converted area. Construction land was the second largest type, accounting for 20.7% of the total converted area. The area of construction land has increased from 2.26 × 106 hm2 in 2002 to 4.61 × 106 hm2 in 2022. From 2002 to 2012, the area of other types of land converted into construction land was 1.19 × 106 hm2, and from 2012 to 2022, it was 1.16 × 106 hm2. Among these, from 2002 to 2012, the area of cropland converted to construction land was 1.11 × 106 hm2; from 2012 to 2022, the area of such conversion was 1.07 × 106 hm2. The forest area increased from 8.28 × 107 hm2 in 2002 to 8.50 × 107 hm2 in 2022. Overall, the expansion of construction land and forest land was primarily the result of the reduction in cropland (Table 5 and Figure 4).
During the study period, changes in land use in the Yangtze River basin exhibited significant regional variations. In the upper Yangtze River basin (UYRB), terrain restrictions limited change in land use, the main transformation being the conversion of grasslands to unused land. However, in regions such as Yunnan and Sichuan, a large amount of cropland has been converted to grassland. As a traditional agricultural core area, the middle Yangtze River basin (MYRB) initially focused on converting forest land and water areas into cropland. However, with the strengthening of ecological protection policies in recent years, which aim to restore the ecological functions of the river basin, a reverse conversion from cropland to forest land and cropland to water areas has been achieved. The Lower Yangtze River basin (LYRB) experienced rapid urbanization and industrialization during the study period, characterized primarily by the conversion of cropland to construction land (Figure 5).

3.2. Spatio-Temporal Characteristics of Carbon Emissions

To estimate net carbon emissions for the Yangtze River basin, this study utilizes corrected DMSP/OLS and NPP/VIIRS night-time total brightness values (TDN) to establish a correlation model between TDN and fossil fuel emissions from the ODIAC dataset. A logarithmic regression was fitted between annual provincial TDN and the corresponding ODIAC emissions for 2002–2022 (Table 6). The regression results indicate a positive and statistically significant association (p < 0.0001). On the basis of this regression relationship, indirect carbon emissions for each city were estimated by applying the corresponding city-level TDN values. These estimates were then added with the direct carbon emissions to derive the net carbon emissions at the city level, measured in Mt.
CO2 net emissions in the Yangtze River basin experienced two phases from 2002 to 2022: rapid growth and slow growth, which showed different temporal characteristics (Figure 6). Between 2002 and 2012, carbon net emissions increased rapidly, rising from 207.26 Mt in 2002 to 573.51 Mt in 2012, with an average annual growth rate of 10.71%. Although carbon emissions continued to grow after 2012, the growth rate slowed significantly, reaching 676.01 Mt in 2022, an increase of 17.87% compared to 2012, with an average annual growth rate of 1.66%. In general, CO2 net emissions in the Yangtze River basin show a clear trend of phased increases, which can be divided into two stages: the first stage is from 2002 to 2012, when carbon emissions increased rapidly; the second stage is from 2012 to 2022, when CO2 net emissions have fluctuated. Although CO2 net emissions in individual cities experienced slight increases during the latter period, the general trend indicates a movement toward stabilization.
In terms of spatial distribution, the average CO2 net emissions in the upper, middle, and lower reaches of the Yangtze River basin in 2022 were 2.54 Mt, 3.32 Mt, and 14.36 Mt, respectively (Table 7 and Figure 7). CO2 net emissions in Chongqing and Chengdu, located in the UYRB, were 30.02 Mt and 20.10 Mt, respectively, significantly higher than the regional average. The total carbon emissions of the two cities accounted for 32.4% of the total CO2 net emissions in the UYRB. The CO2 net emissions of Wuhan, Changsha, and Nanchang in the middle reaches of the Yangtze River basin are also higher than the regional average, which are 18.17 Mt, 11.81 Mt, and 7.51 Mt, respectively. These three cities collectively contributed 24.05% of the total MYRB emissions. In the LYRB, cities such as Shanghai, Suzhou, and Nanjing also recorded CO2 net emissions that exceeded the regional average, with values of 81.77 Mt, 68.53 Mt, and 27.98 Mt, respectively, accounting for 49.67% of the total emissions in the LYRB.

3.3. Spatial-Temporal Variation and Correlation Analysis of Influencing Factors

The effects of different factors on carbon emissions vary, and their impacts also differ at different stages. This article categorizes the various indicator factors into two groups: natural factors and human factors. Among these, factors with significant human factors include GDPpc, POD, UR, and TI (Figure 8). The minimum GDPpc increased from 101 yuan to 16,793 yuan, representing a growth of more than 66 times, with its spatial distribution expanding from a point to a plane. In terms of spatial distribution, LYRB remains at a high level, while UYRB and MYRB have also grown significantly, with regional differences still existing. The maximum value of POD increased from 2582.49 to 4729.46 people/km2, with an increase of 83%. The trend of population concentration in large cities such as Shanghai, Suzhou, and Nanjing has intensified, and regional population density differences remain significant. During the study period, UR increased significantly, with an absolute increase exceeding 20 percentage points. The range of high and medium values expanded, showing dynamic synergistic characteristics of “growth and balance”. TI also increased significantly, with the highest threshold range increasing from 43.90–56.16% to 56.75–74.13%. The process of industrial structuring continues to advance, and spatial heterogeneity is decreasing in the LYRB.
Natural factors that significantly influence carbon emissions include LUI, NP, Kwater, LPI, Kgrass, and LSI (Figure 9). During the study period, the spatial pattern of LUI has gradually spread from the early highly concentrated areas in the southeast to the northwest, while the intensity differences within the region have tended to narrow. The number of core patches with high NP values (399,274–659,276) showed a relative contraction trend, while patches with medium to low values (118,440–230,204) expanded, forming a continuous spatial development axis. The high-value Kwater zone (0.08–0.20) is mainly distributed in a contiguous area in the northwest region, which has decreased in 2022, and the overall trend is converging towards the median and low-value zones. During the study period, the LPI showed an overall upward trend, with the lowest value increasing from 11.56 to 13.23 and the highest value increasing from 96.51 to 97.06, indicating an increase in the dominance of the largest patches in the region. The lower limit of Kgrass increased from −0.18 to −0.16, while the upper limit decreased from −0.08 to −0.11, indicating that both the intensity and the extent of the reduction in grassland patches have been weakened. From a spatial distribution perspective, regions with high values have gradually shifted from a scattered distribution to concentration in upstream and midstream areas, showing a clear trend of spatial contraction. During the study period, the high value zones of LSI exhibited a clear trend of shifting from the north toward the southwest, while the overall complexity of ecological land use patterns increased throughout the spatiotemporal evolution of the landscape.
To quantitatively assess the impact of various factors on CO2 net emissions, Pearson correlation analysis was used to investigate the relationship between CO2 net emissions and various influencing factors (Table 8). The analysis shows that CO2 net emissions are positively correlated with LUI, LUM, LA, Kwood, Kcons, GDPpc, UR, POD, TI, SI, NP, PD, LSI, IJI, and MSIDI, and negatively correlated with Kcrop, Kgrass, Kwater, LPI, CONTAG, and PAFRAC. Among these, POD and GDPpc are highly correlated with CO2 net emissions, with correlation coefficients of 0.836 and 0.526, respectively, indicating that densely populated areas and higher economic levels are typically accompanied by higher energy consumption and industrial activity, resulting in increased carbon emissions. Meanwhile, LUI and LUM are also significantly positively correlated with CO2 net emissions, with correlation coefficients of 0.470 and 0.264, respectively. Additionally, CONTAG and LPI negatively influence CO2 net emissions, with correlation coefficients of −0.238 and −0.266, respectively. The high values of CONTAG and LPI show that landscape patches are more connected and concentrated, which is conducive to the functioning of land ecosystems. For example, continuous vegetation patches such as forests and water bodies with strong carbon sink capacity can absorb and store carbon dioxide more effectively, reducing the net release of carbon emissions.

3.4. Analysis of the Model Results

The prediction results of the four models, XGBoost, CatBoost, LightGBM, and stacking, are shown in Table 9. The results indicate that the RPD values of all four models are higher than 1.6, indicating that all four models have a certain ability to estimate CO2 net emissions. As shown in Figure 10, XGBoost and CatBoost have relatively high R2 and RPD values, along with a lower RMSE. However, the LightGBM model performed poorly, with an R2 of 0.57, an RMSE of 6.48, and an RPD reduced to 1.53, indicating its predictive accuracy was significantly lower than that of XGBoost and CatBoost. On the contrary, the stacking model demonstrated superior performance, achieving an R2 of 0.80, reducing the RMSE to 4.46, and the RPD to 2.22. Among the four models, there are certain differences in the prediction accuracy for high CO2 net emissions and low CO2 net emissions. The XGBoost, CatBoost and stacking models perform relatively well with lower dispersion, while the stacking model performs excellently in predicting both low and high emission levels. The results showed that the stacking model significantly outperformed the other three models in the prediction of CO2 net emissions. The prediction results of the test set are plotted in a measured-predicted scatter regression chart (Figure 10).

4. Discussion

4.1. Model Performance Discussion

This study used XGBoost, LightGBM, CatBoost, and the stacking ensemble model to predict CO2 net emissions. The results show that the stacking ensemble model outperforms the individual XGBoost, CatBoost, and LightGBM models in estimating CO2 net emissions, effectively compensating for the limitations of a single algorithm in capturing noise data and spatiotemporal patterns in complex coupled systems. Specifically, the R2 and RPD values of the stacking ensemble model were 21.21% and 30.59% higher than those of XGBoost, 19.40% and 28.32% higher than CatBoost, and 40.35% and 45.10% higher than LightGBM, respectively. In terms of RMSE, it was 23.37% lower than XGBoost, 22.16% lower than CatBoost, and 31.17% lower than LightGBM.
The stacking ensemble model deeply integrates the temporal characteristics and multiscale variables of carbon emission data through the multisource feature extraction capabilities of its base learners, significantly enhancing its ability to capture complex data patterns. This is consistent with the research results of Zhang and Cheng [48]. Compared to single models or simple weighted average ensembles, stacking models demonstrate significantly improved accuracy in carbon emission intensity prediction by using meta learners to adaptively calibrate the bias of base models, which can effectively suppress overfitting and reduce the variance of prediction errors. The key evaluation indicators R2 and RMSE have both improved significantly, which is consistent with the conclusions of Hu and Li [49]. In terms of computing efficiency, the base learners can be trained independently, making full use of distributed computing resources to reduce overall training time [50]. Moreover, once training is completed, the meta-learner can rapidly generate predictions. Therefore, the stacking ensemble model achieves an organic balance between accuracy and computational efficiency in the field of carbon emission prediction through its diverse combination of base learners and efficient meta-learning mechanisms, demonstrating strong application potential.

4.2. Evaluation of the Results of the Carbon Emission Calculation

Previous studies have widely employed the bookkeeping model and the emission factor method [51]. The bookkeeping model relies on empirical parameters, while the emission factor method is relatively simple to calculate, benefits from easily available data, and is widely applicable in various contexts [52]. Therefore, this paper uses the emission factor method to calculate direct carbon emissions and integrates remote sensing data to calculate indirect carbon emissions, thus improving the accuracy and operability of the calculations to a certain extent.
Accurate assessment of the impact of land use on carbon emissions requires a comprehensive analysis of landscape patterns in the study area. This paper calculates CO2 net emissions from 2002 to 2022 and finds that, with the increase in carbon emissions, various types of land use have undergone varying degrees of change over the past 20 years. Among these, the area of construction land increased by approximately 1.61 × 106 hm2, resulting in 471.52 Mt of carbon emissions, indicating that construction land is the main source of CO2 net emissions and significantly affects the levels of CO2 net emissions in the study area, consistent with the results of Zhu et al. [53]. The spatial concentration of construction land has intensified the dominant effect of fossil fuel combustion and industrial processes on CO2 net emissions. Rapid industrial development has led to significant changes in land use types, which in turn have affected land use structure and population density, ultimately influencing total CO2 net emissions in the study area. The conclusion is consistent with the research results of Zhang et al. [54]. Meanwhile, studies show that CO2 net emissions associated with forests, grasslands, unused land, and water bodies are negative (Figure 11), indicating that the expansion of these land types improves the carbon storage capacity of terrestrial ecosystems [55]. Therefore, rational planning and optimization of land use structure, strengthening ecological land protection and restoration, are essential measures to reduce regional carbon emissions and promote low-carbon sustainable development.

4.3. Feature Importance Analysis

4.3.1. Single-Factor Analysis

This study selected 21 potential influencing factors and ranked them based on their contribution to carbon emissions using SHAP feature attribution. The SHAP analysis results for the top 18 factors are presented in Figure 12a. The results indicate that GDPpc, POD, TI, LUI, and Kwater rank among the top factors, showing that socioeconomic development levels and land use intensity are the primary drivers of carbon emissions. Secondly, landscape pattern indicators such as LPI, LSI, and NP also demonstrated significant contributions, indicating that landscape patterns and spatial structural changes play a crucial role in regulating carbon emissions. While variables like LA and UR exhibited relatively minor overall contributions, they still showed certain sensitivities under specific conditions.
The SHAP Beeswarm Plot (Figure 12b) further reveals the differences in marginal effects across various feature values. Both high GDPpc and high POD exhibit positive SHAP values, while low values have limited impact. LUI and Kcrop exert limited effects on carbon emissions within low-value ranges, exhibiting either mutual or negligible impacts. However, once a critical threshold is exceeded, they transition to sustained positive contributions, revealing the nonlinear driving mechanisms underlying land use transformation [56]. Landscape pattern factors such as LPI, LSI, and NP exhibit complex trends shifting from low positive values toward high values that stabilize or diminish, indicating that landscape fragmentation and aggregation may produce opposing effects at different stages. The SHAP values for Kwood predominantly fall within the negative range, emphasizing that forest dynamics generally function as carbon sinks [57].

4.3.2. Interaction Effect Analysis

In this paper, the SHAP dependency plot is used to reveal the marginal contribution of a single feature to the model prediction at different value levels, and the potential interaction effects with other features. For the selection of interaction features, this paper employs an automated search strategy. These selected features are then presented using color gradients. Compared to the marginal effects of individual features, this method effectively captures the complex interactions between variables within nonlinear models [58]. Therefore, this paper selected socioeconomic factors (GDPpc, TI), land use structure (LUI, LA), and landscape structure (LSI, NP) from the SHAP bar plot for analysis to explore the trends in their SHAP values and the characteristics of interactions with other factors (Figure 13).
Figure 13 reveals the nonlinear characteristics and interaction effects of carbon emission drivers. Figure 13a indicates that as GDPpc increases, the SHAP values rise steadily from low to high levels while maintaining a consistently positive contribution. This reflects that heightened levels of economic development significantly amplify the driving effect on carbon emissions [59], with this positive effect becoming more significant under conditions of higher LSI. Figure 13b indicates that TI exerts a particularly pronounced positive effect on carbon emissions in the low-to-medium value zones, with this influence diminishing progressively in higher-value areas. When landscape concentration is relatively low, the expansion of the tertiary sector exerts a stronger stimulatory effect on carbon emissions. Figure 13d,e reveal that both LUI and LA exhibit critical threshold effects on carbon emissions. LUI significantly increases carbon emissions as land use intensity rises, and this effect is markedly modulated by NP. LA rapidly increases emissions beyond a threshold, exhibiting an interactive reinforcement effect with TI. Figure 13e,f demonstrate that the SHAP values for LSI and NP increase with their own values and exhibit a positive synergistic effect with POD, highlighting the combined impact of spatial patterns and population concentration.
This shows that socioeconomic factors generally exhibit a threshold effect with limited influence at low values, but their impact rapidly intensifies and maintains a positive contribution once the critical point is crossed. Land use structure indicators (e.g., LUI) and landscape pattern factors (e.g., LSI, NP) predominantly exhibit a shift from negative to positive trends, reflecting the regulatory role of optimized landscape structure in carbon emission processes [60]. Therefore, the formation of carbon emissions is the result of the combined effects of economic development, landscape patterns and land dynamics, all of which exhibit nonlinear characteristics.

5. Conclusions

This paper takes the Yangtze River basin as a case study, combining CO2 net emission data and land use data, and uses a stacking ensemble model to assess the relationship between landscape patterns and CO2 net emissions. It compares the predictive capabilities of XGBoost, LightGBM, CatBoost, and the stacking ensemble model for CO2 net emissions, and it evaluates the importance of variables. The main conclusions are as follows:
(1)
CO2 net emissions showed a significant upward trend from 2002 to 2022, with a net increase of 468.74 Mt. The main reason for this increase was the rapid expansion of construction land, which increased by approximately 1.61 × 106 hm2, causing the proportion of carbon emissions to continue to increase. Meanwhile, carbon sources increased, while carbon sinks (grasslands and water bodies) decreased by 5.24% and 9.60%, respectively, accelerating the net increase in carbon emissions. The acceleration of urbanization and industrialization has driven land development and infrastructure construction, accompanied by the degradation of natural ecosystems and a significant weakening of carbon sink functions, thus contributing to the continuous increase in regional carbon emissions.
(2)
Compared to the other three single models, the stacking ensemble model demonstrates superior predictive capability, showing strong feature extraction capabilities and significant advantages in predictive accuracy and stability. Its R2, RMSE and RPD values are 0.80, 4.46, and 2.22, respectively, which are better than the XGBoost model by 21.21%, 23.27% and 30.59%, respectively; better than the CatBoost model by 19.40%, 22.16% and 28.32%; and better than the LightGBM model by 40.35%, 31.17% and 45.10%. The stacking model effectively integrates the strengths of different learning models by combining the prediction results of multiple base learners, making full use of multisource feature information, and significantly improving the fitting and generalization capabilities of complex nonlinear relationships.
(3)
The variable feature importance ranking in the stacking ensemble model shows that CO2 net emissions are influenced by both human factors (GDPpc, POD, and TI) and natural factors (LUI, Kwater, LPI, LSI, NP, and Kcrop). When combined with Pearson correlation analysis, CO2 net emissions were positively correlated with LUI, LUM, LA, Kwood, Kcons, GDPpc, UR, POD, TI, SI, NP, PD, LSI, IJI, and MSIDI and negatively correlated with Kcrop, Kgrass, Kwater, LPI, CONTAG, and PAFRAC. Absolute correlation coefficients are generally consistent with the importance evaluation of the characteristics. The most influential factors are GDPpc, POD, TI, LUI, Kwater, and LPI, in that order. This demonstrates that socioeconomic activities and landscape patterns influence carbon emissions through multidimensional pathways, and factors including population density, economic level, and land use intensity significantly regulate the spatial distribution and dynamic changes in regional carbon emissions by altering energy consumption structures and ecosystem functions.
From 2002 to 2022, carbon emissions in the Yangtze River basin exhibited significant spatial variations, with a fluctuating upward trend. Among the dominant factors, GDPpc, POD, TI, LUI, Kwater and LPI appeared frequently and had a significant impact on the model. Therefore, future urbanization processes should prioritize optimizing landscape patterns, reasonably planning the scale of construction land, and increasing land types with strong carbon sink capacities. It is essential to improve the efficiency of land use in construction while ensuring the protection of forests, grasslands, and water bodies, thus promoting the establishment of a regionally coordinated development model. However, this study also has certain limitations. Although it analyzes the contribution and impact of CO2 net emissions in multiple dimensions, including land use structure, landscape index, economic level, and population, it does not explore the specific mechanisms underlying each influencing factor. Future research needs to be further deepened.

Author Contributions

Conceptualization, methodology, writing—review and editing, visualization, Q.W.; software, J.L.; validation, W.L., Q.G.; formal analysis, J.D.; investigation, J.L.; resources, Z.D., Y.S.; data curation, Q.W., Z.D.; writing—original draft preparation, Q.W.; supervision, B.P.; project administration, B.P.; funding acquisition, B.P. All authors have read and agreed to the published version of the manuscript.

Funding

The author gratefully acknowledges the financial support from the National Natural Science Foundation of China (42277075), Key Science and Technology Projects under the Science and Technology Innovation Platform (202305a12020039), Anhui Natural Science Research Foundation (2208085US14), Anhui Provincial Science and Technology Plan Project for Housing and Ur-ban-Rural Construction (2024-YF055), Natural Science Foundation of colleges and universities in Anhui Province (2023AH050187).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Godlewska, M.; Balk, H.; Izydorczyk, K.; Kaczkowski, Z.; Mankiewicz-Boczek, J.; Ye, S. Rapid in situ assessment of high-resolution spatial and temporal distribution of cyanobacterial blooms using fishery echosounder. Sci. Total Environ. 2023, 857, 159492. [Google Scholar] [CrossRef]
  2. Zhao, L.; Chen, M.-N.; Yang, C.-H.; Zhang, R.-Z.; Zhang, Q.-P.; Wang, Q. Characteristics of spatial and temporal carbon emissions from different land uses in Shanxi section of the Yellow River, China. Environ. Dev. Sustain. 2023, 26, 20869–20884. [Google Scholar] [CrossRef]
  3. Hong, C.; Burney, J.A.; Pongratz, J.; Nabel, J.E.M.S.; Mueller, N.D.; Jackson, R.B.; Davis, S.J. Global and regional drivers of land-use emissions in 1961–2017. Nature 2021, 589, 554–561. [Google Scholar] [CrossRef]
  4. Figueres, C.; Le Quéré, C.; Mahindra, A.; Bäte, O.; Whiteman, G.; Peters, G.; Guan, D. Emissions are still rising: Ramp up the cuts. Nature 2018, 564, 27–30. [Google Scholar] [CrossRef] [PubMed]
  5. Rong, T.; Zhang, P.; Zhu, H.; Jiang, L.; Li, Y.; Liu, Z. Spatial correlation evolution and prediction scenario of land use carbon emissions in China. Ecol. Inform. 2022, 71, 101802. [Google Scholar] [CrossRef]
  6. Wang, Q.; Su, M. A preliminary assessment of the impact of COVID-19 on environment—A case study of China. Sci. Total Environ. 2020, 728, 138915. [Google Scholar] [CrossRef] [PubMed]
  7. Kang, T.; Wang, H.; He, Z.; Liu, Z.; Ren, Y.; Zhao, P. The effects of urban land use on energy-related CO2 emissions in China. Sci. Total Environ. 2023, 870, 161873. [Google Scholar] [CrossRef]
  8. Li, Z.; Wu, H.; Wu, F. Impacts of urban forms and socioeconomic factors on CO2 emissions: A spatial econometric analysis. J. Clean. Prod. 2022, 372, 133722. [Google Scholar] [CrossRef]
  9. Cai, C.; Fan, M.; Yao, J.; Zhou, L.; Wang, Y.; Liang, X.; Liu, Z.; Chen, S. Spatial-temporal characteristics of carbon emissions corrected by socio-economic driving factors under land use changes in Sichuan Province, southwestern China. Ecol. Inform. 2023, 77, 102164. [Google Scholar] [CrossRef]
  10. Pu, X.; Cheng, Q.; Chen, H. Spatial–temporal dynamics of land use carbon emissions and drivers in 20 urban agglomerations in China from 1990 to 2019. Environ. Sci. Pollut. Res. 2023, 30, 107854–107877. [Google Scholar] [CrossRef]
  11. Chen, J.; Shi, Q.; Zhang, W. Structural path and sensitivity analysis of the CO2 emissions in the construction industry. Environ. Impact Assess. Rev. 2022, 92, 106679. [Google Scholar] [CrossRef]
  12. Wang, G.; Chen, X.; Zhang, Z.; Niu, C. Influencing Factors of Energy-Related CO2 Emissions in China: A Decomposition Analysis. Sustainability 2015, 7, 14408–14426. [Google Scholar] [CrossRef]
  13. Zhang, R.; Matsushima, K.; Kobayashi, K. Can land use planning help mitigate transport-related carbon emissions? A case of Changzhou. Land Use Policy 2018, 74, 32–40. [Google Scholar] [CrossRef]
  14. Wang, C.; Wang, F.; Zhang, X.; Yang, Y.; Su, Y.; Ye, Y.; Zhang, H. Examining the driving factors of energy related carbon emissions using the extended STIRPAT model based on IPAT identity in Xinjiang. Renew. Sustain. Energy Rev. 2017, 67, 51–61. [Google Scholar] [CrossRef]
  15. Quan, C.; Cheng, X.; Yu, S.; Ye, X. Analysis on the influencing factors of carbon emission in China’s logistics industry based on LMDI method. Sci. Total Environ. 2020, 734, 138473. [Google Scholar] [CrossRef] [PubMed]
  16. Mo, L. Evaluation model of carbon emission efficiency of land intensive use based on SBM model. Int. J. Environ. Technol. Manag. 2024, 27, 200–215. [Google Scholar] [CrossRef]
  17. Ou, J.; Liu, X.; Li, X.; Chen, Y. Quantifying the relationship between urban forms and carbon emissions using panel data analysis. Landsc. Ecol. 2013, 28, 1889–1907. [Google Scholar] [CrossRef]
  18. Meng, Q.; Zheng, Y.; Liu, Q.; Li, B.; Wei, H. Analysis of Spatiotemporal Variation and Influencing Factors of Land-Use Carbon Emissions in Nine Provinces of the Yellow River Basin Based on the LMDI Model. Land 2023, 12, 437. [Google Scholar] [CrossRef]
  19. Yang, X.; Liu, X. Path analysis and mediating effects of influencing factors of land use carbon emissions in Chang-Zhu-Tan urban agglomeration. Technol. Forecast. Soc. Chang. 2023, 188, 122268. [Google Scholar] [CrossRef]
  20. da Costa, L.M.; de Mendonça, G.C.; de Araújo Santos, G.A.; Pacheco, F.A.L.; de Souza Rolim, G.; Panosso, A.R.; La Scala, N., Jr. Drivers of atmospheric CO2 concentration in southeast Brazil: Insights from land use change, vegetation, and climate factors. Remote Sens. Appl. Soc. Environ. 2025, 38, 101614. [Google Scholar] [CrossRef]
  21. Qiao, R.; Wu, Z.; Jiang, Q.; Liu, X.; Gao, S.; Xia, L.; Yang, T. The nonlinear influence of land conveyance on urban carbon emissions: An interpretable ensemble learning-based approach. Land Use Policy 2024, 140, 107117. [Google Scholar] [CrossRef]
  22. Li, G.; Chen, X.; You, X.-Y. System dynamics prediction and development path optimization of regional carbon emissions: A case study of Tianjin. Renew. Sustain. Energy Rev. 2023, 184, 113579. [Google Scholar] [CrossRef]
  23. Pes, B. Evaluating feature selection robustness on high-dimensional data. In International Conference on Hybrid Artificial Intelligence Systems; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
  24. Acheampong, A.O.; Boateng, E.B. Modelling carbon emission intensity: Application of artificial neural network. J. Clean. Prod. 2019, 225, 833–856. [Google Scholar] [CrossRef]
  25. Zhang, M.; Al Kafy, A.; Xiao, P.; Han, S.; Zou, S.; Saha, M.; Zhang, C.; Tan, S. Impact of urban expansion on land surface temperature and carbon emissions using machine learning algorithms in Wuhan, China. Urban Clim. 2023, 47, 101347. [Google Scholar] [CrossRef]
  26. Zhan, Y.; Liu, W.; Maruyama, Y. Damaged Building Extraction Using Modified Mask R-CNN Model Using Post-Event Aerial Images of the 2016 Kumamoto Earthquake. Remote Sens. 2022, 14, 1002. [Google Scholar]
  27. Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar]
  28. Hoxha, J.; Çodur, M.Y.; Mustafaraj, E.; Kanj, H.; El Masri, A. Prediction of transportation energy demand in Türkiye using stacking ensemble models: Methodology and comparative analysis. Appl. Energy 2023, 350, 121765. [Google Scholar] [CrossRef]
  29. Zhang, B.; Ling, L.; Zeng, L.; Hu, H.; Zhang, D. Multi-step prediction of carbon emissions based on a secondary decomposition framework coupled with stacking ensemble strategy. Environ. Sci. Pollut. Res. 2023, 30, 71063–71087. [Google Scholar] [CrossRef]
  30. Zhao, C.; Cao, X.; Chen, X.; Cui, X. A consistent and corrected nighttime light dataset (CCNL 1992–2013) from DMSP-OLS data. Sci. Data 2022, 9, 424. [Google Scholar] [CrossRef] [PubMed]
  31. Cai, Z.; Kang, G.; Tsuruta, H.; Mosier, A. Estimate of CH4 Emissions from Year-Round Flooded Rice Fields During Rice Growing Season in China. Pedosphere 2005, 15, 66–71. [Google Scholar]
  32. Fang, J.; Guo, Z.; Piao, S.; Chen, A. Terrestrial vegetation carbon sinks in China, 1981–2000. Sci. China Ser. D Earth Sci. 2007, 50, 1341–1350. [Google Scholar] [CrossRef]
  33. Sun, H.; Liang, H.; Chang, X.; Cui, Q.; Tao, Y. Land Use Patterns on Carbon Emission and Spatial Association in China. Econ. Geogr. 2015, 35, 154–162. [Google Scholar]
  34. Shaofeng, Y.; Yiyu, T. Spatial Differentiation of Land Use Carbon Emission in the Yangtze RiverEconomic Belt Based on Low Carbon Perspective. Econ. Geogr. 2019, 39, 190–198. [Google Scholar]
  35. Shi, H.; Mu, X.; Zhang, Y.; Lu, M.Q. Effects of Different Land Use Patterns on Carbon Emission in Guangyuan City of Sichuan Province. Bull. Soil Water Conserv. 2012, 32, 101–106. [Google Scholar]
  36. Mengjie, W.; Yanjun, W.; Shaochun, L. Spatio-temporal difference analysis of carbon emissions in Chang-Zhu-Tanurban agglomeration based on multi-source remote sensing data. Bull. Surv. Mapp. 2023, 1, 65. [Google Scholar]
  37. Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2021, 38, 4145–4162. [Google Scholar] [CrossRef]
  38. Zhang, L.; Jánošík, D. Enhanced short-term load forecasting with hybrid machine learning models: CatBoost and XGBoost approaches. Expert Syst. Appl. 2024, 241, 122686. [Google Scholar] [CrossRef]
  39. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2020, 54, 1937–1967. [Google Scholar] [CrossRef]
  40. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
  41. Wei, X.; Rao, C.; Xiao, X.; Chen, L.; Goh, M. Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model. Expert Syst. Appl. 2023, 219, 119648. [Google Scholar] [CrossRef]
  42. Guolin Ke, Q.M. LightGBM: A Highly Effcient Gradient BoostingDecision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52. [Google Scholar]
  43. Friedman, J.; Hastie, T. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
  44. Ghasemieh, A.; Lloyed, A.; Bahrami, P.; Vajar, P.; Kashef, R. A novel machine learning model with Stacking Ensemble Learner for predicting emergency readmission of heart-disease patients. Decis. Anal. J. 2023, 7, 100242. [Google Scholar] [CrossRef]
  45. Wang, Q.; Lu, H. A novel stacking ensemble learner for predicting residual strength of corroded pipelines. Npj Mater. Degrad. 2024, 8, 87. [Google Scholar] [CrossRef]
  46. Cerqueira, V.; Torgo, L.; Mozetič, I. Evaluating time series forecasting models: An empirical study on performance estimation methods. Mach. Learn. 2020, 109, 1997–2028. [Google Scholar] [CrossRef]
  47. Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W.; et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
  48. Zhao, A.B.; Cheng, T. Stock return prediction: Stacking a variety of models. J. Empir. Financ. 2022, 67, 288–317. [Google Scholar] [CrossRef]
  49. Hu, S.; Li, S.; Gong, L.; Liu, D.; Wang, Z.; Xu, G. Carbon emissions prediction based on ensemble models: An empirical analysis from China. Environ. Model. Softw. 2025, 188, 106437. [Google Scholar] [CrossRef]
  50. Liu, Y.; Tang, C.; Zhou, A.; Yang, K. A novel ensemble approach for road traffic carbon emission prediction: A case in Canada. Environ. Dev. Sustain. 2024, 27, 15977–16013. [Google Scholar] [CrossRef]
  51. Tian, S.; Wang, S.; Bai, X.; Luo, G.; Li, Q.; Yang, Y.; Hu, Z.; Li, C.; Deng, Y. Global patterns and changes of carbon emissions from land use during 1992–2015. Environ. Sci. Ecotechnol. 2021, 7, 100108. [Google Scholar] [CrossRef]
  52. Baumann, M.; Gasparri, I.; Piquer-Rodríguez, M.; Pizarro, G.G.; Griffiths, P.; Hostert, P.; Kuemmerle, T. Carbon emissions from agricultural expansion and intensification in the Chaco. Glob. Chang. Biol. 2016, 23, 1902–1916. [Google Scholar] [CrossRef] [PubMed]
  53. Zhu, C.; Chang, Y.; Li, X.; Shan, M. Factors influencing embodied carbon emissions of China’s building sector: An analysis based on extended STIRPAT modeling. Energy Build. 2022, 255, 111607. [Google Scholar] [CrossRef]
  54. Zhang, C.-Y.; Zhao, L.; Zhang, H.; Chen, M.-N.; Fang, R.-Y.; Yao, Y.; Zhang, Q.-P.; Wang, Q. Spatial-temporal characteristics of carbon emissions from land use change in Yellow River Delta region, China. Ecol. Indic. 2022, 136, 108623. [Google Scholar] [CrossRef]
  55. Zhou, Y.; Chen, M.; Tang, Z.; Mei, Z. Urbanization, land use change, and carbon emissions: Quantitative assessments for city-level carbon emissions in Beijing-Tianjin-Hebei region. Sustain. Cities Soc. 2021, 66, 102701. [Google Scholar] [CrossRef]
  56. Tang, Z.; Wang, Y.; Fu, M.; Xue, J. The role of land use landscape patterns in the carbon emission reduction: Empirical evidence from China. Ecol. Indic. 2023, 156, 111176. [Google Scholar] [CrossRef]
  57. Harris, N.L.; Gibbs, D.A.; Baccini, A.; Birdsey, R.A.; de Bruin, S.; Farina, M.; Fatoyinbo, L.; Hansen, M.C.; Herold, M.; Houghton, R.A.; et al. Global maps of twenty-first century forest carbon fluxes. Nat. Clim. Chang. 2021, 11, 234–240. [Google Scholar] [CrossRef]
  58. Tsai, C.; Yeh, C. Faith-Shap: The Faithful Shapley Interaction Index. J. Mach. Learn. Res. 2023, 24, 1–42. [Google Scholar]
  59. Wang, Z.; Hao, P.; Yao, P. Non-Linear Relationship between Economic Growth and CO2 Emissions in China: An Empirical Study Based on Panel Smooth Transition Regression Models. Int. J. Environ. Res. Public Health 2017, 14, 1568. [Google Scholar] [CrossRef]
  60. Han, X.; Fu, M.; Huang, X. Spatiotemporal Heterogeneity of Land-Use Landscape Pattern Effects on CO2 Emissions at the City-Level Scale in China. Land 2025, 14, 1715. [Google Scholar] [CrossRef]
Figure 1. Various maps of the Yangtze River basin (a) location map of the Yangtze River basin in China, (b) elevation map of the Yangtze River basin, (c) land use map of the Yangtze River basin.
Figure 1. Various maps of the Yangtze River basin (a) location map of the Yangtze River basin in China, (b) elevation map of the Yangtze River basin, (c) land use map of the Yangtze River basin.
Atmosphere 16 01173 g001
Figure 2. Flow diagram of the histogram decision tree algorithm based on LightGBM.
Figure 2. Flow diagram of the histogram decision tree algorithm based on LightGBM.
Atmosphere 16 01173 g002
Figure 3. Diagram of the stacking ensemble model architecture.
Figure 3. Diagram of the stacking ensemble model architecture.
Atmosphere 16 01173 g003
Figure 4. Changes in land use in the Yangtze River basin from 2002 to 2012.
Figure 4. Changes in land use in the Yangtze River basin from 2002 to 2012.
Atmosphere 16 01173 g004
Figure 5. Spatial-Temporal Evolution of Land Use in the Yangtze River Basin, 2002–2022. (a) Transfer of land use from 2002 to 2012 (b) Transfer of land use from 2012 to 2022. A, B, C, and D represent typical areas with significant expansion of built-up land, indicating regions with relatively concentrated land-use transitions in the Yangtze River Basin.
Figure 5. Spatial-Temporal Evolution of Land Use in the Yangtze River Basin, 2002–2022. (a) Transfer of land use from 2002 to 2012 (b) Transfer of land use from 2012 to 2022. A, B, C, and D represent typical areas with significant expansion of built-up land, indicating regions with relatively concentrated land-use transitions in the Yangtze River Basin.
Atmosphere 16 01173 g005
Figure 6. Trends in CO2 net emissions in the Yangtze River basin from 2002 to 2022.
Figure 6. Trends in CO2 net emissions in the Yangtze River basin from 2002 to 2022.
Atmosphere 16 01173 g006
Figure 7. The evolution of spatio-temporal patterns of CO2 net emissions in the Yangtze River Basin from 2002 to 2022.
Figure 7. The evolution of spatio-temporal patterns of CO2 net emissions in the Yangtze River Basin from 2002 to 2022.
Atmosphere 16 01173 g007
Figure 8. Spatial distribution characteristics of human factors in the Yangtze River basin, 2002–2022.
Figure 8. Spatial distribution characteristics of human factors in the Yangtze River basin, 2002–2022.
Atmosphere 16 01173 g008
Figure 9. Spatial distribution characteristics of various natural factors in the Yangtze River basin, 2002–2022.
Figure 9. Spatial distribution characteristics of various natural factors in the Yangtze River basin, 2002–2022.
Atmosphere 16 01173 g009
Figure 10. Scatter plot of estimation results for four models: (a) XGBoost, (b) CatBoost, (c) LightGBM, (d) Stacking.
Figure 10. Scatter plot of estimation results for four models: (a) XGBoost, (b) CatBoost, (c) LightGBM, (d) Stacking.
Atmosphere 16 01173 g010
Figure 11. Analysis of the trends of carbon emissions (carbon absorption) for various types of land use from 2002 to 2022. Green dots represent carbon emissions (absorption) values for each land use type from 2002 to 2022. The solid red line indicates the fitted trend, while the shaded area denotes the 95% confidence interval.
Figure 11. Analysis of the trends of carbon emissions (carbon absorption) for various types of land use from 2002 to 2022. Green dots represent carbon emissions (absorption) values for each land use type from 2002 to 2022. The solid red line indicates the fitted trend, while the shaded area denotes the 95% confidence interval.
Atmosphere 16 01173 g011
Figure 12. SHAP-based feature importance and summary of the top 18 driving factors of carbon emissions, where (a) the SHAP bar plot means absolute SHAP values indicate global importance and proportional contributions, and (b) the SHAP beeswarm plot displays the distribution of SHAP values across all samples, with point positions reflecting contribution magnitude and color gradients denoting feature levels from low (purple) to high (yellow).
Figure 12. SHAP-based feature importance and summary of the top 18 driving factors of carbon emissions, where (a) the SHAP bar plot means absolute SHAP values indicate global importance and proportional contributions, and (b) the SHAP beeswarm plot displays the distribution of SHAP values across all samples, with point positions reflecting contribution magnitude and color gradients denoting feature levels from low (purple) to high (yellow).
Atmosphere 16 01173 g012
Figure 13. SHAP dependence plots of the six features for stacking. The horizontal axis represents the numerical level of a specific feature within the sample, while the vertical axis corresponds to its SHAP value, indicating the magnitude and direction of that feature’s contribution to the prediction outcome. The color of each point characterizes the value of another interacting feature, thereby reflecting the moderating relationship between the primary and secondary features.
Figure 13. SHAP dependence plots of the six features for stacking. The horizontal axis represents the numerical level of a specific feature within the sample, while the vertical axis corresponds to its SHAP value, indicating the magnitude and direction of that feature’s contribution to the prediction outcome. The color of each point characterizes the value of another interacting feature, thereby reflecting the moderating relationship between the primary and secondary features.
Atmosphere 16 01173 g013
Table 1. Data sources.
Table 1. Data sources.
Data NameTimeData SourceData Information
Land use data (Version 1.0.3)2002–2022The 30 m annual land cover datasets and their dynamics in China from 1985 to 2023
https://zenodo.org/records/12779975 (accessed on 1 October 2025)
30 m × 30 m
Cropland, forest, grassland,
water, built-up land, unused land (six land use types)
Carbon emissions data (based on fossil fuel data)2002–2022ODIAC Fossil fuel emission dataset from the center for Global Environmental Research1 km × 1 km
Nighttime lighting data2002–2022https://eogdata.mines.edu/products/dmsp/ (accessed on 8 November 2024)
https://eogdata.mines.edu/products/vnl/
(accessed on 8 November 2024)
DMSP/OLS (2002–2012) NPP/VIIRS (2013–2022)
Socioeconomic data2002–2022China Urban Statistical Yearbook, Statistical Yearbooks of Provinces, Municipalities, and PrefecturesPopulation density, GDP, urbanization rate, and proportion of industry structure
Table 2. Indicators of factors affecting land use and carbon emissions.
Table 2. Indicators of factors affecting land use and carbon emissions.
Indicator LayerFormulaInstructions
Land use patternsLand use mix index L U M = i = 1 n P i ln P i ln n P i is the area proportion of the land use type (%); n is the number of land use types.
Land use degree comprehensive index L U I j = 100 × i = 1 n A i × C i C A i is the number of land use classification levels, C i ,  C is the land use area at level i and the total land area in the region.
Land use dynamicsSingle land use dynamic K T = A b A a A a A a and A b are the areas of land use type at the outset and end of the study period.
Comprehensive land use Dynamics L A T = i = 1 n L A i j i = 1 n L A i × 100 % L A i j is the area of land use type i converted to land use type j during the study period; L A i is the area of land use type i at the beginning study.
Landscape indexNumber of patches ( N P ) N P = N NP describes the heterogeneity of the whole landscape.
Patch density ( P D ) P D = 1 A j = 1 M N i The larger the PD, the more dispersed the urban land use.
Largest patch index ( L P I ) L P I = m a x a 1 , a 2 , a 3 , , a n A × 100 % LPI is the proportion of the largest patch to the total area of the landscape.
Interspersion andjuxtaposition index ( I J I ) IJI = k = 1 m e i k k = 1 m e i k l n e i k k = 1 m e i k l n m 1 × 100 IJI calculates the overall distribution and juxtaposition of individual patches.
Landscape shape index ( L S I ) L S I = 0.25 k = 1 m e i k * T A LSI is used to measure the total length or density of the edge.
Modified Simpson’s Evenness Index ( M S I E I ) M S I E I = l n i = 1 m P i 2 ln m MSIEI is the uniformity of distribution between patch types.
Contagion index ( C O N T A G ) C O N T A G = i = 1 m k = 1 m p i o g i k k = 1 m g i k ln p i o g i k k = 1 m g i k 2 ln m × 100 CONTAG is the connectivity between different types of patches. Higher values indicate higher connectivity.
Perimeter-Area Fractal Dimension ( P A F R A C ) P A F R A C = 2 n i j = 1 n I n p i j I n a i j j = 1 n I n p i j j = 1 n I n a i j i j n i j = 1 n I n p i j 2 j = 1 n I n p i j 2 PAFRAC reflects the complexity of the shape on a range of spatial patches (patch sizes).
Social and economic factorsUrbanization Rate ( U R )Urban population/total population of the regionThe proportion of the urban population in the total population of the region.
Population Density ( P O D )Total population/Administrative areaPopulation density of prefecture-level cities.
GDP per capita ( G D P p c )GDP total value/average annual population totalPer capita economic scale and level of development in the region.
Secondary industry share ( S I )Secondary industrial output/total industrial outputThe proportion of the output value of the secondary industry to the total output value in the region.
Tertiary industry share ( T I )Tertiary industry output/total outputThe proportion of the output value of the tertiary industry to the total output value of the region.
Table 3. Carbon emission coefficient of land use type (t·hm−2·a−1).
Table 3. Carbon emission coefficient of land use type (t·hm−2·a−1).
Land TypeCarbon Emission FactorReference Sources
Cropland0.4970Cai et al., 2005 [31]
Forest−0.6440Fang et al., 2007 [32]
Grassland−0.0205Sun et al., 2015 [33]
Water−0.0230Yuan et al., 2019 [34]
Unused Land−0.0050Shi et al., 2012 [35]
Table 4. Model data partition.
Table 4. Model data partition.
NumberMin (Mt)Max (Mt)Standard Deviation (Mt)
Training set399−2.7371.946.72
Test set266−2.3581.779.93
Table 5. Changes in land use area from 2002 to 2022.
Table 5. Changes in land use area from 2002 to 2022.
Land Use TypeArea in 2002
(hm2)
Area in 2012
(hm2)
Increase/
Decrease Rate
(2002–2012)
Area in 2022
(hm2)
Increase/
Decrease Rate
(2012–2022)
Increase/
Decrease Rate
(2002–2022)
Cropland5.37 × 1075.24 × 107−2.46%5.09 × 107−2.86%−5.24%
Forest8.28 × 1078.37 × 1071.07%8.50 × 1071.55%2.63%
Grassland3.41 × 1073.31 × 107−2.95%3.23 × 107−2.36%−5.24%
Water3.83 × 1063.86 × 1060.88%3.46 × 106−10.39%−9.60%
Built-up land2.26 × 1063.45 × 10652.20%4.61 × 10633.68%103.46%
Unused land1.56 × 1061.78 × 10614.56%2.00 × 10612.40%28.77%
Table 6. Carbon emission prediction models established based on TDN values and indirect carbon emissions in the Yangtze River basin provinces from 2002 to 2022.
Table 6. Carbon emission prediction models established based on TDN values and indirect carbon emissions in the Yangtze River basin provinces from 2002 to 2022.
ProvincesRegression ResultsProvincesRegression Results
Fitting EquationsR2F valueFitting EquationsR2F Value
Henany = 4636ln(x) − 4399.70.93191.50Hubeiy = 1763.3ln(x) − 248.690.91160.03
Zhejiangy = 4531.1ln(x) − 7112.10.94238.67Chongqingy = 710.02ln(x) + 668.30.92183.16
Qinghaiy = 336.32ln(x) + 248.420.95337.49Jiangsuy = 9486.3ln(x) − 22,3980.93230.47
Gansuy = 2040.8ln(x) − 1357.40.92186.76Shanghaiy = 5684.6ln(x) − 13,8580.93182.26
Xizangy = 77.545ln(x) + 172.020.92182.58Jiangxiy = 1049.9ln(x) + 618.690.92130.34
Guangxiy = 1378ln(x) + 811.110.94247.83Guizhouy = 884.49ln(x) + 1627.90.93177.91
Fujiany = 2293.2ln(x) − 1667.30.95311.33Hunany = 1400.1ln(x) + 651.890.94237.96
Shaanxiy = 3591.4ln(x) − 3872.80.97548.33Yunnany = 2049ln(x) − 1059.20.92192.82
Guangdongy = 10,223ln(x) − 27,9310.91149.75Sichuany = 1529ln(x) + 31.2220.92171.56
Anhuiy = 2595.4ln(x) − 40.5550.93230.72
Table 7. Interannual variation in mean and standard deviation of carbon emissions in UYRB, MYRB and MYRB (Mt).
Table 7. Interannual variation in mean and standard deviation of carbon emissions in UYRB, MYRB and MYRB (Mt).
20022007201220172022
UYRBMean0.701.442.142.182.54
SD1.883.164.344.435.06
MYRBMean1.111.972.802.873.32
SD1.232.032.882.943.37
LYRBMean4.508.2212.4612.7314.62
SD6.1311.4617.2117.5819.74
Table 8. Correlation coefficient between CO2 emissions and various indicators.
Table 8. Correlation coefficient between CO2 emissions and various indicators.
LUMLUILAKGDPpcPODUR
KcropKwoodKgrassKwaterKcons
0.2640.4700.170−0.0420.016−0.200−0.1010.0080.5260.8360.270
SITINPLPILSIMSIDIPDIJICONTAGPAFRAC
0.1050.2960.057−0.2660.1340.3080.0620.025−0.238−0.048
Table 9. Analysis of the metrics of the machine learning model.
Table 9. Analysis of the metrics of the machine learning model.
Training SetTest Set
ModelR2RMSE (Mt)RPDR2RMSE (Mt)RPD
XGBoost0.713.511.870.665.821.70
CatBoost0.723.521.910.675.731.73
LightGBM0.693.751.790.576.481.53
Stacking0.872.432.760.804.462.22
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, B.; Wang, Q.; Diao, Z.; Li, J.; Liu, W.; Gao, Q.; Shu, Y.; Du, J. Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data. Atmosphere 2025, 16, 1173. https://doi.org/10.3390/atmos16101173

AMA Style

Pan B, Wang Q, Diao Z, Li J, Liu W, Gao Q, Shu Y, Du J. Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data. Atmosphere. 2025; 16(10):1173. https://doi.org/10.3390/atmos16101173

Chicago/Turabian Style

Pan, Banglong, Qi Wang, Zhuo Diao, Jiayi Li, Wuyiming Liu, Qianfeng Gao, Ying Shu, and Juan Du. 2025. "Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data" Atmosphere 16, no. 10: 1173. https://doi.org/10.3390/atmos16101173

APA Style

Pan, B., Wang, Q., Diao, Z., Li, J., Liu, W., Gao, Q., Shu, Y., & Du, J. (2025). Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data. Atmosphere, 16(10), 1173. https://doi.org/10.3390/atmos16101173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop