Next Article in Journal
Towards Sustainable Railways Using Polymeric Inclusions, Polyurethane Foam and Marginal Materials Derived from Rubber Tires
Previous Article in Journal
Sustainable Marketing: Can Retailers’ Profit-Motivated Consumer Education Enhance Green R&D and Production?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach

1
Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China
2
Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan 430062, China
3
School of Architecture and Engineering, Wuhan City Polytechnic, Wuhan 430064, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(20), 9009; https://doi.org/10.3390/su17209009
Submission received: 2 September 2025 / Revised: 4 October 2025 / Accepted: 6 October 2025 / Published: 11 October 2025

Abstract

High-resolution CO2 fossil fuel emission data are critical for developing targeted mitigation policies. As a key approach for estimating spatial distributions of CO2 emissions, top–down methods typically rely upon spatial proxies to disaggregate administrative-level emission to finer spatial scales. However, conventional linear regression models may fail to capture complex non-linear relationships between proxies and emissions. Furthermore, methods relying on nighttime light data are mostly inadequate in representing emissions for both industrial and rural zones. To address these limitations, this study developed a multiple proxy framework integrating nighttime light, points of interest (POIs), population, road networks, and impervious surface area data. Seven machine learning algorithms—Extra-Trees, Random Forest, XGBoost, CatBoost, Gradient Boosting Decision Trees, LightGBM, and Support Vector Regression—were comprehensively incorporated to estimate high-resolution CO2 fossil fuel emissions. Comprehensive evaluation revealed that the multiple proxy Extra-Trees model significantly outperformed the single-proxy nighttime light linear regression model at the county scale, achieving R2 = 0.96 (RMSE = 0.52 MtCO2) in cross-validation and R2 = 0.92 (RMSE = 0.54 MtCO2) on the independent test set. Feature importance analysis identified brightness of nighttime light (40.70%) and heavy industrial density (21.11%) as the most critical spatial proxies. The proposed approach also showed strong spatial consistency with the Multi-resolution Emission Inventory for China, exhibiting correlation coefficients of 0.82–0.84. This study demonstrates that integrating local multiple proxy data with machine learning corrects spatial biases inherent in traditional top–down approaches, establishing a transferable framework for high-resolution emissions mapping.

1. Introduction

Climate change, driven primarily by excessive CO2 emissions from fossil fuel combustion, poses a major challenge to global sustainable development [1]. In response, China has committed to achieving carbon neutrality by 2060 [2]. Given this context, accurately estimating the spatial distribution of CO2 emissions can provide critical data for achieving these targets.
Spatial compilation of CO2 fossil fuel emissions was initiated in the 1980s. Marland et al. [3] pioneered the approach by allocating global emissions to 1° grids using population distribution, establishing the foundation for spatial emission inventories. Subsequent studies refined these methodologies across global, national, and regional scales [4,5,6,7,8]. Major emission inventories include the Carbon Dioxide Information Analysis Center (CDIAC) [9], Emissions Database for Global Atmospheric Research (EDGAR) [10], Fossil Fuel Data Assimilation System (FFDAS) [8,11], Open-source Data Inventory for Anthropogenic CO2 (ODIAC) [7], High-Resolution Fossil Fuel CO2 Emissions for the United States (Vulcan) [12] and Multi-resolution Emission Inventory for China (MEIC) [13]. Two primary methodological paradigms currently exist: top–down and bottom–up approaches [14]. Top–down methods—distinct from atmospheric inversion modeling—refer to spatial proxy-based allocation or downscaling techniques [15]. These approaches disaggregate administrative-level emissions to finer grid using spatial proxies such as population density [4,16] and nighttime light data [17,18]. Conversely, bottom–up approaches integrate activity data from combustion sources (e.g., traffic volumes, power plant fuel consumption) with source-specific emission factors to construct emission distributions [6,12,19,20,21], bypassing proxy reliance. Although bottom–up methods achieve higher accuracy, their implementation is constrained in data-scarce regions due to limited activity and data availability [6,9]. Consequently, developing top–down spatial allocation models that integrate multiple proxies is critical for high-resolution emission estimation where activity data are inaccessible [18,22].
With the advancement of remote sensing, nighttime light imagery has proven effective in quantifying anthropogenic activities [23,24]. Within top–down frameworks, nighttime light data has been extensively employed as a spatial proxy to disaggregate CO2 fossil fuel emissions from statistical inventories at the administrative division level in China [22]. However, this approach systematically underestimates emissions in industrial zones and peri-urban areas with low nighttime light intensity [12]. Points of interest (POIs) data, which are geospatial labels including geographic coordinates and functional attributes (e.g., commercial, industrial), provide semantic information unobtainable from satellite imagery [25]. As an emerging big data source, POIs demonstrate significant potential for refining CO2 fossil fuel emission distributions [26,27], effectively compensating for nighttime light limitations in industrial and rural areas [28]. Artificial impervious surfaces, key indicators of built environments, reflect spatial patterns of human settlement and economic activity [29] and serve as primary domains for CO2 fossil fuel emissions [30]. Early spatial disaggregation models relied on statistical regression assuming linear relationships between spatial proxies and CO2 fossil fuel emissions [18,24]. Recent studies have begun exploring machine learning to capture non-linear associations [31,32,33], but these efforts face limitations: (1) predominant use of single algorithms without comparative performance assessment, and (2) inadequate quantification of relative variable contributions. Furthermore, most models calibrate proxy–emission relationships at coarse scales (global/national/provincial) and directly apply them to grid-level estimation, neglecting regional heterogeneity. Counties, as fundamental socioeconomic units, provide an optimal scale for modeling such interactions [32].
To address these limitations, this study selected Hubei Province—a pivotal region within China’s Yangtze River Economic Belt and a national low-carbon pilot zone [34,35]—as a case study. We developed a provincial-scale, high-resolution emission estimation framework that integrates multi-source data with machine learning algorithms based on county-level modeling. Nine spatial proxies were integrated: brightness of nighttime light (NTL), population size (POP), residential kernel density (RKD), commercial kernel density (CKD), heavy industrial kernel density (HKD), light industrial kernel density (LKD), agricultural kernel density (AKD), road length (RDL), and impervious surface block count (ISA). The fitted relationships were applied to generate a 1 km resolution CO2 fossil fuel emissions map for Hubei Province from 2014 to 2017. Spatiotemporal distribution characteristics were analyzed after validation against the MEIC.
The main contributions of this study include the following:
(1)
A spatial estimation model of CO2 fossil fuel emissions was constructed based on multi-source data and machine learning models for mapping high-resolution CO2 fossil fuel emissions.
(2)
The model selects multiple variables as model inputs to make the spatial distribution of CO2 fossil fuel emissions more reasonable.
(3)
This study provides some references for model construction and feature selection for subsequent related studies by comparing multiple machine learning models and feature importance analysis.
(4)
This study validates that localized multi-proxy integration is essential for high-resolution CO2 fossil fuel emission mapping.
This paper is organized as follows. Section 2 details the study area, data, and methodological framework. Section 3 presents results, discusses implications, and identifies limitations. Section 4 provides conclusion and future research directions.

2. Materials and Methods

2.1. Study Area

Situated in central China (29°01′53″ N to 33°06′47″ N, 108°21′42″ E to 116°07′50″ E; Figure 1), Hubei Province encompasses 185,900 km2 and comprises 17 prefecture-level cities and 103 counties. As a pivotal component of China’s national development strategies, the Yangtze River Economic Belt and Rise of Central China, the province serves as a major transportation hub with a rapidly growing economy. Hubei exhibits diverse topography, ranging from mountains in the west, hills in the east, and flat plains in the central-south. Characterized by a subtropical monsoon climate, the province boasts abundant water resources, biodiversity, and mineral deposits.

2.2. Data and Preprocessing

This study integrates socioeconomic and CO2 fossil fuel emissions datasets. Eight distinct datasets were utilized (Table 1). Limited by data availability, the analysis period spans 2014–2017. To ensure spatial consistency, all raster and vector data underwent preprocessing: reprojection to an Alberts Equal Area Conic projection and clipping to Hubei Province’s administrative boundaries.

2.2.1. Administrative Boundary and CO2 Emissions Data

  • Administrative boundary and county-level CO2 emissions
County-level administrative boundaries for Hubei Province (comprising 103 counties) were sourced from the National Geographic Information Resources Catalogue Service System (NGIRCS), with dataset version updated to 2019. County-level CO2 fossil fuel emissions data for 2014–2017 were obtained from the Carbon Emission Accounts and Datasets [38], covering 102 counties. Administrative inconsistencies between datasets arising from differing release years necessitated boundary harmonization. We reconciled the county-level administrative map to establish standardized spatial units for subsequent analysis.
  • Multi-resolution emission inventory for China
MEIC, developed and maintained by Tsinghua University, integrates multi-source activity data to construct high-resolution anthropogenic emission inventories for China [13]. It provides 0.25° × 0.25° gridded emissions, utilizing dynamic downscaling technology to spatially allocate point (e.g., industrial facilities), line (transport networks), and area (e.g., residential fuel use) sources, thereby reducing spatial allocation uncertainties inherent in traditional inventories. This widely recognized dataset has been extensively adopted in global climate governance research and policy assessments [39].

2.2.2. Remote Sensing Datasets

  • Nighttime light data
Although DMSP-OLS and NPP-VIIRS represent widely used nighttime light datasets, their application faces limitations in data quality and temporal coverage. This study utilized the NPP-VIIRS-like NTL data for Hubei Province [36]. This dataset integrates DMSP-OLS (2000–2012) and NPP-VIIRS monthly composites (2013–2018) through cross-sensor calibration, effectively mitigating the blooming effect while enhancing spatial resolution to 500 m and revealing fine-scale nocturnal light variations. To align with analytical consistency and preserve pixel integrity, we resampled the data to 1 km spatial resolution using the nearest-neighbor interpolation method, establishing the brightness of nighttime light (NTL) variable. This approach was chosen to maintain the original pixel values and ensure consistency in spatial heterogeneity with the source datasets.
  • Population data
Population data at 1 km × 1 km grid resolution for Hubei Province (2014–2017) were obtained from the WorldPop project, serving as the population size (POP) variable. This dataset employs a Random Forest model incorporating multiple ancillary variables, including topography, climate, land cover, and transportation networks, to generate high-resolution gridded population density estimates, thereby accurately characterizing spatial population distribution patterns across Hubei Province [40].
  • Artificial impervious surface data
Impervious surfaces, anthropogenic features like buildings, roads, and infrastructure that prevent water infiltration, serve as key indicators of human settlements. This study utilizes the long-term global impervious surface product (GAIA) developed by Gong et al. [37], which accurately captures spatiotemporal dynamics of land cover impermeability. To comprehensively represent CO2 emission sources, 30 m resolution impervious surface blocks were aggregated into 1 km × 1 km grids, establishing the impervious surface block count (ISA) variable.

2.2.3. Road Network Data

In 2021, China’s transport sector emitted approximately 960 million tons of CO2, accounting for 9.1% of national emissions, of which road transport contributed approximately 85% [41]. This study obtained road network data from OpenStreetMap, integrating motorways, primary roads, and secondary roads within Hubei Province. We derived the road length (RDL) spatial proxy to characterize CO2 fossil fuel emissions from transportation, constructing a 2014–2017 road network database for the province.

2.2.4. Point of Interest Data

Facility categories and locations derived from POI data can effectively reflect the spatial distribution of industrial emissions. POI data were acquired via the Baidu Map API (https://map.baidu.com/, accessed on 20 March 2024). After cleaning and reclassification following the method of Zhang et al. [28], five POI categories were established for Hubei Province: residential, commercial, heavy industry, light industry, and agricultural. Kernel density refers to the density of an element in its surrounding domain. Kernel density estimation (KDE) was employed to analyze spatial concentration patterns of these POI types at 1 km × 1 km grid resolution [25]. This yielded five spatial proxy variables characterizing density distributions: residential kernel density (RKD), commercial kernel density (CKD), heavy industry kernel density (HKD), light industry kernel density (LKD), and agricultural kernel density (AKD).

2.2.5. Multicollinearity Assessment of Spatial Proxies

To assess potential multicollinearity among the nine spatial proxies, Pearson correlation coefficients and variance inflation factors (VIFs) were calculated. The correlation matrix revealed coefficients ranging from 0.22 to 0.74, with no correlation exceeding the |0.75| threshold (Figure A1). All VIF values ranged from 1.61 to 4.01, well below the critical threshold of 10 (Table A1). These results confirm the absence of severe multicollinearity among the features.

2.3. Methodology

2.3.1. Overall Work Framework

During the training phase, nine 1 km × 1 km raster layers corresponding to the spatial proxies were generated. Pixel values were aggregated to county-level sums, serving as model inputs. County-level CO2 fossil fuel emissions were designated as the response variable. Six tree-based ensemble models (Extra-Trees, XGBoost, CatBoost, Gradient Boosting Decision Trees, Random Forest, and LightGBM) and Support Vector Regression (SVR) were trained using these nine proxies. For comparison, single proxy models employing only the brightness of nighttime light (NTL) variable were implemented; Extra-Trees and Linear Regression were fitted to predict county-level emissions.
To ensure model performance, several preprocessing and optimization steps were undertaken. Feature scaling disparities within the sample dataset may impair machine learning performance. We therefore normalized all nine independent variables using min-max scaling. The dataset was partitioned into training (80%) and testing (20%) sets. Hyperparameters, which are predetermined parameters that govern model behavior, were selected—those with critical influence on the performance of each algorithm. The search ranges for hyperparameters were determined based on officially recommended settings and preliminary experimental results, to ensure coverage of a reasonable and effective parameter space. Optimization was performed using five-fold Grid Search Cross-Validation (GSCV) [42]. This technique trains models on training data while evaluating performance through cross-validation, ultimately selecting optimal configurations [43]. Final model generalization was assessed on the test set.
During the prediction phase, the nine 1 km × 1 km raster layers served as inputs to the optimal-performance model, forecasting CO2 distribution weights. These weights were used to disaggregate county-level CO2 fossil fuel emissions into high-resolution grids, generating 1 km × 1 km emission maps for Hubei Province (2014–2017). The grid-scale validation assessed the plausibility of the spatial allocation of CO2 emissions. Subsequently, a spatiotemporal trend analysis was performed on the emission patterns. The methodological workflow is illustrated in Figure 2.

2.3.2. Machine Learning Algorithms

Machine learning offers diverse regression algorithms, spanning linear models to complex neural networks. Based on prior studies [31,32,44], we selected several widely used algorithms classified into three categories: Tree-based ensembles, Support Vector Machines (represented by Support Vector Regression), and Linear Models (represented by Linear Regression).
  • Tree-based ensembles algorithm
A decision tree employs a tree-structured model that recursively partitions sample data using feature thresholds, assigning output values to leaf nodes based on dataset characteristics [45]. Tree-based ensembles aggregate predictions from multiple decision trees to enhance overall performance and generalization. This ensemble approach improves prediction accuracy, robustness, and scalability, as well as accommodates diverse data relationships. Common implementations include bagging and boosting techniques.
Bagging (Bootstrap Aggregating) exemplifies parallel ensemble learning. This method constructs multiple base models through bootstrap sampling (random selection with replacement) from the original dataset, combining their predictions via averaging or majority voting [46]. Its primary objectives are variance reduction and enhanced prediction stability. This study employs Extra-Trees and Random Forest (RF) as representative bagging implementations. The Extra-Trees algorithm increases model diversity and mitigates overfitting through two key mechanisms: (1) selecting random feature subsets at each node and (2) implementing random splitting thresholds. Random Forest builds numerous decision trees using feature subsets, aggregating predictions by voting or averaging to improve accuracy and robustness.
Boosting represents a sequential ensemble learning method that iteratively enhances model performance by successively training of weak learners and adjusting sample weights based on prior classification errors [47]. This approach focuses on misclassified samples to reduce overall model bias. This study employs four representative boosting algorithms, which are as follows. (1) XGBoost: It builds trees iteratively with parallel processing and regularization, optimizing performance for large-scale and high-dimensional data. (2) CatBoost: The algorithm handles categorical features automatically and imputes missing values through ordered boosting and oblivious trees. (3) Gradient Boosting Decision Trees (GBDT): This method minimizes residuals via gradient descent and iteratively adjusts instance weights to capture complex non-linear relationships. (4) LightGBM: It accelerates training via leaf-wise growth and histogram binning while maintaining accuracy.
  • Support Vector Regression
SVR extends Support Vector Machines to regression tasks. It employs kernel functions to model non-linear relationships in high-dimensional feature spaces [48]. The core objective identifies an optimal regression hyperplane minimizing structural risk, where this hyperplane maintains minimal deviation from training observations while satisfying a predefined ε-insensitive boundary. This mechanism thus effectively mitigates overfitting.
  • Linear regression
As the simplest supervised learning approach in machine learning, linear regression assumes a linear relationship between the independent and dependent variable, represented by a fitted straight line or hyperplane. In this study, we constructed a linear regression model using the full dataset [49], expressed as:
Y = 0.0024 x
where Y denotes the county-level CO2 fossil fuel emissions and x represents the total brightness of nighttime light for the same county. Model performance was evaluated via five-fold cross-validation [42].

2.3.3. Performance Metrics and Computational Cost

Accuracy assessment quantifies model predictive performance to guide model selection and optimization, making this an essential step in model development. This study employs two metrics widely adopted in CO2 emission evaluation: the coefficient of determination (R2) and root-mean-square error (RMSE). R2 measures regression fit quality, ranging from 0 to 1, where values approaching 1 indicate superior fit. RMSE quantifies prediction error magnitude, with lower values indicating higher accuracy. Given the study’s context, RMSE units are million tons of CO2 (MtCO2). These metrics are defined as:
R 2 = i = 1 n y i ^ y ¯ 2 i = 1 n y i y ¯ 2
R M S E = 1 n i = 1 n y i y i ^ 2
where y i represents CO2 fossil fuel emissions for county i from the county-level emission inventory; y i ^ is the estimated CO2 fossil fuel emissions for county i ; y ¯   denotes the mean CO2 fossil fuel emissions across all n counties; and n is the total number of counties.
Beyond performance metrics, computational cost constitutes another essential consideration in model evaluation. Training time is defined as the duration required to complete a single training cycle on the full training set using the optimal hyperparameters. Inference time refers to the time taken to generate predictions over the entire test set. Together, these two metrics provide a comprehensive assessment of the model’s operational efficiency in practical applications.

2.3.4. Model Explainability Analysis

Machine learning models are often regarded as “black boxes” due to their complex and non-transparent internal relationships. To interpret model outcomes, this study employs two explainability techniques: permutation feature importance and partial dependence plots. The permutation feature importance method, which is a model-agnostic and non-parametric approach, quantifies feature importance without data distribution assumptions, offering wide applicability and interpretability [50]. Partial dependence plots visualize the marginal effects of individual spatial proxies on CO2 fossil fuel emission estimates [51].

2.3.5. Spatial Correction of Gridded CO2 Emissions

A systematic mismatch exists between county-level estimates of CO2 fossil fuel emissions and corresponding emission inventory data. This discrepancy propagates to gridded emissions during spatial allocation, causing aggregated grid-cell values within a county to deviate from inventory records. To reconcile this, a spatial correction procedure was implemented based on proportional scaling [49]. The correction factor U i for county i is defined as:
U i = y i / y ¯ i
where y i is the CO2 emission for county i from the county-level emission inventory, and y ¯ i is the estimated CO2 emission for the same county.
The corrected CO2 emission y i ,   k for the k t h grid cell within county i is then calculated by:
y i , k = y ¯ i , k × U i
where y ¯ i , k is the original estimated CO2 fossil fuel emissions in the k t h grid of county i . This scaling ensures that the sum of corrected gridded emissions within each county exactly matches the county-level emission inventory k y i , k = y i while preserving the original spatial distribution patterns across grids.

2.3.6. Spatial and Temporal Characteristics of CO2 Fossil Fuel Emissions

To quantify interannual trends in CO2 fossil fuel emissions, this study calculated temporal change slopes for 2014–2017 using linear regression [52]. The slope coefficient ( S S L O P E ) is computed as:
S S L O P E = n × i = 1 n x i C i i = 1 n x i i = 1 n C i n × i = 1 n x i 2 ( i = 1 n x i ) 2
where n = 4 represents the 2014–2017 period; x i is the ordinal number of the year i ; C i is the CO2 fossil fuel emissions for year i . A positive S S L O P E indicates increasing emissions, whereas a negative S S L O P E denotes decreasing emissions.

3. Results and Discussion

3.1. Model Performance and Implications

Based on the number of input variables, the models employed in this study were categorized into two groups. The first group comprises models utilizing multiple input variables, including Extra-Trees, CatBoost, XGBoost, GBDT, RF, LightGBM, and SVR. For clarity, these models are denoted by the prefix “MUL-” attached to their original names (e.g., MUL-Extra-Trees, MUL-CatBoost). The second group consists of models that utilize only brightness of nighttime light as the input variable, encompassing Extra-Trees and Linear Regression. Similarly, these models are prefixed with “NTL-” (e.g., NTL-Extra-Trees, NTL-Linear Regression).
Utilizing the normalized training dataset, model hyperparameters were optimized through grid search methodology, with each configuration undergoing multiple training iterations. Model accuracy was subsequently assessed via five-fold cross-validation, incorporating actual county-level CO2 emissions data. This process culminated in the identification of nine optimal model configurations exhibiting the highest goodness-of-fit. The detailed hyperparameter search ranges and the optimal hyperparameter combinations are cataloged in Appendix A Table A2. The performance metrics and computational cost are summarized in Table 2. The results demonstrate that the NTL-Extra-Trees model significantly outperformed the NTL-Linear Regression model (Table 2). This indicates a non-linear relationship between brightness of nighttime light and CO2 fossil fuel emissions (consistent with N. Zhao et al. [53]). Furthermore, machine learning algorithms are better equipped to capture the underlying complex non-linear associations in this relationship. The MUL-Extra-Trees model consistently outperformed both the NTL-Extra-Trees and NTL-Linear Regression models (Table 2). This demonstrates that utilizing brightness of nighttime light as a sole spatial proxy for CO2 fossil fuel emissions is insufficient, whereas integrating multiple spatial proxies effectively compensates for this limitation.
All seven machine learning models—MUL-Extra-Trees, MUL-CatBoost, MUL-XGBoost, MUL-GBDT, MUL-RF, MUL-LightGBM, and MUL-SVR—delivered consistently high-performance metrics (Table 2). This further validates that machine learning algorithms effectively harness complex non-linear associations between various spatial proxies and CO2 fossil fuel emissions. Among the seven machine learning models, the six tree-based ensemble methods generally outperformed the support vector regression model (MUL-SVR; Table 2). This indicates that for estimating CO2 fossil fuel emissions, tree-based ensembles may possess stronger capabilities for identifying complex non-linear relationships between multiple spatial proxies and emission patterns. Collectively, the MUL-Extra-Trees model demonstrated superior performance across both five-fold cross-validation and independent test phases compared to other models. Specifically, it achieved an R2 of 0.96 with an RMSE of 0.52 MtCO2 during cross-validation, and maintained high accuracy on the test set (R2 = 0.92, RMSE = 0.54 MtCO2).
In terms of computational efficiency (Table 2), the models exhibited varying training and inference times. The MUL-SVR model required the shortest training time (0.0237 ± 0.0071 s), while MUL-GBDT incurred the highest training time (0.6890 ± 0.0150 s). All models exhibited very short inference times (≤0.0178 s). Given the requirement for high-precision estimation, the MUL-Extra-Trees model achieved a favorable balance between comprehensive performance and computational efficiency. These results provide practical guidance for future large-scale emission modeling and algorithm selection.

3.2. Dominant Spatial Proxies and Mechanisms

To evaluate feature contributions to model outcomes, permutation feature importance analysis was used. Nine spatial proxy variables were examined: brightness of nighttime light (NTL), population size (POP), residential kernel density (RKD), commercial kernel density (CKD), heavy industry kernel density (HKD), light industry kernel density (LKD), agricultural kernel density (AKD), road length (RDL), and impervious surface block count (ISA).
Permutation feature importance assessments for the seven CO2 emission estimation models reveal consistent patterns (Table 3). Across all tree-based ensembles and support vector regression, brightness of nighttime light (NTL) and heavy industry kernel density (HKD) consistently emerged as the dominant spatial proxies (NTL ranked first in 5/7 models, HKD ranked first in 2/7 models; Table 3). Cumulative importance of top two proxies (NTL + HKD) exceeded 55% in 6 of 7 models, indicating their dominant role in capturing fundamental emission patterns and spatial heterogeneity across diverse non-linear frameworks.
To further investigate the individual contributions of spatial proxies, the highest-performing MUL-Extra-Trees model was selected for partial dependence plots (Figure 3). These plots visualize complex non-linear relationships between each proxy and CO2 fossil fuel emissions. CO2 fossil fuel emissions predominantly originate from fossil fuel combustion and human activities. Consequently, they are concentrated in industrial zones and populated areas, with emission levels strongly correlated to anthropogenic intensity [31,32,54]. Figure 3a–i presents the CO2 fossil fuel emissions with elevated values of each spatial proxy variable in MUL-Extra-Trees model.

3.3. Grid-Scale Spatial Accuracy and Error Sources

3.3.1. Inter-Model Comparison and Validation

Utilizing both the conventional single spatial proxy linear regression model (NTL-Linear Regression) and the optimal multiple spatial proxies Extra-Trees model (MUL-Extra-Trees), county-level CO2 fossil fuel emissions data for Hubei Province (2017) were spatially downscaled to a 1 km × 1 km grid. This process incorporated spatial correction to generate a high-resolution (1 km) CO2 fossil fuel emissions distribution map for Hubei Province in 2017.
To evaluate the similarity in spatial distribution patterns between the NTL-Linear Regression and MUL-Extra-Trees outputs, the Spatial Correlation Coefficient (SCC) was employed [8]. The SCC of 0.59 indicated a strong agreement in their spatial distributions (p < 0.01; Figure 4a,b). However, the two models exhibit significant differences in their capacity to identify potential emission sources corresponding to the spatial locations of residential and heavy industry POIs. The MUL-Extra-Trees model incorporated the kernel density of residential and heavy industry POIs as one of its input features. In contrast, the NTL-Linear Regression model relied on nighttime light data. Consequently, the MUL-Extra-Trees model demonstrates exceptional precision in identifying these two POI types. Throughout the study period, the omission rate for heavy industry point sources remained at 0.00% annually. For residential point sources, omission rates were consistently low at 0.52% (2014), 0.49% (2015), 0.51% (2016), and 0.51% (2017), yielding an average omission rate of 0.51%. In contrast, the NTL-Linear Regression model exhibited substantially higher omission rates for these POI types. For heavy industry point sources, annual omission rates were 34.26% (2014), 37.53% (2015), 35.85% (2016), and 29.92% (2017), yielding a four-year average of 34.39%; residential point sources showed omission rates of 27.17% (2014), 27.30% (2015), 27.33% (2016), and 23.21% (2017), with a corresponding average of 26.25% (Table 4).
Notably, the estimated range of CO2 fossil fuel emissions generated by the NTL-Linear Regression model (0–396.0 kilotons of CO2 [ktCO2]) substantially exceeds that of the MUL-Extra-Trees model (0–111.500 ktCO2), with a ratio of maximum values being 3.55:1. This phenomenon likely originates from the NTL-Linear Regression model’s reliance on nighttime light data. In suburban industrial zones and small rural residential settlements, illumination levels frequently fall below sensor detection thresholds. Consequently, CO2 emissions from these undetected sources are erroneously reallocated to high-intensity urban light areas. This spatial misassignment generates an artificial aggregation pattern, thereby inflating the maximum grid-level emission value to 396.0 ktCO2.
The spatial distribution of CO2 fossil fuel emissions in Hubei Province (2017) from MEIC is presented in Figure 4c. To quantify the divergence between MUL-Extra-Trees and NTL-Linear Regression model outputs, CO2 fossil fuel emissions at 1 km resolution were aggregated to 0.25° grids. These estimates were compared against MEIC using two metrics: the Sum of Absolute Differences (SAD) and the Spatial Correlation Coefficient (SCC) [8]. The SAD was employed to quantify deviations between model outputs and reference data (MEIC), operating on the principle of summing absolute differences between corresponding grid cells. Complementarily, magnitude-independent spatial correlation analysis was applied to assess spatial similarity in regional emission patterns. As shown in Table 5 (Hubei Province, 2014–2017), SAD values between models differ by less than 0.5% (ΔSAD), indicating comparable total emissions. However, the MUL-Extra-Trees model consistently exhibits higher SCC values (0.82–0.84) than the NTL-Linear Regression model (0.78–0.80), demonstrating its greater similarity in spatial distribution patterns to MEIC.
Furthermore, a difference map between was generated at 1 km grid scale by subtracting NTL-Linear Regression results from MUL-Extra-Trees outputs, showing their spatial discrepancies (Figure 4d). Overall, regions where MUL-Extra-Trees estimates exceeded those of NTL-Linear Regression were predominantly located in suburban areas of counties with low-intensity nighttime light. Conversely, areas with lower MUL-Extra-Trees estimates were concentrated in urban cores exhibiting high-intensity nighttime light. At 1 km grid scale, differences between two model estimates predominantly ranged from 0.2 to 1.0 ktCO2. This urban-suburban divergence is consistent with findings of Gurney et al. [12], who demonstrated that satellite-driven emission inventories (e.g., ODIAC, which utilizes nighttime lights data) systematically overestimate emissions in urban cores whereas underestimate the contributions from suburban and rural areas. Consequently, models relying solely on nighttime lights as a proxy (e.g., NTL-Linear Regression) tend to overestimate emissions in high-brightness urban areas while underestimating emissions in dimly lit or unlit regions, such as suburban areas.
In Figure 4e, three areas near Pingjiang Avenue in Xinzhou District, Wuhan (marked with yellow circles) exhibit significantly lower CO2 fossil fuel emission estimates from the MUL-Extra-Trees model compared with NTL-Linear Regression outputs. The emission discrepancy in the range of −377.9 to −300.0 ktCO2 emissions is primarily concentrated at two locations: Yangluo passenger station and residential area (①), and Yangluo Port (②). The grid cell containing Wuhan International Container Transport Co., Ltd., Wuhan, China. (③) predominantly displayed differences of −300.0 to −250.0 ktCO2. These transportation hubs, residential areas, and container transshipment centers exhibit exceptionally high brightness of nighttime light. Consequently, the NTL-Linear Regression model overestimates CO2 fossil fuel emissions at these sites based on nighttime light intensity. In contrast, the MUL-Extra-Trees model integrates multiple spatial proxies, resulting in reduced emission allocations for these areas.

3.3.2. Case Study Analysis of Spatial Error Mechanisms

To further investigate discrepancies in the spatial distribution of CO2 fossil fuel emissions between the MUL-Extra-Trees and NTL-Linear Regression models, three representative areas were selected based on urbanization gradients: (1) Xicheng Development Zone, Zhangwan District, Shiyan City (Figure 5a), representing industrialized urban development; (2) Yanjiahe Town, Macheng City, Huanggang City (Figure 5b), exemplifying remote rural landscapes; and (3) Hongshan Town, Suixian County, Suizhou City (Figure 5c), characterizing urban-rural transition interfaces.
Cement kilns require continuous 24 h operation [55]; however, their enclosed production processes and remote siting result in minimal nighttime light emissions. In Figure 5a, location ① (Gaoqiang Cement Plant, Shiyan, China) exhibits higher CO2 fossil fuel emissions allocated by the MUL-Extra-Trees model compared to the NTL-Linear Regression model within its grid cell. Locations ② (Jiacheng Cement Products Factory, Macheng, China) in Figure 5b and ③ (Suizhou Culvert U-shaped Trough Mingwang Cement Products General Factory, Suizhou, China) in Figure 5c show zero nighttime light values. Consequently, the NTL-Linear Regression model allocated no emissions to these grids, whereas the MUL-Extra-Trees model, incorporating multiple spatial proxies, assigned higher CO2 emissions to both cement production facilities.
Furthermore, in Figure 5a–c, black areas denote impervious surfaces. Particularly in Figure 5b, which depicts a remote rural area, these impervious surfaces delineate the distribution of rural residential settlements. Where impervious surfaces exist with zero nighttime light values and an absence of heavy industrial facilities, the NTL-Linear Regression model allocated no CO2 fossil fuel emissions to these grids. This approach overlooks the potential emissions associated with non-industrial, low-light human activities prevalent in rural residential areas. In contrast, the MUL-Extra-Trees model integrates multiple spatial proxies. Crucially, it mainly leverages the positive correlation between CO2 emissions and impervious surface block counts within each grid cell (Figure 3c). Therefore, in areas such as the rural settlements lacking nighttime light signals and heavy industry, the MUL-Extra-Trees model can still assign emissions mainly based on the presence of these impervious surface block counts.

3.4. Spatial and Temporal Characteristics of CO2 Fossil Fuel Emissions

As illustrated in Figure 6a, CO2 fossil fuel emissions in Hubei Province exhibit distinct spatial agglomeration patterns, forming a multi-nodal distribution centered on Wuhan. The primary emission hotspot is concentrated in the Wuhan core, whereas secondary emission clusters are dispersed across other county-level urban centers (Figure 6b).
Temporally, we examined changes in municipal-level CO2 fossil fuel emissions across three dimensions, which are total emissions, emission density (tons per km2), and emission intensity (including per capita emissions and emissions per unit of 100 million Yuan gross domestic product [GDP]) in Hubei Province from 2014 to 2017 (Figure 7). Total CO2 fossil fuel emissions increased slightly by 1%, and emission density (tons per km2) rose by 2%, likely linked to urban expansion. Concurrently, per capita emissions decreased by 4%, and emissions per unit GDP exhibited a significant reduction of 26% (Figure 7). Detailed municipal data are presented in Table 6. During the 2014–2017 period, Hubei Province experienced modest population growth (1.5%) alongside substantial GDP expansion (34%) (Table 6), suggesting a potential partial transition toward lower-carbon economic restructuring. Furthermore, from 2014 to 2017, CO2 fossil fuel emissions decreased across 54% of Hubei Province’s areas, increased in 36% of regions, and remained stable in the remaining 10% (Figure 8a,b).
The fine-scale spatiotemporal patterns revealed above provide valuable insights for supporting climate policy formulation, particularly in optimizing carbon quota allocation and decarbonization pathways for Hubei Province. The significant spatial clustering of emissions (Figure 6a,b)—evident in the presence of primary and secondary hotspot clusters—suggests that carbon quota distribution could be optimized based on emission density and efficiency within jurisdictions. This approach moves beyond traditional city-level aggregate data toward a more equitable and spatially refined allocation mechanism. Furthermore, identifying regions with pronounced emission growth trends (Figure 8a) enables the implementation of forward-looking, targeted mitigation measures, such as promoting clean energy in specific industrial zones. Such precise interventions can optimize the province’s carbon neutrality pathway. This high-resolution perspective is essential for translating macro-level climate goals into locally actionable strategies.

3.5. Limitations and Future Work

Although this study demonstrates the effectiveness of the MUL-Extra-Trees model in estimating the spatial distribution of CO2 fossil fuel emissions, several limitations warrant consideration. First, this study selected nine spatial proxies for CO2 fossil fuel emission estimation. However, additional variables, such as the Normalized Difference Vegetation Index (NDVI), vehicle classification, building footprints, and GDP, could serve as viable spatial proxies [12,32,58]. Nevertheless, data availability constraints and feature engineering considerations precluded their incorporation.
Second, although the MUL-Extra-Trees model demonstrates good performance in spatial allocation accuracy of CO2 fossil fuel emissions, inherent limitations in POI data constrain its practical applications. Heavy industrial POI data effectively identifies plant locations, including small-scale facilities; however, it fails to differentiate production scales. Future research should integrate remote sensing data to extract plant-scale information. POI datasets, typically generated since 2010, exhibit limited availability in numerous countries. This scarcity constrains the model’s temporal applicability and geographical generalizability.
Third, the model trained in this study at the county level was directly applied to 1 km grids. Although this downscaling process helps to obtain high-resolution emission estimates, it can introduce scale effects, particularly the modifiable areal unit problem (MAUP). The MAUP implies that statistical relationships established at the county scale may break down at the grid scale due to aggregation bias and the neglect of spatial heterogeneity within counties. The present approach assumes scale invariance in the relationship between spatial proxies and CO2 fossil fuel emissions—a necessary yet oversimplified assumption. Although the model demonstrates robust performance at the county level, the scale transfer may lead to biases in grid-level estimates, the magnitude of which remains unquantified. Future research should focus on calibrating and validating the downscaling process through multi-scale modeling or the incorporation of finer-grained data (e.g., point-source emissions) to systematically mitigate scale effects.
Finally, though the machine learning model employed in this study builds upon previous results of related research, individual models inherently possess limited capacity for comprehending variable features. A pragmatic approach is to retain the predictive outputs from all high-performing individual models and integrate them collectively within an ensemble framework.

4. Conclusions

This study aimed to overcome key limitations in the top–down estimation of high-resolution CO2 fossil fuel emissions. Specifically, it addresses two critical issues: (1) the inability of conventional linear regression models to capture complex non-linear relationships between emissions and spatial proxies; (2) the frequent inadequacy of nighttime light data in representing emissions in both industrial zones and rural areas. To overcome these challenges, the study integrated nighttime light, POIs, and road network data to develop nine spatial proxies. Seven machine learning algorithms were subsequently implemented: six tree-based ensembles (namely Extra-Trees, CatBoost, XGBoost, Gradient Boosting Decision Trees, Random Forest, and LightGBM) and SVR. This process established estimation models for CO2 fossil fuel emissions across Hubei Province, incorporating the multiple spatial proxies. Validation confirmed that the Extra-Trees model with multiple spatial proxies (MUL-Extra-Trees) delivered optimal performance. Interpretability analysis and grid-scale validation were subsequently conducted. Finally, we analyzed spatiotemporal patterns of CO2 fossil fuel emissions across Hubei Province during the 2014–2017 period.
The multiple proxies machine learning approach substantially outperformed the single proxy NTL-Linear Regression model in estimation accuracy. The MUL-Extra-Trees model achieved exceptional performance, with R2 = 0.96 (RMSE = 0.52 MtCO2) under five-fold cross-validation and R2 = 0.92 (RMSE = 0.54 MtCO2) on the independent test set. Interpretability analysis revealed that the brightness of nighttime light and heavy industry kernel density demonstrated high importance in constructing the CO2 fossil fuel emissions estimation model. Furthermore, the MUL-Extra-Trees model developed in this study demonstrated enhanced spatial alignment with human activity patterns (as represented by POI data), particularly refined spatial allocation patterns of CO2 fossil fuel emissions in industrial areas and rural settlements with low or zero nighttime lighting. The spatial correlation with MEIC reached 0.82–0.84, indicating high consistency with ground-truth emission patterns. CO2 fossil fuel emissions in Hubei Province exhibited multinodal spatial pattern centered on Wuhan. Moreover, during the 2014–2017 period, municipal CO2 emission intensity per unit GDP decreased by 26% while GDP grew by 34%, with 64% of regions exhibiting stable or declining emissions, suggesting potential progress in low-carbon economic transition.
This study confirms that the MUL-Extra-Trees model effectively captures non-linear relationships between multiple spatial proxies and CO2 fossil fuel emissions while demonstrating high accuracy in spatial distribution estimation. This validation study, conducted in Hubei Province as a pilot region, demonstrates that the multiple spatial proxies machine learning framework represents a viable approach for spatial estimation of CO2 fossil fuel emissions. Moreover, the high-resolution emission maps generated by the proposed framework provide critical scientific support for advancing regional sustainable development. By enabling the accurate monitoring and attribution of CO2 fossil fuel emissions, the findings can directly guide evidence-based policy-making aimed at balancing economic growth with environmental protection. The capability to precisely identify emission hotspots and trends is essential for optimizing carbon quota allocation, guiding low-carbon transitions in key sectors, and ultimately steering regional development toward a more sustainable and resilient pathway. Future research should extend this methodology to broader regions to further verify its generalization capacity.

Author Contributions

Funding acquisition, J.C.; methodology, J.C.; supervision, R.L.; validation, R.L.; visualization, Y.F.; writing—original draft, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant NO. 42171354), the Wuhan Municipal Science and Technology Bureau’s 2023 Knowledge Innova-tion Dawn Plan Project (Grant NO. 2023020201020480).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Variance inflation factors (VIFs) of the spatial proxy.
Table A1. Variance inflation factors (VIFs) of the spatial proxy.
Spatial ProxyVIF
Population size2.84
Brightness of nighttime light4.01
Road length2.92
Residential kernel density1.61
Light industry kernel density2.94
Heavy industry kernel density3.39
Commercial kernel density1.82
Agricultural kernel density3.23
Impervious surface block count2.65
Table A2. The hyperparameter search ranges and the optimal hyperparameter combinations of the models.
Table A2. The hyperparameter search ranges and the optimal hyperparameter combinations of the models.
Input ParameterModel NameHyperparameterSearch RangeOptimal Hyperparameters
Multiple spatial proxiesExtra Treesmax_depth: The maximum depth of the tree.{1, 5, 10, 15, 20, 25, 30}25
max_features: The number of features to consider when looking for the best split.{3, 4, 5, 6, 7, 8, 9}9
min_samples_leaf: The minimum number of samples required to be at a leaf node.{1, 2, 3, 4}1
min_samples_split: The minimum number of samples required to split an internal node.{1, 2, 3, 4}2
n_estimators: The number of trees in the forest.{50, 100, 150, 200, 250}100
CatBoostdepth: Controlling the complexity of individual decision trees.{1, 3, 5, 7, 9}7
learning_rate: Used for reducing the gradient step.{0.01, 0.05, 0.1, 0.15, 0.2}0.1
n_estimators: The number of trees in the model.{50, 100, 150, 200, 250, 300}300
XGBoostlearning_rate: Step size shrinkage used in update to prevent overfitting.{0.01, 0.05, 0.1, 0.15, 0.2, 0.25}0.15
max_depth: Maximum depth of a tree.{1, 2, 3, 4, 5}2
n_estimators: The number of trees in the model.{50, 100, 150, 200, 250, 300, 350, 400, 450, 500}450
GBDTlearning_rate: Learning rate shrinks the contribution of each tree.{0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3}0.15
max_depth: Maximum depth of the individual regression estimators.{1, 2, 3, 4, 5}4
n_estimators: The number of boosting stages to perform.{50, 100, 150, 200, 250, 300, 350, 400}350
RFmax_depth: The maximum depth of the tree.{1, 5, 10, 15, 20, 25, 30}15
max_features: The number of features to consider when looking for the best split.{3, 4, 5, 6, 7, 8, 9}4
min_samples_leaf: The minimum number of samples required to be at a leaf node.{1, 2, 3}1
min_samples_split: The minimum number of samples required to split an internal node.{1, 2, 3, 4}2
n_estimators: The number of trees in the forest.{50, 100, 150, 200, 250, 300}30
SVRC: Regularization parameter.{0.1, 1.0, 10, 100, 1000}100
gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.{0.01, 0.1, 1.0, 10, 100}1
kernel: Specifies the kernel type to be used in the algorithm.{‘linear’, ‘poly’, ‘rbf’}rbf
LightGBMlearning_rate: Boosting learning rate.{0.01, 0.05, 0.07, 0.1, 0.13, 0.15}0.07
max_depth: Maximum tree depth for base learners.{1, 3, 5, 7, 9}5
n_estimators: Number of boosted trees to fit.{50, 100, 500, 700, 900, 1100, 1300}1100
num_leaves: Maximum tree leaves for base learners.{1, 5, 10, 15, 20}15
NTLExtra Treesmax_depth: The maximum depth of the tree.{1, 5, 10, 15, 20, 25, 30}5
min_samples_leaf: The minimum number of samples required to be at a leaf node.{1, 2, 3, 4}1
min_samples_split: The minimum number of samples required to split an internal node.{1, 2, 3, 4}4
n_estimators: The number of trees in the forest.{50, 100, 150, 200, 250}50
Figure A1. Correlation coefficient matrix between spatial proxies. Notes: POP: population size; NTL: brightness of nighttime light; RDL: road length; AKD: agricultural kernel density; CKD: commercial kernel density; HKD: heavy industry kernel density; LKD: light industry kernel density; RKD: residential kernel density; ISA: impervious surface block count.
Figure A1. Correlation coefficient matrix between spatial proxies. Notes: POP: population size; NTL: brightness of nighttime light; RDL: road length; AKD: agricultural kernel density; CKD: commercial kernel density; HKD: heavy industry kernel density; LKD: light industry kernel density; RKD: residential kernel density; ISA: impervious surface block count.
Sustainability 17 09009 g0a1

References

  1. Hoegh-Guldberg, O.; Jacob, D.; Taylor, M.; Guillén Bolaños, T.; Bindi, M.; Brown, S.; Camilloni, I.A.; Diedhiou, A.; Djalante, R.; Ebi, K.; et al. The Human Imperative of Stabilizing Global Climate Change at 1.5 °C. Science 2019, 365, eaaw6974. [Google Scholar] [CrossRef] [PubMed]
  2. Xu, G.; Schwarz, P.; Yang, H. Adjusting Energy Consumption Structure to Achieve China’s CO2 Emissions Peak. Renew. Sustain. Energy Rev. 2020, 122, 109737. [Google Scholar] [CrossRef]
  3. Marland, G.; Rotty, R.M.; Treat, N.L. CO2 from Fossil Fuel Burning: Global Distribution of Emissions. Tellus B 1985, 37, 243–258. [Google Scholar] [CrossRef]
  4. Andres, R.J.; Marland, G.; Fung, I.; Matthews, E. A 1° × 1° Distribution of Carbon Dioxide Emissions from Fossil Fuel Consumption and Cement Manufacture, 1950–1990. Glob. Biogeochem. Cycles 1996, 10, 419–429. [Google Scholar] [CrossRef]
  5. Cai, B.; Zhang, L. Urban CO2 Emissions in China: Spatial Boundary and Performance Comparison. Energy Policy 2014, 66, 557–567. [Google Scholar] [CrossRef]
  6. Gurney, K.R.; Mendoza, D.L.; Zhou, Y.; Fischer, M.L.; Miller, C.C.; Geethakumar, S.; De La Rue Du Can, S. High Resolution Fossil Fuel Combustion CO2 Emission Fluxes for the United States. Environ. Sci. Technol. 2009, 43, 5535–5541. [Google Scholar] [CrossRef] [PubMed]
  7. Oda, T.; Maksyutov, S.; Andres, R.J. The Open-Source Data Inventory for Anthropogenic CO2, Version 2016 (ODIAC2016): A Global Monthly Fossil Fuel CO2 Gridded Emissions Data Product for Tracer Transport Simulations and Surface Flux Inversions. Earth Syst. Sci. Data 2018, 10, 87–107. [Google Scholar] [CrossRef]
  8. Rayner, P.J.; Raupach, M.R.; Paget, M.; Peylin, P.; Koffi, E. A New Global Gridded Data Set of CO2 Emissions from Fossil Fuel Combustion: Methodology and Evaluation. J. Geophys. Res. Atmos. 2010, 115, D19306. [Google Scholar] [CrossRef]
  9. Andres, R.J.; Boden, T.A.; Bréon, F.-M.; Ciais, P.; Davis, S.; Erickson, D.; Gregg, J.S.; Jacobson, A.; Marland, G.; Miller, J.; et al. A Synthesis of Carbon Dioxide Emissions from Fossil-Fuel Combustion. Biogeosciences 2012, 9, 1845–1871. [Google Scholar] [CrossRef]
  10. European Commission; Joint Research Centre; Institute for Environment and Sustainability; PBL Netherlands Environmental Assessment Agency. Trends in Global CO2 Emissions: 2012 Report; Publications Office: Luxembourg, 2012.
  11. Asefi-Najafabady, S.; Rayner, P.J.; Gurney, K.R.; McRobert, A.; Song, Y.; Coltin, K.; Huang, J.; Elvidge, C.; Baugh, K. A Multiyear, Global Gridded Fossil Fuel CO2 Emission Data Product: Evaluation and Analysis of Results. J. Geophys. Res. Atmos. 2014, 119, 10213–10231. [Google Scholar] [CrossRef]
  12. Gurney, K.R.; Liang, J.; Patarasuk, R.; Song, Y.; Huang, J.; Roest, G. The Vulcan Version 3.0 High-Resolution Fossil Fuel CO2 Emissions for the United States. J. Geophys. Res. Atmos. 2020, 125, e2020JD032974. [Google Scholar] [CrossRef]
  13. Li, M.; Liu, H.; Geng, G.; Hong, C.; Liu, F.; Song, Y.; Tong, D.; Zheng, B.; Cui, H.; Man, H.; et al. Anthropogenic Emission Inventories in China: A Review. Natl. Sci. Rev. 2017, 4, 834–866. [Google Scholar] [CrossRef]
  14. Committee on Development of a Framework for Evaluating Global Greenhouse Gas Emissions Information for Decision Making; Board on Atmospheric Sciences and Climate; Division on Earth and Life Studies; National Academies of Sciences, Engineering, and Medicine. Greenhouse Gas Emissions Information for Decision Making: A Framework Going Forward; National Academies Press: Washington, DC, USA, 2022; ISBN 978-0-309-69114-7.
  15. Van Vuuren, D.P.; Smith, S.J.; Riahi, K. Downscaling Socioeconomic and Emissions Scenarios for Global Environmental Change Research: A Review. WIREs Clim. Change 2010, 1, 393–404. [Google Scholar] [CrossRef]
  16. Olivier, J.G.J.; Van Aardenne, J.A.; Dentener, F.J.; Pagliari, V.; Ganzeveld, L.N.; Peters, J.A.H.W. Recent Trends in Global Greenhouse Gas Emissions:Regional Trends 1970–2000 and Spatial Distributionof Key Sources in 2000. Environ. Sci. 2005, 2, 81–99. [Google Scholar] [CrossRef]
  17. Doll, C.H.; Muller, J.-P.; Elvidge, C.D. Night-Time Imagery as a Tool for Global Mapping of Socioeconomic Parameters and Greenhouse Gas Emissions. AMBIO J. Hum. Environ. 2000, 29, 157–162. [Google Scholar] [CrossRef]
  18. Meng, L.; Graus, W.; Worrell, E.; Huang, B. Estimating CO2 (Carbon Dioxide) Emissions at Urban Scales by DMSP/OLS (Defense Meteorological Satellite Program’s Operational Linescan System) Nighttime Light Imagery: Methodological Challenges and a Case Study for China. Energy 2014, 71, 468–478. [Google Scholar] [CrossRef]
  19. Gately, C.K.; Hutyra, L.R. Large Uncertainties in Urban-Scale Carbon Emissions. J. Geophys. Res. 2017, 122, 11242–11260. [Google Scholar] [CrossRef]
  20. Gurney, K.R.; Razlivanov, I.; Song, Y.; Zhou, Y.; Benes, B.; Abdul-Massih, M. Quantification of Fossil Fuel CO2 Emissions on the Building/Street Scale for a Large U.S. City. Env. Sci Technol. 2012, 46, 12194–12202. [Google Scholar] [CrossRef]
  21. Gurney, K.R.; Patarasuk, R.; Liang, J.; Song, Y.; O’Keeffe, D.; Rao, P.; Whetstone, J.R.; Duren, R.M.; Eldering, A.; Miller, C. The Hestia Fossil Fuel CO2 Emissions Data Product for the Los Angeles Megacity (Hestia-LA). Earth Syst. Sci. Data 2019, 11, 1309–1335. [Google Scholar] [CrossRef]
  22. Cai, M.; Shi, Y.; Ren, C.; Yoshida, T.; Yamagata, Y.; Ding, C.; Zhou, N. The Need for Urban Form Data in Spatial Modeling of Urban Carbon Emissions in China: A Critical Review. J. Clean. Prod. 2021, 319, 128792. [Google Scholar] [CrossRef]
  23. Elvidge, C.D.; Imhoff, M.L.; Baugh, K.E.; Hobson, V.R.; Nelson, I.; Safran, J.; Dietz, J.B.; Tuttle, B.T. Night-Time Lights of the World: 1994–1995. ISPRS J. Photogramm. Remote Sens. 2001, 56, 81–99. [Google Scholar] [CrossRef]
  24. Oda, T.; Maksyutov, S. A Very High-Resolution (1 km × 1 km) Global Fossil Fuel CO2 Emission Inventory Derived Using a Point Source Database and Satellite Observations of Nighttime Lights. Atmos. Chem. Phys. 2011, 11, 543–556. [Google Scholar] [CrossRef]
  25. Bao, W.; Gong, A.; Zhao, Y.; Chen, S.; Ba, W.; He, Y. High-Precision Population Spatialization in Metropolises Based on Ensemble Learning: A Case Study of Beijing, China. Remote Sens. 2022, 14, 3654. [Google Scholar] [CrossRef]
  26. Wang, J.; Wei, J.; Zhang, W.; Liu, Z.; Du, X.; Liu, W.; Pan, K. High-Resolution Temporal and Spatial Evolution of Carbon Emissions from Building Operations in Beijing. J. Clean. Prod. 2022, 376, 134272. [Google Scholar] [CrossRef]
  27. Zheng, Y.; Du, S.; Zhang, X.; Bai, L.; Wang, H. Estimating Carbon Emissions in Urban Functional Zones Using Multi-Source Data: A Case Study in Beijing. Build. Environ. 2022, 212, 108804. [Google Scholar] [CrossRef]
  28. Zhang, X.; Xie, Y.; Jiao, J.; Zhu, W.; Guo, Z.; Cao, X.; Liu, J.; Xi, G.; Wei, W. How to Accurately Assess the Spatial Distribution of Energy CO2 Emissions? Based on POI and NPP-VIIRS Comparison. J. Clean. Prod. 2023, 402, 136656. [Google Scholar] [CrossRef]
  29. Sutton, P.C.; Anderson, S.J.; Elvidge, C.D.; Tuttle, B.T.; Ghosh, T. Paving the Planet: Impervious Surface as Proxy Measure of the Human Ecological Footprint. Prog. Phys. Geogr. Earth Environ. 2009, 33, 510–527. [Google Scholar] [CrossRef]
  30. Wang, M.; Wang, Y.; Li, S.; Lin, Y.; Teng, F.; Cai, H. Spatio-temporal difference analysis of carbon emissions in Chang-Zhu-Tan urban agglomeration based on multi-source remote sensing data. Bull. Surv. Mapp. 2023, 1, 65. [Google Scholar]
  31. Cao, H.; Han, L.; Liu, M.; Li, L. Spatial Differentiation of Carbon Emissions from Energy Consumption Based on Machine Learning Algorithm: A Case Study during 2015–2020 in Shaanxi, China. J. Environ. Sci. 2023, 149, S1001074223003558. [Google Scholar] [CrossRef]
  32. Liu, Z.; Han, L.; Liu, M. Spatiotemporal Characteristics of Carbon Emissions in Shaanxi, China, during 2012–2019: A Machine Learning Method with Multiple Variables. Environ. Sci. Pollut. Res. 2023, 30, 87535–87548. [Google Scholar] [CrossRef]
  33. Zhao, C.; Zhang, M.; Bai, J.; Wu, J.; Chang, I.-S. A Review of the Application of Machine Learning in Carbon Emission Assessment Studies: Prediction Optimization and Driving Factor Selection. Sci. Total Environ. 2025, 987, 179678. [Google Scholar] [CrossRef] [PubMed]
  34. National Development and Reform Commission. National Development and Reform Commission on Low-Carbon Provinces and Areas and Low-Carbon Cities on Pilot Work. 2011. Available online: https://www.ndrc.gov.cn/xxgk/zcfb/tz/201008/t20100810_964674.html (accessed on 14 June 2025).
  35. Wen, Y.; Hu, P.; Li, J.; Liu, Q.; Shi, L.; Ewing, J.; Ma, Z. Does China’s Carbon Emissions Trading Scheme Really Work? A Case Study of the Hubei Pilot. J. Clean. Prod. 2020, 277, 124151. [Google Scholar] [CrossRef]
  36. Chen, Z.; Yu, B.; Yang, C.; Zhou, Y.; Yao, S.; Qian, X.; Wang, C.; Wu, B.; Wu, J. An Extended Time Series (2000–2018) of Global NPP-VIIRS-like Nighttime Light Data from a Cross-Sensor Calibration. Earth Syst. Sci. Data 2021, 13, 889–906. [Google Scholar] [CrossRef]
  37. Gong, P.; Li, X.; Wang, J.; Bai, Y.; Chen, B.; Hu, T.; Liu, X.; Xu, B.; Yang, J.; Zhang, W.; et al. Annual Maps of Global Artificial Impervious Area (GAIA) between 1985 and 2018. Remote Sens. Environ. 2020, 236, 111510. [Google Scholar] [CrossRef]
  38. Chen, J.; Gao, M.; Cheng, S.; Hou, W.; Song, M.; Liu, X.; Liu, Y.; Shan, Y. County-Level CO2 Emissions and Sequestration in China during 1997–2017. Sci. Data 2020, 7, 391. [Google Scholar] [CrossRef]
  39. Geng, G.; Liu, Y.; Liu, Y.; Liu, S.; Cheng, J.; Yan, L.; Wu, N.; Hu, H.; Tong, D.; Zheng, B.; et al. Efficacy of China’s Clean Air Actions to Tackle PM2.5 Pollution between 2013 and 2020. Nat. Geosci. 2024, 17, 987–994. [Google Scholar] [CrossRef]
  40. Lloyd, C.T.; Chamberlain, H.; Kerr, D.; Yetman, G.; Pistolesi, L.; Stevens, F.R.; Gaughan, A.E.; Nieves, J.J.; Hornby, G.; MacManus, K.; et al. Global Spatio-Temporally Harmonised Datasets for Producing High-Resolution Gridded Population Distribution Datasets. Big Earth Data 2019, 3, 108–139. [Google Scholar] [CrossRef]
  41. International Energy Agency. China–Emissions. 2021. Available online: https://www.iea.org/countries/china/emissions (accessed on 10 June 2025).
  42. Zhang, Y.; Liang, S.; Zhu, Z.; Ma, H.; He, T. Soil Moisture Content Retrieval from Landsat 8 Data Using Ensemble Learning. ISPRS J. Photogramm. Remote Sens. 2022, 185, 32–47. [Google Scholar] [CrossRef]
  43. Adnan, M.; Alarood, A.A.S.; Uddin, M.I.; Ur Rehman, I. Utilizing Grid Search Cross-Validation with Adaptive Boosting for Augmenting Performance of Machine Learning Models. PeerJ Comput. Sci. 2022, 8, e803. [Google Scholar] [CrossRef]
  44. Lin, X.; Ma, J.; Chen, H.; Shen, F.; Ahmad, S.; Li, Z. Carbon Emissions Estimation and Spatiotemporal Analysis of China at City Level Based on Multi-Dimensional Data and Machine Learning. Remote Sens. 2022, 14, 3014. [Google Scholar] [CrossRef]
  45. Charbuty, B.; Abdulazeez, A. Classification Based on Decision Tree Algorithm for Machine Learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  46. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  47. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  48. Tanveer, M.; Rajani, T.; Rastogi, R.; Shao, Y.H.; Ganaie, M.A. Comprehensive Review on Twin Support Vector Machines. Ann. Oper. Res. 2022, 339, 1223–1268. [Google Scholar] [CrossRef]
  49. Zhang, X.; Cai, Z.; Song, W.; Yang, D. Mapping the Spatial-Temporal Changes in Energy Consumption-Related Carbon Emissions in the Beijing-Tianjin-Hebei Region via Nighttime Light Data. Sustain. Cities Soc. 2023, 94, 104476. [Google Scholar] [CrossRef]
  50. Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation Importance: A Corrected Feature Importance Measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
  51. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  52. He, C.; Ma, Q.; Li, T.; Yang, Y.; Liu, Z. Spatiotemporal Dynamics of Electric Power Consumption in Chinese Mainland from 1995 to 2008 Modeled Using DMSP/OLS Stable Nighttime Lights Data. J. Geogr. Sci. 2012, 22, 125–136. [Google Scholar] [CrossRef]
  53. Zhao, N.; Samson, E.L.; Currit, N.A. Nighttime-Lights-Derived Fossil Fuel Carbon Dioxide Emission Maps and Their Limitations. Photogramm. Eng. Remote Sens. 2015, 81, 935–943. [Google Scholar] [CrossRef]
  54. Bun, R.; Hamal, K.; Gusti, M.; Bun, A. Spatial GHG Inventory at the Regional Level: Accounting for Uncertainty. Clim. Change 2010, 103, 227–244. [Google Scholar] [CrossRef]
  55. GB 4915-2013; Emission Standard of Air Pollutants for Cement Industry. Ministry of Ecology and Environment of the People’s Republic of China: Beijing, China, 2013. Available online: https://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/dqhjbh/dqgdwrywrwpfbz/201312/t20131227_265765.shtml (accessed on 7 July 2025).
  56. Hubei Provincial Bureau of Statistics. Hubei Statistical Yearbook 2015; Hubei Provincial Bureau of Statistics: Wuhan, China, 2015. Available online: http://tjj.hubei.gov.cn/tjsj/sjkscx/tjnj/qstjnj/index.shtml (accessed on 15 July 2025).
  57. Hubei Provincial Bureau of Statistics. Hubei Statistical Yearbook 2018; Hubei Provincial Bureau of Statistics: Beijing, China, 2018. Available online: http://tjj.hubei.gov.cn/tjsj/sjkscx/tjnj/qstjnj/index.shtml (accessed on 15 July 2025).
  58. Moran, D.; Pichler, P.-P.; Zheng, H.; Muri, H.; Klenner, J.; Kramel, D.; Többen, J.; Weisz, H.; Wiedmann, T.; Wyckmans, A.; et al. Estimating CO2 Emissions for 108 000 European Cities. Earth Syst. Sci. Data 2022, 14, 845–864. [Google Scholar] [CrossRef]
Figure 1. Location of the study area (Hubei Province, China).
Figure 1. Location of the study area (Hubei Province, China).
Sustainability 17 09009 g001
Figure 2. Flowchart of the framework.
Figure 2. Flowchart of the framework.
Sustainability 17 09009 g002
Figure 3. Partial dependency plots for each spatial proxy variable in MUL-Extra-Trees model: (a) Brightness of nighttime light (NTL); (b) Heavy industry kernel density (HKD); (c) Impervious surface block count (ISA); (d) Population size (POP); (e) Road length (RDL); (f) Light industry kernel density (LKD); (g) Agricultural kernel density (AKD); (h) Residential kernel density (RKD); (i) Commercial kernel density (CKD). Notes: Vertical axes: Carbon emissions in million tons of CO2 (MtCO2). Solid blue lines: fitted curves; black ticks: distribution density of input variables.
Figure 3. Partial dependency plots for each spatial proxy variable in MUL-Extra-Trees model: (a) Brightness of nighttime light (NTL); (b) Heavy industry kernel density (HKD); (c) Impervious surface block count (ISA); (d) Population size (POP); (e) Road length (RDL); (f) Light industry kernel density (LKD); (g) Agricultural kernel density (AKD); (h) Residential kernel density (RKD); (i) Commercial kernel density (CKD). Notes: Vertical axes: Carbon emissions in million tons of CO2 (MtCO2). Solid blue lines: fitted curves; black ticks: distribution density of input variables.
Sustainability 17 09009 g003
Figure 4. Spatial distribution of CO2 fossil fuel emissions at 1 km × 1 km resolution in Hubei Province (2017): (a) NTL-Linear Regression model; (b) MUL-Extra-Trees model; (c) MEIC; (d) Difference between MUL-Extra-Trees and NTL-Linear Regression model; (e) Local detail near Pingjiang Avenue, Xinzhou District, Wuhan City. Notes: Annotations in (e): ① Yangluo passenger station and residential area; ② Yangluo Port; ③ Wuhan International Container Transport Co., Ltd., Wuhan, China.
Figure 4. Spatial distribution of CO2 fossil fuel emissions at 1 km × 1 km resolution in Hubei Province (2017): (a) NTL-Linear Regression model; (b) MUL-Extra-Trees model; (c) MEIC; (d) Difference between MUL-Extra-Trees and NTL-Linear Regression model; (e) Local detail near Pingjiang Avenue, Xinzhou District, Wuhan City. Notes: Annotations in (e): ① Yangluo passenger station and residential area; ② Yangluo Port; ③ Wuhan International Container Transport Co., Ltd., Wuhan, China.
Sustainability 17 09009 g004
Figure 5. Spatial distribution of CO2 fossil fuel emissions at the grid scale for NTL-Linear Regression and MUL-Extra-Trees models. Representative areas: (a) Xicheng Development Zone, Zhangwan District, Shiyan City; (b) Yanjiahe Town, Macheng City, Huanggang City; (c) Hongshan Town, Suixian County, Suizhou City. Notes: Annotations: ① Gaoqiang Cement plant, Shiyan, China; ② Jiacheng Cement Products Factory, Macheng, China; ③ Suizhou Culvert U-shaped Trough Mingwang Cement Products General Factory, Suizhou, China.
Figure 5. Spatial distribution of CO2 fossil fuel emissions at the grid scale for NTL-Linear Regression and MUL-Extra-Trees models. Representative areas: (a) Xicheng Development Zone, Zhangwan District, Shiyan City; (b) Yanjiahe Town, Macheng City, Huanggang City; (c) Hongshan Town, Suixian County, Suizhou City. Notes: Annotations: ① Gaoqiang Cement plant, Shiyan, China; ② Jiacheng Cement Products Factory, Macheng, China; ③ Suizhou Culvert U-shaped Trough Mingwang Cement Products General Factory, Suizhou, China.
Sustainability 17 09009 g005
Figure 6. Spatial patterns of CO2 fossil fuel emissions in Hubei Province: (a) Multi-year average emissions (2014–2017); (b) Kernel density estimation of emissions (2017).
Figure 6. Spatial patterns of CO2 fossil fuel emissions in Hubei Province: (a) Multi-year average emissions (2014–2017); (b) Kernel density estimation of emissions (2017).
Sustainability 17 09009 g006
Figure 7. CO2 fossil fuel emissions from 2014 to 2017: (a) Per unit of GDP; (b) Per capita.
Figure 7. CO2 fossil fuel emissions from 2014 to 2017: (a) Per unit of GDP; (b) Per capita.
Sustainability 17 09009 g007
Figure 8. CO2 fossil fuel emissions in Hubei Province, 2014–2017: (a) Temporal trends; (b) Pie chart of emission change proportions.
Figure 8. CO2 fossil fuel emissions in Hubei Province, 2014–2017: (a) Temporal trends; (b) Pie chart of emission change proportions.
Sustainability 17 09009 g008
Table 1. List of datasets and sources used in the study.
Table 1. List of datasets and sources used in the study.
CategoryDatasetsFormatTimeSources
Socioeconomic dataPoint of interestVector (Point)2014–2017Baidu Map Services
Road networkVector (Polyline)2014–2017OpenStreetMap (https://download.geofabrik.de/asia/china.html, accessed on 16 March 2024)
WorldPopRaster (1 km)2014–2017WorldPop (https://hub.worldpop.org/project/categories?id=3, accessed on 15 March 2024)
Nighttime light imageRaster (500 m)2014–2017Chen et al. [36]
Impervious surfaceRaster (30 m)2014–2017PENG CHENG LABORATORY, Gong et al. [37]
CO2 emissions dataCounty-level CO2 emissionsTable2014–2017Carbon Emission Accounts and Datasets (https://www.ceads.net/, accessed on 15 March 2024)
MEIC-China-CO2 1.4Raster (0.25°)2014–2017Multi-resolution Emission Inventory model for Climate and air pollution research (http://meicmodel.org.cn/#firstPage, accessed on 20 May 2024)
Basic geographic dataAdministrative boundariesVector (Polygon)2021National Catalogue Service for Geographic Information (https://www.webmap.cn/commres.do?method=dataDownload, accessed on 16 March 2024)
Table 2. The performance metrics of the five-fold cross-validation and test set and computational cost.
Table 2. The performance metrics of the five-fold cross-validation and test set and computational cost.
Input ParameterModel NameFive-Fold Cross-Validation Performance MetricsTest Set Performance MetricsComputational Cost
R2RMSE (MtCO2)R2RMSE (MtCO2)Training Time (s)Inference Time (s)
Multiple spatial proxiesMUL-Extra-Trees a0.960.520.920.540.2780 ± 0.02520.0178 ± 0.0023
MUL-CatBoost0.960.570.910.580.4114 ± 0.10920.0026 ± 0.0017
MUL-XGBoost0.960.580.880.650.3634 ± 0.30540.0024 ± 0.0008
MUL-GBDT0.940.660.850.740.6890 ± 0.01500.0016 ± 0.0008
MUL-RF0.940.670.880.660.3616 ± 0.01940.0159 ± 0.0014
MUL-SVR0.940.700.850.730.0237 ± 0.00710.0030 ± 0.0011
MUL-LightGBM0.930.710.890.630.1786 ± 0.20380.0022 ± 0.0004
NTLNTL-Extra-Trees0.731.430.601.200.0084 ± 0.0148 0.0002 ± 0.0004
NTL-Linear Regression b0.412.06N/AN/AN/AN/A
a The MUL-Extra-Trees model achieved optimal performance. b The NTL-Linear Regression model was trained using the entire dataset during five-fold cross-validation (no hold-out test set), as linear regression requires maximal data for coefficient stability. Metrics are only comparable in cross-validation results. Abbreviations: N/A, not available; MtCO2, million tons of carbon dioxide. Notes: Training and inference times are reported as mean ± standard deviation based on 5 repeated runs. All experiments were conducted on a uniform hardware platform (Intel Core i5-8265U CPU @ 1.60 GHz, 8 GB RAM) to ensure comparability.
Table 3. Permutation feature importance (%) of spatial proxy variables across seven machine learning models.
Table 3. Permutation feature importance (%) of spatial proxy variables across seven machine learning models.
VariableMUL-Extra-TreesMUL-CatBoostMUL-XGBoostMUL-GBDTMUL-RFMUL-SVRMUL-LightGBM
NTL40.7030.3834.2527.9631.1338.8525.63
HKD21.1118.9930.0031.8127.7017.8942.60
ISA16.9211.597.339.1018.443.989.57
POP8.469.7913.1112.045.6411.418.97
RDL8.1411.767.8715.5910.8913.406.86
LKD1.813.931.340.791.623.072.11
AKD1.264.023.771.401.136.811.66
RKD1.146.381.220.862.730.931.86
CKD0.463.171.110.460.733.660.74
NTL: brightness of nighttime light; HKD: heavy industry kernel density; ISA: impervious surface block count; POP: population size; RDL: road length; LKD: light industry kernel density; AKD: agricultural kernel density; RKD: residential kernel density; CKD: commercial kernel density. Dominant proxies: NTL ranked first in 5/7 models, and HKD ranked first in MUL-GBDT and MUL-LightGBM. Cumulative importance of top two proxies (NTL + HKD) exceeded 55% in 6 of 7 models. The exception was MUL-CatBoost (49.37%).
Table 4. Annual omission rates (%) for heavy industry and residential POI categories by model type in Hubei Province, 2014–2017.
Table 4. Annual omission rates (%) for heavy industry and residential POI categories by model type in Hubei Province, 2014–2017.
Model TypePOI Category2014201520162017
NTL-Linear RegressionHeavy industry34.2637.5335.8529.92
Residential27.1727.3027.3323.21
MUL-Extra-TreesHeavy industry0.000.000.000.00
Residential0.520.490.510.51
POI omission rate = (Number of omitted POIs/Total POIs for category) × 100%, where POIs are Points of Interest. Heavy industry and residential POI categories were calculated separately.
Table 5. Comparison of NTL-Linear Regression and MUL-Extra-Trees models against MEIC in Hubei Province (2014–2017).
Table 5. Comparison of NTL-Linear Regression and MUL-Extra-Trees models against MEIC in Hubei Province (2014–2017).
Model TypeMetric2014201520162017
NTL-Linear RegressionSAD (MtCO2)2.102.052.042.04
SCC0.790.790.780.80
MUL-Extra-TreesSAD (MtCO2)2.102.041.992.06
SCC0.820.820.830.84
All spatial correlation coefficients were statistically significant (p < 0.01). Abbreviations: SAD, sum of absolute differences; SCC, spatial correlation coefficient; MtCO2, million tons of carbon dioxide.
Table 6. Municipal-level changes in socioeconomic and emission indicators (2014–2017).
Table 6. Municipal-level changes in socioeconomic and emission indicators (2014–2017).
NameΔPopulation (%)ΔGDP (%)ΔCO2 Fossil Fuel Emissions (%)ΔPer Capita Emissions (%)ΔPer Unit GDP Emissions (%)
Wuhan5.3733.181.35−3.81−23.90
Huangshi0.8721.410.65−0.22−17.10
Shiyan1.3435.93−8.69−9.90−32.83
Yichang0.7623.15−5.05−5.77−22.90
Xiangyang0.9629.90−6.97−7.85−28.38
Ezhou1.7131.941.42−0.28−23.13
Jingmen0.4326.98−5.49−5.89−25.57
Xiaogan1.1028.60−2.55−3.62−24.23
Jingzhou−1.7829.83−3.53−1.78−25.70
Huanggang1.2530.10−3.58−4.77−25.89
Xianning1.8428.061.48−0.36−20.76
Suizhou1.2229.34−8.14−9.25−28.98
Enshi1.3130.92−9.27−10.44−30.70
Xiantao−2.1430.13−6.42−4.37−28.09
Qianjiang1.1124.370.96−0.15−18.82
Tianmen−0.6331.45−7.32−6.74−29.50
Shennongjia0.1326.04−11.89−12.00−30.09
Provincial average growth rates (2014–2017): population = 1.5%, GDP = 34%. Data sources: Hubei Statistical Yearbook 2015 [56] and Hubei Statistical Yearbook 2018 [57].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, Y.; Li, R.; Cao, J. Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability 2025, 17, 9009. https://doi.org/10.3390/su17209009

AMA Style

Fang Y, Li R, Cao J. Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability. 2025; 17(20):9009. https://doi.org/10.3390/su17209009

Chicago/Turabian Style

Fang, Yujun, Rong Li, and Jun Cao. 2025. "Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach" Sustainability 17, no. 20: 9009. https://doi.org/10.3390/su17209009

APA Style

Fang, Y., Li, R., & Cao, J. (2025). Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability, 17(20), 9009. https://doi.org/10.3390/su17209009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop