This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach
by
Yujun Fang
Yujun Fang 1,
Rong Li
Rong Li 1,2,* and
Jun Cao
Jun Cao 3
1
Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China
2
Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan 430062, China
3
School of Architecture and Engineering, Wuhan City Polytechnic, Wuhan 430064, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(20), 9009; https://doi.org/10.3390/su17209009 (registering DOI)
Submission received: 2 September 2025
/
Revised: 4 October 2025
/
Accepted: 6 October 2025
/
Published: 11 October 2025
Abstract
High-resolution CO2 fossil fuel emission data are critical for developing targeted mitigation policies. As a key approach for estimating spatial distributions of CO2 emissions, top–down methods typically rely upon spatial proxies to disaggregate administrative-level emission to finer spatial scales. However, conventional linear regression models may fail to capture complex non-linear relationships between proxies and emissions. Furthermore, methods relying on nighttime light data are mostly inadequate in representing emissions for both industrial and rural zones. To address these limitations, this study developed a multiple proxy framework integrating nighttime light, points of interest (POIs), population, road networks, and impervious surface area data. Seven machine learning algorithms—Extra-Trees, Random Forest, XGBoost, CatBoost, Gradient Boosting Decision Trees, LightGBM, and Support Vector Regression—were comprehensively incorporated to estimate high-resolution CO2 fossil fuel emissions. Comprehensive evaluation revealed that the multiple proxy Extra-Trees model significantly outperformed the single-proxy nighttime light linear regression model at the county scale, achieving R2 = 0.96 (RMSE = 0.52 MtCO2) in cross-validation and R2 = 0.92 (RMSE = 0.54 MtCO2) on the independent test set. Feature importance analysis identified brightness of nighttime light (40.70%) and heavy industrial density (21.11%) as the most critical spatial proxies. The proposed approach also showed strong spatial consistency with the Multi-resolution Emission Inventory for China, exhibiting correlation coefficients of 0.82–0.84. This study demonstrates that integrating local multiple proxy data with machine learning corrects spatial biases inherent in traditional top–down approaches, establishing a transferable framework for high-resolution emissions mapping.
Share and Cite
MDPI and ACS Style
Fang, Y.; Li, R.; Cao, J.
Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability 2025, 17, 9009.
https://doi.org/10.3390/su17209009
AMA Style
Fang Y, Li R, Cao J.
Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability. 2025; 17(20):9009.
https://doi.org/10.3390/su17209009
Chicago/Turabian Style
Fang, Yujun, Rong Li, and Jun Cao.
2025. "Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach" Sustainability 17, no. 20: 9009.
https://doi.org/10.3390/su17209009
APA Style
Fang, Y., Li, R., & Cao, J.
(2025). Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach. Sustainability, 17(20), 9009.
https://doi.org/10.3390/su17209009
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.