1. Introduction
The rise in greenhouse gas (GHG) emissions underscores the urgent need for global action to combat climate change. As awareness of the greenhouse effect and its indirect consequences grows, the “carbon peak-carbon neutrality” target for the middle of this century has become the central focus of global climate governance [
1,
2,
3]. Establishing accurate CO
2 emission prediction models to forecast and understand CO
2 emission trends is essential for mitigating climate change and its impacts [
4,
5,
6,
7]. Interpretable carbon emission prediction simulations, which guide practical actions, form the foundational basis and methodological cornerstone for transforming macro-level “dual carbon” objectives into actionable engineering and policy measures [
8].
Numerous scholars have developed three types of CO
2 emission accounting models. Statistical and econometric models use activity data and emission factors to calculate target sequences and model them over time. These models are primarily employed for medium- and long-term trend forecasting and are suitable for industries where emission relationships are approximately linear. However, they tend to accumulate errors due to data fluctuations under strong non-linearity and cross-scale coupling conditions, which limits their accuracy [
9,
10]. Spatio-temporal models jointly model both the time and space dimensions, considering time lags and spatial heterogeneity. These models are often used for regional-level spatio-temporal forecasting and CO
2 emission backcasting, enhancing both accuracy and interpretability in systems with spatial dependencies and neighboring interactions. However, their stability and extrapolation capabilities are significantly limited when data is sparse [
11]. Probabilistic models use statistical methods to describe the multi-scale characteristics of CO
2 emissions and output interval forecasts, typically applied in short-to-medium-term, seasonally significant, and volatile scenarios [
12,
13]. In summary, these methods rely on limited data and mostly focus on trend prediction and conclusive judgment. Their coupling with decision optimization remains weak, making it difficult to structurally transmit uncertainty to control and scheduling layers. This limits their ability to support real-time and precise regulation of production and operation within a region.
Compared to traditional models, machine learning offers significant advantages in handling nonlinear, multimodal, and time-varying features, and it can be coupled with interpretability methods to enhance reliability. Existing studies have directly integrated machine learning methods into CO
2 emission prediction. For building or district-based models, machine learning models such as Extreme Gradient Boosting (XGBoost) are commonly used, and Model Predictive Control (MPC) incorporates variables such as electricity prices, carbon intensity, and comfort into constraints for global optimization [
14,
15]. Various improvement frameworks, such as opportunity constraints and Distributionally Robust Optimization (DRO), are used to predict and make decisions on CO
2 emissions under uncertainty, ensuring the reliability of strategies [
16,
17,
18,
19,
20]. Additionally, some studies have coupled carbon-aware demand response by integrating real-time or day-ahead carbon intensity signals into scheduling, often paired with deep learning methods to jointly predict and optimize carbon benefit structures [
21,
22]. For CO
2 emission optimization, multi-objective optimization is often employed to balance CO
2 emissions, economic costs, and comfort, using tree models to search the Pareto front for compromise solutions [
23,
24,
25]. Machine learning encompasses all aspects of CO
2 emission prediction methods and can provide valuable guidance. These models can achieve high precision, resilience to fluctuations, and uncertainty description under large-scale data conditions. Furthermore, they play a crucial role in regional scheduling optimization, providing real-time strategies and guidance for regional CO
2 emissions.
However, a gap remains in coupling interpretable predictions with actionable optimization across diverse scenarios. Our work bridges this by integrating explainable machine learning with scenario-based emission control. In this study, we propose a CO2 emission modeling framework built on fine-grained categories and emission factors, explicitly accounting for seasonal differences and temporal effects. The model takes the reduction in building energy consumption as the decision variable and is bounded by operational and population constraints. Using an RF algorithm combined with SHAP and correlation matrices, it enables interpretable emission prediction and strategy formulation. An empirical study on the campus shows that the framework is well-suited to “micro-city” settings: it accurately accounts for CO2 emissions across different scenarios, clearly disentangles the impacts of seasonal, demographic, and environmental factors, and, on this basis, delivers scenario-specific optimal solutions—findings that are of significant relevance to sustainable development and achieving carbon neutrality targets.
3. Results
The university campus located in Shandong Province spans a total area of 1153 acres. Its spatial layout encompasses diverse functional units, including residential zones, teaching and research buildings, public transportation hubs, and experimental R&D parks, collectively forming a relatively enclosed micro-urban ecosystem. This makes it an ideal research sample for analyzing carbon emission characteristics.
We chose to conduct a simulation analysis using data from the years 2023 to 2025, focusing on all elements within the region. For energy utilization, data is collected on an hourly basis, while for population changes, diet, transportation, and other factors, data is collected daily. In this study, we quantify CO2 emissions on a monthly basis and divide each year into four seasons: spring (March, April, May), summer (June, July, August), autumn (September, October, November), and winter (December, January, February).
3.1. Regional CO2 Emission Accounting
3.1.1. Analysis of CO2 Emissions from Energy
As shown in
Figure 1, we have categorized the buildings and architectural accessories into 10 groups: venue buildings, teaching buildings, student dormitories, dining halls/restaurants, research buildings, library buildings, administrative offices, faculty dormitories, hospital, and public green areas. The seasonal energy consumption fluctuations of these ten building categories directly reflect variations in carbon emissions, providing a basis for formulating energy-saving strategies. The energy consumption patterns remain largely consistent during spring and autumn. In summer, the proportion of electricity consumption in student dormitories and teaching buildings increases significantly due to air conditioning usage. In winter, although some students leave campus, the overall energy consumption shows little change due to centralized heating, with venue buildings experiencing a substantial increase in energy demand for heating purposes.
The load fluctuation data serves as a critical parameter for optimizing regional carbon emissions, enabling the formulation of targeted strategies based on temporal characteristics. As illustrated in
Figure 2, the overall load profiles demonstrate minimal variations among spring, autumn, and winter seasons for the aforementioned reasons, though winter exhibits slightly higher nighttime loads due to continuous heating demand. In contrast, summer experiences an upward shift in load curves with sustained nighttime consumption, attributable to air conditioning requirements. All seasonal patterns maintain typical peak-valley characteristics in energy supply. Annual energy consumption fluctuations exhibit distinct seasonal patterns, with lower demand in spring/autumn and elevated peaks in winter/summer. Notably, data from the operational suspension during the mid-February Spring Festival period is excluded from modeling. The mid-July to September summer vacation period, characterized by reduced student occupancy and consequent lower energy consumption, provides essential datasets for investigating the impacts of population density and spatial configuration on carbon emissions.
We considered a total of 10 (
) types of power consuming facilities in the region. As the studied area exclusively utilizes electricity, a uniform carbon emission factor of 0.6 kg/(kW·h) (
Table 1) was applied to all building types [
32]. In 2024, the total energy consumption reached 18.41 million kW·h, resulting in corresponding CO
2 emissions of 11,004.24 tons. The carbon emissions of each building type at different time intervals exhibit a proportional relationship with either their absolute energy consumption or energy consumption share, and these data will be utilized for subsequent strategy optimization analyses.
3.1.2. Analysis of CO2 Emissions from Transportation
Table 2 presents the calculation parameters for transportation carbon emissions, focusing exclusively on three predominant modes of transport: bicycles, electric bicycles, and cars. The analysis assumes a constant inventory of each vehicle type within the study area. In terms of energy consumption per unit distance, bicycles are assigned a zero-value baseline, while the other two categories exhibit distinct seasonal variations—demonstrating comparable consumption patterns during spring and autumn; electric bicycles show reduced energy demand in summer due to eliminated battery preheating requirements at elevated temperatures, whereas cars display increased consumption from air conditioning usage; both vehicle types experience significantly heightened energy demands during winter conditions due to low-temperature effects. The mileage and usage rate data are obtained through regional statistics.
The average daily travel distance varies significantly across transport modes, reflecting users’ climate-dependent mobility preferences. For carbon emission factors: bicycles are considered zero-emission; electric bicycles adopt pure electric mode with an emission factor identical to building electricity use (0.6 kg/(kW·h)); cars use a weighted average of 0.9 kg/(kW·h) based on a 50–50% gasoline-EV mix. Actual vehicle utilization is dynamically scaled according to the real-time population-to-capacity ratio in the area. Due to the small area, the utilization rate of cars is low, only 0.2, while that of electric-bicycles is 0.5.
As shown in
Figure 3, there are two distinct vacation periods with significantly reduced population within a year, while the population remains relatively stable during the rest of the time. Transportation-related CO
2 emissions exhibit seasonal variations: emissions in spring and autumn hover around the average level with minor fluctuations; summer sees slightly higher per capita emissions, primarily due to increased car usage driven by hot weather, despite the lower energy consumption of electric bicycles being offset by their reduced adoption. In winter, car usage increases further, and although electric bicycles remain little used, their higher energy consumption contributes to additional emissions. Among the four seasons, winter records the highest per capita CO
2 emissions from transportation. The annual transportation-related CO
2 emission in this area is 70.35 tons.
3.1.3. Analysis of CO2 Emissions from Population Activities
CO
2 emissions from human activities primarily originate from dietary consumption. In this model, the average daily food intake per capita is assumed constant, with meat and vegetable consumption fixed at 0.205 kg/(person·day) and 0.930 kg/(person·day), respectively (
Table 3). The carbon emission factors were calculated using dietary habit-weighted averages, yielding mean values of 7.5 kg CO
2/kg for meat and 0.6 kg CO
2/kg for vegetables. The model incorporates a food waste factor, with 85% of food assumed to be actually consumed. As shown in
Figure 4a, the monthly variation pattern of these emissions closely follows population fluctuations, with annual CO
2 emissions reaching 5704.57 tons.
3.1.4. Analysis of CO2 Emissions from Vegetation
The carbon sequestration by vegetation primarily comes from green spaces and forests, covering 167,233 m
2 and 80,560 m
2, respectively. Given the study area’s location in North China, the calculation employs region-specific carbon absorption coefficients for typical grasslands and forests (
Table 4).
Figure 4b shows that vegetation carbon emissions are significantly affected by season, and the annual cumulative carbon emissions are −806.59 tons.
3.1.5. Analysis of CO2 Emissions from Waste Management
Given the absence of industrial facilities in the study area, industrial emissions have been excluded from the analysis. This research specifically focuses on carbon emissions associated with solid waste and wastewater treatment. The data reveals that monthly solid waste generation fluctuates between 440 and 640 tons per month, with variations driven by factors such as population size and seasonal changes (
Table 5). To estimate emissions, a weighted average of disposal methods (including landfilling and incineration) was applied, resulting in a carbon emission factor for waste treatment of 0.5 kg per ton. As depicted in
Figure 5a, monthly CO
2 emissions from solid waste treatment are shown. Notably, the high density of public buildings leads to increased per capita water consumption, which reaches 60.78 tons per person per year. As illustrated in
Figure 5b,c, CO
2 emissions from wastewater treatment are closely correlated with regional population size and represent a relatively small proportion of the total CO
2 emissions.
3.1.6. Analysis of CO2 Emissions from Industrial Production
In the absence of industrial production activities within the study area, industrial-related carbon emissions were limited to facility maintenance processes. This study categorized maintenance emissions into three types: lighting devices, pipe fittings, and electrical accessories, with respective CO
2 emission accounting. As shown in
Figure 5d, the CO
2 emissions from facility maintenance were negligible, amounting to only approximately 3.55 tons per year.
3.1.7. Annual CO2 Emissions Analysis
Figure 6a illustrates the annual dynamics of CO
2 emissions in the study area. The emission profile exhibits an “energy-dominant, multi-source synergistic” pattern, where seasonal fluctuations in energy consumption serve as the primary driver of overall emission trends, while contributions from other sources are modulated by demographic mobility and climatic factors. Analytical results reveal that apart from emission troughs during population valleys (February and July–August), monthly emission intensities are predominantly climate-driven, with marked peaks in summer (air conditioning) and winter (space heating). Notably, per capita emissions demonstrate pronounced seasonal variability during high-population periods, whereas during low-population periods, elevated per capita emissions are observed under identical seasonal conditions—a phenomenon attributable to variations in energy efficiency, the mechanistic details of which will be elaborated upon in subsequent sections.
Figure 6b demonstrates that energy consumption maintains its dominant position throughout all months, with monthly emissions (518.6–1295.6 tons) nearly equal to the combined total of the other five categories, underscoring the pivotal role of energy management in regional decarbonization strategies. As the second largest emission source, food consumption highlights the importance of reducing food waste rates for emission mitigation. Waste treatment ranks as the third largest source, showing relatively stable emission patterns due to consistent solid waste generation. Transportation emissions, vegetation carbon sequestration, and industrial maintenance collectively contribute minimally to the regional emission profile. Annual emissions also exhibited a similar trend in
Figure 6c, with energy consumption being the largest source at approximately 11,004.24 t.
3.2. Machine Learning and Strategy Optimization
Finally, we constructed a dataset that includes time, season, population size, energy consumption from different buildings, CO2 emissions from various factors, and total CO2 emissions. By using total CO2 emissions as the prediction target, we explored the relationships between emissions and factors such as population and season.
We test the rationality of the data set by the energy consumption distribution of representative administrative offices.
Figure 7a shows the energy consumption distribution histogram for administrative offices in the dataset. The data exhibits a skewed distribution, with most of the energy consumption concentrated in the lower range, while higher energy consumption only occurs during specific periods. This demonstrates the reasonableness of the dataset’s size and distribution Based on its superior ability to capture complex data patterns with minimal error and its robust generalization performance that effectively mitigates overfitting, RF was selected as the final model for this study. RF training uses grid search to find the optimal model.
Figure 7b displays the results obtained from training the best RF model. The R
2 value (for the test set) is 0.92, indicating that the model has a good fit. Additionally, the RMSE value is 3135.46, suggesting that the model’s overall performance is quite satisfactory.
4. Discussion
Regarding building-related CO
2 emissions (
Figure 8a), the annual CO
2 emissions varied substantially across building types. Venue buildings, teaching buildings, and student dormitories were the major emission sources, each contributing several thousand tons per year, while hospitals, administrative offices, and faculty dormitories had relatively low emissions, and public green areas showed negligible direct emissions. This suggests that emission reduction efforts should prioritize high-consumption building clusters, focusing on retrofitting heating, cooling, lighting, and operational systems for improved efficiency.
Some representative data were selected for correlation coefficient and SHAP analysis. All categories in
Figure 8b,c are abbreviated by their first letters. The correlation analysis between building categories (
Figure 8b) revealed that most pairs exhibited strong positive correlation coefficients (>0.7), such as administrative offices and library buildings (r = 0.95). This indicates a strong synchronization in energy consumption trends, likely driven by similar climatic conditions and the regional number of occupants. Such synchronization offers opportunities for centralized management and CO
2 emission reduction. The correlation matrix of emission sources (
Figure 8c) further revealed that energy consumption had the strongest correlation with total emissions (r = 0.91). Transportation, food, and waste exhibited moderate inter-correlations, while vegetation showed low or negative correlations with other sources, reflecting its carbon sink function and seasonal variations.
The SHAP analysis based on the random forest model (
Figure 8d) quantitatively assessed the contributions of each source to total emissions. Food consumption exhibited far greater explanatory power than all other factors, primarily due to population fluctuations and food waste. In the other categories, energy consumption had the largest explanatory power, followed by transport, while waste and vegetation had relatively smaller impacts, and industrial production contributed negligibly. Excluding industrial production, the change in value aligned with the overall trend, and the higher the ranking, the greater the impact of value changes on overall volatility. These findings suggest that emission reduction strategies should prioritize energy efficiency, supported by low-carbon supply chain management in the food sector.
Figure 9a illustrates the monthly variation in per-capita CO
2 emissions, showing a strong correlation with population fluctuations. When the population size is low, per-capita emissions increase markedly.
Figure 9b presents the total CO
2 emissions predicted by the RF model across the full population range in July, with a LOWESS-fitted trend curve indicating a monotonically increasing relationship with population density. When the population decreases to a certain threshold, CO
2 emissions gradually level off and stabilize; a similar stabilization is observed when the population approaches its upper limit. These results suggest a correlation between population size and infrastructure energy consumption. At low population levels, the use of public facilities leads to higher per-capita energy consumption; in dormitories, dispersed occupancy and the lack of centralized management reduce the efficiency of air conditioning and other electrical systems. At moderate population levels, the increase in CO
2 emissions per additional person primarily arises from the further opening of public facilities and the increased utilization of dormitories. When the population nears saturation, the energy consumption of public spaces rises only marginally, and concentrated dormitory occupancy enables the sharing of air conditioning and other amenities, thereby slowing the growth rate of CO
2 emissions. Therefore, regional policy changes should be implemented at the inflection point of the curve.
We set up two scenarios for analysis. Scenario 1 involves high comfort with flexible regulation, while Scenario 2 involves low comfort with strict regulation. To compare the differences between the two scenarios, we assume that the seasonal influence factor remains the same in all other aspects and only affects high-population areas such as student dormitories, teaching buildings, and libraries. The seasonal influence is divided into winter–summer and spring–autumn categories. Since accommodation can be centrally managed during off-peak times, the adjustment range is larger, whereas for office buildings, libraries, and similar buildings, which have higher comfort requirements, the maximum adjustment value is considered smaller. For hospitals, due to the highest comfort requirements, the calculated maximum adjustment value is the smallest. The final calculated adjustment range values are shown in
Table 6 and
Table 7. We trained the model on an annual dataset and then optimized it with the mid-month of each quarter, aiming to represent seasonal (quarterly) patterns more faithfully.
Figure 10 compares baseline and optimized CO
2 emissions for major high-consumption building categories after applying the seasonal constraint optimization strategy. The model results indicate that, during summer and winter, reduced dormitory occupancy necessitates centralized management, whereas in spring and autumn—when the population is relatively high—dormitories are not the primary optimization target. In winter and summer, the optimization strategy significantly reduces emissions from student dormitories, teaching buildings, and venue buildings, consistent with their dominant roles in heating and cooling loads. The reason for the smaller decrease in winter compared to summer is that in winter, the entire campus is centrally heated. As a result, centralized management can only reduce electricity and other non-heating consumption. To further decrease CO
2 emissions, it would be necessary to first implement centralized management and then provide targeted heating.
In spring and autumn, the focus of emission reductions shifts toward research buildings, teaching buildings, and venue buildings. However, due to the larger number of people, the emission reductions from these buildings are not as significant overall as during the summer and winter when some areas are closed. Combining building correlation weights with RF-based emission predictions provides an effective and feasible approach for guiding reduction allocation.
In Scenario 2 (
Figure 11), the optimization targets minimizing CO
2 emissions, so the model requires all building types to participate in reducing energy use. Taking teaching buildings as an example, although the maximum adjustable rate is 0.1 across seasons, the model still differentiates seasonal effects: in winter, uniform heating makes reductions harder, so the associated strategies are ranked lower and have smaller adjustment magnitudes.
For real-time control, the model can identify existing key peak periods and, along with real-time data on population changes and energy consumption patterns, pre-allocates emission reduction efforts. This enables the formulation of targeted strategies for any given moment. It demonstrates the model’s reliability and its potential for CO2 emission optimization.
5. Conclusions
We present an interpretable, integrated CO2 emission prediction–optimization framework grounded in fine-grained categories and emission factors, explicitly accounting for seasonal, demographic, and temporal effects. Using an RF model combined with SHAP and correlation matrices, we enable interpretable prediction and CO2 strategy design. Using the campus as a representative “large-scale community”, we analyze CO2 emissions and evaluate optimization strategies under two control scenarios; the model remains robust across conditions. The framework provides a quantitative basis for structurally propagating predictive uncertainty into real-time optimization. The main findings are as follows:
We develop an interpretable prediction–optimization framework for CO2 emissions and introduce ML-based optimization schemes that support differentiated control and optimization across building types and seasonal contexts. By integrating RF with SHAP analysis, our framework achieves an R2 of 0.92 and identifies key drivers, enabling tailored emission strategies that balance reduction targets with operational comfort constraints.
We integrate hourly/daily/monthly/quarterly data within a unified model, enabling CO2 prediction and optimization across multiple temporal scales. This multi-scale data integration captures both short-term fluctuations and long-term seasonal trends, ensuring that predictive accuracy and optimization strategies are robust and adaptable to dynamic changes in population activity and climate conditions.
We complete carbon accounting for the campus, reveal the characteristic CO2 emission patterns of a typical micro-city, quantify factor contributions, and provide actionable optimization directions for similar regions. In this campus case study, our accounting reveals a distinct “energy-dominant, multi-source synergistic” pattern, with energy use contributing over 11,000 tons annually. The SHAP analysis prioritizes energy, food, and transport as primary levers for effective mitigation in comparable community-scale systems.
This study bridges a critical gap between predictive modeling and operational mitigation by developing an interpretable machine learning framework that integrates fine-grained emission prediction with scenario-based optimization. Unlike approaches that focus solely on forecasting, our method provides transparent, actionable strategies tailored to seasonal and demographic contexts, offering a practical tool for achieving carbon neutrality in community-scale settings. This integrated approach provides a viable pathway toward a sustainable, low-carbon future.