Benchmarking Training Emissions of Regression Models for Vehicle CO2 Prediction

Turhan, Mahmut; Emeç, Murat; Ertürk, Muzaffer

doi:10.3390/su18062830

Open AccessArticle

Benchmarking Training Emissions of Regression Models for Vehicle CO₂ Prediction

by

Mahmut Turhan

,

Murat Emeç

^*

and

Muzaffer Ertürk

Department of Aviation Electrics and Electronics, Istanbul Nisantasi University, Istanbul 34400, Türkiye

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(6), 2830; https://doi.org/10.3390/su18062830

Submission received: 5 February 2026 / Revised: 4 March 2026 / Accepted: 10 March 2026 / Published: 13 March 2026

(This article belongs to the Topic Advances in Low-Carbon, Climate-Resilient, and Sustainable Built Environment)

Download

Browse Figures

Versions Notes

Abstract

The urgency of climate action has intensified the use of machine learning (ML) to predict vehicular CO₂ emissions; however, the training of machine learning models also generates computational emissions that are seldom reported. This study addresses a paradox central to Green AI: can carbon-intensive algorithms be justified for predicting carbon emissions? Using a public dataset of 7385 light-duty vehicles, we trained nine widely used regression models spanning simple linear baselines, polynomial and regularised linear methods, tree-based learners, ensembles, and a neural network. All experiments were instrumented with CodeCarbon to quantify real-time training footprints under a grid carbon intensity of 450 g CO₂/kWh. Across models, test performance ranged from R² = 0.72 to 0.99, yet training emissions varied by four orders of magnitude, from 0.001 g CO₂ (simple linear regression) to 2.3 g CO₂ (XGBoost). Although XGBoost achieved the highest accuracy (R² = 0.9947), it emitted approximately 2300× more CO₂ than regularised polynomial linear models for only a 0.39-point gain in R². Pareto analysis identifies Lasso and Ridge regression with degree-4 polynomial features as sustainability-optimal, reaching R² = 0.9908 at ~0.004 g CO₂. To unify predictive and environmental efficiency, we introduce Accuracy-per-Gram (APG = R²/CO₂) and Marginal Emissions Cost (MEC = ΔCO₂/ΔR²), demonstrating a steep efficiency cliff beyond regularised linear models. At the fleet scale (100 million vehicles with daily retraining), algorithm choice implies ~84 t CO₂/year for XGBoost versus ~0.15 t for Lasso, highlighting the potential climate cost of marginal accuracy gains. We provide a reproducible carbon-tracking pipeline, Green-AI evaluation metrics, and deployment guidance, arguing that computational sustainability must co-determine model selection for emissions-related ML systems. Most critically, we identify a clear accuracy–carbon emission Pareto frontier, demonstrating that regularised polynomial linear models lie on the sustainability-optimal boundary, while widely used ensemble methods such as XGBoost sit beyond an “efficiency cliff,” where marginal accuracy improvements incur disproportionately high carbon costs.

Keywords:

carbon footprint; Green AI; machine learning sustainability; vehicle CO₂ prediction

1. Introduction

Transportation produces about 24% of global CO₂ emissions, with road vehicles contributing a major share. This has accelerated the use of machine learning (ML) to predict vehicular emissions and inform policies such as fuel-economy standards, carbon taxation, and consumer vehicle ratings. The underlying premise is straightforward: better prediction enables better intervention, which in turn reduces emissions.

These predictions depend on training models that consume electricity and emit CO₂ depending on the energy mix of the grid. For simple models, this cost is negligible (e.g., linear regression trained on ~7000 samples emits ≈ 0.001 g CO₂), but for complex algorithms, the footprint can grow rapidly. A prior work highlights extreme cases: training large Transformer models with neural architecture search can emit hundreds of tons of CO₂ [1]. This raises a core tension: if training emissions approach or exceed the emissions savings enabled by prediction, we may be shifting the problem from tailpipes to data centres.

Historically, ML research has prioritised predictive performance (R², F1, etc.) while largely ignoring training emissions. Three structural factors explain this gap:

Infrastructure abstraction: Cloud platforms conceal energy and CO₂ costs behind simple API calls, leaving researchers unaware of kilowatt-hours consumed.
Short-term framing: Academic studies typically measure one-off training, whereas production systems retrain continuously (daily, weekly, or in real time), multiplying computational costs.
Performance ratchet: The “bigger is better” paradigm encourages ever-larger models, externalising environmental costs as a hidden by-product of accuracy gains.

Although carbon-tracking tools and the Green AI agenda exist [2,3,4], they focus mainly on deep learning. In contrast, the everyday ML models that dominate tabular regression in practice—gradient boosting, Random Forests, and regularised regression—remain largely unbenchmarked in carbon terms. This study addresses the long-tail gap. To our knowledge, this is the first study to systematically benchmark carbon emissions and predictive performance across a broad suite of classical regression models in a policy-relevant environmental domain. A prior work has focused primarily on deep learning or AutoML pipelines; the controlled carbon benchmarking of classical tabular regression models remains scarce. Although it may seem intuitively plausible that more complex algorithms consume more energy, intuition alone does not provide decision-relevant thresholds. The critical question is not whether boosting emits more CO₂ than linear regression, but how much additional carbon is required per marginal unit of predictive improvement. Without controlled benchmarks and efficiency metrics, model selection remains guided by accuracy alone. This study therefore shifts the debate from qualitative intuition to quantitative trade-off measurement.

We investigate three interconnected questions:

RQ1: How much CO₂ do different ML models emit when predicting vehicle emissions?
RQ2: How does the accuracy–sustainability trade-off vary across model classes, using the Accuracy-per-Gram (APG) metric?
RQ3: Under fleet-scale, continuous-retraining scenarios, how do algorithm choices affect cumulative annual emissions?

This work makes four contributions:

C1. Multi-model carbon benchmark: The first systematic CO₂-footprint comparison across traditional regression ML models on a real vehicle-emissions task.
C2. Pareto-frontier evidence: Empirical demonstration that regularised linear models (Lasso/Ridge) achieve near-optimal accuracy (~99% R²) at ~500× lower emissions than gradient boosting.
C3. Policy-ready metrics: Introduction of APG (R²/CO₂) and Marginal Emissions Cost (ΔCO₂/ΔR²) to support carbon-aware model selection in deployment.
C4. Reproducible Green-AI pipeline: A transferable methodology including hardware profiling, grid-intensity mapping, CodeCarbon integration, and sensitivity analyses across energy mixes.

We choose vehicle CO₂ prediction because it is policy-critical, represents one of the most common tabular regression settings, and carries a deliberate irony: if emission-prediction AI is not itself sustainable, climate-mitigation ML risks undermine its own objective.

The remainder of this paper is organised as follows. Section 2 reviews Green AI and vehicle-emissions modelling; Section 3 details the dataset, models, and carbon-tracking setup; Section 4 reports predictive performance and emissions with Pareto analysis; Section 5 discusses scalability and policy implications; and Section 6 concludes the paper. Our central claim is that accuracy alone is no longer sufficient—computational sustainability must be co-optimised in algorithmic choice.

2. Related Works

The environmental cost of machine learning became a central research concern after training large NLP models was shown to emit from hundreds of kilograms to hundreds of tons of CO₂, especially when paired with neural architecture search [1,5]. This catalysed the “Green AI” agenda as a shift from accuracy maximisation regardless of targeting sufficient accuracy with minimised computational cost and calls for routine reporting of energy use and emissions alongside performance [6]. A subsequent work provided both conceptual and practical tools for estimating training emissions and reducing per-training emissions even as the overall AI energy demand rises [7].

Despite rapid progress in Green AI, most carbon-tracking benchmarks focus on deep learning in NLP and computer vision. Classical machine learning methods—linear models, decision trees, Random Forests, and gradient boosting—that dominate tabular prediction in industry and regulatory analytics remain comparatively under-measured [8,9,10]. Carbon accounting has thus concentrated on Transformers, CNNs, and compute-heavy RL systems, leaving a methodological gap around the “workhorse” models used in real-world structured data pipelines [11,12,13]. Limited evidence suggests boosting models may emit several times more CO₂ than forests, but studies often omit simpler baselines and rarely include realistic tuning costs, preventing a clear view of carbon–accuracy diminishing returns [1,2,3,9,10,11,14].

Predicting vehicular CO₂ emissions is a mature ML application supporting regulation and vehicle design. Interpretable linear and Lasso-based models typically reach strong explained variance (R² ≈ 0.88–0.93) with minimal computation, while Random Forest and XGBoost often improve predictive accuracy (R² ≈ 0.95–0.98) by modelling nonlinearities at the cost of opacity and higher training time [10,13,15,16,17,18,19,20,21,22,23]. Deep neural models (e.g., LSTMs on telemetry) can be effective when high-frequency sensor data are available, but their computational and environmental footprints are substantial and seldom disclosed [24,25,26,27]. Importantly, prior vehicle-emissions ML studies rarely report training CO₂, so there is no head-to-head sustainability comparison across model classes in this domain [6,9,11,17,18,23].

Training-phase CO₂ emissions depend on energy consumption (kWh), regional grid intensity (g CO₂/kWh), and hardware efficiency [5,7,9]. Real-time tools such as CodeCarbon log energy use and carbon intensity during experiments, enabling reproducible reporting, though they may slightly underestimate emissions due to idle power or network overhead [21,28,29]. When real-time measurement is not feasible, post hoc estimators like ML CO₂ Impact provide retrospective approximations [5,9]. To ensure comparability, carbon measurement should be paired with explicit reporting of hardware specifications, grid intensity assumptions, and training durations [6,7,29].

This study sits at the intersection of three streams: (i) Green AI, by extending carbon tracking to classical ML; (ii) vehicle-emissions prediction, by integrating carbon cost into accuracy benchmarking; and (iii) efficiency economics, by quantifying diminishing returns through carbon-aware metrics [9,10,11,13]. By benchmarking linear, regularised, tree-based, ensemble, and neural regressors within a unified experimental setup, we provide the first direct Pareto-frontier evidence for carbon-aware model selection in tabular vehicle-emissions prediction [6,10,30]. Recent studies have also started to examine carbon-aware optimisation beyond deep learning, including energy benchmarking of AutoML pipelines, lifecycle accounting of ML experimentation, and multi-objective efficiency modelling in tabular contexts [9,11,14]. However, controlled carbon comparisons across classical regression algorithms remain scarce, particularly within policy-relevant environmental prediction domains [6,15,21,22].

3. Materials and Methods

3.1. Study Design and Workflow

We adopted a dual-objective evaluation paradigm that assesses (i) predictive performance using standard regression metrics (R², RMSE, MAE) and (ii) environmental cost quantified as training-phase CO₂ emissions recorded by CodeCarbon. This parallel evaluation enables the construction of an accuracy–sustainability Pareto frontier, identifying models that achieve high predictive quality with minimal computational carbon.

The study workflow comprised five stages (Figure 1):

Stage 1: Data acquisition and preprocessing: A public vehicle-emissions dataset (N = 7385) was obtained from the Canadian fuel-consumption database. Data were checked for missingness, screened for outliers using IQR rules, and prepared for feature engineering.

Stage 2: Exploratory data analysis (EDA): We examined distributional properties (skewness, kurtosis), correlation structure (Pearson’s r), multicollinearity (variance inflation factor, VIF), and categorical effects via one-way ANOVA. These results informed feature selection and scaling.

Stage 3: Model training with carbon tracking: Nine regression models were trained under identical train–test splits (80/20; random_state = 42). Each run was instrumented with CodeCarbon to capture energy use and resulting CO₂.

Stage 4: Performance evaluation: Models were evaluated on the held-out test set using R² (primary), RMSE, and MAE. Cross-validation was used during hyperparameter tuning (5–10 folds depending on model cost).

Stage 5: Emissions analysis and Pareto ranking: Training footprints were compared and synthesised through two carbon-aware efficiency metrics: Accuracy-per-Gram (APG = R²/CO₂) and Marginal Emissions Cost (MEC = ΔCO₂/ΔR²).

3.2. Dataset and Variables

We used the 2023 Canadian Fuel Consumption Ratings dataset released by Natural Resources Canada under the Open Government Licence. The dataset contains 7385 light-duty vehicles (model years 2022–2023), spanning 42 manufacturers, 240 models, and 16 vehicle classes (Table 1).

Missing data: No missing values were observed, likely reflecting mandatory reporting in vehicle certification.

Outliers: IQR-based screening identified 23 outliers (0.3%), with an upper fence of 525 g/km (high-performance vehicles) and a lower fence of 100 g/km (hybrid/electric vehicles). These were retained because they represent valid market categories, and removing them would reduce generalizability.

Duplicates: No exact duplicates were found; however, 42 trim-level pairs shared identical specifications but distinct model names. All entries were retained to preserve real-market variability.

Encoding and scaling: Categorical variables were label-encoded for tree-based models (decision tree, Random Forest, XGBoost) and one-hot encoded for linear/regularised models to avoid ordinal assumptions. Numeric variables were z-score standardised for Ridge/Lasso to stabilise optimisation and make coefficients comparable.

3.3. Exploratory Data Analysis (EDA)

Fuel consumption variables (city, highway, combined) and the target CO₂ were approximately normal with mild right skew, supporting the suitability of linear and regularised regression whose residual-normality assumptions are commonly reasonable in this context.

Kernel density curves for fuel_cons_comb stratified by vehicle class showed expected class-dependent modes: compact/subcompact vehicles concentrated around 7–8 L/100 km, pickup trucks around 12–14 L/100 km, and sports cars displayed a bimodal pattern separating high-emission performance types from lower-emission hybrid variants.

Multicollinearity and feature choice: VIF analysis indicated problematic collinearity for fuel_cons_city and fuel_cons_hwy (VIF > 10). Because fuel_cons_comb is a weighted aggregate of both, we retained fuel_cons_comb and removed the city/highway variables, reducing features from 12 to 10 without losing information and improving linear model stability.

Categorical effects: One-way ANOVA confirmed significant categorical contributions to emissions (Table 2). Vehicle class had the largest effect (η² = 0.41), fuel type was second (η² = 0.25), and make/transmission had smaller but significant effects (η² < 0.15).

The increase from five base predictors to 126 polynomial features results from all degree-4 polynomial combinations including interaction terms, calculated using the formula for combinations with repetition.

3.4. Feature Engineering

To capture nonlinearities without defaulting to complex ensembles, we applied degree-4 polynomial feature expansion over a base set of p = 5 predictors (engine_size, cylinders, fuel_cons_comb, fuel_cons_comb_mpg, transmission). This yields 126 polynomial features, including all interaction terms up to order 4, enabling linear models to approximate nonlinear response surfaces.

Degree-4 was selected based on prior evidence that degree-3/4 polynomials capture nearly all meaningful nonlinear structure in vehicular emissions, while higher degrees increase overfitting risk. Regularisation is necessary because many polynomial terms are strongly correlated. Ridge (L2) shrinks correlated coefficients, while Lasso (L1) induces sparsity by setting a subset to zero.

3.5. Model Architectures

We trained nine models spanning linear, regularised, tree-based, ensemble, and neural families. All models used identical feature sets and splits to ensure comparability.

All models, including XGBoost and MLP, were trained for convergence using the final selected hyperparameters. Training details and hardware specifications are provided in Section 3.7.

Simple Linear Regression (SLR): Baseline using engine_size only.
Multiple Linear Regression (MLR): Linear model with five predictors.
Polynomial OLS (degree-4): Closed-form OLS on 126 polynomial features.
Ridge Regression (L2): Polynomial features with L2 penalty; α tuned by grid search.
Lasso Regression (L1): Polynomial features with L1 penalty; α tuned by grid search; yields sparse coefficients.
Decision Tree Regressor: Single unpruned regression tree (default splitting). The single unpruned regression tree serves as a simple baseline to illustrate model complexity and carbon cost trade-offs. It is not intended as a final predictive model.
Random Forest Regressor: Bagged ensemble of 100 trees.
XGBoost Regressor: Gradient-boosted ensemble of 500 trees with subsampling.
MLP Regressor: Feedforward neural network (128-64-32 hidden units, ReLU), early stopping, and GPU-accelerated training.

Hyperparameter grids and selections are reported in Section 4 alongside carbon costs to avoid post hoc tuning bias.

3.6. Train–Test Split and Cross-Validation

A fixed 80/20 hold-out split was used for all experiments: training N = 5908 and test N = 1477. A single random seed (42) ensured reproducibility. Stratification was not applied because the target is continuous.

During tuning, we used k-fold cross-validation with folds chosen to reflect model cost: 10-fold for Ridge/Lasso, 5-fold for XGBoost, and a single validation split with early stopping for MLP.

To maintain a fair comparison, all models were provided with the same underlying feature information. Linear and regularised models utilised one-hot encoding and degree-4 polynomial expansion to capture nonlinearities, whereas tree-based models utilised their native ability to handle categorical variables and nonlinear splits. Supplemental tests indicated that providing the expanded polynomial feature set to XGBoost increased its training emissions by ~15% without a statistically significant gain in R², reinforcing the dominance of regularised linear models on the Pareto frontier.

3.7. Performance Metrics

The primary metric was R², interpreted as the fraction of CO₂ variance explained by the model. R² is scale-invariant and widely interpretable for policy contexts.

Secondary metrics were MAE and RMSE. MAE gives average absolute error, while RMSE penalises larger deviations more strongly. In regulatory applications, explained variance is typically prioritised over marginal improvements in absolute error.

All models except MLP and XGBoost were trained on CPUs. MLP and XGBoost training utilised GPUs to accelerate computation.

3.8. Carbon Footprint Tracking Methodology

Carbon tracking was integrated into every training run using CodeCarbon v2.3.4, which estimates CO₂ emissions from real-time power monitoring and regional electricity carbon intensity.

Power monitoring: CodeCarbon records incremental energy draw from CPU (Intel RAPL), GPU (NVIDIA NVML via nvidia-smi), and RAM. Sampling occurred at ~0.5 s intervals.

Grid carbon intensity: Regional intensity (g CO₂/kWh) was queried from ElectricityMap through CodeCarbon. Experiments were conducted in Turkey (TR), with a mean grid intensity of 450 g CO₂/kWh during the training window (Table 3).

Emission calculation:

CO₂ (g) = Energy (kWh) × Carbon intensity (g CO₂/kWh)

Sensitivity to alternative grid intensities is reported to contextualise geographic dependence.

Accounting limitations: Reported CO₂ excludes baseline idle draw (≈5% underestimation), embodied hardware carbon, facility cooling overhead (PUE), and network energy because training was single-machine and local. These factors are unlikely to alter comparative rankings within our model set but matter for scaling comparisons to large distributed systems.

Reported carbon emissions correspond to a single training run with the final selected hyperparameters. The carbon cost of hyperparameter tuning is not included and represents an important limitation.

It is important to clarify that the carbon emissions reported in this study correspond to a single training run using the final selected hyperparameter configuration for each model. In practical research and industrial workflows, hyperparameter tuning (e.g., grid search, random search, or Bayesian optimisation) typically requires multiple training runs. Consequently, the total lifecycle carbon emissions of model development may exceed the values reported here by one to two orders of magnitude. Quantifying the full carbon cost of tuning pipelines represents an important limitation of the present benchmark and a key direction for future work.

4. Results

The results reveal two simultaneous stories: predictive performance and environmental responsibility. Table 4 summarises all models under identical experimental conditions, showing that test accuracy clusters tightly for most algorithms (R²: 0.72–0.99), while training emissions span four orders of magnitude (0.001–2.30 g CO₂). This decoupling of accuracy and carbon cost highlights an “accuracy ceiling,” where additional complexity yields diminishing predictive returns yet sharply increasing emissions.

To ensure the robustness of the Pareto ordering, we conducted a sensitivity analysis using 10-fold cross-validation. The results confirm that the accuracy–emissions ranking is stable across different data splits. While absolute R² values fluctuate by ±0.002, the dominance of regularised polynomial models on the sustainability-optimal frontier remains consistent, as their carbon advantage (500× lower than XGBoost) far outweighs marginal performance variances.

Accuracy ceiling: Six models (Polynomial OLS, Ridge, Lasso, Random Forest, MLP, XGBoost) reach R² ≥ 0.9908, differing by <0.4 percentage points. This indicates that, for tabular vehicular data, predictive performance saturates once a polynomial structure and mild regularisation are introduced.

XGBoost (R² = 0.9947): XGBoost achieves the best test performance, reflecting the strength of gradient boosting in iteratively correcting residuals:

f^{^}^((t)) (x) = f^{^}^((t - 1)) (x) + η h_t (x)

RMSE (13.42 g/km) represents the lowest absolute error among all methods.

MLP Regressor (R² = 0.9923): The three-hidden-layer MLP performs slightly below XGBoost. Its gain over polynomial linear models is small (+0.15 R²-points), suggesting that most nonlinearities are already captured by degree-4 polynomial features, leaving limited residual structure for a neural approximator. Degree-4 polynomial expansion was chosen following prior studies demonstrating its effectiveness in capturing nonlinearities in tabular environmental datasets [10,13].

Random Forest (R² = 0.9908): Random Forest does not exceed regularised polynomial models on the test set. Three mechanisms explain the following:

(i): Bagging vs. boosting: averaging trees reduces variance but does not reduce bias as effectively as boosting;
(ii): Feature redundancy: polynomial expansion already encodes nonlinear interactions, narrowing the advantage of tree discovery;
(iii): Train–test gap: train R² (0.9955) exceeds test R² (0.9908), indicating mild overfitting despite bagging.

Polynomial OLS, Ridge, and Lasso (R² = 0.9908): These three models show identical test performance, differing primarily in training dynamics. OLS exhibits slight overfitting (train R² marginally above test), Ridge stabilises correlated coefficients through L2 shrinkage, and Lasso yields a sparse, policy-auditable model by removing ~40% of polynomial terms. Importantly, Pareto logic favours the regularised variants because they preserve accuracy while preventing unnecessary coefficient inflation (Table 5).

Multiple Linear Regression (R² = 0.9000): MLR captures a major linear structure but underfits nonlinear effects documented by polynomial/regularised models.

Simple Linear Regression (R² = 0.7238): SLR, using engine_size alone, establishes a transparent lower bound, confirming that fuel-economy variables and interactions are essential for high-fidelity prediction.

Decision Tree (R² = 0.8450): The unpruned tree overfits severely (train R² ≈ 1.00), leading to poor generalisation. This underscores the instability of single trees in regression without pruning or depth constraints.

Even with polynomial expansion, emissions remain near-zero because matrix operations are dense, short-lived, and efficiently vectorised. Despite a sub-second runtime, the decision tree emits 0.18 g CO₂. Recursive splitting requires repeated sorting and irregular memory access, elevating power draw per unit time relative to closed-form linear solvers.

Lasso removed exactly 50 of the 126 polynomial features (~40%), yielding a sparse and interpretable model.

Why does XGBoost emit most? (i) Boosting is sequential and cannot parallelise across rounds; (ii) GPU histogram building plus CPU tree logic introduces transfer overhead; (iii) the full 500-round training window adds sustained compute time.

To unify predictive power and environmental cost, we define:

APG = R²/(CO₂ (g))

APG quantifies “how much accuracy is purchased per gram of CO₂”. Higher APG indicates a more sustainable algorithm.

Table 6 of the APG ranking shows a dramatic advantage for polynomial and regularised linear models. Polynomial OLS achieves APG = 450.4, over 1000× more efficient than XGBoost (APG = 0.43), while matching test R² within 0.4 points. This indicates that, under standard regulatory thresholds (R² ≥ 0.95), regularised polynomial linear models dominate the sustainability–accuracy frontier.

We quantify diminishing returns through (Table 7):

MEC = (ΔCO₂)/(ΔR²)

The transition from Lasso to XGBoost delivers only 0.39 percentage-point accuracy improvement, yet increases emissions by ~2.30 g, producing the steepest MEC (“efficiency cliff”) in the experiment set. Interpretability is essential for policy adoption. Lasso coefficients show that emissions are dominated by fuel economy, with nonlinear amplification through polynomial terms and interactions:

fuel_cons_comb and higher powers explain the largest share of variance, confirming that CO₂ is primarily a function of fuel burned per kilometre.

engine_size² × fuel_cons_comb and cylinders × fuel_cons_comb highlight compounding effects in larger engines.

Categorical terms (transmission, fuel_type, vehicle_class) contribute smaller but directionally consistent effects.

Predictions closely follow the actual CO₂ values across the first 20 test samples (Figure 2), indicating a strong fit. Small deviations appear mainly at sharp peaks and dips, suggesting the slight smoothing of extremes but no systematic bias.

XGBoost gain-based importance converges with Lasso, again ranking combined fuel consumption as the dominant predictor. This agreement between a sparse linear model and a nonlinear ensemble strengthens confidence in the underlying explanatory pattern: fuel consumption is the primary causal lever for emissions reduction, while engine size and cylinders largely act through fuel economy.

5. Discussion

To illustrate the potential system-level implications of algorithm choice, we present a scaling stress-test scenario involving 100 million vehicles. The 100 million vehicle fleet scenario is a hypothetical scaling exercise to illustrate potential system-level carbon impacts. It does not imply the availability of individual vehicle telemetry but models large-scale regulatory or fleet analytics systems. This scenario does not assume individual-level fuel telemetry for each vehicle; rather, it reflects a large-scale regulatory or fleet analytics setting in which aggregate datasets are periodically updated, and the model is retrained.

Machine learning research has historically ranked models almost exclusively by predictive accuracy. Benchmarks and competitions routinely celebrate improvements in R² or AUC at the fourth or fifth decimal place. However, our results demonstrate that, for tabular emissions prediction, predictive performance saturates near an accuracy ceiling once a polynomial structure and modest regularisation are introduced. Beyond this point, additional accuracy comes at the cost of sharply increasing emissions.

Therefore, the decision question shifts from “Which model is most accurate?” to “Which model balances accuracy, interpretability, and sustainability for the intended use case?” In climate-mitigation applications, this reframing is not optional but necessary to avoid unintentionally displacing emissions from tailpipes to data centres.

Our Pareto analysis identifies regularised linear regression (Ridge/Lasso) with polynomial features as optimal under three common deployment conditions.

Condition 1: Enough Accuracy for Policy Use

If stakeholders accept predictive adequacy within typical regulatory thresholds (e.g., R² ≥ 0.95), Lasso delivers 99.08% explained variance at 0.0045 g CO₂. This level of accuracy already exceeds what most standards require for guiding policy incentives or fleet-level compliance monitoring. In other words, sixth-decimal improvements do not meaningfully change policy decisions.

Condition 2: Interpretability Requirements

Regulatory regimes such as GDPR Article 22, the EU AI Act, and the NIST AI Risk Management Framework increasingly require explainability and auditability. Lasso offers transparent coefficients that directly show the causal structure captured by the model:

Fuel consumption terms dominate;
Engine size × fuel consumption interactions amplify emissions;
Cylinder count has a secondary contribution.

In contrast, XGBoost relies on post hoc explainability tools such as SHAP or LIME, which not only add methodological complexity but also incur extra compute and carbon cost. For policy-sensitive domains, the interpretability advantage of sparse linear models is a decisive co-benefit.

Condition 3: Frequent Retraining or Resource Constraints

In edge or embedded systems (e.g., onboard vehicle computers, IoT sensors), computational budgets are limited. Lasso models train in ~14 s, occupy minimal memory (<1 MB), and yield sub-millisecond inference. XGBoost models, by contrast, require substantially longer training (~220 s), larger storage footprints, and higher inference latency. Thus, for real-time or low-power deployments, regularised linear models are not merely greener—they are also operationally necessary.

Quantitatively, shifting from daily to weekly retraining reduces the annual emissions of the XGBoost fleet scenario from 84 t to approximately 12 t CO₂/year. While this mitigation is substantial, the relative efficiency ratio between Lasso and XGBoost remains constant at ~500:1. Furthermore, in edge deployment scenarios (e.g., onboard vehicle units), the sub-millisecond inference and minimal memory footprint (<1 MB) of Lasso models provide a decisive operational advantage over ensemble methods.

While regularised polynomial regressions dominate most practical settings, complex models are not categorically unjustified. Our findings instead enable explicit cost–benefit reasoning.

Scenario A: Safety-Critical or High-Penalty Settings

In environments where even small error reductions prevent large harms—such as autonomous vehicle control, stringent compliance audits, or high-stakes taxation systems—the RMSE improvement delivered by XGBoost (13.4 vs. 17.7 g/km) may warrant its carbon cost. If a 2% reduction in misclassification prevents multi-million-dollar risks, the emissions externality (~120 g CO₂/year per model retrained weekly) is economically negligible relative to avoided harm. In such cases, XGBoost is defensible.

XGBoost achieves a lower RMSE (13.4 g/km) than Lasso (17.7 g/km) due to its ability to model complex nonlinearities, but this comes at a substantially higher carbon footprint.

Scenario B: One-Time Training with Massive Amortised Inference

If a model trains once and serves tens of millions of predictions, training emissions are amortised to near zero per prediction. At that scale, lifecycle carbon is usually dominated by inference infrastructure and data pipelines rather than training. Model choice then has minimal effect on total emissions, and accuracy may be the dominant factor.

Scenario C: Renewable-Powered Training

Geographic and temporal grid intensity strongly modulates training footprints. Training XGBoost on low-carbon grids (e.g., Iceland or France) reduces its emissions by 80–90%, narrowing the efficiency gap. Organisations with renewable commitments can therefore reduce the environmental penalty of complex models through carbon-aware scheduling or regional relocation.

Sensitivity to temporal grid variations suggests that carbon-aware scheduling (training during low-intensity windows) can reduce XGBoost emissions by up to 80–90%. However, even under the cleanest grid scenarios (e.g., 50 g/kWh), the absolute carbon footprint of ensemble models remains significantly higher than that of regularised linear models trained during peak-intensity periods, suggesting that algorithmic efficiency is a more potent lever for sustainability than temporal scheduling alone.

A key methodological implication is that tuning, not final training, can dominate research-phase emissions. Achieving optimal hyperparameters for XGBoost often requires dozens to hundreds of training runs. A realistic grid search with modest ranges and 5-fold CV can exceed 300 g CO₂, far above the 2.3 g reported for one final model. Therefore, studies that report carbon only for final training runs systematically understate computational impact by 10–100×. Methodological Insight: The Hidden Carbon Cost of Hyperparameter Tuning: While this benchmark reports emissions for final trained models, realistic model development involves extensive hyperparameter exploration. For example, a modest grid search over 20 parameter combinations with 5-fold cross-validation requires 100 training runs. For XGBoost, this would correspond to approximately 230 g CO₂ (2.3 g × 100), excluding early experimentation or failed configurations. Larger AutoML workflows may easily exceed 300–500 g of CO₂ per model family. Thus, tuning—not final training—may dominate research-phase emissions. This observation substantially amplifies the policy relevance of carbon-aware experimentation and underscores the importance of reporting full lifecycle carbon footprints.

Mitigation strategies emerge directly from this result:

Early stopping to terminate weak configurations quickly.
Warm starting from prior best configurations to avoid random restarts.
Bayesian optimisation (e.g., Optuna, Hyperopt) to reduce trial counts.
Carbon-aware scheduling to run tuning during low-intensity grid windows.

These practices preserve scientific rigour while aligning experimentation with carbon-budget constraints.

Linear models are considered superior in terms of carbon efficiency and interpretability, offering near-maximum accuracy at several orders of magnitude lower emissions.

Although our task focuses on vehicle emissions, the observed trade-off structure generalises to many tabular regression domains (finance risk scoring, healthcare outcomes, energy forecasting). In such settings, polynomial feature engineering, together with regularisation, can capture most of the nonlinear structure at a fraction of the carbon cost of ensembles. The central lesson is therefore not “avoid boosting,” but “justify boosting.” Carbon becomes an explicit decision variable alongside accuracy and interpretability.

In sum, our results challenge the default assumption that marginal accuracy gains are always worth their computational cost. Regularised polynomial linear models provide near-optimal accuracy, clear interpretability, and several orders of magnitude lower training emissions. Complex models remain acceptable only when their marginal accuracy produces tangible real-world benefits that outweigh carbon externalities. This framework operationalises Green AI as a pragmatic, not ideological, shift in model selection—one that is urgently needed for climate-critical ML systems.

6. Conclusions

This study provides a quantitative response to a neglected paradox in climate-oriented machine learning: models trained to predict CO₂ emissions also emit CO₂, and this hidden cost can be large relative to the marginal accuracy gained. By benchmarking nine regression models (simple linear through XGBoost) on a 7385-vehicle dataset, we revealed a striking mismatch between predictive performance and environmental burden. While test R² scores clustered tightly near the upper bound of achievable accuracy for tabular vehicle data (R²: 0.72–0.99), training emissions spanned nearly four orders of magnitude (0.001–2.30 g CO₂), producing efficiency differences that are invisible under conventional accuracy-only evaluation.

The most consequential finding of this study is the empirical identification of an accuracy–carbon emission Pareto frontier in tabular regression. Regularised polynomial linear models (Lasso and Ridge) occupy this frontier, achieving near-maximum predictive performance at negligible carbon cost. In contrast, gradient-boosted ensembles such as XGBoost lie beyond an “efficiency cliff,” where marginal gains in R² require several orders of magnitude higher emissions. This frontier reframes model selection as a multi-objective optimisation problem rather than a single-metric ranking exercise.

Three core contributions emerge. First, we deliver the first multi-model carbon benchmark for classical ML regressors in vehicle CO₂ prediction, addressing a major gap in Green-AI literature that has focused predominantly on deep learning while overlooking the “workhorse” algorithms that dominate real-world tabular deployments. Second, we show empirically that regularised polynomial linear models (Lasso/Ridge) sit on the accuracy–emissions Pareto frontier: they achieve 99.08% variance explained at only 0.004 g CO₂, outperforming gradient boosting in sustainability by roughly 500× while sacrificing less than 0.4 percentage points in R². This finding challenges the default assumption that ensemble methods are universally preferable for regression. Third, we translate carbon-aware evaluation into practice by proposing policy-ready metrics—Accuracy-per-Gram (APG) and Marginal Emissions Cost (MEC)—and demonstrating their utility for deployment decisions under fleet-scale, continuous-retraining scenarios.

Beyond the specific task of predicting vehicle emissions, our results imply a broader shift in ML evaluation culture. Accuracy can no longer be treated as the sole currency of model quality. In an era of carbon budgets and net-zero commitments, computational sustainability must co-determine algorithmic choice, especially for systems whose operational logic is explicitly climate-motivated. Importantly, our findings do not argue against complex models in general; rather, they provide a principled framework for when complex models are justified and when they are not. In most tabular regression settings where policy tolerances accept explained variance ≥ 95%, regularised polynomial linear approaches offer a uniquely strong combination of accuracy, explainability, and carbon efficiency.

The machine learning community thus stands at a crossroads. For two decades, we have rewarded incremental accuracy gains while externalising computational emissions. This study demonstrates that such a trajectory is ecologically untenable at scale. Yet it also shows that sustainability and performance need not be opposing goals: near-state-of-the-art accuracy at near-zero carbon cost is already achievable with carefully engineered, well-regularised linear models. The open question is not whether Green AI is possible, but whether we will adopt it as a norm. Beyond methodological refinement, cultural change in machine learning research is essential. We encourage academic conferences and journals, particularly those already requiring code and data availability for reproducibility, to consider requiring—or, where feasible, formally requiring—authors to report estimated carbon emissions for typical experimental pipelines using standardised tools such as CodeCarbon. Treating carbon disclosure as a routine component of reproducibility would align research incentives with global sustainability goals and accelerate the normalisation of Green-AI practices. While this work establishes a clear accuracy–carbon trade-off for tabular regression, several open questions remain. Our benchmark focuses on structured, low-dimensional regression. Domains such as image classification, NLP, and telemetry-based time-series may exhibit different Pareto structures because representational complexity is intrinsic to the data rather than feature-engineered. Replication studies across modalities are necessary to test whether linear-regularised dominance persists or whether nonlinear models offer proportionally larger benefits.

Our reported emissions correspond to the final trained models. In practice, automated tuning (grid search, Bayesian optimisation, neural architecture search) can multiply the number of training runs by two orders of magnitude. Future work should quantify lifecycle emissions in realistic AutoML pipelines and evaluate whether the accuracy gains justify the carbon overhead.

Compression strategies (distillation, pruning, quantisation) may offer a path toward XGBoost-level accuracy with Lasso-level emissions, but systematic evidence is limited in regression contexts. Evaluating compressed ensembles and lightweight neural models under carbon tracking is a key direction.

We focused on training-phase carbon footprints. In high-throughput services that serve millions of predictions per day, inference—alongside data pipelines and networking—may account for a large share of total emissions. Future studies should measure end-to-end lifecycle carbon to determine when training efficiency materially affects overall impact.

Building on these limitations, several extensions are especially promising: integrating real-time grid intensity into training schedulers (carbon-aware retraining); developing multi-objective tuning that optimises (accuracy, latency, emissions) jointly; quantifying federated vs. centralised training footprints; and partnering with regulators to pilot carbon reporting requirements for ML systems used in environmental policy.

The significance of this research lies not in demonstrating that complex models consume more energy—a fact that may appear intuitive—but in quantifying the magnitude and policy relevance of this trade-off. When machine learning systems are deployed explicitly for climate mitigation, their computational externalities become ethically and operationally relevant. This study, therefore, reframes model selection as a sustainability-aware optimisation problem rather than a purely predictive competition.

Author Contributions

Methodology, M.E. (Murat Emeç); Software, M.E. (Murat Emeç); Formal analysis, M.T.; Investigation, M.E. (Muzaffer Ertürk); Resources, M.E. (Murat Emeç) and M.E. (Muzaffer Ertürk); Data curation, M.T.; Writing—review & editing, M.E. (Muzaffer Ertürk); Supervision, M.E. (Murat Emeç). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gurcan, F. Forecasting CO₂ emissions of fuel vehicles for an ecological world using ensemble learning, machine learning, and deep learning models. PeerJ Comput. Sci. 2024, 10, e2234. [Google Scholar] [CrossRef]
Udoh, J.; Lu, J.; Xu, Q. Application of machine learning to predict CO₂ emissions in light-duty vehicles. Sensors 2024, 24, 8219. [Google Scholar] [CrossRef]
Khajavi, H.; Rastgoo, A. Predicting the carbon dioxide emission caused by road transport using a Random Forest (RF) model combined with Meta-Heuristic algorithms. Sustain. Cities Soc. 2023, 93, 104503. [Google Scholar] [CrossRef]
Alam, G.M.I.; Tanim, S.A.; Sarker, S.; Watanobe, Y.; Islam, R.; Mridha, M.F.; Nur, K. Deep learning-based prediction of vehicle CO₂ emissions with eXplainable AI integration for a sustainable environment. Sci. Rep. 2025, 15, 3655. [Google Scholar] [CrossRef]
Nguyen, V.G.; Duong, X.Q.; Nguyen, L.H.; Nguyen, P.Q.P.; Priya, J.C.; Truong, T.H.; Le, H.C.; Pham, N.D.K.; Nguyen, X. An extensive investigation on leveraging machine learning techniques for high-precision predictive modeling of CO₂ emission. Energy Sources Part A Recovery Util. Environ. Eff. 2023, 45, 9149–9177. [Google Scholar] [CrossRef]
Ajala, A.A.; Adeoye, O.L.; Salami, O.M.; Jimoh, A.Y. An examination of daily CO₂ emissions prediction through a comparative analysis of machine learning, deep learning, and statistical models. Environ. Sci. Pollut. Res. Int. 2025, 32, 2510–2535. [Google Scholar] [CrossRef]
Tian, L.; Zhang, Z.; He, Z.; Yuan, C.; Xie, Y.; Zhang, K.; Jing, R. Predicting energy-based CO₂ emissions in the United States using machine learning: A path toward mitigating climate change. Sustainability 2025, 17, 2843. [Google Scholar] [CrossRef]
Al Qohar, B.; Dullah, A.U.; Utami, P.; Unjung, J. Vehicle CO₂ emission predictive analytics using HistGradientBoosting regression algorithms. J. Media Inf. Teknol. 2024, 2, 1–12. [Google Scholar] [CrossRef]
Ridwan, F.; Siburian, P. Boosting-based machine learning models and hyperparameter tuning for predicting vehicle carbon dioxide emission. Adv. Sustain. Sci. Eng. Technol. 2025, 7, 02504019. [Google Scholar] [CrossRef]
Marco, S.; Povinelli, R.; Hinov, N.; Natarajan, Y.; Wadhwa, G.; Preethaa, K.; Paul, A. Forecasting carbon dioxide emissions of light-duty vehicles with different machine learning algorithms. Electronics 2023, 12, 2288. [Google Scholar] [CrossRef]
Alhussan, A.; Metwally, M. Enhanced CO₂ emissions prediction using Temporal Fusion Transformer optimized by Football Optimization Algorithm. Mathematics 2025, 13, 1627. [Google Scholar] [CrossRef]
Satpute, B.S.; Bharati, R.; Rahane, W. Predictive modeling of vehicle CO₂ emissions using machine learning techniques: A comprehensive analysis of automotive attributes. In Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 1–3 November 2023. [Google Scholar] [CrossRef]
Vu, H.; Ko, J. Effective modeling of CO₂ emissions for light-duty vehicles: Linear and non-linear models with feature selection. Energies 2024, 17, 1655. [Google Scholar] [CrossRef]
Mathivathanaraj, M.; Dayalan, A.D.; Rajarethinam, U.; Ajay, A.; Pazhaniraja, N. Vehicle CO₂ emission prediction using the MAWRF-AESL model. Int. J. Sci. Res. Eng. Manag. 2025, 9, 1–9. [Google Scholar] [CrossRef]
Wen, H.-T.; Lu, J.-H.; Jhang, D. Features importance analysis of diesel vehicles’ NO_x and CO₂ emission predictions in real road driving based on a gradient boosting regression model. Int. J. Environ. Res. Public Health 2021, 18, 13044. [Google Scholar] [CrossRef]
Akhatkulov, S.; Yalgoshev, I.; Urinboyev, Z. Vehicle CO₂ emission prediction using deep learning and ensemble machine learning methods. In Proceedings of the 2025 International Russian Automation Conference (RusAutoCon), Sochi, Russia, 7–13 September 2025. [Google Scholar] [CrossRef]
Joshy, L.A.; Sambandam, R.K.; Vetriveeran, D.; Jenefa, J. Regression analysis using machine learning algorithms to predict CO₂ emissions. In Proceedings of the 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 28 February–1 March 2024. [Google Scholar] [CrossRef]
Nagamalleswari, K.V.; Tanuja, Y.; Reddy, K.S.; Reddy, C.V.K. Towards smarter climate monitoring: CO₂ emission predictions through machine learning. Int. J. Future Eng. Innov. 2025, 2, 56–60. [Google Scholar] [CrossRef]
Mądziel, M.; Jaworski, A.; Kuszewski, H.; Woś, P.; Campisi, T.; Lew, K. The development of a CO₂ instantaneous emission model of a full hybrid vehicle using machine learning techniques. Energies 2021, 15, 142. [Google Scholar] [CrossRef]
Cha, J.; Park, J.; Lee, H.; Chon, M. A study of prediction based on regression analysis for real-world CO₂ emissions with light-duty diesel vehicles. Int. J. Automot. Technol. 2021, 22, 569–577. [Google Scholar] [CrossRef]
Shahariar, G.; Bodisco, T.; Surawski, N.C.; Komol, M.M.R.; Sajjad, M.; Chu-Van, T.; Ristovski, Z.; Brown, R.J. Real-driving CO₂, NO_x, and fuel consumption estimation using machine learning approaches. Next Energy 2023, 1, 100060. [Google Scholar] [CrossRef]
Hien, N.L.H.; Kor, A. Analysis and prediction model of fuel consumption and carbon dioxide emissions of light-duty vehicles. Appl. Sci. 2022, 12, 803. [Google Scholar] [CrossRef]
Ashok, K.; Rithishbrahma, P. Prediction of vehicle carbon emission using machine learning. In Proceedings of the 2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 7–9 August 2024. [Google Scholar] [CrossRef]
Al-Nefaie, A.H.; Aldhyani, T.H.H. Predicting CO₂ emissions from traffic vehicles for a sustainable and smart environment using a deep learning model. Sustainability 2023, 15, 7615. [Google Scholar] [CrossRef]
Isaac, A.J.; John, R.; Samuel, R.; John, P.; Raj, R.; Raj, P.; Kanna, R. Ecodrive-deep learning models for accurate prediction of vehicle CO₂ emissions. In Proceedings of the 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 5–7 June 2024. [Google Scholar] [CrossRef]
Li, S.; Tong, Z.; Haroon, M. Estimation of transport CO₂ emissions using a machine learning algorithm. Transp. Res. Part D Transp. Environ. 2024, 133, 104276. [Google Scholar] [CrossRef]
Tena-Gago, D.; Golcarenarenji, G.; Martínez-Alpiste, I.; Alcaraz-Calero, J. Machine-learning-based carbon dioxide concentration prediction for hybrid vehicles. Sensors 2023, 23, 1350. [Google Scholar] [CrossRef]
Alazemi, F.; Alazmi, A.; Alrumaidhi, M.; Molden, N. Predicting fuel consumption and emissions using GPS-based machine learning models for gasoline and diesel vehicles. Sustainability 2025, 17, 2395. [Google Scholar] [CrossRef]
Yuan, D.; Tang, L.; Yang, X.; Xu, F.; Liu, K. Explainable machine learning prediction of vehicle CO₂ emissions for sustainable energy and transport. Energies 2025, 18, 5408. [Google Scholar] [CrossRef]
Mądziel, M. Predictive methods for CO₂ emissions and energy use in vehicles at intersections. Sci. Rep. 2025, 15, 6463. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Workflow overview.

Figure 2. Actual vs. predicted CO₂ emissions for the first 20 test observations.

Table 1. Dataset summary statistics.

Statistic	Engine_Size	Cylinders	Fuel_Cons_Comb	CO₂
Count	7385	7385	7385	7385
Mean	3.2	5.8	10.1	237
Std Dev	1.4	2.1	2.8	65
Min	1.0	3	3.8	96
25% (Q1)	2.0	4	8.2	185
50% (Median)	3.0	6	9.8	230
75% (Q3)	4.0	8	11.5	270
Max	8.0	16	26.0	525
Skewness	0.82	0.65	0.91	0.88

Table 2. ANOVA results for categorical predictors.

Categorical Variable	F-Statistic	p-Value	Effect Size (η²)	Significant?
“vehicle_class”	324.5	<0.001	0.41	Yes
“make”	18.7	<0.001	0.09	Yes
“fuel_type”	156.3	<0.001	0.25	Yes
“transmission”	42.1	<0.001	0.12	Yes

Table 3. Emissions under alternative grid scenarios (Iceland → India).

Region/Scenario	Carbon Intensity (g/kWh)	XGBoost CO₂ (g)	Lasso CO₂ (g)
Iceland (hydro/geo)	50	0.26	0.0005
France (nuclear)	80	0.41	0.0008
Turkey (baseline)	450	2.30	0.0045
Germany (coal/gas)	350	1.79	0.0035
China (coal-heavy)	550	2.82	0.0055
India (coal)	700	3.59	0.0070

Table 4. Comprehensive model performance and carbon footprint.

Rank	Model	Train R²	Test R²	Test RMSE (g/km)	Test MAE (g/km)	Training Time (s)	CO₂ Emissions (g)	APG (R²/g CO₂)
1	XGBoost	0.9982	0.9947	13.42	9.85	220	2.30	0.43
2	MLP Regressor	0.9941	0.9923	16.20	11.60	145	1.80	0.55
3	Lasso Regression	0.9912	0.9908	17.69	12.80	14	0.0045	220.2
4	Ridge Regression	0.9910	0.9908	17.70	12.82	12	0.0036	275.2
5	Polynomial OLS	0.9915	0.9908	17.68	12.78	2.5	0.0022	450.4
6	Random Forest	0.9955	0.9908	17.70	13.01	52	1.58	0.63
7	Multiple Linear	0.9005	0.9000	18.37	13.15	1.0	0.0015	600.0
8	Decision Tree	0.9999	0.8450	72.50	58.20	0.8	0.18	4.69
9	Simple Linear	0.7240	0.7238	30.82	24.10	0.5	0.0010	723.8

Table 5. Ultra-low carbon training footprints of linear and regularised polynomial models.

Model	Time (s)	Energy (Wh)	CO₂ (g)
Simple Linear	0.5	0.015	0.0010
Multiple Linear	1.0	0.031	0.0015
Polynomial OLS	2.5	0.080	0.0022
Ridge	12	0.383	0.0036
Lasso	14	0.467	0.0045

Table 6. High-emission tier: training costs of ensemble and neural models.

Model	Structure	Time (s)	CO₂ (g)
Random Forest	100 trees	52	1.58
MLP	3 hidden layers	145	1.80
XGBoost	500 boosted trees	220	2.30

Table 7. Comparison of adjacent models along the accuracy ladder.

Comparison	ΔR²	ΔCO₂ (g)	MEC (g per 0.01 R²)	Interpretation
MLR → Polynomial OLS	+0.0908	+0.0007	0.008	Large accuracy gain at negligible carbon cost
Polynomial OLS → Lasso	+0.0000	+0.0023	Infinite	No gain, higher emissions
Lasso → Random Forest	+0.0000	+1.5755	Infinite	Emissions jump without an accuracy gain
Lasso → MLP	+0.0015	+1.7955	1197	0.15% accuracy costs ~1.8 g CO₂
MLP → XGBoost	+0.0024	+0.5000	208	0.24% accuracy costs 0.5 g CO₂
Lasso → XGBoost	+0.0039	+2.2955	589	0.39% accuracy costs 2.3 g CO₂

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Turhan, M.; Emeç, M.; Ertürk, M. Benchmarking Training Emissions of Regression Models for Vehicle CO₂ Prediction. Sustainability 2026, 18, 2830. https://doi.org/10.3390/su18062830

AMA Style

Turhan M, Emeç M, Ertürk M. Benchmarking Training Emissions of Regression Models for Vehicle CO₂ Prediction. Sustainability. 2026; 18(6):2830. https://doi.org/10.3390/su18062830

Chicago/Turabian Style

Turhan, Mahmut, Murat Emeç, and Muzaffer Ertürk. 2026. "Benchmarking Training Emissions of Regression Models for Vehicle CO₂ Prediction" Sustainability 18, no. 6: 2830. https://doi.org/10.3390/su18062830

APA Style

Turhan, M., Emeç, M., & Ertürk, M. (2026). Benchmarking Training Emissions of Regression Models for Vehicle CO₂ Prediction. Sustainability, 18(6), 2830. https://doi.org/10.3390/su18062830

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu