Next Article in Journal
Climate Change and Non-Communicable Diseases: A Bibliometric, Content, and Topic Modeling Analysis
Previous Article in Journal
Analyzing Long-Term Land Use/Cover Change (LUCC) and PM10 Levels in Coastal Urbanization: The Crucial Influence of Policy Interventions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Fuel Consumption and Emissions Using GPS-Based Machine Learning Models for Gasoline and Diesel Vehicles

1
Department of Safety, Kuwait Municipality, Kuwait City 12027, Kuwait
2
Department of Construction Project, Ministry of Public Work, South Surra 12011, Kuwait
3
Civil Engineering Department, College of Technological Studies, Public Authority for Applied Education and Training, Shuwaikh 70654, Kuwait
4
Emissions Analytics, C R Bates Industrial Estate, 2, Stokenchurch, High Wycombe HP14 3PD, UK
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(6), 2395; https://doi.org/10.3390/su17062395
Submission received: 11 January 2025 / Revised: 5 March 2025 / Accepted: 7 March 2025 / Published: 9 March 2025

Abstract

:
The transportation sector plays a vital role in enabling the movement of people, goods, and services, but it is also a major contributor to energy consumption and greenhouse gas emissions. Accurate modeling of fuel consumption and pollutant emissions is critical for effective transportation management and environmental sustainability. This study investigates the use of real-world driving data from gasoline and diesel vehicles to model fuel consumption and exhaust emissions ( C O 2 and N O x ). The models were developed using ensemble bagged and decision tree algorithms with inputs derived from both vehicle speed and GPS speed data. The results demonstrate high predictive accuracy, with the ensemble bagged model consistently outperforming the decision tree model across all datasets. Notably, GPS speed-based models showed comparable performance to vehicle speed-based models, indicating the feasibility of using GPS data for real-time predictions. Furthermore, the combined gasoline and diesel engine dataset improved the accuracy of C O 2 emission predictions, while the gasoline-only dataset yielded the highest accuracy for fuel consumption. These findings underscore the potential of integrating GPS-based machine learning models into Intelligent Transportation Systems (ITS) to enhance real-time monitoring and policymaking. Future research should explore the inclusion of heavy-duty vehicles, additional pollutants, and advanced modeling techniques to further improve predictive capabilities.

1. Introduction

The transportation system is a cornerstone of economic growth and societal development, enabling the movement of people, goods, and services across regions. However, it also significantly contributes to global energy consumption and greenhouse gas (GHG) emissions, posing serious environmental and public health challenges. In the European Union, the transport sector accounted for 72% of transportation-related GHG emissions in 2017, with passenger cars alone contributing 44% [1]. Similarly, in Beijing, passenger cars emitted 11.3 million tons of carbon dioxide ( C O 2 ) in 2012, constituting 75.5% of the city’s total C O 2 emissions [2]. These figures underscore the urgent need for accurate predictive modeling of fuel consumption and vehicle emissions to inform policy and mitigate environmental impacts.
Vehicle emissions are influenced by a variety of factors, including vehicle type, engine characteristics, operational parameters (e.g., speed, acceleration, and driving patterns), and environmental conditions. Previous studies have demonstrated that vehicle speed, particularly instantaneous speed, plays a pivotal role in emissions modeling, as it captures dynamic changes in driving behavior [3]. Fuel type and properties, such as grade and chemical composition, also significantly affect emissions. While advancements in engine technology have led to more efficient engines, including diesel engines with reduced well-to-wheels emissions [4], the transportation sector remains a major contributor to poor air quality in urban areas.
Traditionally, emissions and fuel consumption have been modeled using empirical, statistical, and analytical methods. However, these approaches often face limitations in handling complex, multidimensional datasets. Machine learning (ML) techniques have emerged as a powerful alternative, offering the ability to model nonlinear relationships and analyze large datasets with high accuracy. ML approaches, such as artificial neural networks (ANN), support vector machines (SVM), and ensemble methods, have been successfully applied to emissions modeling. Recent studies have demonstrated their capability to predict various emissions, including C O 2 and N O x , with strong correlations between predicted and measured values [5,6].
Despite these advancements, several challenges remain. First, while most studies focus on individual engine types (either gasoline or diesel), the potential of mixed datasets to improve model generalizability is underexplored. Second, the reliance on vehicle speed as a primary input for predictive modeling requires extensive data collection infrastructure, which can be logistically complex and costly. GPS speed data offer a promising alternative, as it is widely accessible and requires minimal hardware modifications. However, its reliability and accuracy in predictive modeling, particularly for mixed gasoline and diesel datasets, have not been fully evaluated.
This research aims to address these gaps by developing machine learning models that predict fuel consumption and emissions using vehicle speed and GPS speed data. Ensemble bagged and decision tree algorithms are employed to evaluate model performance, leveraging data collected from gasoline and diesel engine vehicles under real-world driving conditions. By comparing mixed and individual datasets, this study investigates the impact of dataset composition on predictive accuracy. The results provide actionable insights into the feasibility of integrating GPS-based models into intelligent transportation systems (ITS) for real-time monitoring and management of environmental impacts.
The findings of this research contribute to the ongoing effort to develop sustainable transportation solutions by enhancing the accuracy and applicability of emissions and fuel consumption models. Furthermore, the study underscores the importance of leveraging emerging technologies, such as machine learning and GPS data, to address critical challenges in urban transportation and environmental sustainability.

2. Literature Review

The accurate modeling of vehicle fuel consumption and emissions has been a major focus of research due to the transportation sector’s substantial contribution to greenhouse gas emissions and air pollution. Numerous studies have explored various modeling techniques and influencing factors to enhance prediction accuracy and support policymaking.
Previous research addresses the foundational contributions that have shaped current research in fuel consumption and emissions prediction. Vehicle emissions are influenced by several factors, including vehicle physical properties, operational characteristics, environmental conditions, and fuel properties. Sullivan et al. (2004) demonstrated that diesel engines generally produce lower emissions than gasoline engines on a well-to-wheels basis [4]. Meanwhile, studies, such as those by Clark et al. (2011) and Zhang et al. (2013), have categorized these influencing factors, highlighting vehicle speed and acceleration as critical parameters for emissions modeling [7,8]. Specifically, speed has been extensively researched, with both instantaneous and averaged measurements proving vital for accurate predictions [3]. Early studies were crucial in establishing the basic principles, methodologies, and models used in emission estimation, particularly in relation to vehicle speed and driving behavior. These studies focused on understanding the relationship between vehicle operating conditions (like speed) and emissions, providing foundational insights for later research.
The evolution of emission modeling methodologies has seen the development of various approaches. Faris et al. (2011) and Wang and McGlinchy (2009) grouped emissions and fuel consumption models into five categories: scale input models (microscopic, macroscopic, mesoscopic), formulation models (analytical, empirical, statistical, graphical), dimension models, explanatory variable models, and state variable value models [9,10]. Among these, microscopic models have been favored for their ability to capture real-time variations in vehicle behavior, but they often require substantial computational resources. These studies established key models and frameworks that continue to influence current methodologies.
Predictive accuracy pertaining to vehicle emissions, especially NOx and CO2 and fuel usage, has been boosted through recent machine learning (ML) implementations together with GPS data application. Modern advances are constructed upon previous research findings, which prove historically significant in developing solutions for current emissions modeling problems. Machine learning approaches have gained prominence in recent years for their ability to process large and complex datasets, offering significant advantages over traditional methods [11,12,13]. Techniques such as artificial neural networks (ANN), support vector machines (SVM), and ensemble methods have shown high predictive accuracy for vehicle emissions and fuel consumption [14]. Deep learning techniques have also been introduced to the field, though their application remains limited. Works by Altug and Kucuk (2019) and Shin et al. (2021) have explored deep learning for predicting tailpipe N O x emissions and transient engine emissions, respectively [15,16]. These approaches demonstrate potential for capturing complex relationships in emissions data but require extensive computational resources and large datasets.
Researchers in Ecuador developed a new approach to evaluate pollutant emissions generated by light vehicles through their investigation [17]. Researchers developed an artificial neural network that utilized GPS data speed and slope information together with vehicle-specific data of mass and engine capacity to forecast CO2, CO, HC, and NOx emissions [17]. The examined models demonstrated strong predictive power through their 0.735 R2 value for CO2 predictions and 0.798 R2 value for NOx predictions, which established the effective combination of GPS data and ML techniques in emissions estimation [17]. Liu et al. (2023) integrated the Motor Vehicle Emission Simulator (MOVES) model with ML methods to predict C O 2 and NO emissions, achieving strong correlations between predicted and measured values [6]. Ahmed et al. (2021) applied ANN to model the emissions of spark ignition engines using methanol-gasoline blends, reporting an accuracy of 99% [5]. These studies underscore the adaptability and reliability of ML approaches across various contexts.
More recent advancements have focused on real-world driving data and the integration of portable emission measurement systems (PEMS) for accurate data collection [18,19]. Different research developed superior-level models of light-duty vehicle emissions and fuel consumption through deep neural networks (DNNs) [20]. The system utilized (PEMS) for real-time data collection by recording speed parameters and engine controls together with GPS identification data [20]. The implementation of DNNs enabled researchers to establish predictions of complex nonlinear associations between driving patterns and emission outputs, which boosted accuracy levels [20]. Lee et al. (2021) used ANN to predict diesel vehicle emissions under real-world conditions, incorporating diverse input parameters such as vehicle-specific power, ambient temperature, and exhaust gas recirculation rates [21]. Cha et al. (2021) utilized regression models to predict C O 2 emissions from diesel vehicles and found that instantaneous speed and acceleration were among the most influential predictors [22]. Researchers conducted another study to estimate real-driving CO2 together with NOx and fuel consumption through the use of machine learning methods [23]. The research group used real-measured driving data for developing predictive machine learning models that achieved reliable results for emissions and fuel usage evaluation [23]. The analysis demonstrates the ability of ML methods to handle complicated data for improving emission estimation accuracy in actual driving circumstances [23]. Using gradient boosting regression (GBR) models, Wen et al. (2021) demonstrated the capability of ML for NOx and CO2 emissions prediction together with fuel consumption modeling for diesel vehicles operating under different circumstances [24]. The training process employed PEMS data, which incorporated mass air flow rate together with exhaust flow rate features, according to Wen et al. (2021) [24]. The GBR models confirmed their exceptional predictive accuracy through their R2 values, which reached up to 0.99 in various driving situations [24]. A study by Alfaseeh et al. (2020) investigated how deep sequence learning models, specifically long short-term memory (LSTM) networks, can forecast greenhouse gas emissions that occur on road networks [25]. Time-series data that include speed and traffic density input enables these models to make forecasts of CO2 equivalent emissions at high temporal resolutions [25]. The integration of this method enables experts to develop eco-routing strategies and operate real-time emissions monitoring systems. Although ANN and SVM are commonly used, ensemble methods such as bagged and boosted trees have emerged as powerful tools for modeling emissions. Kocev et al. (2007) demonstrated the effectiveness of ensemble learning techniques in handling complex, high-dimensional datasets [26].
ML algorithms enable rapid vehicle-specific emissions evaluation when used with on-road remote sensing data. Scientists used ensemble model technology, which unites neural networks and extreme gradient boosting with random forests to assess CO, HC, and NOx emissions from light-duty gasoline vehicles [27]. On-road vehicle emission supervision becomes possible through this methodology, which provides practical functionality to policymakers and environmental agencies, according to Xia et al. (2021) [27].
The use of GPS data for real-time speed tracking and the integration of vehicle-specific data have added new dimensions to the prediction models. Furthermore, the increase in computing power and the development of machine learning techniques have enabled researchers to handle large datasets more effectively and generate more accurate, real-time emission predictions. More recent studies have significantly expanded incorporating real-time GPS data and advanced computational techniques, such as machine learning, to enhance the accuracy and applicability of emissions prediction. GPS speed data have recently gained attention as an alternative to vehicle speed data in emissions modeling. Studies by Akkamis et al. (2021) highlighted the accuracy of GPS speed sensors, particularly in high-acceleration scenarios [28]. However, the reliability of GPS data for predictive modeling in mixed vehicle datasets is not well-established, creating an opportunity for further exploration.
Modern investigations demonstrate that basic research performed in the last twenty years continues to maintain its essential position. Studies during the early phase established speed and acceleration variables as significant components when developing emissions models, which are now combined with advanced machine learning methods combined with GPS systems. Modern predictions based on sophisticated ML techniques developed from traditional empirical models show continued relevance in the present research effort for predicting accurate emissions outcomes. While significant progress has been made, gaps remain in the application of ML techniques to mixed gasoline and diesel datasets and the use of alternative input parameters such as GPS speed data. This study aims to address these gaps by leveraging ensemble bagged models and decision tree algorithms to evaluate the predictive accuracy of GPS-based models for fuel consumption and emissions. By combining datasets from gasoline and diesel engines, this research seeks to develop generalized models that enhance real-world applicability and provide actionable insights for sustainable transportation management.

3. Materials and Methods

3.1. Methodology

In this study, machine learning techniques, specifically the decision tree and Bagged Trees algorithms, were employed to develop predictive models for analyzing fuel consumption and exhaust emissions using real-world driving data from gasoline and diesel vehicles. The models were trained on driving datasets to identify key factors influencing fuel efficiency and pollutant emissions, including carbon dioxide ( C O 2 ) and nitrogen oxides ( N O x ). To ensure model accuracy and reliability, various evaluation metrics, including mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the coefficient of determination ( R 2 ), were utilized. Additionally, 5-fold cross-validation was applied to prevent overfitting and improve the model’s generalization capability. The following sections detail the methodology, including data preprocessing, model training, validation strategies, and evaluation criteria.

3.1.1. Decision Tree

Decision trees (DT) are a widely used machine learning technique for classification and regression tasks due to their interpretability and effectiveness in handling both numerical and categorical data. A decision tree consists of a hierarchical structure composed of nodes, branches, and leaves, where each internal node represents a decision rule based on a feature, branches indicate possible outcomes, and leaf nodes correspond to the final prediction or class label [29]. The construction of a decision tree follows a recursive partitioning approach, where the dataset is iteratively split based on a selected attribute that optimally separates the data. The splitting criterion varies depending on the type of problem; for classification tasks, measures such as Gini impurity or information gain (based on entropy) are commonly used, while regression trees typically rely on mean squared error (MSE) minimization.
Mathematically, the impurity reduction at each split can be expressed as follows. For classification, the trees using entropy:
I ( S ) = i = 1 k   p i l o g 2 p i ,
where I S represents the entropy of a dataset S , k is the number of classes, and p i denotes the proportion of samples belonging to class i . The optimal split is determined by maximizing the information gain, calculated as the difference in entropy before and after the split.
The construction of a decision tree involves three key steps. First, the algorithm selects the most informative feature to split the dataset based on a chosen criterion. Second, the dataset is recursively partitioned into subsets until a stopping condition is met, such as a maximum tree depth or a minimum number of samples per leaf node. Finally, the resulting tree structure is pruned, if necessary, to prevent overfitting by removing branches that contribute little to predictive performance.

3.1.2. Bagged Trees

The Bagging (bootstrap aggregating) Trees algorithm, introduced by Breiman (1996), is a widely used ensemble learning technique designed to enhance predictive accuracy by reducing model variance [30]. This method constructs multiple decision tree (DT) models using bootstrap sampling, where training datasets are created by randomly selecting data points with replacement. Each sampled dataset is used to train an independent DT, ensuring diversity among the models [31]. During training, the overall number of training samples remains the same across iterations, but their distribution varies due to the resampling process, leading to a more generalized model [32]. The final prediction is derived by aggregating the outputs of all decision trees—either through majority voting in classification tasks or by averaging in regression tasks. This process can be mathematically expressed as
z ^ = 1 M i = 1 M φ i x ,
where z ^ represents the final model output, M is the total number of DT models, and φi(x) corresponds to the prediction made by the ith decision tree.
The development of Bagged Trees involves three primary steps. First, multiple DT models are generated as base learners. Second, bootstrap sampling is applied to create diverse training sets, where approximately two-thirds of the data are used for training, while the remaining one-third is reserved as out-of-bag (OOB) data to assess model performance. Finally, the individual tree predictions are aggregated to produce the final output, improving stability and mitigating overfitting [33,34]. Bagged Trees have been widely used in transportation studies, particularly in accident severity prediction, traffic flow analysis, and driver behavior modeling, due to their ability to capture complex relationships within transportation data while maintaining high predictive reliability.

3.1.3. Evaluation Metrics

Evaluating the performance of a predictive model is essential to ensure its accuracy, reliability, and generalization ability. In regression analysis, various statistical metrics are used to assess how well the model’s predictions align with actual observed values. The most commonly used evaluation metrics include mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the coefficient of determination ( R 2 ).
Model validation was conducted by dividing the dataset into three subsets. The first and second subsets were used for model training, while the developed model was exported and applied to the third subset to simulate its performance. To ensure robustness and minimize the risk of overestimation, we employed 5-fold cross-validation, a widely used resampling technique that partitions the dataset into five subsets, iteratively training the model on four folds and validating it on the remaining one. This process was repeated five times, and the average of the evaluation metrics was calculated to assess model performance. The selected evaluation metrics provide insights into model accuracy and error distribution.

Mean Absolute Error (MAE)

Mean absolute error (MAE) measures the average magnitude of the errors between predicted and actual values without considering their direction. It is calculated as
M A E = 1 n i = 1 n   y i y ^ i ,
where n is the total number of observations, y i represents the actual value, and y ^ i is the predicted value. MAE provides an intuitive measure of error in the same unit as the target variable, making it easy to interpret. However, it does not penalize large errors as severely as squared error metrics.

Mean Squared Error (MSE)

Mean squared error (MSE) calculates the average of the squared differences between actual and predicted values. It is expressed as
M S E = 1 n i = 1 n   y i y ^ i 2 .
By squaring the errors, MSE places a higher penalty on larger deviations, making it more sensitive to outliers. This property helps in assessing models that need to minimize significant prediction errors.

Root Mean Squared Error (RMSE)

Root mean squared error (RMSE) is derived from MSE by taking the square root, which ensures that the metric remains in the same unit as the original data. It is given by
R M S E = 1 n i = 1 n   y i y ^ i 2 .
RMSE is widely used because it penalizes larger errors more heavily while maintaining interpretability. A lower RMSE indicates a model with better predictive accuracy.

Coefficient of Determination ( R 2 )

The coefficient of determination ( R 2 ) measures the proportion of variance in the dependent variable that is explained by the independent variables in the model. It is defined as
R 2 = 1 i = 1 n   y i y ^ i 2 i = 1 n   y i y ¯ 2 ,
where y ¯ is the mean of the observed values. The R 2 value ranges from 0 to 1, with values closer to 1 indicating a better fit. A high R 2 suggests that the model captures a significant portion of the variability in the target variable, whereas a low R 2 indicates a poor fit.

3.2. Data Acquisition and Preprocessing

Data used in this research was acquired from a portable emission measurement system (PEMS) using SEMTECH-LDV sensors. The PEMS setup is shown in Figure 1. The real-world test was conducted by the Emissions Analytics Research Center. The test involved four vehicles: two with gasoline and two with diesel engines. All test vehicles used in the research were light-duty, automatic transmission, EPA tier 3 standard. The gasoline engine data were extracted from Nissan Altima (2-FWD and Tier 3 Bin 30) and Volvo S60 (4-AWD Tier 3 Bin 30) 2020 model sedan vehicles. The diesel engine data were from a 2020 Jeep Wrangler (4-AWD Tier 3 Bin 160) sport utility vehicle and a 2021 Jeep Gladiator (4-AWD Tier 3 Bin 160) pickup vehicle. The PEMS is a compact instrument that includes a GPS, weather station, and portable analyzer. These components make it possible for the PEMS to measure exhaust mass flow, fuel rate, pollutants, vehicle properties, GPS data, road properties, and weather probe data in real-time. Data from the PEMS were exported to a CSV file, and the data points corresponding to device standby mode were removed.
The selected input parameters were environmental factors, exhaust parameters, engine, and vehicle speed (Table 1). The gasoline engine dataset consists of 21,098 samples, while 20,500 samples were recorded with the diesel engine.
Two case studies were involved. First, the diesel and gasoline databases were combined to train the models. In this case, the data were further divided into vehicle speed and GPS speed datasets, which consisted of the mixed diesel/gasoline database. The databases were trained with the ensemble bagged and tree algorithms. In the second scenario, only the ensemble bagged algorithm was used to train the models using the separate diesel and gasoline databases for vehicle and GPS speed input categories.
The dataset was normalized between 0 and 1 to avoid early saturation of network weight, which would lead to faster convergence. The model was trained using the MATLAB 9.10 regression toolbox. The ensemble bagged and decision tree algorithms were used to train the model.

3.3. Fuel Consumption Calculations

Vehicle emissions are closely related to fuel consumption [35,36,37]. Reference [38] outlined light-duty and heavy-duty vehicle methods of calculating fuel consumption using the carbon balance method from PEMS test results. From the measured emitted C O 2 , CO, and total hydrocarbons (THC), the fuel consumption, in grams, can be estimated [38].

4. Results

GPS speeds are commonly used in transportation research. GPS speed sensors are typically used to replace radar speed sensors since they deliver speed data in pulse signals [28]. According to Akkamis et al., (2021), higher update frequency GPS speed sensors give better accuracy, especially in the case of higher acceleration situations [28]. One of the main contributions of this research is to investigate the efficiency of using GPS speed data in an ML model to predict fuel consumption and vehicle emissions. The percentage error of a GPS speed measurement was 1.4% when compared to vehicle speed data. We investigated how ML can accurately use this low error GPS speed data to predict fuel consumption and emission factors. The compiled engine type dataset was used and a comparison between actual vehicle speed and GPS vehicle speed was made. Moreover, this section discusses the difference between gasoline and diesel engines to explain the performance of the ML model when applied to these two types of engines. Finally, a comparison between the compiled diesel and gasoline dataset and the separate datasets was made to quantify the difference between the real-world (compilation of different vehicle engine types) and the ideal (one engine type) conditions.
In the research database (diesel and gasoline engines), the mean vehicle speed was 46.5 mph and the mean GPS speed was 47.2 mph. Mean fuel rate and emission factors among all monitored runs were 0.0003 gal/s fuel rate, 2.93 g/s C O 2 , and 0.00029 g/s N O x .
Ensemble and decision tree models were used to predict fuel rate, which measures instantaneous fuel consumption of two exhaust pollutants: C O 2 and N O x . The model results are presented in Table 2, and the regression scatter plot showing the predicted and measured value relationship is shown in Figure 2. The results are presented in two categories. The first is an evaluation of modeling fuel consumption using the mixed diesel and gasoline dataset for the vehicle and GPS speed input classes. The second category evaluates the performance of the ensemble bagged algorithm on the single diesel and gasoline datasets for the vehicle and GPS speed input classes.

4.1. Modeling of Fuel Consumption

The modeling of fuel consumption has become an important issue for consideration not just for the implementation of a sustainable intelligent transport system (ITS), but also for economic justification of the amount of energy consumed. Again, in Table 2, the results of the model performance are presented.
Based on the value of the coefficient of determination (R2), the ensemble bagged model performance exceeded that of the decision tree model in both vehicle and GPS speed input categories.
The scatter plot of the predicted and measured fuel consumption presented in Figure 2 depicts the prediction accuracy of the models. The results show that the ensemble bagged model performance accuracy is higher than that of the tree algorithm. The scatter plot data points are closer to the line of equality than the tree model plot. This is an indication of better model performance.
Furthermore, the values of the RMSE, MAE, and MSE presented in Table 2 also indicate that the performance of the ensemble bagged model has better accuracy. Although the normalized MSE value of both models obtained was 0.001, a lower RMSE and MAE value of 0.025 and 0.014 for the bagged model explains the higher R2 value of 0.967 and 0.968 for fuel consumption prediction using vehicle speed and GPS parameters, respectively.
A superimposition scatter plot of the vehicle and GPS speed presented in Figure 3 and 4 shows that the GPS input-based models are much closer to the line of equality. Also, the similarities in the models’ plot position for the outliers strongly suggest the potential of using GPS data in the prediction of fuel consumption. The results further show that modeling of fuel consumption with the gasoline engine dataset has high performance accuracy for both vehicle speed and GPS parameter-based models. The determination coefficient (R2) obtained is about 99% for the gasoline dataset. Furthermore, the poor performance of the diesel engine dataset models could be attributed to either low data quality arising from machine error or the insufficiency of the selected input for predicting consumption in diesel engines.

4.2. Modeling of C O 2 Exhaust

The transportation industry accounts for large C O 2 emissions, especially in urban areas, which not only increase the potential of increasing global warming but also contribute to poor air quality. This research evaluated the model performance of ensemble bagged and decision tree models using vehicle and GPS parameters. Based on the data from Figure 3 and Table 3, GPS parameter-based models have higher performance accuracy when compared with vehicle speed-based models in both diesel and gasoline mixed and single databases. The confidence of determination obtained between the normalized predicted and measured values were 0.966 and 0.975 for the ensemble and tree models, respectively, in the GPS input-based model. The results further suggest that real-time monitoring of C O 2 is feasible and can be a useful tool to help policymakers make optimizations in the transportation system.

4.3. Modeling of N O x Exhaust

The model performance of N O x exhaust emissions shows a slightly different trend when compared to C O 2 models. The N O x model shows poor performance using the GPS input parameter. The performance accuracy obtained with the ensemble bagged model was 95% and further decreased to 76% in the tree model. However, the high performance accuracy of the ensemble model also indicates that the GPS parameter can be used to predict N O x exhaust emissions.
Table 4 further indicates that a combination of the evaluating parameters (RMSE, MAE, MSE and R2) is necessary for validating the performance of a model.
Although, in the prediction of N O x , the performance accuracy of the single diesel and gasoline database model did not exceed 89%, as observed in Figure 4, surprisingly, the ensemble bagged methods in the diesel/gasoline mixed database had a higher performance accuracy, up to 96%.

5. Discussion

Rising greenhouse gas emissions and their environmental impacts have become a global concern, necessitating innovative approaches to quantify and mitigate emissions. Real-time estimation of emissions and fuel consumption is critical for developing efficient Intelligent Transportation Systems (ITS) and implementing environmentally and economically sustainable transportation solutions. This study addresses these needs by evaluating machine learning models based on GPS speed and vehicle speed inputs.
Recent work has focused on integrating real-time vehicle behavior data, including speed variations and acceleration patterns, to develop dynamic, location-specific emission forecasts. Despite these advancements, there are still significant challenges that warrant further research. For instance, the variability in GPS signal accuracy, the influence of different vehicle types, and the adaptation of predictive models to new emission standards and real-world driving conditions remain areas for improvement. Additionally, integrating machine learning and artificial intelligence with real-time GPS data presents opportunities to refine emission predictions and address emerging concerns, such as electric vehicles and new fuel technologies. This study addresses these needs by evaluating machine learning models based on GPS speed and vehicle speed inputs.
The findings indicate that GPS speed, as an alternative to vehicle speed, is a reliable and efficient parameter for predicting fuel consumption and exhaust emissions. The low percentage error (1.4%) between GPS speed and vehicle speed measurements demonstrates the feasibility of using GPS data for real-time monitoring. The ensemble bagged model exhibited superior performance compared to the decision tree model across all datasets and parameters, reinforcing its applicability for complex, high-dimensional data.
The study also explored the combined use of gasoline and diesel engine datasets. Interestingly, while the gasoline-only dataset achieved the highest accuracy in fuel consumption predictions, the combined dataset enhanced the predictive accuracy for C O 2 emissions. These findings suggest that combining datasets can capture broader variations and improve model robustness, particularly for emission predictions.
The comparison of models trained on individual gasoline and diesel datasets versus the combined dataset underscores the importance of data diversity in machine learning applications. The high accuracy achieved with gasoline datasets could be attributed to the consistency of input parameters, whereas the slightly lower performance for diesel engines points to potential variability in data quality or inherent differences in engine behavior. Figure 5, Figure 6 and Figure 7 present a detailed summary of the models’ comparison using a radar chart. In these Figures, the performance metrics were normalized to between 0 and 0.5 to scale their visual representation while ensuring that R2 remains distinguishable.
A significant contribution of this study is the demonstration of GPS data as a viable alternative to vehicle speed data, particularly in scenarios where installing speed sensors may be impractical. The results pave the way for ITS solutions that can quantify real-time fuel consumption and emissions without the logistical and legal complexities associated with vehicle-integrated systems.
However, the scope of this study is limited to light-duty vehicles and specific exhaust pollutants ( C O 2 and N O x ). Further research is required to expand these findings to heavy-duty vehicles, additional pollutants, and alternative machine learning techniques such as deep learning. Future investigations should also examine the impact of varying road conditions, vehicle types, and environmental factors to develop more generalized and scalable models.
This research represents a step forward in leveraging machine learning for sustainable transportation, offering practical insights for policymakers and transportation planners aiming to optimize energy use and minimize environmental impacts.

6. Conclusions

This study investigated the application of ensemble bagged and decision tree algorithms for predicting fuel consumption and exhaust emissions ( C O 2 and N O x ) in gasoline and diesel engine vehicles using vehicle speed and GPS-based speed input parameters. The research also explored the impact of combining gasoline and diesel engine datasets to enhance model accuracy.
The results demonstrated that both ensemble bagged and decision tree models achieved high predictive accuracy, with performance exceeding 95% in fuel consumption predictions for both vehicle speed and GPS speed inputs. The ensemble bagged model consistently outperformed the decision tree model, showcasing its superior capability in handling complex datasets. Notably, GPS-based input models performed comparably to vehicle speed-based models, emphasizing the feasibility of GPS data for real-time modeling applications in intelligent transportation systems (ITS).
Moreover, the analysis revealed that the combination of gasoline and diesel datasets improved the predictive accuracy for C O 2 emissions, with the mixed dataset performing better than individual engine datasets. However, the gasoline engine dataset exhibited the highest predictive accuracy, achieving 99% in both input variable categories.
This research highlights the potential of leveraging GPS speed data in machine learning models for real-time monitoring of fuel consumption and exhaust emissions. Such advancements could significantly aid policymakers in optimizing transportation systems and mitigating environmental impacts. Future work should focus on expanding the scope of this research to include additional exhaust pollutants, heavy-duty vehicles, and the integration of deep learning techniques to further enhance predictive capabilities and perform comparative analysis with other machine learning methods.

Author Contributions

Study conception and design: F.A.; data collection: F.A., A.A., M.A. and N.M.; analysis and interpretation of results: F.A., A.A. and M.A.; draft manuscript preparation: F.A., A.A. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The Data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank Emissions Analytics for providing the data used in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Greenhouse Gas Emissions from Transport in Europe. Available online: https://www.eea.europa.eu/data-and-maps/indicators/transport-emissions-of-greenhouse-gases/transport-emissions-of-greenhouse-gases-12 (accessed on 14 January 2025).
  2. Wang, Z.; Chen, F.; Fujiyama, T. Carbon Emission from Urban Passenger Transportation in Beijing. Transp. Res. Part D Transp. Environ. 2015, 41, 217–227. [Google Scholar] [CrossRef]
  3. Int Panis, L.; Broekx, S.; Liu, R. Modelling Instantaneous Traffic Emission and the Influence of Traffic Speed Limits. Sci. Total Environ. 2006, 371, 270–285. [Google Scholar] [CrossRef]
  4. Sullivan, J.L.; Baker, R.E.; Boyer, B.A.; Hammerle, R.H.; Kenney, T.E.; Muniz, L.; Wallington, T.J. CO2 Emission Benefit of Diesel (versus Gasoline) Powered Vehicles. Environ. Sci. Technol. 2004, 38, 3217–3223. [Google Scholar] [CrossRef]
  5. Ahmed, E.; Usman, M.; Anwar, S.; Ahmad, H.M.; Nasir, M.W.; Malik, M.A.I. Application of ANN to Predict Performance and Emissions of SI Engine Using Gasoline-Methanol Blends. Sci. Prog. 2021, 104, 00368504211002345. [Google Scholar] [CrossRef]
  6. Liu, R.; He, H.; Zhang, Z.; Wu, C.; Yang, J.; Zhu, X.; Peng, Z. Integrated MOVES Model and Machine Learning Method for Prediction of CO2 and NO from Light-Duty Gasoline Vehicle. J. Clean. Prod. 2023, 422, 138612. [Google Scholar] [CrossRef]
  7. Clark, N.N.; Kern, J.M.; Atkinson, C.M.; Nine, R.D. Factors Affecting Heavy-Duty Diesel Vehicle Emissions. J. Air Waste Manag. Assoc. 2002, 52, 84–94. [Google Scholar] [CrossRef]
  8. Zhang, K.; Yao, L.; Li, G. Factors Affecting Vehicular Emissions and Emission Models. In ICTE 2013: Safety, Speediness, Intelligence, Low-Carbon, Innovation; American Society of Civil Engineers: Reston, VA, USA, 2013; pp. 2848–2853. [Google Scholar] [CrossRef]
  9. Faris, W.F.; Rakha, H.A.; Kafafy, R.I.; Idres, M.; Elmoselhy, S. Vehicle Fuel Consumption and Emission Modelling: An in-Depth Literature Review. Int. J. Veh. Syst. Model. Test. 2011, 6, 318. [Google Scholar] [CrossRef]
  10. Review of Vehicle Emission Modelling and the Issues for New Zealand. Available online: https://www.researchgate.net/publication/289082445_Review_of_vehicle_emission_modelling_and_the_issues_for_New_Zealand (accessed on 14 January 2025).
  11. Alrumaidhi, M.; Farag, M.M.G.; Rakha, H.A. Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques. Sustainability 2023, 15, 9878. [Google Scholar] [CrossRef]
  12. Alrumaidhi, M.; Rakha, H.A. An Econometric Analysis to Explore the Temporal Variability of the Factors Affecting Crash Severity Due to COVID-19. Sustainability 2024, 16, 1233. [Google Scholar] [CrossRef]
  13. Alrumaidhi, M.; Rakha, H.A. Factors Affecting Crash Severity among Elderly Drivers: A Multilevel Ordinal Logistic Regression Approach. Sustainability 2022, 14, 11543. [Google Scholar] [CrossRef]
  14. Alazmi, A.; Rakha, H. Assessing and Validating the Ability of Machine Learning to Handle Unrefined Particle Air Pollution Mobile Monitoring Data Randomly, Spatially, and Spatiotemporally. Int. J. Environ. Res. Public Health 2022, 19, 10098. [Google Scholar] [CrossRef] [PubMed]
  15. Altuğ, K.B.; Küçük, S.E. Predicting Tailpipe NOx Emission Using Supervised Learning Algorithms. In Proceedings of the 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 11–13 October 2019; pp. 1–5. [Google Scholar]
  16. Shin, S.; Lee, Y.; Park, J.; Kim, M.; Lee, S.; Min, K. Predicting Transient Diesel Engine NOx Emissions Using Time-Series Data Preprocessing with Deep-Learning Models. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 3170–3184. [Google Scholar] [CrossRef]
  17. Rivera-Campoverde, N.D.; Arenas-Ramírez, B.; Muñoz Sanz, J.L.; Jiménez, E. GPS Data and Machine Learning Tools, a Practical and Cost-Effective Combination for Estimating Light Vehicle Emissions. Sensors 2024, 24, 2304. [Google Scholar] [CrossRef]
  18. Mądziel, M. Liquified Petroleum Gas-Fuelled Vehicle CO2 Emission Modelling Based on Portable Emission Measurement System, On-Board Diagnostics Data, and Gradient-Boosting Machine Learning. Energies 2023, 16, 2754. Available online: https://www.mdpi.com/1996-1073/16/6/2754 (accessed on 14 January 2025). [CrossRef]
  19. Yao, Y.; Li, J.; He, C.; Chen, Y.; Yu, H.; Wang, J.; Yang, N.; Zhao, L. Research on Particle Emissions of Light-Duty Hybrid Electric Vehicles in Real Driving. Atmos. Pollut. Res. 2025, 16, 102332. [Google Scholar] [CrossRef]
  20. Motallebiaraghi, F.; Rabinowitz, A.; Jathar, S.; Fong, A.; Asher, Z.; Bradley, T.; Motallebiaraghi, F.; Rabinowitz, A.; Jathar, S.; Fong, A.; et al. High-Fidelity Modeling of Light-Duty Vehicle Emission and Fuel Economy Using Deep Neural Networks; SAE International: Warrendale, PA, USA, 2021. [Google Scholar]
  21. Lee, J.; Kwon, S.; Kim, H.; Keel, J.; Yoon, T.; Lee, J. Machine Learning Applied to the NOx Prediction of Diesel Vehicle under Real Driving Cycle. Appl. Sci. 2021, 11, 3758. [Google Scholar] [CrossRef]
  22. Cha, J.; Park, J.; Lee, H.; Chon, M.S. A Study of Prediction Based on Regression Analysis for Real-World CO2 Emissions with Light-Duty Diesel Vehicles. Int. J. Automot. Technol. 2021, 22, 569–577. [Google Scholar] [CrossRef]
  23. Shahariar, G.M.H.; Bodisco, T.A.; Surawski, N.; Komol, M.M.R.; Sajjad, M.; Chu-Van, T.; Ristovski, Z.; Brown, R.J. Real-Driving CO2, NOx and Fuel Consumption Estimation Using Machine Learning Approaches. Next Energy 2023, 1, 100060. [Google Scholar] [CrossRef]
  24. Wen, H.-T.; Lu, J.-H.; Jhang, D.-S. Features Importance Analysis of Diesel Vehicles’ NOx and CO2 Emission Predictions in Real Road Driving Based on Gradient Boosting Regression Model. Int. J. Environ. Res. Public Health 2021, 18, 13044. [Google Scholar] [CrossRef]
  25. Alfaseeh, L.; Tu, R.; Farooq, B.; Hatzopoulou, M. Greenhouse Gas Emission Prediction on Road Network Using Deep Sequence Learning. Transp. Res. Part D Transp. Environ. 2020, 88, 102593. [Google Scholar] [CrossRef]
  26. Kocev, D.; Vens, C.; Struyf, J.; Džeroski, S. Ensembles of Multi-Objective Decision Trees. In Machine Learning: ECML 2007; Kok, J.N., Koronacki, J., de Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 624–631. [Google Scholar]
  27. Xia, Y.; Jiang, L.; Wang, L.; Chen, X.; Ye, J.; Hou, T.; Wang, L.; Zhang, Y.; Li, M.; Li, Z.; et al. Rapid Assessments of Light-Duty Gasoline Vehicle Emissions Using On-Road Remote Sensing and Machine Learning. Sci. Total. Environ. 2022, 815, 152771. [Google Scholar] [CrossRef] [PubMed]
  28. Akkamis, M.; Keskin, M.; Sekerli, Y.E. Comparative Appraisal of Three Low-Cost GPS Speed Sensors with Different Data Update Frequencies. AgriEngineering 2021, 3, 423–437. [Google Scholar] [CrossRef]
  29. Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  30. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  31. Peng, L.; Zheng, S.; Zhong, Q.; Chai, X.; Lin, J. A Novel Bagged Tree Ensemble Regression Method with Multiple Correlation Coefficients to Predict the Train Body Vibrations Using Rail Inspection Data. Mech. Syst. Signal Process. 2023, 182, 109543. [Google Scholar] [CrossRef]
  32. Sun, J.; Lang, J.; Fujita, H.; Li, H. Imbalanced Enterprise Credit Evaluation with DTE-SBD: Decision Tree Ensemble Based on SMOTE and Bagging with Differentiated Sampling Rates. Inf. Sci. 2018, 425, 76–91. [Google Scholar] [CrossRef]
  33. Chen, K.; Peng, Y.; Lu, S.; Lin, B.; Li, X. Bagging Based Ensemble Learning Approaches for Modeling the Emission of PCDD/Fs from Municipal Solid Waste Incinerators. Chemosphere 2021, 274, 129802. [Google Scholar] [CrossRef]
  34. Elbeltagi, A.; Heddam, S.; Katipoğlu, O.M.; Alsumaiei, A.A.; Al-Mukhtar, M. Advanced Long-Term Actual Evapotranspiration Estimation in Humid Climates for 1958–2021 Based on Machine Learning Models Enhanced by the RReliefF Algorithm. J. Hydrol. Reg. Stud. 2024, 56, 102043. [Google Scholar] [CrossRef]
  35. Gis, W.; Bielaczyc, P.; Gis, W.; Bielaczyc, P. Emission of CO2 and Fuel Consumption for Automotive Vehicles; SAE International: Warrendale, PA, USA, 1999. [Google Scholar]
  36. Harrington, W. Fuel Economy and Motor Vehicle Emissions. J. Environ. Econ. Manag. 1997, 33, 240–252. [Google Scholar] [CrossRef]
  37. Pinto, G.; Oliver-Hoyo, M.T. Using the Relationship between Vehicle Fuel Consumption and CO2 Emissions To Illustrate Chemical Principles. J. Chem. Educ. 2008, 85, 218. [Google Scholar] [CrossRef]
  38. Pavlovic, J.; Fontaras, G.; Broekaert, S.; Ciuffo, B.; Ktistakis, M.A.; Grigoratos, T. How Accurately Can We Measure Vehicle Fuel Consumption in Real World Operation? Transp. Res. Part D Transp. Environ. 2021, 90, 102666. [Google Scholar] [CrossRef]
Figure 1. PEMS setup and research framework.
Figure 1. PEMS setup and research framework.
Sustainability 17 02395 g001
Figure 2. Scatter plot for predicted and measured fuel consumption models.
Figure 2. Scatter plot for predicted and measured fuel consumption models.
Sustainability 17 02395 g002
Figure 3. Scatter plot for predicted and measured C O 2 models.
Figure 3. Scatter plot for predicted and measured C O 2 models.
Sustainability 17 02395 g003
Figure 4. Scatter plot for predicted and measured N O x models.
Figure 4. Scatter plot for predicted and measured N O x models.
Sustainability 17 02395 g004
Figure 5. Radar chart summarizing the performance of models for fuel consumption. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Figure 5. Radar chart summarizing the performance of models for fuel consumption. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Sustainability 17 02395 g005
Figure 6. Radar chart summarizing the performance of models for CO2. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Figure 6. Radar chart summarizing the performance of models for CO2. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Sustainability 17 02395 g006
Figure 7. Radar chart summarizing the performance of models for NOx. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Figure 7. Radar chart summarizing the performance of models for NOx. Here, VS represents vehicle speed, EBM stands for ensemble bagged method, and TM denotes tree method.
Sustainability 17 02395 g007
Table 1. Details on input and output parameters.
Table 1. Details on input and output parameters.
Input CategoriesOutputs
Vehicle SpeedGPS SpeedUnit
GPS AltitudemFuel Rate (gal/s)

Instantaneous Mass C O 2 (g/s)

Corrected Instantaneous Mass N O x (g/s)
Vehicle SpeedGPS Speedmph
GPS Ground Speedkm/h
Weather Probe HumidityWeather Probe Humidity%RH
Ambient PressureAmbient Pressurembar
Weather Probe TemperatureWeather Probe Temperature°C
Exhaust Mass Flow RateExhaust Mass Flow Ratekg/h
Exhaust TemperatureExhaust Temperature°C
Manifold PressureManifold PressurekPa
Engine RPMEngine RPMRPM
Abs Throttle PostnAbs Throttle Postn%
Rel. Throttle PostnRel. Throttle Postn%
Table 2. Model performance for the prediction of fuel rate.
Table 2. Model performance for the prediction of fuel rate.
RMSE (mph)MAE (mph)MSER2
MixedVehicle speedEnsemble Bagged Method0.0250.0140.0010.967
Tree Method0.0310.0160.0010.95
GPS speedEnsemble Bagged Method0.0250.0140.0010.968
Tree Method0.0310.0160.0010.95
Ensemble Bagged Method
SingleDiesel datasetVehicle Speed0.0350.0230.0010.947
GPS parameter0.0340.0220.0010.949
Gasoline datasetVehicle Speed0.0110.0050.0000.993
GPS parameter0.0120.0060.0000.991
Table 3. Model performance for the prediction of C O 2 .
Table 3. Model performance for the prediction of C O 2 .
RMSE (mph)MAE (mph)MSER2
MixedVehicle speedEnsemble Bagged Method0.0280.0170.0010.960
Tree Method0.0320.0190.0010.945
GPS speedEnsemble Bagged Method0.0260.0140.0010.966
Tree Method0.0220.0110.0000.975
Ensemble Bagged Method
SingleDiesel datasetVehicle Speed0.0360.0240.0010.947
GPS parameter0.0270.0140.0010.949
Gasoline datasetVehicle Speed0.0110.0050.0000.993
GPS parameter0.0120.0060.0000.991
Table 4. Model performance for the prediction of N O x .
Table 4. Model performance for the prediction of N O x .
RMSE (mph)MAE (mph)MSER2
MixedVehicle speedEnsemble Bagged Method0.0280.0170.0010.960
Tree Method0.0250.0150.0010.967
GPS speedEnsemble Bagged Method0.0080.0020.0000.950
Tree Method0.0160.0040.0000.760
Ensemble Bagged Method
SingleDiesel datasetVehicle Speed0.0190.0060.0000.853
GPS parameter0.0160.0050.0000.869
Gasoline datasetVehicle Speed0.0190.0060.0000.889
GPS parameter0.0190.0060.0000.890
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alazemi, F.; Alazmi, A.; Alrumaidhi, M.; Molden, N. Predicting Fuel Consumption and Emissions Using GPS-Based Machine Learning Models for Gasoline and Diesel Vehicles. Sustainability 2025, 17, 2395. https://doi.org/10.3390/su17062395

AMA Style

Alazemi F, Alazmi A, Alrumaidhi M, Molden N. Predicting Fuel Consumption and Emissions Using GPS-Based Machine Learning Models for Gasoline and Diesel Vehicles. Sustainability. 2025; 17(6):2395. https://doi.org/10.3390/su17062395

Chicago/Turabian Style

Alazemi, Fahd, Asmaa Alazmi, Mubarak Alrumaidhi, and Nick Molden. 2025. "Predicting Fuel Consumption and Emissions Using GPS-Based Machine Learning Models for Gasoline and Diesel Vehicles" Sustainability 17, no. 6: 2395. https://doi.org/10.3390/su17062395

APA Style

Alazemi, F., Alazmi, A., Alrumaidhi, M., & Molden, N. (2025). Predicting Fuel Consumption and Emissions Using GPS-Based Machine Learning Models for Gasoline and Diesel Vehicles. Sustainability, 17(6), 2395. https://doi.org/10.3390/su17062395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop