Estimation of Fuel Consumption through PID Signals Using the Real Emissions Cycle in the City of Quito, Ecuador

: In Ecuador, according to data from the Ministry of Energy, the internal combustion engine is the largest consumer of fossil fuels. For this reason, it is important to identify and develop proposals in the literature that enable the prediction of vehicle fuel consumption in both the laboratory and on the road. To accomplish this, real driving emissions (RDEs) need to be contrasted against the development of an algorithm that characterizes forces that oppose such proposals. From experimental tests, fuel consumption information was collected through a ﬂow meter connected to the fuel line and the engine’s characteristic curves were obtained through a chassis dynamometer. Then, from the parameter identiﬁcation data (PID), the most important predictors were established through an ANOVA analysis. For the acquired variables, a neural network was implemented that could predict 99% of the estimates and present a relative error lower than 5% compared to common methods. Additionally, an algorithm was developed to calculate fuel consumption as a function of the gear, inertial forces, rolling resistance, slope, and aerodynamic force.


Introduction
According to data from the Coordinating Ministry of Strategic Sectors, the transportation sector in Latin America and the Caribbean in 2016 generated 36% of greenhouse gases. In Ecuador, the transportation sector was the main consumer of fossil fuels [1]. To address this situation, standards have been implemented such as the Euro 6 regulation that limits emissions and fuel consumption [2]. Therefore, being able to estimate the fuel consumption of light gasoline-powered vehicles is essential for reducing both energy consumption costs and emissions [3]. In a previous work, Zhou et al. [4] described fuel consumption prediction as a complex, non-linear process involving many parameters: distance traveled, weather conditions (temperature, humidity, wind speed), specific vehicle characteristics, traffic conditions and driving style. There are two techniques of prediction: one is by means of models that involve equations that describe the physical and chemical processes of the engine during the intake, compression, work and exhaust phases. The other are black box models, which consider the vehicle as a whole for which an equation cannot be predicted and the output is estimated from system inputs [4]. Currently, vehicle fuel consumption can be quantified as an open research problem because of non-controllable variables such as driving style and atmospheric conditions (e.g., pressure, altitude, and relative humidity). Experimental laboratory measurements, road tests and automatic algorithms that can predict specific fuel consumption can be used to estimate this parameter. Information about fuel consumption can be acquired in laboratories through vehicle instrumentation and testing on a chassis dynamometer under controlled conditions, but this information differs greatly from actual fuel consumption on the road. To solve this problem, measurements were performed with the use of a portable emission measurement system (PEMS) that recorded the concentration of emissions and fuel consumption was calculated by inferring the pollutants emitted by the exhaust pipe [5]. It should be noted that a device that measures emissions in real-time is costly. For developing countries, the cost is simply prohibitive, so many research centers opt for data logger devices that allow non-intrusive data acquisition without adding mass to the vehicle through an on board diagnostics (OBD II) port or by processing the parameter identification data (PID) signals to calculate fuel consumption [6]. Authors such as Cabrera [7] used optimized dynamic programming and real road data to achieve fuel consumption savings of 5.2% by taking into account the road profile and travel time. In their work, Wang et al. [8] paid special attention to the reduction of emissions and fuel consumption in high-altitude cities by means of an optimization performed using support vector machines to modify engine characteristics to obtain similar power parameters and significant reductions in emissions. Although total emissions may decrease, Chen et al. [9] considered that real driving emissions (RDEs) underestimate the importance of emissions generated by vehicles under cold-start conditions, so they propose new measurement methodologies. It should be noted that fuel consumption is directly related to the number of emissions produced. Qu et al. [10] measured hydrocarbon (HC) emmissions-carbon monoxide (CO), and carbon dioxide (CO 2 )-and through a carbon mass balance during the combustion and exhaust gas process to calculate the fuel consumption. Andrade et al. [11], through the use of PID signals, proposed a machine learning algorithm that estimates CO 2 emissions from the calculated fuel consumption data using variables acquired from the OBD II port. In a work developed in China, Zheng et al. [12] calculated fuel consumption through the carbon balance method by finding several discrepancies between the values reported during tests using RDEs and the country approval cycle by developing an algorithm that calculated the fuel flow showing a correlation between 0.906 and 0.977 with the approval cycle. Authors such as Doulgeris et al. [13] recognized the difficulty of predicting fuel consumption so they used experimental data to adjust the accuracy of their model to achieve CO 2 estimation errors of less than 5% using RDE methodology.
The objective of this research was to obtain a mathematical model that predicts the fuel consumption of a vehicle in the city of Quito, Ecuador, considering the geographic conditions of the city and the quality of the fuel. Additionally, this information served as a reference for selecting a vehicle and understanding its actual fuel consumption. This paper is organized as follows: The Section 2 presents how the data are acquired through OBD II. After that, the most significant variables are selected to design an algorithm to estimate fuel consumption. Section 3 introduces the model to calculate fuel consumption through resistive forces and is compared with other methodologies found in the literature. Section 4 contrasts the results with other works. Conclusions and future work are provided in Section 5.

Materials and Methods
The research proposed for this project is of the explanatory scientific type, which attempts to formulate laws that determine the behavior of a physical phenomenon from the explanation of the causes that generate it [14]. The project describes the fuel consumption process in a spark-ignition engine through the application of various methodological strategies. It involves studying the acquired PID through the OBD II port of the engine, including data such as manifold absolute pressure (MAP), engine speed (RPM), and intake air temperature (IAT). Subsequently, it identified the variables that had the highest contribution to the process and adjusted the model with complementary variables such as road slope and altitude. To identify the most important variables involved in fuel consumption, a vehicle was selected. Signals were acquired from the OBD II port through a data logger device, and then these signals were filtered and synchronized because of a delay between the different measurement equipment. Then, with these variables, the physical phenomenon was mathematically modeled to compare with the results of similar works. Finally, fuel consumption was simulated following the RDE methodology. A structure of the methodological process can be seen in Figure 1.

Experimental Setup
To initiate with the experimental phase, a sedan equipped with a 1.4 L engine displacement was situated on a chassis dynamometer. This dynamometer was employed to replicate speed and load conditions, simulating real-world road operation. As the vehicle traversed various operational ranges, a data logger device was linked to the engine's OBD II port to capture vehicle PID. Meanwhile, the chassis dynamometer, a MAHA LP model adhering to the ISO 17359:2018 standard, generated engine characteristic curves encompassing torque and power [15]. A summary of the experimental setup used for the development of this project is shown in Figure 2.

Data Acquisition
After performing the experimental setup, the acquired information is post-processed to obtain the engine characteristic curves as shown in Figure 3.
The specific consumption curve elucidated the quantity of the fuel mass needed to procure a designated amount of energy. This pivotal parameter, in turn, enabled the determination of fuel consumption in terms of liters per hour. As is discernible from Figure 3b, the range of 2000 to 3000 rpm exhibited the most economical specific fuel consumption. Additionally, it is noteworthy that the utmost peak in fuel injection volume occurred at approximately 4000 rpm. Engine characteristic curves were acquired through meticulous experimentation on the chassis dynamometer. Concurrently, the PID signals were recorded utilizing the Freematics ONE+ data logger [16]. Table 1 shows the signals and the physical units that each variable represents.

Exploratory Study
The objective of the current study was to construct a model that elucidates the intricate interplay between the sensor-acquired variables; namely, vehicle speed (VSS), throttle position (TPS), engine coolant temperature (ECT), oxygen sensor (O 2 ), short-term fuel trim (STFT), long-term fuel trim (LTFT), RPM, manifold absolute pressure (MAP), and intake air temperature (IAT). Figure 4 exhibits scatter diagrams for each of the pertinent variables, positioned along the horizontal axis. The vertical axis, on the other hand, portrays the probability density, thereby enabling the discernment of the distinct distributions inherent to the various variables. It is important to mention that STFT, LTFT and O 2 were disregarded from consideration because of the engine's insignificant wear and the absence of outstanding fluctuations in their values. Additionally, it is imperative to highlight that fuel consumption data were recorded simultaneously with the power and torque characteristic curves of the vehicle.
By applying a linear regression model that incorporated all the dependent variables except for fuel adjustments, the process effectively explained 99.8% of the vehicle fuel consumption, as evidenced by the highly favorable adjusted coefficient of determination (R 2 ). Moreover, as shown in Table 2, it is notable that RPM, TPS and MAP emerged as the variables exhibiting the highest linear relationship with fuel consumption, substantiated by their noticeable significance values.  The quantile-quantile plot (Q-Q plot) depicted in Figure 5a proves that the assessment of data demonstrated a close alignment of the data points with the reference trend line, indicating that the distribution can be reasonably approximated as normal. In contrast, Figure 5b illustrates the relationship between predicted values and residuals. To this extent, not only the absence of any characteristic equation is evident, but also the lack of a fan or funnel-shape distribution is present. Consequently, the linearity of the data can be visualized and the adherence to the homoscedasticity criterion may be validated.
Due to the large number of considered variables, it was advisable to streamline the model by diminishing the number of factors that represented the underlying physical phenomenon with a more adequate level of complexity. Through the application of Pearson's correlation coefficient, it became evident that certain variables like VSS, RPM and MAP had a significant influence in this context. Furthermore, Figure 6 shows the importance of the dependent variables concerning the independent variable of consumption. Nevertheless, it is worthy to remark that predictors such as VSS and RPM exhibited a notable degree of correlation, which was further corroborated by a variance inflation analysis as seen in Table 3, confirming the presence of collinearity. Hence, the removal of one of the variables from the model had to be taken into consideration to alleviate the issue.  Despite the noticeable result of the Pearson correlation coefficient regarding to variables like IAT and ECT, it is relevant to stress the fact that their significance within the model was imperceptible. Therefore, it was pertinent to reduce the number of variables to the three representing the utmost importance for the model. This reduction yielded a model wherein all the remaining variables displayed significance, effectively explaining 98.8% of the gathered information as exemplified in Table 4.

Experimental Route Planning
A route was charted through the urban expanse of the city of Quito, located in Ecuador. This choice of city was considered due to the data provided by the Global Traffic Scorecard, which designates Quito as the fifth most congested urban center in South America [17]. To comprehensively evaluate the chosen route, the RDE cycle was employed, encompassing diverse driving scenarios, namely, city, rural and highway navigation, which is also elucidated in Figure 7. To ensure the validity of route, it is essential that it adheres to the constraints established by the Euro 6 standard. The mentioned constraints comprise various aspects, such as the speed of circulation in each section, stopping times and distances to be covered. Table 5 offers a clear view of what has been stated [18,19]. Figure 8 provides a visual representation of the speed for each section plotted against the accumulated distance. Notably, it is imperative to acknowledge that parameters such as highway speed presented a certain degree of inconvenience in their achievement due to legal regulations, which prohibit exceeding speeds of 90 km/h in the Ecuadorian territory. The experimental environmental conditions during the tests reported a consistent temperature of 18 • C and an atmospheric pressure of 70 kPa without the presence of rain or intense winds. The testing procedures took place at an initial altitude of 2900 m above sea level, thereby exhibiting similarities to the conditions elucidated in [20]. Throughout the testing, certain measures were undertaken. These included deactivating the vehicle's air conditioning system and ensuring the closure of all vehicle windows.

Model Problem
The fuel consumption can be determined through an analysis of the vehicle's longitudinal dynamics, employing Newton's second law. The behavior of fuel consumption is succinctly described by Equation (1), in which m v represents the mass vehicle, a signifies acceleration and γ denotes the coefficient pertaining to rotating mass [21,22].
Equation (2) delineates the constituent forces influencing the vehicle's motion. Specifically, F v represents the force required for the vehicle's forward movement, F d embodies the aerodynamic force of the vehicle, F r encompasses the forces produced by rolling resistance and F p accounts for the force required to ascend slopes [23].
From Equation (3) it is necessary to determine some vehicle parameters such as the vehicle mass found by measurement as shown in Figure 9a. The air density ρ is calculated from the relationship between the standard air and the one found in the city of Quito, using readings from the MAP sensor located in the intake manifold. Parameter S represents the sectional area of the vehicle, c x coefficient of drag, f r coefficient of rolling resistance, V velocity, g gravitational force and θ slope angle. By using the Equation (4), it is possible to determine the rolling resistance coefficient that determines the relationship between the type of surface on which the tire travels, inflation pressure and vehicle speed, these values can be obtained empirically for this case we use a f = 0.015, and f 0 = 0.01, common values in the literature [1], while the speed is expressed in km/h.
Parameters such as the frontal area of the vehicle are calculated by drawing the crosssection in a computer-aided design program as shown in Figure 9b, the dimensions are acquired from the vehicle manufacturer's website [24].
Inertial forces are dependent on the variation of acceleration, and this parameter is calculated as the rate of change of velocity with respect to time a = dv dt . To find the road angle (θ), the data logger device has incorporated a Global Positioning System (GPS) that records latitude, longitude, and variation of altitude every 0.1 s. Figure 10 shows the estimated angle of the roadway described by the vehicle along the route, which is inferred with Equation (5) [25]. Once all the forces necessary for the vehicle to move forward have been found, this value is multiplied by the speed in order to find the wheel power required for each instant as shown in Equation (6), but the engine power is related to the selected gear and its efficiency [26].
Transmission efficiency can be obtained from geometric ratios by which gearboxes are designed, for a 185/65R15 88H tire the geometric radius of the tire can be found by the Equation (7), where A n represents tire width, P a is the height of the tire sidewall and d c is the wheel diameter in inches.
Being i, the longitudinal slip ratio of the tire, the effective radius of the tire can be found as shown in Equation (8).
Using the relationship r = VSS/RPM, the number of groups is determined by a k-means algorithm that identifies each vehicle speed step (see Figure 11) [27]. The transmission system is manual and does not have a sensor that identifies the selected gear, so to classify this process a decision tree learning technique is used to classify the gear. To generate the database that classifies the gear, the vehicle is driven for 200 km, where all gears were used under different engine speeds. As it can be seen in Figure 12 the match rate is high because it is a deterministic model that classifies very well for this specific vehicle, but it would not be able to generalize for others. For the model structure, 70% of the data were used for training, 15% for testing, and the remaining 15% for validation, resulting in a model with a 99.9% success rate.  From Equation (9), we can find the ratio of the differential group. With R 5 = VSS RPM being the direct gear ratio, the efficiency of each gear can be identified as shown in Equation (10). R j represents the gear ratio for each gear and γ j the rotating mass ratio of each gear, furthermore j max represents the value of the highest gear [1].

Linear Model
In this section, the fuel consumption is modeled from the use of the most significant predictors being the RPM, TPS and MAP sensor located in the intake manifold. From a linear regression with an adjusted quadratic coefficient of R 2 = 0.98 the following Equation (11) is obtained. Figure 13 shows the behavior of fuel consumption with respect to the distance traveled in the test.

Resistive Forces Model
After post-processing the information of each of the variables, it can be observed that from all the calculated forces, the one that generates the greatest opposition to the vehicle's progress is the slope resistance due to the topology of the city of Quito-Ecuador, followed by the inertial forces produced by the mass of the vehicle in acceleration and deceleration at a certain rate of change. The aerodynamic drag as seen in Figure 14 is completely dependent on the speed while the rolling resistance depends on the same factor, but to a lesser degree [28].  Once all the forces opposing the vehicle progress have been obtained, the instantaneous power is calculated by multiplying the resulting force by the speed. It should be noted that the power obtained is the power at the wheel, but fuel consumption is linked to engine power, so Equation (12) is used, where η j represents the efficiency of the gear (see Table 6). The linear relationship between fuel consumption and engine can be seen in Equation (12) and Figure 15 [29].  Once the consumption data for each of the gears are obtained, they are associated with their respective engine power. For ease of visualization, the distance is represented on the horizontal axis, while the consumption in liters per hour for each instant is displayed on the vertical axis (see Figure 16).

Neural Network
Based on the most significant predictors of the model: MAP, TPS, and RPM were determined by employing an experimental test developed on the chassis dynamometer, and a neural network is trained to predict fuel consumption. As it is known, these models very efficiently predict the responses, but they do not allow us to determine intermediate processes as is the case of the analytical functions [30]. In Figure 17a, it can be seen that the validation error is below the training and the test, so the model is able to predict different operating cycles. As can be seen in Figure 17b, the model fit trained by neural networks is able to predict 99.5% of the times correctly.

Discussion
In this work, different methodologies are developed and an algorithm is proposed that calculates the fuel consumption through the dynamic forces of the automobile that interact from the road wheel contact following the kinematic chain to reach the engine. As shown in the methodological chapter, several authors calculate the fuel consumption from the absolute pressure found in the intake manifold because the vehicle manufacturer's OBD II standard does not provide a fuel PID [6,11]. Bishop et al. [31], reported differences from −1.3 to 1.7% of values calculated of fuel consumption with respect to the observed when the vehicle is tested under a US06 driving cycle in their study.
As depicted in Figure 18, a significant disparity is evident in the results obtained using the linear regression algorithm when compared to other approaches. This variance can be attributed to variations in parameters, such as the rolling resistance coefficient, which exists between the rollers of the chassis dynamometer and the road surface of the tracks. Furthermore, it is worth noting that the test conditions involved zero aerodynamic resistance and no slopes throughout the testing process [32]. To contrast the data and verify that the calculations developed from the MAP sensor are not erroneous, a commercial application used for smartphones allows the connection to the OBD II port of the vehicle and stores the information in a plain text format [33]. Post processing the information, it is visualized that the curves of the smartphone application with respect to those obtained by MAP sensor in the intake manifold are similar, but far from the linear regression process. Neural networks predict the study phenomenon very well as reported in the study made by Abukhalil et al. [34], where the RMSE is 2.436 between the proposed method with respect to measured values. In this work, the training neural network predicts effective fuel consumption with a relative error of less than 5%, but it is not possible to know the intermediate stages of the process. In Figure 19, a clear distinction is observed in the graphs obtained from the algorithm compared to the fuel estimation derived through the reading of the PID. This research proposes a novel approach to calculate fuel consumption, taking into account various resistive forces acting on the vehicle. Parameters like rolling resistance coefficient, aerodynamic drag, slope, and inertial masses are carefully considered in this calculation. Moreover, the vehicle shape parameters are inferred from its cross-section and drag coefficient. To enhance accuracy, atmospheric conditions, such as air temperature and pressure, are extracted from the IAT and MAP sensor, respectively. The model employed in this study is of a deterministic nature, facilitating adaptability for different vehicle types. Consequently, parameters like vehicle shape can be modified through adjustments to the cross-section and drag coefficient to suit various vehicle configurations. In Figure 20 it can be verified that the algorithm presents a moderate correlation with a fit of R 2 = 0.7 in rural urban and highway driving. In studies such as the one shown in [12], they calculate fuel consumption from a carbon balance, the method consists of collecting information on each of the pollutants (CO, HC, among others) at the exhaust outlet by means of a PEMS to later convert these emissions into fuel consumption. The method is very close to the data obtained by OBD II, finding differences of 0.55 ± 0.12 L/s in certain measurement bins and 3.79 ± 0.69 L/s. Comparing the estimates found in Table 7, it can be seen that the methods proposed in this study are within 5% of the expected value and the consumption calculated via OBD II using the MAP sensor is a good reference indicator. To find the differences between each method, the relative error is used, as shown in Figure 21 the relative error is within 2% in most of the route except in certain sections where it reaches 5%. According to studies carried out by [35] a VT-Micro Model is proposed, with respect to experimental data reporting values around the threshold of 5% [36].

Conclusions
In this work, several methodologies were developed to evaluate the most influential predictors in fuel consumption generation. A linear regression model was used to predict the most significant variables with a 95% confidence interval. The results showed that MAP, TPS, and RPM are important factors for estimating fuel consumption. However, parameters such as VSS are also important, but they exhibit collinearity with RPM due to their dependence.
The model generated from the most significant prognosticators can predict 99.8% of the values according to ANOVA. However, when the model is tested under a RDE cycle, it shows variations of about 44.1%. This discrepancy can be attributed to the data being obtained from experimental measurements on a chassis dynamometer under laboratory conditions, without considering differences in rolling resistance and drag resistance on an actual roadway.
Neural networks predict fuel consumption efficiently and with only three predictors. The model in this study reports relative errors lower than 5% in a RDE cycle. Unfortunately, this type of techniques does not allow to show intermediate explanations of the process, so in this study analytical techniques that describe each of the processes and dependent variables were used.
The algorithm developed for analyzing vehicle resistive forces enables us to identify the most significant factors affecting the process of fuel consumption. In this scenario, slope resistance emerges as the most influential force, followed by inertial masses. When the vehicle is tested under a RDE cycle, the algorithm reported a fuel consumption of 5.57 L, while the PID estimations yielded 5.28 L. The method demonstrates an average error of 1.41%, with a maximum value of 5.09%.
In future works, it is recommended to continue the study by incorporating a piezoelectric flow-meter, which would enhance the efficiency of the models described above. Additionally, parameters such as rolling coefficient and aerodynamic resistance can be refined through coast-down tests, enabling the development of scalable models applicable to a wider range of vehicles.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study did not require ethical approval.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.