Comparisons of Real-World Vehicle Energy Efﬁciency with Dynamometer-Based Ratings and Simulation Models

: Software tools for fuel economy simulations play an important role during design stages of advanced powertrains. However, calibration of vehicle models versus real-world driving data faces challenges owing to inherent variations in vehicle energy efﬁciency across different driving conditions and different vehicle owners. This work utilizes datasets of vehicles equipped with OBD/GPS loggers to validate and calibrate FASTSim (software originally developed by NREL) vehicle models. The results show that window-sticker ratings (derived from dynamometer tests) can be reasonably accurate when averaged across many trips by different vehicle owners, but successfully calibrated FASTSim models can have better ﬁdelity. The results in this paper are shown for nine vehicle models, including the following: three battery-electric vehicles (BEVs), four plug-in hybrid electric vehicles (PHEVs), one hybrid electric vehicle (HEV), and one conventional internal combustion engine (CICE) vehicle. The calibrated vehicle models are able to successfully predict the average trip energy intensity within ± 3% for an aggregate of trips across multiple vehicle owners, as opposed to within ± 10% via window-sticker ratings or baseline FASTSim.


Introduction
Many software tools exist for modeling the fuel economy of vehicles, as surveyed in [1]. Depending on the underlying modeling approach, one may classify the tools into the following: (i) direct simulation models that are physics-based (termed "White-box" in [1]), (ii) empirical models that are primarily data-inference-based (termed "Black-box" in [1]), and (iii) hybrid models that combined some traits of both empirical and physicsbased models (termed "White-box" in [1]). Examples of physics-based models include Autonomie [2] and FASTSim [3], which are endorsed by the U.S. Department of Energy [4]. Examples of empirical models include MOVES [5], which is developed and maintained by the U.S. Environmental Protection Agency (EPA) and EMFAC [6], which is utilized by the California Air Resources Board. Furthermore, categorically speaking, window-sticker ratings published by the EPA [7] are another form of a (simple) black-box model.
Black-box models have the advantage of being grounded to real-world data, but their shortcomings include (i) the possibility of reduced accuracy when applied to smaller sub-populations of vehicles, (ii) the need for large amounts of data in order to properly calibrate, and (iii) the difficulty in predicting the performance of vehicle models that do yet not exist in the real-world. White-box models (physics-based) on the other hand can be easily modified to explore new design parameter settings, but validity of the model (i.e., whether a new prototype will perform as predicted by the software) remains an important issue. As new vehicle designs are of primary interest, the current work focuses 2 of 12 on physics-based models, and more specifically FASTSim [3], owing to it being (relatively) computationally-light, open-source, and freely accessible to the general public.
While detailed physics-based models such as Autonomie are becoming popular for future transportation systems simulation, such as autonomous vehicles [8], data-based approaches also appear to remain popular in recent literature [9,10]. Previous work by the authors in [11] aimed to strike a balance between the attractive features of a computationally light physics-based model (FASTSim) along with the calibration capability of data-based approaches. The previous work by the authors in [11] proposed three tunable parameters that constitute correction terms for passenger and cargo weight, as well as auxiliary and traction power. The results of the approach in [11] were shown for three battery-electric vehicle (BEV) models (Leaf, Bolt, Model S) and three plug-in hybrid electric vehicle (PHEV) models (C-Max Energi, Prius Prime, Volt).
The current paper extends the previous work into an additional minivan PHEV model (Pacifica Hybrid), a non-plug-in hybrid electric vehicle (HEV) model (Prius HEV), as well as an SUV conventional internal combustion engine (CICE) vehicle model (CR-V). The current paper also delves deeper (for all nine considered vehicle models) into examining how the driving conditions from different vehicle owners of the same vehicle model can pose challenges to the construction of "useful" calibrated models capable of generalization across multiple owners.
The rest of the paper is organized as follows. Section 2 provides an overview of the source for real-world vehicles trip data and conducts an analysis that highlights variations in trip energy intensity (energy per distance travelled), Section 3 provides a summary of the vehicle models' calibration approach and results of the optimized values for the tuning parameters, and Section 4 showcases the performance of the calibrated vehicle simulation models. The paper then ends with a brief summary of conclusions and future work.

Real-World Vehicles Trip Data
Datasets of real-world driving analyzed in this work are but a subset of large-scale data collection effort by UC-Davis Institute of Transportation Studies, the eVMT Project [12]. Part of the eVMT project involves OBD/GPS monitoring of all vehicles in survey participant households that own a BEV or PHEV. To maintain privacy for survey participants, only the following second-by-second trip information was utilized within the current work for simulation of energy/fuel consumption: (i) vehicle speed, (ii) road slope, and (iii) cabin heating or air-conditioning power. In order to compare/validate simulation models, OBD record of energy/fuel utilization parameters are also examined, but only totals for whole trips. Furthermore, for additional privacy consideration, even after removal of sensitive parts of the data, computational analysis involving individual trips is carried out in a secured environment, with only bulk results presented.
Box-plots for trip energy intensity, defined as vehicle energy usage per distance travelled (with units of kWh/mile or gal/mile), are shown in Figure 1. The analyzed dataset includes trips from up to 10 (fewer real-world vehicles were available for some models) different real-world vehicles per vehicle model. For PHEVs, whose trips can include both electricity and gasoline usage, all gasoline amounts were transformed into equivalent electric energy (kWh) by multiplying by the fuel heating value and average vehicle efficiency in charge sustaining mode, thus allowing trip energy intensity of PHEVs to be visualized via single box-plots (as opposed to splitting electricity and gasoline, which could have created biases due to differences in driving patterns for trips that involve gasoline usage). Box-plots provide a convenient and compact way of visualizing data that include statistical variations. The bottom and top lines of the box correspond to the 25th and 75th percentiles, respectively, of the cumulative distribution, while the middle line corresponds to the median value. The bottom and top extension lines mark the 5th and 95th percentiles, respectively, while the average value for the distribution is marked by a diamond shape. Aside from individual vehicle-household box plots, a combined "zone-wide" (darker tone) box-plot is shown in Figure 1 for all trips by all vehicles of a given model. Aside from individual vehicle-household box plots, a combined "zone-wide" (darker tone) box-plot is shown in Figure 1 for all trips by all vehicles of a given model.  Additional data provided in Figure 1 include the number of trips in the data sample for each vehicle, which is shown between brackets below the vehicle ID on the horizontal axis, as well as equivalent EPA widow-sticker kWh/mile or gal/mile value for the combined cycle (as obtained from [7]), which is marked via a horizontal dashed line. The secondary vertical axis in Figure 1 (with red text) showcases the relative difference in trip energy intensity when referenced to the OBD average value for all trips by all vehicle samples of a vehicle model.
Keeping in mind that OBD readings can have their own error margin, and that accounting for such errors is beyond the scope of current work, some notable observations about Figure 1 include the following:

•
For the vast majority of the considered individual vehicle owners, some of their trips have higher energy intensity than the window-sticker value, while some other trips have lower energy intensity (other than a few cases such as the Bolt owner #2 and Model S owner #5, the dashed line in Figure 1 lies between the 5th and 95th percentiles marked by the extension lines of box-plots); • Window-sticker values can be reasonably good (within approximately ±10%) at predicting the average energy intensity across multiple vehicle owners (comparing the average value for darker tone box plots with the dashed line in Figure 1), but the average energy intensity for some individual vehicle owners can be off from the window-sticker value by more than 20%, such as the Model S owner #4 in Figure 1; • Individual trips by vehicle owners may occasionally be different than the average across owners (reading the 5th and/or 95th extension lines on the secondary vertical axis in Figure 2) by more than 50%. By extent, the window-sticker values can also be off by more than 50% for some individual trips.
The main insights from such observations could be summarized as highlighting that variations in trip energy intensity do exist across different vehicle owners and within different trips by same vehicle owners. Some of those variations (attributed to variations in trip speed, acceleration aggressiveness, or road slope) may be possible to reproduce in software simulations, while other sources of variation (such as unknown passenger/cargo weight or head wind speed) can be very difficult to predict and account for. As such, tuning of software models compared with real-world energy recording is done with a mindset of error reduction, but with the understanding that errors cannot be completely eliminated. The main insights from such observations could be summarized as highlighting that variations in trip energy intensity do exist across different vehicle owners and within different trips by same vehicle owners. Some of those variations (attributed to variations in trip speed, acceleration aggressiveness, or road slope) may be possible to reproduce in software simulations, while other sources of variation (such as unknown passenger/cargo weight or head wind speed) can be very difficult to predict and account for. As such, tuning of software models compared with real-world energy recording is done with a mindset of error reduction, but with the understanding that errors cannot be completely eliminated.

Model Calibration
A calibration approach using three tuning parameters for physics-based simulation models (applied to FASTSim) was previously introduced by the authors. For details (including mathematical derivation), readers are referred to [11]. An overview of the three parameters is explained as follows: α T is a scaling factor that adjusts the total traction power at every time instant (calculated by FASTSim to account for acceleration, wind drag, rotational inertia, road slope, and tire rolling resistance). α M is a correction mass (in (kg)) to account for differences across vehicle owners in passenger and cargo weight from the default (136 kg) value in FASTSim. α A is a constant additional power (in (kW)) that takes the form of a correction term for auxiliary power, but is intended to also account for other unknown effects.
An estimation of appropriate values for the tuning parameters is conducted via an optimization procedure that seeks to minimize a weighed error function for differences between OBD energy records and simulation predictions, for several trips across multiple vehicle owners. The results of the optimized tuning parameters are summarized in Table 1. As with typical meta-modeling approaches [13], tuning of the models is conducted on a subset of data (referred to as the set of trips "T"), then verification is conducted on a different set of trips that were not included in the optimization (referred to as the set of trips "V"). For that reason, only a subset of vehicles from Figure 1 (ones that have a sufficient number of trips that can be partitioned into sets T and V) are included in Table 1. In the design philosophy of the calibration approach, α T is intended to account for minor discrepancies in the modelled powertrain components such as engine, motor, and transmission. As such, in a properly constructed physics-based model, when conducting optimization in order to estimate values for (α T , α A , α M ), the optimized value for α T should be close to 1 (and its value is considered a sanity-check). Furthermore, only one value for α T per vehicle model is considered in the optimization, while (α A , α M ) are permitted by the optimization to have different values for each individual vehicle owner. After conducting the optimization, a final step includes averaging of the parameter values across different owners in order to provide a more generic "group-wide" estimate, as shown in Table 1.

Results and Discussion
One of the key philosophies in the development of FASTSim by its original authors at NREL was to emphasize reasonable accuracy (within ±10% per [14]) in favor of fast computations that allow the simulation of large sets of real-world driving. In that sense, the key advantage of "baseline" FASTSim may not so much be its ability to provide better predictions than window-sticker ratings (which are also in the same ball-park of ±10% when examining averages across multiple vehicle owners in Figure 1), but in its ability to provide predictions about vehicle designs that do not exist in the market yet. For example, FASTSim could simulate a 150-mile electric range version of C-Max or change the powertrain of a conventional ICE pickup truck model into a 300-mile range BEV. For that reason, the validity assessment of fuel economy models in this section will examine the statistical distribution (via box plots) for the relative difference in trip energy intensity (per Equation (1)) for three predictors: • Window-sticker-based prediction, in which the prediction for every trip is simply the combined cycle rating from the fuel economy guide [7]. • Baseline FASTSim models, as published by NREL in the 2018 public version of FAST-Sim [3]. It should be noted, however, that the public version of FASTSim does not include all vehicle models considered in the current study (Table 1). Thus, a "baseline" FASTSim model could not be examined for Pacfica Hybrid, Prius HEV, or CR-V.

•
FASTSim models with tuning, per the adopted calibration approach.
The relative difference in trip energy intensity (λ), whose statistics are used as an error metric for comparison between fuel economy prediction models, is defined as follows: where E is the equivalent energy intensity (in (kWh/mile) or (gal/mile)) for some trip j, with the superscript o indicating the OBD-logged value for the trip, while the superscript m indicates some other model for estimation, with T and V indicating the tuning and validation sets of trips, respectively. A comparison of trip energy intensity estimation models for each of the individual vehicle samples (as listed in Table 1) is shown in Figure 2. As trips of each vehicle sample are split into tuning and verification subsets of trips, the box-plots in Figure 2 are shown in pairs, with same color tone with the color tone indicating the model that was used for the estimation of trip energy intensity. In each pair of box-plots, the first (to the left) is for the tuning set of trips, while the second (to the right) is for the verification set of trips. It should be noted that window-sticker-based estimates and baseline FASTSim involve no optimization-based tuning, thus their performance should not differ much between the tuning and verification sets of trips. However, as the trip sets are different, they are still reported in separate box-plots in Figure 2.
Notable observations from Figure 2 include the following: • The average value (diamond marker) for tuned FASTSim models for the tuning set of trips (left box-plot in the green color tone pairs) remains within ±1% for all vehicle samples of all considered vehicle models. This is an indication of success for the optimization process being able to find a good solution for the tuning parameter values, but not necessarily an indication that the tuned FASTSim models can generalize well.

•
The average value for tuned FASTSim models for the verification set of trips (right box-plot in the green color tone pairs) is mostly within ±5%, with the exception of Bolt vehicle #1, CR-V vehicle #2, and Prius Prime vehicle #1, which have errors of −6.6%, +6.0%, and +5.2%, respectively.

•
The average value for window-sticker based estimates (diamond marker in either of the yellow color tone box-plot pairs) is outside the bounds of ±10% on some vehicle samples across different vehicle models, with a few cases outside ±15% and one extreme case (C-Max Energi vehicle #7) exceeding 40%.
Observations from Figure 2 further highlight the limitation of window-sticker-based predictions when it comes to individual vehicle owners, which is consistent with the general message in the fuel economy guide that individual mileage can vary [7]. Baseline FASTSim models are mostly at the same level of accuracy as window-sticker-based predictions, and may have a bit of an advantage in being able to avoid large errors in extreme case samples (such as C-Max Energi vehicle #7 in Figure 2). Tuned FASTSim models on the other hand are observed to exhibit superior prediction performance of trip energy intensity (even on trips that were not part of the model tuning process), and this predictive capability was observable across all vehicle samples in all considered vehicle models. Individually calibrated vehicle models, however, imply that each vehicle (for each owner) can have a different value for correction mass and auxiliary power, which poses a challenge when the goal is to construct vehicle models that generalize across multiple owners.
For vehicle models that are capable of generalizing across different vehicle owners, we test the "group-wide" estimate for the tuning parameters (α T , α A , α M ) in Table 1. The group-wide estimate for (α T , α A , α M ) is essentially a weighted average (by total miles driven) for the vehicles that have been individually tuned (per Figure 2). The test set of trips for the group-wide tuned models includes trips in the tuning set, verification set, as well as trips by vehicles samples in Figure 1 that were not included in the tuning process at all. The results of the group-wide validity assessment are shown in Figure 3, in which notable observations include the following:

•
The average value (diamond shape) for both window-sticker-based estimates (yellow color tone box plots in Figure 3) and baseline FASTSim (gray color tone box plots) remains mostly within ±10% error bounds, with the exception of C-Max Energi, and CR-V for window-sticker-based estimates, and Volt for baseline FASTSim.

•
The average value for tuned FASTSim models (diamond shape of green color tone box plots in Figure 3) remains within ±3% error for all the studied vehicle models. The worst observed cases were for Prius Prime and CR-V, at +2.8% and +2.2%, respectively.
we test the "group-wide" estimate for the tuning parameters (αT, αA, αM) in Table 1. The group-wide estimate for (αT, αA, αM) is essentially a weighted average (by total miles driven) for the vehicles that have been individually tuned (per Figure 2). The test set of trips for the group-wide tuned models includes trips in the tuning set, verification set, as well as trips by vehicles samples in Figure 1 that were not included in the tuning process at all. The results of the group-wide validity assessment are shown in Figure 3, in which notable observations include the following: • The average value (diamond shape) for both window-sticker-based estimates (yellow color tone box plots in Figure 3) and baseline FASTSim (gray color tone box plots) remains mostly within ±10% error bounds, with the exception of C-Max Energi, and CR-V for window-sticker-based estimates, and Volt for baseline FASTSim.

•
The average value for tuned FASTSim models (diamond shape of green color tone box plots in Figure 3) remains within ±3% error for all the studied vehicle models. The worst observed cases were for Prius Prime and CR-V, at +2.8% and +2.2%, respectively. It ought to be noted that a limitation of the overall calibration approach (where three tuning parameters adjust the FASTSim model for a group of vehicle owners) is that individual usage conditions by some owners or certain trips by some owners may vary significantly from the "typical" expectation of the calibrated models. For example, one trip may involve a heavier than normal load of passengers and cargo (thus not adhering to the expected value for correction mass parameter) and/or take place during significantly worse than normal weather conditions. Such unusual trips do show up in Figure 3 when examining the 5th and 95th percentile trips (extension lines of the box plots), which, even It ought to be noted that a limitation of the overall calibration approach (where three tuning parameters adjust the FASTSim model for a group of vehicle owners) is that individual usage conditions by some owners or certain trips by some owners may vary significantly from the "typical" expectation of the calibrated models. For example, one trip may involve a heavier than normal load of passengers and cargo (thus not adhering to the expected value for correction mass parameter) and/or take place during significantly worse than normal weather conditions. Such unusual trips do show up in Figure 3 when examining the 5th and 95th percentile trips (extension lines of the box plots), which, even for the calibrated models, can extend up to nearly ±30% error (as observed for models of the Bolt and Prius Prime in Figure 3).

Conclusions
This paper adopted a calibration approach for FASTSim and tested its validity for nine vehicle models including a variety of powertrains (three BEVs, four PHEVs, one HEV, and one CICE). The accuracy in predicting the average energy intensity was within ±7% in verification set of trips by individual vehicle owners (compared with more than ±15% for baseline FASTSim and window-sticker-based estimation), and within ±3% when combining trips across multiple vehicle owners (compared with ±10% for baseline FASTSim and window-sticker-based estimation). It was thus demonstrated that the proposed calibration approach (with only three tuning parameters), while requiring some additional optimization work compared with baseline FASTSim, can improve the fidelity of the vehicle models for representative groups of vehicle owners. Achieving high accuracy in the simulation of individual trips, however, remains a challenge owing to various unforeseeable uncertainties that some trips may have.
It also ought to be noted that the calibration approach is not necessarily specific to FASTSim, thus future work may include implementation and testing with other physicsbased models for vehicle fuel economy simulations. Future work may also consider comparison with other black-box type estimators (besides window-sticker-based estimation), such as MOVES and EMFAC.