Predicting Electric Vehicle Consumption: A Hybrid Physical-Empirical Model †

: Electric vehicles are becoming more important in our society. Using them in a ﬂeet to minimize energy cost is, therefore, a compelling opportunity for taxi companies. It is crucial to develop accurate models that estimate energy consumption for traveling from one point to another. Consumption can be estimated using a physical model, but such a model fails to ﬁt real-world data, especially in taxi-driving conditions. We compare different approaches to learn from historical data in order to correct/improve the physical model. Similar techniques can be used to estimate consumption for a new vehicle model, which can be useful for companies that want to add a new vehicle model for which they do not have historical data.


Introduction
TEO Taxi (Taxelco Inc., Montreal, QC, Canada) is a company that runs a fleet of 100% electric taxis (approximately 170 cars). This leads to cost reduction and a reduction of greenhouse gas emissions [1]. However, electric vehicles used by the company, which are Nissan LEAF, Kia Soul EV, Tesla Model S and Tesla Model X do not offer as much autonomy as conventional internal combustion engine cars. Since official ratings lack precision especially in winter conditions [2][3][4], the need for an accurate energy consumption prediction model is a real preoccupation for the company. This is mandatory to allow optimized usage of each owned vehicle as well as for future acquisitions. TEO Taxi also wants to be able to use the developed model to predict more accurately the consumption for new/unknown vehicle models early on.
Such models do exist (e.g., [5,6]), but they are not adapted to managing electric taxis especially in winter-driving conditions. Some of them rely on a physical model, but do not take into account the specific taxi driving constraints such as frequent stops, doors being opened and closed frequently and intense urban driving. Another very important factor is the temperature in which the fleet evolves that goes from −30°C to 30°C. The autonomy of any electric vehicle declines in cold temperature [3,7,8] and it is worse in the context of taxi-driving where doors frequently open. This leads to heat loss and an increase in energy consumption to maintain a decent temperature in the cabin.
Inspired by [5], our main goal is to adapt a physical model to take into consideration the various external factors that affect electric vehicle consumption in a taxi-driving context. Two objectives are considered: (1) to enhance the accuracy of a physical model using historical data from a known vehicle model by using linear regression, and (2) allow a better prediction for a physical model for a

Literature Review
Energy consumption is important for fleet management of conventional combustion engine cars as well as a key component for the fleet of electric vehicles. The limited range of electric vehicles and its variability amplify its importance [8,10]. Works currently done for fleets of electric vehicles mostly focus on charging strategies (e.g., [11]) and dispatch strategies (e.g., [12,13]). This simulation calls for an accurate energy consumption model, but current models do not take into consideration the particularities of taxi-driving.
Electric vehicle consumption prediction encompasses two related problems: to predict the range of an electric vehicle [14] or the consumption for a given trip [15]. Consumption is affected by various external factors such as temperature, route condition, and driving style [3,7,16]. It has been shown that these factors have a significant impact on energy consumption for a fleet [2].
De Cauwer et al. [5] proposed to adjust/correct physical models by learning from empirical data. They started with a physical model, it is a wheel power equation. By multiplying by a distance d we obtain the energy consumption for a very short distance traveled for a short duration. A small duration is necessary to fully retain the dynamics that affect an electric vehicle) (Equation (1). The first two terms are the rolling resistance ( f cos θ) and potential energy (sin(θ)). The third one (ρC x A( v 3.6 ) 2 ) is the aerodynamics loss and ma is the loss or regeneration caused by acceleration. They improved the model to include vehicle dynamics, applying a regression on the historical data (Equation (2)). They used multiple linear regression (MLR) to determine the value for the B i . They also added a term for accessories consumption.
Aux T = Temperature scaling Aux t = Fraction of time the auxiliaries are switched on They also studied how events such as traffic lights, days of the week, traffic, driving dynamics, etc. affect the consumption of micro driving segments. This relation being non-linear, they used a neural network for this.
The present article focuses on predicting the global consumption of each taxi trip since we do not have the dynamic driving information needed to predict consumption associated to the individual micro driving segments. We instead take into consideration additional factors (e.g., winter tires, vehicle age, etc.) as well as interactions between factors. This can be done while keeping the model linear.

The Dataset: Working with Real Empirical Data
We have access to historical data for all TEO Taxi's vehicle models (Nissan LEAF, Kia Soul, Tesla Model S and Tesla Model X) for a year of operation. Coupled with public datasets from Environment Canada and NASA, we were able to set up the dataset described in Table 1. It contains information about the distance driven, speed, elevation, trip duration, temperature, wind speed, wind direction, starting and ending state of charge (SOC), vehicle id, vehicle model, date, driver, odometer, battery capacity, and vehicle positions. Each ride is divided into steps of approximately 3 s containing information obtained by a Fleet Carma data logger. The length of the steps does not allow fully capturing the exact driving dynamics, but it should allow a good enough approximation to develop models that adapt to taxi-driving conditions.

Cleaning the Data
Inspecting carefully the speed of consecutive steps of a given ride, we realized there were some inconsistencies. It was established that the timestamps of the GPS points were not evenly spread in time as they should be. The timestamps were therefore corrected and a moving average was applied to correct what was identified as unexplained abnormalities. Taxi rides have an average duration of 20 min [17] and it was established with the company that rides of more than one hour were more likely related to the company bringing the car to a garage or similar. It is the same for rides of only a few seconds: they are related to the situation where the driver started/stopped/started the system probably involuntarily. A known error with the logs occurs when the vehicle passes through a tunnel, causing it to pass it in 2-8 s. The resulting speed was around 900 to 2000 km/h depending on the cases. We also considered that a driver in normal taxi driving circumstances would never go faster than 130 km/h. A total of 28,283 out of 193,347 rides were removed.
The company wants to evaluate the accuracy of the models for different ride lengths. Table 2 presents the number of rides in the dataset per distance cluster of 5 km. Table 3 presents the average consumption per ride (kWh) for each vehicle model.

Models
This section presents the various models we evaluate in Section 5.1 and discusses their particularities.

Basic Physical Model
The first model is the Basic Physical Model from [5] (Equation (1)). The model lacks aspects such as powertrain efficiency to be used for predicting energy consumption [3]. It fails to fit real-world data, but it will be used as a reference for other models.

Extended Physical Model
This model is the same as the previous one, but with an additional term related to air conditioning and accessories (term Aux T aux t from Equation (2)). Aux t was defined has 1.0 (always on). The temperature term (Aux T ) considers energy as a function of the outside temperature (in Celsius). We decided to define Aux T as a non-linear function ( Figure 1). It has been derived from historical data. The function is divided into five segments and each choke-point coordinate has been determined by fitting the data using a solver to perform a least-squares minimization. The logic behind this function is the following: the colder it is, the more energy will be needed to heat the cabin. There is a point at which the heating component reaches its maximal energy consumption and when the temperature rises, there is a point from which we use air conditioning, which takes less energy than heating [18]. Finally, there is a point where we do not use air conditioning or heating, but still use other accessories such as the radio and the lights.  (Table 1), we are able to determine the best values for those

Fitted Extended Physical Model
This model corresponds to Equation (2) with Aux T Aux t as defined in Section 4.2. Using the historical database containing thousands of taxi rides (Table 1), we are able to determine the best values for those weight parameters (B i ) using multiple linear regression (MLR). Using this technique, we correct the extended physical model to take into consideration external factors that were not initially considered by the model (that is, taxi-specific conditions). This allows the model to get a better accuracy for future rides. Moreover, some factors, such as battery capacity [19] and rolling resistance [20], are not constant and vary with external elements such as temperature. These variations can, in part, be corrected while training/adjusting the models.

Learned Empirical Model
The information contained in the database (Table 1) is more diverse than what is necessary for the Extended physical model. We created a Learned Empirical Model based on the available information. This model has access to more information than the Extended Physical Model; this could offer greater flexibility and better accuracy. It can be trained using the same method of learning as for the fitted extended physical model (MLR).

Hybrid Model
Another alternative is to combine the extended physical model with the learned empirical model. This model has the potential to further improve the results of the two models by exploiting the strength of each one.

Adding Interaction Terms into the Hybrid and Learned Empirical Models
Hybrid and Learned Empirical Models have multiple parameters that could have interactions between them. Interaction terms are terms that are the product of two or more terms in the linear regression. For example, if we have a term X and a term Y it is possible that X and Y are in relation and adding the term X · Y could greatly improve the accuracy of the linear regression [21]. The weight applied by the linear regression to this term reflects the importance of the interaction for the model. Since there are approximately 35 terms in the hybrid model and in the learned empirical model, the total number of possible interactions between two terms is 595 and most of them are not logical. This number rises to 6545 if we make three terms interact. Therefore, we only selected terms from which their interactions could logically have a significant implication.
As for the basic and extended physical model, there is no logical interaction between the terms.

Interaction Terms for the Learned Empirical Model
For the learned empirical model, we added the square of each individual term. This allows the linear regression to put greater weight on a parameter when it grows larger. In addition, we added five more interaction terms, all related to the average speed: average speed multiplied by (1) the lost altitude, (2) the odometer, (3) the distance (4) winter and (5) summer. The reason behind (1) is that the possible recovery of energy when going downhill might be related to the speed at which we are driving. We also wanted to have a term which could be related to the state of health of the battery. Term (2) is used for this reason and is logical because the more the vehicle drove, the more likely it is that its battery has a weaker state of health. The term (3) serves as a logical combination of the distance driven with the average speed during that distance. Finally, terms (4) and (5) are related to the fact that winter and summer might affect consumption differently. We do not have any other non-linear terms with which we could interact and it is difficult to find other interaction terms that are logical. Therefore, we decided to keep those five terms.

Interaction Terms for the Hybrid Model
For the Hybrid model, since it already encompasses the non-linear terms of the Extended Physical Model, we only added interaction terms with these. We did not add the square of each term as we did in Section 4.6.1, but we added some more interesting relations such as the multiplication of the odometer by all the terms of the extended physical model to account for the state of health as mentioned in Section 4.6.1, the multiplication of the basic physical model terms with lost altitude for the same reasons as in Section 4.6.1 and the multiplication of winter and summer with all the terms of the extended physical model to allow a different weight of the terms during these seasons. The reason why we only multiplied the terms of the basic physical model with the lost altitude is that there is no logic in multiplying the lost altitude with the accessories consumption term or the climate control term.

Experiments
In this section, we first evaluate the models presented in Section 4 according to how well they allow predicting the energy consumption for a ride. For each model, we report the mean absolute error (MAE) (predicted energy consumption in kWh for a ride minus the real energy consumption of the ride). MAE is the best indicator for the company as it reflects how much they can trust the model and what security gap they should use while dispatching. A smaller MAE allows for a more efficient use of each vehicle in the fleet and helps to maximize their use.
The models are trained separately for each vehicle model to allow the best fit. For each vehicle model, a subset of the database (80% of the taxi rides) is randomly chosen in each distance cluster in order to define the training set.
Each resulting trained model is tested using the 20% remaining taxi rides. We repeat this process 10 times to create a 95% confidence interval.

Results
As a reference, Table 4 presents the actual average trip consumption in the test set. Tables 5-11 present the MAE by vehicle model for each Section 4 models.       Figure 2 presents aggregated results. It shows the weighted MAE (all vehicles aggregated) divided by the actual trip consumption according to the trip length. Globally, the longer the trip is, the smaller the relative error is (that is true for any model). The relative error is about 30% for the 0-5 distance cluster. It may be explained by the fact that the models lack information about events that have a greater impact on the shortest trip (e.g., door openings, etc) which cause the shortest trip to be more error prone.
The company showed more interest in the results for long trips (the relative error is of about 12% for the 20-25 distance cluster). Long trips caused more problems to them in the past as the dispatchers had to introduce very large buffers because of uncertainty. They were very pleased we were able to efficiently keep the error low for long trips while taking temperature and other environmental factors into account. It allows the company to use a smaller buffer (1 kWh) and thus to use their electric vehicles more efficiently.
As expected, all the models are better than the basic physical model (which we only use as a reference). The models using additional empirical data are better than the simpler fitted models. Adding interaction terms helps both models. Hybrid models are better than all the others. Figure 3 presents the average MAE of each model for each vehicle model. On average, we observed a reduction of the error by 51.75% for the hybrid model with interaction terms in comparison to the basic physical model.
Since we are using the same training and test data for each model, we can verify if the models MAE are significantly different using a technique called the difference score [23]. Results show that the Hybrid Model using interaction terms dominates or co-dominates all models. For the Tesla Model S the use of interaction terms does not allow a significant improvement in the results. Finally, the hybrid model dominates all models for the Tesla model X except for the learned empirical model that uses interaction terms where they co-dominate all other models. The hybrid model with interaction terms is always better or equivalent than all other models. This model also has the lowest MAE for the higher distance cluster. It tends to have smaller confidence intervals. This means that it can consider more accurately the impact of long-range taxi-driving than the other models.

Removing Unnecessary Terms
We showed that the hybrid model using interaction terms dominates most of the other models. When two models provide equivalent results, the simplest is generally preferred [24]. The hybrid model contains a lot of terms and we explored the possible ways of removing unnecessary terms. The results for the hybrid model presented in Section 5.1 are those of the hybrid model from which we removed the terms identified in Sections 5.2.1 and 5.2.2.    We used recursive feature elimination, cross-validated (RFECV) selection of the best number of features [25] to detect terms that are not useful to our linear regression.
Using this technique, we are able to determine that the parameter Time since last charge has no significant impact on the results. It is the only term that was identified using RFECV. This means all other terms have an impact on the accuracy of the Hybrid Model.

Removing Terms Using Recursive Feature Elimination (RFE)
Some parameters might have an impact on performance, but this impact might be very low. To detect these parameters, we can use recursive feature elimination (RFE) [25].
Using this method, we are able to remove the terms precipitation and gained altitude from the Hybrid Model without significantly affecting its results. This was tested with the same technique as described in Section 5.1.

Predicting Consumption for a New Vehicle
Predicting the consumption for a new/unknown vehicle is crucial for companies that manage a fleet of electric vehicles. Physical models are interesting because they can easily adapt to new vehicle particularities. We could use a similar technique as previously but we, unfortunately, lack historical data to learn from. Therefore, we propose that, for each vehicle model, we evaluate if training using data from other known vehicles allows accurate predictions.
We use linear regression to learn from all vehicles except the one we consider as the new/unknown vehicle. As an example, for the Nissan LEAF, our training set contained all data from the Kia Soul, Tesla Model S and Tesla Model X. We then test the resulting model on all Nissan LEAF trips using the Nissan LEAF physical parameters. We do not have access to data for driven distance larger than 25 km for the Nissan LEAF and we exclude data from the Nissan LEAF when training for other cars. Table 12 presents the results for all vehicle models. It compares the MAE in kWh of the predicted energy consumption compared to the real energy consumption. Table 13 presents the relative error for each model. Since each test set contains all data about a specific vehicle model, it is normal to have exact values without confidence intervals. Tables 12 and 13 show that the fitted extended physical model is the best model except for the Nissan LEAF and we observe on average a reduction of 30.75% of the error for this model in comparison to the basic physical model. This is a smaller improvement than the one presented in Section 5.1. It is caused by the fact that we do not learn on the data from the vehicle model itself and it is normal to achieve smaller improvement. The results are still better when compared to the basic physical model. Even though there are some significant differences in the behavior of each vehicle model, we can still model general rules that apply to an electric vehicle in a taxi-driving context using data from other vehicles.
Except for the Nissan LEAF, we can observe that adding terms to the model does not improve the results. It even makes it worse than using the fitted extended physical model. Models that worked better in the experiment presented in Section 5.1 now have difficulties correctly predicting the energy consumption. It can be explained by the phenomenon of overfitting [26]. It can happen when a model highly focuses on the learning data at the cost of generality. This situation happens mostly for regression models that consider interaction between factors, as for example the hybrid model. Also, the bigger the training set is in comparison with the test set, the more likely it is that overfitting happens.

Conclusions
Using historical data to train the extended physical model with linear regression improves its accuracy by as much as 44% in comparison with the basic physical model. This is crucial from the point of view of the company. It allows them to use smaller buffers when dispatching resulting in a more optimized system. The consequences are an increased efficiency for the uses of the electric vehicles. We then propose a new hybrid model that uses interaction terms and historical data to further enhance its accuracy. It reaches at best an improvement of about 61% when compared with the basic physical model. However, the historical data used contains a lot of variations due to the fact that it comes from sensors that are subjected to various errors and many uncontrolled parameters [4]. These variations suggest that the results are not as precise as they could be. Some of these variations have been caught by the Hybrid Model using interaction terms thus explaining its better accuracy. It is, however, possible that others might not have been caught by our models. Finally, the fact that our dataset did not contain enough rides in the distance clusters larger than 25 km restricted precision to such distance thus not evaluating models for higher driven distance. As for future work, since our best model included some non-linear terms, it is justified to suppose that other methods offering some non-linear possibilities might be effective and should be explored.