Understanding the Energy Consumption of Battery Electric Buses in Urban Public Transport Systems

: The ever-increasing concerns over urban air quality, noise pollution, and considerable savings in total cost of ownership encouraged more and more cities to introduce battery electric buses (e-bus). Based on the sensor records of 99 e-buses that included over 250,000 h across 4.7 million kilometers, this paper unveiled the relationship between driving behaviors and e-bus battery energy consumption under various environments. Battery e ﬃ ciency was evaluated by the distance traveled per unit battery energy (1% SoC, State of Charge). Mix e ﬀ ect regression was applied to quantify the magnitude and correlation between multiple factors; and 13 machine learning methods were adopted for enhanced prediction and optimization. Although regenerative braking could make a positive contribution to e-bus battery energy recovery, unstable driving styles with greater speed variation or acceleration would consume more energy, hence reduce the battery e ﬃ ciency. The timing window is another signiﬁcant factor and the result showed higher e ﬃ ciency at night, over weekends, or during cooler seasons. Assuming a normal driving behavior, this paper investigated the most economical driving speed in order to maximize battery e ﬃ ciency. An average of 19% improvement could be achieved, and the optimal driving speed is time-dependent, ranging from 11 to 18 km / h.

strategies for passenger EVs; however, the present study hypothesized that such speed might not be suitable for e-bus operations in urban public transport systems. After training and identifying the most predictable machine learning model, an optimized bus traveling speed strategy will be presented in the following section.

Materials and Methods
More than one year (from May 2017 to July 2018) of operational data were collected from 99 e-buses that were running at seven different cities in China. All of them were the same type (Table 1) and the in-vehicle sensors, GPS (Global Positioning System) and BMS (Battery Management System), were recording bus status every 10 s. For the average historical temperature (daily high, low, and average), Figure 1 illustrates the distribution of seven cities in this study. speed strategies for passenger EVs; however, the present study hypothesized that such speed might not be suitable for e-bus operations in urban public transport systems. After training and identifying the most predictable machine learning model, an optimized bus traveling speed strategy will be presented in the following section.

Materials and Methods
More than one year (from May 2017 to July 2018) of operational data were collected from 99 ebuses that were running at seven different cities in China. All of them were the same type (Table 1) and the in-vehicle sensors, GPS (Global Positioning System) and BMS (Battery Management System), were recording bus status every 10 s. For the average historical temperature (daily high, low, and average), Figure 1 illustrates the distribution of seven cities in this study.  This paper aims to analyze battery performance when a bus is running along the road; therefore, data were aggregated based on the battery discharge period-it started as a bus left the charging station or restarted, covered the driving time in the city transit network, and ended as a bus was switched off. Figure 2 visualizes example driving profiles in two afternoons between 2 p.m. and 4 p.m.; and in each afternoon, this particular bus experienced two battery discharge periods and one charging status in-between. For each discharge period, several features were calculated, including average speed, standard deviations of acceleration and deceleration, mileage, and SoC (State of Charge). Using Figure 2 as an example, Table 2 summarizes statistics for each battery discharge period and the calculated distance per SoC will be treated as the target variable in the following section. This paper aims to analyze battery performance when a bus is running along the road; therefore, data were aggregated based on the battery discharge period-it started as a bus left the charging station or restarted, covered the driving time in the city transit network, and ended as a bus was switched off. Figure 2 visualizes example driving profiles in two afternoons between 2 p.m. and 4 p.m.; and in each afternoon, this particular bus experienced two battery discharge periods and one charging status in-between. For each discharge period, several features were calculated, including average speed, standard deviations of acceleration and deceleration, mileage, and SoC (State of Charge). Using Figure 2 as an example, Table 2 summarizes statistics for each battery discharge period and the calculated distance per SoC will be treated as the target variable in the following section.  After data cleaning and aggregation, a total of 62,057 battery discharge periods were extracted, covering 4,727,206 km traveled over 256,979 h. Table 3 provides the summary statistics of all continuous modeling variables and three time-related categorical covariates that were extracted from GPS time stamps. One is for meteorological temperate seasons, where Spring covers from March to May, with a historical average daily temperature from 12 to 22 °C; Summer (25-26 °C) ranges from June to August, Fall (19-26 °C) is between September and November, and Winter (8-12 °C) lasts from December till February. The other is the time of day, where the daytime starts at 6 a.m. and the nighttime begins at 6 p.m. The third one considers weekdays (Monday to Friday) and weekends (Saturday or Sunday).  After data cleaning and aggregation, a total of 62,057 battery discharge periods were extracted, covering 4,727,206 km traveled over 256,979 h. Table 3 provides the summary statistics of all continuous modeling variables and three time-related categorical covariates that were extracted from GPS time stamps. One is for meteorological temperate seasons, where Spring covers from March to May, with a historical average daily temperature from 12 to 22 • C; Summer (25-26 • C) ranges from June to August, Fall (19-26 • C) is between September and November, and Winter (8-12 • C) lasts from December till February. The other is the time of day, where the daytime starts at 6 a.m. and the nighttime begins at 6 p.m. The third one considers weekdays (Monday to Friday) and weekends (Saturday or Sunday). The general form of linear mixed effect model in matrix notation is listed in the following equation [31]: where y is the target variable with size N × 1; X is a N × p matrix with p predictors; β is the fixed-effect coefficients with size p × 1; Z is the random complement to the fixed X with size N × q and u is a q × 1 column vector of the random effects. ε is the residuals vector that accounts for the randomness that cannot be explained by Xβ + Zu. In this study, the bus-specific correlations need to be considered to accommodate unobserved driving heterogeneity. The final modeling dataset has 62,057 records (N = 62,057) with 99 e-buses (q = 99). All the continuous input variables were z-scored to allow effect comparison across a uniform scale and the R package lme4 (Linear Mixed Effects Models using 'Eigen' and S4) [32] was used to fit the model.

Results
In this session, the linear mixed effect regression model was estimated to quantify the magnitude and correlation between multiple factors and battery efficiency. Then, after hyper-parameter fine-tuning, training, random sampling, and comparing 13 different machine learning algorithms, the Random Forest was identified as the most predictable method. At last, the Random Forest regressor was further adopted to search the optimal speed using the stochastic differential evolution optimization. Table 4 presents mixed effect modeling results for average distance traveled per SoC. The average speed is statistically significant, and its positive coefficient indicates that the faster the speed, the longer distance can be traveled per unit battery energy. The standard deviation of bus operation speed and acceleration can reflect operator's driving behaviors-the higher the speed variation, the greater the driving fluctuations, and the higher the acceleration, the more excessive sudden pressing on brake or gas pedals. Both of them have significant negative effects, suggesting that unstable driving behavior can greatly increase battery energy consumption; hence reduce the distance traveled. The seasonal indicator was treated as a categorical variable and the baseline is Spring. All of them are significant, showing that environmental temperature plays an important role in battery energy consumption. It is possible to assume that the higher the temperature (in Summer), the heavier usage of bus air conditioners, thus increasing the energy consumptions and shortening the total distance that can be reached. More people use buses during daytime or weekdays, hence battery energy could be exhausted faster with the heavier load. The positive coefficient of nighttime further validates such a hypothesis-with carrying fewer passengers at night or weekends, the bus can travel longer.

Predicting Energy Consumption via Machine Learning
While mixed effect regression model is great for factor interpretation, the prediction accuracy might be compromised due to its linear nature. In this section, several machine learning algorithms (including both linear and non-linear methods) are assessed to explore the model's prediction accuracy. Figure 3 illustrates the framework used in the current paper for model hyperparameter tuning, mode selection, and model validation. Thirteen machine learning estimators were tested, including linear regressors (Linear Regression, Lasso, Ridge, Elastic Net, and Robust Huber Regression), kernel-based method (Support Vector Machine), method with dimension reduction (PLS), multi-layer perceptron (MLP) and deep neural network with dropout, tree-based method (Decision Tree), and ensemble methods (Random Forest, Boosting Trees, and XGBoost). It is desired to identify the most predictive method for the unseen data; therefore, the whole dataset was randomly divided into training, validation, and testing samples, based on the ratio of 60%, 20%, and 20%. Each estimator has a set of hyper-parameters (the parameter that used to control the learning process) and they are tuned via an exhaustive grid search based on the training and validation dataset. Additionally, to consider the randomness, this process was repeated 50 times via random sampling. Within each iteration, 80% of the data were used to fine-tune the model and the remaining unseen 20% were reserved as the testing data to verify the model's performance.
Mean Square Error (MSE), the average squared difference between the predicted values and the actual values, was chosen as modeling performance in Figure 4. The Random Forest regressor outperforms other machine learning methods as evidenced by the lowest MSE scores for both training and testing dataset. Figure 4 also embeds two parity plots for Random Forest at one iteration during the process of random sampling, where the x-axis represents true values and y-axis is prediction. The bottom one is for training performance and upper plot is based on testing dataset. Overall, the model performance is good, as all the points clustered along the 45 • diagonal line. However, some systematic bias was observed, as the regressor overestimates (prediction is higher than true value) at the lower end and underestimates (prediction is lower than true value) at the higher end.

Optimization to Identify the Most Economical Driving Speed
Once the random forest estimator was fine-tuned and selected, the next question is to optimize the target variable-maximizing total distance traveled per SoC (DPSoC) to achieve a higher battery efficiency. Although there are multiple different input features for the random forest regressor, this study aims to find the most economical driving speed based on typical driving behavior and driving environment during weekdays. The median values were adopted to represent a typical driving behavior-an operator started a bus with 97% charge, after driving it normally (with 18.64 km/h as speed standard deviation, 0.39 as overall acceleration variation, and 0.33 as deceleration variation) along city transit network, this bus consumed 43% battery energy and returned to charging station

Optimization to Identify the Most Economical Driving Speed
Once the random forest estimator was fine-tuned and selected, the next question is to optimize the target variable-maximizing total distance traveled per SoC (DPSoC) to achieve a higher battery efficiency. Although there are multiple different input features for the random forest regressor, this study aims to find the most economical driving speed based on typical driving behavior and driving environment during weekdays. The median values were adopted to represent a typical driving behavior-an operator started a bus with 97% charge, after driving it normally (with 18.64 km/h as speed standard deviation, 0.39 as overall acceleration variation, and 0.33 as deceleration variation) along city transit network, this bus consumed 43% battery energy and returned to charging station

Optimization to Identify the Most Economical Driving Speed
Once the random forest estimator was fine-tuned and selected, the next question is to optimize the target variable-maximizing total distance traveled per SoC (DPSoC) to achieve a higher battery efficiency. Although there are multiple different input features for the random forest regressor, this study aims to find the most economical driving speed based on typical driving behavior and driving environment during weekdays. The median values were adopted to represent a typical driving behavior-an operator started a bus with 97% charge, after driving it normally (with 18.64 km/h as speed standard deviation, 0.39 as overall acceleration variation, and 0.33 as deceleration variation) along city transit network, this bus consumed 43% battery energy and returned to charging station with an ending SoC of 54%. As discovered by the statistical model, timing is critical to e-bus better performance; therefore, the optimization was conducted based on eight different situations: max Speed mean DPSoC = RandomForest(Speed mean , Speed std , Acce. sd , Dece. sd , SoC start , SoC end , Time) sub ject to : Speed sd = 18.64, Acce sd = 0.39, Dece sd = 0.33, SoC start = 97%, SoC end = 54%, Time Equation (2) summarizes objective function which subjects to some normal driving assumptions. Random Forest is an ensemble black-box model with limited information of gradient. Differential evolution optimization was utilized because it is a stochastic method and does not depend on the gradient to find the optimum [33]. Table 5 provides the optimization results with respect to different time windows. It was impossible to find the identical constrains (speed variation, acceleration/deceleration, and SoC as listed in Equation (2) from the historical dataset; therefore, the existing unoptimized distance per SoC was approximated by averaging all the data points around the constrains within a ±2.5% buffer (increase the buffer by a step of 0.25% if no data point was identified). An average of 19.5% improvement was observed after applying the most economical speed strategy.

Discussion
There were approximately 385,000 e-buses globally in 2017 and 99% of them were operated in China [1]. Inspired by 99 e-buses from seven cities in China, the present paper treated the total distance traveled per unit SoC as a surrogate for battery efficiency and researched its correlations with driving behaviors under various environments.
Statistical model with mixed effects can consider the unobserved driving heterogeneity and quantify the correlation and magnitude of each factor. Specifically, the average driving speed is essential to battery efficiency, and the higher the speed, the longer distance could be traveled. Both speed variation and acceleration can reflect driving behavior, which could be attributable to operator's intrinsic driving style or traffic conditions. Their modeling coefficients were negative, indicating that the more unstable maneuvers that were performed, the shorter distance can be covered per unit battery energy. Dedicated bus lanes (DBL) can free buses from traffic interference [34] and the study found that it could save bus running time and reduce the operation variability [35]. To provide a better driving environment and enhance battery efficiency further, DBL is recommended, because operators can drive faster and smoother. As for the phenomenon of regenerative braking, this should be partially reflected by the standard deviation of deceleration during each battery discharge period. Its coefficient is positive, suggesting that regenerative braking can make a positive contribution to e-bus battery energy recovery.
In addition to different driving behaviors, this paper also identified strong correlations between battery energy consumption and timing windows. Temporal factors like time of day and day of week are considerable, and e-buses can travel 6.3% longer per unit battery energy at night and 0.3% longer over weekends. This improvement makes sense because the daytime traffic is always more congested than nighttime and with fewer passengers on the bus, the battery can last longer. Seasonal effect is another dominant factor. Assuming Spring is the baseline, the battery efficiencies were reduced during hotter months-12.7% and 2.5% reduction for Summer and Fall, respectively, whereas battery efficiency was increased by 3.8% during Winter. It is well acknowledged that a lithium-ion battery cell often performs worse at a lower temperature due to the increased resistance. However, most of the buses in the current study were operated in warm southern cities in China, where the average temperature during Winter is around 10 • C. Although the detailed data is not available, it is highly possible that the major battery efficiency reduction during Summer is caused by air conditioning usage. While mixed effect model helps to understand correlations among existing historical data, its ability to predict the unknown is limited. After a comprehensive evaluation of 13 machine learning algorithms, the Random Forest was chosen because of its lowest modeling errors in both training (understanding existing patterns) and testing (predicting unknown) datasets. It was integrated into the next analysis stage-identifying the optimum speed that can assist bus drivers to travel longer within a unit SoC. Results found that the average maximum distance which could be achieved is about 2.24 km per unit battery energy, with a reduction in Summer (1.96) and increase in Spring (2.36), Fall (2.31) and Winter (2.36). As hypothesized, the optimal speed strategy for e-bus is much lower than the previous researches on passenger EVs; and an average of 19.5% improvement could be achieved based on the most economical speed under a normal driving style. Unlike passenger EVs, e-bus serves as public transportation and our results can provide a scientific guidance to help bus drivers better control e-bus with a greater energy efficiency.

Conclusions
This paper processed a comprehensive dataset (55 GB) that is composed of the data collected from 99 e-buses in seven cities in China over one year. Various driving behaviors (average speed, speed variation, and acceleration) were linked to battery efficiency and a stable driving style would be appreciated for a higher efficiency. Different timing windows were also proven to achieve a higher battery efficiency, probably due to fewer passengers at night and less air conditioning usage during cooler seasons.
By analyzing historical datasets, one can learn from prior experiences; but more importantly, one wants to forecast the unknown future. This paper introduced a framework that starts with understanding existing data via mixed effect regression; followed by a comprehensive model selection and evaluation, in order to pick up a method with the most prediction power. This case study pinpointed the Random Forest to be the best candidate and applied it in the optimization stage for a more productive battery usage plan. The findings of the most economical speed were recommended, which could be served as an actionable bus operating guideline during different times.
As the current paper only analyzed a portion of dataset, more actions are planned in the future; for example, including the battery details (current and voltage information) and engine status (rotation speed and torque). Neural network methods (MLP and deep neural net) were tested in the current study; however, their performances were not good enough. With a much more complex layer structure (activation function, dropout rates, learning rates, etc.) to explore, it is possible that the current research did not identify the suitable deep learning models. More deep learning methods will be studied by the team in the future. GPS data were collected but the GIS (Geographic Information System) roadway profile data was not available for the current study, which made it impossible to compare battery performance under various roadway conditions (ramp, straight/curved road, or slope). Although mixed effect model can account for these unobserved driving heterogeneities to some extent, further study is recommended based on the roadway profile data. Admittedly, temperature has a great influence on the battery energy consumption; however, the current research only considered the seasonal aspect, which lost the detailed insights in daily or hourly levels. Although it is a lengthy process, the research team is currently collecting detailed weather information (wind speed, temperature, precipitation, etc.) from each city in a much granular level to further extend the research in the future.

Conflicts of Interest:
The authors declare no conflict of interest.