Application Study on the Dynamic Programming Algorithm for Energy Management of Plug-in Hybrid Electric Vehicles

To explore the problems associated with applying dynamic programming (DP) in the energy management strategies of plug-in hybrid electric vehicles (PHEVs), a plug-in hybrid bus powertrain is introduced and its dynamic control model is constructed. The numerical issues, including the discretization resolution of the relevant variables and the boundary issue of their feasible regions, were considered when implementing DP to solve the optimal control problem of PHEVs. The tradeoff between the optimization accuracy when using the DP algorithm and the computational burden was systematically investigated. As a result of overcoming the numerical issues, the DP-based approach has the potential to improve the fuel-savings potential of PHEVs. The results from comparing the DP-based strategy and the traditional control strategy indicate that there is an approximately 20% improvement in fuel economy.


Introduction
In contrast to hybrid electric vehicles (HEVs), plug-in hybrid electric vehicles (PHEVs) have a larger battery, which can replace a certain amount of conventional fossil fuels with grid electricity [1][2][3][4].The manner in which the onboard electrical energy is used significantly influences the energy utilization efficiency and subsequently impacts the fuel economy [5][6][7][8].
As an approach that solves multi-step optimization problems based on Bellman's principle of optimality [9,10], dynamic programming (DP) guarantees global optimality through an exhaustive search of all control and state grids.Applying DP in PHEVs consists of finding optimal control sequences to obtain the optimal battery state of charge (SoC) trajectory and to minimize fuel consumption over a given driving schedule.The DP-based energy management strategy belongs to the category of off-line energy management techniques, which are not suitable for online control [11].However, this approach provides a benchmark for assessing the optimality of other energy management strategies and helps to improve the online strategy [12][13][14][15][16].
Because DP is a numerical algorithm, the continuous-time control problem must be discretized.In fact, the DP processes are implemented backward from the final state to the initial state by searching for the optimal trajectory among the discretized grid points.Moreover, the grid points are the intersections of discretizing lines of state space and time space, as shown in Figure 1.However, the state output of the model function is continuous in the state space, which does not generally coincide with the nodes of the state grid but rather between them.Consequently, it is necessary to appropriately evaluate the DP process, and interpolation is used to find the cost-to-go value, which inherently introduces numerical errors [17,18].Therefore, the accuracy of the DP solutions depends on the number of grid points [19].Higher discretization resolution of the state space and time space could increase the number of grid points, which would consequently increase the optimality of the DP results.Unfortunately, higher discretization resolution also leads to an increase in the computation load required to calculate the global optimum [17,19,20].Another issue to consider is the valuation of the cost function for the infeasible states.An infinite cost is always used as the cost function for the state points in the infeasible area, which could result in the cost-to-go values, obtained by linear interpolation between an infinite cost-to-go and a finite one, for the feasible state points near the boundary line becoming infinite.Consequently, the actual infeasible domain is enlarged during the calculation process.Moreover, the control variables also need to be properly discretized.For the control variables, the proportion of the feasible region that can be utilized depends on its discretization resolution, as shown in Figure 2. Clearly, the number of available points increases as the discretization resolution increases.Additionally, the optimality of the DP-based control sequences improves, whereas the computation load is substantially increased.In general, if the aforementioned problems are not fully taken into account and appropriately treated, then the relevant numerical errors would have a large impact on the final result.These issues for the case of PHEVs will be investigated in this paper.
The remainder of this paper is organized as follows: Section 2 provides an introduction and discusses systematic modeling for the targeted single-axis series-parallel plug-in hybrid electric bus (PHEB).The DP of the optimal control problem for the PHEB is formulated in Section 3. The numerical issues when solving the DP are investigated in Section 4. The results from the PHEB with two types of strategies are discussed in Section 5. Finally, conclusions are drawn in Section 6.

Plug-in Hybrid Powertrain Model
As an approach to solve global optimization problems over a finite horizon, DP is always used for solving the optimal energy management problem of HEVs [21][22][23].In this paper, for the purpose of minimizing the fuel consumption of PHEVs over a given driving cycle, DP is responsible for finding the optimal power combination of the power components to meet the power demand of the vehicle, which is based on the vehicle dynamic model.

A Plug-in Hybrid Electric Bus Powertrain
A single-axis series-parallel PHEB is taken as the research object, and its powertrain configuration is shown in Figure 3.It consists of a conventional internal combustion engine (ICE), an integrated starter and generator (ISG), a traction motor (TM), an automatically controllable friction clutch, a battery pack, an on-board battery charger and electronic control systems, which include a vehicle control unit (VCU), a battery management system (BMS), an integrated motor controller for the ISG and the TM, an ICE control unit, and so on.The technical parameters of the PHEB are summarized in Table 1.For the PHEB powertrain, the ICE output is connected directly to the ISG rotor shaft and then connected to the clutch input plate.The TM rotor is connected directly to the clutch output plate.The power from the ICE, the ISG and the TM can be delivered directly to the rear drive wheels through the final drive and the differentials.The automatically controllable friction clutch is used to connect or disconnect the ICE/ISG torque with the TM torque.If the clutch input plate and output plate are connected, then the ICE/ISG torque can be delivered directly to the driving wheels, and the PHEB works in a parallel hybrid mode.When the clutch input plate and output plate are disconnected, the ICE/ISG can only output electricity, and the PHEB works in a series hybrid mode.Note that the ISG can instantaneously start the ICE once the ICE needs to work.

The Vehicle Model
The movement behavior of a vehicle along its moving direction is completely determined by all of the forces that act on it in the same direction.In the longitudinal direction, the major external forces acting on a two-axle vehicle include the rolling resistance of the front and rear tires, f F ; the aerodynamic drag, w F ; the climbing resistance, i F ; the acceleration resistance, j F ; and the tractive effort of the drive wheels, d F .The dynamic equation for vehicle motion along the longitudinal direction is expressed by: where m is the vehicle gross weight, g is the acceleration due to gravity, f is the rolling resistance coefficient,  is the road gradient, D C is the aerodynamic drag coefficient, A is the vehicle frontal area,  is the air density, v is the vehicle velocity,  is the mass factor that equivalently converts the rotational inertias of rotating components into translational mass, and d d v t is the vehicle acceleration.
The driving resistances depend on the current state of the vehicle and on the driver's expectation at the next moment.During the simulation, the desired velocity at the next moment is determined from the driving cycle profile.Because the vehicle simulation system is a discrete-time system, the current acceleration, k a , can be described as: where k v is the current velocity, 1 ' k v  is the desired velocity at the next simulation step, and step t is the simulation time step.By combining Equations ( 1) and ( 2), the vehicle torque requirements for the powertrain at the current step in the discrete-time space, req _ k T , can be formulated as: where d r is the dynamic radius of the wheel, 0 i is the ratio of the final gear, T  is the transmission efficiency, and k  is the road gradient at the current step.

ICE Model
The experimental modeling method is used to develop the ICE model in the quasi-static vehicle model without considering its dynamic characteristics.The fuel consumption map of the ICE is expressed as the relationship between the crankshaft speed and the torque by a non-linear 3-D MAP from experimental ICE data.Figure 4 shows the fuel consumption map of a 6.5 L diesel engine.Note that BSFC is the abbreviation for brake specific fuel consumption.
Therefore, the ICE fuel consumption rate e e e ( , ) g n T at the operating point e e ( , ) n T , where the ICE outputs torque e T at speed e n , is obtained from the following interpolation function: e e e e e ( , ) ( , ) In the discrete-time system, the ICE fuel consumption at the th k simulation step, fuel_k V , is obtained by:  Then, the ICE fuel consumption fuel V during the simulation process is obtained by: sim fuel fuel_ 1 where sim L is the number of simulation steps, obtained as sim sim step , and sim T is the simulation period.

The ISG and TM Models
The experimental modeling method is also used to develop the ISG model and the TM model.Their efficiency characteristics are expressed as the relationship between the speed and the torque by a non-linear 3-D MAP from experimental data.Figure 5 shows the ISG efficiency map, and Figure 6 shows the TM efficiency map.The torque output model of the motor is similar to the ICE.The motor efficiency m m m ( , ) where m n is the speed of the motor and m T is the motor output torque, which is defined as positive during propelling and negative during regenerative braking. Speed/(r/min)

The Battery Model
A lithium-ion battery is used, which can be modeled with a static equivalent circuit [24].In this paper, the Rint model, which is based on experimental data of battery charging-discharging, is used due to its simplicity and effectiveness for lithium-ion batteries.This model is illustrated in Figure 7.
Based on the equivalent circuit shown in Figure 7, the following equations can be obtained: where bat P is the electric power provided by the battery, which is positive during discharging and negative during charging.
In addition, bat P is determined by the ISG and the TM as shown in the following equation: where ISG T , ISG n and ISG  are the output torque, speed and working efficiency of the ISG, respectively, and TM T , TM n and TM  are the output torque, speed and working efficiency of the TM, respectively.
Equation ( 8) is transformed to the following form: The first-order derivative of the battery SoC with respect to time can be expressed as follows: where C is the nominal capacity of the battery.Equation ( 11) is transformed to a discrete form, as follows: where k I is the battery charging-discharging current at the kth step.

Formulating Dynamic Programming
Prior to formulating the DP, the control variables and state variables need to be determined.In this paper, the state variables, such as the vehicle speed v and SoC, reflect the state of the PHEB.Because DP is implemented over the known driving cycle, the vehicle speed at every stage is known.Therefore, SoC is chosen as the only state variable in this work.T and TM T , are independent, and these two variables are chosen as the independent control variables in this work.
In the discrete-time format, the PHEB system can be expressed as follows: 1 ( , ) where k x and k u are the state variable and the control variable, respectively.
Compared with HEVs, whose battery acts as a power equalizer to improve the ICE operating efficiency with the expectation of the same SoC at the start and end of a trip, PHEVs have a larger battery and can replace a certain amount of fossil energy with grid electricity.In general, the exhaust emissions from the hybrid electric bus are not taken into consideration.In studying the object PHEB in this paper, we only focus on minimizing the fuel consumption.Then, the optimal control problem is to find the control sequences to minimize the following cost function: where N is the stage number of the driving cycle and L is the instantaneous cost.
To ensure safe/smooth operation of the components, such as the ICE, ISG, TM and battery, during the solving process, it is necessary to satisfy constraints related to their ratings and the power flow between them.The physical constraints on the states and the inputs are denoted by the following inequalities and equalities: where ISG_k n and ISG_k T are the speed and output torque of the ISG at the kth step, respectively; TM _ k n and TM_k T are the speed and output torque of the TM at the kth step, respectively; b_k T is the hydraulic brake torque at the kth step; and the subscripts min and max in the variables denote the maximum and minimum of those variables, respectively.

Implementing Dynamic Programming
During the backward simulation procedure, the DP problem can be described by the recursive Equations ( 17) and (18).The sub-problem for the (N -1)th step is: For the th k ( 0 1 k N    ) step, the sub-problem is: where * ( ) J x is the optimal cost-to-go function at state k x from the kth simulation step to the terminal of the driving cycle and 1 k x  is the state in the ( 1)th k  stage when the control variable k u is applied to state k x at the th k stage according to Equation (14).
Before recursive Equations ( 17) and ( 18) are solved in reverse, it is necessary for the continuous variables to be discretized.The continuous state SoC is discretized into finite points, and the number of discretized state, S, is: where SoC  is the increment of the discretized SoC and max

SoC and min
SoC are the upper and lower constrains of SoC, respectively.In addition, the independent control variables, e T and m T , are all continuous and also need to be discretized into finite points.Due to the coupling relationship between the torque variables, e T and m T have the same discretization resolution, denoted by T  , which is defined here as the torque increment.During the backward simulation procedure, the model output 1 k SoC  of the system state SoC based on Equation ( 12) is continuous in the state space, and it does not generally coincide with the nodes of the state grid but is rather between them, as depicted in Figure 8.When evaluating the function * ( ) at every grid point, such as SoC  , and so on, * 1 1 ( ) is evaluated by interpolation if the model output 1 k SoC  does not fall exactly on grid points.During the backward simulation, the optimal controls at every grid point are obtained.When the initial SoC is specified, the optimal control sequences can be found through a forward simulation.During the forward simulation procedure, the interpolation is also needed to find the optimal control sequences, as shown in Figure 9.When the optimal control at the kth stage is k u , the optimal control stage is obtained through interpolation between

Numerical Issues of the DP
The errors that occur during implementation of the DP procedure result from approximating the valuation when the actual state does not coincide with the nodes of the state grid, as in Figures 8 and 9. Additionally, these errors are closely related to the discretization resolution of relevant continuous variables.In this section, the interaction mechanism between the accuracy of the calculated results and the numerical issues, such as the resolution issue of the discretized variables and the boundary issue, is investigated with consideration of the computational load.

Resolution Study
For the vehicles, because of the correlation between the sampling period of the given driving cycle and the total vehicle energy demand under this driving cycle, the stage step, the discretization step of the time space, is chosen to equal to the sampling period such that the stage number N is equal to the number of sampling points of the driving cycle.Thus, the discretization resolution of the state space and the control variables is investigated in the following section.

Resolution of the State Variable
The increment of the discretized state variable SoC, SoC  , represents its resolution.The dynamics of the battery SoC is inherently determined by the sampling period, the battery capacity and the charging/discharging current, which is dependent on the output power of the motor.However, the only adjustable variable relative to the SoC dynamics is the torque increment T  in this work.Moreover, it is clear that the smaller T  is, the more accurate are the results calculated by DP.The relationship between the battery charging/discharging current and the different T  is investigated, and the results are presented in Figure 10.As shown in Figure 10, within certain limits, the increase in T  does not clearly influence the battery working current and the computational load is noticeably reduced.Based on the mathematical principle of DP, the discretization resolution of the state space and the discretization resolution of the control variables are independent of the calculated results.Therefore, during the procedure for investigating the SoC discretization resolution, the difference in the T  does not influence the rate of convergence of the calculated results.Here, T  is chosen as 30 Nm, and the relationship between SoC  and the calculated results for batteries with different capacities is shown in Figure 11.The computational load with different SoC  is also checked, and the results are shown in Figure 12.Note that the abscissa values of the three graphs in Figure 11 are consistent with those in Figure 12.From Figures 11 and 12, it is clear that the fuel consumption decreases as SoC  decreases and tends to be stable after SoC  is sufficiently small; the SoC termination follows the same trend.Unfortunately, the computational load increases many fold when SoC  decreases.In addition, the convergence rate of the calculated results is closely related to the battery capacity, and the slower the convergence speed is, the larger the battery capacity will be.In other words, the maximum SoC  through which the accuracy could be ensured increases as the battery capacity decreases.
Thus, the appropriate SoC  , which requires a minimum computational load to ensure accuracy, can be obtained according to the battery capacity.

Resolution of the Control Variables
As mentioned above, the torque increment T  represents the resolution of the control variables.
When a lower discretization resolution is chosen, the feasible region of the power components is not effectively utilized and the optimality of the calculated results is inherently degraded.However, a higher discretization resolution corresponds to a higher computational load.Due to the decoupling characteristics between the discretization resolution of the state space and the discretization resolution of the control variables mentioned above, SoC  is set constant at 0.001 when investigating the influence of T  on the calculated results.The calculated results are shown in Figure 13, and the corresponding computational load is shown in Figure 14.As shown in Figure 13, the SoC termination is almost not influenced by T  , and the fuel consumption gradually tends to the optimal value with decreasing T  .For HEVs and PHEVs, the fuel consumption is generally the optimization goal, and ICE is the only component that consumes fuel.The specific fuel consumption of ICE is related to its capacity, calibrating conditions and so on; thus, the fuel consumption map of ICE needs to be taken into consideration when selecting T  .
Compared with SoC  , the influence of T  on the computational load is very mild, as shown in Figure 14.When the maximum T  is 400 times greater than the minimum, the difference between the corresponding computational loads is less than 4-fold.This result is the reason why the discretization of control variables is not necessary to be implemented in every computation when solving DP.Therefore, the extent to which T  impacts the computational load is dependent on the frequency that discretization is implemented during the calculation.

Boundary Issue
When implementing DP, the boundary issue can be divided into two parts: the boundary issue of the state space and the boundary issue of the feasible region of control variables.In contrast to the optimal control problem of HEVs, whose state variable SoC remains constant within the preset range, the SoC trajectory of PHEVs monotonically decreases overall, as shown in Figure 15.Correspondingly, the boundary issue of the state space influencing the DP results exists throughout the solving procedure of DP for HEVs' optimal control problem, whereas it only exists near the start and end of the trip for PHEVs.Therefore, it could be negligible to the accuracy of the results calculated by DP when using DP to solve the optimal control problem for PHEVs.The boundary issue of the feasible region of control variables is that the available control points on the boundary lines face the risk of being missed, which is essentially derived from the discretization of the control variables.It has been actually embodied in the influence of the discretization resolution on the results calculated by DP.
In summary, from the results presented in Figures 11 and 13, it could be observed that the convergence characteristics of the SoC termination are related to SoC  but not influenced by T  .
Thus, for the optimal control problem of PHEVs, the SoC trajectory is determined by the discretization resolution of the state variable SoC, whereas the optimality of the fuel consumption is primarily dependent on the discretization resolution of the state variable SoC and the discretization resolution of the control variables.

Application Example
In previous works [24], we defined the basic PHEV operating modes as pure electric driving (PED), hybrid driving charge depleting (HDCD) and hybrid driving charge sustaining (HDCS) based on the battery SoC profile and developed the PED + HDCD + HDCS strategy, which is an optimal online strategy that is practical for PHEVs.This strategy is optimally composed of the PED mode, the HDCD mode and the HDCS mode.In this section, DP is utilized to solve the optimal control problem of the PHEB presented in Section 2. The results obtained by DP are compared with the simulation results of the PHEB with the PED + HDCD + HDCS strategy.
The Chinese Standard Urban Driving Cycle (CSUDC), which is shown in Figure 16, is selected to be used during the simulation experiment.The trip distance is 180 km, which is attained by successively repeating the same driving cycle 31 times.Additionally, the vehicle is loaded with 65% of a full load.
Velocity/(km/h) Prior to conducting the simulation experiments, it is assumed that the battery of the PHEB has been fully charged from the power grid.Note that the PHEB is not allowed to implement regenerative braking when the battery SoC is higher than 80% for the purpose of protecting the battery.When implementing the DP, the SoC  and the T  are set to 0.5% and 5 Nm, respectively.The SoC trajectories of the PHEB with two types of control strategies are shown in Figure 17.When the battery SoC is higher than 80%, the trends of the two SoC trajectories are similar, and the slight difference between them results from the ICE often being turned on to propel the vehicle under the DP-based strategy whereas the ICE is turned off for the PED + HDCD + HDCS strategy.When the battery SoC is between 80% and 30%, the battery SoC acquired by DP linearly decreases overall.Moreover, the available energy from the battery is exhausted only at the end of the trip, which provides the best benchmark for improving the online strategy.Figure 18 shows the relationship between the trip distance and the fuel consumption per 100 km of the PHEB with the two types of strategies.The fuel economy of the PHEB with the PED + HDCD + HDCS strategy is better than the results calculated by DP in the first fraction of the trip, as shown in Figure 18.However, the latter gradually becomes superior to the former as the trip distance increases.The reason for this result is that the DP-based strategy can coordinate different components of the PHEB powertrain to efficiently work from a global perspective.The shutdown proportion of the ICE and the ISG, which is performed by DP, is higher than that under the PED + HDCD + HDCS strategy, as shown in Figures 19 and 20, respectively.Moreover, the ICE under the DP-based strategy works in the higher efficiency area, as shown in Figures 21 and 22  From the results presented in Figure 21, it is clear that the proportion of the ICE working points calculated by DP in the higher efficiency area is clearly greater than that under the PED + HDCD + HDCS strategy.Consequently, the distribution probability of the BSFC of the ICE working points in the results calculated by DP is considerably better than that under the PED + HDCD + HDCS strategy, as shown in Figure 22.From the comparative analysis mentioned above, the drawback of the online strategy is clear.Then, the goal and the effective plan to improve the online strategy could be easily drawn.

Conclusions
We constructed a DP-based optimal control algorithm for a PHEB and investigated the interaction mechanism between the accuracy of the calculated results and the numerical issues, such as the resolution issue of the discretized variables and the boundary issue with consideration of the computational load.The main conclusions are as follows: (1) Numerical issues emerged when the optimal control problem of PHEVs was solved using DP.
A single-axis series-parallel PHEB was modeled, and its systematic model was constructed for studying these issues.Then, the discretization resolution of the relevant variables and the boundary issue of their feasible region were investigated.
(2) The battery SoC trajectory is determined by the discretization resolution of the state variable SoC, whereas the optimality of the fuel consumption is primarily dependent on the discretization resolution of the state variable SoC and the discretization resolution of the control variables.Additionally, the lowest discretization resolution of the state variable SoC with maintained accuracy is closely related to the battery capacity, whereas the fuel consumption map of ICE needs to be taken into consideration when selecting the discretization resolution of the control variables.(3) The computational load increases many fold as the discretization resolution of the state variable SoC increases.However, the influence of the discretization resolution of the control variables on the computational load is very mild.Moreover, the extent to which it impacts the computational load depends on the frequency with which the discretization is implemented during the calculation.(4) A simulation experiment was performed, and the results from two types of strategies were compared.The DP-optimized result provided the maximum potentiality of an energy management strategy for PHEVs, which serves an optimized target for other online control strategies by proper calibration methods.

Figure 1 .
Figure 1.Discretization of the state space and time space.

Figure 2 .
Figure 2. Discretization of the control variables.


is the fuel density and e _ k n and e _ k T are the speed and output torque of the ICE at the current step, respectively.

Figure 4 .
Figure 4.The ICE fuel consumption map.

Figure 6 .
Figure 6.The traction motor efficiency map.

Figure 7 .
Figure 7.The Rint battery model.Here oc U is the battery open-circuit voltage, which is related to the SoC and the battery temperature

where 1 k
SoC  is the battery SoC at the (k+1)th step andk SoC , oc _ k U , bat _ k P and int_ k R are the SoC,open-circuit voltage, electric power and internal resistance of the battery at the kth step, respectively.The electricity consumption Q during the simulation process is obtained by:

Figure 8 .
Figure 8.The interpolation diagram during the backward simulation. k

Figure 9 .
Figure 9.The interpolation diagram during the forward simulation.

Figure 10 .
Figure 10.The distribution probability of the absolute value of the battery charging/discharging current for different T  .

Figure 11 .
Figure 11.The results calculated by DP with different SoC  .

Figure 12 .
Figure 12.The relative computational load for different SoC  .

Figure 13 .
Figure 13.The results calculated by DP with different T  .

Figure 14 .
Figure 14.The relative computational load for different SoC  .

Figure 15 .
Figure 15.The optimal SoC trajectory of HEVs and PHEVs.

Figure 16 .
Figure 16.Velocity profile of the CSUDC cycle.

Figure 17 .
Figure 17.The SoC trajectories under two types of strategies.

Figure 18 .
Figure 18.The relationship curve between fuel economy and trip distance under the two types of strategies. .

Figure 19 .Figure 20 .
Figure 19.The ICE shutdown percentage under the two types of strategies.

Figure 21 .
Figure 21.The statistical results of the BSFC of the working points of the ICE under the two types of strategies.

Figure 22 .
Figure 22.The distribution probability of the BSFC of the working points of the ICE under the two types of strategies.
There are many control variables in the PHEB, such as the ICE output torque e T , ICE output speed e n , TM output torque TM T , TM output speed TM n , ISG output torque ISG T , ISG output speed ISG n , and hydraulic brake torque b T .However, only two control variables, e