Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method

Han, Wenxiao; Chu, Xiaohua; Shi, Sui; Zhao, Ling; Zhao, Zhen

doi:10.3390/pr10061080

Open AccessArticle

Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method

by

Wenxiao Han

,

Xiaohua Chu

^*,

Sui Shi

,

Ling Zhao

and

Zhen Zhao

School of Mechanical & Automotive Engineering, Liaocheng University, Liaocheng 252059, China

^*

Author to whom correspondence should be addressed.

Processes 2022, 10(6), 1080; https://doi.org/10.3390/pr10061080

Submission received: 27 April 2022 / Revised: 24 May 2022 / Accepted: 24 May 2022 / Published: 27 May 2022

(This article belongs to the Topic Energy Efficiency, Environment and Health)

Download

Browse Figures

Versions Notes

Abstract

:

The main problem in current energy management is the ability of practical application. To address the problem, this paper proposes a reinforcement learning (RL)-based energy management by combining Tubule Q-learning and Pontryagin’s Minimum Principle (PMP) algorithms for a plug-in hybrid electric bus (PHEB). The main innovation distinguished from the existing energy management strategies is that a dynamic SOC design zone plan method is proposed. It is characterized by two aspects: ① a series of fixed locations are defined in the city bus route and a linear SOC reference trajectory is re-planned at fixed locations; ② a triangle zone will be re-planned based on the linear SOC reference trajectory. Additionally, a one-dimensional state space is also designed to ensure the real-time control. The off-line trainings demonstrate that the agent of the RL-based energy management can be well trained and has good generalization performance. The results of hardware in loop simulation (HIL) demonstrate that the trained energy management has good real-time performance, and its fuel consumption can be decreased by 12.92%, compared to a rule-based control strategy.

Keywords:

plug-in hybrid electric bus; energy management; Q-learning; dynamic SOC design zone; hardware in loop simulation

Graphical Abstract

1. Introduction

The environmental pollution caused by the rapid development of the transportation industry cannot be ignored, and the electric vehicle is expected to solve this dilemma [1]. Plug-in hybrid electric vehicles (PHEVs), characterized by combining an electric motor and an internal combustion engine, can reduce gas emissions [2]. In practice, at least two power sources are deployed in the PHEV, that is, at least two degrees of freedom will be introduced into the energy management [3]. Therefore, energy management is the most important issue for the PHEVs [4,5].

Many energy management strategies such as rule-based, optimization–based, prediction-based, and reinforcement learning (RL)-based methods have been proposed. Nevertheless, no matter what methods are proposed, how to realize the real-time and economic control in real world is the objective of these methods. The rule-based energy management can easily realize real-time control in the real world. However, only when the key parameters are elaborately designed can the control performance improve. For example, Ding N. et al. proposed a hybrid energy management system based on a rule-based control strategy and genetic algorithm to improve the fuel economy and overcome the battery limitations [6]. Li P. et al. proposed an intelligent logic rule-based energy management method by optimizing the working area of the engine [7]. The optimization-based energy management can realize better economic control in the real world, although the real-time control performance may be sacrificed. For example, Hassanzadeh M. et al. proposed an energy management strategy based on PMP to improve the fuel economy and battery life in the uncertain traffic condition [8]. Wang W. et al. proposed an economic method based on dynamic programming (DP) and a feedback energy management system, which can maintain the state of charge of the battery (SOC) within the desired range [9]. Geng S. et al. compared various prediction methods and equivalent consumption minimization strategy (ECMS) implementations to evaluate the fuel consumption [10]. The prediction method-based energy management can realize real-time control in the real world, while the prediction precision should be well controlled. For example, Lian J. et al. proposed a predictive control algorithm by combining the long short-term memory network (LSTM) and the model predictive control (MPC) strategy to reduce fuel consumption under the constrains of SOC trajectory [11]. Liu Y J. et al. proposed a robust design method based on the Taguchi robust design method, where a nonlinear model predictive control (NMPC) is deployed to realize the real-time energy management [12]. In contrast, the RL-based energy management may be a promising method to realize real-time and economic control in the real world because it can realize self-learned control, thereby enhancing the control performance. For example, Lin X. et al. proposed an intelligent energy management strategy based on an improved RL algorithm with exploration factor to enhance the adaptability and improve the fuel economy [13]. Zhang H. et al. proposed a novel RL-based energy management method named Coach-Actor-Double Critic with a bi-level onboard controller to improve the self-leaning ability and adaptability [14]. However, the RL-based energy management also has some disadvantages that may hinder its practical application.

The RL-based energy management is mainly focused on the Q-learning (QL) algorithm. The easy implementation and good real-time control performances make the Tabular QL-based strategy the most popular method [15,16]. However, the problem of the “curse of dimensionality” is difficult to be avoided, once the state space is sufficiently large. Moreover, it can only solve the problem of discrete state and action. The key problem of RL-based energy management is to accurately identify the optimal action based on current state. As SOC should decline to the expected value at the destination, the best action may have a strong relationship with required power, velocity, and travelled distance for PHEV. Moreover, only when the state is adequately discrete can the action catch the best action [17]. Therefore, more than three states may be defined to ensure the strategy has good generalization performance. In this case, if every state is a discrete 100 segments, 1,000,000 states may be generated in the Q-table. This may lead to the “curse of dimensionality” problem, once the strategy is implemented into the currently used vehicle controller. Deep QL (DQL) based strategies can solve the continuous state space problem, which can avoid the “curse of dimensionality” problem by substituting the Q-table by a neural network model (NN) [18,19]. However, the control performance may deteriorate once the fitting precision of neural network (NN) is low. In contrast, Ref. [20] proposed an Action-Critic control framework (AC) to solve the problem of continuous state and action space. However, multiple NNs should be designed in the strategy. Therefore, it is difficult to implement into currently used controllers in order to realize real-time control, because the computation burden will be greatly increased. Similarly, Ref. [21] further proposed a deep deterministic policy gradient (DDPG) framework which considers the traffic information. However, it also has the similar problems as AC.

Thanks to the instantaneous optimization performance of Pontryagin’s Minimum Principle (PMP) and its easy implementable performance for the current vehicle controller, it is a feasible method to be used in real-world, once the co-state can be dynamically recognized [22,23]. So, if we combine the QL and the PMP as the recognition algorithm in a dynamical and uncertain traffic environment, the control performance may be enhanced. However, the “curse of dimensionality” problem should be avoided, if the real-time energy management control is to be satisfied. To solve this problem, Ref. [24] proposed a feasible method by designing the state as the difference between the feedback SOC and the reference SOC. Here, the SOC reference trajectory is designed by the optimal SOC trajectories calculated by a series of historical driving conditions. Inspired by this method, this paper proposes an RL-based energy management together with a similar state variable design method. In particular, the main difference between Ref. [24] and our work is that a dynamic linear SOC reference trajectory is planned at fixed locations based on the feedback SOC, and a triangle zone will be re-planned based on the reference SOC trajectory. The main advantage of this method is that the dynamic triangle zone can provide a margin for the fuel economy improvement, and can guide the feedback SOC reach the objective SOC.

The remainder of this paper is structured as follows. The modeling of the PHEB is introduced in Section 2. The RL-based energy management is detailed in Section 3. The results and discussion are presented in Section 4, and the conclusions are drawn in Section 5.

2. The Description of the PHEB

Figure 1 shows the layout of the PHEB. It is constituted by an engine, a clutch, an electric motor (EM), and a 6-speed automated mechanical transmission (AMT). The gears used in this paper are from 2 to 6, without considering the climbing driving conditions. Many working modes such as engine driving, hybrid driving, motor driving, and regenerative braking can be realized based on the driving demand and energy management.

An engine model that can satisfy the energy management requirement is indispensable. Based on the fuel consumption rate MAP of the engine (Figure 2), the instantaneous fuel consumption of the engine is formulated by

m_{e} = \frac{T_{e} \cdot n_{e}}{9500} \cdot \frac{b_{e} (T_{e}, n_{e})}{3600} \cdot Δ t

(1)

where

m_{e}

denotes the instantaneous fuel consumption of the engine;

T_{e}

denotes the torque of the engine;

n_{e}

denotes the speed of the engine; and

b_{e} (T_{e}, ω_{e})

denotes the fuel consumption rate of the engine.

Similarly, the motor is formulated by

\{\begin{cases} P_{m} = n_{m} \cdot T_{m} \cdot η_{m}^{- sgn (T_{m})} / 9550 \\ sgn (T_{m}) = \{\begin{cases} 1 if T_{m} \geq 0 (motoring mode) \\ - 1 if T_{m} < 0 (generating mode) \end{cases} \end{cases}

(2)

where

P_{m}

denotes the power of the motor;

η_{m}

denotes the efficiency of the motor, which can be interpolated by a look-up table formulated by the efficiency map of the motor (Figure 3); and

n_{m}

and

T_{m}

denote the speed and the torque of the motor, respectively.

As shown in Figure 4, the battery is formulated as a Rint model; it is described as

\{\begin{cases} S \dot{O} C = - \frac{1}{Q_{b}} I_{b} \\ I_{b} = \frac{1}{2 R_{b}} (V_{b} - \sqrt{V_{b}^{2} - 4 R_{b} P_{b}}) \end{cases}

(3)

where

V_{b}

denotes the battery voltage;

P_{b}

denotes the battery power;

R_{b}

denotes the internal resistance; and

Q_{b}

denotes the battery capacity.

3. The Formulation of the RL-Based Energy Management

3.1. The Formulation of PMP

In this paper, an economic gear shift strategy is executed during shift process. In consideration of minimizing the fuel consumption, the objective function is described as

\min J = \int_{t_{0}}^{t_{f}} [m_{e} (u (t))] d t

(4)

where

J

denotes the performance index function and

u (t)

denotes the control vector, which is the throttle of the engine (

u (t) = [t h (t)]

). Here, the throttle of the engine is denoted by

t h (t)

, which ranges from 0 to 1.

Inspired by Ref. [25], the energy management can be converted into instantaneous optimization problem by minimizing the Hamiltonian function, which can be described as

\{\begin{cases} H (x (t), u (t), λ (t), t) = m_{e} (u (t)) + λ (t) \cdot S \dot{O} C (t) \\ u^{*} (t) = \arg \min \{H (x (t), u (t), λ (t), t)\} \\ S \dot{O} C (t) = - \frac{V_{b} (x (t)) - \sqrt{V_{b}^{2} (x (t)) - 4 R_{b} (x (t)) P_{b} (u (t))}}{2 R_{b} (x (t)) Q_{b}} \end{cases}

(5)

where

u^{*} (t)

denotes the optimum control solution;

H (x (t), u (t), λ (t), t)

denotes the Hamiltonian function, whereby the first term is the instant fuel consumption, and the second term is the delta SOC which is multiplied by co-state; and

λ (t)

denotes the co-state.

Theoretically, the co-state is the only key parameter to influence the optimization performance and is a time-varied value. It can be also approximately recognized as constant over the whole trip, based on Refs. [22,23]. Moreover, it can also be adapted in real-time, based on the driving conditions [26].

Furthermore, some constraints, with respect to the physical components of the PHEB, are also indispensable, which is described as

S.t. \{\begin{cases} ω_{e_min} \leq ω_{e} (t) \leq ω_{e_max} \\ ω_{m_min} \leq ω_{m} (t) \leq ω_{m_max} \\ P_{e_min} (ω_{e} (t)) \leq P_{e} (t) \leq P_{e_max} (ω_{e} (t)) \\ P_{m_min} (ω_{m} (t)) \leq P_{m} (t) \leq P_{m_max} (ω_{m} (t)) \end{cases}

(6)

where

ω_{e} (t)

and

ω_{m} (t)

denote the rotate speeds of the engine and the motor, respectively;

ω_{e_min}

,

ω_{e_max}

and

ω_{m_min}

,

ω_{m_max}

denote the corresponding rotate speed boundaries;

P_{e} (t)

and

P_{m} (t)

denote the powers of the engine and the motor, respectively; and

P_{e_min}

,

P_{e_max}

and

P_{m_min}

,

P_{m_max}

denote the corresponding power boundaries.

3.2. The Design of the Dynamic SOC Design Zone

In this paper, only one state for the difference between the feedback SOC and reference SOC trajectory is designed for the RL-based energy management. Therefore, the SOC reference trajectory becomes the key issue. As shown in Figure 5, a novel dynamic SOC design zone plan method is proposed. The basic principle is that a linear SOC reference trajectory will be firstly planned at the fixed location, and simultaneously a dynamic reference SOC zone will be defined based on the linear SOC reference trajectory. There are two advantages to this method.

(1). The state of the difference of SOC between the dynamic reference SOC and the feedback SOC is only defined. In this case, only a Q-matrix with 100 rows and 27 columns is designed, which can ensure the real-time control of the strategy.

(2). In practice, the real driving conditions cannot be completely predicted, so the optimal SOC trajectory is usually not completely predicted. In this case, taking the optimal SOC trajectory as reference trajectory is infeasible. However, if a feasible zone (the floating value is designed as 0.02) is defined based on the liner SOC reference trajectory, the feedback SOC may be controlled within a feasible zone, and a great margin for the fuel economy improvement may be provided. This is also the most important innovation in this paper.

The main difference between the proposed method and the existing methods are listed as follows.

(1). In Ref. [24], only a SOC reference trajectory is designed, based on a series of optimal SOC trajectories. In terms of our method, the SOC reference trajectory is only based on the fixed location. Moreover, a triangle zone will be also defined to improve the fuel economy of the PHEB.

(2). In Ref. [25], only an efficient zone is defined based on a series of optimization SOC trajectories, and no dynamic reference SOC trajectory is planned. Moreover, three states should be designed, and the efficient zone is designed off-line.

In addition, the dynamic SOC design zone plan method can be described as

\{\begin{cases} S O C_{ref} = \frac{S O C_{T} - S O C_{fnl}}{D_{T} - D_{fnl}} \cdot (D_{ref} - D_{fnl}) + S O C_{fnl} \\ S O C_{upper} = \frac{S O C_{T} + 0.02 - S O C_{fnl}}{D_{T} - D_{fnl}} \cdot (D_{ref} - D_{fnl}) + S O C_{fnl} \\ S O C_{lower} = \frac{S O C_{T} - 0.02 - S O C_{fnl}}{D_{T} - D_{fnl}} \cdot (D_{ref} - D_{fnl}) + S O C_{fnl} \end{cases}

(7)

where

S O C_{ref}

and

D_{ref}

denote the reference SOC and travelled distance at current time step, respectively;

S O C_{T}

and

D_{T}

denote the dynamic value of target SOC and the travelled distance, respectively, which are updated after fixed distance step; and

S O C_{fnl}

and

D_{fnl}

denote the value of target SOC and travelled distance at destination, respectively.

3.3. The Formalation of the RL-Based Energy Managemtn

QL is one of the most important RL methods, based on the Temporal-Difference (TD) method. The update process of the Q value can be described as

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t + 1} + γ \max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})]

(8)

where

α

denotes the learning rate, which is defined as 0.95 in this paper;

γ

denotes the discount factor, which is defined as 0.8 in this paper;

r_{t + 1}

denotes the immediate reward at time

t

; and

\max_{a} Q (s, a)

denotes the maximum Q value in the next state.

As shown in Figure 6, at every time step, the agent will obtain a state

S_{t}

from the environment, and an action

a_{t}

will be evaluated by the agent. Then, a state of the environment

s_{t + 1}

will be adapted, and a corresponding reward

r_{t + 1}

will be transmitted to the agent.

As an instantaneous optimization algorithm, PMP-based energy management has good real-time control performance. The only challenge is to recognize the co-state for the unrepeatable, stochastic driving conditions. In addition, QL is widely regarded as an intelligent algorithm that can adapt well to uncertain circumstances. Motivated by this, an RL-based energy management, combining PMP and RL, is proposed. As shown in Figure 7, at every time step, the agent will evaluate an action based on the states, and the states of PHEB will be adapted based on the action, then a reward will be generated and transmitted to the agent.

(1): The state

As stated above, the difference between the feedback SOC and the reference SOC is defined as the sole state, which is descried as

s_{t} = S O C_{f} - S O C_{ref}

(9)

where

S O C_{f}

denotes the feedback SOC. The state

s_{t}

ranges from −0.04 to 0.04, and is sampled by 100 points. That is to say, the number of rows is only 100, which provides a basis to reduce the dimensionality of the Q and R tables.

(2): The action

The co-state is defined as the only action, which ranges from −2800 to −4000. Specifically, the action space is

a_{t} = [\begin{array}{l} - 2800, - 2900, - 3000, - 3100, - 3200, - 3250, - 3275, - 3300, - 3325, - 3350, - 3375, - 3390, - 3400, \\ - 3410, - 3425, - 3450, - 3475, - 3500, - 3525, - 3550, - 3575, - 3600, - 3650, - 3700, - 3800, - 3900, - 4000 \end{array}]

(10)

(3): The reward

As stated above, the reward is defined as Equation (11). Specifically, if the feedback SOC at time step

t + 1

is larger than the upper boundary, then a punishment will be provided to the Q-value function. Moreover, the further the deviation is, the greater the punishment will be; if the feedback SOC at time step

t + 1

is lower than the lower boundary, then a punishment will be provided to the Q-value function. Moreover, if the feedback SOC at time step

t + 1

located in the feasible zone (between the lower and upper boundaries), a reward will be provided to the Q-value function, and the closer the feedback SOC to the reference, the greater the reward will be.

r_{t} = \{\begin{cases} - a b s (S O C_{f} (t + 1) - S O C_{upper} (t + 1)) S O C_{f} (t + 1) > S O C_{upper} (t + 1) \\ - a b s (S O C_{f} (t + 1) - S O C_{lower} (t + 1)) S O C_{f} (t + 1) < S O C_{lower} (t + 1) \\ \frac{10}{1 + 100 * abs (S O C_{f} (t + 1) - S O C_{r} (t + 1))} S O C_{lower} (t + 1) \leq S O C_{f} (t + 1) \leq S O C_{upper} (t + 1) \end{cases}

(11)

(4): The ε-greedy algorithm

To realize the self-learning control, the ε-greedy algorithm is deployed, which is formulated by

π^{*} (a | s) = \{\begin{cases} a = random (A) if r_{n} < ε \\ a = \max_{} Q (s, a) if r_{n} \geq ε \end{cases}

(12)

where

r_{n}

is the random number, which ranges from 0 to 1.

(5): The RL-based energy management algorithm

1: initializing the Q and R tables with null matrix

2: for episode = 1, M do

3: for t = 1, T do

4: observing the current state

s_{t}

(

s_{t} = S O C_{f} - S O C_{ref}

)

5: selecting the action

a_{t}

with ε-greedy algorithm

6: executing the action(

a_{t}

) and observing the next state

7: calculating the immediate reward based on Eq. (11)

8: updating the Q-Table by:

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t + 1} + γ \max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})]

9: end

10: if the feedback SOC is bigger than 0.85 or lower than 0.25 or abs (

s (t)

) is bigger than 0.04
11: continue;

12: end

13: end

14: end

As shown in Figure 8, the design process is divided into three steps: off-line training, off-line verification, and hardware in the loop (HIL) with controller.

4. Result Discussions

A series of historical driving cycles of the PHEB are deployed for training and testing. The total length of route is about 50 km and has 39 bus stops; the number of passengers at per station is assumed to be random. Moreover, as shown in Figure 9, a series of combined driving cycles, including driving cycle and passenger mass, are designed in this paper.

4.1. The Training Process

To ensure the RL-based energy management has better control performance, a well-trained Q-table of the reinforcement learning is indispensable. Therefore, six combined driving cycles are firstly designed to train the Q-table. The Q-table is continually trained one by one based on the combined driving cycles, and will be trained 100 times for each combined driving cycle based on different

ε

value. Specifically, the training is divided into three stages; in the first stage, the

ε

is designed as 0.5 before episode 45, which implies that the action is randomly selected by the probability of 50%; in the second stage, the

ε

is designed as 0.15 between the episode 46 and episode 75, which implies that the action is randomly selected by the probability of 15%; in the third stage, the

ε

is designed as 0 between the episode 76 and episode 100, which implies that the action is selected by the trained Q-table.

As shown in Figure 10, the combined driving cycle 1 is firstly selected to train the Q-table (named Q-table 1). In the first stage, the agent strives to probe the possible action. In this case, the final SOCs are higher than 0.6, which implies that the Q-table 1 is not well trained; in the second stage, the RL will partly select the action based on the trained Q-table 1, whilst trying to probe possible actions by the ε-greedy algorithm. In this case, the SOCs will be fluctuated around 0.3, which implies the Q-table 1 has been better trained. In the third stage, the final SOC can easily reach the objective vale, and the feedback SOC trajectory can easily follow the reference SOC trajectory. This implies that the Q-table 1 has been well trained for the combined driving cycle 1.

As shown in Figure 11, the combined driving cycle 2 is deployed to continually train the Q-table (Q-table 2), based on Q-table 1. The RL will still be terminated in advance in the first stage. Moreover, the final SOCs are higher than 0.5, which are lower than the final SOCs in Figure 10. In the second stage, the final SOCs can satisfy the control object, and has good control perforce. In the third stage, the Q-table has been well trained, the final SOC can satisfy the control object, and the SOC trajectory can better follow the reference SOC trajectory well. This implies that the Q-table 2 has been better trained compared to the Q-table in Figure 10.

As shown in Figure 12, the combined driving cycle 3 is deployed to continually train the Q-table (Q-table 3), based on the well trained Q-table 2. In the first stage, the control performance has been greatly improved, compared to the first stage for Q-table 1 and 2. However, the final SOCs still do not satisfy the control object. In contrast, the control preference is deteriorated in stage 2 compared to the Q-table 2. This implies that the driving conditions may be different from the combined driving cycle 2, and the generalization performance of the Q-table should be further improved. Nevertheless, the control preference can satisfy the control objective during the third stage, and it is recognized that the Q-table 3 has been well trained.

As shown in Figure 13, the combined driving cycle 4 is deployed to continually train the Q-table (Q-table 4), based on the well trained Q-table 3. Similar to the first stage in Figure 12, the Q-table 4 is also not well trained, because the final SOCs do not reach 0.3. However, the control performance is greatly improved in the second stage, and the control performance can satisfy the control objective in the stage 3. This implies that Q-table 4 has been well trained.

It can be seen from Figure 14 and Figure 15 that the Q-table 4 has been well trained, because the control performance can be well satisfied in three stages, no matter the

ε

value. This implies that the generalization performance of the Q-table 4 has been greatly improved, and can satisfy the control performance. On the other hand, the generalization performance of the Q-table 4 will be further improved based on the trainings of the two combined driving cycles. In addition, Q-table 6 can be taken as the well trained Q-table being after continually trained by combined driving cycle 5 and 6.

4.2. The Off-Line Verification

To verify the generalization and reliable performances of the Q-Table 6, the combined driving cycle 7 (denoted by No.7) and combined driving cycle 8 (denoted by No.8) are deployed. From Figure 16, it can be seen that the co-state can be well adjusted, based on the driving cycles and the Q-Table 6. Moreover, the final SOCs can satisfy the control object, and the SOC trajectories locate the designed boundary and can easily follow the SOC reference trajectories. This implies that the RL-based strategy has great potential for practical application. In addition, a rule-based energy management is also deployed to evaluate the fuel economy of the RL-based energy management method. From Table 1, it can be seen that the fuel consumption of conditions 7 and 8 can be reduced by 10.95% and 11.78%, respectively.

4.3. The Hardware in Loop Simulation Verify

As shown in Figure 17, a HIL test system mainly includes HCU, switch, Upper computer, CAN communication interface, DC 12 V, and Kvaser, built to verify the real-time and reliability of RL-based energy management. Here, the upper computer transmits CAN signals to the HCU through Kvaser.

As shown in Figure 18, a HIL simulation model is built based on D2P rapid prototyping control system and the well trained strategy. It mainly includes three modules: Input, HCU, and Output, where HCU is used to embed the RL-based energy management.

Additionally, the combined driving cycle 9 (denoted by No.9) is also deployed. From Figure 19, it can be seen that the co-state can be adjusted in real-time based on the well-trained strategy, and the control performance can be sufficiently satisfied. Moreover, the fuel consumption can be decreased by 12.92% compared to the rule based strategy as shown in Table 2.

5. Conclusions

This paper proposes an RL-based energy management method based on a novel dynamic SOC design zone plan. The main conclusions are summarized as follows.

Firstly, the proposed dynamic SOC design zone plan method is feasible and applicable. The fuel consumption can be greatly decreased compared to the rule-based energy management.

Secondly, the agent of RL-based energy management can be well trained, and has good generalization performance. Moreover, the trained strategy can be easily embedded into the controller, and the real-time control performance can be satisfied well. It has great potential to be used in practice.

Future work will focus on the further verification of the RL-based energy management in the real vehicles.

Author Contributions

Investigation, X.C.; methodology, W.H. and X.C.; software, S.S. and Z.Z.; supervision, L.Z.; validation, S.S. and Z.Z.; writing-original draft, W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ajanovic, A.; Haas, R.; Schrödl, M. On the historical development and future prospects of various types of electric mobility. Energies 2021, 14, 1070. [Google Scholar] [CrossRef]
Plötz, P.; Moll, C.; Bieker, G.; Mock, P. From lab-to-road: Real-world fuel consumption and CO2 emissions of plug-in hybrid electric vehicles. Environ. Res. Lett. 2021, 16, 054078. [Google Scholar] [CrossRef]
Zhang, F.; Hu, X.; Langari, R.; Cao, D. Energy management strategies of connected HEVs and PHEVs: Recent progress and outlook. Prog. Energy Combust. Sci. 2019, 73, 235–256. [Google Scholar] [CrossRef]
Huang, Y.; Wang, H.; Khajepour, A.; Li, B.; Ji, J.; Zhao, K.; Hu, C. A review of power management strategies and component sizing methods for hybrid vehicles. Renew. Sustain. Energy Rev. 2018, 96, 132–144. [Google Scholar] [CrossRef]
Biswas, A.; Emadi, A. Energy Management Systems for Electrified Powertrains: State-of-The-Art Review and Future Trends. IEEE Trans. Veh. Technol. 2019, 68, 6453–6467. [Google Scholar] [CrossRef]
Ding, N.; Prasad, K.; Lie, T.T. Design of a hybrid energy management system using designed rule-based control strategy and genetic algorithm for the series-parallel plug-in hybrid electric vehicle. Int. J. Energy Res. 2021, 45, 1627–1644. [Google Scholar] [CrossRef]
Li, P.; Li, Y.; Wang, Y.; Jiao, X. An intelligent logic rule-based energy management strategy for power-split plug-in hybrid electric vehicle. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 7668–7672. [Google Scholar]
Hassanzadeh, M.; Rahmani, Z. Real-time optimization of plug-in hybrid electric vehicles based on Pontryagin’s minimum principle. Clean Technol. Environ. Policy 2021, 23, 2543–2560. [Google Scholar] [CrossRef]
Wang, W.; Cai, Z.; Liu, S. Study on Real-Time Control Based on Dynamic Programming for Plug-In Hybrid Electric Vehicles. SAE Int. J. Electrified Veh. 2021, 10, 167. [Google Scholar] [CrossRef]
Geng, S.; Schulte, T.; Maas, J. Model-Based Analysis of Different Equivalent Consumption Minimization Strategies for a Plug-In Hybrid Electric Vehicle. Appl. Sci. 2022, 12, 2905. [Google Scholar] [CrossRef]
Lian, J.; Wang, X.R.; Li, L.H.; Zhou, Y.F.; Yu, S.Z.; Liu, X.J. Plug-in HEV energy management strategy based on SOC trajectory. Int. J. Veh. Des. 2020, 82, 1–17. [Google Scholar] [CrossRef]
Liu, Y.J.; Sun, Q.; Han, Q.; Xu, H.G.; Han, W.X.; Guo, H.Q. A Robust Design Method for Optimal Engine Operating Zone Design of Plug-in Hybrid Electric Bus. IEEE Access 2022, 10, 6978–6988. [Google Scholar] [CrossRef]
Lin, X.; Zhou, K.; Mo, L.; Li, H. Intelligent Energy Management Strategy Based on an Improved Reinforcement Learning Algorithm With Exploration Factor for a Plug-in PHEV. IEEE Trans. Intell. Transp. Syst. 2021, 1–11. [Google Scholar] [CrossRef]
Zhang, H.; Peng, J.; Tan, H.; Dong, H.; Ding, F. A Deep Reinforcement Learning-Based Energy Management Framework With Lagrangian Relaxation for Plug-In Hybrid Electric Vehicle. IEEE Trans. Transp. Electrif. 2020, 7, 1146–1160. [Google Scholar] [CrossRef]
Liu, T.; Hu, X.; Hu, W.; Zou, Y. A Heuristic Planning Reinforcement Learning-Based Energy Management for Power-Split Plug-in Hybrid Electric Vehicles. IEEE Trans. Ind. Inform. 2019, 15, 6436–6445. [Google Scholar] [CrossRef]
Chen, Z.; Hu, H.; Wu, Y.; Zhang, Y.; Li, G.; Liu, Y. Stochastic model predictive control for energy management of power-split plug-in hybrid electric vehicles based on reinforcement learning. Energy 2020, 211, 118931. [Google Scholar] [CrossRef]
Guo, H.Q.; Wei, G.; Wang, F.; Wang, C.; Du, S. Self-Learning Enhanced Energy Management for Plug-in Hybrid Electric Bus With a Target Preview Based SOC Plan Method. IEEE Access 2019, 7, 103153–103166. [Google Scholar] [CrossRef]
Qi, C.; Zhu, Y.; Song, C.; Cao, J.; Xiao, F.; Zhang, X.; Xu, Z.; Song, S. Self-supervised reinforcement learning-based energy management for a hybrid electric vehicle. J. Power Sources 2021, 514, 230584. [Google Scholar] [CrossRef]
Wu, Y.; Tan, H.; Peng, J.; Zhang, H.; He, H. Deep reinforcement learning of energy management with continuous control strategy and traffic information for a series-parallel plug-in hybrid electric bus. Appl. Energy 2019, 247, 454–466. [Google Scholar] [CrossRef]
Tan, H.; Zhang, H.; Peng, J.; Jiang, Z.; Wu, Y. Energy management of hybrid electric bus based on deep reinforcement learning in continuous state and action space. Energy Convers. Manag. 2019, 195, 548–560. [Google Scholar] [CrossRef]
He, W.; Huang, Y. Real-time Energy Optimization of Hybrid Electric Vehicle in Connected Environment Based on Deep Reinforcement Learning. IFAC-PapersOnLine 2021, 54, 176–181. [Google Scholar] [CrossRef]
Kim, N.; Jeong, J.; Zheng, C. Adaptive energy management strategy for plug-in hybrid electric vehicles with Pontryagin’s minimum principle based on daily driving patterns. Int. J. Precis. Eng. Manuf.-Green Technol. 2019, 6, 539–548. [Google Scholar] [CrossRef]
Xie, S.; Hu, X.; Xin, Z.; Brighton, J. Pontryagin’s minimum principle based model predictive control of energy management for a plug-in hybrid electric bus. Appl. Energy 2019, 236, 893–905. [Google Scholar] [CrossRef] [Green Version]
Guo, H.; Du, S.; Zhao, F.; Cui, Q.; Ren, W. Intelligent Energy Management for Plug-in Hybrid Electric Bus with Limited State Space. Processes 2019, 7, 672. [Google Scholar] [CrossRef] [Green Version]
Guo, H.; Zhao, F.; Guo, H.; Cui, Q.; Du, E.; Zhang, K. Self-learning energy management for plug-in hybrid electric bus considering expert experience and generalization performance. Int. J. Energy Res. 2020, 44, 5659–5674. [Google Scholar] [CrossRef]
Onori, S.; Tribioli, L. Adaptive Pontryagin’s Minimum Principle supervisory controller design for the plug-in hybrid GM Chevrolet Volt. Appl. Energy 2015, 147, 224–234. [Google Scholar] [CrossRef]

Figure 1. The structure of the PHEB.

Figure 2. The fuel consumption rate MAP of the engine.

Figure 3. The efficiency MAP of the motor.

Figure 4. The Rint model of the battery.

Figure 5. The principle of the dynamic SOC design zone method.

Figure 6. The principle of the reinforcement learning.

Figure 7. The principle of the RL-based energy management.

Figure 8. The design process of the RL-based energy management.

Figure 9. The combined driving cycles.

Figure 10. The training process of combined driving cycle 1.

Figure 11. The training process of combined driving cycle 2.

Figure 12. The training process of combined driving cycle 3.

Figure 13. The training process of combined driving cycle 4.

Figure 14. The training process of combined driving cycle 5.

Figure 15. The training process of combined driving cycle 6.

Figure 16. Offline verification results of combined driving cycles 7 and 8.

Figure 17. The HIL test system.

Figure 18. The composition of HIL.

Figure 19. The HIL test results of combined driving cycle 9.

Table 1. The fuel consumptions of the off-line verification.

Combined Driving Cycle	RL-Based (L/100 km)	Rule-Based (L/100 km)	Fuel Consumption Comparison
No.7	16.8738	18.9497	−10.95%
No.8	16.6673	18.8939	−11.78%

Table 2. The fuel consumption of the HIL.

Combined Driving Cycle	RL-Based (L/100 km)	Rule-Based (L/100 km)	Fuel Consumption Comparison
No.9	15.2956	17.5642	−12.92%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, W.; Chu, X.; Shi, S.; Zhao, L.; Zhao, Z. Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method. Processes 2022, 10, 1080. https://doi.org/10.3390/pr10061080

AMA Style

Han W, Chu X, Shi S, Zhao L, Zhao Z. Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method. Processes. 2022; 10(6):1080. https://doi.org/10.3390/pr10061080

Chicago/Turabian Style

Han, Wenxiao, Xiaohua Chu, Sui Shi, Ling Zhao, and Zhen Zhao. 2022. "Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method" Processes 10, no. 6: 1080. https://doi.org/10.3390/pr10061080

APA Style

Han, W., Chu, X., Shi, S., Zhao, L., & Zhao, Z. (2022). Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method. Processes, 10(6), 1080. https://doi.org/10.3390/pr10061080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Practical Application-Oriented Energy Management for a Plug-In Hybrid Electric Bus Using a Dynamic SOC Design Zone Plan Method

Abstract

1. Introduction

2. The Description of the PHEB

3. The Formulation of the RL-Based Energy Management

3.1. The Formulation of PMP

3.2. The Design of the Dynamic SOC Design Zone

3.3. The Formalation of the RL-Based Energy Managemtn

4. Result Discussions

4.1. The Training Process

4.2. The Off-Line Verification

4.3. The Hardware in Loop Simulation Verify

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI