Next Article in Journal
Energy Performance Database of Building Heritage in the Region of Umbria, Central Italy
Next Article in Special Issue
A Multi-Function Conversion Technique for Vehicle-to-Grid Applications
Previous Article in Journal
Coordinated Control Strategies of VSC-HVDC-Based Wind Power Systems for Low Voltage Ride Through
Previous Article in Special Issue
Wheel Slip Control for Improving Traction-Ability and Energy Efficiency of a Personal Electric Vehicle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle

Collaborative Innovation Center of Electric Vehicles in Beijing, School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Energies 2015, 8(7), 7243-7260; https://doi.org/10.3390/en8077243
Submission received: 14 January 2015 / Revised: 16 June 2015 / Accepted: 29 June 2015 / Published: 16 July 2015
(This article belongs to the Special Issue Advances in Plug-in Hybrid Vehicles and Hybrid Vehicles)

Abstract

:
This paper presents a reinforcement learning (RL)–based energy management strategy for a hybrid electric tracked vehicle. A control-oriented model of the powertrain and vehicle dynamics is first established. According to the sample information of the experimental driving schedule, statistical characteristics at various velocities are determined by extracting the transition probability matrix of the power request. Two RL-based algorithms, namely Q-learning and Dyna algorithms, are applied to generate optimal control solutions. The two algorithms are simulated on the same driving schedule, and the simulation results are compared to clarify the merits and demerits of these algorithms. Although the Q-learning algorithm is faster (3 h) than the Dyna algorithm (7 h), its fuel consumption is 1.7% higher than that of the Dyna algorithm. Furthermore, the Dyna algorithm registers approximately the same fuel consumption as the dynamic programming–based global optimal solution. The computational cost of the Dyna algorithm is substantially lower than that of the stochastic dynamic programming.

1. Introduction

In recent years, hybrid electric vehicles (HEVs) are being widely used for reducing fuel consumption and emissions. In these vehicles, an energy management strategy controls the power distribution among multiple energy storage systems [1,2]. This strategy realizes several control objectives, such as the driver’s power demand, optimal gear shifting, and battery state-of-charge (SOC) regulation. Many optimal control methods have been proposed for designing energy management strategies in HEVs. For instance, because vehicles follow a certain driving cycle, the deterministic dynamic programming (DDP) approach can be used to obtain global optimal results [3,4,5]. In addition, previous studies have applied the stochastic dynamic programming (SDP) approach to utilize the probabilistic statistics of the power request [6,7]. Pontryagin’s minimum principle was introduced in [8,9] and an equivalent consumption minimization strategy was suggested in [10,11,12] to obtain optimal control solutions. Furthermore, a model predictive control was introduced in [13] and convex optimization was presented in [14]. Recently, game theory [15] and reinforcement learning (RL) [16] have attracted research attention for HEV energy management. RL is a heuristic learning method applied in numerous areas, such as robotic control, traffic improvement, and energy management. For example, previous studies have applied RL approaches for robotic control and for enabling robots to learn and adapt to situations online [17,18]. Furthermore, [19] proposed an RL approach for enabling a set of unmanned aerial vehicles to automatically determine patrolling patterns in a dynamic environment.
The aforementioned RL studies have not evaluated energy management strategies for HEVs. A power management strategy for an electric hybrid bicycle was presented in [20]; however, the powertrain is simpler than that in HEVs and the power is not distributed among multiple power sources. In the current study, RL was applied to solve an energy management problem of a hybrid electric tracked vehicle (HETV). Statistical characteristics of an experimental driving schedule were extracted as a transition probability matrix of the power request. The energy management problem was formulated as a stochastic nonlinear optimal control problem with two state variables, namely the battery SOC and rotational speed of the generator, and one control variable, namely the engine throttle signal. Subsequently, the Q-learning and Dyna algorithms were applied to determine an energy management strategy for improving the fuel economy performance and achieving battery charge sustenance. Furthermore, the RL-based energy management strategy was compared with the dynamic programming (DP)–based energy management strategy. The simulation results indicated that the Q-learning algorithm entailed a lower computational cost (3 h) compared with the Dyna algorithm (7 h); nevertheless, the fuel consumption of the Q-learning algorithm was 1.7% higher than that of the Dyna algorithm. The Dyna algorithm registered almost the same fuel consumption as the DP-based global optimal solution. The Dyna algorithm is computationally more effective than SDP. However, because of their computational burdens, the Q-learning and Dyna algorithms cannot be used in current online operations, and further research on real-time applications is required.
The remainder of this paper is organized as follows: in Section 2, a hybrid powertrain is modeled and the optimal control problem is formulated. In Section 3, a statistical information model that is based on the experimental driving schedule is developed, and the Q-learning and Dyna algorithms are presented. The RL-based energy management strategy is compared with the DP-, and SDP-based energy management strategies in Section 4. Section 5 concludes this paper.

2. Hybrid Powertrain Modeling

Figure 1 shows a heavy-duty HETV with a dual-motor drive structure. The powertrain comprises two main power sources: an engine-generator set (EGS) and a battery pack. The dashed arrow lines in the figure indicate the directions of power flows. To guarantee a quick and adequately precise simulation, a quasi-static modeling methodology [21] was used to model the power request of the hybrid powertrain. Table 1 lists the vehicle parameters used in the model.
Figure 1. Powertrain configuration of the HETV.
Figure 1. Powertrain configuration of the HETV.
Energies 08 07243 g001
Table 1. HETV Parameters.
Table 1. HETV Parameters.
ParameterSymbolValue
Sprocket radiusr0.313 m
Inertial yaw momentIz55,000 kg·m2
Motor shafts efficiencyη0.965
Gear ratio param.i013.2
Vehicle treadB2.55 m
Curb weightmv15,200 kg
Gravit. constantg9.81 m/s2
Rolling resis. coefficientf0.0494
Contacting track widthL3.57 m
Motor efficiencyηem0.9
Electromotive force param.Ke1.65 Vsrad−2
Electromotive force param.Kx0.00037 NmA−2
Generator inertiaJg2.0 kg·m2
Engine inertiaJe3.2 kg·m2
Gear ratio param.ieg1.6
Battery capacityQb50 Ah
Min. engine speedneng,min650 rpm
Max. engine speedneng,max2100 rpm
Min. SOCSOCmin0.2
Max. SOCSOCmax0.8

2.1. Power Request Model

Assume that only longitudinal motions are considered [4]; the torque of the two motors is calculated as follows:
T 1 = ( F 1 r i 0 η M r B i 0 η ) + [ m v r 2 i 0 2 η R R B / 2 I z r 2 i 0 2 η B ( R B / 2 ) ] ω ˙ 1
T 2 = ( F 2 r i 0 η + M r B i 0 η ) + [ m v r 2 i 0 2 η R R + B / 2 + I z r 2 i 0 2 η B ( R B / 2 ) ] ω ˙ 2
where T1 and T2 are the torque of the inside and outside motors, respectively, and ω1 and ω2 are the rotational speed of the inside and outside sprockets, respectively; r is the radius of the sprocket, Iz is the yaw moment of inertial, η is the efficiency from the motor shafts to the tracks, i0 is the fixed gear ratio between motors and sprockets, B is the vehicle tread, R is the turning radius of the vehicle, mv is the curb weight, and F1 and F2 are the rolling resistance forces of the two tracks. The yaw moment from the ground M is evaluated as follows:
M = 1 4 u t m v g L
where g is the acceleration of gravity and L is the track contact length. The lateral resistance coefficient ut is computed empirically [22]:
u t = u max ( 0.925 + 0.15 R / B ) 1
where umax is the maximum value of the lateral resistance coefficient. The turning radius R is expressed as:
R = B 2 ω 2 + ω 1 ω 2 ω 1
The rotational speed of the inside and outside sprockets (ω1 and ω2, respectively) is calculated as follows:
ω 2 , 1 = 30 π v 2 , 1 i 0 r
where v1 and v2 are the speed of the two tracks. The rolling resistance forces acting on the two tracks are obtained using the following expression:
F 1 = F 2 = 1 2 f m v g
where f is the rolling resistance coefficient. The power request Preq should be balanced by the two motors anytime as follows:
P r e q = T 1 ω 1 η e m ± 1 + T 2 ω 2 η e m ± 1
where ηem is the efficiency of the motor. When the power request is positive, electric power is delivered to propel the vehicle and a positive efficiency sign is returned, and vice versa; however, when the powertrain absorbs the electric power (e.g., regenerative braking [23]), a negative efficiency sign is returned.

2.2. EGS Model

Figure 2 illustrates the equivalent electric circuit of the engine, permanent magnet, directive generator, and rectifier, where ωg is the rotational speed of the generator, Tg is the electromagnetic torque, Ke is the coefficient of the electromotive force, and Kxωg is the electromotive force; Kx is calculated as follows:
K x = 3 π K L g
where K is the number of poles and Lg is the synchronous inductance of the armature. The output voltage and current of the generator, Ug and Ig, respectively, are computed as follows [4]:
T e n g i e g T g = 0.1047 i e g ( J e n g i e g 2 + J g ) d n e n g d t
K e I g K x I g 2 = T g
U g = K e ω g K x ω g I g
n e n g = 30 ω g / π i e g
where neng and Teng are the rotational speed and torque of the engine, respectively. Furthermore, Je and Jg are the moments of inertia; ieg is the fixed gear ratio connecting the engine and generator. The power request is balanced at any time by the EGS and battery as follows:
P r e q = ( U g I g + U b I b ) η e m ± 1
where Ub and Ib are the voltage and current, respectively, of the battery. Figure 3 depicts the results of the EGS test and simulation run to validate the effectiveness of the equivalent electric circuit model, in which Ug and neng are predicted at an acceptable accuracy during the pulse transient current load.
Figure 2. Equivalent circuit of the engine-generator set.
Figure 2. Equivalent circuit of the engine-generator set.
Energies 08 07243 g002
Figure 3. Test and simulation results of the equivalent circuit.
Figure 3. Test and simulation results of the equivalent circuit.
Energies 08 07243 g003
The engine must be limited to the specific work area to ensure safety and reliability:
n e n g , min n e n g n e n g , max
0 T e n g T e n g , max
The fuel mass flow rate m ˙ f (g/s) was determined according to the engine torque Teng and speed neng by using a brake specific fuel consumption map, which is typically obtained through a bench test. The control variable, engine throttle signal u_th(t), was normalized in the range [0,1], and the engine’s torque was optimally regulated to control the power split between the EGS and battery to achieve minimum fuel consumption.

2.3. Battery Model

The SOC in a battery is a second state variable and is calculated as follows:
d ( S O C ( t ) ) d t = I b ( t ) Q b
I b ( t ) = V o c V o c 2 4 R int P b ( t ) 2 R int
where Qb is the battery capacity, Ib is the battery current, Voc is the open circuit voltage, Ri is the internal resistance, and Pb is the output power of the battery. To ensure reliability and safety, the current and SOC are constrained as:
I b , min I b ( t ) I b , max
SOC min SOC ( t ) SOC max
Figure 4 shows the Voc and Rint parameters [4].
Figure 4. Parameters of Voc and Rint.
Figure 4. Parameters of Voc and Rint.
Energies 08 07243 g004
Cost function minimization is a trade-off between fuel consumption and charge sustainability in the battery and is expressed as follows:
J = t 0 t f [ m ˙ f ( t ) + β ( S O C ( t f ) SOC ( t 0 ) ) 2 ] d t
where β is a positive weighting factor, which is normally identified through multiple simulation iterations, and [t0, tf] is the entire time span.

3. RL-Based Energy Management Strategy

RL is a machine learning approach in which an agent senses an environment through its state and responds to the environment through its action under a control policy. In the proposed model, the control policy is improved iteratively by RL algorithms called Q-learning and Dyna algorithms. The environment provides numerical feedback called a reward and supplies a transition probability matrix for the agent. According to the driving schedule statistical model, a transition probability matrix is extracted from the sample information. Subsequently, the RL algorithm is adopted to optimize fuel consumption in another driving schedule by using the transition probability matrix.

3.1. Statistic Information of the Driving Schedule

A long natural driving schedule, including significant accelerations, braking, and steering (Figure 5), was obtained through a field experiment. The power request corresponding to the driving schedule is calculated according to Equations (1)–(8) (Figure 6).
Figure 5. Long driving schedule of the tracked vehicle.
Figure 5. Long driving schedule of the tracked vehicle.
Energies 08 07243 g005
Figure 6. Power request of the long driving schedule.
Figure 6. Power request of the long driving schedule.
Energies 08 07243 g006
Maximum likelihood estimation and nearest neighbor method were employed to compute the transition probability of the power request [24]:
p i k , j = N i k , j N i k                          N i k 0
where Nik,j is the number of times the transition from Pireq to Pjreq has occurred at a vehicle average velocity of v ¯ k , and Nik is the total event counts of the Pireq occurrence at an average velocity of v ¯ k . A smoothing technique was applied to the estimated parameters [25]. Figure 7 illustrates the transition probability map at a velocity of 25 km/h.
Figure 7. Power request transition probability map at 25 km/h.
Figure 7. Power request transition probability map at 25 km/h.
Energies 08 07243 g007
In this study, according to the Markov decision processes (MDPs) introduced in [26], the driving schedule was considered a finite MDP. The MDP comprises a set of state variables S = {(SOC(t), neng(t))| 0.2 ≤ SOC(t) ≤ 0.8, neng,minneng(t) ≤ neng,max}, set of actions a = {u_th(t)}, reward function r = m ˙ f (s,a), and transition function psa, s’, where psa, s’ represents the probability of making a transition from state s to state s´ using action a.

3.2. Q-Learning and Dyna Algorithms

When π is used as a complete decision policy, the optimal value of a state s is defined as the expected finite discounted sum of the rewards [27], which is represented as follows:
V * ( s ) = min π   E ( t = t 0     t = t f γ t r t )
where γ ∈ [0,1] is the discount factor. The optimal value function is unique and can be reformulated as follows:
V * ( s ) = min a ( r ( s , a ) + γ s S p s a , s V * ( s ) )         s S
Given the optimal value function, the optimal policy is specified as follows:
π * ( s ) = arg min a ( r ( s , a ) + γ s S p s a , s V * ( s ) )
Subsequently, the Q value and optimal Q value corresponding to the state s and action a are defined recursively as follows:
Q ( s , a ) = r ( s , a ) + γ s S p s a , s Q ( s , a )
Q * ( s , a ) = r ( s , a ) + γ s S p s a , s min a Q * ( s , a )
The variable V*(s) is the value of s assuming an optimal action is taken initially; therefore, V*(s) = Q*(s, a) and π*(s) = arg mina Q*(s, a). The Q-learning updated rule is expressed as follows:
Q ( s , a ) : = Q ( s , a ) + α ( r + γ min a Q ( s , a ) Q ( s , a ) )
where α ∈ [0,1] is a decayed factor in Q-learning. Unlike the Q-learning algorithm, the Dyna algorithm operates by iteratively interacting with the environment. For a tracked vehicle, the Dyna algorithm records the sample information as the vehicle operates on a new driving schedule. Then, incremental statistical information is used to update the reward and transition functions. The Dyna algorithm updated rule is as follows:
Q ( s , a ) = r ¯ ( s , a ) + γ s S p ¯ s a , s Q ( s , a )
Q ( s , a ) : = Q ( s , a ) + α ( r + γ min a Q ( s , a ) Q ( s , a ) )
where r ¯ and p ¯ s a , s are time variant and change as the driving schedule is updated. The Dyna algorithm clearly entails a heavier computational burden compared with the Q-learning algorithm. Section 4 compares the optimality between the two algorithms. Figure 8 depicts the computational flowchart of the two algorithms.

4. Results and Discussion

4.1. Comparison between the Q-Learning and Dyna Algorithm

Figure 9 shows the experimental driving schedule used in the simulation. Figure 10 illustrates the mean discrepancy of the two algorithms at v = 25 km/h, where the mean discrepancy is the deviation of two Q values per 100 iterations. The mean discrepancy declined with iterative computations, indicating the convergence of the Q-learning and Dyna algorithms. Figure 10 also shows that the rate of convergence of the Dyna algorithm is faster than that of the Q-learning algorithm. A possible conclusion is that the time-variant reward function and the transition function in the Dyna algorithm accelerates the convergence [28].
Figure 8. Computational flowchart of the Q-learning and Dyna algorithms. * The MDP toolbox is introduced in [26].
Figure 8. Computational flowchart of the Q-learning and Dyna algorithms. * The MDP toolbox is introduced in [26].
Energies 08 07243 g008
Figure 9. Experimental driving schedule used in the simulation.
Figure 9. Experimental driving schedule used in the simulation.
Energies 08 07243 g009
Figure 10. Mean discrepancy of the value function in the Q-learning and Dyna algorithms.
Figure 10. Mean discrepancy of the value function in the Q-learning and Dyna algorithms.
Energies 08 07243 g010
Figure 11 depicts the simulation results of the Q-learning and Dyna algorithms. Because of the charge sustenance in the cost function, the final SOC values were close to the initial SOC value. Figure 11b shows the fuel consumption and working points of the engine. An SOC-correction method [29] was applied to compensate for the fuel consumption caused by the various SOC final values. Figure 12 illustrates the performance of the two algorithms. Table 2 lists the fuel consumption; the fuel consumption of the Dyna algorithm is lower than that of the Q-learning algorithm, which is attributable to the difference in the time-variant reward function and the transition function between the Dyna and Q-learning algorithms.
Figure 11. SOC trajectories and engine operation area in the Q-learning and Dyna algorithms.
Figure 11. SOC trajectories and engine operation area in the Q-learning and Dyna algorithms.
Energies 08 07243 g011
Figure 12. Battery and engine power in the Q-learning and Dyna algorithms.
Figure 12. Battery and engine power in the Q-learning and Dyna algorithms.
Energies 08 07243 g012
Table 2. Fuel consumption in the Q-learning and Dyna algorithms.
Table 2. Fuel consumption in the Q-learning and Dyna algorithms.
AlgorithmFuel Consumption (g)Relative Increase (%)
Dyna2847
Q-learning28961.72
Table 3 shows the computation times of the two algorithms; the Dyna algorithm has a longer computation time compared with the Q-learning algorithm. This is caused by the updated rule of the Dyna algorithm, in which the reward function and the transition probability are updated at a certain step size [28]. Thus, the updated transition probability and reward function of the Dyna algorithm resulted in lower fuel consumption but longer computation time.
Table 3. Computation times of the Q-learning and Dyna algorithms.
Table 3. Computation times of the Q-learning and Dyna algorithms.
AlgorithmsQ-learningDyna
Time a (h)37
a A 2.4 GHz microprocessor with 12 GB RAM was used.

4.2. Comparative Analysis of the Results of Dyna Algorithm, SDP, and DP

To validate the optimality of the RL technique, the Dyna algorithm, SDP [24], and DP [30] were controlled on the experimental driving schedule shown in Figure 10; Figure 13 presents the simulation results. The SOC terminal values were close to the initial values because of the final constraint in the cost function. Figure 13b illustrates the engine work area, indicating that the engine frequently works in a low fuel consumption field to ensure optimal fuel economy. Table 4 lists the fuel consumption after SOC correction. The Dyna-based fuel consumption was lower than the SDP-based fuel consumption and extremely close to the DP-based fuel consumption. Table 5 shows the computation time of the three algorithms. Because of the policy iteration process in SDP, the SDP-based computation time was considerably longer than the Dyna- and DP-based computation times.
Figure 13. SOC trajectories and engine operation area in the Dyna algorithm, SDP, and DP.
Figure 13. SOC trajectories and engine operation area in the Dyna algorithm, SDP, and DP.
Energies 08 07243 g013
Table 4. Fuel consumption in the Dyna algorithm, SDP, and DP.
Table 4. Fuel consumption in the Dyna algorithm, SDP, and DP.
AlgorithmFuel Consumption (g)Relative Increase (%)
DP2847
Dyna28530.21
SDP29252.74
Table 5. Computation times of the Dyna algorithm, SDP, and DP.
Table 5. Computation times of the Dyna algorithm, SDP, and DP.
AlgorithmsDPDynaSDP
Time a (h)2712
a A 2.4 GHz microprocessor with 12 GB RAM was used.
Because the Dyna-based control policy is extremely close to the DP-based optimal control policy, the Dyna algorithm has the potential to realize a real-time control strategy in the future. When the present power request is considered a continuous system, the next power request of a vehicle can be predicted accurately using the method introduced in [31,32]. Subsequently, when the power request is combined with the Dyna algorithm, the reward function and transition probability matrix can be updated. Furthermore, the computation time can be reduced when the transition probability matrix is updated as the reference [31]. Finally, the power split at the next time can be determined and a real-time control can be implemented.

5. Conclusions

In this study, the RL method was employed to derive an optimal energy management policy for an HETV. The updated rules of the Q-learning and Dyna algorithms were elucidated. The two algorithms were applied to the same experimental driving schedule to compare their optimality and computation times. The simulation results indicated that the Dyna algorithm registers more efficient fuel economy than the Q-learning algorithm does. However, the computation time of the Dyna algorithm is considerably longer than that of the Q-learning algorithm. The global optimality of the Dyna algorithm was validated by comparing it with the DP and SDP methods. The results showed that the Dyna-based control policy is more effective than the SDP-based control policy and close to the DP-based optimal control policy. In future studies, the Dyna algorithm will be used to realize a real-time control by predicting the next power request in a stationary Markov chain–based transition probability model.

Acknowledgments

The authors appreciate the scrupulous reviewers for their valuable comments and suggestions. This research was supported by the National Nature Science Foundation, China (Grant 51375044), National Defense Basic Research, China (Grant B2220132010) and University Talent Introduction Program of China (Grant B12022).

Author Contributions

Teng Liu, is writing and revising this manuscript integrally. Yuan Zou is in charge of deciding how to modify the reinforcement learning algorithm in this manuscript and contacting the editor. The task of Dexing Liu is computing and recomputing the transition probability matrix of different driving schedules in this manuscript. Fengchun Sun is responsible for revising all figures according to reviewers’ suggestions and the English editing in this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Serrao, L.; Onori, S.; Rizzoni, G. A Comparative Analysis of Energy Management Strategies for Hybrid Electric Vehicles. J. Dyn. Syst. Meas. Control 2011, 133, 031012:1–031012:9. [Google Scholar] [CrossRef]
  2. Lin, C.C.; Kang, J.M.; Grizzle, J.W.; Peng, H. Energy Management Strategy for a Parallel Hybrid Electric Truck. In Proceedings of the American Control Conference 2001, Arlington, VA, USA, 25–27 June 2001; Volume 4, pp. 2878–2883.
  3. Zou, Y.; Liu, T.; Sun, F.C.; Peng, H. Comparative study of dynamic programming and pontryagin’s minimum principle on energy management for a parallel hybrid electric vehicle. Energies 2013, 6, 2305–2318. [Google Scholar]
  4. Zou, Y.; Sun, F.C.; Hu, X.S.; Guzzella, L.; Peng, H. Combined optimal sizing and control for a hybrid tracked vehicle. Energies 2012, 5, 4697–4710. [Google Scholar] [CrossRef]
  5. Sundstrom, O.; Ambuhl, D.; Guzzella, L. On implementation of dynamic programming for optimal control problems with final state constraints. Oil Gas Sci. Technol. 2009, 65, 91–102. [Google Scholar] [CrossRef]
  6. Johannesson, L.; Åsbogård, M.; Egardt, B. Assessing the potential of predictive control for hybrid vehicle powertrains using stochastic dynamic programming. IEEE Trans. Intell. Transp. Syst. 2007, 8, 71–83. [Google Scholar] [CrossRef]
  7. Tate, E.; Grizzle, J.; Peng, H. Shortest path stochastic control for hybrid electric vehicles. Int. J. Robust Nonlinear Control 2008, 18, 1409–1429. [Google Scholar] [CrossRef]
  8. Kim, N.; Cha, S.; Peng, H. Optimal control of hybrid electric vehicles based on Pontryagin’s minimum principle. IEEE Trans. Control Syst. Technol. 2011, 19, 1279–1287. [Google Scholar]
  9. Delprat, S.; Lauber, J.; Marie, T.; Rimaux, J. Control of a paralleled hybrid powertrain: Optimal control. IEEE Trans. Veh. Technol. 2004, 53, 872–881. [Google Scholar] [CrossRef]
  10. Nüesch, T.; Cerofolini, A.; Mancini, G.; Guzzella, L. Equivalent consumption minimization strategy for the control of real driving NOx emissions of a diesel hybrid electric vehicle. Energies 2014, 7, 3148–3178. [Google Scholar] [CrossRef]
  11. Musardo, C.; Rizzoni, G.; Guezennec, Y.; Staccia, B. A-ECMS: An adaptive algorithm for hybrid electric vehicle energy management. Eur. J. Control 2005, 11, 509–524. [Google Scholar] [CrossRef]
  12. Sciarretta, A.; Back, M.; Guzzella, L. Optimal control of paralleled hybrid electric vehicles. IEEE Trans. Control Syst. Technol. 2004, 12, 352–363. [Google Scholar] [CrossRef]
  13. Vu, T.V.; Chen, C.K.; Hung, C.W. A model predictive control approach for fuel economy improvement of a series hydraulic hybrid vehicle. Energies 2014, 7, 7017–7040. [Google Scholar] [CrossRef]
  14. Nüesch, T.; Elbert, P.; Guzzella, L. Convex optimization for the energy management of hybrid electric vehicles considering engine start and gearshift costs. Energies 2014, 7, 834–856. [Google Scholar] [CrossRef]
  15. Gao, B.T.; Zhang, W.H.; Tang, Y.; Hu, M.J.; Zhu, M.C.; Zhan, H.Y. Game-theoretic energy management for residential users with dischargeable plug-in electric vehicles. Energies 2014, 7, 7499–7518. [Google Scholar] [CrossRef]
  16. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA; London, UK, 2005; pp. 140–300. [Google Scholar]
  17. Hester, T.; Quinlan, M.; Stone, P. RTMBA: A real-time model-based reinforcement learning architecture for robot control. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 85–90.
  18. Degris, T.; Pilarski, P.M.; Sutton, R.S. Model-free reinforcement learning with continuous action in practice. In Proceedings of the 2012 American Control Conference, Montreal, QC, Canada, 27–29 June 2012; pp. 2177–2182.
  19. Perron, J.; Moulin, B.; Berger, J. A hybrid approach based on multi-agent geo simulation and reinforcement learning to solve a UAV patrolling problem. In Proceedings of the Winter Simulation Conference, Austin, TX, USA, 7–10 December 2008; pp. 1259–1267.
  20. Hsu, R.C.; Liu, C.T.; Chan, D.Y. A reinforcement-learning-based assisted power management with QoR provisioning for human–electric hybrid bicycle. IEEE Trans. Ind. Electron. 2012, 59, 3350–3359. [Google Scholar] [CrossRef]
  21. Abdelsalam, A.A.; Cui, S.M. A fuzzy logic global power management strategy for hybrid electric vehicles based on a permanent magnet electric variable transmission. Energies 2012, 5, 1175–1198. [Google Scholar] [CrossRef]
  22. Langari, R.; Won, J.S. Intelligent energy management agent for a parallel hybrid vehicle—Part I: System architecture and design of the driving situation identification process. IEEE Trans. Veh. Technol. 2005, 54, 925–934. [Google Scholar] [CrossRef]
  23. Guo, J.G.; Jian, X.P.; Lin, G.Y. Performance evaluation of an anti-lock braking system for electric vehicles with a fuzzy sliding mode controller. Energies 2014, 5, 6459–6476. [Google Scholar] [CrossRef]
  24. Lin, C.C.; Peng, H.; Grizzle, J.W. A stochastic control strategy for hybrid electric vehicles. In Proceedings of the American Control Conference, Boston, MA, USA, 30 June–2 July 2004; pp. 4710–4715.
  25. Dai, J. Isolated word recognition using Markov chain models. IEEE Trans. Speech Audio Proc. 1995, 3, 458–463. [Google Scholar]
  26. Brazdil, T.; Chatterjee, K.; Chmelik, M.; Forejt, V.; Kretinsky, J.; Kwiatkowsha, M.; Parker, D.; Ujma, M. Verification of markov decision processes using learning algorithms. Logic Comput. Sci. 2014, 2, 4–18. [Google Scholar]
  27. Chades, I.; Chapron, G.; Cros, M.J. Markov Decision Processes Toolbox, Version 4.0.2. Available online: http://cran.r-project.org/web/packages/MDPtoolbox/ (accessed on 22 July 2014).
  28. Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar]
  29. Zou, Z.Y.; Xu, J.; Mi, C.; Cao, B.G. Evaluation of model based state of charge estimation methods for lithium-Ion batteries. Energies 2014, 7, 5065–5082. [Google Scholar] [CrossRef]
  30. Jimenez, F.; Cabrera-Montiel, W. System for Road Vehicle Energy Optimization Using Real Time Road and Traffic Information. Energies 2014, 7, 3576–3598. [Google Scholar] [CrossRef]
  31. Filev, D.P.; Kolmanovsky, I. Generalized markov models for real-time modeling of continuous systems. IEEE Trans. Fuzzy Syst. 2014, 22, 983–998. [Google Scholar] [CrossRef]
  32. Di Cairano, S.; Bernardini, D.; Bernardini, D.; Bemporad, A.; Kolmanovsky, I.V. Stochastic MPC with Learning for Driver-Predictive Vehicle Control and its Application to HEV Energy Management. IEEE Trans. Cont. Syst. Technol. 2015, 22, 1018–1030. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Liu, T.; Zou, Y.; Liu, D.; Sun, F. Reinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle. Energies 2015, 8, 7243-7260. https://doi.org/10.3390/en8077243

AMA Style

Liu T, Zou Y, Liu D, Sun F. Reinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle. Energies. 2015; 8(7):7243-7260. https://doi.org/10.3390/en8077243

Chicago/Turabian Style

Liu, Teng, Yuan Zou, Dexing Liu, and Fengchun Sun. 2015. "Reinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle" Energies 8, no. 7: 7243-7260. https://doi.org/10.3390/en8077243

Article Metrics

Back to TopTop