Novel Approaches for Energy Management Strategies of Hybrid Electric Vehicles and Comparison with Conventional Solutions

: Well-designed energy management strategies are essential for the good operation of Hybrid Electric Vehicles (HEVs) in terms of fuel economy and pollutant emissions reduction, regardless of the speciﬁc powertrain architecture. The goal of this paper is to propose two innovative supervisory control strategies for HEVs derived from different optimization algorithms and to assess HEVs’ fuel consumption reduction (compared to conventional vehicles). These approaches are derived from the literature and modiﬁed by the authors to present novel algorithms for the optimization problem. One is based on Dynamic Programming (DP), here referred to as the Forward Approach to Dynamic Programming (FADP) and introduces a different implementation of the DP to achieve computational and accuracy beneﬁts. The other is based on the Equivalent Consumption Minimization Strategy (ECMS) approach, and it adapts to the latest driving conditions using information gathered in a ﬁnite-length backward-looking horizon. These techniques are used to achieve the optimal power share between the thermal engine and the battery of a parallel HEV. Their performances are compared and analysed in terms of achieved fuel economy and computational time with respect to conventional DP and Pontryagin’s Minimum Principle (PMP) approaches.


Introduction
In the current world energy scenario, the transport sector accounts for the highest share (one third) of final energy consumption. Indeed, it has been reported that, in 2014, passenger cars alone used more energy than the entire residential sector; moreover, together with freight road vehicles, they accounted for almost a third of global energy-related CO 2 emissions [1]. Therefore, the transition to road vehicles that are not powered by fossil fuels, such as battery electric vehicles (BEVs) and fuel cell electric vehicles (FCEVs), will have a huge positive impact on global CO 2 emissions, as long as electricity and hydrogen, respectively, are produced with renewable energy sources. Nevertheless, several decades will be required to complete this transition because there is still the need for cost reduction, reliability increase and infrastructure development.
In this context, it is authors' opinion that hybrid electric vehicles (HEVs) could represent one of the most feasible and fastest ways in the short term to reduce the impact of the transportation sector on global energy-related CO 2 emissions, combining the advantages of the conventional (i.e., based on internal combustion engines) vehicles with those of BEVs. According to the HEV architecture, it is possible to achieve some of the following improvements [2]: • downsize the engine to achieve higher average efficiency; • recover energy during braking instead of dissipating it as friction losses; • optimize the power split between prime movers; • eliminate idle fuel consumption by turning off the engine (stop-and-start); • reduce clutching losses decoupling the engine from the road.
HEVs were firstly classified in two basic types: series or parallel. In series HEVs, the electric motor alone drives the vehicle with the electricity supplied by the battery or by the engine-driven generator. In parallel HEVs, on the contrary, both prime movers can power the vehicle individually or simultaneously.
When the battery is recharged by the grid, like in BEVs, the HEV is referred to as plugin (PHEV). This vehicle is not limited to charge sustaining operation as non-rechargeable hybrids. Therefore, the battery can be discharged to its lower limit to cover the whole driving mission length. However, the range that PHEVs can cover in pure electric mode is limited.
Choosing the best hybrid powertrain architecture and size of the components for a given vehicle is a trivial task. Indeed, every architecture has strengths and weaknesses that might be relevant depending on the vehicle size, usage, and final cost. Once the architecture is established, the control structure should be defined to obtain the greatest benefits in term of fuel economy increase as well as emissions and life-cycle cost reduction.
In the design of a hybrid powertrain, optimization has a key role in the entire development process. Three main optimization levels can be identified: • structural optimization to find the best possible powertrain layout architecture; • parametric optimization to find the best possible parameters for the components of a fixed structure; • control system optimization to find the best possible supervisory control algorithms.
The structural optimization for HEVs essentially consists in the choice of the best powertrain architecture (i.e., series, parallel, series-parallel or combined hybrid). Whereas, the parametric optimization consists in the optimal sizing of the engine, of the battery pack, of the electric motor (or motors) and of the gearbox gear ratios. Regardless of the configuration, the potential fuel economy improvement and emission reduction rely entirely on the power distribution within the hybrid powertrain. Hence, there is the need for a supervisory controller to determine how the power demand distributes among the powertrain components, providing the set points for all the low-level controllers.
Choosing the proper control strategy, that can be classified in this context as Energy Management Strategy (EMS), is complex and requires the knowledge of both vehicle architecture and driving mission profile, with great burden on computational effort for the online implementation of such strategies [3].
The contribution of this work is to propose and evaluate the performance of two novel supervisory control algorithms: a Forward Approach to Dynamic Programming (FADP) and an adaptation of a causal Equivalent Consumption Minimization Strategy (ECMS-based). The main advantages and drawbacks of such algorithms are compared to conventional optimization techniques: Dynamic Programming (DP) and Pontryagin's Minimum Principle (PMP) are used as reference since they are well established and are commonly used offline to find the optimal control law of HEVs over an assigned driving mission.
The paper is structured as follows: Section 2 briefly addresses the description of the model used to simulate the HEV operation, based on information already available in the literature. Section 3 presents an initial overview on conventional HEVs supervisory control strategies (i.e., DP and PMP) followed by a more detailed description of the novel strategies proposed in this work (i.e., FADP and ECMS-based). In Section 4, a critical analysis of the performances of each optimization technique is performed accounting for required computational time and achieved fuel economy over several driving missions.

Model of the Hybrid Electric Vehicle
The hybrid vehicle considered in this study presents a parallel P3 hybrid powertrain configuration, as already presented in [4]. The base vehicle mass is 1400 kg, and the frontal area is 2 m 2 , whereas the other key parameters required for the simulation are taken from [5]. The electrical machine is located on the gearbox output shaft so that the mechanical power provided by the electric motor is not affected by engine efficiency. With this architecture, pure electric drive is possible. Such configuration has been chosen instead of a P1 configuration (with the electric motor located nearby the thermal engine) since it shows a lower fuel consumption during urban driving due to higher mechanical efficiency and allows regenerative braking in any driving condition (also when the clutch is not engaged) [6]. Nevertheless, it can be stated that the algorithms proposed in this work can be applied to any architecture without loss of generality.
The model proposed here is a backward-facing one, meaning that the driving force required to move the vehicle is directly computed from the desired velocity trajectory [7]. The driving force propagates backward through the driveline towards the internal combustion engine and the electrical machine. The battery pack consists of 30 branches in parallel, each made of 24 cells connected in series. The battery charging and discharging internal resistances are taken from [8]. The single battery is a LiFePO 4 cell manufactured by A123 systems [9], with a nominal voltage of 3.3 V, a nominal capacity of 1.1 Ah and a weight of 39 g. The transmission is a manual 5-speed manufactured by TREMEC model TR-2450 [10]. The mechanical and electrical transmission components are modelled with a constant behaviour, whereas the efficiencies of the internal combustion engine (SI, 1.2 L, 4 in-line cylinders, 105 Nm at 3500 rpm) and the electrical machine (130 Nm max torque, 9000 rpm max speed, 3200 rpm transition speed) are modelled through maps depending on shaft speed and torque [4]. The main drivetrain transmission parameters are taken from [5,10]. Further details on the hybrid vehicle model used in this work can be found in [4].
The main disadvantage of the considered backward approach is that the speed reference trajectory is always assumed perfectly tracked by the vehicle, regardless of the powertrain capabilities (i.e., maximum available torque). Moreover, approximating a transient condition as a sequence of steady states (quasi-static approach) reduces the model accuracy. Nevertheless, it is worth remarking that the simplicity and computational lightness of the proposed model along with the absence of the necessity of a powertrain control structure (including the driver) represent a strong advantage in terms of model usage for control strategies' design and application. Therefore, considering the great computational benefits, these approximations are considered acceptable for the purpose of this work, which aims at performing a preliminary estimation of HEVs' fuel consumption [11]. Moreover, powertrain dynamics can be neglected since it is much faster than the battery state of charge dynamics [12].

Supervisory Control Algorithms
A conventional powertrain based on an internal combustion engine has zero degrees of freedom (DOF) on the power demand distribution: all the requests are satisfied by the engine. On the other hand, a hybrid powertrain usually has one DOF, so that the power demand can be optimally distributed (optimal power split) among its components to achieve the maximum improvement in terms of fuel economy and emissions reduction.
The control task of optimal power split is referred to as supervisory control or EMS [11]. The strategies that can be adopted by the supervisory controller can be divided into two main classes: optimal and heuristic. The former uses optimal control theory to formulate a problem that can be solved afterwards, whereas the latter follows rule-based indications calibrated beforehand mainly through optimal control techniques.
Another classification can be performed based on the information used: non-causal and causal. Non-causal control strategies require the perfect knowledge of the inputs (e.g., the road load) and thus assume that they are completely known. On the other hand, Energies 2022, 15, 1972 4 of 22 causal strategies use information up to the current instant only. All heuristic strategies are inherently causal, whereas optimal strategies can either be causal or non-causal.
Generally, causal controllers are neither able to find the optimal solution nor to satisfy the constraint on the final state-of-charge (SoC) of the battery. This latter aspect is particularly relevant when it comes to the comparison between different strategies: if the SoCs at the end of the mission are different, the comparison is meaningless.
The main novelty brought by this paper to what is currently available in the literature consists in two novel supervisory control algorithms, being a Forward Approach to Dynamic Programming (FADP) and an adaptation of a causal Equivalent Consumption Minimization Strategy (ECMS-based). The key features and pros and cons of the proposed novel algorithms are described in details and compared to two conventional optimization techniques, which are Dynamic Programming (DP) and Pontryagin's Minimum Principle (PMP).
All considered techniques are non-causal, except the ECMS-based one, which uses current and past information only. Both DP [13][14][15][16][17] and PMP [18,19] techniques are well established and have been extensively presented in the literature for hybrid powertrain optimization, and are here therefore used to evaluate the performance of the newly proposed FADP and ECMS-based approaches. In the following, a brief overview on optimal control problem formulation is given, followed by the introduction of the bases of all the considered methods (both conventional and novel) and the analysis of their performance for hybrid powertrain optimization.

Optimal Control Problem Statement
The optimal solution to the energy management problem is the one that minimizes (or maximizes) a certain performance index J, suitably defined for each specific problem. The simplest performance index formulation is the one that considers only the fuel mass consumed to realize the driving mission: where x and u are the system state and control variables respectively. Although further factors might be considered in the performance index (e.g., pollutants emissions, battery aging and degradation), this work aims only at minimizing the vehicle fuel consumption. Several state variables are required to describe the dynamics of HEV systems. Nevertheless, for the purpose of energy management, only the battery SoC is considered as a relevant state variable to be accounted for (i.e., x = ξ), whose evolution presents the initial condition ξ(0) = ξ 0 .
For the case of an HEV under exam, the control variable adopted throughout this paper is the battery power (i.e., u = P b ), whose minimum and maximum values depend on both the cell maximum charge and discharge currents and battery pack configuration, along with the battery SoC: The gear engaged is always chosen as the one giving the lowest fuel consumption among those of the feasible set, keeping engine speed within its minimum and maximum values.
The SoC should stay within admissible boundaries during the whole mission ξ min ≤ ξ(t) ≤ ξ max and meet the constraint on its target value ξ t at the end of the driving mission (i.e., ξ(t f ) = ξ t ).
To consider the constraint on the final SoC, a terminal cost (or penalty) φ is usually added to the performance index J:

Dynamic Programming
Dynamic programming (DP) was firstly introduced in 1957 as a solution for multistage decision problems in which time plays a significant role and the order of operations may be crucial [20]. DP has widely been used in literature to solve optimal control problems for its capability to handle complex multiple constraints and to guarantee global optimality of the provided solution.
The main drawbacks of the DP are its heavy computational burden, in terms of both memory usage and CPU operations, which increase exponentially with the number of state and control variables of the system and linearly with the mission time span.
The practical implementation of DP algorithms requires the discretization of time and of the feasible sets of state and control variables. Consequently, the solution found with the algorithm will be optimal only up to the error introduced by this discretization.
Once all the variables have been discretized, the overall cost of the mission, given by Equation (3), can be written as the sum of the costs of the N time steps and the final cost: The constraint on the final SoC is handled as a hard one, yielding to: Starting from the end of the mission, the DP algorithm proceeds backward in time to find and store the optimal control value P k b,opt (ξ i ) on any state grid point ξ i at any time t k , as in Equation (6), and the cost-to-go J k (ξ i ), that is the minimum cost to go from state ξ i at time t k to state ξ t at time t N : where V is the set of all control variable grid values and f i represents the value that the SoC would assume according to the control variable P k b at the next time step starting from ξ i : At time t N , where the algorithm starts, the costs-to-go are known and equal to zero when ξ i = ξ t and infinite otherwise, according to Equation (5).
The cost-to-go J k+1 ( f k+1 i ) must be interpolated across grid points or approximated with its nearest neighbour because the arrival states f k+1 i , defined by Equation (7), generally do not correspond to the state grid points ξ i . Once J k+1 ( f k+1 i ) is available, it is possible to solve Equation (6) and find the optimal control value P k b,opt (ξ i ) among the values in the feasible set V.
Once the optimal control is found for all i and k, starting from ξ 0 opt = ξ 0 , the optimal state trajectory is reconstructed proceeding forward in time: Since the optimal state ξ k opt generally do not correspond to the state grid points ξ i , the optimal control value P k b,opt ξ k opt must be evaluated by interpolation or by approximation to its nearest neighbour. Further detail on Dynamic Programming can be found in [21,22], whereas its application to the hybrid vehicle problem can be found, e.g., in [2,13,[15][16][17][23][24][25][26].

Pontryagin's Minimum Principle
Pontryagin's Minimum Principle (PMP) is a result of optimal control theory that helps identifying necessary optimality conditions for control (and, thus, state) variables of a given dynamic system. Since the optimality conditions are not sufficient, the control candidate identified might not lead to the global optimum. Nevertheless, the results obtained through the application of the PMP could be relatively close to global optimum computed through DP [12]. Further theoretical details on the principle could be found in [21], while examples of its application to the HEV powertrain control can be found, e.g., in [5,19,[27][28][29][30].
Given the optimal control problem discussed in the previous section with respect to DP, it is possible to define the Hamiltonian function as: where λ(t) is the co-state. The PMP states that a necessary condition for optimality is that the Hamiltonian is minimum in the control variable for all the time window: and the optimal state and co-state evolve according to the following equations: . .
The compliance of the constraint on the final SoC depends on the initial co-state value, i.e., the initial condition of Equation (12). This should be found iteratively (e.g., through the shooting method) until the final state of charge converges to its target value. Subsequently, the Hamiltonian minimization and the integration of Equations (11) and (12) are performed several times until convergence is reached.
To speed up the computation by avoiding the resolution of a minimization problem every time step, the Hamiltonian is evaluated on a finite set of values of the control variable P b of dimension Q, as presented in [31].

Forward Approach to Dynamic Programming
From a theoretical point of view, DP algorithms can run forward or backward in time without any difference: the optimal solution will be the same regardless of the approach. The practical implementation of DP to solve optimal control problems requires the discretization of the control variable, leading to the backward procedure described in Section 3.2. DP algorithms involve the use of either the rounding or the interpolation method, with further reduction in the accuracy of the solution and increase in the computational time. Moreover, there is no guarantee that, during the forward-calculation phase of DP, a feasible control trajectory is generated or that such control will lead to the target state at the end of the mission. This issue is particularly problematic in problems where the optimal state trajectory lies along or close to the boundary of its feasible set.
In this paper, a novel procedure is proposed to avoid the discretization of the control variable. This procedure is only possible when a bijective correspondence between state transitions and control variables is present. The proposed algorithm can run forward or backward in time without distinction.
The transition from state ξ i to ξ j is expressed as: where the indexes i and j are both related to the same state grid points. A certain cost is required to realize the transition ∆ξ ij and, at the same time, satisfy the power request at the wheels. This cost is referred to as transition cost and is indicated as L ij . The bijective correspondence between state transitions and control variables is necessary to guarantee that to each ∆ξ ij corresponds only one possible L ij . The procedure to evaluate the transition costs requires particular care and it will be detailed in the following. Nevertheless, once these costs are known, the optimal control problem can be basically approached following the same reasoning detailed for the simplified problem discussed in the previous sections.
The algorithm here proposed starts from the initial time step setting the initial cost to infinity for values of ξ different from the initial condition: For the sake of notation compactness, a slight change in the notation is adopted: J k i represents the cost-to-arrive to ξ i at time t k starting from the initial state at the beginning of the mission. With reference to Figure 1, the computation proceeds forward in time solving the minimization: (15) In this paper, a novel procedure is proposed to avoid the discretization of the control variable. This procedure is only possible when a bijective correspondence between state transitions and control variables is present. The proposed algorithm can run forward or backward in time without distinction.
The transition from state ξi to ξj is expressed as: where the indexes i and j are both related to the same state grid points. A certain cost is required to realize the transition ∆ξij and, at the same time, satisfy the power request at the wheels. This cost is referred to as transition cost and is indicated as Lij. The bijective correspondence between state transitions and control variables is necessary to guarantee that to each ∆ξij corresponds only one possible Lij. The procedure to evaluate the transition costs requires particular care and it will be detailed in the following. Nevertheless, once these costs are known, the optimal control problem can be basically approached following the same reasoning detailed for the simplified problem discussed in the previous sections.
The algorithm here proposed starts from the initial time step setting the initial cost to infinity for values of ξ different from the initial condition: For the sake of notation compactness, a slight change in the notation is adopted: represents the cost-to-arrive to ξi at time t k starting from the initial state at the beginning of the mission.
With reference to Figure 1, the computation proceeds forward in time solving the minimization: This algorithm basically finds the optimal state value ξi from which it is convenient to start in order to arrive at ξj, satisfying the power request at the wheels. The arguments of the minimization in Equation (15) are stored in a feedback matrix X, whose elements are defined as: This algorithm basically finds the optimal state value ξ i from which it is convenient to start in order to arrive at ξ j , satisfying the power request at the wheels. The arguments of the minimization in Equation (15) are stored in a feedback matrix X, whose elements are defined as: Each entry X k+1 j represents the index i to which corresponds the optimal j state grid value to start from in order to arrive at ξ j at t k+1 with the minimum cost-to-arrive. These values are used to reconstruct the optimal state trajectory ξ opt (t) with a backward procedure, starting from j = t and ξ k opt = ξ t , and proceeding backward with k that goes from N − 1 to 1 according to the following sequence: The general implementation of the FADP once each transition cost is known and stored in memory is expressed in the Appendix A (see Algorithm A1). The transition cost calculation depends on the specific problem, and it will be detailed in the next section for an HEV optimal control application.
It is worth noting that the output of this optimization is the optimal state trajectory over time. Nevertheless, since there is a bijective correspondence between each state transition and the control value needed to realize it and satisfy the power request, the optimal control law can be easily calculated.

Transition Cost Calculation
The algorithm described in the previous section is general and could be used for any optimal control problem as an alternative to DP. In this section, the transition cost calculation for the specific case of the HEV energy management problem previously described is presented.
Once the state transitions ∆ξ ij and the time step ∆t have been defined, the rate of charge/discharge is evaluated as: and the battery power P b can be then calculated. At this stage, the compliance with the constraints like those in Equation (2) can be checked. Once the battery power has been computed, the electric machine mechanical power can be evaluated and the power required to the thermal engine can be finally calculated as the difference between the required wheel power and the electric machine mechanical power.
From engine speed and torque, the fuel consumption . m f can be evaluated along with the cost-to-arrive L k ij . In the Appendix A (Algorithm A2), the whole process of the cost-to-arrive calculation is reported.

Iterative Approach in FADP
To reduce the computational burden of FADP while preserving the accuracy of the solution, an iterative process is here proposed (IFADP). The main idea is to progressively narrow the state grid in the neighbourhood of the optimal state trajectory obtained in a previous iteration performed on a coarser grid.
Similarly, to what shown in Section 3.4, the state grid is discretized into M grid points: while the upper ξ up and lower ξ down boundaries of the state grid are defined as: where ∆ξ * is the chosen amplitude of the neighbourhood region related the optimal state trajectory found previously. Hence, the state grid step ∆ξ can be calculated as: At each iteration, the amplitude of the neighbourhood ∆ξ * is progressively reduced and, as a consequence, a better fuel economy is achieved. On the other hand, the number of state grid elements M is kept constant to not increase the computational burden.

Causal ECMS-Based Strategy
In many practical applications dealing with optimal supervisory control of HEVs, the co-state λ introduced with the PMP method can be treated as a constant. The assumption is motivated by the weak dependency of the Hamiltonian on the SoC ξ. In fact, while the fuel flow rate . m f does not depend on ξ at all, the battery current depends on ξ through the battery internal resistance and open circuit voltage [4]. Nonetheless, especially for charge sustaining control policies, since ξ fluctuates around an intermediate value, the battery parameters are approximately constant if the range is narrow enough. Subsequently, the dependency of the Hamiltonian H on ξ is weak and can thus be neglected.
According to [19], the Hamiltonian could be redefined as the sum of the fuel chemical power plus the battery electrical power, weighted by an equivalence factor s 0 , represented by the constant co-state: Similarly, the Hamiltonian could be redefined as the sum of the fuel mass consumed by the engine and an equivalent fuel mass consumed by the electric motor: It is worth noting that if . m EQ = P b /H i , Equation (23) corresponds to Equation (22). Nevertheless, in this work, the equivalent fuel consumption is considered equal to the state dynamics: thus, the Hamiltonian is closer to the one defined in Equation (9). The equivalence factor drives the solution alternatively towards battery charging (i.e., high s 0 values) or battery depleting (i.e., low s 0 values). The ECMS-based technique usually minimizes the Hamiltonian while adjusting the equivalent factor dynamically over time using, for instance, a feedback signal on the battery SoC: This allows keeping ξ within a given range of values and obtaining a reduction of fuel consumption comparable with that obtained through a non-causal method, such as in [31][32][33]. The constant baseline value s 0 is normally obtained beforehand through an offline optimization using a non-causal method.
In this paper, s 0 is estimated using past driving information and a feedback signal from the battery SoC: The first term s c estimates the equivalence factor according to the driving conditions (e.g., urban, sub-urban, highway, etc.) through the Urban Degree UD: This latter term expresses if a specific driving cycle shows more "urban" conditions or not. It is calculated on the backward horizon (t h ,t) as: and its value is greater for urban driving cycles and smaller for highway ones. The second term s p takes into account the difference between the current SoC and its target value ξ t . The parameters presented for this control algorithm, that are s c1 , s c2 , s p , UD 0 and t h , have to be identified for each powertrain configuration through heuristic approaches or iteratively solving the optimal control problem with non-causal techniques.
Analogously to what has been done for PMP in Section 3.3, the control variable P b is discretized into a finite set of dimension Q on which the Hamiltonian is evaluated to find its minimum value.
The parameters of the causal ECMS-based control strategy were heuristically determined and listed in Table 1, according to the hybrid powertrain configuration defined in the previous sections.

Simulation Results
The performances of each control strategy will now be assessed over several driving missions in terms of achieved fuel economy and required computational time. All the results showed in the following section will refer to charge sustaining strategies: the battery SoC target at the end of the mission is set equal to the initial one (ξ t = ξ 0 = 0.6). The driving cycles accounted in this work are listed in Table 2, with the corresponding UD value evaluated on the whole cycle through Equation (28) and ordered in descending UD. More information on the accounted driving cycles can be found in [34]. The Relative Fuel Economy (RFE) is here assumed as the reference parameter for strategies' assessment and comparison. Such parameter expresses the percentage reduction of the total mass of fuel required by the hybrid vehicle to realize the driving mission with respect to that required by the conventional vehicle (i.e., considering the same powertrain without the electrical components weight): A further analysis is performed on the computational effort required by the proposed strategies. In Table 3, the complexity of each algorithm is reported as the number of evaluations of the cost function. Specifically, the terms N, M and Q are the number of time, state, and control discrete elements, respectively, and i is the number of iterations (where applicable). Table 3. Complexity of each considered optimal supervisory control strategy.

Analysis of DP Performance
Solving the optimal control problems through DP requires the calculation of the costto-go from every battery SoC at every time instant. The solution obtained with this method reaches the absolute optimum, compatibly with the state and control discretization. The main drawback consists in the high amount of resources required in terms of both memory and computational time. If the number of state elements M is set equal to the number of control elements Q, the computational time increases quadratically with the number of elements (as shown for the FADP in Table 3).
The optimal SoC trajectory obtained through DP on a WLTP driving cycle is shown in Figure 2. In this figure, on the left y-axis it is reported the SoC value is related to the red SoC trajectory line, whereas the right y-axis relates to the cost-to-go values. It can be observed that the control algorithm tends to recharge the battery during the Low, Medium and High speed of the cycle (i.e., from 0 to almost 1600 s, in urban and extra-urban conditions) and then depletes the battery in the last part of the cycle, when an extra high speed is encountered (i.e., on the highway). For what concerns the cost-to-go values (proposed as a contour plot right beneath the SoC trajectory), they decrease over time and with higher SoC values. This latter result is easily explained by considering that a greater SoC with respect to the one to be achieved as target value implies a reduction in the overall fuel consumption.
Energies 2022, 15, x FOR PEER REVIEW 12 with respect to the one to be achieved as target value implies a reduction in the ov fuel consumption.  Table 4 lists the RFE achieved through the DP optimization for the driving c reported in Table 2. As expected, higher RFE are obtained on urban driving cycles NYCC and CADC urban) with higher values of UD (calculated over the whole dr mission). This is explained with the more frequent braking phases (where energy is r ered) and the lower average efficiency of the thermal engine in low load conditions.  Table 2 in terms of Relative Fuel Economy (RFE).  Table 4 lists the RFE achieved through the DP optimization for the driving cycles reported in Table 2. As expected, higher RFE are obtained on urban driving cycles (e.g., NYCC and CADC urban) with higher values of UD (calculated over the whole driving mission). This is explained with the more frequent braking phases (where energy is recovered) and the lower average efficiency of the thermal engine in low load conditions. On the other hand, on highway missions, the RFE is very limited due to the almost complete absence of braking phases and the high efficiency achieved by the thermal engine alone on the conventional vehicle while operating at medium-high loads. The limited efficiency benefits, along with the additional weight due to the presence of the battery pack, nearly nullify the benefits brought by the hybrid powertrain configuration on highway scenarios.

Analysis of PMP Performance
As already described in Section 3.3, using PMP entails the replacement of the original boundary value problem (with fixed initial and final conditions on the SoC) with an initial value problem, where the initial condition on the co-state λ is iteratively updated through a shooting method until the final condition on the SoC ξ is met.
Clearly, higher absolute values of the co-state lead to higher values of the final SoC, favouring the recharge mode and penalizing the discharge mode. To find the instantaneous minimum value of the Hamiltonian, the control variable P b is discretized into a finite set of equally distant points Q within its feasible range (i.e., from its current minimum to its maximum). The Hamiltonian is then evaluated for each possible operating point. Thus, the optimal control is the one that meets the constraints and corresponds to the minimum value of the Hamiltonian.
The computational time required to solve the optimal control problem is proportional to the number of iterations to reach the desired battery SoC at the end of the mission. It increases linearly with the number of elements and iterations in accordance with the estimation given in Table 3. However, if the co-state starts from a proper initial guess and it is suitably updated, just few iterations are required to meet the constraint on the SoC final value within an acceptable tolerance. The above comments are graphically represented in Figure 3 (example related to the results obtained on the FTP-72 cycle), where the linear increase in the computational time with the number of discrete control elements can be clearly visible. However, it can also be observed that the relative fuel economy improvement becomes negligible above a certain number of control elements (2000 in the example), proving that the increase in the computational efforts does not result in further improvement in the optimal solution. Figure 4 presents the optimal SoC trajectory obtained with PMP on the WLTP driving cycle, to allow a visual comparison with the results obtained with the DP. From this figure, it can be observed that the optimal SoC trajectory is almost the same of that presented in Figure 2, with only hardly detectable changes. Additionally, in this case, the control algorithm tends to recharge the battery in almost the entire cycle and depletes it during highway driving. Table 4 reports the RFE results obtained through the PMP optimization for several driving missions. Since PMP obtains the same RFE results as DP (Table 4), it is possible to state that this technique allows to find the absolute optimal control on the problem under analysis. Therefore, PMP will be adopted as a comparative reference for the analysis presented in the following with respect to the novel proposed approaches. final value within an acceptable tolerance. The above comments are graphically represented in Figure 3 (example related to the results obtained on the FTP-72 cycle), where the linear increase in the computational time with the number of discrete control elements can be clearly visible. However, it can also be observed that the relative fuel economy improvement becomes negligible above a certain number of control elements (2000 in the example), proving that the increase in the computational efforts does not result in further improvement in the optimal solution.  Figure 4 presents the optimal SoC trajectory obtained with PMP on the WLTP driving cycle, to allow a visual comparison with the results obtained with the DP. From this figure, it can be observed that the optimal SoC trajectory is almost the same of that presented in Figure 2, with only hardly detectable changes. Additionally, in this case, the control algorithm tends to recharge the battery in almost the entire cycle and depletes it during highway driving.   Table 4 reports the RFE results obtained through the PMP optimization for several driving missions. Since PMP obtains the same RFE results as DP (Table 4), it is possible to state that this technique allows to find the absolute optimal control on the problem under analysis. Therefore, PMP will be adopted as a comparative reference for the analysis presented in the following with respect to the novel proposed approaches.

Analysis of FADP Performance
Approaching the solution of HEV optimal control problem through FADP algorithms allows to evaluate the cost-to-arrive to each battery SoC at every time instants, as reported in Figure 5. The evaluation of the cost-to-arrive over time can be particularly useful in the benchmarking process of causal EMS.

Analysis of FADP Performance
Approaching the solution of HEV optimal control problem through FADP algorithms allows to evaluate the cost-to-arrive to each battery SoC at every time instants, as reported in Figure 5. The evaluation of the cost-to-arrive over time can be particularly useful in the benchmarking process of causal EMS. Figure 5 presents the optimal trajectories of the battery SoC obtained with FADP approach. Compared to that of PMP, it can be observed that only minor deviations in the achieved optimal state trajectories are present, mainly due to the discretization of the state and/or control variables. Figure 6 shows the influence of the number of discrete state elements M on the required computational time and the fuel economy achieved through the resulting optimal control law (the results are presented with respect to the NYCC driving cycle). It is evident that the computational time increases quadratically with the number of elements, in accordance with the estimation given in Table 3, but the maximum fuel economy reaches also in this case a limit value that does not increase with the number of state elements above 1000. The choice of the NYCC driving cycle in this case is performed to give a wider analysis of the results, demonstrating that the achievement of the optimal solution does not require a high computational effort independently from the chosen driving cycle.

Analysis of FADP Performance
Approaching the solution of HEV optimal control problem through FADP algorithms allows to evaluate the cost-to-arrive to each battery SoC at every time instants, as reported in Figure 5. The evaluation of the cost-to-arrive over time can be particularly useful in the benchmarking process of causal EMS.  Figure 5 presents the optimal trajectories of the battery SoC obtained with FADP approach. Compared to that of PMP, it can be observed that only minor deviations in the achieved optimal state trajectories are present, mainly due to the discretization of the state and/or control variables. Figure 6 shows the influence of the number of discrete state elements M on the required computational time and the fuel economy achieved through the resulting optimal control law (the results are presented with respect to the NYCC driving cycle). It is evident that the computational time increases quadratically with the number of elements, in accordance with the estimation given in Table 3, but the maximum fuel economy reaches also in this case a limit value that does not increase with the number of state elements above 1000. The choice of the NYCC driving cycle in this case is performed to give a wider analysis of the results, demonstrating that the achievement of the optimal solution does not require a high computational effort independently from the chosen driving cycle. Addressing the use of the iterative approach, the SoC optimal trajectory obtained with IFADP is shown in Figure 7 with respect to three successive iterations. The procedure is performed by restraining the search space of the current iteration in a progressively narrower band (of amplitude ∆ξ * ) around the optimal trajectory obtained with the previous iteration (as described in Section 3.4.2). The values of ∆ξ * adopted to obtain the results in Figure 7 are 0.1, 0.01 and 0.005, for the three consecutive iterations. The result achieved with the third iteration coincides with the trajectory already presented in Figure 5. Therefore, after few iterations, the same fuel economy of an FADP with a greater state element number is attained, even though with a reduced computational time. This advancement can be observed from Figure 8, where the influence of the number of discrete state elements M on the required computational time and the fuel economy for the IFADP approach is reported with respect to the NYCC driving cycle. Comparing Figure 8 with Figure 6, it can be seen that, with a given number of state elements, e.g., M = 2000, the IFADP approach achieves the same fuel economy of the FADP in half the required time. Addressing the use of the iterative approach, the SoC optimal trajectory obtained with IFADP is shown in Figure 7 with respect to three successive iterations. The procedure is performed by restraining the search space of the current iteration in a progressively narrower band (of amplitude ∆ξ * ) around the optimal trajectory obtained with the previous iteration (as described in Section 3.5). The values of ∆ξ * adopted to obtain the results in Figure 7 are 0.1, 0.01 and 0.005, for the three consecutive iterations. The result achieved with the third iteration coincides with the trajectory already presented in Figure 5. Therefore, after few iterations, the same fuel economy of an FADP with a greater state element number is attained, even though with a reduced computational time. This advancement can be observed from Figure 8, where the influence of the number of discrete state elements M on the required computational time and the fuel economy for the IFADP approach is reported with respect to the NYCC driving cycle. Comparing Figure 8 with Figure 6, it can be seen that, with a given number of state elements, e.g., M = 2000, the IFADP approach achieves the same fuel economy of the FADP in half the required time.   Table 4 reports the RFE results obtained for all investigated driving cycles. It is wort noting that the difference between the optimal state trajectories in Figure 5 does not lea to different results in term of fuel saving (see Table 4).

Analysis of Causal ECMS-Based Strategy Performance
Similarly to what has been obtained for PMP, to solve the optimization problem through the ECMS-based approach, the control variable Pb is again discretized into Q points, in which the Hamiltonian is evaluated to find its minimum value.
The computational time required to find the sub-optimal SoC trajectory, and thus th battery power control law over the whole driving mission, is comparable with the on required by one iteration of the PMP (see Table 3). The main advantage is that the pro posed causal ECMS-based technique can be implemented online since only informatio up to the current time instant is required. Figure 9 shows the SoC trajectory obtained with the ECMS-based strategy (gree solid line) compared to that of the DP (blue solid line), PMP (red dashed line), FADP (dot ted line) and IFADP (dot-dashed line) approaches. It is important to notice that both meth   Table 4 reports the RFE results obtained for all investigated driving cycles noting that the difference between the optimal state trajectories in Figure 5 do to different results in term of fuel saving (see Table 4).

Analysis of Causal ECMS-Based Strategy Performance
Similarly to what has been obtained for PMP, to solve the optimizatio through the ECMS-based approach, the control variable Pb is again discreti points, in which the Hamiltonian is evaluated to find its minimum value.
The computational time required to find the sub-optimal SoC trajectory, a battery power control law over the whole driving mission, is comparable w required by one iteration of the PMP (see Table 3). The main advantage is th posed causal ECMS-based technique can be implemented online since only i up to the current time instant is required.  Table 4 reports the RFE results obtained for all investigated driving cycles. It is worth noting that the difference between the optimal state trajectories in Figure 5 does not lead to different results in term of fuel saving (see Table 4).

Analysis of Causal ECMS-Based Strategy Performance
Similarly to what has been obtained for PMP, to solve the optimization problem through the ECMS-based approach, the control variable P b is again discretized into Q points, in which the Hamiltonian is evaluated to find its minimum value.
The computational time required to find the sub-optimal SoC trajectory, and thus the battery power control law over the whole driving mission, is comparable with the one required by one iteration of the PMP (see Table 3). The main advantage is that the proposed causal ECMS-based technique can be implemented online since only information up to the current time instant is required. Figure 9 shows the SoC trajectory obtained with the ECMS-based strategy (green solid line) compared to that of the DP (blue solid line), PMP (red dashed line), FADP (dotted line) and IFADP (dot-dashed line) approaches. It is important to notice that both methods achieve the same final value of SoC ξ(t f ) = ξ t (although not always guaranteed, as already explained, with the ECMS-based approach), but following two completely different SoC trajectories. However, although this is a substantial difference, the achieved relative fuel economies (reported in Table 5) are very close. different SoC trajectories. However, although this is a substantial difference, the achieved relative fuel economies (reported in Table 5) are very close. To make a fair comparison and to evaluate the distance from the equivalent optimal solution achieved with the PMP approach, the optimal control problem has been solved through PMP by setting the SoC final value (ξ(tf) reported in Table 5) as the same final value obtained through the ECMS-based technique for each driving cycle. The ratio between the RFE achieved with ECMS, and that of the PMP, has been calculated and reported in the last column of Table 5. The proximity to the optimal fuel consumption is confirmed by the fact that these ratios are all greater than or equal to 97%.  Table 5 highlight some substantial differences with respect to the other techniques presented, due to the differences in the final state of charge. For example, the negative relative fuel economy, with respect to the conventional vehicle, on the HWFET test cycle can be explained by assuming that the battery SoC increases by nearly 1% during the driving mission. This result suggests that a considerable amount of energy from the fuel is used to recharge the battery, similarly to what happens by applying the PMP (as confirmed by the RFE ratio of 100%). Moreover, as shown in Figure 10, it can be observed that the change in Relative Fuel Economy with  To make a fair comparison and to evaluate the distance from the equivalent optimal solution achieved with the PMP approach, the optimal control problem has been solved through PMP by setting the SoC final value (ξ(t f ) reported in Table 5) as the same final value obtained through the ECMS-based technique for each driving cycle. The ratio between the RFE achieved with ECMS, and that of the PMP, has been calculated and reported in the last column of Table 5. The proximity to the optimal fuel consumption is confirmed by the fact that these ratios are all greater than or equal to 97%.
Nevertheless, the RFE values reported in Table 5 highlight some substantial differences with respect to the other techniques presented, due to the differences in the final state of charge. For example, the negative relative fuel economy, with respect to the conventional vehicle, on the HWFET test cycle can be explained by assuming that the battery SoC increases by nearly 1% during the driving mission. This result suggests that a considerable amount of energy from the fuel is used to recharge the battery, similarly to what happens by applying the PMP (as confirmed by the RFE ratio of 100%). Moreover, as shown in Figure 10, it can be observed that the change in Relative Fuel Economy with respect to the number of control elements is negligible with a value of Q greater than 1000. With a lower elements number, the algorithm is less stable and accurate and the constraint on the final SoC cannot be respected anymore, thus enhancing the RFE change due to battery energy consumption.
nergies 2022, 15, x FOR PEER REVIEW Figure 10. Influence of the number of the discrete control elements on the computa economy achieved through the ECMS-based technique on the WLTP driving cycle.
The differences between causal and non-causal techniques are partic on short driving cycles. On the other hand, it is possible to practically sho lence, and hence the effectiveness, of the ECMS-based optimization in th running multiple repetitions of the same test cycle and evaluating the ov sumption. By doing so, the SoC difference, and thus its fuel equivalent, is by the higher amount of fuel used. These results are shown in Table 6, w values line up with the ones achieved through the other presented non-cau tion methods.

Conclusions
The paper presented a simulation analysis of four supervisory control Hybrid Electric Vehicles (HEVs) based on different optimization algorith known in the literature, being the Dynamic Programming (DP) and the Pon The differences between causal and non-causal techniques are particularly evident on short driving cycles. On the other hand, it is possible to practically show the equivalence, and hence the effectiveness, of the ECMS-based optimization in the long run by running multiple repetitions of the same test cycle and evaluating the overall fuel consumption. By doing so, the SoC difference, and thus its fuel equivalent, is compensated by the higher amount of fuel used. These results are shown in Table 6, where the RFE values line up with the ones achieved through the other presented non-causal optimization methods.

Conclusions
The paper presented a simulation analysis of four supervisory control strategies for Hybrid Electric Vehicles (HEVs) based on different optimization algorithms, two wellknown in the literature, being the Dynamic Programming (DP) and the Pontryagin's Minimum Principle (PMP), and two new algorithms proposed by the authors and modified from the literature, being the Forward Approach to Dynamic Programming (FADP) and the Equivalent Consumption Minimization Strategy-based (ECMS-based).
The techniques' performance comparison has been made focusing on the optimal power distribution between the components of a parallel HEV, and the achieved results have been analysed in terms of fuel economy and computational time. The results obtained with the FADP, DP and PMP optimizations are in accordance with each other, and all the methods have shown to be able to reach the absolute optimum, although it is worth recalling that absolute optimality cannot be guaranteed with the application of PMP alone. Nevertheless, due to its capability to find the absolute optimum and of its computational speed, PMP stands out as the ideal candidate for the integration of energy management strategies.
For all the test cycles, the results achieved in terms of relative fuel economy through the causal ECMS-based technique are always comparable to the ones of non-causal strategies, which is proof of the goodness of the procedure despite the sub-optimality of the solution and the differences in the state of charge trajectory.
The presented FADP technique shows comparable computational time with respect to the well-established DP and the capability to attain the absolute optimum. Moreover, FADP results are expressed in terms of cost-to-arrive rather than cost-to-go (as in DP), which makes this technique particularly useful in the benchmarking process of causal energy management strategies. Clearly, since FADP and DP are slower than real time, they cannot be used for on-board control applications (along with the fact that they are non-causal). Conversely, PMP and causal ECMS-based optimizations are faster than real time and, hence, are suitable for such applications.
In conclusion, in the current automotive scenario, the ideal candidate for on-board control applications is the causal ECMS-based optimization for the good performance shown without the need for using information in future driving conditions. On the other hand, the technological advances in driving assistance systems and connected vehicles that are leading towards autonomous driving, are the ultimate game changer that can make the driving horizon more deterministic and, hence, render non-causal techniques more interesting from the industrial point of view in the near future.
As future advancement of this research, detailed powertrain components modelling and realistic driving cycles (e.g., Real Driving Emissions cycle) will be considered to provide more accurate results and make the comparison of optimization approaches more robust.