Electric Vehicle Energy Management Under Unknown Disturbances from Undefined Power Demand: Online Co-State Estimation via Reinforcement Learning

C. Treesatayapun; A. J. Munoz-Vazquez; S. K. Korkua; B. Srikarun; C. Pochaiya

doi:10.3390/en18154062

,

and

¹

Robotics and Advanced Manufacturing, Center for Research and Advanced Studies (CINVESTAV), 1062 Industria Metalurgica Av., Ramos Arizpe 25903, Mexico

²

Higher Education Center at McAllen, Texas A&M University (TAMU), College Station, TX 78504, USA

³

School of Engineering and Technology, Walailak University, Nakhonsrithammarat 80161, Thailand

^*

Author to whom correspondence should be addressed.

Energies2025, 18(15), 4062;https://doi.org/10.3390/en18154062

This article belongs to the Special Issue Forecasting and Optimization in Transport Energy Management Systems

Version Notes

Order Reprints

Abstract

This paper presents a data-driven energy management scheme for fuel cell and battery electric vehicles, formulated as a constrained optimal control problem. The proposed method employs a co-state network trained using real-time measurements to estimate the control law without requiring prior knowledge of the system model or a complete dataset across the full operating domain. In contrast to conventional reinforcement learning approaches, this method avoids the issue of high dimensionality and does not depend on extensive offline training. Robustness is demonstrated by treating uncertain and time-varying elements, including power consumption from air conditioning systems, variations in road slope, and passenger-related demands, as unknown disturbances. The desired state of charge is defined as a reference trajectory, and the control input is computed while ensuring compliance with all operational constraints. Validation results based on a combined driving profile confirm the effectiveness of the proposed controller in maintaining the battery charge, reducing fluctuations in fuel cell power output, and ensuring reliable performance under practical conditions. Comparative evaluations are conducted against two benchmark controllers: one designed to maintain a constant state of charge and another based on a soft actor–critic learning algorithm.

Keywords:

energy management system; electric vehicle; co-state optimal control; online data-driven; unknown disturbances

1. Introduction

The growing demand for sustainable transportation has intensified research in electric vehicles powered by hybrid energy sources []. Among these, fuel cell and battery electric vehicles represent a promising solution by combining the high energy density of hydrogen fuel cells with the fast dynamic response of lithium-ion batteries [,]. Effective coordination of power flow between these energy sources is critical to achieving performance, efficiency, and durability optimization objectives [,]. A key challenge is the development of an energy management system (EMS) that can optimize fuel cell operation while maintaining the battery state of charge within acceptable bounds [,]. Conventional energy management strategies often rely on rule-based logic or predictive control methods that depend on accurate models of the vehicle powertrain [,]. However, precise modeling of such systems is inherently difficult due to nonlinear dynamics, time-varying operating conditions, and the presence of unknown or uncertain elements, such as auxiliary loads and environmental factors. Moreover, many model-based approaches require prior knowledge of future driving profiles or disturbance patterns, which may not be readily available in real-time applications [,].

Model-based EMS for electric vehicles typically rely on detailed mathematical representations of powertrain components, including fuel cells, batteries, electric motors, and auxiliary loads [,]. These models facilitate the application of advanced optimization and control techniques, such as model predictive control and dynamic programming, to compute power allocation strategies that improve energy efficiency and prolong component lifespan []. By accurately capturing system dynamics and constraints, model-based EMS can predict future states and proactively regulate power flow []. However, the performance of these approaches is highly dependent on the accuracy and completeness of the underlying models, which are often difficult to construct due to nonlinear behavior, parameter uncertainties, and variable operating conditions []. In addition, the computational demands associated with solving model-based optimization problems in real time may hinder practical implementation, especially under uncertain or rapidly changing environments [].

The application of reinforcement learning (RL) to EMS in electric vehicles has gained substantial momentum, driven by the increasing need to manage system complexity and uncertainty without reliance on precise mathematical models [,]. In particular, actor–critic frameworks have emerged as a promising class of RL algorithms, as they integrate the processes of value estimation and policy optimization, thereby supporting real-time adaptability to dynamic driving environments and power demand fluctuations [,]. Compared to traditional control strategies such as rule-based logic or model predictive control (MPC), actor–critic methods have demonstrated superior performance in capturing nonlinear behaviors and learning optimal control policies through direct interaction with the system []. Despite these advantages, conventional RL implementations often rely on pretraining with large and diverse datasets, which can constrain their practicality for real-time deployment [,]. Moreover, the inherent dimensional complexity of electric vehicle powertrain systems—characterized by high-dimensional state-action spaces—poses challenges related to computational overhead and algorithmic convergence [,]. These limitations have motivated the development of more computationally efficient and data-adaptive approaches that can learn effectively in online settings without extensive offline training.

In response to the limitations of model-based methods and conventional actor–critic architectures, data-driven and model-free control approaches have attracted increasing interest due to their ability to learn system behavior directly from real-time measurements []. These methods offer robust adaptability under uncertain and varying conditions without relying on explicit system models. This work proposes an adaptive control framework based on a fuzzy rule emulated network, termed MiFREN [], for real-time power management in fuel cell and battery electric vehicles. The scheme incorporates two learning networks: MiFREN_m, which estimates unknown system dynamics, and MiFREN_s, which approximates the cost gradient necessary for control optimization. Together, these networks enable online adaptation and optimization without prior knowledge of system parameters.

The main contributions of this work are summarized as follows:

Unlike conventional reinforcement learning schemes [,,], which typically rely on iterative learning and extensive offline training, the proposed approach employs a co-state network that is trained solely using online data in real time. This design enables the formulation of an optimal energy management controller without requiring a comprehensive dataset or prior knowledge of the entire operational domain. Additionally, the framework inherently avoids the curse of dimensionality, making it well-suited for practical deployment in embedded systems.
By treating power consumption associated with air conditioning systems, time-varying slopes and road conditions, passenger support systems, and other onboard demands as unknown disturbances, the robustness of the proposed scheme is demonstrated both from a practical perspective and through theoretical analysis.
From the perspective of energy management as a control system, the desired state of charge (SOC) is formulated as the reference trajectory, while the optimal control input is computed using the proposed control law under full operational constraints.

The remainder of this paper is organized as follows. Section 2 presents the problem formulation, where the electric vehicle with energy management is modeled as a class of discrete-time systems, and the optimal solution is addressed from the perspective of a model-free control approach. Section 3 details the design of the proposed controller along with the corresponding analysis. Section 4 provides validation results and comparative evaluations. Finally, the conclusion is presented in Section 5.

2. Problem Formulation with EV-EMS Framework

2.1. A Class of Control Systems Based on Model-Free EV-EMS

For the purposes of this study, the fuel cell/battery electric vehicle is, without loss of generality and within a discrete-time aspect, briefly represented by the power-flow block diagram shown in Figure 1. The drive block denotes the required power

P_{d} (k)

[kW] at sampling index k, as determined by the vehicle velocity

v (k)

, physical resistances, and longitudinal dynamics.

Figure 1. Power flow block diagram.

In this work, the standard model is employed to express the power as a function of the real-time sampling index k and sampling interval

T_{s}

[s] as

\begin{matrix} P_{d} (k) & = & \frac{1}{2} C_{d} A_{f} ρ_{a} v^{3} (k) + R_{i} m_{0} v (k) \frac{v (k) - v (k - 1)}{T_{s}} \\ + m_{0} g v (k) sin (θ_{r} (k)) + R_{r} m_{0} g v (k) sin (θ_{r} (k)), \end{matrix}

(1)

where

θ_{r} (k)

denotes the road slope and other parameters are defined as Table 1.

Table 1. System parameters.

With respect to a quasi-static fuel cell model,

P_{f c} (k)

[kW] denotes the power output at sampling index k, which reflects the instantaneous hydrogen consumption, expressed as

{\dot{m}}_{H_{2}} (k) = f_{f c} (P_{f c} (k))

, where

f_{f c} (-)

is a generally nonlinear function considered unknown in this work. In light of this relationship, and consistent with insights from related studies, the EMS design in this work is directly based on the principle that a reduction in the required power

P_{f c} (k + 1)

leads to a corresponding decrease in hydrogen consumption.

The lithium-ion battery pack is modeled using its standard equivalent circuit, where the output current

I_{b} (k)

is governed by the battery power

P_{b} (k)

, such that

I_{b} (k) = \frac{V_{o c} - \sqrt{V_{o c}^{2} - 4 R_{i} P_{b} (k)}}{2 R_{i}},

(2)

where

V_{o c}

is the open circuit voltage and

R_{i}

is the internal resistance. It is evident that under discharge conditions, when

P_{b} (k) < \frac{V_{o c}^{2}}{4 R_{i}}

, a proportional relationship between the battery power and output current can be established, provided that the state variation is consistent with the direction of the discharge current. By this mean, the change in

S O C

or

\dot{S O C}

can be discretized as

\dot{S O C} |_{t \in [k T s, (k + 1) T_{s})} ≐ \frac{S O C (k + 1) - S O C (k)}{T_{s}} = - η_{b}^{- sign \{P_{b} (k)\}} \frac{P_{b} (k)}{3, 600 E_{0}},

or

S O C (k + 1) = S O C (k) - η_{b}^{- sign \{P_{b} (k)\}} \frac{P_{b} (k)}{3, 600 E_{0}} T_{s},

(3)

where

η_{b}

denotes the Coulombic efficiency of the battery pack, and

E_{o}

represents its capacity, as specified in Table 1.

The auxiliary power

P_{a u x} (k)

accounts for the total power consumption associated with the air conditioning system, passenger support utilities, operational functions, and other onboard loads. In this work,

P_{a u x} (k)

is treated as an unknown disturbance, and, in contrast to previous works, no estimation models or observers are employed.

The electric motor, integrated into the EV powertrain, has an efficiency

η_{e m}

generally characterized as a function of the output torque

τ_{m}

and rotational speed

ω_{m}

, such that

η_{e m} = f_{η_{e m}} (τ_{m}, ω_{m})

, with

f_{η_{e m}} (\cdot)

defined experimentally. For simplicity and without loss of generality,

η_{e m}

is specified as listed in Table 1. The motor is driven via a DC/AC inverter and a mechanical transmission, with their respective efficiencies denoted by

η_{D C / A C}

and

η_{m}

, also provided in Table 1. Furthermore, the electrical energy generated by the fuel cell is delivered through a DC/DC converter, with its efficiency

η_{D C / D C}

likewise specified. Based on these considerations, the power balance for the EV powertrain can be expressed as

{(η_{m} η_{e m} η_{D C / A C})}^{- sign \{F_{E V} (k)\}} P_{d} (k) + P_{a u x} (k) = η_{D C / D C} P_{f c} (k) + P_{b} (k),

(4)

where

\begin{matrix} F_{E V} (k) & = & \frac{1}{2} C_{d} A_{f} ρ_{a} v^{2} (k) + R_{i} m_{0} \frac{v (k) - v (k - 1)}{T_{s}} \\ + m_{0} g sin (θ_{r} (k)) + R_{r} m_{0} g sin (θ_{r} (k)) . \end{matrix}

(5)

It is worth noting that the dynamics presented in Equations (1)–(4) are simplified representations, provided without loss of generality. The complete formulations can be found in the following references: the EV model and gear system in [,], the battery model and state of charge (SOC) dynamics in [], and the fuel cell and motor models in []. Nonetheless, these models and associated dynamics are introduced solely for conceptual illustration and validation purposes, as the proposed scheme, which will be discussed in subsequent sections, is entirely model-free and relies exclusively on real-time data from measurable states. For practical implementation, the relevant physical constraints are summarized as follows:

\begin{matrix} P_{f c}^{m i n} \leq & P_{f c} (k) & \leq P_{f c}^{m a x}, \\ - Δ P_{f c}^{M} \leq & Δ P_{f c} (k) & \leq Δ P_{f c}^{M}, \\ S O C^{m i n} \leq & S O C (k) & \leq S O C^{m a x}, \\ P_{a u x}^{m i n} \leq & P_{a u x} (k) & \leq P_{a u x}^{m a x}, \\ - P_{b}^{M} \leq & P_{b} (k) & \leq P_{b}^{M}, \\ - I_{b}^{M} \leq & I_{b} (k) & \leq I_{b}^{M}, \\ - P_{d}^{G e n} \leq & P_{d} (k) & \leq P_{d}^{D r v}, \end{matrix}

(6)

where all corresponding values are detailed in Table 2. In this work,

P_{d}^{G e n}

and

P_{d}^{D r v}

are defined to represent the limitations imposed by motor characteristics, such as the maximum torque during both generator and drive modes, as well as constraints associated with the maximum allowable vehicle velocity and its specifications.

Table 2. Constraint parameters.

2.2. Characterization of the Optimal Solution

Without loss of generality, the dynamics described in Equations (1)–(4) can be regarded as a class of discrete-time control systems, where

S O C (k)

denotes the output

y (k)

,

P_{b} (k)

,

P_{f c} (k)

,

P_{d} (k)

, and

v (k)

are measurable states

x (k) \in R^{4}

, and

P_{a u x} (k)

is treated as a disturbance

d (k)

that requires neither prediction nor direct measurement. The control effort

u (k)

is employed as

Δ P_{f c} (k)

such that

u (k) ≐ Δ P_{f c} (k) = P_{f c} (k) - P_{f c} (k - 1) .

(7)

Recalling (4) with (7), it yields

\begin{matrix} P_{b} (k) & = & {(η_{m} η_{e m} η_{D C / A C})}^{- sign \{F_{E V} (k)\}} P_{d} (k) + P_{a u x} (k) - η_{D C / D C} P_{f c} (k), \\ = & {(η_{m} η_{e m} η_{D C / A C})}^{- sign \{F_{E V} (k)\}} P_{d} (k) - η_{D C / D C} [u (k) + P_{f c} (k - 1)] \\ + d (k) . \end{matrix}

(8)

By substituting (8) into (3), the resulting expression becomes

\begin{matrix} S O C (k + 1) & = & S O C (k) - \frac{η_{b}^{- sign \{P_{b} (k)\}}}{3, 600 E_{0}} T_{s} [- η_{D C / D C} [u (k) + P_{f c} (k - 1)] \\ + {(η_{m} η_{e m} η_{D C / A C})}^{- sign \{F_{E V} (k)\}} P_{d} (k) + d (k)] . \end{matrix}

(9)

With

S O C (k + 1)

considered as the output, the dynamics in Equation (9) can be generalized as a class of non-affine discrete-time systems, described as

y (k + 1) = f_{N} (y (k), x_{1} (k), x_{2} (k), x_{3} (k), x_{4} (k), u (k), d (k)),

(10)

where

f_{N} (\cdot)

denotes an unknown function, and only the state vector

x (k)

and the output

y (k)

are available by design. In this work, the disturbance

d (k)

is treated as an unknown but physically bounded time-varying parameter, representing practical variations such as auxiliary power demands or environmental influences, which remain within realistic operational limits.

To address the challenge of handling unknown system dynamics under practical disturbances, this work introduces time-varying functions constructed based on the equivalent modeling framework proposed in []. This approach enables the transformation of the original nonlinear dynamics (10) into an affine equivalent representation, thereby facilitating real-time implementation and control synthesis. The resulting system can be expressed as

y (k + 1) = f_{o} (k) + g (k) u (k),

(11)

where

f_{o} (k)

is the internal dynamics and

g (k)

is the input gain. It is worth mentioning that

f_{o} (k)

and

g (k)

are unknown nonlinear functions.

At this stage, the control input

u (k)

must be designed to drive the system output

y (k + 1)

in Equation (11) to follow the desired trajectory

y_{d} (k + 1)

. In this work,

y_{d} (k + 1)

corresponds to the reference state-of-charge denoted as

S O C_{r} (k + 1)

, such that

y_{d} (k + 1) ≐ S O C_{r} (k + 1) = δ_{S O C} S O C_{r} (k) + [1 - δ_{S O C}] S O C (k),

(12)

where

δ_{S O C} < 1

is a design parameter.

Therefore, the problem is formulated for the control systems when the tracking error

e (k)

is defined as

e (k) = y (k) - y_{d} (k) .

(13)

By utilizing (12) and (11) with (13), it yields

\begin{matrix} e (k + 1) & = & y (k + 1) - y_{d} (k + 1), \\ = & f_{o} (k) + g (k) u (k) - y_{d} (k + 1), \\ = & f (k) + g (k) u (k), \end{matrix}

(14)

where

f (k) = f_{o} (k) - δ_{S O C} S O C_{r} (k) + [1 + δ_{S O C}] S O C (k) .

(15)

It is important to emphasize that the affine discrete-time system in (14) has been derived directly to represent the error dynamics, without any transformation from the original plant. Furthermore, the functions

f (k)

and

g (k)

are assumed to be unknown nonlinear functions.

In accordance with the optimal control design, the long-term cost function

J (k)

is defined in this work as

J (k) = \sum_{i = k}^{\infty} r (i),

(16)

where

r (i)

is the unity function given as

r (i) = q_{r} e^{2} (i) + p_{r} u^{2} (i) .

(17)

q_{r}

and

p_{r}

are positive constants. It is clear that

r (i) = 0

when only

e (i) = 0

and

u (i) = 0

.

By using

J (k)

in (16), we obtain

\begin{matrix} J (k) & = & r (k) + \sum_{i = k + 1}^{\infty} r (i), \\ = & r (k) + J (k + 1) . \end{matrix}

(18)

To determine the optimal control law, it requires the constraint such that

\frac{\partial J (k)}{\partial u (k)} = 0

. That leads to

\begin{matrix} 0 & = & \frac{\partial r (k)}{\partial u (k)} + \frac{\partial J (k + 1)}{\partial u (k)}, \\ = & 2 p_{r} u (k) + \frac{\partial J (k + 1)}{\partial e (k + 1)} \frac{\partial e (k + 1)}{\partial u (k)} . \end{matrix}

(19)

Let us recall the error dynamics in (14); thus, the ideal optimal control law

u^{*} (k)

is given as

u^{*} (k) = - \frac{1}{2 p_{r}} g (k) \frac{\partial J (k + 1)}{\partial e (k + 1)} .

(20)

The controller in Equation (20) is impractical for implementation, as it requires knowledge of the unknown function

g (k)

and the future value of the cost function gradient

\frac{\partial J (k + 1)}{\partial e (k + 1)}

. To address this limitation, two key components are proposed: (i) a data-driven scheme is employed to estimate the unknown function

g (k)

, denoted as

\hat{g} (k)

, and (ii) an approximation of the co-state,

\hat{λ} (k)

, is introduced to predict

\frac{\partial J (k + 1)}{\partial e (k + 1)}

. Accordingly, the practical controller developed in this work, is formulated as

u (k) = - \frac{1}{2 p_{r}} \hat{g} (k) \hat{λ} (k) .

(21)

It is worth noting that the control law in Equation (21) involves two time-varying parameters, which will be developed in the following sections, subject to the practical constraints defined in Equation (6).

3. Controller as EMS with MiFREN-Estimators

By considering the control law in Equation (21), it is evident that two main components are required for its implementation:

\hat{g} (k)

and

\hat{λ} (k)

. In this work, the first adaptive network, denoted as MiFREN_m, is constructed to estimate the unknown function

\hat{g} (k)

based on a data-driven approach. Subsequently, the second network, MiFREN_s, is designed to generate the co-state approximation

\hat{λ} (k)

.

3.1. Dynamic Equivalent Model

An adaptive network, referred to as MiFREN_m, is employed to estimate the error dynamics described in Equation (14), based on the network architecture illustrated in Figure 2. Accordingly, the equivalent model

\hat{y} (k + 1)

is formulated as

\begin{matrix} \hat{y} (k + 1) & = & \hat{f} (k) + \hat{g} (k) u (k), \\ = & β_{f}^{T} (k) φ (k) + β_{g}^{T} (k) φ (k) u (k), \end{matrix}

(22)

where

β_{f} (k) \in R^{N}

and

β_{g} (k) \in R^{N}

are weight parameters of MiFREN and N is the number of IF–THEN rules.

φ (k) \in R^{N}

is a regression vector for inputs

y (k)

and

y (k - 1)

. By setting N as negative, Z as zero, and P as positive membership functions as Figure 2, we have

N = 9

. Furthermore, it is worth emphasizing that

\hat{g} (k)

required by the control law (21) is determined as

\hat{g} (k) = β_{g}^{T} (k) φ (k) .

(23)

Thereafter, the learning law is derived to tune parameters

β_{f} (k)

and

β_{g} (k)

with the cost function

E_{e} (k + 1)

defined as

E_{e} (k + 1) = \frac{1}{2} {\tilde{e}}^{2} (k + 1),

(24)

where

\tilde{e} (k + 1) = y (k + 1) - \hat{y} (k + 1) .

(25)

Figure 2. MiFREN_m architecture: Model network.

By utilizing the gradient search, learning laws for

β_{f} (k)

and

β_{g} (k)

are expressed as

β_{f} (k + 1) = β_{f} (k + 1) - η_{e} \frac{\partial E_{e} (k + 1)}{\partial β_{f} (k + 1)},

(26)

and

β_{g} (k + 1) = β_{g} (k + 1) - η_{e} \frac{\partial E_{e} (k + 1)}{\partial β_{g} (k + 1)},

(27)

respectively, where

η_{e}

is the learning rate. Let us employ the chain rule over the estimated error (22); thus, we obtain

\begin{matrix} \frac{\partial E_{e} (k + 1)}{\partial β_{f} (k + 1)} & = & \frac{\partial E_{e} (k + 1)}{\partial \tilde{e} (k + 1)} \frac{\partial \tilde{e} (k + 1)}{\partial \hat{y} (k + 1)} \frac{\partial \hat{y} (k + 1)}{\partial β_{f} (k + 1)}, \\ = & - \tilde{e} (k + 1) φ (k), \end{matrix}

(28)

and

\begin{matrix} \frac{\partial E_{e} (k + 1)}{\partial β_{g} (k + 1)} & = & \frac{\partial E_{e} (k + 1)}{\partial \tilde{e} (k + 1)} \frac{\partial \tilde{e} (k + 1)}{\partial \hat{y} (k + 1)} \frac{\partial \hat{y} (k + 1)}{\partial β_{g} (k + 1)}, \\ = & - \tilde{e} (k + 1) u (k) φ (k) . \end{matrix}

(29)

By substitution (26) and (27) with (28) and (29), respectively, learning laws are derived as

β_{f} (k + 1) = β_{f} (k) + η_{e} \tilde{e} (k + 1) φ (k),

(30)

and

β_{g} (k + 1) = β_{g} (k) + η_{e} \tilde{e} (k + 1) u (k) φ (k) .

(31)

Next, the convergence of the proposed learning laws (30) and (31) will be analysed through the learning rate

η_{e}

by the following Lemma.

Lemma 1.

By utilizing the tracking-error equivalent model (22) with the learning laws (30) and (31) and boundedness of the control effort and weight parameters such that

| u (k) | \leq u_{M}

,

| | β_{f} (k) | | \leq β_{f}^{M}

and

| | β_{g} (k) | | \leq β_{g}^{M}

, the convergence of

\tilde{e} (k + 1)

is guaranteed when the learning rate

η_{e}

is employed as the time-varying leaning rate

η_{e} (k)

given as

η_{e} (k) = \frac{γ_{e}}{[1 + | u (k) u (k - 1) |] φ^{T} (k) φ (k - 1)},

(32)

where

0 < γ_{e} < 1 .

(33)

Proof.

Let us recall the universal function approximation of MiFREN; thus, the dynamics of the tracking error (14) can be rewritten by ideal weight parameters

β_{f}^{*}

and

β_{g}^{*}

as

y (k + 1) = φ^{T} (k) β_{f}^{*} + φ^{T} (k) β_{g}^{*} u (k) + ε_{e} (k),

(34)

when

ε_{e} (k)

is a bounded residual error

| ε_{e} (k) | \leq ε_{e M}

. By using (25) with (22) and (34), it yields

\begin{matrix} \tilde{e} (k + 1) & = & y (k + 1) - \hat{y} (k + 1), \\ = & φ^{T} (k) [β_{f}^{*} - β_{f} (k)] + φ^{T} (k) [β_{g}^{*} - β_{g} (k)] u (k) + ε_{e} (k), \\ = & φ^{T} (k) {\tilde{β}}_{f} (k) + φ^{T} (k) {\tilde{β}}_{g} (k) u (k) + ε_{e} (k), \end{matrix}

(35)

where

{\tilde{β}}_{f} (k) = β_{f}^{*} - β_{f} (k)

and

{\tilde{β}}_{g} (k) = β_{g}^{*} - β_{g} (k)

.

Let us recall learning laws (30) and (31); they lead to

{\tilde{β}}_{f} (k + 1) = {\tilde{β}}_{f} (k) - η_{e} \tilde{e} (k + 1) φ (k),

(36)

and

{\tilde{β}}_{g} (k + 1) = {\tilde{β}}_{g} (k) - η_{e} \tilde{e} (k + 1) u (k) φ (k),

(37)

respectively. Utilizing one-step back to (36) and (37) and substituting into (35), we obtain

\begin{matrix} \tilde{e} (k + 1) & = & φ^{T} (k) [{\tilde{β}}_{f} (k - 1) - η_{e} \tilde{e} (k) φ (k - 1)] \\ + φ^{T} (k) [{\tilde{β}}_{g} (k - 1) - η_{e} \tilde{e} (k) u (k - 1) φ (k - 1)] u (k) + ε_{e} (k), \\ = & - η_{e} [1 + u (k) u (k - 1)] φ^{T} (k) φ (k - 1) \tilde{e} (k) \\ + φ^{T} (k) {\tilde{β}}_{f} (k - 1) + φ^{T} (k) {\tilde{β}}_{g} (k - 1) u (k) + ε_{e} (k), \\ = & A_{\tilde{e}} (k) \tilde{e} (k) + B_{\tilde{e}} (k), \end{matrix}

(38)

where

A_{\tilde{e}} (k) = - η_{e} [1 + u (k) u (k - 1)] φ^{T} (k) φ (k - 1),

(39)

and

B_{\tilde{e}} (k) = φ^{T} (k) {\tilde{β}}_{f} (k - 1) + φ^{T} (k) {\tilde{β}}_{g} (k - 1) u (k) + ε_{e} (k) .

(40)

For the setting of membership functions

μ (-) \in [0, 1]

, it leads to

| | φ (k) | | \leq N

. Thus, it is clear that

B_{\tilde{e}} (k)

is bounded as

B_{\tilde{e}} (k) \leq 2 N [β_{f}^{M} + β_{g}^{M} u_{M}] + ε_{e M} .

(41)

Thereafter, by substitution

η_{e}

in (39) with the time-varying leaning rate in (32), we have

A_{\tilde{e}} (k) = - γ_{e} \frac{[1 + u (k) u (k - 1)] φ^{T} (k) φ (k - 1)}{[1 + | u (k) u (k - 1) |] φ^{T} (k) φ (k - 1)} .

(42)

Considering the setting of

γ_{e}

in (33), it is obvious that

| A_{\tilde{e}} (k) | < 1

. Thus,

\tilde{e} (k + 1)

is a convergence sequence. The proof is completed. □

At this point, the estimation

\hat{g} (k)

required by the proposed control law (21) has been derived. Therefore, the co-state

\hat{λ} (k)

will be established next.

3.2. Co-State Estimation

In this section, the co-state network is constructed by another MiFREN with the network architecture depicted in Figure 3. The tracking error

e (k)

and

u (k)

are inputs, and the output is the estimated co-state formulated as

\hat{λ} (k) = β_{λ}^{T} (k) φ_{λ} (k),

(43)

where

β_{λ} (k) \in R^{N}

is the weight vector and

φ (k) \in R^{N}

is the input-regression vector.

Figure 3. MiFREN_s architecture: Co-state network.

Let us recall the definition of the co-state

λ (k)

and its discrete-time approximation such that

λ (k) ≐ \frac{\partial J (k + 1)}{\partial e (k + 1)} \approx \frac{J (k + 1) - J (k)}{e (k + 1) - e (k)} = \frac{Δ J (k)}{Δ e (k)},

(44)

where

Δ e (k) \neq 0

. By utilizing (18), it yields

Δ J (k) = - r (k) .

(45)

Thus, the target co-state

λ_{d} (k)

employed for tuning the parameter

β_{λ} (k)

is formulated as

λ_{d} (k) = - \frac{r (k)}{Δ e (k)} .

(46)

Therefore, the error

e_{λ} (k)

is defined as

e_{λ} (k) = \hat{λ} (k) - λ_{d} (k) .

(47)

Thus, the learning law of

β_{λ} (k)

is given as

β_{λ} (k + 1) = β_{λ} (k) - η_{λ} φ_{λ} (k) e_{λ} (k),

(48)

where

η_{λ}

is the learning rate.

Lemma 2.

By utilizing the learning law (48), the convergence of the weight parameter

β_{λ} (k)

is guaranteed when the learning rate

η_{λ}

is employed as the time-varying variable

η_{λ} (k)

given as

η_{λ} (k) = \frac{γ_{λ}}{Λ_{λ} (k)},

(49)

where

0 < γ_{λ} < 2,

(50)

and

Λ_{λ} (k)

is an eigenvalue of

Ψ_{λ} (k)

defined by

Ψ_{λ} (k) = [\begin{matrix} φ_{λ, 1} (k) φ_{λ, 1} (k) & φ_{λ, 1} (k) φ_{λ, 2} (k) & \dots & φ_{λ, 1} (k) φ_{λ, N} (k) \\ φ_{λ, 2} (k) φ_{λ, 1} (k) & φ_{λ, 2} (k) φ_{λ, 2} (k) & \dots & φ_{λ, 2} (k) φ_{λ, N} (k) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ φ_{λ, N} (k) φ_{λ, 1} (k) & φ_{λ, N} (k) φ_{λ, 2} (k) & \dots & φ_{λ, N} (k) φ_{λ, N} (k) \end{matrix}] .

(51)

Proof.

According to the property of MiFREN, it exists the ideal weight parameter

β_{λ}^{*}

as

λ_{d} (k) = β_{λ}^{* T} φ_{λ} (k) + ε_{λ} (k),

(52)

where

ε_{λ} (k)

is a bounded residual error such that

| ε_{λ} (k) | < ε_{λ M}

. Let us substitute (43) and (52) into (47), thus, we have

\begin{matrix} e_{λ} (k) & = & {[β_{λ} (k) - β_{λ}^{*}]}^{T} φ_{λ} (k) - ε_{λ} (k), \\ = & - {\tilde{β}}_{λ}^{T} (k) φ_{λ} (k) - ε_{λ} (k), \end{matrix}

(53)

where

{\tilde{β}}_{λ} (k) = β_{λ}^{*} - β_{λ} (k)

. By substitution (53) into (48), it yields

β_{λ} (k + 1) = β_{λ} (k) + η_{λ} φ_{λ} (k) {\tilde{β}}_{λ}^{T} (k) φ_{λ} (k) + ε_{λ} (k) η_{λ} φ_{λ} (k),

(54)

or

{\tilde{β}}_{λ} (k + 1) = {\tilde{β}}_{λ} (k) - η_{λ} φ_{λ} (k) {\tilde{β}}_{λ}^{T} (k) φ_{λ} (k) - η_{λ} ε_{λ} (k) φ_{λ} (k) .

(55)

Utilizing matrix algebra, we obtain

\begin{matrix} {\tilde{β}}_{λ} (k + 1) & = & {\tilde{β}}_{λ} (k) - η_{λ} Ψ_{λ} (k) {\tilde{β}}_{λ} (k) - η_{λ} ε_{λ} (k) φ_{λ} (k), \\ = & [1 - η_{λ} Λ_{λ} (k)] {\tilde{β}}_{λ} (k) - η_{λ} ε_{λ} (k) φ_{λ} (k), \\ = & A_{λ} (k) {\tilde{β}}_{λ} (k) + B_{λ} (k), \end{matrix}

(56)

where

A_{λ} (k) = 1 - η_{λ} Λ_{λ} (k),

(57)

and

B_{λ} (k) = - η_{λ} ε_{λ} (k) φ_{λ} (k) .

(58)

By setting membership functions and the bounded residual error

ε_{λ} (k)

, it is clear that

B_{λ} (k)

in (58) is also bounded. Furthermore, by recalling the learning rate

η_{λ}

with (49), it is obvious that

- 1 < A_{λ} (k) < 1

. Thus,

{\tilde{β}}_{λ} (k + 1)

in (56) is a convergence sequence. The proof is completed. □

For clarity, the block diagram of the proposed scheme is shown in Figure 4, illustrating the flow of key signals within the control structure. Upon receiving all measurable states and output variables, the MiFREN_m network estimates the input gain

\hat{g} (k)

, while MiFREN_s computes the co-state

\hat{λ} (k)

. The control law is then executed based on the reference state of charge

S O C_{r} (k)

while accounting for unknown disturbances from driving demand

P_{d} (k)

and auxiliary load

P_{a u x} (k)

, which reflect road and environmental conditions. Here,

P_{f c} (k)

,

P_{d} (k)

, vehicle speed

v (k)

, and battery power

P_{b} (k)

are treated as measurable states, with

S O C (k)

as the output. In contrast,

P_{a u x} (k)

and the road gradient

θ_{r} (k)

are unmeasurable and modeled as external disturbances. The learning laws for both networks operate directly along the time index k, without iterative updates, ensuring real-time applicability.

Figure 4. Control system block diagram.

4. Validation and Comparative Results

4.1. Validation Results

To implement the proposed scheme, the membership functions for all inputs of both MiFREN_m and MiFREN_s, as illustrated in Figure 5, are designed in accordance with the constraints specified in Equation (6) and Table 2. It is worth noting that, for comparative purposes, the battery power

P_{b} (k)

[kW] is reformulated as the stored energy within the battery,

E_{b} (k)

[kWh], defined as

E_{b} (k) = E_{b} (k - 1) + P_{b} (k) \frac{T_{s}}{3, 600} .

(59)

Figure 5. Membership functions.

The velocity trajectory shown in Figure 6 is constructed by sequentially combining standard driving cycles, including UDDS, HWFET, ArtMw150, FTP, ArtRoad, and WLTP2 [,,,]. These datasets are recorded with a sampling interval of 1 [s]; thus, the sampling time is defined as

T_{s} = 1

[s]. Based on the selected velocity profiles and corresponding road conditions, the resulting power demand trajectory

P_{d} (k)

is computed and illustrated in Figure 7.

Figure 6. Velocity profile.

Figure 7. Demanding power:

P_{d} (k)

.

To employ the proposed scheme, the initial battery energy is set to

E_{b} (1) = 15.5

[kWh], which determines the initial value of

S O C

as

S O C (1) = \frac{E_{b} (1)}{E_{0}} = \frac{15.5}{50} = 0.31 .

(60)

Subsequently,

S O C_{r} (k + 1)

is generated in real time according to the relation in (12), with

δ_{S O C} = 0.95

. Therefore, the learning laws of MiFREN_m, given in Equations (30) and (31), and MiFREN_s, given in Equation (48), are employed with learning rates

γ_{e} = 0.5

and

γ_{λ} = 0.5

, respectively.

Under the proposed controller, the time-varying behavior of the fuel cell power

P_{f c} (k)

is shown in Figure 8, with the corresponding control effort

u (k)

, represented as

Δ P_{f c} (k)

, illustrated in Figure 9. The results indicate that the controller successfully maintains the fuel cell power output at a nearly constant level for the majority of the operation period, while strictly adhering to the lower and upper power constraints defined in Table 2. The battery energy evolution

E_{b} (k)

, shown in Figure 10, reveals that despite starting from a low initial energy level, the controller is able to sustain an adequate battery charge throughout the drive cycle. Furthermore, Figure 11 compares the actual state of charge

S O C (k)

with the reference trajectory

S O C_{r} (k)

, demonstrating accurate tracking performance under varying operational conditions. To highlight the adaptive behavior of the proposed networks, Figure 12 illustrates the evolution of the time-varying learning rates

η_{e} (k)

and

η_{λ} (k)

, which govern the online adaptation of MiFREN_m and MiFREN_s, respectively. Nonetheless, it is worth remarking that the fluctuations observed in the reference state of charge

S O C_{r} (k)

are closely associated with variations in the co-state learning rate

η_{λ} (k)

, as illustrated in Figure 12. This correlation highlights the effectiveness of the proposed scheme, wherein the adaptive learning mechanism enables the controller to accurately manage the behavior of the actual

S O C (k)

—particularly under high charge conditions. These learning rates dynamically respond to system variations, contributing to the robustness and real-time adaptability of the control strategy.

Figure 8. Proposed controller:

P_{f c} (k)

.

Figure 9. Proposed controller:

u (k)

or

Δ P_{f c} (k)

.

Figure 10. Proposed controller:

E_{b} (k)

.

Figure 11. Proposed controller:

S O C (k)

.

Figure 12. Proposed controller:

η_{e} (k)

and

η_{λ} (k)

.

4.2. Comparative Results

4.2.1. Comparative Controller A

The comparative controller A is developed based on the concept of maintaining

S O C (k)

approximately constant, following the algorithm proposed in []. All design parameters are selected in accordance with Table 6 of [], except that the minibatch size is increased to 256, as suggested by the numerical formulation in [], to enhance performance. This adjustment is justified by the extended validation period in this work, which spans 6 h compared to only 0.67 h in [].

In this case, the fuel cell power output

P_{f c} (k)

under Controller A is depicted in Figure 13. In comparison with the results obtained using the proposed controller (Figure 8), it is apparent that Controller A induces more pronounced high-frequency fluctuations in

P_{f c} (k)

. This behavior may impose additional stress on the fuel cell system and could potentially reduce its operational lifespan. Figure 14 presents the battery energy trajectory

E_{b} (k)

, indicating that the controller maintains the battery around its nominal energy level throughout the driving cycle. Additionally, Figure 15 shows the evolution of the state of charge

S O C (k)

, demonstrating that Controller A achieves its design objective of regulating

S O C (k)

near a constant value. Nonetheless, it is important to note that this approach may sacrifice smooth fuel cell operation in favor of maintaining a steady battery charge.

Figure 13. Controller A:

P_{f c} (k)

.

Figure 14. Controller A:

E_{b} (k)

.

Figure 15. Controller A:

S O C (k)

.

4.2.2. Comparative Controller B

To address the issue of high-frequency variations in

P_{f c} (k)

, the soft actor–critic scheme developed in [] is adopted as Controller B. All design parameters are chosen in accordance with Table 4 of [], except for the discount factor, which is set to

γ = 0.95

to optimize performance for the current validation case. It is important to note that this approach requires additional information, such as a thermal load model and a detailed description of the air conditioning system, which together allow for a well-defined formulation of

P_{a u x} (k)

. However, in the context of this validation test,

P_{a u x} (k)

is treated as an unknown disturbance. Consequently, the learning algorithm proposed in [] is employed to train both the actor and critic networks, with the learning rates reselected to 0.001.

As a result, the power output

P_{f c} (k)

generated by Controller B is illustrated in Figure 16. In comparison with Controller A, it is evident that the high-frequency components have been significantly suppressed, indicating smoother fuel cell operation. However, some transient variations are observed during the initial hour of operation, likely due to the adaptation period of the learning-based controller. Furthermore, Figure 17 presents the evolution of

S O C (k)

under Controller B. While the state of charge is successfully maintained within the specified operational limits, the

S O C (k)

trajectory exhibits slightly larger oscillations compared to those obtained with the proposed controller and Controller A.

Figure 16. Controller B:

P_{f c} (k)

.

Figure 17. Controller B:

S O C (k)

.

5. Conclusions

This work has presented a data-driven energy management strategy for fuel cell and battery electric vehicles, formulated as a constrained optimal control problem. The proposed approach has integrated a co-state network with online learning to estimate the optimal control input in real time, eliminating the need for prior system modeling or complete operational datasets and avoiding dimensionality challenges common in many learning-based methods. Robustness has been achieved by treating unknown and time-varying disturbances—such as auxiliary power consumption, road slope variations, and passenger-related loads—as bounded system uncertainties. The controller has aimed to track a desired SOC trajectory while respecting all physical constraints and adapting effectively to changing operating conditions without requiring offline training or predictive models. Validation results obtained over a six-hour composite driving profile—comprising UDDS, HWFET, ArtMw150, FTP, ArtRoad, and WLTP2 cycles—have demonstrated the following:

Stable battery operation with SOC maintained within a practical range;
A significant reduction in high-frequency fluctuations of fuel cell power output compared to benchmark controllers;
Improved overall energy efficiency relative to constant SOC and soft actor–critic methods.

Comparative analysis against a constant SOC controller and a soft actor–critic algorithm has further confirmed the proposed scheme’s advantages in terms of stability, robustness to unknown disturbances, and real-time computational feasibility. Building upon the benefits of the online learning framework developed in this work, integration with traffic and route information—by leveraging vehicle-to-everything (V2X) and route-based forecasting—has been identified as a promising direction for future research to further enhance predictive capabilities and energy efficiency under real-world driving conditions.

Author Contributions

Conceptualization, C.T.; Methodology, S.K.K. and C.P.; Validation, C.T., A.J.M.-V. and B.S.; Formal analysis, S.K.K. and B.S.; Investigation, C.T., A.J.M.-V., S.K.K. and C.P.; Data curation, B.S.; Writing—original draft, C.T. and A.J.M.-V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sidharthan, V.P.; Kashyap, Y.; Kosmopoulos, P. Adaptive-Energy-Sharing-Based Energy Management Strategy of Hybrid Sources in Electric Vehicles. Energies 2023, 16, 1214. [Google Scholar] [CrossRef]
Deng, L.; Li, S.; Tang, X.; Yang, K.; Lin, X. Battery thermal- and cabin comfort-aware collaborative energy management for plug-in fuel cell electric vehicles based on the soft actor–critic algorithm. Energy Convers. Manag. 2023, 283, 116889. [Google Scholar] [CrossRef]
Chan, C.C. The State of the Art of Electric, Hybrid, and Fuel Cell Vehicles. Proc. IEEE 2007, 95, 704–718. [Google Scholar] [CrossRef]
Gioffrè, D.; Manzolini, G.; Leva, S.; Jaboeuf, R.; Tosco, P.; Martelli, E. Quantifying the Economic Advantages of Energy Management Systems for Domestic Prosumers with Electric Vehicles. Energies 2025, 18, 1774. [Google Scholar] [CrossRef]
Wang, C.; Liu, Y.; Zhang, Y.; Xi, L.; Yang, N.; Zhao, Z.; Lai, C.S.; Lai, L.L. Strategy for optimizing the bidirectional time-of-use electricity price in multi-microgrids coupled with multilevel games. Energy 2025, 323, 135731. [Google Scholar] [CrossRef]
Maroufi, S.M.; Karrari, S.; Rajashekaraiah, K.; De Carne, G. Power Management of Hybrid Flywheel-Battery Energy Storage Systems Considering the State of Charge and Power Ramp Rate. IEEE Trans. Power Electron. 2025, 40, 9944–9956. [Google Scholar] [CrossRef]
Nawaz, M.; Ahmed, J.; Abbas, G. Energy-efficient battery management system for healthcare devices. J. Energy Storage 2022, 51, 104358. [Google Scholar] [CrossRef]
Uralde, J.; Barambones, O.; del Rio, A.; Calvo, I.; Artetxe, E. Rule-Based Operation Mode Control Strategy for the Energy Management of a Fuel Cell Electric Vehicle. Batteries 2024, 10, 214. [Google Scholar] [CrossRef]
Li, Y.; Pu, Z.; Liu, P.; Qian, T.; Hu, Q.; Zhang, J.; Wang, Y. Efficient predictive control strategy for mitigating the overlap of EV charging demand and residential load based on distributed renewable energy. Renew. Energy 2025, 240, 122154. [Google Scholar] [CrossRef]
Kim, D.J.; Kim, B.; Yoon, C.; Nguyen, N.D.; Lee, Y.I. Disturbance Observer-Based Model Predictive Voltage Control for Electric-Vehicle Charging Station in Distribution Networks. IEEE Trans. Smart Grid 2023, 14, 545–558. [Google Scholar] [CrossRef]
Khan, B.; Ullah, Z.; Gruosso, G. Enhancing Grid Stability Through Physics-Informed Machine Learning Integrated-Model Predictive Control for Electric Vehicle Disturbance Management. World Electr. Veh. J. 2025, 16, 292. [Google Scholar] [CrossRef]
Khan, K.; Samuilik, I.; Ali, A. A Mathematical Model for Dynamic Electric Vehicles: Analysis and Optimization. Mathematics 2024, 12, 224. [Google Scholar] [CrossRef]
Previti, U.; Brusca, S.; Galvagno, A.; Famoso, F. Influence of Energy Management System Control Strategies on the Battery State of Health in Hybrid Electric Vehicles. Sustainability 2022, 14, 12411. [Google Scholar] [CrossRef]
Meteab, W.K.; Alsultani, S.A.H.; Jurado, F. Energy Management of Microgrids with a Smart Charging Strategy for Electric Vehicles Using an Improved RUN Optimizer. Energies 2023, 16, 6038. [Google Scholar] [CrossRef]
Shen, Y.; Li, Y.; Liu, D.; Wang, Y.; Sun, J.; Sun, S. Energy Management Strategy for Hybrid Energy Storage System based on Model Predictive Control. J. Electr. Eng. Technol. 2023, 18, 3265–3275. [Google Scholar] [CrossRef]
Oksuztepe, E.; Yildirim, M. PEM fuel cell and supercapacitor hybrid power system for four in-wheel switched reluctance motors drive EV using geographic information system. Int. J. Hydrogen Energy 2024, 75, 74–87. [Google Scholar] [CrossRef]
Gao, H.; Yin, B.; Pei, Y.; Gu, H.; Xu, S.; Dong, F. An energy management strategy for fuel cell hybrid electric vehicle based on a real-time model predictive control and pontryagin’s maximum principle. Int. J. Green Energy 2024, 21, 2640–2652. [Google Scholar] [CrossRef]
Liu, W.; Yao, P.; Wu, Y.; Duan, L.; Li, H.; Peng, J. Imitation reinforcement learning energy management for electric vehicles with hybrid energy storage system. Appl. Energy 2025, 378, 124832. [Google Scholar] [CrossRef]
Han, R.; He, H.; Wang, Y.; Wang, Y. Reinforcement Learning Based Energy Management Strategy for Fuel Cell Hybrid Electric Vehicles. Chin. J. Mech. Eng. 2025, 38, 66. [Google Scholar] [CrossRef]
Guo, D.; Lei, G.; Zhao, H.; Yang, F.; Zhang, Q. The A3C Algorithm with Eligibility Traces of Energy Management for Plug-In Hybrid Electric Vehicles. IEEE Access 2025, 13, 92507–92518. [Google Scholar] [CrossRef]
Liu, H.; You, C.; Han, L.; Yang, N.; Liu, B. Off-road hybrid electric vehicle energy management strategy using multi-agent soft actor–critic with collaborative-independent algorithm. Energy 2025, 328, 136463. [Google Scholar] [CrossRef]
Wang, J.; Du, C.; Yan, F.; Duan, X.; Hua, M.; Xu, H.; Zhou, Q. Energy Management of a Plug-In Hybrid Electric Vehicle Using Bayesian Optimization and Soft Actor–Critic Algorithm. IEEE Trans. Transp. Electrif. 2025, 11, 912–921. [Google Scholar] [CrossRef]
Sun, Z.; Guo, R.; Luo, M. Integrated energy-thermal management strategy for range extended electric vehicles based on soft actor–critic under low environment temperature. Energy 2025, 330, 136868. [Google Scholar] [CrossRef]
Wang, C.; Zhang, J.; Wang, A.; Wang, Z.; Yang, N.; Zhao, Z.; Lai, C.S.; Lai, L.L. Prioritized sum-tree experience replay TD3 DRL-based online energy management of a residential microgrid. Appl. Energy 2024, 368, 123471. [Google Scholar] [CrossRef]
Jia, C.; He, H.; Zhou, J.; Li, J.; Wei, Z.; Li, K. Learning-based model predictive energy management for fuel cell hybrid electric bus with health-aware control. Appl. Energy 2024, 355, 122228. [Google Scholar] [CrossRef]
Cavus, M.; Dissanayake, D.; Bell, M. Next Generation of Electric Vehicles: AI-Driven Approaches for Predictive Maintenance and Battery Management. Energies 2025, 18, 1041. [Google Scholar] [CrossRef]
Omakor, J.; Alzayed, M.; Chaoui, H. Particle Swarm-Optimized Fuzzy Logic Energy Management of Hybrid Energy Storage in Electric Vehicles. Energies 2024, 17, 2163. [Google Scholar] [CrossRef]
Treesatayapun, C. Prescribed performance of discrete-time controller based on the dynamic equivalent data model. Appl. Math. Model. 2020, 78, 366–382. [Google Scholar] [CrossRef]

Figure 1. Power flow block diagram.

Figure 2. MiFREN_m architecture: Model network.

Figure 3. MiFREN_s architecture: Co-state network.

Figure 4. Control system block diagram.

Figure 5. Membership functions.

Figure 6. Velocity profile.

Figure 7. Demanding power:

P_{d} (k)

.

Figure 8. Proposed controller:

P_{f c} (k)

.

Figure 9. Proposed controller:

u (k)

or

Δ P_{f c} (k)

.

Figure 10. Proposed controller:

E_{b} (k)

.

Figure 11. Proposed controller:

S O C (k)

.

Figure 12. Proposed controller:

η_{e} (k)

and

η_{λ} (k)

.

Figure 13. Controller A:

P_{f c} (k)

.

Figure 14. Controller A:

E_{b} (k)

.

Figure 15. Controller A:

S O C (k)

.

Figure 16. Controller B:

P_{f c} (k)

.

Figure 17. Controller B:

S O C (k)

.

Table 1. System parameters.

Parameter	Description	Value	Unit
$C_{d}$	Aerodynamic drag coefficient	0.3
$A_{f}$	Fronted area	2.2508	[ $m^{2}$ ]
$ρ_{a}$	Air density	1.293	[k/ $m^{3}$ ]
$m_{0}$	Curb weight	2024	[kg]
$R_{i}$	Rotational inertia coefficient	1
$R_{r}$	Rolling resistance coefficient	0.013
g	Gravity acceleration	9.81	[m/ $s^{2}$ ]
$η_{e m}$	Motor efficiency	0.9
$η_{m}$	Mechanical drive efficiency	0.9
$η_{D C / A C}$	Inverter efficiency	0.95
$η_{D C / D C}$	Converter efficiency	0.95
$η_{b}$	Coulombic efficiency	0.98
$E_{0}$	Battery capacity	50	[kWh]

Table 2. Constraint parameters.

Limit	Value	Unit	Limit	Value	Unit
$P_{f c}^{m i n}$	0.25	[kW]	$P_{f c}^{m a x}$	80	[kW]
$Δ P_{f c}^{M}$	9	[kW]	$P_{b}^{M}$	50	[kW]
$S O C^{m i n}$	0.2	Per Unit	$S O C^{m a x}$	0.9	Per Unit
$P_{d}^{G e n}$	80	[kW]	$P_{d}^{D r v}$	100	[kW]
$P_{a u x}^{m i n}$	0.5	[kW]	$P_{a u x}^{m a x}$	20	[kW]
$I_{b}^{M}$	50	[A]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Electric Vehicle Energy Management Under Unknown Disturbances from Undefined Power Demand: Online Co-State Estimation via Reinforcement Learning

Abstract

1. Introduction

2. Problem Formulation with EV-EMS Framework

2.1. A Class of Control Systems Based on Model-Free EV-EMS

2.2. Characterization of the Optimal Solution

3. Controller as EMS with MiFREN-Estimators

3.1. Dynamic Equivalent Model

3.2. Co-State Estimation

4. Validation and Comparative Results

4.1. Validation Results

4.2. Comparative Results

4.2.1. Comparative Controller A

4.2.2. Comparative Controller B

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics