Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging

Pesantez, Daniel; Menéndez-Granizo, Oswaldo; Dehghani, Moslem; Rodríguez, José

doi:10.3390/electronics15071356

Open AccessArticle

Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging

¹

Centro de Transición Energética (CTE), Facultad de Ingeniería, Universidad San Sebastián, Bellavista 7, Santiago 8420524, Chile

²

Departamento de Ingeniería de Sistemas y Computación, Universidad Católica del Norte, Antofagasta 1249004, Chile

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(7), 1356; https://doi.org/10.3390/electronics15071356

Submission received: 12 February 2026 / Revised: 17 March 2026 / Accepted: 22 March 2026 / Published: 25 March 2026

(This article belongs to the Section Power Electronics)

Download

Browse Figures

Versions Notes

Abstract

In recent years, DC-DC partial power converters (PPCs) have become increasingly important in fast-charging architectures for electric vehicles (EVs). Their key feature is that only a fraction of the energy delivered to the battery is processed by the PPC, while the rest is transferred directly, bypassing the conversion stage. This reduces DC-DC conversion losses and improves overall charging efficiency. However, the nonlinear dynamics of these converters can limit performance, especially with model-based controllers such as proportional–integral (PI) controllers. This paper proposes a data-driven control framework for EV fast-charging stations using a DC-DC PPC that is controlled by deep reinforcement learning (DRL). A value-based deep Q-network (DQN) directly selects switching actions and jointly regulates the partial-voltage and output current. The control problem is formulated as a discrete-time Markov decision process, and a two-stage transfer learning scheme ensures safe, efficient deployment. Firstly, the DQN agent is trained in a high-fidelity simulation and then fine-tuned with a small set of experimental data to capture parasitic and modeling errors. The controller is integrated into a constant-current–constant-voltage (CC-CV) charging algorithm and validated over a full charging cycle of a 60 kWh EV battery. The proposed control scheme exhibits a settling time of approximately 2 ms in response to current reference variations while maintaining steady-state errors below 2% in current regulation and below 1% in partial voltage regulation. Simulation results show that the proposed DRL controller has a small steady-state tracking error and improved robustness to reference changes compared with conventional PI and sliding mode controllers. The low computational cost of the trained DQN policy also enables real-time execution on embedded platforms for EV charging.

Keywords:

data-driven control systems; reinforcement learning; neural network control; partial power converter; fast charging

1. Introduction

One of the main global challenges is mitigating greenhouse gas emissions, with particular emphasis on Carbon Dioxide (CO₂). The transportation sector contributes approximately

16 %

of total CO₂ emissions, primarily as a result of the combustion of fossil fuels in internal combustion engines (ICEs) [1,2]. In recent years, the electric vehicle (EV) sector has emerged as a highly promising field, aiming to replace internal combustion engines (ICEs) with fully electric powered engines [3,4]. However, one of the significant challenges in this industry evolution is the battery capacity and operating range [5]. Consequently, advances in battery technology over the past ten years have led to increased battery capacities and decreased costs [6]. Alongside advances in battery technology, the need for more efficient chargers has become essential, prompting extensive research into AC-DC and DC-DC converter topologies [6,7,8,9,10]. The power section of charging stations typically consists of two main stages: an AC-DC conversion stage, followed by a DC-DC stage that is in charge of the current and voltage control at the station’s output terminals. There are several ways to configure these two stages, allowing a variety of charging system architectures. These architectures are classified primarily into two categories: off-board and on-board configurations. Off-board architectures are configurations in which the charger’s power conversion stages are located external to the vehicle. These systems are typically able to deliver higher charging power, primarily because they are not constrained by the spatial and mass limitations imposed by the vehicle integration. In contrast, on-board chargers are charging systems in which both AC–DC and DC-DC power-conversion stages are integrated into the vehicle. Due to constraints on volume, mass, thermal management, and cost, these systems typically operate at lower power ratings, resulting in longer charging times compared to off-board charging systems. However, their main advantage lies in their high degree of deployment flexibility, as they can be connected to virtually any location equipped with an appropriate electrical supply [11].

In addition, several algorithms have been proposed and implemented to optimally manage and control EV battery charging. These algorithms are primarily designed to reduce overall charging time while simultaneously prolonging the battery life cycle. To achieve this, some of these algorithms take into account factors such as temperature, state of health, current injection rate, and various physical and chemical phenomena that impact the battery over time and during repeated charge–discharge cycles [12,13]. For example, the constant-current–constant-voltage (CC-CV) charging algorithm comprises two distinct operating phases. In the initial constant-current phase, the battery is charged at a fixed current, the maximum magnitude of which is primarily determined by the battery manufacturer’s specifications and the charger’s rated capacity. Subsequently, in the constant-voltage phase, overcharging is prevented by regulating the current delivered to the battery while maintaining a fixed terminal voltage, thus allowing the battery to reach a full 100% state of charge (SoC) in a controlled and regulated manner [14]. Another battery charging algorithm uses a pulsed charging profile, in which the battery is subjected to periodic current pulses characterized by variable amplitude and frequency [15]. A third charging strategy reported in the literature employs multiple stages of constant current. This approach seeks to minimize the overall charging duration by determining, at each stage, the optimal current magnitude as a function of the battery SoC, thereby mitigating the excessive temperature rise during the charging process [16,17].

To regulate physical quantities in power electronic converters—such as current, voltage, and power—the conventional approach is to employ linear control strategies. These typically comprise a hierarchy of cascaded control loops followed by a modulation stage [18,19]. Achieving high performance with linear controllers generally requires an accurate mathematical model of the plant. However, in practice, many power electronic systems are simplified and modeled as first-order dynamical systems to facilitate controller design, thereby restricting the converter’s dynamic behavior to a relatively narrow operating region. As a result, advanced techniques based on artificial intelligence have become increasingly important for addressing the control challenges associated with inherently nonlinear and time-varying power conversion systems.

However, alternative control methodologies can also be identified. For example, in [20], a converter is presented that uses a nonlinear input–output feedback linearization control scheme for a boost converter operating under mixed-load conditions. In parallel, data-driven control systems constitute an advanced class of control strategies in which the dynamics of the power converter are inferred from simulated and experimental data [21,22]. In recent years, deep reinforcement learning (DRL) agents have attracted increasing interest due to their high adaptability and robustness against disturbances and parameter uncertainties, thereby enabling the design of model-free adaptive control systems. Various DRL agents have been proposed as control strategies for different power converter topologies [23,24,25,26]. In particular, the continuous nature of the control actions in DC-DC power converters facilitates the deployment of actor–critic DRL agents, such as the proximal policy gradient (PPG) [25], the deep deterministic policy gradient (DDPG) [27], and the twin-delayed deep deterministic policy gradient (TD3) [28]. A control scheme for a three-phase voltage source inverter (VSI) based on a DDPG agent was initially introduced in [29], in which a control strategy for a Permanent Magnet Synchronous Machine (PMSM) was formulated using the dq components of the phase currents. Furthermore, the principal results reported in [29] were exploited to design a robust control scheme by integrating a DDPG agent with an H_∞ controller. Pursuing a similar objective, a two-stage decoupled DDPG architecture was proposed in [30] to regulate both the speed and current of a PMSM. Currently, actor–critic algorithms have been used to enhance other control paradigms, including Finite Control Set Model Predictive Control (FCS-MPC) [31]. Comparative studies between DDPG-based controllers and conventional Proportional–Integral (PI) controllers indicate that the DDPG approach yields superior steady-state and transient performance [32]. In addition, recent developments have demonstrated the effectiveness of a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent in achieving fast and frequent regulation of microgrids interfaced with power inverters [33,34]. In [35], a value-based DRL algorithm, called deep Q-network (DQN), is employed in an AC-link DC-DC PPC. Here, an intelligent artificial control system based on the TD3 algorithm is introduced to control the charging process of the EV battery; this approach replaces the traditional inner current control loop with an actor–critic DRL agent of a traditional PI cascade controller while maintaining PI control as the external loop. Furthermore, power quality, through harmonic analysis, is undoubtedly an important aspect within EV battery charge applications, as shown in [36].

This work presents a data-driven control framework using deep reinforcement learning agents for smart fast EV charging based on a DC-DC partial power converter (PPC). The main contribution is the development of a value-based DQN controller that directly selects the converter switching actions to jointly regulate the partial voltage

V_{p c}

and the output current

i_{o u t}

, thereby eliminating the need for a conventional cascaded PI inner-loop structure and reducing the dependence on an accurate converter model. To guarantee safe learning and practical deployability, a two-stage transfer learning strategy is introduced: the DQN agent is first trained in a high-fidelity simulation environment under diverse operating conditions, and subsequently fine-tuned using limited experimental data to compensate for parasitics, unmodeled nonlinearities, and parameter variations. Empirical findings show that the proposed control achieves fast dynamic tracking, improved robustness under load and reference changes, and low computational complexity, making it suitable for real-time embedded implementation in EV charging applications. However, it should be emphasized that, rather than being conceived as potential substitutes for classical control strategies, DRL-based control schemes are more appropriately interpreted as complementary approaches. In particular, they can address nonlinear dynamics and operating-condition variations in scenarios where linear control strategies would otherwise require retuning or reconfiguration.

The remainder of this work is organized as follows. Section 2 describes the fundamental principles underlying the PPC framework employed in this study. Section 3 presents the mathematical formulation of the DRL-based control strategy. Section 4 reports the main simulation results within the context of an EV battery charging application. Finally, Section 5 summarizes the conclusions and outlines directions for future research.

2. Partial Power Converter

EV battery chargers generally consist of two main stages: an AC-DC conversion stage, which allows the charger to be connected to the electrical grid, followed by a DC-DC conversion stage, which mainly regulates the current that is injected into the battery. As shown in Figure 1, the DC-DC stage of each charging architecture may employ converter topologies with or without galvanic isolation. For topologies that incorporate galvanic isolation in the DC-DC converter, direct connection to the AC grid is feasible without the use of a low-frequency transformer. In contrast, when the DC-DC converter lacks galvanic isolation, a low-frequency transformer must be included to enable safe connection to the AC grid. Regarding the mentioned DC-DC conversion stage, numerous converters have been designed featuring different topologies [6,37,38,39,40,41,42], among which, PPCs have become an applicable option for the DC-DC conversion stage in charging systems [18,19,40,43,44,45,46,47].

In PPC topologies, only a fraction of the total power delivered to the load is processed by the converter, while the remainder is transferred directly. This contrasts with full-power converter topologies, in which the entire power delivered to the load is processed by the converter. The underlying operating principle is therefore referred to as partial power processing. The PPC feature increases overall conversion efficiency and reduces the electrical and thermal stresses imposed on the converter’s semiconductor devices. Due to these advantages, DC-DC PPCs have emerged as a relevant alternative for EV battery charging systems. The literature reports a wide range of such topologies, with and without galvanic isolation, which differ mainly in the manner in which the converter is integrated into the overall system architecture, including input-parallel output-series (IPOS) and input-series output-parallel (ISOP) configurations [48].

The IPOS step-up configuration, referred to by some authors as Type I PPC, and the ISOP step-down configuration, referred to as Type II PPC, are widely employed in EV charging applications. These configurations are depicted in Figure 2. In this figure,

P_{p c}

represents the power processed by the converter, whereas

P_{d i r}

shows the power delivered directly to the battery. The voltage at the converter terminal is indicated by

V_{p c}

; the voltages at the input and output of the DC-DC converter stage are denoted by

V_{i n}

and

V_{o}

, respectively.

P_{o}

shows the total power supplied to the battery, which can be written as:

P_{o} = P_{p c} + P_{d i r}

.

To quantify the degree of partiality in PPCs, a partial power ratio

K_{p r}

is introduced. This dimensionless parameter represents the fraction of the total power processed by the PPC relative to the input power

P_{i n}

of the entire DC-DC conversion stage, and is defined by Equation (1) as follows:

K_{p r} = \frac{P_{p c}}{P_{i n}} .

(1)

The literature reports a wide range of PPC topologies used in EV battery charger applications [18,43,44,45]. In this paper, the topology presented in [49] is used to evaluate the proposed controller.

2.1. Topology Description

The topology used corresponds to a pseudo-PPC type-II step-down configuration comprising an H-bridge switch, a capacitor cell, and a bypass diode. This configuration is designated as a pseudo-PPC topology because, in one of its switching states, the converter no longer exhibits the characteristics of a PPC and instead operates as a full-scale power converter; therefore, the term “pseudo” indicates that partial power processing is not continuously maintained over the complete switching cycle. This configuration emulates the operating behavior of a conventional buck converter; however, the traditional power semiconductor switch is replaced by the previously described switched-capacitor cell, as illustrated in Figure 3. This modified arrangement enables the series connection of the capacitor

C_{p c}

with the DC bus, while the battery is interfaced at the output of the converter. The voltage across the

C_{p c}

capacitor defines the degree of partiality of the converter. A reduction in this voltage increases the partiality of the converter, thereby decreasing the voltage stress experienced by the switches and, consequently, improving the overall efficiency.

2.2. Converter Model

To model the considered converter, three switching modes are defined [49]. These switching states enable the formulation of the converter state-space equations that characterize its behavior over the entire operating range.

In switching state I, represented by the equivalent circuit in Figure 4, the partial voltage

V_{p c}

is connected in series with the input DC bus, represented by the capacitor voltage

V_{d c}

, the inductor L, and the output voltage of the converter

V_{o u t}

. In switching mode I, the state variables of the converter are controlled by the following state space equations:

\begin{matrix} \begin{matrix} L \frac{d_{i L}}{d t} = V_{d c} + V_{p c} - V_{o u t} \\ C_{p c} \frac{d V_{p c}}{d t} = - i_{L} \\ C_{o u t} \frac{d V_{o u t}}{d t} = i_{L} - i_{o u t} \end{matrix} \end{matrix}

(2)

Figure 5 shows the equivalent circuit corresponding to switching state II. In this configuration, the polarity of the capacitor

C_{p c}

is inverted with respect to switching state I, effectively subtracting the voltage

V_{p c}

from the input voltage

V_{d c}

. The corresponding state-space equations in switching state II are given by Equation (3):

\begin{matrix} \begin{matrix} L \frac{d_{i L}}{d t} = V_{d c} - V_{p c} - V_{o u t} \\ C_{p c} \frac{d V_{p c}}{d t} = i_{L} \\ C_{o u t} \frac{d V_{o u t}}{d t} = i_{L} - i_{o u t} \end{matrix} \end{matrix}

(3)

Finally, during switching state III, illustrated in Figure 6, all semiconductor devices are in the non-conducting (off) state, such that the load current freewheels through the output inductor and diode D. This operating condition corresponds to the full-power mode of the converter. The state-space equations in this switching state are given by Equation (4):

\begin{matrix} \begin{matrix} L \frac{d_{i L}}{d t} = - V_{o u t} \\ C_{p c} \frac{d V_{p c}}{d t} = 0 \\ C_{o u t} \frac{d V_{o u t}}{d t} = i_{L} - i_{o u t} \end{matrix} \end{matrix}

(4)

From Equations (2)–(4), and assuming steady-state operation of the converter, i.e.,

\frac{d i_{L}}{d t} = 0

, the following expression is obtained:

V_{o u t} = V_{i n} (1 - δ) + V_{p c} (1 - 2 α - δ),

(5)

where

α

denotes the time interval during which the partial voltage

V_{p c}

is added on the input voltage (corresponding to state I and state II), while

δ

designates the time interval in which the converter operates at full-power mode.

2.3. Classic PI Control

The conventional control strategy for this converter employs a cascaded PI control architecture comprising two primary feedback loops. The outer loop regulates the converter output voltage, typically set to the battery nominal voltage. In contrast, the inner control loop consists of two PI controllers. The first of these functions serves as a partial-voltage regulator; the controller’s reference is arbitrarily set to one-half of the input voltage, thereby defining the desired partial-voltage level within the converter. The second innermost controller is a current PI regulator whose input is the error between the actual output current delivered by the converter to the battery and its corresponding reference value. The outputs of these two PI controllers in the inner loop are subsequently applied to a pulse-width modulation (PWM) block, which generates the semiconductor gating signals

S_{x}

, as illustrated in Figure 7.

V_{p c}^{*}

,

V_{o}^{*}

and

i_{L}^{*}

are the reference signals of the partial voltage, output voltage and output current, respectively. The reference signal of the inner current PI controller is determined according to the SoC of the battery, when the SoC reaches approximately 80%, the reference value is provided by the output-voltage PI controller. Below this threshold, the reference remains a constant value determined by the intrinsic characteristics of the battery. The dash line in Figure 7 is shown the control mode which is chosen based on the SoC.

3. DRL-Based Control Strategy

The proposed data-driven control system determines the optimal sequence of switching actions to regulate

V_{p c}

and

i_{o u t}

in the PPC topology. To achieve this aim, a two-stage transfer learning framework based on a DQN agent is developed. The main goal is to design an adaptive control strategy capable of handling nonlinear dynamics, parametric uncertainty, and model inaccuracies without overfitting to specific operating conditions. From the perspective of optimal control theory, RL can be interpreted as an approximate method for solving an optimal control problem, in which the optimal policy is derived through the iterative estimation of the action-value function. The control problem is formulated as a discrete-time Markov Decision Process (MDP), where the DQN agent learns the optimal switching policy that minimizes tracking errors while maintaining safe operating conditions.

3.1. Two-Stage Transfer Learning Algorithm

The proposed control methodology consists of two sequential stages: (i) Simulation-Based Training and (ii) Experimental Fine-Tuning. The DQN agent is first trained in a high-fidelity simulation environment of the PPC converter. This stage enables controlled exploration over a wide range of operating conditions, including load variations and reference changes, without risking hardware damage. The control objective is formulated as a path-planning problem in state space, where the agent minimizes the weighted tracking error of

V_{p c}

and

i_{o u t}

at each sampling instant. The discrete nature of the PPC switching states enables the use of an action-value-based reinforcement learning algorithm. Therefore, a DQN agent with four discrete outputs is adopted to select among the feasible switching actions.

After simulation convergence, the learned Q-network parameters are transferred to the real PPC system. A limited fine-tuning phase is performed using experimental data to compensate for model mismatch, parasitic effects, and unmodeled nonlinearities. This transfer learning approach significantly reduces training time and ensures safe operation in real-world settings. Figure 8 presents the overall control scheme. The dash line in Figure 8 is shown the control mode which is chosen based on the SoC.

3.2. Deep Q-Network Architecture

Although the pseudo-PPC presents a discrete action space, its state space is continuous and high-dimensional, as it encompasses electrical state variables together with integrals of the tracking errors. Under these conditions, conventional tabular RL agents become impractical due to the problem of dimensionality and their limited capacity for function generalization. Therefore, a DQN is used to approximate the action-value function, enabling the controller to learn nonlinear state–action relationships and generalize across varying operating conditions of the pseudo-PPC.

Deep Q-Network architecture follows the standard DRL framework, where the agent interacts with the PPC (environment). At each control interval, the measurable electrical variables of the converter are used to construct the state vector

s_{t}

, which represents the instantaneous working condition of the system. The state vector is provided as input to the Q-network, which estimates the action-value function

Q (s, a)

related to feasible switching actions of the converter. Based on these estimates, the control action is selected using an

ε

-greedy policy that balances exploration and exploitation. The selected switching state is then applied to the PPC, producing a new system state and an associated reward determined by the reward function defined in the previous subsection.

Each interaction between the agent and the pseudo-PPC generates a transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

that is stored in an experience replay buffer. During training, random min-batches of stored transitions are sampled to update the parameters of the Q-network. This experience replay mechanism reduces correlations between consecutive samples, and improves the stability and efficiency of the learning processes by approximating independently and identically distributed (i.i.d) training data.

In addition, a target network is used to compute the temporal-difference targets. The parameters of the target network are periodically updated using the weights of the main Q-network. This architecture enables stable approximation of the action-value function while allowing the agent to learn a control policy that maps PPC states to optimal switching actions. Figure 9 presents the overall architecture of the proposed DQN-based control framework, including the interaction between the PPC environment, the Q-network, target network, the action selection policy, and the experience replay mechanism used during training.

3.3. Training of the DQN Agent

In the proposed DRL framework, the state

s_{t}

represents the measurable electrical variables of the PPC at time step t. These variables describe the instantaneous operating condition of the converter and are used by the agent to determine the appropriate control action. The action

a_{t}

corresponds to the switching command applied to the PPC during the next control interval.

The DQN approximates the action-value function

Q (s, a)

, which represents the expected cumulative return obtained by applying action a in state s and subsequently following the learned control policy. This differs from the state-value function

V (s)

, which depends only on the state and represents the expected return when following a policy from state s. Unlike

V (s)

, the action-value function instead evaluates the expected return for each possible action in a state, allowing the agent to choose the optimal action, for example, when identifying an optimal switching state that regulates the PPC during the EV fast-charging process.

3.3.1. Reward Function

The reward function is defined to penalize voltage and current tracking errors:

r_{t} = - (θ_{1} |V_{p c}^{*} - V_{p c}| + θ_{2} |i_{o u t}^{*} - i_{o u t}|),

(6)

The selected reward function enables a direct evaluation of the primary objective of the proposed control strategy: accurate regulation of both the partial voltage

V_{p c}

and the output current of the pseudo-PPC. This aim is pursued by penalizing tracking errors, thereby directing the agent to choose switching actions that minimize deviations of the aforementioned signals from their respective reference values. The negative formulation of the reward can be interpreted within an optimization framework, wherein the reward corresponds to the negative of an error-based cost function. Under this interpretation, the reward attains its maximum value when the errors converge to zero. In addition, maximizing the reward function is equivalent to minimizing a positive-definite cost function, thereby facilitating stable and accurate regulation of the pseudo-PPC. The weights

θ_{1}

and

θ_{2}

determine the relative importance of voltage and current regulation within the learning objective and were selected through empirical tuning to achieve balanced control performance. On the other hand, all electrical variables are expressed in per-unit values to improve numerical conditioning and maintain consistent reward magnitudes during the training process. Therefore, the proposed reward formulation promotes stable converter operation while guiding the agent toward control actions that improve regulation accuracy.

3.3.2. State and Action Spaces

The selected state representation is designed to capture both the instantaneous operating condition of the converter and the dynamic evolution of the regulation errors. In particular, the inclusion of voltage and current tracking errors together with their integral terms provides the agent with information about both transient behavior and steady-state regulation. This allows the reinforcement learning agent to learn control policies that reduce tracking errors while maintaining stable converter operation under varying operating conditions. Similar considerations in the design of state representations for reinforcement learning control problems are discussed in advanced RL test-bed problems, such as the minimum-time balance problem, where the state must describe both the current system condition and its dynamic evolution to enable effective policy learning [50,51]. The state vector is defined as:

s_{t} = {[e_{V, t}, e_{I, t}, \int e_{V, t}, \int e_{I, t}, V_{p c}, i_{o u t}]}^{T},

(7)

where

e_{V, t}

and

e_{I, t}

denote instantaneous voltage and current tracking errors, respectively. It is noteworthy that enhancing the state vector with the error signal and its integral enables the agent to learn a control policy that minimizes the steady-state error while preserving its capacity to accommodate nonlinear system dynamics and variations in the operating point.

The action space consists of four discrete switching commands corresponding to valid PPC switching states:

A = [S_{a}, S_{b}, S_{c}, S_{d}] .

(8)

3.3.3. Q-Network Update

The update of the DQN is based on the Bellman optimality equation, which defines the relationship between the action-value function and the expected cumulative return. For each transition

(s_{t}, a_{t}, r_{t}, s_{t + 1})

, a target value y is computed as the sum of the immediate reward and the discounted estimate of the optimal future return (see (9)). In the context of the pseudo-PPC, the reward reflects the tracking performance of the partial-voltage and output current, while the future return captures the long-term effect of the selected switching action on the converter dynamics. The future return is estimated using a target network with parameters

θ^{-}

, which is periodically updated to improve training stability.

y = r + γ max_{a^{'}} Q (s^{'}, a^{'}; θ^{-}),

(9)

where

θ^{-}

represents the parameters of the target network. The target network is updated periodically to stabilize learning.

The Q-network is trained by minimizing the temporal-difference loss

L (θ)

, defined as the mean squared error between the predicted action-value

Q (s, a; θ)

and the target value y. This loss function quantifies the discrepancy between the current estimation of the expected return and the target derived from the Bellman equation. By minimizing

L (θ)

through gradient-based optimization, the network iteratively improves its approximation of the optimal action-value function. Temporal-difference loss is computed as follows:

L (θ) = E_{(s, a, r, s^{'})} [{(y - Q (s, a; θ))}^{2}] .

(10)

From a control perspective, this learning process enables the agent to associate each converter state with the switching action that minimizes future tracking errors. From an optimization viewpoint, the DQN update can be interpreted as an approximate solution to a dynamic programming problem, in which the objective is to minimize a cumulative cost associated with partial voltage and current regulation. As the approximation of the action-value function improves, the derived control policy progressively approaches the optimal switching strategy, leading to enhanced regulation performance of the pseudo-PPC.

The training procedure is summarized as follows:

1.: Measure converter variables and construct $s_{t}$ .
2.: Select action $a_{t}$ using an $ϵ$ -greedy strategy.
3.: Apply the switching command to the PPC converter.
4.: Observe reward $r_{t}$ and next state $s_{t + 1}$ .
5.: Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in the replay buffer.
6.: Sample random mini-batches to update network parameters.

3.4. Neural Network Architecture

The Q-network is implemented as a feedforward neural network composed of:

Input layer: 6 neurons (state vector)
One hidden layer: 100 neurons with hyperbolic tangent activation.
Output layer: 4 neurons representing switching states.

The hidden layer size (100 neurons) was selected empirically after evaluating multiple configurations to balance approximation capability and computational efficiency. Smaller networks exhibited slower convergence, while deeper architectures increased variance and computational burden without significant performance gains.

The shallow architecture facilitates real-time implementation in embedded PPC control platforms.

3.5. Hyperparameter Configuration

Table 1 summarizes the selected training parameters. The discount factor is set to 0.995 to emphasize long-term performance. An experience replay buffer of 200,000 samples is used, with mini-batches of size 512 randomly selected for parameter updates. Each training episode lasts 2 s to ensure adequate system excitation and convergence.

3.6. Computational Requirements

The training phase is performed on a workstation equipped with an Intel(R) Core(TM) i7-10700F processor and an NVIDIA GeForce RTX 4070 GPU with 64 GB RAM. GPU acceleration reduces training time significantly.

Once trained, the controller requires only a forward pass through the Q-network

a_{t} = arg {max}_{a} Q (s_{t}, a)

, which performs a limited number of multiply–accumulate operations. Therefore, real-time deployment in embedded PPC hardware is feasible without GPU support.

4. Results

To evaluate the performance of the proposed controller in fast battery charging, a set of simulations was conducted using MATLAB software package 2024b. The details of the simulated system are presented in Table 2. It is important to note that the equivalent circuits shown in Figure 4, Figure 5 and Figure 6 are formulated under the assumption of ideal components, acting exclusively to illustrate the converter switching states. In contrast, for the numerical simulation, whose results are discussed below, nonlinear effects were taken into account, including semiconductor forward voltage drops, parasitic winding resistances in inductors, and the equivalent series resistance (ESR) of capacitors.

The first result presented in corresponds to the simulation of a complete charging cycle of an EV battery with a nominal capacity of 60 kWh. The process begins with an initial SOC of 20% and continues until the SOC reaches 100%. Figure 10a shows the temporal evolution of the output current

i_{out}

during the charging process, illustrating the operation of the CC-CV charging algorithm. In the initial phase, corresponding to the constant-current stage, the output current

i_{out}

is regulated at its nominal value of 100 A. This current reference is imposed by the battery management system (BMS). Once the battery reaches 80% of its SOC, the BMS switches the system to the constant-voltage stage. In this subsequent phase, an external voltage controller sets the output current reference, which regulates the terminal voltage so that the battery reaches 100% SOC in a controlled manner, thus mitigating the risk of overcharging. The transition between the two operating modes is observed at approximately minute 55. At this point, the current reference ceases to be the nominal value of 100 A and decreases to approximately 80 A, then continues to decrease until it reaches 0 A as the battery completes the charging procedure. The second charging stage extends up to approximately 90 min, which is a typical charging time for a battery with the aforementioned capacity. During the transition between the two operating modes, the current reference undergoes an abrupt variation; the simulation results demonstrate that the proposed controller accurately tracks this change. In contrast, Figure 10b shows the evolution of the battery terminal voltage during the same charging cycle. The battery initially exhibits a voltage of approximately 420 V, which is lower than the nominal value due to its initial SOC. During the constant-current phase, the terminal voltage increases progressively until the transition to the constant-voltage phase occurs. At this transition point, as the charging current decreases, the rate of increase in voltage also decreases. However, in the final stage of the charging process, a significant and rapid increase in voltage is observed. This behavior, obtained from the simulation, arises from the equivalent-circuit model used by MATLAB to represent the battery. The resulting curve exhibits a marked increase in the voltage slope attributable to the nonlinear characteristics of the lithium-ion battery model used, in which the terminal voltage depends explicitly on the SOC, internal polarization phenomena, and the ohmic voltage drop associated with the internal resistance.

Figure 11 illustrates the partial voltage

V_{p c}

across the capacitor

C_{p c}

. For the correct operation of this DC-DC pseudo-PPC converter topology, it is essential to regulate this voltage, as it directly determines the degree of partiality of the system. In the presented simulation, the reference voltage was arbitrarily set at 270 V, corresponding to half of the input voltage. The figure illustrates that the partial voltage accurately tracks the reference signal throughout the entire charging process, regardless of the algorithmic charging stage in which the converter operates. Keeping the value of

V_{p c}

fixed at 270 V, the partiality of the converter is approximately 36% at the beginning of the charging process. This value corresponds to the fraction of the total power delivered to the battery that is transferred directly, i.e., without passing through the DC-DC converter.

Figure 12 shows the system response to a step change in the reference current signal, as observed in both the inductor current

i_{L}

and the partial voltage

V_{p c}

. In Figure 12a, the applied perturbation corresponds to a step increase in the current reference signal: the initial value of 100 A is raised to 120 A at

t = 10

s. The results indicate that the proposed control strategy enables the converter to track this variation with fast dynamics; the settling time is approximately 0.002 s, and the steady-state error remains below 2% of the nominal current reference. Figure 12b depicts the corresponding response of the partial voltage

V_{p c}

to a step change in its reference signal of 5 V. In this case, the voltage stabilizes within approximately 0.006 s after the reference change. The steady-state error in the

V_{p c}

signal is approximately 0.75%.

Figure 13 shows the system response to a step change in the reference current signal, as observed in both the inductor current

i_{L}

and the partial voltage

V_{p c}

. In Figure 13a,b, the reference signals show decreases of 20 A and 5 V, respectively. In Figure 13a, it can be observed that the inductor current

i_{L}

accurately tracks its reference signal, exhibiting a settling time of less than 2 ms. This indicates a fast and well-damped transient response, with no appreciable overshoot. Similarly, Figure 13b shows that the partial voltage

V_{p c}

is effectively regulated, reaching the new operating point with a settling time of approximately 6 ms while maintaining a steady-state error below 2%.

Figure 14 shows the temporal evolution of the inductor current

i_{L}

under three distinct control strategies. The first signal, shown in blue, corresponds to a conventional PI controller. The second trajectory, shown in green, represents the response obtained with a sliding mode control (SMC) strategy, whereas the red trajectory illustrates the behavior achieved with the DRL-based controller proposed in this paper. The dynamic performance of each control scheme is assessed in response to a step variation in the current reference applied at

t = 10

s.

All controllers are able to regulate the inductor current around the reference value; however, the DRL-based controller exhibits superior performance in both the transient and steady-state phases. In particular, following the disturbance, both the DRL-based controller and the SMC achieve the new operating point with a shorter settling time, while the PI controller is characterized by larger oscillations and slower convergence. Although the SMC exhibits a fast transient response due to its inherent robustness to system nonlinearities, the DRL-based controller achieves the steady-state operating condition with attenuated oscillatory behavior. Another relevant aspect highlighted in the figure is the ripple of the current waveforms: the DRL-based controller yields a smoother current profile, indicating more efficient and refined control compared to both PI and SMC. These results indicate that the proposed DRL-based controller constitutes a more robust solution, demonstrating an enhanced capability to handle system disturbances without requiring retuning of its parameters. In contrast, the performance of the PI controller exhibits a strong dependence on the specific operating point at which it has been tuned, whereas the SMC necessitates a careful selection of the switching gains in order to appropriately balance robustness against the occurrence of chattering phenomena. Table 3 presents a comparative analysis of the performance characteristics of the three controllers under evaluation.

Figure 15 shows the partial voltage

V_{p c}

performance under the suggested DRL-based control scheme, compared with that under the PI and SMC controllers, for the reference change at

t = 10 s

. The results indicate that the DRL-based controller achieves accurate tracking of the partial voltage with respect to its reference. The steady-state error remains low, on the order of approximately 2%, both before and after the reference change, reaching approximately 2.11% according to the quantitative results summarized in Table 4. In contrast, the PI-controlled system exhibits a larger steady-state error, approximately 6.03%, and a higher ripple amplitude in the voltage response. The steady-state error of the PI and SMC controllers is approximately 6%, but the SMC is robust and also has a better transient response than the PI controller. The proposed DRL controller achieves a settling time of 0.5 ms, outperforming both the SMC (1.0 ms) and PI (1.5 ms) controllers. Furthermore, the steady-state error of the proposed DRL controller is around 2.11%, indicating better performance compared to the PI and SMC controllers. These observations demonstrate that the DRL-based controller achieves more accurate and robust voltage regulation under variations in the operating point, and this comparative analysis also underscores the potential of reinforcement learning as an alternative control framework capable of effectively addressing nonlinear converter dynamics without the need for explicit model-based controller design.

Finally, the behavior of the converter equipped with the proposed controller is analyzed for different values of the battery SoC. The corresponding results are presented in Figure 16. The experiment was carried out for different values of the SoC of the battery, 10%, 50% and 80%, values that yield steady-state errors of 2.06%, 2.03%, and 1.81%. The maximum observed variation in the steady-state error is 0.25%, corresponding to the difference between the two most extreme conditions under which this test was conducted. This simulation validation shows that the converter maintains its correct operation for each of these scenarios. Settling time, overshoot, and steady-state error remained practically unchanged. As shown in Figure 17, the performance of the proposed DRL controller under variations in the inductance L by 5%, 10%, and 15% from its nominal value results in steady-state errors of 2.04%, 3.30%, and 3.66%, respectively. Similarly, it can be observed that the proposed controller is able to maintain its behavior despite the variations. This behavior is achieved because the state space of the agent includes the error and its integral, allowing the neural network to dynamically compensate for parameter variations that would normally degrade the performance of a fixed-gain PI controller. This analysis demonstrates that hyperparameter tuning is robust and guarantees optimal operation throughout the battery charging cycle.

5. Conclusions

This work examined the implementation of a DRL-based control strategy for a DC-DC pseudo-PPC designed for high-power EV charging. The simulation validation results provide clear evidence that the proposed control scheme effectively regulates both partial voltage and output current over a wide operating range, including sudden variations in the reference signals and operating modes imposed by a CC-CV charging algorithm. The capability of the controller to maintain accurate tracking during such transients is particularly significant, as these conditions correspond to critical operating points in practical EV charging scenarios.

A detailed analysis of the dynamic responses indicates that the proposed controller achieves substantially shorter settling times and reduced steady-state errors relative to a conventional proportional–integral control strategy. Furthermore, the smoother current and voltage waveforms obtained under DRL-based control suggest a more refined switching behavior, which is expected to result in a lower current ripple and a more favorable distribution of electrical and thermal stress across the power electronic components. Regulation of the partial voltage at a fixed reference level further promotes stable converter operation and preserves the advantages of partial power processing throughout the entire charging cycle.

The assessment of converter performance over a complete battery charging process confirms that the proposed DRL-based control strategy is well-suited for deployment in real-world EV fast-charging applications. The findings demonstrate that advanced data-driven control methodologies can effectively cope with the nonlinear dynamics of the DC-DC pseudo-PPC without requiring controller retuning across different operating conditions, thereby providing a robust and flexible alternative to classical linear control approaches. The stability of the proposed control strategy has been assessed through multiple simulation scenarios and comparative analyses with alternative control schemes. Nevertheless, given that the primary objective of this work is to investigate the feasibility and performance of a DLR-based control strategy for a pseudo-PPC, a more rigorous and formal stability analysis, an analysis of harmonics, and comprehensive experimental validation are deferred to future work and will be presented in future studies with other control methods.

Author Contributions

Conceptualization, D.P. and O.M.-G.; methodology, D.P., O.M.-G., and M.D.; software, D.P. and O.M.-G.; validation, D.P., O.M.-G., M.D., and J.R.; formal analysis, D.P., O.M.-G., and M.D.; investigation, D.P., O.M.-G., and M.D.; resources, D.P., O.M.-G., and M.D.; data curation, D.P., O.M.-G., and M.D.; writing—original draft preparation, D.P. and O.M.-G.; writing—review and editing, D.P., O.M.-G., M.D., and J.R.; visualization, D.P., O.M.-G., M.D., and J.R.; supervision, M.D. and J.R.; project administration, M.D. and J.R.; funding acquisition, M.D. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by La Agencia Nacional de Investigación y Desarrollo (ANID), Chile Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) de Postdoctorado 2025 under Grant 3250347; in part by ANID, Chile FONDECYT Iniciacion under Grant 11241171. J. Rodriguez acknowledges the support of ANID through projects CIA250006 and ATE250063.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the first and corresponding authors.

Acknowledgments

This work was supported by La Agencia Nacional de Investigación y Desarrollo (ANID), Chile Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) de Postdoctorado 2025 under Grant 3250347; in part by ANID, Chile FONDECYT Iniciacion under Grant 11241171. J. Rodriguez acknowledges the support of ANID through projects CIA250006 and ATE250063.

Conflicts of Interest

The authors declare no conflicts of interest. The funders mentioned in the Acknowledgments Section had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Friedlingstein, P.; O’Sullivan, M.; Jones, M.W.; Andrew, R.M.; Hauck, J.; Landschützer, P.; Quéré, C.L.; Li, H.; Luijkx, I.T.; Olsen, A.; et al. Global Carbon Budget 2024. Earth Syst. Sci. Data 2025, 17, 965–1039. [Google Scholar] [CrossRef]
Ge, M.; Friedrich, J.; Vigna, L. Where Do Emissions Come From? 4 Charts Explain Greenhouse Gas Emissions by Sector; World Resources Institute: Washington, DC, USA, 2024. [Google Scholar]
Scarpelli, C.; Ceraolo, M.; Crisostomi, E.; Apicella, V.; Pellegrini, G. Charging Electric Vehicles on Highways: Challenges and Opportunities. IEEE Access 2024, 12, 55814–55823. [Google Scholar] [CrossRef]
Tudor, C.; Sova, R.; Stamatiou, P.; Vlachos, V.; Polychronidou, P. Future-Proofing EU-27 Energy Policies with AI: Analyzing and Forecasting Fossil Fuel Trends. Electronics 2025, 14, 631. [Google Scholar] [CrossRef]
Aphale, S.; Kelani, A.; Nandurdikar, V.; Lulla, S.; Mutha, S. Li-ion Batteries for Electric Vehicles: Requirements, State of Art, Challenges and Future Perspectives. In 2020 IEEE International Conference on Power and Energy (PECon), Penang, Malaysia, 7–8 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 288–292. [Google Scholar] [CrossRef]
Rivera, S.; Kouro, S.; Vazquez, S.; Goetz, S.M.; Lizana, R.; Romero-Cadaval, E. Electric Vehicle Charging Infrastructure: From Grid to Battery. IEEE Ind. Electron. Mag. 2021, 15, 37–51. [Google Scholar] [CrossRef]
Tseng, N.H.; Chen, C.Y.; Chen, F.H.; Shih, Y.T.; Huang, L.J.; Tu, M.C.; Yen, Y.H.; Chen, K.H.; Zhu, X.; Lin, Y.H.; et al. A 97.1Extension and Precision-Adjustable Current Monitoring for 2× Faster Battery Charging. IEEE Trans. Power Electron. 2026, 41, 1731–1742. [Google Scholar] [CrossRef]
Karneddi, H.; Ronanki, D. Universal Bridgeless Nonisolated Battery Charger With Wide-Output Voltage Range. IEEE Trans. Power Electron. 2023, 38, 2816–2820. [Google Scholar] [CrossRef]
Karneddi, H.; Ronanki, D. Reconfigurable Battery Charger With a Wide Voltage Range for Universal Electric Vehicle Charging Applications. IEEE Trans. Power Electron. 2023, 38, 10606–10610. [Google Scholar] [CrossRef]
Mammeri, E.N.; Lopez-Santos, O.; El Aroudi, A.; Domajnko, J.; Prosen, N.; Martinez-Salamero, L. Modeling and Control of a Three-Phase Interleaved Buck Converter as a Battery Charger. IEEE Access 2025, 13, 18325–18345. [Google Scholar] [CrossRef]
Safayatullah, M.; Elrais, M.T.; Ghosh, S.; Rezaii, R.; Batarseh, I. A Comprehensive Review of Power Converter Topologies and Control Methods for Electric Vehicle Fast Charging Applications. IEEE Access 2022, 10, 40753–40793. [Google Scholar] [CrossRef]
Xie, J.; Vorobev, P.; Yang, R.; Nguyen, H.D. Battery Health-Informed and Policy-Aware Deep Reinforcement Learning for EV-Facilitated Distribution Grid Optimal Policy. IEEE Trans. Smart Grid 2025, 16, 704–717. [Google Scholar] [CrossRef]
Tang, W.; Chen, J.; Chen, D. Predicting EV battery state of health using long short term degradation feature extraction and FEA TimeMixer. Sci. Rep. 2025, 15, 2200. [Google Scholar] [CrossRef]
Shen, W.; Vo, T.T.; Kapoor, A. Charging algorithms of lithium-ion batteries: An overview. In 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA), Singapore, 18–20 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1567–1572. [Google Scholar] [CrossRef]
Althurthi, S.B.; Rajashekara, K. Evaluation of Pulsed Charging Procedures and Their Impact on Lithium-Ion Battery Lifetime for Electric Vehicle Fast Charging Applications. IEEE Trans. Transp. Electrif. 2025, 11, 14038–14046. [Google Scholar] [CrossRef]
Huang, Q.Y.; Liu, Y.H.; Chen, G.J.; Luo, Y.F.; Liu, C.L. Optimization of the SOC-based multi-stage constant current charging strategy using coyote optimization algorithm. J. Energy Storage 2024, 77, 109867. [Google Scholar] [CrossRef]
Wu, X.; Xia, Y.; Du, J.; Gao, X.; Nikolay, S. Multistage Constant Current Charging Strategy Based on Multiobjective Current Optimization. IEEE Trans. Transp. Electrif. 2023, 9, 4990–5001. [Google Scholar] [CrossRef]
Pesantez, D.; Renaudineau, H.; Rivera, S.; Peralta, A.; Alcaide, A.M.; Kouro, S. Transformerless partial power converter topology for electric vehicle fast charge. IET Power Electron. 2024, 17, 970–982. [Google Scholar] [CrossRef]
Pesantez, D.; Renaudineau, H.; Kouro, S.; Rivera, S.; Rodriguez, J. Reconfigurable Type I and Type II Buck-Boost Partial Power Converter for EV Fast Chargers. In 2024 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES), Mangalore, India, 18–21 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Arora, S.; Balsara, P.; Bhatia, D. Input–Output Linearization of a Boost Converter With Mixed Load (Constant Voltage Load and Constant Power Load). IEEE Trans. Power Electron. 2019, 34, 815–825. [Google Scholar] [CrossRef]
Prag, K.; Woolway, M.; Celik, T. Data-Driven Model Predictive Control of DC-to-DC Buck-Boost Converter. IEEE Access 2021, 9, 101902–101915. [Google Scholar] [CrossRef]
Subedi, S.; Gui, Y.; Xue, Y. Applications of Data-Driven Dynamic Modeling of Power Converters in Power Systems: An Overview. IEEE Trans. Ind. Appl. 2025, 61, 2434–2456. [Google Scholar] [CrossRef]
Rajamallaiah, A.; Naresh, S.; Raghuvamsi, Y.; Manmadharao, S.; Bingi, K.; R, A.; Guerrero, J.M. Deep Reinforcement Learning for Power Converter Control: A Comprehensive Review of Applications and Challenges. IEEE Open J. Power Electron. 2025, 6, 1769–1802. [Google Scholar] [CrossRef]
Cui, C.; Yan, N.; Huangfu, B.; Yang, T.; Zhang, C. Voltage Regulation of DC-DC Buck Converters Feeding CPLs via Deep Reinforcement Learning. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1777–1781. [Google Scholar] [CrossRef]
Hajihosseini, M.; Andalibi, M.; Gheisarnejad, M.; Farsizadeh, H.; Khooban, M.H. DC/DC Power Converter Control-Based Deep Machine Learning Techniques: Real-Time Implementation. IEEE Trans. Power Electron. 2020, 35, 9971–9977. [Google Scholar] [CrossRef]
Artal-Sevil, J.S.; Coronado-Mendoza, A.; Haro-Falcón, N.; Domínguez-Navarro, J.A. High-Efficiency Partial-Power Converter with Dual-Loop PI-Sliding Mode Control for PV Systems. Electronics 2025, 14, 3622. [Google Scholar] [CrossRef]
Gheisarnejad, M.; Farsizadeh, H.; Khooban, M.H. A Novel Nonlinear Deep Reinforcement Learning Controller for DC–DC Power Buck Converters. IEEE Trans. Ind. Electron. 2021, 68, 6849–6858. [Google Scholar] [CrossRef]
Tang, Y.; Hu, W.; Cao, D.; Hou, N.; Li, Z.; Li, Y.W.; Chen, Z.; Blaabjerg, F. Deep Reinforcement Learning Aided Variable-Frequency Triple-Phase-Shift Control for Dual-Active-Bridge Converter. IEEE Trans. Ind. Electron. 2023, 70, 10506–10515. [Google Scholar] [CrossRef]
Schenke, M.; Kirchgassner, W.; Wallscheid, O. Controller Design for Electrical Drives by Deep Reinforcement Learning: A Proof of Concept. IEEE Trans. Ind. Inform. 2020, 16, 4650–4658. [Google Scholar] [CrossRef]
Bhattacharjee, S.; Halder, S.; Yan, Y.; Balamurali, A.; Iyer, L.V.; Kar, N.C. Real-Time SIL Validation of a Novel PMSM Control Based on Deep Deterministic Policy Gradient Scheme for Electrified Vehicles. IEEE Trans. Power Electron. 2022, 37, 9000–9011. [Google Scholar] [CrossRef]
Wan, Y.; Xu, Q.; Dragičević, T. Reinforcement Learning-Based Predictive Control for Power Electronic Converters. IEEE Trans. Ind. Electron. 2025, 72, 5353–5364. [Google Scholar] [CrossRef]
Ye, J.; Mei, S.; Guo, H.; Hu, Y.; Zhang, X. A DDPG Algorithm Based Reinforcement Learning Controller for Three-Phase DC-AC Inverters. In 2023 International Conference on Power Energy Systems and Applications, ICoPESA 2023, Nanjing, China, 24–26 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 429–434. [Google Scholar] [CrossRef]
Egbomwan, O.E.; Liu, S.; Chaoui, H. Twin Delayed Deep Deterministic Policy Gradient (TD3) Based Virtual Inertia Control for Inverter-Interfacing DGs in Microgrids. IEEE Syst. J. 2023, 17, 2122–2132. [Google Scholar] [CrossRef]
Egbomwan, O.E.; Chaoui, H.; Liu, S. A Physics-Constrained TD3 Algorithm for Simultaneous Virtual Inertia and Damping Control of Grid-Connected Variable Speed DFIG Wind Turbines. IEEE Trans. Autom. Sci. Eng. 2025, 22, 958–969. [Google Scholar] [CrossRef]
Pesantez, D.; Menendez, O.; Renaudineau, H.; Kouro, S.; Rivera, S.; Rodriguez, J. Intelligent Control for Type I Partial Power Converters in EV Charging Systems: Twin-Delayed Deep Deterministic Policy Gradient Approach. In 2024 IEEE International Conference on Automation/XXVI Congress of the Chilean Association of Automatic Control (ICA-ACCA), Santiago, Chile, 20–23 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Niu, S.; Liu, W.; Liu, C.; Jia, C.; Liserre, M.; Chau, K.T. A non-intrusive integration of wireless chargers into electric vehicles: 95.60% dc-dc efficiency at 0.51 LD-to-CL ratio with on-vehicle demonstration. eTransportation 2026, 27, 100547. [Google Scholar] [CrossRef]
Guler, N.; Bagheri, F.; Komurcugil, H.; Bayhan, S. An MPC-controlled Bidirectional Battery Charger with DC-DC and Three-level F-type Converters. In 2024 4th International Conference on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 8–10 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Chaudhury, T.; Kastha, D. A Novel Current-Fed Dual-Active-Bridge DC–DC Converter for Ultra-Wide Output Voltage Range Electric Vehicle Battery Charger. IEEE Trans. Power Electron. 2025, 40, 17339–17354. [Google Scholar] [CrossRef]
Maalandish, M.; Hosseini, S.H.; Sabahi, M.; Rostami, N.; Khooban, M.H. High step-up multi input–multi output DC–DC converter with high controllability for battery charger/EV applications. IET Power Electron. 2023, 16, 2606–2623. [Google Scholar] [CrossRef]
Rivera, S.; Rojas, J.; Kouro, S.; Lehn, P.W.; Lizana, R.; Renaudineau, H.; Dragičević, T. Partial-Power Converter Topology of Type II for Efficient Electric Vehicle Fast Charging. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 10, 7839–7848. [Google Scholar] [CrossRef]
Rivera, S.; Goetz, S.M.; Kouro, S.; Lehn, P.W.; Pathmanathan, M.; Bauer, P.; Mastromauro, R.A. Charging Infrastructure and Grid Integration for Electromobility. Proc. IEEE 2023, 111, 371–396. [Google Scholar] [CrossRef]
Anzola, J.; Aizpuru, I.; Arruti, A. Partial Power Processing Based Converter for Electric Vehicle Fast Charging Stations. Electronics 2021, 10, 260. [Google Scholar] [CrossRef]
dos Santos, N.G.F.; da Silva Martins, M.L. A Reconfigurable Partial Power Converter with Adjustable Transformer Turns Ratio for a 6.6-kW Integrated On-Board Charger. In 2023 IEEE 8th Southern Power Electronics Conference and 17th Brazilian Power Electronics Conference (SPEC/COBEP), Florianopolis, Brazil, 26–29 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Iyer, V.M.; Gulur, S.; Gohil, G.; Bhattacharya, S. An Approach Towards Extreme Fast Charging Station Power Delivery for Electric Vehicles with Partial Power Processing. IEEE Trans. Ind. Electron. 2020, 67, 8076–8087. [Google Scholar] [CrossRef]
Santos, N.G.F.d.; Toebe, A.; Löbler, P.H.B.; Schuch, L.; Martins, M.L.d.S.; Rech, C. A High-Efficient Single-Switch Switched-Capacitor Partial Power Converter for On-Board Chargers. IEEE Trans. Power Electron. 2024, 39, 15269–15280. [Google Scholar] [CrossRef]
Liu, J.; Wu, Z.; Zhao, Q. A Bidirectional Resonant Converter Based on Partial Power Processing. Electronics 2025, 14, 910. [Google Scholar] [CrossRef]
Ejaz, B.; Zamora, R.; Reusser, C.; Lin, X. A Comprehensive Review of Partial Power Converter Topologies and Control Methods for Fast Electric Vehicle Charging Applications. Electronics 2025, 14, 1928. [Google Scholar] [CrossRef]
Anzola, J.; Aizpuru, I.; Romero, A.A.; Loiti, A.A.; Lopez-Erauskin, R.; Artal-Sevil, J.S.; Bernal, C. Review of Architectures Based on Partial Power Processing for DC-DC Applications. IEEE Access 2020, 8, 103405–103418. [Google Scholar] [CrossRef]
Rivera, S.; Pesantez, D.; Kouro, S.; Lehn, P.W. Pseudo-Partial-Power Converter without High Frequency Transformer for Electric Vehicle Fast Charging Stations. In 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Portland, OR, USA, 23–27 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1208–1213. [Google Scholar] [CrossRef]
Tutsoy, O.; Brown, M. Reinforcement learning analysis for a minimum time balance problem. Trans. Inst. Meas. Control 2016, 38, 1186–1200. [Google Scholar] [CrossRef]
Menéndez, O.; López-Caiza, D.; Tarisciotti, L.; Ruiz, F.; Auat-Cheein, F.; Rodríguez, J. Assessment of Deep Reinforcement Learning Algorithms for Three-Phase Inverter Control. In 2023 IEEE 8th Southern Power Electronics Conference and 17th Brazilian Power Electronics Conference (SPEC/COBEP), Florianopolis, Brazil, 26–29 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar] [CrossRef]

Figure 1. Classification of different architectures for EV battery chargers. On-board architectures are located to the right of the red dashed line, whereas off-board architectures are placed to the left of this line.

Figure 2. DC-DC PPC configurations: (a) Type I PPC; (b) Type II PPC.

Figure 3. Considered pseudo-PPC topology.

Figure 4. Switching states of the converter: State I. The red arrows indicate the direction of current flow.

Figure 5. Switching states of the converter: State II. The red arrows indicate the direction of current flow.

Figure 6. Switching states of the converter: State III. The red arrows indicate the direction of current flow.

Figure 7. Classical cascade control architecture based on PI controllers.

Figure 8. Proposed data-driven control system.

Figure 9. Overall architecture of the proposed DQN-based control strategy for the PPC. The agent receives the state vector from the environment, estimates the action-value function using the Q-network, and selects switching actions through an

ε

-greedy policy. The generated transitions are stored in the experience replay buffer and used to update the Q-network parameters, while a target network provides stable temporal-difference targets during training.

Figure 9. Overall architecture of the proposed DQN-based control strategy for the PPC. The agent receives the state vector from the environment, estimates the action-value function using the Q-network, and selects switching actions through an

ε

-greedy policy. The generated transitions are stored in the experience replay buffer and used to update the Q-network parameters, while a target network provides stable temporal-difference targets during training.

Figure 10. Battery charging process: (a) Battery current. (b) Battery voltage.

Figure 11. Battery charging process:

V_{p c}

partial voltage.

Figure 11. Battery charging process:

V_{p c}

partial voltage.

Figure 12. Step-up response of the following signals: (a) Inductor current,

i_{L}

. (b) Partial voltage,

V_{p c}

.

Figure 12. Step-up response of the following signals: (a) Inductor current,

i_{L}

. (b) Partial voltage,

V_{p c}

.

Figure 13. Step-down response of the following signals: (a) Inductor current,

i_{L}

. (b) Partial voltage,

V_{p c}

.

Figure 13. Step-down response of the following signals: (a) Inductor current,

i_{L}

. (b) Partial voltage,

V_{p c}

.

Figure 14. Comparative analysis of the DC-DC pseudo-PPC dynamics (Inductor current

i_{L}

) under PI, SMC controllers, and the proposed DRL-based control strategy.

Figure 14. Comparative analysis of the DC-DC pseudo-PPC dynamics (Inductor current

i_{L}

) under PI, SMC controllers, and the proposed DRL-based control strategy.

Figure 15. Comparative analysis of the DC-DC pseudo-PPC dynamics (Partial voltage

V_{p c}

) under PI, SMC controllers, and the proposed DRL-based control strategy.

Figure 15. Comparative analysis of the DC-DC pseudo-PPC dynamics (Partial voltage

V_{p c}

) under PI, SMC controllers, and the proposed DRL-based control strategy.

Figure 16. Inductor current

i_{L}

response under different battery SoC conditions: 20%, 50%, and 80%.

Figure 16. Inductor current

i_{L}

response under different battery SoC conditions: 20%, 50%, and 80%.

Figure 17. Inductor current

i_{L}

response under different inductor L values: 5%, 10%, and 15% of the nominal value.

Figure 17. Inductor current

i_{L}

response under different inductor L values: 5%, 10%, and 15% of the nominal value.

Table 1. Main parameters for the training of the DQN agent.

	Parameters	Value
PPC-topology	DC-link Voltage ( $V_{D C}$ )	600 V
	Phase current reference	10 to 20 A
DQN Training	Number of observations	6
	Number of actions	4
	Discount Factor	0.995
	Random experiences mini batch	512
	Maximum number of episodes	100
	Number of future rewards used to estimate policy’s value	2
	Experience buffer size	200,000

Table 2. Simulation parameters.

Parameter	Value
Converter nominal power	40 kW
Input voltage	540 V
Nominal output current	100 A
Nominal output voltage	457 V
Nominal $V_{p c}$ voltage	270 V
Output inductor L	1.2 mH
Capacitor $C_{p c}$	150 μF
Battery capacity	60 kWh
Initial SOC	20%

Table 3. Comparative performance analysis of PI, SMC, and DRL controllers under variations of the output current reference.

Controller	Settling Time	Steady-State Error
PI	5 ms	4.41%
SMC	3 ms	4.31%
DRL	2 ms	2.05%

Table 4. Comparative performance analysis of PI, SMC, and DRL controllers under variations of the partial voltage.

Controller	Settling Time	Steady-State Error
PI	1.5 ms	6.03%
SMC	1.0 ms	6.02%
DRL	0.5 ms	2.11%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pesantez, D.; Menéndez-Granizo, O.; Dehghani, M.; Rodríguez, J. Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging. Electronics 2026, 15, 1356. https://doi.org/10.3390/electronics15071356

AMA Style

Pesantez D, Menéndez-Granizo O, Dehghani M, Rodríguez J. Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging. Electronics. 2026; 15(7):1356. https://doi.org/10.3390/electronics15071356

Chicago/Turabian Style

Pesantez, Daniel, Oswaldo Menéndez-Granizo, Moslem Dehghani, and José Rodríguez. 2026. "Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging" Electronics 15, no. 7: 1356. https://doi.org/10.3390/electronics15071356

APA Style

Pesantez, D., Menéndez-Granizo, O., Dehghani, M., & Rodríguez, J. (2026). Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging. Electronics, 15(7), 1356. https://doi.org/10.3390/electronics15071356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Control of a DC-DC Pseudo-Partial Power Converter Using Deep Reinforcement Learning for EV Fast Charging

Abstract

1. Introduction

2. Partial Power Converter

2.1. Topology Description

2.2. Converter Model

2.3. Classic PI Control

3. DRL-Based Control Strategy

3.1. Two-Stage Transfer Learning Algorithm

3.2. Deep Q-Network Architecture

3.3. Training of the DQN Agent

3.3.1. Reward Function

3.3.2. State and Action Spaces

3.3.3. Q-Network Update

3.4. Neural Network Architecture

3.5. Hyperparameter Configuration

3.6. Computational Requirements

4. Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI