Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization

Huang, Yiqi; Zhao, Qiang; Zhu, Miao; Wen, Shuli; Zhang, Bing

doi:10.3390/electronics15081563

Open AccessArticle

Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization

by

Yiqi Huang

¹,

Qiang Zhao

^1,*

,

Miao Zhu

²

,

Shuli Wen

²

and

Bing Zhang

¹

Ocean College, Jiangsu University of Science and Technology, Zhenjiang 212003, China

²

School of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(8), 1563; https://doi.org/10.3390/electronics15081563

Submission received: 11 March 2026 / Revised: 31 March 2026 / Accepted: 4 April 2026 / Published: 8 April 2026

(This article belongs to the Special Issue Power Electronics and Multilevel Converters)

Download

Browse Figures

Versions Notes

Abstract

A triple phase shift (TPS) modulation strategy is proposed for a three-port active bridge (TAB) converter in shipboard zonal DC systems. Unlike traditional multi-port converters, the TAB realizes voltage conversion and bidirectional power conversion under TPS modulation. It exhibits superior performance in reducing control complexity, enhancing fault-tolerant capability, and extending the zero-voltage switching (ZVS) region under normal and fault operation modes. To further enhance its conversion efficiency, a deep reinforcement learning optimization approach based on the deep deterministic policy gradient (DDPG) algorithm is introduced to adaptively optimize TPS control parameters and minimize the overall power losses of the converter. To verify the proposed TPS modulation and DDPG-based optimization strategy for the TAB converter topology, a corresponding hardware prototype is built and experimentally tested under different operating conditions. Experimental results demonstrate that the TAB architecture with DDPG optimization effectively reduces current stress and power loss, boosting the converter’s maximum efficiency to 96.9% under normal mode and a 3% efficiency gain after fault isolation.

Keywords:

triple-phase-shift modulation strategy; three-port active bridge converter; deep deterministic policy gradient; fault-tolerant control; efficiency optimization

1. Introduction

Driven by the rapid evolution of power electronics technology, the efficient coupling of distributed energy resources and ship DC energy storage systems [1,2,3] becomes a strategic frontier for the development of smart microgrids. Unlike traditional AC frameworks, DC systems substantially reduce conversion losses by eliminating redundant DC/AC stages, thereby optimizing overall energy efficiency along the power path.

According to IEEE Std. 1709 [4] shipboard DC power distribution systems mainly employ three classic topologies: radial, ring, and zonal. The radial topology is simple and suitable for vessels with modest reliability requirements. The ring topology, though theoretically appealing, is rarely used in marine applications due to its limited reliability and flexibility [5,6]. In contrast, the zonal structure, as depicted in Figure 1 forms a zonal network through longitudinal port and starboard DC buses along the ship, enabling a dual-sided redundant power supply. When one side fails, critical loads can still be powered from the healthy opposite-side bus, significantly enhancing system survivability and power continuity. However, this structure necessitates a large number of DC-DC converters. Therefore, multi-port fault-tolerant converters can reduce volume, cost, and losses while improving system reliability [6].

The dual active bridge (DAB) converter plays a significant role in DC power systems supporting emerging energy applications, including photovoltaic generation and shipboard energy storage. Given the prevalent use of multiple DC–DC converters in DC systems, adopting a multi-port architecture to integrate these functions provides an effective means of significantly decreasing system volume and cost. Multi-port converters have received extensive research attention in applications such as shipboard power systems, satellite platforms, and electric vehicles. Both reliability and high efficiency are critical design requirements in these fields [7,8,9,10,11].

Based on circuit topology, existing multi-port converters can be classified into non-isolated [11] and isolated [12,13]. For example, non-isolated multi-port converters feature a simple topology and reduce volume. However, their efficiency and operating voltage range are generally limited. Most existing multi-port converters focus on improving efficiency and reducing cost [14,15]. However, such integrated designs often compromise system reliability. As demonstrated in Figure 2, representative isolated multi-port converters include the series-DAB and parallel-DAB.

Existing research on multi-port DAB converter control and modulation [16] primarily focuses on optimizing control algorithms for conventional multi-port DAB systems to reduce losses in power switches and magnetic components.

As the conventional modulation scheme for DAB converters, single phase shift (SPS) offers buck–boost capability, reduced filter requirements, and fast dynamic response [17]. However, it suffers from a limited ZVS region, elevated current stress, and increased circulating power, which significantly degrades overall performance. To address these limitations, extensive research focuses on advanced modulation strategies such as extended phase shift (EPS), dual phase shift (DPS), and triple phase shift (TPS) [18,19,20,21]. In practice, SPS, EPS, and DPS are specific operating cases of TPS. TPS employs three independent control variables, providing greater flexibility than the other modulation strategies. However, under heavy load conditions, TPS operation gradually shifts toward DPS or EPS modes, resulting in increased power loss and reduced conversion efficiency [19,20].

To further explore optimal solutions, various advanced iterative algorithms are proposed, including genetic algorithms (GA) [22] and the Newton–Raphson (NR) method [23], which are commonly used iterative techniques. However, these iterative approaches involve high logical complexity and high computational costs. Particle swarm optimization is adopted to minimize reactive power and expand the ZVS region [24]. Compared with genetic algorithms and Newton’s method, particle swarm optimization offers faster convergence, fewer tuning parameters, and lower computational costs. In recent years, artificial intelligence techniques significantly reshaped traditional control methodologies. For instance, reinforcement learning is widely applied to solve optimal control problems [25]. Among them, deep Q-network (DQN) is a representative approach that combines Q-learning with deep neural networks. However, DQN [26,27] is primarily designed for discrete action spaces and is therefore not well suited for control problems involving continuous variables, such as phase shift modulation in power converters. As a result, the control implementation often depends on a lookup table generated from the trained reinforcement learning policy. Yet, when the training covers a wide operating range, the lookup table grows excessively large, imposing high computational and memory requirements.

To address this drawback, an RL–ANN hybrid approach has been proposed in [28], which employs an artificial neural network as a fast surrogate model to generate real-time control actions during online operation, thereby substantially cutting down on memory requirements. However, this RL–ANN framework still requires two separate training processes, resulting in high computational costs and prolonged training times.

To tackle these challenges, deep reinforcement learning (DRL) techniques are applied to address cognitive decision-making problems in complex systems [29,30]. Among representative DRL algorithms, the DDPG method uses neural networks to approximate both the policy and the Q-function, enabling efficient and stable control in continuous action spaces, effectively overcoming the limitations of conventional reinforcement learning approaches [31]. Consequently, the trained agent serves as a fast surrogate predictor that directly generates appropriate control decisions in real time. As a model-free DRL approach based on actor–critic architecture, DDPG employs a parameterized policy that is directly optimized via policy gradient methods to update the parameters of deep neural networks [32].

In summary, the DDPG algorithm is well suited for the control of multi-port DAB converters [33,34,35]. It naturally handles continuous control variables and captures nonlinear system dynamics through neural network approximation. In addition, ZVS constraints can be incorporated into the optimization objective via reward design, enabling desired switching performance. The data-driven nature also reduces reliance on accurate system models and improves robustness.

As summarized in Table 1, DDPG is compared with conventional optimization and reinforcement learning methods in terms of control flexibility, nonlinear capability, and real-time performance. Compared with GA, PSO, NR, and DQN, the proposed method provides superior capability in continuous control and nonlinear handling, making it more suitable for the optimization of multi-port converters.

Table 1. Comparison of the DDPG algorithm with other algorithms.

Algorithms	Action Space	Continuous Variable	Nonlinear Handling	Model Dependency	ZVS Constraints
GA [21]	Continuous	Good	Moderate	High	difficult
PSO [24]	Continuous	Good	Moderate	High	difficult
NR [22]	Continuous	Good	Weak	High	difficult
DQN [25,26]	Discrete	Poor	Moderate	Low	Good
RL-ANN [27]	Continuous	Moderate	Strong	Low	Good
DDPG [32,33,34]	Continuous	Good	Strong	Low	Good

The structure of this paper is organized as follows. The circuit working mode analysis of the TAB converter, TPS modulation method under normal and fault operation modes, stability analysis, and modeling are described in Section 2. A detailed loss analysis is presented in Section 3. The modulation method optimized using the DDPG algorithm is analyzed in Section 4, and the current stress and various power losses as functions of output power are evaluated and compared among the traditional parallel, series, and TAB topologies, as well as the DDPG-optimized topology. Experimental results obtained from a prototype employing the proposed topology are presented in Section 5. Finally, the conclusions of this paper are provided in Section 6.

2. TAB Converter Circuit Topology and Modulation Strategy

2.1. TAB Converter Circuit Topology

The circuit structure of a series DC-DC converter is depicted in Figure 3. The TAB converter includes three symmetrical H-bridges, two high-frequency isolation transformers T_ri, and a single series inductor L_r for auxiliary power transfer.

The primary side consists of two H-bridges (HB_i, I = 1~2), each of which is equipped with four MOSFET switches (S_i₁~S_i₄, I = 1~2). Moreover, the output H-bridge (HB_o) incorporates four switches (S_o₁~S_o₄). All switches use MOSFETs, with parasitic capacitances denoted as C_i₁~C_i₄ (HB_i) and C_o₁~C_o₄ (HB_o), respectively. On the DC side, the filter capacitors C_in and C_o are connected in parallel with the input-side H-bridge (HB_i) and the output-side H-bridge (HB_o), respectively. Furthermore, the turns ratio of T_ri is 1:n, and the corresponding voltage conversion ratio for each battery pack is m_i = U_o/nU_i, with a total voltage conversion ratio of m = U_o/U₁ + U₂, where m > 1 and m < 1 denote boost and buck scenarios, respectively. U_o is the output voltage and U_i is the input voltage of the i-th battery pack.

2.2. TPS Modulation Strategy

The TAB converter is designed to achieve power balancing and voltage conversion among battery packs, as well as to regulate the output-side power, under both normal and fault conditions. The corresponding converter modulation waveform is illustrated in Figure 4. On the primary side, the H-bridge switches S_i₁(S_i₃) and S_i₂(S_i₄) operate complementarily, with a constant 50% duty cycle. On the secondary side, the switches operate with a constant duty cycle, d_o = 0.5.

As a result, the diagonal switches within each pack are not activated simultaneously, creating an internal phase shift cycle d_i (0 < d_i < 0.5). If a short-circuit fault occurs in battery pack 2, for fast isolation, S₂₁ and S₂₃ are set to 1, S₂₂ and S₂₄ are set to 0, and the internal phase shift period d_i is adjusted to 0. The global phase shift angle γ (–90° < γ < 90°) between the primary and secondary sides is used to determine the direction and magnitude of power conversion throughout the system.

In this paper, only the forward power conversion mode (0° < γ < 90°) is considered. Prior to the derivation of the power expressions, the following assumptions are adopted to simplify the analysis: (i) the switches are assumed to operate under ideal conditions, neglecting switching transients and dead-time effects; (ii) the leakage inductance current is approximated as piecewise linear within each switching interval; (iii) parasitic resistances and magnetic losses are neglected; (iv) the converter is assumed to operate under steady-state conditions with constant input voltages and switching frequency.

From the modulation waveform diagram of the TAB converter, it is evident that the converter operates in six distinct states during half a switching cycle (t₁ to t₇). Based on the volt-second balance principle and Kirchhoff’s voltage law, the inductor current i_Lr over the interval from t_i to t_j can be expressed as follows (1):

i_{L r} (t_{j}) = \int_{t_{i}}^{t_{j}} \frac{\sum_{i = 1}^{2} n U_{p i} - U_{o}}{L_{r}} d t + i_{L r} (t_{i})

(1)

Interval 1 [t₁–t₂]: During this time, on the primary side, switches S₁₁ and S₁₄ in HB₁ are turned on, and S₂₄ in HB₂ is turned on. On the secondary side, switches S_o₂ and S_o₃ are turned on. At this stage, the inductor current is negative.

i_{L r} (t_{2}) = \frac{n U_{p 1} + U_{o}}{L r} (t_{2} - t_{1}) + i_{L r} (t_{1})

(2)

Interval 2 [t₂–t₃]: During this time, on the primary side, switches S₁₁ and S₁₄ in HB₁ are turned on, and switches S₂₁ and S₂₄ in HB₂ are turned on. On the secondary side, switches S_o₂ and S_o₃ are turned on. During the dead time, switch S₂₂ is turned off, and its parasitic capacitance is charged, while the parasitic capacitance of switch S₂₁ is charged, until the drain-source voltage of switch S₂₁ is zero, thus achieving ZVS of switch S₂₁. At this stage, the inductor current increases linearly.

i_{L r} (t_{3}) = i_{L r} (t_{2}) + \frac{n U_{p 1} + n U_{p 2} + U_{o}}{L_{r}} (t_{3} - t_{2})

(3)

Interval 3 [t₃–t₄]: During this time, on the primary side, switches S₁₁ and S₁₄ in HB₁ are turned on and switches S₂₁ and S₂₄ in HB₂ are turned on. On the secondary side, switches S_o₁ and S_o4 are turned on. During the dead time, switches S_o₂ and S_o₃ are turned off, and their parasitic capacitances are charged. At the same time, the parasitic capacitances of switches S_o₁ and S_o₄ are charged until the drain-source voltage of switches S_o₁ and S_o₄ are zero, thus achieving ZVS of switches S_o₁ and S_o₄. At this stage, the inductor current is positive.

i_{L r} (t_{4}) = \frac{n U_{p 1} + n U_{p 2} - U_{o}}{L_{r}} (t_{4} - t_{3}) + i_{L r} (t_{3})

(4)

Interval 4 [t₄–t₅]: During this time, on the primary side, switches S₁₁ and S₁₄ in HB₁ are turned on and switches S₂₁ and S₂₃ in HB₂ are turned on. On the secondary side, switches S_o₁ and S_o₄ are turned on. During the dead time, switch S₂₄ is turned off, and its parasitic capacitance is charged. At the same time, the parasitic capacitance of switch S₂₃ is charged until the drain-source voltage of switch S₂₃ is zero, thus achieving ZVS of switch S₂₃. At this stage, the inductor current is positive.

i_{L r} (t_{5}) = \frac{n U_{p 1} - U_{o}}{L_{r}} (t_{5} - t_{4}) + i_{L r} (t_{4})

(5)

Interval 5 [t₅–t₆]: During this time, on the primary side, switches S₁₁ and S₁₄ in HB₁ are turned on and switches S₂₁ and S₂₃ in HB₂ are turned on. On the secondary side, switches S_o₁ and S_o4 are turned on. At this stage, the inductor current is positive.

i_{L r} (t_{6}) = \frac{n U_{p 1} - U_{o}}{L r} (t_{6} - t_{5}) + i_{L r} (t_{5})

(6)

Interval 6 [t₆–t₇]: During this time, on the primary side, switches S₁₁ and S₁₃ in HB₁ are turned on, and switches S₂₁ and S₂₃ in HB₂ are turned on. On the secondary side, switches S_o₁ and S_o₄ are turned on. During the dead time, switch S₁₄ is turned off, and its parasitic capacitance is charged. At the same time, the parasitic capacitance of switch S₁₃ is charged until the drain-source voltage of switch S₁₃ is zero, thereby achieving ZVS of switch S₁₃. At this stage, the inductor current is positive.

i_{L r} (t_{7}) = \frac{- U_{o}}{L_{r}} (t_{7} - t_{6}) + i_{L r} (t_{6})

(7)

When a short-circuit occurs in the second battery pack, as shown in Figure 5, the faulty battery pack can be quickly isolated by simultaneously opening switches S₂₁ and S₂₃ (S₂₁/S₂₃ = 1, S₂₂/S₂₄ = 0), as shown in Figure 4, thereby ensuring reliable system operation.

Based on the modulation waveform diagram of the TAB converter, the TAB converter operates in six states during half a switching cycle (t₁–t₇). Without loss of generality, the operation states of HB₁–HB₂ are given in Figure 6. Each primary side full bridge (HB_i) and secondary side full bridge (HB_o) is regarded as a dual active bridge.

For ease of analysis, in Formula (8), the output current of the TAB converter, the total output power, and the output power of each battery pack are normalized using the standard reference values I_norm, P_o,norm, and P_i,norm, respectively.

I_{n o r m} = \frac{U_{o}}{4 f L_{r}}, P_{n o r m} = \frac{{U^{2}}_{o}}{4 f L_{r}}, P_{i, n o r m} = \frac{U_{o} U_{i}}{4 f L_{r}}

(8)

Without loss of generality, we consider the scenario where m > 1. By combining Equations (2)–(8), the inductor current i_Lr,norm at t₁~t₇ is given as:

\{\begin{cases} i_{L r, n o r m} (t_{1}) = \frac{(- 2 n d_{1})}{m_{1}} + \frac{(- 2 n d_{1})}{m_{2}} + 4 d_{1} - 1 - \frac{2 γ}{π} \\ i_{L r, m o r m} (t_{2}) = \frac{(- 2 n d_{2})}{m_{2}} + \frac{(- 2 n d_{1})}{m_{1}} + 2 d_{1} + 2 d_{2} - 1 - \frac{2 γ}{π} \\ i_{L r, m o r m} (t_{3}) = \frac{(- 2 n d_{2})}{m_{2}} + \frac{(- 2 n d_{1})}{m_{1}} + 1 - \frac{2 γ}{π} + 2 d_{1} - 2 d_{2} \\ i_{L r, m o r m} (t_{4}) = \frac{(- 2 n d_{1})}{m_{2}} + \frac{(- 2 n d_{1})}{m_{1}} + 1 - \frac{2 γ}{π} \\ i_{L r, m o r m} (t_{5}) = \frac{(\frac{2 n γ}{π} - 2 n d_{1})}{m_{2}} + \frac{(\frac{2 n γ}{π} - 2 n d_{1})}{m_{1}} + 1 \\ i_{L r, m o r m} (t_{6}) = \frac{(2 n d_{1})}{m_{1}} + \frac{(2 n d_{1})}{m_{2}} + 1 - 4 d_{1} + \frac{2 γ}{π} \\ i_{L r, m o r m} (t_{7}) = \frac{(2 n d_{2})}{m_{2}} + \frac{(2 n d_{1})}{m_{1}} - 2 d_{1} + 2 d_{2} - 1 - \frac{2 γ}{π} \end{cases}

(9)

By integrating the inductor current over half a switching cycle (t₁~t₇), the average output DC current I_o-avg can be derived from (10).

I_{o - a v g} = \frac{2}{T} \int_{t_{1}}^{t_{7}} i_{L r} d t

(10)

According to the average inductor current in half a switching cycle in Formula (10), the total output power P_o of TAB after TPS modulation is the product of the average output current I_o-avg and the output voltage U_o, as shown in Formula (11). The output power of a single battery pack is P_i, which is the product of the average output current I_o-avg and the battery pack voltage U_i, as shown in Formula (12).

P_{o} = U_{o} \cdot I_{o - a v g} = \frac{2}{T} \int_{t_{1}}^{t_{7}} i_{L r} d t

(11)

P_{i} = U_{i} \cdot I_{o - a v g} = \frac{2}{T} \int_{t_{1}}^{t_{7}} i_{L r} d t

(12)

Thus, the output power of each battery pack and the total output power of the TAB converter can be expressed by Equations (13) and (14).

P_{i n, i, n o r m} \frac{P_{i}}{P_{i, n o r m}} = - \frac{4 n d_{i}^{2}}{m_{i}} + \frac{2 n d_{i}}{m_{i}} + \frac{4 n d_{1} γ}{π m_{i}} - \frac{2 n γ^{2}}{π^{2} m_{i}}

(13)

P_{o, n o r m} = \frac{P_{o}}{P_{n o r m}} = \sum_{i = 1}^{2} [- \frac{4 n d_{i}^{2}}{m_{i}} + \frac{2 n d_{i}}{m_{i}} + \frac{4 n d_{1} γ}{π m_{i}} - \frac{2 n γ^{2}}{π^{2} m_{i}}]

(14)

2.3. ZVS Constraints

To achieve ZVS for all power switches S_i₁~S_i₄ and S_o₁~S_o₄, the parasitic capacitance of each switch must be fully discharged before turning on. Therefore, the inductor L_r must have sufficient energy to resonate with the parasitic capacitance and the series inductor L_r. Subsequently, the anti-parallel diodes of the power switches are naturally turned on, bringing the voltage between the drain and source close to zero. Therefore, if the power switches are turned on at this point, ZVS is achieved.

To ensure ZVS, the energy stored in the leakage inductor L_r at the switching instant must be sufficient to complete the commutation of the bridge arm. To be specific, this energy must exceed the energy required to simultaneously charge the output capacitance of one power switch and discharge the output capacitance of the other switch within the same bridge arm. Satisfying this condition allows the drain-source voltage of the incoming switch to fall to zero before the gate signal is applied, thereby eliminating turn-on losses. Accordingly, the following inequality must be satisfied:

\frac{1}{2} L_{r} {i^{2}}_{L_{r}} \geq \frac{1}{2} C_{i n - p a r} {U^{2}}_{i - D S} + \frac{1}{2} C_{o n - p a r} {U^{2}}_{o - D S}

(15)

In these expressions, C_in-par and C_out-par represent the parasitic capacitances of the primary side and secondary side switches, respectively, while U_i-DS and U_o-DS denote the drain-to-source voltages. For simplified analysis, ZVS is approximated by ensuring that a current flows from source to drain before turn-on. Under this approximation, the boundary condition for ZVS is defined as i_DS ≤ 0, where i_D_S represents the current flowing through the drain and source of the MOSFET. Accordingly, based on the waveforms shown in Figure 4, the ZVS constraints for switches S_i₁–S_i₄ and S_o₁–S_o₄ can be summarized as follows:

i_{L r, n o r m} (t_{2}) \leq 0, i_{L r, n o r m} (t_{3}) \geq 0

(16)

It is vital to note that critical ZVS turn-on is achieved at the boundary condition where the current defined in (14) becomes exactly zero. Although this operating point may introduce minor turn-on losses due to incomplete discharge of the parasitic capacitance before gate activation, it provides the advantage of enabling zero current turn-off for the complementary switch within the same bridge arm.

2.4. Stability Modeling and Analysis

In order to rigorously analyze the dynamic behavior of TAB converters under three-phase modulation, based on the power flow characteristics according to Equation (14), and to derive a small-signal model, each variable is decomposed into a steady-state value plus a small disturbance:

\{\begin{matrix} γ = γ_{o} + \hat{γ} \\ P = P_{o} + {\hat{P}}_{i} \\ U_{i} = U_{i o} + {\hat{U}}_{i} \end{matrix}

(17)

where γ_o, U_io, and P_o are the global phase shift angle, battery pack voltage, and output power in steady state, respectively.

\hat{γ}

,

{\hat{U}}_{i}

, and

{\hat{P}}_{i}

are the corresponding perturbations.

Applying first-order Taylor expansion around the equilibrium point:

{\hat{P}}_{o, norm} = {\frac{\partial P_{o, norm}}{\partial γ}|}_{γ_{0}} \hat{γ}

(18)

By linearizing the power expression around the equilibrium point, the small-signal power variation can be expressed in compact form as:

\hat{P} = K_{γ} \hat{γ}

(19)

where the power gain coefficient K_γ is:

K_{γ} = \frac{n \sum_{i = 1}^{N} U_{i}}{8 f L_{r}}

(20)

This result indicates that, near the operating point, the TAB converter exhibits a linear proportional relationship between global phase shift perturbation and output power variation. The magnitude of this gain is determined by voltage level, leakage inductance, and switching frequency.

The dynamic response of the output voltage can be described by the relationship between the capacitor current and voltage. Based on Kirchhoff’s current law and Kirchhoff’s voltage law, the dynamic response expression of the output voltage to the modulation signal can be obtained:

C_{o} \frac{d {\hat{U}}_{o}}{d t} = \frac{{\hat{P}}_{o}}{U_{o}} - \frac{{\hat{U}}_{o}}{R_{o}} = \frac{\hat{γ}}{U_{o}} K_{γ} - \frac{{\hat{U}}_{o}}{R_{o}}

(21)

By applying the Laplace transform to the dynamic Equations obtained above, the open-loop transfer function can be solved:

G_{p} (s) = \frac{n U_{o} R_{o} (1 - \frac{2 γ}{π}) \sum_{i = 1}^{2} U_{i}}{8 f L_{r} (R_{o} C_{o} s + 1)}

(22)

Based on the Bode plot shown in Figure 7, the control performance under normal operating conditions was evaluated.

The open-loop function is shown in Equation (22). After closed-loop PI compensation, the low-frequency gain increases, effectively reducing the steady-state error. The cutoff frequency is approximately 270 Hz, achieving a good balance between dynamic response speed and high-frequency noise attenuation. Furthermore, the phase margin is designed to be 48°, indicating that the system possesses sufficient stability and damping characteristics.

In addition, the measured frequency response agrees well with the analytical model, verifying the accuracy of the second-order model. These results confirm that the proposed control design can achieve the required stability and dynamic performance.

3. Losses Distribution

The primary goal of this study is to enhance the efficiency of the TAB converter, necessitating a rigorous quantification of individual loss components. Total power dissipation is primarily categorized into semiconductor losses, magnetic component losses, and residual losses. Magnetic losses encompass conduction losses and excitation losses in the transformer T_ri and the inductor L_r. In addition, semiconductor losses include conduction losses, switching transition losses, and gate drive power consumption. Residual losses are mainly due to temperature-dependent variations in the conduction resistance of MOSFET modules. A detailed breakdown of power loss analysis is provided in the following sections.

3.1. Losses Model of Power Switches

(1) Switching Losses: In general, the evaluation of switching losses depends on the operating conditions of power semiconductors, namely soft switching or hard switching. Accordingly, total switching loss comprises turn-on and turn-off losses. Specifically, the expressions for the turn-on loss and turn-off loss of MOSFETs are as follows:

P_{o n} = f_{s} (\frac{1}{2} V_{d s} I_{o n} (t_{r i} + t_{f - o n}) + \frac{1}{2} C_{d s} V_{d s}^{2})

(23)

P_{o f f} = f_{s} (\frac{1}{2} V_{d s} I_{o f f} (t_{r v} + t_{f - o f f}))

(24)

where P_on denotes the turn-on losses, P_off indicates the turn-off losses, f_s represents the switching frequency, V_ds denotes the drain-source voltage, C_ds represents the parasitic capacitance, I_on denotes the turn-on current, and I_off is the turn-off current. Moreover, t_{f_on} denotes the turn-on delay time, t_ri represents the current rise time, t_{f_off} signifies the turn-off delay time, and t_rv denotes the voltage fall time of the MOSFET, respectively. Therefore, the total switching loss is as follows:

P_{t o t a l_s w} = P_{o n} + P_{o f f}

(25)

Specifically, switching losses are negligible under soft switching conditions, with turn-on losses effectively eliminated under ZVS and turn-off losses under zero current switching.

(2) Conduction Losses: The conduction losses in power semiconductors are primarily determined by the circulating RMS current and the device’s on-state resistance (R_ds_(on)). Specifically, it is critical to account for the temperature dependency of on-state resistance, which typically increases with junction temperature (T_j), thereby affecting overall efficiency at high power levels.

P_{c o n} = {I^{2}}_{r m s} \cdot R_{d s (o n)} (T_{j})

(26)

Given that each power switch operates with a 50% duty cycle, it conducts for exactly half of the switching period. Specifically, the total loss corresponds to the sum of power dissipation across the on-state resistance of all active semiconductor devices in both primary and secondary bridges. Accordingly, the cumulative conduction losses P_{total_con} of all switches admit an analytical expression as follows:

P_{total_con} = 8 R_{ds_in} {(\frac{i_{Lr_rms}}{\sqrt{2}})}^{2} + 4 R_{ds_out} {(\frac{n \cdot i_{Lr_rms}}{\sqrt{2}})}^{2}

(27)

where R_{ds_in} represents the on-resistance of the power switches (S_i₁~S_i₄) and R_{ds_out} denotes the on-resistance of the power switches (S_o₁~S_o₄).

(3) Gate Drive Losses: The power dissipation in the gate drive circuitry stems from the repeated charging and discharging of the MOSFET’s internal gate capacitance during each switching cycle. This loss mainly depends on the total gate charge Q_g required for device state transitions and the magnitude of the applied gate source voltage V_gs. Accordingly, the gate driver loss can be calculated as follows:

P_{gate} = Q_{g} \cdot V_{gs} \cdot f_{s}

(28)

where V_gs represents the gate driver voltage and Q_g denotes the gate charge capacitance in the MOSFET, respectively.

For the TAB converter with 12 power switches, the total gate drive power consumption (P_{total_gate}) is proportional to the switching frequency and can be analytically expressed as:

P_{total_gate} = 8 Q_{g-in} \cdot V_{gs-in} \cdot f_{s} + 4 Q_{g-out} \cdot V_{gs-out} \cdot f_{s}

(29)

where V_{gs_in} denotes the gate driver voltage and Q_{g_in} represents the gate charge capacitance of the power switches S_i₁~S_i4. Similarly, V_{gs_out} denotes the gate driver voltage and Q_{g_out} signifies the gate charge capacitance of the power switches S_o₁~S_o₄.

Thus, the losses of the total power switches are as follows:

P_{total_sw_loss} = P_{total_sw} + P_{total_con} + P_{total_gate}

(30)

3.2. Power Losses Model in Magnetic Components

The total power dissipation in magnetic components, including the high-frequency transformer T_ri and resonant inductor L_r, comprises winding losses and excitation losses. Under the assumption of constant winding resistance and neglecting frequency-dependent skin and proximity effects, copper losses P_copper of these magnetic components are analytically defined based on Joule heating as follows:

P_{c o p p e r} = {I^{2}}_{m_r m s} \cdot R_{m}

(31)

To simplify the electromagnetic analysis of the magnetic cores, the induction waveforms in T_ri and L_r are assumed to be dominated by their fundamental sinusoidal components. Consequently, the classical Steinmetz Equation serves to estimate the core power dissipation, P_core, which admits the following analytical expression (32):

P_{core} = k \cdot f_{s}^{α} \cdot {\hat{B}}^{β} \cdot V_{e}

(32)

where k, α, and β are empirical Steinmetz coefficients that describe the hysteresis and eddy current characteristics of a specific core material. These parameters are typically obtained from manufacturer loss curves or datasheets.

\hat{B}

is the peak magnetic flux density within the core and depends on the applied voltage excitation and the effective cross-sectional area of the core. V_e is the effective magnetic volume of the core and reflects the total volume of material subjected to magnetic flux. Accordingly, the power loss of the transformer T_ri can be expressed as follows:

P_{Tr_loss} = I_{L_{k_rms}}^{2} \cdot (R_{Tri_p} + n^{2} \cdot R_{Tri_s}) + k_{Tri} F_{s}^{α_{-} t r i} {\hat{B}}_{Tri}^{β_{Tri}}

(33)

where R_{Tri_p} denotes the winding resistor of T_ri on the primary side and R_{Tri_s} denotes the total winding resistance of the transformer T_ri on the secondary side.

{\hat{B}}_{T r i} \approx \frac{2 U_{i}}{π^{2} N_{T r i_p} A_{T r i}}

(34)

where N_{Tri_p} denotes the number of winding turns of each transformer on the primary side, and A_Tri represents the effective magnetic cross-sectional area of each transformer.

The losses of the series inductor L_r can be calculated as:

P_{L_{r_loss}} = I_{L_{k_rms}}^{2} \cdot R_{L_{r_P}} + k_{L_{r}} f_{s}^{α_{-} L_{r}} {\hat{B}}_{L_{r}}^{β_{L r}}

(35)

where R_{Lr_p} denotes the winding resistor of L_r. The

{\hat{B}}_{L r}

can be estimated as:

{\hat{B}}_{L r} \approx \frac{μ_{e f f} μ_{0} i_{L r_\max}}{l_{L r}}

(36)

where μ_eff denotes the effective relative permeability of the magnetic core and accounts for the presence of a physical air gap introduced to prevent magnetic saturation; μ₀ represents the permeability of free space; i_{Lr_max} signifies the peak instantaneous current flowing through the resonant inductor L_r, which is utilized to determine the maximum magnetic flux density; and 1_Lr refers to the effective magnetic circuit length of the iron core utilized for the series inductor L_r_, respectively.

Finally, the major power losses can be calculated as:

P_{all_loss} = P_{total_sw} + P_{total_con} + P_{total_gate} + P_{Tr_loss} + P_{Lr_loss}

(37)

The efficiency of the TAB converter can be calculated as:

η = \frac{P_{o}}{P_{o} + P_{All_loss}}

(38)

4. Algorithm Optimization of TAB Converter Under TPS Modulation

In this section, the DDPG algorithm is applied to enhance the conversion efficiency of the TAB converter under ZVS constraints. The optimization focuses on identifying suitable phase shift ratios d₁, d₂, and γ to minimize total power losses.

4.1. DDPG Algorithm

During the optimization control phase of the TAB converter, the main goal is to select an optimal control action that enables operation with minimum power dissipation based on real-time state variables U₁, U_2, and P_o. From a theoretical standpoint, this efficiency optimization problem can be formulated as a Markov decision process.

DDPG, an advanced DRL framework, is employed to derive an optimal control strategy by solving this markov decision process. In the DDPG algorithm, the policy function maps the state vector st equal to s_t= [U₁, U₂, P_o] ^T to the corresponding optimal action vector α= [d₁, d₂, γ] ^T.

Meanwhile, the critic network evaluates each state action pair to approximate the corresponding action-value function Q of s and α. The operational workflow of the proposed DDPG-based control methodology is depicted in Figure 8.

In the DDPG framework, the actor function

α_{t} = μ (s_{t} | θ^{μ})

defines the deterministic policy parameterized by

θ^{μ}

and directly maps the state s_t to a specific control action α_t. Concurrently, the critic function

Q^{θ Q} (s_{t}, a_{t}| θ^{μ})

approximates the action-value function with parameters

θ^{Q}

. Based on the Bellman Equation, the action-value function

Q^{θ Q} (s_{t}, a_{t})

represents the expected cumulative return and admits the following formulation.

Q^{θ Q} (s_{t}, a_{t}) = E_{s_{t + 1} \sim E} [r (s_{t}, a_{t}) + γ Q^{θ Q} (s_{t + 1}, μ_{θ} (s_{t + 1}))]

(39)

In this formulation,

Q^{θ Q} (s_{t}, μ_{θ} (s_{t})

represents the expected cumulative return obtained by selecting the optimal action in state S_t under the given policy

π

. To ensure accurate value estimation, the critic network is updated through minimization of a loss function

L^{θ Q}

that represents the mean squared Bellman error.

The loss function admits the following mathematical expression:

L (θ^{Q}) = E_{μ_{θ}^{'}} {[(Q (s_{t}, a_{t} | θ^{Q}) - y_{t})]}^{2}

(40)

where μ′_ɵ denotes the target policy introduced to stabilize the learning process, while y_t represents the target value produced by the target critic and target actor networks. It is important to note that y_t is inherently dependent on the parameters

θ^{Q}

of the target critic network. The target action-value y_t can be expressed as follows:

y_{t} = r (s_{t}, a_{t}) + γ_{d} Q^{'} (s_{t + 1}, μ_{θ}^{'} (s_{t + 1} | θ^{μ^{'}}) | θ^{Q^{'}})

(41)

Moreover, the parameters of the actor network are updated using the deterministic policy gradient ∇_θ^μJ. The policy gradient can be analytically derived using the chain rule as follows:

𝛻_{θ^{μ}} J = E_{μ_{θ}^{'}} [𝛻_{θ^{μ}} Q (s, a | θ^{Q} {) |}_{s = s_{t}, a = μ_{θ} (s_{t} | θ^{μ})}] = E_{μ_{θ}^{'}} [𝛻_{a} Q (s, a | θ^{Q} {) |}_{s = s_{t}, a = μ_{θ} (s_{t})} \cdot 𝛻_{θ^{μ}} μ_{θ} (s | θ^{μ} {) |}_{s = s_{t}}]

(42)

During the update process of the action-value function

Q (s_{t}, a_{t}| θ^{Q})

, the target value y_t changes continuously, which may undermine the convergence of the critic network. To mitigate this instability, target networks are introduced. Instead of directly copying the primary network weights, a soft update strategy is adopted.

Specifically, a target critic network

Q (s_{t}, a_{t}| θ^{Q^{,}})

and a target actor network are maintained, where the target parameters track the primary network weights through a smoothing factor. The gradual transfer of these smoothed weights from the primary networks to the target networks, as illustrated in Figure 8, can be mathematically expressed as follows.

{soft update}_{τ = 0.001} \{\begin{matrix} θ^{Q^{'}} \leftarrow τ θ^{Q} + (1 - τ) θ^{Q^{'}} \\ θ^{μ^{'}} \leftarrow τ θ^{μ} + (1 - τ) θ^{μ^{'}} \end{matrix}

(43)

where α denotes the soft-update rate, determining the tracking speed of the target networks. Assigning a small value to this parameter constrains the update speed of the target values y_t and significantly improves the stability and convergence of the training process. This mechanism suppresses oscillatory behavior commonly associated with frequent parameter updates in DRL.

In the DDPG algorithm, an exploration strategy supports effective exploration within the continuous action domain. To enhance exploration capability, Gaussian noise

N μ_{θ} (s_{t}| {θ_{t}}^{μ}, σ)

is added to the deterministic action policy μ, thereby forming a new exploratory policy μ′.

This process can be mathematically expressed as follows:

μ_{θ}^{'} (s_{t}) = μ_{θ} (s_{t} |θ_{t}^{μ}) + N (μ_{θ} (s_{t} | θ_{t}^{μ}), σ)

(44)

where

N μ_{θ} (s_{t}| {θ_{t}}^{μ}, σ)

is sampled from the noise process and injected into the practical environment during action execution. The parameter σ remains constant during the initial stage and gradually decreases at a fixed rate once the replay memory reaches its maximum capacity.

To obtain the minimum operating loss modulation strategy for TAB converters using the DDPG algorithm, the objective function f(x) is defined as follows:

f (x) = [P_{a l l - l o s s}] = \min [P_{total_sw} + P_{total_con} + P_{total_gate} + P_{Tr_loss} + P_{Lr_loss}]

(45)

4.2. Reward Function for Minimizing the Power Losses

The total power dissipation of the TAB converter is denoted as P_all-loss, as defined in Equation (29). Additionally, the optimization problem includes a nonlinear equality constraint P_o equal to P_or, where P_or denotes the target output power and P_o represents the actual output power measured during the training process. To guide the operating point toward the desired power level, a penalty function φ (d₁, d₂, γ) is introduced to quantify the deviation between the actual and target output power.

φ (d_{1}, d_{2}, γ) = {(P_{o r} - P_{o})}^{2}

(46)

The minimum value of zero for φ (d₁, d₂, γ) is achieved when P_or = P_o. Thus, the reward function F (d₁, d₂, γ) should consist of a fitness function P_loss and a penalty function φ (d₁, d₂, γ), which is given by:

F (d_{1}, d_{2}, γ) = - [ω \cdot P_{l o s s} (d_{1}, d_{2}, γ) + α \cdot φ (d_{1}, d_{2}, γ)]

(47)

where ω and α denote the penalty coefficients. The value of α is set to 150 to amplify the impact of the power tracking error within the penalty function. Furthermore, to achieve ZVS, the coefficient ω is dynamically assigned: ω = 1.5 when all power switches S_i₁ to S_i₄ and S_o₁ to S_o₄ satisfy the soft switching condition defined in Equation (14). Otherwise, ω = 15 to penalize the loss of ZVS. The reward function is constructed so that its value increases as power losses and power tracking errors decrease. As a result, the objective of minimum power loss and zero power deviation under ZVS constraints is reformulated as a reward maximization problem F (d₁, d₂, γ).

4.3. Training of the DDPG Algorithm

During each episode, the states (U₁, U₂, P_o) are randomly sampled from the ranges specified in Table 2. In summary, the DDPG framework takes the TAB converter operating states (U₁, U₂, P_o) as inputs, processes them through an actor–critic neural network under a reward-driven optimization mechanism during offline training, and outputs an optimized control policy that maps system states to the optimal TPS control parameters. Accordingly, the appropriate selection of critical hyperparameters significantly influences training performance. In this study, both the actor and critic networks employ a symmetric architecture, each comprising two hidden layers with 256 and 128 neurons, respectively. Hyperparameter optimization is carried out using the grid search method, with the training process spanning 10,000 episodes.

For the proposed DDPG-based optimization framework for the TAB converter, the input system states are defined as s_t = [U₁, U₂, P_o]^T, and the optimal action vector to be optimized is αt = [d1, d2, γ]^T. The detailed training process of the DDPG algorithm is presented step-by-step as follows:

Step 1: Randomly initialize the actor network θ^μ, critic network θ^Q and softly initialize their target networks.
Step 2: For each training episode, observe the initial system state s_t.
Step 3: In each training step, select action $α_{t} = μ (s_{t} | θ^{μ}) + n_{t}$ by adding exploration noise n_t to the actor network output.
Step 4: Execute α_t, obtain the reward r_t and next state s_t₊₁, then store the experience tuple (s_t, α_t, r_t, s_t₊₁) into the replay buffer.
Step 5: Sample a mini-batch of experience data and compute the target value.
Step 6: Update the critic network by minimizing the loss function shown in Equation (39).
Step 7: Update the actor network using the deterministic policy gradient in Equation (41).
Step 8: Soft update the target networks as shown in Equation (40) and repeat until convergence.

Copper losses are calculated based on the winding resistance of the magnetic components. In this paper, the selected switching frequency (20 kHz) has a relatively small impact on the winding resistance. Therefore, the additional resistance introduced by these effects is negligible and has a minimal impact on the overall loss analysis. The main goal of the DDPG learning algorithm is to obtain the optimal control strategy with minimal power loss throughout the entire operating range. The key parameters are summarized in Table 3.

4.4. DDPG Algorithm Optimization Results

To ensure a fair and consistent comparison, all configurations utilize switches from Infineon and Microchip Technology. The voltage and current ratings of the selected MOSFETs, as detailed in Table 4, are set to 1.5 to 2 times their rated operating values to provide sufficient safety margins.

Through Table 5, a comprehensive comparative analysis is conducted on these three configurations from dimensions such as fault tolerance, ZVS performance, and the number of key components with control complexity.

In a series configuration, two transformers with a turn ratio of 1:1 are required. Using extended phase shift modulation to control the power of each individual port and overall power transmission requires four degrees of freedom. In parallel configuration, two transformers are required with a turn ratio of 1:2. Using single phase shift modulation to independently adjust the power of each port requires two degrees of freedom (DOF), but the output voltage range is limited in parallel configuration. If a port fails, the parallel topology without fault isolation capability will affect power output.

A comparative analysis is conducted among traditional series architecture (Figure 2a), parallel architecture (Figure 2b), TAB (Figure 3), and TAB topologies optimized using the DDPG algorithm, focusing on current stress, switching losses, and magnetic component losses.

Figure 9 illustrates the comparison of current stress values and different power losses with P_o when U_{i_parallel} = 48 V, U_{i_series} = 48 V, U_{i_TAB} = 48 V, and P_o changes from 0 W to 216 W. Specifically, Figure 9a depicts the changing trend of the current stress I_stress, while Figure 9b,c depict the curves for power switch loss and magnetic component loss, respectively. Figure 9d presents the curves of all power losses P_{all_loss}. As indicated in Figure 9a, the DDPG-optimized TAB converter exhibits the smallest current stress, whereas the traditional parallel topology experiences the largest current stress. Given that the traditional parallel and series topologies utilize more switches and transformers, their total losses, as depicted in Figure 9d, are also the highest. After DDPG optimization for efficiency, the TAB converter achieves the lowest losses and the highest efficiency.

Under fault-tolerant mode, the evolution of current stress and power losses relative to varying output power is illustrated in Figure 10. For this analysis, P_o is increased from 0 W to 200 W while the transformer Tr_{1_TAB} turns ratio is fixed at 1:2. Specifically, Figure 10a illustrates the changing trend of current stress I_stress, while Figure 10b,c show the curves for power switching loss and magnetic component loss, respectively. Figure 10d demonstrates the curves for all power losses P_{all_loss}. As illustrated in Figure 10a,d, the DDPG-optimized TAB converter achieves smaller current stress and total losses.

Specifically, Figure 10a,d present the comparison of current stress and total power loss between the TAB converter with and without DDPG optimization under normal mode. At the 200 W operating point, the proposed method reduces the current stress by 11.2% under normal operation and 12.1% under fault-tolerant operation. In addition, the efficiency is improved by 2.9% and 3.9%, respectively. These results clearly demonstrate the effectiveness of the proposed DDPG-based method in improving TAB converter performance.

5. Experimental Verifications

To ensure fault tolerance, the prototype remains in TAB topology. The primary side of the experimental setup is equipped with three battery packs and corresponding transformers. In normal mode, the third battery pack does not provide power; in fault-tolerant mode, the third pack provides power. This allows for comparison of the TAB converter’s system efficiency and ZVS region in both normal and fault-tolerant modes. All nominal values and operating ranges of the experimental prototype are detailed in Table 6. The controller models employed are TMS320F28377 and GW1N-UV4PG256C6/I5. Figure 11 displays a photograph of the experimental platform, which verifies the feasibility and performance characteristics of the TAB converter in both normal and fault-tolerant modes.

5.1. Normal Mode

The power change of the battery pack under the normal mode of the converter is tested, as shown in Figure 12. The input power of the bat2 increases at time t. The internal phase shift ratio (d₂) of the transformer primary side voltage U_p₂ is increased. Consequently, the leakage inductor current i_Lr and the output DC current I_o also rise.

5.2. Switch from Normal Mode to Fault-Tolerant Mode

To assess fault-tolerant capability, the waveform transition from normal to fault-tolerant mode is illustrated in Figure 13. When a short-circuit fault occurs in bat2, bat2 is isolated at time t. After isolation, the transformer primary side voltage U_p2 drops to 0V. The leakage inductor current i_Lr declines.

5.3. Fault-Tolerant Mode

To verify the fault-tolerant operational capability, the power change of the battery pack under fault-tolerant mode is indicated in Figure 14. The input power of the bat1 increases at time t. The internal phase shift ratio (d₁) of the transformer primary side voltage U_p₁ increases. Consequently, the leakage inductor current i_Lr also increases.

5.4. ZVS and Converter Efficiency

Under normal mode and fault-tolerant mode, taking switch S₁₁ as an example, the ZVS performance of the converter is tested as shown in Figure 15.

Figure 16 illustrates the efficiency curves of the system under normal mode and fault-tolerant operating mode. In normal mode, the system achieves a broader ZVS region, allowing operation closer to its optimal point and resulting in higher efficiency compared to the fault-tolerant mode. Under normal operating conditions, DDPG optimization increases the maximum efficiency to 96.9%, representing an improvement of approximately 3.9% compared to the unoptimized TAB converter. Under fault-tolerant conditions, the maximum efficiency is improved by about 4.96%. Table 7 shows a comparison of the efficiency of TPS and DDPG-TPS under different power points and operating conditions.

In summary, the inductor current waveforms under different operating conditions are compared with simulation and experimental results, and good agreement in waveform shape is observed. To provide quantitative validation, the peak current values are also evaluated, where under normal operation a peak current of 6.2 A is measured experimentally and 6.43 A is predicted by the analytical model, resulting in a deviation of 3.5%. Under fault-tolerant operation, the measured peak current is 7.15 A and the predicted value is 7.43 A, corresponding to a deviation of 3.8%, thereby demonstrating good accuracy and physical consistency of the proposed analytical model.

6. Conclusions

This paper proposes a TAB converter under three-phase shift modulation and uses deep deterministic policy gradient technology to minimize power loss, thereby improving overall operating efficiency. The main contributions include: (i) the proposed TAB converter under TPS modulation can achieve voltage conversion and bidirectional power transmission; (ii) in the event of short-circuit faults, it isolates the faulty port without requiring additional hardware and ensures uninterrupted power delivery to the load through an effective modulation strategy; (iii) within the power range of 0–200 W, the proposed DDPG method improves the conversion efficiency by approximately 2.85–3.97% under normal mode and by 3.89–4.96% under fault-tolerant mode, compared with the conventional TPS strategy; and iv) experimental validation confirms the achievement of soft switching, the improvement in conversion efficiency based on DDPG, and robust cooperative operation with fault-tolerant capabilities, thereby ensuring reliable power conversion and continuous utilization of renewable energy in shipboard DC system applications.

In this paper, although the proposed DDPG-based optimization method demonstrates improved performance, several practical considerations should be noted: (i) the training process requires considerable computational resources; however, it is performed offline, and the trained policy can be efficiently implemented in real time. (ii) The performance may be affected by parameter variations, and significant deviations from the training conditions could lead to degradation. (iii) The validation is conducted on a low-power prototype, and extension to higher-power systems may introduce additional challenges. These issues will be further investigated in future work to improve the robustness and scalability of the proposed method.

Author Contributions

Conceptualization, Y.H.; methodology, Y.H. and Q.Z.; software, Y.H.; validation, Y.H. and Q.Z.; formal analysis, Y.H. and S.W.; investigation, Y.H. and M.Z.; resources, B.Z.; data curation, Y.H.; writing original draft preparation, Y.H. and Q.Z.; writing review and editing, Y.H. and M.Z.; supervision, S.W.; project administration, Q.Z. and M.Z.; funding acquisition, Y.H. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Postgraduate Research & Practice Innovation Program of Jiangsu Province sponsored by the Department of Education of Jiangsu Province, grant number SJCX25_2508.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no personal, academic, or financial conflicts of interest associated with this paper.

References

Aldarsi, E.; Singh, R.; Zhang, J. A Case Study of a Stand-Alone AC and DC Power Network in the Red Sea New City, Kingdom of Saudi Arabia. Electronics 2026, 15, 1077. [Google Scholar] [CrossRef]
Liu, J.; Ming, Z.; Shi, H.; Zhang, H.; Liu, Z. A Hierarchical Distributed Method for Source-Grid-Load-Storage Coordinated Power and Energy Balance in Distribution Networks. Electronics 2026, 15, 1054. [Google Scholar] [CrossRef]
Liu, Y.; Tang, F.; Liu, Z.; Zuo, L. Research on the Optimal Transient Power Angle Control Strategy for New Energy Transmission Systems in Energy Storage Enhancement Areas. Sustainability 2026, 18, 1636. [Google Scholar] [CrossRef]
IEEE Std 1709-2018 (Revision of IEEE Std 1709-2010); IEEE Recommended Practice for 1 kV to 35 kV Medium-Voltage DC Power Systems on Ships. IEEE: Piscataway, NJ, USA, 2018; pp. 1–54.
Baltazar, P.; Barros, J.D.; Gomes, L. A Distributed Electric Vehicles Charging System Powered by Photovoltaic Solar Energy with Enhanced Voltage and Frequency Control in Isolated Microgrids. Electronics 2026, 15, 418. [Google Scholar] [CrossRef]
Chen, Q.; Su, Y.; Hu, B.; Shao, C.; Xu, L.; Huang, C. Dynamic State Estimation for Sustainable Distribution Systems Considering Data Correlation and Noise Adaptiveness. Sustainability 2026, 18, 1693. [Google Scholar] [CrossRef]
Maevsky, D.; Kharchenko, V.; Bardis, N.; Stetsiuk, D.; Maevskaya, E.; Kryvda, V. Dependability Model of Electric Power Systems for Assessing Smart City Energy Sustainability. Sustainability 2026, 18, 1512. [Google Scholar] [CrossRef]
Chen, Y.; Ma, J.; Zhu, M.; Liu, J. Dual-Mode Wide-Voltage-Range Operation of Hybrid Triple Active Bridge Converter for Bipolar DC Distribution Systems. IEEE Trans. Ind. Appl. 2024, 60, 8998–9014. [Google Scholar] [CrossRef]
Cao, T.; Zhu, J.; Guo, Y.; Han, Y.; Wu, B.; Li, D. Utilizing the Intrinsic CC/CV Characteristics of a CLLC Converter for Battery Charging with ZVS Operation. Electronics 2026, 15, 1128. [Google Scholar] [CrossRef]
Kang, X.; Li, S.; Smedley, K.M. Decoupled PWM Plus Phase-Shift Control for a Dual-Half-Bridge Bidirectional DC–DC Converter. IEEE Trans. Power Electron. 2018, 33, 7203–7213. [Google Scholar] [CrossRef]
Zhang, H.; Dong, D.; Liu, W.; Ren, H.; Zheng, F. Systematic Synthesis of Multiple-Input and Multiple-Output DC–DC Converters for Nonisolated Applications. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 10, 6470–6481. [Google Scholar] [CrossRef]
Ma, J.; Chen, Y.; Shen, X.; Qiu, Y. Fault-Tolerant Multiport Active Bridge Converter for Resilient Energy Storage Integration in Zonal Shipboard DC System. J. Mar. Sci. Eng. 2025, 13, 654. [Google Scholar] [CrossRef]
Yang, W.; Ma, J.; Zhu, M.; Hu, C. Open-Circuit Fault Diagnosis and Tolerant Method of Multiport Triple Active-Bridge DC-DC Converter. IEEE Trans. Ind. Appl. 2023, 59, 5473–5487. [Google Scholar] [CrossRef]
Sato, Y.; Uno, M.; Nagata, H. Nonisolated Multiport Converters Based on Integration of PWM Converter and Phase-Shift-Switched Capacitor Converter. IEEE Trans. Power Electron. 2020, 35, 455–470. [Google Scholar] [CrossRef]
Ma, J.; Zhu, M.; Li, Y.; Cai, X. Monopolar Fault Reconfiguration of Bipolar Half Bridge Converter for Reliable Load Supply in DC Distribution System. IEEE Trans. Power Electron. 2022, 37, 11305–11318. [Google Scholar] [CrossRef]
Shih, L.; Liu, Y.; Chiu, H. A novel hybrid mode control for a phase-shift full-bridge converter featuring high efficiency over a full-load range. IEEE Trans. Power Electron. 2019, 34, 2794–2804. [Google Scholar] [CrossRef]
Guo, Z. Modulation scheme of dual active bridge converter for seamless transitions in multiworking modes compromising ZVS and conduction loss. IEEE Trans. Ind. Electron. 2020, 67, 7399–7409. [Google Scholar] [CrossRef]
Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. Reinforcement learning based efficiency optimization scheme for the DAB DC–DC converter with triple-phase-shift modulation. IEEE Trans. Ind. Electron. 2021, 68, 7350–7361. [Google Scholar] [CrossRef]
Bhattacharjee, A.K.; Batarseh, I. Optimum hybrid modulation for improvement of efficiency over wide operating range for triple-phase-shift dual-active-bridge converter. IEEE Trans. Power Electron. 2020, 35, 4804–4818. [Google Scholar] [CrossRef]
Sha, D.; Sun, T.; Zhang, J. Varying switching frequency control for current-fed dual-active bridge DC–DC converter with constant flux density change for transformers. IEEE Trans. Power Electron. 2020, 35, 3766–3777. [Google Scholar] [CrossRef]
Hou, N.; Li, Y. Overview and comparison of modulation and control strategies for non-resonant single-phase dual-active-bridge DC-DC converter. IEEE Trans. Power Electron. 2020, 35, 3148–3172. [Google Scholar] [CrossRef]
Meng, L.; Dragicevic, T.; Vasquez, J.C.; Guerrero, J.M. Tertiary and secondary control levels for efficiency optimization and system damping in droop-controlled DC–DC converters. IEEE Trans. Smart Grid. 2015, 6, 2615–2626. [Google Scholar] [CrossRef]
Du, Z.; Tolbert, L.M.; Chiasson, J.N.; Ozpineci, B. Reduced switchingfrequency active harmonic elimination for multilevel converters. IEEE Trans. Ind. Electron. 2008, 55, 1761–1770. [Google Scholar] [CrossRef]
Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement learning and its applications in modern power and energy systems: A review. J. Modern Power Syst. Clean Energy. 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
Dilokthanakul, N.; Kaplanis, C.; Pawlowski, N.; Shanahan, M. Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3409–3418. [Google Scholar] [CrossRef] [PubMed]
Nwachukwu, S.E.; Folly, K.A.; Awodele, K.O. Soft Actor-Critic-Based MPPT Control of Solar PV Systems Under Partial Shading Conditions. IEEE Open Access J. Power Energy 2025, 12, 194–208. [Google Scholar] [CrossRef]
Tang, Y.; Hu, W.; Xiao, J. A Deep Q-Network based optimized modulation scheme for Dual-Active-Bridge converter to reduce the RMS current. Energy Rep. 2020, 6, 1192–1198. [Google Scholar] [CrossRef]
Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. RL-ANN-Based Minimum-Current-Stress Scheme for the Dual-Active-Bridge Converter with Triple-Phase-Shift Control. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 10, 673–689. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Cao, D.; Hu, W.; Zhao, J.; Huang, Q.; Chen, Z.; Blaabjerg, F. A multi-agent deep reinforcement learning based voltage regulation using coordinated PV inverters. IEEE Trans. Power Syst. 2020, 35, 4120–4123. [Google Scholar] [CrossRef]
Qiu, C.; Hu, Y.; Chen, Y.; Zeng, B. Deep deterministic policy gradient (DDPG)-Based energy harvesting wireless communications. IEEE Internet Things J. 2019, 6, 8577–8588. [Google Scholar] [CrossRef]
Xu, H.; Sun, H.; Nikovski, D.; Kitamura, S.; Mori, K.; Hashimoto, H. Deep reinforcement learning for joint bidding and pricing of load serving entity. IEEE Trans. Smart Grid. 2019, 10, 6366–6375. [Google Scholar] [CrossRef]
Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. Artificial intelligence-aided minimum reactive power control for the DAB converter based on harmonic analysis method. IEEE Trans. Power Electron. 2021, 36, 9704–9710. [Google Scholar] [CrossRef]
Sun, L.; Pan, Y.; Wang, H. Assessing the inertial response time of grid-forming converters: Estimation and optimization. Electr. Power Syst. Res. 2026, 253, 112470. [Google Scholar] [CrossRef]
Tang, Y.; Cao, D.; Xiao, J. AI-aided power electronic converters automatic online real-time efficiency optimization method. Fundam. Res. 2025, 5, 1111–1116. [Google Scholar] [CrossRef]

Figure 1. The zonal architecture for a shipboard DC power distribution system.

Figure 2. The representative isolated multi-port converters. (a) Series-DAB topology, (b) parallel-DAB topology.

Figure 3. Circuit topology of the proposed TAB converter.

Figure 4. TPS modulation waveform of the TAB converter.

Figure 5. Equivalent circuit diagram of Bat2 under short-circuit failure.

Figure 6. Circuit diagrams of different operating modes during t₁–t₇. (a) Circuit diagram of HB₁-HB_o, (b) circuit diagram of HB₂-HB_o.

Figure 7. Stability analysis of TAB converter under three-phase modulation using bode plot.

Figure 8. Algorithm diagram for optimizing the efficiency of the DDPG TAB converter under three phase shift modulation.

Figure 9. Curves showing the variation of current stress and different types of power losses with P_o. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses. (U_{i_parallel} = 48 V, U_{i_series} = 48 V, U_{i_TAB} = 48 V).

Figure 10. Current stress and power loss of TAB converter in fault-tolerant mode and TAB converter optimized by DDPG. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses.

Figure 11. A photograph of the hardware setup.

Figure 12. The waveform of bat2 power change in normal mode.

Figure 13. The waveform of normal mode switching to fault-tolerant mode.

Figure 14. The waveform of bat1 power change under fault mode.

Figure 15. ZVS of switch S₁₁. (a) Normal mode, (b) fault mode.

Figure 16. Measured the converter efficiency. (a) Under normal mode, (b) under fault mode.

Table 2. Parameters of different converters under testing.

Parameters	Value
Input voltage in traditional parallel topology (U_{i_parallel})	48 V
Input voltage in traditional series topology (U_{i_series})	48 V
Input voltage of battery pack in TAB topology (U_{i_TAB})	48 V
Transformer turns ratio of parallel topology (n)	1:2
Transformer turns ratio of series topology (n)	1:1
Transformer turns of TAB topology (n)	1:1
Output voltage (U_o)	100 V
Rated power (P_o)	216 W
Capacitor (C_o)	220 μF
Switching frequency (f_s)	20 kHz
Leakage inductance (L_r)	45 μH

Table 3. Key parameters of the DDPG algorithm.

Parameters	Value
Actor network learning rate (θ^μ)	0.0003
Critic network learning rate (θ^Q)	0.003
Soft update rate (τ)	0.005
Penalty coefficient (α)	150
Penalty coefficient (ω)	ω = 1.5 (ZVS), ω = 15 (None-ZVS)
Discount factor (γ_d)	0.98
Noise parameters (σ)	0.01
Memory pool size	50,000
Number of episodes	10,000
Step size of each episode	20
Reward stability	Fluctuation < 1 %

Table 4. Specifications of the converter circuit components.

Items	Figure 2a Architecture	Figure 2b Architecture	TPS Architecture
Input Switches	BSZ440N15NS3G	BSC070N10NS5	BSC070N10NS5
Output Switches	BSZ440N15NS3G	BSC320N20NS3G	BSC320N20NS3G
Input Capacitors	B41858C9227M000	B41858C9227M000	B41858C9227M000
Output Capacitors	B41858C9227M000	B43504A2477M000	B43504A2477M000
Transformer (core)		B66375G0000X187
Inductor (core)		74435581000

Table 5. Structural comparison of representative multi-port converters.

Parameters	Series-EPS	Parallel-SPS	TAB-TPS
Fault-tolerant	×	×	√
ZVS	Partial ZVS	Partial ZVS	Full ZVS
DOF	4	2	3
Capacitors	4	4	3
Switches	16	16	12
Transformers	2	2	2
Inductors	2	2	1

Table 6. Circuit experimental parameters.

Parameters	Value
Input voltage of battery pack i = 1 (U₁)	48 V
Input voltage of battery pack i = 2 (U₂)	48 V
Input voltage of battery pack i = 3 (U₃)	48 V
Transformer turns ratio (n)	1:1
Output voltage (U_o)	100 V
Rated power (P_o)	216 W
Capacitor (C_o)	110 μF
Switching frequency (f_s)	20 kHz
Leakage inductance (L_r)	45 μH

Table 7. Efficiency at different power points under different operating conditions.

Mode and Strategy	40 (W)	80 (W)	120 (W)	160 (W)	200 (W)
TPS in normal mode	93%	92.9%	92.35%	91.56%	91.31%
DDPG-TPS in normal mode	96.9%	95.8%	95.2%	94.44%	94.28%
TPS in fault mode	91.4%	91.31%	90.67%	89.78%	89.25%
DDPG-TPS in fault mode	96.36%	95.2%	94.65%	93.8%	93.18%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhao, Q.; Zhu, M.; Wen, S.; Zhang, B. Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics 2026, 15, 1563. https://doi.org/10.3390/electronics15081563

AMA Style

Huang Y, Zhao Q, Zhu M, Wen S, Zhang B. Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics. 2026; 15(8):1563. https://doi.org/10.3390/electronics15081563

Chicago/Turabian Style

Huang, Yiqi, Qiang Zhao, Miao Zhu, Shuli Wen, and Bing Zhang. 2026. "Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization" Electronics 15, no. 8: 1563. https://doi.org/10.3390/electronics15081563

APA Style

Huang, Y., Zhao, Q., Zhu, M., Wen, S., & Zhang, B. (2026). Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics, 15(8), 1563. https://doi.org/10.3390/electronics15081563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization

Abstract

1. Introduction

2. TAB Converter Circuit Topology and Modulation Strategy

2.1. TAB Converter Circuit Topology

2.2. TPS Modulation Strategy

2.3. ZVS Constraints

2.4. Stability Modeling and Analysis

3. Losses Distribution

3.1. Losses Model of Power Switches

3.2. Power Losses Model in Magnetic Components

4. Algorithm Optimization of TAB Converter Under TPS Modulation

4.1. DDPG Algorithm

4.2. Reward Function for Minimizing the Power Losses

4.3. Training of the DDPG Algorithm

4.4. DDPG Algorithm Optimization Results

5. Experimental Verifications

5.1. Normal Mode

5.2. Switch from Normal Mode to Fault-Tolerant Mode

5.3. Fault-Tolerant Mode

5.4. ZVS and Converter Efficiency

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI