Next Article in Journal
Joint Phase and Power Optimization in RIS-Aided Multi-User Systems Using Deep Reinforcement Learning
Previous Article in Journal
Eye-Tracking Response Modeling and Design Optimization Method for Smart Home Interface Based on Transformer Attention Mechanism
Previous Article in Special Issue
Capacitance Reduction in IGCT-Based MMC Through Elevated Ripple Tolerance Under Linear Modulation Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization

1
Ocean College, Jiangsu University of Science and Technology, Zhenjiang 212003, China
2
School of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(8), 1563; https://doi.org/10.3390/electronics15081563
Submission received: 11 March 2026 / Revised: 31 March 2026 / Accepted: 4 April 2026 / Published: 8 April 2026
(This article belongs to the Special Issue Power Electronics and Multilevel Converters)

Abstract

A triple phase shift (TPS) modulation strategy is proposed for a three-port active bridge (TAB) converter in shipboard zonal DC systems. Unlike traditional multi-port converters, the TAB realizes voltage conversion and bidirectional power conversion under TPS modulation. It exhibits superior performance in reducing control complexity, enhancing fault-tolerant capability, and extending the zero-voltage switching (ZVS) region under normal and fault operation modes. To further enhance its conversion efficiency, a deep reinforcement learning optimization approach based on the deep deterministic policy gradient (DDPG) algorithm is introduced to adaptively optimize TPS control parameters and minimize the overall power losses of the converter. To verify the proposed TPS modulation and DDPG-based optimization strategy for the TAB converter topology, a corresponding hardware prototype is built and experimentally tested under different operating conditions. Experimental results demonstrate that the TAB architecture with DDPG optimization effectively reduces current stress and power loss, boosting the converter’s maximum efficiency to 96.9% under normal mode and a 3% efficiency gain after fault isolation.

1. Introduction

Driven by the rapid evolution of power electronics technology, the efficient coupling of distributed energy resources and ship DC energy storage systems [1,2,3] becomes a strategic frontier for the development of smart microgrids. Unlike traditional AC frameworks, DC systems substantially reduce conversion losses by eliminating redundant DC/AC stages, thereby optimizing overall energy efficiency along the power path.
According to IEEE Std. 1709 [4] shipboard DC power distribution systems mainly employ three classic topologies: radial, ring, and zonal. The radial topology is simple and suitable for vessels with modest reliability requirements. The ring topology, though theoretically appealing, is rarely used in marine applications due to its limited reliability and flexibility [5,6]. In contrast, the zonal structure, as depicted in Figure 1 forms a zonal network through longitudinal port and starboard DC buses along the ship, enabling a dual-sided redundant power supply. When one side fails, critical loads can still be powered from the healthy opposite-side bus, significantly enhancing system survivability and power continuity. However, this structure necessitates a large number of DC-DC converters. Therefore, multi-port fault-tolerant converters can reduce volume, cost, and losses while improving system reliability [6].
The dual active bridge (DAB) converter plays a significant role in DC power systems supporting emerging energy applications, including photovoltaic generation and shipboard energy storage. Given the prevalent use of multiple DC–DC converters in DC systems, adopting a multi-port architecture to integrate these functions provides an effective means of significantly decreasing system volume and cost. Multi-port converters have received extensive research attention in applications such as shipboard power systems, satellite platforms, and electric vehicles. Both reliability and high efficiency are critical design requirements in these fields [7,8,9,10,11].
Based on circuit topology, existing multi-port converters can be classified into non-isolated [11] and isolated [12,13]. For example, non-isolated multi-port converters feature a simple topology and reduce volume. However, their efficiency and operating voltage range are generally limited. Most existing multi-port converters focus on improving efficiency and reducing cost [14,15]. However, such integrated designs often compromise system reliability. As demonstrated in Figure 2, representative isolated multi-port converters include the series-DAB and parallel-DAB.
Existing research on multi-port DAB converter control and modulation [16] primarily focuses on optimizing control algorithms for conventional multi-port DAB systems to reduce losses in power switches and magnetic components.
As the conventional modulation scheme for DAB converters, single phase shift (SPS) offers buck–boost capability, reduced filter requirements, and fast dynamic response [17]. However, it suffers from a limited ZVS region, elevated current stress, and increased circulating power, which significantly degrades overall performance. To address these limitations, extensive research focuses on advanced modulation strategies such as extended phase shift (EPS), dual phase shift (DPS), and triple phase shift (TPS) [18,19,20,21]. In practice, SPS, EPS, and DPS are specific operating cases of TPS. TPS employs three independent control variables, providing greater flexibility than the other modulation strategies. However, under heavy load conditions, TPS operation gradually shifts toward DPS or EPS modes, resulting in increased power loss and reduced conversion efficiency [19,20].
To further explore optimal solutions, various advanced iterative algorithms are proposed, including genetic algorithms (GA) [22] and the Newton–Raphson (NR) method [23], which are commonly used iterative techniques. However, these iterative approaches involve high logical complexity and high computational costs. Particle swarm optimization is adopted to minimize reactive power and expand the ZVS region [24]. Compared with genetic algorithms and Newton’s method, particle swarm optimization offers faster convergence, fewer tuning parameters, and lower computational costs. In recent years, artificial intelligence techniques significantly reshaped traditional control methodologies. For instance, reinforcement learning is widely applied to solve optimal control problems [25]. Among them, deep Q-network (DQN) is a representative approach that combines Q-learning with deep neural networks. However, DQN [26,27] is primarily designed for discrete action spaces and is therefore not well suited for control problems involving continuous variables, such as phase shift modulation in power converters. As a result, the control implementation often depends on a lookup table generated from the trained reinforcement learning policy. Yet, when the training covers a wide operating range, the lookup table grows excessively large, imposing high computational and memory requirements.
To address this drawback, an RL–ANN hybrid approach has been proposed in [28], which employs an artificial neural network as a fast surrogate model to generate real-time control actions during online operation, thereby substantially cutting down on memory requirements. However, this RL–ANN framework still requires two separate training processes, resulting in high computational costs and prolonged training times.
To tackle these challenges, deep reinforcement learning (DRL) techniques are applied to address cognitive decision-making problems in complex systems [29,30]. Among representative DRL algorithms, the DDPG method uses neural networks to approximate both the policy and the Q-function, enabling efficient and stable control in continuous action spaces, effectively overcoming the limitations of conventional reinforcement learning approaches [31]. Consequently, the trained agent serves as a fast surrogate predictor that directly generates appropriate control decisions in real time. As a model-free DRL approach based on actor–critic architecture, DDPG employs a parameterized policy that is directly optimized via policy gradient methods to update the parameters of deep neural networks [32].
In summary, the DDPG algorithm is well suited for the control of multi-port DAB converters [33,34,35]. It naturally handles continuous control variables and captures nonlinear system dynamics through neural network approximation. In addition, ZVS constraints can be incorporated into the optimization objective via reward design, enabling desired switching performance. The data-driven nature also reduces reliance on accurate system models and improves robustness.
As summarized in Table 1, DDPG is compared with conventional optimization and reinforcement learning methods in terms of control flexibility, nonlinear capability, and real-time performance. Compared with GA, PSO, NR, and DQN, the proposed method provides superior capability in continuous control and nonlinear handling, making it more suitable for the optimization of multi-port converters.
Table 1. Comparison of the DDPG algorithm with other algorithms.
Table 1. Comparison of the DDPG algorithm with other algorithms.
AlgorithmsAction SpaceContinuous VariableNonlinear HandlingModel
Dependency
ZVS
Constraints
GA [21]ContinuousGoodModerateHighdifficult
PSO [24]ContinuousGoodModerateHighdifficult
NR [22]ContinuousGoodWeakHighdifficult
DQN [25,26]DiscretePoorModerateLowGood
RL-ANN [27]ContinuousModerateStrongLowGood
DDPG [32,33,34]ContinuousGoodStrongLowGood
The structure of this paper is organized as follows. The circuit working mode analysis of the TAB converter, TPS modulation method under normal and fault operation modes, stability analysis, and modeling are described in Section 2. A detailed loss analysis is presented in Section 3. The modulation method optimized using the DDPG algorithm is analyzed in Section 4, and the current stress and various power losses as functions of output power are evaluated and compared among the traditional parallel, series, and TAB topologies, as well as the DDPG-optimized topology. Experimental results obtained from a prototype employing the proposed topology are presented in Section 5. Finally, the conclusions of this paper are provided in Section 6.

2. TAB Converter Circuit Topology and Modulation Strategy

2.1. TAB Converter Circuit Topology

The circuit structure of a series DC-DC converter is depicted in Figure 3. The TAB converter includes three symmetrical H-bridges, two high-frequency isolation transformers Tri, and a single series inductor Lr for auxiliary power transfer.
The primary side consists of two H-bridges (HBi, I = 1~2), each of which is equipped with four MOSFET switches (Si1~Si4, I = 1~2). Moreover, the output H-bridge (HBo) incorporates four switches (So1~So4). All switches use MOSFETs, with parasitic capacitances denoted as Ci1~Ci4 (HBi) and Co1~Co4 (HBo), respectively. On the DC side, the filter capacitors Cin and Co are connected in parallel with the input-side H-bridge (HBi) and the output-side H-bridge (HBo), respectively. Furthermore, the turns ratio of Tri is 1:n, and the corresponding voltage conversion ratio for each battery pack is mi = Uo/nUi, with a total voltage conversion ratio of m = Uo/U1 + U2, where m > 1 and m < 1 denote boost and buck scenarios, respectively. Uo is the output voltage and Ui is the input voltage of the i-th battery pack.

2.2. TPS Modulation Strategy

The TAB converter is designed to achieve power balancing and voltage conversion among battery packs, as well as to regulate the output-side power, under both normal and fault conditions. The corresponding converter modulation waveform is illustrated in Figure 4. On the primary side, the H-bridge switches Si1(Si3) and Si2(Si4) operate complementarily, with a constant 50% duty cycle. On the secondary side, the switches operate with a constant duty cycle, do = 0.5.
As a result, the diagonal switches within each pack are not activated simultaneously, creating an internal phase shift cycle di (0 < di < 0.5). If a short-circuit fault occurs in battery pack 2, for fast isolation, S21 and S23 are set to 1, S22 and S24 are set to 0, and the internal phase shift period di is adjusted to 0. The global phase shift angle γ (–90° < γ < 90°) between the primary and secondary sides is used to determine the direction and magnitude of power conversion throughout the system.
In this paper, only the forward power conversion mode (0° < γ < 90°) is considered. Prior to the derivation of the power expressions, the following assumptions are adopted to simplify the analysis: (i) the switches are assumed to operate under ideal conditions, neglecting switching transients and dead-time effects; (ii) the leakage inductance current is approximated as piecewise linear within each switching interval; (iii) parasitic resistances and magnetic losses are neglected; (iv) the converter is assumed to operate under steady-state conditions with constant input voltages and switching frequency.
From the modulation waveform diagram of the TAB converter, it is evident that the converter operates in six distinct states during half a switching cycle (t1 to t7). Based on the volt-second balance principle and Kirchhoff’s voltage law, the inductor current iLr over the interval from ti to tj can be expressed as follows (1):
i L r t j = t i t j i = 1 2 n U p i U o L r d t + i L r t i
Interval 1 [t1–t2]: During this time, on the primary side, switches S11 and S14 in HB1 are turned on, and S24 in HB2 is turned on. On the secondary side, switches So2 and So3 are turned on. At this stage, the inductor current is negative.
i L r ( t 2 ) = n U p 1 + U o L r ( t 2 t 1 ) + i L r ( t 1 )
Interval 2 [t2–t3]: During this time, on the primary side, switches S11 and S14 in HB1 are turned on, and switches S21 and S24 in HB2 are turned on. On the secondary side, switches So2 and So3 are turned on. During the dead time, switch S22 is turned off, and its parasitic capacitance is charged, while the parasitic capacitance of switch S21 is charged, until the drain-source voltage of switch S21 is zero, thus achieving ZVS of switch S21. At this stage, the inductor current increases linearly.
i L r ( t 3 ) = i L r ( t 2 ) + n U p 1 + n U p 2 + U o L r ( t 3 t 2 )
Interval 3 [t3–t4]: During this time, on the primary side, switches S11 and S14 in HB1 are turned on and switches S21 and S24 in HB2 are turned on. On the secondary side, switches So1 and So4 are turned on. During the dead time, switches So2 and So3 are turned off, and their parasitic capacitances are charged. At the same time, the parasitic capacitances of switches So1 and So4 are charged until the drain-source voltage of switches So1 and So4 are zero, thus achieving ZVS of switches So1 and So4. At this stage, the inductor current is positive.
i L r ( t 4 ) = n U p 1 + n U p 2 U o L r ( t 4 t 3 ) + i L r ( t 3 )
Interval 4 [t4–t5]: During this time, on the primary side, switches S11 and S14 in HB1 are turned on and switches S21 and S23 in HB2 are turned on. On the secondary side, switches So1 and So4 are turned on. During the dead time, switch S24 is turned off, and its parasitic capacitance is charged. At the same time, the parasitic capacitance of switch S23 is charged until the drain-source voltage of switch S23 is zero, thus achieving ZVS of switch S23. At this stage, the inductor current is positive.
i L r ( t 5 ) = n U p 1 U o L r ( t 5 t 4 ) + i L r ( t 4 )
Interval 5 [t5–t6]: During this time, on the primary side, switches S11 and S14 in HB1 are turned on and switches S21 and S23 in HB2 are turned on. On the secondary side, switches So1 and So4 are turned on. At this stage, the inductor current is positive.
i L r ( t 6 ) = n U p 1 U o L r ( t 6 t 5 ) + i L r ( t 5 )
Interval 6 [t6–t7]: During this time, on the primary side, switches S11 and S13 in HB1 are turned on, and switches S21 and S23 in HB2 are turned on. On the secondary side, switches So1 and So4 are turned on. During the dead time, switch S14 is turned off, and its parasitic capacitance is charged. At the same time, the parasitic capacitance of switch S13 is charged until the drain-source voltage of switch S13 is zero, thereby achieving ZVS of switch S13. At this stage, the inductor current is positive.
i L r ( t 7 ) = U o L r ( t 7 t 6 ) + i L r ( t 6 )
When a short-circuit occurs in the second battery pack, as shown in Figure 5, the faulty battery pack can be quickly isolated by simultaneously opening switches S21 and S23 (S21/S23 = 1, S22/S24 = 0), as shown in Figure 4, thereby ensuring reliable system operation.
Based on the modulation waveform diagram of the TAB converter, the TAB converter operates in six states during half a switching cycle (t1t7). Without loss of generality, the operation states of HB1–HB2 are given in Figure 6. Each primary side full bridge (HBi) and secondary side full bridge (HBo) is regarded as a dual active bridge.
For ease of analysis, in Formula (8), the output current of the TAB converter, the total output power, and the output power of each battery pack are normalized using the standard reference values Inorm, Po,norm, and Pi,norm, respectively.
I n o r m = U o 4 f L r , P n o r m = U 2 o 4 f L r , P i , n o r m = U o U i 4 f L r
Without loss of generality, we consider the scenario where m > 1. By combining Equations (2)–(8), the inductor current iLr,norm at t1~t7 is given as:
i L r , n o r m ( t 1 ) = ( 2 n d 1 ) m 1 + ( 2 n d 1 ) m 2 + 4 d 1 1 2 γ π i L r , m o r m ( t 2 ) = ( 2 n d 2 ) m 2 + ( 2 n d 1 ) m 1 + 2 d 1 + 2 d 2 1 2 γ π i L r , m o r m ( t 3 ) = ( 2 n d 2 ) m 2 + ( 2 n d 1 ) m 1 + 1 2 γ π + 2 d 1 2 d 2 i L r , m o r m ( t 4 ) = ( 2 n d 1 ) m 2 + ( 2 n d 1 ) m 1 + 1 2 γ π i L r , m o r m ( t 5 ) = ( 2 n γ π 2 n d 1 ) m 2 + ( 2 n γ π 2 n d 1 ) m 1 + 1 i L r , m o r m ( t 6 ) = ( 2 n d 1 ) m 1 + ( 2 n d 1 ) m 2 + 1 4 d 1 + 2 γ π i L r , m o r m ( t 7 ) = ( 2 n d 2 ) m 2 + ( 2 n d 1 ) m 1 2 d 1 + 2 d 2 1 2 γ π
By integrating the inductor current over half a switching cycle (t1~t7), the average output DC current Io-avg can be derived from (10).
I o a v g = 2 T t 1 t 7 i L r d t
According to the average inductor current in half a switching cycle in Formula (10), the total output power Po of TAB after TPS modulation is the product of the average output current Io-avg and the output voltage Uo, as shown in Formula (11). The output power of a single battery pack is Pi, which is the product of the average output current Io-avg and the battery pack voltage Ui, as shown in Formula (12).
P o = U o I o a v g = 2 T t 1 t 7 i L r d t
P i = U i I o a v g = 2 T t 1 t 7 i L r d t
Thus, the output power of each battery pack and the total output power of the TAB converter can be expressed by Equations (13) and (14).
P i n , i , n o r m P i P i , n o r m = 4 n d i 2 m i + 2 n d i m i + 4 n d 1 γ π m i 2 n γ 2 π 2 m i
P o , n o r m = P o P n o r m = i = 1 2 4 n d i 2 m i + 2 n d i m i + 4 n d 1 γ π m i 2 n γ 2 π 2 m i

2.3. ZVS Constraints

To achieve ZVS for all power switches Si1~Si4 and So1~So4, the parasitic capacitance of each switch must be fully discharged before turning on. Therefore, the inductor Lr must have sufficient energy to resonate with the parasitic capacitance and the series inductor Lr. Subsequently, the anti-parallel diodes of the power switches are naturally turned on, bringing the voltage between the drain and source close to zero. Therefore, if the power switches are turned on at this point, ZVS is achieved.
To ensure ZVS, the energy stored in the leakage inductor Lr at the switching instant must be sufficient to complete the commutation of the bridge arm. To be specific, this energy must exceed the energy required to simultaneously charge the output capacitance of one power switch and discharge the output capacitance of the other switch within the same bridge arm. Satisfying this condition allows the drain-source voltage of the incoming switch to fall to zero before the gate signal is applied, thereby eliminating turn-on losses. Accordingly, the following inequality must be satisfied:
1 2 L r i 2 L r 1 2 C i n p a r U 2 i D S + 1 2 C o n p a r U 2 o D S
In these expressions, Cin-par and Cout-par represent the parasitic capacitances of the primary side and secondary side switches, respectively, while Ui-DS and Uo-DS denote the drain-to-source voltages. For simplified analysis, ZVS is approximated by ensuring that a current flows from source to drain before turn-on. Under this approximation, the boundary condition for ZVS is defined as iDS ≤ 0, where iDS represents the current flowing through the drain and source of the MOSFET. Accordingly, based on the waveforms shown in Figure 4, the ZVS constraints for switches Si1Si4 and So1So4 can be summarized as follows:
i L r , n o r m t 2 0 , i L r , n o r m t 3 0
It is vital to note that critical ZVS turn-on is achieved at the boundary condition where the current defined in (14) becomes exactly zero. Although this operating point may introduce minor turn-on losses due to incomplete discharge of the parasitic capacitance before gate activation, it provides the advantage of enabling zero current turn-off for the complementary switch within the same bridge arm.

2.4. Stability Modeling and Analysis

In order to rigorously analyze the dynamic behavior of TAB converters under three-phase modulation, based on the power flow characteristics according to Equation (14), and to derive a small-signal model, each variable is decomposed into a steady-state value plus a small disturbance:
γ = γ o + γ ^ P = P o + P ^ i U i = U i o + U ^ i
where γo, Uio, and Po are the global phase shift angle, battery pack voltage, and output power in steady state, respectively. γ ^ , U ^ i , and P ^ i are the corresponding perturbations.
Applying first-order Taylor expansion around the equilibrium point:
P ^ o , norm = P o , norm γ γ 0 γ ^
By linearizing the power expression around the equilibrium point, the small-signal power variation can be expressed in compact form as:
P ^ = K γ γ ^
where the power gain coefficient Kγ is:
K γ = n i = 1 N U i 8 f L r
This result indicates that, near the operating point, the TAB converter exhibits a linear proportional relationship between global phase shift perturbation and output power variation. The magnitude of this gain is determined by voltage level, leakage inductance, and switching frequency.
The dynamic response of the output voltage can be described by the relationship between the capacitor current and voltage. Based on Kirchhoff’s current law and Kirchhoff’s voltage law, the dynamic response expression of the output voltage to the modulation signal can be obtained:
C o d U ^ o d t = P ^ o U o U ^ o R o = γ ^ U o K γ U ^ o R o
By applying the Laplace transform to the dynamic Equations obtained above, the open-loop transfer function can be solved:
G p ( s ) = n U o R o 1 2 γ π i = 1 2 U i 8 f L r R o C o s + 1
Based on the Bode plot shown in Figure 7, the control performance under normal operating conditions was evaluated.
The open-loop function is shown in Equation (22). After closed-loop PI compensation, the low-frequency gain increases, effectively reducing the steady-state error. The cutoff frequency is approximately 270 Hz, achieving a good balance between dynamic response speed and high-frequency noise attenuation. Furthermore, the phase margin is designed to be 48°, indicating that the system possesses sufficient stability and damping characteristics.
In addition, the measured frequency response agrees well with the analytical model, verifying the accuracy of the second-order model. These results confirm that the proposed control design can achieve the required stability and dynamic performance.

3. Losses Distribution

The primary goal of this study is to enhance the efficiency of the TAB converter, necessitating a rigorous quantification of individual loss components. Total power dissipation is primarily categorized into semiconductor losses, magnetic component losses, and residual losses. Magnetic losses encompass conduction losses and excitation losses in the transformer Tri and the inductor Lr. In addition, semiconductor losses include conduction losses, switching transition losses, and gate drive power consumption. Residual losses are mainly due to temperature-dependent variations in the conduction resistance of MOSFET modules. A detailed breakdown of power loss analysis is provided in the following sections.

3.1. Losses Model of Power Switches

(1) Switching Losses: In general, the evaluation of switching losses depends on the operating conditions of power semiconductors, namely soft switching or hard switching. Accordingly, total switching loss comprises turn-on and turn-off losses. Specifically, the expressions for the turn-on loss and turn-off loss of MOSFETs are as follows:
P o n = f s 1 2 V d s I o n ( t r i + t f o n ) + 1 2 C d s V d s 2
P o f f = f s 1 2 V d s I o f f ( t r v + t f o f f )
where Pon denotes the turn-on losses, Poff indicates the turn-off losses, fs represents the switching frequency, Vds denotes the drain-source voltage, Cds represents the parasitic capacitance, Ion denotes the turn-on current, and Ioff is the turn-off current. Moreover, tf_on denotes the turn-on delay time, tri represents the current rise time, tf_off signifies the turn-off delay time, and trv denotes the voltage fall time of the MOSFET, respectively. Therefore, the total switching loss is as follows:
P t o t a l _ s w = P o n + P o f f
Specifically, switching losses are negligible under soft switching conditions, with turn-on losses effectively eliminated under ZVS and turn-off losses under zero current switching.
(2) Conduction Losses: The conduction losses in power semiconductors are primarily determined by the circulating RMS current and the device’s on-state resistance (Rds(on)). Specifically, it is critical to account for the temperature dependency of on-state resistance, which typically increases with junction temperature (Tj), thereby affecting overall efficiency at high power levels.
P c o n = I 2 r m s R d s ( o n ) ( T j )
Given that each power switch operates with a 50% duty cycle, it conducts for exactly half of the switching period. Specifically, the total loss corresponds to the sum of power dissipation across the on-state resistance of all active semiconductor devices in both primary and secondary bridges. Accordingly, the cumulative conduction losses Ptotal_con of all switches admit an analytical expression as follows:
P total _ con = 8 R ds _ in i Lr _ rms 2 2 + 4 R ds _ out n i Lr _ rms 2 2
where Rds_in represents the on-resistance of the power switches (Si1~Si4) and Rds_out denotes the on-resistance of the power switches (So1~So4).
(3) Gate Drive Losses: The power dissipation in the gate drive circuitry stems from the repeated charging and discharging of the MOSFET’s internal gate capacitance during each switching cycle. This loss mainly depends on the total gate charge Qg required for device state transitions and the magnitude of the applied gate source voltage Vgs. Accordingly, the gate driver loss can be calculated as follows:
P gate = Q g V gs f s
where Vgs represents the gate driver voltage and Qg denotes the gate charge capacitance in the MOSFET, respectively.
For the TAB converter with 12 power switches, the total gate drive power consumption (Ptotal_gate) is proportional to the switching frequency and can be analytically expressed as:
P total _ gate = 8 Q g-in V gs-in f s + 4 Q g-out V gs-out f s
where Vgs_in denotes the gate driver voltage and Qg_in represents the gate charge capacitance of the power switches Si1~Si4. Similarly, Vgs_out denotes the gate driver voltage and Qg_out signifies the gate charge capacitance of the power switches So1~So4.
Thus, the losses of the total power switches are as follows:
P total _ sw _ loss = P total _ sw + P total _ con + P total _ gate

3.2. Power Losses Model in Magnetic Components

The total power dissipation in magnetic components, including the high-frequency transformer Tri and resonant inductor Lr, comprises winding losses and excitation losses. Under the assumption of constant winding resistance and neglecting frequency-dependent skin and proximity effects, copper losses Pcopper of these magnetic components are analytically defined based on Joule heating as follows:
P c o p p e r = I 2 m _ r m s R m
To simplify the electromagnetic analysis of the magnetic cores, the induction waveforms in Tri and Lr are assumed to be dominated by their fundamental sinusoidal components. Consequently, the classical Steinmetz Equation serves to estimate the core power dissipation, Pcore, which admits the following analytical expression (32):
P core = k f s α B ^ β V e
where k, α, and β are empirical Steinmetz coefficients that describe the hysteresis and eddy current characteristics of a specific core material. These parameters are typically obtained from manufacturer loss curves or datasheets. B ^ is the peak magnetic flux density within the core and depends on the applied voltage excitation and the effective cross-sectional area of the core. Ve is the effective magnetic volume of the core and reflects the total volume of material subjected to magnetic flux. Accordingly, the power loss of the transformer Tri can be expressed as follows:
P Tr _ loss = I L k _ rms 2 R Tri _ p + n 2 R Tri _ s + k Tri F s α t r i B ^ Tri β Tri
where RTri_p denotes the winding resistor of Tri on the primary side and RTri_s denotes the total winding resistance of the transformer Tri on the secondary side.
B ^ T r i 2 U i π 2 N T r i _ p A T r i
where NTri_p denotes the number of winding turns of each transformer on the primary side, and ATri represents the effective magnetic cross-sectional area of each transformer.
The losses of the series inductor Lr can be calculated as:
P L r _ loss = I L k _ rms 2 R L r _ P + k L r f s α L r B ^ L r β L r
where RLr_p denotes the winding resistor of Lr. The B ^ L r   can be estimated as:
B ^ L r μ e f f μ 0 i L r _ max l L r
where μeff denotes the effective relative permeability of the magnetic core and accounts for the presence of a physical air gap introduced to prevent magnetic saturation; μ0 represents the permeability of free space; iLr_max signifies the peak instantaneous current flowing through the resonant inductor Lr, which is utilized to determine the maximum magnetic flux density; and 1Lr refers to the effective magnetic circuit length of the iron core utilized for the series inductor Lr, respectively.
Finally, the major power losses can be calculated as:
P all _ loss = P total _ sw + P total _ con + P total _ gate + P Tr _ loss + P Lr _ loss
The efficiency of the TAB converter can be calculated as:
η = P o P o + P All _ loss

4. Algorithm Optimization of TAB Converter Under TPS Modulation

In this section, the DDPG algorithm is applied to enhance the conversion efficiency of the TAB converter under ZVS constraints. The optimization focuses on identifying suitable phase shift ratios d1, d2, and γ to minimize total power losses.

4.1. DDPG Algorithm

During the optimization control phase of the TAB converter, the main goal is to select an optimal control action that enables operation with minimum power dissipation based on real-time state variables U1, U2, and Po. From a theoretical standpoint, this efficiency optimization problem can be formulated as a Markov decision process.
DDPG, an advanced DRL framework, is employed to derive an optimal control strategy by solving this markov decision process. In the DDPG algorithm, the policy function maps the state vector st equal to st= [U1, U2, Po] T to the corresponding optimal action vector α= [d1, d2, γ] T.
Meanwhile, the critic network evaluates each state action pair to approximate the corresponding action-value function Q of s and α. The operational workflow of the proposed DDPG-based control methodology is depicted in Figure 8.
In the DDPG framework, the actor function α t = μ ( s t | θ μ ) defines the deterministic policy parameterized by θ μ and directly maps the state st to a specific control action αt. Concurrently, the critic function Q θ Q s t , a t θ μ approximates the action-value function with parameters θ Q . Based on the Bellman Equation, the action-value function Q θ Q ( s t , a t ) represents the expected cumulative return and admits the following formulation.
Q θ Q s t , a t = E s t + 1 E r s t , a t + γ Q θ Q s t + 1 , μ θ s t + 1
In this formulation, Q θ Q ( s t , μ θ ( s t ) represents the expected cumulative return obtained by selecting the optimal action in state St under the given policy π . To ensure accurate value estimation, the critic network is updated through minimization of a loss function L θ Q that represents the mean squared Bellman error.
The loss function admits the following mathematical expression:
L ( θ Q ) = E μ θ Q ( s t , a t | θ Q ) y t 2
where μ′ɵ denotes the target policy introduced to stabilize the learning process, while yt represents the target value produced by the target critic and target actor networks. It is important to note that yt is inherently dependent on the parameters θ Q   of the target critic network. The target action-value yt can be expressed as follows:
y t = r s t , a t + γ d Q s t + 1 , μ θ ( s t + 1 | θ μ ) | θ Q
Moreover, the parameters of the actor network are updated using the deterministic policy gradient θμJ. The policy gradient can be analytically derived using the chain rule as follows:
𝛻 θ μ J = E μ θ 𝛻 θ μ Q ( s , a | θ Q ) | s = s t , a = μ θ s t | θ μ = E μ θ 𝛻 a Q ( s , a | θ Q ) | s = s t , a = μ θ s t 𝛻 θ μ μ θ ( s | θ μ ) | s = s t
During the update process of the action-value function Q s t , a t θ Q , the target value yt changes continuously, which may undermine the convergence of the critic network. To mitigate this instability, target networks are introduced. Instead of directly copying the primary network weights, a soft update strategy is adopted.
Specifically, a target critic network Q s t , a t θ Q , and a target actor network are maintained, where the target parameters track the primary network weights through a smoothing factor. The gradual transfer of these smoothed weights from the primary networks to the target networks, as illustrated in Figure 8, can be mathematically expressed as follows.
soft update τ = 0.001 θ Q τ θ Q + 1 τ θ Q θ μ τ θ μ + 1 τ θ μ
where α denotes the soft-update rate, determining the tracking speed of the target networks. Assigning a small value to this parameter constrains the update speed of the target values yt and significantly improves the stability and convergence of the training process. This mechanism suppresses oscillatory behavior commonly associated with frequent parameter updates in DRL.
In the DDPG algorithm, an exploration strategy supports effective exploration within the continuous action domain. To enhance exploration capability, Gaussian noise N μ θ s t θ t μ , σ is added to the deterministic action policy μ, thereby forming a new exploratory policy μ′.
This process can be mathematically expressed as follows:
μ θ s t = μ θ s t θ t μ + N μ θ s t | θ t μ , σ
where N μ θ s t θ t μ , σ is sampled from the noise process and injected into the practical environment during action execution. The parameter σ remains constant during the initial stage and gradually decreases at a fixed rate once the replay memory reaches its maximum capacity.
To obtain the minimum operating loss modulation strategy for TAB converters using the DDPG algorithm, the objective function f(x) is defined as follows:
f ( x ) = P a l l l o s s = min P total _ sw + P total _ con + P total _ gate + P Tr _ loss + P Lr _ loss

4.2. Reward Function for Minimizing the Power Losses

The total power dissipation of the TAB converter is denoted as Pall-loss, as defined in Equation (29). Additionally, the optimization problem includes a nonlinear equality constraint Po equal to Por, where Por denotes the target output power and Po represents the actual output power measured during the training process. To guide the operating point toward the desired power level, a penalty function φ (d1, d2, γ) is introduced to quantify the deviation between the actual and target output power.
φ   d 1 ,   d 2 , γ = P o r P o 2
The minimum value of zero for φ (d1, d2, γ) is achieved when Por = Po. Thus, the reward function F (d1, d2, γ) should consist of a fitness function Ploss and a penalty function φ (d1, d2, γ), which is given by:
F ( d 1 ,   d 2 ,   γ ) = ω P l o s s ( d 1 ,   d 2 ,   γ ) + α φ ( d 1 ,   d 2 ,   γ )
where ω and α denote the penalty coefficients. The value of α is set to 150 to amplify the impact of the power tracking error within the penalty function. Furthermore, to achieve ZVS, the coefficient ω is dynamically assigned: ω = 1.5 when all power switches Si1 to Si4 and So1 to So4 satisfy the soft switching condition defined in Equation (14). Otherwise, ω = 15 to penalize the loss of ZVS. The reward function is constructed so that its value increases as power losses and power tracking errors decrease. As a result, the objective of minimum power loss and zero power deviation under ZVS constraints is reformulated as a reward maximization problem F (d1, d2, γ).

4.3. Training of the DDPG Algorithm

During each episode, the states (U1, U2, Po) are randomly sampled from the ranges specified in Table 2. In summary, the DDPG framework takes the TAB converter operating states (U1, U2, Po) as inputs, processes them through an actor–critic neural network under a reward-driven optimization mechanism during offline training, and outputs an optimized control policy that maps system states to the optimal TPS control parameters. Accordingly, the appropriate selection of critical hyperparameters significantly influences training performance. In this study, both the actor and critic networks employ a symmetric architecture, each comprising two hidden layers with 256 and 128 neurons, respectively. Hyperparameter optimization is carried out using the grid search method, with the training process spanning 10,000 episodes.
For the proposed DDPG-based optimization framework for the TAB converter, the input system states are defined as st = [U1, U2, Po]T, and the optimal action vector to be optimized is αt = [d1, d2, γ]T. The detailed training process of the DDPG algorithm is presented step-by-step as follows:
  • Step 1: Randomly initialize the actor network θμ, critic network θQ and softly initialize their target networks.
  • Step 2: For each training episode, observe the initial system state st.
  • Step 3: In each training step, select action α t = μ ( s t | θ μ ) + n t by adding exploration noise nt to the actor network output.
  • Step 4: Execute αt, obtain the reward rt and next state st+1, then store the experience tuple (st, αt, rt, st+1) into the replay buffer.
  • Step 5: Sample a mini-batch of experience data and compute the target value.
  • Step 6: Update the critic network by minimizing the loss function shown in Equation (39).
  • Step 7: Update the actor network using the deterministic policy gradient in Equation (41).
  • Step 8: Soft update the target networks as shown in Equation (40) and repeat until convergence.
Copper losses are calculated based on the winding resistance of the magnetic components. In this paper, the selected switching frequency (20 kHz) has a relatively small impact on the winding resistance. Therefore, the additional resistance introduced by these effects is negligible and has a minimal impact on the overall loss analysis. The main goal of the DDPG learning algorithm is to obtain the optimal control strategy with minimal power loss throughout the entire operating range. The key parameters are summarized in Table 3.

4.4. DDPG Algorithm Optimization Results

To ensure a fair and consistent comparison, all configurations utilize switches from Infineon and Microchip Technology. The voltage and current ratings of the selected MOSFETs, as detailed in Table 4, are set to 1.5 to 2 times their rated operating values to provide sufficient safety margins.
Through Table 5, a comprehensive comparative analysis is conducted on these three configurations from dimensions such as fault tolerance, ZVS performance, and the number of key components with control complexity.
In a series configuration, two transformers with a turn ratio of 1:1 are required. Using extended phase shift modulation to control the power of each individual port and overall power transmission requires four degrees of freedom. In parallel configuration, two transformers are required with a turn ratio of 1:2. Using single phase shift modulation to independently adjust the power of each port requires two degrees of freedom (DOF), but the output voltage range is limited in parallel configuration. If a port fails, the parallel topology without fault isolation capability will affect power output.
A comparative analysis is conducted among traditional series architecture (Figure 2a), parallel architecture (Figure 2b), TAB (Figure 3), and TAB topologies optimized using the DDPG algorithm, focusing on current stress, switching losses, and magnetic component losses.
Figure 9 illustrates the comparison of current stress values and different power losses with Po when Ui_parallel = 48 V, Ui_series = 48 V, Ui_TAB = 48 V, and Po changes from 0 W to 216 W. Specifically, Figure 9a depicts the changing trend of the current stress Istress, while Figure 9b,c depict the curves for power switch loss and magnetic component loss, respectively. Figure 9d presents the curves of all power losses Pall_loss. As indicated in Figure 9a, the DDPG-optimized TAB converter exhibits the smallest current stress, whereas the traditional parallel topology experiences the largest current stress. Given that the traditional parallel and series topologies utilize more switches and transformers, their total losses, as depicted in Figure 9d, are also the highest. After DDPG optimization for efficiency, the TAB converter achieves the lowest losses and the highest efficiency.
Under fault-tolerant mode, the evolution of current stress and power losses relative to varying output power is illustrated in Figure 10. For this analysis, Po is increased from 0 W to 200 W while the transformer Tr1_TAB turns ratio is fixed at 1:2. Specifically, Figure 10a illustrates the changing trend of current stress Istress, while Figure 10b,c show the curves for power switching loss and magnetic component loss, respectively. Figure 10d demonstrates the curves for all power losses Pall_loss. As illustrated in Figure 10a,d, the DDPG-optimized TAB converter achieves smaller current stress and total losses.
Specifically, Figure 10a,d present the comparison of current stress and total power loss between the TAB converter with and without DDPG optimization under normal mode. At the 200 W operating point, the proposed method reduces the current stress by 11.2% under normal operation and 12.1% under fault-tolerant operation. In addition, the efficiency is improved by 2.9% and 3.9%, respectively. These results clearly demonstrate the effectiveness of the proposed DDPG-based method in improving TAB converter performance.

5. Experimental Verifications

To ensure fault tolerance, the prototype remains in TAB topology. The primary side of the experimental setup is equipped with three battery packs and corresponding transformers. In normal mode, the third battery pack does not provide power; in fault-tolerant mode, the third pack provides power. This allows for comparison of the TAB converter’s system efficiency and ZVS region in both normal and fault-tolerant modes. All nominal values and operating ranges of the experimental prototype are detailed in Table 6. The controller models employed are TMS320F28377 and GW1N-UV4PG256C6/I5. Figure 11 displays a photograph of the experimental platform, which verifies the feasibility and performance characteristics of the TAB converter in both normal and fault-tolerant modes.

5.1. Normal Mode

The power change of the battery pack under the normal mode of the converter is tested, as shown in Figure 12. The input power of the bat2 increases at time t. The internal phase shift ratio (d2) of the transformer primary side voltage Up2 is increased. Consequently, the leakage inductor current iLr and the output DC current Io also rise.

5.2. Switch from Normal Mode to Fault-Tolerant Mode

To assess fault-tolerant capability, the waveform transition from normal to fault-tolerant mode is illustrated in Figure 13. When a short-circuit fault occurs in bat2, bat2 is isolated at time t. After isolation, the transformer primary side voltage Up2 drops to 0V. The leakage inductor current iLr declines.

5.3. Fault-Tolerant Mode

To verify the fault-tolerant operational capability, the power change of the battery pack under fault-tolerant mode is indicated in Figure 14. The input power of the bat1 increases at time t. The internal phase shift ratio (d1) of the transformer primary side voltage Up1 increases. Consequently, the leakage inductor current iLr also increases.

5.4. ZVS and Converter Efficiency

Under normal mode and fault-tolerant mode, taking switch S11 as an example, the ZVS performance of the converter is tested as shown in Figure 15.
Figure 16 illustrates the efficiency curves of the system under normal mode and fault-tolerant operating mode. In normal mode, the system achieves a broader ZVS region, allowing operation closer to its optimal point and resulting in higher efficiency compared to the fault-tolerant mode. Under normal operating conditions, DDPG optimization increases the maximum efficiency to 96.9%, representing an improvement of approximately 3.9% compared to the unoptimized TAB converter. Under fault-tolerant conditions, the maximum efficiency is improved by about 4.96%. Table 7 shows a comparison of the efficiency of TPS and DDPG-TPS under different power points and operating conditions.
In summary, the inductor current waveforms under different operating conditions are compared with simulation and experimental results, and good agreement in waveform shape is observed. To provide quantitative validation, the peak current values are also evaluated, where under normal operation a peak current of 6.2 A is measured experimentally and 6.43 A is predicted by the analytical model, resulting in a deviation of 3.5%. Under fault-tolerant operation, the measured peak current is 7.15 A and the predicted value is 7.43 A, corresponding to a deviation of 3.8%, thereby demonstrating good accuracy and physical consistency of the proposed analytical model.

6. Conclusions

This paper proposes a TAB converter under three-phase shift modulation and uses deep deterministic policy gradient technology to minimize power loss, thereby improving overall operating efficiency. The main contributions include: (i) the proposed TAB converter under TPS modulation can achieve voltage conversion and bidirectional power transmission; (ii) in the event of short-circuit faults, it isolates the faulty port without requiring additional hardware and ensures uninterrupted power delivery to the load through an effective modulation strategy; (iii) within the power range of 0–200 W, the proposed DDPG method improves the conversion efficiency by approximately 2.85–3.97% under normal mode and by 3.89–4.96% under fault-tolerant mode, compared with the conventional TPS strategy; and iv) experimental validation confirms the achievement of soft switching, the improvement in conversion efficiency based on DDPG, and robust cooperative operation with fault-tolerant capabilities, thereby ensuring reliable power conversion and continuous utilization of renewable energy in shipboard DC system applications.
In this paper, although the proposed DDPG-based optimization method demonstrates improved performance, several practical considerations should be noted: (i) the training process requires considerable computational resources; however, it is performed offline, and the trained policy can be efficiently implemented in real time. (ii) The performance may be affected by parameter variations, and significant deviations from the training conditions could lead to degradation. (iii) The validation is conducted on a low-power prototype, and extension to higher-power systems may introduce additional challenges. These issues will be further investigated in future work to improve the robustness and scalability of the proposed method.

Author Contributions

Conceptualization, Y.H.; methodology, Y.H. and Q.Z.; software, Y.H.; validation, Y.H. and Q.Z.; formal analysis, Y.H. and S.W.; investigation, Y.H. and M.Z.; resources, B.Z.; data curation, Y.H.; writing original draft preparation, Y.H. and Q.Z.; writing review and editing, Y.H. and M.Z.; supervision, S.W.; project administration, Q.Z. and M.Z.; funding acquisition, Y.H. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Postgraduate Research & Practice Innovation Program of Jiangsu Province sponsored by the Department of Education of Jiangsu Province, grant number SJCX25_2508.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no personal, academic, or financial conflicts of interest associated with this paper.

References

  1. Aldarsi, E.; Singh, R.; Zhang, J. A Case Study of a Stand-Alone AC and DC Power Network in the Red Sea New City, Kingdom of Saudi Arabia. Electronics 2026, 15, 1077. [Google Scholar] [CrossRef]
  2. Liu, J.; Ming, Z.; Shi, H.; Zhang, H.; Liu, Z. A Hierarchical Distributed Method for Source-Grid-Load-Storage Coordinated Power and Energy Balance in Distribution Networks. Electronics 2026, 15, 1054. [Google Scholar] [CrossRef]
  3. Liu, Y.; Tang, F.; Liu, Z.; Zuo, L. Research on the Optimal Transient Power Angle Control Strategy for New Energy Transmission Systems in Energy Storage Enhancement Areas. Sustainability 2026, 18, 1636. [Google Scholar] [CrossRef]
  4. IEEE Std 1709-2018 (Revision of IEEE Std 1709-2010); IEEE Recommended Practice for 1 kV to 35 kV Medium-Voltage DC Power Systems on Ships. IEEE: Piscataway, NJ, USA, 2018; pp. 1–54.
  5. Baltazar, P.; Barros, J.D.; Gomes, L. A Distributed Electric Vehicles Charging System Powered by Photovoltaic Solar Energy with Enhanced Voltage and Frequency Control in Isolated Microgrids. Electronics 2026, 15, 418. [Google Scholar] [CrossRef]
  6. Chen, Q.; Su, Y.; Hu, B.; Shao, C.; Xu, L.; Huang, C. Dynamic State Estimation for Sustainable Distribution Systems Considering Data Correlation and Noise Adaptiveness. Sustainability 2026, 18, 1693. [Google Scholar] [CrossRef]
  7. Maevsky, D.; Kharchenko, V.; Bardis, N.; Stetsiuk, D.; Maevskaya, E.; Kryvda, V. Dependability Model of Electric Power Systems for Assessing Smart City Energy Sustainability. Sustainability 2026, 18, 1512. [Google Scholar] [CrossRef]
  8. Chen, Y.; Ma, J.; Zhu, M.; Liu, J. Dual-Mode Wide-Voltage-Range Operation of Hybrid Triple Active Bridge Converter for Bipolar DC Distribution Systems. IEEE Trans. Ind. Appl. 2024, 60, 8998–9014. [Google Scholar] [CrossRef]
  9. Cao, T.; Zhu, J.; Guo, Y.; Han, Y.; Wu, B.; Li, D. Utilizing the Intrinsic CC/CV Characteristics of a CLLC Converter for Battery Charging with ZVS Operation. Electronics 2026, 15, 1128. [Google Scholar] [CrossRef]
  10. Kang, X.; Li, S.; Smedley, K.M. Decoupled PWM Plus Phase-Shift Control for a Dual-Half-Bridge Bidirectional DC–DC Converter. IEEE Trans. Power Electron. 2018, 33, 7203–7213. [Google Scholar] [CrossRef]
  11. Zhang, H.; Dong, D.; Liu, W.; Ren, H.; Zheng, F. Systematic Synthesis of Multiple-Input and Multiple-Output DC–DC Converters for Nonisolated Applications. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 10, 6470–6481. [Google Scholar] [CrossRef]
  12. Ma, J.; Chen, Y.; Shen, X.; Qiu, Y. Fault-Tolerant Multiport Active Bridge Converter for Resilient Energy Storage Integration in Zonal Shipboard DC System. J. Mar. Sci. Eng. 2025, 13, 654. [Google Scholar] [CrossRef]
  13. Yang, W.; Ma, J.; Zhu, M.; Hu, C. Open-Circuit Fault Diagnosis and Tolerant Method of Multiport Triple Active-Bridge DC-DC Converter. IEEE Trans. Ind. Appl. 2023, 59, 5473–5487. [Google Scholar] [CrossRef]
  14. Sato, Y.; Uno, M.; Nagata, H. Nonisolated Multiport Converters Based on Integration of PWM Converter and Phase-Shift-Switched Capacitor Converter. IEEE Trans. Power Electron. 2020, 35, 455–470. [Google Scholar] [CrossRef]
  15. Ma, J.; Zhu, M.; Li, Y.; Cai, X. Monopolar Fault Reconfiguration of Bipolar Half Bridge Converter for Reliable Load Supply in DC Distribution System. IEEE Trans. Power Electron. 2022, 37, 11305–11318. [Google Scholar] [CrossRef]
  16. Shih, L.; Liu, Y.; Chiu, H. A novel hybrid mode control for a phase-shift full-bridge converter featuring high efficiency over a full-load range. IEEE Trans. Power Electron. 2019, 34, 2794–2804. [Google Scholar] [CrossRef]
  17. Guo, Z. Modulation scheme of dual active bridge converter for seamless transitions in multiworking modes compromising ZVS and conduction loss. IEEE Trans. Ind. Electron. 2020, 67, 7399–7409. [Google Scholar] [CrossRef]
  18. Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. Reinforcement learning based efficiency optimization scheme for the DAB DC–DC converter with triple-phase-shift modulation. IEEE Trans. Ind. Electron. 2021, 68, 7350–7361. [Google Scholar] [CrossRef]
  19. Bhattacharjee, A.K.; Batarseh, I. Optimum hybrid modulation for improvement of efficiency over wide operating range for triple-phase-shift dual-active-bridge converter. IEEE Trans. Power Electron. 2020, 35, 4804–4818. [Google Scholar] [CrossRef]
  20. Sha, D.; Sun, T.; Zhang, J. Varying switching frequency control for current-fed dual-active bridge DC–DC converter with constant flux density change for transformers. IEEE Trans. Power Electron. 2020, 35, 3766–3777. [Google Scholar] [CrossRef]
  21. Hou, N.; Li, Y. Overview and comparison of modulation and control strategies for non-resonant single-phase dual-active-bridge DC-DC converter. IEEE Trans. Power Electron. 2020, 35, 3148–3172. [Google Scholar] [CrossRef]
  22. Meng, L.; Dragicevic, T.; Vasquez, J.C.; Guerrero, J.M. Tertiary and secondary control levels for efficiency optimization and system damping in droop-controlled DC–DC converters. IEEE Trans. Smart Grid. 2015, 6, 2615–2626. [Google Scholar] [CrossRef]
  23. Du, Z.; Tolbert, L.M.; Chiasson, J.N.; Ozpineci, B. Reduced switchingfrequency active harmonic elimination for multilevel converters. IEEE Trans. Ind. Electron. 2008, 55, 1761–1770. [Google Scholar] [CrossRef]
  24. Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement learning and its applications in modern power and energy systems: A review. J. Modern Power Syst. Clean Energy. 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
  25. Dilokthanakul, N.; Kaplanis, C.; Pawlowski, N.; Shanahan, M. Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3409–3418. [Google Scholar] [CrossRef] [PubMed]
  26. Nwachukwu, S.E.; Folly, K.A.; Awodele, K.O. Soft Actor-Critic-Based MPPT Control of Solar PV Systems Under Partial Shading Conditions. IEEE Open Access J. Power Energy 2025, 12, 194–208. [Google Scholar] [CrossRef]
  27. Tang, Y.; Hu, W.; Xiao, J. A Deep Q-Network based optimized modulation scheme for Dual-Active-Bridge converter to reduce the RMS current. Energy Rep. 2020, 6, 1192–1198. [Google Scholar] [CrossRef]
  28. Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. RL-ANN-Based Minimum-Current-Stress Scheme for the Dual-Active-Bridge Converter with Triple-Phase-Shift Control. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 10, 673–689. [Google Scholar] [CrossRef]
  29. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
  30. Cao, D.; Hu, W.; Zhao, J.; Huang, Q.; Chen, Z.; Blaabjerg, F. A multi-agent deep reinforcement learning based voltage regulation using coordinated PV inverters. IEEE Trans. Power Syst. 2020, 35, 4120–4123. [Google Scholar] [CrossRef]
  31. Qiu, C.; Hu, Y.; Chen, Y.; Zeng, B. Deep deterministic policy gradient (DDPG)-Based energy harvesting wireless communications. IEEE Internet Things J. 2019, 6, 8577–8588. [Google Scholar] [CrossRef]
  32. Xu, H.; Sun, H.; Nikovski, D.; Kitamura, S.; Mori, K.; Hashimoto, H. Deep reinforcement learning for joint bidding and pricing of load serving entity. IEEE Trans. Smart Grid. 2019, 10, 6366–6375. [Google Scholar] [CrossRef]
  33. Tang, Y.; Shen, S.; Mi, C.; Wang, Y.; Chen, S. Artificial intelligence-aided minimum reactive power control for the DAB converter based on harmonic analysis method. IEEE Trans. Power Electron. 2021, 36, 9704–9710. [Google Scholar] [CrossRef]
  34. Sun, L.; Pan, Y.; Wang, H. Assessing the inertial response time of grid-forming converters: Estimation and optimization. Electr. Power Syst. Res. 2026, 253, 112470. [Google Scholar] [CrossRef]
  35. Tang, Y.; Cao, D.; Xiao, J. AI-aided power electronic converters automatic online real-time efficiency optimization method. Fundam. Res. 2025, 5, 1111–1116. [Google Scholar] [CrossRef]
Figure 1. The zonal architecture for a shipboard DC power distribution system.
Figure 1. The zonal architecture for a shipboard DC power distribution system.
Electronics 15 01563 g001
Figure 2. The representative isolated multi-port converters. (a) Series-DAB topology, (b) parallel-DAB topology.
Figure 2. The representative isolated multi-port converters. (a) Series-DAB topology, (b) parallel-DAB topology.
Electronics 15 01563 g002
Figure 3. Circuit topology of the proposed TAB converter.
Figure 3. Circuit topology of the proposed TAB converter.
Electronics 15 01563 g003
Figure 4. TPS modulation waveform of the TAB converter.
Figure 4. TPS modulation waveform of the TAB converter.
Electronics 15 01563 g004
Figure 5. Equivalent circuit diagram of Bat2 under short-circuit failure.
Figure 5. Equivalent circuit diagram of Bat2 under short-circuit failure.
Electronics 15 01563 g005
Figure 6. Circuit diagrams of different operating modes during t1t7. (a) Circuit diagram of HB1-HBo, (b) circuit diagram of HB2-HBo.
Figure 6. Circuit diagrams of different operating modes during t1t7. (a) Circuit diagram of HB1-HBo, (b) circuit diagram of HB2-HBo.
Electronics 15 01563 g006
Figure 7. Stability analysis of TAB converter under three-phase modulation using bode plot.
Figure 7. Stability analysis of TAB converter under three-phase modulation using bode plot.
Electronics 15 01563 g007
Figure 8. Algorithm diagram for optimizing the efficiency of the DDPG TAB converter under three phase shift modulation.
Figure 8. Algorithm diagram for optimizing the efficiency of the DDPG TAB converter under three phase shift modulation.
Electronics 15 01563 g008
Figure 9. Curves showing the variation of current stress and different types of power losses with Po. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses. (Ui_parallel = 48 V, Ui_series = 48 V, Ui_TAB = 48 V).
Figure 9. Curves showing the variation of current stress and different types of power losses with Po. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses. (Ui_parallel = 48 V, Ui_series = 48 V, Ui_TAB = 48 V).
Electronics 15 01563 g009
Figure 10. Current stress and power loss of TAB converter in fault-tolerant mode and TAB converter optimized by DDPG. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses.
Figure 10. Current stress and power loss of TAB converter in fault-tolerant mode and TAB converter optimized by DDPG. (a) Current stress, (b) power switching losses, (c) magnetic component losses, and (d) all power losses.
Electronics 15 01563 g010
Figure 11. A photograph of the hardware setup.
Figure 11. A photograph of the hardware setup.
Electronics 15 01563 g011
Figure 12. The waveform of bat2 power change in normal mode.
Figure 12. The waveform of bat2 power change in normal mode.
Electronics 15 01563 g012
Figure 13. The waveform of normal mode switching to fault-tolerant mode.
Figure 13. The waveform of normal mode switching to fault-tolerant mode.
Electronics 15 01563 g013
Figure 14. The waveform of bat1 power change under fault mode.
Figure 14. The waveform of bat1 power change under fault mode.
Electronics 15 01563 g014
Figure 15. ZVS of switch S11. (a) Normal mode, (b) fault mode.
Figure 15. ZVS of switch S11. (a) Normal mode, (b) fault mode.
Electronics 15 01563 g015
Figure 16. Measured the converter efficiency. (a) Under normal mode, (b) under fault mode.
Figure 16. Measured the converter efficiency. (a) Under normal mode, (b) under fault mode.
Electronics 15 01563 g016
Table 2. Parameters of different converters under testing.
Table 2. Parameters of different converters under testing.
ParametersValue
Input voltage in traditional parallel topology (Ui_parallel)48 V
Input voltage in traditional series topology (Ui_series)48 V
Input voltage of battery pack in TAB topology (Ui_TAB)48 V
Transformer turns ratio of parallel topology (n)1:2
Transformer turns ratio of series topology (n)1:1
Transformer turns of TAB topology (n)1:1
Output voltage (Uo)100 V
Rated power (Po)216 W
Capacitor (Co)220 μF
Switching frequency (fs)20 kHz
Leakage inductance (Lr)45 μH
Table 3. Key parameters of the DDPG algorithm.
Table 3. Key parameters of the DDPG algorithm.
ParametersValue
Actor network learning rate (θμ)0.0003
Critic network learning rate (θQ)0.003
Soft update rate (τ)0.005
Penalty coefficient (α)150
Penalty coefficient (ω)ω = 1.5 (ZVS), ω = 15 (None-ZVS)
Discount factor (γd)0.98
Noise parameters (σ)0.01
Memory pool size50,000
Number of episodes10,000
Step size of each episode20
Reward stabilityFluctuation < 1 %
Table 4. Specifications of the converter circuit components.
Table 4. Specifications of the converter circuit components.
ItemsFigure 2a
Architecture
Figure 2b
Architecture
TPS
Architecture
Input
Switches
BSZ440N15NS3GBSC070N10NS5BSC070N10NS5
Output
Switches
BSZ440N15NS3GBSC320N20NS3GBSC320N20NS3G
Input
Capacitors
B41858C9227M000B41858C9227M000B41858C9227M000
Output
Capacitors
B41858C9227M000B43504A2477M000B43504A2477M000
Transformer (core) B66375G0000X187
Inductor (core) 74435581000
Table 5. Structural comparison of representative multi-port converters.
Table 5. Structural comparison of representative multi-port converters.
ParametersSeries-EPSParallel-SPSTAB-TPS
Fault-tolerant××
ZVSPartial ZVSPartial ZVSFull ZVS
DOF423
Capacitors443
Switches161612
Transformers222
Inductors221
Table 6. Circuit experimental parameters.
Table 6. Circuit experimental parameters.
ParametersValue
Input voltage of battery pack i = 1 (U1)48 V
Input voltage of battery pack i = 2 (U2)48 V
Input voltage of battery pack i = 3 (U3)48 V
Transformer turns ratio (n)1:1
Output voltage (Uo)100 V
Rated power (Po)216 W
Capacitor (Co)110 μF
Switching frequency (fs)20 kHz
Leakage inductance (Lr)45 μH
Table 7. Efficiency at different power points under different operating conditions.
Table 7. Efficiency at different power points under different operating conditions.
Mode and Strategy40 (W)80 (W)120 (W)160 (W)200 (W)
TPS in normal mode93%92.9%92.35%91.56%91.31%
DDPG-TPS in normal mode96.9%95.8%95.2%94.44%94.28%
TPS in fault mode91.4%91.31%90.67%89.78%89.25%
DDPG-TPS in fault mode96.36%95.2%94.65%93.8%93.18%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhao, Q.; Zhu, M.; Wen, S.; Zhang, B. Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics 2026, 15, 1563. https://doi.org/10.3390/electronics15081563

AMA Style

Huang Y, Zhao Q, Zhu M, Wen S, Zhang B. Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics. 2026; 15(8):1563. https://doi.org/10.3390/electronics15081563

Chicago/Turabian Style

Huang, Yiqi, Qiang Zhao, Miao Zhu, Shuli Wen, and Bing Zhang. 2026. "Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization" Electronics 15, no. 8: 1563. https://doi.org/10.3390/electronics15081563

APA Style

Huang, Y., Zhao, Q., Zhu, M., Wen, S., & Zhang, B. (2026). Triple Phase Shift Modulation for Active Bridge Converter: Deep Reinforcement Learning-Based Efficiency Optimization. Electronics, 15(8), 1563. https://doi.org/10.3390/electronics15081563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop