PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient

Zhang, Zhi; Shen, Yunde; Ou, Kai; Liu, Zhuwei; Xuan, Dongji

doi:10.3390/hydrogen6020020

Open AccessArticle

PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient

by

Zhi Zhang

¹,

Yunde Shen

¹,

Kai Ou

²,

Zhuwei Liu

¹ and

Dongji Xuan

^1,*

¹

College of Mechanical and Electrical Engineering, Wenzhou University, Wenzhou 325035, China

²

School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350108, China

^*

Author to whom correspondence should be addressed.

Hydrogen 2025, 6(2), 20; https://doi.org/10.3390/hydrogen6020020

Submission received: 18 February 2025 / Revised: 18 March 2025 / Accepted: 19 March 2025 / Published: 25 March 2025

Download

Browse Figures

Versions Notes

Abstract

The operational performance of proton exchange membrane fuel cells (PEMFC) is highly influenced by temperature, making effective thermal management essential. However, the multivariate coupling between pumps and radiators presents significant control challenges. To address this issue, a dual DDPG-PID control strategy is proposed, integrating temperature and flow rate variations to enhance system stability and response. Simulation results demonstrate that the proposed method significantly reduces temperature control errors and improves response time compared to conventional PID-based strategies. Specifically, the D-DDPG PID achieves a temperature error reduction of up to 75.4% and shortens the average tuning time by up to 25.6% compared to PSO-PID. Furthermore, the strategy optimizes cooling system performance, demonstrating its effectiveness in PEMFC thermal management.

Keywords:

PEMFC; thermal management; deep reinforcement learning; dual DDPG-PID control

1. Introduction

In recent years, with the gradual reduction of global dependence on coal, oil, and natural gas, and in response to the environmental pollution caused by the use of fossil fuels, there has been increasing attention on exploring new alternative energy sources [1,2,3]. Among these, proton exchange membrane fuel cells (PEMFCs) have garnered significant attention due to their numerous advantages, such as low emissions, high efficiency, low operating temperature, rapid start-up capability, and long lifespan [4,5,6]. As an efficient and environmentally friendly energy conversion technology, PEMFC is based on electrochemical principles, and the proton directional migration is achieved through the anodic oxidation of hydrogen and cathodic reduction of oxygen in the membrane electrode module, which ultimately generates clean water and releases electrical energy [7,8]. However, during the electrochemical reaction process, fuel cells generate not only electrical energy but also heat, which leads to an increase in the operating temperature of the PEMFC [9,10]. While a higher temperature can enhance reaction kinetics and conductivity, thereby improving fuel cell efficiency, excessive temperatures may result in the dehydration and degradation of the proton exchange membrane, ultimately compromising the long-term stability and lifespan of the fuel cell [11,12,13]. Therefore, the development of efficient thermal management control strategies for PEMFC systems has garnered significant attention from the academic community, leading to extensive research by scholars in this field.

To address the impact of temperature on fuel cell performance, many researchers have conducted studies on thermal management control strategies for fuel cells, aiming to improve system stability, enhance response speed, and optimize energy consumption. For instance, Zhao et al. [14] developed a thermal management system model and analyzed the impact of temperature on PEMFC performance. Yu et al. [15] proposed controlling the stack temperature by adjusting the coolant flow rate, while considering the parasitic power of the cooling system. Liu et al. [16] introduced a power control strategy to address the strong nonlinearity of PEMFC systems, achieving performance improvements over traditional PID (proportion–integration–differentiation) control. Li et al. [17] developed a multivariable coordinated controller to resolve the coupling issue between the pump and the radiator, thereby enhancing control accuracy. Yin et al. [18] designed a maximum efficiency control strategy (MECS) based on temperature variations in the PEMFC, which outperforms traditional PID methods in temperature-tracking performance. Zhao et al. [19] applied a diploid genetic algorithm to optimize the temperature control of the membrane electrode assembly (MEA). Cheng et al. [20] combined nonlinear feedforward control with a linear quadratic regulator (LQR) to enhance the control performance of the fuel cell’s cooling system. Jia Y. et al. [21] proposed a variable fuzzy PID controller that adjusts control accuracy through the proportional factor. Yu et al. [22] integrated a PID controller with a thermal circuit to reduce parasitic power loss and alleviate the temperature rise in the catalyst layer. Zhi et al. [23] introduced a piecewise predictive feedback control method based on PID to address the limitations of traditional PID control. Ou et al. [24] implemented a feedforward fuzzy PID controller for compressor temperature control, effectively reducing parasitic power loss. Hasheminejad et al. [25] proposed a fuzzy control strategy that ensures optimal operation conditions for PEMFCs by modeling the coolant loop. Mousakazemi et al. [26] optimized fuzzy controller parameters using a genetic algorithm, significantly improving overall control performance. Wang et al. [27] developed a fuzzy rule-based controller for PEMFC system temperature control by adjusting the fan speed. Tan et al. [28] applied particle swarm optimization (PSO) to optimize the operating temperature of the fuel stack. Yan et al. [29] introduced a fault-tolerant control method for sensor fault detection, ensuring that the PEMFC temperature remains near the reference value, even during sensor faults. Ahmadi S et al. [30] applied PSO to optimize PID control, achieving significant control performance improvements. Li et al. [31] used a BP neural network-based PID controller to regulate PEMFC system temperature, demonstrating good robustness and practical application effectiveness.

Despite the valuable contributions of these studies, there are still some limitations in their control strategies. Most of the control methods (e.g., PID and fuzzy logic controllers) are based on predefined models and cannot dynamically adapt to the complex nonlinear behavior of PEMFC systems. These methods struggle to provide stability and optimal performance in the face of the high multivariate coupling and system delays inherent to PEMFCs. As a result, temperature overshoots and slow response times often occur, thus affecting the overall efficiency and stability of the system. For example, while fuzzy logic and PID controllers can improve stability to some extent, their performance is still limited by having too many parameters and the need for extensive tuning under different operating conditions. These challenges are further exacerbated by the complex nonlinear dynamics of PEMFCs, which lead to difficulties in model matching, as well as substantial delays in the control process.

In the past few years, with the rapid development of intelligent technology, algorithms based on deep reinforcement learning (DRL) have gradually been widely used in the field of new energy, such as the deep Q-network (DQN) [32], deep deterministic policy gradient (DDPG) [33], twin delayed deep deterministic policy gradient (TD3) [34], and other advanced algorithms. These algorithms do not rely on precise mathematical models and can continuously learn and optimize control strategies through interaction with the environment, exhibiting strong adaptability, especially in complex and dynamically changing systems. Currently, deep reinforcement-learning algorithms have been widely applied to energy management in fuel cell hybrid electric vehicle systems [35]. In this context, Huang et al. [36] proposed an energy management solution for fuel cell hybrid systems based on dual deep deterministic policy gradient (Dual DDPG), where in pure electric mode, DDPG optimizes the power distribution between the battery and the supercapacitor, reducing the battery’s depth of discharge. In extended-range mode, DDPG coordinates the output of the fuel cell, battery, and super-capacitor, improving fuel cell efficiency by 0.94% and significantly reducing power fluctuations. Although deep reinforcement learning has achieved promising results in many fields, research on reinforcement-learning algorithms in fuel cell system control, especially in thermal management control, is still limited, and relevant literature is scarce.

Inspired by the energy management of fuel cell hybrid vehicles based on dual DDPG, and in order to overcome the limitations of traditional control algorithms, the author innovatively introduced reinforcement-learning algorithms into fuel cell thermal management. A dual DDPG-PID control strategy was proposed to address the nonlinear and multivariable coupling characteristics of the fuel cell thermal management system. As an advanced adaptive method for PEMFC temperature regulation, this hybrid strategy combines the DDPG algorithm with a PID controller, allowing for dynamic adjustment of the control strategy based on real-time system feedback. Unlike traditional methods, the DDPG algorithm can learn and continuously improve the control strategy in real time according to the state of the fuel cell system, achieving precise optimization of the PID controller parameters. This effectively solves the latency issue commonly encountered in traditional controllers. Therefore, this hybrid control strategy provides a novel approach to thermal management control for fuel cells.

This paper is organized as follows. In Section 2, the level model of the fuel cell system will be developed; in Section 3, the framework of the proposed Dual DDPG control strategy will be discussed. In Section 4, the simulation will be validated under continuous step-loading conditions, and the results will be described and analyzed in detail. Finally, the full paper will be summarized and conclusions will be given in Section 5.

2. PEMFC System Model

The schematic representation of the PEMFC system is illustrated in Figure 1. This system comprises two main subsystems for the fuel supply: the air supply system and the hydrogen supply system. The air supply system facilitates the transport of air to the cathode side, while the hydrogen supply system directs hydrogen from a high-pressure cylinder to the anode side through a pressure valve. Additionally, a thermal management system is integrated to ensure effective cooling of the fuel cell during operation.

2.1. Fuel Cell Modeling

The voltage of a fuel cell is influenced by thermodynamic and electrochemical losses. The actual output voltage can be expressed as:

V = E - V_{a c t} - V_{o h m} - V_{c o n c}

(1)

where E denotes the Nernst electromotive force,

V_{a c t}

denotes the voltage loss due to activation polarization,

V_{o h m}

denotes the voltage drop due to ohmic loss, and

V_{c o n c}

denotes the voltage loss due to concentration polarization.

The calculation of Nernst electromotive force is as follows [37]:

E = 1.229 - 8.5 \times 10^{- 4} (T_{f c} - T_{0}) + \frac{R T_{f c}}{2 F} (l n P_{H_{2}} + \frac{1}{2} l n P_{O_{2}})

(2)

where

T_{0}

is the ambient temperature,

T_{f c}

is the fuel cell temperature,

P_{H_{2}}

and

P_{O_{2}}

are the pressures of hydrogen and oxygen entering the stack, respectively, and

F

is the Faraday constant.

Activation polarization is the voltage loss occurring during the electrochemical reaction in a fuel cell due to the limitation of the electrode reaction rate, primarily caused by the kinetic lag of the electrode catalysis reaction [38,39]. The empirical formula for calculating activation polarization is as follows:

V_{a c t} = V_{1} + V_{α} (1 - e^{{- i c}_{1}})

(3)

The Ohmic polarization is the potential loss in a fuel cell caused by the resistance of the electrolyte and the electrode materials [40]. Its empirical calculation formula is:

V_{o h m} = i \times R_{o h m}

(4)

R_{o h m} = \frac{t_{m}}{σ_{m}}

(5)

where

t_{m}

is the membrane thickness, and

σ_{m}

is a function of the membrane’s water content and the fuel cell temperature.

Concentration polarization is the potential loss in a fuel cell caused by the rapid consumption of reactants at the electrode surface, leading to a decrease in the concentration of reactants at the electrode surface [41]. The empirical formula for calculating the concentration polarization is as follows:

V_{c o n c} = i (c_{2} \frac{i}{i_{m a x}})^{c_{3}}

(6)

To validate the accuracy of the developed model, a simulation was conducted based on the experimental conditions reported in Reference [42], where the chimney temperature was set to 80 °C and the inlet pressures of both the cathode and anode were set to 2 bar. The simulated results were then fitted to the experimental data from the literature to analyze the consistency between the predicted V-I characteristics and the reference data. As shown in Figure 2 of the manuscript, the simulation results closely match the experimental data, with a maximum deviation of less than 2% across the entire operating range, confirming the model’s accuracy.

In the fuel cell system model, it is necessary to include the air compressor model, air cooler model, humidifier model, and gas transmission pipeline model. The gas transmission pipeline model represents the mass flow rate of gas transport and the pressure variations during the transmission process, which can be calculated using Equations (A1)–(A35). The auxiliary system models, including the air compressor, air cooler, and humidifier models, primarily serve to supply air with appropriate pressure and humidity to maintain the operation of the fuel cell, and their behavior can be determined using Equations (A37)–(A55). Additionally, the parameters used for modeling are provided in Table A1.

2.2. PEMFC Thermal Management System Modeling

The thermal management system of a PEMFC comprises components, including a water pump, radiator, water tank, stack, bypass valve, etc. [43]. Its associated control framework diagram is illustrated in Figure 3. Subsequently, utilize principles of mole conservation, energy conservation, and empirical formulas to model these components.

2.2.1. PEMFC Thermal Modeling

The energy balance of a PEMFC is primarily maintained through several factors [44]: the total power of electrochemical reactions

Q_{t o t}

, the energy brought into the stack by the incoming gases

Q_{i n}

, the energy carried away by the gases exiting the stack

Q_{o u t}

, the power consumed by the load

P_{s t}

, the heat carried away by the circulating coolant

Q_{c l}

, and the energy exchange between the PEMFC stack and the surrounding environment

Q_{a m b}

. This results in the variation of stack temperature

T_{s t}

.

m_{s t} C_{p, s t} \frac{{d T}_{s t}}{d t} = Q_{t o t} + Q_{i n} - P_{s t} - Q_{o u t} - Q_{c l} - Q_{a m b}

(7)

Q_{t o t} = ∆ H \times N_{H 2, r e a c t}

(8)

Q_{i n} = (W_{H 2, a n, i n} C_{p, H 2} + W_{v, a n, i n} C_{p, v}) \times (T_{a n, i n} - T_{a t m}) + (W_{a, c a, i n} C_{p, a i r} + W_{v, c a, i n} C_{p, v}) \times (T_{c a, i n} - T_{a t m})

(9)

Q_{o u t} = (W_{H 2, a n, o u t} C_{p, H 2} + W_{v, a n, o u t} C_{p, v} + W_{O 2, c a, o u t} C_{p, O 2} + W_{N 2, c a, o u t} C_{p, N 2} + W_{v, c a, o u t} C_{p, v} + W_{l, g e n} C_{p, l}) \times (T_{s t} - T_{a t m})

(10)

Q_{c l} = W_{c l} C_{p, l 1} (T_{s t} - T_{s t}^{i n})

(11)

Q_{a m b} = (T_{s t} - T_{a t m}) / R_{t}

(12)

In the above formula,

C_{p, a i r}

represents the specific heat of air;

C_{p, O 2}

represents the specific heat of oxygen;

C_{p, H 2}

represents the specific heat of hydrogen;

C_{p, N 2}

represents the specific heat of nitrogen;

C_{p, v}

represents the specific heat of water vapor;

C_{p, l}

represents the specific heat of liquid water;

C_{p, l 1}

represents the specific heat of coolant;

R_{t}

represents the thermal equivalent of the fuel cell; and

W_{c l}

is the coolant flow rate.

2.2.2. Radiator Modeling

The radiator controls the coolant temperature at the stack inlet by circulating coolant through forced-air convection from the cooling fan. The heat transfer to the environment depends on factors like the radiator’s surface area, heat transfer coefficient, and ambient temperature [45], and its heat transfer rate is linearly related to airflow and temperature difference.

Q_{r a d} = C_{p, a i r} W_{f a n} (T_{r a d, a i r}^{o u t} - T_{a t m})

(13)

where

W_{f a n}

represents the airflow rate of the radiator fan, and

T_{r a d, a i r}^{o u t}

is the air temperature at the outlet of the radiator. The value represents the average temperature of the coolant at the inlet and outlet of the radiator.

The temperature of the coolant upon exiting the radiator is determined by the following formula.

T_{r a d}^{o u t} = T_{r a d}^{i n} - \frac{Q_{r a d}}{W_{r a d} C_{p, l 1}}

(14)

where

T_{r a d}^{i n} = T_{s t}

is the temperature of the coolant entering the radiator, and

W_{r a d}

is the flow rate of the coolant passing through the radiator.

3. Deep Deterministic Policy Gradient Strategy

3.1. DDPG Algorithm Description

A DDPG (deep deterministic policy gradient) is a reinforcement-learning algorithm based on the actor–critic structure, which is suitable for continuous control problems, while the thermal management of fuel cells involves the regulation of continuous variables such as temperature, which gives DDPG a significant advantage in this scenario. Compared with the DQN algorithm, the DDPG algorithm is well adapted to complex systems with continuous state spaces, while the DQN algorithm mainly targets discrete action spaces and is not suitable for fuel cell thermal management control. Compared with PPO and SAC, DDPG responds faster to dynamic changes in the environment and can make efficient decisions in complex thermal management systems. Meanwhile, the stability and sample efficiency of DDPG perform better in reinforcement-learning applications and can work with PID controllers to achieve accurate temperature control and reduce over-regulation and oscillation problems. Therefore, DDPG combined with PID can provide smoother, stable, and efficient thermal management strategies. This chapter is followed by a detailed description of the DDPG algorithm and the framework of the dual DDPG-PID-based thermal management control strategy.

The DDPG algorithm exhibits strong applicability in addressing the control problem of continuous action spaces in PEMFC systems [46]. It amalgamates principles from both deep learning and deterministic policy into an actor–critic architecture. The actor network’s primary role is to acquire a deterministic policy, mapping environmental states to appropriate actions. Given the environmental state as input, the actor network outputs the action to be executed. Throughout the training phase, the actor network endeavors to maximize the cumulative reward derived from the current policy. In contrast, the critic network’s responsibility lies in estimating the value function, specifically the Q-value, for a given state–action pair. Utilizing the current environmental state and action as input, the critic network computes the corresponding Q-value, which represents the cumulative reward achievable from the subsequent state following action execution. Employing the Q-learning approach, the critic network iteratively refines its parameter values by minimizing the temporal difference (TD) error. This iterative refinement process enhances the Critic network’s accuracy in estimating Q-values, consequently bolstering the DDPG algorithm’s efficacy in navigating continuous action spaces.

The DDPG algorithm utilizes two distinct networks: the actor policy network and the critic value network, each consisting of both an online and a target network [47]. The online actor network, denoted as

μ_{θ}

, is responsible for generating actions directly, expressed as

a_{t} = ϑ (s_{t} | θ^{ϑ})

, Simultaneously, an online Critic network evaluates the Q-value, represented as

Q = Q (s_{t}, a_{t} | θ^{q})

. Subsequent to the action’s interaction with the environment, the target critic network employs the action from the target actor network and the current state observation to compute the target output Q-value, thereby adjusting the value function

Q_{ϕ} (s, a)

. Here,

θ^{ϑ}

and

θ^{q}

denote the parameters of the online actor and online critic, respectively, while

s_{t}

represents the agent’s state at time t. During the update of the actor and critic network parameters, the process is initiated by randomly sampling a minibatch from the time series [

s_{t}, a_{t}, r_{t}, s_{t + 1}

] stored in the experience pool R. This approach enhances algorithm stability and convergence speed, significantly reducing the variance of the Q value and facilitating the computation of the loss function L. The estimated optimal Q value and loss function are defined as:

y_{k} = r_{k} + γ Q^{'} (s_{k + 1}, ϑ^{'} (s_{k + 1} | θ^{ϑ^{'}}) | θ^{q^{'}})

(15)

L = \frac{1}{N} \sum_{k} (y_{k} - Q s_{k}, a_{k} | θ^{q})^{2}

(16)

where

r_{k}

denotes the reward at a specific time step, and

γ

represents the discount factor within the range [0, 1]. N indicates the count of state–action pairs randomly sampled from the experience replay buffer R. The primary objective of the optimal policy is to maximize the expected Q value. The actor network updates its parameters

θ^{ϑ}

by utilizing the Q-value’s derivative in the direction of maximum Q. The update process follows:

\nabla_{θ^{ϑ}} J \approx \nabla_{a} Q (s, a | θ^{q}) |_{s = s_{k}, a = μ (s_{k})} \nabla_{θ^{ϑ}} ϑ (s | θ^{ϑ}) |_{s_{k}}

(17)

where

\nabla_{a} Q (s, a | θ^{q})

represents the gradient of the action–value function with respect to action a, indicating the direction for maximizing the Q-value. The expression

\nabla_{θ^{ϑ}} ϑ (s | θ^{ϑ}) |_{s_{k}}

represents the gradient of the policy

ϑ

concerning the parameters

θ

, elucidating the method to adjust

θ

to maintain the efficacy of the control policy. Finally, to bolster learning stability, DDPG incorporates two sets of target networks: the target actor network and the target critic network [48]. The parameter updates for the target networks are given by Formulas (18) and (19). And the pseudocode of the Dual DDPG PID algorithm will be given in Algorithm A1.

θ^{q^{'}} \leftarrow τ θ^{q} + (1 - τ) θ^{q^{'}}

(18)

θ^{ϑ^{'}} \leftarrow τ θ^{ϑ} + (1 - τ) θ^{ϑ^{'}}

(19)

3.2. Temperature Control Architecture for PEMFC Based on Dual DDPG-PID

The thermal management system of a PEMFC consists of three main modules, namely the water pump, radiator, and stack, which aim to maintain a stable operating temperature. Temperature regulation involves controlling the coolant flow via the pump and airflow through the radiator. The coolant cools the stack, while the radiator cools the coolant and maintains its inlet temperature. This paper proposes a dual DDPG-PID approach to improve PID controllers in handling strongly coupled systems. The FCTMS framework based on a dual DDPG-PID is shown in Figure 4, and the agent’s output action structure is illustrated in Figure 5.

3.3. Details of the DDPG

The agent in the DDPG algorithm primarily consists of the state space, reward function, and action space. The state space is designed to characterize the thermal state and cooling dynamics of the system, enabling the reinforcement-learning agent to optimally regulate the coolant and airflow rates. The reward function serves as a constraint for the agent, penalizing excessive training deviations while providing rewards when the training performance meets the expected criteria. The action space, representing the agent’s output, corresponds to the gain parameters of the PID controller in the DDPG-PID controller. The following sections provide a detailed description of these three components.

3.3.1. State

The state space of the DDPG algorithm includes coolant flow rate, airflow through the radiator, stack temperature error and its derivative, and the coolant temperature difference error from inlet to outlet and its derivative. The state space expression of the fuel cell is as follows:

s_{t 1} = {e_{1}, \dot{e_{1}}, e_{2}, \dot{e_{2}}, W_{c l - 1}}

(20)

s_{t 2} = {e_{1}, \dot{e_{1}}, e_{2}, \dot{e_{2}}, W_{a i r - 1}}

(21)

where

e_{1}

signifies the disparity between the fuel cell temperature and the targeted control temperature, while

e_{2}

captures the variation between the temperature differential of the cooling fluid upon entering and exiting the fuel cell and the target temperature differential,

\dot{e_{1}}

and

\dot{e_{2}}

are the derivatives of two errors over time.

3.3.2. Action

In the DDPG algorithm, the output values of actions are constrained within a specified range and appropriately expanded. During training, the actor within the DDPG algorithm agent further fine-tunes the action parameters in the Simulink environment, restoring the action space. The expression is as follows:

a_{1} = ({K P}_{1} / 100, {K I}_{1} / 100, {K D}_{1} / 100)

(22)

a_{2} = ({K P}_{2}, {K I}_{2} / 10, {K D}_{2} / 10)

(23)

where

{K P}_{1 / 2} {K I}_{1 / 2} {K D}_{1 / 2}

is the output of the DDPG algorithm agent within a certain range. The coefficient by which

{K P}_{1 / 2} {K I}_{1 / 2} {K D}_{1 / 2}

is divided in the above expression varies because, in the DDPG algorithm, to maintain the action limits of both agent1 and agent2 within a two-digit range and avoid significant discrepancies in the numerical values of loss parameters during the iteration process, they are multiplied by different amplification gains. Lastly, there are differences in the coefficients by which the action parameters are divided when fine-tuning in the Simulink environment.

3.3.3. Reward Function

The agent continuously gains more rewards in the PEMFC system environment to approximate optimal temperature control and ideal temperature difference control. During the experimentation phase, the agent undergoes separate training sessions dedicated to controlling coolant flow and cooling airflow. During training, the agent’s output serves as the input for the PID controller. The PID controller calculates the corresponding control input based on the current state and control error. The reward function for the agent during training is defined as follows:

r_{t} = \{\begin{matrix} α_{1} (C_{1} - ξ_{1} (e_{1} + 1)^{2}) + α_{2} (C_{2} - ξ_{2} (e_{2} + 1 {))}^{2} + c, | e_{1} | \leq 0.1, | e_{2} | \leq 0.1 \\ α_{1} (C_{1} - ξ_{1} (e_{1} + 1)^{2}) - α_{2} (C_{3} + ξ_{2} (e_{2} + 1 {))}^{2} + c, | e_{1} | \leq 0.1, | e_{2} | > 0.1 \\ α_{2} (C_{2} - ξ_{2} (e_{2} + 1)^{2}) - α_{1} (C_{3} + ξ_{1} (e_{1} + 1)^{2}) + c, | e_{1} | > 0.1, | e_{2} | \leq 0.1 \\ - α_{1} (C_{4} + ξ_{1} (e_{1} + 1)^{2}) - α_{2} (C_{3} + ξ_{2} (e_{2} + 1)^{2}) + c, | e_{1} | > 0.1, | e_{2} | > 0.1 \end{matrix}

(24)

where

e_{1}

denotes the target temperature error of the stack (

T_{s t} - 353 K

), and

e_{2}

denotes the temperature difference error of the coolant in and out of the stack (

T_{s t} - T_{s t, i n} - 10

). The first term in the above expression indicates that both errors are small, so the system performs very well. The reward function returns the larger positive value, indicating that the state is ideal. The second term indicates that the system’s performance is problematic because

e_{1}

is small and

e_{2}

is large. Apply a penalty (negative value) to

e_{2}

and reduce the reward. The third item, because

e_{1}

is large and

e_{2}

is small, also requires a penalty to be applied to

e_{1}

. This component reduces the reward and indicates that the system is not performing well. The fourth term is due to the fact that, when both errors are large, the reward function gives a larger penalty and returns a negative value, indicating that the system is not performing as required. In the above reward function

α_{1}

,

α_{2}

, the weighting factors control the strength of the error penalty. The

ξ_{1}, ξ_{2}

error coefficients regulate how much the error affects the reward. The

C_{1}

,

C_{2}

,

C_{3}

,

C_{4}

constants are used to adjust the base value of the reward function. The c constant term can be used for offset or compensation. In this study, their values are shown in Table 1. Additionally, detailed training parameters for the DDPG are listed in Table A2.

4. The Thermal Management Strategy Based on Dual DDPG-PID

In this study, an integrated control method is based on a dual deep deterministic policy gradient to regulate the optimal operating temperature of PEMFC. This approach utilizes two reinforcement-learning agents to dynamically train the parameters of two PID controllers, one for the water pump side and the other for the radiator side. Additionally, this approach effectively addresses the challenges encountered by PID controllers when dealing with strongly coupled systems, thus enhancing control performance.

In accordance with RL theory, an agent’s learning process can be likened to an endeavor. It initiates an action, represented as

a_{t}

, within the present environmental state

s_{t}

. This action

a_{t}

acts as the input for the PID controller. By utilizing the current state and control error, the PID controller computes the relevant control output and engages with the environment. Post-interaction, the environment transitions to a subsequent state

s_{t + 1}

, while concurrently producing an immediate reward

r_{t}

for the agent. By employing this reward

r_{t}

and the current state

s_{t + 1}

, the agent determines the subsequent action

a_{t + 1}

. Leveraging its own experience and accumulated knowledge, the agent refines action planning to augment the PID controller’s performance for improved adaptation to the environment.

The temperature control objective for PEMFC in this study is a temperature difference of

∆ T = 10 K (∆ T = T_{s t, o u t} - T_{s t, i n})

between the coolant inlet and outlet of the stack, and the optimal operating temperature of the stack is

T_{r e f} = 535 K (T_{s t} = T_{r e f} = T_{s t, o u t})

.

4.1. Training and Simulation Results

In this section, the hardware setup simulated comprises an Intel (R) Core (TM) i5-8250U CPU @ 1.80 GHz paired with an 8 GB RAM computer. The models described in Section 2 and Section 3 of the paper, as well as the thermal management control model, are developed on the MATLAB 2021b/Simulink platform.

During training, the numerical values of actions output by the agent are determined based on the sizes of the three parameters, P, I, and D, adjusted in the PID controller within the Simulink environment. The exact adjustment rules are as follows. First, to ensure eventual stabilization of the response curve around the stack’s target temperature of 353 K and a coolant inlet–outlet temperature difference of 10 K, and to mitigate the adverse effects of computational precision on the outcome and reduce the possibility of oscillations, the adjustment range of the PID parameters is restricted and appropriately expanded in the DDPG algorithm code. This allows the agent to search for the most suitable parameters within the largest possible numerical range, rather than determining parameters within a small range. Then, the rules for adjusting the numerical values of the proportional coefficient P, integral coefficient I, and derivative coefficient D are determined to maintain consistency with the normal parameter adjustment sequence. Finally, further fine-tuning of the action parameters is conducted in the Simulink environment to restore the action parameter space, ensuring the performance and stability of the controller in practical environments.

To avoid the DDPG from falling into a local optimum, the authors adopt a diverse exploration mechanism in the initial distribution design, which enables the initial action distribution to have high diversity by randomly initializing the weights of the strategy network, ensuring that the intelligent body can extensively explore the state space in the early training stage, effectively reducing the risk of the algorithm converging to the local optimum. In addition, Gaussian noise is introduced during the training process to enhance the exploration ability of the intelligent body, in order to dynamically adapt to the nonlinear and multivariate coupling characteristics of the PEMFC, and further improve the global search ability. In the DDPG-PID framework, the initial parameter setting of the PID controller is also crucial. According to the temperature control requirements and the temperature difference control requirements of the coolant in and out of the electric pile, the authors weight the PID parameter ranges according to the control requirements after initializing the PID parameters, which provides a stable starting point for further optimization of the DDPG, helps the system achieve smooth temperature control at an early stage, and avoids local optimum traps due to the instability of the initial strategy. In addition, the diverse historical experiences are saved and randomly extracted through the experience playback buffer, which reduces the learning bias caused by time correlation and enables the system to explore the state space more comprehensively, effectively reducing the possibility of falling into the local optimum.

4.2. Analysis of Training Results of Dual DDPG-PID

In this section, the DDPG employs average reward evaluation during the training iterations to assess training effectiveness. During training, priority is given to controlling the stack temperature, namely the coolant flow rate of the water pump, followed by the coolant temperature difference, i.e., the airflow through the radiator. The training outcomes for both the water pump and radiator for the dual DDPG approach are depicted in Figure 6.

The PID controller on the pump side regulates the fuel cell temperature through coolant flow modulation. Figure 6a illustrates the training process curve of the pump-side agent. Initially, the three parameters of the PID controller are randomly initialized. At the beginning of training, a notable deviation exists between the stack temperature and the target control temperature, leading to escalated penalties throughout training. Consequently, the agent’s average reward declines during episodes 1–35. Subsequently, as training progresses, the deviation between the stack temperature and the target control temperature diminishes, resulting in the accrual of a satisfactory number of rewards during training. Hence, convergence begins after 75 episodes and remains steady between episodes 75 and 350.

The PID controller on the radiator side regulates the coolant temperature through airflow regulation. Figure 6b illustrates the training process curve of the radiator-side agent. As shown, the convergence of the agent begins after 55 episodes. This rapid convergence is attributed to the swift response of the radiator-side controller once the stack temperature stabilizes at the target control temperature. It promptly adjusts airflow to cool the coolant, thereby minimizing the disparity between the temperature differentials of the coolant inlet and outlet and the target temperature differential. Consequently, the radiator-side agent achieves faster convergence during the training process.

As can be seen from Figure 7a,b, during the training process of the DDPG-PID algorithm, the fluctuation of the PID parameters in the early stage is due to the fact that the algorithm is in the exploratory stage and tries to find the optimal control strategy through different combinations of parameters. At this stage, the algorithm continuously adjusts the PID parameters to cope with the changes in the system, resulting in large fluctuations in the parameter values. With the depth of training, the algorithm gradually accumulates experience, is able to predict and adjust the parameters more accurately, and finally, converges to a stable value, indicating that the system has found the optimal control strategy and realized the stable control of the fuel cell temperature.

Table 2 shows the constraint ranges and convergence accuracies of PSO-PID and FUZZY-PID in the optimization process. Based on the previous experience, PSO-PID and FUZZY-PID are used for coolant flow control. In the tuning details of PSO-PID and FUZZY-PID, first, the parameter ranges of PID controllers are set. And their convergence accuracy is set to 0.001 for both of them. This setting is used to improve the temperature stability. However, temperature control in a fuel cell system involves multivariable coupled control of the coolant and airflows. PID and FUZZY-PID mainly target a single flow variable, which makes it difficult to cope with the complex coupling characteristics of multivariable in the system and increases the difficulty of temperature control.

4.3. Temperature Regulation in Response to Continuous Variations in Load Current

To assess the effectiveness of the proposed control strategy, this study employs continuous step changes in load current for validation. The duration of each load current action is set to 1200 s, during which the load current undergoes abrupt variations every 200 s. Specifically, at 200 s, 400 s, and 600 s, the load current progressively increases from 60 A to 180 A, while at 800 s and 1000 s, it decreases from 180 A to 150 A. The step changes in load current are illustrated in Figure 8.

Figure 9 demonstrates the step current temperature variation curves of the fuel cell stack under different control strategies. From Figure 9, it can be seen that the D-DDPG PID control strategy exhibits a significant steady-state control advantage in fuel cell temperature control, which can more effectively maintain the temperature around 353 K and has a lower steady-state error when compared with other control strategies. This indicates that the D-DDPG PID has a significant advantage in maintaining temperature stability and can better meet the requirements of fuel cell systems for temperature control accuracy. In addition, through the zoom-in analysis of the local response (at 200 s, 400 s, and 800 s), it can be observed that the D-DDPG PID possesses faster response speed and shorter regulation time during temperature fluctuation, and its temperature profile returns to the set value quickly when compared with the traditional PID, PSO-PID, and FUZZY-PID, which effectively reduces unnecessary temperature fluctuation. This makes D-DDPG PID more suitable for real-time temperature control under complex working conditions.

The D-DDPG PID is also excellent in overshoot control and robustness. the D-DDPG PID has almost no significant overshoot, while the conventional PID and PSO-PID have a larger overshoot, which may adversely affect the lifetime and performance of the fuel cell. In addition, the control curve of the D-DDPG PID is very stable in all fluctuation intervals and can adapt to different perturbations quickly, which reflects stronger robustness. In contrast, the traditional PID and FUZZY-PID have poor adaptability due to hysteresis and oscillation under large perturbations, while the D-DDPG PID improves the adaptability to uncertain working conditions through the adaptive mechanism of deep reinforcement learning and realizes high-precision and high-stability temperature control.

Table 3 shows the control data of different control strategies at different time periods. From the data in the table, it can be seen that the D-DDPG PID outperforms the other control strategies in all performance indicators. In terms of the average absolute control error, the error of D-DDPG PID in the interval of 0–200 s is 0.016 K, which is about 75.4%, 66%, and 51.5% lower than that of the traditional PID (0.065 K), FUZZY-PID (0.047 K), and PSO-PID (0.033 K), respectively. In the intervals of 400–600 s and 800–1000 s, their absolute errors are reduced by 79.5%, 76.3%, 64%, 80.9%, 80%, and 69.2%, respectively. The absolute maximum overshoots of the D-DDPG PID are 0.31 K, 0.26 K, and 0.13 K, respectively, in the three intervals, in comparison with the PSO-PID’s overshoots of 0.64 K, 0.51 K, and 0.25 K in the same intervals. The overshooting amplitude is reduced by about 49.5% on average, showing stronger overshooting suppression ability. In addition, in terms of the average tuning time, the D-DDPG PID was 33 s, 29 s, and 19 s in the intervals of 0–200 s, 400–600 s, and 800–1000 s, respectively, which were about 24.2%, 25.6%, and 13.6% shorter compared with the 41 s, 39 s, and 22 s of the PSO-PID. These data further indicate that the D-DDPG PID significantly outperforms the conventional control algorithms in terms of temperature control stability and dynamic response speed and has stronger robustness and adaptability, especially under complex operating conditions.

Figure 10 presents the simulation results of the coolant flow rates under different control strategies. As observed from the figure, at 200 s, 400 s, and 800 s, when the load increases, the operating temperature of the fuel cell rises accordingly. To achieve temperature regulation, the D-DDPG-PID control strategy responds rapidly to system variations, increasing the coolant flow rate required for effective cooling and maintaining stability. Furthermore, at 800 s and 1000 s, as the load decreases, the heat generation of the stack is reduced, leading to a lower coolant demand. Under the regulation of the dual DDPG-PID controller, the coolant flow rate is dynamically adjusted based on system requirements, preventing excessive cooling and enhancing overall system efficiency. In contrast, the compared control algorithms in this study, while also capable of adjusting the coolant flow rate in response to system variations, exhibit a slower response, noticeable overshoot, and oscillations, leading to inferior control stability compared to the D-DDPG-PID control strategy.

Figure 11 presents the simulation results of airflow regulation during the temperature control process of the fuel cell. In the thermal management system of the fuel cell, the adjustment of radiator airflow is primarily aimed at reducing the temperature of the coolant entering the stack, ensuring that the fuel cell operates within its optimal temperature range. When the load increases at 200 s, 400 s, and 800 s, the operating temperature of the fuel cell rises, leading to an increase in both the temperature and flow rate of the coolant exiting the stack. Consequently, the radiator airflow needs to be increased to cool the coolant and enhance its heat dissipation capacity. To achieve the temperature regulation objective, the radiator airflow must be appropriately increased under load increments to meet the cooling demand. From the simulation results, it can be observed that the D-DDPG-PID control algorithm can rapidly respond to changes in the coolant flow rate and effectively regulate the radiator airflow, thereby stabilizing the inlet temperature of the coolant entering the stack. In contrast, at 800 s and 1000 s, when the load decreases, the heat generation of the fuel cell is reduced, leading to a lower cooling demand. Accordingly, the radiator airflow must be decreased to prevent excessive cooling of the coolant, which could otherwise cause temperature non-uniformity within the fuel cell stack. Furthermore, compared to other control algorithms, the D-DDPG-PID algorithm effectively suppresses hysteresis, overshoot, and oscillations in the coolant flow rate adjustment process, resulting in smoother regulation of radiator airflow and improved overall control performance. Combined with the simulation results of coolant flow regulation, this control algorithm demonstrates superior stability in the coordinated control of coolant and airflow, enhancing the cooling efficiency of the fuel cell stack while reducing the energy consumption of the cooling system, thereby improving the overall energy efficiency of the system.

Figure 12 shows the simulation result graph of the coolant temperature difference of the fuel cell. From the result plots, the D-DDPG PID is able to accurately and stably maintain around 10 K near the temperature difference setpoint, while the other control strategies show large fluctuations and oscillations during the temperature difference changes. From the local zoomed-in graphs, the D-DDPG PID shows a faster response speed and smaller overshooting in the temperature difference fluctuation region around 0 s, 400 s, or 800 s. In comparison, the traditional PID, PSO-PID, and FUZZY-PID methods have larger overshoots and longer tuning times in the temperature difference response process, especially as the PSO-PID and FUZZY-PID oscillate in multiple fluctuation intervals. The ability of the D-DDPG PID to quickly and stably control the coolant temperature difference at around 10 K is mainly attributed to the better coordination of the D-DDPG PID in the D-DDPG PID and the D-DDPG PID in the D-DDPG PID. DDPG PID better coordinates the control of coolant and coolant air so that it does not oscillate and fluctuate significantly when the load changes. This further verifies the stability of the D-DDPG PID control algorithm, which can effectively suppress the potential impact of temperature fluctuations on the fuel cell’s performance and ensure the efficient operation of the cooling system under complex operating conditions.

5. Conclusions

In this paper, a dual DDPG-PID control strategy is used to enhance the thermal management performance of PEMFCs. The strategy addresses the inherent delay of conventional PID controllers by dynamically matching the temperature and flow rate variations in real time. Simulation results show that the D-DDPG PID significantly reduces temperature overshoot by approximately 75.4%, 66%, and 51.5% compared to conventional PID, FUZZY-PID, and PSO-PID, respectively, in the 0–200 s time interval. Additionally, the average tuning time of D-DDPG PID is shortened by 24.2%, 25.6%, and 13.6% compared to PSO-PID in the corresponding time intervals. This strategy effectively alleviates challenges such as multivariable coupling and control delays in PEMFC thermal management. By optimizing the PID parameters through reinforcement learning, it reduces temperature oscillations and minimizes parasitic losses. Although the study is based on simulations, the results provide a solid foundation for future industrial applications, with the potential for broader implementation in other PEMFC subsystems and fuel cell types.

Author Contributions

Methodology, Z.Z.; Software, Z.Z. and D.X.; Validation, Z.Z. and Y.S.; Formal analysis, Z.L.; Investigation, Y.S.; Data curation, K.O.; Writing—original draft, Z.Z.; Writing—review & editing, Z.Z.; Supervision, D.X.; Project administration, K.O. and D.X.; Funding acquisition, D.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [Major Science and Technology Projects of Wenzhou, China] grant number [ZG2022024] and [Science and Technology Major Project of Fujian Province of China] grant number [2022HZ028024].

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy concerns.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Cathode and Anode Flow Channel Model and Membrane Water Content Model

This appendix primarily introduces the calculation of mass flow rate in gas transmission, gas pressure variations, and membrane water content in the anode and cathode flow channel models. Subsequently, it presents the computational methods for the auxiliary system models of the fuel cell.

Appendix A.1. Cathode and Anode Flow Channel Modeling

The cathode and anode mass flow models reflect the flow characteristics of air and hydrogen within the fuel cell channels. The models employ the ideal gas equation and are expressed through the mass conservation equations for oxygen, nitrogen, hydrogen, and water vapor [42].

\frac{{d m}_{O_{2}, c a}}{d t} = W_{O_{2}, i n_c a} - W_{O_{2}, o u t_c a} - W_{O_{2}, r e a c t}

(A1)

\frac{{d m}_{N_{2}, c a}}{d t} = W_{N_{2}, i n_c a} - W_{N_{2}, o u t_c a}

(A2)

\frac{{d m}_{w, c a}}{d t} = W_{v, i n_c a} - W_{v, o u t_c a} + W_{v, r e a c t_c a} + W_{v, m e n} - W_{l, o u t_c a}

(A3)

\frac{{d m}_{H_{2}, a n}}{d t} = W_{H_{2}, i n_a n} - W_{H_{2}, o u t_a n} - W_{H_{2}, r e a c t_a n}

(A4)

\frac{{d m}_{w, a n}}{d t} = W_{v, i n_a n} - W_{v, o u t_a n} - W_{v, m e n} - W_{l, o u t_a n}

(A5)

where

W_{O_{2}, i n_c a}

is the mass flow rate of oxygen gas entering the cathode,

W_{O_{2}, o u t_c a}

is the mass flow rate of oxygen gas leaving the cathode,

W_{O_{2}, r e a c t}

is the rate of oxygen reacted,

W_{N_{2}, i n_c a}

is the mass flow rate of nitrogen gas entering the cathode,

W_{N_{2}, o u t_c a}

is the mass flow rate of nitrogen gas leaving the cathode,

W_{v, i n_c a}

is the mass flow rate of vapor entering the cathode,

W_{v, o u t_c a}

is the mass flow rate of vapor leaving the cathode,

W_{v, g e n_c a}

, gen is the rate of vapor generated in fuel cell reaction,

W_{v, m e n}

is the mass flow rate of water transfer across the fuel cell membrane,

W_{l, o u t_c a}

is the rate of liquid water leaving the cathode,

W_{H_{2}, i n_a n}

is the mass flow rate of hydrogen gas entering the anode,

W_{H_{2}, o u t_a n}

is the mass flow rate of hydrogen gas leaving the anode,

W_{H_{2}, r e a c t_a n}

is the rate of hydrogen reacted,

W_{v, i n_a n}

is the mass flow rate of vapor entering the anode,

W_{v, o u t_a n}

is the mass flow rate of vapor leaving the anode,

W_{v, m e n}

is the mass flow rate of water transfer across the fuel cell membrane,

W_{l, o u t_a n}

is the rate of liquid water leaving the anode,

m_{O_{2}, c a}

is the mass of oxygen,

m_{N_{2}, c a}

is the mass of nitrogen,

{d m}_{H_{2}, a n}

is the mass of hydrogen, and

m_{w, c a}

and

m_{w, a n}

are the masses of cathode and anode water, respectively.

According to the ideal gas law, the empirical formulas for calculating the saturation vapor pressure and the gases in the anode and cathode flow channels are as follows [49]:

{l o g}_{10} (P_{v s a t}) = - 1.69 \times 10^{- 10} T^{4} + 3.85 \times 10^{- 7} T^{3} - 3.39 \times 10^{- 4} T^{2} + 0.143 T - 20.92

(A6)

P_{z, k} = \frac{m_{z, k} R_{z} T_{s t}}{V_{k}}

(A7)

P_{g, i n_c a} = P_{i n_c a} - P_{v, i n_c a}

(A8)

P_{H_{2}, i n_a n} = P_{i n_a n} - P_{v, i n_a n}

(A9)

P_{v, k_i n} = \emptyset_{k_i n} P_{v s a t} (T_{k_i n})

(A10)

where

z = O_{2} / N_{2} / H_{2} / v

,

{k = a n / c a, \emptyset}_{k_i n}

denotes the gas humidity upon entering the stack,

T_{k_i n}

signifies the gas temperature upon entering the stack, and

P_{v s a t} (T_{i n_c a})

indicates the saturation vapor pressure at temperature

T_{i n_c a}

.

Since fuel cells generate heat during electrochemical reactions and heat is transferred when gases enter and exit the stack, it is necessary to calculate the mass flow rates of gases entering and exiting the fuel cell stack, as well as the mass flow rate of the gases involved in the electrochemical reactions. The detailed calculation equations are as follows:

W_{a, i n_c a} = \frac{W_{c a, i n}}{1 + \frac{M_{v}}{M_{a, i n_c a}} \frac{P_{v, i n_c a}}{P_{a, i n_c a}}}

(A11)

W_{v, i n_c a} = W_{c a_i n} - W_{g, i n_c a}

(A12)

W_{H_{2}, i n_c a} = \frac{W_{a n_i n}}{1 + \frac{M_{v}}{M_{H_{2}}} \frac{P_{v, i n_c a}}{P_{i n_c a}}}

(A13)

W_{v, i n_a n} = W_{i n_a n} - W_{H_{2}, i n_a n}

(A14)

M_{g, i n_c a} = y_{O_{2}, i n_c a} \times M_{O_{2}} + (1 - y_{O_{2}, i n_c a}) \times M_{N_{2}}

(A15)

The mass flow rates of oxygen and nitrogen can be determined by the following formulas:

W_{O_{2}, i n_c a} = x_{O_{2}, i n_c a} W_{g, i n_c a}

(A16)

W_{N_{2}, i n_c a} = (1 - x_{O_{2}, i n_c a}) W_{g, i n_c a}

(A17)

In the above equation,

x_{O_{2}, i n_c a}

represents the mass fraction of oxygen, and its calculation formula is:

x_{O_{2}, i n_c a} = \frac{y_{O_{2}, i n_c a} M_{O_{2}}}{y_{O_{2}, i n_c a} M_{O_{2}} + (1 - y_{O_{2}, i n_c a}) M_{N_{2}}}

(A18)

where

M_{z}

is the molar mass of the above gas,

M_{g, c a_i n}

is the molar mass of dry air,

k = a n / c a, W_{k, i n}

represents the total gas mass entering the cathode and anode, and

W_{k, v_i n}

is the mass flow rate of water vapor entering the cathode and anode and

y_{O_{2}, i n_c a} = 0.21

.

The mass flow rate of gases exiting the cathode can be calculated using empirical Formulas (A19)–(A25), as follows:

W_{o u t_c a} = k_{o u t_c a} (P_{c a} - P_{o u t_c a})

(A19)

W_{g, o u t_c a} = \frac{1}{1 + \frac{M_{v} P_{v, o u t_c a}}{M_{g, c a} P_{g, o u t_c a}}} W_{o u t_c a}

(A20)

W_{v, o u t_c a} = W_{o u t_c a} - W_{o u t_c a}

(A21)

W_{O_{2}, o u t_c a} = x_{O_{2}, c a} W_{g, o u t_c a}

(A22)

W_{v, o u t_c a} = (1 - x_{O_{2}, c a}) W_{g, o u t_c a}

(A23)

Similar to the calculation of the mass flow rate of gases exiting the cathode, the mass flow rate of gases exiting the anode can be calculated as follows:

W_{H_{2}, a n_o u t} = \frac{1}{1 + \frac{M_{v}}{M_{H_{2}, a n}} \frac{P_{v, a n}}{P_{H_{2}, a n}}} W_{a n_o u t}

(A24)

W_{v, a n_o u t} = W_{a n_o u t} - W_{H_{2}, a n_o u t}

(A25)

However, to reduce the computational load of the model, this study assumes that hydrogen is completely consumed in the reaction (i.e., 100% hydrogen utilization on the anode side). Therefore, the model does not include a hydrogen recirculation pump.

The mass flow rates of hydrogen, oxygen, and water vapor involved in the fuel cell electrochemical reaction can be calculated using empirical Formulas (A26)–(A28).

W_{v, c a, g e n} = M_{v} \times n \frac{I_{s t}}{2 F}

(A26)

W_{O_{2}, r e a c t} = M_{O_{2}} \times n \frac{I_{s t}}{4 F}

(A27)

W_{H_{2}, r e a c t} = M_{H_{2}} \times n \frac{I_{s t}}{2 F}

(A28)

Appendix A.2. Membrane Water Content Modeling

The membrane water content model describes both the membrane’s humidity and the rate of water flow through it. Water transport within the membrane involves primarily “electro-osmosis” and “reactive diffusion” [40]. Hence, the calculation of water permeation through the membrane is as follows:

N_{v, m e m b r} = n_{d} \frac{i}{F} - D_{w} (\frac{c_{v, c a} - c_{v, a n}}{t_{m}})

(A29)

c_{v, k} = \frac{ρ_{m e n, d r y}}{M_{m, d r y}} λ_{k}

(A30)

λ_{k} = \{\begin{matrix} 0.043 + 17.81 α_{k} - 39.85 α_{k}^{2} + 36.0 α_{k}^{3} & , 0 < α_{k} \leq 1 \\ 14 + 1.4 (α_{k} - 1) & , 0 < α_{k} \leq 3 \end{matrix}

(A31)

λ_{m o} = \frac{(λ_{a n} + λ_{c a})}{2}

(A32)

where

α_{k} = \frac{P_{v, k}}{P_{v s a t} (T_{s t})}

,

D_{w}

is the water diffusion coefficient, and

n_{d}

is the electro-osmotic resistance coefficient. Their calculations are as follows.

D_{w} = D_{λ} \exp (2416 (\frac{1}{303} - \frac{1}{T_{f c}}))

(A33)

n_{d} = 0.0029 λ_{m o}^{2} + 0.05 λ_{m o} - 3.4 \times 10^{- 19}

(A34)

D_{λ} = \{\begin{matrix} 10^{- 10} λ_{m o} \leq 2 \\ 10^{- 6} (1 + 2 (λ_{m o} - 2)) 2 < λ_{m o} \leq 3 \\ 10^{- 6} (3 - 1.67 (λ_{m o} - 3)) 3 < λ_{m o} \leq 4.5 \\ 1.25 \times 10^{6} 4.5 < λ_{m o} \end{matrix}

(A35)

Appendix B. PEMFC Auxiliary System Modeling

Appendix B.1. Air Compressor Modeling

The air compressor model consists of two main sections: the first uses a static compressor MAP chart to govern airflow, followed by thermodynamic equations to calculate the outlet air temperature and power output. The second section considers the rotational inertia of the compressor and motor to determine the compressor speed, which is then used in the compressor map to calculate air mass flow [50].

The inlet conditions of the compressor fluctuate in response to variations in inlet flow, pressure, and temperature. Therefore, it becomes imperative to adjust the mass flow and speed within the compressor map. The corrected mass flow is

W_{a d} = W_{c p} \sqrt{T_{a d}} / P_{a d}

, and the corrected speed is

N_{a d} = N_{c p} / \sqrt{T_{a d}}

. And the corrected pressure is

P_{a d} = P_{c p, l i} / 1 a t m

, and the corrected temperature is

T_{a d} = T_{c p, l i} / 288 K

. Following the methodology outlined in reference [51],

ψ

is determined according to the following formula:

ψ = \frac{C_{p} T_{c p, i n} [(\frac{P_{c p, o u t}}{P_{c p, i n}})^{\frac{γ - 1}{γ}} - 1]}{\frac{1}{2} U_{c}^{2}}

(A36)

U_{c} = \frac{π}{60} d_{c} N_{a d}

(A37)

Building upon the preceding analysis, the standardized mass flow rate of the air compressor can be determined using the following methodology.

θ = \frac{W_{a d}}{ρ_{a} \frac{π}{4} d_{c}^{2} U_{c}}

(A38)

The normalized compressor flow rate,

Φ

, is:

Φ = φ_{m a x} [1 - {e x p}^{α (\frac{ψ}{ϕ_{m a x}} - 1)}]

(A39)

φ_{m a x} = a_{4} M^{4} + a_{3} M^{3} + a_{2} M^{2} + a_{1} M + a_{0}

(A40)

ϕ_{m a x} = b_{5} M^{5} + b_{4} M^{4} + b_{3} M^{3} + b_{2} M^{2} + b_{1} M + b_{0}

(A41)

α = c_{2} M^{2} + c_{1} M + c_{0}

(A42)

where M is the Mach number of the inlet duct, defined as follows:

M = \frac{U_{C}}{\sqrt{γ R_{a} T_{c p, i n}}}

(A43)

The relationship between compressor speed, flow rate, and pressure ratio (PR) obtained using a nonlinear fitting method is shown in Figure A1.

Figure A1. Air compressor speed–flow–pressure ratio chart.

Appendix B.2. Manifold Modeling and Return Manifold Modeling

The intake manifold is used to transport gases to the stack, while the return manifold primarily transmits gases leaving the stack. Based on the energy conservation equation for gases, their expressions can be represented by the following empirical equations.

\frac{{d P}_{s m}}{d t} = \frac{{γ R}_{a}}{V_{s m}} (W_{c p} T_{c p, o u t} - W_{s m, o u t} T_{s m})

(A44)

W_{s m, o u t} = K_{s m, o u t} (P_{s m} - P_{c a})

(A45)

\frac{{d P}_{r m}}{d t} = \frac{R_{g} T_{r m}}{V_{r m}} (W_{c a, o u t} - W_{r m, o u t})

(A46)

W_{r m, o u t} = \frac{C_{D, r m} A_{T, r m} P_{r m}}{\sqrt{\bar{R} T_{r m}}} (\frac{P_{a t m}}{P_{r m}})^{\frac{1}{γ}} {\frac{2 γ}{γ - 1} [1 - (\frac{P_{a t m}}{P_{r m}})^{\frac{γ - 1}{γ}}]}^{\frac{1}{2}} \frac{P_{a t m}}{P_{r m}} > 0.528

(A47)

W_{r m, o u t} = \frac{C_{D, r m} A_{T, r m} P_{r m}}{\sqrt{\bar{R}} T_{r m}} γ^{\frac{1}{2}} {(\frac{2}{γ + 1})}^{\frac{γ + 1}{2 (γ - 1)}} \frac{P_{a t m}}{P_{r m}} \leq 0.528

(A48)

Appendix B.3. Air Cooler Modeling

Since the air exiting the air compressor has a high temperature, it must be cooled before entering the stack. However, temperature variations affect the humidity of the gas, and the humidity of the gas exiting the air cooler can be calculated using the following equation.

ϕ_{c l} = \frac{{P_{v s a t (T_{a t m})} ϕ}_{a t m} P_{c l}}{P_{v s a t (T_{c l})} P_{a t m}}

(A49)

Appendix B.4. Air Humidifier Modeling

The humidifier’s function is to humidify the air entering the stack, which facilitates the electrochemical reactions within the stack. Assuming that the humidifier adjusts the air humidity to the set value, the water vapor partial pressure of the gas at the humidifier outlet can be calculated using the following formula.

P_{v, c l} = ϕ_{c l} P_{v s a t (T_{c l})}

(A50)

P_{v, h m} = ϕ_{h m} P_{v s a t (T_{h m})}

(A51)

P_{g, c l} = P_{c l} - P_{v, c l}

(A52)

The mass flow rate of dry air entering and exiting the humidifier remains constant,

W_{g, c l} = W_{g, h m}

. The vapor flow rate escalates with the augmented water flow rate.

W_{v, h m} = \frac{P_{v, h m} W_{g, c l}}{P_{g, c l}} \frac{M_{v}}{M_{a}}

(A53)

The water flow rate added can be determined using the subsequent formula.

W_{v, i n j} = W_{v, h m} - W_{v, c l} = \frac{P_{v, h m} W_{g, c l}}{P_{g, c l}} \frac{M_{v}}{M_{a}} - (W_{c l} - W_{a, c l})

(A54)

Appendix B.5. Hydrogen Supply System

The supply of hydrogen primarily comes from the high-pressure hydrogen cylinders. Therefore, the mass flow rate of hydrogen entering the stack is mainly achieved by adjusting the pressure drop in the supply pipeline, and the empirical equation for its calculation is as follows:

W_{a n, i n} = K_{1} (K_{2} P_{s m} - P_{a n})

(A55)

The above content primarily pertains to the modeling of the PEMFC system. The parameters associated with the formulas are shown in Table A1. These data are sourced from reference [28].

Table A1. Parameters used in the modeling of the PEMFC system.

Symbol	Variable	Value
$γ$	Ratio of special heat of air	1.4
$C_{P}$	Constant pressure-specific heat of air	$1004 J / (m o l \cdot K)$
$R_{a}$	Air gas constant	$286.9 J / (k g \cdot$ K)
$ρ_{a}$	Air density	$1.23 k g / m^{3}$
$d_{c}$	Compressor diameter	$0.2286 m$
$a_{0}$	Constant	$2.21195 \times 10^{- 3}$
$a_{1}$	Constant	$- 4.63685 \times 10^{- 5}$
$a_{2}$	Constant	$- 5.36235 \times 10^{- 4}$
$a_{3}$	Constant	$- 2.70399 \times 10^{- 4}$
$a_{4}$	Constant	$- 3.69906 \times 10^{- 5}$
$b_{0}$	Constant	$0.43331$
$b_{1}$	Constant	$- 0.68344$
$b_{2}$	Constant	$0.80121$
$b_{3}$	Constant	$- 0.42937$
$b_{4}$	Constant	$0.10581$
$b_{5}$	Constant	$- 9.78775 \times 10^{- 3}$
$c_{0}$	Constant	$2.44419$
$c_{1}$	Constant	$- 1.34837$
$c_{2}$	Constant	$1.76567$
$η_{c m}$	Efficiency of motor	$98 %$
$η_{c p}$	Maximum efficiency of compressor	$80 %$
$J_{c p}$	Combined inertia	$5 \times 10^{- 5} k g \cdot m^{2}$
$R_{c m}$	Motor constant	$0.82$
$k_{t}$	Motor constant	$0.0153 N - m / A m p$
$k_{v}$	Motor constant	$0.153 V / (r a d / s)$
$V_{s m}$	Supply manifold volume	$0.02 m^{3}$
$k_{s m, o u t}$	Out orifice constant	$0.3629 \times 10^{- 5} k g / (s \cdot p a)$
$V_{r m}$	Return manifold volume	$0.005 m^{3}$
$K_{1}$	Gain	$2.1 k g / (s \cdot k P a)$
$K_{2}$	Drop	$0.94$
$C_{D, r m}$	Constant term	$0.0124$
$A_{T, r m}$	Return manifold throttle area	$0.002 m^{2}$

Appendix C. DDPG Algorithm Pseudocode and Hyperparameters

Algorithm A1. Dual DDPG PID algorithm

1 : Randomly initialize critic network Q (s, a| θ^{q})

and action network ϑ (s | θ^{ϑ})

with parameters : θ^{q}

, θ^{ϑ}

2 : Initialize the target network for critic and actor networks : θ^{q^{'}} \leftarrow θ^{q}, θ^{ϑ^{'}} \leftarrow θ^{ϑ}

3 : Initialize replay buffer R, learning rate τ

4: For episode = 1 to M do

5: Initialize a random process N for action exploration

6 : Receive initial observation state of PEMFCS S_{t}

7: Initialize the state of the PID controller.

8: integral = 0; Prev error = 0

9: For t = 1, T do

10 : Select action : a_{t} = ϑ (s_{t} | θ^{ϑ}) + N_{t}

according to the current actor with target parameters θ^{ϑ}

and exploration noise N_{t}

11 : Execute action a_{t}

, obtain reward r_{t}

and new state s_{t + 1}

from PEMFCS

12 : Calculate the output of the PID controller . 13 : error = target - current_state; integral = integral + error; derivative = error - prev_error . 14 : action = P \times e r r o r + I \times i n t e g r a l + D \times d e r i v a t i v e . 15 : Store transition (s_{t}, a_{t}, r_{t}, s_{t + 1}) in R 16 : Sample a random minibatch of K transitions (s_{t}, a_{t}, r_{t}, s_{t + 1}) from the replay memory R 17 : Set y_{k} = r_{k} + γ Q^{'} (s_{k + 1}, ϑ (s_{k + 1} | θ^{ϑ}) | θ^{q}) 18 : Update critic parameters θ^{q} by minimizing the loss : L = \frac{1}{N} \sum_{k} (y_{k} - Q (s_{k}, a_{k} | θ^{q}))^{2} 19 : Update actor policy using the sampled policy gradient : \nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{k}^{K} \nabla_{a} Q (s, a | θ^{q}) |_{s = s_{k}, a = μ (s_{k})} \nabla_{θ^{ϑ}} ϑ (s | θ^{ϑ}) |_{s_{k}} 20 : Update the target networks : θ^{q^{'}} \leftarrow τ θ^{q} + (1 - τ) θ^{q^{'}} θ^{ϑ^{'}} \leftarrow τ θ^{ϑ} + (1 - τ) θ^{ϑ^{'}}

21: end for
22: end for

Table A2. Hyperparameters of DDPG.

Symbol	Variable	Value
$l_{a}$	Actor network learning rate	$10^{- 4}$
$l_{c}$	Critic network learning rate	$5 \times 10^{- 3}$
$N$	Minibatch size	32
$γ$	Discount factor	0.99
$N_{R}$	Experience buffer length size	$1 \times 10^{6}$
$v_{a}$	Noise variance	0.3
$v$	Soft target update rate	$10^{- 3}$

References

Xia, W.; Apergis, N.; Bashir, M.F.; Ghosh, S.; Doğan, B.; Shahzad, U. Investigating the role of globalization, and energy consumption for environmental externalities: Empirical evidence from developed and developing economies. Renew. Energy 2022, 183, 219–228. [Google Scholar]
Wang, J.; Hussain, S.; Sun, X.; Chen, X.; Ma, Z.; Zhang, Q.; Yu, X.; Zhang, P.; Ren, X.; Saqib, M.; et al. Nitrogen application at a lower rate reduce net field global warming potential and greenhouse gas intensity in winter wheat grown in semi-arid region of the Loess Plateau. Field Crops Res. 2022, 280, 108475. [Google Scholar]
Mao, X.; Liu, S.; Tan, J.; Hu, H.; Lu, C.; Xuan, D. Multi-objective optimization of gradient porosity of gas diffusion layer and operation parameters in PEMFC based on recombination optimization compromise strategy. Int. J. Hydrogen Energy 2023, 48, 13294–13307. [Google Scholar]
Liu, S.; Tan, J.; Hu, H.; Lu, C.; Xuan, D. Multi-objective optimization of proton exchange membrane fuel cell geometry and operating parameters based on three new performance evaluation indexes. Energy Convers. Manag. 2023, 277, 116642. [Google Scholar]
Huang, Y.; Kang, Z.; Mao, X.; Hu, H.; Tan, J.; Xuan, D. Deep reinforcement learning based energy management strategy considering running costs and energy source aging for fuel cell hybrid electric vehicle. Energy 2023, 283, 129177. [Google Scholar]
Mao, X.; Liu, S.; Huang, Y.; Kang, Z.; Xuan, D. Multi-flow channel proton exchange membrane fuel cell mass transfer and performance analysis. Int. J. Heat Mass Transf. 2023, 215, 124497. [Google Scholar]
Jian, Q.; Huang, B.; Luo, L.; Zhao, J.; Cao, S.; Huang, Z. Experimental investigation of the thermal response of open-cathode proton exchange membrane fuel cell stack. Int. J. Hydrogen Energy 2018, 43, 13489–13500. [Google Scholar]
Han, J.; Park, J.; Yu, S. Control strategy of cooling system for the optimization of parasitic power of automotive fuel cell system. Int. J. Hydrogen Energy 2015, 40, 13549–13557. [Google Scholar] [CrossRef]
Yu, X.; Zhou, B.; Sobiesiak, A. Water and thermal management for Ballard PEM fuel cell stack. J. Power Sources 2005, 147, 184–195. [Google Scholar] [CrossRef]
Kim, K.; von Spakovsky, M.R.; Wang, M.; Nelson, D.J. Dynamic optimization under uncertainty of the synthesis/design and operation/control of a proton exchange membrane fuel cell system. J. Power Sources 2012, 205, 252–263. [Google Scholar] [CrossRef]
Ou, K.; Yuan, W.-W.; Choi, M.; Yang, S.; Kim, Y.-B. Performance increase for an open-cathode PEM fuel cell with humidity and temperature control. Int. J. Hydrogen Energy 2017, 42, 29852–29862. [Google Scholar] [CrossRef]
Wang, Y.; Xu, H.; Wang, X.; Gao, Y.; Su, X.; Qin, Y.; Xing, L. Multi-sub-inlets at cathode f low-field plate for current density homogenization and enhancement of PEM fuel cells in low relative humidity. Energy Convers. Manag. 2022, 252, 115069. [Google Scholar]
Liso, V.; Nielsen, M.P.; Kær, S.K.; Mortensen, H.H. Thermal modeling and temperature control of a PEM fuel cell system for forklift applications. Int. J. Hydrogen Energy 2014, 39, 8410–8420. [Google Scholar] [CrossRef]
Zhao, X.; Li, Y.; Liu, Z.; Li, Q.; Chen, W. Thermal management system modeling of a water-cooled proton exchange membrane fuel cell. Int. J. Hydrogen Energy 2015, 40, 3048–3056. [Google Scholar] [CrossRef]
Yu, S.; Jung, D. Thermal management strategy for a proton exchange membrane fuel cell system with a large active cell area. Renew. Energy 2008, 33, 2540–2548. [Google Scholar] [CrossRef]
Liu, Z.; Chen, J.; Kumar, L.; Jin, L.; Huang, L. Model-based decoupling control for the thermal management system of proton exchange membrane fuel cells. Int. J. Hydrogen Energy 2023, 48, 19196–19206. [Google Scholar]
Li, J.; Yang, B.; Yu, T. Distributed deep reinforcement learning-based coordination performance optimization method for proton exchange membrane fuel cell system. Sustain. Energy Technol. Assess. 2022, 50, 101814. [Google Scholar] [CrossRef]
Yin, L.; Li, Q.; Wang, T.; Liu, L.; Chen, W. Real-time thermal Management of Open-Cathode PEMFC system based on maximum efficiency control strategy. Asian J. Control 2019, 21, 1796–1810. [Google Scholar] [CrossRef]
Zhao, R.; Qin, D.; Chen, B.; Wang, T.; Wu, H. Thermal Management of Fuel Cells Based on Diploid Genetic Algorithm and Fuzzy PID. Appl. Sci. 2023, 13, 520. [Google Scholar] [CrossRef]
Cheng, S.; Fang, C.; Xu, L.; Li, J.; Ouyang, M. Model-based temperature regulation of a PEM fuel cell system on a city bus. Int. J. Hydrogen Energy 2015, 40, 13566–13575. [Google Scholar] [CrossRef]
Jia, Y.; Zhang, R.; Lv, X.; Zhang, T.; Fan, Z. Research on Temperature Control of Fuel-Cell Cooling System Based on Variable Domain Fuzzy PID. Processes 2022, 10, 534. [Google Scholar] [CrossRef]
Yu, Y.; Chen, M.; Zaman, S.; Xing, S.; Wang, M.; Wang, H. Thermal management system for liquid-cooling PEMFC stack: From primary configuration to system control strategy. ETransportation 2022, 12, 100165. [Google Scholar]
You, Z.; Xu, T.; Liu, Z.; Peng, Y.; Cheng, W. Study on Air-cooled Self-humidifying PEMFC Control Method Based on Segmented Predict Negative Feedback Control. Electrochim. Acta 2014, 132, 389–396. [Google Scholar] [CrossRef]
Ou, K.; Wang, Y.-X.; Li, Z.-Z.; Shen, Y.-D.; Xuan, D.-J. Feedforward fuzzy-PID control for air flow regulation of PEM fuel cell system. Int. J. Hydrogen Energy 2015, 40, 11686–11695. [Google Scholar] [CrossRef]
Hasheminejad, S.M.; Fallahi, R. Intelligent VIV control of 2DOF sprung cylinder in laminar shear-thinning and shear thickening cross-flow based on self-tuning fuzzy PID algorithm. Mar. Struct. 2023, 89, 103377. [Google Scholar]
Mousakazemi, S.M.H. Control of a PWR nuclear reactor core power using scheduled PID controller with GA, based on two-point kinetics model and adaptive disturbance rejection system. Ann. Nucl. Energy 2019, 129, 487–502. [Google Scholar] [CrossRef]
Wang, Y.-X.; Qin, F.-F.; Ou, K.; Kim, Y.-B. Temperature Control for a Polymer Electrolyte Membrane Fuel Cell by Using Fuzzy Rule. IEEE Trans. Energy Convers. 2016, 31, 667–675. [Google Scholar] [CrossRef]
Tan, J.; Hu, H.; Liu, S.; Chen, C.; Xuan, D. Optimization of PEMFC system operating conditions based on neural network and PSO to achieve the best system performance. Int. J. Hydrogen Energy 2022, 47, 35790–35809. [Google Scholar] [CrossRef]
Yan, C.; Chen, J.; Liu, H.; Lu, H. Model-based Fault Tolerant Control for the Thermal Management of PEMFC Systems. IEEE Trans. Ind. Electron. 2019, 67, 2875–2884. [Google Scholar] [CrossRef]
Ahmadi, S.; Abdi, S.; Kakavand, M. Maximum power point tracking of a proton exchange membrane fuel cell system using PSO-PID controller. Int. J. Hydrogen Energy 2017, 42, 2043020443. [Google Scholar] [CrossRef]
Li, G.; Li, Y. Temperature Control of PEMFC Stack Based on BP Neural Network. In Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology, Xi’an, China, 10–11 December 2016. [Google Scholar] [CrossRef]
Song, C.; Kim, K.; Sung, D.; Kim, K.; Yang, H.; Lee, H.; Cho, G.Y.; Cha, S.W. A Review of Optimal Energy Management Strategies Using Machine Learning Techniques for Hybrid Electric Vehicles. Int. J. Automot. Technol. 2021, 22, 1437–1452. [Google Scholar]
Zhou, Q.; Zhao, D.; Shuai, B.; Li, Y.; Williams, H.; Xu, H. Knowledge implementation and transfer with an adaptive learning network for real-time power management of the plug-in hybrid vehicle. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5298–5308. [Google Scholar] [PubMed]
Inuzuka, S.; Zhang, B.; Shen, T. Real-Time HEV Energy Management Strategy Considering Road Congestion Based on Deep Reinforcement Learning. Energies 2021, 14, 5270. [Google Scholar] [CrossRef]
Li, J.; Qian, T.; Yu, T. Data-driven coordinated control method for multiple systems in proton exchange membrane fuel cells using deep reinforcement learning. Energy Rep. 2022, 8, 290–311. [Google Scholar]
Huang, Y.; Hu, H.; Tan, J.; Lu, C.; Xuan, D. Deep reinforcement learning based energy management strategy for range extend fuel cell hybrid electric vehicle. Energy Convers. Manag. 2023, 277, 116678. [Google Scholar] [CrossRef]
Amphlett, J.C.; Baumert, R.M.; Mann, R.F.; Peppley, B.A.; Roberge, P.R. Performance modeling of the Ballard Mark IV solid polymer electrolyte fuel cell. J. Electrochem. Soc. 1995, 142, 9–15. [Google Scholar]
Lee, J.; Lalk, T.; Appleby, A. Modeling electrochemical performance in large scale proton exchange membrane fuel cell stacks. J. Power Sources 1998, 70, 258–268. [Google Scholar]
Mann, R.F.; Amphlett, J.C.; Hooper, M.A.; Jensen, H.M.; Peppley, B.A.; Roberge, P.R. Development and application of a generalised steady-state electrochemical model for a PEM fuel cell. J. Power Sources 2000, 86, 173–180. [Google Scholar]
Nguyen, T.V.; White, R.E. A Water and Heat Management Model for Proton-Exchange-Membrane Fuel Cells. J. Electrochem. Soc. 1993, 140, 2178–2186. [Google Scholar]
Guzzella, L. Control Oriented Modelling of Fuel-Cell Based Vehicles. In Presentation in NSF Workshop on the Integration of Modeling and Control for Automotive Systems; University of Michigan: Ann Arbor, MI, USA, 1999. [Google Scholar]
Pukrushpan, J.T.; Peng, H.; Stefanopoulou, A.G. Control-Oriented Modeling and Analysis for Automotive Fuel Cell Systems. J. Dyn. Syst. Meas. Control 2004, 126, 14–25. [Google Scholar] [CrossRef]
Xing, L.; Chang, H.; Zhu, R.; Wang, T.; Zou, Q.; Xiang, W.; Tu, Z. Thermal analysis and management of proton exchange membrane fuel cell stacks for automotive vehicle. Int. J. Hydrogen Energy 2021, 46, 32665–32675. [Google Scholar]
Hu, P.; Cao, G.-Y.; Zhu, X.-J.; Hu, M. Coolant circuit modeling and temperature fuzzy control of proton exchange membrane fuel cells. Int. J. Hydrogen Energy 2010, 35, 9110–9123. [Google Scholar] [CrossRef]
Wang, L.; Quan, Z.; Zhao, Y.; Yang, M.; Zhang, J. Experimental investigation on thermal management of proton exchange membrane fuel cell stack using micro heat pipe array. Appl. Therm. Eng. Des. 2022, 214, 118831. [Google Scholar]
Li, W.; Cui, H.; Nemeth, T.; Jansen, J.; Ünlübayir, C.; Wei, Z.; Feng, X.; Han, X.; Ouyang, M.; Dai, H.; et al. Cloud-based health- conscious energy management of hybrid battery systems in electric vehicles with deep reinforcement learning. Appl. Energy 2021, 193, 116977. [Google Scholar]
Liu, Y.; Gao Po Zheng, C.; Tian, L.; Tian, Y. A deep reinforcement learning strategy combining expert experience guidance for a fruit-picking manipulator. Electronics 2022, 11, 311. [Google Scholar] [CrossRef]
Hu, H.; Lu, C.; Tan, J.; Liu, S.; Xuan, D. Effective energy management strategy based on deep reinforcement learning for fuel cell hybrid vehicle considering multiple performance of integrated energy system. Energy Res. 2022, 46, 24254–24272. [Google Scholar]
Pukrushpan, J.T. Modeling and Control of Fuel Cell Systems and Fuel Processors; University of Michigan: Ann Arbor, MI, USA, 2003. [Google Scholar]
Cunningham, J.M.; Hoffman, M.A.; Moore, R.M.; Friedman, D.J. Requirements for a Flexible and Realistic Air Supply Model for Incorporation into a Fuel Cell Vehicle (FCV) System Simulation; SAE International: Warrendale, PA, USA, 1999. [Google Scholar]
Moraal, P.; Kolmanovsky, I. Turbocharger Modeling for Automotive Control Applications; SAE International: Warrendale, PA, USA, 1999. [Google Scholar] [CrossRef]

Figure 1. System architecture diagram of PEMFC.

Figure 2. Comparison of the simulated data with the reference data [42].

Figure 3. Control framework of the PEMFC thermal management system.

Figure 4. Structural framework diagram of the dual DDPG algorithm.

Figure 5. Framework diagram of intelligent agent outputs.

Figure 6. (a) The training process curve of the water pump side agent in dual DDPG, (b) the training process curve of the radiator side agent in dual DDPG.

Figure 7. (a) Pump-side PlD parameter training results, (b) radiator-side PID parameter training results.

Figure 8. Continuous step load current curve.

Figure 9. Temperature variation curves of the stack under different control strategies.

Figure 10. Water pump coolant flow curve.

Figure 11. Radiator airflow curve.

Figure 12. Temperature difference curve of coolant outflow and inflow to the stack.

Table 1. Reward function parameters.

Parameter	Description	Value
$α_{1}$	$The weight factor for the penalty strength of the control error e_{1}$	0.7
$α_{2}$	$The weight factor for the penalty strength of the control error e_{2}$	0.8
$ξ_{1}$	$Penalty coefficients, adjusting for the effect of the square of the error term e_{1}$ on the reward	0.2
$ξ_{2}$	$Penalty coefficients, adjusting for the effect of the square of the error term e_{2}$ on the reward	0.3
$C_{1}$	$Reward its quasi - value against the e_{1}$ base reward	3
$C_{2}$	$Reward its quasi - value against the e_{2}$ base reward	4
$C_{3}$	$Penalty base value, for e_{1}$ $or e_{2}$ , penalties for deviations from target values	−1
$C_{4}$	$Penalty base value, for e_{1}$ $or e_{2}$ , penalties for deviations from target values	−0.9
$c$	Constant offset value used to adjust the overall output of the reward function	0.2

Table 2. PSO-ID and FUZZY-PID parameter setting details.

Control Algorithm	Parameters	Parameter Range	Convergence Accuracy	Description
PSO-PID	KP	0.01~0.03	0.001	PSO optimizes PID controller parameters to adjust coolant flow to balance temperature changes
	KI	0.001~0.07	0.001
	KD	0~6	0.001
FUZZY-PID	KP	−0.3~0.3	0.001	Fuzzy logic adjusts PID controller parameters to regulate coolant flow to equalize temperature changes
	KI	−0.06~0.06	0.001
	KD	−3~3	0.001

Table 3. Comparison of control data.

Control Methods	Load Time (s)	Mean Absolute Control Error (K)	Absolute Maximum Overshoot (K)	Mean Setting Time (s)
D-DDPG-PID	0–200	0.016	0.31	33
	400–600	0.009	0.26	29
	800–100	0.004	0.13	19
PID	0–200	0.065	1.02	59
	400–600	0.044	0.85	51
	800–100	0.021	0.41	39
FUZZY-PID	0–200	0.047	0.79	53
	400–600	0.038	0.74	47
	800–1000	0.020	0.38	35
PSO-PID	0–200	0.033	0.64	41
	400–600	0.0025	0.51	39
	800–100	0.013	0.25	22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Shen, Y.; Ou, K.; Liu, Z.; Xuan, D. PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient. Hydrogen 2025, 6, 20. https://doi.org/10.3390/hydrogen6020020

AMA Style

Zhang Z, Shen Y, Ou K, Liu Z, Xuan D. PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient. Hydrogen. 2025; 6(2):20. https://doi.org/10.3390/hydrogen6020020

Chicago/Turabian Style

Zhang, Zhi, Yunde Shen, Kai Ou, Zhuwei Liu, and Dongji Xuan. 2025. "PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient" Hydrogen 6, no. 2: 20. https://doi.org/10.3390/hydrogen6020020

APA Style

Zhang, Z., Shen, Y., Ou, K., Liu, Z., & Xuan, D. (2025). PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient. Hydrogen, 6(2), 20. https://doi.org/10.3390/hydrogen6020020

Article Menu

PEMFC Thermal Management Control Strategy Based on Dual Deep Deterministic Policy Gradient

Abstract

1. Introduction

2. PEMFC System Model

2.1. Fuel Cell Modeling

2.2. PEMFC Thermal Management System Modeling

2.2.1. PEMFC Thermal Modeling

2.2.2. Radiator Modeling

3. Deep Deterministic Policy Gradient Strategy

3.1. DDPG Algorithm Description

3.2. Temperature Control Architecture for PEMFC Based on Dual DDPG-PID

3.3. Details of the DDPG

3.3.1. State

3.3.2. Action

3.3.3. Reward Function

4. The Thermal Management Strategy Based on Dual DDPG-PID

4.1. Training and Simulation Results

4.2. Analysis of Training Results of Dual DDPG-PID

4.3. Temperature Regulation in Response to Continuous Variations in Load Current

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Cathode and Anode Flow Channel Model and Membrane Water Content Model

Appendix A.1. Cathode and Anode Flow Channel Modeling

Appendix A.2. Membrane Water Content Modeling

Appendix B. PEMFC Auxiliary System Modeling

Appendix B.1. Air Compressor Modeling

Appendix B.2. Manifold Modeling and Return Manifold Modeling

Appendix B.3. Air Cooler Modeling

Appendix B.4. Air Humidifier Modeling

Appendix B.5. Hydrogen Supply System

Appendix C. DDPG Algorithm Pseudocode and Hyperparameters

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI