# Deep Reinforcement Learning Algorithm Based on Fusion Optimization for Fuel Cell Gas Supply System Control

^{*}

## Abstract

**:**

## 1. Introduction

- A simplified hybrid model environment of the fuel cell air and hydrogen circuits is built;
- An optimal flow control strategy based on net power optimization is proposed;
- Deep reinforcement learning controllers based on deterministic policy gradient are proposed to control the oxygen flow and hydrogen flow. The effect of decoupled and coupled controllers is compared;
- A controller that integrates fuzzy PID and DRL algorithms is proposed, which has a faster dynamic response than traditional PID and more stable steady-state performance than DRL algorithms.

## 2. Fuel Cell Gas Supply System

#### 2.1. Fuel Cell Output Voltage Model

^{2}, ${i}_{max}$ is the limit current density. ${i}_{max}$, ${c}_{1}$, ${c}_{2}$ and ${c}_{3}$ are the constants related to the gas pressure and temperature. ${t}_{m}$ denotes the thickness of the PEM in m, and ${\sigma}_{m}$ is the conductivity of the PEM, related to the temperature and membrane. The specific expressions of the relevant parameters are detailed in Ref. [20] and will not be repeated.

#### 2.2. Fuel Cell Gas Supply System Flow Mathematical Model

- All gases are ideal gases;
- Air flow, hydrogen flow, temperature, humidity, etc., are controlled separately;
- The temperature inside the stack is uniformly distributed and always remains constant;
- The gas pressure inside the stack is uniformly distributed;
- The stack gas water vapor inside the stack is saturated; the liquid volume also has no effect on the system.

#### 2.3. Critical Component Models of the Fuel Cell Gas Supply System

#### 2.3.1. Air Compressor Model

^{2}, ${\omega}_{cp}$ is the rotational speed of the air compressor in rpm, and ${\tau}_{cm}$ and ${\tau}_{cp}$ are the motor driving torque and resistance torque of the air compressor in N·m, respectively.

#### 2.3.2. Recirculating Pump Model

#### 2.4. Fuel Cell Gas Supply System Model

## 3. Deep Reinforcement Learning Algorithm

#### 3.1. Deep Reinforcement Learning

Algorithm 1 DQN algorithm |

Initialize action-value function Q random parameters ${\theta}_{Q}$ Initialize target networks ${\theta}_{{Q}^{*}}\leftarrow {\theta}_{Q}$ Initialize replay buffer $\mathcal{B}$ |

#### 3.2. Principles of Deep Reinforcement Learning Algorithm Based on Deterministic Policy Gradient

- Dual network: K (usually 2) sets of critic networks are used, and the smallest is taken when calculating the target value, thus suppressing the network overestimation problem;
- Target policy smoothing regularization: when calculating the target value, a perturbation is added to the action of the next state, thus making the value evaluation more accurate;
- Delayed update: the actor network is updated only after every d updates of the critic network, thus ensuring a more stable training of the actor network.

Algorithm 2 DDPG/TD3 algorithm |

Initialize K critic networks ${Q}_{k}(k=1,2...,K)$, and actor network $\pi $ with random parameters ${\theta}_{{Q}_{k}}$, ${\theta}_{\pi}$ Initialize target networks ${\theta}_{{Q}_{k}^{*}}\leftarrow {\theta}_{{Q}_{k}}$, ${\theta}_{{\pi}^{*}}\leftarrow {\theta}_{\pi}$ Initialize replay buffer $\mathcal{B}$ |

## 4. Deep Reinforcement Learning Algorithm Based on Fusion Optimization

#### 4.1. Adaptive Expectations

#### 4.2. Deep Reinforcement Learning Controller

#### 4.3. Deep Reinforcement Learning Controller Based on Fusion Optimization

## 5. Simulation Results and Analysis

#### 5.1. DRL Controller Training and Testing Results

#### 5.2. Controller Comparison Test and Result Analysis

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Kojima, K. Recent progress of research and development for fuel cell vehicle application. In Proceedings of the 2008 International Conference on Effects of Hydrogen on Materials, Grand Teton National Park, WY, USA, 7–10 September 2008. [Google Scholar]
- Wang, Y.; Chen, K.S.; Mishler, J.; Cho, S.C.; Adroher, X.C. A review of polymer electrolyte membrane fuel cells: Technology, applications, and needs on fundamental research. Appl. Energy
**2011**, 88, 981. [Google Scholar] [CrossRef] - Liu, J.; Zhou, Z.; Zhao, X.; Xin, Q.; Sun, G.; Yi, B. Fuel Cell Overview. Phys. Chem. Chem. Phys.
**2004**, 6, 134. [Google Scholar] [CrossRef] - Lachaize, J.; Caux, S.; Fadel, M.; Schott, P.; Nicod, L. Pressure, flow and thermal control of a fuel cell system for electrical rail transport. In Proceedings of the 2004 IEEE International Symposium on Industrial Electronics, Ajaccio, France, 4–7 May 2004. [Google Scholar]
- Goshtasbi, A.; Ersal, T. Degradation-conscious control for enhanced lifetime of automotive polymer electrolyte membrane fuel cells. Power Sources
**2020**, 457, 227996. [Google Scholar] [CrossRef] - Ryu, S.K.; Vinothkannan, M.; Kim, A.R.; Yoo, D.J. Effect of type and stoichiometry of fuels on performance of polybenzimidazole-based proton exchange membrane fuel cells operating at the temperature range of 120–160 C. Energy
**2022**, 238, 121791. [Google Scholar] [CrossRef] - Matraji, I.; Laghrouche, S.; Wack, M. Pressure control in a PEM fuel cell via second order sliding mode. Int. J. Hydrogen Energy
**2012**, 37, 16104. [Google Scholar] [CrossRef] - Matraji, I.; Laghrouche, S.; Wack, M. Second order sliding mode control for PEM fuel cells. In Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, GA, USA, 15–17 December 2010. [Google Scholar]
- Tang, X.; Wang, C.S.; Mao, J.H.; Liu, Z.J. Adaptive fuzzy PID for proton exchange membrane fuel cell oxygen excess ratio control. In Proceedings of the 32nd Chinese Control and Decision Conference, Hefei, China, 22–24 August 2020. [Google Scholar]
- Wei, G.; Quan, S.; Zhu, Z.; Pan, M.; Qi, C. Neural-PID control of air pressure in fuel cells. In Proceedings of the 2010 International Conference on Measuring Technology and Mechanical Automation(ICMTMA), Changsha, China, 13–14 March 2010. [Google Scholar]
- Kim, E.S.; Kim, C.J. Nonlinear State Space Model and Control Strategy for PEMFC systems. J. Energy Power Eng.
**2010**, 4, 8. [Google Scholar] - Yang, D.; Pan, R.; Wang, Y.; Chen, Z. Modeling and control of PEMFC air supply system based on TS fuzzy theory and predictive control. Energy
**2019**, 188, 116078. [Google Scholar] [CrossRef] - Sedighizadeh, M.; Rezazadeh, A. Adaptive Self-Tuning Wavelet Neural Network Controller for a Proton Exchange Membrane Fuel Cell. Appl. Neural Netw. High Assur. Syst.
**2010**, 268, 221–245. [Google Scholar] - Li, C.; Zhu, X.; Sui, S.; Hu, W.; Hu, M. Maximum power point tracking of a photovoltaic energy system using neural fuzzy techniques. J. Shanghai Univ.
**2009**, 13, 29–36. [Google Scholar] [CrossRef] - Hwang, J.J. Effect of hydrogen delivery schemes on fuel cell efficiency. J. Power Sources
**2013**, 239, 54. [Google Scholar] [CrossRef] - He, J.L.; Choe, S.Y.; Hong, C.O. Analysis and control of a hybrid fuel delivery system for a polymer electrolyte membrane fuel cell. J. Power Sources
**2008**, 185, 973. [Google Scholar] [CrossRef] - Quan, S.W.; Chen, J.Z.; Wang, Y.X.; He, H.; Li, J. A hierarchical predictive strategy-based hydrogen stoichiometry control for automotive fuel cell power system. In Proceedings of the 16th IEEE Vehicle Power and Propulsion Conference, Hanoi, Vietnam, 14–17 October 2019. [Google Scholar]
- Wang, Y.; Quan, S.; Wang, Y.; He, H. Design of adaptive backstepping sliding mode-based proton exchange membrane fuel cell hydrogen circulation pump controller. In Proceedings of the Asia Energy and Electrical Engineering Symposium, Chengdu, China, 28–31 May 2020. [Google Scholar]
- Lee, J.H.; Lalkb, T.R.; Appleyc, A.J. Modeling electrochemical performance in large scale proton exchange membrane fuel cell stacks. J. Power Sources
**1998**, 70, 2. [Google Scholar] [CrossRef] - Pukrushpan, J.T.; Stefanopoulou, A.G.; Peng, H. Control of Fuel Cell Power Systems; Springer: London, UK, 2004. [Google Scholar]
- Silver, D. Deep Reinforcement Learning. A Tutorial at ICML 2016, 19 June 2016. Available online: https://www.deepmind.com/learning-resources/introduction-to-reinforcement-learning-with-david-silver (accessed on 16 January 2023).
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science
**2006**, 313, 504. [Google Scholar] [CrossRef] [PubMed] - LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature
**2015**, 521, 436. [Google Scholar] [CrossRef] [PubMed] - Gao, Y.; Chen, S.F.; Lu, X. Research on reinforcement learning technology: A review. Acta Autom. Sin.
**2004**, 30, 86. [Google Scholar] - Montague, P.R. Reinforcement learning: An introduction. Trends Cogn. Sci.
**1999**, 3, 360. [Google Scholar] [CrossRef] - Boyan, J.A. Least-squares temporal difference learning. In Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, 27 June 1999. [Google Scholar]
- Aissani, N.; Beldjilali, B.; Trentesaux, D. Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach. Eng. Appl. Artif. Intell.
**2009**, 22, 1089. [Google Scholar] [CrossRef] - Rodatz, P.; Tsukada, A.; Mladek, M.; Guzzella, L. Efficiency improvements by pulsed hydrogen supply in PEM fuel cell systems. In Proceeding of the 15th IFAC Triennial World Congress, Barcelona, Spain, 21–26 July 2002. [Google Scholar]

**Figure 3.**Optimum OER and air compressor voltage under different stack currents, where the black points indicate the optimal points: (

**a**) the relationship between OER and the net power output of the fuel cell system under different current requirements, (

**b**) the optimal OER under different current, (

**c**) the optimal air compressor voltage under different current.

**Figure 6.**Fuzzy PID parameters: (

**a**–

**c**), (

**d**–

**f**) represent the PID parameters Kp, Ki, and Kd of the fuzzy logic output of OER and HER, respectively.

**Figure 9.**Reward values during DRL controller training. (

**a**) Reward changes in relation to episode. (

**b**) Reward changes in relation to time.

**Figure 11.**Test results of hydrogen excess ratio control: (

**b**–

**e**) are the enlarged figures of (

**a**) near 4 s, 12 s, 16 s, 26 s when the load current changes, respectively.

**Figure 12.**Test results of oxygen excess ratio control: (

**b**–

**e**) are the enlarged figures of (

**a**) near 4 s, 12 s, 16 s, 26 s when the load current changes, respectively.

Parameters | 2-DRL | DRL | |
---|---|---|---|

agent_${\mathit{O}}_{2}$ | agent_${\mathit{H}}_{2}$ | ||

action | ${V}_{cm}$(${a}_{1}$) | ${N}_{rcp}$(${a}_{2}$) | ${V}_{cm}$, ${N}_{rcp}$ |

state | ${I}_{st}$, ${e}_{OER}$, $\Delta {e}_{OER}$, ${V}_{cm}$, $\Delta {V}_{cm}$ | ${I}_{st}$, ${e}_{HER}$, $\Delta {e}_{HER}$, ${N}_{rcp}$, $\Delta {N}_{rcp}$ | ${I}_{st}$, ${e}_{OER}$, $\Delta {e}_{OER}$, ${V}_{cm}$, $\Delta {V}_{cm}$, ${e}_{HER}$, $\Delta {e}_{HER}$, ${N}_{rcp}$, $\Delta {N}_{rcp}$ |

reward | ${R}_{OER}+{R}_{{a}_{1}}$ | ${R}_{HER}+{R}_{{a}_{2}}$ | ${R}_{OER}+{R}_{{a}_{1}}+{R}_{HER}+{R}_{{a}_{2}}$ |

Symbol | Value | Unit | Instructions |
---|---|---|---|

M | 1000 | - | Maximum number of cycles |

${T}_{s}$ | 0.01 | s | Sampling period |

T | 1000 | s | Maximum steps per cycle |

$\gamma $ | 0.9 | - | Discount factor |

$\tau $ | $5\times {10}^{-4}$ | - | Learning rate |

**Table 3.**Control effect parameter values of multiple algorithms, where the red values indicate the best values for each algorithm in the table.

Algorithm | HER | OER | ||||||
---|---|---|---|---|---|---|---|---|

RMSE${}_{+2\mathit{s}}$ | RMSE${}_{\mathit{st}}$ | ${\mathit{T}}_{\mathit{r}}$(s) | ${\mathit{M}}_{\mathit{d}}(\%)$ | RMSE${}_{+2\mathit{s}}$ | RMSE${}_{\mathit{st}}$ | ${\mathit{T}}_{\mathit{r}}$(s) | ${\mathit{M}}_{\mathit{d}}(\%)$ | |

Feed-Forward | 0.1215 | 0.000046 | 0.89 | 31.22% | 0.0611 | 0.000052 | 0.16 | 2.90% |

Fuzzy-PID | 0.0699 | 0.000033 | 0.36 | 9.09% | 0.0561 | 0.000014 | 0.12 | 3.20% |

DQN | 0.0643 | 0.011157 | 0.28 | 8.28% | 0.0485 | 0.004120 | 0.11 | 1.02% |

DDPG | 0.0386 | 0.005113 | 0.15 | 2.44% | 0.0420 | 0.005646 | 0.11 | 0.64% |

MDQN | 0.0508 | 0.005309 | 0.26 | 5.64% | 0.0481 | 0.004019 | 0.11 | 1.04% |

MTD3 | 0.0519 | 0.003132 | 0.71 | 5.95% | 0.0472 | 0.000828 | 0.07 | 0.40% |

FO-DDPG | 0.0370 | 0.000033 | 0.15 | 4.36% | 0.0441 | 0.000031 | 0.11 | 1.01% |

FO-MTD3 | 0.0545 | 0.000035 | 0.24 | 7.30% | 0.0487 | 0.000010 | 0.10 | 0.92% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yuan, H.; Sun, Z.; Wang, Y.; Chen, Z.
Deep Reinforcement Learning Algorithm Based on Fusion Optimization for Fuel Cell Gas Supply System Control. *World Electr. Veh. J.* **2023**, *14*, 50.
https://doi.org/10.3390/wevj14020050

**AMA Style**

Yuan H, Sun Z, Wang Y, Chen Z.
Deep Reinforcement Learning Algorithm Based on Fusion Optimization for Fuel Cell Gas Supply System Control. *World Electric Vehicle Journal*. 2023; 14(2):50.
https://doi.org/10.3390/wevj14020050

**Chicago/Turabian Style**

Yuan, Hongyan, Zhendong Sun, Yujie Wang, and Zonghai Chen.
2023. "Deep Reinforcement Learning Algorithm Based on Fusion Optimization for Fuel Cell Gas Supply System Control" *World Electric Vehicle Journal* 14, no. 2: 50.
https://doi.org/10.3390/wevj14020050