# Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Modeling of the Vanadium Redox-Flow Battery

#### 2.1. Determination of the State of Charge (SoC)

- ∘
- $I\left(t\right)$ refers to the current used for charging or discharging the battery.
- ∘
- ${I}_{Loss}$ signifies the current losses resulting from internal processes within the VRFB, such as shunt currents or vanadium permeation.
- ∘
- ${C}_{Stor}$ denotes the practical storage capacity of the battery system, measured in ampere-hours (Ah), which typically differs from the theoretical storage capacity. The real storage capacity reflects the actual amount of charge the battery can hold, accounting for various factors that might affect its performance. These factors often make the real storage capacity deviate from the ideal or theoretical value ${C}_{Theo}$.
- ∘
- R represents the gas constant with a numerical value of 8.314 J Mol${}^{-1}$ K${}^{-1}$.

#### 2.2. Determination of Cell Voltage (U)

- ∘
- ${U}_{0}^{\prime}$ is the formal cell potential, which applies when the concentrations of all vanadium oxidation states are identical.
- ∘
- F is a Faraday constant with a numerical value of 96,486 AsMol${}^{-1}$.
- ∘
- z refers to the transferred electrons during the reaction.
- ∘
- ${N}_{cell}$ is the number of cells used to determine the battery voltage.

#### 2.3. Calculation of Power (P)

#### 2.4. Dataset Overview

- ∘
- AC Power: This stands for “Alternating Current power”. It refers to the type of electrical power where the direction of the electric current reverses periodically. It is measured after the inverter and expressed in W.
- ∘
- DC Voltage: This is recorded by the battery management system (BMS) and reaches a maximum of 60 V when fully charged, decreasing to a minimum of 38 V during controlled discharge when in a discharged state.
- ∘
- DC: This stands for “Direct Current”, expressed in A.
- ∘
- State of charge (SoC): The SoC values were read out with the aid of the software and used to control the battery during charging or discharging. The validation of the open-circuit voltage used in the BMS is expressed as a %.
- ∘
- Temperature: This refers to the temperature of the container or the electrolytes, expressed in °C.

**1 kW**power cycle dataset for the training process while reserving the other power cycle datasets for testing purposes.

## 3. Learning VRFB Parameters Using Deep Reinforcement Learning

#### 3.1. Advancing to Deep Reinforcement Learning (Deep RL)

#### 3.2. Markov Decision Process Formulation

- ∘
- Environment: As our focus is on the VRFB, we consider the VRFB as the environment for our RL system. It includes essential details about the battery, such as its internal parameters and the extracted and preprocessed raw dataset.
- ∘
- Agent: The agent uses the neural network to approximate the q-values through interactions with the VRFB environment.
- ∘
- State: The state s is a tensor that includes the raw data, the simulated data produced by resolving the mathematical system, and the battery-specific parameters.
- ∘
- Action Space: The action space reflects the adjustment and variation of the battery-specific parameters. Hence, our action space is a discrete space. In this context, we defined two different configurations of the action space. Initially, we outlined three possible actions, each involving adjusting the parameters: increasing ${I}_{loss}$, ${R}_{i}$, ${U}_{0}^{\prime}$, and ${C}_{Stor}$; decreasing them, or maintaining them. Subsequently, we expanded our action space to include nine distinct actions that refer to increasing, decreasing, or maintaining them separately. This method enabled us to determine which configuration produced better outcomes.
- ∘
- Reward function: The agent learns the optimal battery-specific parameters that can enhance the simulation model’s accuracy. Thus, we were inspired by and utilized the reward function in [36]. We defined the following straightforward reward function given by the equation to encourage faster convergence and facilitate the learning process for the DQN agent.

- ∘
- $\lambda $ is a hyperparameter;
- ∘
- $best\_ERROR$ is the lowest error achieved so far during an episode;
- ∘
- $ERROR$ includes the errors for the different battery state variables ($So{C}_{error}$, ${U}_{error}$, and ${I}_{error}$) multiplied by the weights (${w}_{SoC}$, ${w}_{U}$, and ${w}_{I}$) for each variable, as described in Equation (5).

#### 3.3. Training Algorithm for Deep Q-Network

**increase**,

**decrease**, or

**maintain**the battery-specific parameters), and the environment responds to this action and presents a new state ${s}_{t+1}$ to the agent. Based on the state/action, a reward ${r}_{t}$ is provided that reflects its performance in fitting the simulated data with the raw data. In this process, the agent utilizes the predicted q-value, target q-value, and observed reward obtained from the data sample to calculate a loss. This loss is then employed to train the q-network.

## 4. Results and Discussion

#### 4.1. Training DQN Agent

#### 4.1.1. Definition of the Gym Environment

- ∘
- Initialization Method: The first method is the initialization method, where we create our environment class. We establish the action and state spaces and other initial battery parameter values within this function.
- ∘
- Step Method: This method is run every single time we take a step within our environment involving taking and applying an action ${a}_{t}$ to the environment. This method returns the next state/observation ${s}_{t+1}$; the actual determined reward ${r}_{t}$; a Boolean variable done, which refers to the end of the episode; and the set info, which contains some additional information. In our case, we used the variable info to store the battery-specific parameters ${I}_{loss}$, ${R}_{i}$, ${U}_{0}^{\prime}$, and ${C}_{Stor}$.
- ∘
- Render method: This method is used to visualize the current state. We used this method to track and display the alignment between the simulated data and the raw data during the episode.
- ∘
- Reset method: Lastly, the reset method is where we reset the environment and obtain the initial observations.

#### 4.1.2. DQN Parameters

#### 4.1.3. Training Results

**charging**and

**discharging**cases of a power cycle of

**P = 1 KW**for

**2500**episodes for both configurations of the action spaces. This means that only one power cycle at a time was selected to train the model.

- a.
- Optimizing the Learning Rate Selection for Training

**0.001**resulted in superior performance compared to the other configurations in terms of cumulative reward and rapid learning. Therefore, we chose to proceed with this value. The models were trained with the same neural network architecture, discount rate, and learning rate. During training, we created a model checkpoint each time we achieved a new highest cumulative reward within an episode.

- b.
- Choosing the Optimal Action Space

**= 0.3**, ${\mathit{w}}_{\mathit{U}}$

**= 0.5**, and ${\mathit{w}}_{\mathit{I}}$

**= 0.2**. With fewer actions in the three-action space case, we aimed for a faster learning process. It is evident that the cumulative reward produced by fewer actions achieved better learning performance compared to the other configuration for both the charging and discharging processes, as demonstrated in Figure 11 and Figure 12. We can see that the average reward increased over training episodes, signifying effective learning by the DQN agent. In the three-action space case, the agent achieved a higher cumulative reward compared to the other case. The increased reward indicates that the agent is making progress in acquiring the desired task knowledge that reflects learning the optimal VRFB parameters.

**three valid actions**was used for the subsequent evaluation phase.

#### 4.2. Evaluation of the DQN Agent

#### 4.2.1. DQN Agent Testing Process

Algorithm 1 General testing loop of our DQN agent |

Reset the environment and get state ${\mathit{s}}_{\mathit{t}}$for episode =1, M do doReset the environment and get the initial state ${\mathit{s}}_{\mathit{t}}$ while True doPredict the action ${\mathit{a}}_{\mathit{t}}$ related to ${\mathit{s}}_{\mathit{t}}$ Execute a step s and get the next state ${\mathit{s}}_{\mathit{t}+\mathbf{1}}$, reward ${\mathit{r}}_{\mathit{t}}$, done, and info Render the environment if done thenDisplay optimal ${\mathit{I}}_{\mathit{loss}}$, ${\mathit{R}}_{\mathit{i}}$, ${\mathit{U}}_{\mathbf{0}}^{\prime}$, and ${\mathit{C}}_{\mathit{Stor}}$ stored in the variable info end ifend whileend for |

#### 4.2.2. Testing Results and Evaluation

**10%**compared to the reference work. To provide a more comprehensive illustration, Figure 14 demonstrates the results of fitting the simulated data with the raw data while testing the DQN agent specifically on the

**10 kW**power cycle for the state variables (SoC and voltage) for the charging process. Figure 15 demonstrates the same results but for the discharging process. It is important to note that these figures present the predicted state variable for each step of the episode.

#### 4.2.3. Discussion

#### 4.3. Enhancing the Performance of the DQN Agent

**10%**.

## 5. Conclusions

**0.111%**for the SoC and

**1.114 V**for the voltage. In addition, it outperformed the optimization results of the method used in the prediction of battery state variables and the determination of optimal battery parameters, especially voltage, in [26].

**10%**. This optimized simulation model can be generalized to other types of redox-flow batteries. In addition to the outstanding performance of our learned VRFB parameters, our concept proved to be simple and easy to adapt when the optimization goals need to be changed. In other words, it is possible to change the target of change in the optimization process and we may just adapt the reward function and let our RL agent train autonomously again.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Global Energy. Available online: https://www.iea.org/energy-system/transport/electric-vehicles (accessed on 15 July 2023).
- Sutikno, T.; Arsadiando, W.; Wangsupphaphol, A.; Yudhana, A.; Facta, M. A review of recent advances on hybrid energy storage system for solar photovoltaics power generation. IEEE Access
**2022**, 10, 42346–42364. [Google Scholar] [CrossRef] - Ferret, R.; Sánchez-Díez, E.; Ventosa, E.; Guarnieri, M.; Trovò, A.; Flox, C.; Marcilla, R.; Soavi, F.; Mazur, P.; Aranzabe, E. Redox flow batteries: Status and perspective towards sustainable stationary energy storage. J. Power Sources
**2021**, 481, 228804. [Google Scholar] - Skyllas-Kazacos, M. All-Vanadium Redox Battrey; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Rizzuti, A.; Carbone, A.; Dassisti, M.; Mastrorilli, P.; Cozzolino, G.; Chimienti, M.; Olabi, A.G.; Matera, F. Vanadium: A Transition Metal for Sustainable Energy Storing in Redox Flow Batteries; ELSEVIER: Amsterdam, The Netherlands, 2016. [Google Scholar]
- Di Noto, V.; Sun, C.; Negro, E.; Vezzù, K.; Pagot, G.; Cavinato, G.; Nale, A.; Bang, Y.H. Hybrid Inorganic-Organic Proton-Conducting Membranes Based on SPEEK Doped with WO3 Nanoparticles for Application in Vanadium Redox Flow Batteries; Electrochimica Acta, ELSEVIER: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Neves, L.P.; Gonçalves, J.; Martins, A. Methodology for Real Impact Assessment of the Best Location of Distributed Electric Energy Storage; Sustainable Cities and Society; Elsevier: Amsterdam, The Netherlands, 2016. [Google Scholar]
- Schubert, C.; Hassen, W.F.; Poisl, B.; Seitz, S.; Schubert, J.; Usabiaga, E.O.; Gaudo, P.M.; Pettinger, K.H. Hybrid Energy Storage Systems Based on Redox-Flow Batteries: Recent Developments, Challenges, and Future Perspectives. Batteries
**2023**, 9, 211. [Google Scholar] [CrossRef] - Skyllas-Kazacos, M.; Xiong, B.; Zhao, J.; Wei, Z. Extended Kalman filter method for state of charge estimation of vanadium redox flow battery using thermal-dependent electrical model. J. Power Sources
**2014**, 262, 50–61. [Google Scholar] - He, F.; Shen, W.X.; Kapoor, A.; Honnery, D.; Dayawansa, D. H infinity observer based state of charge estimation for battery packs in electric vehicles. In Proceedings of the 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), Hefei, China, 5–7 June 2016. [Google Scholar]
- Hannan, M.A.; Lipu, M.H.; Hussain, A.; Mohamed, A. A review of lithium-ion battery state of charge estimation and management system in electric vehicle applications: Challenges and recommendations. Renew. Sustain. Energy Rev.
**2017**, 78, 834–854. [Google Scholar] [CrossRef] - Costa-Castell’o, R.; Strahl, S.; Luna, J. Chattering free sliding mode observer estimation of liquid water fraction in proton exchange membrane fuel cells. J. Frankl. Inst.
**2017**, 357, 13816–13833. [Google Scholar] - Clemente, A.; Cecilia, A.; Costa-Castelló, R. SOC and diffusion rate estimation in redox flow batteries: An I&I-based high-gain observer approach. In Proceedings of the 2021 European Control Conference (ECC), Rotterdam, The Netherlands, 29 June–2 July 2021. [Google Scholar]
- Battaiotto, P.; Fornaro, P.; Puleston, T.; Puleston, P.; Serra-Prat, M.; Costa-Castelló, R. Redox flow battery time-varying parameter estimation based on high-order sliding mode differentiators. Int. J. Energy Res.
**2022**, 46, 16576–16592. [Google Scholar] - Clemente, A.; Montiel, M.; Barreras, F.; Lozano, A.; Costa-Castello, R. Vanadium redox flow battery state of charge estimation using a concentration model and a sliding mode observer. IEEE Access
**2021**, 9, 72368–72376. [Google Scholar] [CrossRef] - Choi, Y.Y.; Kim, S.; Kim, S.; Choi, J.I. Multiple parameter identification using genetic algorithm in vanadium redox flow batteries. J. Power Sources
**2020**, 45, 227684. [Google Scholar] [CrossRef] - Wan, S.; Liang, X.; Jiang, H.; Sun, J.; Djilali, N.; Zhao, T. A coupled machine learning and genetic algorithm approach to the design of porous electrodes for redox flow batteries. Appl. Energy
**2021**, 298, 117177. [Google Scholar] [CrossRef] - Niu, H.; Huang, J.; Wang, C.; Zhao, X.; Zhang, Z.; Wang, W. State of charge prediction study of vanadium redox-flow battery with BP neural network. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 1289–1293. [Google Scholar]
- Cao, H.; Zhu, X.; Shen, H.; Shao, M. A neural network based method for real-time measurement of capacity and SOC of vanadium redox flow battery. In Proceedings of the International Conference on Fuel Cell Science, Engineering and Technology, American Society of Mechanical Engineers, San Diego, CA, USA, 28 June–2 July 2015; Volume 56611, p. V001T02A001. [Google Scholar]
- Heinrich, F.; Klapper, P.; Pruckner, M. A comprehensive study on battery electric modeling approaches based on machine learning. J. Power Sources
**2021**, 4, 17. [Google Scholar] [CrossRef] - Iu, H.; Fernando, T.; Li, R.; Xiong, B.; Zhang, S.; Zhang, X.; Li, Y. A novel one dimensional convolutional neural network based data-driven vanadium redox flow battery modelling algorithm. J. Energy Storage
**2023**, 61, 106767. [Google Scholar] - Iu, H.; Fernando, T.; Li, R.; Xiong, B.; Zhang, S.; Zhang, X.; Li, Y. A Novel U-Net based Data-driven Vanadium Redox Flow Battery Modelling Approach. Electrochim. Acta
**2023**, 444, 141998. [Google Scholar] - Hannan, M.A.; How, D.N.; Lipu, M.H.; Mansor, M.; Ker, P.J.; Dong, Z.; Sahari, K.; Tiong, S.K.; Muttaqi, K.M.; Mahlia, T.I.; et al. Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer model. Sci. Rep.
**2021**, 11, 19541. [Google Scholar] [CrossRef] [PubMed] - Zhang, C.; Li, T.; Li, X. Machine learning for flow batteries: Opportunities and challenges. Chem. Sci.
**2022**, 17, 4740–4752. [Google Scholar] - Wei, Z.; Xiong, R.; Lim, T.M.; Meng, S.; Skyllas-Kazacos, M. Online monitoring of state of charge and capacity loss for vanadium redox flow battery based on autoregressive exogenous modeling. J. Power Sources
**2018**, 402, 252–262. [Google Scholar] [CrossRef] - Zugschwert, C.; Dundálek, J.; Leyer, S.; Hadji-Minaglou, J.R.; Kosek, J.; Pettinger, K.H. The Effect of input parameter variation on the accuracy of a Vanadium Redox Flow Battery Simulation Model. Batteries
**2021**, 7, 7. [Google Scholar] [CrossRef] - Weber, R.; Schubert, C.; Poisl, B.; Pettinger, K.H. Analyzing Experimental Design and Input Data Variation of a Vanadium Redox Flow Battery Model. Batteries
**2023**, 9, 122. [Google Scholar] [CrossRef] - He, Q.; Fu, Y.; Stinis, P.; Tartakovsky, A. Enhanced physics-constrained deep neural networks for modeling vanadium redox flow battery. J. Power Sources
**2022**, 542, 231807. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Plaat, A. Deep Reinforcement Learning; Springer Nature Singapore Pte Ltd.: Singapore, 2022. [Google Scholar]
- König, S. Model-Based Design and Optimization of Vanadium Redox Flow Batteries; Karlsruher Instituts für Technologie: Karlsruhe, Germany, 2017. [Google Scholar]
- Haisch, T.; Ji, H.; Weidlichm, C. Monitoring the state of charge of all-vanadium redox flow batteries to identify crossover of electrolyte. Electrochim. Acta
**2020**, 336, 135573. [Google Scholar] [CrossRef] - Loktionov, P.; Konev, D.; Pichugov, R.; Petrov, M.; Antipov, A. Calibration-free coulometric sensors for operando electrolytes imbalance monitoring of vanadium redox flow battery. J. Power Sources
**2023**, 553, 232242. [Google Scholar] [CrossRef] - Shin, K.H.; Jin, C.S.; So, J.Y.; Park, S.K.; Kim, D.H.; Yeon, S.H. Real-time monitoring of the state of charge (SOC) in vanadium redox-flow batteries using UV–Vis spectroscopy in operando mode. J. Energy Storage
**2020**, 27, 101066. [Google Scholar] [CrossRef] - Cellstrom GmbH. Available online: https://unternehmen.fandom.com/de/wiki/Cellstrom_GmbH (accessed on 15 June 2023).
- Subramanian, S.; Ganapathiraman, V.; El Gamal, A. Learned Learning Rate Schedules for Deep Neural Network Training Using Reinforcement Learning; Amazon Science, ICLR: Los Angeles, CA, USA, 2023. [Google Scholar]
- Gym Library. Available online: https://www.gymlibrary.dev/content/basic_usage/ (accessed on 5 May 2023).
- DQN Improvement. Available online: https://dl.acm.org/doi/fullHtml/10.1145/3508546.3508598#:~:text=In%20a%20discrete%20or%20continuous,k%7D%2C%20k%20%E2%88%88%20R (accessed on 22 April 2023).
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1995–2003. [Google Scholar]

**Figure 2.**Steps of the simulation model proposed in [26].

**Figure 12.**Cumulative reward for the discharging process for both configurations of the action space.

**Figure 14.**Comparison of simulated and raw data using P = 10 KW for the charging process. (

**a**) SoC testing outcome for the charging process. (

**b**) Voltage testing outcome for the charging process.

**Figure 15.**Comparison of simulated and raw data using P = 10 KW for the discharging process. (

**a**) SoC testing outcome for the discharging process. (

**b**) Voltage testing outcome for the discharging process.

Initial Value | |
---|---|

SoC (%) | Charge: 20, Discharge: 80 |

${\mathit{C}}_{\mathit{Stor}}$
(As) | 870,000 |

${\mathit{I}}_{\mathit{loss}}$
(A) | 10.0 |

${\mathit{U}}_{\mathbf{0}}$
(V) | 1.375 |

${\mathit{R}}_{\mathit{i}}$
(m$\mathsf{\Omega}$) | 0.75 |

No. of stacks | 10 |

Nominal Voltage (V) | 48 |

Temperature (°C) | 21.85 |

No. of Cells | 40 |

Parameter | Definition | Value |
---|---|---|

num_layers | Number of hidden layers used | 2 |

num_units | Number of units used to enhance the quality of training and prediction | 256.2 |

Activation Function | The non-linear activation function used for the NN | ReLU |

Replace | The frequency with which the target network is updated | 1000 |

Epsilon
$\mathit{\u03f5}$ | Level of probability randomness for each iteration | 1.0 |

eps_min | The ending value of Epsilon | 0.1 |

Decay_rate | Reducing the Epsilon at each iteration | 1 × ${10}^{-5}$ |

Gamma
$\mathit{\gamma}$ | Discount factor | 0.99 |

Batch_size | Number of transitions sampled from the replay buffer | 64 |

Learning_rate | The learning rate of the Adam optimizer | 0.001 |

Evaluation Power | Ri (m$\mathbf{\Omega}$) | U${}_{\mathbf{0}}^{\prime}$ (V) | I${}_{\mathit{Loss}}$ (A) | C${}_{\mathit{Stor}}$ (Ah) |
---|---|---|---|---|

2.5 kW | 0.29 | 1.075 | 9.7 | 2415.83 |

5 kW | 0.3 | 1.375 | 10.035 | 2416 |

7.5 kW | 0.305 | 1.675 | 10.3 | 2417.5 |

10 kW | 0.30 | 1.45 | 10.075 | 2416.88 |

Evaluation Power | VRFB State Variables | RMSE | MAE |
---|---|---|---|

State of charge SoC (%) | 0.1114 | 0.029 | |

2.5 kW | Voltage U (V) | 1.923 | 1.8369 |

Current I (A) | 0.312 | 0.1028 | |

State of charge SoC (%) | 0.2407 | 0.0723 | |

5 kW | Voltage U (V) | 1.8311 | 1.7832 |

Current I (A) | 0.2231 | 0.1031 | |

State of charge SoC (%) | 0.8394 | 0.244 | |

7.5 kW | Voltage U (V) | 1.4741 | 1.468 |

Current I (A) | 0.3741 | 0.068 | |

State of charge SoC (%) | 0.5573 | 0.1498 | |

10 kW | Voltage U (V) | 1.114 | 1.1204 |

Current I (A) | 0.113 | 0.0139 |

Approach | Evaluation Power | WLSS (%) |
---|---|---|

Model [26] | 2.5 KW | 187.58 |

5 KW | 100.14 | |

7.5 KW | 7.9 | |

10 KW | 120.51 | |

DQN Agent | 2.5 KW | 170.42 |

5 KW | 87.51 | |

7.5 KW | 6.07 | |

10 KW | 90.41 |

Metric | DQN | Dueling DQN |
---|---|---|

RMSE [V] | 1.114 | 1.1081 |

MAE [V] | 1.1204 | 1.0914 |

Metric | DQN | Deuling DQN |
---|---|---|

RMSE [%] | 0.5573 | 0.456 |

MAE [%] | 0.1498 | 0.0932 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ben Ahmed, M.; Fekih Hassen, W.
Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach. *Batteries* **2024**, *10*, 8.
https://doi.org/10.3390/batteries10010008

**AMA Style**

Ben Ahmed M, Fekih Hassen W.
Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach. *Batteries*. 2024; 10(1):8.
https://doi.org/10.3390/batteries10010008

**Chicago/Turabian Style**

Ben Ahmed, Mariem, and Wiem Fekih Hassen.
2024. "Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach" *Batteries* 10, no. 1: 8.
https://doi.org/10.3390/batteries10010008