# An Advanced Multi-Agent Reinforcement Learning Framework of Bridge Maintenance Policy Formulation

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Deterioration Model

**Ρ**, assuming that the component performance deteriorates by at most one level in the natural environment during an inspection cycle, and the matrix

**Ρ**is expressed as follows:

#### 2.2. Reinforcement Learning

#### 2.3. Q-Learning and Deep Q-Learning

#### 2.4. Multiple-Agent in Parallel

## 3. Case Study

#### 3.1. Case I: A Simple Bridge Deck System

Algorithm 1 Pseudocode |

Repeat (for each sample episode): |

Initialize t = start year (random), ${S}_{t}$ (depend on t) |

Preprocessing: ${S}_{t}\to \left[{S}_{t}^{1},\text{}{S}_{t}^{2},\text{}{S}_{t}^{3},\text{}{S}_{t}^{4},\text{}{S}_{t}^{5},{S}_{t}^{6},\text{}{S}_{t}^{7}\right]$ |

Single agent parallelism |

Initialize $\text{}{\theta}^{c}$ arbitrarily |

Repeat (for t < T) |

Choose ${A}_{t}^{c}$ using policy derived from Q (epsilon greedy) |

Take action ${A}_{t}^{c}$, observe ${R}_{t}^{c}$, sample next state ${S}_{t+1}^{c}$=${S}_{t}^{c}$*$P(s|{S}_{t}^{c},{A}_{t}^{c}$) |

and collect state <${S}_{t}^{c}$,$\text{}{A}_{t}^{c}$,$\text{}{R}_{t}^{c}$,$\text{}{S}_{t+1}^{c}$>; |

t = t + 1; |

Util t = T; |

Calculate ${U}_{t}^{c}$ for each step t in the episode, save the tuple |

<${S}_{t}^{c}$, ${A}_{t}^{c}$, ${R}_{t}^{c}$, ${S}_{t+1}^{c}$, ${U}_{t}^{c}$> to memory |

Update ${\theta}^{c}$ using Equation (16) |

Calculate $Rewar{d}_{sum}$ |

Output optimal maintenance policy |

#### 3.2. Case II: Taiping Lake Bridge

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

**Figure A1.**Case II: State transition matrix of the four categories identified in year 5. (

**a**) Stay-cables; (

**b**) Box girders; (

**c**) Towers; (

**d**) Piers.

**Figure A2.**Case II: State transition matrix of the four categories identified in year 10. (

**a**) Stay-cables; (

**b**) Box girders; (

**c**) Towers; (

**d**) Piers.

**Figure A3.**Case II: State transition matrix of the four categories identified in year 15. (

**a**) Stay-cables; (

**b**) Box girders; (

**c**) Towers; (

**d**) Piers.

## References

- Frangopol, D.M.; Dong, Y.; Sabatino, S. Bridge life-cycle performance and cost: Analysis, prediction, optimisation and decision-making. Struct. Infrastruct. Eng.
**2017**, 13, 1239–1257. [Google Scholar] [CrossRef] - Wong, K.Y.; Lau, C.K.; Flint, A.R. Planning and implementation of the structural health monitoring system for cable-supported bridges in Hong Kong. In Proceedings of the SPIE-The International Society for Optical Engineering, San Diego, CA, USA, 31 July–2 August 2000; Volume 3995, pp. 266–275. [Google Scholar]
- Bao, Y.; Chen, Z.; Wei, S.; Xu, Y.; Tang, Z.; Li, H. The state of the art of data science and engineering in structural health monitoring. Engineering
**2019**, 5, 234–242. [Google Scholar] [CrossRef] - Tsuda, Y.; Kaito, K.; Aoki, K.; Kobayashi, K. Estimating markovian transition probabilities for bridge deterioration forecasting. Kagaku Kogaku Ronbun
**2006**, 23, 241–256. [Google Scholar] [CrossRef] - Mirzaei, Z. Overview of Existing Bridge Management Systems-Report by the IABMAS Bridge Management Committee. 2014. Available online: https://www.research-collection.ethz.ch (accessed on 7 February 2022).
- Robert, W.E.; Marshall, A.R.; Shepard, R.W.; Aldayuz, J. The Pontis Bridge Management System: State-of-the-Practice in Implementation and the Pontis Bridge Management System: State-of-the-Practice in Implementation and Development. 2002. Available online: https://trid.trb.org/view/644472 (accessed on 7 February 2022).
- Andersen, N.H. Danbro—A Bridge Management System for Many Levels; Springer: Dordrecht, The Netherlands, 1990. [Google Scholar]
- Kim, S.K. Intelligent Bridge Maintenance System Development for Seo-Hae Grand Bridge//Cable-Supported Bridges: Challenging Technical Limits. In Proceedings of the Cable-Supported Bridges—Challenging Technical Limits, Seoul, Korea, 12–14 June 2001. [Google Scholar]
- Frangopol, D.M. Life-cycle performance, management, and optimisation of structural systems under uncertainty: Accomplishments and challenges. Struct. Infrastruct. Eng.
**2011**, 7, 389–413. [Google Scholar] [CrossRef] - Frangopol, D.M.; Kallen, M.J.; Noortwijk, J.M.V. Probabilistic models for life-cycle performance of deteriorating structures: Review and future directions. Prog. Struct. Eng. Mat.
**2004**, 6, 197–212. [Google Scholar] [CrossRef] - Kuhn, K.D. Network-level infrastructure management using approximate dynamic programming. J. Infrastruct. Syst.
**2010**, 6, 103–111. [Google Scholar] [CrossRef] - Medury, A.; Madanat, S. Simultaneous network optimization approach for pavement management systems. J. Infrastruct. Syst.
**2014**, 20, 04014010.1–04014010.7. [Google Scholar] [CrossRef] - Robelin, C.A.; Madanat, S.M. Dynamic programming based maintenance and replacement optimization for bridge decks using history-dependent deterioration models. Am. Soc. Civil Eng.
**2006**, 13–18. [Google Scholar] [CrossRef] - Mozer, M.C.; Touretzky, D.S.; Hasselmo, M. Reinforcement learning: An introduction. IEEE Trans. Neural Netw.
**2005**, 16, 285–286. [Google Scholar] - Canese, L.; Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-agent reinforcement learning: A review of challenges and applications. Appl. Sci.
**2021**, 11, 4948. [Google Scholar] [CrossRef] - Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-Learning algorithms: A comprehensive classification and applications. IEEE Access
**2019**, 7, 133653–133667. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. IEEE Trans. Neural Netw.
**1998**, 9, 1054. [Google Scholar] [CrossRef] - Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] [PubMed] - Wei, S.; Bao, Y.; Li, H. Optimal policy for structure maintenance: A deep reinforcement learning framework. Struct. Saf.
**2020**, 83, 101906. [Google Scholar] [CrossRef] - Madanat, S.; Mishalani, R.; Ibrahim, W.H.W. Estimation of infrastructure transition probabilities from condition rating data. J. Infrastruct. Syst.
**1995**, 1, 120–125. [Google Scholar] [CrossRef] - Chung, S.-H.; Hong, T.-H.; Han, S.-W.; Son, J.-H.; Lee, S.-Y. Life cycle cost analysis based optimal maintenance and rehabilitation for underground infrastructure management. KSCE J. Civ. Eng.
**2006**, 10, 243–253. [Google Scholar] [CrossRef] - Wellalage, N.K.W.; Zhang, T.; Dwight, R. Calibrating Markov Chain-based deterioration models for predicting future conditions of railway bridge elements. J. Bridge Eng.
**2015**, 20, 04014060. [Google Scholar] [CrossRef] - Veshosky, D.; Beidleman, C.R.; Buetow, G.W.; Demir, M. Comparative analysis of bridge superstructure deterioration. J. Struct. Eng.
**1994**, 120, 2123–2136. [Google Scholar] [CrossRef] - Park, T.; Saad, W. Kolkata paise restaurant game for resource allocation in the internet of things. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 16 April 2018; pp. 1774–1778. [Google Scholar]
- Kastampolidou, K.; Papalitsas, C.; Andronikos, T. The distributed Kolkata paise restaurant game. Games
**2022**, 13, 33. [Google Scholar] [CrossRef] - Qiu, J.; Yu, C.; Liu, W.; Yang, T.; Yu, J.; Wang, Y.; Yang, H. Low-cost multi-agent navigation via reinforcement learning with multi-fidelity simulator. IEEE Access
**2021**, 9, 84773–84782. [Google Scholar] [CrossRef] - Tham, C.K.; Prager, R.W. A modular Q-Learning architecture for manipulator task decomposition. In Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, 10–13 July 1994; pp. 309–317. [Google Scholar]
- Machado, L.; Schirru, R. The Ant-Q algorithm applied to the nuclear reload problem. Ann. Nucl. Energy
**2002**, 29, 1455–1470. [Google Scholar] [CrossRef] - Yang, L.; Sun, Q.; Ma, D.; Wei, Q. Nash Q-learning based equilibrium transfer for integrated energy management game with We-Energy. Neurocomputing
**2020**, 396, 216–223. [Google Scholar] [CrossRef] - Ben-Akiva, M.; Gopinath, D. Modeling infrastructure performance and user costs. J. Infrastruct. Syst.
**1995**, 1, 33–43. [Google Scholar] [CrossRef] - Kobayashi, K.; Do, M.; Han, D. Estimation of Markovian transition probabilities for pavement deterioration forecasting. KSCE J. Civ. Eng.
**2010**, 14, 343–351. [Google Scholar] [CrossRef] - Jiang, Y.; Saito, M.; Sinha, K.C. Bridge Performance Prediction Model Using the Markov Chain. 1988. Available online: https://trid.trb.org/view/300339 (accessed on 7 February 2022).
- Walgama Wellalage, N.K.; Zhang, T.; Dwight, R.; El-Akruti, K. Bridge deterioration modeling by Markov Chain Monte Carlo (MCMC) simulation method. In Proceedings of the 8th World Congress on Engineering Asset Management & 3rd International Conference on Utility Management & Safety, Hong Kong, China, 30 October–1 November 2013; pp. 545–556. [Google Scholar]
- Morcous, G. Performance prediction of bridge deck systems using Markov Chains. J. Perform. Constr. Fac.
**2006**, 20, 146–155. [Google Scholar] [CrossRef] - Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature
**2015**, 521, 436. [Google Scholar] [CrossRef] [PubMed]

**Figure 2.**Case I: Natural deterioration state transition matrix for the seven components. (

**a**) Wearing surfaces (

**b**) Drainage system; (

**c**) Exterior face 1 (

**d**) Exterior face 2; (

**e**) End portion 1 (

**f**) End portion 2; (

**g**) Middle portion.

**Figure 3.**Case I: State transition matrix of maintenance actions. (

**a**) Preventive scenario; (

**b**) Corrective scenario; (

**c**) Rebuild.

**Figure 4.**Deterioration matrix regression analysis of component 0. (

**a**) Component 0 performance index fitting; (

**b**) Comparison of PI and TPM.

**Figure 9.**Case II: State transition matrix of the four categories in year 0. (

**a**) Stay-cables; (

**b**) Box girders; (

**c**) Towers; (

**d**) Piers.

**Figure 10.**Case II: Policy optimization process during agents training. (

**a**) Towers; (

**b**) Stay-cables; (

**c**) Piers; (

**d**) Box girders.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhou, Q.-N.; Yuan, Y.; Yang, D.; Zhang, J.
An Advanced Multi-Agent Reinforcement Learning Framework of Bridge Maintenance Policy Formulation. *Sustainability* **2022**, *14*, 10050.
https://doi.org/10.3390/su141610050

**AMA Style**

Zhou Q-N, Yuan Y, Yang D, Zhang J.
An Advanced Multi-Agent Reinforcement Learning Framework of Bridge Maintenance Policy Formulation. *Sustainability*. 2022; 14(16):10050.
https://doi.org/10.3390/su141610050

**Chicago/Turabian Style**

Zhou, Qi-Neng, Ye Yuan, Dong Yang, and Jing Zhang.
2022. "An Advanced Multi-Agent Reinforcement Learning Framework of Bridge Maintenance Policy Formulation" *Sustainability* 14, no. 16: 10050.
https://doi.org/10.3390/su141610050