EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning
Abstract
:1. Introduction
2. Background
3. Related Works
4. Proposed Solutions
4.1. Overview of the Problem and Solution
- Problem: Given a plan of an indoor building and a situation of danger of fire, it is intended to give quick and accurate answers to the building’s occupants.
- Goals: Find safe paths from different danger positions in the building to safe positions in the plan within a few seconds at the most, taking into account constraints imposed by the situation.
- Proposed solution: To achieve this goal, the following steps and tools were developed:
- A modeling tool, such that a plan can be defined graphically and transformed into a graph, where each room is indicated as a node, the edges are the possible paths (e.g., a door or window), and the safe node is indicated as the objective (at the end).
- Given the graph mentioned above, our solution uses DRL algorithms to find the best emergency exit routes in the shortest possible time and with the highest possible accuracy. For this, the problem is modeled using the pattern of OpenAI Gym and a DRL algorithm based on experiment replay and target network is also defined. Three alternatives are investigated: using random initialization, without random initialization, and to rely on transfer learning instead of initializing the repetition buffer. Once the best DRL strategy is chosen, the algorithm can be applied in real time.
- Outputs. Discovered safe paths to exit routes, which can be used, for instance, to give quick response to victims in danger situations and rescuers in real time through an APP.
4.2. The Use of Deep Reinforcement Learning for Defining the Best Emergency Exit Routes
- The init method is used to initialize the variables necessary for the environment. A graph referring to the floor plan is provided with information. Following this, an adjacent matrix is created from the input graph, output nodes, and the nodes affected by fire. The spatial distance between the nodes is used as the edge weight. The number of possible actions to be performed is defined in terms of the number of nodes in the graph and the size of the observation space is defined as the number of possible actions × 2.
- The reset method is used to reset the environment to its initial conditions. This method is responsible for defining the initial state of the agent in the environment, either in a pre-defined node or randomly, if it has not received the information. By using random initial states during the training, the agent learns the path from all the nodes of the graph to the output node, although this also makes the training time much longer.
- The step method is responsible for the core of the algorithm, as it uses the neural network to define the next action required, the next state, and the reward. It receives the current state and the action as parameters. After this, it calculates the reward by means of the adjacent matrix, which shows the adjacency and distance between the nodes.
- If the next state is a node that is on the list of nodes affected by fire, the agent incurs a severe penalty (i.e., a negative reward) of −10,000.
- If the next node is not adjacent to the current node (i.e., they are not neighbors), the agent incurs a penalty (negative reward) of −5000.
- If the next node is adjacent to the current node (i.e., they are neighbors) and is not an exit node, the agent incurs a penalty (negative reward) corresponding to the weight of the edge that connects these nodes.
- If the next node is an exit node, the learning agent is granted a positive reward of 10,000, which is a sign that it has successfully completed its goal.
Algorithm 1: DRL algorithm based on experiment replay and target network |
4.3. Representing and Using Floor Plan Images
4.4. Settings for Training
5. Projects, Results, and Discussion
5.1. Project 1
5.2. Project 2
5.3. Project 3
5.4. Summary of Projects
5.5. Improvements of the System, Discussion, and Challenges
5.5.1. Improvements in the System
5.5.2. Discussion
6. Conclusions and Recommendations for Future Work
Author Contributions
Funding
Informed Consent Statement
Conflicts of Interest
References
- TodayShow. Newer Homes and Furniture Burn Faster, Giving You Less Time to Escape a Fire. 2017. Available online: https://www.today.com/home/newer-homes-furniture-burn-faster-giving-you-less-time-escape-t65826 (accessed on 28 September 2023).
- Emergency Exit Routes. 2018. Available online: https://www.osha.gov/sites/default/files/publications/emergency-exit-routes-factsheet.pdf (accessed on 22 April 2022).
- Brasil É o 3º País com o Maior Número de Mortes por Incêndio (Newsletter nº 5). 2015. Available online: https://sprinklerbrasil.org.br/imprensa/brasil-e-o-3o-pais-com-o-maior-numero-de-mortes-por-incendio-newsletter-no-5/ (accessed on 28 September 2023).
- Conselho Nacional do Ministério Público. Saídas de Emergência em edifíCios—NBR 9077. 2001. Available online: https://www.cnmp.mp.br/portal/images/Comissoes/DireitosFundamentais/Acessibilidade/NBR_9077_Sa%C3%ADdas_de_emerg%C3%AAncia_em_edif%C3%ADcios-2001.pdf (accessed on 28 September 2023).
- USFire. Residential Fire Estimate Summaries; USFire: Emmitsburg, ML, USA, 2022. [Google Scholar]
- Crispim, C.M.R. Proposta de Arquitetura Segura de Centrais de Incêndio em Nuvem. 2021. Available online: http://repositorio2.unb.br/jspui/handle/10482/40580 (accessed on 28 September 2023).
- Sharma, J.; Andersen, P.A.; Granmo, O.C.; Goodwin, M. Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 7363–7381. [Google Scholar] [CrossRef]
- Agnihotri, A.; Fathi-Kazerooni, S.; Kaymak, Y.; Rojas-Cessa, R. Evacuating Routes in Indoor-Fire Scenarios with Selection of Safe Exits on Known and Unknown Buildings Using Machine Learning. In Proceedings of the 2018 IEEE 39th Sarnoff Symposium, Newark, NJ, USA, 24–25 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Xu, S.; Gu, Y.; Li, X.; Chen, C.; Hu, Y.; Sang, Y.; Jiang, W. Indoor emergency path planning based on the Q-learning optimization algorithm. ISPRS Int. J. Geo-Inf. 2022, 11, 66. [Google Scholar] [CrossRef]
- Bhatia, S. Survey of shortest path algorithms. SSRG Int. J. Comput. Sci. Eng. 2019, 6, 33–39. [Google Scholar] [CrossRef]
- Thombre, P. Multi-Objective Path Finding Using Reinforcement Learning. Master’s Thesis, San Jose State University, San Jose, CA, USA, 2018. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Chapter 1.7—Introduction, Early History of Reinforcement Learning. In Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018; pp. 13–22. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Chapter 1.1—Introduction, Reinforcement Learning. In Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018; pp. 1–4. [Google Scholar]
- Pacelli Ferreira Dias Junior, E. Aprendizado Por Reforço Sobre o Problema de Revisitação de Páginas Web. Master’s Thesis, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Brazil, 2012; p. 27. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Chapter 1.3—Introduction, Elements of Reinforcement Learning. In Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018; pp. 6–7. [Google Scholar]
- Prestes, E. Capítulo 1.1—Conceitos Básicos. In Introdução à Teoria dos Grafos; Repositório Institucional da UFPB: João Pessoa, Brazil, 2020; pp. 2–7. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Chapter 3—Policies and Value Functions. In Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018; pp. 58–62. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Chapter 6—Temporal-Difference Learning. In Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018; pp. 119–138. [Google Scholar]
- Fedus, W.; Ramachandran, P.; Agarwal, R.; Bengio, Y.; Larochelle, H.; Rowland, M.; Dabney, W. Revisiting Fundamentals of Experience Replay. arXiv 2020, arXiv:2007.06700. [Google Scholar]
- Torres, A.J. Deep Q-Network (DQN)-II. 2020. Available online: https://towardsdatascience.com/deep-q-network-dqn-ii-b6bf911b6b2c (accessed on 28 September 2023).
- Deep Q-Learning: An Introduction to Deep Reinforcement Learning. 2020. Available online: https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/ (accessed on 28 September 2023).
- Deng, H.; Ou, Z.; Zhang, G.; Deng, Y.; Tian, M. BIM and computer vision-based framework for fire emergency evacuation considering local safety performance. Sensors 2021, 21, 3851. [Google Scholar] [CrossRef] [PubMed]
- Selin, J.; Letonsaari, M.; Rossi, M. Emergency exit planning and simulation environment using gamification, artificial intelligence and data analytics. Procedia Comput. Sci. 2019, 156, 283–291. [Google Scholar] [CrossRef]
- Wongsai, P.; Pawgasame, W. A Reinforcement Learning for Criminal’s Escape Path Prediction. In Proceedings of the 2018 5th Asian Conference on Defense Technology (ACDT), Hanoi, Vietnam, 25–26 October 2018; pp. 26–30. [Google Scholar] [CrossRef]
- Schmitt, S.; Zech, L.; Wolter, K.; Willemsen, T.; Sternberg, H.; Kyas, M. Fast routing graph extraction from floor plans. In Proceedings of the 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Sapporo, Japan, 18–21 September 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Lam, O.; Dayoub, F.; Schulz, R.; Corke, P. Automated Topometric Graph Generation from Floor Plan Analysis; Australian Robotics and Automation Association: Sydney, Australia, 2015; pp. 1–8. [Google Scholar]
- Hu, R.; Huang, Z.; Tang, Y.; van Kaick, O.; Zhang, H.; Huang, H. Graph2Plan: Learning Floorplan Generation from Layout Graphs. arXiv 2020, arXiv:2004.13204. [Google Scholar] [CrossRef]
- Kalervo, A.; Ylioinas, J.; Häikiö, M.; Karhu, A.; Kannala, J. CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis. arXiv 2019, arXiv:1904.01920. [Google Scholar]
- Lu, Y.; Tian, R.; LI, A.; Wang, X.; del Castillo Lopez, J.L.G. CubiGraph5K: Organizational Graph Generation for Structured Architectural Floor Plan Dataset; CAADRIA: Hong Kong, China, 2021. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. arXiv 2017, arXiv:1703.06870. [Google Scholar]
- Sandelin, F. Semantic and Instance Segmentation of Room Features in Floor Plans using Mask R-CNN. Master’s Thesis, Uppsala University, Uppsala, Sweden, 2019. [Google Scholar]
- Gym is a Standard API for Reinforcement Learning, and a Diverse Collection of Reference Environments. Available online: https://www.gymlibrary.dev (accessed on 28 September 2023).
- The Progressive Javascript Framework. Available online: https://vuejs.org (accessed on 28 September 2023).
- FastAPI Framework, High Performance, Easy to Learn, Fast to Code, Ready for Production. Available online: https://fastapi.tiangolo.com/ (accessed on 28 September 2023).
- MogoDB: The Developer Data Platform That Provides the Services and Tools Necessary to Build Distributed Applications Fast, at the Performance and Scale Users Demand. Available online: https://www.mongodb.com/ (accessed on 28 September 2023).
- v-Network-Graph: An Interactive Network Graph Visualization Component for Vue 3. Available online: https://dash14.github.io/v-network-graph/ (accessed on 28 September 2023).
- Projetos Arquitetônicos Para Construção—Portal do FNDE. Available online: https://www.fnde.gov.br (accessed on 28 September 2023).
- Huang, B.; Wu, Q.; Zhan, F.B. A shortest path algorithm with novel heuristics for dynamic transportation networks. Int. J. Geogr. Inf. Sci. 2007, 21, 625–644. [Google Scholar] [CrossRef]
- Machado, A.F.d.V.; Santos, U.O.; Vale, H.; Gonçalvez, R.; Neves, T.; Ochi, L.S.; Clua, E.W.G. Real Time Pathfinding with Genetic Algorithm. In Proceedings of the 2011 Brazilian Symposium on Games and Digital Entertainment, Salvador, Brazil, 7–9 November 2011; pp. 215–221. [Google Scholar] [CrossRef]
- Sigurdson, D.; Bulitko, V.; Yeoh, W.; Hernandez, C.; Koenig, S. Multi-agent pathfinding with real-time heuristic search. In Proceedings of the 2018 IEEE Conference on Computational Intelligence and Games (CIG), Maastricht, The Netherlands, 14–17 August 2018. [Google Scholar] [CrossRef]
- Konar, A.; Chakraborty, I.G.; Singh, S.J.; Jain, L.C.; Nagar, A.K. A deterministic improved Q-learning for path planning of a mobile robot. IEEE Trans. Syst. Man Cybern. Syst. 2013, 43, 1141–1153. [Google Scholar] [CrossRef]
Reference | Machine Learning | Reinforcement Learning | Real Time | Technique for Representing the Floor Plan (Graph or Matrix) |
---|---|---|---|---|
[8] | Yes | No | Does not specify | No |
[23] | Yes | No | No | Yes |
[9] | Yes | Yes | Does not specify | No |
[7] | Yes | Yes–DRL | Yes | No |
EvacuAI (This study) | Yes | Yes–DRL–TL | Yes | Yes |
Name | Type | In Features | Out Features | Activation |
---|---|---|---|---|
F1 | Linear | Observation Size | 256 | ReLU |
F2 | Linear | 256 | 128 | ReLU |
F3 | Linear | 128 | 64 | ReLU |
F4 | Linear | 64 | Qty. of actions | Linear |
Condition | Number of Episodes |
---|---|
There is one start node | (Number of edges) × 100 |
Transfer learning is used | (Number of edges/3) × 500 |
The number of edges is greater than 40 | (Number of edges) × 900 |
For the other cases | (Number of edges) × 500 |
Project 1 | ||
---|---|---|
With TL | Without TL | |
Number of Nodes | 9 | 9 |
Number of Edges | 8 | 8 |
Observation Space Size | 18 | 18 |
Number of Episodes: Start Node | 1333 | 800 |
Number of Episodes: All Nodes | 1333 | 4000 |
Project 1 | ||
---|---|---|
With TL | Without TL | |
Number of Nodes | 31 | 31 |
Number of Edges | 34 | 34 |
Observation Space Size | 62 | 62 |
Number of Episodes: Start Node | 5666 | 3400 |
Number of Episodes: All Nodes | 5666 | 17,000 |
Project 1 | ||
---|---|---|
With TL | Without TL | |
Number of Nodes | 34 | 34 |
Number of Edges | 46 | 46 |
Observation Space Size | 68 | 68 |
Number of Episodes: Start Node | 7666 | 4600 |
Number of Episodes: All Nodes | 7666 | 41,400 |
Project 1 | ||||||
---|---|---|---|---|---|---|
Simulation 1 | Simulation 2 | Simulation 3 | ||||
Blocked Nodes | None | Node 7 | Node 4 and Node 5 | |||
Initial Node | None | None | None | |||
Method | ER with replay init buffer | ER without replay init buffer | Transfer Learning | ER with replay init buffer | ER without replay init buffer | Transfer Learning |
Accuracy | 100% | 100% | 87% | 100% | 100% | 100% |
Training Time | 39 s | 33 s | 12 s | 1 min | 51 s | 16 s |
Inference Time | 0.03 s | 0.04 s | 0.04 s | 0.04 s | 0.03 s | 0.03 s |
Project 2 | ||||||||
---|---|---|---|---|---|---|---|---|
Simulation 1 | Simulation 2 | Simulation 3 | ||||||
Blocked Nodes | None | Node 2 | Nodes 11 to 17 | |||||
Initial Node | None | None | None | |||||
Method | ER with replay init buffer | ER without replay init buffer | ER with replay init buffer | ER without replay init buffer | Transfer Learning | ER with replay init buffer | ER without replay init buffer | Transfer Learning |
Accuracy | 3.26% | 100% | 0% | 0% | 0% | 7% | 100% | 100% |
Training Time | 2 min and 29 s | 2 min and 53 s | 2 min and 53 s | 3 min and 32 s | 22 s | 2 min and 38 s | 2 min and 40 s | 52 s |
Inference Time | 0.84 s | 0.69 s | 0.62 s | 0.84 s | 0.54 s | 0.62 s | 0.21 s | 0.14 s |
Project 3 | |||||||
---|---|---|---|---|---|---|---|
Simulation 1 | Simulation 2 | Simulation 3 | Simulation 4 | ||||
Blocked Nodes | None | Node 22 | Node 22 | Nodes 4, 5, 6, and 10 | |||
Initial Node | None | None | Node 2 | None | |||
Method | ER without replay init buffer | ER without replay init buffer | Transfer Learning | ER without replay init buffer | Transfer Learning | ER without replay init buffer | Transfer Learning |
Accuracy | 100% | 0% | 90% | 100% | 100% | 100% | 100% |
Training Time | 7 min and 14 s | 6 min and 43 s | 1 min | 21 s | 20 s | 10 min and 21 s | 1 min |
Inference Time | 0.17 s | 0.61 s | 0.49 s | 0.01 s | 0.02 s | 0.11 s | 0.09 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rosa, A.C.; Falqueiro, M.C.; Bonacin, R.; de Mendonça, F.L.L.; Filho, G.P.R.; Gonçalves, V.P. EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning. Sensors 2023, 23, 8892. https://doi.org/10.3390/s23218892
Rosa AC, Falqueiro MC, Bonacin R, de Mendonça FLL, Filho GPR, Gonçalves VP. EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning. Sensors. 2023; 23(21):8892. https://doi.org/10.3390/s23218892
Chicago/Turabian StyleRosa, Anna Carolina, Mariana Cabral Falqueiro, Rodrigo Bonacin, Fábio Lúcio Lopes de Mendonça, Geraldo Pereira Rocha Filho, and Vinícius Pereira Gonçalves. 2023. "EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning" Sensors 23, no. 21: 8892. https://doi.org/10.3390/s23218892
APA StyleRosa, A. C., Falqueiro, M. C., Bonacin, R., de Mendonça, F. L. L., Filho, G. P. R., & Gonçalves, V. P. (2023). EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning. Sensors, 23(21), 8892. https://doi.org/10.3390/s23218892