ESCAPE : Evacuation Strategy through Clustering and Autonomous Operation in Public Safety Systems

Natural disasters and terrorist attacks pose a significant threat to human society, and have stressed an urgent need for the development of comprehensive and efficient evacuation strategies. In this paper, a novel evacuation-planning mechanism is introduced to support the distributed and autonomous evacuation process within the operation of a public safety system, where the evacuees exploit the capabilities of the proposed ESCAPE service, towards making the most beneficial actions for themselves. The ESCAPE service was developed based on the principles of reinforcement learning and game theory, and is executed at two decision-making layers. Initially, evacuees are modeled as stochastic learning automata that select an evacuation route that they want to go based on its physical characteristics and past decisions during the current evacuation. Consequently, a cluster of evacuees is created per evacuation route, and the evacuees decide if they will finally evacuate through the specific evacuation route at the current time slot or not. The evacuees’ competitive behavior is modeled as a non-co-operative minority game per each specific evacuation route. A distributed and low-complexity evacuation-planning algorithm (i.e., ESCAPE) is introduced to implement both the aforementioned evacuee decision-making layers. Finally, the proposed framework is evaluated through modeling and simulation under several scenarios, and its superiority and benefits are revealed and demonstrated.


Introduction
Public Safety Systems (PSSs) have gained a growing interest owing to events such as natural disasters and terrorist attacks that pose a significant threat to human society.Within PSSs, the problem of evacuation planning is an issue of major practical importance, as the efficient and effective guidance of the crowds can contribute and improve evacuees' survivability, and save human lives [1].During the evacuation process, the evacuees face the problem of incomplete information regarding the surrounding environment, e.g., a bottleneck in evacuation routes, which further hampers the decision-making process and deteriorates their fast and accurate evacuation-route selection [2].Moreover, a disaster area dynamically changes over the time due to the evolving catastrophic event.Consequently, human behavior becomes unpredictable due to the imposed uncertainty by the public safety system [3].Thus, evacuation planning becomes an even more complicated process, especially if a large number of coevacuees [4] and available evacuation routes are included within humans' decision map [5].
In this paper, we address the aforementioned challenges and, particularly, propose an autonomous evacuation process based on the principles of reinforcement learning and game theory to support distributed and efficient evacuation planning.Humans/evacuees act as rational entities in the public safety system who are able to sense changes in the disaster area via the limited provided information, and make optimal evacuation decisions about themselves.

Contributions and Outline
The key contributions of our research work that differentiate it from the rest of the literature body are summarized as follows: (a) A novel paradigm for distributed and autonomous evacuation planning is introduced, allowing the evacuees to make the most beneficial actions for themselves by using the proposed ESCAPE service, which can run on their mobile devices.The ESCAPE service was developed based on the principles of the reinforcement learning and game theory, and consists of two decision-making layers.(b) At the first layer, the evacuees acting as stochastic learning automata [6][7][8] decide which evacuation route they want to join based on their past decisions while performing the current evacuation, and taking the (limited) available information from the disaster area through the ESCAPE service, e.g., evacuation rate per route, evacuees already on the route, and capacity of the route.The latter information can easily be available in a real implementation scenario through sensors deployed at the evacuation routes.Based on evacuees' decisions, a cluster of them is created at each evacuation route.It is highlighted that this human decision expresses the evacuation route where the evacuee desires to go, and if they finally evacuate through the specific evacuation route is determined by the second layer of their decision.(c) At the second layer, the evacuees of each cluster have to determine if they will finally go or not toward the initially selected evacuation route.Given that the evacuees usually exhibit aggressive and non-co-operative behavior under the risk of their lives, the decision-making process at each cluster of each evacuation route is formulated as a non-co-operative game among the evacuees.It is also highlighted that evacuees' decisions are interdependent.The theory of minority games [9,10] was adopted, allowing evacuees to dynamically and in a distributed fashion choose if they will finally go through the evacuation route, while considering the route's available free space (Distributed Exponential learning algorithm for Minority Games (DEMG)).
The formulated minority game is solved using an exponential learning algorithm.(d) A distributed and low-complexity evacuation planning algorithm, named ESCAPE, is introduced to implement both decision-making layers of the evacuees.ESCAPE complexity is provided that demonstrates the low time overhead of the proposed evacuation planning, and therefore could be adopted and considered for realistic implementation.(e) Detailed numerical and comparative results demonstrate that the proposed holistic framework concludes to a promising solution for realizing a distributed and autonomous evacuation planning that confronts with the needs and requirements of both the evacuees and the Emergency Control Center.
This paper is organized as follows.In Section 2, the overall system model, the ESCAPE methodology, and the related work are described, while, in Section 3, our proposed reinforcement-learning-based evacuation route selection is presented.Section 4 introduces the non-co-operative minority game among evacuees to finally determine their evacuation route.The distributed and low-complexity ESCAPE algorithm, as well as its complexity, are presented in Section 5. Finally, a detailed numerical evaluation of our approach via modeling and simulation is demonstrated in Section 6, while Section 7 concludes the paper.

System Model and Overview
We consider a disaster area where a catastrophic event is evolving, and several evacuation routes are available within the disaster area.Let us denote their set by E = {1, . . ., e, . . ., |E|}, while each human m residing in the disaster area aims at selecting an evacuation route e to escape from the area.The set of evacuees is denoted by M = {1, . . ., m, . . ., |M|}.Each evacuation route is characterized by a specific capacity of evacuees C e , and a corresponding rate of evacuation λ e , where both depend on the physical characteristics of the evacuation route, e.g., stairs or highway.At the time that an evacuee examines to select an evacuation route, a number of humans |M| e , ∀e ∈ E are already in the specific route e, ∀e ∈ E .The distance of each evacuee m from the evacuation route e is denoted by d m,e .Figure 1 provides an illustrative example of the topology and model under consideration [11].

ESCAPE Methodology
A high-level overview of the overall framework proposed in this research work is depicted in Figure 2. In the following sections, its individual components are analyzed in detail.Specifically, as it is observed in Figure 2, we formulate and address the problem of distributed and autonomous evacuation planning, where evacuees are able to make the most beneficial actions for themselves.In a nutshell, the proposed problem and evacuation process formulation are as follows.Before evacuees start the evacuation process, they interact with the ESCAPE service through their mobile device to make an optimal decision regarding the evacuation route that they will follow.ESCAPE consists of two main components: (a) evacuation-route selection, during which evacuees decide where they want to go, which is based on reinforcement learning, and (b) the decision-making process of whether evacuees will finally go or not go to the initially selected evacuation route, which is formulated via the concept of minority games.
In the following subsection, we summarize some of the most recently proposed evacuation-planning methods in order to better position our work within the existing literature.

Related Work
Several centralized approaches have been proposed in the literature to deal with the evacuation-planning problem, where all information of the disaster area's status (e.g., available evacuation routes, number of evacuees, evacuation routes' congestion levels) is gathered to a central decision-making entity, i.e., an Emergency Control Center (ECC) [12].The authors in [13] developed an evacuation service named MacroServ to support the transportation departments simulate an emergency evacuation by recommending to the evacuees the most preferable routes toward safe locations during a disaster.The authors adopted the Poisson and Weibull distributions to quantify various stochastic factors that influence their model, such as behavior of evacuees while driving, and rate of evacuees departing from an area.The problem of contraflow for evacuation is studied in [14], where a transportation network with edges of specific capacity and travel time is considered.The authors determined a reconfigured network with the ideal direction for each edge toward minimizing evacuation time by reallocating each edge's capacity.A similar problem was discussed in [15] where a multiple-objective optimization problem was formulated while considering evacuation priorities and setup time for the contraflow operation.Evacuation planning due to fire propagation was examined in [16], where the authors developed a probabilistic model to characterize how the disaster affects crowd behavior and, in turn, evacuation time.
An analytic system-optimal dynamic traffic-assignment model with probabilistic demand and capacity constraints was proposed in [17] to capture the evacuation process, while stochastic programming was used as the solution method.Following a similar philosophy, a distributed-constraint optimization algorithm was introduced in [18] to optimize human evacuation timing by estimating their location.A centralized optimization problem was formulated in [19] aiming at minimizing total clearance time of the disaster area and the travel time of each evacuee under the constraints of avoiding traffic congestion and balancing traffic loads between different evacuation routes.Another system-oriented approach was discussed in [20], targeting evacuation traffic management at intersections of the disaster area by presenting a simulation-based framework to maximize evacuation efficiency with uncertain budget constraints of the Emergency Control Center.A heuristic algorithm was introduced in [21], which considered the disaster area as a transportation network and dynamically selected evacuation routes with a maximum flow rate to provide a high-velocity evacuee stream.A human-centric approach was followed in [22] by introducing a quality-of-service routing algorithm to accommodate the needs of different types of evacuees based on age, mobility, and level of resistance to hazards.In [23], the authors proposed a resource-allocation algorithm based on random neural networks to allocate the first responders to those evacuees whose health conditions have deteriorated beyond a certain level in the course of an evacuation.

Discussion, Threats, and Solutions
Nevertheless, the aforementioned approaches are mainly centrally oriented evacuation-planning methods, meaning that all available information from the field of disaster area must be gathered at the Emergency Control Center for optimal planning, either based on high computing demands but accurate optimal solutions, or by executing heuristic algorithms for approximate but fast decisions.The outcomes of evacuation planning are then announced (or even enforced) to the evacuees to guarantee their safety.However, in real-life evacuation planning, evacuees may not follow the proposed evacuation plan and retain their distributed and autonomous decision-making.Our research work aims at filling this gap and focuses on the autonomous and distributed operation of the Public Safety System.This is achieved by allowing evacuees to make optimal decisions about themselves regarding the selection of the optimal evacuation route in a dynamically changing disaster area by learning from their past actions and considering the reaction of the surrounding environment.

Evacuation Planning through Reinforcement Learning
Toward realizing distributed and autonomous evacuation planning, we consider evacuees as stochastic learning automata [6].At each operation time slot t of the reinforcement-learning loop, each evacuee m has a set of actions a m (t) = {1, . . ., e, . . ., |E|} in terms of selecting an evacuation route.Each action represents a different choice e of evacuation route.Toward making their decisions, evacuees consider the status of the disaster area, and specifically output set β(t) = {I(t), S(t)}, where I(t) is the set of information characterizing the disaster area and S(t) = [S 1 (t), . . .S m (t), . . .S |M| (t)] is the final strategy that evacuees decide while competing among each other per each evacuation route that they want to go.The latter action is realized through a non-co-operative minority game, analyzed in detail in Section 4. Set of information I(t) includes evacuation rate λ e (t) of each evacuation route e at time slot t, which implicitly represents the normalized speed with which evacuees can move within an evacuation route, the capacity C e of each evacuation route and its number of humans |M| e (t) that already aim at evacuating through it.Output β(t) is determined from the second layer of the ESCAPE service, i.e., the minority game among evacuees.
Based on the chosen final action s(t) and the corresponding reaction of the disaster area (i.e., updated λ e (t) and |M| e (t) values), we are able to determine reward function r m,e (t) for each evacuee m per each evacuation route e, which is associated with evacuees' action a The reward function of each user is normalized as rm,e (t) = r m,e (t) to reflect the reward probability rm,e (t), 0 ≤ rm,e (t) ≤ 1 of evacuee m per evacuation route e. Reward probability rm,e (t) represents the level of satisfaction that evacuee m will have if they select to evacuate via evacuation route e at time slot t.
Given reward probability rm,e (t), each evacuee is considered to act as a stochastic learning automaton and determines their action probability vector Pr m (t) = [Pr m,1 (t), . . .Pr m,e (t), . . .Pr m,|E| (t)], where Pr m,e (t) expresses the probability that evacuee m wants to go to evacuation route e at time slot t.Following the action probability updating rule of stochastic learning automata [6], we have the following model: where b, 0 < b < 1, is a step-size parameter that controls the convergence time of the stochastic learning automata (i.e., evacuees).The impact of parameter b on the convergence time of the ESCAPE service is numerically studied in Section 6. Equation (2a) represents the probability that evacuee m will select the same evacuation route e (t+1) m in time slot (t + 1) compared to the one in time slot t, i.e., e (t) m , while Equation (2b) reflects the probability of evacuee m selecting a different evacuation route.It is noted that the proposed reinforcement-learning approach that is based on the concept of stochastic learning automata needs set of information I(t).The latter becomes available to the ESCAPE service through sensors established at the evacuation routes, which send the information to the Emergency Control Center that, in turn, broadcasts that information to the evacuees.At the initial stage of evacuation planning, evacuees have no prior knowledge of reward probability rm,e (t = 0), ∀m ∈ M, ∀e ∈ E, and action probability Pr m (t) = [Pr m,1 (t), . . .Pr m,e (t), . . .Pr m,|E| (t)].Thus, the initial selection of an evacuation route by evacuees is made with equal probability, i.e., Pr m,e (t = 0) = 1 |E| .It is also highlighted that, based on the principle that each evacuee tries to improve their perceived satisfaction from the evacuation process, a convergence toward the evacuation route that each evacuee wants to join is finally obtained.The algorithmic description of autonomous evacuation planning based on the reinforcement-learning approach of stochastic learning automata is provided in Section 5.

Evacuate or Not? A Minority Game Approach
Based on the reinforcement-learning component of evacuation planning as presented in Section 3, evacuees select the evacuation route that they potentially want to follow be guided to a safe place.Thus, the reinforcement-learning component implicitly acts as a clustering mechanism of evacuees per evacuation route.At this stage, evacuees act in a selfish and competitive manner in terms of finally being able to evacuate or not through the evacuation route that they initially selected.Evacuees that finally do not evacuate the disaster area at the current time slot, due to the created bottleneck and competition, re-enter the reinforcement-learning-based decision process in the next time slot.Evacuees' competitive behavior is captured through non-co-operative game theory and, specifically, a minority game-theoretic approach is introduced.The main concept of the proposed minority game is that a number of evacuees per evacuation route repeatedly compete in order to be in the minority group by selecting a strategy among the two available ones, i.e., evacuate or not.Each evacuee makes an autonomous decision regarding the strategy that they should follow, being agnostic of the strategies that the rest of the evacuees have selected on joining a specific evacuation route through the reinforcement-learning component (Section 3), based only on the history of the game's outcome.The evacuees that belong to the minority group win and promote their winning action for the next iteration of the game.It is shown in [24] that a minority game has a nonempty set of Pure Nash Equilibria.
Let G t e = [|M| e (t), {S m } m∈M , { f s m (m)} m∈M ] denote the minority game per each evacuation route e that is played at each time slot t of the reinforcement-learning component.Game G t e consists of the set of evacuees/players |M| e (t) that have selected in time slot t that they want to evacuate through evacuation route e, their set of strategies S m = {0, 1}, and the payoff function f S m (m).The strategy of each evacuee at each iteration ite of minority game G m e is denoted as s ∈ S m .An evacuee can finally select not to evacuate through the desired evacuation route, i.e., s where κ ≤ C e − |M| e (t) and C e − |M| e (t) is the threshold number of evacuees that can physically go through evacuation route e at time slot t.The distributed and low-complexity algorithm to determine the Pure Nash Equilibrium of minority game G t e is presented and discussed in detail in Section 5.

ESCAPE Algorithm
In this section, ESCAPE algorithm (Algorithm 1) and the low-complexity Distributed Exponential Learning Algorithm for Minority Games (DEMG) (Algorithm 2) for human evacuation are presented.Specifically, at each timeslot of the stochastic learning automata, each human m ∈ M determines their choice of evacuation route e (t) m that evacuee m wants to go considering their action probabilities It is highlighted that this human's decision expresses the evacuation route where the evacuee desires to go, while if they finally evacuate through the specific evacuation route will be determined by the second layer of their decision, i.e., the DEMG algorithmic component.Therefore, after each human's m ∈ M choice e (t) m , a cluster of humans N e is constructed for each evacuation route e ∈ E, and a minority game is played in order for the set of humans who finally evacuate, i.e., N (go) e , and the corresponding set of humans who do not evacuate, i.e., N (ngo) e to be determined.Toward determining the Pure Nash Equilibrium (PNE) of each minority game, a distributed exponential learning algorithm for minority games (DEMG) is introduced, allowing humans to learn from their past actions and make the most beneficial strategy selection.In other words, the humans are considered sophisticated players, exploring potential strategies and converging to a PNE [9].In this paper, a distributed exponential learning algorithm is deployed in order to enable humans of each cluster N e to converge to one of the ( N e C e −|M| e −1 ) + ( N e C e −|M| e +1 ) PNE points [24].Based on the exponential learning algorithm, each human m ∈ M, starting with equal probabilities, i.e., pr m,s m =1 = 0.5, and zero scores for the actions 'go' and 'do not go', i.e., π m,s m =1 = 0, at each iteration ite of the minority game determines its action s m,s m .In order for each human m ∈ M to evaluate payoff f s m (m) of their chosen action s (ite) m , winning action w (ite) is announced by the ECC as a broadcasted bit of information, i.e., w (ite) ∈ {0, 1}.It should be clarified that the ECC does not make any centralized decision for the evacuation procedure and, accordingly, the decision-making process lies with the evacuees.Furthermore, in case that communication between evacuees and the ECC is not feasible, other mechanisms can be followed to substitute ECC's role, such as clustering mechanisms, where the cluster head per each evacuation route can communicate with the rest of the evacuees over device-to-device communication and announce the winning action.After the convergence of each minority game played at each evacuation route, evacuees N (go) e , ∀e ∈ E are excluded from the system, and humans that did not evacuate N (ngo) e , ∀e ∈ E, acting as stochastic learning automata, determine their reward functions r m,e (t), ∀e ∈ E and, based on Equations (2a) and (2b), they update their action probabilities Pr m,e (t + 1), ∀e ∈ E for the next timeslot.   .Furthermore, as we see in the numerical results below (Section 6), both Ite| t and the number of the timeslots T scale very well with respect to the number of humans, i.e., |M|, and the number of evacuation routes, i.e., |E|; consequently, our approach is characterized by low complexity.end if 27: end while

Numerical Results
In this section, we present indicative numerical results that illustrate and explain the operation, features, and benefits of the proposed ESCAPE service.Initially, in Section 6.1, we focused on the evaluation of the operational characteristics of our proposed pure framework.Furthermore, in Section 6.2 we provided scalability and complexity analysis of the proposed framework, while in Section 6.3 we performed a comparative study of our proposed approach against alternative strategies in order to better reveal its benefits.
In our study, unless otherwise explicitly stated, we consider a set of 301 evacuees, i.e., |M| = 301, and four possible evacuation routes, i.e., |E| = 4, with randomly assigned respective capacities C e ∈ [9,15], C e ∈ N, ∀e ∈ E.Moreover, the number of evacuees who were already in a certain evacuation route e was initialized to |M| e = 0, ∀e ∈ E and evacuation rates were randomly initialized to λ e ∈ [0.2, 0.9], ∀e ∈ E. The distance of each evacuee m from each evacuation route e was normalized within the range of (0.0, 1.0], and randomly assigned as d m,e ∈ (0.0, 1.0].

Pure Operation of the Framework
Figure 3a presents the evolution of probabilities pr m,s m , s m ∈ S m = {0, 1}, i.e., the probabilities to evacuate and not evacuate as they are determined by the minority game (DEMG algorithm) of two indicative evacuees who want to go to the same evacuation route e as a function of the minority game's iterations required to converge to the PNE point.It is clear that there initially is a common starting point for all probabilities (i.e., assumed to be equal to 0.5), but afterward the minority game very quickly converges in terms of iterations (lower horizontal axis), as well as in terms of real execution time (upper horizontal axis), this way guiding the first evacuee (User1) to evacuate and the second evacuee (User2) not to evacuate.The corresponding probabilities for the aforementioned decisions for both evacuees are equal to 1, while the complementary probabilities are equal to 0, as the non-co-operative minority game converges.Figure 3b illustrates the required number of iterations of the minority-game algorithmic component (DEMG algorithm) in order to converge as a function of exponential learning step γ.It is clearly observed that, for larger values of γ, the time it takes the non-co-operative game to converge decreases, while for smaller values it increases.The aforementioned behavior is observed due to the fact that, for large values of γ, each evacuee m deterministically selects best strategy s m (evacuate or not evacuate) with the highest accumulated score, while for small values each evacuee more explores the available alternatives.Therefore, in the rest of our analysis, we consider a relatively small value of exponential learning parameter γ, i.e., γ = 0.8, toward allowing evacuees to better investigate the available choices.
In Figure 3c we consider two distinct evacuees (User3 and User10) from the set of evacuees and evaluate their action probability vectors (updated according to Equations (2a) and (2b)), as a function of the iterations of the reinforcement-learning component.Specifically, we observed that the action probability for only one evacuation route per each evacuee increases, eventually reaching 1.0, while the other probabilities for all remaining evacuation routes decrease until they reach a value equal to 0.0.Such observations reconfirm that the reinforcement-learning component successfully converges to a feasible solution.In the examined scenario, we notice that User3 wants to go to Evacuation Route 1, while User10 wants to go to Evacuation Route 4. Figure 3d illustrates the necessary reinforcement-learning component iterations in order to converge (blue line) and the corresponding average reward function with respect to the number of users (red line) as a function of step-size learning parameter b of the reinforcement-learning component.It is clear that, for large values of b, convergence time as well as the average reward function decreases.This occurs because small b values enable evacuees to better explore the disaster area's conditions and eventually make better choices regarding which evacuation route they want to follow, like, for instance, exploring evacuation routes that are not so congested as better options.Thus, in the latter case, more time is needed for the reinforcement-learning component to converge, but evacuees experience a superior reward.Considering the tradeoff between evacuees' perceived reward and the corresponding time that the ESCAPE algorithm needs to converge, in the following, unless otherwise stated, we consider the stochastic-learning component's step-size parameter b = 0.7, so evacuees enjoy a relatively superior reward and, at the same time, make fast decisions.Figure 3e presents the cluster size of humans per evacuation route as a function of iterations (lower horizontal axis) and time (upper horizontal axis) that the reinforcement-learning component needs to converge.Specifically, we observe how evacuees are distributed to evacuation routes during reinforcement-algorithm iterations (until the convergence) and what the physical characteristics of the routes that define this behavior are.In particular, we observe that most evacuees choose evacuation routes with the greatest capacity C e and, among routes with the greatest capacity, they choose the one with the greatest average evacuation rate (AER) λ e .

Scalability and Complexity Evaluation
In this section, detailed scalability analysis and complexity evaluation are provided in terms of the framework performance for increasing number of evacuees and available evacuation routes.ESCAPE was tested and evaluated with an Intel(R) Core(TM) 2 Duo CPU T7500 @ 2.20 GHz laptop with 2.00 GB RAM.For demonstration purposes and without loss of generality, we considered that evacuees have mobile devices of similar computing capabilities, e.g., a commercial smartphone device.Specifically, Figure 4a presents the necessary reinforcement-learning component's convergence iterations to PNE, the corresponding real execution time, as well as the average reward probability (discussed in Section 3), as the number of evacuees increases.We observed that the proposed ESCAPE framework scales quite well with respect to the increasing number of evacuees, presenting almost linear behavior.Moreover, it was observed that the average reward function decreases as the number of evacuees increases due to the fact that the evacuation routes become more congested and eventually evacuees tend to choose routes where the number of evacuees that are already in, i.e., |M| e , is high.Thus, based on Equation (1), the average reward function decreases.In Figure 4b, we present the behavior of the proposed framework in terms of the required reinforcement-learning algorithmic component's iterations to converge to the PNE and of the average reward function as the number of the available evacuation routes increases.We can clearly discern that, while the number of evacuation routes increases, the number of ESCAPE iterations and also the real execution time decrease because routes are more; as a result, evacuees have the potential of exploring the disaster area more and eventually choosing evacuation routes that are not so congested.Consequently, the reinforcement-learning component needs a smaller number of iterations to reach the PNE.For exactly the same reason, the average reward function increases as the number of the available evacuation routes increases.

Comparative Evaluation
In this subsection, we focus on comparing the performance of our proposed framework with other alternative implementation strategies.Our study was performed over a large set of evacuees, i.e., |M| = 601, and executed in three stages, as follows.

First Scenario: Why Minority Games?
In this scenario, we compare our proposed ESCAPE framework, and specifically the minority-game component, with two different approaches regarding the way that evacuees in the disaster area choose to evacuate or not: (a) The first alternative method is the following: evacuee m chooses whether they will evacuate or not through desired evacuation route e by taking into consideration how far they are from e, i.e., based on distance d m,e .The greater d m,e is, the lower the probability for the evacuee to evacuate is; conversely, the lower d m,e is, the greater their probability to evacuate is. and , meaning that the more evacuees are already in the evacuation route, the greater the probability of not to evacuate is, as the evacuation route is congested.Now, if C e ≤ |M| e (t) then we consider that Pr NotEvacuate = 1.0 and Pr Evacuate = 0.0.In the following we refer to this strategy as "Capacity".
In particular, Figure 5a shows the average number of evacuees that are already in each evacuation route e, i.e., |M| e , in combination with the corresponding capacities (Cap).It is obvious that our approach demonstrates better performance between all alternatives, as it properly and dynamically distributes evacuees between the available evacuation routes.This is confirmed by the fact that our proposed approach presents the lowest average number of evacuees in all routes, while these numbers being below the corresponding capacities for all routes.This means that, with the ESCAPE service, evacuation routes never become too congested, thus leading evacuees away from the disaster area quickly in a safe way.On the other hand, the two other approaches gather a greater average number of evacuees per evacuation route, which is above the capacity for all the routes, making them very congested and unsafe because of potential evacuation delays.Between the two other approaches, the one with the capacity threshold of evacuees is the worst because the algorithm waits for the number of the evacuees already in the routes to drop below capacity limits, and when this happens many evacuees choose them again in order to evacuate.As a result, the average number of evacuees remains high (and above the capacity threshold).On the other hand, the approach with the distance weights (even if it underperforms compared to our proposed approach), it still outperforms the approach with the capacity criterion.This is due to the fact that the former accomplishes a wider distribution of evacuees in evacuation routes than the latter approach, as the criterion to choose an evacuation route is distance d m,e , ∀m ∈ M, ∀e ∈ E. Figure 5b complies with the aforementioned observations, as the total average reward function of our approach is the greatest one compared to the other two approaches.

Second Scenario: Why Reinforcement Learning?
In this scenario, we compare our proposed reinforcement-learning component with four other approaches by modifying the reward function described in Section 3, Equation (1) as follows: (a) The first alternative approach is defined by reward function as r m,e (t) = 1 d m,e . The greater distance d m,e of evacuee m from evacuation route e is, the lower the corresponding reward function is (we refer to this strategy as "Distance Reward"). (b) The second alternative defines the reward function as: r m,e (t) = C e .The greater the C e (i.e., Capacity) of selected evacuation route e is, the greater the reward function is as the aforementioned evacuation route fits more evacuees (we refer to this strategy as "Capacity Reward"). (c) The third alternative approach is defined by reward function r m,e (t) = λ e .The greater evacuation rate λ e is, the greater the reward function is (we refer to this strategy as: "Evac_Rate Reward").(d) The last approach is defined by reward function r m,e (t) = λ e • |AvgDistEvac − d m,e |, where AvgDistEvac is the average distance of all other evacuees from respective evacuation route e that evacuee m selected (we refer to this strategy as "Avg_Dist Reward").
Figure 6 presents the average reward function of all aforementioned approaches.It is observed that our approach presents the best performance (highest average reward function) because it examines and properly combines several features of the other approaches, including capacity C e , evacuation rate λ e , and distances of evacuees from evacuation routes d m,e , ∀m∀e.Moreover, we notice that the "Capacity Reward" approach had the worst performance as it took into consideration only capacity C e , which is a static feature and common for all evacuees.Among the remaining methods, the one with the evacuation rate as a reward function ("Evac_Rate Reward") is the best because its nature is dynamic, and it is constantly updated during the algorithm, implicitly indicating the evacuees how congested evacuation routes are.Finally, approaches that consider evacuee distances from the evacuation routes (i.e., "Distance Reward" and "Avg_Dist Reward"), given that distances are different for each evacuee, achieved a better distribution of the evacuees to the evacuation routes when compared to the "Capacity Reward" alternative.Finally, in order to demonstrate and quantify the benefits of the introduced reinforcement-learning component, we compared our proposed approach with two other approaches that do not implement any learning process; instead, evacuees choose the evacuation route that they want to join right away, following specific metrics each time.In particular, the alternative approaches considered in this comparison are described as follows:

Our
(a) The first alternative approach is that evacuee m chooses evacuation route e with lowest respective distance d m,e .(b) The second approach is that evacuee m chooses evacuation route e with greatest evacuation route λ e .
Figure 7 presents the average reward function of all three approaches (including the proposed ESCAPE framework), and it is clearly observed that our approach significantly outperformed the two alternatives.This observation was expected as, through the reinforcement-learning component, evacuees acting as stochastic learning automata are able to thoroughly examine the disaster area in terms of how congested evacuation routes are.As a result, evacuees are distributed in the evacuation routes much more efficiently than in the other two approaches, where evacuees make their choice right away based on simple metrics.As far as the two alternative methods are concerned, the one where evacuee m chooses evacuation route e that has greatest evacuation rate λ e had worse performance than the approach where the evacuee chooses the route with the lowest distance d m,e .

Our Approach
Lowest Distance Greatest Evacuation Rate

Conclusions
In this paper, a distributed and autonomous evacuation process based on the key principles of reinforcement learning and game theory was proposed in order to support distributed and efficient evacuation planning in Public Safety Systems.Specifically, two decision-making layers were identified and implemented for evacuees toward selecting the most appropriate and effective evacuation route among available ones.At the first layer, a reinforcement-learning technique was introduced where evacuees act as stochastic learning automata and select the evacuation route to which they want to go.Consequently, based on evacuees' decisions, a cluster of evacuees is created per evacuation route, where humans compete with each other toward each determining their final evacuation path.Evacuees' competitive behavior is captured via a non-co-operative minority game among evacuees per each evacuation route.The existence and uniqueness of a Pure Nash Equilibrium of the minority game is shown, where evacuees decide if they will finally evacuate he disaster area or not via the initially selected evacuation route.The overall framework is captured in a distributed and low-complexity algorithm, namely, ESCAPE.
The operation and performance of our proposed framework were extensively evaluated through modeling and simulation, while the presented numerical results demonstrated its superior performance in an evacuation process.Our current and future work contains the consideration of evacuees' mobility during the evacuation process by capturing evacuee speed along a chosen escape route, as well as the formulation of the examined problem under the principles of the Tragedy of the Commons, where evacuation routes act as a common pool of resources that can fail to serve evacuees due to overexploitation.Moreover, we plan to consider evacuees' speed toward reaching and going through an evacuation route as a factor to update the probabilities to evacuate or not through the most convenient evacuation route.

Figure 1 .
Figure 1.Topology of the disaster area and system model.

Figure 2 .
Figure 2. Autonomous evacuation-planning framework as a learning system.
(ite) m = 0, or to evacuate, i.e., s (ite) m = 1.For each strategy s (ite) m ∈ S m , there is a function f s m : {1, . . .m, . . .|M|} → R, where, for each m, m ∈ M, the number f s m (m) ∈ R expresses the payoff that evacuee m receives.Payoff function f s m (m) is formulated as follows.
based on their action's payoff f s m (m), as it is given in Equation (3), human m updates their chosen action's score π (ite)

Algorithm 2 ⇒ 4 :
Distributed Exponential Learning Algorithm for Minority Games (DEMG) 1: Input: ⇒ Set of humans who selected the route: N e ⇒ Capacity of the route C e ⇒ Set of humans inside the route |M| e 2: Output: ⇒ Set of humans that go and do not go to the route N go e Each human m, m ∈ N e has equal probability to go or not go pr (0) m,s m = 0.5, ∀m ∈ N e , ∀s m ∈ S m ⇒ ite = 0, number of iterations for minority-game convergence ⇒ π (0) m,s m = 0, ∀s m ∈ S m , the score of each action ⇒ Convergence = 0 Iterative Procedure: 5: while Convergence == 0 do C e − |M| e ) then 16:w (ite) = 1; go is the winning action 17:

else 18 :
w (ite) = 0;do not go is the winning action then

Figure 7 .
Figure 7. Comparative analysis-Scenario 3: average reward function of all alternative approaches.
Regarding ESCAPE's complexity, it can be assumed that in each timeslot t of the stochastic learning automata, minority games for evacuation routes (ESCAPE algorithm's lines 12 to 16) are played in parallel, since each minority game for each evacuation route e ∈ E is fully independent, and the only necessary information is corresponding cluster size N e and available space inside the route e, i.e., C e − |M| e .Therefore, by denoting as Ite| t the number of iterations that are needed for the convergence of the minority game that finishes last for timeslot t, and since the complexity of the DEMG algorithm is O(|M|), the complexity of all minority games is O(Ite| t • |M|).Moreover, the rest of the ESCAPE algorithm has complexity O(|M| + |M| • |E|), i.e., O(|M| • |E|), and by denoting as T the number of timeslots that are needed for the reinforcement-learning component's convergence, the ESCAPE algorithm's complexity is O(T • (Ite| t • |M| + |M| • |E|)) m (t)22:for e = 1 to |E| do if (∀m ∈ M, ∃e ∈ E: |Pr m,e (t + 1) − 1| ≤ , → 0) then Action probability vs. iterations That is: Pr NotEvacuate = d m,e and Pr Evacuate = 1 − d m,e .In the following, we refer to this strategy as "Distance".(b) The second alternative method is the following: evacuee m chooses whether they will evacuate or not through desired evacuation route e by taking into consideration the threshold number of evacuees that can physically go through evacuation route e at time slot t, i.e., C e − |M| e (t), as discussed in Section 4. In particular, if C e > |M| e (t) then Pr NotEvacuate = 1 C e − |M| e (t) Comparative analysis-Scenario 2: average reward function of all alternative strategies.