Multi-Agent Collaborative Path Planning Based on Staying Alive Policy †

: Modern mobile robots tend to be used in numerous exploration and search and rescue applications. Essentially they are coordinated by human operators and collaborate with inspection or rescue teams. Over the time, robots became more advanced and capable for various autonomous collaborative scenarios. Recent advances in the ﬁeld of collaborative exploration and coverage provide different approaches to solve this objective. Thus scope of this article is to present a novel collaborative approach for multi-agent coordination in exploration and coverage of unknown complex indoor environments. Fundamentally, the task of collaborative exploration can be divided into two core components. The principal one is a sensor based exploration scheme that aims to guarantee complete area exploration and coverage. The second core component proposed is a staying alive policy that takes under consideration the battery charge level limitation of the agents. From this perspective the path planner assigns feasible tasks to each of the agents, including the capability of providing reachable, collision free paths. The overall efﬁcacy of the proposed approach was extensively evaluated by multiple simulation results in a complex unknown environments.


Introduction
For the last years, the number of applications where mobile robots collaborate with humans [1] dramatically increased. Mainly, this is due to advances in batteries, powerful computation boards, and miniaturization of sensors. Robust and sophisticated mobile robots became a vital tool in human life, such as smart wheelchairs for disabled people [2], in autism therapy [3], and for tour-guiding [4]. Moreover, they are used in disaster search and rescue missions [5], utility inspection [6], crop monitoring [7], etc. Depending on mission requirements, mobile robots could be equipped with different types of sensors or actuators that depend on the mission objectives. Their assistance in inspection, search, and rescue missions, and the exploration of unknown areas plays crucial role and provides benefits to human operator safety, decision making, fast exploration, three-dimensional (3D) reconstruction of the environment, and providing localization [5]. In these contexts, mobile robots could be equipped with depth cameras or laser range finders that are used for navigation and situational awareness [8,9].
Most of previous works study the deployment of single robot scenarios [10][11][12]; however, in the majority of the real worlds size problems, such as exploration and coverage of large areas [13], there is a need of multiple robots for guaranteeing successful mission [14]. This arises multiple challenges scheme is established in Section 3. The simulation results are presented in Section 4, and finally the article is concluded in Section 5.

Notation and Preliminaries
In the proposed method, each agent covers a squared area with a l ∈ R >0 length, where it is assumed that the agent is placed in the middle of it and as the result of the exploration task a grid map is obtained with the same resolution. Furthermore, and without a loss of generality, it is assumed that the movement of the agent is defined with four directions, specifically North, South, East, and West, and rotation around its geometric centre. In each step, the agents can move in a fixed distance l and rotate by α = {± π 2 , ±π}. The pose of the agent i ∈ N can be defined by the q i,k = [x i,k , y i,k , θ i,k ] T vector, where x, y ∈ R 2 are the coordinates and θ is the heading at the time instant k ∈ N, with the corresponding kinematic equations being denoted by: The workspace W and its coverage model M are in R 2 , while the sets C within the workspace are denoted as R 2×m and C ⊂ W and C f i,k , C o i,k , and C global k denote the free space, obstacles, and global backtracking points sets, respectively. The transpose of a matrix A ∈ R a×b is denoted as A T ∈ R b×a . The superscripts chp , BTP and min , denote a charging point (station), a backtracking point and a minimum value correspondingly. The energy E ∈ R ≥0 is denoted as a scalar value, while E i,k is the energy level of the agent i and, at the k time instant, E tr i,k is the energy spent for the translation from q i,k to q i,k+1 . Furthermore, a lost agent is defined as q i,k = ∅, a blocked position as q i,k = ∞, and an unblocked position as q i,k = ∞. The path from q i,k to q i,k+1 is defined as where p i,k is the position of the agent and P i,k is set of all the agent positions. Furthermore, d i,k denotes the Euler distance between the position q i,k and q i,k+1 . With n ∈ N will be denoted the number of agents. Finally, s j , j ∈ {1, 2, ..., 8} is the sensor model and operator \ denotes set subtraction.

Cooperative Exploration and Coverage Scheme
In the proposed methodology, it is assumed that the environment has only one entrance, which is considered to be the starting point. It is also assumed that the discovered branches/areas have at least the same width as the agent's sweep step. At the starting point, all of the agents are looking towards the unknown area and in each iteration the agent completely covers the next cell. The overall proposed exploration scheme contains three components: (a) an exploration method, (b) a staying alive policy, and (c) a path planner, while these parts will be analysed in the sequel.

Exploration Method
In the proposed methodology, the unknown areas of the workspace W are incrementally discovered through the Boustrophedon motion [28], so that the agent translates forward or backward, exploring the surrounding space, until it reaches a blocked position and rotates to alter the direction of the translation. During exploration, each agent at each step performs sensing of the surrounding environment, denoted by s i,k , according to the sensor model that is defined as: where j ∈ [1, 2, . . . , 8]-corresponds to each adjacent area around the agent. After the completion of the overall unknown area, it will be represented as a grid map (global map, coverage model) M [36] with discovered free space C f k and obstacles C o k , with the overall architecture being depicted in Figure 1. Relying on the s i,k , the agents perform the BM according to the algorithm presented in Algorithm 1 and detect the corresponding backtracking points (BTPs) that generally can be any agent's neighbouring point that satisfies the following statement C BTP , while it should be mentioned that these points are utilized for the detection of the adjacent unknown regions if existing.
Without a proper selection mechanism for the BTPs, the total agent's path can have a lot of unnecessary transitions that will increase the overall distance and exploration time. This article proposes the novel idea in order to reduce the number of BTPs unlike, as appeared in [18,36], where a robot detects BTPs from the accumulated knowledge (covered path) only when it reaches blocked condition and uses all eight neighbouring cells of a tile to detect BTP. In the proposed approach, sensor measurements are used to detect BTPs at each step, but novel detection conditions are proposed, as depicted in Figure 2, where the black tiles denote obstacles, the red tiles correspond to BTPs, and the grey tiles are free areas that are not considered for detection, while the green arrows indicate the agent's motion direction, when the BTP can be detected. The agent is placed in the middle (white tile) and the neighbouring cells are sensor measurements, unlike the approach proposed in [18,36], where the backtracking point is in the middle. For making the overall approach more clear, let us consider left condition in the middle row of Figure 2, which uses three out of the eight measured points to calculate the BTP. In this case, the green arrows indicate that the BTP can be detected, while moving to the North or to the South. After obtaining the sensor measurements s j , it results that s 7 ∈ C o i,k and {s j \ s 7 } ∈ C f i,k . To obtain the motion direction, the s 2 should be considered and, of the forward or backward direction is not blocked, then s 8 ∈ C BTP i,k . Depending on the complexity of the unknown area, the number of backtracking points could be proportional to the number of agent steps. Therefore, in the proposed work, we consider that, while the agent performs BM, it discovers and covers BTPs, so the already visited BTPs are removed from the agent's BTP stack, and this operation can be defined as: At the same time, each agent locally operates based on his own BTP stack C BTP i,k and it globally uses data from the global BTP stack shared among the agents as C global k if C BTP i,k = ∅, and it relies on its sensor measurements s i,k to provide non-overlapping coverage paths. In case there are more agents than BTPs, a free agent is waiting for the first available BTP in the global stack. The global map of discovered area M (coverage model) is updated in each iteration from all agents, unless the agent makes a turn or holds its position. The full coverage of the unknown area is achieved when the C global k = ∅ and presented as map model M.
In the proposed approach, it is considered that, while performing BM, the agents never revisit discovered areas unless they reach a blocked position q i,k = ∞. Such an example is depicted on the Figure 3, where the different blocked conditions are shown and there is a need for generating the path from the actual agent's position to the next available BTP. . The blocked conditions for the agent with forward motion. Green arrow depicts motion direction, white tiles denote agent's discovered path, grey tiles denote free space. The left image describes the case when the agent is blocked by obstacles (black tiles), the middle image depicts the case when the agent is blocked with obstacles on the left and the neighbouring areas are covered by other agents. The image on the right depicts the case when all neighbouring regions were discovered by other agents.

Staying Alive Policy
In the proposed scheme, it is considered that all of the agents are battery powered and have limited operation time [37]. Mission planning without considering this limitation may result in the loss of agents [38] and even result in a mission failure. Thus, in this work, the policy for staying alive for all agents is introduced.
Agent's transition energy is defined as E tr i,k and it is evaluated at each BM step (iteration), as: where e i corresponds to the agent's minimum energy spent per one iteration. The energy consumption for each agent can be defined as: where E i,k ∈ [0, 100] corresponds to the actual battery energy level, where a full charge denotes a value of 100% and the critical charge denotes value that is below 20%. The overall algorithm for staying alive policy (SAP) is shown in Algorithm 2, where the required energies E chp i,k and E BTP i,k are calculated using Equation (4).

Algorithm 2 Staying alive policy algorithm
Moreover, in case the agent requires charging q i,k+1 = q chp i , the path planner that is presented in the Section 3.3 will generate a collision free path to BTP or to the charging station.

Path Planner
In general, the agents perform an exploration task that is based on BM. However, when the battery level of an agent is low, which is denoted from E i,k < E min i or the agent is blocked q i,k = ∞, the agent should navigate to the charging station q i,k+1 = q chp i , or the closest BTP q i,k+1 = C BTP i,k , respectively. Thus, a Probabilistic Roadmap (PRM) planner [39] is implemented in order to generate collision free paths P i,k from q i,k to q i,k+1 based on the discovered map M.
The PRM is based on a non-directed graph that contains all possible trajectories that are derived from the occupancy grid. Generally, the PRM path planner has two phases: learning and query. During first phase it constructs a roadmap of a free space as an non directed graph, while in the second phase the method tries to connect the q i,k and the q i,k+1 positions by a corresponding path. Figure 4 shows the generated non-direct graph by blue lines and corresponding path in the orange. Furthermore, the path planner algorithm is presented in Algorithm 3, where "==" denotes equality. It requires the current position of the agent q i,k , the position of charging station q chp i , and the set of BTPs C BTP i,k . Subsequently, in the case of q i,k = ∞ all paths to BTPs are calculated (Figure 5a),while the BTP with a minimum length of the path is selected for the agent to visit. However, if the agent's battery level is not sufficient, for reaching to q i,k+1 based on Algorithm 2, the agent should navigate to the charging station ( Figure 5b). In this case q i,k+1 = q chp i , only one path from q i,k to the charging station is obtained.  . After a while, the agent reaches its blocked position q i,k = ∞, which means that this BM is finished. From this point, there are four possibilities to continue exploration. The PRM planner will calculate paths and their distances to each BTP and will send an agent to the nearest one to continue exploration. On the right image (corresponds to the second case), while conducting BM agents' charge level became low (signalized from the staying alive policy), so the agent has to visit charging point q i,k = q chp i . The path is generated by a PRM planner. After recharging, PRM planner will provide the shortest path to the agent to continue exploration from three available possibilities (C BTP  Overall, the final cooperative exploration and coverage scheme is summarized in Algorithm 4 and Figure 6 represents block diagram of the suggested approach, where PP and SAP stand for path planner and staying alive policy, respectively. if q i,k ∈ C f k then 8: if C global k = ∅ then 9: End of mission. 10: end if 11: for i ∈ n do 12: if C BTP i,k = ∅ then 13:

Algorithm 3 Path planner
end if 15: Find path according to Algorithm 3 16: end for 17: else if q i,k = ∞ then 18: if C global k = ∅ then 19: End of mission. 20: end if 23: Find path according to Algorithm 3 24: end if 25: else if q i,k == ∅ then 26: Agent is lost. 27: Launch new agent. 28: else if q i,k == q chp i,k then 29: Find path according to Algorithm 3 30: Go to Step 4.

Simulation Results
The proposed methodology is evaluated in simulation studies, where all of the algorithms have been implemented in MATLAB R2018a, (in a single thread) on a platform with Intel Core i5-8250U and 16 GB of RAM. Furthermore, the charging time for each agent is not considered. As will be presented further, in all of the scenarios, the suggested cooperative exploration and coverage algorithm, succeeded in providing a complete area coverage.
In all the simulations, we assume that all agents are modelled by a square with side l = 1 m and have no prior knowledge about the workspace. The workspaces are randomly generated binary images, where pixels marked zero belong to free space and all the remained pixels are marked with one belong to occupied space. All regions within workspace are accessible by agents and are discovered by them in an incremental manner increasing their knowledge of the environment. All of the discovered pixels are added to the grid map with the cell size equal agent's side l. All of the agents always detect obstacles and have communication with the base station.
At first, the multi-agent algorithm was evaluated with teams of one to 11 agents in simulations for three environments with different size and complexity, which were chosen to evaluate the complete coverage of the proposed method in relation to the size of the area, scenarios with obstacles inside the exploration area, and complexity of the environments. In all scenarios, all of the agents were deployed from the same starting location (blue square at the bottom of the figures), which also stands as a charging point, while the discovered obstacles are shown in black. The red squares with white faces correspond to backtracking points. When they are covered with an agent, the face colour changes to the corresponding agent's colour. Moreover, the BTPs in the middle of the exploration area (points that are not adjacent to the obstacles) depict that the agent went to the charging point according to the staying alive policy.
The area in the first scenario has a size of 2068 m 2 and it contains multiple branches. Figure 7 depicts the evolution of the explored area for 25%, 50%, 75%, and 100% of the time from left to right with a team of 5 agents. It can be observed from changing the BTP face colours that each agent uses its own BTP stack until it is empty, and then uses the BTP from the global BTP stack. As an example the agent 4 in 25% of the simulation execution time utilized its own local BTP (red squares); however, in 50% of the simulation time and after completing the local BTP, it moves to the global BTPs which were discovered with another agents. In the second case environment with size of 2029 m 2 with obstacles inside the exploration area is considered. Same as first case the exploration evolution over time with a team of five agents is shown in Figure 8. It can be seen that the proposed method results in the full coverage of the area. Furthermore, in the third case, environment with size of 3631 m 2 , has higher complexity and number of branches is explored with team of five agents. Figure 9 depicts the exploration expansion over time. The simulations for scenarios 1-3 demonstrated that the proposed method succeeds in completing coverage task. In order to analyze how well the work load is distributed between the agents the overall travelled path length of all agents in each scenario was analyzed. Figure 10 depicts the lengths of the coverage paths of the agents for each scenario. The proposed approach provides well balanced path lengths for each agent, as can be seen. Worth noting that the average path length for one agent in scenarios 1 and 2 represents a balanced workload for exploration areas of equivalent size, while the average path length for one agent in scenario 3 is bigger due to the larger size of the exploration area. In order to assess how the exploration area size impacts on the algorithm performance with regards to the number of agents, the average time required to execute one step for one agent was calculated, as depicted in Figure 11. As one can see, the computation time for each scenario could be approximated linearly. Accordingly, the overall method is O(n), which means that the rate of the computation time grows with the number of utilized agents. It can also be noted that the average computation time for exploration areas of the same sizes (Scenario 1 and 2) is around 1 s. This also means that, on average, the number of BTPs utilized to get trajectories by PRM planner in both scenarios is equal. For the scenario 3 with larger exploration area as compared to first two scenarios, agents discover more BTPs, thus path planning requires more computation time. The further analysis will describe how the number of agents will influence the exploration process and the performance of the staying alive policy. In order to avoid data redundancy, the following results will only be presented for the first scenario.
In Figure 12 the relation between number of agents and the explored area over the time is presented. As expected, the larger team of agents results in less exploration time. However, it can be seen that the difference of exploration time between 9 and 11 agents is much less than difference of exploration time between 9 and 5. This shows that larger number of agents is not always necessary due to the size or structure of the environment; nonetheless, future study is required in order to find optimal number of agents.  Figure 13 depicts the variation of agent's charge level over the iteration process. The lower bound of the agent's battery level E min is depicted as solid horizontal line in red colour. As it can be observed at the top and the middle images, the charge level E i,k goes below the stated bound. This means that, due to the large traveling distance from the charging point, the battery level dropped below 20%. Thus, the introduced staying alive policy proved its viability in all conducted simulations by guaranteeing 0% of agent loss. Further analysis revealed that the total number of charges per agent were reduced with a growth of the utilized agents. In Figure 14 a comparison of the total number of charges required to explore the current simulation scenario, where for the case of 1 agent, the total and average number of charges are the same, is depicted. However, for larger teams, we observe fluctuations, e.g. in some cases the utilized team will have more charges than a single agent, but, on average, it will be more efficient and require less time for the entire exploration objective. Additionally, the average number of charges per agent varies, so, for example, in Figure 13 agent 1 made five charges and agent 3 made three charges.  Table 1 shows the summarized results of simulations for scenario 1. In order to evaluate the performance of our approach with respect to other algorithms, two methods were chose [29]. One is the nonbacktracking multi-robot spanning tree coverage (NB-MSTC), an offline method that has proven to provide the smallest total length of the coverage paths and boustrophedon and backtracking mechanism (BoB) algorithm, which an online method developed for multi-robot systems. The criteria for comparison are the complete area coverage and the total length of the coverage paths. The workspace for evaluation was designed, as in Figure 9a in [29]. It has size of 30 × 30 m square cells and each cell has a size of 1 × 1 m and 13 obstacles, where each obstacle has cell width and a random length four to nine times the square cell.
The simulation results are presented in Figure 15, where the figure to the left is replotted Figure 9b from [29] that depicts the coverage path with four agents with NB-MSTC approach, while the figure to the right depicts the coverage path for the same number of agents with the proposed approach.  Table 2 summarizes the coverage paths for four agents by each method. As it can be seen, the NB-MSTC provides the shortest paths with the total coverage length of 617 m; however, as it can be seen from Figure 15, it does not provide a complete coverage and the workload between agents is not equally distributed. The BoB approach provides the total path length of 871 m, while keeping balanced work load distribution between agents. Unlike these two methods, the proposed algorithm created the total exploration path length of 716 m, while providing complete area coverage along with the balanced workload between agents. Overall, the simulation results showed that the proposed approach works well in all simulated scenarios. Additionally, it outperforms NB-MSTC in area coverage and equally distributed work load among agents, and it outperforms BoB in terms of shorter total coverage path length of agents.

Discussions of the Design Choices
This article studied online multi-agent collaborative exploration and coverage of the unknown environment. Due to real-life challenges, such as search and rescue mission where the time of operation is important for situational assessments, this online approach selected and designed components of the presented algorithm.
The exploration method that is presented in Section 3.1 was selected according to conducted analysis of existing online exploration and coverage strategies. Among which, the Boustrophedon motion algorithm showed reasonable performance and capability for collaborative multi-agent work implementation in unknown environments.
The second method in Section 3.2 of the proposed algorithm was intentionally selected in order to bring it closer to the real life applications. Because, presently all platforms have limited operation time. This consideration allowed for us to implement the staying alive policy, which, according to conducted simulations, guarantees that all agents do not run out of energy within entire exploration and coverage mission.
Accordingly, the first two methods allow for conducting online exploration considering limited operation time. However, they cannot handle transitions when the agent reaches its blocked condition depicted at Figure 3 or runs out of charge. To overcome it and continue exploration ( Figure 5), the path planner was used to generate collision free trajectories. Generally, for this, any path planner with such a capability can be used.

Design of Experiments
The transition from simulations to actual experiments will be done by the means of the Robot Operating System (ROS) [40], a framework for developing robotic applications, in which implementation of the designed algorithm together with robot control package will be implemented and tested. The assumptions that were taken in simulation, like robot dynamics, perfect localization, perfect sensor model, and communication, are the limitations that will be considered in planning the experiment. As a robotic platform, we selected turtlebot3, a standard ROS robot that provides full access to its on-board sensors for obtaining information about robot's state and it can be remotely controlled from a laptop. The robot is equipped with 360 Laser Distance Sensor LDS-01 with maximum detection distance that is equal to 3500 mm. The lidar data will be processed in a way to reflect the sensing model that is used in simulation. The communication link between robots and laptop will be established while using WiFi connection. The Ubuntu based laptop will have ROS on-board and it will stand as a base station for controlling robots. The workspace will be designed when considering the turtlebot's dimensions and, generally, will be smaller and less complicated than in the simulations. When considering its size and platform's operating time of about 2 h, will be developed commands that will simulate low battery and full charge events for robot agents. Validation of the coverage and position tracking will be done while using motion capture system Vicon. During the experiment, all sensory information from the turtlebots' and Vicon will be recorded by the means of ROS and processed afterwards. The obtained results from the experiments and simulations will be compared.

Conclusions
In this study, we introduced a novel online coverage approach for multi-agent systems in an unknown environment that can be used in inspection, search and rescue missions, and the exploration of the unknown areas of different complexitym and it considers deployment and coordination of multiple agents in close to real-life scenarios. The additional novelty of the proposed algorithm is that it couples the agent's limited battery operation time with the path planner in order to guarantee a staying alive path planning for the multi-agent team. Moreover, a novel BTP point detection scheme for exploration task, while performing Boustrophedon motion, has been presented. The overall proposed online algorithm was successfully evaluated in simulations in complex environments by providing complete coverage of the exploration area and equally distributing the workload between agents. The overall provided path length of the proposed approach outperformed the BoB algorithm with 716 m against 871 m. Trajectories that were provided by the PRM planner were not always optimal and it requires more investigation towards local optimal path planning and remains to be part of future research. The influence of the exploration area size on the calculation time was demonstrated. This indicates that the larger exploration area, on average, has more BTPs, which, in turn, requires more trajectories to be calculated by the path planner. The implemented staying alive policy in all of the simulations guaranteed achieving a zero agent loss. The observed number of total charges required per agent for the simulated scenario is generally decreasing with a larger team, but there could be no guarantee that the total number of charges will be less than those for the one agent case. Part of the future research includes the enhancement and fusion of the presented staying alive policy by the path planning algorithm that is based on a real discharging model and the full experimental verification of the suggested scheme with mobile and aerial robots. Besides that, it is planned to investigate the feasibility of the agent in order to reach the end-point of the exploration area with a possibility to return safely to the starting point.