Hierarchical Task Assignment and Path Finding with Limited Communication for Robot Swarms

: Complex service robotics scenarios entail unpredictable task appearance both in space and time. This requires robots to continuously relocate and imposes a trade-off between motion costs and efﬁciency in task execution. In such scenarios, multi-robot systems and even swarms of robots can be exploited to service different areas in parallel. An efﬁcient deployment needs to continuously determine the best allocation according to the actual service needs, while also taking relocation costs into account when such allocation must be modiﬁed. For large scale problems, centrally predicting optimal allocations and movement paths for each robot quickly becomes infeasible. Instead, decentralized solutions are needed that allow the robotic system to self-organize and adaptively respond to the task demands. In this paper, we propose a distributed and asynchronous approach to simultaneous task assignment and path planning for robot swarms, which combines a bio-inspired collective decision-making process for the allocation of robots to areas to be serviced, and a search-based path planning approach for the actual routing of robots towards tasks to be executed. Task allocation exploits a hierarchical representation of the workspace, supporting the robot deployment to the areas that mostly require service. We investigate four realistic environments of increasing complexity, where each task requires a robot to reach a location and work for a speciﬁc amount of time. The proposed approach improves over two different baseline algorithms in speciﬁc settings with statistical signiﬁcance, while showing consistently good results overall. Moreover, the proposed solution is robust to limited communication and robot failures.


Introduction
Several service scenarios are characterized by new tasks appearing at any time and in any place, necessitating persistent operation [1]. As an example, consider a large warehouse where orders arrive erratically and diverse items need to be collected from different locations. In such context, besides retrieving goods, other services must be continuously executed on pallets, such as tidying, sorting objects, and controlling and refilling stocks [2,3]. Similarly, in a greenhouse, continuous and localized operation is necessary for pruning plants, variable-rate spraying, and selective harvesting. Finally, in a facility or office environment, cleaning or delivery tasks may be continuously and unpredictably requested by users located in different areas [4]. Automation in such context cannot always rely on centralized, fixed infrastructures, which may be costly or impractical. Instead, wireless sensor networks, IoT, and mobile devices may be exploited to raise the need for tasks to be serviced by a swarm of specialized robots [5,6].
Addressing a persistent and unpredictable service demand with a multi-robot system can provide higher efficiency, thanks to parallel operation, and larger flexibility by modulating the operational costs in terms of robots employed, as well as by deploying robots only where needed. At the same time, the robotic system must prove scalable, fault-tolerant and robust, to ensure continuous and efficient operation despite the number of units deployed, the occurrence of failures, and the variability in the task demands. Concrete solutions can be obtained by addressing two concurrent problems. On the one hand, it is necessary to find a suitable task assignment for each robot, so that costs are minimized in terms of relocation, robot idle time, and task waiting time [3,7,8]. On the other hand, a collision-free trajectory for the robot must be planned in order to safely reach the destination without interfering with other robots' plans [3,9]. This class of problems is often split in Multi-Robot Task Allocation (MRTA) and Multi-Agent Path Finding (MAPF). A common naïve approach consists in solving MRTA and MAPF separately. However, in cluttered environments, such as indoor spaces, those two problems are tightly coupled: the former provides a set of origin-destination pairs to the latter, which, in turns, returns back actual costs and timings associated to the movements planned by the different robots. For both MRTA and MAPF, many centralized and decentralized approaches exist in the literature (see Section 2). Centralized solutions often rely on perfect intra-robot communication and suffer the well-known issues of scaling with group size-especially when the task demand can rapidly change over time-and of fault-tolerance due to the existence of a single point of failure. Decentralized approaches, on the other hand, potentially feature high fault-tolerance and scalability, but few have been shown to operate in environments with limited communication.
Here, we propose a decentralized algorithm for both task allocation and motion planning that is robust to limited communication and robot failures, and scalable to different environments and group size. Our method is referred to as Hierarchical Task Assignment and Path Finding (HTAPF) because the task allocation problem is solved exploiting a hierarchical representation of the locations where tasks must be executed. More precisely, by partitioning the work-area in a quad-tree, our MRTA problem is efficiently solved through the deployment of robots in areas of interest corresponding to the tree leafs where tasks to be executed are available. To this end, we propose a non-trivial extension of a collective decision-making process designed for robot swarms [10,11] to deal with a hierarchy of sequential decisions, allowing a robot swarm to prioritize those areas with highest demand, while minimizing movement costs. This task allocation approach is tightly coupled with a search-based path planning approach, which plans the robot movements and estimates the expected costs [12]. Thanks to this coupling, HTAPF is able to dynamically adjust the robots' density in different areas, granting consistently good performance across different problem instances.
The main contributions of this paper are the following: (i) introduction of a decentralized hierarchical approach to the task assignment problem; (ii) tight coupling of the task assignment and path finding processes based on a novel utility function; (iii) validation of the proposed approach in realistic settings; (iv) statistical analysis and comparison of the performances of the system against two different baselines.
In the following, we first discuss the literature about coupled MRTA and MAPF problems (see Section 2). Next, we provide a formal description of the analyzed problem together with the proposed approach in all its constituent parts (see Section 3). Section 4.1 presents the results obtained from the experimental evaluation, and Section 5 concludes by discussing the benefits and limitations of the proposed approach.

Related Work
Centralized approaches to coupled MRTA and MAPF have been proposed that, given a fixed road-map, compute the shortest path first and then utilize integer linear programming (ILP) to compute the optimal task assignment [13]. Other approaches use search-based methods that can find optimal and bounded sub-optimal schedules with respect to the sum-of-cost [14] or optimal schedules with respect to make-span [15]. In obstacle-free spaces, the concurrent assignment and planning of trajectories (CAPT) finds optimal motions plans for multiple robots by minimizing the squared velocity. This method has been also proposed in a decentralized version, but with no optimality guarantees [16]. Other decentralized solutions for the combined problem include controller synthesis to solve the formation change problem in environments with obstacles and communication constraints [17]. However, this solution scales exponentially with the number of robots. In Reference [18], a decentralized control approach is presented featuring good performance for large robot teams, but-unlike our approach-it does not consider obstacles. Additionally, this approach considers pairwise task-swaps and, therefore, might not optimize the objective over the whole team. Other approaches are based on the estimation of the robot density [19]. The swarm distribution is estimated locally, followed by the distributed solution of an optimal transport problem. However, also this method does not consider obstacles posing constraints on motion and communication.
Some traditional approaches to decentralized task assignment use auctions [20]. A great part of these methods rely on temporarily centralizing the decision-making on a auctioneer that collects offers and defines the winning robot, to which the task gets assigned. In swarm robotics contexts, similar techniques are often unpractical due to the high communication load for large groups, and the limited communication abilities of the individual robots. Instead, in swarm robotics, threshold-based approaches [21] or stochastic task switching [22] are often preferred. These methods are inspired by biological systems where a flexible and stochastic allocation of tasks is observed (e.g., in social insects, individuals engage in tasks on the basis of the urgency of accomplishing it [23,24]). Robot swarms employ similar strategies, which allow to achieve coordination in an implicit way, relying on probabilistic rules that can be modulated by the individual ability to evaluate the utility in engaging in a given task. These methods can be improved by enabling interaction among individuals deciding collectively which tasks to execute or in which area to move, and various task assignment dynamics have been observed from winner take all to a balanced deployment [11,25]. These latter studies are inspired by the decision-making abilities observed in ants [26,27] and honeybees [28]. The latter, when searching for a site where to establish a new nest, engage in repeated interactions trying to recruit other scouts for high-quality sites, at the same time discouraging other individuals to recruit for alternative sites. Theoretical studies revealed that the mechanisms involved in this collective process can lead to optimal decisions in a relatively short time [28], also presenting striking similarities with the neural dynamics observed in human decision-making [29,30].
Previous work deployed a decision-making algorithm based on the behavior of honeybees and tailored to collective decisions in decentralized systems [10], which proved useful for the design of artificial systems in a variety of contexts, from robot swarms [11,25,31,32] to cognitive radio networks [33]. Here, we extend this framework to a hierarchy of sequential decisions, largely increasing the complexity of the resulting collective dynamics. In contrast to previous studies, we account for robot motion and collision costs, robot failures, and limited communication. For path planning, we use a fully distributed version of the priority-based planning paradigm [12] as described in Section 3.2.3. Priorities are assigned implicitly by the order in which data is received from nearby robots. No token passing procedure or specific schedule is in place ensuring robust execution also with limited probabilistic communication and robot failures. Other decentralized path planning methods-not considered here-include push-rotate-update [34] and a method where the path is planned by a sensor network [35].

Hierarchical Task Assignment and Path Finding
We consider scenarios with several tasks to be serviced, e.g., an area needs cleaning or human workers request supplies. We assume that there is no central authority coordinating the task execution or sharing the list of available tasks to all the robots. Additionally, the task distribution is not known in advance, and tasks may appear at any time in any place. Here, we first provide a formal description of the problem, and then discuss our approach to address both MRTA and MAPF in a fully decentralized way.

Problem Description
We consider the class of problems characterized by tasks that can be executed by a single robot at a time (SR), and robots that can execute only a single task at the time (ST). We also consider instantaneous assignment (IA), as the arrival of further tasks is not available and unpredictable [36,37].
Without loss of generality, the work area is defined as a square partitioned in a 4connected grid of size N × N, N ∈ N (see Figure 1, left), which can be configured to represent spaces of different complexity, from an office to a warehouse. A grid cell c i ∈ C is identified by its coordinates x i , y i ∈ {1, . . . , N}, and might represent free space where robots can move and tasks might appear, or an obstacle (e.g., a wall). Each free cell can be occupied by at most one robot and can contain at most one task at any time. Conflicts are resolved by sequential planning, exploiting the knowledge of others' plans for task assignment and collision-free path planning. In the image, each robot is identified by its unique color and number, i.e., r 1 white, r 2 green, r 3 blue, r 4 yellow. Grey regions are walls. Right: Interaction scheme among the different components of HTAPF. The local knowledge of robot r stores the world model, the estimated area utilities obtained upon interaction with other robots, and the received plans from other robots. Both the area assignment process and the motion planner exploit and update the local knowledge and interact with each other with local information passing. Information received from other robots about area utilities (i.e., U r 0 (a) from robot r 0 about area a and the goal region a g ) and individual motion plans are exploited to update the local knowledge.
, which requires any robot to move to the cell at location (x i , y i ) and work at that location. Task T i appears at time-step t s i and is accomplished by a robot at time-step t e i (where t e i = ∞ as long as the task has not been finished). The work time t w i is the time required from a robot to stay at the task location and actually execute the task (e.g., cleaning, delivery, or maintenance). The task set T is the disjoint union of the completed tasks T c = {T i |T i ∈ T , t e i = ∞} and the unfinished tasks T u = {T i |T i ∈ T , t e i = ∞}. At any time t, we can define the set of available tasks T a (t) = {T i |T i ∈ T u , t s i < t}. In this study, the primary objective is to serve the highest number of tasks (max |T c |).
We consider R identical holonomic robots starting in randomly assigned free cells. A robot r i ∈ R is described by tuple r i = (i, x i , y i , s i , a i ) identified by its unique id i, an associated position in the environment given by its coordinates (x i , y i ) a discrete state s i indicating the current behavior (whether exploring or exploiting available knowledge, see below) and an area a i where the robot is currently deployed. At each time-step, a robot can wait at its current position, move to an adjacent free cell, or work on a task located in the cell it is occupying. We assume limited sensing capabilities: robots can only sense adjacent neighbors, and use their sensing abilities to avoid moving in a cell that is already occupied. Additionally, robots can communicate with each other wirelessly. Communication is simulated by means of a realistic model for WiFi networks [38][39][40], whereby the received power (in dBm) from robot r j when robot r i is transmitting can be computed as: where P 0 is the signal strength at 1 m, η is the path loss, and d(i, j) computes the distance in meters between the two robots. Moreover, nW(i, j) computes the number of walls in the direct line between the two robots, W is the wall attenuation, and σ the noise variance. Data is exchanged in packets (or frames), and the frame error rate is defined as: where S is the receiver sensitivity, FER S and γ are hardware specific constants, N b is the background noise, and N th the thermal noise. In this work, we exploit the receiver sensitivity S as a parameter to determine local and realistic communication abilities. The above communication model is relevant also for discovering tasks. Indeed, we consider tasks as being issued by mobile devices equipped with the same WiFi communication system as the robots (e.g., tablets or smartphones). Hence, robots might have limited perception of the available tasks and the existence of a task is not known by all the robots at the same time. We assume, unless otherwise stated, that robots do not share task-related information.

HTAPF: Principles and Methods
We now illustrate the details of HTAPF, our novel distributed approach that couples MRTA and MAPF. The overall approach includes three main components: (i) the local knowledge available to each robot, (ii) the area assignment leveraging interaction with other robots, and (iii) the motion planning and task assignment, which is performed by exploiting the knowledge of other robots' plans (see Figure 1, right).
The local knowledge is available to each robot and consists of a world model, area utility estimations obtained via interaction with other robots, and motion plans of nearby robots. The world model has a map of the environment-the only information known a priori-and a hierarchical description of the work area, that is divided in sub-areas (see Section 3.2.1). As time goes by, the world model is updated including information about the available tasks and about the presence of other robots. Robots exchange two types of information. First, they share their locally-estimated perceived utility of working in the area they are currently located, which is exploited to support the area/task assignment process. Second, they exchange their motion plans, which are useful for path finding.
The area assignment process generates goals for the motion planner in terms of a goal area where the robot wants to service tasks. To determine the robot destination, the area assignment exploits the world model, the information coming from other robots, and knowledge from the motion planner about the costs of reaching a target area. In other words, robots can choose to work in an area only if they are assigned to it, and their motion plan is, therefore, influenced by the area to which they belong. For instance, in Figure 1 (left), all robots can move and work within their current area but r 4 , the yellow robot, which has no task available in its area and needs to get assigned to a different one before selecting a task (in this case, T 4 ).
Task assignment in the target area and motion planning are performed concurrently. The motion planner is fully distributed and does not require synchronization among robots. Given a goal area, it assign tasks and generates collision free paths according to the information previously received and stored in the local knowledge, including the plans of other robots. The current plan is also used internally to enhance the area assignment process and is also broadcast to other robots to support concurrent planning. For instance, in Figure 1, the green robot r 2 first plans its path toward task T 2 , and r 3 , the blue robot, later decides to move to task T 3 even if farther away to avoid interference with r 2 .
From this high-level description, it is evident that there is a very tight coupling between the different components, which constantly share information to increase performance both at the level of the individual robot and of the group (also see Algorithm 1). In the following, we provide a detailed description, starting from the world model and then moving to the main contributions of this study, that is, the task assignment and path finding, giving particular attention to the coupling between the two. Algorithm 1 HTAPF 1: procedure STEP(a n i , D, P a , P d ) 2: Reset transition probabilities 4: r o ← SelectNeighbour() Random choice of a known robot 5: Selection of next transition type 6: if s i = s a then 7: (4) and (5) 10: UpdateTree(a g ) Update all ancestors 12: cost, plan ← PlanMotion(a n i , a g ) Collision Free Path 13: Broadcast(a g , U r (a g ), cost, plan) 14: UpdateUtility(a g , cost) U r (a g ) in Equation (3) 15: ExecuteNextAction(plan)

World Model
As previously stated, we employ a world model that consists of a hierarchical description of the discretized space, which is represented by a generic k-d tree. Such a hierarchical description is general enough to be automatically constructed, without requiring knowledge about the structure of the work area in terms of rooms, corridors, and connecting spaces. In the following, we consider k = 4, that is, a quad-tree where each leaf node relates to a small area where task may require execution, and the root node corresponds to a region representing the whole N × N environment. Leaf and non-root nodes are assigned following the classical quad-tree generation as illustrated in Figure 2 (left and center panels). Every node in the tree is associated with an area a n j characterized by the level in the hierarchy n and an index j ∈ {1, 2, 3, 4}, so that the area is fully described by the tuple a n j = (x n j , y n j , l n ), where (x n j , y n j ) are the coordinates of the top-left corner of the considered region and l n = N/2 n is its length (i.e., nodes closer to the root have greater length). The root node a 0 is identified by the tuple (0, 0, N), while a single cell c j can be identified as a m j = (x m j , y m j , 1), where m = log 2 N. At the beginning, a robot r is provided with the size of the environment N and the depth of the desired quadtree M ≤ log 2 N, which is sufficient to build its own world model. At any time t, each area a n j of any size can contain a set of tasks T a a n j (t) that are available and need to be executed. The corresponding node in the world model of robot r is, therefore, updated with the local knowledge available, and a utility is computed that reflects the expected number of tasks that r can execute in the corresponding area a n j : U r (a n j , t) = ∑ T∈T a a n j (t) where the term C r (T) → [0, 1] is the normalized cost for robot r to reach task T, as estimated by the motion planner, computed as the number of actions needed to move from the robots' current location to its target location, normalized by the maximum lower bound cost for the given environment. The term H FW (r o , T) → [0, 1] is a heuristic function that estimates the normalized lower bound cost for any other robot r o to reach task T or its associated area, computed using the Floyd-Warshall algorithm (also see Section 3.2.3). If, for any reason, Equation (3) is not defined, we set U r (a n j ) = 0. In other words, the utility of an area a n j is higher the more tasks are present, the lower the cost for reaching them (as estimated by the motion planner), and the lower the number of other robots that can potentially execute the same tasks. In case of non-perfect communication, r might not have knowledge of the full range of tasks and other robots, in which case the utility function only considers the set of known robots and tasks. with the x and y coordinates and size l of the corresponding area. For instance, a robot residing in area (4, 2, 2) has the possibility to descend to any of the four corresponding leaf-nodes, such as the cell (5, 3, 1), exploiting its current knowledge of the sub-tree, or to ascend the quad-tree to area (4, 0, 4) for wider exploration of the environment. Right: probabilistic transitions between possible states, where a robot committed to a n i can either move up to the parent node (area a n−1 j ) when in the ascending state s a , or move down to one of the children nodes (area a n+1 m ) when in the descending state s d . Robots first decide whether to switch state s a → s d (s d → s a ) with probability P d (P a ). Then, transitions can take place that allow a robot to move in the quad-tree, either spontaneous (commitment γ n+1 m , abandonment α n i ) or interactive (recruitment ρ n+1 m , cross-inhibition β n i,i and self-inhibition ψ n i ).

Decentralized Area Assignment
The world model is constantly updated upon receipt of information about new tasks or about the presence of other robots in a certain area. In this way, each robot can compute the utility of each area from its own perspective, which is then used to select a goal area. The decision ultimately corresponds to the assignment of the robot to a node of the hierarchical world model. Here, the selection of a non-leaf node corresponds to the robot deciding to move to and remain in a certain area, while the assignment to a leaf node corresponds to the robot deciding to move to the corresponding area and possibly execute one of the available tasks. To determine the assignment to the nodes of the hierarchical word model, we exploit a decentralized algorithm inspired by the collective decisionmaking abilities of honeybees [10,28], which was used in different contexts [25,32,33] and also extended for MRTA [11]. Here, we improve over previous studies by extending the algorithm to a hierarchical representation of the world, and by considering the constraints imposed by both the need for collision-free movements and limited communication.
At any time, a robot r i is considered committed to a node in the quad-tree, i.e., an area a i = a n j in the environment wherein its movements are constrained. Initially, robots start committed to the root node a i = a 0 representing the whole environment. Robots committed to non-leaf nodes in the quad-tree are not assigned to any specific task and are free to explore the corresponding area performing a random walk [41]. Robots committed to leaf nodes are considered allocated for task execution, whenever tasks become available. At each decision step, a robot can either descend the quad-tree towards leaf nodes or move up the hierarchy towards the root node (see Figure 2). When descending the quad-tree, a robot exploits its current knowledge about the children nodes in the quad-tree to choose the area with higher utility (3) for task execution. When ascending the quad-tree, the robot gets committed to the parent node, hence accessing a larger area to explore for servicing tasks elsewhere. Switching between the ascending state s a and the descending state s d is performed according to a probabilistic threshold-based decision, which can be tuned to balance the exploration-exploitation trade-off (see Figure 2, right). Specifically, a robot changes its state s i from ascending (s a ) to descending (s d ) with probability P d per time-step, while it switches from descending to ascending with probability P a per time-step. Overall, the probability that a robot at any time is found in the ascending (descending) state is In this work, we use equal probabilities P a = P d = 0.5 to obtain a fair exploration-exploitation trade-off.
In both the descending and ascending states, a robot decides whether or not to change the current area on the basis of its local knowledge. The proposed decentralized algorithm exploits both the world model and the knowledge from other robots to determine if neighboring areas are worth being considered. The idea is that areas with high utility should be preferred over areas of low utility, following a proven decision-making design pattern [10,28]. At the same time, overcrowding should be avoided, hence balancing exploitation among different areas. To this end, robots interact with each other to determine the best possible allocation. Note that, to move towards areas at the same hierarchical level, a transition through the parent node is required.
The selection of a destination node in the quad-tree is performed according to four stochastic processes: (i) commitment to an area at a lower hierarchical level, (ii) recruitment by a peer robot committed to an area at a lower hierarchical level, (iii) abandonment of the current area to move up the hierarchy, and (iv) inhibition determined by a peer robot-either committed to the same (self-inhibition) or to a same-level area (cross-inhibition)-leading to a move up in the hierarchy (see Figure 2, right panel).
Hence, commitment and recruitment represent the possible descending transitions available when the robot is in s d , allowing a robot to move down the hierarchy towards the leaf nodes. Abandonment and inhibition represent the possible ascending transitionsavailable when the robot is in s a -allowing a robot to move up the hierarchy towards the root node. Specifically, self-inhibition avoids overcrowding, pushing robots to explore other areas when the current one has too many robots working around, while crossinhibition favors the selection of better areas of the same hierarchical levels by promoting exploration at the higher level [10,28]. We refer to commitment and abandonment as spontaneous processes, as they are determined by individual knowledge, while recruitment and inhibition are interactive processes that take place upon interaction with a different robot. All transitions between nodes are stochastic, and the transition probability is computed on the basis of the utility associated to the quad-tree nodes, as defined in Equation (3) (see Figure 2, right). Note that, at any time, there may be a non-null probability of remaining in the current area, depending on the probabilities of the available transitions.

Descending Transitions
The robot r located in area a n i and in state s d can change node by means of commitment or recruitment (see Figure 2, right). Commitment represents the spontaneous decision of r to move to one of the four children nodes a n+1 m according to the perceived utility. For each area a n+1 m , the transition probability is computed proportionally to its utility as follows: where k is a control parameter for spontaneous processes. Conversely, recruitment allows a robot r to get recruited by another robot. At any time, a robot r o is randomly chosen among the robots known by r, and-if it belongs to the sub-tree of a child node a n+1 ma transition probability is computed according to the utility shared by r o , based on the assumption that U r o provides a better estimation for area a n+1 m since r o is located in that region. Moreover, thanks to the coupling with the motion planner, U r o embeds the motion costs as estimated by r o , which serve as an indication of the perceived ease of motion within that area. Additionally, recruitment provides a means to tune the assignment process to the density of robots in a given area, as the probability of selecting a robot assigned to a certain area is proportional to the robot density of that area. Hence, the recruitment transition probability is computed as follows: where h is a control parameter for interactive processes. Note that, at any time, only a single neighboring robot r o is considered, and recruitment takes place only if it belongs to a sub-tree of the children nodes of a n i . In all other cases, no recruitment is considered. In this way, the overall recruitment rate of robots to a given area is proportional to the population of robots already committed to that area [10].

Ascending Transitions
When in state s a and in area a n i , robot r may decide to move to the parent node a n−1 j through abandonment, self-inhibition or cross-inhibition (see Figure 2, right). Spontaneous abandonment is the more probable the lower is the utility of the current area: On the other hand, a robot r residing in area a n i can be inhibited by another robot r o belonging to the sub-tree either from the same area (self-inhibition) or from a different area a n i at the same hierarchical level n (i.e., a n i and a n i share the same parent node, that is, they are sibling nodes). Self-inhibition is used to cope with overcrowding, where several robots share the same region, hence impairing robot motion. It is defined as: ψ n i = h · U r o (a n i ) · H(R(a n i ) − θ ψ C(a n i )), where R(a n i ) represents the number of robots in area a i , and H is the Heaviside step function, which enables the process only if the population in the area exceeds a fraction θ ψ = 3/4 of its capacity C(a n i ), defined as the number of cells that do not contain an obstacle. Overall, self-inhibition balances recruitment to an area when the population density is too high (also see Reference [11]). Conversely, Cross-inhibition forces a robot i to abandon a region with low utility in favor of a region i with higher utility. It is used to focus the assignment to areas of high utility, allowing the robots to abandon their current area (i.e., getting assigned to the parent node) to explore one of the sibling areas. The purpose of this process is to balance poor commitments that might arise from the probabilistic nature of the system and outdated/limited knowledge. Hence, given a robot r o residing in the sub-tree of area a n i , i = i, the probability of cross-inhibition is proportional to the utility U r o (a n i ), as defined in the following: where the Heaviside function H enables the process only if the population in area i does not exceed one fourth of the capacity, i.e., θ β = 0.25, again to limit potential overcrowding. Similar to recruitment, also in this case inhibition happens only if r o belongs to areas at the same hierarchical level or below, otherwise the transition probability is null.

Overall Algorithm and Control Parameters
A pseudo-code of the steps followed by the robot during the area assignment process is provided in Algorithm 1. The robot moves in the hierarchical tree one node at the time based on ascending and descending transition, respectively, selected according to two a-priori fixed probabilities P a and P d (i.e., probabilistic selection of the robot state s i and hence the next transition type, ascending or descending). Next, transitions probabilities are computed according to the currently selected neighboring robot and the considered areas (see Algorithm 1, lines 7 and 9) and are, respectively, associated to the locally computed interactive and spontaneous processes values as in Equation (4) to (8). The assignment process randomly extract a new hierarchical area (line 10) on the basis of computed transition probabilities and propagate this decision up, to all its ancestors in its local quad-tree. We assume the assignment process to be faster than robot motion, so that D decisions are performed while a robot is moving between two adjacent cells. In this study, we heuristically set D to be equal to the depth of the quad-tree M. Increasing or decreasing D can improve or diminish the tree exploration. Last, a new motion plan is made (line 12) only if the assignment process changes the current area after D steps, hence requiring to determine a new destination (either a chosen task or a randomly determined cell).
As originally proposed by Reina et al. [28], we have weighted interactive and spontaneous processes by two free parameters, respectively, h and k, used to define the ratio h/k between interactive and spontaneous processes. As shown by Albani et al. [11], in the simple non-hierarchical case, changing this ratio allows switching between a utilityproportional deployment where robots are allocated to areas proportionally to their utility when the ratio is low, to a utility-responsive collective decision where the optimal number of robots is allocated to the area that currently has the highest utility (higher values for h/k). Indeed, when h = 0, there is no recruitment or inhibition, and only commitment or abandonment are considered, which are driven directly by knowledge of the area utility. Relying more on interactive transitions allows to make a collective decision towards areas of higher utility, while balancing the deployment to avoid overcrowding, thanks to self-inhibition. In this study, we consider h/k = 0.25 (h = 0.2, k = 0.8), which favors spontaneous transitions while still exploiting interactions to collectively choose the best area where to service tasks.

Decentralized Task Assignment and Motion Planning
Task assignment and motion planning are performed concurrently by exploiting a priority-based scheme that assumes robots are ordered and plan sequentially, treating the plans of previous robots as known dynamic obstacles. Such an approach is in general not complete, but scales linearly with the number of robots. We implement this approach in a distributed way, similar to previous work [12], as outlined below.
At the beginning, we pre-compute the all-pair shortest paths in a pre-processing step using the Floyd-Warshall algorithm. This can be computed in a decentralized way by each robot or pre-computed and given to each robot as part of the environment description. We use the all-pair shortest paths in two ways. First, we normalize the all-pair shortest paths and use the resulting values for our heuristic function H FW (·, ·), which informs our utility function (see Equation (3)). Second, the knowledge of the all-pair shortest paths is an admissible heuristic that is used within the path planning algorithm to speed up the computations.
To this end, as the path planning algorithm, we use Safe Interval Path Planning (SIPP) [42], a variant of A* that can plan efficiently in the space-time domain if the trajectories of dynamic obstacles are known a-priori. We make an adjustment to the algorithm to better cooperate with the assignment algorithm. We let the decision-making algorithm reason on goal regions rather than single cells: when a goal region is selected as the next assignment by the decision-making algorithm, we consider any cell in the goal region a g as a possible goal, therefore solving a task assignment and path planning problem simultaneously. Specifically, each cell containing a task within a leaf node of the quad-tree is evaluated and the one with lowest cost is selected as destination. This allows the assignment process to reason about the expected number of tasks in the goal region rather than having to consider low-level congestion information of the path planner. We note that there is no explicit synchronization and that our approach is fully distributed and asynchronous.
We then broadcast our current plan to neighboring robots, that update their local knowledge accordingly using received costs to update their local utility (also see Algorithm 1). Our approach does not rely on a fixed priority (e.g., Reference [12]), or implicit synchronization using tokens (e.g., Reference [43]). However, a disadvantage is that safety can only be guaranteed with the assumed sensing capabilities of neighboring robots. Reactive obstacle avoidance prevents collisions that might be caused by our probabilistic communication model or dynamic ordering of the robots: a movement is performed only if the target cell is free, otherwise re-planning is triggered.

Experimental Evaluation
We evaluate our approach empirically by exploiting a custom 2D simulation environment written in C++ implementing a grid world (Section 4.1). On this setup, we compare results for the proposed algorithm against two baselines: (i) a greedy task selection with task swapping, and (ii) an auction-based mechanism relying on the Contract Net Protocol (CNP) framework [44] (see Section 4.2). To ensure a fair comparison, all algorithms rely on the same motion planner and have access to the same information. The experimental setup is detailed in Section 4.3, and the obtained results are analyzed in Section 4.4. . For the purpose of the experiments, we consider each cell c j in the grid world to be either free or occupied. A free cell represents empty space where a robot can move and a task appear. For the empty32 environment, task locations are uniformly distributed. In the split16, facility and lab scenarios, potential task locations are specified such that they do not block narrow passages or corridors and only appear inside rooms (see Figure 3). An occupied cell is either an empty cell already taken by a robot-and does not allow the presence of a second robot-or represents a wall or any obstacle not allowing the presence of robots and tasks. Walls and obstacles have an impact on the communication range, following the model introduced in Equations (1) and (2). Figure 4 shows how the environment configuration impacts on the communication range for the facility and lab environments.

Service Robotics Simulation
As already introduced in Section 3.2.1, we consider R identical holonomic robots. At each time-step the set of robots is randomly shuffled to remove any bias introduced by ordered sequential decisions. This allows to avoid ties since we use the reshuffled sequential order of robots to prioritize the decisions. At each step, a robot is asked to perform one of the available actions: wait at its current position, move to an adjacent free cell, or work on a task located in the cell it is occupying. Robot actions are performed after the decision phase that constitutes the main contribution of this work and is summarized in Algorithm 1.

Greedy and Auction-Based Strategies
For MRTA, a greedy approach is suitable because "anecdotal evidence suggests that the greedy algorithm works extremely well" [36]. Indeed, with high robot density and low task generation rate, the greedy approach performs close to the optimal solution due to its intrinsic reactive property, provided that conflicts in task assignment can be easily resolved. Similarly, with few robots but many tasks, the robotic system is often fully occupied all the time, resulting in the maximum possible efficiency. For our baseline, each robot r chooses the closest available task T i . We propose a variation of the trivial greedy approach by including a task swapping procedure. If a robot r p committed to task T i receives information related to the commitment of a second robot r q to the same task, a conflict resolution procedure begins. Robot r p compares its utility against r q 's utility and drops its current assignment only if U r p (T i ) < U r q (T i ) =⇒ H FW (r p , T i ) < H FW (r q , T i ). Such greedy variant does not require the concept of areas (see Section 3.2.1); thus, the utility reduces to the heuristic h.
Our proposed auction-based baseline relies on the Contract Net Protocol scheme (CNP) [44], upon which several other systems have been built (e.g., Reference [45]). Our implementation models self-interested robots without the possibility of subcontracts and transfers [37]. During every selection step, if not assigned with any task, a robot starts the announcement phase and becomes the manager for the closest task distance-wise. During the submission phase, any other non-committed robot within the communication range answers with its current bid, equals to the heuristic H FW as defined in above. The selection and contract phases proceed as normal giving priority to lower bids, i.e., robots with a shorter path to the task, and we break ties by following a FIFO selection. Note that, due to the announcement phase, task-related information might be propagated farther than the original task range, potentially increasing the number of robots involved in the transaction. In obstacle-free areas and, in particular, in absence of environmental bottlenecks one can imagine the CNP baseline to act as a improved variation of the greedy approach, but in complex scenarios it has several differences. The assignment process is synchronous and considers all the robots with related utilities at once, thus locally centralizing the system for a small time-frame and generating a bigger communication overhead than the other two proposed approaches.

Experimental Setup
To model our service robot domain, tasks are generated within two macro-areas at a time, chosen from 16 areas that cut the environment in identically-sized regions. Chosen areas generate tasks at a fixed task generation rate, set to 1 task per time-step in all environments, unless the area is completely full. Additionally, the enabled macroareas change during the simulation eight times, resulting in a dynamic demand across the environment. Tasks do not disappear after a region change and can still be served by the robots in the area. We believe this to be an effective and realistic approximation of the service robot problem, in which a region-once served-does not require further attention for a certain amount of time.
Unless otherwise specified, the settings for the experiments are as follows. The work time t w i for all tasks is constant and fixed to 5 steps, i.e., the number of steps a robot has to remain over a task to consider it accomplished. The number of robots involved in the simulations is related to the number of free-cells in the considered environment: 25 for split16 and facility, 50 for empty32, and 150 for lab. We consider nodes associated to areas of size 2 (i.e., a m i = (x i , y i , 2), with m = log 2 N − 1) as leaves. The number of decisions per time-step (i.e., movements in the hierarchical tree) is set to D = M = log 2 (N) − 1, that is, the depth of the quad-tree, and the probability to switch from the ascending to the descending state, and vice versa, are set to P a = P d = 0.5. Each simulation was executed for 300 time-steps. All the parameter values have been determined on the basis of previous work or empirical evaluations and are not specifically tuned based on the environment to be tested (for a discussion about this choice, see Section 5).

Results
The evaluation of performance of the proposed HTAPF approach is compared against the greedy and CNP approaches in different experimental settings. For each setting, we performed 50 independent simulations with different random seeds and random initial robot positions. In each simulation, we measure the number of completed tasks looking for the approach that maximizes the demand satisfaction. For each tested environment and experimental condition, a one-way non-parametric ANOVA (Kruskal-Wallis) test was conducted to examine the differences in the completed tasks according to the different strategies considered (greedy, CNP, and HTAPF). Moreover, the post-hoc Dunn's test was employed to examine the significance of all possible pairwise comparisons (p < 0.05). Figure 5 provides the results obtained in all environments in ideal conditions, that is, when communication among robots is perfect and there is no failure. The former aspect implies that every robot can receive without error the information about (i) tasks available and (ii) motion plans from other robots. This condition is clearly unrealistic for practical scenarios but is useful as a baseline on which to evaluate the impact of limitations, like local communication and possible failures. Generally speaking, we observe good results from HTAPF, which performs on par or better than the greedy and CNP approaches (see Figure 5 and Table 1 for the outcome of statistical tests). Specifically, in split16 HTAPF performs significantly better than both greedy and CNP. Here, the bottleneck constituted by the small opening between the two rooms may produce congestion when the task assignment to different areas is not properly managed, which is something that neither CNP nor the greedy approach properly do. In empty32, instead, CNP is properly assigning tasks to the robots and performs slightly better than HTAPF, while the greedy approach presents a significantly lower performance, due to the inability to quickly adapt to changes in the areas where tasks are spawned. It is not surprising that CNP performs well in this environment, considering that there is no limitation in motion from walls or obstacles, and, therefore, the bidding estimates closely correspond to the actual costs incurred, as motion planning has ample freedom to select a collision-free route. This is not the case for the facility environment, where the motion constraints in relocating from one area to another have a similar impact on all approaches due to the narrow corridors that in some cases force the passage of a single robot at the time. Finally, the most challenging environment in terms of size and structure is lab, where HTAPF again outperforms all other approaches.
Note that, thanks to the perfect communication, CNP is close to a centralized approach for task assignment, as all robots can participate to the bidding process. In addition, the greedy approach can take advantage of the perfect information available, making task switching very efficient. Hence, the competitor strategies for HTAPF are working in the best possible conditions for their characteristics. HTAPF also exploits all the available information well, but the stochastic task assignment maintains some randomness that best suites for scenarios with limited knowledge, as we show later. Additionally, the communication requirements for HTAPF are minimal, while CNP imposes a considerable communication load. Hence, the advantage presented by HTAPF in these baseline experiments is commendable.
A more realistic setting is when communication is limited in range, due to limited emission power and imperfect receiver sensitivity (We use the following parameters for the communication model for Equations (1) and (2): P 0 = −20, η = 5.6, W = 10, σ = 3.1, FER S = 0.08, γ = 1, N b = −100, and N th = −100). We study the influence of limited communication by using three different sensitivities, S ∈ {−120, −100, −80} which substantially limit the range at which broadcast messages can be received (see Figure 4 for some examples). Note that both robots and tasks rely on the same communication model, meaning that tasks are known to a robot only when within range. Figure 6 shows the results for all environments and all approaches considered, while Table 2 reports the results of the statistical tests. Additionally, we also considered failures in robots happening at different rates (0.05 and 0.1 expected failures per time-step, corresponding to an average of 15 and 30 failing robots per simulation, respectively), which turn robots into static obstacles that need to be avoided, therefore influencing the overall task execution abilities of the system (see Figure 7 and Table 3 for the results of the statistical tests).
Generally speaking, the value of HTAPF is confirmed practically everywhere. In the split16 environment, HTAPF proves robust against communication limitations, and always performs significantly better than both greedy and CNP, apart when a sensitivity of S = −120 is considered where no significant difference was found between HTAPF and CNP (see Figure 6, top-left panel, and Table 2). The small dimensions and the limited number of walls entail that limitations in communication are not too evident, substantially confirming the baselines results. In addition, in case of failures, HTAPF presents a robust behavior, only marginally affected by the unavailable robots, but still outperforming the competitor approaches (Figure 7 top-left and Table 3).
In the empty32 environment, limitations in communication have practically no impact, and we observe again that CNP and HTAPF are performing on par and significantly better than the greedy approach, although a higher variability for CNP is observed when the communication sensitivity is worse (S = −100 and S = −80). In the latter case, CNP performs slightly better, thanks to the ease of motion granted to the robots (see Figure 6, top-right, and Table 2). When robot failures are introduced, instead, HTAPF performs better than the other approaches for the highest failure rate, as it proves to be more robust in re-allocating the residual robots when needed. Indeed, there is no significant drop in performance with increasing failure rates (see Figure 7, top-right, and Table 3). The facility environment is the one that presents the harshest communication limitations, with thick walls preventing a proper exchange of information. In such conditions, the performance of CNP and HTAPF gets worse and the greedy approach proves more robust as it does not rely much on communication. Significant differences are observed when S ≥ −100, where the greedy approach outperforms both CNP and HTAPF (see Figure 6, bottom-left, and Table 2). In case of failures, all approaches show a similar drop in performance, due to failed robots blocking corridors or doorways, and no algorithm is more robust than the others (see Figure 7, bottom-left, and Table 3).
Finally, the lab environment confirms the robustness of HTAPF both with respect to limited communication and robot failures. In the former case ( Figure 6 bottom-right and Table 2), HTAPF always outperforms the greedy approach and is significantly better than CNP when S = −80. In the latter case, HTAPF is significantly better than both approaches but for the maximum failure rate, where the advantage is marginal (Figure 7 bottom-right and Table 3). It is worth noting that the performance of HTAPF is rather constant despite the complexity of the environment and the introduced limitations and failures, which suggests that the algorithm properly exploits the available robots, flexibly deploying them to the areas that mostly need work to be performed, quickly adapting to variations in the service demands.

Conclusions
We presented HTAPF, a novel fully decentralized algorithm that couples a bio-inspired task allocation algorithm with a distributed motion planner for persistent robotic service tasks. The task allocation component significantly extends our prior work by supporting multiple hierarchical decision levels and a utility function that directly considers the (estimated) cost of robot motions. The motion planning component is based on a distributed prioritized planning scheme with goal assignment within a specified region. Both approaches are tightly coupled because the motion planner's cost estimates are directly considered for area/task assignment. Tests on different environments show that the proposed solution performs well in all tested cases, validated by statistical tests and compared to two baseline algorithms. We also demonstrate that HTAPF is robust with respect to limited communication and induced robot failures.
Although the HTAPF algorithm is characterized by several free parameters, no specific tuning has been performed for the experiments presented here. Proposed values are chosen according to previous work and reasonable heuristics, and have been kept fixed across different problem instances to demonstrate the wide applicability of the proposed approach. While improvements in performance are possible with a detailed tuning of all the parameters for specific experimental settings, these results demonstrate that the system properly generalizes to different conditions, different number of robots and different introduced limitations. Indeed, we have shown that performance is satisfactory also in those cases where the nature of the environment (e.g., robot failures or communication limitation) is not known in advance. We believe that the proposed solution is a good fit for many realistic service task scenarios or situations where robots with limited power, reliability, or communication are taken into account. Examples of the latter are unstructured environments that lack a central entity, a distributed network that covers the entire area, or scenarios where the system can be deployed without specific infrastructures, such as search and rescue scenarios and delivery systems for large industrial settings.
Future work includes the study of the macroscopic dynamics of the system and the adaptive tuning of the tree exploration. A deep analysis of a macroscopic model representing the decision dynamics over the quad-tree can provide a better understanding of the dynamics characterizing the assignment process, leading to the identification of parameterizations that provide optimal performance and desired system properties, possibly including different parameters for different hierarchical levels in the quad-tree. Adaptive solutions to the tuning of some key parameters are important to dynamically respond to problem requirements. For instance, the possibility of online tuning the probability of descending the quad-tree can provide an important means to optimize the explorationexploitation trade-off.