An Auto-Adaptive Multi-Objective Strategy for Multi-Robot Exploration of Constrained-Communication Environments

Featured Application: In application ﬁelds where strong communication requirements do not condition the mission, the present approach represents a proper option for coping with real communication constraints, being more fault tolerant and still having good performance simultaneously. Abstract: The exploration problem is a fundamental subject in autonomous mobile robotics that deals with achieving the complete coverage of a previously unknown environment. There are several scenarios where completing exploration of a zone is a main part of the mission. Due to the efﬁciency and robustness brought by multi-robot systems, exploration is usually done cooperatively. Wireless communication plays an important role in collaborative multi-robot strategies. Unfortunately, the assumption of stable communication and end-to-end connectivity may be easily compromised in real scenarios. In this paper, a novel auto-adaptive multi-objective strategy is followed to support the selection of tasks regarding both exploration performance and connectivity level. Compared with others, the proposed approach shows effectiveness and ﬂexibility to tackle the multi-robot exploration problem, being capable of decreasing the last of disconnection periods without noticeable degradation of the completion exploration time.


Introduction
The exploration problem is a fundamental subject in autonomous mobile robotics that deals with achieving the complete coverage of a previously unknown environment. There are several scenarios where completing exploration of a zone is a central part of the mission, e.g., planetary exploration, reconnaissance, search and rescue, agriculture, cleaning, or dangerous places as mined lands and radioactive zones. Additionally, due to the inner qualities-mainly efficiency and robustness-of multi-robot systems, exploration is usually done cooperatively [1].
Schematically, the exploration of an environment can be seen as the composition of Mapping and Motion Planning tasks. A map is needed in order to plan new motions. Moreover, choosing a correct depending on the application field, the exploration strategy should take this into account to prevent isolation situations.

Communication Issues
Mobile Ad-hoc NETworks (MANETs) constitute a particular example of scenarios where the topology of the robot network varies dynamically over time. This kind of network is recommended when the fixed infrastructure is no longer available, e.g., in disasters to support the communication among rescue team members. In such cases, connectivity is of utmost importance because the loss of communication could imply human losses.
A first critical issue concerns the collective knowledge of the environment. Under communication restrictions, such knowledge cannot be assumed to be always accessible and depending on the coordination mechanism could be the cause of significant performance degradation [20]. Therefore, depending on the application, the exploration strategy should take this into account in order to prevent the robots from becoming completely unconnected, let say isolation situations. Such an isolation situation, as well as its possible effects, are illustrated in the example scene depicted in Figure 1.  [20]. The yellow portion of the map is only known by robot B. Thus, robot A goes to re-explore the region beyond the red frontier.

Connection Requirements
Three categories are mainly identified [20]: • None. Robots are not required to communicate. • Event-based connectivity. The need for regaining connectivity is triggered by particular events such as the discovery of new information or just periodically. • Continuous connectivity. Every robot must be connected at all times to any other fleet member either directly or in a multi-hop manner.
• Line-of-sight (LoS). Two robots can communicate if and only if their positions belong to a free-of-obstacle line segment. Usually, the distance is also restricted to a maximum value that is often related to the scope of the communication device. • Disc or Circle. Communication with any other robot is permitted when its location is within a fixed maximum distance (communication radius) regardless of the presence of obstacles. • Signal. Communication is available with a certain probability that depends on the estimated signal power between the robot positions. The higher the signal power, the higher the probability. • Traces. Robots can communicate with each other by dropping messages in the environment.
Additionally, to these five categories that cover an essential aspect of the communications, say connectivity, there exist other formulations aimed at cover bandwidth or throughput as well. Clear examples of its use are the applications with a strong dependence on video streaming like search and rescue applications.

Connectivity-Based Proposals
Despite its well-known inefficiencies, there exist some few approaches without any connection requirements where robots meet each other by chance. Nevertheless, this section only surveys the proposals that depend on connectivity in one way or another.
In [21] a behaviour-based approach is presented. The architecture is designed to guide the exploration constraining the fleet to keep within the communication range, establishing a mobile network. The well-known disk model and a graph structure are used to model the network connectivity and identify possible disconnections. Frontier cells are evaluated regarding costs (computed utilising a flooding algorithm) and information utility (based on the ideas proposed in [3]). Behaviours are selected according to the network topology conditions.
In [22] a centralised communicative exploration algorithm is proposed. Communicative exploration implies that the team of robots have to maintain connections between each other at all times. The target selection is based on a utility function that weights the benefits of exploring new regions versus the goal of keeping connected. While connectivity is valued using the classic disc model, the costs of the shortest paths are computed from the Manhattan distance notion. Due to spatial and movement restrictions, specific behaviours are defined to deal with deadlocks. Also following a centralised approach, [23] presents four fully reactive exploration strategies. They consist in translating the distance to tasks and disconnection situations into artificial forces that pull and push the robot to reach new positions smoothly, avoiding them to lose connectivity. The radio signal quality is modelled considering both the communication range and the distance attenuation effect. Deadlocks are avoided by assigning tasks to a cluster of robots. This allocation guarantees that robots belonging to the same cluster do not exert conflicting forces upon each other towards different directions.
In [24], the authors propose a decentralised version of the strategy proposed in [22] based on message exchanging and a graph structure where the group always tries to keep a biconnected network efficiently. Communication model is based on the classic disc model. In consequence, robot mobility is restricted by the communication range. Using the same graph theory, in [25] the experimental validation of a distributed algorithm that preserves connectivity is also discussed. Nevertheless, a different coordination mechanism-supported by a market-based negotiation algorithm-is adopted. Unfortunately, only results on connectivity maintenance are shown, lacking exploration metrics reports.
The proposal of [26] aims to maintain and repair the underlying wireless mesh network while the coverage task is being performed, all at once. The system works in a fully asynchronous and distributed way. Differently from previous works, the authors propose a network disconnection detection by checking the real state of connections without assumptions on communication range or propagation model. On the contrary, all nodes require knowledge about the area to be covered and on global positions.
In [15] the robots can disconnect as long as they regain connectivity periodically following a distributed but synchronous strategy. Authors address coordination implicitly through localisation data exchanging. Robots are forced to wait for others before making a decision. The system works as an optimisation method where each variable is optimised at a time in a round-robin while the others remain unchangeable. In [27] the authors describe a heterogeneous multi-robot system for exploration tasks. They consider several explorer robots and conceive a particular robot playing the role of relay dispenser. This agent is in charge of place relays when and where it is necessary to support the video/audio streaming generated by explorers.
In [28] the problem of exploration and mapping is addressed by using a Decentralised POMDP. This technique takes advantage of local interaction and coordination from the interaction-oriented resolution of decentralised decision makers. Distributed value functions (DVS) are used by decoupling the multi-agent problem into a set of individual agent problems. In order to address full local observability, limited information sharing, and communication breaks, an extension of the DVS methodology is proposed and applied in multi-robot exploration so that each robot computes locally a strategy that minimises the interaction between fleet members and maximises the coverage achieved by the team, even in communication constrained environments. A decision step consists in building the model, computing the policy from the DVS and producing a trajectory.
Rendezvous-based techniques have also been used to deal with limited communication ranges. In [16] robots are enabled to move out of the communication range but forced to rejoin the group frequently. After moving out the communication range robots have to return to a pre-arranged meeting point to exchange the information gathered during the disconnection period in order to avoid exploration overlaps.
The proposal presented in [29] describes a Particle Swarm Optimisation based approach to achieving fault-tolerance in preventing communication network splits. The principal objective is to keep the fleet k-connected. Considering that the application domain defines the fault-tolerance level required to the system, a MANET connectivity algorithm is extended with the concept of k-fault-tolerance.
A multi-robot system for crisis management is described in [30]. The system is composed of mobile sensors (ground robots-UGV) and mobile relays (aerial vehicles-UAV). However, some robots may change roles dynamically during the mission (e.g., UAVs equipped with both wireless routers and cameras). The problem is modelled and solved using constrained-based local search on a communication model based on graph theory.
In [31] a fully distributed approach for multi-robot sweep exploration is introduced. The proposal aims to guarantee full coverage using a minimum number of messages and to maintain connectivity at all times, even under severe restrictions on the communication type, range, and quality. The algorithm proposed uses communication not only to exchange information but to direct the robot movements. Communication intensity is used in order to disperse the fleet while beacons are used to mark locations of interest.
In [32] a multi-robot exploration algorithm based on multiple behaviours is proposed. Quad-rotors are asked to explore and map an indoor zone with unreliable communication and limited battery life. Robots are enabled to change roles both dynamically according to intrinsic and extrinsic factors (e.g., boundaries/distances and battery level) and hierarchically in order to explore and avoid collision among each other. The remaining battery level is considered in order to avoid losing gathered information. Quad-rotors are also able to leave the network, but after a fixed period they search for regaining connectivity. Relay robots are designated to forward information from/to the more distant robots improving communication between team members. Although no optimal relay placement is computed, the existence of relays is crucial in the proposed scheme.
In [33,34] the relay node dynamic re-positioning problem is tackled. The proposed solution relies on optimisation procedures and evolutionary algorithms to find the best relay locations and how the robots should move to these points. The authors follow a centralised multi-stage approach where one node is in charge of computing the best assignment regarding both connectivity and throughput.
In [35], the problem of how to connect one or more remote units to a base station investing a limited number of intermediate relay robots in constrained communication environments is investigated.
The authors study the complexity of the optimal relay placement problem and propose methodologies to create chains or trees of relays as required by different static scenarios. By contrast, in changing environments static solutions cannot be successfully applied because the location optimality does not hold over time.
In [36] the exploration problem is addressed ensuring a time-varying connected topology in 3D cluttered environments but following a decentralised control strategy which enables simultaneous multi-task exploration.
Another centralised but asynchronous strategy is followed in [19,37] in order to address the problem of multi-robot exploration under recurrent connectivity. In these works, the authors leverage a variant of the Steiner tree problem that appears as a particular case of different known graph optimisation problems. Robot placement is treated as an optimisation problem through Integer Linear Programming. Exact and approximated algorithms are compared on particular scenarios.

Conclusions
Some conclusions arise from this brief survey of recent works. Firstly, it is remarkable that despite being the most restrictive class of exploration algorithms, the exploration strategies based on continuous connectivity are prevalent in applications where real-time image streaming are needed (e.g., search and rescue), or simply when human operators at the base station need to enforce timely information updates, or even when a high level of coordination is needed (i.e., when globally shared knowledge between robots is assumed). Additionally, robustness is also highly appreciated in hostile or inaccessible scenarios. In these missions, fault-tolerance is typically achieved adding redundancy (e.g., systems that guarantee k-connected time-varying network topologies) and employing distributed systems.
Nevertheless, when these strong requirements do not condition the mission, the event-based connectivity-that is less restrictive than the former concerning the fleet mobility-seems to be more appropriate. Now, moving up from essential aspects as communication to the top of the software architecture stack. There exists a large set of distributed reactive and behaviour-based proposals. Compared to the centralised approaches, distributed approaches have the advantage of not presenting the single-point-of-failure weakness. However, in many cases, it suffers from deadlocks at the individual or collective level.
Market-based coordination methods represent another popular option. There exists a wide variety of implementations that mainly differ from each other in the way the bids are computed by the robots (e.g., single-item or multiple-item auctions). These difference are not insignificant and typically trade simplicity and computational efficiency off for proper coordination and local optima avoidance. Besides, since each auction involves a period of synchronicity between robots, fully asynchronous market-based systems have no place. Nevertheless, asynchronous systems may be advantageous over those that periodically ask the robots to wait for others before making a decision.
Finally, in communication-restricted environments, there seems to be a general agreement on the benefits of spreading out the fleet as long as the robots can regain connectivity in disconnection case. From this, and trying to balance these potentially opposed goals, some multi-objective utility-based approaches have been proposed. Also, defining multiple roles (including communication relays) has demonstrated to be a worthy strategy to address the multi-robot exploration problem when communication restrictions are present.
In conclusion, the survey suggests that in the context of decentralised systems there is room to try new ideas related to connectivity-regaining policies and rendezvous places. On the one hand, the event-based connectivity framework imposes the execution of connectivity-regaining actions in the presence of some events. On the other hand, rendezvous-based approaches imply the definition of particular meeting points where robots have to meet in order to regain connectivity. Leaving apart the fact that the selection of these places could be a hard issue itself, once the connectivity-regaining action is triggered and the meeting place is known by robots, they should interrupt its exploration plans deviating from its current trajectories in order to accomplish the new goal. This action probably leads to global time performance degradation and individual energy consumption increasing. However, what would happen if robots are only influenced to keep or recover connectivity at all times instead of being demanded to regain connectivity? Furthermore, what would happen if they are free to meet by chance, having been motivated to stay close but without having to meet at specific places?

Contributions
This work tries to answer these research questions from the development of a novel multi-objective approach where the robots, when selecting their targets, are always considering travelling costs and the opportunity cost of keeping connected or regaining connectivity. A simple yet useful model for the signal strength and attenuation effects provide the robots with connectivity awareness. Thereupon, connectivity level measurements and path costs are considered together into a task utility function for finding solutions with a right balance between the benefit of visiting the closer targets and the usefulness of keeping the team connectivity level as high as possible.
For the sake of robustness, a decentralised approach is followed. Robots make decisions asynchronously addressing coordination implicitly through localisation and mapping data exchanging. The human operator is asked to use his application-field expertise to play a part in the task assessment process.
The main contributions of this proposal can be summarised as follows.

Ease to Deploy and Flexibility
In order to establish the task selection criterion, the human operator only needs to choose the extra distance he is willing to ask the robots to travel in order to keep or enlarge the connectivity level of the fleet. From this criterion and through formal analysis, the weights of these potentially conflicting objectives are derived. This way the robots can deal with communication constraints auto-adjusting the weights of each objective in a more intuitive manner. Furthermore, by eliminating the need for training stages the system is more adaptable to different environments.

Good Performance
Asynchronism is taken as a natural way of avoiding waiting times to make a decision as well as decreasing the number of robots that are simultaneously making a decision. Since the task allocation computation strongly depends on the number of robots under consideration, asynchronism also makes optimal decisions can be linearly computable most of the time. As a consequence, robots can compute optimal tasks-to-robots distributions in a short time, achieving high levels of dispersion efficiently. Besides, regarding reconnections, the proposal consists of a rendezvous policy where the locations of the selected tasks become the meeting points themselves, avoiding deviations from the planned paths. Compared with others, the proposed approaches are capable of decreasing the last of disconnection periods without noticeable degradation of the completion exploration time.

Outline
The present document is organised as follows. Section 2 provides the exploration problem formalisation including models and goals. Next, an Auto-Adaptive Multi-Objective (AAMO) task selection approach, as well as the task allocation algorithm and the decentralised coordination mechanism, are thoroughly described in Sections 3-5, respectively. Experimental results related to a baseline and to the AAMO approach itself are discussed in Section 6. Finally, the document is concluded highlighting some future research directions in Section 7.

Problem Formulation
This section defines the instance of the multi-robot exploration problem, which constitutes the basis for the proposal formulated in this work. All particular assumptions are mentioned throughout the following sections. Firstly, the environment, robot and communication models are defined. Namely, some real communication constraints are taken into account and formalised into the model. A task definition is given as well as the task identification method. Finally, the global exploration objectives are stated.

Environment Model
The environment E is defined as a bounded planar workspace E ⊆ R 2 previously unknown. Besides, E is represented by an occupancy grid structure [4] where each cell c can belong to three different probabilistic states S = { f , o, u}, standing for free, occupied and unknown, respectively. Typically, P(state(c) = f ) = 1 − P(state(c) = o) is assumed. When |P(state(c) = f ) − 0.5| < the cell c is labelled as unknown; otherwise it is labelled as free or occupied, accordingly. These states represent all possible theoretical situations in which a point of the environment can be classified over time. The mapping algorithm frequently updates the probability value of each cell on each robot. Despite this, only the current classification of each cell at a given decision time step is considered. Consequently, the representation of E belongs to the domain of matrices S m×n . Furthermore, the region already explored E known and the remaining that is yet unexplored E unknown at time t may be defined from this representation as follows:

Robot Model
Given a robot team R = {R 1 , R 2 , . . . , R M } consisting of M homogeneous circular rigid mobile robots with wireless communication capabilities, a traditional representation defines each robot: .M] and X i (t) = {x i (t), y i (t), θ i (t)} represents the configuration vector of the robot i at time t (position of its centre and heading with respect to the inertial frame), r i represents the radius of the robot body, and s i , c i represent the sensory capabilities as maximum radius of sensing and maximum range of communication, respectively.

Communication Model
This model aims to support the connectivity awareness ability of robots needed to deal with disconnection situations during the exploration. Given the position of their teammates and obstacles, robots can estimate the connectivity degree of a specific location considering some of the communication constraints that are widely present in real scenarios, mainly indoor (e.g., office-like and buildings).
The signal strength function (Γ i represents a slight adaptation of the signal strength function presented in [38]) Γ i : N × S m×n × R → R is defined as follows: where, d Att and w Att stand for distance attenuation and wall attenuation terms, respectively. In addition, d i (j, t) represents the Euclidean distance between two robot locations at time t: typically the transmitter (X i (t)) and receiver (X j (t)), w i (j, E known (t), t) represents the number of walls (robots cannot distinguish between different kind of rigid obstacles, but the term wall is used for simplicity and in order to be consistent with the underlying proposal) present in the known region between transmitter and receiver locations at time t, D a f represents a distance attenuation factor, and W a f represents a wall attenuation factor. Finally, C represents the maximum number of walls up to which the W a f factor causes a significant effect in function Γ i . When w i (j, E known (t), t) ≥ C, the distance attenuation effect dominates. Finally, note that in [38] the independent term Γ 0 i is suggested to be either derived empirically or obtained directly from the wireless network device specification. Nevertheless, in this work the model is adapted in order to become independent from specific deployments (communication devices), deriving the Γ 0 i value so that the signal strength Γ i (j, E known (t), t) = 0 when d i (j, t) = c i . In Figure 2 the shape of the function Γ i , as well as the attenuation effects caused by both distances and walls, are plotted. Unfortunately, due to uncertain and incomplete knowledge, the Γ i function only can either confirm the absence of connectivity or deliver an optimistic estimation of connectivity level instead. Although this model represents a valuable improvement in relation to others (e.g., the classic disk or line of sight models [20]), for the sake of simplicity other impairments also common in communication (e.g., bandwidth, information losses, fading, and multi-path propagation phenomenon [39,40]) are not considered in this work.

Task Identification Method
The task identification problem is addressed following a frontier point approach [5] where the free cells (cf. Section 2.1) that belong to a frontier are over labelled as frontier points (FP). Besides, the resulting set of FP cells is clustered (using procedures such as K-Means [41] or Affinity Propagation [42]) in order to identify the cells that better represent each frontier, defining a set of tasks (in the remainder of the document, the terms task and target are used indistinctly) T = {T 1 , T 2 , . . . , T N } | T j ∈ R 2 , ∀j ∈ {1 . . . N}. Thus, T represents, at each moment, the smallest set of promising locations that the robots could be interested in visiting to explore all frontiers. In Figure 3 these task cells are coloured in yellow.

Multi-Robot Task Allocation Problem-MRTA
Following the classification proposed in [10], the MRTA problem to be tackled is described as a single-task robots (ST), single-robot tasks (SR), and instantaneous assignment (IA) problem. ST means that each robot is able to visit at most one task at a time. SR means that each task requires only one robot to be explored. IA means that the available information about the robots, the tasks, and the environment permits only an instantaneous allocation of tasks to robots, preventing the possibility to plan future allocations. Additionally, an ST-SR-IA can be formulated as an instance of the well known Optimal Assignment Problem (OAP) as follows. Given M robots, N tasks, and utility estimates U for each MN possible robot-task pair, the goal is to assign tasks to robots so as to maximise overall expected utility. Finally, from an Integer Linear Programming perspective, the problem can be formalised as: Find the MN non-negative integers α ij that maximise (2).

Global Objectives
The exploration aims for the full coverage of a bounded indoor environment, a priori totally unknown, with a team of terrestrial robots, in minimal time and avoiding isolation situations as much as possible. In this context, isolation refers to the fact of being unconnected from any other fleet member. In this work, the multi-robot system is designed to address these objectives from the following definitions.

Full Coverage
Given the E known and E unknown previously defined in Section 2.1, it is possible to claim that the completion condition is reached when E = E known or equivalently E unknown = ∅. Although this condition is straightforward, it is useless in practice. Alternatively, the completion condition is conceived considering the sensing activity of the robots over time. Let sen i (t) = sen(X i (t)) be the information gathered by the robot i at time t in the configuration X i (t). From this, E known at completion time T is defined as follows: Finally, the completion condition may be written as in (4) implying that there are no reachable configurations where any robot can gather new information.

Completion Time Optimisation
Additionally to full coverage, the multi-robot system is asked to perform the exploration in minimal time. Therefore, from (4), the minimal completion time condition can be expressed as:

Isolation Avoidance
In multi-robot exploration missions, the individual isolation situations (when a robot becomes unconnected from any other) are non-desirable. The key motivations to avoid them are (i) When robots are unconnected they have no possibilities to coordinate their actions, hence they could visit the same regions. Therefore, keeping the fleet connected is a way to decrease inefficiency; (ii) Damages or inner failures during isolation periods can lead to information losses. Therefore, keeping the fleet connected is also a way to decrease the risk of re-work and to prevent time performance degradation, consequently.
Thus, in addition to (5), the last of possible individual disconnections should be minimised. To this end, concepts of graph theory are borrowed in order to model a time-varying network topology of mobile robots. Such network is represented employing an undirected graph defined as G(t) = (V, E (t)) where the nodes V = {1 . . . M} represent the robots R i | i ∈ [1..M] and the edges E (t) = {i, j | i, j ∈ V, j ∈ N i (E known (t), t)} represent the operative communication links between any pair of robots R i , R j .
The function N i (E known (t), t) = {j | Γ i (j, E known (t), t) > 0} computes the neighbours of a robot i at time t. From this it is possible to define the isolation situations of any robot i like the periods when the corresponding node i has no incident edges (degree(i) = 0). Furthermore, isolation situations may repeat several times along the exploration.
In Figure 4 an example of an exploration timeline concerning disconnections is depicted. condition is straightforward, it is useless in practice. Alternatively, the completion condition is conceived considering the sensing activity of the robots over time. Let sen i (t) = sen(X i (t)) be the information gathered by the robot i at time t in the configuration X i (t). From this, E known at completion time T is defined as follows: Finally, the completion condition may be written as in (4) implying that there are no reachable configurations where any robot can gather new information. Additionally to full coverage, the multi-robot system is asked to perform the exploration in minimal time. Therefore, from (4), the minimal completion time condition can be expressed as: , t)} represent the operative communication links between any pair of robots R i , R j .

380
The function From this model, the expression for the disconnection last optimisation may be obtained as follows: From this model, the expression for the disconnection last optimisation may be obtained as follows: where : k indexes the disconnection events dE represents the ending time of the disconnection dE k

Auto-Adaptive Multi-Objective Task Selection Approach
In this section, a novel multi-objective based approach for multi-robot exploration missions is introduced. As was mentioned above, in exploration missions the best choice for the robots is to visit the places where the gain of information can be potentially higher. Gaining information is, actually, the only way to conclude the exploration task. Therefore, the connection between path-cost-based target selection strategies and the completion time performance obtained resides in the fact that this way the fleet expand its territorial knowledge potentially faster. Besides, when the environment presents communication restrictions, individual failures or incoordinations can lead to inefficiency more likely.
In order to make the system robust and efficient, a decentralised and asynchronous coordination mechanism is defined. An auto-adaptive multi-objective task utility function is defined in accordance with both the task identification method presented in Section 2.4 and the objectives of the exploration problem defined in Section 2.6. Its primary purpose is to integrate travelling costs and connectivity levels finding solutions with a right balance between the benefit of visiting the closer targets and the usefulness of keeping the team connectivity level as high as possible.
Furthermore, to make the system more flexible, an analytic approach through which the relative importance of each goal is set independently of the scenarios, is followed. As a result, an auto-adaptive procedure-where the human operator is asked to use his application field expertise in order to influence the robot decisions defining a criterion to balance the importance of both objectives-is developed. Several proofs of correctness on such a procedure are conducted demonstrating that the robots are always capable of auto-adapt the objectives weights to select the tasks accordingly with the human-operator criterion.

Task Utility Function
This function will guide the optimal task distribution search regarding well-balanced solutions where both the travelling cost and the team connectivity level are considered to evaluate the current targets. The objectives are implemented using utility functions such as (i) path utility function takes the travelling costs to deliver a notion of how beneficial-concerning distance-the tasks under consideration are; (ii) connectivity utility function gives the robots a connectivity awareness ability.
The task utility function 1], is defined as follows: s.t.
Given the current state of the fleet R and the current environment knowledge E known , the function Φ i estimates the utility obtained by a robot R i in case of selecting the task T j . The current fleet state refers to both the location of the assigned tasks in case of assigned robots and the robot positions otherwise. The terms Ψ and Ω represent path utility and connectivity utility functions, respectively. The weights α and β work as tuning parameters that permit to adjust the kind of solutions the system will search for. If α = 1 during the whole exploration, then the system would only intent to spread out the fleet. On the contrary, if α = 0 then the system would always search for potentially fully connected solutions. Otherwise, when (0 < α < 1) the system will balance both path utility and connectivity utility. As a result, sometimes the robots could choose other tasks than the closest to favour the team connectivity level. This possibility is deeply analysed further below in Section 4.
Although in this double-objective function the symbol β could be substituted by 1 − α, it is preserved for the sake of generality: if the weighted sum had more than two terms, it would not be possible to express all weights as α functions.

Path Utility
Path utility measures the relative effort needed for a robot to reach a task from its current location. The path utility function Ψ i : T × S m×n → [0, 1] is defined as follows: Given the current environment knowledge E known , the function Ψ i estimates the path utility obtained by a robot R i in case of selecting the task T j . The parameter γ works as a shaping factor that could be used to tune the relation between distance and utility. The ordered sequence of waypoints wp k represents the shortest path between the robot configuration X i and the target T j . All segments (wp k , wp k+1 ) are safe given that they are always built regarding only the collision-free pathways present in the known region E known . The wavefront propagation method proposed by [43] is employed to determine the waypoint sequence. The shape and behaviour of the Ψ function are depicted in Figure 5.  Path utility measures the relative effort needed for a robot to reach a task from its current location. The path utility function Ψ i : T × S m×n → [0, 1] is defined as follows: where: Given the current environment knowledge E known , the function Ψ i estimates the path utility obtained by 422 a robot R i in case of selecting the task T j . The parameter γ works as a shaping factor that could be used 423 to tune the relation between distance and utility. The ordered sequence of waypoints wp k represents 424 the shortest path between the robot configuration X i and the target T j . All segments (wp k , wp k+1 ) are 425 safe given that they are always built regarding only the collision-free pathways present in the known 426 region E known . The wavefront propagation method proposed by [43] is employed to determine the

Connectivity Utility
Connectivity utility computes, optimistically, the connectivity level present in a location at a certain moment. The connectivity utility function Ω i : T × R M × S m×n → [0, 1] is defined as follows: Given the current state of the fleet R and the current environment knowledge E known , the function Ω i estimates the connectivity utility obtained by a robot R i in case of selecting the task T j . Particularly, it is interesting to do so concerning the arrival time to T j . The current fleet state refers to both the location of the assigned tasks in case of assigned robots and the robot positions otherwise. The parameter ρ works as a shaping factor that could be used to tune the relation between connectivity level and utility. Note that the utility is decreasing in the number of robots, and may favour the adoption of MANET compliant connectivity techniques. In such networks, messages travel from source to destination members in more than one hop, where intermediate nodes forward messages until the destination is reached. The shape and usefulness of the Ω function may be appreciated in Figure 6. Therefore, as long as the will of another robot is to keep connected with the fleet, it would be able to take this perspective into account when deciding where going to.

General Considerations
The definition of multi-objective weights is usually accomplished as an empirical matter. Typically, a search process is run in order to find-after a lot of trials-values that fit some optimal criteria. This kind of methods is typically used when the parametric function is planned to be used many times. However, in the exploration context, this assumption or even the possibility of running trials are frequently out of the question. It is not possible to assume that all scenarios where the exploration will be conducted will be similar between each other and, for this reason, is neither possible to assume that the best α and β values can remain unchangeable.
Furthermore, when these procedures are followed, at the end of the training stage it is often tough to associate the resultant parameter values with real aspects of the problem (e.g., performance metrics like time, distances, energy, or even connectivity levels). This lack of understanding may, in turn, wrongly influence the fine tuning of such parameters without rerunning a portion of trials. Taking those shortcomings into account, an analytic approach-through which the α and β values might be set independently of the scenarios-is explored.

Adaptive α-Value Computation
When a multi-robot exploration process is going to run under communication constrained conditions, choosing between only exploring or exploring preserving connectivity level is a crucial decision. The first choice would be suitable when connectivity is out of the question, or it is impossible for a robot to keep connected and explore at once. In such a case, connectivity does not play any role in the decision-making process. On the contrary, the second choice is suitable when it is necessary to interleave high-performance exploration (minimising the total exploration time) and acceptable connectivity level (avoiding robot isolation as much as possible).
To this end, the human operator is let to use his application field expertise in order to influence the robot decision-defining a criterion to balance the importance of both objectives-by merely setting a parameter before the exploration starts.
Therefore, since α and β parameters determine the behaviour of the robots concerning target selection, two questions come up: (i) How can the value of those parameters be defined in order to ensure the applicability of the human-operator criterion along the exploration process? (ii) Should these values be adapted during the exploration process?
Henceforth, the task selection framework and the human-operator criterion are formalised. Besides, several proofs to demonstrate the existence and correctness of an adaptive α-value that makes the robots behave following the criterion mentioned above are conducted.

Task Selection Framework
This process is always made iteratively from a list, comparing the currently best task against the rest, one by one. Therefore, without loss of generality, the most relevant aspects can be studied just analysing all the possible relations between an arbitrary pair of tasks. Regarding the distance to a specific robot location and the connectivity level (number of connections with the rest of the fleet), any task can be classified according to Table 1. Table 1. Task classification.

Connectivity
Distance Closest (Cl) Furthest (F) Therefore, the meaning of these categories is straightforward: regarding the assignment of the fleet, Co means that the task location would offer to the robot at least the minimum level of connectivity (i.e., one connection to another fleet member); NC means the opposite; regarding the spatial distribution of tasks, Cl means that the task under consideration is closest to the robot than any other; F means that the task is furthest to the robot than any other task.
Moreover, let R i a robot and T j and T k two tasks such that class(T) can belong to any class defined in Table 1. In any scenario, these tasks can be related to each other according to Table 2. Given that T j and T k are arbitrary tasks, the matrix can be considered symmetric. Thus, taking one of the triangular matrices is enough to study all possible cases.
From the lower triangular, it is possible to identify some cases where one task is better (regarding both path utility and connectivity utility) than the other. Such an example is the [Cl/Co;F/NC] where T j is closer to the robot than T k , and it is the only one that keeps the robot connected as well. Similarly, in the [Cl/NC;F/NC] case neither task can keep the robot connected, and in consequence, the closest task T j results more convenient than T k . Thus, in both previous cases, the criterion to choose a task is clear: the closest task should be selected. However, in the other cases, it is not clear at all which task should be selected. In one case, [Cl/NC;F/Co], whichever selection implies either traversing longer distances or losing connectivity. In the other case, [Cl/Co;F/Co], selecting the closest task T j ensures traversing the shortest path but could imply losing connectivity. By contrast, selecting the furthest task T k would be acceptable only when the gain in connectivity oppose a more significant travelling effort. In other words, the human operator criterion is determined by setting the distance threshold until which the targets that preserve or enlarge connectivity are preferred over the rest, for all robots.
For instance, in the [Cl/NC;F/Co] case the selection will be conditioned as follows: T k will be selected if and only if the length of the shortest path between T k and the robot location is less than or equal to HO-Threshold. T j will be selected otherwise.
In order to make the influence of HO-Threshold clearer, an example scene is depicted in Figure 7. Note that all tasks are within the HO-Threshold, but only T 3 can enlarge the connectivity level of the robot R 1 . Thus, applying Definition 1 leads to the selection of task T 3 because it enables the robot R 1 to travel more distance to gain connectivity. On the contrary, whether the HO-Threshold ≤ 3, T 3 would be no longer preferred over the rest, and consequently the closest task T 2 would be selected instead.
Hence, in the presence of some specific conditions, it is expected that the application of the HO criterion can make the fleet more cohesive than following approaches that do not take communication constraints into account and less restrictive than the ones that do not permit disconnections or force re-connections as well.
Next, the proofs of correctness and existence of α (and β) values that implement the HO criterion are conducted regarding the cases present in the lower triangular of  Two robots are carrying out an exploration mission. The communication and sensory ranges are drawn around the robots with red and green dashed lines, respectively. It is assumed that R 2 has already chosen the task T 4 whereas R 1 is still selecting from T 1 , T 2 , and T 3 . Dotted lines are used to show the sight-line between R 2 and the tasks. The corresponding Euclidean distance is also shown. HO-Threshold is set to 6.

[Cl/Co,F/NC] and [Cl/NC,F/NC] Cases
In Figure 8a,b, two instances of these cases are depicted, respectively.  Proof. This claim can be derived directly from the following facts: • in the [Cl/Co,F/NC] case the furthest task T k makes the robot disconnected, and then applying (7) to T j and T k leads to: • in the [Cl/NC,F/NC] case both tasks make the robot to be disconnected, and thus the Φ function value will depend only on the Ψ term: In conclusion, in any of these cases, the task selection is not affected by α.

[Cl/Co,F/Co] and [F/Co,Cl/NC] Cases
In the [Cl/Co,F/Co] case both tasks offer the possibility to be connected. On the contrary, in the [F/Co,Cl/NC] case opposite objectives are present: one task is closer but disconnected while the other is connected but further. Thus, the latter case is taken to prove the existence of an α, that can respect any given HO criterion. The former case is finally used to corroborate the non-existence of any possible unwanted side effect caused by the achieved α expression.

[F/Co,Cl/NC] case.
Based on the human-operator criterion (set by a threshold value) we want an α-value that makes, following the scenario depicted in Figure 9, T 3 preferred over T 2 if and only if T 3 belongs to the circular area defined by the HO-Threshold. Next, the existence of such an α parameter will be demonstrated, and its value will be derived as well.

Proposition 2.
When T j and T k belong to [F/Co,Cl/NC] is always possible to find an α-value that satisfies the following inequality: Let Ω 1 the utility assigned to the fact of being connected with only one teammate. Then, from (9), it is possible to state that: if T j belongs to any [ * /Co] class, Moreover, if the number of robots does not change, it is also possible to state that Ω 1 remains invariant over time. Applying this result into (12) leads to the inequality presented next in (13): Besides, substituting (13) may be rewritten as follows: given that: (14) is possible to claim the existence of an α-value that obey any HO-Threshold if and only if the function x u+x presents an absolute minimum on the domain: This fact can be stated employing Weierstrass theorem (a function f has an absolute extreme if it is continuous and its domain is compact). Besides, the minimum point might be calculated analysing both: (i) the relative extrema and (ii) the points lying on the border of D. Following this procedure, it is possible to find the absolute extreme of the function x u+x in (x, u) = (c, 1). Moreover, it is remarkable that this extreme represents a place where the most demanding conditions are reached: task T j presents the lowest positive connectivity utility, and the distance between both tasks is the largest. Hence, the existence of a positive value α ≤ Ω 1 1+Ω 1 (regardless of how demanding can be the distance relation between tasks) that might alter the task selection in favour of connectivity has been demonstrated.
Nevertheless, in (13) α is independent of the HO-Threshold. Consequently, its direct application would result in a strictly connectivity-guided exploration, where tasks that offer connectivity are always preferred over the rest no matter how far they are. Therefore, to relate it with an HO-Threshold the value of the term Ψ i (T j ) in (13) must be substituted by the utility of being HO-Threshold far from the robot, say Ψ i (T HO ). Next, the value of the term Ψ i (T k ) is substituted by 1 since Ψ i (T k ) = 1 represents the necessary condition to reach the extreme coordinate u = 1 that arose from (14). Finally, the expression for an HO-Threshold dependent α, say α HO , is expressed in (15) as follows: Proposition 3. The applicability of the α HO referred to in (15) causes any task within the threshold scope that also offers any positive connectivity level to be favoured over the rest of the tasks that do not offer any connectivity level, regardless of how close to the robot they are.
is imposed to any tasks (T j , T k ) that respect the [F/Co,Cl/NC] conditions: Then, applying (15) leads to (16): Since i) Ω 1 is constant, ii) ≥ 1, and iii) (Ψ i (T HO ) − 1) ≤ 0, it is possible to conclude that: • the upper bound is reached when: Please note that (a) is out of the proposition conditions. Instead, (16) can be rewritten imposing (b), leading to: which is true if, and only if, ∆ i (T j ) ≤ HO-Threshold, which is indeed what the human operator would like to get from his criterion application to tasks within the HO-Threshold. Hence, following (15) under the [F/Co,Cl/NC] conditions it is always possible to compute an α HO -value that makes the robots behave following the human-operator criterion.
Likewise, it is important to highlight that the α HO -value needs to be calculated every time a robot is ready to make a decision. This need for adaptation arises from Ψ i (T HO ), which is not constant. Its value depends on the relation between the HO-Threshold and the relative distance to the current furthest task. That way, the robots can autonomously adapt the weights of the task utility function according to the changing conditions of the environment in order to be always consistent with the human-operator criterion.

[Cl/Co,F/Co] Case
This analysis is devoted to checking the applicability of the α HO when the conditions to achieve a good trade-off between path cost and connectivity level are less demanding than in the [F/Co,Cl/NC] case. In the [Cl/Co,F/Co] case, although one task is closer than the other, the differences in the positive connectivity level offered by them could make the furthest task more attractive than the closest. From that, considering the connectivity level offered by the closest, two cases may be identified: (i) When T j offers a higher level of connectivity than T k . In such a case, there is no doubt that independently of the α HO value, the selection would always favour the task T j because it is the closest as well; (ii) On the contrary, when T k offers a higher level of connectivity than T j , the selection of T k will depend on both how distant from robot it is and how much more connected would be the robot on T k respect to T j .
Finally, to show that the α HO value does not introduce any unwanted side effect on the task selection process when tasks belong to the [Cl/Co,F/Co] case, it is needed to prove that it neither contradicts the first case nor restricts the occurrence of the second case.

Proposition 4.
In the presence of two tasks subject to the [Cl/Co,F/Co] case conditions, if T j is the closest and simultaneously the one which provides the highest level of connectivity, then the application of the α HO value will never result in the selection of T k .
Proof. By contradiction, it is assumed that under these conditions the selection could be in favour of T k , implying that the following inequality holds: Which implies that, independently of the α HO value, the terms (Ψ i (T j ) − Ψ i (T k )) and (Ω i (T j , R) − Ω i (T k , R)) should not be positive simultaneously. Thus, either (Ψ i (T j ) ≤ Ψ i (T k )) or (Ω i (T j , R) ≤ Ω i (T k , R)). However, this contradicts the hypothesis where T j is stated as the closest and the one which simultaneously provides the highest level of connectivity, and accordingly the proposition has been demonstrated.

Proposition 5.
In the presence of two tasks subject to the [Cl/Co,F/Co] case conditions, if T j is the closest and T k the one which provides the highest level of connectivity, then the application of the α HO value will never be conclusive concerning the task selection.
Proof. The relation between the utility of tasks is written as follows in (18): it is possible to state that in order to favour the selection of T j the inequality (19) must be held, otherwise the (20): given that: On this domain, the function x u+x presents an absolute maximum equal to 1 in the point (x, u) = (1, 0), and absolute minima equal to 0 along the line segment defined by (x, u) = (0, u). Assessing the α HO expression derived in (15) with (0, u) leads to (21) and (22), respectively: From which, while the condition expressed in (21) is reached when |R| → ∞, the one expressed in (22) is reached when HO-Threshold tends to 0. The condition (21) is unreachable in practice implying that no α HO can make the task T k always preferred over T j . Conversely, the condition (22) is reachable if, and only if, the human operator deliberately does not want to care about connectivity. Otherwise, there is no positive α HO -value that can make the task T j always preferred over T k .
Consequently, when α HO ∈ (0..1] under the [Cl/Co,F/Co] conditions, it is not possible to hold a single preference over time.

Considerations and Usefulness
In order to establish the task selection criterion, the human operator only needs to choose the extra distance HO-Threshold-according to his expertise and knowledge-he is willing to ask the robots to travel in order to keep or enlarge the connectivity level of the fleet. Once the HO-Threshold is set, robots are capable of selecting tasks consistently with the HO criterion following the Equation (15). Furthermore, it is important to note that the HO-Threshold value does not change along the exploration but, as was pointed out, the α HO does, due to the dependency on the Ψ function. This explains the need for auto-adaptive capabilities concerning the multi-objective Φ function.
Additionally, it also worth noticing that setting HO-Threshold= ∞ it is a practical way to implement an event-based connectivity approach where the tasks that provide connectivity will always be preferred over the rest, no matter how close they are.

Task Allocation Scheme
The allocation scheme is founded on two pillars: the coordination method and the task selection algorithm.

Coordination Method
In order to take advantage of the individual computing power of the robots, to avoid the single point of failure, and to deal better with the presence of real communication constraints during the exploration, a decentralised approach is followed. Typically, estimation of travelling costs and target benefits, as well as mapping and localisation, are the tasks chosen to be made locally by the robots.
However, to achieve a cooperative behaviour, both the local map and localisation information must be shared among team members.
Additionally, the relation between |T| and |R| can result in two somewhat different behaviours: (i) If |T| < |R|, not all robots would be needed to reach all targets. Some robots may choose to keep quiet; (ii) When |T| ≥ |R| all robots would be needed in order to reach the maximum amount of targets at a time. When robots decide to explore, the task selection is made coordinately. Robots coordinate their actions implicitly, sharing specific information (such as locations, eventually already-done-selections, and local maps) and running the same selection algorithm. Thus, it is possible for the multi-robot system to compute a coordinated-tasks-to-robots distribution in a decentralised way [15,17,44].
To do so properly, the exchanging information time is carefully set up. The system is fully asynchronous, meaning that: (i) Robots do not wait for others; (ii) After selecting a task, the robots do exchange their selection in order to prevent future overlappings; (iii) Local maps and-by means of this -the sets of new available tasks are periodically exchanged, each time two conditions are met: (1) A waypoint of the planned path is reached; (2) New information has been gathered; (iv) Localisation data is exchanged at a higher rate than maps because its influence on the task selection algorithm is higher too.
While localisation data is exchanged periodically, the rest of data exchanging is triggered by events instead. These policies make the system more efficient and flexible because: (i) No data is transmitted when there is no new information to exchange; (ii) There is no need to set up any rate parameter when exploring different environments. The robot life-cycle algorithm is sketched in Algorithm 1. R * ← ∅ Vector of connected robot locations. 7: for j ∈ R ∧ j = i do 8: if Γ i (j, gMap) > 0 then Connected robot. 9: Asking for localisation data. 10: sndPose(pose, j) Sending own localisation data. 11: gMap = mapMerge(gMap, rcvMap(j)) Asking for local maps. 12: end if 13: end for 14: if atT then 15: T = getFrontierTasks(gMap) Tasks location vector. 16: task ← getAssignment(i, R * , T, |R|, HO-Threshold) 17: goto(task) [gMap, ni] = mapMerge(gMap, getMap(pose)) Mapping. 22: if atT ∨ ni then R i arrives at task or new information was gathered. 23: for j ∈ R * ∧ j = i do 24: sndMap(gMap, j) Sending local map. 25: end for 26: end if 27: end while 28: end function

Task Selection Algorithm
The task selection process employs the multi-objective utility function Φ defined in (7) with α HO values dynamically adapted by (15) to solve the MRTA problem stated in Secion 2.5. The corresponding algorithm is sketched in Algorithm 2.
Algorithm 2 Task selection algorithm.

5:
for each k in R u do 6: for each j in T HO do 7: PU[k, j] = Ψ k (T j ) Path utility matrix.  for each row r in T2RDist do 16: return T * 20: end function Firstly, the input parameter R * specifically corresponds to the locations of the teammates currently connected with the robot R i . Next, in lines 2 and 3, both the task and robot location sets are split up into two subsets each one (assigned and unassigned items, respectively). Line 4 is in charge of taking only the unassigned tasks that are within the HO-Threshold scope from every robot. Afterwards, from lines 5 until 9, the path utility matrix is computed regarding all possible task-robot combinations. Next, lines 10 and 11 aim to compute the α HO and β values according to (15). The set of tasks-to-robots distributions is calculated from line 12 to 14. Finally, from line 15 to 17 all possible assignments are evaluated using the Φ function while the task corresponding to robot i of the best assignment is selected in line 18.
Some considerations on Algorithm 2 are hereafter discussed. Concerning the computation of the set of tasks-to-robots distributions (lines 12 to 14), it provides a way to potentially avoid falling in local minima or even taking wrong decisions. Note that the connectivity utility function is subject to locality conditions and thus, it is not possible to compute optimal distributions from the application of iterative polynomial-time assignation algorithms such as the Hungarian method [14].
On the contrary, Algorithm 2 can choose the optimum tasks-to-robots distribution by evaluating all possible T HO -to-R u distributions. Nevertheless, this process may be potentially very hard since . Therefore, the smaller |T HO | and |R u | the faster the algorithm will run. In the first case |T HO | is bounded by pruning |T u | with the help of HO-Threshold.
On the contrary, even being naturally bounded (|R| ≥ |R * | ≥ |R u |), the set R could imply a large R u . Besides, all efforts are to keep the fleet connected as much as possible, leading to |R * | → |R|. Fortunately, in a fully asynchronous multi-robot system the probability of two or more robots being simultaneously making a decision is negligible.
Finally, note that Algorithm 2 assumes |T HO | > |R u |; otherwise the tasks-to-robots distribution cannot be computed. In such a case, the input parameters are managed in order to conduct a robots-to-tasks distribution instead. In turn, |Ar M u N HO | does not represent a significant effort since M u ≥ N HO holds for small values.

Baseline Statement and AAMO Approach Results
The aims of this section are: (i) To establish a baseline on the main figure of merits that will be defined to asses the benefits of different approaches; (ii) To assess and analyse the performance of different instances of the Auto-Adaptive Multi-Objective (AAMO) approach (different instances-from now on-refer to different HO-Threshold setup values) under non-ideal communication conditions; (iii) To compare AAMO instances against other approaches under non-ideal communication conditions.
Regarding the first purpose, the baseline is established regarding two state-of-art approaches so that the simulation runnings concern the comparison between a Yamauchi-based algorithm [5] and the minPos algorithm [17] under ideal communication conditions. These algorithms were chosen since they are decentralised, as are the author's proposal; while Yamauchi is a reference on exploration and typically serves itself as a comparison baseline, the minPos proposal has demonstrated very good performance, outperforming other important reference algorithms.
On the contrary, regarding the AAMO assessment and the comparison with other approaches, the simulation runnings concern exploration missions subject to non-ideal communication conditions. In this case, the primary purpose is to understand how compromised could be the exploration time performance when the connectivity level is prioritised and to reveal possible improvements concerning previous techniques. In consequence, there are experiments which compare only the performance achieved by different instances of AAMO, while in other experiments, where relevant, comparison with state-of-art performance is taken into account too.

Simulation Setup
All simulations were conducted over MORSE physics simulator (www.openrobots.org/morse/ doc/stable/morse.html) using ATRV-like robots equipped with laser range sensors. The more relevant simulation parameters are shown in Table 3.
Furthermore, it is important to precise that except for Communication range that depends on the device, the rest of communication factors were taken from [38] regarding their strong dependency on the materials present in the environment. The values of HO-Threshold correspond to 66%, 50%, and 33% of the communication range c i , respectively. In all simulations localisation and low level motion control are taken for granted.

Scenarios
Simulations are conducted over synthetic scenarios (See Figure 10) where long distances and obstacle presence may offer similar challenging conditions that would be expected in the real world. The Loop and Cross scenarios (see Figure 10a,b) were mainly used to confirm the correctness of the implemented solutions and to show the advantages of using a multi-robot approach over a single one. Unfortunately, and caused by the shape and size of the free zones, on those scenarios there are nearly no possibilities to demonstrate any advantage of the proposed approaches over the others. Finally, the Maze scenario (Figure 10c), that represents the most challenging environment, was used to establish comparative results among the approaches. These results are further analysed below in Sections 6.3 and 6.4, respectively. Due to the big amount of collected data, only the values related to Maze runnings are summarised and discussed here. Even so, all charts and screen-shots generated from data concerning all of the three environments are available online: www.fing.edu.uy/~fbenavid/ projects/MuRE/mure.html.

Robotic Agent Architecture
From a software architecture point of view, each robot is organised in three layers. In Figure 11 the main components are roughly depicted. Each layer is responsible for different aspects grouped by abstraction levels so that the higher layer, the more abstract are the issues which the software components are devoted to. Figure 11. Robotic Agent Architecture. The first layer includes the software components that represent systems or devices through which the agent can interact with the environment. The second layer includes models and algorithms to keep the models up to date. The third layer includes the task identification and selection algorithms. Components on the shadowed zone were developed during this work.
Going bottom-up in the layer stack, in the first layer the components are in charge of the interaction between the robotic agent and the environment. The Motion Control component is taken from the MORSE (MORSE physics simulator www.openrobots.org/morse/doc/stable/morse.html) repository and is responsible for controlling the motors. Besides, in this work, the component follows a way-point-based motion strategy. In the Sensory Capabilities component all sensory systems in charge of gathering environmental information are grouped. The most relevant information comes from the Pose and the Laser scanner sensors, also taken from the MORSE repository. From the Pose sensor it is possible to know the robot configuration X i (t) = {x i (t), y i (t), θ i (t)} at any time-implementing the localisation capability-while the laser gives an array of distance measurements z(t) from which is possible to build the map of the close surroundings. Finally, the Communication Capabilities component is asked to manage every aspect related to communications receiving/sending information from/to (see incoming/outgoing arrows) other team members. In this work, and since only the distance and wall attenuation effects (discarding other sources of perturbation) are considered, the communication is simulated in a very simple manner directly applying the communication model introduced in Section 2.3.
The second layer represents the core of the system where the models and algorithms that support the highest level functionalities-namely related to the exploration purpose of the system-are allocated. On the one side, the World Model component is in charge of modelling all physic interaction between the robotic agents and its surroundings. By keeping several structures up-to-date (e.g., occupancy grid map, the position of the fleet members, assignment of the fleet members), it is also able to support foretelling services that would be required for the highest level algorithms. On the other side, Mapping and Path Planning components are also supported by the World Model component since it gives an access point to the mapping structures and the kinematic models as well. The Mapping component implements a standard occupancy grid approach [46] where the posterior of the map is calculated from a collection of separate problems of estimating p(m k |z(t), X i (t)) for all grid cell m i and where each m i has attached to it one of the occupancy values S = { f , o, u} (previously defined in Section 2.1). The Path Planning component implements the wave-front propagation approach introduced in [43].
Finally, high level decisions as coordination are taken in the third layer when the task allocation scheme is executed by the Task Assignment component. In particular, the arrow between Task Assignment and Communication Capabilities components represents the exchange of current positions and task assignments from the agent to the fleet and vice-versa.

Figure of Merits
The performance of approaches is assessed regarding the following figures of merit. The first three are the most popular and represent the strongest quality indicators [20]. The fourth has been taken from [47] and sometimes can be useful to explain the results concerning the first two. The fifth was inspired by [48] in order to measure the connectivity quality. Besides, a sixth indicator is proposed here in order to have a better qualitative analysis of the connectivity aspects. Moreover, the connected components of the topology along the exploration are also plotted. The indicators are defined as follows:

Baseline Statement
In this section, a baseline of performance on the main indicators is established from runnings of both Yamauchi and minPos approaches under ideal communication conditions. Since the exploration problem is expected to be more difficult under non-ideal communication conditions than otherwise [20], the obtained results may be considered as a baseline of the first four indicators-defined before in Section 6.2-with respect to the corresponding performance achieved in runnings conducted under non-ideal communication conditions.

Collected Data
In order to conduct the assessment and comparison stated above, at least ten realistic software-in-the-loop simulations were executed on the Maze scenario presented in Figure 10. All collected data is presented in Table 4 and are organised obeying the following scheme. The columns refer to (from left to right): figure of merits (FM); approaches, where Y and MP stand for Yamauchi and MinPos, respectively; and the fleet size |R|. In each fleet size, the average AVE and standard deviation StD values are registered.

Baseline Assessment
We start the analysis highlighting that both approaches can adequately explore all the environments presented above in Section 6.1.1. Coherently, both approaches achieve high levels of CR. This can be seen clearer in Figure 12. Furthermore, the minPos approach outperforms Yamauchi concerning TT as was expected. However, the most notorious differences of performance are observed on fleets which size is less than or equal to five robots, as can be seen in Figure 13. In crowded environments, going from one location to another is often more difficult than in the presence of fewer robots. Therefore, due to collision avoidance manoeuvres, both approaches show an increasing PL when the fleet size increases. This behaviour may be observed in the corresponding chart in Figure 14. On the one hand, Yamauchi presents a trend with an almost invariant slope along the different fleet size values. On the other hand, under MinPos, the trend of PL presents a positive but minor slope from one to five-robot-sized fleet after what it becomes very steep. Hence, the analysis is divided into two cases. Firstly, when fleet size is less than or equal to five robots, MinPos is more efficient than Yamauchi since both approaches achieve very similar coverage ratios (see Figure 12) despite in the latter robots need to traverse longer distances than in the former, on average. That is expected since the Yamauchi approach does not take care about the dispersion of the fleet as the MinPos does and consequently, in the former robots are forced to deal with crowding more frequently than in the latter. This is a remarkable difference given that the energy needed to support an exploration mission will be closely related to the distance traversed by robots.
Contrarily, as the fleet size increase beyond five robots, the shape of the scenario and the peculiar wall distribution all together seem to make the crowding unavoidable for the MinPos approach, causing a severe worsening on its PL performance.
Finally, it is interesting to observe the over-sensing-cell phenomenon, because, by observing the amount of rework done by the fleet during exploration tasks, it also gives a good measure of the system efficiency.
In this case, we start the analysis pointing that in an ideal world-with perfect communications, perfect sensing and instantaneous actions-there would be no place for over-sensing. Nevertheless, in the real world, communications and sensing systems are not perfect and, more important, all actions take time. Even the ones which do not involve motion such as sensing, computing and communicating actions need some window time to be executed. Therefore, many things can happen simultaneously, e.g., sensing actions conducted on the same objects. In such a case, two or more robots might report the discovery of the same cells.
In conclusion, even under ideal communication conditions, it is possible to register some level of over-sensing, and this level is unavoidable because of the parallel nature of the system. However, it is equally interesting to analyse the over-sensing results: (i) When the fleets are obeying different policies; (ii) To have a baseline against which the results obtained under non-ideal communication conditions may be compared.
Backing to the experiments, during the simulation runnings we verify that the most significant over-sensing record is mainly generated at starting steps when all robots are very close to each other (recall that all robots start from the same corner of the scenario, see Table 3) and, in consequence, its sensing scopes overlap each other, significantly. In Figure 15 the robot placement setup at the starting time is shown. Figure 15. Robot placement setup at starting time. Robots are represented by black dots. The sensing scope of the robot placed right in the corner is represented by a grey area where it is possible to see the laser aces and the obstruction caused by some teammates. Robots are placed from the corner along the x and y axes. As the fleet size increase, new robots are placed next following the row of robots on each axis, alternately.
Conversely, after this initial period, the robots overlap each other less frequently, and hence the OSR remains almost unchangeable over time, in both approaches. Despite this, minor differences may be highlighted. Due to a better fleet distribution on the terrain-which decreases the probability of simultaneous sensing events-the fleet makes slightly less rework under MinPos approach than under Yamauchi approach (see Figure 16).

Conclusions
Concerning the maze scenario, the conclusions of the section are: (i) Regarding fleets integrated with at most five robots, the MinPos approach is clearly advantageous (outperforming the Yamauchi approach in all assessed figures of merit); (ii) The benefits of employing the MinPos approach are severely affected when fleet increase beyond five robots, decreasing quickly or even disappearing when it is about eight robots. Figure 16. Over-sensing ratio (OSR) under ideal communication conditions. This shows how as fleet size increases the trend of OSR is upward as well. This is expected since the more robots sensing the environment the higher the probability of simultaneously sensing the same cells.

AAMO Assessment
This section aims to study the impact of using different HO-Threshold values on the performance of the proposed AAMO approach when the fleet is asked to explore an environment under non-ideal communication conditions. Moreover, these results are compared with the one achieved by other approaches like Yamauchi and MinPos-when they are subject to non-ideal communication conditions too-and also with an event-based-connectivity strategy that does make all efforts in favour of connectivity (regardless the total exploration time).
This last comparison is namely important because the performance of this kind of strategy may serve as an upper bound on the connectivity level over time and the total exploration time as well. To do so, typically two strategies (based on different connection requirements, see Section 1.1.1) can be considered: the ones which force the robots to be connected only on task-arrival time (kind of event-based connectivity) or the ones which force the robots to keep always connected-even during the path traversal periods (continuous connectivity). In the former, the robots are forced to select only between tasks which location would not cause isolation on arrival-regarding the current task assignment of the fleet. Nevertheless, it does not take into account the connectivity level along the path between the current robot location and the location of the task under consideration. Conversely, the latter imposes stronger restrictions on the fleet mobility in order to guarantee connectivity at all times. Consequently, depending on the application field the latter strategy would be recommended but is more complex to implement than the former. On the contrary, the former allows a simpler implementation but could lead to a lower level of connectivity along the exploration. Concerning this document, a connectivity-at-task-arrival-time based strategy is used for comparison purposes.
Besides, it is also important to highlight that, despite Yamauchi and MinPos assume ideal communication conditions, neither approach needs to be modified or adapted in order to properly run under non-ideal communication condition. Nevertheless, in the MinPos case, some severe degradation is expected because of the following working hypothesis are not guaranteed anymore: All robots share the same map and know the position of the other fleet members, at all times. This could lead to incoordinations that, in turn, would harm the dispersion strategy on which the approach is strongly based. Conversely, in the Yamauchi case, the level of expected degradation is fewer due to the coordination level between robots is fewer as well. Robots only try to avoid going to the same task simultaneously.

Collected Data
In order to conduct the assessment and comparison stated above, at least ten realistic software-inthe-loop simulations were executed on the Maze scenario presented in Figure 10. All collected data is presented in Table 5 and are organised obeying the following scheme. The columns refer to (from left to right): Figure of

. Effectiveness Assessment
We start the analysis highlighting that all implemented approaches-and particularly all AAMO instances-can adequately explore all the environments presented above in Section 6.1.1. Coherently, all instances achieve a high level of CR when exploring the Maze environment, as can be appreciated in Figure 17.

AAMO vs. Baseline Comparison
Concerning TT, as was expected in multi-robot systems, all AAMO instances benefit from adding robots to the fleet. This result can be seen in Figure 18a. Nevertheless, compared to the baseline results all AAMO instances show performance degradation (see Figure 18b). The evidence indicates that the more efforts made in favour of connectivity (bigger HO-Threshold) the worst TT. In other words, not all HO-Threshold setup values produce the same level of performance degradation. Since the degradation of TT performance could be very problematic in many application fields, this subject is carefully analysed.
At first, the PL indicator can help to initially explain why the fleet spends more time under AAMO approach than under the MinPos approach, to explore the same environment. In Figure 19a it is possible to observe the same behaviour as in the baseline (see Section 6.3): larger fleets imply bigger PL; while Figure 19b shows the difference between the corresponding total length of the paths traversed by fleets. The similarity between Figures 18b and 19b is remarkable and could explain, to a large extent, the origin of TT degradation. Simply, under the AAMO approach, the robots are asked to invest some effort (translated as a distance using the HO-Threshold) in order to keep the fleet connected and hence it is logic to get a bigger PL as a result. Moreover, the tradeoff between path and connectivity utility discussed in Section 3.1 shows up through these results, reflecting that the price of connectivity maintenance is the inability to apply an optimal policy concerning path costs.
Nevertheless, there exists a small portion of the TT degradation that cannot be explained by the PL increasing. Therefore, the hypothesis assumed in the tractability analysis made at the end of Section 5.2 are compared here with the simulation results in order to add a complementary explanation on the TT degradation. Furthermore, this TT degradation shows a parabolic trend as the fleet size increase, reaching a maximum about three-sized fleets, independently of the HO-Threshold values. Thus, the analysis will be conducted observing what happens when the fleet size does change but the HO-Threshold does not (in order to explain the shape of the curve or the relative values), and the opposite conditions are imposed in order to explain the absolute values.
In any case, it is worth knowing that the Task selection algorithm is the most demanding software component in the software architecture of the robots. Hence, the overall performance of the multi-robot system is highly determined by the performance of this component. In turn-as was pointed out in Section 5.2-its performance is strongly influenced by the number of unassigned thresholded tasks n = |T HO | and the number of unassigned robots in a connected component m = |R u | that are making a decision at the same time, in the following way: |Ar n m | = n! (n−m)! = Π n−1 m=0 (n − m) → O(n m ). Therefore, the smaller |T HO | and |R u | the faster the algorithm will run. Please recall that |T HO | is upper bounded by the amount of unassigned tasks |T u |. Firstly, from Figures 20 and 21, it is possible to examine how |R u | and |T u | change along explorations depending on the fleet size. In all cases, both values show well defined patterns that are easily identifiable. Concerning |R u | (see Figure 20) it is possible to state that in all AAMO instances-working in a fully asynchronous modality-the probability of two or more robots simultaneously running a decision making process is negligible. Thus the majority of time either none robot is making a decision or at most one robot is evaluating the available tasks. Results obtained during simulations are summarised in Table 6 and show a behaviour that is consistent with this last statement independently of the fleet size. The low ratio of robot coincidences is remarkable (e.g., for 3-sized fleets, about 96% of the decision making moments have only one robot participating on them).  In conclusion, in practice, the worsening of the TT performance is apparently only related to the incidence of the HO-Threshold on the |T HO | value. Next, this relation is carefully studied, and some answers are essayed.
The parabola described by the TT degradation values in Figure 18 suggests the presence of two factors impacting on this behaviour. One presses the trend upwards and the other in a counter sense. In the following, two particular factors are analysed: the fleet size and the bounded condition of the environment. (i) As the fleet size increase robots make progress faster, causing |T HO | to increase more quickly as well. When |T HO | rises, the task selection algorithm becomes slower, and thus the increase in the fleet size could explain the first increasing section of the trend; (ii) In bounded environments, the multi-robot exploration systems typically show two mobility patterns that characterise, in turn, two different exploration stages: (1) One is characterised by the dispersion of the fleet on the terrain. In such a stage, the new available tasks appear closer to each other, and its total amount |T u | is upward; (2) On the contrary, the second exploration stage is characterised by the convergence of the fleet to the remaining unexplored zones starting when it is no longer possible to disperse the fleet until the end of the exploration. In such a stage, the new available tasks generally appear further to each other and its total amount |T u | is decreasing. Therefore, since the tasks T HO are the ones which are closer than a relative distance HO-Threshold, under the AAMO approach, it is statistically less demanding for the robots to select a task during the last exploration stage than in the initial one.
Additionally, either when the fleet size increase or the HO-Threshold decrease, the transition from the first to the second exploration stage is achieved faster. This fact can be corroborated in both Figures 21 and 22. For instance, concerning Figure 21, the 3-Robot system spends about 410 s to reach the end of the dispersion stage whereas the 5-Robot system and 8-Robot system spend about 320 s and 260 s, respectively. Likewise, from Figure  Hence, despite the fact the impact of the fleet size on the exploration stage transition appears to be higher than the one caused by the HO-Threshold value, both aspects contribute to reducing the task selection effort enabling robots to save time in the task allocation procedure anticipatedly.
In conclusion, when the AAMO is executed in bounded environments, the addition of robots and the decreasing of HO-Threshold can almost entirely mitigate the worsening in the total exploration time performance. Please note that the performance degradation of AAMO:10 instances is almost null for eight-sized fleets.
From these promising results, in the following, all AAMO instances are compared with the other approaches concerning non-ideal communication conditions.

AAMO Efficiency Assessment
In this section, several statistical analyses were performed on different indicators to demonstrate the efficiency of the proposed AAMO approach. Wilcoxon signed-rank tests were performed (a non-parametric test was chosen since data in each condition do not follow a normal distribution) to compare samples from two populations. More precisely, it tests the indicator differences between approaches for a given fleet size.
Firstly, in relation to TT (see Figure 23), the evidence confirms two expected results: (i) All approaches benefit from adding robots to the fleet. A Wilcoxon difference test was performed regarding TT and the fleet size for each approach. All comparisons present a significant decrease in TT when fleet size increases (p-value < 0.05); (ii) Since it only takes care of connectivity, the EbC approach shows the worst performance regardless the fleet size. Wilcoxon tests showed a significant result (p-value < 0.001) for all comparisons between approaches given a fleet size.
Additionally, all AAMO instances show competitive TT results even slightly outperforming other approaches in the case of AAMO:10. In particular, a Wilcoxon difference test showed that AAMO:10 has a smaller TT than MinPos for 2 and 5 robots (resp. W = 169, p-value < 0.01, and W = 159, p-value < 0.05). Secondly, concerning the PL indicator (see Figure 24), the EbC approach present the worst performance, coherently. Again, the Wilcoxon test showed significant results (p-value < 0.001). Likewise, all AAMO instances show competitive results too.
Besides, and as was pointed above, the TT and PL results show that the lack of ideal communication conditions negatively affects the MinPos approach more than the Yamauchi approach. Wilcoxon tests showed a trend for 4 and 5 robots (p-value < 0.1) concerning TT, and a sigfinicant difference in PL for 4 robots (p-value < 0.05).
Up to this point, the AAMO approach has shown results as good as the MinPos approach. Next, the indicators related to connectivity are analysed in order to properly assess the potential advantages of the AAMO approach in the presence of more realistic communication conditions. The DLR indicator trend is shown in Figure 25. As can be seen, while the performance of the MinPos and Yamauchi approaches are the worst, the EbC performance is remarkably the best. These visual results were confirmed by Wilcoxon tests between approaches for each fleet size. DLR indicator is significantly bigger (p-value < 0.001) for MinPos and Yamauchi than AAMO and EbC approaches, except for 8-sized fleets where these results are significant only when compared to EbC. Moreover EbC has a significant smaller DLR indicator (p-value < 0.05) than all the others approaches except for the 5 and 8 robots cases, in which any statistical difference can be found between AAMO approaches and EbC. Similarly, the AAMO approach results represent a very good improvement with respect to both MinPos and Yamauchi approaches. The chart in Figure 25 reveals that our approach outperforms both Yamauchi and minPos approaches independently of the fleet size on average. Nevertheless, the smaller fleet, the greater outperforming. The explanation can arise correctly from intuition: when the environment is bounded, the probability of being disconnected tends to decrease as the fleet size increase. Therefore, the benefits of our approach tend to be smaller when the fleet size increases. Either way, it is always meaningful. Please note that even in the largest fleet size case, the DLR of AAMO represents an improvement of 20% on average compared to the corresponding Yamauchi or MinPos.
Furthermore, the relation between TT, DLR and HO-Threshold is noticeable. The more effort demanded by the human operator (higher threshold), the slower but higher connected the AAMO performs. This claim is confirmed by Wilcoxon tests that showed a significantly bigger (p-value < 0.05) TT indicator for AAMO:20 than for AAMO:10, and also show that the DRL indicator is significantly smaller (p-value < 0.05) for AAMO:20 than for AAMO:10, regardless the fleet size. Regarding the oscillation registered, it could suggest the existence of the following rational pattern. When fleet size is even, the easier way to avoid isolation situations is keeping in pairs (connected with at least another teammate). Contrarily, when the fleet size is odd, not all robots can keep in pairs. In case the fleet has divided, at least one sub group must be composed of three robots. Therefore, this oscillatory behaviour could hint at the fact that odd-sized fleets need to make little more effort to avoid robot isolation situations and are consequently subject to bigger DLR results as well.
Likewise, it is interesting to analyse the DLR indicator and network topology together. This way it is possible to get a closer notion about the interaction between robots along the exploration. Figure 26 is devoted to showing the number of connected components present in the network, averaged over time.
Please note that for the AAMO:20 instance-run on 2-Robot fleet-the DLR is about 40% (see Figure 25), coinciding with the percentage achieved by the 2CC of the same fleet size in Figure 26. In other words, the fleet holds a network composed of one single connected component during 60% (100%-40%) of total exploration time. Consistently, this is equivalent to say that during this portion of the time none robot has been disconnected.
Additionally, and as a matter of fact, the chart shows that as the fleet size increase it is more challenging to keep the whole fleet connected: 1CC stack is decreasing in size as the fleet size increase. Nevertheless, it also shows that simultaneously with the adding of new robots, the fleet is more and more cohesive (in relative terms). This fact may be corroborated looking at the upper part of the chart where the stacks corresponding to the greatest number of connected components are plotted. The following pattern can be observed: the number of connected components (given by nCC) increase slower than the fleet size n. Again, the fact that the Maze scenario is bounded may explain this phenomenon to a large extent. Depending on the number of connected components and the fleet size, it is possible to study the existence of sufficient conditions to fall into isolation situations. For instance, for a 3-Robot fleet, the 2CC or 3CC topologies imply having at least one robot isolated while for a 5-Robot fleet this implication is related to 3CC, 4CC, or 5CC topologies, and so on.
Although all this information gives an approximated notion about how disconnected is the fleet (group perspective) along explorations, it is not enough to hint what is happening at the individual level. Thus, it is also interesting to study the worst case of the individual disconnections last. This way it is easier to evaluate both coordination capabilities (how long a robot is unable to coordinate its actions with any other teammates) and risky situations (how long the fleet present single points of failure). Recall that the key motivations in considering communication constraints are strongly related with the rework avoidance: (i) When robots are unconnected they have fewer possibilities to coordinate their actions hence they could visit the same regions unnecessarily. Hence, keep them connected is a way to favour the efficiency; (ii) In the presence of damages or inner failures the exploration strategy should take those events into account preventing the need of re-exploration.
In Figure 27 the trend followed by Maximum Disconnection Last Ratio MDLR indicator is depicted showing that the bigger HO-Threshold, the shorter disconnection periods (Wilcoxon tests showed that the MDLR indicator is significant smaller (p-value < 0.05) for AAMO:20 than for AAMO:10, for 2 and 3 robots, and tends to be smaller (p-value = 0.09) for 4 robots) and that the last of isolation situations is at most equivalent to half of the DLR values for every fleet size and HO-Threshold value as well. In other words, the isolation situations regard more than one single robot and this in turn, reveals that under the AAMO approach the robots often intent to rejoin each other. At last but not least, it is worth to discuss the trend of OSR as the fleet size increase. The results obtained by the different AAMO instances are depicted in Figure 28. In Section 6.3 the OSR levels were achieved mostly thanks to simultaneous sensing actions, conversely, in these simulation runnings, the OSRs achieve higher levels due to non-ideal communication conditions. As was expected, the more the mapping information of the robots is out-of-date with respect to each other, the higher the OSR. However, in any communication conditions, the same upper bound is achieved. This suggests that the size and bounded condition of the Maze environment could be limiting the over-sensing phenomenon when fleet size increase beyond five robots.
To sum up and concerning the Maze scenario and the baseline stated in Section 6.3, the conclusions of this section are: (i) The AAMO approach can be employed as a strategy to coordinate multi-robot systems that are dedicated to exploration tasks; (ii) As was expected, the HO-Threshold value directly impacts on the connectivity level that the fleet is able to hold during the mission; (iii) Likewise, the relation between HO-Threshold values and the TT and DLR/MDLR indicators is the expected: the bigger the HO-Threshold value, the worse TT performance, but the better DLR/MDLR ratios; (iv) Although all instances of the AAMO approach present TT degradation with respect to the baseline, in any case it is not significantly due to the computation of the proposed task-to-robots distribution; (v) All AAMO instances outperform the baseline concerning the DLR and MDLR indicators; (vi) With the exception of DLR/MDLR, all instances of the AAMO approach outperform the EbC approach; (vii) The topology of the fleet networks shown during exploration is consistent with the HO-Threshold values, for all AAMO instances. The AAMO approach shows effectiveness and flexibility (through the HO-Threshold setup) to tackle the multi-robot exploration problem. Particularly concerning the efficiency related to both completion time and connectivity level maintenance, the approach appears as an intermediate solution that presents much better TT performance than the most restrictive approach EbC and better connectivity level along exploration than the approaches that do not take care about communication issues.

Task Assessment Problem
The proposed Auto-Adaptive Multi-Objective (AAMO) approach follows a multi-objective assessment strategy where the tasks under consideration are assessed regarding two objectives: the cost associated with the corresponding shortest path and the connectivity level each task location can offer to robots at arrival time. The multi-objective strategy is implemented employing a weighted sum that trades travelling cost off for connectivity levels. Up until now, all these concepts are quite standard being present in several state-of-art approaches.
Nevertheless, in this work: (i) Connectivity awareness ability is given to the robots by modelling attenuation effects that commonly affect the communication signal strength; (ii) the weights of these potentially conflicting objectives are derived from formal analysis instead of a training stage, making the system more adaptable to different environments; (iii) The human operator is asked to use his application-field expertise to play a part in the task assessment process by setting a distance threshold until which the tasks that preserve or enlarge connectivity are preferred over the rest. All this leads to a more flexible system where the robots can deal with communication constraints adjusting the weights of each objective independently of any scenario in a more intuitive manner, saving a lot of training time.
All existence and correctness proofs conducted on the task selection procedure support the fact that the robots are always capable of auto-adapting the objectives weights in order to select the tasks accordingly with the human-operator criterion. In conclusion, this task assessment approach may be very advantageous considering its ease of deployment.

Task Allocation Problem
Concerning the tasks-to-robots distribution algorithm, all previous proposals explicitly avoid the combinatorial blow-up of allocation complexity using different heuristics. Nevertheless, heuristic-based approaches make assumptions that cannot be verified at all times. In consequence, when the heuristic fails the robots choose suboptimal distributions.
Taking into account this limitation and since the number of possible distributions depends on both the number of available tasks and the number of robots making a decision, the proposal presented here computes optimal distributions based on more general assumptions such as: (i) Robots can implicitly coordinate their actions; (ii) Asynchronism may keep the number of simultaneous decision making at small values; (iii) Pruning the furthest tasks (out of the scope defined by the human operator -HO-Threshold) does not prevent the computation of optimal tasks-to-robots distribution.

Connectivity Maintenance Problem
While all event-based connectivity approaches consist in the execution of regaining-connectivity actions in the presence of specific events (e.g., typically disconnections, whenever it happens or periodically after a certain amount of time), the AAMO approach integrates a less restrictive connectivity strategy where the robots are motivated but not compelled to regain connectivity. When selecting their targets, the robots are always considering the opportunity cost of keeping connected or regaining connectivity, implicitly. Furthermore, in reconnection cases, the task location becomes the meeting point itself eliminating the need for rendezvous policy implementation and, maybe more important, avoiding deviations from natural paths. This way, the policy is utterly transparent to the eyes of the external observer: every time it is possible to explore and keep or enlarge connectivity level the robots will choose this option. On the contrary, when it is not, they merely behave guided by a pure path-cost exploration.
Particularly concerning the efficiency related to both completion time and connectivity level maintenance, the approach is capable of decreasing the last of disconnection periods without a noticeable degradation of the completion exploration time, appearing as an intermediate solution that presents much better completion time performance than the most restrictive event-based connectivity approaches and better connectivity level along exploration than the approaches that do not take care about communication issues.

Future Research Directions
New research questions have arisen along this work leaving, as a result, several opportunities to improve the developed system. Although the environments employed in simulations are proposed as benchmarks, it would be beneficial to check the validity and performance of the proposed approach on a broader variety of scenarios. Large office-like environments would be exciting to put the system into more realistic situations like mapping buildings where larger fleets could be employed, too. Since the robot model defined can support robots with different characteristics, exploiting heterogeneity could be a promising research direction. Integrating a fleet with heterogeneous robots (e.g., different in size, sensory, and motion capabilities) could enhance the skills of the fleet. For instance, given their greater mobility, UAVs could help the fleet to keep connected by playing a relay role, while small terrestrial robots could be the key to get into access-restricted spaces. At last but not least, executions on real systems are also planned. Despite the goodness of any simulator, many important details escape their scopes. The proposed approach is designed to serve as a solution for real-world applications so that it is imperative to verify its feasibility in real scenarios. In such a case, localisation and mapping errors cannot be ignored. Both simultaneous Localisation and Mapping (SLAM) algorithms and the sensory and motor devices should be carefully studied to limit the influence of this kind of errors on the high-level decision components. Regarding the equipment availability of the involved laboratories, the candidate platforms would be either IRobot or KheperaIII units.
Funding: This research was partially funded by the Comisión Sectorial de Investigación (CSIC-UdelaR) through its mobility program MIA.