Intelligent Exploration Approaches Based on Utility Functions Optimization for Multi-Agent Environment Applications

: In this work, the problem of exploring an unknown environment with a team of agents and search different targets on it is considered. The key problem to be solved in multiple agents is choosing appropriate target points for the individual agents to simultaneously explore different regions of the environment. An intelligent approach is presented to coordinate several agents using a market-based model to identify the appropriate task for each agent. It is proposed to compare the ﬁtting of the market utility function using neural networks and optimize this function using genetic algorithms to avoid heavy computation in the Non-Polynomial (NP: nondeterministic polynomial time) path-planning problem. An indoor environment inspires the proposed approach with homogeneous physical agents, and its performance is tested in simulations. The results show that the proposed approach allocates agents effectively to the environment and enables them to carry out their mission quickly.


Introduction
Research on autonomous agents can produce a revolutionary effect in exploration schemes like the ones developed by search/rescue teams in accidents, natural disasters, or military tasks. This kind of activity can be automated or considered a mobile-robotic problem where it is needed to search an unknown number of targets in a static environment [1]. There are several applications in this field, such as planetary exploration [2], surveillance [3], rescue [4], or cleaning [5], in which the complete coverage of terrain is part of the inherent goal of a mission.
The goal to tackle the environment exploration problem lies in determining how an agent should move to obtain as much new information as possible [6]. A proper exploration algorithm should have two properties, completeness and effectiveness. Completeness requires that the agents cover most of the environment, while effectiveness means that the agents achieve completeness by minimal efforts (i.e., exploration time, power consumption, etc.). In particular, a good exploration algorithm for the so-called graph-like environment corresponds to finding the shortest round trip through all nodes of the graph, which is the well-known traveling salesman problem (NP-hard for known graph-like environments) [7].
In static environments; since the targets are not moving, the target-searching problem is comparable to the exploration problem. Here, a target represents an unknown situation in the environment that the agents require to find complete the exploration tasks. As the agents cover more areas of the environment, they can find the targets. Hence, if the agents can achieve complete coverage; then, they can find all targets.
Typically, there are several benefits of using multiple agents over single-agent systems [8]. First, cooperating agents can complete a single task more quickly than a single agent [9]. In addition, multiple agents may be able to locate targets more efficiently if they exchange certain information such as positions and regions previously explored; and therefore, avoiding re-exploration [10]. Finally, it is expected that redundancy and greater fault-tolerance will be gained by using multiple agents, while keeping the system simpler than a single powerful and advanced agent.
The basic idea in exploration algorithms is to identify the boundary of a covered area in a map or frontier; and then, select the appropriate frontier for the agent to move. By crossing frontiers, the known area increases accordingly. However, there exist two main related challenges. First, selecting the proper frontier to maximize the information gain in a multiple frontier problem; second, how an agent can efficiently and safely approach a frontier.
The present work looks for the proper handling of the first challenge. Thus, an intelligent approach is presented for coordinating several agents using a market-based model considering the three tasks to accomplish the agent job: knowing the environment, making decisions, and interacting socially in a multi-agent system. For executing the three tasks, an agent is limited by available information, time, and hardware-processing specifications. Moreover, three cognitive styles can be identified for an agent. First, the Satisfier who simply tries to find a solution that is "good enough". A satisfier selects the first option that fulfills the problems at hand without analyzing if such a solution is optimal. Second, the Maximizer who tries to make an optimal decision. A maximizer tends to take more time to decide due to the need to carefully maximize the performance for all variables and make the tradeoffs. The utility function approach is optimized with genetic algorithms for maximizers. Finally, the Intelligent who may also learn or use knowledge to achieve its task. An intelligent seeks to use little information to make good decisions based on the agent introspection and social requirements. The utility function model using artificial neural networks with low computational cost inputs is proposed for the intelligence.
This work compares different exploration algorithms for agents in a simulated indoor environment to challenge selecting the proper crossing frontier to maximize the information gain. Random algorithm and nearest frontier-based exploration are implemented for satisfiers. The utility function approach is optimized with genetic algorithms for maximizers. The utility function model using artificial neural networks with low computational cost inputs is proposed for the intelligence. This paper presents several simulations to explore the proposed approach properties, and comparisons with related approaches are provided. It can be observed that the intelligent cognitive style reduces significantly the time required to cover an unknown environment with a team of agents. The present paper is structured as follows: Section 2 presents the related work; Section 3 deals with the proposed approach; Section 4 focuses on the implementation; Section 5 describes system test and experimental validation and finally, Section 6 addresses conclusions and future work.

Related Work
The design of intelligent exploration algorithms and coordination mechanisms for multi-agent systems applied to target searching includes different multi-agent coordination approaches for environment exploration.
According to the occupancy maps, the most common deliberative technique is the frontier-based exploration algorithm [11]. The technique is based on identifying the boundary or frontier of the map-covered area; and then, select the proper frontier where the next agent should move. When the target frontier is assigned to an agent, the goal is to approach the frontier using the least number of moves while avoiding obstacles. To effectively plan the path to reach the frontier, probability-based algorithms, such as Probabilistic Road Map and Rapid Exploring Random Trees, can be used [12]. This path planning is a Non-Polynomial (NP) hard problem, and it can require heavy computations in cluttered environments. In addition, the development of an exploration algorithm based on a utility function to make effective decisions by a single robot has been explored. For example, the utility function can be designed to achieve new navigation goals by the agent, considering short distances and higher chances to increase the knowledge about the map. Some param-eters (factors) used to enhance the implementation of these algorithms, and therefore, the agent decision-making, are cluster, distance, clearance, and unreachable-points [13].
Single robot exploration has been expanded towards the implementation of cooperative exploration. To enable such exploration using frontier-based algorithms, multiple agents must share their local maps to find together the global frontiers. If the agents can be located within the environment, they can share and merge their findings by summing or multiplying the state values in each local map [14]. However, if the location is not accurate enough, the agents must use probabilistic algorithms to merge the local map information. For instance, particle filters can support a group of agents to merge local maps under the agents uncertainties [15]. Furthermore, the agents can negotiate to allocate frontiers to more suitable agents when the local maps are merged into a global map. In [16], a potential field-based algorithm makes it possible for each agent to select the closest frontier to approach. In addition, to avoid an agent from attempting to move to a nearby but inaccessible frontier, a higher priority is given to a visible frontier. An optimal frontier assignment algorithm allows the agents to select frontiers sequentially. Once an agent has selected its destination frontier, this frontier relative weight will decrease so that the next agent will no longer select this frontier [17].
In the literature, the random search is most frequently applied for target searching [18]; however, it is key to coordinate agents to search efficiently for multi-agent systems. For instance, by creating a special pattern following it, the agents have better sense/sensitive coverage and can avoid re-exploring the area that has already been covered [19].
Several projects involve multi-agent cooperation architectures. For instance, Grabowski et al. consider teams of miniature agents who overcome the limits imposed by their small scale by exchanging mapping and sensor data [20,21]. As a part of this plan, a team leader incorporates the information collected by the other agents. In addition, it orders the other agents to bypass obstacles or direct them to unknown areas. Mataric and Sukhatme review different task assignment strategies in agent teams and analyze the team performance through extensive experimentations [22]. Burgard et al. consider the problem of a collaborative exploration of an unknown environment by multiple agents [23] but restricting the agents movement so that two agents do not approach the same target position or visit a position in the visibility area of another. The algorithm considers at the same time the utility of frontier cells and the cost for achieving them. Coordination is achieved by trading off the utilities and the cost by reducing the utilities depending on the number of agents already going to a specific zone. Simmons et al. extend the approach presented in [24] by distributing the computation and by using a more sophisticated notion of expected information gain by considering current map knowledge and the individual agent capabilities. Berhault et al. consider coordinating a team of mobile agents visiting several targets in a partially unknown area [25]. Their approach is based on combinatorial auctions, where agents bid on target bundles instead of single targets, as is often the case in auction strategies for exploring unknown environments. The idea is to consider synergies among targets to optimize exploration. Billard et al. study the influence of communication, learning, and the number of agents in the task of mapping the target locations in a dynamic environment [26]. The research is based on a theoretical framework based on probabilistic modeling and analyzed via simulation and physical implementation. The results of multiple experiments are compared with those anticipated by the probabilistic model, and they agree that the probabilistic model is a good approximation of a multi-agent system. These results show an effective approach for learning target locations that often change.
Cooperative multi-agent systems are those in which multiple agents interact jointly to solve tasks or optimize utility [27]. For example, in [28] Shi. H et al. researched on a distributed cooperative strategy using pedestrian behavior. As a result of interactions among agents, the complexity of multi-agent problems can increase rapidly with the number of agents or their behavioral sophistication.
Recently, some authors have worked on new strategies to apply frontier-based exploration methods. For example, a frontier-based exploration concept and a multi-agent flood Appl. Sci. 2021, 11, 2408 4 of 29 algorithm have been used to obtain a better exploration in an unknown area [29]. In the case of [30], Gomez et al. have worked on combining frontier-based exploration concepts and map-building using semantic information. They propose a semantic frontier classification and selection considering a cost-utility function. Here, the semantic information is used to classify the frontier as a free area or transit area considering the map geometric characteristics. On the other hand, N. Mandoui et al. work on a frontier-based approach in which robots share their local frontier points instead of sharing the whole grip map to save communication bandwidth [31]. In this work, the robots perform dynamic cooperation in which they have some specific information about their mates, such as positions.
Some authors have worked on efficiency improvement of frontier-based exploration methods. For example, Fand and Ding combine a frontier point evaluation with random frontier points optimization (RFPO) and SLAM algorithms. The idea is to work with a multistep exploration, where the algorithms do not plan a global path from the current position, but instead establish a local exploration path size. Once an agent reaches the local exploration path size, the current optimal frontier point is reselected to avoid the robot from making repetitive decisions [32].
Other authors have focused on the integration of deep reinforcement learning techniques with existing frontier integration methods [33][34][35][36][37]. Some authors, such as H. Li, have proposed a deep learning-based decision algorithm that uses a neural network for learning [34]. In [35], Z. Chen et al. propose a deep learning approach for multi-agent exploration. In that work, they use agent multi-channel maps, known regions, unexplored areas, and obstacles as inputs of a convolutional neural network. Other works such as Hu, J. et al. [36] propose an approach that incorporates a technique called dynamic Voronoi partitions. This method reduces duplicated exploration areas by assigning different targets location to individual robots. In [37] Yu C. et al. worked on a distributed multi-agent deep reinforcement learning approach to cooperative strategies for multi-robot pursuit. In other cases, such as the work done by Y. Hou et al., a radial basis neural network is used to build a continuous occupancy grid map [38]. Moreover, the research conducted by Shrestha et al. implements a deep generative neural network model to predict unknown regions [39].
On the contrary to all approaches discussed above, this work aims to contribute to the performance evaluation of different cognitive levels in the decision-making process, emphasizing the need to develop intelligent algorithms that can take effective decisions with little information based on the modeling of optimal decision-makers.

Proposed Approach
Exploration can be defined as the process of selection and execution of actions to maximize environmental awareness; therefore, models of the physical environment can be acquired. For a proper model generation, the agent has to interpret its sensors outcomes to make correct inferences about the surroundings, which corresponds to the map-building problem. The accuracy of the built map depends on the agent location when mapping and acquiring suitable models of the environment. This fundamental problem in mobile robotics is called Simultaneous Localization and Mapping (SLAM). SLAM is defined as a problem with an iterative solving strategy; while an agent navigates in an unknown environment, it must incrementally build a map of its surroundings; and localize itself within the built map. Additionally, as exploration is carried out, the agent must choose its viewpoints to ensure that the sensory measurements contain new and useful data. Thus, the accuracy of the map depends on the choice of viewpoints during exploration.
In exploration applications, there is a trade-off between the amount of knowledge acquired and its acquisition cost. An explorer agent aims to obtain the maximum environment knowledge at a minimum cost (i.e., minimum time and/or power). This work proposes the design of a hybrid coordination mechanism based on the interaction of three actions for intelligent agents: communication, intelligent task allocation, and negotiation in a multi-agent system. Communication is important in any multi-agent system. The purpose is to integrate a market-based task allocation algorithm with cooperative negotiation for agent task completion. Figure 1 shows a diagram based on an individual agent analysis to carry out an overall mission considering the environment information provided by itself and its teammates. There are different degrees of cooperation in a multi-agent system. The highest is "global cooperation," and corresponds to when an agent while making its own decision, keeps trying to maximize the global utility function that considers all agents actions in the system.
In exploration applications, there is a trade-off between the amount of knowledge acquired and its acquisition cost. An explorer agent aims to obtain the maximum environment knowledge at a minimum cost (i.e., minimum time and/or power). This work proposes the design of a hybrid coordination mechanism based on the interaction of three actions for intelligent agents: communication, intelligent task allocation, and negotiation in a multi-agent system.
Communication is important in any multi-agent system. The purpose is to integrate a market-based task allocation algorithm with cooperative negotiation for agent task completion. Figure 1 shows a diagram based on an individual agent analysis to carry out an overall mission considering the environment information provided by itself and its teammates. There are different degrees of cooperation in a multi-agent system. The highest is "global cooperation," and corresponds to when an agent while making its own decision, keeps trying to maximize the global utility function that considers all agents actions in the system. The process of finding an appropriate task for an agent depends on three subprocesses in this approach. For any agent, the suitability of an agent task (STi) is proposed to be defined in terms of the Knowledge acquired from the teammates (KTMi) and the Environment (KEi), the accuracy of the communications system (ac) and the sensors (as), the results obtained with the Decision-making Algorithm (DMRi), and the constraints given by negotiation within the Social Interactions (SINi). Each variable may change in time. This behavior determines the possibility of dynamic task allocation in the decision-making algorithm.
= ( , , , ) for each task i (1) The process of finding an appropriate task for an agent depends on three subprocesses in this approach. For any agent, the suitability of an agent task (S Ti ) is proposed to be defined in terms of the Knowledge acquired from the teammates (K TMi ) and the Environment (K Ei ), the accuracy of the communications system (a c ) and the sensors (a s ), the results obtained with the Decision-making Algorithm (DM Ri ), and the constraints given by negotiation within the Social Interactions (SI Ni ). Each variable may change in time. This behavior determines the possibility of dynamic task allocation in the decision-making algorithm.

Knowledge
S Ti = f (a c K TMi , a s K Ei , DM Ri , SI Ni ) for each task i (1)

Knowledge
This process involves both perception and interpretation processes by the agent, allowing it to know its teammates and the environment. These processes consider location, map building, tasks in progress, and agent performance, and capabilities. Here, hardware developments (sensors and communications systems) are necessary to acquire accurate information. In this sense, the more accurate information is obtained in this process, the more computational-cost reduction is reached for the next level.
In deliberative exploration, a map of the environment is needed to help the agents finding the optimum path to fully and efficiently cover a region. A grid map can be applied directly using sensor readings and location information. Hence, communication among agents is required to construct covered regions in the global map, know teammates locations, and memorize the grid cells already visited to avoid re-explorations.

Social Interactions
This process involves two or more agents making a joint decision. First, the agents verbalize the requests and then agree to a concession process or searching for new alternatives. The agreement can be reached through cooperative negotiation, where agents attempt to achieve the maximum global utility considering all their activities worth. It is possible to develop effective negotiation rules by exploiting the work with multiple agents.
When a task is assigned to an agent, conflicts or collisions among teammates arise; therefore, a negotiation algorithm is needed to reach an agreement that enables the negotiation process among agents. In this sense, the proposed negotiation algorithm depends on selecting the minimum cost decision from four approaches: waiting, teammate waiting, task switching, or new path assignment.

Decision Maker
This process is individual for each agent, and it is the intelligent method to obtain an optimal decision with minimal information. High intelligent performance at this level reduces the communication cost. Three cognitive styles characterize a decision-making algorithm: satisfiers, who try to find a "good enough" solution; maximizers, who try to make an optimal decision; and intelligent, who may also learn or use knowledge to achieve their goals. The core point of this process is how to coordinate agents to cover the environment efficiently. The primary purpose of the exploration is to cover the whole environment in a minimum of time. As a result, the agents must be aware of which areas of the environment have been explored previously. The task allocation implicates a new grid cell assignment where each agent should go and the path to get there.
The proposed intelligent approach uses a utility function to calculate the reaching suitability of each point. For this reason, a utility function U i based on the trade-off between the costs of reaching the grid cell and its gain in coverage is proposed and shown in Equation (2). The proportional cost P(C Ri ), as in Equation (3), defines agent suitability as a function of the time required to move to a particular grid cell t Rj compared to other agents in terms of maximum time to get there. This guarantees the best solution for the team and allows it to be implemented into a multi-agent system. The unexplored area size gives the proportional gain P(G Ci ) of grid cells that an agent can cover with its sensors when reaching a grid cell.
where n is the number of grid cells in the environment.
where m is the number of agents in the environment.

Implementation
The proposed approach is implemented in a simulation platform for multi-agent systems for indoor environments. Figure 2 shows the simulation tool for the indoor environment. The indoor platform has a total area of 180 cm × 150 cm, divided into 30 grid cells of 30 cm × 30 cm. Each cell is identified with a number from 1 to 30. Obstacles and targets are positioned randomly with a maximum of three homogeneous agents.
its distance recorded now is final and minimal. 5. If the destination node has been marked visited (when planning a route between t specific nodes) or if the smallest tentative distance among the nodes in the unvisi set is infinity (when planning a complete traversal), then the process stops. The al rithm has finished. 6. Set the unvisited node marked with the smallest tentative distance as the next "c rent node" and go back to Step 3.  Here, the modeling approach for the utility function defined in Equation (2) to allocate a new grid cell where an agent should go through the path to reach it is presented. In the proposed approach, the shortest path and cost to reach a particular grid cell are achieved by implementing a change to the Dijkstra's algorithm, where the cost of moving from one cell to another is dynamic due to agent orientation.
The Dijkstra's algorithm is a graph-based searching algorithm that solves the singlesource shortest-path problem for a graph with nonnegative edge path costs, producing a shortest-path tree. A basic description of this algorithm is as follows.

1.
Assign to every node a tentative distance value: set it to zero for the initial node and infinity for all other nodes.

2.
Mark all nodes unvisited. Set the initial node as current. Create a set of the unvisited nodes consisting of all nodes except the initial node. 3.
For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Even though a neighbor has been examined, it is not marked as visited at this time, and it remains in the unvisited set.

4.
When all of the current node neighbors are considered, the current node is marked as visited and remove from the unvisited set. A visited node will never be rechecked; its distance recorded now is final and minimal.

5.
If the destination node has been marked visited (when planning a route between two specific nodes) or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal), then the process stops. The algorithm has finished. 6.
Set the unvisited node marked with the smallest tentative distance as the next "current node" and go back to Step 3.
The following is a brief explanation of the most important exploration algorithms implemented with an example situation and the evaluation function values. In the graphs presented below (e.g., Figure 2) each agent is identified with a square of a color and the grid cell where has been assigned is indicated with a star of the same color. The next graphs (Figures 3-13) represent the blue agent decision function evaluation indicating the chosen cell.
Based on the knowledge acquired from teammates and the environment that each exploration algorithm needs to run, the Random Exploration algorithm does not use any information about the environment and teammates. This is the less intelligent algorithm. It looks for short response times, as discussed below. Nearest Cell Exploration and Farthest Cell Exploration need to run the shortest path algorithm for obtaining the cost to reach each cell. Frontier-based Exploration and Less Covered and Farthest Cell Exploration add to the decision-making process to construct the global environment coverage map to make decisions. For the proposed exploration algorithm, the Correlation Filter Exploration needs the global environment coverage image, and the Task Allocation based on Utility Function Approach (a neural-network-based approach) adds the teammates position to calculate the Euclidian distance. Besides, a genetic algorithm optimizes the utility function with heavy computation because it calculates cell by cell the actual coverage area expected and the proportional cost. It is necessary that each agent knows (and saves in memory) the teammates positions and paths, the global environment coverage, the travel cost to each cell, and the actual coverage gain obtained by visiting each cell. Thus, optimal decisions and multiple-agent coordination is ensured.

Random Exploration
A cell is selected randomly from all the possible ones that a robotic agent can reach and has not been visited. In random exploration, actions are randomly generated with a uniform probability distribution, independent of exploration costs or expected rewards. Strategies from the previous group, such as random exploration, distributed Boltzman exploration, and semi-uniform distributed exploration, do not use any exploration-specific knowledge and ensure exploration by merging randomness into action selection. The assignments of this approach are uniformly distributed in pseudorandom cells with the Mersenne Twister algorithm [40].

Nearest Cell Exploration
As the environment has 30 grid cells, the input to this algorithm is given by the cost to reach each unvisited cell but accessible from the agent initial cell. This cost is determined by the number of moves that the robot requires to reach a particular cell. Here the unit cost is 1 for 90-degree rotation, and four units for walking from one cell to the next one. This cost can represent the power consumption of the robot to produce these movements. Thus, this algorithm finds the cell that has a minimum cost to reach it (see Figure 3).

Farthest Cell Exploration
Similarly, this algorithm input is given by the cost to reach each unvisited cell but accessible from the agent initial cell; however, this algorithm finds the cell with the maximum cost. Even though it seems contradictory to use the maximum cost, the farthest cell use actually improves area coverage. It is important to note that this technique is only valid in known environments (see Figure 4).

Farthest Cell Exploration
Similarly, this algorithm input is given by the cost to reach each unvisited cell but accessible from the agent initial cell; however, this algorithm finds the cell with the maximum cost. Even though it seems contradictory to use the maximum cost, the farthest cell use actually improves area coverage. It is important to note that this technique is only valid in known environments (see Figure 4).

Farthest Cell Exploration
Similarly, this algorithm input is given by the cost to reach each unvisited cell but accessible from the agent initial cell; however, this algorithm finds the cell with the maximum cost. Even though it seems contradictory to use the maximum cost, the farthest cell use actually improves area coverage. It is important to note that this technique is only valid in known environments (see Figure 4).

Frontier-Based Exploration
The idea of frontier-based exploration strategy is to detect borders between already explored environment regions and those regions where the agent has not acquired information yet. Hence, the agent looks for traversable regions in the on-going map construction and those adjacent to unexplored regions and holes in the map. This algorithm input is the sum of the proportional costs to reach from the origin cell to each cell where the robotic agent can reach and has not been visited, with the percentage of how much has been covered each cell. Thus, this algorithm finds the less covered and nearest cell that minimizes the cost function of Equation (4) (see Figure 5).
for each agent i and for j = 1, 2… n, where n is the number of unvisited cells with no obstacles in them.

Frontier-Based Exploration
The idea of frontier-based exploration strategy is to detect borders between already explored environment regions and those regions where the agent has not acquired information yet. Hence, the agent looks for traversable regions in the on-going map construction and those adjacent to unexplored regions and holes in the map. This algorithm input is the sum of the proportional costs to reach from the origin cell to each cell where the robotic agent can reach and has not been visited, with the percentage of how much has been covered each cell. Thus, this algorithm finds the less covered and nearest cell that minimizes the cost function of Equation (4) (see Figure 5).
for each agent i and for j = 1, 2 . . . n, where n is the number of unvisited cells with no obstacles in them.

Less Covered and Farthest Cell Exploration
This algorithm input is the sum of the proportional cost to reach each unvisited cell but accessible from the agent initial cell. The cost assignment goes from 0% to 100% as needed so that each cell is covered in its entirety. This algorithm finds the cell that max-

Less Covered and Farthest Cell Exploration
This algorithm input is the sum of the proportional cost to reach each unvisited cell but accessible from the agent initial cell. The cost assignment goes from 0% to 100% as needed so that each cell is covered in its entirety. This algorithm finds the cell that maximizes the cost function of Equation (5) (see Figure 6).
for each agent i and for j = 1, 2 . . . n, where n is the number of unvisited cells with no obstacles in them.

Less Covered and Farthest Cell Exploration
This algorithm input is the sum of the proportional cost to reach each unvisited cell but accessible from the agent initial cell. The cost assignment goes from 0% to 100% as needed so that each cell is covered in its entirety. This algorithm finds the cell that maximizes the cost function of Equation (5) (see Figure 6).
for each agent i and for j = 1, 2… n, where n is the number of unvisited cells with no obstacles in them.

Correlation Filter Exploration
This algorithm maximizes the use of the global coverage area construction by each agent. It is based on a binary image of the environment, where the pixels marked with black are positions that have been covered or where there are obstacles (see Figure 7). Using the region observed by the agent sensor as a correlation filter, it is possible to find different points where the maximum coverage is achieved, and that position is chosen since it has a lower cost to reach.

Correlation Filter Exploration
This algorithm maximizes the use of the global coverage area construction by each agent. It is based on a binary image of the environment, where the pixels marked with black are positions that have been covered or where there are obstacles (see Figure 7). Using the region observed by the agent sensor as a correlation filter, it is possible to find different points where the maximum coverage is achieved, and that position is chosen since it has a lower cost to reach.  The basic idea in this approach is the correlation implementation in 2D. Given a square filter, the results of correlation can be computed by aligning the center of the filter with a pixel. Then, all overlapping values are multiplied together and add up the result, as shown in Equation (6) as where CF is the Correlation Filter of size 2 N × 2 N and EBI is the Environment Binary Image. Figure 8 presents an example in the simulation environment and the correlation levels between the environment and the sensor area. The basic idea in this approach is the correlation implementation in 2D. Giv square filter, the results of correlation can be computed by aligning the center of the with a pixel. Then, all overlapping values are multiplied together and add up the r as shown in Equation (6)  where CF is the Correlation Filter of size 2 N × 2 N and EBI is the Environment B Image. Figure 8 presents an example in the simulation environment and the correl levels between the environment and the sensor area. Two methods are used as input for obtaining the cost and gain without requ high computational resources due to neural networks use to model the utility fun The Euclidian distance from each agent to the specific cell represents the cost. As a approach, the gain is obtained with the correlation level with the image of the env ment in a specific cell by applying the correlation filter described above. High corre levels point out positions in the environment where the agent can obtain maximum erage.
The utility function with extensive computation is used to train the neural netw obtaining good performance with feed-forward back-propagation network archite three hidden layers, and sigmoid tangential transfer functions. On the other hand proposed to optimize the running of the utility function exploiting the parallel comp nature of genetic algorithms (GAs).
The implemented GAs have: Two methods are used as input for obtaining the cost and gain without requiring high computational resources due to neural networks use to model the utility function. The Euclidian distance from each agent to the specific cell represents the cost. As a new approach, the gain is obtained with the correlation level with the image of the environment in a specific cell by applying the correlation filter described above. High correlation levels point out positions in the environment where the agent can obtain maximum coverage.
The utility function with extensive computation is used to train the neural network, obtaining good performance with feed-forward back-propagation network architecture, three hidden layers, and sigmoid tangential transfer functions. On the other hand, it is proposed to optimize the running of the utility function exploiting the parallel computing nature of genetic algorithms (GAs).
The implemented GAs have:

1.
A genetic representation of the solution domain: populations or chromosomes are the grid cells not visited by the agents.

2.
The fitness function to evaluate the solution domain is the utility function with extensive computation.

3.
A standard representation of the solution is an array of bits with the size of the number of grid cells in the environment. The position set to 1, is the chosen cell. The main property that makes these genetic representations practical is that their components are easily aligned due to their fixed size, simplifying crossover operations.

4.
The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions (usually randomly); and then, improves it through repetitive application of the selection, crossover, and mutation operators.

5.
For this, the objective function to maximize is the utility function proposed in (2). Next, an example is presented for a task assignment process. Figures 9-12 show the proportional costs, gains, and utility values for the decision in the example case.
ber of grid cells in the environment. The position set to 1, is the chosen cell. The main property that makes these genetic representations practical is that their components are easily aligned due to their fixed size, simplifying crossover operations. 4. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions (usually randomly); and then, improves it through repetitive application of the selection, crossover, and mutation operators. 5. For this, the objective function to maximize is the utility function proposed in (2).
Next, an example is presented for a task assignment process. Figures 9-12 show the proportional costs, gains, and utility values for the decision in the example case.    3. A standard representation of the solution is an array of bits with the size of the number of grid cells in the environment. The position set to 1, is the chosen cell. The main property that makes these genetic representations practical is that their components are easily aligned due to their fixed size, simplifying crossover operations. 4. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions (usually randomly); and then, improves it through repetitive application of the selection, crossover, and mutation operators. 5. For this, the objective function to maximize is the utility function proposed in (2).
Next, an example is presented for a task assignment process. Figures 9-12 show the proportional costs, gains, and utility values for the decision in the example case.       Figure 12 shows how the modeled function with neural networks does not represent exactly the utility function with extensive computation optimized with GA, but the decision obtained is the same. Figure 13 shows the need to model and optimize the utility function given the response times of each algorithm. To illustrate this, 100 tests are performed of the decision in different environments for each algorithm obtaining the response time of each decision.    Figure 12 shows how the modeled function with neural networks does not repre exactly the utility function with extensive computation optimized with GA, but the d sion obtained is the same. Figure 13 shows the need to model and optimize the ut function given the response times of each algorithm. To illustrate this, 100 tests are formed of the decision in different environments for each algorithm obtaining the sponse time of each decision.

Knowledge Comparison Among Algorithms
To complement the decision-making algorithm implementation stage, based on (1), Table 1 presents the knowledge acquired from teammates and the environment that each algorithm needs to run. Previously, all algorithms know the map of the environment and the obstacle grid cells.

Knowledge Comparison Among Algorithms
To complement the decision-making algorithm implementation stage, based on (1), Table 1 presents the knowledge acquired from teammates and the environment that each algorithm needs to run. Previously, all algorithms know the map of the environment and the obstacle grid cells.

Negotiation Algorithms
As explained in the proposed hybrid coordination mechanism, the negotiation algorithm is based on choosing the minimum cost decision (time to complete the task) from four approaches: waiting, teammate waiting, task switching, or new path assignment. Next, four example cases are presented to show the operation of each negotiation algorithm, present the collision scene (the collision cell is marked with a yellow star), and report the obtained solution.

Waiting
The Waiting algorithm corresponds to keep the robotic agent still for a specific time until no collision are detected (see Figure 14).

Negotiation Algorithms
As explained in the proposed hybrid coordination mechanism, the negotiation algorithm is based on choosing the minimum cost decision (time to complete the task) from four approaches: waiting, teammate waiting, task switching, or new path assignment. Next, four example cases are presented to show the operation of each negotiation algorithm, present the collision scene (the collision cell is marked with a yellow star), and report the obtained solution.

Waiting
The Waiting algorithm corresponds to keep the robotic agent still for a specific time until no collision are detected (see Figure 14).

Teammate Waiting
The Teammate Waiting algorithm corresponds to keep the collision partner still for a specific time until no further collisions are detected (see Figure 15).

Task Switching
The Task Switching algorithm corresponds to exchange the cell to reach between colliding robotic agents. This approach does not permit to stop the exploration of any robotic agent increasing the possibility of gain coverage (see Figure 16).

Teammate Waiting
The Teammate Waiting algorithm corresponds to keep the collision partner still for a specific time until no further collisions are detected (see Figure 15).

Negotiation Algorithms
As explained in the proposed hybrid coordination mechanism, the negotiation algorithm is based on choosing the minimum cost decision (time to complete the task) from four approaches: waiting, teammate waiting, task switching, or new path assignment. Next, four example cases are presented to show the operation of each negotiation algorithm, present the collision scene (the collision cell is marked with a yellow star), and report the obtained solution.

Waiting
The Waiting algorithm corresponds to keep the robotic agent still for a specific time until no collision are detected (see Figure 14).

Teammate Waiting
The Teammate Waiting algorithm corresponds to keep the collision partner still for a specific time until no further collisions are detected (see Figure 15).

Task Switching
The Task Switching algorithm corresponds to exchange the cell to reach between colliding robotic agents. This approach does not permit to stop the exploration of any robotic agent increasing the possibility of gain coverage (see Figure 16).

Task Switching
The Task Switching algorithm corresponds to exchange the cell to reach between colliding robotic agents. This approach does not permit to stop the exploration of any robotic agent increasing the possibility of gain coverage (see Figure 16).

Figure 16.
Collision scene and the best solution implementing the task switching approach to minimize the time to complete the task by the magenta robot.

New Path Assignment
This approach finds a new route to reach the assigned target cell in the task allocation process putting the collision cell as an obstacle. Since finding a new route cannot be assured, this approach not always provides a solution for a collision event (see Figure 17).

Experimental Results
The approach is implemented in the simulation environment presented in Section 4. Each experiment involves exploring the environment by three agents to achieve a 97% environment coverage (29 out of 30 cells). Moreover, each experiment has been performed with a different and random configuration of the environment to make data collection independent. This includes different positions of the obstacles, targets, and agents.
Additionally, a series of experiments is performed to get a quantitative assessment of the proposed approach improvements over non-intelligence.
The experimental design is defined as follows: Hypothesis: To which grid cell or node in the map should an agent move to minimize the time it takes to fully explore the environment and find the maximum number of possible targets given its teammates positions and the areas already explored? What is the exploration algorithm for achieving that objective?
Response Variables: 1. Response Time is the machine time in seconds for each algorithm to do the allocation process for the task. 2. Decision Cost represents the time required to complete the task.

New Path Assignment
This approach finds a new route to reach the assigned target cell in the task allocation process putting the collision cell as an obstacle. Since finding a new route cannot be assured, this approach not always provides a solution for a collision event (see Figure 17).  Figure 16. Collision scene and the best solution implementing the task switching approach to minimize the time to complete the task by the magenta robot.

New Path Assignment
This approach finds a new route to reach the assigned target cell in the task allocation process putting the collision cell as an obstacle. Since finding a new route cannot be assured, this approach not always provides a solution for a collision event (see Figure 17).

Experimental Results
The approach is implemented in the simulation environment presented in Section 4. Each experiment involves exploring the environment by three agents to achieve a 97% environment coverage (29 out of 30 cells). Moreover, each experiment has been performed with a different and random configuration of the environment to make data collection independent. This includes different positions of the obstacles, targets, and agents.
Additionally, a series of experiments is performed to get a quantitative assessment of the proposed approach improvements over non-intelligence.
The experimental design is defined as follows: Hypothesis: To which grid cell or node in the map should an agent move to minimize the time it takes to fully explore the environment and find the maximum number of possible targets given its teammates positions and the areas already explored? What is the exploration algorithm for achieving that objective?
Response Variables: 1. Response Time is the machine time in seconds for each algorithm to do the allocation process for the task. 2. Decision Cost represents the time required to complete the task.

Experimental Results
The approach is implemented in the simulation environment presented in Section 4. Each experiment involves exploring the environment by three agents to achieve a 97% environment coverage (29 out of 30 cells). Moreover, each experiment has been performed with a different and random configuration of the environment to make data collection independent. This includes different positions of the obstacles, targets, and agents.
Additionally, a series of experiments is performed to get a quantitative assessment of the proposed approach improvements over non-intelligence.
The experimental design is defined as follows: Hypothesis: To which grid cell or node in the map should an agent move to minimize the time it takes to fully explore the environment and find the maximum number of possible targets given its teammates positions and the areas already explored? What is the exploration algorithm for achieving that objective?
Response Variables:

1.
Response Time is the machine time in seconds for each algorithm to do the allocation process for the task.

2.
Decision Cost represents the time required to complete the task.

3.
Decision Gain is the percentage-gain coverage that is expected when the task is completed.

4.
Social Interaction Level determines if a task assignment produces collision or not.

5.
Coordination Level measures the capability of the exploration algorithm to use multiple agents. It is performed using the standard deviation of the individual cost for each agent. 6.
Re-Exploration Level is the percentage of the re-explored environment when the coverage is 90%. 7.
Exploration Time is the time when the agents reach 97% coverage of the environment, and it is measured in seconds. 8.
Coverage Effectiveness is the percentage of the environment covered in 60 s. 9.
Search Effectiveness is the percentage of targets found in 60 s. 10. Collision Avoidance Algorithm determines which is the algorithm selected when a collision occurs.
Factors: exploration algorithms. The proposed approach vs. previous non-intelligent techniques. For the non-intelligent techniques; there are considered (1) Random exploration, (2) Nearest cell exploration, (3) Farthest cell exploration, (4) Frontier based exploration, and (5) Less covered and farthest cell; and for the proposed approach, there are (6) Exploration based on correlation filters, (7) Utility function modeled using neural networks, and (8) Utility function optimized using GAs. Figure 18 shows the legend for each of the following figures that summarize the tested algorithms results. 3. Decision Gain is the percentage-gain coverage that is expected when the task is completed. 4. Social Interaction Level determines if a task assignment produces collision or not. 5. Coordination Level measures the capability of the exploration algorithm to use multiple agents. It is performed using the standard deviation of the individual cost for each agent. 6. Re-Exploration Level is the percentage of the re-explored environment when the coverage is 90%. 7. Exploration Time is the time when the agents reach 97% coverage of the environment, and it is measured in seconds. 8. Coverage Effectiveness is the percentage of the environment covered in 60 s. 9. Search Effectiveness is the percentage of targets found in 60 s. 10. Collision Avoidance Algorithm determines which is the algorithm selected when a collision occurs.
Factors: exploration algorithms. The proposed approach vs. previous non-intelligent techniques. For the non-intelligent techniques; there are considered (1) Random exploration, (2) Nearest cell exploration, (3) Farthest cell exploration, (4) Frontier based exploration, and (5) Less covered and farthest cell; and for the proposed approach, there are (6) Exploration based on correlation filters, (7) Utility function modeled using neural networks, and (8) Utility function optimized using GAs. Figure 18 shows the legend for each of the following figures that summarize the tested algorithms results. Three graphs are developed for some response variables: cumulative average, box plots, and multiple comparison test. In some cases, the cumulative average and the box plots show differences between the exploration algorithms, but it is necessary to implement a comparison method since some confidence intervals overlap. In particular, it is necessary to know which pairs of means are significantly different and not. A test that can give such information is known as a multiple comparison procedure. If applied a t-test, the alpha value would apply to each comparison; therefore, the chance of incorrectly finding a significant difference would increase with the number of comparisons. Multiple comparison procedures are designed to provide an upper bound on the probability that any comparison being incorrectly considered significant. To do this, Tukey's honestly significant difference criterion (HSD or Tukey-Kramer) is implemented together with an analysis of variance (ANOVA) to identify which means are significantly different from each other. In these graphs, two means are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap.
About the response time variable, the three proposed exploration algorithms take a longer time to perform the decision-making process compared to the non-intelligent approaches, as seen in Figure 19. This proves the stated regarding the information required for the execution of each algorithm. Based on Figure 19, all the non-intelligent approaches are not different statistically; but, according to the time response; correlation filter, neural networks, and genetic algorithm are statistically different. Three graphs are developed for some response variables: cumulative average, box plots, and multiple comparison test. In some cases, the cumulative average and the box plots show differences between the exploration algorithms, but it is necessary to implement a comparison method since some confidence intervals overlap. In particular, it is necessary to know which pairs of means are significantly different and not. A test that can give such information is known as a multiple comparison procedure. If applied a t-test, the alpha value would apply to each comparison; therefore, the chance of incorrectly finding a significant difference would increase with the number of comparisons. Multiple comparison procedures are designed to provide an upper bound on the probability that any comparison being incorrectly considered significant. To do this, Tukey's honestly significant difference criterion (HSD or Tukey-Kramer) is implemented together with an analysis of variance (ANOVA) to identify which means are significantly different from each other. In these graphs, two means are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap.
About the response time variable, the three proposed exploration algorithms take a longer time to perform the decision-making process compared to the non-intelligent approaches, as seen in Figure 19. This proves the stated regarding the information required for the execution of each algorithm. Based on Figure 19, all the non-intelligent approaches are not different statistically; but, according to the time response; correlation filter, neural networks, and genetic algorithm are statistically different.
The decision cost measures the average time needed to complete the task assigned by the exploration algorithm. Based on Figure 20, it is interesting to notice how the utility function optimized with a genetic algorithm is similar to nearest cell exploration. Furthermore, it is proved that the farthest cell exploration, and less covered and farthest cell exploration are the costliest decisions in terms of time and power consumption. The decision cost measures the average time needed to complete the task assigned by the exploration algorithm. Based on Figure 20, it is interesting to notice how the utility function optimized with a genetic algorithm is similar to nearest cell exploration. Furthermore, it is proved that the farthest cell exploration, and less covered and farthest cell exploration are the costliest decisions in terms of time and power consumption. The decision gain measures the average coverage area obtained when the robotic agent completes the task assigned by the exploration algorithm. Based on Figure 21, it is clear to see how the decisions taken using the correlation filter algorithm obtain the most coverage gain compared to the other algorithms, proving its capacity to maximize the use of global coverage area construction by each robotic agent.  The decision gain measures the average coverage area obtained when the robotic agent completes the task assigned by the exploration algorithm. Based on Figure 21, it is clear to see how the decisions taken using the correlation filter algorithm obtain the most coverage gain compared to the other algorithms, proving its capacity to maximize the use of global coverage area construction by each robotic agent. The results obtained in the social interactions section define that an algorithm with a high rate of collision is not very sociable because their decisions affect their teammates. The utility function optimized with the genetic algorithm presents the lowest collision frequency, demonstrating that is it a sociable algorithm. The results presented in Figure 22 allow classifying the exploration algorithms by social levels. Appl. Sci. 2021, 11, x FOR PEER REVIEW 20 of 29 The results obtained in the social interactions section define that an algorithm with a high rate of collision is not very sociable because their decisions affect their teammates. The utility function optimized with the genetic algorithm presents the lowest collision frequency, demonstrating that is it a sociable algorithm. The results presented in Figure  22 allow classifying the exploration algorithms by social levels. The coordination level (individual cost-time to complete the tasks) measures an algorithm ability to distribute the work evenly among robotic agents. Figure 23 shows that the algorithms that reach the farthest cell are the ones that present lower coordination.  The results obtained in the social interactions section define that an algorithm with a high rate of collision is not very sociable because their decisions affect their teammates. The utility function optimized with the genetic algorithm presents the lowest collision frequency, demonstrating that is it a sociable algorithm. The results presented in Figure  22 allow classifying the exploration algorithms by social levels. The coordination level (individual cost-time to complete the tasks) measures an algorithm ability to distribute the work evenly among robotic agents. Figure 23 shows that the algorithms that reach the farthest cell are the ones that present lower coordination. The coordination level (individual cost-time to complete the tasks) measures an algorithm ability to distribute the work evenly among robotic agents. Figure 23 shows that the algorithms that reach the farthest cell are the ones that present lower coordination. Again, the utility function optimized with a genetic algorithm and the nearest cell and frontier-based exploration presents the best coordination levels. Again, the utility function optimized with a genetic algorithm and the nearest cell and frontier-based exploration presents the best coordination levels.  One of the most important features of exploration algorithms is avoiding re-exploration, because this is considered a loss of power and time by the robot team. Based on Figure 24, the supremacy of the utility function optimized with a genetic algorithm over the other algorithms is clear.  The most important goal in an exploration problem is time. If high coordination with three agents in the environment is required, Figure 25 shows that exploration algorithms that trade-off coverage gain and time cost to reach a specific grid cell present better performance. Moreover, farthest and random cell algorithms make re-exploration of the environment increasing the exploration time. The most important goal in an exploration problem is time. If high coordination with three agents in the environment is required, Figure 25 shows that exploration algorithms that trade-off coverage gain and time cost to reach a specific grid cell present better performance. Moreover, farthest and random cell algorithms make re-exploration of the environment increasing the exploration time.  Given the physical characteristics of the environment used to obtain these results, the analysis of the next variables aims to determine the coverage and searching effectiveness of each algorithm regardless of the environment. As a first step, both variables increase their rate to increase the number of robotic agents in the environment as shown in Figures  26 and 27. Given the physical characteristics of the environment used to obtain these results, the analysis of the next variables aims to determine the coverage and searching effectiveness of each algorithm regardless of the environment. As a first step, both variables increase their rate to increase the number of robotic agents in the environment as shown in Figures 26 and 27.   Additionally, Figure 28 shows the frequency of collision avoidance performance for each negotiation algorithm in the experiments. The results show how the algorithms actually select the most appropriate according to the collision scene. Figure 29 shows a radar chart or spider plot that permits selecting of the appropriate exploration algorithm for any restriction or mission goal represented with the response variables analyzed for the indoor environment. This presents normalized values for each variable response and results for three-agent exploration.
Before starting the analysis of results, it is essential to define an exploration algorithm intelligence, such as its ability to generate important decisions (in terms of environment coverage) in minimum time with the least amount and quality of information from teammates and the environment.
About the response time, the three proposed exploration algorithms take a longer time to perform the decision-making process compared to the non-intelligent approaches. This proves the stated above regarding the information necessary for the execution of each algorithm.  Additionally, Figure 28 shows the frequency of collision avoidance performance for each negotiation algorithm in the experiments. The results show how the algorithms actually select the most appropriate according to the collision scene.  Figure 29 shows a radar chart or spider plot that permits selecting of the appropriate exploration algorithm for any restriction or mission goal represented with the response variables analyzed for the indoor environment. This presents normalized values for each variable response and results for three-agent exploration.
Before starting the analysis of results, it is essential to define an exploration algorithm intelligence, such as its ability to generate important decisions (in terms of environment coverage) in minimum time with the least amount and quality of information from teammates and the environment. The decision gain measures the average coverage area obtained when the agent completes the task assigned by the exploration algorithm. Based on Figure 29, it is clear to note how the correlation filter algorithm decisions obtain more coverage gain than the other algorithms, proving its capacity to maximize the use of global coverage area construction by each agent.
The results obtained in the social interactions section conclude that an algorithm with a high rate of collision is not very sociable, since agents' decisions affect their teammates. The utility function optimized with a GA had the lowest collision frequency, demonstrating a good sociable algorithm.
The coordination level measures the ability to distribute the work evenly among agents. Clearly, the algorithms that reach to farthest cell present the lower coordination. Again, the utility function optimized with a GA together with the nearest cell and frontierbased exploration shows the best coordination levels.
One of the most important exploration algorithms features is re-exploration avoidance because this is considered a loss of power consumption and time by the agent team. Figure 29 shows the supremacy of the utility function optimized with a GA over the other algorithms.
The most important goal in the exploration problem is the exploration time. In monoagent systems, the nearest cell exploration presents the best performance. The exploration algorithms that consider the trade-off between the coverage gain and cost in time to reach a particular grid cell present a better performance for a high coordination requirement. Moreover, farthest and random cell algorithms make re-exploration of the environment increasing the exploration time.

Conclusions
The proposed approach has been inspired in an indoor environment with physical agents, and its performance tested in simulations. The results show that our proposal effectively allocates the agents over the environment and enables them to carry out their The decision cost measures the average time needed to complete the task assigned by the exploration algorithm. Based on Figure 29, it is interesting to notice how the utility function optimized with GA is similar to the nearest cell exploration. Furthermore, it is proved that the farthest cell exploration and less covered and farthest cell exploration take the most expensive decisions (in time and power consumption).
The decision gain measures the average coverage area obtained when the agent completes the task assigned by the exploration algorithm. Based on Figure 29, it is clear to note how the correlation filter algorithm decisions obtain more coverage gain than the other algorithms, proving its capacity to maximize the use of global coverage area construction by each agent.
The results obtained in the social interactions section conclude that an algorithm with a high rate of collision is not very sociable, since agents' decisions affect their teammates. The utility function optimized with a GA had the lowest collision frequency, demonstrating a good sociable algorithm.
The coordination level measures the ability to distribute the work evenly among agents. Clearly, the algorithms that reach to farthest cell present the lower coordination. Again, the utility function optimized with a GA together with the nearest cell and frontierbased exploration shows the best coordination levels.
One of the most important exploration algorithms features is re-exploration avoidance because this is considered a loss of power consumption and time by the agent team. Figure 29 shows the supremacy of the utility function optimized with a GA over the other algorithms.
The most important goal in the exploration problem is the exploration time. In monoagent systems, the nearest cell exploration presents the best performance. The exploration algorithms that consider the trade-off between the coverage gain and cost in time to reach a particular grid cell present a better performance for a high coordination requirement. Moreover, farthest and random cell algorithms make re-exploration of the environment increasing the exploration time.

Conclusions
The proposed approach has been inspired in an indoor environment with physical agents, and its performance tested in simulations. The results show that our proposal effectively allocates the agents over the environment and enables them to carry out their mission quickly. They also demonstrate that our coordination method outperforms other methods developed so far.
The goal then is not to conclude which exploration algorithm is better, but to highlight each one taking into account the response variables discussed above. By analyzing the results presented above, the explorations algorithms that consider the cost in time of the mission show better performance. This is reflected in the fact that farthest and random exploration algorithms make re-exploration of the environment by increasing the exploration time. On the other hand, algorithms that trade-off coverage gain and time cost to reach a particular grid cell have improved performance regarding coverage efficiency. In a high coordination requirement, with more agents in the environment, the proposed intelligent task allocation algorithm based on optimizing a utility function using genetic algorithms and neural networks is significantly better than other approaches.
The following future works can be considered: make dynamic task allocation in the decision-making algorithm, implement intelligent techniques in the negotiation process and implement these algorithms in dynamic environments.