Multi-Robot Exploration Based on Multi-Objective Grey Wolf Optimizer

: In this paper, we used multi-objective optimization in the exploration of unknown space. Exploration is the process of generating models of environments from sensor data. The goal of the exploration is to create a finite map of indoor space. It is common practice in mobile robotics to consider the exploration as a single-objective problem, which is to maximize a search of uncertainty. In this study, we proposed a new methodology of exploration with two conflicting objectives: to search for a new place and to enhance map accuracy. The proposed multiple-objective exploration uses the Multi-Objective Grey Wolf Optimizer algorithm. It begins with the initialization of the grey wolf population, which are waypoints in our multi-robot exploration. Once the waypoint positions are set in the beginning, they stay unchanged through all iterations. The role of updating the position belongs to the robots, which select the non-dominated waypoints among them. The waypoint selection results from two objective functions. The performance of the multi-objective exploration is presented. The trade-off among objective functions is unveiled by the Pareto-optimal solutions. A comparison with other algorithms is implemented in the end.


Introduction
In robotics, exploration pertains to the process of scanning and mapping out an environment to produce a map, which can be used by a robot or group of robots for further work. Based on the type of environment, exploration can be one of the following: outdoor, indoor, or underwater, using a mobile-robot or multi-robot systems [1,2]. In this study, we focused on indoor exploration by robots, which are equipped with ranging sensors. It can be assumed that these robots with onboard sensors can scan an environment without any difficulties by walking randomly around. However, their motion is not efficient, which can result in an incomplete map coverage. As a solution to this issue, this paper proposes an algorithm that enhances the efficiency of the multi-robot exploration by using a multi-objective optimization strategy.
Naturally, all real-world optimization problems in engineering pursue multiple goals. They may differ from each other in various fields. However, it is common for all to optimize problems by maximization or minimization functions. In the past, multiple optimization problems were solved by one function because of the lack of suitable solution methodologies for solving a multi-objective optimization problem (MOOP) as a single-objective optimization problem. With the development of evolutionary algorithms, new techniques, which seek to optimize two or more conflicting objectives in one simulation run, have been applied to MOOPs. This new research area is named multi-objective optimization (MOO) [3].
In robotics, studies related to optimization have been gaining wide attention [4]. If we consider multi-robot systems [5], optimization is popularly applied in path planning [6], formation [7], exploration [8], and other fields where decision-making control needs to be optimized. Previous research conventionally found the optimal solutions as separated single-objective tasks: short path, obstacle-free motion, smoothest route, and constant search of uncertain terrain. The new impact of optimization in robotics is obtained due to the metaheuristics and its nature-inspired optimization techniques [9]. The nature-inspired algorithms are not only restricted to robotics but also have significant applications in different fields. Due to this, they have attracted the attention of scholars [10,11].
Metaheuristic algorithms are optimization approaches, which emulate the intelligence of various species of animals in nature. The number of agents classifies the metaheuristic algorithms into single-solution-based and population-based algorithms. In both classes, the solutions improve over the course of iterations with one single agent or an entire swarm of agents, respectively. The main advantage of population-based approaches is their ability to avoid stagnation in the local optima due to the number of agents. The swarm can explore search space more and faster than a single agent. Regardless of this, the benefit of one class over the other depends on its application in a certain problem. However, it is important to mention the No-Free-Lunch (NFL) theorem for optimization, which assures that there is no algorithm with universal optimal solution by all criteria and in all domains [12].
Despite the number of agents, the metaheuristic algorithms can be classified into single and multi-objective optimization techniques according to the number of objective functions. A multiobjective optimization is an extended approach to single optimization. It allows finding an optimal solution of two independent objectives simultaneously. In order to select just one best solution from the available ones, a trade-off should be considered among them. The Pareto-optimal front helps pick up one of the suitable solutions satisfying two objective functions.
Using the MOGWO exploration, we defined two objectives for optimization, namely the maximization of the search for new area and the minimization of the inaccuracy of the explored map. It can be said that the search process is divided into two stages ( Figure 1). They switch during the simulation run depending on the value of the GWO parameter. If the value is greater than one, it searches occluded space. If the value is less than one, it increases the map accuracy by repeated visits in the explored space. It needs to be emphasized that the occupancy grid map with probabilistic values is used in this study [23].
The MOGWO exploration employs static waypoints in the simulation, which promotes the efficient exploration of an indoor simulated environment. It can be noted that the waypoints belong to the programmed logic of the algorithm and are not supposed to be used in the real environment [24]. The waypoints are grey wolf agents with some costs of probability values. In each iteration, the robots search alpha, beta, and gamma waypoints and save them in an archive wherein avoiding the selection the same non-dominated waypoints for several robots. After the selection, the robots can compute the next position, which is the closest to the average position of alpha, beta, and gamma waypoints, from among the frontier cells [25]. This paper is organized as follows: in Section 2, we briefly recall different algorithms of multirobot exploration and evolutionary optimization techniques used in related works. In Section 3, the theory of GWO and MOGWO is presented. Sections 4 and 5 are dedicated to the proposed MOGWO exploration and its performance. Section 6 concludes the present study.

Related Work
In the last two decades, many techniques have been proposed for robot exploration. Among them, there are novel fundamental, hybridized, and modified methods. In this section, studies on the different algorithms and the impact of the evolutionary optimization techniques in exploration are discussed.
Considering exploration as one of the branches in robotics, Yamauchi et al.'s frontier-based method is the pioneering work in this field [26]. From that time up to now, many frontier-based studies have appeared, most of which were hybridized or modified with success.
The coordinated multi-robot exploration (CME) is frontier-based with the emphasis on the cooperative work with a team of robots [22]. The robot's mission is to search for maximum utility with the minimum cost that diverges the robots from each other keeping the direction to search unexplored space. The alternative coordinated method is the randomized graph approach [27,28]. It builds a roadmap in an explored area that navigates robots to move through safe paths. Recently, Alfredo et al. [29] introduced the efficient backtracking concept to the random exploration graph, preventing the same robot in visiting the same place more than once. These above-mentioned methods have the common idea of using frontier-based control.
Another approach in exploration that is completely different in theory and practice is artificial intelligence (AI). Reinforcement Learning (RL) and convolutional neural network (CNN) are such attempts, which have been proposed in previous studies [30][31][32]. The exploration-based approach on neural networks differs considerably from the frontier-based approach in terms of environment perception and control system. Visual sensors (cameras) scan a place for further computation using image-processing algorithms [33]. The output of the calculation is the interaction of the robot with the environment. Lei Tai et al. [34] conducted a survey of leading studies in mobile robotics using deep learning from perception to control systems.
Recently, a novel branch of exploration that employs nature-inspired optimization techniques has appeared. The approaches seek to enhance existing solutions to exploration. Sharma S. et al. [35] applied clustering-based distribution and bio-inspired algorithms such as PSO, Bacteria Foraging Optimization, and Bat algorithm. The clustering provides a direction of robot motion, while the nature-inspired approaches involve exploring the unknown area. The study of [36] applied a combination of PSO, fractional calculus, and fuzzy inference system. They compared their results with other six other PSO variations that showed effective multi-robot exploration. A similar waypoint concept in our study was performed in [37]. The artificial pheromone and fuzzy controllers help the multi-robot systems to navigate efficiently by distributing the search between robots and avoiding repeated visits in explored regions.
The study of [38] involved more than one optimization problem in the exploration, which is important to highlight here. The optimal solution seeks to minimize two objective functions: the variance of path lengths and the sum of the path lengths of all robots. Compared to our research, they applied the K-Means clustering algorithm instead of the bio-inspired technique used in our study.
The study of [39] presented the auto-adaptive multi-objective strategy for multi-robot exploration, where the multi-objective concept consisted of two missions: a search of uncertainties and stable communication. This work is closely related to the present work, but the focus of their research is an assessment of the communication conditions for providing efficient map coverage, which is a different perspective compared to our study.
In regard to multi-objective optimization in multi-robot systems, MOPSO [40] and multi-ACO [41] have already been applied to path planning problem. Broadly speaking, the metaheuristic algorithms are often applied in path planning problems compared to other issues, mainly, because optimization is the core study for finding a short and smooth path.
In general, MOGWO has never been applied before in mobile robotics studies.

Single and Multi-Objective Grey Wolf Optimizer
The section briefly describes the theories of GWO and MOGWO. The two techniques have the interconnection that one is inferred to another. Firstly, GWO will be presented, and then, the concept will be extended to the multi-objective optimization using MOGWO.

Grey Wolf Optimizer
GWO is a population-based metaheuristic algorithm, which mimics the wolf hunting process. Population-and single-based optimization algorithms differ from each other in the number of agents used to carry out a search of a global optimum. Each agent is a candidate for finding a global optimum. Figure 2 shows the GWO simulation. The search begins when all agents obtain random , values, where lower bound ≤ x, y ≤ upper bound . Then, the cost function defines the best candidates , , among them in each iteration ( Figure 2a) by equation: Every single agent needs to compute , , , and then, , , can be found. The random and adaptive vectors, ⃗ and ⃗ are upgraded in each iteration.
Finally, the next agent's position is mean value of ⃗ , ⃗ , ⃗ , which Figure 2c is illustrated.
The same calculation will be repeated in each run time for every search agent. Depending on the values of the vectors, ⃗ and ⃗ , also denoted as GWO parameters, the two phases make the transition between divergence (exploration) and convergence (exploitation) in the optimal solution search. The GWO parameters are calculated as follows: where the value of ⃗ decreases linearly from 2 to 0 using the update equation for iteration : and ⃗ and ⃗ are random values ranging from 0 to 1. In GWO, the parameter ⃗ determines the exploration and exploitation in searching behavior. Each agent of the population performs divergence when ⃗ > 1 and executes convergence from , , agents when ⃗ < 1. Figure 2 illustrates how the agent largely changes the next position at iterations t and t+1 by the divergence of the parameter ⃗ , which is linearly decreasing. The parameter ⃗ randomly determines the exploration or exploitation tendencies without dependency on the iterations. The stochastic mechanism in GWO allows the enhanced search of optimality by reaching different positions around the best solutions.
In recent years, the GWO algorithm has been widely modified in various studies. In study [42], the authors improved the convergence speed of GWO by guiding the population using the alpha solution. In another study [43], the new operator called reflecting learning was introduced in the algorithm. It improves the search ability of GWO by the principle of light reflection in physics. The optimization was enhanced in studies [44][45][46] by random walk strategies and Levy flights distribution as well.

Multi-Objective Grey Wolf Optimizer
Two new components were integrated into MOGWO for performing multi-objective optimization: an archive of the best non-dominated solutions and a leader selection strategy of alpha, beta, and gamma solutions. The archive is needed for storing non-dominated Pareto solutions through the course of iterations. The archive controller has dominant sorting rules for entering new solutions and for archive states. The size of the archive is closely related to the number of objective functions, which is named segments or hypercubes. Figure 3a shows the archive of three hypercubes with the non-dominated solutions for the t-iteration. The second component is a leader selection, which chooses the least crowded hypercube of the search space and offers available non-dominated solutions from the archive (Figure 3b). In case there are only two solutions, the third one can be taken from the second least crowded hypercube.
Generally, it can be said that the archive stores the best solutions for each objective function. It saves them not only as alpha, beta, and gamma agents, but also with the segment priorities, which are defined by the number of total solutions. Thus, the global best solution can be chosen among the local ones in the archive. This selection mechanism in MOGWO prevents the picking of the same leaders. In other words, it avoids stagnation in local optimal points. Figure 4 shows the full algorithm of MOGWO.
MOGWO finds application in cloud computing for virtual machine placement [47], medicine for preventing cervical cancer by scanning images [48], wind power for speed forecasting [49], and energy-efficient scheduling [50]. However, it has not been used in the robotic field up to this time.

MOGWO Exploration for Multi-Robot System
In this section, we describe the proposed multi-robot exploration based on MOGWO optimization. First, we define the optimization problems in the exploration. As mentioned above, there are two objective functions for which the study tries to find an optimal solution. Then, the second subsection presents the approach for solving the problems using the MOGWO exploration algorithm.

Mathematical Formulation of MOOPs in the Multi-Robot Exploration
The process of searching uncertainties by a team of robots can be considered a multi-tasking system. Each robot receives the sensor reading data and upgrades the probabilities of the grid occupancy in the map. One robot should have the same task as another robot in the multi-robot system, wherein the task can be any of the following: scanning the environment using sensors, avoiding obstacles and collisions with other robots, seeking to explore new terrain, and increasing the accuracy of the map. It means that together, each single robot should provide good implementation as a multi-robot system satisfying the multi-objective functions for obtaining the best solutions.
In this paper, we formulated the objective functions of the exploration as follows: Minimize: Subject to: where, − , − .
The first objective function in Equation (6) tries to maximize the search space by visiting various numbers of cells in the map. In a good scenario, robots should avoid explored cells. The waypoints in MOGWO exploration allow saving the direction to the unexplored part of the map. However, there are some constraints for a successful search. For example, the number of waypoints should not be too small and big. In the case when it is small, robots will stay in one point, because they do not have the tasks to drive next waypoints. If it is bigger than the total number of cells in a map, then a robot will drive around one place longer than it is needed.
After the map is explored, the second objective function tries to improve the map accuracy by reducing the probability values of the grid cells. It means once a sensor beam touches a grid cell, the cell is marked as explored. However, the signal strength projected on the cell is not identical. In the robot position, the probability value has the lowest value. In frontier cells, the values are higher according to the power of a signal.
In the subsection below, the MOGWO exploration algorithm is described extensively.

The Proposed MOGWO Exploration Algorithm
Equations (6) and (7) define the objectives of the exploration in this study. For such problems, there is no single solution that satisfies all objectives simultaneously at one time. It is not possible to explore new cells (f1) and to revisit explored cells (f2) at the same time. Based on the GWO parameter a in Equation (5), the search process is divided into two parts. When > 1 , it searches new waypoints. When < 1, the process switches to revisiting already explored areas to improve the map accuracy. Thus, the approach serves two MOOPs in a single run-time.
Algorithm 1 demonstrates the MOGWO exploration for the multi-robot system. The process begins with the random initialization of waypoints in the search space. Their positions are set only once in the first iteration and will not be upgraded throughout the whole exploration. In line 1, it was noted that the number of waypoints should be higher than because each robot needs at least three of the best solutions , , for the search. Insert rays to the map from , position 12: Calculate the distances by the objective function ( ) 13 The proposed algorithm uses the archive for the same purposes as MOGWO does. It allows the storage of the non-dominated solution, which prevents the repeated selection of the same waypoint by the robots. For each iteration in the loop, the robots upgrade the positions, the GWO parameters, and the positions and probability costs of the frontier cells (lines 9, 10).
Two objective functions are used for all stages (lines 12, 13). The first one calculates the distances between waypoints and robots. The second objective function computes the probability values in the waypoint positions. Thus, the waypoints have distance costs of and probability costs . Lines 14-24 show the exploration stage for ≥ 1. At first, the algorithm needs to divide the waypoints into explored and unexplored ones due to the probability costs in lines [15][16][17][18][19]. Then, it can select the unexplored , , and according to the distance costs ( Figure 5). In lines 21 and 22, it computes the position ( + 1). However, the robots cannot jump physically to the position, thus, the frontier cell, which is the closest to ( + 1), is selected for the next robot position. (b) the waypoints that are located in the explored space have the lowest probability to be selected by the robot than waypoints in unknown space; (c) the best waypoints for one robots should not be duplicated for another robot.
Lines 25-28 illustrate the exploitation stage for ≤ 1. It divides the probability cost by the distance cost . Afterward, it finds the maximum value of the result in line 26 and saves it in the archive. The next robot position is one of the frontier cells, which is closest to the alpha waypoint.
In line 30, the parameter is reduced from 2 to 0 iteratively. Lastly, the map is upgraded by all robots in the end of iteration.
The implementation results are performed in the next section. The multi-robot exploration by MOGWO algorithm obtained notable results, which can be observed in various conditions by adjusting the numbers of waypoints and iterations.

Simulation Results and Analysis
In this section, we implemented the proposed MOGWO exploration and analyzed the obtained results. As alternatives, there are varying parameters for testing the simulation performance such as the number of waypoints and the number of iterations. The parameters are used to test the algorithm performance in several ways. However, some parameters were maintained as constant throughout all the simulation runs (see Table 1).

Parameters
Value Initial poses r1 = (5,5), r2 = (7,9), r3 = (4,9) Map size 15 × 15 Obstacle Width 0.5 Ray Length 1.5 Probabilities of occupancy cells P(robotRx,yR) = 0.0010 P(obstacleRx,yR) = 0.9990 P(unexploredRx,yR) = 0.5000 P(unexploredRx,yR) > P(exploredRx,yR) ≥ P(robotRx,yR) The major goal, which we seek to attain, is to know how many iterations and how many waypoints are needed for efficient exploration. If the number of iterations is too low, the robots do not have time to explore the entire map physically considering that the size of the step in each iteration is unchanged. The same is true for the number of waypoints. They should be enough for free robot driving in the environment. In the next subsection, the experiments with certain constraints are presented, and the Pareto optimal set is proposed for the selection of the optimal solution of the environment.

Simulation Results
The analysis of the MOGWO exploration algorithm takes into consideration two aspects of the objective function: how it explores and how it improves the accuracy of the map. The experiment constraints influence the performance of the algorithm. Due to the GWO stochastic parameters, the decision-making process can be different in each simulation run. It leads us to test the algorithm performance several times with the same constraints. Based on the experiment parameters in Table  1, it can be calculated using Equation (8) that the iteration number should not be less than 60 and more than 120. In addition, for the waypoints, the range should be from 60 to 150 for three robots in a certain map size (Equations (9) and (10)). In this study, we selected the parameters as 60, 80, 100, and 120 iterations and 60, 80, 100, and 150 waypoints. Table 2 shows the results of map coverage in percentage, which is computed using the following equation: (%) = 100 − × 100. (11) In Table 2, it was emphasized that, for example, the maximum map coverage with a certain sequence of decisions and the constraints, 60 iterations by 60 waypoints, is 92.36%. Furthermore, the highest result among all the set of constraints used is 99.47% at the maximum allowable set of constraints.  Figure 6 shows one of the simulation-runs with the constraints: 120 iterations and 150 waypoints. In the ≥ 1 stage, 87.57% of the environment was explored in half of the total number of iterations (61). For the map in Figure 6b, it can be concluded that the robots touched all the waypoints with the sensor rays. This means that the exploration ability of the algorithm is satisfied in this stage. Figure  6c demonstrates the completed result for the ≤ 1 stage with a total of 99.06% map coverage. The trajectories of the robots can be observed through the blue, red, and green colored lines in Figure 6d. The decision-making process of each robot is presented in Figure 7. The values of alpha solutions in the simulation above ( Figure 6) vary for the two stages: exploration and exploitation. It can be seen that when ≥ 1, the trend goes up to maximum values, and when ≤ 1, the simulation tries to achieve the minimum values. The simulation of MOGWO exploration algorithm was implemented in MATLAB using the OccupancyGrid class of the Robotic System Toolbox [51,52]. The video of the simulation can be seen here [53]. Table 3 shows numerous exploration results, which are categorized by constraints. Considering only the percentage of map coverage, it is obvious that 99.47% is the best performance. However, the exploration with 120 iterations and 150 waypoints as constraints takes the longest time, which means it is not the optimal solution.

The Pareto Optimality Analysis for MOGWO Exploration Algorithm
In this study, we take two factors that are important for the exploration: map coverage and time. In Figure 8, we searched for the trade-offs between the minimum number of iterations and the maximum number of map coverage. The plot was made based on the data in Table 2. The results for 120 and 60 iterations, which can be considered too long and too short for exploration, respectively, are extreme solutions. Thus, the solutions belong to the Pareto optimal front, which lies between two lines. It can be concluded here that the optimal set lies between 80 and 100 iterations with 150 waypoints. In the next subsection, the MOGWO exploration algorithm using 150 waypoints will be compared to two other algorithms using the same environment and map coverage computation using Equation (11).

Comparison
In this subsection, the proposed MOGWO exploration was compared with the original deterministic CME algorithm [22] and the hybrid stochastic exploration algorithm based on the GWO and CME [8]. The same map and experiment parameters (Table 1) were selected for the two algorithms with 60, 80, 100, and 120 iterations as it was implemented in the MOGWO exploration algorithm in Section 5.1. It should be noted that waypoints were not applied for the other two algorithms used in the comparison.
In the experiment, the CME algorithm was run only once for each iteration class (60, 80, 100, and 120) because it does not generate any random values even when tested multiple times. By its deterministic nature, CME implements differently for every modification in the environment. For instance, the simulation runs were aborted after the 98 th iteration during our experiment when one of the robots got stuck next to the wall obstacle. For the purpose of completing the exploration, the initial position of robot 2 or r2 (from Table 1) was changed from (7,9) to (6,5). From these results, we can conclude that the exploration by the deterministic CME algorithm requires fine-tuning the map parameters for successful map coverage.
The hybrid stochastic algorithm is a stochastic approach, which uses the single-based GWO algorithm. During our experiments, the simulation-runs were aborted several times due to the robot's selection of inappropriate positions among the frontier cells. This situation occurs when the GWO parameters, A and C, oblige a robot to move into the wrong places, such as obstacles or another robot position. Fortunately, in this algorithm, the A and C parameters vary in each simulation-run, which allows us to obtain successful results.  Figure 9 shows the comparison of the results obtained using the original CME, the hybrid stochastic exploration, and the MOGWO exploration with 150 waypoints. It can be seen that the deterministic CME approach has the lowest values of map coverage among all the algorithms. The proposed MOGWO algorithm does not outperform the hybrid stochastic exploration algorithm in iteration classes, 60, 80, and 100. However, it surpasses the original CME in all iteration categories and the hybrid stochastic exploration algorithm in the category of 120 iterations. Additionally, aborted simulation-runs, which are drawbacks of the CME and hybrid stochastic exploration, did not occur in the MOGWO exploration. Thus, the MOGWO exploration proved more efficient and stable compared to the other algorithms studied in this subsection.  Table 2.

Conclusions
This paper proposed a new method of solving the multi-robot exploration problem as a multiobjective problem. Two objective functions were formed: to search new terrain and to enhance the map accuracy. The use of the MOGWO algorithm enabled us to obtain high percent values of the map coverage without any aborted simulation-runs. The simulation results successfully demonstrated the capability of the MOGWO algorithm to build complete maps, which were completed within certain constraints: the number of waypoints and the number of iterations. Based on the results, the optimal solution was defined by the Pareto optimal set. Furthermore, the proposed MOGWO exploration algorithm was compared with the deterministic exploration and the hybrid stochastic exploration algorithms. The comparison showed that the proposed MOGWO exploration technique outperforms the deterministic exploration in all set of constraints and the hybrid stochastic exploration algorithm at 120 iterations and 150 waypoints. Map coverage