An Improved SOM-Based Method for Multi-Robot Task Assignment and Cooperative Search in Unknown Dynamic Environments

: The methods of task assignment and path planning have been reported by many researchers, but they are mainly focused on environments with prior information. In unknown dynamic environments, in which the real-time acquisition of the location information of obstacles is required, an integrated multi-robot dynamic task assignment and cooperative search method is proposed by combining an improved self-organizing map (SOM) neural network and the adaptive dynamic window approach (DWA). To avoid the robot oscillation and hovering issue that occurs with the SOM-based algorithm, an SOM neural network with a locking mechanism is developed to better realize task assignment. Then, in order to solve the obstacle avoidance problem and the speed jump problem, the weights of the winner of the SOM are updated by using an adaptive DWA. In addition, the proposed method can search dynamic multi-target in unknown dynamic environment, it can reassign tasks and re-plan searching paths in real time when the location of the targets and obstacle changes. The simulation results and comparative testing demonstrate the e ﬀ ectiveness and e ﬃ ciency of the proposed method.


Introduction
Multi-robot systems have become a prominent research area and have attracted increasing attention from researchers. A multi-robot system can be used in many applications, such as in warehousing, search and rescue missions, ground-cleaning, industrial transportation, etc. Task assignment and path planning of a multi-robot system are fundamental issues in the research field of multiple robots. An integrated biologically inspired self-organizing map (BISOM) algorithm was proposed for a 3-D underwater environment [1]. In unknown environments with obstacles, how to assign a group of robots to a target's location and achieve autonomous obstacle avoidance remains a challenging topic in multi-robot systems [2][3][4].
The task assignment of multi-robot systems has attracted the attention of many researchers. Wei et al. [5] studied the multi-robot task allocation problem in the search and retrieval domain. Liu et al. [6] introduced a task allocation mechanism that is based on the ant colony optimization of a swarm intelligence algorithm, which can effectively achieve loosely coupled and tightly coupled task allocation in large-scale multi-robot systems. Yao et al. [7,8] studied task allocation and path planning, such as offline route planning for a coverage search mission in a river region and the time-optimal trajectory generation for aerial coverage of urban buildings. The self-organizing map (SOM) is an unsupervised neural network for data visualization that was first proposed by Kohonen [9]. Some SOM-based methods have been used for energy management, such as battery sorting [10], evaluation of energy distribution [11], efficient utilization of resources [12], etc. Some SOM-based approaches have been proposed to resolve the task assignment for multi-robot systems [13][14][15]. The task assignment of a multi-robot system is acquired based on the competition principle of the SOM, and the path planning of each robot is realized by adjusting the weight updating rules of the SOM. However, these studies have focused on establishing a control model based on prior environmental information at the lowest cost, and seldom consider the path planning problem encountered by each robot en route to its destination in real environments.
The purpose of task assignment and path planning is to arrange the robots so that they visit all of the assigned target locations, while ensuring that the robots can avoid obstacles smoothly. Path planning is required after task assignment, and some improved algorithms have been proposed. However, as the number of robots increases, so too does the computational complexity of the algorithm, such as the artificial bee colony [16], the A* algorithm [17], and the genetic algorithm [18]. A multi-robot algorithm was proposed for exploration, which is better than a depth-first search with a single robot [19]. The artificial potential field (APF) [20] is an efficient path planning algorithm [21,22]. This method has the advantage of low computational complexity, and it has long been proposed in the field of robot path planning. Closed-loop control of the current path and environmental information are utilized to smooth the trajectory. However, the well-known drawbacks of the APF, such as a low efficiency and the local minimal problem, have prevented its widespread application, and the APF is difficult to use in a dynamic environment with unknown location information of obstacles.
Most methods of task assignment and path planning are mainly focused on environments with prior information. Since most algorithms use a global search approach, these methods are suitable for static environments, not unknown dynamic environments with obstacles. Local path planning is an important part of the autonomous motion of mobile robots and requires that a robot moves towards the preferred location without collisions in a completely or partially unknown environment. The dynamic window approach (DWA) is an efficient real-time planning algorithm that was first proposed by Fox D et al. [23]. The DWA has been successfully applied in robot operating system (ROS) navigation; the robot's motion trajectory can be adjusted according to the environmental information acquired in real time. In recent years, many obstacle avoidance methods have been proposed; compared with other obstacle avoidance methods, the DWA accounts for the dynamic properties and the constraints of real robots, such as the maximum velocity, acceleration, etc. Therefore, the control instructions obtained by this method appear to be more in line with the actual situation [24].
In this paper, in unknown dynamic environments where the location information of obstacles requires real-time acquisition, an improved DWA is introduced into an improved SOM neural network to solve the problem of task assignment and cooperative search for multi-robot systems, the proposed method is called the DSOM method. The proposed DSOM method has three main contributions. First, to avoid the robot oscillations and hovering that appear when using an SOM-based algorithm, an improved SOM neural network with a locking mechanism is developed to assign the robots in order to access multiple targets. Second, an adaptive DWA is introduced to adjust the velocities of the robots in order to update the weights of the winning neurons and their neighboring neurons in the SOM in such a way that the robot can automatically avoid speed jumps and obstacles, and achieve a better planning search path in unknown environments. Third, the proposed method can reassign tasks and re-plan the search paths in real time when the location of the target and obstacle changes. The robot acquires real-time distance information with obstacles through sensors, and then it adjusts the linear and angular velocities of the robot through the adaptive DWA to achieve path planning in unknown dynamic environments. This paper is organized as follows. The problem statement is briefly introduced in Section 2. In Section 3, the traditional approaches are introduced. Section 4 describes the proposed method of the DSOM. In Section 5, the experiments are described. A brief conclusion is provided in Section 6.

Problem Statement
In this study, the environmental information is initially unknown. We assume that each robot is equipped with a wireless network device and sensors to measure the relative bearing. The proximity information comes from the local sensors and none of the sensors are faulty. The robot is searching in the unknown environment, the robot can only sense the nearby robots and obstacles through sensors, and it shares this information across the network. We use this information to construct and dynamically update the network topology [25].
The essential issue of task assignment and cooperative search in an unknown dynamic environment is to decompose all the tasks into multiple subtasks according to the perceived environmental information so that all the robots can find their optimal path to move to the specified target and achieve obstacle avoidance. A robot can only perform one task at a time, and each target requires at least one robot to visit, but it can continue to perform another task after completing the first task. There are three primary requirements for the task assignment and cooperative search of multi-robot systems in unknown dynamic environments: (1) each task is assigned to an appropriate robot at the cost of minimizing a multi-robot system; (2) all robots can find the optimal path safely and collision-free when completing the assignment task, and in the overall motion of the robot, collisions with obstacles or other robots are not allowed; and (3) robots are equipped with sensors, and the mobile robots have basic capabilities for communication and location recognition.
Although there have been some SOM-based task assignment and cooperative search studies in multi-robot systems, there are still some issues that need to be improved: (1) the traditional SOM-based method has a phenomenon of output neuron oscillation. In some cases, this phenomenon will lead to too many iterations of the algorithm and even cause the system to fall into infinite iterations, resulting in poor robustness of the system and unsmooth path planning. (2) In some unpredictable emergencies, such as when some robots failed, the robustness of traditional algorithms is poor. Additionally, robots often encounter the sharp speed jump problem; (3) In an environment where the locations of the obstacles are unknown or in a dynamic unknown environment, the effect of obstacle avoidance in the robot cooperative search is not ideal.

SOM Structure of Multi-Robot System
The SOM is generally composed of an input layer and an output layer, which are connected by weight between the output layer and input layer. The SOM structure for task assignment and path planning of a multi-robot system is shown in Figure 1. This model only has two layers of neurons; namely, the input layer (targets' locations) and the output layer (robots' positions). The input layer consists of two neurons, where the two-dimensional input vector (x i , y i ) represents the Cartesian coordinates of the ith task. The output layer (robots' positions) is a competitive layer, in which each neuron represents the location of a robot and is connected to the neurons in the input layer (targets' locations). The connection weight between the neurons in the output layer and the input layer is given by a two-dimensional weight vector. The SOM includes the winner selection rules, the definition of the neighborhood function and the weights update rules. The SOM-based method is used to select winners for the target locations. In addition, the neighborhood function is used to select the neighbor of the winning robot, and the moving speed of the winner and its neighbor is determined. Finally, by updating the weight vectors of the winner robots, the robots can reach the target locations along collision-free paths. When all of the target positions have been accessed, the task assignment and path planning are completed [1]. , the competitive layer neurons compete to be the winner. According to the winner selection rules, the robot with the shortest Euclidean distance from the input task is selected. The rules are defined as follows: where ij D represents the Euclidean distance between the ith input neuron (task) and the jth competitive neuron (robot). represents the coordinates of the jth competitive neuron. W is the selected winner neuron to the ith input neurons. To prevent the robot from being the winner more than once in each iteration, Ω is a set of nodes that provides all of the possible choices for In traditional SOM task assignment for 3D coordinates, the difference is that the winner selection rules are based on 3D coordinates, while other cases are consistent with the application of 2D environment [1,14,15].

Definition of the Neighborhood Function
After the winner selection is finished, the next step is to find the neighbors of the winner. A neighborhood function is introduced to define the influence of the input data on the winner and its neighboring neurons. The neighborhood range is defined as a circle with radius r, where the center is the location of the winner robot. The neighborhood function determines the attractive strength of the input target on the winner as well as its neighbors. The attraction force on the selected neurons is diminishing as the distance between the neighbors and the winner increases. The neighborhood function is described as: is the distance between the nth output neuron and the winner. The nonlinear function ( ) G t is calculated by , where t is the number of iterations, G0 is the gain parameter, α is the update rate determining the calculation time and α < 1 is constant. The calculation time decreases with an increase in the parameter α.

Winner Selection Rules
For a given input neuron (target coordinate) T i = (x i , y i ), the competitive layer neurons compete to be the winner. According to the winner selection rules, the robot with the shortest Euclidean distance from the input task is selected. The rules are defined as follows: where D ij represents the Euclidean distance between the ith input neuron (task) and the jth competitive neuron (robot). R i = (w jx , w jy ) represents the coordinates of the jth competitive neuron. W is the selected winner neuron to the ith input neurons. To prevent the robot from being the winner more than once in each iteration, Ω is a set of nodes that provides all of the possible choices for T i = (x i , y i ). In traditional SOM task assignment for 3D coordinates, the difference is that the winner selection rules are based on 3D coordinates, while other cases are consistent with the application of 2D environment [1,14,15].

Definition of the Neighborhood Function
After the winner selection is finished, the next step is to find the neighbors of the winner. A neighborhood function is introduced to define the influence of the input data on the winner and its neighboring neurons. The neighborhood range is defined as a circle with radius r, where the center is the location of the winner robot. The neighborhood function determines the attractive strength of the input target on the winner as well as its neighbors. The attraction force on the selected neurons is diminishing as the distance between the neighbors and the winner increases. The neighborhood function is described as: where d j = R n − R j is the distance between the nth output neuron and the winner. The nonlinear function G(t) is calculated by G(t) = (1 − α) t G 0 , where t is the number of iterations, G 0 is the gain parameter, α is the update rate determining the calculation time and α < 1 is constant. The calculation time decreases with an increase in the parameter α.

Weights Updating Rule
The winners and their corresponding neighbors will move along effective paths. A Self-organizing neural network not only allocates tasks to suitable robots but also plan paths for each robot to the assigned tasks by updating the weights. The update rule is described by the following equation: where D min is the termination condition of iterations (D min can effectively decrease the calculation time). As long as D ij < D min , the weight value of the jth competitive neuron will be replaced by the ith input vector, which means that the jth robot has reached the ith task. The learning speed β has an effect on the modification of the weight and the computing time [13].

Motion Model of DWA
The DWA is an obstacle-avoidance approach that accounts for the dynamic and kinematic constraints of a mobile robot. The concept of the DWA is to maximize the objective function and directly obtain an optimal velocity command in the velocity space. The position control of the robot is transformed into speed control, and the obstacle avoidance problem is described as an optimization problem with speed space constraints [23,24]. There are two types of constraints: (1) the velocity and acceleration constraints of the robots; and (2) the environment and obstacle constraints.
The velocity command is obtained from the velocity space of the robot. Three kinds of velocity spaces are defined in the DWA, which can be expressed as V s ,V a and V d . Each velocity in the velocity space is a tuple (v, w), in which v is the translational velocity and w is the rotational velocity. V s is composed of all of the possible velocities the robot can achieve.
Each velocity in V a ensures that if we choose this velocity, the robot can stop before crashing into any obstacle. Here, . v b and . w b are the accelerations for breakage.
V d is called the dynamic window, which contains the velocities that can be reached in a clock cycle. Here, v c and w c are the actual velocities.
Then, the resultant velocity search space is as follows: Each velocity in V r will be sampled and its appropriateness is calculated with the objective function. The objective function is defined as follows: where α, β, and γ are constant, and usually α+β+γ=1. Here, σ is a smoothing constant. In these functions, heading(v, w) is an azimuth evaluation function, which maintains the course of the robot towards its target location and is maximal if the robot moves directly towards the target; dist(v, w) is the distance to the closet obstacle on the trajectory, for which the nearer the obstacle is, the more the robot will be inclined to move around it; and vel(v, w) is the linear velocity to force the robot to move fast. A robot's velocity is a tuple (v, w), and includes the linear velocity and angular velocity-that is, the magnitude and the direction of the velocity. When we simulate the trajectory of the robot, the robot finds the positions of the obstacle and other robots, calculates the distance between the robot and the obstacle or other robot, and then judges whether the current sampling velocity can stop before hitting anything else. If it can stop, it is admissible; otherwise, it will be abandoned.

Adaptive DWA
The proposed algorithm in this paper assumes that the robot is equipped with a ranging sensor, which can measure the distance and bearing from the obstacle within a certain range. The safe distance between the robot and the obstacle or other robots is set as D d . The dynamic weight of the linear velocity is defined as the adaptive dynamic weight γ v .
where γ min is the minimum weight and γ max is the maximum weight. The weight range for the adaptive dynamic weight is set to [γ min ,γ max ], ψ and κ are constants, and κ = 1.2 in this paper. The motion model of DWA is still used in the adaptive DWA, we only improve Equation (9), and define the objective function as follows: In this paper, α = 0.6, β = 0.2 and σ = 1. To avoid the high proportion of one of the values, all three components of the objective function are normalized to [0,1]. By combining all three of these functions, the robot avoids the collision as quickly as possible under the abovementioned constraints while still moving in the direction of reaching the target.
Finally, the velocity command for the next time period is the highest velocity of the objective function value. In other words, we have

The Proposed Locking Mechanism
In the process of multi-robot task allocation, the output neuron oscillation phenomenon exists in some SOM-based neural network method. The definition of the output neuron oscillation phenomenon in our study is as follows: Output neuron oscillation refers to when the same output neuron is attracted by different targets in such a way that the output neuron moves back and forth between those targets [26]. When there are more targets than robots, the probability of this phenomenon occurring is relatively high, and the worst result occurs when the output neuron always moves back and forth between the input targets and cannot achieve any target. Such a result could lead to too many iterations and even failure of the task allocation. Because of the randomness of the input targets, there is a possibility that the output neuron (robot) could never visit any target.
To illustrate the working process of selecting of the winner and neighbors as well as the locking mechanism, an example is shown in Figure 2, where the blue dots represent the initial locations of the Energies 2020, 13, 3296 7 of 18 robots, the black circles represent the obstacles, and the hexagonal stars denote the positions of the targets. In this study, we assume that there is no difference between each pair of robots, which means that they all have the same parameters and functions. When all of the targets in the workspace are visited by the robots, all of the tasks are completed. For each robot, the cost required to complete all the assigned tasks for the robot is defined as the time cost (number of iterations) and the path length. The total cost is the sum of the path length of all the robots and the maximum iteration times among all of the robots. between those targets [26]. When there are more targets than robots, the probability of this phenomenon occurring is relatively high, and the worst result occurs when the output neuron always moves back and forth between the input targets and cannot achieve any target. Such a result could lead to too many iterations and even failure of the task allocation. Because of the randomness of the input targets, there is a possibility that the output neuron (robot) could never visit any target.
To illustrate the working process of selecting of the winner and neighbors as well as the locking mechanism, an example is shown in Figure 2, where the blue dots represent the initial locations of the robots, the black circles represent the obstacles, and the hexagonal stars denote the positions of the targets. In this study, we assume that there is no difference between each pair of robots, which means that they all have the same parameters and functions. When all of the targets in the workspace are visited by the robots, all of the tasks are completed. For each robot, the cost required to complete all the assigned tasks for the robot is defined as the time cost (number of iterations) and the path length. The total cost is the sum of the path length of all the robots and the maximum iteration times among all of the robots. In Figure 2, it is assumed that the location of target 1 T is one of the input vectors, and robot 1 R becomes the winner according to the winner selection rules. Then, we define the winner's neighborhood domain (with radius r ). Obviously, robot 2 R is in the neighborhood area. Finally, according to the weight updating rule in Equation (4), robots 1 R and 2 R will move towards target 1 T at different speeds. Additionally, there are three targets and one robot in the top left corner.
Regardless of whether target 2 T , target 3 T or target 4 T are used as the input data for the network, the winner is robot 3 R , and there are no neighbors in the neighborhood of the winner. By analyzing the character of the winner selection rules in Equation (2) and neighborhood updating rule in Equation (3), the output neuron (robot) will move back and forth between targets 2 To avoid the oscillation and hovering phenomenon of the output neuron (robot), a locking mechanism is proposed in this paper to improve the SOM-based algorithm. The mechanism is described as follows: each output neuron (robot) can either lock a target or not lock a target, and only one target can be locked by an output neuron (robot). Once an output neuron (robot) locks a target, In Figure 2, it is assumed that the location of target T 1 is one of the input vectors, and robot R 1 becomes the winner according to the winner selection rules. Then, we define the winner's neighborhood domain (with radius r). Obviously, robot R 2 is in the neighborhood area. Finally, according to the weight updating rule in Equation (4), robots R 1 and R 2 will move towards target T 1 at different speeds. Additionally, there are three targets and one robot in the top left corner. Regardless of whether target T 2 , target T 3 or target T 4 are used as the input data for the network, the winner is robot R 3 , and there are no neighbors in the neighborhood of the winner. By analyzing the character of the winner selection rules in Equation (2) and neighborhood updating rule in Equation (3), the output neuron (robot) will move back and forth between targets T 2 , T 3 and T 4 .
To avoid the oscillation and hovering phenomenon of the output neuron (robot), a locking mechanism is proposed in this paper to improve the SOM-based algorithm. The mechanism is described as follows: each output neuron (robot) can either lock a target or not lock a target, and only one target can be locked by an output neuron (robot). Once an output neuron (robot) locks a target, the output neuron will not move towards other targets. The time from the current position of the robot to the target location is called the locking cost of output neuron R i and target T i . Two requirements are needed to lock the output neuron R i and the target T i . First, when T i is the input data for the network, R i is the winner neuron and there are no other neurons in the neighborhood. Second, the locking cost must be minimal. Take Figure 2 as an example to further illustrate the principle of locking mechanism. After neural network initialization, robot R 3 does not lock any target, thus the locking cost of R 3 is infinite. When T 3 is randomly selected as the input neuron, according to the winner selection rules, R 3 is called the winner neuron and there is no other robot in the neighborhood, and the distance from R 3 to T 3 is less than the infinity in the locking table, which means that the two requirements of locking mechanism are met, and the locking target in the locking table is T 3 . After an algorithm iteration, when R 3 moves to point a, the locking cost of R 3 is updated to L 2 . In the second iteration period, T 2 is randomly selected as the input neuron, and only R 3 is in the neighborhood. We assume that the distance from point a to T 2 is less than that to T 3 , then the two requirements are met, and then the Energies 2020, 13, 3296 8 of 18 locking target of R 3 is modified to T 2 , the locking cost is updated to L 1 . In the next iteration period, whether T 3 or T 4 is randomly selected as the input neuron, the locking cost of robot R 3 to T 3 or T 4 is more than that of robot T 2 , so the locking target of R 3 remains unchanged, until R 3 reaches T 2 , the locking target of R 3 is reset to null. After that, repeating the above process, T 3 and T 4 will be locked successively.
If a robot has locked a target, it is only required to satisfy requirement 1. When robot R j locks one target rather than the current input target T i , the value of L( j) is reset to 0. Otherwise, the value of L( j) is reset to 1.

New Neighborhood Function
After selecting the winning robot, the next step is to find the winner's neighbors. The influence of the input data on the winner and its neighboring neurons is defined by a neighborhood function. Taking the position of the winner robot as the center, the neighborhood range is defined as a circle with radius r. The attractive strength of the input target on the winner robot as well as its neighbors is determined by the neighborhood function. The attraction force on the selected neurons diminishes as the distance between the neighbors and the winner increases. Because of the locking mechanism, the neighborhood function in Equation (3) is redefined as follows: According to the locking mechanism, L( j) is a function in Equation (13) with a value range of {0, 1}.

New Weights Updating Rule with DWA
The tasks are assigned to the appropriate robots via the SOM neural network, and the paths for the assigned tasks are planned by updating the weights. The winners of the robots and their neighbors will move forward the along optimized paths. To enable the robot to achieve autonomous obstacle avoidance in the process of accomplishing the task, this paper redefines the neighborhood updating rule of the SOM based on the motion model in the DWA. The weight updating rule in Equation (4) is redefined as follows: where |V t | is required to be suitable data in order to satisfy the speed control constraints of the robot, and therefore, the robot does not exhibit a sharp speed jump phenomenon. The learning speed ρ has an effect on the computing time and the number of iterations; the larger the value of ρ is, the faster the robot is.
In the new weights updating rules, the motion model in the DWA is applied to the definition of the new rules. Robots can adjust the linear and angular velocities in real time through the DWA according to the information acquired by sensors in the environment, which thus improves the response ability to the environmental changes and avoids the speed jump problem, as well as improving the cooperative search in the environments where the obstacle locations are unknown. The DSOM for task assignment and cooperative search can be implemented to avoid static or dynamic obstacles.

Implementation of the DSOM
The flow diagram of the DSOM method is shown in Figure 3. Next is a detailed description of the proposed method of the DSOM. The DSOM method is obtained following the basic steps below.
Energies 2020, 13, 3296 9 of 18 Step 1: Initialization. Initialize the weights of the output neuron (robots' position), input neuron (tasks' position), and locking table; Step 2: A task T i is randomly selected and input into the neural network; Step 3: According to winner selection rules and the neighborhood rules, the winner neurons (winner robots) and their neighbors are found; Step 4: Update the locking table according to the locking mechanism; Step 5: Compute the velocity |V t | and update the robots' position of the winner robots and their neighbors by the adaptive DWA; Step 6: Update the locking cost; Step 7: Repeat steps 2 to 6 until all tasks and the program are completed.

lidation and Comparison
In this paper, the effectiveness and efficiency of our proposed DSOM method was verifie s of simulation experiments. All simulations were run on the computer of win7 with mem

Validation and Comparison
In this paper, the effectiveness and efficiency of our proposed DSOM method was verified by a series of simulation experiments. All simulations were run on the computer of win7 with memory of 4 GB, and the simulation experiments were implemented in MATLAB R2013. In these simulation experiments, the size of the workspace was set to 300 × 300. In addition, the locations of the obstacles were initially unknown to the robots. However, the robots knew the search area and the initial locations of the targets. If all of the tasks were completed or the maximum number of iterations maxItr = 500 was reached, we ceased execution of the task.
In this section, four different experiments are presented, including the evaluation of robot performance in unknown environments with dynamic obstacles, multi-robot cooperative search, a case of robots failure, and the problem of robot hovering, as well as the performance of comparative testing.

Unknown Environment with Dynamic Obstacles
In this section, we study the simulation experiments in unknown dynamic environments (dynamic targets, dynamic obstacles). As shown in Figure 4, the blue dots represent the initial locations of the mobile robot, the hollow hexagonal stars represent the initial locations of the targets and a solid hexagonal star indicates that the robot has visited the target position and completed the task. Targets can move randomly in the workspace, where the dotted line represents the trajectory of the moving targets, and the blue hexagonal stars represent the final locations of the moving targets. Figure 4 shows an environment with dynamic obstacles, where the solid black circles represent the initial positions of the obstacles, a hollow circle represents the position of the obstacle after it has moved and the dotted lines are the trajectories of the moving obstacles. The solid lines are the tracks of the robots to the targets. To better illustrate the effectiveness of the algorithm, pause processing was performed in iterations 50 and 90 during the simulation experiment, and the learning rate of the robot was set to ρ = 1. The number of iterations is used to reflect the simulation times in this paper. To facilitate the analysis and explanation, the names of some robots, targets and obstacles are marked in the experimental results. The simulation results show that the proposed method can achieve better task assignment and cooperative search in these cases.
As shown in Figure 4, the proposed method achieves better task assignment and cooperative search in an environment with both dynamic obstacles and dynamic targets. As shown in Figure 4a, there are 29 obstacles in the environment, and three of them can move randomly, named O 1 , O 2 and O 3 . There are eight targets, and three of them can move randomly, named T 1 , T 3 and T 5 . As shown in Figure 4b, in the 50th iteration, obstacles O 1 , O 2 and O 3 and targets T 1 , T 3 and T 5 move to their new positions. Each robot can avoid static and dynamic obstacles in an environment with both dynamic obstacles and dynamic targets, which requires that each robot can adjust its path in time according to the real-time location information of the targets and obstacles. The robots need 140 iterations to complete all of the tasks. The simulation results show that regardless of how the target and the obstacles move, the robot can reassign the task and re-plan the path in real time, and it can find a collision-free path to complete the task. It can be seen from the smoothness of the robot's moving trajectories that there is no speed jump problem when the robot moves towards the target. The proposed algorithm has a good robustness.
Energies 2020, 13, 3296 11 of 18 complete all of the tasks. The simulation results show that regardless of how the target and the obstacles move, the robot can reassign the task and re-plan the path in real time, and it can find a collision-free path to complete the task. It can be seen from the smoothness of the robot's moving trajectories that there is no speed jump problem when the robot moves towards the target. The proposed algorithm has a good robustness.

Multi-Robot Cooperative Search
To better reflect the cooperative search effect of the proposed algorithm, we give the search experimental results in several cases, as shown in Figures 5 and 6. In these experiments, the robot searched according to the proposed DSOM algorithm. The artificial potential field (APF) method is used to replace the velocity t V in the updated weight formula in Equation (15) to update the

Multi-Robot Cooperative Search
To better reflect the cooperative search effect of the proposed algorithm, we give the search experimental results in several cases, as shown in Figures 5 and 6. In these experiments, the robot searched according to the proposed DSOM algorithm. The artificial potential field (APF) method is used to replace the velocity |V t | in the updated weight formula in Equation (15) to update the position of the target point. The target is regarded as the prey in the search process. At this time, the APF does not consider the force between the targets.
To better compare the experimental results of the search, we consider more complex environments in which the number of obstacles increases significantly, as shown in Figures 5 and 6. As shown in Figure 5, the initial positions of the six robots are on the diagonal of the search space, and the target points are on both sides of the diagonal. We take the learning rate ρ as 4, and the coefficient of the target (prey) learning rate as 0.7. However, the neighborhood coefficient at this time is 1.2.
To better compare the experimental results of the search, we consider more complex environments in which the number of obstacles increases significantly, as shown in Figures 5 and 6. As shown in Figure 5, the initial positions of the six robots are on the diagonal of the search space, and the target points are on both sides of the diagonal. We take the learning rate ρ as 4, and the coefficient of the target (prey) learning rate as 0.7. However, the neighborhood coefficient at this time is 1.2. To better see the experimental results, we also pause after 20 iterations, as shown in Figure 5a, and finally complete the target search task after 36 iterations. From the above results, it can be seen that the search tasks of the two targets can be well completed. As shown in Figure 6, there are eight robots and four targets. We take the learning rate of the robot as 3 and the learning rate of the target (prey) as 1, but the neighborhood coefficient is 1.1 at this time. To better observe the experimental results, we also pause after 30 iterations, as shown in Figure 6a. After 61 iterations, we successfully completed the multiple target search task. Through the experiment shown in Figures 5 and 6, the effectiveness and efficiency of the proposed DSOM method to complete the multi-target dynamic search are verified. The robot can find the moving target accurately and quickly after locking onto the target according to the locking mechanism and can complete the target search tasks. To better see the experimental results, we also pause after 20 iterations, as shown in Figure 5a, and finally complete the target search task after 36 iterations. From the above results, it can be seen that the search tasks of the two targets can be well completed. As shown in Figure 6, there are eight robots and four targets. We take the learning rate of the robot as 3 and the learning rate of the target (prey) as 1, but the neighborhood coefficient is 1.1 at this time. To better observe the experimental results, we also pause after 30 iterations, as shown in Figure 6a. After 61 iterations, we successfully completed the multiple target search task. To better see the experimental results, we also pause after 20 iterations, as shown in Figure 5a, and finally complete the target search task after 36 iterations. From the above results, it can be seen that the search tasks of the two targets can be well completed. As shown in Figure 6, there are eight robots and four targets. We take the learning rate of the robot as 3 and the learning rate of the target (prey) as 1, but the neighborhood coefficient is 1.1 at this time. To better observe the experimental results, we also pause after 30 iterations, as shown in Figure 6a. After 61 iterations, we successfully completed the multiple target search task. Through the experiment shown in Figures 5 and 6, the effectiveness and efficiency of the proposed DSOM method to complete the multi-target dynamic search are verified. The robot can find the moving target accurately and quickly after locking onto the target according to the locking mechanism and can complete the target search tasks. Through the experiment shown in Figures 5 and 6, the effectiveness and efficiency of the proposed DSOM method to complete the multi-target dynamic search are verified. The robot can find the moving target accurately and quickly after locking onto the target according to the locking mechanism and can complete the target search tasks.

Some Robots Failed
The proposed DSOM presented in this paper has a good robustness. Even in some unpredictable emergencies, the DSOM can still obtain a reasonable solution. In some traditional algorithms, Energies 2020, 13, 3296 13 of 18 these unexpected situations often mean fatal errors or system crashes. Figure 7 shows the failure of some robots in the system during the execution of the tasks. Figure 7a shows the initial position of the system. As shown in Figure 7b, the black line represents the trajectory of the robot's movement, and the red cross represents the location where the robot failed. As shown in Figure 7b,c, robots R 4 and R 5 failed on their way to tasks T 2 and T 4 , respectively. Additionally, the robots lost their ability to move towards and complete the tasks. In this case, R 3 completes task T 2 after task T 1 , and R 6 completes task T 4 after task T 3 ; finally, all tasks are completed, as shown in Figure 7d. The simulation results show that the proposed method can reassign the tasks and re-plan the path when the robot fails, achieving a good effect.

Some Robots Failed
The proposed DSOM presented in this paper has a good robustness. Even in some unpredictable emergencies, the DSOM can still obtain a reasonable solution. In some traditional algorithms, these unexpected situations often mean fatal errors or system crashes. Figure 7 shows the failure of some robots in the system during the execution of the tasks. Figure 7a shows the initial position of the system. As shown in Figure 7b, the black line represents the trajectory of the robot's movement, and the red cross represents the location where the robot failed. As shown in Figure 7b,c, robots R4 and R5 failed on their way to tasks T2 and T4, respectively. Additionally, the robots lost their ability to move towards and complete the tasks. In this case, R3 completes task T2 after task T1, and R6 completes task T4 after task T3; finally, all tasks are completed, as shown in Figure 7d. The simulation results show that the proposed method can reassign the tasks and re-plan the path when the robot fails, achieving a good effect.

Robot Hovering Problem
The proposed method is also capable of addressing the case where the number of tasks is greater than the number of robots. For the case where the number of tasks in the environment is greater than the number of robots (six robots and eight targets), as shown in Figure 8, it is obvious that some of the robots have more than one target to reach. When a robot reaches a target, it should become free and can pursue another target. As seen from the figure, some robots visited only one target, and some robots visited two targets. The main challenge in this case is to choose an optimal visit order for the robots that have more than one target to visit. Due to the competitive effect of the competitive layer in the SOM network, the planned paths for the robots are the shortest paths among all possible routes. As shown in Figure 8d, the proposed method can also adjust the task assignment in real time. Initially, robot R4 was assigned to complete task T6, and robot R5 was assigned to complete task T5. However,

Robot Hovering Problem
The proposed method is also capable of addressing the case where the number of tasks is greater than the number of robots. For the case where the number of tasks in the environment is greater than the number of robots (six robots and eight targets), as shown in Figure 8, it is obvious that some of the robots have more than one target to reach. When a robot reaches a target, it should become free and can pursue another target. As seen from the figure, some robots visited only one target, and some robots visited two targets. The main challenge in this case is to choose an optimal visit order for the robots that have more than one target to visit. Due to the competitive effect of the competitive layer in the SOM network, the planned paths for the robots are the shortest paths among all possible routes. As shown in Figure 8d, the proposed method can also adjust the task assignment in real time. Initially, robot R 4 was assigned to complete task T 6 , and robot R 5 was assigned to complete task T 5 .
However, when robot R 5 completed task T 5 , it was closer to task T 6 than robot R 4 . At this time, task T 6 is reassigned to robot R 5 , and robot R 4 stops working.
The traditional SOM algorithm leads to the robot oscillation and hovering problem. In addition, the robot has no obstacle avoidance function in the traditional SOM method. As shown in Figure 8a, in the traditional SOM neural network task assignment and cooperative search algorithm, robots R 1 , R 2 , R 4 and R 5 exhibited the oscillation and hovering phenomenon, and the other robots were not assigned tasks. After 500 iterations, the traditional SOM algorithm still failed to progress from this state, while the proposed method only required 172 iterations to complete all tasks and achieve better obstacle avoidance. As shown in Figure 8c, some robots exhibited the oscillation and hovering phenomenon. Robot R 1 completed task T 1 , robot R 3 completed task T 3 , and then robot R 3 continued to move towards the target. When it moved to the middle of targets T 2 and T 4 , according to the traditional winner selection rules, it became the winner of the two tasks, and because of the lack of restrictions on the self-organizing behavior and the random input order, it constantly moves back and forth between the two tasks. At the same time, the hovering phenomenon of robots R 2 , R 4 and R 5 made it impossible to complete the task in the end.
Energies 2020, 13,3296 14 of 19 when robot R5 completed task T5, it was closer to task T6 than robot R4. At this time, task T6 is reassigned to robot R5, and robot R4 stops working. The traditional SOM algorithm leads to the robot oscillation and hovering problem. In addition, the robot has no obstacle avoidance function in the traditional SOM method. As shown in Figure 8a, in the traditional SOM neural network task assignment and cooperative search algorithm, robots R1, R2, R4 and R5 exhibited the oscillation and hovering phenomenon, and the other robots were not assigned tasks. After 500 iterations, the traditional SOM algorithm still failed to progress from this state, while the proposed method only required 172 iterations to complete all tasks and achieve better obstacle avoidance. As shown in Figure 8c, some robots exhibited the oscillation and hovering phenomenon. Robot R1 completed task T1, robot R3 completed task T3, and then robot R3 continued to move towards the target. When it moved to the middle of targets T2 and T4, according to the traditional winner selection rules, it became the winner of the two tasks, and because of the lack of restrictions on the self-organizing behavior and the random input order, it constantly moves back and forth between the two tasks. At the same time, the hovering phenomenon of robots R2, R4 and R5 made it impossible to complete the task in the end.
(a) (b) The proposed DSOM method solves the problem of obstacle avoidance better and uses a locking mechanism to better address oscillation and hovering, thus improving the success rate of this method. As shown in Figure 8b, robot R1 completes tasks T2 and T3 in turn. In Figure 8b,d, from the motion trajectory of the robot, it can be seen that the proposed method avoids the oscillation of the robot in The proposed DSOM method solves the problem of obstacle avoidance better and uses a locking mechanism to better address oscillation and hovering, thus improving the success rate of this method. As shown in Figure 8b, robot R 1 completes tasks T 2 and T 3 in turn. In Figure 8b,d, from the motion trajectory of the robot, it can be seen that the proposed method avoids the oscillation of the robot in the process of task assignment and cooperative search, and from the smoothness of the robot's moving trajectory, there is no problem of a speed jump when the robot moves towards the target. Thus, the entire trajectory is very smooth. In the workspace with obstacles, the improved algorithm can make the robot reach the task position along the shortest path and avoid obstacles automatically when encountered.

Comparative Testing
To verify the effectiveness and superiority of the proposed method in task assignment and cooperative search under an unknown dynamic environment, comparison studies with the algorithms in reference [1,22] are compared in this section. The algorithm in [1] was proposed for a 3-D underwater environment. To avoid a contingency, thirty different environments with the same size were simulated with the comparison methods in reference [1,22], and the proposed method of DSOM. It should be noted that in our experiments, only the 2-D model of the algorithm in reference [1] is used; the comparison methods in reference [1,22] do not consider the unknown environment, and the location information of obstacles is provided to them as prior information. The success rates and averages of the iterations for each algorithm are reported in Figure 9. There are eight robots in each experiment and these robots and targets are randomly distributed. When the number of targets is more than the number of robots, some robot must visit more than one target. Therefore, the possibility of the oscillation and hovering phenomenon occurring is higher. As shown in Figure 9a, the success rate calculated here refers to the successful experiments that can find all of the targets. When the number of robots is greater than or equal to the number of targets, the success rate of both algorithms is 100%. The larger the number of targets is, the lower the success rate is. The success rate of the DSOM algorithm is better than the comparison methods when the number of targets is greater than 14. In Figure 9b, when the number of targets is less than 10, the number of iterations of the comparison methods in reference [1] and [22] is less than that of the DSOM, but there is not much difference. As a comparison, the number of iterations obtained by using the DSOM is very stable and within a reasonable range. In other words, the larger the number of targets a single robot visits, the more obvious the effectiveness and robustness of the proposed method.

Comparative Testing
To verify the effectiveness and superiority of the proposed method in task assignment and cooperative search under an unknown dynamic environment, comparison studies with the algorithms in reference [1] and [22] are compared in this section. The algorithm in [1] was proposed for a 3-D underwater environment. To avoid a contingency, thirty different environments with the same size were simulated with the comparison methods in reference [1] and [22], and the proposed method of DSOM. It should be noted that in our experiments, only the 2-D model of the algorithm in reference [1] is used; the comparison methods in reference [1] and [22] do not consider the unknown environment, and the location information of obstacles is provided to them as prior information. The success rates and averages of the iterations for each algorithm are reported in Figure 9. There are eight robots in each experiment and these robots and targets are randomly distributed. When the number of targets is more than the number of robots, some robot must visit more than one target. Therefore, the possibility of the oscillation and hovering phenomenon occurring is higher. As shown in Figure 9a, the success rate calculated here refers to the successful experiments that can find all of the targets. When the number of robots is greater than or equal to the number of targets, the success rate of both algorithms is 100%. The larger the number of targets is, the lower the success rate is. The success rate of the DSOM algorithm is better than the comparison methods when the number of targets is greater than 14. In Figure 9b, when the number of targets is less than 10, the number of iterations of the comparison methods in reference [1] and [22] is less than that of the DSOM, but there is not much difference. As a comparison, the number of iterations obtained by using the DSOM is very stable and within a reasonable range. In other words, the larger the number of targets a single robot visits, the more obvious the effectiveness and robustness of the proposed method. To further compare the performance of the comparison methods in reference [1] and [22] and DSOM, simulation experiments are performed for four cases, including a dynamic environment in which the number of targets is equal to that of the robots (Case 1), a dynamic environment in which the number of targets is more than that of the robots (Case 2), a dynamic environment in which some targets are movable (Case3), and a dynamic environment in which some obstacles are movable (Case4). For each case, the numbers and locations of the targets, robots and obstacles are the same. Thirty experiments are repeated in each case to avoid any contingencies. In each experiment, the number of robots and targets is determined, but the positions of the robots and targets are randomly generated; additionally, the number of iterations and the total path length are recorded. The average number of iterations and the average path length of the proposed method and comparison methods in reference [1] and [19] are compared in Table 1. Compared with the comparison methods, the To further compare the performance of the comparison methods in reference [1] and [22] and DSOM, simulation experiments are performed for four cases, including a dynamic environment in which the number of targets is equal to that of the robots (Case 1), a dynamic environment in which the number of targets is more than that of the robots (Case 2), a dynamic environment in which some targets are movable (Case3), and a dynamic environment in which some obstacles are movable (Case4). For each case, the numbers and locations of the targets, robots and obstacles are the Energies 2020, 13, 3296 16 of 18 same. Thirty experiments are repeated in each case to avoid any contingencies. In each experiment, the number of robots and targets is determined, but the positions of the robots and targets are randomly generated; additionally, the number of iterations and the total path length are recorded. The average number of iterations and the average path length of the proposed method and comparison methods in reference [1,19] are compared in Table 1. Compared with the comparison methods, the proposed method has fewer iterations and a shorter path length in three cases, especially when the number of targets is greater than the number of robots. This difference occurs because the proposed method uses a locking mechanism to solve the problem of oscillation and hovering, and enables the robot to move to the designated target position faster in order to complete the task. Further, the robot uses the DWA to avoid obstacles; the DWA has a better effect on the environment of the unknown obstacle location, and it can better avoid obstacles. Thus, the robot can move to the target position without a collision. The simulation results show that the locking mechanism adopted by the proposed DSOM method improves the robustness of the SOM neural network and reduces the hovering of the robots. Additionally, through the DWA method, the robot can effectively avoid obstacles and access targets in dynamic environments where the location information of the obstacles is unknown, and the robot can avoid speed jumps. Compared with other algorithms, the DSOM has the advantages of requiring less time and utilizing a shorter path length. In addition, the algorithm can address unknown dynamic environments, it can have the number of robots be less than or greater than the target, and it presents cases in which some robots have faults. The DSOM method can also reassign tasks and re-plan paths in real time when the location of the target and obstacle changes. The proposed method can thus better accomplish task assignment and cooperative search.

Conclusions
To solve the problems in the task assignment and cooperative search research of multi-robot systems described in this paper, an improved SOM method combined with the adaptive DWA was proposed. The main contributions of the DSOM are as follows. (1) The SOM algorithm is improved by building a locking mechanism on each competing neuron to restrict its self-organizing behavior in order to avoid situations of robot oscillation and hovering. (2) The adaptive DWA is used to update the weights of the SOM's competing neurons for each robot in order to adjust the direction of motion according to environmental information. It can also achieve obstacle avoidance without knowing the location information of the obstacles, optimize the trajectories of robots, and avoid speed jumps. (3) The proposed method can reassign tasks and re-plan paths in real time when the locations of the target and obstacle change. Task assignment and cooperative search in different conditions are well realized in unknown environments in which obstacle location information requires real-time acquisition.
The proposed method of the DSOM was simulated in unknown dynamic environments, as well as in cases where some robots fail and the robot hovering problem exists; this method was also compared with other algorithms. The experimental results show that the proposed method can solve the problems that occur in task assignment and cooperative search in unknown dynamic environments. The DSOM requires fewer iterations, has a better efficiency and a higher success rate in most cases, and has good robustness. In the future, we plan to study task assignment for multi-robot cooperative hunting problems, and we will build a multi-robot experimental platform for testing. The DWA algorithm has been successfully applied to an ROS robot system; therefore, the proposed method is expected to solve real-world engineering problems and more complex scenarios in the field of intelligent mobile robots.
Author Contributions: H.T. and W.S. gave the conceptualization of the paper and designed the methods; H.T., A.L. and W.S. worked on the simulation results and were involved in the paper writing; H.T. and S.S. analyzed the data and were involved in the paper review and editing. All authors have read and agreed to the published version of the manuscript.