A Pheromone-Inspired Monitoring Strategy Using a Swarm of Underwater Robots

The advent of the swarm makes it feasible to dynamically monitor a wide area for maritime applications. The crucial problems of underwater swarm monitoring are communication and behavior coordination. To tackle these problems, we propose a wide area monitoring strategy that searches for static targets of interest simultaneously. Traditionally, an underwater robot adopts either acoustic communication or optical communication. However, the former is low in bandwidth and the latter is short in communication range. Our strategy coordinates underwater robots through indirect communication, which is inspired by social insects that exchange information by pheromone. The indirect communication is established with the help of a set of underwater communication nodes. We adopt a virtual pheromone-based controller and provide a set of rules to integrate the area of interest into the pheromone. Based on the information in the virtual pheromone, behavior laws are developed to guide the swarm to monitor and search with nearby information. In addition, a robot can improve its performance when using additional far-away pheromone information. The monitoring strategy is further improved by adopting a swarm evolution scheme which automatically adjusts the visiting period. Experimental results show that our strategy is superior to the random strategy in most cases.


Introduction
The development of the science and technology of robots makes it feasible to produce large quantities of robots with low cost. Organizing these robots to work together has become a hot topic in the research community and recent years have witnessed rapid progress in robot swarms. Nature is one of the best sources for swarm intelligence, and many communication mechanisms have been developed based on nature and animal behavior [1][2][3]. Social insects usually adopt two communication schemes, i.e., direct communication and indirect communication. Insects can exchange information via direct communication. For example, bees can indicate the positions of nectar source through waggle dance [4]. On the other hand, some insects adopt indirect communication by secreting pheromone into the environment and other insects can get the message by sensing the pheromone.

•
We propose a communication network to organize a swarm of underwater robots using indirect communication. The network consists of a set of underwater communication nodes. There are various underwater navigation methods-such as Terrain-Referenced Navigation (TRN) [23], Database-Referenced Navigation (DBRN) [24] and Gravity Aided Navigation (GAN) [25]-for an underwater robot to periodically visit the nodes to exchange information and charge batteries if needed.

•
We apply a pheromone-based controller to coordinate a swarm to monitor marine environment and search for static targets on the seafloor. The controller is composed of two layers: the layer of virtual pheromone and the layer of behavior laws. Virtual pheromone indicates the pheromone density in the area of interest (AOI). An algorithm is developed to map an AOI of a random shape to the virtual pheromone in the form of a matrix. Behavior laws are designed on top of the virtual pheromone, such that a swarm continuously monitors the environment. During the monitoring process, the swarm can also search for and report specific static targets, such as hazards or wreckage. Note that the controller is bio-inspired, and thus we do not prove the convergence of adopted algorithms. • We introduce a swarm evolution scheme to improve the monitoring strategy by automatically adjusting the robots' visiting period. Experimental results indicate that the choice of a visiting period affects a swarm's performance. After adopting an evolution scheme, a swarm can achieve an acceptable performance by avoiding unfavorable cases.
The rest of this paper is organized as follows. Related work is discussed in Section 2. We describe the problem and introduce the pheromone-based controller in Section 3. The pheromone map is explained in Section 4. The behavior laws to monitor the environment and search for static targets are demonstrated in Section 5. In Section 6, we present the simulation and real-world experimental results. Section 7 concludes this paper.

Comparison with Available Schemes
Monitoring the environment and searching for targets is a main application for multi-robot systems. To realize this, robots involved in the swarm need to work cooperatively. There are two ways to organize the robots: a consensus way and a distributed way.
When organized into a consensus structure, the whole swarm obeys commands from a leader. To achieve this, the leader needs to be able to communicate with all robots in the swarm, so that it can gather data from robots, make decisions based on the information, and send commands to others. As the leader has access to global information of the swarm, it can make optimal decisions, such as path planning and task assignment [70]. The drawback of this strategy is that the leader is a single point of failure (SPOF) of the swarm. The failure of the leader will result in collapse of the whole swarm. To solve this problem, some methods have been proposed to enhance the robustness of the swarm using a dynamic leader, such as electing a leader with swarm decision-making methods, or dynamically changing the roles of the robots in the swarm [4].
In a distributed robot swarm, a leader is not required and robots make decisions based on the obtained information. At present, most methods are based on direct communication, e.g., [71] proposes a power-efficient system where a node communicates with others within a certain communication range. In addition, methods using indirect communication have been extensively studied, which are mainly inspired by foraging of the ant colony. To build a robot swarm with indirect communication, we assume a communication network consisting of communication nodes have been deployed into the AOI in advance. Field experiments have verified that robots are able to exchange large volumes of data in a short time by visiting a communication node. It is worth mentioning that, as the positions of communication nodes are fixed, an underwater robot can be navigated to a node by affordable methods, such as TRN [23], DBRN [24] and GAN [25].
Direction communication adopts either acoustic or optical communication, but the former is low in bandwidth and the latter is short in range, which limits the communication capability of the leader in a consensus structure so that it cannot gather needed information in time. Therefore, in this paper, we coordinate the swarm in a distributed way via indirect communication.

Problem Statement and Underwater Robot Swarm with Indirect Communication
This paper seeks to develop a method to coordinate a large number of underwater robots into a swarm that can work cooperatively and can be applied to monitor the environment while searching for underwater targets simultaneously. The swarm is deployed into an extensive area in kilometers, which means that the distance between a pair of robots will be too far to communicate with optical lights, and acoustic communication cannot meet the communication bandwidth requirement. Thus, we abandon direct peer-to-peer communication among robots. Then, the problem can be reformulated as how to use N robots to monitor the environment and search for underwater targets in a cooperative way without direct communication.
Our solution is to organize an underwater robot swarm with the help of a communication network. We deploy a set of underwater communication nodes into the AOI. These communication nodes are connected through underwater cables or radio (if the communication node is connected to a buoy equipped with antennas). Finally, these nodes form an underwater communication network, and they share data as a whole. When visiting a communication node, an underwater robot can exchange data with the node through optical communication, transferring a large volume of data in a short time. The whole AOI is shown in Figure 1. The relationship between robots and nodes is not one-to-one, and one robot can visit any node.
It is assumed that all robots are equipped with localization devices, and the localization error can be eliminated when visiting a communication node. Thus, we can assume that a robot can get a relatively accurate position. All robots have the information of the AOI, including the shape of the AOI and the positions of the communication nodes. Robots are also equipped with collision avoidance sensors that are able to detect the existence of another robot when the distance between them is smaller than a threshold. This device can be a sonar-based, or an optical light-based sensor [67].

Virtual Pheromone-Based Controller
To achieve cooperative monitoring and search without direct communication, we design a virtual pheromone-based controller. The controller consists of two layers. The bottom layer is a virtual pheromone map, while the top layer is a behavior controller. The structure is shown in Figure 2. The pheromone map is used to mimic the environment into which an individual can secrete pheromone, and from which an individual can sense pheromones. Thus, this map contains the information of the whole swarm. However, as robots in the swarm do not share real-time communication, the information is readily outdated. Each robot maintains a pheromone map that updates at each step. When visiting a communication node, a robot uploads its pheromone map to the communication network, and then the pheromone map is merged with the one that is maintained by the communication network. Finally, the robot downloads a new pheromone map from the communication network. The behavior controller takes the pheromone map as input, making a decision to guide the robot to voyage in the area, monitoring the environment, searching for targets and visiting the communication nodes.
For robot i, take P i (t) as the pheromone map it maintained at t, B i (t) as its behavior at t, and A(·) as the behavior law. f update (·) is the rule to update the pheromone map and f merge (·) is the rule to emerge pheromone of itself. The rule maintained by the communication network is defined as P net (t). The mathematical model of the method can be described with the following functions: A robot can calculate its new behavior with Equation (1) based on the pheromone map, and the pheromone map will be updated using Equation (2). In following sections, we will introduce how to define the pheromone map with a matrix in detail. The designs of A(·), f update (·) and f merge (·) are also presented.
From the bionic perspective, the behavior controller can be treated as a social insect such as an ant, and the pheromone map it maintains is its environment. The main difference between our robot swarm and a real insect swarm is that, in our swarm, the environment is not real-time. In an ideal situation, namely all robots keep communicating with the communication network, the pheromone map maintained by the robots will be the same and it can reflect the current global situation. However, as robots can only exchange data with the network when visiting a communication node, in general, the pheromone maps maintained by robots are different and outdated. This metaphor is also represented in Figure 2.
A: Figure 2. Structure of the controller. (a) is a robot swarm monitoring the environment and searching targets in the AOI, mimicking the foraging behavior of ants in (b). For each robot, the controller consists of two layers, with A being the behavior law and P being the pheromone map. The pheromone map is similar to the environment for ants to deploy and sense pheromones. Just like ants can make a decision based on pheromone information, the behavior controller A will generate a motion decision based on the pheromone map P. It can be seen that all ants in (b) share the same environment, but in (a) the pheromone maps for different robots differ due to the lack of real-time communication.

Pheromone Map
We define a pheromone map to mimic the environment for social insect swarms. Agents in the swarm can write information into the pheromone map and read information out. To make it easy to manipulate the pheromone map with mathematical tools, we represent the pheromone map with a m × n matrix P as: where p i,j implies the pheromone density in D i,j , which is a portion of the AOI. The definition of D i,j is given in Section 4.1.

Mapping the AOI into the Pheromone Matrix
Let the AOI of random shape be a collection of points. We use a point set D AOI to represent the AOI, where each tuple (x, y) ∈ D AOI is a point in the AOI. The definition of the coordinate system is shown in Figure 3(2).
We use mapping f ap : D AOI → P to get the pheromone matrix P. Mapping f ap can is represented with Algorithm 1.
The idea of Algorithm 1 is that: STEPS 1 and 2: Build minimum bounding rectangle (MBR). By finding X MAX , X MIN , Y MAX , and Y MIN from D AOI , we can build a collection It is an MBR of D AOI . In this step, from an AOI with any kind of shape, we can always obtain a rectangle. It is always easy to map a rectangle into a matrix. STEPS 3 and 4: Expand the MBR. Usually, researchers directly scatter a rectangle to get a matrix. When the rectangle is scattered, we can get a set of rectangles. Then, we can build a matrix with each element representing the pheromone information in each rectangle. This idea is straightforward and is adopted by various research works [48,50,53]. The main drawback of this idea is that special behavior laws are necessary to prevent robots from going out of the AOI.
Calculate the center of each rectangle, with the center of D i,j being (x c (i), y c (j)): With our idea, if the AOI is surrounded by repellent pheromone with extremely high density, a robot will not set a point outside of the AOI as its next waypoint. Taking the waypoint as input, a low-level Proportional-Integral-Differential (PID) controller controlling the thruster and rudder will drive a robot back to the AOI in case it is pushed out by water flow. To achieve this, we expand the D MBR into a larger rectangle In D, the edges are adjusted into: ∆x and ∆y are parameters that should be chosen properly. STEP 5: Scatter D.
As we want to obtain an m × n matrix, we scatter the rectangle D into m × n cells. We label these cells with D i,j so that p i,j in matrix P can represent the pheromone information in D i,j . We have The center of D i,j is defined as (x c (i), y c (j)), which can be calculated with: We further have STEP 6: Create the pheromone matrix P Finally, we set initial values for each p i,j according to the situation whether D i,j is inside D AOI . We treat D i,j inside D AOI if its center (x c (i), y c (j)) is inside D AOI , and then set p i,j as 0, namely, no pheromone exists in this cell. Otherwise, the D i,j is out of D AOI , which means we do not want a robot moves in D i,j . In this case, we set p i,j as ∞, indicating that this cell is filled with repellent pheromone with extremely high density.
To avoid a robot moving out of D AOI , we want D AOI surrounded by repellent pheromone. This means: We have: Finally, we can obtain: which is the condition to choose ∆x and ∆y in STEP 2. The whole process is also shown in Figure 3.  Figure 3. Mapping an AOI into a pheromone matrix. From (1) to (2), we find the MBR of the AOI. The definition of the coordinate system is also shown in (2). Then, the MBR is expanded slightly and scattered into m × n squares in (3). The cells that overlap with the AOI are set to 0, while other cells are set to ∞. This manipulation ensures that, in the pheromone matrix, all inaccessible cells are filled with a repellent pheromone of extremely high density. Finally, we get the pheromone matrix P from these cells.

Rules to Update Pheromone Matrix
For environment monitoring and static target search applications, we define the density of virtual pheromone as: where r pos (t) is the position of the robot at t. D i,j is the cell corresponding to p i,j , and k is a parameter indicating the density of the pheromone deployed. When a robot visits a communication node, it will exchange data with the communication network, merging the matrices maintained by the robot and the communication network, as f merge (·) in Equation (2).
Assume that one robot has visited a communication node at T and visits the communication network again at T + t back . Then, t back is defined as the re-visit time. We define a trajectory matrix M trace = (m i,j ) m×n , whose dimension is the same as that of P, to record the trajectory of the robot during [T, T + t back ]. In M trace , m i,j = 1 if D i,j has been visited by the robot during [T, T + t back ], otherwise it is set as 0. When visiting a communication node, a robot first uploads the change of the pheromone map caused by its behavior during [T, T + t back ], and then downloads the whole pheromone map. The rules are as follows.
Let P r be the pheromone matrix maintained by the robot before exchanging data, P net be the matrix maintained by the communication network before exchanging data, P r be the matrix maintained by the robot after exchanging data, and P net be the pheromone map maintained by the communication network after exchanging data. M trace is updated to M trace after exchanging data. Then, we have Algorithm 2.

Algorithm 2 Update pheromone matrix when visiting a communication node
Input: P r , P net , M trace Output: P r , P net , M trace 1: P net ← P net + P r . * M trace 2: P r ← P net 3: M trace ← 0 At Step 2, the operator . * is element-wise multiplication. It will return a matrix the same size as the two operands, by multiplying operands' elements with the same subscript. In this way, we can merge the pheromone map from the robot and that maintained by the communication network.

Environment Monitoring and Target Search
This section introduces the behavior law to monitor the environment and search for a set of static targets with an underwater robot swarm. It can be treated as an abstraction of deploying a swarm of underwater robots to monitor the AOI and search for underwater mine resources or wreckage after a shipwreck. We also explore to improve the performance by adjusting t back dynamically using a swarm evolution strategy.

Behavior Law for Environment Monitoring and Target Search
The behavior law is represented with a finite state machine, as illustrated in Figure 4. The state transition conditions are given in Table 1. The finite state machine consists of three states: the search state, the visit state, and the report state.
In the search state, robots move in the AOI while monitoring and searching simultaneously. The behavior law is A(·) in Equation (1), which can be treated as a mapping: with the current position of the robot r pos (t) ∈ D i,j , namely, the robot is currently in D i,j . In the next step, it should move to a neighbor cell in C D , according to the pheromone matrix P.
The intuitive behavior law is to directly mimic behaviors of social insects. In the next step, the robot should directly go to the neighbor cell with the lowest pheromone, i.e., the cell D i * ,j * , following behavior law: where i * , j * fulfill: When a robot has monitored the environment and searched for a period t back , it will change to the visit state.
In the visit state, a robot moves directly to the nearest communication node. When the robot reaches the communication node, it changes to the report state, exchanging data with the communication network following Algorithm 2.
The control commands here and those in the following sections are derived from a discrete model, i.e., the robots are required to track a series of waypoints rather than following continuous velocity commands. This is because the behavior of an underwater robot is disturbed by unpredictable ocean current, and hydrodynamics needs to be taken into account, making it difficult to build an accurate model. The solution to this problem is to adopt a layered controller in underwater robots. The top layer generates commands such as velocity and waypoints, while the bottom layer-usually following PID law-directly controls the steering and thrust. It is usually more effective for underwater robots to track waypoints than following continuous velocity commands because the bottom controller has disturbance rejection capabilities.
Two factors affect the performance of the method. The first factor is the re-visit time t back , and the second factor is the behavior law f behavior . We explore the effects of the two factors in Sections 5.2 and 5.3.

Marker Description
(1) Monitor the environment and search for a period of t back (2) Reach the nearest communication node (3) Finish exchanging data with the communication node

The Relationship between t back and Performance
As defined in Section 4.2, t back indicates the re-visit time of a node. We observe that, as t back increases, the performance, namely the coverage rate, for the same swarm first increases and then decreases, as shown by simulation results using different t back in Section 6.
We provide a qualitative explanation for the phenomenon. For a swarm with N robots and operation time T total , the coverage rate is P c = c visited \ c total , with c visited being cells that have been visited by at least one robot, and c total the total number of cells in the AOI. As c total is constant, the coverage rate is determined by c visited : with T i search being the time consumed by robot i to search, and P i overlap the possibility of robot i visiting a cell that has already been visited. To simplify the analysis, it is assumed that a robot can only visit one cell within a unit time.
Assuming that, during T total , robot i visits the communication node N b times, then we have Here, t j return is the time for robot i to voyage towards the communication node for the jth time. As a robot visits the communication node following the shortest path, t j return << t back . To simplify the analysis, we assume that the travel time to the communication node is a constant value t return , then we have: We further have: For robot i, Then, for all robots, c visited is determined by T search and P i overlap . With the increase of t back , N b decreases so that T search increases, i.e., the increase of t back is favorable to the increase of c visited . Meanwhile, without real-time data of each other, the chance for a robot to visit a cell that has already been visited by other robots also increases with the increase of t back . i.e., P i overlap increases, which is a negative factor for c visited . When t back is small, P i overlap is so small that T search plays the main role, and, with the increase of t back , the weight of P i overlap increases, and finally neutralizes the advantage brought by T search . Thus, the coverage rate first increases and then decreases with the increase of t back , as shown in the experimental section.
As the robot swarm is a complex system affected by multiple contradictory factors, we are unable to analyze the effects of t back quantitatively, i.e., unable to theoretically obtain the t back with the best performance. Therefore, a swarm intelligent method is proposed to adjust the parameter t back online and automatically. For robot r, we define p back to measure the performance of t back : where c r Based on p back , the robots in the swarm can adjust t back automatically according to the performance during the last t back interval, and the method is as follows: • Set initial values. For each robot, we assign a small initial value to t back (0). We also set p back (0) = 0, and k(0) = 1.

•
When a robot visits the communication network the i th time, calculate p back (i) and update t back (i) where and ∆t is a parameter.
When robots in the swarm adjust their own t back automatically with the method above, we can get an acceptable performance. Even though it is not an optimal solution, we can avoid the risk of choosing an unfavorable t back .

Improve Performance by Using Global Information
The behavior law f behavior in Section 5.1 just uses the information from the nine neighbor cells. This strategy wastes the information from far-away cells. A better strategy should consider both nearby and far-away pheromones. In this section, we propose an improved behavior law as follows: Assume that the current position of the robot is r pos (t) ∈ D i,j . In order to obtain f behavior , we define two 3 × 3 matrices P local and P global . P local is a sub-matrix of P as follows: Apparently, P local implies the pheromone information in cells in C D . As cells in the border region are assigned a pheromone whose density is ∞, a robot does not set a point at the border as its next waypoint. Only when it reaches one waypoint will a robot plan the next one. Hence, during the whole searching process, waypoints are within the AOI. As a result, we can always get P local . P global is a matrix compressed from P, which means that P global should contain the pheromone information of the whole pheromone matrix. We get P global with Algorithm 3.
In STEP 1, we replace the ∞ elements in P with 0 to get P so that we can sum elements up and calculate the average value. Then, in STEP 2, we compress P into a 3 × 3 matrix. This is achieved by dividing P into nine sub-matrices and calculate the average values of these sub-matrices, as shown in Figure 5. After this step, we can obtain a matrix called P global . P global already contains the information of the whole pheromone matrix. However, it does not indicate if the neighbor cells are accessible. Thus, in STEP 3, we get a matrix P local that indicates all the inaccessible neighbor cells. In STEP 4, we combine P local and P global to get P global that contains the global information, while indicating neighbor inaccessible cells with ∞. The process is shown in Figure 5.

Algorithm 3 Compress P into P global
Input: P, P local ,r pos ∈ D i,j Output: P global 1: Define P = (p i,j ) m×n : Mapping P local to P local by replacing all elements not equal to ∞ into 0 4: P global = P local + P global Now, we have two 3 × 3 matrices, i.e., the local matrix P local and the global matrix P global . Then, we can decide the motion of the robot according to the two matrices. The idea is that, if a neighbor cell has not been visited before, the robot will explore it. If all neighbor cells have already been visited, the robot will move in the direction with the lowest pheromone density. We define the next cell to visit as D i * ,j * , which can be obtained with Algorithm 4. (i * , j * ) ← random(C) 11: end if 12: if |C| = 0 then 13: for p a,b in C global do 14: if p a,b = min(C global ) then 15: put (a, b) into C 16: end if 17: end for 18: (i * , j * ) ← random(C) 19:

end if
The behavior law means: • Check local matrix P local and go to a random cell whose pheromone density is 0. • If no cell in P local equals 0, move to the cell with the lowest element in P global . Steps of getting P global from P. R indicates the current position of the robot, and p is pheromone value that is not equal to ∞. ∞ indicates the inaccessible cells. From P, we can get P and P local . P is compressed into P global . P local transits to P local by replacing all elements not equal to ∞ with 0. Finally, we get P global by summing up P local and P global . P global implies the average pheromone density at each direction and indicates the adjacent inaccessible cells with ∞.

Simulation and Real-World Experiment
Simulations are carried out in Matlab (version 2014a) to test the strategy proposed in this paper. We use an underwater robot swarm to monitor the environment and search for static targets. The proposed methods are evaluated, and factors affecting the performance of the methods are analyzed. Finally, we give some recommendations for the application of the methods according to the simulation results.

Simulation
In the simulation of monitoring the environment and searching for static targets, we set the AOI as a rectangle composed of 200 × 200 cells. At each step, a robot can move to one of the adjacent cells, and the total simulation time is set to 2000 steps. We assume that a robot is able to move from one cell to another in one step. A communication network of sixteen communication nodes is deployed into the AOI. These nodes are deployed evenly into the AOI, forming a uniform grid. All robots are deployed from the same position. This is because, in practical application, all robots in the swarm are deployed by the same mothership or from the same base station. In Section 5, we develop two behavior laws. The behavior law f local in Section 5.1 uses only local pheromone information, while f global in Section 5.3 uses global pheromone information. Both methods are simulated. We also test a random search scheme with all robots moving randomly. In Section 5.2, we assume that the re-visit time t back can affect the performance of the method. Thus, we implement simulations with different t back .
To eliminate randomness, for the same setting, we repeat the simulation 10 For a random search, no t back is used.  Figure 6. Comparison of performance for f global , f local and f random . In each subfigure, t back is set with the same value, and swarms with 10, 20, 30, 40, 50 robots are tested with all three behavior laws. The performance of the control law is evaluated with coverage rate. From each subfigure, we notice that, for a fixed number of robots, no matter how t back is set, the performance of f global is always superior to that of f local . Both the f global and f local schemes perform better than f random . In addition, with the increase of swarm size, the performance also increases no matter which scheme is adopted. However, the increase of performance is obvious for f local and f global , while that for f random is rather insignificant. This trend appears in each subfigure, which implies that f global is superior to f local and f random no matter how t back is set. shown with a grayscale map, indicating the density of repellent pheromone. The density of the pheromone is higher in the lighter region and lower in the darker region. The pheromone density is 0 in the black region. We want most areas to be visited by robots, but not repeated many times; (a) is not perfect because it has a large black part, representing regions that have never been visited. It also has a large part that is extremely white, indicating that these regions have been visited repeatedly, which is a duplication of label; (b) is better because almost the whole AOI is covered with a layer of light white, indicating that most parts have been visited, and not repeated over and over again.
As all experiments are carried out in the area of the same size, the number of robots can also reflect the density of robots. As the swarm is used to monitor the environment and search static targets, we assume that, only when the target that is within the cell has been visited by a robot, it can be found. Thus, we use the coverage rate at the final time to measure the performance of the method.
From the simulation results in Figure 6, we compare the performance of the three methods with the same number of robots and the same t back . It can be found that the method using the global pheromone information is better than the method using only local pheromone information. The performances of both pheromone-based methods are superior to that of the random search method. We note that, except for t back = 50, the coverage rate with 20 robots and the behavior law f global is similar to that of a swarm with 50 robots using f local . In addition, for t back = 50, when the size of swarm using f global is 10, the coverage rate is similar to a f local swarm with 50 robots. This means that f global is very superior to f local . This is because, by using the P global , the strategy f global can guide the robots to the region with a lower repellent pheromone density in which fewer cells have been visited by robots. As a result, with f global , the pheromone density is more even and robots can spread out in a short time. Figure 7 provides the pheromone map of a swarm using f local and that of a swarm using f global for comparison. The pheromone maps are represented with a grayscale map, indicating the density of the pheromone. In both cases, the size of the swarm is 50 and the t back is set to 250. It can be seen that, when using f local , the grayscale map is more imbalanced. The light region implies that these cells have been visited multiple times, so repellent pheromones have been deployed again and again, while the large dark regions have never been visited. Meanwhile, the map using f global is very balanced, and the dark region is much smaller. This implies that fewer cells have been visited multiple times by the swarm, enabling robots to explore new regions.
In Section 5.2, we predict that t back can affect the performance of the swarm, and with the increase of t back , the coverage rate will first increase and then decrease. Figures 8 and 9 show the effects of t back to the performance of the swarm. We can clearly see the trend that the performances first increase and then decrease. Therefore, it is important to choose a proper t back .
To solve this problem, we propose a method in Section 5.2 that can dynamically adjust the t back for each robot. Again, we use a different number of robots and the two kinds of behavior controllers to perform simulations. In these cases, the t back is not a constant value, and keeps changing following the scheme in Section 5.2. The simulation results are shown in Figures 8 and 9. In each subfigure, the red box shows the performance while automatically adjusting the t back . From Figure 8, it is obvious that, when using f local , the performance with dynamic t back is better than that of any constant t back . From Figure 9, when using f global , the performance with dynamic t back is mediocrity. Even though the dynamic t back scheme is not optimal, the performance is acceptable. More importantly, evolving the t back by the swarm, we can avoid the risk of choosing a bad t back . From the black line, we notice that, with the increase of t back , the performance first increases and then decreases. There exists a best t back which provides the fastest coverage rate. However, as analyzed in Section 5.2, we are unable to obtain this value directly. Meanwhile, the red box shows that, with the dynamic adjust t back strategy, the performance is superior to that of any fixed t back . The phenomenon appears in each subfigure, i.e., simulation with different swarm size, indicating that, when using f local , the dynamic adjust t back scheme performs better than any fixed t back .  Figure 8, in each subfigure, the black line first increases and then decreases, indicating that, with the increment of t back , the coverage rate will first increase and then decrease. From the five sub-figures, the red box is a bit lower than the top of the black line. This indicates that, with f global , the dynamic adjust t back scheme will provide the performance that is not the best, but still acceptable. Considering that we are unable to get the best t back in advance, using the dynamic adjust t back scheme has application value because it can avoid choosing a poor t back .

Real-World Experiment
The experiment is carried out with a USV swarm in Xiuhu Lake, Shenyang, China. As experiments with a swarm of underwater robots are costly, it is a common practice to mimic the behavior of underwater robots using USVs as an alternative in real-world experiments, such as in [72,73]. As shown in Figure 10, a float ball acting as the communication node is located at the center of the lake (123.653555 E, 41.934899 N). In the lake, we set a square of 150 × 150 m 2 as the AOI and deploy two USVs. The AOI is first mapped into a matrix representing the pheromone with Algorithm 1. This is achieved by scattering the AOI into a set of grids properly. The USVs adopt GPS to obtain their locations and the positioning error is around 5 m. The cruise speed of both USVs are 1.5 m/s, with the turn radius being 10 m. With these parameters, we scatter the AOI into 25 cells, with the edge of each cell being 30 m. This value is big enough for USVs to overcome negative effects caused by positioning errors, and is larger than the turn radius so that one USV can move smoothly from one cell to another without circling around the destination. To prevent the USVs from going out of the AOI, the edge of the AOI is expanded slightly, generating the border region filled with the pheromone of ∞ density. In this case, we set a border region whose width is 30 m, as shown in Figure 11. The two USVs are unable to communicate with each other but can exchange data with a communication node when the distance between them is less than five meters.
In the experiment, two behavior laws are tested, which are based on local pheromone information in Section 5.1 and global pheromone information in Section 5.1, respectively. Considering the choice of t back and the simulation results in Section 6.1, the scheme in Section 5.2 can adjust t back dynamically, providing an acceptable performance in both cases. Thus, in the test, we do not set a fixed t back but rather adjust it dynamically. The performance is evaluated with the distribution of pheromones in the area, and we are interested in two metrics. The first metric is the coverage rate, implying if the whole area has been explored by the swarm. The second metric is the mean square error of the density of pheromone. A small mean square error indicates that each square is visited frequently enough, enabling the robots to find emerging targets in time.
Both methods are tested for 20 min, with Figure 11 showing the distribution of pheromones for both methods. With the local pheromone-based method, the coverage rate is 88% with the mean square error of 0.3594. However, with the global pheromone-based method, the coverage rate reaches 100% with a smaller mean square error of 0.2899. We conclude that the scheme adopting global pheromone information performs better. This result further supports the discussion in Section 5 and verifies the simulation results. Figure 10. The lake experiment carried out with two USVs and one communication node. USV #1 is approaching the communication node to update its pheromone map, while #2 that has just visited the node is exploring the lake.
(a) f local (b) f global Figure 11. Pheromone distribution with f local and f global . In our test, the AOI is scattered into 5 × 5 grids with Algorithm 1. The color of each square indicates the density of pheromone: the darker a square, the greater the pheromone density is. The edge of the AOI is black and the corresponding pheromone density is ∞, preventing the robots from going out of the AOI. The red star indicates a communication node. With f global , pheromone has been deployed into the whole area, but it is not the case when using f local . In addition, the density with global is more balanced.

Conclusions
An underwater robot swarm can be applied for maritime monitoring applications. Compared with a single robot, a swarm covers a larger area and it can accomplish tasks more rapidly by working collectively. In this paper, we propose a monitoring strategy for an underwater robot swarm. The use of robots makes it possible to monitor a dynamically selected area, rather than monitor a fixed area using stationary monitoring sensors. Our strategy deals two aspects, i.e., communication and swarm monitoring behavior. We build a communication network which contains a set of underwater communication nodes. Robots periodically visit communication nodes to exchange information with each other in an indirect way. To form cooperative swarm monitoring behavior, we apply a pheromone-inspired controller to each robot. The controller uses virtual pheromone to store the information of an AOI. Behavior laws are designed to guide robots to monitor the environment with the help of the virtual pheromone. In the monitoring process, static target search-such as wreckage or mine resources-can be performed simultaneously. Once the targets are found, they can be reported by updating information to communication nodes.
Experimental results indicate that, among the three schemes we tested (i.e., f global , f local and random search), the f global scheme performs best, and both f global and f local schemes work better than the random search scheme. This can be explained from the degree of "cooperation" among robots. With the random search scheme, every robot works independently. Thus, they do not use information from their peers, resulting in duplicate work. With f local , a robot uses the nearby information. It can infer the density of nearby robots and visit unexplored areas. With f global , a robot can obtain the information of the whole AOI. The more information it has, the wiser a decision it can make.
There is a trade-off between performance and computation cost. For a robot swarm with a random search scheme, only a few robots are needed and the calculation workload is small. When adopting a f local scheme, an underwater communication network consisting of a set of communication nodes is necessary. To achieve better performance, we apply f global scheme, which may require a high performance computer because a robot needs to handle a matrix representing the whole AOI. With the expansion of the AOI, the calculation workload will also increase. If the AOI is enormous, robots may need a high performance computer, which will increase the cost of the swarm, as well as the energy consumption. However, with f local , no matter how large the AOI is, the calculation workload is fixed and small because only a 3 × 3 matrix is processed (or 3 × 3 × 3 matrix when the third dimension is added). As a result, the robot can carry a computer with lower performance, thus reducing the cost of the robot.
In addition, the performance of f global can be further improved. The key point is to extract valuable information from the pheromone matrix. Our current scheme uses the mean pheromone density in each direction to determine the behavior of a robot. However, from the global matrix, other information can also be used. A proper choice of the information may improve the performance of the swarm, and that is what we are currently working on.
In order to achieve a reasonable performance, we introduce an evolution scheme that automatically varies the visiting period of the robots in the monitoring process. Simulation results reveal that, for f local , the performance of dynamic t back is superior to that of any fixed t back . However, for f global , the performance of this scheme is not remarkable. This is because a robot adjusts its t back based on its own historical performance. There is a chance that the strategy can be improved by utilizing the historical performance of other robots.
In the future, we plan to enhance the performance of our monitoring strategy by predicting global information based on past information from the network and demonstrate its effectiveness in real-world applications.

Conflicts of Interest:
The authors declare no conflict of interest.