Multibody System-Based Adaptive Formation Scheme for Multiple Under-Actuated AUVs.

Underwater vehicles' coordination and formation have attracted increasing attention since they have great potential for real-world applications. However, such vehicles are usually under-actuated and with very limited communication capabilities. On the basis of the multibody system concept, a multiple autonomous underwater vehicle formation and communication link framework has been established with an adaptive and radial basis function (RBF) strategy. For acoustic communication, a packets transmission scheme with topology and protocol has been investigated on the basis of an acoustic communication framework and transmission model. Moreover, the cooperative localization errors caused by packet loss are estimated through reinforcement learning radial basis function neural networks. Furthermore, in order to realize formation cruising, an adaptive RBF formation scheme with magnitude reduced multi-layered potential energy functions has been designed on the basis of a time-delayed network framework. Finally, simulations and experiments have been extensively performed to validate the effectiveness of the proposed methods.


1.
On the basis of the multibody system concept, the MAUVs' formation and communication link framework has been established. The connection between AUVs can be viewed as a springs and damping system. An adaptive control strategy has been set up for multiple under-actuated AUVs formation with a desired formation region and magnitude reduced artificial potential function.

2.
On the basis of the MAUVs' formation and communication link framework, the packets transmission scheme has been designed with learning-based multi-layered network topology; the cooperative localization errors caused by packet loss are estimated and modified through reinforcement learning RBF neural networks. 3.
On the basis of the MAUVs' formation and communication link framework, an adaptive RBF formation scheme with magnitude reduced multi-layered potential energy functions has been designed on the basis of the time-delayed network framework. Simulations and experiments have verified the performance of the purposed schemes.

Adaptive Communication Protocol
If we take multiple AUVs' formation as a multibody system, the mobile AUV nodes should be connected and coordinated over network communication. However, constantly varying the nodes' distance and transmission latency could lead to the difficulties in data transmission and relative distance observation. Moreover, the energy consumption is correlated with the data transmission  After channel contention and selection, the source AUV node and objective AUV nodes realize time consensus through broadcasting and answering. The source AUV node will send information to the objective AUV node 2 through objective AUV node 1.
Secondly, the source AUV node will send information to the source node. The format of the data package is {RTS/overtime, node_pos, node_speed, destination_node}, which denotes the data After channel contention and selection, the source AUV node and objective AUV nodes realize time consensus through broadcasting and answering. The source AUV node will send information to the objective AUV node 2 through objective AUV node 1.
Sensors 2020, 20, 1943 5 of 17 Secondly, the source AUV node will send information to the source node. The format of the data package is {RTS/overtime, node_pos, node_speed, destination_node}, which denotes the data send request, present position, speed, and destined AUV node (in Figure 3, the objective AUV node 2 is supposed as the destined AUV node). When the "RTS" message has been received by the objective AUV node 1, it will be sent to the objective AUV node 2 immediately. At the same time, objective AUV node 1 will be waiting for the "CTS" message from the objective AUV node 2 or a return timeout frame. When the "RTS" message has been received by the objective AUV node 2, it is informed about the forthcoming message, comes into "response adjustment" status, and sends the "CTS" message to the source AUV node through the objective AUV node 1. When the "CTS" message is received by the objective AUV node 1, it will be transmitted to the source AUV node with the format of the data package as {CTS1/overtime, node1_pos, node1_speed, CTS2/overtime, node2_pos, node2_speed}, which denotes the speed and position of objective nodes. When timeout happens, the source AUV node will send the request again or reselect another objective AUV node.
Thirdly, when the "CTS" message is received, "data" will be sent from the source AUV node to the objective AUV node 2 through the objective AUV node 1. When the objective AUV node 1 received "data", it will come into "Response adjustment" status and send the data package to the objective AUV node 2. After the data has been received, the objective AUV node 2 will return "ACK" to the source node through the objective AUV node 1. The format of the data package is {ACK1/overtime, node1_pos, node1_speed, ACK2/overtime, node2_pos, node2_speed}, which denotes the speed and position of objective nodes. After "ACK" has been received by the source node, the transmission process will terminated.

Protocol for One-Many Contending Topology
The protocol includes a four-way handshaking access method for "RTS", "CTS", "Data", and "Acknowledgment for receiving", as well as "Blocked to Send" packets for waiting control. The "Response adjustment" time includes the time of propagation and process delay. Once a source decides to start transmission through one channel, the handshaking process will start and transmit a "Blocked to Send" to other sources (other AUVs) at the same time (see Figure 4).
Sensors 2020, 20, 1943 5 of 18 send request, present position, speed, and destined AUV node (in Figure 3, the objective AUV node 2 is supposed as the destined AUV node). When the "RTS" message has been received by the objective AUV node 1, it will be sent to the objective AUV node 2 immediately. At the same time, objective AUV node 1 will be waiting for the "CTS" message from the objective AUV node 2 or a return timeout frame. When the "RTS" message has been received by the objective AUV node 2, it is informed about the forthcoming message, comes into "response adjustment" status, and sends the "CTS" message to the source AUV node through the objective AUV node 1. When the "CTS" message is received by the objective AUV node 1, it will be transmitted to the source AUV node with the format of the data package as {CTS1/overtime, node1_pos, node1_speed, CTS2/overtime, node2_pos, node2_speed}, which denotes the speed and position of objective nodes. When timeout happens, the source AUV node will send the request again or reselect another objective AUV node. Thirdly, when the "CTS" message is received, "data" will be sent from the source AUV node to the objective AUV node 2 through the objective AUV node 1. When the objective AUV node 1 received "data", it will come into "Response adjustment" status and send the data package to the objective AUV node 2. After the data has been received, the objective AUV node 2 will return "ACK" to the source node through the objective AUV node 1. The format of the data package is {ACK1/overtime, node1_pos, node1_speed, ACK2/overtime, node2_pos, node2_speed}, which denotes the speed and position of objective nodes. After "ACK" has been received by the source node, the transmission process will terminated.

Protocol for One-Many Contending Topology
The protocol includes a four-way handshaking access method for "RTS", "CTS", "Data", and "Acknowledgment for receiving", as well as "Blocked to Send" packets for waiting control. The "Response adjustment" time includes the time of propagation and process delay. Once a source decides to start transmission through one channel, the handshaking process will start and transmit a "Blocked to Send" to other sources(other AUVs) at the same time (see Figure 4). At the first stage, when the RTS frame is received, the destination is notified for the forthcoming transmission. The destination goes to the "Response adjustment" state to receive the packets from its neighbor through the selected channel. A block to send is transmitted to other neighbors so as to alert potential interferers that this channel will be busy for the whole carrying time before it can cause a collision.
At the second stage, the source waits until receiving either "CTS" or a timeout frame. When a timeout occurs, the source is back to the channel contention and selection state. Obviously, the propagation delay between a frame and its "Response adjustment" is at least equal to the length of the frame to be transmitted/received in it so that the node response can be dealt with one after At the first stage, when the RTS frame is received, the destination is notified for the forthcoming transmission. The destination goes to the "Response adjustment" state to receive the packets from its neighbor through the selected channel. A block to send is transmitted to other neighbors so as to alert potential interferers that this channel will be busy for the whole carrying time before it can cause a collision.
At the second stage, the source waits until receiving either "CTS" or a timeout frame. When a timeout occurs, the source is back to the channel contention and selection state. Obviously, the propagation delay between a frame and its "Response adjustment" is at least equal to the length of the frame to be transmitted/received in it so that the node response can be dealt with one after another. Thus, the transmission of an "RTS" frame and reception of a "CTS" frame are two actions that have the same maximum single-trip propagation delay, Pmax. If we define the fixed length gap between a control frame and its consequent frame as "CML", thus, the gap at the source between RTS and CTS is called CMLRTS, and the gap at the destination between "CTS" and "Data" is called "CMLCTS". We define: for the worst propagation scenario. After receiving the RTS frame, the destination then uses the distance information measured from the "RTS" frame to calculate the time to reply with a "CTS" frame so that the "CTS" frame reaches the source after a "CML" space can be counted. During the gap of "CML", a potential interferer is avoided for collision-free transmission. Once the "Adjusted Response" state finishes, the source sends the data packets through the corresponding channel and goes to the "ACK" state. In summary, the second stage allows the destination to negotiate with the source, which gives both the source and the destination more flexibility and therefore reduces the chance that the destination fails due to channel collision. The third stage starts as soon as the "CTS" frame is actually received. During this stage, if the destination receives "Data" from the source, it goes to the "Response adjustment" state to verify that the data packet is coming from the source. Otherwise, a timeout occurs.
At the fourth stage, "ACK" for the corresponding data packets are sent through the selected channel once the "Response adjustment" state finishes. After receiving the first ACK packet, the source finishes its transmission process. The BTS values are reset, and the node goes to a "Channel request" state if there are packets to transmit.

RBF Learning Network for Localization Errors Estimation
The sound propagation loss is one of the major reasons for cooperative localization errors. It is composed mainly of three aspects: namely, geometrical spreading, attenuation by absorption, and the anomaly of propagation: 10logA where α is the absorption coefficient in dB/km, k represents the geometrical spreading factor, l represents the transmission range, and f represents the signal frequency.
If we set N t as the turbulence noise, N v as the vehicle noise, N w as the wind driven wave noise, and N th as the thermal noise, therefore, we obtain the channel capacity as: where B is the bandwidth and P tx is the signal transmission power. MAUVs in the formation should not only keep the formation configuration to realize purposed missions, but also avoid collision with obstacles. The formation shape and relative distances maintenance are important. If we set p c as the formation center, one obtains: Sensors 2020, 20, 1943 7 of 17 Each AUV can acquire a geometric center by communicating with its neighbors so as to keep the formation. Hence, the error between p c and the desired center, T is the desired center of the formation region: T is the RBF neural network to estimate three dimensional cooperative localization errors caused by the data transmission packets loss and measurement noise.Ŵ = w 1 , . . . , w N h is the weight vector, while s i represents the input, including the packet loss, delay, current relative distance and between the AUVS, throughput, and current AUV speed.
The output of the RBF neural network can be expressed in the following: where N h , N i , and N o represent the number of hidden layers, input layers, and output neurons. w im and ξ mk denote the network weights, δ ξj and δ wj represent the threshold offsets, and σ() denotes the Gaussian function: where r i is the center vector of the receptive field. w im can be obtained through the following reinforcement learning algorithm.
w(s(t), a k (t)) = w(s(t), a k (t)) + α[r(t + 1) + γw * (s(t + 1)) − w(s(t), a k (t))] In this algorithm, the action is taken on the packets transmission episode. The actions are chosen through the ε greedy strategy. If ε >> 0, the actions are taken randomly of a(t) ∈ U(a min , a max ). When ε << 1, the system exploits the knowledge through selecting the actions. The actions are selected through the comparisons between a random value of x ε ∈ U[0, 1] and ε: The actions represent the power transmission levels. The state is a combination of transmission energy E trans and channel transmission error evaluation, P error : where B error is the bit error rate and N bit is the number of bits in the packet [13]. If each transmission action attempts to transmit the total packets, the rewards are defined as a combination of packets reception and energy power levels: where π is the quantization step size factor between two consecutive quantization levels. pr(t) and p Ediss (t) are the packets reception levels and energy dissipation levels, respectively, while m pr is the number of quantized pr(t) levels. If one defines Sensors 2020, 20, 1943 8 of 17 where β is the maximum speed of desired trajectory p d , β = max(p d ), ⊗ is the Kronecker product. Then, the derivative of the error is given by: where .

Formation Shape Maintenance with Potential Field
Potential functions play a great role in helping AUVs move along the desired gradients directions and finally stabilize at the local minima. The following will define the layered potential functions' shape for the AUVs to reach the desired region and maintain a formation shape (see Figure 5).
where η iol = η i − η ol , η ol is a constant reference point of the l-th desired region, l = 1, 2, . . . , m, and m is the total number of objective functions. f Sl (δη iol ) represents the scalar functions with continuous partial derivatives. From Equation (1), the desired range of AUV motions in the formation is defined as a cylindrical and ring-shape region. For each AUV p i , the desired region is the ring centered around p d c between R 1 and R 2 with height h. Therefore, the scalar attractive forces of the shape function can be defined as follows.
Layer 1 : Hence, the center of the desired formation region is: If k l is set as a positive constant, the traditional potential energy function for the desired formation regions in Figure 5 is: In the consideration with the under-actuated characteristic of AUV, the potential energy functions' magnitude produced from three-dimensional distances have been reduced to improve the scheme robustness and convergence. On the other hand, since the rudder angle is significant for under-actuated AUV to arrive at desired positions, the yaw error of AUV formation appears to be more important.  (1), the desired range of AUV motions in the formation is defined as a cylindrical and ring-shape region. For each AUV i p , the desired region is the ring centered around d c p between 1 R and 2 R with height h. Therefore, the scalar attractive forces of the shape function can be defined as follows. Figure 5. The layered region for AUV formation and collision avoidance. Figure 5. The layered region for AUV formation and collision avoidance.
Thus, the region error for the i-th AUV is defined as follows.
For the collision avoidance conditions, the repulsive forces between AUVs or AUVs and obstacles are defined in the form as: where p oi is the position vector of the i-th obstacle, the energy functions are defined on the basis of the collision avoidance region: where δη ij = η i − η j , g 1ij , g 2ij , . . . , g NLij are the functions for the first layer, second layer, . . . , and the innermost layer, respectively, and these layers are continuous and differentiable, while N is the number of layers, and R i1 > R i2 > . . . R iN denote the radius of the first, second, and innermost layers, respectively. Similar to the equations shown in (19), the collision avoidance energy functions have been magnitude reduced as: . . .
where k Nij > · · · > k 2ij > k 1ij are positive constants. The potential energy for collision avoidance between the i-th and j-th vehicle is: and If we set L i as positive definite matrices, the estimated parameterλ i is updated as: Therefore, In order to prove the stability of the RBF-based adaptive formation scheme, we obtain a Lyapunov-like function for the multiple AUVs system as: We obtain from Equations (20), (31), and (32): .
, the last term of the Equation (36) can be rewritten by using Equation (25): From Equation (22), we can obtain g hij δq ij = g hji δq ji and ∂g hij δq ij Thus, the last term of Equation (35) can be written as , W * k,i denotes the ideal constant weights. Therefore, the time derivative of the Lyapunov function in Equation (37) is  (28), ∆ρ ij → 0 . Since as t → ∞ , all the error terms are summing yields: Since the interactive forces between AUVs are bi-directional, the summation of all the interactive forces in the systems is zero, we obtain: One trivial solution of Equation (43)  This means that some AUVs must be on the opposite sides of the desired region. When there are interactions or coupling among the AUVs from different sides of the desired region, a reasonable weightage can be obtained for ∆ξ i by adjusting α i . Finally, since s i → 0 and ∆ξ i → 0 , we can conclude from Equation (28) that ∆ρ ij → 0 . Hence, all the AUVs are synchronized to the same speed and maintain constant distances among themselves at steady state.

Simulations and Experiments
In order to analyze and verify the designed communication link framework and formation scheme, simulations and experiments have been launched. In the formation simulations of Figures 6 and 7, comparisons have been made on the proposed adaptive formation scheme with and without the RBF neural network. The disturbance is set with a current speed as 0.1 m/s in the west direction. The simulation includes the formation along a round curve and cruising in the confined channel. Their communications are simulated in the NS-2 simulator on the basis of the communication protocol of Section 2. The formation control simulation platform was established on the basis of AUV hydrodynamic equations.
information. The multibody system-based potential field can help MAUVs maintain and change their formation shape according to the environment. The protocol for one-many contending topology and linear topology have been applied and switched according to the shape requirements.
Offshore experiments of MAUVs formation coverage exploration are illustrated in Figure 8. The vehicles were given folding lines with a 90-degree yaw path to test the formation performance of heterogeneous AUVs. The three AUVs can keep their formation while cruising under the strategies proposed in this study.     In Figure 6, the three AUVs are planned to follow a round curve with a line shape, e.g., the followers are planned to maintain the same distance one after another. The protocol for linear topology has been applied for the formation communication on the basis of the network framework of Section 2. Since the radius of the trace curvature is greater than the radius of the AUVs' gyration, these three AUVs can keep formation cruising precisely. The package loss and data transmission throughput are illustrated in Figure 6b; one can improve the cooperative localization accuracy through reinforcement learning RBF neural network and therefore improve the formation stability. From Figure 6c, the reinforcement learning RBF neural network can compensate and reduce the cooperative localization errors caused by communication loss through Equations (12)- (14).
Channel cooperative exploration is one of the important applications, and it is very difficult for MAUVs because of the change of channel size and curve. Through the reinforcement learning RBF neural network, the MAUVs' formation can obtain more accurate cooperative localization information.
The multibody system-based potential field can help MAUVs maintain and change their formation shape according to the environment. The protocol for one-many contending topology and linear topology have been applied and switched according to the shape requirements.
Offshore experiments of MAUVs formation coverage exploration are illustrated in Figure 8. The vehicles were given folding lines with a 90-degree yaw path to test the formation performance of heterogeneous AUVs. The three AUVs can keep their formation while cruising under the strategies proposed in this study.

Conclusions
MAUVs' formation is of great significance for marine surveys and exploration. In order to realize MAUVs' formation, this study has focused on their communication and formation. On the basis of the multibody system concept, the MAUVs' formation and communication link framework has been established with an adaptive RBF strategy. The connection for communication and formation between AUVs can be viewed as a springs and damping system. The packets transmission scheme has been designed with multi-layered network topology, which reduces the packets' loss rate and improves the throughput of the network. Moreover, through the reinforcement-learning RBF neural networks, an adaptive RBF formation strategy can be improved with more accurate cooperative localization information. Simulations and offshore experiments with multiple heterogeneous under-actuated AUVs testify the performance of proposed method.

Conclusions
MAUVs' formation is of great significance for marine surveys and exploration. In order to realize MAUVs' formation, this study has focused on their communication and formation. On the basis of the multibody system concept, the MAUVs' formation and communication link framework has been established with an adaptive RBF strategy. The connection for communication and formation between AUVs can be viewed as a springs and damping system. The packets transmission scheme has been designed with multi-layered network topology, which reduces the packets' loss rate and improves the throughput of the network. Moreover, through the reinforcement-learning RBF neural networks, an adaptive RBF formation strategy can be improved with more accurate cooperative localization information. Simulations and offshore experiments with multiple heterogeneous under-actuated AUVs testify the performance of proposed method.