Efficient Asynchronous Federated Learning for AUV Swarm

Meng, Zezhao; Li, Zhi; Hou, Xiangwang; Du, Jun; Chen, Jianrui; Wei, Wei

doi:10.3390/s22228727

Open AccessArticle

Efficient Asynchronous Federated Learning for AUV Swarm

by

Zezhao Meng

¹,

Zhi Li

^1,*,

Xiangwang Hou

²,

Jun Du

²,

Jianrui Chen

²

and

Wei Wei

³

¹

School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China

²

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

³

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 514231, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(22), 8727; https://doi.org/10.3390/s22228727

Submission received: 13 October 2022 / Revised: 8 November 2022 / Accepted: 8 November 2022 / Published: 11 November 2022

(This article belongs to the Special Issue Sensors and Underwater Robotics Network)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The development of automatic underwater vehicles (AUVs) has brought about unprecedented profits and opportunities. In order to discover the hidden valuable data detected by an AUV swarm, it is necessary to aggregate the data detected by AUV swarm to generate a powerful machine learning model. Traditional centralized machine learning generates a large number of data exchanges and faces problems of enormous training data, large-scale models, and communication. In underwater environments, radio waves are strongly absorbed, and acoustic communication is the only feasible technology. Unlike electromagnetic wave communication on land, the bandwidth of underwater acoustic communication is extremely limited, with the transmission rate being only

1 / 10^{5}

of the electromagnetic wave. Therefore, traditional centralized machine learning cannot support underwater AUV swarm training. In recent years, federated learning could only interact with model parameters without interacting with data, which greatly reduced communication costs. Therefore, this paper introduces federated learning into the collaboration of an AUV swarm. In order to further reduce the constraints of underwater scarce communication resources on federated learning and alleviate the straggler effect, in this work, we designed an asynchronous federated learning method. Finally, we constructed the optimization problem of minimizing the weighted sum of delay and energy consumption, relying on jointly optimizing the AUV CPU frequency and signal transmission power. In order to solve this complex optimization problem of high-dimensional non-convex time series accumulation, we transformed the problem into a Markov decision process (MDP) and use the proximal policy optimization 2 (PPO2) algorithm to solve this problem. The simulation results demonstrate the effectiveness and superiority of our method.

Keywords:

federated learning (FL); autonomous underwater vehicle (AUV); gradient compression; communication resource optimization; proximal policy optimization 2 (PPO2)

1. Introduction

An automatic underwater vehicle (AUV) is a kind of submarine robot that can carry out ocean sampling activities independently; it is widely used in underwater research and the marine industry. For example, AUVs have been used to sample coastal frontiers, monitor coastal areas, measure thermocline turbulence, obtain interdisciplinary data, conduct fishery research, and install submarine cables under the frozen sea [1,2]. However, due to the limited capacity of a single AUV, processing more complex tasks by AUV swarm has become a hot research topic. In recent years, underwater research based on the AUV group has become more popular [3,4]. The advantages of the AUV swarm cooperation over a single AUV can be summarized into two points. On the one hand, an AUV swarm can obtain more data than a single AUV, such as underwater hydrological characteristic modeling, on the other hand, an AUV swarm can perform tasks that cannot be completed by a single AUV, such as rounding-up targets, formation cruising, collaborative positioning, etc. In the above two aspects, it is often critical to establish a powerful machine learning model to mine the values behind the observed data, or to improve the effectiveness of the AUV’s own actions. Specifically, for the sake of exploring the hidden valuable information of the data detected by the AUV swarm, it is necessary to aggregate the data detected by the AUV swarm to generate a powerful machine learning model, such as forming the hydrological characteristics model of the water area by data fitting. Moreover, in order to improve the movement of the AUV, it is a requisite to train the AUV to obtain a strong reinforcement learning model to achieve efficient autonomous action of the AUV. In order to support the above two goals, traditional centralized machine learning needs to interact with a large number of data, which will require a lot of communication resources. However, unlike on land, underwater electromagnetic wave communications with high bandwidths cannot be used; they often rely on underwater acoustic communication for data transmission. The bandwidth of underwater acoustic communication is very limited, and the transmission rate is only

1 / 10^{5}

of the electromagnetic wave. Therefore, traditional centralized machine learning cannot support underwater AUV swarm training and the production of large models. Federated learning, a new machine learning framework, was first proposed by Google in 2016 [5]. Each data holder uses his/her own data to train a model, and the models interact with each other. Finally, a global model is obtained through model aggregation. Unlike traditional centralized machine learning, federated learning can only interact model parameters without interacting with data, which can greatly reduce communication overhead and reduce dependence on communication resources, and is suitable for the environment where underwater communication resources are scarce. Therefore, this paper introduces federated learning into the collaboration of the AUV swarm. Furthermore, considering that the interaction model parameters are much less than the interactions with the original data, there is still a large number of data, and the traditional federated learning method (i.e., FEDAVG proposed by Google [5]) is synchronized, which is prone to the emergence of the straggler effect, especially in unreliable underwater information transmission and large time delays. Therefore, this paper designs an asynchronous federated learning method to further reduce data transmission and reduce the straggler effect, which will play an important role in the harsh underwater environment. Furthermore, considering the limited energy of the AUV and the time requirement of model training, we constructed an optimization problem of minimizing the weighted sum of delay and energy consumption, relying on jointly optimizing AUV CPU frequency and signal transmission power. In order to solve this complex optimization problem of high-dimensional non-convex time series accumulation, we built the problem into an MDP and used the PPO2 algorithm to solve this problem. The main contributions are as follows:

To the authors’ knowledge, this paper is the first to introduce federated learning into the AUV swarm. Federated learning can help AUV swarm train large-scale machine learning models in an environment where underwater communication resources are scarce.
In order to further reduce the constraints of underwater scarce communication resources on federated learning, we designed an asynchronous federated learning method, which can effectively alleviate the straggler effect and further reduce data interaction.
We constructed the optimization problem of minimum delay and energy consumption, by jointly optimizing AUV CPU frequency and signal transmission power.
In order to solve the optimization problem efficiently, we converted it into an MDP and proposed the PPO2 algorithm to solve this problem. The simulation results verify the effectiveness of the proposed algorithm.

The content and organizational structure of this article are as follows:

In the third section, a federated learning model based on the AUV swarm is established. First, we established a federal learning model. Second, the communication system model was established. Then the control model was established, the nodes participating in this round of upload were selected locally, and redundant nodes were adaptively skipped. Then a delay model was established to calculate the time and total time of each phase of federated learning. Then the energy consumption model was established as the energy constraint. Finally, the optimization problem is listed.
In the fourth section, the PPO2 algorithm is used to solve the optimization problem.
The fifth section is the experimental part. By changing the relevant parameters, we can observe the communication times, model accuracy, and model The simulation results show the convergence of the PPO2 algorithm.
The last section of this paper is the summary, which summarizes the main points and shortcomings of this work.

2. Related Work

The bottleneck limiting the performance of the AUV swarm mainly lies in the difficulties of underwater communications [6]. When AUV groups are used for federated learning tasks, as the scale of task models and AUV groups expand, the pressure on the self-organizing data interaction network increases, which requires designing algorithms to save communication resources as much as possible. However, reducing communication resources excessively may lead to compromised federated learning performance and confusion about the AUV formations, which creates a paradox. How to make a trade-off between resource consumption and model performance is worth studying at present.

AUV swarm has the advantages of rapid deployment, controllable action, flexible networks, and other advantages because of its formation, dynamic, mobility, and other characteristics. In recent years, the research into underwater exploration systems based on multiple AUVs or AUV groups has become popular, especially in navigation or path planning [7], collaborative data collection [8], target hunting [9], and so on. For example, in the offshore oil and gas industry, AUVs equipped with underwater environment monitoring sensors cooperate to effectively detect the boundary of oil and gas producing areas. Cui et al. [10] proposed an adaptive path planning algorithm based on the random tree star algorithm to estimate the scalar field underwater for the AUV group. Noguchi et al. [11] put forward a dynamic task allocation and path planning algorithm for the AUV group, which gives full play to the advantages of the AUV group. Huang et al. [12] used the AUV group equipped with scanning imaging sonar, and proposed using the binary Bayesian filter to process the input signal. Then, a path-planning strategy based on the artificial potential field was proposed for real-time target tracking. Cao et al. [13] used the artificial potential field theory to model the underwater environment information and introduce the potential field function into the dispersion, similarity, and difference of the potential field function. On this basis, an intelligent path planning model that can achieve accurate AUV swarm collaborative target search is proposed.

In the federated learning aggregation algorithm, each participant needs to completely send local parameters to the server for model aggregation. These model parameters tend to have a lot of data, which brings huge communication overhead. In [14], the author protects the privacy of the federated learning architecture through covert communication technology, minimizes the federated learning delay by optimizing the interference power, signal power, and training accuracy, and uses an alternating descent algorithm to solve the optimization problem. In [15], the author uses federated learning to build a digital twin model based on the digital twin edge network architecture, uses an asynchronous model update scheme, and performs device selection locally, minimizing channel allocation, and CPU frequency as variables. The federated learning delay uses the artificial intelligence method DNN (deep neural network) model to dynamically solve the optimization problem. In [16], the author selects some clients to send their local model parameters to the server, avoids the training loss, and convergence time being affected by a reasonable local selection strategy, and establishes an optimization problem to optimize the convergence time and training loss, using ANN (artificial neural network) to solve optimization problems. In [17], the author defines a new performance evaluation criterion, i.e., learning efficiency, which is discussed in scenarios where the device is equipped with CPU and GPU, respectively, e.g., channel resource allocation and bandwidth allocation as optimization variables were used to maximize learning efficiency via the Lagrange multiplier method, KKT (Karush–Kuhn–Tucker) conditions. A two-dimensional search algorithm was used to solve the optimization problem and detect and skip some redundant communication rounds locally with adaptive communication rules; a comparable convergence speed was used to the original SGD.

In [18], the authors proposed a communication-saving method for distributed machine learning, which is a gradient-based selection method that reduces redundant communication rounds. In [19], the author introduced the above method [18] into federated learning, based on the traditional SGD (stochastic gradient descent) method. In [20], the author delves into the basis of the above work, and used the gradient sparse method to select redundant nodes locally to reduce the communication amount. The work of this paper continues on this basis. So far, there is no work to introduce federal learning into AUV swarm.

3. System Model

This paper considers the following scenario: a fixed-form swarm consisting of a leader AUV L and a set

M

of the M follower AUV sailing at the same depths with constant speed, collecting seabed data, cooperating, and using federated learning to perform various machine learning tasks, such as target recognition, path planning, etc. Each AUV maintains the same distance from other AUVs and navigators, collecting data with their own independent sensors, and training local models. The local model parameters are then sent to the navigator using the uplink. The leader AUV uses the information uploaded by the follower to aggregate into global model parameters and sends it to the follower using the downlink. The follower proceeds with a new round of local training using the received global model parameters. In order to keep AUVs in formation, AUV L and AUV m also need to transfer position information and speed information to each other. In each round, AUV m uploads its current speed and position data to AUV L. AUV L calculates and broadcasts the subsequent speed and direction from the received information.

3.1. Federated Learning Model

Assuming that the input data collected by the follower AUV

m \in M

is

X_{m}

, the output data are

Y_{m}

, and the local model parameter is

w_{m}

, then the data set of the AUV m can be expressed as

D_{m} = {(x_{m, 1}, y_{m, 1}), (x_{m, 2}, y_{m, 2}), \dots, (x_{m, N_{m}}, y_{m, N_{m}})}

, where

N_{m}

is the number of the data that AUV m owns, the loss function on its data set

D_{m}

can be expressed as average of sample loss functions:

F_{m} (w_{m}) = \frac{1}{N_{m}} \sum_{i = 1}^{N_{m}} f (w_{m}; x_{m, i}, y_{m, i}), \forall m \in M

(1)

Assuming that the global model parameter is

w

, for an AUV

m \in M

containing

N_{m}

data, the corresponding local loss function is weighted, averaged, and defined as the global loss function:

F (w) ≜ \sum_{m = 1}^{M} \frac{N_{m} F_{m} (w)}{N} = \frac{1}{N} \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} f (w_{m}; x_{m, i}, y_{m, i}),

(2)

The purpose of federated learning is to find a parametric model that minimizes the global loss function, i.e., the optimal parametric model can be expressed as

w^{*} = \arg \min F (w),

(3)

where

w = w_{1} = \dots = w_{M}

,

3.2. Communication Model

In the underwater environment, electromagnetic waves and optical signals attenuate quickly and travel over short distances. Therefore, the most commonly used underwater communication is underwater acoustic communication. The attenuation of the water signal with frequency f at distance D can be given by

A (D, f) = l^{k} a {(f)}^{D}

(4)

where k represents the spreading factor, and

a (f)

is the absorption coefficient, which can be expressed by the following empirical formula [21]:

10 log a (f) = \frac{0.11 f^{2}}{1 + f^{2}} + \frac{44 f^{2}}{4100 + f^{2}} + 2.75 \cdot 10^{- 4} f^{2} + 0.003 .

(5)

In the marine environment, noise is mainly divided into turbulence noise, shipping noise, wave noise, and thermal noise, respectively. According to [22], the power spectral density (p.s.d) of four main types of noise in dB re

μ

Pa per Hz on the communication frequency f can be given by:

10 log N_{ϑ} (f) = 17 - 30 log f

(6)

10 log N_{s} (f) = 40 + 20 (s - \frac{1}{2}) + 26 log f - 60 log (f + 0.03)

(7)

10 log N_{w} (f) = 50 + 7.5 w^{\frac{1}{2}} + 20 log f - 40 log (f + 0.4)

(8)

10 log N_{t h} (f) = - 15 + 20 log f

(9)

Moreover,

s \in [0, 1]

is the shipping activity factor, while w represents the wind velocity (

m / s

). Hence, the combined noise

N (f)

can be represented as

N (f) = N_{ϑ} (f) + N_{s} (f) + N_{w} (f) + N_{t h} (f)

(10)

Therefore, the normalized SNR of a signal with unit transmitted power and bandwidth can be represented as

γ (D, f) = \frac{1}{A (D, f) N (f)} .

(11)

The uplink data rate between AUV m and leader AUV L is given by

R_{m}^{U} = B_{m}^{U} {log}_{2} (1 + \frac{p_{m} γ (D, f)}{B_{m}^{U}}),

(12)

where

B_{m}^{U}

is the allocated uplink bandwidth of the AUV m, while

p_{m} \in (0, p_{m a x})

is the transmitting power of the AUV m.

In the iteration, the leader AUV broadcasts the information to the followers and, hence, the downlink data rate between AUV m and leader AUV L can be represented as

R_{m}^{D} = B^{D} {log}_{2} (1 + \frac{p_{L} γ (D, f)}{B^{D}})

(13)

where

B^{D}

is the downlink bandwidth,

p_{L} \in (0, p_{m a x})

denotes the transmitting power of leader AUV.

3.3. Control Model

Due to the high cost of underwater communication, most gradient interactions in the distributed SGD are redundant. Therefore, this paper introduces the concept of lazy nodes, allowing each follower drone to perform self-detection locally so that some nodes skip some rounds of communication. The definition of a lazy node in this paper refers to the work of Chen et al. [19].

\frac{{∥\nabla_{M_{N}}^{t - 1}∥}^{2}}{M_{N}} ⩽ \frac{{∥\nabla_{M}^{t - 1}∥}^{2}}{M}

(14)

Among them, these lazy nodes constitute a set

M_{N}

of size

M_{N}

. In this paper, the gradient descent algorithm is used to optimize the global model, where

γ

represents the learning rate

w (t) = w (t - 1) - γ \nabla_{M}^{t - 1}

(15)

{∥ \nabla_{M}^{t - 1} ∥}^{2} ⩽ \frac{M_{N}}{γ^{2} M} {∥ w (t) - w (t - 1) ∥}^{2},

(16)

Since the global model tends to converge, the following approximation is used

w (t) - w (t - 1) \approx w (t - 1) - w (t - 2)

(17)

According to the mean inequality, we have

{∥ \nabla_{M_{N}^{t - 1}} ∥}^{2} = {∥ \sum_{m \in M_{N}} \frac{N_{m} \nabla_{m}^{t - 1}}{N} ∥}^{2} ⩽ \frac{M_{N}}{N^{2}} \sum_{m \in M_{N}} {∥N_{m} \nabla_{m}^{t - 1}∥}^{2}

(18)

Let

M_{N} = β M

, Equation (14) can be deduced as Equation (18)

{∥ N_{m} \nabla_{m}^{t - 1} ∥}^{2} ⩽ \frac{N^{2}}{γ^{2} M^{2} β} {∥ w (t - 1) - w (t - 2) ∥}^{2} .

(19)

At one round t, AUV m checks locally whether Equation (18) is satisfied. If satisfied, skip this round of uploading, otherwise participate in this round of uploading. Considering the extreme case, if all nodes in a certain round t satisfy Equation (18), then AUV L cannot receive the model information uploaded by the follower AUV. At this time, AUV L randomly selects some follower drones to participate in the upload after a certain time interval. In this paper, we randomly select 1 AUV m to participate in uploading in extreme cases.

In order to prevent AUVs from not communicating for a long time due to the setting of lazy nodes at certain times, we require that AUVs communicate at least once within

τ

round. Each AUV locally records whether it uploads itself in each round of uploading

Λ_{m} (τ) = {λ_{m} (t - τ), λ_{m} (t + 1 - τ), \dots, λ_{m} (t - 1)}

(20)

where

λ_{m} (t) \in {0, 1}

indicates whether AUV m participates in the t round of gradient upload after controlling the model,

λ_{m} (t)

= 1 indicates that it participates in this round of the gradient upload, otherwise not.

At one round t, AUV m checks locally whether Equation (20) is all of 0. If so, AUV m must be uploaded in this round.

3.4. Latency Model

3.4.1. Local Parameter Calculating Latency

Let

f_{m} \in (f_{\min}, f_{\max})

and

f_{L} \in (f_{\min}, f_{\max})

denote the CPU frequency of the AUV m and leader AUV L, respectively, where

f_{\min}

and

f_{\max}

represent the minimum and the maximum CPU frequency, respectively.

The local computation latency on each AUV m at the t-th slot can be calculated as

T_{m}^{LC} (t) = \frac{N_{m} c_{m}}{f_{m} (t)},

(21)

where

c_{m}

is the required CPU cycles for training one sample data by backpropagation algorithm [17] at AUV m.

3.4.2. Uploading Latency

Assume that the local parameter has a counterpart gradient consists of

α

elements, and each of them has an average quantitative bit number denoted by

ζ

. The size of the current position information and speed information is

ψ

. Hence, the total data size of each local parameter is

|w_{m}| = α ζ

[17], uploading latency is given by

T_{m}^{LU} (t) = \frac{|w_{m}| λ_{m} (t)}{R_{m}^{U}} + \frac{ψ λ_{m} (t)}{R_{m}^{U}}

(22)

3.4.3. Global Parameter Aggregating Latency

At the leader AUV L, the global parameter aggregating latency is calculated by

T_{L}^{GA} (t) = \frac{c_{0} \sum_{m = 1}^{M} |w_{m}|}{f_{L} (t)},

(23)

where

c_{0}

is the computational complexity [15] to aggregate the newly updated parameters from follower AUVs, while

|w_{m}|

is the data size of the local parameter

w_{m}

.

3.4.4. Global Parameter Updating Latency

The local parameter updating latency is given by

T_{L}^{GU} (t) = \frac{c_{L}^{'}}{f_{L} (t)},

(24)

where

c_{L}^{'}

is the computational complexity for performing global parameter updating [17].

3.4.5. Downloading Latency

The downloading latency can be calculated as

T_{L}^{GD} (t) = \frac{|w|}{R_{m}^{D}} + \frac{ψ}{R_{m}^{D}},

(25)

where

|w|

is the data size of the global parameter, which is basically similar to the local parameter

|w_{m}|

.

3.4.6. Total Latency

Regardless of extreme cases, there are always devices uploaded in a round, a complete federated learning delay is

T (t) = max_{m \in M} \{T_{m}^{LC} + T_{m}^{LU}\} + max_{m \in M} \{T_{L}^{GA} + T_{L}^{GU} + T_{L}^{GD}\} .

(26)

Consider the extreme case where there are no devices uploaded within the round. According to the settings of the control model, when the AUV L does not receive the gradient upload after the time interval

T^{'}

, a device is randomly selected for uploading. Assuming that the selected device is

m_{0}

, then the federated learning delay for this round is

T (t) = T_{m_{0}}^{LC} + T_{m_{0}}^{LU} + max_{m \in M} {T_{L}^{GA} + T_{L}^{GU} + T_{L}^{GD}} + T^{'} .

(27)

3.5. Energy Consumption Model

3.5.1. Energy Consumption of Follower AUV

The computational energy consumption of the follower AUV is mainly composed of a local parameter calculating and local parameter updating, which can be given by

E_{m}^{Cp} (t) = k f_{m}^{σ} (T_{m}^{LC} (t)),

(28)

The communication energy consumption is mainly the local parameter uploading, and it can be calculated by

E_{m}^{C} (t) = p_{m} T_{m}^{LU} (t),

(29)

3.5.2. Energy Consumption on Leader AUV

The computational energy consumption of the leader AUV is mainly caused by a global parameter aggregating, which is given as

E_{L}^{Cp} (t) = k f_{L}^{σ} (T_{L}^{GA} (t) + T_{L}^{GU} (t)),

(30)

Similarly, the communication energy consumption of leader AUV is mainly due to the global parameter downloading, which can be calculated by

E_{m}^{C} (t) = p_{L} T_{L}^{GD} (t),

(31)

3.5.3. Total Energy Consumption

E (t) = Φ (E_{m}^{Cp} (t) + E_{L}^{Cp} (t)) + χ (E_{m}^{C} (t) + E_{m}^{C} (t))

(32)

3.6. Problem Formulation

Our goal is to minimize the total cost by optimizing the variables

p_{m}, f_{m}, p_{L}, f_{L}, β

, which are defined as follows.

\begin{matrix} \begin{matrix} P 1 : & min_{p_{m}, f_{m}, p_{L}, f_{L}, β} Cost = \sum_{t = 1}^{N_{t}} (T (t) + E (t)) \end{matrix} \end{matrix}

(33)

0 ⩽ p_{m} ⩽ p_{max}, m \in M,

(34)

f_{min} ⩽ f_{m} ⩽ f_{max}, m \in M,

(35)

0 ⩽ p_{L} ⩽ p_{max},

(36)

f_{min} ⩽ f_{L} ⩽ f_{max},

(37)

0 ⩽ β ⩽ 1,

(38)

E_{m}^{Cp} + E_{m}^{C} ⩽ E_{m}^{thd}, m \in M,

(39)

E_{L}^{Cp} + E_{L}^{C} ⩽ E_{L}^{thd},

(40)

Among them,

N_{t}

is the total number of iterations. Equation (33) is the optimization objective, and the total time of each round is minimized by the optimizing power

p_{m}, p_{L}

and CPU frequency

f_{m}, f_{L}

, Equations (34) and (35) are the power constraints, and CPU frequency constraints of the AUV m, respectively, Equations (36) and (37) are the power constraints and CPU frequency constraints of AUV L, respectively, Equation (38) is the lazy node scale factor, and Equations (39) and (40) are the energy constraints of AUV m and AUV L, respectively.

4. Algorithm Design

In this section, we consider PPO to deal with

P 1

, since it is non-convex and has high dimensionality.

4.1. Modeling of Deep Reinforcement Learning Environment

An MDP consists of the state space, action space, state transition matrix function, reward function, and discount factor. Thus the optimization problem

P 1

can be transformed into the following:

State space: For each time slot t, we use

p_{m} (t - 1), f_{m} (t - 1), p_{L} (t - 1), f_{L} (t - 1), β (t - 1)

at the time slot

(t - 1)

to describe the state space.

According to the above, the network state of the agent at time slot t in this paper can be expressed by

\begin{matrix} \begin{matrix} s (t) = (p_{m} (t - 1), f_{m} (t - 1), p_{L} (t - 1), f_{L} (t - 1), β (t - 1)) \end{matrix} \end{matrix}

(41)

the network state of the agent is given by

S_{t} = \{s (t)\}

action space: at each the time slot t,

a (t)

is composed of the following parts—formally, the action space at time slot t is denoted by:

p_{m} (t), f_{m} (t), p_{L} (t), f_{L} (t), β (t)

\begin{matrix} \begin{matrix} a (t) = (p_{m} (t), f_{m} (t), p_{L} (t), f_{L} (t), β (t)) \end{matrix} \end{matrix}

(42)

and the action of the agent is given by

a_{t} = \{a (t)\}

.

State transition function: transition probability of the agent at time slot t can be denoted as

P_{T} (s_{t + 1} ∣ s_{t}, a_{t})

.

Policy: Let

π

denote the policy function, which is based on the observed state to make decisions and control the action of the agent

π (a ∣ s) = P (a ∣ s)

.

Reward function: the reword function in this paper is designed for optimizing

p_{m}, f_{m}, p_{L}, f_{L}, β

; the presented reword function of the agent at one time slot t is expressed as

\begin{matrix} r_{t} (s_{t}, a_{t}) = - (T (t) + E (t)) \end{matrix}

(43)

Maximizing the cumulative reward obtained by the sequence

T^{'}

involves the sum of the rewards obtained at each stage, called

R_{n} (T^{'})

. Therefore, the expected reward can be obtained as follows with policy

π

:

\begin{matrix} \begin{matrix} R_{n} (T^{'}) = \sum_{τ = 0}^{\infty} ξ_{τ} r_{n} (s_{n + τ}, a_{n + τ}) \end{matrix} \end{matrix}

(44)

where

ξ_{τ}

denotes the discounter factor.

4.2. Proximal Policy Optimization Algorithm

The proximal policy optimization (PPO) is the off-policy model-free reinforcement learning algorithm. OpenAI uses it as the current baseline algorithm, which uses a new class of objective functions and updates parameters in small batches with multiple training steps. To better describe reward and prevent overfitting, we used advantage

{\hat{A}}_{n}

instead

R_{n} (T^{'})

to evaluate actions

\begin{matrix} \begin{matrix} {\hat{A}}_{n} = R_{n} (T^{'}) - V_{ϕ} (s_{n}) \end{matrix} \end{matrix}

(45)

where

V_{ϕ} (s_{n})

can be calculated by a value network.

Moreover, our expectation is to update the actor’s policy

π

to maximize the expected reward. So, we need to use the gradient boosting method to update the network parameters

θ

. The gradient solution process is as follows:

\begin{matrix} \begin{matrix} G = E [\nabla_{θ} log π_{θ} (a_{t} ∣ s_{t}) {\hat{A}}_{t}] \end{matrix} \end{matrix}

(46)

PPO2 was inspired by the same question as TRPO: using only the current data to improve the policy as much as possible, without causing a sudden decline in the performance of the policy. Differently from TRPO attempting to solve the problem with a complex second-order approach, PPO2 uses a first-order approach that uses a few other tricks to make the new policy approximate the old one. The essence of the PPO algorithm is to introduce a ratio coefficient to indirectly describe the difference between the new strategy and the old strategy, denoted by

r_{t} (θ) = \frac{π_{θ} (a_{t} ∣ s_{t})}{π_{θ_{k}} (a_{t} ∣ s_{t})}

; the loss function is defined as follows:

\begin{matrix} \begin{matrix} θ_{k + 1} = arg max_{θ} \frac{1}{|D_{k}| T} \sum_{τ \in D_{k}} \sum_{t = 0}^{T} \\ min (\frac{π_{θ} (a_{t} ∣ s_{t})}{π_{θ_{k}} (a_{t} ∣ s_{t})} A^{π_{θ_{k}}} (s_{t}, a_{t}), g (ϵ, A^{π_{θ_{k}}} (s_{t}, a_{t}))) \end{matrix} \end{matrix}

(47)

where

ϵ

is a (small) hyperparameter that roughly limits the range of variation of

r_{t} (θ)

. In PPO2, we use the following formula to update the value network strategy

\begin{matrix} \begin{matrix} ϕ_{k + 1} = arg min_{ϕ} \frac{1}{|D_{k}| T} \sum_{τ \in D_{k}} \sum_{t = 0}^{T} {(V_{ϕ} (s_{t}) - {\hat{R}}_{t})}^{2} \end{matrix} \end{matrix}

(48)

The algorithm used in this paper is summarized in Algorithm 1.

Algorithm 1 Proximal policy optimization clip.

Input: initial policy parameters $θ_{0}$ , initial value function parameters $ϕ_{0}$
for $k = 1, 2, \dots$ do
Run policy $π_{θ}$ for K timesteps, collecting $s_{n}, a_{n}, r_{n}$
Compute return ${\hat{R}}_{n}$ according to Equation (44)
Compute advantages ${\hat{A}}_{n}$ according to Equation (45)
Update the policy by maximizing the PPO clip objective according to Equation (47)
Fit value function according to Equation (48)
end for

5. Simulation Results

This chapter uses the TensorFlow framework in Python to build a federated learning model, using a MNIST (Modified National Institute of Standards and Technology Database) consisting of 42,000 digital images, and retaining 10% of the data as a test set for the global model. At the same time, a three-layer MLP (multilayer perceptron) neural network was created as a model for the classification task. The number of neurons in each layer was 200, 200, and 10, respectively. We used this model to perform a classification task and classify the resulting accuracy and loss function. This section gives a maximum number of convergence rounds of 1000. Assuming M = 9, the follower AUVs are 20 m apart from each other and form a formation to form a formation and fly at a constant speed in a certain direction, and the leader AUV is 20 m in front of the formation and the center. The rest of the simulation parameters are shown in Table 1.

5.1. The Performance of Each Index in the Gradient Compression Test

In Figure 1, the comparison of communication times among the different schemes versus

β

, We can observe that as

β

increases, the follower AUVs that do not participate in communication within the round decrease more, and the corresponding total number of communications increased. From the analysis of Equation (19), the decrease of

β

leads to the increase of the right side of the inequality, resulting in more nodes satisfying Equation (19), the AUV skipping communication rounds increase, and the communication decreases. The experimental results are in good agreement with the theoretical analysis. It can be concluded from Figure 1 and Figure 2 that the federated learning after the control model still converges. However, as it decreases, the number of communications decreases. Although the communication resources are saved, the global model is affected due to the lack of gradients of some follower AUVs in certain rounds, resulting in a decrease in the performance of federated learning. Specifically, the performance is reduced in accuracy, and the experimental results are consistent with the theoretical analysis.

In Figure 3, we show the total time comparison of different schemes versus

β

. We can observe that as

β

increases, the total time reduces. This is because with

β

increasing, the communications gradually increase, and the model can converge faster with more communications, resulting in a shorter total time.

5.2. The Performance Analysis of the Scheme Proposed in This Paper

To show that the scheme proposed in this paper has the best effect on reducing the training cost, we compare it with other offloading schemes.

Scheme 1: the scheme proposed in this paper.
Scheme 2: asynchronous federated learning with dynamically optimized $β$
Scheme 3: asynchronous federated learning with fixed $β$ .
Scheme 4: asynchronous federated learning with LAG.
Scheme 5: traditional asynchronous federated learning.

The two sub-graphs in Figure 4 describe the communication times and corresponding accuracy rates of different experimental schemes with different follower AUV numbers. Accuracy will increase. At the same time, a larger communication number corresponds to a higher accuracy rate. When the communication number time is reduced by 710, which is only 21% of the traditional asynchronous federated learning, the accuracy rate is only reduced by 0.03%. This shows that we greatly reduced the communication number, but the accuracy basically did not decrease, indicating that our proposed scheme has good results.

In Figure 5, we show the accuracies of different control models versus the number of follower AUVs. Compared with the work of [19], the control model proposed in this paper increases the accuracy. This shows that the improved control model in this paper is effective.

In Figure 6, we show the cost comparison of different schemes versus the number of follower AUVs. With the increase in the follower AUV, the cost gradually increases. As can be seen from the figure, the scheme proposed in this paper has a smaller cost and can save more resources.

5.3. The Performance Analysis of the PPO2 Algorithm

In Figure 7, we compare the performance of state-of-the-art algorithms in coping with problem

P 1

, including GA, PSO, AC, DDPG, and our employed PPO2 algorithm. We can observe that PPO2 has a smaller cost under the same number of iterations, which shows that this algorithm is better than other algorithms.

6. Conclusions

In order to reduce the constraints of underwater scarce communication resources on AUV swarm machine learning, we designed an asynchronous federated learning method. By constructing the optimization problem of minimizing the weighted sum of delay and energy consumption, the AUV CPU frequency and signal transmission power were jointly optimized. In order to solve this complex optimization problem of high-dimensional non-convex time series accumulation, we transformed the problem into an MDP and used the PPO2 algorithm to solve this problem. Finally, we carried out some experiments to verify the effectiveness of the proposed scheme.

Author Contributions

This manuscript was designed and written by Z.M., who conceived the main idea of this study. Z.M. wrote the program and completed all experiments. Z.L. supervised the research and contributed to the proposal and improvement of the algorithms. X.H., J.D., J.C. and W.W. contributed to the work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation China under project 61971257, in part by the Young Elite Scientist Sponsorship Program by CAST under Grant 2020QNRC001, and in part by the National Natural Science Foundation China under project 61673310. (Corresponding author: Zhi Li).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FL	federated learning
AUVs	autonomous underwater vehicles
UIoT	underwater Internet of Things
PPO2	proximal policy optimization 2
FEDSGD	federated stochastic gradient descent
DNN	deep neural network
ANN	artificial neural network

References

Fang, Z.; Wang, J.; Jiang, C.; Wang, X.; Ren, Y. Average Peak Age of Information in Underwater Information Collection With Sleep-Scheduling. IEEE Trans. Vehicular Technol. 2022, 71, 10132–10136. [Google Scholar] [CrossRef]
Guan, S.; Wang, J.; Jiang, C.; Duan, R.; Ren, Y.; Quek, T.Q.S. MagicNet: The Maritime Giant Cellular Network. IEEE Commun. Mag. 2021, 59, 117–123. [Google Scholar] [CrossRef]
Fang, Z.; Wang, J.; Jiang, C.; Zhang, Q.; Ren, Y. AoI-Inspired Collaborative Information Collection for AUV-Assisted Internet of Underwater Things. IEEE Internet Things J. 2021, 8, 14559–14571. [Google Scholar] [CrossRef]
Fang, Z.; Wang, J.; Du, J.; Hou, X.; Ren, Y.; Han, Z. Stochastic Optimization-Aided Energy-Efficient Information Collection in Internet of Underwater Things Networks. IEEE Internet Things J. 2022, 9, 1775–1789. [Google Scholar] [CrossRef]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics—AISTATS, Ft. Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
Poudel, S.; Moh, S. Medium Access Control Protocols for Unmanned Aerial Vehicle-Aided Wireless Sensor Networks: A Survey. IEEE Access 2019, 7, 65728–65744. [Google Scholar] [CrossRef]
Shen, C.; Shi, Y.; Buckham, B. Integrated Path Planning and Tracking Control of an AUV: A Unified Receding Horizon Optimization Approach. IEEE/ASME Trans. Mechatron. 2017, 22, 1163–1173. [Google Scholar] [CrossRef]
Yan, J.; Yang, X.; Luo, X.; Chen, C. Energy-Efficient Data Collection Over AUV-Assisted Underwater Acoustic Sensor Network. IEEE Syst. J. 2018, 12, 3519–3530. [Google Scholar] [CrossRef]
Cai, L.; Zhou, G.; Zhang, S. Multi-AUV Collaborative Hunting Method for the Non-cooperative Target in Underwater Environment. In Proceedings of the 2018 3rd International Conference on Advanced Robotics and Mechatronics (ICARM), Singapore, 18–20 July 2018; pp. 1–5. [Google Scholar] [CrossRef]
Cui, R.; Li, Y.; Yan, W. Mutual Information-Based Multi-AUV Path Planning for Scalar Field Sampling Using Multidimensional RRT*. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 993–1004. [Google Scholar] [CrossRef]
Noguchi, Y.; Maki, T. Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention the AUVs. In Proceedings of the 2019 IEEE Underwater Technology (UT), Kaohsiung, Taiwan, 16–19 April 2019; pp. 1–6. [Google Scholar] [CrossRef]
Huang, H.; Zhu, D.; Yuan, F. Dynamic task assignment and path planning for multi-AUV system in 2D variable ocean current environment. In Proceedings of the 24th Chinese Control and Decision Conference (CCDC), Taiyuan, China, 23–25 May 2012; pp. 3660–3664. [Google Scholar] [CrossRef]
Cao, X.; Sun, C. Multi-AUV cooperative target hunting based on improved potential field in underwater environment. In Proceedings of the 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing, China, 18–20 May 2018; pp. 118–122. [Google Scholar]
Van, N.T.T.; Luong, N.C.; Nguyen, H.T.; Shaohan, F.; Niyato, D.; Kim, D.I. Latency Minimization in Covert Communication-Enabled Federated Learning Network. IEEE Trans. Vehicular Technol. 2021, 70, 13447–13452. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Zhang, K.; Maharjan, S.; Zhang, Y. Communication-Efficient Federated Learning and Permissioned Blockchain for Digital Twin Edge Networks. IEEE Internet Things J. 2021, 8, 2276–2288. [Google Scholar] [CrossRef]
Chen, M.; Poor, H.V.; Saad, W.; Cui, S. Convergence Time Optimization for Federated Learning Over Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 2457–2471. [Google Scholar] [CrossRef]
Ren, J.; Yu, G.; Ding, G. Accelerating DNN Training in Wireless Federated Edge Learning Systems. IEEE J. Sel. Areas Commun. 2021, 39, 219–232. [Google Scholar] [CrossRef]
Zhang, J.; Simeone, O. LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 962–974. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chen, T.; Sun, Y.; Yin, W. Communication-Adaptive Stochastic Gradient Methods for Distributed Learning. IEEE Trans. Signal Process. 2021, 69, 4637–4651. [Google Scholar] [CrossRef]
Sun, J.; Chen, T.; Giannakis, G.B.; Yang, Q.; Yang, Z. Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2031–2044. [Google Scholar] [CrossRef] [PubMed]
Jensen, F.B.; Kuperman, W.A.; Porter, M.B.; Schmidt, H.; Tolstoy, A. Computational Ocean Acoustics; Springer: Berlin/Heidelberg, Germany, 2011; Volume 2011. [Google Scholar]
Stojanovic, M. On the relationship between capacity and distance in an underwater acoustic communication channel. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2007, 11, 34–43. [Google Scholar] [CrossRef]

Figure 1. Comparison of the communication times of different schemes versus

β

.

Figure 1. Comparison of the communication times of different schemes versus

β

.

Figure 2. Comparison of the accuracy of different schemes versus

β

.

Figure 2. Comparison of the accuracy of different schemes versus

β

.

Figure 3. Comparisons of the total times of different schemes versus

β

.

Figure 3. Comparisons of the total times of different schemes versus

β

.

Figure 4. The relationship between communication times and accuracy. (a) Comparison of the communication times of different schemes versus the number of follower AUVs. (b) Comparison of the accuracy of different schemes versus the number of follower AUVs.

Figure 5. Comparison of the accuracies of different control models versus the number of follower AUVs.

Figure 6. Comparison of the costs of different schemes versus

β

.

Figure 6. Comparison of the costs of different schemes versus

β

.

Figure 7. Comparison of the profits of different algorithms.

Table 1. Values of main parameters.

Parameter	Value	Parameter	Value
k	$1.25 \times 10^{- 26}$	$γ$	0.01
$δ$	3	M	20
$B_{m}^{U}$	10 kHz	f	30 kHz
$B^{D}$	10 kHz	s	0.5
$N_{m}$	4224	$τ$	5
$c_{m}$	10,000	$f_{\max}$	0.4 GHz
$c_{0}$	50	$f_{\min}$	0.2 GHz
$c_{L}^{'}$	50	$E_{m}^{thd}$	0.07 W
$\| w_{m} \|$	1594*64 bit	$E_{L}^{thd}$	0.8 W
$\| w \|$	1594*64 bit	$Φ$	1
$p_{\max}$	0.2 W	$χ$	1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, Z.; Li, Z.; Hou, X.; Du, J.; Chen, J.; Wei, W. Efficient Asynchronous Federated Learning for AUV Swarm. Sensors 2022, 22, 8727. https://doi.org/10.3390/s22228727

AMA Style

Meng Z, Li Z, Hou X, Du J, Chen J, Wei W. Efficient Asynchronous Federated Learning for AUV Swarm. Sensors. 2022; 22(22):8727. https://doi.org/10.3390/s22228727

Chicago/Turabian Style

Meng, Zezhao, Zhi Li, Xiangwang Hou, Jun Du, Jianrui Chen, and Wei Wei. 2022. "Efficient Asynchronous Federated Learning for AUV Swarm" Sensors 22, no. 22: 8727. https://doi.org/10.3390/s22228727

APA Style

Meng, Z., Li, Z., Hou, X., Du, J., Chen, J., & Wei, W. (2022). Efficient Asynchronous Federated Learning for AUV Swarm. Sensors, 22(22), 8727. https://doi.org/10.3390/s22228727

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Asynchronous Federated Learning for AUV Swarm

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. Federated Learning Model

3.2. Communication Model

3.3. Control Model

3.4. Latency Model

3.4.1. Local Parameter Calculating Latency

3.4.2. Uploading Latency

3.4.3. Global Parameter Aggregating Latency

3.4.4. Global Parameter Updating Latency

3.4.5. Downloading Latency

3.4.6. Total Latency

3.5. Energy Consumption Model

3.5.1. Energy Consumption of Follower AUV

3.5.2. Energy Consumption on Leader AUV

3.5.3. Total Energy Consumption

3.6. Problem Formulation

4. Algorithm Design

4.1. Modeling of Deep Reinforcement Learning Environment

4.2. Proximal Policy Optimization Algorithm

5. Simulation Results

5.1. The Performance of Each Index in the Gradient Compression Test

5.2. The Performance Analysis of the Scheme Proposed in This Paper

5.3. The Performance Analysis of the PPO2 Algorithm

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI