AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning

Autonomous underwater vehicles (AUVs)-assisted mobile data collection in underwater wireless sensor networks (UWSNs) has received significant attention because of their mobility and flexibility. To satisfy the increasing demand of diverse application requirements for underwater data collection, such as time-sensitive data freshness, emergency event security as well as energy efficiency, in this paper, we propose a novel multi-modal AUV-assisted data collection scheme which integrates both acoustic and optical technologies and takes advantage of their complementary strengths in terms of communication distance and data rate. In this scheme, we consider the age of information (AoI) of the data packet, node transmission energy as well as energy consumption of the AUV movement, and we make a trade-off between them to retrieve data in a timely and reliable manner. To optimize these, we leverage a deep reinforcement learning (DRL) approach to find the optimal motion trajectory of AUV by selecting the suitable communication options. In addition to that, we also design an optimal angle steering algorithm for AUV navigation under different communication scenarios to reduce energy consumption further. We conduct extensive simulations to verify the effectiveness of the proposed scheme, and the results show that the proposed scheme can significantly reduce the weighted sum of AoI as well as energy consumption.


Introduction
Accompanied by the increasing demand for ocean exploration and protection, underwater sensor networks (UWSNs) have received more attention as these play an important role in diverse marine applications, such as coastal monitoring and protection, marine resource exploration, disaster warning and military operations [1][2][3][4]. However, due to the harsh hydrographic and geographical environment, it is difficult to collect data from underwater sensor devices via a long-range routing path. Even if the monitored data can be transmitted through multi-hop routing technologies, there may be heavy workload near the sink with extra energy consumption [5]. Furthermore, as the battery power of underwater sensor nodes is severely limited and difficult to be recharged underwater, it is not energy-efficient to upload large volume of ocean monitoring data to the sink directly. Moreover, with marine security operations, it is better to collect secret data nearby the monitoring sensors. To solve the aforementioned problems, autonomous underwater vehicles (AUVs) have been rapidly developed in recent years in terms of data storage and signal processing capabilities, which can better enable underwater mobile data collection. Moreover, the durability and mobility of AUVs alleviate the unbalanced energy consumption problem of underwater sensors [6,7].
To collect data in an efficient manner, various underwater communication technologies have been investigated, such as acoustic and optics [8]. Currently, although underwater acous-tic communication (UAC) has become the most widely used technology due to its unique advantages (e.g., long-range communication), it is limited by its shortcomings (e.g., low bandwidth, slow speed, high bit error rate and large delay) [9]. To address these issues, underwater optical communication (UOC) has emerged as an alternative solution, as it has a higher propagation speed (2.255 ×10 8 m/s) and higher data rate (up to hundreds of Mbit/s) over short to medium-range transmissions [10,11]. As both acoustic and optical communication have their pros and cons, employing multi-modal underwater communication systems in UWSNs has become a potential approach to improve network performance [12,13].
To facilitate mobile data collection in such multi-modal networks, it is necessary to satisfy the varying requirements of marine applications by combining the potential advantages of AUVs [14]. Combined multi-modal data collection via AUV is divided into two categories, such as acoustic multi-modal and acoustic-optical multi-modal. In acoustic multi-modal data collection, the sensor node transmits control information using low-frequency acoustic waves and guides the AUV to the designated area, and then, it switches to a high-frequency UAC modem to transmit the data [15]. In this case, the high energy consumption of the UAC shortens the lifetime of the sensor node when transmitting large volume of network data. Whereas, in the acoustic-optical multi-modal data collection, the UAC provides the capability for the AUV to approach the sensor node through long-range guidance and assists with alignment for the optical communication. The subsequent proximity of the UOC data transmission not only improves data transmission efficiency but also saves energy for transmission [16]. However, limited by the UOC range, it is necessary for the AUV to move slowly close to the sensor node to build an optical link reliably, which increases traveling time. A promising solution to the above problem is to engage both the UAC and UOC in data collection, such as transmitting a small volume of data over long distances using the UAC and retrieving and offloading large amounts of data using the UOC [17,18].
Although the aforementioned pioneering studies have laid a solid foundation for multimodal data collection, there are still some issues when applied to mobile data collection. Firstly, the AUV should adaptively select the best communication technology according to the specific marine operational data requirements (e.g., data importance and packet size). For example, high-quality data collection (e.g., high-resolution images with 4K size) with UOC can prolong the network lifetime of the UWSN by sacrificing the energy consumption of the AUV, and when the volume of data is relatively less, UAC can be used for remote collection to reduce the energy consumption and travel time of the AUV. Furthermore, the AUV should complete the data collection operation quickly to guarantee the freshness of data as the data value usually decays over time [19]. Generally, the age of information (AoI) can be used to measure data freshness in mobile data collection scenarios [5,20,21]. By optimizing AoI, the requirement of the network for timely data delivery can better be satisfied. Therefore, in a multi-modal AUV-enabled mobile data collection scheme, how to optimize the trajectories of AUV and select a communication option to minimize both AoI and energy consumption based on the size and the importance of the packets is a critical issue.
To solve the aforementioned issues, in this paper, we propose an acoustic-optical multimodal mobile data collection scheme. Based on the type and the size of data, the AUV intelligently searches for the optimal trajectory and communication options using the deep reinforcement learning (DRL) approach, thereby minimizing the AoI and extending the lifetime of the sensor network. To the best of our knowledge, this is the first study which focuses on integrating an acoustic-optical multi-modal option with optimal AUV path planning for reliable and timely mobile data collection leveraging the DRL approach. The main contributions of this study are listed as follows. • We investigate an AUV-assisted underwater trajectory planning problem for data collection by integrating the complementary advantages of both acoustic and optical communication with data diversity to perform reliable and timely mobile data collection. • We propose a DRL-based AUV-assisted multi-modal mobile data collection scheme in which we consider several key factors, such as data importance, packet size and data collection option, to minimize AoI and reduce energy consumption.
• We propose an optimal angle steering algorithm for AUV navigation to reduce energy consumption, in which the steering angle of the AUV is determined based on the AUV and sensor positions as well as the data collection option.
The rest of the paper is structured as follows. We briefly review the related works in Section 2. In Section 3, we introduce the network model with necessary background.
In Section 4, we analyze the problem of the multi-modal data collection. In Section 5, we describe the proposed scheme of DRL-based multi-modal data collection in detail. We evaluate the performance of the proposed scheme in Section 6. Finally, we conclude the paper in Section 7.

Related Works
In recent years, multi-modal communication has become a research topic to improve network performance and optimize data transmission in various marine application scenarios. Commonly adopted multi-modal technologies include acoustic multi-modal communication and acoustic-optical hybrid communication [13,[22][23][24]. Among them, the acoustic multi-modal communication is constructed by a set of UAC modems working on different frequency bands [13]. In [22], the authors proposed a multi-modal underwater routing protocol based on the reinforcement learning technique. In this protocol, the reliability and delay of data transmission are optimized by UAC modems in multiple frequency bands. To explore the advantages of UAC and UOC during data transmission, Shen et al.
proposed an acoustic-optical multi-modal routing scheme based on packet size and link adaptation, which reduces packet loss and end-to-end delay [23]. However, the challenge of unbalanced energy consumption still exists in the multi-hop underwater networks.
The AUV-assisted data collection can mitigate the energy consumption unbalance that occurs in multi-hop routing. Han et al. [18] explored the characteristics of underwater acoustic and optical communication in AUV-assisted data collection and showed that hybrid acoustic-optical data collection outperforms the one with a single acoustic modem in terms of both throughput and energy consumption. To cope with the impact of the harsh underwater environment, Luo et al. [25] maximize the network throughput by capturing the dynamic characteristics of the channel and the mobility of the AUV. Hu et al. [17] proposed a mobile data collection method for the heterogeneous sensor network using multi-hop acoustic communication to build an intra-cluster network where CHs collect large-scale data and upload them to a mobile receiver via optical communication. Although the aforementioned schemes improve the efficiency of data collection, these ignore the difference in the importance of data and the decay of freshness over time.
To handle the aforementioned issues, Gjanci et al. [16] proposed a greedy adaptive navigation algorithm to guide AUV for data collection, which considers the characteristics of data decay, but it is only applicable to sparse networks due to the unavoidable long paths. To deliver emergency data faster, Liu et al. [26] proposed a hybrid data collection scheme, in which the urgent data are routed using a multi-hop scheme, while the delay-insensitive data are collected by AUVs. Duan et al. [27] studied a hierarchical data collection problem using AUVs to optimize the information quality of the collected data while considering the importance and timeliness of the events.
More recently, researchers have proposed the concept of AoI to model the timeliness of data while considering the quality of experience (QoE) [28]. Khan et al. [29] provided an optimization algorithm to ensure the freshness of the collected data. Fang et al. [5] used a vocational queuing model to improve the data reliability and peak AoI of the data. Then, the communication link is established using the UAC when the AUV arrives near the node. Al-Habob et al. [30] proposed a framework to optimize the trajectories of AUV and minimize the normalized weighted sum of the average AoI. Wu et al. [31] studied the AUV transmission scheduling policy by considering both the age and the importance of the message.
Although the aforementioned approaches have promoted the study of underwater data collection, only a single communication technology was considered for the data collection.
Moreover, none of them addressed the issue of leveraging multiple data types and multiple communication technologies to improve data freshness and energy efficiency. To address this issue, in this paper, we propose a AUV-assisted acoustic and optical multi-modal data collection scheme, in which we use the DRL method to optimize AUV trajectories, AoI and energy consumption by considering different communication options and packet size.

Network Architecture
As shown in Figure 1, we consider an AUV-based multi-modal data collection network where the deployed nodes are classified into ordinary nodes S = {s 1 , s 2 , . . . , s M }, cluster heads CHs = {c 1 , c 2 , . . . , c N } and sink node according to their different functions. The sensor nodes are statically deployed on the seabed using anchor chains, where the locations of the nodes are assumed to be known. The CHs perform intra-cluster data fusion and data compression [27] and then wait for the AUV to arrive and collect the data. In particular, during the network formation phase, all sensor nodes are divided into multiple clusters based on spatial distance, and only one node in each cluster is selected as the CH, while other nodes are used as ordinary nodes for data collection [32]. The AUV performs global data collection around all CH nodes and finally reports the data to the sink node. In the multi-modal network, each node is equipped with both UAC and UOC modems for multi-modal communication, and they have the same initial energy, sensing and communication capabilities. Specifically, it includes an acoustic modem for exchanging data at a low transmission rate over a long distance and an optical modem with a relatively short transmission distance and high data rate [33]. Meanwhile, the AUV has similar communication capability to ensure the data transmission [16]. Without loss of generality, we assume that the data arrival rate of sensor node obeys a Poisson random distribution with parameter λ. When the AUV visits c i , the CHs package the sampled data block into the packets of length B i with timestamp T i .

Node Clustering Phase
We assume that the nodes are randomly deployed in the target area to monitor the underwater environment, and the nodes are clustered. In the initial phase, the sink nodes know the location of each node and determine the number of clusters based on the network size, and then, the target area is divided equally into several square areas. The sink node broadcasts the subregion message to all nodes, and each node determines its own cluster identifier based on its position. Nodes with the same identifier belong to the same cluster [26].
The CHs should be selected for inter-cluster data collection and communication with the AUV. The selection of CH is carried out according to the procedure as follows. The number of optical and acoustic communication neighbors of each node in the sub-region is first obtained, and then, the node with the highest number of optical communication neighbors and the remaining energy satisfying the energy threshold requirement is selected as the CH. Then, the above operation is repeated until all CHs in the target region are determined. Finally, a confirmation packet is sent by the sink to the designated CHs. At the end of the data collection process, all CHs are evaluated, and when the energy of the CH is less than the energy threshold, the network performs a new CH selection round.

Acoustic Data Collection Link
When the AUV traverses near the node, it is necessary to construct a communication link for data collection. As for the acoustic link, the acoustic wave is affected by the absorption of medium and the scattering of impurities in water. The path loss of underwater acoustic channels is related to frequency f and distance d ac . To this end, the total attenuation is given as follows [34].
where k = 1.5 represents propagation loss, and a( f ) is the absorption coefficient in dB/km given by the Thorp formula [35] 10 log a( f ) = 0.11 Consequently, given the acoustic signal transmit power P ac trans and frequency f , the signal-to-noise ratio (SNR) can be expressed as [36] where N( f ) and ∆( f ) represent the total noise level including four kinds of interference noise and the bandwidth of the receiver, respectively. Therefore, the transmission power of acoustic communication satisfying the minimum SNR ac min is expressed as

Optical Data Collection Link
For the optical link, the path loss PL of the underwater wireless optical link can be expressed as [37] PL ≈ 10 log where D r represents the aperture diameter of the receiver and θ denotes half of the transmitter beamwidth, d op represents the distance between transceivers, and c and ζ represent the extinction coefficient and turbidity of water quality, respectively. Subject to the optical-toelectric conversion efficiency of the receiver, a minimum received power per bit is defined as P op rec . Then, the transmission power of the UOC is expressed as In order to ensure robust optical communication, it is necessary to control the minimum SNR op min requirements [38].
where NEP represents the noise equivalent power and ϕ is the offset angle between transceivers. According to the Lambert W function [39], the maximum underwater optical communication distance while satisfying the communication SNR op min can be obtained by Since optical modems are usually directional, in order to receive optical signals from any direction, we assume that an omni-directional optical modem can be achieved by using multiple LEDs [40].

Multi-Modal Data Collection Analysis
When performing multi-modal data collection via AUV, the freshness of the collected data and the energy consumption of the network nodes need to be fully considered. The choice of communication options fundamentally affects the data collection efficiency. Among the various communication options, the UOC is capable of transmitting a large volume of data rapidly to reduce transmission latency but increases the navigation time and energy consumption of the AUV. Meanwhile, the UAC has a lower bandwidth but can collect small volume of data over long distances to reduce the travel time of the AUV. Consequently, to collect data in a timely and efficient manner, several key factors, such as the data collection option, data type, packet size and AoI requirement, should be fully analyzed and integrated into the optimal path-planning scheme for data collection.

Problem Analysis
The primary goal of the mobile data collection in this paper is to minimize both the weighted average AoI and the energy consumption. The factors that influence the AoI include the AUV trajectory, the data transfer time and the importance of the data. Consequently, to minimize the weighted average AoI, the optimization problem can be expressed as where A i denotes the final result of AoI when the data from c i reach the sink node, e i denotes the energy consumption by c i during data collection, and x ac i,t = 1 indicates that the AUV has reached the acoustic communication range of CH c i and receives data through UAC.
Otherwise, x ac i,t = 0 holds. Similarly, x op i,t represents an optical communication indicator. E AUV is the energy consumption of the AUV in data collection, and energy AUV indicates the initial total energy of the AUV. The constraint in (9c) is used to ensure that each node can select just one communication option during data collection. The constraint in (9d) is to guarantee that the AUV cannot consume all of its energy. Finally, the constraint in (9e) is to determine the initial position of AUV.
The optimization problem (9) is a non-linear integer programming problem, which is intractable due to the presence of binary variables and non-convex objective function. In the following section, we model this as a Markov decision process (MDP) to be solved by leveraging the DRL approach.

Definition of AoI
The AoI is an important metric to portray the freshness of collected data and is defined as the time elapsed between the data collected by the AUV from the CHs until its delivery to the sink node [41]. We use δ i,t to denote the AoI collected from c i in the navigation trajectory at time t. When t < T i , the information of CH c i is not sampled since it is not visited, and thus, δ i,t = 0 holds. Otherwise, δ i,t = t − T i holds. Then, the AoI of c i at the start of time slot t is given by the following relation.
The primary factors affecting AoI during data collection include data transmission delay and AUV sailing time. We use T ac i and T op i to denote the data transmission time using UAC and UOC, respectively. The time to transmit B i bits by the UAC can be written as where R ac and V ac indicate the data rate and transmission velocity of the underwater acoustic modem, respectively. Similarly, the data transmission time of the UOC at data rate R op and transmission velocity V op can be obtained as follows.
To collect the monitored data, the AUV travels from the sink p 0 , collects data from each of the N CHs according to a pre-determined trajectory, and then returns to the sink node after completing the task. Assume that the travel trajectory of the AUV P = {p 0 , p i , . . . , p j , p 0 }, and thus the travel time of the AUV can be expressed as where D(P) and V AUV denote the total distance and velocity traveled by the AUV, respectively. According to (11)-(13), from the moment T i when AUV arrives at CH c i to the moment T i+1 when it finishes collecting data and moves to the next data collection point, the AoI of CH c i can be expressed as T m i + t travel i,i+1 . The optical communication has a much smaller transmission delay compared to the acoustic waves. However, the acoustic communication enables long-range transmission that significantly reduces the travel time of the AUV. The time delay caused by data transmission is mainly determined by the data size B i , and so the data collection time and traveling time need to be considered jointly to reduce the decline of data freshness. Then, at moment t = T i+1 , the AoI collected from CH c i refers to where b i = 1, i = {1, 2, . . . , N} indicates that the data of CH c i has been collected; otherwise, b i = 0 holds. When the AUV arrives at the sink node, the AoI of c i is where η i denotes the importance weight of the data collected by CH c i , and ∑ N i=1 η i = 1. The higher its value, the greater the data importance.

Energy Consumption Associated With Data Collection
To satisfy the energy constraint (9d) in the optimization problem (9), we analyze the AUV energy consumption and node energy consumption. In the data collection process, there are extra costs associated with the AUV if it runs out of energy before returning to the sink node. Therefore, the trajectory of the AUV should be scheduled to minimize energy consumption. The power of the AUV at each time slot mainly consists of the sum of propulsion power Φ prop and hotel load power Φ H [42]. The hotel load Φ H is the power consumed by all subsystems other than propulsion mechanism and is typically negligible in comparison with Φ prop [43]. Therefore, the power of the AUV trajectory can be expressed as where · denotes the Euclidean vector norm and ρ is the density of water. η p , C D and A s indicate the efficiency of the AUV's propulsion system, the drag coefficient and the wetted surface area, respectively [7]. Consequently, with the relations in (4), (6), (11)-(13) and (16), the total energy consumption can be expressed as where is a weighted parameter that measures the balance between the energy consumption of the sensor node and that of the AUV.

Proposed DRL-Based Multi-Modal Data Collection Scheme
In this section, we design the AUV multi-modal data collection scheme by leveraging the DRL approach. In this scheme, we first provide the MDP formulation and then present a multi-modal steering angle optimization (MSAO) algorithm for the AUV. Afterwards, we design the AUV path planning using the Deep Q Network (DQN) method for multi-modal mobile data collection.

MDP Formulation
When the network nodes are clustered, the next goal is to find an optimal CHs data collection strategy. The AUV-assisted data collection problem can be formulated as an MDP to be solvable by the DRL approach, which is represented by < S, A, P, R, γ > five tuples. Here, S is the state of the environment, A is the set of actions of the agent, P is the state transition probability, R is the reward function, and γ denotes the discount factor. In particular, at time slot t, the agent observes state s t and chooses an action to be performed. Then, the environment state is transferred with probability p s t ,s t+1 to s t+1 and the agent obtains a reward r t from the environment. In this paper, the AUV is considered as the agent to collect data, and the details of each element are defined as follows.
• State space S: The status of AUV mobile collection is defined as where p a,t and ψ t are the coordinates and sailing orientation of the AUV at time slot t, and its position can be obtained via ultra-short baseline (USBL) [44]. ∆ t is the difference between the remaining energy of the AUV and the AUV's arrival at its final destination from its current position. d i,t records the Euclidean distance of the AUV to CH c i . x m i,t is the data collection indicator related to the data collection option. When the AUV arrives at the data collection point c i , the AoI of node c i starts to be updated.

•
Action space A: In state s t , the action selection of the AUV is characterized by the target point c i,t ∈ N r with the transmission option m i,t , and the next target point c j,t ∈ N r \c i,t , where N r is the set of CHs that have not been collected. Then, the action performed by AUV at state s t can be expressed as • State transition probability P: P (s t+1 |s t , a t ) defines the transition probability from state s t to the next state s t+1 under the action a t , and P (s t+1 |s t , a t ) = 1 holds. • Reward R: Applying action a t in state s t , the AUV enters state s t+1 and obtains an immediate reward r(s t+1 |s t , a t ). In the AUV-assisted multi-modal data collection scenario, the immediate reward r t can be expressed as where k 1 , k 2 are constants and k 1 < k 2 holds, and when the AUV has collected the data of CH c i , the relevant reward is obtained according to the selected modem. dis p a ,p i is the Euclidean distance from the current position of the AUV to the target point. J denotes the reward at the end of the data collection process, including rewards for successful data collection and penalties for failure (e.g., exceeding maximum energy consumption and crossing boundaries).
where k 3 is a constant and Ω is the region in which the AUV can move within. • Discount factor γ: γ ∈ [0, 1] is the future reward discount factor.

Multi-Modal Steering Angle Optimization Algorithm
In the multi-modal data collection network, since the communication radius can reduce the navigation time of AUVs, we propose an MSAO algorithm to adjust the AUV heading under the maximum steering angle constraint. In MSAO, the steering angle of the AUV is calculated based on its position, the navigation target and the communication options. As shown in Figure 2, the yellow triangle indicates the position p a,t of the AUV at time slot t, the blue pentagram indicates the CHs that need to perform the data collection operation, the outer circle C ac and inner circle C op indicate the communication range of UAC and UOC, respectively. Let c i be the AUV's current target CH and c j be the next target CH, the p r i ,m indicates the target hover point when the AUV selects communication option m = {ac, op}, the ψ m,t indicates the angle of the AUV toward the target hover point p r i ,m at time slot t. The goal is to obtain the point p r i ,m such that the p a,t − p r i ,m + c j − p r i ,m distance is shortest within the communication range C m of the communication options m. This problem is a classical pilgrimage problem in ancient castles, and hence, an approximate solution of p r i ,m = (x r i , y r i ) can be obtained by the following equation [45].
where ς 1 = , y = y i −y r i d m , d ai,t and d ij denote the distance of the AUV from the target at time slot t and the distance of the current target CH c i from the next target CH c j , respectively. Then, the steering angle of the AUV at time slot t can be expressed as where ψ max is the maximum steering angle allowed by the AUV. Then, depending on the target location and the communication option, the steering angle of the AUV can be adjusted in the following two cases.  • Case 1: The AUV is not through the region C m from the current position p a,t to the next target collection point c j ; i.e., the distance d seg i from point c i to the segment p a,t c j is greater than the UAC radius. As shown in Figure 2a, after determining the communication option, the points p r i ,ac (or p r i ,op ) are obtained in circle C ac (or C op ) to minimize the length of the AUV trajectory. For example, when the CH c i , c j and acoustic modem are selected, the AUV hover position p r i ,ac = (x r i , y r i ) for data collection and the steering angle Ψ ac,t can be calculated by (22) and (23), respectively. Similarly, when m = op holds, the data collection hover point p r i ,op and the steering angle Ψ op,t can be obtained using the same approach. • Case 2: The trajectory of the AUV from the current coordinate p a,t to the next target CH c j sails through the communication region C m of c i . If the AUV crosses the UAC area C ac without crossing the communication area C op , d seg i becomes shorter than d ac but greater than d op . As shown in Figure 2b, the data collection hover point of the AUV is the vertical foot p r i ,ac from c i to segment p a,t c j if UAC is selected as the communication option. Then, the steering angle of the AUV can be obtained by (23). If the selected communication option is UOC, the data collection point and steering angle are calculated following the method in Case 1. Furthermore, if d seg i is less than d op , i.e., the AUV crosses the UOC range of c i , then UOC is selected directly as the communication option. This is due to the superiority of UOC over UAC in terms of energy consumption and transmission time for the same AUV trajectory. The data collection hover point and steering angle of the AUV are similar to the method in Case 2.
Based on the above discussion, we obtain the MSAO algorithm that is shown in Algorithm 1.

Algorithm 1 Proposed MSAO Algorithm
Require: Coordinate of the AUV p a,t , coordinates of the current target CH c i , coordinates of the next target CH c j , UAC communication radius d ac and UOC communication radius d op . 1: if d seg i > d ac and m = ac then 2: Calculate the data collection hover position p c i ,ac by (22). 3: Calculate steering angle Ψ ac,t by (23). 4: else if d seg i ≤ d ac and d seg i > d op and m = ac then 5: The data collection hover position is the vertical foot p r i ,ac from c i to the segment p a,t c j . 6: Calculate steering angle Ψ ac,t by (23). 7: else if d seg i > d op and m = op then 8: Calculate the data collection hover position p r i ,op by (22). 9: Calculate steering angle Ψ op,t by (23). 10: else if d seg i ≤ d op then 11: The data collection hover position is the vertical foot p r i ,op from c i to the segment p a,t c j . 12: Calculate steering angle Ψ op,t by (23). 13: end if Ensure: The steering angle of the AUV: Ψ m,t .

DRL-Based Multi-Modal Path Planning Scheme
Due to the uncertainty of reference access points and node data arrivals, the locations of AUV and the AoI of collected data are inherently random, which leads to a proliferation of state space dimensions. In comparison, DRL can handle extremely large state space by estimating the Q values of states s and actions a through neural networks [45,46]. The training framework of DQN includes a current Q-network and a target Q-network. In order to balance experience and exploration of the unknown, the agent at state s t selects the action a t to be performed by the -greedy algorithm [47]. a t = random a ∈ A, with probability ε arg max a t Q(s t , a t ; θ), with probability 1 − ε . (24) Immediately after adjusting the navigation angle and execution of action a t , the AUV receives reward r t and the data acquisition network moves to the next state s t+1 . Aiming to reduce the correlation between the online Q-network samples, an experience replay B is used to store historical experience samples (s t , a t , r t , s t+1 ). At each training step, a small batch of randomly selected empirical samples Φ b from the experience replay is used to update the parameters of the online Q-network. In addition, we denote the parameters of DQN as θ, and the parameters of the online Q-network are determined by minimizing the loss function.
where δ(s, a) = y t − Q(s t , a t |θ) is the temporal difference, and y t is the target Q-value, which can be calculated by where θ − denoted the parameter of the target Q-network. Then, the weight of the current network θ is updated by the following formula. a t , θ).
The proposed AUV-assisted data collection algorithm is shown in Algorithm 2. The algorithm starts by initializing all neural networks as well as the replay buffer B. The training iterates over E episodes, and the environment is initialized in each episode by observing the distribution of CHs. The action is first obtained according to the -greedy policy, which is followed by inputting the action to Algorithm 1 to obtain the steering angle. Then, the AUV moves to the next state s t+1 and receives an immediate reward r t . After storing the transition tuple (s t , a t , r t , s t+1 ) in experience replay B, a randomly selected sample of Φ b is utilized to learn the current network Q, and it updates the weights of the current network θ and that of the target network θ − . Then, c i is removed from N r if the current state is able to collect the data of c i , and the current loop is terminated when N r = ∅ holds.
Algorithm 2 DRL-Based Multi-Modal Data Collection Algorithm 1: Input: Initialize the constants k 1 , k 2 and k 3 , maximum number of training sets E, reward discount factor γ, learning rate l r , experience replay B, minimum batch Φ b , exploration probability , and update step χ; 2: Initialize the current network Q(s t , a t , θ) with weights θ and the target network Q(s t , a t , θ − ) with weights θ − . 3: for episode = 1, · · · , E do 4: for t = 1, · · · , T do 5: Initialize the data collection network environment and observe the initial state s t . 6: Select a random action a t according to the -greedy algorithm. 7: Determine the AUV steering angle with Algorithm 1.

8:
Execute action a t and observe the reward r t and the next state s t+1 .

9:
Store experience (s t , a t , r t , s t+1 ) in experience replay B.

10:
Sample a random mini-batch of Φ b experiences from B.

11:
Calculate the target value y t by (26). 12: Update the current network weights θ by (27). 13: Update the weights of the target network θ − = θ every χ steps. 14: if s t+1 is the collection stop n i then 15: Remove the CH c i from N r . 16: end if 17: Terminate the episode if N r = ∅ holds. 18: end for 19: end for 20: Output: The AUV trajectory p a,t and the AoI A i .

Results and Discussion
In this section, we conduct extensive simulations to verify the effectiveness of the proposed scheme. The simulation setup and numerical performance results are given as follows.

Simulation Setup
To evaluate the proposed scheme, we assume that there are 50 sensor nodes uniformly distributed in an 800 m × 800 m square target area. After CHs are designated, data fusion and data compression are performed by CHs. It is assumed that the data types collected and transmitted by the normal sensor nodes are text, records and images, and the amount of data pooled by the CHs is set to be between 10 and 300 packets, with the size of each packet 1024 bits. The AUV starts from the start point p 0 = (50, 120) with an initial orientation angle ψ m,0 = 0 • and returns to p 0 after collecting data from all the CHs.
To evaluate the performance of the algorithm, a python 3.8 simulation environment was chosen. The target Q-network and the current Q-network are two-layer fully connected networks with 256 neurons per layer, and we use the ReLU function as the activation function to train both networks using the Adam optimizer. Other simulation parameters and their specific values are provided in Table 1. For the sake of performance comparison, the benchmark algorithms are provided as follows.

•
Single Acoustic: The AUV exchanges data utilizing acoustic waves during data collection, and the hovering positions are determined by the UAC radius during the selection process of the steering angle. The AUV trajectories are learned using the DQN algorithm. • Single Optical: The AUV can exchange data only by selecting optical waves and calculating the AUV hovering locations by means of the UOC radius. The DQN algorithm is used to learn the AUV trajectory. • Energy Greedy: The AUV performs steering Algorithm 1 and then greedily selects the nodes with the shortest path length in the data collection sequence.

The Convergence Performance
To demonstrate the convergence of the AoI optimization algorithm for AUV data collection, in Figure 3, we show the variation of the cumulative reward, where the X-axis represents the number of iterations trained and the Y-axis presents the cumulative reward. It can be seen that in the early stages of training, the cumulative reward values are very low due to the high chance of -greedy random exploration. As the training period continues to increase, the reward value gradually increases and stabilizes.

Impact of the AUV Velocity on Performance
To explore the effect of AUV velocity on the average AoI, we simulated the average AoI performance of the collected data with data arrival rate λ = 20 and 300 Kbits. The experimental results are shown in Figure 4, where it can be observed that the average AoI of the data collected by AUV gradually decreases with increasing AUV velocity. The weighted average AoI of the single UAC is lower than that of the multi-modal and single UOC when the AUV speed is 0.5 m/s. With the increase of the AUV velocity, the average AoI of the multi-modal data collection scheme is better than that of the single communication option.

Rewards
Episodes rewards Average rewards  The primary reason for this performance is that the AUV travels slower and increases travel time, whereas the long-distance data collection via the single UAC is able to reduce the travel time of the AUV, which mitigates the increase in AoI. As the AUV velocity increases, the effect of AUV travel time on AoI is weakened, which makes the weight of data transmission time increase for AoI; thus, the UOC scheme outperforms the UAC scheme at higher AUV velocities. The multi-modal data collection scheme selects the best communication option according to the data characteristics, AUV navigation time and data transmission time so that the overall performance is better than the single-modal scheme.
Furthermore, the average AoI of our proposed multi-modal data collection scheme at the AUV velocity of 0.5 m/s is inferior to that of the single UAC scheme; this is because the multi-modal scheme not only considers AoI but also focuses on data collection energy consumption, so it sacrifices some AoI performance to reduce CHs energy consumption.
Under the parametric conditions of Figure 4, we analyzed the effect of AUV velocity on the energy consumption of data collection. Considering that the energy consumption of CHs is irreversible, we pay more attention to the energy consumption of data transmission, and therefore, we set = 0.03. As shown in Figure 5, it can be observed that the weighted energy consumption of the single UAC scheme always remains at the highest level owing to the high weight of data transmission energy consumption. The single UOC scheme performs well in terms of CHs data transmission energy consumption, and hence, the weighted energy consumption is better than the single UAC scheme. The multi-modal scheme is able to reduce both AUV energy consumption and CHs data transmission energy consumption by jointly deciding on the best communication option based on packet size and path length. Furthermore, with the increasing AUV velocity, the weighted energy consumption of the three schemes will be convergent, since the AUV power increases geometrically with the velocity of travel. 6.4. Impact of the Data Arrival Rate on Performance Figure 6 shows the effect of different data arrival rates on the weighted average AoI. The velocity of the AUV is set to 1 m/s, and the length of each time slot is set to 6 s. It can be observed from the figure that at lower data arrival rates, the single UAC is superior to the single UOC since the acoustic waves can be deployed for long-range data transmission, which significantly saves the travel time of the AUV. As the data arrival rate increases, the weight of data transmission time on AoI improves, which results in the single UOC scheme being superior to the single UAC scheme in terms of AoI. In addition, the greedy algorithm performs poorly in weighted average AoI as it greedily selects the closest visit location ignoring data importance and AoI. Our proposed multi-modal data collection scheme outperforms the other three schemes for different data arrival rates and is near the single UOC performance when the data size is over 140 Kbits. The main reason for this phenomenon is that the multi-modal scheme selects acoustic communication to reduce the sailing time when the data size is small and optical communication for fast data transmission when the data size is large, and thus, it can adapt to different data conditions and achieve a relatively low weighted average AoI.
To verify the superiority of the proposed multi-modal data collection algorithm in terms of energy consumption for data collection, we compare the transmission energy consumption of CHs and AUV energy consumption for different data sizes. In this study, we set the AUV velocity to 1 m/s and the data size to 20-200 Kbits, and the experimental results are shown in Table 2 and Figure 7. It is observed that the average data transmission energy consumption of CHs under a single UOC approach is the smallest, and the AUV energy consumption is the highest. The energy consumption of CHs is the highest for the single UAC and greedy approaches, but the AUV energy consumption is kept at a low level. In the multi-modal scheme, the energy consumption of CH increases and then decreases with the increasing data size, and the AUV energy consumption gradually increases, but the overall energy consumption remains at a low level.  The reason for such a phenomenon can be explained as follows. The single UOC requires the AUV to travel to the immediate vicinity of the node, which increases the energy consumption of the AUV for navigation. Fortunately, due to the low energy consumption and high bandwidth of the optical modems, the energy consumption of the CHs owing to the data transmission is low. Similarly, the single UAC and greedy algorithm allow the AUV to collect data over longer distances using acoustic waves, which greatly saves AUV energy consumption. However, with the increasing data size, the low bandwidth and high energy consumption of the UAC make the energy consumption of the CHs significant. In the multi-modal scheme, the AUV selects the UAC to collect data when the data size is small, keeping the transmission energy consumption of the CHs low while reducing the mobile energy consumption. When the data size is larger than 140 Kbits, the multimodal scheme switches to strictly optical communication mode to reduce the excessive energy consumption of the CHs in order to extend the lifetime of the UWSN. Note that our proposed multi-modal data collection scheme has an excellent performance in the face of diversified data, and when the data size of each CHs is large (or small), the multi-modal scheme will become a strict UOC (or UAC) scheme. In Figure 8, we show the weights of AoI for each CHs under different schemes. The results show that the greedy scheme has the maximum AoI value for CH index = 5 and the lowest AoI value for index = 1. This is because the greedy algorithm ignores the effect of data importance when selecting the nearest nodes to visit, resulting in a large data AoI for the first visited node. The other three schemes use reinforcement learning methods to select the best node access order based on the importance of the data, which avoids the extreme cases of AoI values. Furthermore, the multi-modal scheme flexibly selects the communication options based on the data size and importance of the nodes, and hence, its performance is better compared to the other two single-modal schemes. It is worth noting that we neglected the specific details of light alignment and the time consumed during data collection, which result in a seemingly promising AoI performance for the single UOC. In future work, we will consider more details of underwater optical communication.

Conclusions
In this paper, we proposed an AUV-assisted multi-modal data collection scheme which provides timely and reliable data collection by utilizing underwater acoustic and optical communication technologies in an adaptive manner. The trajectory planning problem is formulated as a mixed integer nonlinear problem to minimize the weighted average AoI and energy consumption, and the data collection problem is formulated as an MDP considering data importance, packet size, and data collection options. We then developed a DQN-based learning algorithm to determine the optimal strategy. In addition, an AUV multi-modal corner optimization algorithm is proposed to reduce the energy consumption of AUV navigation. Through numerical simulations, we showed that our proposed algorithm has convergence capability as well as verified that the AUV path-planning algorithm has excellent performance which can effectively reduce the AoI and energy consumption of collected data.
Author Contributions: F.B. proposed the main ideas, wrote the paper, designed the description framework, and conducted the simulations. H.L. provided guidance for the work, discussed and provided ideas, wrote and modified the paper and acquired funding. X.L., R.R. and G.H. provided guidance for the work, and collaborated in discussion on the proposed system model and techniques. S.M. assisted in testing the code and checked the paper. All authors have read and agreed to the published version of the manuscript.