1. Introduction
Recently, interest in marine resources has grown considerably, resulting in increased marine development activities. Autonomous Underwater Vehicles (AUVs) are crucial for tasks such as seabed resource exploration, submarine pipeline maintenance, and marine data collection [
1,
2]. Therefore, obtaining precise navigation and positioning technology for AUVs is crucial to ensuring successful and timely task completion, owing to the highly complex marine environment. In contrast to land robots [
3] and aerial robots [
4], AUVs do not receive GPS signals underwater, posing a challenge for traditional satellite-dependent navigation techniques in this environment. Emerging technologies have been increasingly employed recently for successful underwater localization and navigation. The primary underwater navigation and localization techniques are categorized into four main groups: acoustic navigation [
5,
6,
7], geophysical navigation [
8,
9,
10], Simultaneous Localization and Mapping (SLAM), and inertial navigation and dead reckoning [
11,
12]. Acoustic waves are the most effective method for transmitting information underwater, making acoustic navigation the primary method for underwater target navigation and localization. Nonetheless, the acoustic beacons must be placed in advance, as acoustic navigation is ineffective in an unknown environment. Geophysical navigation can be divided into three primary groups based on the requisite geophysical parameters: terrain-matching navigation, marine geomagnetic navigation, and gravity navigation. However, geophysical navigation is limited by the requirement to obtain geophysical parameters in advance. Conversely, SLAM enables AUVs to create maps of their surroundings and determine their position within that environment. However, SLAM requires external environmental information measured by additional sensors and high computation capacity.
Inertial navigation is an autonomous system known for not relying on external information or emitting energy externally. The Inertial Navigation System (INS) uses triaxial gyroscopes and accelerometers to measure angular rate and acceleration. Then, the attitude, velocity, and position information of the AUV is calculated by an integral operation. However, the integration process results in error accumulation in the INS, and over a long navigation period, the position can be shifted considerably. This approach partially mitigates the error accumulation problem by utilizing a Doppler Velocity Logger (DVL) for bottom-track velocity and integrating INS and DVL measurements. Kalman filtering (KF) is a widely applied data fusion method [
13], and it can achieve optimal filtering with Gaussian white noise in the system process. The bottom-track velocities measured by the DVL are indispensable in the data fusion algorithm. However, the DVL is sensitive to the complex marine environment, which may cause inaccuracies in velocity measurements. For instance, DVL bottom tracking can be vulnerable to interference and disruptions due to steep seafloor slopes or rifts, AUV attitude, currents, and fish populations [
14], as shown in
Figure 1. In cases where the DVL produces anomalous values for a brief period, the issue can be resolved by utilizing effective bottom tracking from the previous moment. Nevertheless, this method is inadequate when the DVL outputs anomalous data for an extended period or is inactive, in which case the INS solution error accumulates and the navigation accuracy significantly degrades over time. Consequently, investigating the navigation method when the DVL output remains invalid for an extended period is crucial.
Some methods have been commonly used in the existing literature to address invalid bottom tracking [
15,
16]: One approach involves implementing combined navigation by installing additional sensors to replace the DVL in case of failure; however, this method increases costs and system complexity. Conversely, another method replaces the DVL with a mathematical model generating virtual bottom velocity information, solved by way of modeling single- and three-degree-of-freedom dynamics [
17]. And Kinsey et al. developed a single-degree-of-freedom nonlinear dynamic model estimator and verified its feasibility [
18]. Zhao et al. introduced a mechanism for outlier detection in DVL data and compensated for velocity anomalies using a kinematic model. However, the complexity of AUV models in challenging marine environments makes it difficult to obtain accurate hydrodynamic parameters. Therefore, building precise AUV dynamic models is evidently impractical. Establishing dynamic models with single and three degrees of freedom, validated through sea trials, demonstrated that the speeds calculated using these models closely aligned with those measured by the DVL.
Various machine learning algorithms, such as Support Vector Machines (SVMs) [
19], Random Forests (RFs) [
20], Extreme Learning Machines (ELMs) [
21], and Artificial Neural Networks (ANNs) [
22], have been employed in diverse fields owing to the recent widespread application of artificial intelligence technology. In their study, Mu et al. [
23] applied the time-series learning mechanism to AUV navigation and proposed a novel neural network framework using Long Short-Term Memory (LSTM) to process multi-sensor data and determine the position of an AUV during navigation. Another study [
19] developed a hybrid predictor by combining partial least squares regression and support vector regression to estimate the bottom velocity of a DVL when faced with DVL failure. Lv et al. employed ELM to establish a model relationship between the AUV’s thruster speed, attitude, rudder information, and bottom velocity to compensate for DVL failures. Li et al. proposed a nonlinear autoregressive framework with heteroscedastic inputs (NARX) and adaptive Kalman filtering to predict and fuse DVL outputs. Water-track velocity and flow rate estimation during anomalous DVL bottom velocity were also investigated [
24]. Our study presents a deep learning framework incorporating LSTM and Self-Attention to address this issue, considering the current velocity as a variable to estimate the water-track velocity of the DVL. The effectiveness of our approach is validated by comparing the results with the measured data.
This paper proposes a cruise speed model based on the Self-Attention mechanism for estimating AUV speeds in complex marine environments. Utilizing inputs like acceleration, angle, angular velocity, and propeller speed, the model estimates cruise speed via the Self-Attention mechanism. This cruise speed corresponds to the velocities along the three axes of the AUV onboard coordinate system. As a consequence, the model sustains high navigation accuracy even when the bottom-track velocity data are consistently unavailable. The main contributions of this paper are as follows:
(1) To address the continuous failure of bottom-track velocity measurements in complex marine environments, a deep learning-based AUV speed estimation model is constructed to predict and output bottom-track velocities, enhancing AUV navigation accuracy during DVL failures.
(2) LSTM will be used to separately extract time-series data from different data sources, and Self-Attention will be employed to enhance the encoding of time-series data. Water flow rate information is introduced into the network as input to compensate for ocean current information, increasing the model’s generalization capability.
(3) The proposed Self-Attention-based cruise speed model’s effectiveness on AUVs will be validated through sea trials and simulation data. The results show that the proposed model achieves better navigation accuracy compared to using water-track velocity compensation.
The rest of this paper is organized as follows:
Section 2 describes the AUV and equipment specifications used for the field trials.
Section 3 derives a model for the application of Kalman filtering in combined AUV navigation.
Section 4 details the network model framework and analyzes the results obtained in
Section 5. Finally,
Section 6 concludes the study.
2. An Introduction of the AUV Platform
Herein, we present the AUV used in our experiment, depicted in
Figure 2. The XH R300 employs a double main thrust propulsion system capable of attaining a maximum speed of 5 knots and sustaining continuous travel for up to 10 km. The hydrodynamic characteristics of the XH R300 are notably intricate, necessitating the formulation of a three-degree-of-freedom dynamics model to elucidate its motion. This modeling endeavor is predicated on several key assumptions: first, the AUV is treated as a rigid body; second, the current is assumed to be a two-dimensional flow lacking rotational components; and third, the fluid medium is regarded as uniform and unbounded. The kinetic equations governing the AUV’s motion are conventionally expressed as follows:
where
v denotes the triaxial component of the AUV velocity in the carrier coordinate system,
M and
C, respectively, denote the inertia matrix and the Coriolis centripetal matrix of the rigid body,
are external forces and moments,
and
are, respectively, the axial and lateral forces acting on the AUV, and
is the yaw external moment. The expression is as follows:
where
and
are, respectively, the thrust of the port and starboard thrusters, and
B is the distance between the thrusters. The three-degree-of-freedom nonlinear dynamics model of the AUV can be described as
where
,
, and
represent hydrodynamic coefficients. According to Equations (2) and (3), the AUV speed is related to the acceleration, angle, angular velocity, and amount of rudder thrust. The thrust of the servos, in turn, is related to the rotational speed and current obtained through various sensor measurements, which will be used later in this study to estimate the AUV speed. The equipment used to obtain the relevant data is illustrated below.
The XH R300 is equipped with a signal cabin, control cabin, power control cabin, and power operation cabin. The primary sensors include a GPS module, Iridium satellite, radio, Wifi, INS, DVL, and depth gauge to obtain AUV position, acceleration, angle, and angular velocity information. Based on functionality, the main control can be divided into a control unit, navigation and positioning unit, guidance and planning unit, perception unit, fault detection unit, and data storage unit. The navigation and positioning unit is crucial for real-time acquisition of AUV pose information and provides the foundational support for the operation of the control unit and guidance and planning unit. The GPS module offers real-time precise latitude and longitude data while the AUV operates on the water surface, as delineated in
Table 1. Nevertheless, owing to the rapid attenuation of GPS signals in water, the XH R300 incorporates the INS (detailed in
Table 2) that derives the AUV’s position, velocity, and triaxial attitude angle by integrating data from the gyroscope, measuring angular rates, and the accelerometer, gauging triaxial accelerations. However, the integration process inevitably results in error accumulation within the INS, impinging upon navigation accuracy. Consequently, the XH R300 is outfitted with a Pathfinder 600 KHz DVL developed by Teledyne, described in
Table 3, to rectify these discrepancies. The DVL emits sound waves via a transducer when a phased array is employed, which, upon reaching the seabed, bounce back, enabling velocity estimation relative to the seafloor by analyzing frequency shifts in the received echoes. When GPS signals are unavailable underwater, the disparity between the raw speed of the INS and the speed of the DVL serves as feedback, refining the INS output through an indirect approach.
However, in deep-sea environments exceeding the operational range of the DVL or encountering steep seabed inclines, the acoustic waves of the DVL may fail to reach or be detected upon seabed contact, rendering the bottom-track data invalid and precluding its integration with INS for high-precision navigation. Although the DVL can also provide water-track velocities, they are notably less precise than bottom-track velocities and fail to meet stringent navigation accuracy requisites. A novel solution addressing these challenges is proposed herein and elaborated upon subsequently. Additionally, a depth gauge ISD4000 developed by Impact Subsea is integrated into the XH R300 for precise depth determination, ensuring accurate depth measurement.
4. Deep Learning Navigation Architecture
The AUV state data, captured as a time series, exhibit significant correlations over time. Previous studies on DVL anomalies often treated sensor data at each moment in isolation, neglecting the time-series correlations. Furthermore, not all data points are equally important in predicting subsequent states. In response to these considerations, this section presents a detailed description of a novel deep learning network architecture, developed after comprehensively examining these two aspects.
4.1. Basic LSTM Principles
Deep learning has recently emerged as a ubiquitous tool across various domains, with researchers continuously introducing new network architectures that demonstrate remarkable performance in practical applications. Among these architectures, Recurrent Neural Networks (RNNs) have found widespread use in tasks involving time-series prediction and natural language processing, owing to their adeptness in handling sequential data. Given that AUV sensor data inherently represent time-series data, RNNs are a natural choice for AUV navigation tasks. However, conventional RNNs struggle to retain long-term dependencies, with information relevance diminishing as it recedes from the current moment. This limitation stems from the BackPropagation Through Time (BPTT) method employed during training, where gradients associated with distant moments gradually vanish, rendering conventional RNNs inadequate to address long-term dependency issues [
25].
LSTM [
26] networks were introduced to mitigate the challenge of vanishing gradients and effectively model long-term dependencies. LSTM represents a specialized variant of RNNs explicitly designed to tackle gradient instability encountered when training sequences with long time-series spans. By introducing more gating units to control the information flow within the network, the stability of the parameter optimization process is enhanced. The
function is used to extract valid information to alleviate the problem of vanishing gradients in the calculation of memory cells and hidden states. The LSTM architecture, depicted in
Figure 3, incorporates memory cells and introduces several gating mechanisms to regulate the flow of information within the network. At each time step, the input
from the current moment and the hidden state
from the preceding moment are fed into the LSTM gates, which undergo processing via three fully connected layers equipped with sigmoid activation functions to compute the input, forget, and output gate values. This computation proceeds as follows:
where
,
,
and
,
,
are weight parameters and
,
,
are bias parameters. The candidate memory element
is calculated similarly to the gate but using the
function as the activation function. Its equation at moment t is as follows:
where
and
are weight parameters and
are bias parameters. Subsequently, the memory cells are computed, utilizing the previously derived input and forget gate values to determine the extent to which new data from candidate memory cells are incorporated while retaining relevant past information. This approach effectively mitigates the issue of vanishing gradients and facilitates capturing relationships with long-term dependencies within the time series. The computation of memory cells can be described as follows:
Finally, the hidden state
is computed, leveraging the output gate and memory cells. When the output gate is close to 1, it signifies the effective propagation of all memorized information to the prediction phase. Conversely, when the output gate is close to 0, it implies information retention solely within the memory cells without updating the hidden state. This computation unfolds as follows:
LSTM has found extensive utility in natural language processing owing to its adeptness in handling long-term dependencies. The proposed model leverages LSTM to process time-series data, with the output of the LSTM layer serving as input to the subsequent attention mechanism layer, as elaborated upon in subsequent sections.
4.2. Self-Attention Mechanism
The Self-Attention mechanism represents a network configuration that comprehensively considers the overall context while prioritizing salient features. In time-series data, the information at any given moment is often interdependent on preceding moments. However, the correlation between data from different moments and the current moment varies. Therefore, during data training, incorporating information from previous moments and emphasizing the most pertinent information is crucial. This is commonly referred to as the Self-Attention mechanism.
The computational process of the Self-Attention mechanism is illustrated in
Figure 4. The input
is subjected to multiplication by three weight matrices
,
, and
to derive
Q,
K, and
V, respectively. Subsequently, the resultant
Q and
K are used to compute the correlation between input vectors
, typically through dot-multiplication. Normalization is then performed using the SoftMax function to obtain
A. Finally,
A is multiplied by
V to yield the output of the Self-Attention mechanism layer.
4.3. The Deep Learning Navigation Framework Based on Self-Attention
In the complex marine environment, the navigation and localization of AUVs predominantly rely on INS and DVL. However, DVL may produce invalid readings under certain conditions, such as encountering a school of fish, resulting in short-term data invalidation. Prolonged DVL invalidity occurs in ultra-deep waters or when encountering steep seabed slopes with no echo returns. While short-term invalidations can be compensated for using kinetic models, relying on such models for extended durations introduces deviations from actual velocities, impeding high-precision navigation and localization.
This section proposes a deep learning navigation framework based on the Self-Attention mechanism to achieve precise navigation over extended periods. The framework adopts an encoder–decoder architecture, organizing sensor data into time-series sequences inputted into the LSTM layer for encoding. Subsequently, time-series data are further refined through the Self-Attention mechanism, followed by decoding through fully connected layers and water-track velocity.
According to the AUV dynamics model outlined in
Section 2, the velocity of the AUV correlates with acceleration, angular velocity, angle, thrust [
27], and other factors. Acceleration encompasses triaxial acceleration in the instrument coordinate system, while angular velocity includes the triaxial angular velocity of the gyroscope. The angle comprises pitch and roll angles obtained from the INS. Thrust indirectly indicates the speed and current of twin thrusters. Although these data constitute time-series sequences, their sampling frequencies vary among sensors; for example, the collection frequency of the INS is 10, and the collection frequency of the thruster is 2. Although interpolation methods can be used to unify data of different frequencies to a common frequency, models built using this method may cause information increase and loss due to artificial data accumulation or interpolation. Separately processing data of different frequencies can also reduce the data preprocessing process. Additionally, separately processing data from different sources allows the encoder to only encode the data without handling the relationships between data, thus decoupling the network functionally and reducing repetitive work. Hence, data from sensors with different frequencies are inputted into corresponding LSTM layers. As depicted in
Figure 5, this framework employs five LSTM layers to receive acceleration, angular velocity, angle, thruster speed, and current information. After extracting and compressing the time-series data of sensors into context vectors through the LSTM layer, the hidden layer serves as the input for further training on data significance at different moments through the Self-Attention mechanism layer. Finally, the Self-Attention mechanism layer output and the DVL-derived bottom-track velocity are fed into the fully connected layer for decoding.
The encoder–decoder architecture decouples the network, reducing redundancy while facilitating input–output sequence correspondence modeling. In the encoder stage, sensor time-series data are compressed into context vectors by LSTM, albeit with inevitable information loss. To address this information loss, a Self-Attention mechanism enhances time-series data encoding, learning correlations between input moments. The input in the decoder stage comprises timing vectors enhanced by the Self-Attention module. Given that the output solely represents the AUV velocity at the current moment without necessitating multiple sequence outputs, a linear layer is employed to map high-dimensional time-series vectors to a low-dimensional sample space, yielding the output of the model. To enhance model generalization during decoding, water-track velocity is encoded by LSTM and combined with timing input, serving as the final input to the linear layer. This enables the model to learn embedded sea current information.
The entire model can be summarized into two categories: First, LSTM and Self-Attention encode the timing information to obtain optimal timing vectors, addressing the long-term dependency problem and extending the inputs to high dimensions to extract effective information in all aspects. Second, the linear layer decoder maps the extracted time-series data to lower dimensions and learns the sea current information and water-track velocities to enhance the generalizability of the model and obtain the optimal output.
4.4. Portfolio Navigation Framework
After constructing the deep learning navigation model based on Self-Attention as described in the previous subsection, the collected INS, DVL, and thruster data are divided into two paths when the DVL operates normally. One input feeds into the combined navigation model for AUV position computation, while the other input trains optimized network parameters for AUV speed estimation. During a short DVL failure, the Pathfinder DVL outputs a valid flag for the bottom-track velocity, where an A flag indicates that the measured bottom-track velocity is valid and any other flag indicates that it is invalid. Consequently, no more fault detection activity is performed, and the combination is navigated by compensating for the speed of the AUV using the water-track velocity. Conversely, during prolonged DVL invalidity, the DVL is determined to be invalid for a long time by calculating the time
since the last valid flag bit. When
is larger than 10 s, the DVL for the bottom-velocity measurement is considered to have been invalid for a long time. At this point, the data from the corresponding sensors are fed into the AUV speed estimation model to predict the current AUV speed, and then the predicted speed is subtracted from the INS speed to obtain the measured value for optimal estimation. During the training and prediction of the AUV speed estimation model, the corresponding sensor data must be saved according to the set time interval. The frequency at which the DVL measures water-track velocity is used and is typically set to 1 s. The specific framework diagram is shown in
Figure 6.
6. Conclusions
This study proposes a deep learning model leveraging acceleration, angle, angular velocity, and thruster speed as inputs to estimate AUV speed. LSTM is employed to extract time-series data from these variables, while Self-Attention enhances time-series data encoding to address long-term dependency issues. Water flow rate information, crucially embedded in water-track velocity, is separately encoded and utilized to enhance network generalization. The experimental results based on sea trial data demonstrate that the deep learning-based speed estimation model outperforms direct compensation with water-track velocity, achieving higher speed accuracy and meeting the demand for high-precision combined navigation in persistent DVL failure scenarios, thus enhancing the accuracy of the combined navigation system. Additionally, this research can be extended to scenarios with significant ocean currents, sharp turns, or muddy conditions that severely reduce DVL accuracy, further enhancing the reliability of the integrated navigation system.
Although the proposed method demonstrates superiority in most cases, the accuracy of the speed estimation model may deteriorate with declining bottom-track accuracy, warranting further investigation.