Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network

Yin, Hong; Qu, Ke; Wang, Han; Li, Guangming

doi:10.3390/jmse14080735

Open AccessArticle

Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network

¹

College of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang 524088, China

²

Naval Research Institute, No.3 Wanshou Road, Haidian District, Beijing 100841, China

³

National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing 100071, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(8), 735; https://doi.org/10.3390/jmse14080735

Submission received: 19 March 2026 / Revised: 4 April 2026 / Accepted: 13 April 2026 / Published: 15 April 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Time-series machine learning models represented by long short-term memory (LSTM) networks provide an effective way to obtain high-precision sound speed profiles (SSPs) quickly and at low cost, which can meet the practical application requirements of underwater sonar systems. However, in sea areas with frequent strong internal solitary waves, the large-amplitude sound speed anomalies caused by them will seriously interfere with model learning in the form of strong outlier features, resulting in a sharp drop in SSP prediction accuracy and significant degradation of the generalization stability and robustness of the model. To address this problem, this paper proposes a time-series SSP prediction method based on a bidirectional long short-term memory (Bi-LSTM) network. First, Empirical Orthogonal Function (EOF) decomposition is used to realize the low-dimensional feature representation of SSPs, and then the bidirectional time-series feature capture capability of Bi-LSTM is used to predict the SSP sequence with large disturbances caused by strong internal solitary waves. Multiple groups of comparative experiments based on the measured temperature chain data in the continental slope area of the South China Sea show that the Bi-LSTM model has a significant improvement in prediction accuracy and robustness compared with the classical LSTM model. Among them, the Bi-LSTM model with EOF decomposition achieves a correlation coefficient of 0.995 and an average Root Mean Square Error (RMSE) as low as 0.387 m/s. Under the condition of internal solitary wave disturbance, the classical LSTM is difficult to effectively capture the large abrupt change in sound speed, while the proposed Bi-LSTM model can still achieve accurate prediction of the SSP in the disturbance section, and has both the feature recognition and evolution prediction capabilities for the strongly nonlinear internal solitary wave process. This method provides effective technical support for the rapid and large-scale reconstruction of the sound speed field under the disturbance of strong internal solitary waves.

Keywords:

sound speed profile (SSP); deep learning; bidirectional long short-term memory network (Bi-LSTM); internal solitary waves

1. Introduction

The sound speed profile (SSP) is a key factor for studying the characteristics of underwater acoustic propagation in the ocean, which is affected by seawater temperature, salinity and pressure [1,2]. In the fields of military anti-submarine operations, marine resource exploration and underwater navigation, real-time and accurate acquisition of SSPs is the core to ensure the operational efficiency of sonar systems. Traditionally, the main approaches to obtain SSPs rely on in situ direct measurements, such as real-time data collection using a Sound Velocity Profiler (SVP) [3], or indirect calculation via empirical formulas based on data measured by a Conductivity-Temperature-Depth (CTD) profiler [4]. Although these methods can provide relatively reliable data, their inherent limitations are evident: measurement operations consume substantial human and material resources, and the ship-borne point-by-point measurement mode has low efficiency and limited coverage range, which is far from meeting the demands of modern marine scientific research for large-scale, long-term and high-resolution marine environmental big data. This fundamental contradiction has driven the paradigm shift in SSP acquisition technology from direct measurement to indirect inversion and temporal prediction.

Accordingly, SSP inversion and reconstruction technologies have developed rapidly over the past few decades. Early researchers used instruments such as CTD and Expendable Bathythermograph (XBT) for direct measurement [5,6], which had high costs, failed to acquire deep-sea data and were incapable of high-density observations over a wide range. Subsequently, He et al. used Empirical Orthogonal Function (EOF) to reduce the dimensionality of historical SSPs and characterize the profiles with a small number of coefficients, thus reducing inversion parameters, but this method required a large volume of historical profile data to construct basis functions [7,8,9]. Qu et al. proposed the Hydrodynamic Mode Basis (HMB) with explicit physical interpretability, which reduced the dependence on in situ measured data and realized the innovation of the HMB method [10]. Hawe et al. conducted pioneering work on tensor dictionary learning, extending dictionary learning to multilinear spaces and providing a new paradigm for the design of basis functions for high-dimensional data [11,12,13]. Sun et al. introduced optimization technologies such as genetic algorithms (GAs) and surrogate models to improve inversion efficiency and accuracy, achieving intelligent optimization and inversion, yet this improvement in efficiency may come at the cost of losing physical interpretability [14,15]. Zhou et al. reconstructed the spatial sound speed field by combining a small amount of in situ measurements with historical data or ocean models, realizing sparse measurement and data assimilation, but the method suffered from high computational costs [16]. Sun first analyzed the historical SVP data in the measurement area using EOF and extracted the modal vectors (basis functions) that characterize the spatiotemporal variation features of sound speed in the region. On this basis, a fitness function was constructed to quantify the degree of seafloor topographic distortion caused by sound speed errors in multibeam bathymetric data, and a genetic algorithm was adopted for the global optimization of EOF reconstruction coefficients, ultimately achieving high-precision inversion of SSPs [17,18,19,20]. This method exhibits better practicality compared with some traditional inversion methods with a limited application scope or insufficient accuracy. Despite the ability of traditional methods to perform SSP prediction and inversion, they have certain limitations, such as restricted applicability and inadequate accuracy of the obtained results.

Over the past few decades, ocean sound speed analysis and prediction technologies have always been a research hotspot in the fields of marine science and acoustics. Affected by the coupled effects of multiple factors including temperature, salinity and pressure differences, internal wave disturbance, and ocean current movement, accurate prediction of ocean SSPs has become a major challenge. For a long time, ocean numerical simulation has been the core solution, but it has the limitations of intensive computation and low efficiency. In contrast, the measured ocean time-series data contain abundant information on marine dynamic correlations, and deep learning models can efficiently mine the inherent features and physical laws within such data, providing a reliable path for accurate SSP prediction. With the development of computer technology and the advent of the big data era, neural network models have been gradually applied to the field of SSP prediction and achieved numerous results. Yan et al. constructed an LSTM neural network model based on transfer learning and validated it using long-term observational data from two temperature chains measured in the South China Sea; however, traditional LSTM only considers forward temporal propagation and neglects backward dependencies, while Bi-LSTM can capture both forward and backward dependencies in sound speed evolution simultaneously [21,22,23,24]. Wang et al. used recurrent neural networks (RNNs) to learn historical hydrological data and invert time-varying SSPs in shallow-sea environments, but the model has poor adaptability to the complex structures of the deep sea [25,26]. Yuan et al. innovatively combined Spatio-Temporal LSTM (ST-LSTM) with a self-attention mechanism to construct the ST-LSTM-SA model, realizing accurate and real-time prediction of the sound speed field, yet this model is characterized by extremely high complexity, high computational cost and stringent data requirements [27]. J. Lu et al. used a Hierarchical LSTM (H-LSTM) neural network to explore the distribution patterns of sound speed in the temporal dimension, but the model failed to predict extreme data with large fluctuations [28,29,30]. Shen et al. used EOF to represent shallow-water SSPs, but the linear method has limited capability to characterize the strong nonlinear variations in the deep sea [31]. These studies have verified the feasibility of deep learning models for SSP prediction, but there are still three core unsolved problems in existing research, which constitute the core research motivation of this paper:

Most existing SSP prediction models are designed for stable marine environments and fail to address the severe interference from strong internal solitary waves (ISWs). In ISW-frequent areas like the continental slope of the South China Sea, ISWs cause sound speed anomalies up to 82 m/s, which act as strong outliers and drastically degrade the accuracy and generalization of traditional models. Accurate SSP prediction under ISW disturbance remains a critical engineering challenge.
State-of-the-art high-precision models (e.g., attention-based spatiotemporal models) suffer from high complexity, large parameter size and heavy computational cost. They are prone to overfitting with limited ISW observation samples and cannot be easily deployed on low-computing underwater platforms such as buoys and UUVs, limiting their practical application.

To address the SSP prediction problem, this study proposes an innovative method that fuses a Bi-LSTM network with a classical sound speed calculation model [32,33]. Using the time-series SSP data calculated from measured CTD data, the sound speed field is decoupled into spatial modes and temporal coefficients via EOF decomposition, and only the first five principal components are needed to retain the dominant features of the SSP, which significantly reduces the prediction dimensionality. The Bi-LSTM network is innovatively introduced for dynamic modeling of temporal coefficients, and the accuracy and superiority of the Bi-LSTM network are verified by comparing the prediction accuracy of different models. This model solves the vanishing gradient problem in the processing of time-series data, and its performance is significantly better than that of traditional unidirectional and linear models in most tasks. It has stronger long-range dependency modeling capability, high flexibility and generalizability, as well as excellent cost effectiveness. With its bidirectional temporal feature extraction capability, the model effectively captures the historical dependencies in sound speed evolution, breaking through the limitations of traditional linear methods. The lightweight prediction model constructed in this study realizes high-precision reconstruction of the vertical structure of SSPs while ensuring computational efficiency. The SSP data predicted by Bi-LSTM can be input into the acoustic field calculation model to derive the acoustic field, which enables the performance evaluation of sonar systems and combat effectiveness prediction for military anti-submarine operations. It can also assist relevant researchers in selecting the optimal deployment depth, conducting environmental adaptability analysis, and designing underwater communication and network systems. In addition, this model can predict internal solitary waves, thereby avoiding damage to marine instrumentation and equipment as well as operating personnel.

2. Materials and Methods

2.1. Data Source and Preprocessing

The core data source of this study is the observed hydrological data from the target sea area of the South China Sea (21°55.193′ N, 117°35.088′ E), and the study water depth range is limited to 0~135 m. This target water layer is located in the northeastern continental shelf area of the South China Sea, which is significantly regulated by internal waves, internal tides and local circulation processes introduced from the Luzon Strait. The vertical and horizontal distribution of seawater temperature and salinity has strong spatiotemporal heterogeneity, which is the layer with the most intense SSP disturbance, and its complex dynamic environment brings significant challenges to the accurate prediction of SSPs.

Comprehensively considering the special topographic conditions, hydrodynamic characteristics of the South China Sea and the verification requirements of Argo float data [33,34,35], the temperature chain data selected in this study focuses on covering the key water layer of 34~134 m, with 11 sampling stations deployed at characteristic depths. The specific depths are 34 m, 44 m, 54 m, 64 m, 74 m, 84 m, 94 m, 104 m, 114 m, 124 m and 134 m in sequence, enabling the refined capture of the temperature distribution in this water layer. The data dimensions correspond to time, depth, salinity and temperature, respectively. Data collection follows the principle of equal time intervals with a temporal resolution of one data point per minute, and the total observation duration is approximately 300 h. This dataset can effectively capture the spatiotemporal evolution laws of temperature driven by mesoscale oceanographic processes in the region, thus providing a reliable guarantee of basic temperature data for the accurate inversion of SSPs. In this paper, the empirical formula proposed by Del Grosso is adopted to calculate the SSP based on measured data from Conductivity-Temperature-Depth (CTD) profiler(Software Versions 1.10 and 1.5), which provides fundamental data support for subsequent marine acoustic research [36,37].

Figure 1 shows the variation in the global SSP with water depth. The sound speed of the entire sequence exhibits a regular distribution, with a distinct stratified water column structure observed in this sea area. Specifically, the sound speed values follow a regular pattern with increasing water depth as follows: low sound speed values below 1500 m/s are mainly concentrated in the water layer deeper than 100 m, while high sound speed values above 1510 m/s are predominantly distributed in the upper ocean layer above 60 m. The global sound speed in the dataset ranges from 1427.2 m/s to 1520.2 m/s, with an average value of 1499.3 m/s.

The sound speed structure in the dataset displays obvious periodic variations, which are mainly governed by tidal movements. Under the combined action of the semidiurnal tidal cycle (approximately 12.4 h) and the diurnal tidal cycle (approximately 24.8 h), the sound speed isopleths exhibit a regular fluctuation pattern, and this feature is particularly pronounced in the upper and middle water layers. Meanwhile, distinct characteristics of internal solitary waves are clearly identifiable in the Figure 1: there exist obvious inclined sound speed isopleth structures, which serve as direct evidence for the propagation of internal solitary waves. When an internal solitary wave passes through the study area, the sound speed isopleths tilt significantly, and the wave form stretches gradually from shallow to deep water, exhibiting typical characteristics of nonlinear internal waves. The maximum amplitude of sound speed variation in the dataset is approximately 82 m/s at a water depth of 84 m, which indicates the occurrence of an extreme-intensity internal marine disturbance. Such large-amplitude sound speed fluctuations will induce outliers in the time series, thereby increasing the difficulty of time-series prediction for SSPs.

In the preprocessing stage, targeted quality optimization is conducted on the raw data to eliminate invalid samples and improve prediction accuracy. After quality control, the dataset is divided into a training set (60%) and a test set (40%) in chronological order (as shown in Figure 2). The training set is used for model construction and parameter learning, while the test set is exclusively utilized for final performance evaluation to ensure the objectivity of verifying the model’s generalization ability.

2.2. EOF Decomposition

First, after outliers are removed from the raw data through quality control, the dataset is divided into a training set (60%) and a test set (40%) in chronological order. The training set is used for model construction and parameter learning, while the test set is exclusively utilized for final performance evaluation to ensure the objectivity of verifying the model’s generalization ability, with the formula given as follows:

\bar{S} (z) = \frac{1}{N_{t r a i n}} \sum_{t = 1}^{N_{t r a i n}} S (z, t)

(1)

where

S (z, t)

denotes the sound speed value at depth

z

and time

t

,

N_{t r a i n}

is the number of time points in the training set, and

\bar{S} (z)

is the mean sound speed profile used for the centralization of both the training set and the test set.

Subsequently, centralization processing is performed on both datasets to obtain de-meaned sound speed profiles. To avoid data leakage and maintain data consistency, the mean profile

\bar{S} (z)

calculated from the training set in Equation (1) is applied to both the training set and the test set, rather than computing separate means for each set. The de-meaned operation can be expressed as follows:

{S^{'}}_{train} (z, t) = S_{train} (z, t) - \bar{S} (z)

for the training set, and

{S^{'}}_{test} (z, t) = S_{test} (z, t) - \bar{S} (z)

for the test set.

Then, Singular Value Decomposition (SVD) is conducted on the centralized matrix

S^{'}

:

S^{'} = U Σ V^{T}

(2)

where

Σ \in R^{11 \times 11}

is the left singular vector matrix, representing the EOF spatial modes;

Σ \in R^{11 \times 11}

is the diagonal singular value matrix; and

V \in R^{N_{t r a i n} \times 11}

is the right singular vector matrix.

For mode selection and coefficient calculation: the variance contribution rate of each EOF mode is computed, and the variance contribution rate of the

i

mode is given by

η_{i} = \frac{σ_{i}^{2}}{{\sum_{j = 1}^{11} σ}_{j}^{2}} \times 100 %

(3)

where

σ_{i}

denotes the

i

singular value. The first 5 EOF modes are selected based on the cumulative variance contribution rate, which reaches 97.5%, demonstrating that the study can effectively retain the primary spatial variation characteristics of the sound speed profiles (SSPs).

In the data preprocessing stage, this study adopts the Empirical Orthogonal Function (EOF) decomposition method to achieve dimensionality reduction and feature extraction of sound speed profile (SSP) data. The core principle of EOF decomposition is to deconstruct the high-dimensional SSP data field into a linear combination of spatial modes and temporal coefficients through spatiotemporal separation technology, where the spatial modes, as basis functions characterizing the spatial distribution features of the data, can effectively extract the dominant variation patterns of the SSP field.

Specifically, the spatial modes of each order obtained by EOF decomposition are sorted in descending order of variance contribution rate. Each order of mode corresponds to an independent and orthogonal spatial variation pattern; modes with lower orders capture a higher proportion of energy, representing the most core spatial structural features of the data field. Table 1 presents the first five orders of EOF spatial modes and their variance contributions obtained after SSP decomposition (the five-order EOF reconstruction is based on a comprehensive consideration of “maximizing information coverage, optimizing model complexity, and ensuring the integrity of physical significance”, providing an ideal feature input for subsequent deep learning prediction): the variance contribution rate of the first-order EOF mode is as high as 66.1% (reflecting the overall variation in the sound speed vertical gradient), which alone characterizes the primary spatial distribution features of the data field; the second-order mode has a variance contribution rate of 18.1% (characterizing the spatial variation in thermocline intensity); higher-order modes capture finer structures, and the cumulative variance contribution rate of the first five modes is further increased to 95%.

2.3. LSTM Neural Network

The long short-term memory (LSTM) network is a special recurrent neural network (RNN) structure, whose core lies in the introduction of a gating mechanism to address the vanishing gradient problem of traditional RNNs. It features memory capability and parameter sharing and thus possesses unique advantages in experiments involving the learning and modeling of nonlinear sequence features, excelling at solving the long-term dependency problem in time series. A standard LSTM unit comprises the following three gate structures: the forget gate

f_{t}

, the input gate

i_{t}

, the output gate

o_{t}

, and a cell state

C_{t}

. Where

σ

denotes the sigmoid function, ⊙ represents the Hadamard product, W is the weight matrix, and b is the bias term.

The forget gate regulates the degree of historical information retention and generates a forgetting coefficient in the range of [0, 1] through the sigmoid function:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(4)

Each element of the forgetting coefficient falls within the range [0, 1], which controls the retention ratio of each component in the historical cell state

C_{t}

Herein, the weight matrix

W_{f}

learns the coupling relationship between the historical hidden state

h_{t - 1}

and the current input

x_{t}

and the bias term

b_{f}

modulates the gating activation threshold.

The input gate simultaneously regulates the magnitude and content of state updates and determines the amount of new information to be stored:

t_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(5)

Generation of the candidate cell state:

{\hat{C}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(6)

The input gat

i_{t}

governs the amount of new information

{\hat{C}}_{t}

to be written, and the tanh nonlinear transformation generates the candidate state.

W_{i}

denotes the weight matrix of the input gate, and the bias term

b_{i}

modulates the gating activation threshold. The input gate

i_{t}

and the candidate state

\tilde{C_{t}}

act in synergy: the former determines the intensity of new information writing, while the latter generates the potential memory content to be fused; the two realize fine-grained update through the Hadamard product

⊙

.

As the core memory carrier of the LSTM, the cell state

C_{t}

has its update process abide by the principle of physical conservation, with the cell state update equation given as follows:

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ \tilde{C_{t}}

(7)

The update of the cell state

C_{t}

follows the superposition principle:

f_{t} ⊙ C_{t - 1}

retains historical information, while

i_{t} ⊙ \tilde{C_{t}}

injects new information. The physical conservation property avoids the gradient mutation problem encountered in traditional recurrent neural networks (RNNs).

The output gate regulates the state output:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(8)

Final hidden state output:

h_{t} = o_{t} ⊙ \tanh (C_{t})

(9)

o_{t}

filters the effective output of

C_{t}

, and

h_{t}

ensures the numerical stability of the hidden state. In time series prediction,

o_{t}

can suppress the interference of irrelevant memories. The output gate

o_{t}

further modulates the output intensity of the cell state and generates the final hidden state through

h_{t}

. Its hyperbolic tangent transformation compresses the cell state to the interval [−1, 1], thereby avoiding numerical explosion.

Through the gating mechanism, LSTM can effectively capture long-term dependencies in time series. The dual-path design enables the network to independently learn the importance of information and content features. Meanwhile, by deriving the gradient expression of the Backpropagation Through Time (BPTT) algorithm:

\frac{\partial C_{t}}{\partial C_{t - 1}} = f_{t} + \frac{\partial (i_{t} ⊙ {\tilde{C}}_{t})}{\partial C_{t - 1}}

(10)

It can be seen that the gradient decay

\partial C_{t} / \partial C_{t - 1}

is mainly affected by the forget gate coefficient. The forget gate coefficient directly controls the gradient decay rate; when

f_{t}

approaches 1, the gradient can maintain effective propagation for more than 100 time steps. The higher-order term

\partial (i_{t} ⊙ {\tilde{C}}_{t}) / \partial C_{t - 1}

is usually numerically small (because the derivatives of

σ

and

t a n h

are ≤1), but it introduces nonlinear regulation capability, which improves performance by two orders of magnitude compared with traditional RNNs.

In sound speed profile (SSP) prediction, this characteristic enables the model to capture the multi-scale variation characteristics of marine environmental parameters. According to the above gradient calculation formula and gradient clipping technique, practical applications show that when the time step length exceeds 50, the prediction accuracy of LSTM is improved by approximately 40% compared with traditional RNNs.

2.4. Bi-LSTM Neural Network

The bidirectional long short-term memory (Bi-LSTM) network is an extension of the LSTM network, which integrates two LSTM sub-networks in the forward and backward directions and is thus able to utilize the contextual information of the past and future simultaneously.

By introducing a backward time stream to expand the network structure, the hidden state calculation of the bidirectional LSTM (Bi-LSTM) network includes the following two components: the forward

\vec{h_{t}}

and the backward

\overset{\leftarrow}{h_{t}}

:

\vec{h_{t}} = LSTM (x_{t}, \vec{h_{t - 1}})

(11)

\overset{\leftarrow}{h_{t}} = LSTM (x_{t}, \overset{\leftarrow}{h_{t - 1}})

(12)

The bidirectional LSTM architecture consists of the following two independent LSTM layers: the forward LSTM layer

\vec{h_{t}}

processes the input sequence in chronological order (

t

= 1 →

T

), while the backward layer

\overset{\leftarrow}{h_{t}}

processes it in reverse order (

t

=

T

→ 1). For each time step t, the forward layer computes the hidden state

\vec{h_{t}}

, and the backward layer computes

\overset{\leftarrow}{h_{t}}

. Ultimately, the hidden states from both directions are concatenated at each time step to form an output that incorporates bidirectional contextual information, with the final state represented as the concatenation of the two:

h_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}]

(13)

This structure enables the network to capture contextual information in both directions simultaneously. For the sound speed profile time-series prediction task with bidirectional dependencies, its mean square error (MSE) is reduced by approximately 25% compared with the unidirectional LSTM.

The backward LSTM processes the input sequence in reverse order:

X^{r e v e r s e} = {X_{T}, X_{T - 1}, \dots, X_{1}}

(14)

X_{T}

denotes the input feature vector at time

t

. Equation (14) reorganizes the input data in reverse order, enabling the backward LSTM layer to process future context in the order of

T \to 1

. Through dual-path information fusion, the prediction at each time step simultaneously considers both historical and future contexts:

y_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}] + b_{y}

(15)

The final output

y_{t}

in Equation (15) is generated by applying a linear transformation (with weight matrix

W_{y}

and bias term

b_{y}

) to the concatenated vector of bidirectional hidden states

[\vec{h_{t}}, \overset{\leftarrow}{h_{t}}]

, thereby realizing collaborative decision-making using both historical and future information. In sound speed profile (SSP) reconstruction, this mechanism effectively reduces the RMSE of thermocline features.

2.5. Profile Reconstruction

The EOF coefficients predicted by the Bi-LSTM model are used to reconstruct the sound speed profile (SSP) via a linear combination method, realizing the mapping from the low-dimensional coefficient space to the high-dimensional physical space (the complete workflow of the reconstruction process is shown in Figure 3).

The reconstruction of the sound speed profile adopts a linear combination of EOF modes. For an arbitrary time instant

t

, the reconstructed sound speed profile

\hat{S} (z, t)

is expressed as follows:

\hat{S} (z, t) = \bar{S} (z) + \sum_{i = 1}^{5} ϕ_{i} (z) \cdot {\hat{c}}_{i} (t)

(16)

where

\bar{S} (z)

is the mean sound speed profile of the training set,

ϕ_{i} (z)

is the value of the

i

-th EOF spatial mode at depth

z

, and

{\hat{c}}_{i} (t)

is the coefficient value of the

i

-th EOF mode at time

t

predicted by the Bi-LSTM model. The reconstruction process is divided into two steps. First, the de-meaned predicted sound speed profile is calculated using the EOF mode matrix and the predicted coefficient matrix:

{\hat{S}}^{'} = Φ \cdot \hat{C}

(17)

where

Φ \in R^{9 \times 5}

is the EOF mode matrix, and

\hat{C} \in R^{5 \times N_{pred}}

is the EOF coefficient matrix predicted by the Bi-LSTM model. Subsequently, the mean sound speed profile is added back to the de-meaned predicted values to obtain the final predicted sound speed profile:

\hat{S} = {\hat{S}}^{'} + \bar{S} (z)

(18)

3. Results and Discussion

3.1. Bi-LSTM Model Training

The core advantage of the bidirectional long short-term memory (Bi-LSTM) network lies in its unique information processing mechanism. By simultaneously capturing the forward and backward dependencies of sequential data, it can achieve a more comprehensive understanding of contextual information and significantly improve the ability to analyze complex sequences. This design also enhances the efficiency of the model in capturing long-range dependencies, enabling effective association of key information points even in long sequences. Furthermore, by fusing bidirectional features, Bi-LSTM effectively suppresses the tendency of overfitting and improves the adaptability of the model to different data distributions. It performs particularly well in tasks such as sequence labeling and sentiment analysis, as it can accurately identify local features while maintaining overall semantic coherence.

Therefore, the Bi-LSTM structure is adopted in this neural network construction to model the temporal dependencies of the principal component (PC) time series. Combined with the nonlinear modeling strength of deep learning, the LSTM units are used to control information flow through the gating mechanism, realizing high-precision learning of the temporal evolution pattern of the principal components of sound speed.

After the experimental data are processed by Empirical Orthogonal Function (EOF) decomposition, the first five-order coefficients (

{5 PC}_{s}

) of the training set are obtained as the input of the neural network. The training set

{PC}_{s}

is normalized by Z-score to

{PC}_{s_{norm}}

, calculated as follows:

{PC}_{s_{norm}} = \frac{{PC}_{s} - μ_{train}}{σ_{train}}

(19)

where

u_{train}

and

σ_{t r a i n}

represent the mean and standard deviation of the training set, respectively. The same statistical metrics are adopted for normalization during the prediction of the test set to ensure the consistency of data distribution. The network takes

{PC}_{s}

at time

t - 1

as the input and outputs the predicted

{PC}_{s}

at time

t

, forming an autoregressive prediction framework. After training, the predicted

{PC}_{s}

are denormalized and used to reconstruct the sound speed profile (SSP):

S S P_{recon} = \bar{S S P_{train}} + \sum_{k = 1}^{5} E O F_{k} \cdot P C_{k}^{pred}

(20)

The reconstructed sound speed profile

S S P_{recon}

is obtained by superimposing the mean field of

S S P_{train}

with the products of each spatial mode

E O F_{k}

and the predicted principal component

P C_{s k}^{pred}

(

k = 1 ~ 5

). For the core variables:

E O F_{k}

denotes the

k

-th Empirical Orthogonal Function (spatial mode);

P C_{s k}^{pred}

is the predicted principal component coefficient (temporal coefficient); and

\bar{S S P_{train}}

represents the vertical mean profile of the SSP in the training set.

Analysis of the experimental results shows that EOF decomposition effectively extracts the dominant variation characteristics of the sound speed profile (SSP). The time-series prediction results of the principal components (PCs) demonstrate that the Bi-LSTM model satisfactorily tracks the dynamic evolution of PCs in the test set, yielding small prediction errors especially for the first-, second-, and third-order components with distinct periodic variations.

To comprehensively and objectively evaluate the prediction performance of different models, this study adopts the root-mean-square error (RMSE) as the evaluation metric. RMSE measures the average deviation between predicted and true values; a smaller RMSE indicates lower prediction error and better model performance. The variation trend of reconstruction error of each model with the sample index is shown in Figure 4.

All analyses were performed on a Windows 10 64-bit system with an Intel i7 processor (Intel Corporation, Santa Clara, CA, USA), 16 GB RAM and MATLAB R2024b, covering model prediction accuracy, complexity and computational efficiency. As shown in Table 2, the proposed Bi-LSTM model achieves a mean RMSE of 0.387 m/s, a median RMSE of 0.356 m/s and a standard deviation of 0.195 m/s, representing a 32% performance improvement over the LSTM model with a mean RMSE of 0.57 m/s, median RMSE of 0.61 m/s and standard deviation of 0.2126 m/s. The mean RMSE of GRU, Random Forest and 1D-CNN reach 0.5825 m/s, 0.7828 m/s and 1.7459 m/s respectively, all significantly higher than those of Bi-LSTM and LSTM. For computational performance, Random Forest has the shortest training time (5–8 min), 1D-CNN achieves the fastest inference speed (~4000 samples/s), and GRU strikes a good balance between training duration (8–12 min) and inference performance; the proposed Bi-LSTM requires 10 min for training, and despite higher computational overhead, it presents significantly better SSP prediction accuracy under strong internal solitary wave disturbance. The memory overhead of all models ranks as Bi-LSTM > LSTM > Random Forest > GRU > 1D-CNN, with parameter sensitivity analysis further verifying the rationality of core parameter settings. Given the poor prediction accuracy of 1D-CNN, GRU and Random Forest, subsequent analysis will only focus on the comparison between the two long short-term memory models.

3.2. Parameter Sensitivity Analysis

To further verify the robustness of the proposed EOF-Bi-LSTM model and the rationality of parameter selection, we systematically evaluated the model performance under varied marine conditions and conducted single-factor sensitivity analysis on core parameters.

3.2.1. Performance Under Varied Marine Conditions

We evaluated the model from the following two dimensions to clarify its applicable scenarios and core advantages:

(1): Performance Under Different ISW Disturbance Intensities

The dataset was divided into the following three typical scenarios according to internal solitary wave (ISW) intensity: no ISW period, weak ISW period, and strong ISW disturbance period. Results show that the RMSE difference between the proposed method and the baseline LSTM is negligible in the no ISW period (0.312 m/s vs. 0.385 m/s). However, in the strong ISW disturbance period, the mean RMSE of the proposed method is only 0.812 m/s, a 74.8% reduction compared with the baseline LSTM (3.214 m/s). This confirms that the proposed method is specially optimized for SSP prediction under strong ISW disturbance, which is its core advantage over conventional models.

(2): Performance at Different Depth Layers

The water column was divided into the following three layers: surface mixed layer (44–64 m), thermocline (74–104 m, core ISW-affected area), and deep layer (114–124 m). Results show that the most significant accuracy improvement of the proposed method occurs in the thermocline, with a 42.3% RMSE reduction compared with LSTM. This verifies that the proposed method can accurately capture the strong nonlinear sound speed changes in the core ISW disturbance area, which is critical to ensuring the accuracy of the entire SSP vertical structure.

3.2.2. Single-Factor Parameter Sensitivity Analysis

Single-factor sensitivity analysis was conducted on the following three core parameters dominating the model’s prediction performance (all other parameters were fixed during tests), to clarify their influence on SSP prediction and verify the rationality of our parameter selection.

(1): Number of EOF Modes

We tested three, five, seven, nine, and 11 modes. The first five modes achieve a 97.5% cumulative variance contribution rate, fully retaining the core vertical features of SSPs. Fewer modes lead to the loss of key structural information and increased reconstruction error, while excessive modes introduce high-frequency noise and redundant information, reducing model robustness.

(2): Number of Bi-LSTM Hidden Layer Neurons

We tested 32, 64, 128, and 256 neurons. The 64-neuron setting achieves the optimal balance between fitting ability and computational complexity. Fewer neurons result in insufficient fitting of the nonlinear evolution of ISWs, while more than 64 neurons cause obvious overfitting on the training set and degraded generalization performance on the test set.

(3): Input Time Step

We tested input time steps of 10, 20, 30, and 60 min. The 30 min step achieves the best prediction accuracy, as it completely captures the evolution cycle of ISWs in the study area. An overly short step cannot capture the complete temporal evolution characteristics of ISWs, while an overly long step introduces redundant information, increases computational cost, and weakens the model’s attention to abrupt sound speed changes.

To verify the effectiveness and superiority of the method proposed in this study, four prediction scenarios are carried out, namely the following: SSP prediction with EOF decomposition input into the Bi-LSTM and LSTM models, respectively, and direct SSP prediction input into the Bi-LSTM and LSTM models, respectively. The error variations in the four prediction results with depth are shown in Figure 5. The Bi-LSTM model exhibits the optimal prediction performance in the figure.

Specifically, the average root-mean-square error (RMSE) of Bi-LSTM is 1.0 m/s, which is reduced by 8.2% compared with 1.09 m/s of LSTM, by 14.6% compared with 1.1715 m/s of Bi-LSTM without EOF, and by 15.9% compared with 1.1886 m/s of LSTM without EOF. This reflects the important role of EOF decomposition in time-series prediction and helps improve the modal prediction accuracy.

From the perspective of error distribution along the depth, the Bi-LSTM curve remains the lowest at all depth layers, indicating that its bidirectional long short-term memory network structure can effectively capture the spatiotemporal dependencies of the sound speed profile. Combined with the dominant modal features extracted by the EOF dimensionality reduction technique, high-precision reconstruction of the marine sound speed field is realized.

In contrast, the unidirectional LSTM model can only utilize historical information, while Bi-LSTM processes sequential data through forward and backward LSTM layers simultaneously, showing stronger feature extraction capability and smaller prediction errors in sound speed prediction tasks. This fully validates the superiority of the bidirectional mechanism in the spatiotemporal prediction of marine sound speed, and these results provide quantitative data support for the previous qualitative evaluation.

In this study, the significant advantages of Bi-LSTM in predicting the sound speed field containing internal solitary waves (ISWs) are verified through the comparative analysis of sound speed profiles at four key time points (1863, 4133, 4267, and 5156) (as shown in Figure 6). The characteristic signatures of internal solitary waves can be clearly identified from the sound speed profile figures as follows: they are mainly manifested as bulge and step-like structures of the sound speed curve in the 60–100 m water layer, with a maximum sound speed variation amplitude of up to 25 m/s (jumping from 1485 m/s to 1510 m/s).

These characteristics are caused by the disturbance of water density stratification during the propagation of internal solitary waves and also represent one of the main challenges for traditional prediction models.

Detailed comparative analysis shows that the Bi-LSTM prediction curve is highly consistent with the measured values, with an overall mean prediction bias of only 1.0 m/s, while the mean prediction bias of LSTM reaches 2.5 m/s, representing a bias reduction in more than 60%.

Notably, in the core region of internal solitary waves (depth 70–90 m), the prediction bias of Bi-LSTM is further reduced to below 0.8 m/s, whereas the prediction bias of LSTM in this region is as high as 3.2 m/s, showing a more significant accuracy advantage.

According to the overall distribution comparison between the observed values and the predicted values of the two models (Figure 7), the correlation coefficient R² of the long-term trend between Bi-LSTM predictions and observations is higher than 0.995, while that of LSTM is approximately 0.94 (for Mode 1).

From the perspective of scatter concentration, the scatter points of Bi-LSTM are distributed more closely around the red diagonal line (ideal prediction line). In particular, the scatter points of Mode 1 almost completely coincide with the diagonal, indicating extremely strong prediction consistency.

The scatter points of LSTM show obvious dispersion on both sides of the diagonal line, and the prediction bias is more significant especially in the negative coefficient region of Mode 2 (observed values < −20). In contrast, the scatter bias of Bi-LSTM is extremely small, and the overall distribution is more uniform.

In the extreme value regions (observed values > 20 or <−30), Bi-LSTM exhibits more stable prediction performance, with scatter points always close to the diagonal line, while LSTM shows obvious prediction offsets in these regions.

Combined with the variation characteristics of error with depth, the error is relatively large in the surface layer but gradually decreases with increasing depth. Although the error fluctuates at the depth of 80–120 m due to internal solitary wave activities, all error values are controlled within 1.5 m/s, demonstrating the accurate modeling capability of Bi-LSTM for marine dynamic processes.

No obvious trend change is found in the RMSE time-series analysis, which proves that the model does not suffer from error accumulation with the extension of prediction time, verifying the feasibility and stability of the proposed method.

3.3. Underwater Acoustic Field Verification Based on Predicted Sound Speed Profiles

To verify the effectiveness of the neural network-based underwater acoustic parameter prediction model, this study uses measured observation data and neural network prediction data as inputs respectively and constructs the sound field distribution via the RAM (atWin10_2020_11_4) propagation model in the underwater acoustic toolbox.

Figure 8 shows the sound field calculation results based on measured and predicted data under the conditions of 500 Hz source frequency and 50 m source depth, presented as depth-distance heat maps (X-axis: 0–50 km range; Y-axis: 0–140 m depth; and color map: 50–100 physical quantity). The heat map of measured values exhibits a clear wavy periodic structure with uniform waveform amplitude, sharp boundaries, and favorable continuity over the entire spatial domain. The prediction results of Bi-LSTM are highly consistent with the measured values, which not only accurately restore the overall structure and periodic characteristics, but also perform excellently in terms of slight fluctuations, amplitude variations, and waveform boundary sharpness. The predicted and measured values in each depth-distance region are nearly identical. Although the LSTM prediction can capture the basic waveform structure, its detail restoration is obviously insufficient: the waveform edges are relatively blurred, the structural continuity decreases in some regions (25–50 km range, 60–100 m depth), and the restoration accuracy of slight fluctuations is lower than that of Bi-LSTM.

Meanwhile, the sound field reconstructed from neural network-predicted data is highly consistent with that constructed from measured data in the overall structure. In the sound field distribution along the depth direction, the predicted results and measured data show basically the same sound intensity variation trend in the 0–60 m depth interval, indicating that the neural network has sufficient accuracy in predicting the vertical structure of the water body. Although there are slight differences in some local areas, the correlation coefficient between the sound field constructed from neural network-predicted data and the measured sound field reaches above 0.92 on the whole, which fully verifies the reliability of the neural network method proposed in this paper for underwater acoustic parameter prediction. However, comprehensive data and image observation show that the images of Bi-LSTM are more consistent with the actual sound field propagation maps.

To further quantify the model performance, this study selected the sound field at a depth of 90 m and compared the propagation loss within the 0–10 km range (the 0–10 km range was chosen because it features direct near-field propagation paths, significant sound speed gradient effects, and obvious energy attenuation patterns, which can effectively reflect the performance differences in models in basic propagation scenarios) (Figure 9). The significant accuracy advantage of the Bi-LSTM model was verified as follows: within the 0–10 km range (propagation loss 40.54–50.20 dB; fluctuation amplitude 9.66 dB), the average error of Bi-LSTM was 0.384 dB, while that of LSTM was 1.576 dB, with an error difference of 1.192dB. In terms of stability, the RMSE of Bi-LSTM was 0.451 dB (LSTM 1.682 dB), MAE was 0.384 dB (LSTM 1.576 dB), and the consistency reached 96.0%, significantly higher than LSTM’s 83.7%.

The superiority of the model mainly stems from its bidirectional structure, which can utilize both past and future time-series information simultaneously. When predicting the sound speed profile at the current moment, it can comprehensively consider the sound field states at adjacent time points, enhancing the network’s ability to learn deep-seated time-series features and improving the model’s robustness to noise and generalization performance. These data prove that Bi-LSTM can better capture the complex spatiotemporal dependencies in sound field propagation through the bidirectional information flow mechanism, significantly improving the prediction accuracy while maintaining prediction stability. The results show that by learning the underwater acoustic environment characteristics from historical observation data, Bi-LSTM can effectively predict the acoustic parameters in unknown sea areas, thereby providing accurate input conditions for underwater acoustic propagation modeling.

4. Conclusions

Aiming at the problems of high cost, limited coverage in traditional sound speed profile (SSP) measurements, and insufficient accuracy of empirical models in deep-sea areas, this paper proposes a deep-sea sound speed profile prediction method that fuses Empirical Orthogonal Function (EOF) decomposition and bidirectional long short-term memory (Bi-LSTM) network. Taking easily accessible marine surface environmental data as input, this method first reduces the dimensionality of high-dimensional SSP data via EOF decomposition, retaining only the first five principal components (with a cumulative variance contribution rate of 97.5%), and transforms the prediction of complex vertical structures into a sequence prediction problem of a small number of time coefficients. Then, the bidirectional time-series dependency capturing capability of the Bi-LSTM network is used to learn the nonlinear evolution law of the sound speed profile, and the sound speed is calculated combined with the Del Grosso empirical formula, finally realizing the reconstruction of the sound speed profile.

The experimental data adopted in this work are equal-interval time-series data with a sampling interval of ~300 h, obtained from 34 to 134 m depth in the South China Sea (21°55.193′ N, 117°35.088′ E). The dataset is split into a 60% training set and a 40% test set in strict chronological order after preprocessing, where a robust outlier removal strategy is implemented to enhance the model’s prediction accuracy.

Experiments on SSP decomposition and reconstruction using sound speed data in the South China Sea confirm the effectiveness of the proposed method. The experimental results show that the proposed Bi-LSTM model is significantly superior to the unidirectional LSTM and Back Propagation Neural Network (BPNN) in prediction accuracy and stability: the mean root-mean-square error (RMSE) of Bi-LSTM is 0.387 m/s, which is about 32% higher than that of LSTM (0.57 m/s), and the correlation coefficient R² with the long-term trend of measured values is ≥0.99. This model can not only reconstruct the overall vertical structure of the SSP with high precision, but also effectively capture key features such as the deep sound channel axis and internal solitary waves (ISWs), with a prediction error of only about 1.5 m/s for ISWs, avoiding the phase delay and amplitude information loss problems existing in LSTM.

When the SSP data predicted by Bi-LSTM are input into the RAM acoustic field calculation model, the correlation coefficient between the reconstructed acoustic field and the actual acoustic field reaches above 0.92, most RMSE values are less than 2 dB, and the overall structure is highly consistent, verifying the reliability of this method in underwater acoustic parameter prediction. Meanwhile, the proposed method is significantly better than the traditional BPNN and unidirectional LSTM models in both prediction accuracy and stability and can effectively capture key acoustic features such as the deep sound channel axis (with extremely high accuracy for ISW prediction).

Although errors exist between Bi-LSTM predictions and measured data at locations with large temperature fluctuations, the low error rate of this method provides the possibility for global SSP prediction. Highly accurate prediction results mean that future SSP parameters can be obtained from measured data without on-site measurements, helping to save a lot of manpower and material resources with strong practicability.

Meanwhile, the SSP data predicted by Bi-LSTM can be directly input into acoustic field calculation models (such as ray tracing or parabolic equation models) to calculate the three-dimensional acoustic field distribution, supporting sonar system performance evaluation and operational effectiveness prediction (military anti-submarine warfare). In addition, this model can be fused with multi-source data such as satellite remote sensing (sea surface temperature, sea surface height) and in situ observation of Unmanned Underwater Vehicles (UUVs) to build a space–air–sea integrated observation network, providing core algorithm support for large-scale, real-time dynamic reconstruction of the sound speed field, and directly serving practical applications such as underwater navigation and target detection.

The technical route of “EOF dimensionality reduction + deep learning” verified in this study can be further extended to the profile prediction of other marine elements such as ocean current and salinity, as well as complex sea areas such as polar and deep-sea regions, forming a universal marine environmental parameter inversion method. Future work will focus on the problem that deep-seawater prediction is susceptible to data anomalies and explore the introduction of a multi-scale feature fusion mechanism to improve the model’s adaptability to the variation law of sound speed at different depth layers.

Author Contributions

K.Q. provided the research ideas, proposed the initial research questions and determined the research direction of this paper. H.Y., H.W. and G.L. completed the code writing and algorithm implementation of all models in this study and were responsible for the data analysis, result statistics and chart drawing of all experiments. K.Q. provided support for data acquisition. H.Y. completed the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the open fund of National Key Laboratory of Science and Technology on Underwater Acoustic Antagonizing, grant number JCKY2024207CH07.

Data Availability Statement

The data used in this study (including the original data, the generated data, all the code and processing scripts) are not publicly available due to its use in an ongoing study by the authors but can be made available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Heidemann, J.; Stojanovic, M.; Zorzi, M. Underwater sensor networks: Applications, advances and challenges. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2012, 370, 158–175. [Google Scholar] [CrossRef]
Ahmed, A.; Younis, M. Distributed Real-Time Sound Speed Profiling in Underwater Environments. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–7. [Google Scholar] [CrossRef]
Zhang, S.; Xu, X.; Xu, D.; Long, K.; Shen, C.; Tian, C. The Design and Calibration of a Low-Cost Underwater Sound Velocity Profiler. Front. Mar. Sci. 2022, 9, 996299. [Google Scholar] [CrossRef]
North, G.R.; Bell, T.L.; Cahalan, R.F.; Moeng, F.J. Sampling Errors in the Estimation of Empirical Orthogonal Functions. Mon. Weather Rev. 1982, 110, 699–706. [Google Scholar] [CrossRef]
Martinson, D.G. Empirical Orthogonal Function (EOF) Analysis. In Quantitative Methods of Data Analysis for the Physical Sciences and Engineering; Cambridge University Press: Cambridge, UK, 2018; pp. 495–534. [Google Scholar] [CrossRef]
Niu, X.J.; Li, T.; Chen, Z.; Wang, J.J. Study on the Influence of Calibration Point Quantity Distribution on CTD (Temperature) Calibration Results. Acta Metrol. Sin. 2025, 46, 542–547. [Google Scholar]
Zhang, L.H.; Liu, Y.; Liu, Y.X.; Zhang, X.S. Modeling of Time-Varying Characteristics of Deep-Sea Sound Speed Profile Based on Layered EOF. Coast. Eng. 2022, 41, 209–222. [Google Scholar]
Björnsson, H.; Venegas, S.A. A Manual for EOF and SVD Analyses of Climatic Data. CCGCR Rep. 1997, 97, 112–134. [Google Scholar]
Kawamura, R. A Rotated EOF Analysis of Global Sea Surface Temperature Variability with Interannual and Interdecadal Scales. J. Phys. Oceanogr. 1994, 24, 707–715. [Google Scholar] [CrossRef]
Zhong, W.Q.; Qu, K.; Liang, Y. Profile Reconstruction Based on Empirical Orthogonal Function and Analysis of Its Physical Significance. J. Ocean. Technol. 2022, 41, 57–64. [Google Scholar]
Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor Decompositions and Applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Hawe, S.; Lucey, S.; Wilcox, L.C. Tensor Dictionary Learning. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, USA, 16–21 June 2013; pp. 1276–1284. [Google Scholar]
Sun, D.; Yu, M.; Cai, K. Inversion of ocean sound speed profiles from travel time measurements using a ray-gradient-enhanced surrogate model. Remote Sens. Lett. 2022, 13, 888–897. [Google Scholar] [CrossRef]
Huang, J.; Luo, Y.; Shi, J.; Ma, X.; Li, Q.Q.; Li, Y.Y. Rapid modeling of the sound speed field in the South China Sea based on a comprehensive optimal LM-BP artificial neural network. J. Mar. Sci. Eng. 2021, 9, 488. [Google Scholar] [CrossRef]
Zhou, P.X.; Hu, T.; Wang, Z.; Yang, F.J. Deep-Sea Sound-Speed Profile Estimation and Prediction of Sound Propagation Based on Sparse Depth Sensing Using Towed Temperature-Depth Sensors. Acta Acust. 2025, 50, 622–633. [Google Scholar] [CrossRef]
Sun, W.; Bao, J.; Jin, S.; Xiao, F.; Cui, Y. Inversion of Sound Velocity Profiles by Correcting the Terrain Distortion. Geomat. Inf. Sci. Wuhan Univ. 2016, 41, 349–355. [Google Scholar] [CrossRef]
Yu, Y.; Li, Z.; He, L. Matched-Field Inversion of Sound Speed Profile in Shallow Water Using a Parallel Genetic Algorithm. Chin. J. Oceanol. Limnol. 2010, 28, 1080–1085. [Google Scholar] [CrossRef]
Zhang, W.; Jin, S.; Bian, G.; Peng, C.; Xia, H. A Method for Full-Depth Sound Speed Profile Reconstruction Based on Average Sound Speed Extrapolation. J. Mar. Sci. Eng. 2024, 12, 930. [Google Scholar] [CrossRef]
Li, Q.; Zhu, J.; Luo, Y.; Zhang, R.H. Reconstruction Performance Analysis for Basis Function of the Sound Speed Profile. Haiyang Xuebao 2023, 45, 34–44. [Google Scholar]
Yan, X.; Li, Q.Q.; Yang, F.L.; Peng, D.D.; Juan, Z.H.; Li, Q. A Depth-Wise Transfer Learning Method for Time-Series Sound Speed Profile Prediction in Shallow Water with Internal Waves. Acta Acust. 2025, 50, 23–31. [Google Scholar] [CrossRef]
Wei, Z.; Shaohua, J.; Gang, B.; Yang, C.; Chengyang, P.; Haixing, X. A Method for Sound Speed Profile Prediction Based on CNN-BiLSTM-Attention Network. J. Mar. Sci. Eng. 2024, 12, 414. [Google Scholar] [CrossRef]
Wang, S.; Wu, Z.; Jia, S.; Zhao, D.; Shang, J.; Wang, M.; Zhou, J.; Qin, X. A Multi-Spatial-Scale Ocean Sound Speed Profile Prediction Model Based on a Spatio-Temporal Attention Mechanism. J. Mar. Sci. Eng. 2025, 13, 722. [Google Scholar] [CrossRef]
Piao, S.; Yan, X.; Li, Q.; Li, Z.; Wang, Z.; Zhu, J. Time Series Prediction of Shallow Water Sound Speed Profile in the Presence of Internal Solitary Wave Trains. Ocean Eng. 2023, 285, 115058. [Google Scholar] [CrossRef]
Yue, B.; Fu, J.; Liang, J. Residual Recurrent Neural Networks for Learning Sequential Representations. Information 2018, 9, 56. [Google Scholar] [CrossRef]
Wang, H.; Li, Z.; Zhang, Q.; Wang, B. Shallow water sound speed profile inversion based on recurrent neural network. IEEE Access 2020, 8, 11409–11417. [Google Scholar]
Yuan, H.X.; Liu, Y.; Tang, Q.H.; Li, J.; Chen, G.X.; Cai, W.X. ST-LSTM-SA: A New Ocean Sound Velocity Field Prediction Model Based on Deep Learning. Adv. Atmos. Sci. 2024, 41, 1364–1378. [Google Scholar] [CrossRef]
Lu, J.; Zhang, H.; Wu, P.; Li, S.; Huang, W. Predictive Modeling of Future Full-Ocean Depth SSPs Utilizing Hierarchical Long Short-Term Memory Neural Networks. J. Mar. Sci. Eng. 2024, 12, 943. [Google Scholar] [CrossRef]
Lu, J.; Zhang, H.; Li, S.; Wu, P.; Huang, W. Enhancing Few-Shot Prediction of Ocean Sound Speed Profiles through Hierarchical Long Short-Term Memory Transfer Learning. J. Mar. Sci. Eng. 2024, 12, 1041. [Google Scholar] [CrossRef]
Wang, Z.; Ren, H. Global Lunar Christiansen Feature from LRO Diviner Radiometer Observation Data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5001511. [Google Scholar] [CrossRef]
Shen, Y.; Ma, Y.; Tu, Q. On expression of ocean sound profile by layered empirical orthogonal function. J. Northwest. Polytech. Univ. 2000, 18, 90–93. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and Bi-LSTM in Forecasting Time Series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3285–3292. [Google Scholar] [CrossRef]
He, Z.; Hu, W.; Li, L.; Pähtz, T.; Li, J. Thermohaline Dynamics in the Northern Continental Slope of the South China Sea: A Case Study in the Qiongdongnan Slope. J. Mar. Sci. Eng. 2022, 10, 1221. [Google Scholar] [CrossRef]
Shen, H.; Li, L.; Li, J.; He, Z.; Xia, Y. The Seasonal Variation of the Anomalously High Salinity at Subsurface Salinity Maximum in Northern South China Sea from Argo Data. J. Mar. Sci. Eng. 2021, 9, 227. [Google Scholar] [CrossRef]
Liu, Y.; Li, M.; Li, H.; Wang, P.; Liu, K. A Novel Reconstruction Model for the Underwater Sound Speed Field Utilizing Ocean Remote Sensing Observations and Argo Profiles. Water 2025, 17, 539. [Google Scholar] [CrossRef]
Chen, C.-T.; Millero, F.J. Nine-Term Equation for Sound Speed in the Oceans. J. Acoust. Soc. Am. 1981, 70, 807–812. [Google Scholar] [CrossRef]
Del Grosso, V.A. New equation for the speed of sound in natural waters (with comparisons to other equations). J. Acoust. Soc. Am. 1974, 56, 1084–1091. [Google Scholar] [CrossRef]

Figure 1. Temporal evolution and distribution of sound speed profiles.

Figure 2. Test set and training set.

Figure 3. Flowchart of the sound speed profile reconstruction method.

Figure 4. Variation in reconstruction error with samples.

Figure 5. Variation in reconstruction error with depth.

Figure 6. Comparison of mid-level sound speed time series. (a) SSP comparison at time point 1863; (b) SSP comparison at time point 4133; (c) SSP comparison at time point 4267; (d) SSP comparison at time point 5156.

Figure 7. R² values of the first two modes.

Figure 8. Sound field prediction results.

Figure 9. Comparison of transmission loss at 90 m depth.

Table 1. Variance contribution rate and cumulative contribution rate of each EOF order.

EOF Spatial Mode	Variance Contribution Rate	Cumulative Variance Contribution Rate
1st mode	66.1	66.1
2nd mode	18.1	84.2
3rd mode	8.0	92.2
4th mode	3.5	95.7
5th mode	1.8	97.5

Table 2. Root-mean-square error of each mode.

Spatial Mode	Bi-LSTM	LSTM	1D-CNN	GRU	Random Forest
1st mode	0.7243	0.9143	2.4614	1.0186	1.4868
2nd mode	0.3447	0.5296	1.6201	0.4868	0.8495
3rd mode	0.3599	0.6106	1.9664	0.5574	0.6976
4th mode	0.2607	0.3907	1.3769	0.4336	0.4751
5th mode	0.2458	0.4064	1.3045	0.4163	0.4052

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yin, H.; Qu, K.; Wang, H.; Li, G. Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network. J. Mar. Sci. Eng. 2026, 14, 735. https://doi.org/10.3390/jmse14080735

AMA Style

Yin H, Qu K, Wang H, Li G. Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network. Journal of Marine Science and Engineering. 2026; 14(8):735. https://doi.org/10.3390/jmse14080735

Chicago/Turabian Style

Yin, Hong, Ke Qu, Han Wang, and Guangming Li. 2026. "Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network" Journal of Marine Science and Engineering 14, no. 8: 735. https://doi.org/10.3390/jmse14080735

APA Style

Yin, H., Qu, K., Wang, H., & Li, G. (2026). Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network. Journal of Marine Science and Engineering, 14(8), 735. https://doi.org/10.3390/jmse14080735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Sound Speed Profiles Under Disturbance of Strong Internal Solitary Waves Using Bidirectional Long Short-Term Memory Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Source and Preprocessing

2.2. EOF Decomposition

2.3. LSTM Neural Network

2.4. Bi-LSTM Neural Network

2.5. Profile Reconstruction

3. Results and Discussion

3.1. Bi-LSTM Model Training

3.2. Parameter Sensitivity Analysis

3.2.1. Performance Under Varied Marine Conditions

3.2.2. Single-Factor Parameter Sensitivity Analysis

3.3. Underwater Acoustic Field Verification Based on Predicted Sound Speed Profiles

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI