Expendable Conductivity–Temperature–Depth-Assisted Fast Underwater Sound Speed Estimation by Convolutional Neural Network with Reduced Fully Connected Layers

: Obtaining accurate sound speed profiles (SSPs) in near-real-time is of great significance for ocean exploration, underwater communication and improving the performance of sonar systems. In response to the problem that traditional sound speed estimation methods cannot obtain real-time sound speed distribution or rely too much on sonar observation data, we propose an SSP estimation method based on a convolutional neural network with reduced fully connected layers (RFC-CNN) in this paper. This method utilizes neural networks to extract the complex nonlinear features of various types of data. With the help of the historical SSPs and shallow seawater sound speed and temperature data obtained by expendable conductivity–temperature–depth probes (XCTDs), a more accurate estimation of the regional sound speed distribution can be realized quickly. This approach can save the observation cost and significantly improve the real-time performance of SSP estimation.


Introduction
The speed of sound, is one of the most important acoustic physical parameter in the ocean, which has a strong correlation with temperature, salinity and pressure [1].Among them, temperature has the most significant effect on the speed of sound, with a change of 1 °C in seawater temperature resulting in a change of approximately 0.35% in the speed of sound.The distribution of sound speed has an important effect on the performance of sonar systems, because the uneven distribution of sound speed affects the propagation pattern of underwater acoustic signals, including propagation attenuation and propagation path.The sound speed distribution varies much more with depth than in the horizontal direction at the same scale, so the sound speed profile (SSP) is usually used to describe the sound speed distribution in small-scale areas.Obtaining accurate SSPs is of great significance for ocean environmental detection and monitoring, evaluation of ocean water properties, and national defense and security [2,3].
The most direct way to obtain the SSP is in situ measurement.The advantage of using sound velocity profiler (SVP) for direct measurement of the SSP lies in accurate numerical measurement results and high depth resolution.However, the acoustic measurement equipment can only measure point by point, and needs to be carried out by stopping the ship and using the shipborne towing method.As the measurement cycle is long, it is time-consuming and labor-intensive to make large-scale, real-time observations [4,5].In order to improve on these deficiencies, inversion methods based on sonar information for estimating sound speed has emerged.Traditional sound speed inversion methods include matched field processing, compressed sensing and neural network inversion, but these methods rely on sonar observation data, which puts high requirements on the deployment of sonar observation systems, so the application range is limited.Based on the above reasons, sound velocity field construction based on historical SSP data has become an important research direction.
Current research indicates that rapid access to sound speed distributions can be achieved based on a combination of in situ and historical temperature and salinity measurements, historical SSP data, etc.The traditional way to obtain data are to use sonar equipment.However, when acquiring data in this way, it is difficult to deploy and dismantle tomographic imaging systems frequently, resulting in low real-time performance and high cost of data acquisition [6].In order to obtain real-time data, researchers proposed to obtain sea surface data conveniently and quickly through remote sensing satellites.This approach can obtain large-area sea surface data in real time with high spatial resolution [7], but the temperature and salinity data below the surface are difficult to obtain in real time.Temperature, conductivity, depth profiler (CTD) or expendable conductivity-temperaturedepth probes (XCTDs) can be used to obtain the temperature and salinity data below the surface.CTD can measure a larger range of depths, but it takes a longer time.In contrast, XCTDs can be synchronized while the surface platform is kept underway (up to 5 knots), and the data can be transmitted back through the copper wire.The probes can be discarded finally without being retrieved, so it does not affect the efficiency of the operation.It only takes about 20 min to measure the SSP at a depth of 2000 m, so the measurement of the SSP is more real-time [5].In addition, the portability of the XCTD enables it to be mounted on automated platforms such as drones or unmanned aerial vehicles (UAVs).With the advancement of research, for the next generation of observations that require a large amount of real-time information, the application of UAVs in ocean observation provides important possibilities.UAVs also perform well in terms of manufacturing cost, operating cost and mobility [8,9].Therefore, we can choose UAVs equipped with the XCTD to acquire observation data, which can realize the rapid acquisition of data with high accuracy and reduce the cost of observation.
However, after acquiring these data, the complex coupling relationships within them are still at an unknown stage.In recent years, with the continuous maturation of the theory and practice in the field of deep learning, machine learning has been widely applied in regression and classification problems in multiple fields, such as image recognition and numerical prediction.Thanks to its ability to fit complex nonlinear functions, it provides a new solution for extracting the complex nonlinear features of various types of data, and thus estimating the underwater sound speed.In this paper, we propose a fast estimation method for SSP based on convolutional neural network with reduced fully connected layers (RFC-CNN) with no need for sonar observation data.With the help of historical SSPs data and shallow seawater data of sound speed and temperature obtained by an XCTD, the estimation of regional SSP can be achieved, which reduces the demand for sonar observation systems, saves the observation cost, significantly improves the real-time performance and facilitates implementation.
To realize the purpose of accurately obtaining SSPs and reducing the cost of data acquisition, in this paper, we use convolutional neural network (CNN) to construct a SSP estimation method, and obtain the data with the help of XCTD equipment.This method effectively improves the efficiency of estimating SSPs.The specific work of this paper is as follows:

•
Aiming at the problem that traditional sound speed estimation methods cannot obtain real-time SSP estimation results or overly rely on sonar observation data, in this paper, we propose a fast estimation method based on RFC-CNN for SSP.In addition, we set the learning rate to be adjusted dynamically and reduce the number of fully connected layers to further improve the efficiency of the model.

•
In view of the high cost of obtaining data using sonar equipment and the difficulty of remote sensing satellites to acquire data below the sea surface in real time, we propose to acquire the shallow seawater measured data with the help of XCTD equipment to reduce the observation cost in this paper.

Related Works
Models for describing SSPs are mainly classified into two categories: Empirical Orthogonal Function (EOF) models and Analytic Function Models.The EOF model is to decompose the eigenvalues of the covariance matrix of the sample sequence of SSPs, and reconstruct the SSPs using eigenvectors of the first few orders [10].The analytic function model is to express the SSPs as a specific mathematical function style (e.g., Munk model) [11] through numerical fitting.
SSP can also be inverted by marine acoustic methods.SSPs inversion is a scenario of acoustic tomography, which utilizes certain features of the observed signal as observation quantities, calculates the copy quantities of the same features through the sound field propagation model, and inverts the equivalent SSPs of the sound wave propagation paths.Specific methods include matched-field processing, compressive sensing, and neural networks, and so on.The essence of acoustic tomography inversion is the optimization problem of cost function.The introduction of genetic algorithms [12] and sequential inversion algorithms [13][14][15] has improved the accuracy of the inversion results, but it still needs to carry out a large amount of sound field model calculation to obtain the copy volume, which is computationally intensive.
In 1991, the matched field method based on principal component analysis by EOF was firstly used for SSPs inversion by Tolstoy of the US Naval Laboratory [16], and the lattice point traversal method was used to search for the matching term.The computational complexity of the process was high, and the timeliness of the inversion needed to be urgently improved.In the process of matched field calculation, when the boundary parameters are not sufficiently matched, it is difficult to accurately recover the propagation time of the signal, which reduces the accuracy of the SSP inversion [17].
In order to improve the search speed of matching feature terms in matched field processing, in 2017, Zheng Guangwing et al. proposed an improved perturbation method algorithm [18].This method transforms the SSPs inversion from nonlinear optimization to linear equation systems, and improves the timeliness of the inversion under the condition of reducing the partial accuracy.Some researchers have introduced heuristic algorithms in matched-field processing to speed up the inversion process [19], such as particle swarm optimization (PSO), simulated annealing (SA), genetic algorithm (GA).The essence of acoustic tomography inversion is the optimization problem of the cost function, and the introduction of heuristic algorithms in matched-field processing improves the accuracy of the inversion results.However, the core of the heuristic algorithms is based on the Monte Carlo idea, and it is necessary to set sufficient particle number (e.g., PSO) or population number (e.g., GA) to guarantee the search probability of the optimal or suboptimal matching.Therefore, it still has a high computational time complexity.
Bianco [20] at the University of California and Choo [21] at Seoul National University proposed, in 2016 and 2018, respectively, the method of compressed sensing sound speed inversion combined with EOF decomposition.By using the signal propagation intensity and the signal propagation time, respectively, they established a compressed sensing dictionary, and employed the least-squares method to solve the overdetermined problem to describe the impact of sparse sound speed disturbances on the sound field.The compressed sensing models the mapping of the sound field to the sound speed distribution, but the use of first-order Taylor approximation reduces the inversion accuracy.
Neural networks have good self-learning ability, which can skip complex ocean models and dig intrinsic connections directly from the historical and online data.Especially in recent years, the amount of ocean forecast data has increased significantly, and a large amount of monitoring data from submerged buoys has been accumulated, which provides a strong support for the application of neural networks in ocean forecasting [22].More and more scholars are applying neural networks to ocean forecasting problems.
The sound speed field should have the real-time prediction function of SVP, which is more practical engineering significance.Jain et al. used ANN to predict SSP at 27 depths with sea surface parameters, vertical salinity, and temperature data [23].In order to improve the real-time estimation of sound speed, Dr. Huang et al. proposed an autoencoding feature-mapping neural network (AEFMNN) structure [24].This method can not only effectively improve the real-time performance of the inversion stage, but also enhance the robustness of the model to resist noise interference.However, the existing matched-field processing, compressed sensing and neural network models for sound speed inversion need to rely on sonar observation data, which puts forward high requirements for observation systems, and the application is limited for areas where it is difficult to deploy observation equipment.
With the development of technology, satellite remote sensing is capable of sustained and high-resolution observations of large sea areas, which can meet the needs of large-scale and real-time monitoring.Therefore, it is an inevitable trend to introduce satellite remote sensing technology to meet the needs of large-scale, long-term and timely observation.Satellites have the advantages of wide coverage and high resolution, and have become the main means of ocean observation.
Li Qianqian et al. proposed the use of artificial neural networks to establish the local spatio-temporal ocean sound speed field based on remote sensing satellite observation data and Argo historical data, achieving the real-time inversion of the SSPs over the whole ocean depth [7].With the advantage of multi-source information fusion, the inversion performance of this method is more stable and accurate.
Based on the sea surface parameters obtained from remote sensing, the SSP can be estimated to meet the needs of large-scale and instantaneous SSP acquisition, so the remote sensing data are also favored by researchers.However, satellites can only measure the surface data of the ocean, and the underwater sound speed data are still unknown, so the remote sensing data are still subject to certain limitations.
In recent years, unmanned aerial vehicles (UAVs) have been developed as observation platforms with lower manufacturing costs, operating costs, and superior maneuverability.In 2022, Japanese scholars proposed to develop a method to use UAVs to conduct XCTD/XBT observations and to consider the use of UAVs as ocean observing platforms to acquire the data [9].This method can reduce the cost of acquiring the actual data of the sound speed and temperature, making it more convenient to acquire.

Model Training and Sound Speed Field Construction Process
The framework of the SSPs estimation based on RFC-CNN is shown in Figure 1.Previous studies have shown that the SSP can be quickly obtained by combining various data such as ocean temperature, salinity data, historical SSP data, etc.However, it is difficult to express the complex coupling relationship between them with analytical expressions.Empirical equations are usually used to express the relationship between them.Therefore, neural networks are used in this paper to fit the complex nonlinear relationship between the data to reduce computational complexity.In this paper, we take the average historical SSPs, the measured temperature and sound speed within a certain depth range of the shallow sea surface, the information of longitude and latitude as the input of the model.The output will be an estimation of sound speed.
The process goes through: firstly, training samples are obtained from historical data, and then the model is trained using the samples to obtain the SSP estimation model, which realizes the real-time estimation of SSPs based on measured data and historical data.The general process is as follows: 1.
Download data related to sound speed from the global Argo ocean real-time observation data, and preprocess the data; 2.
Establish the RFC-CNN model for sound speed distribution; 3.
Input data for model training and testing, and use RFC-CNN to learn and estimate the SSP, so as to construct the ocean sound speed field; 4.
Compare and analyze the obtained results with the actual SSP, and evaluate the accuracy of estimating the SSP.
After the model is established, the RFC-CNN is trained using a stated training set.In step 3, the model is trained using a back propagation algorithm, which calculates the error of each layer and updates the weights and parameters of the model according to the error of the training data.The weight update operation is performed at the end of each iteration, which can effectively mitigate the impact of the changes in the input data on the results of the model.The parameters of the convolutional layers and fully connected layers are trained as the gradient decreases, while the relu layers and pooling layers perform a fixed function operation, which will not change.The training process goes through multiple cycles to ensure that the model converges to optimal performance.Then, the trained network is used to predict the output of the test set, and the predicted results are transposed and reverse normalized to convert the predicted values into a form that matches the actual values.
The performance effect of the prediction model can be evaluated from various aspects to determine the accuracy and stability of model predictions.The comparison curves between the actual values and the predicted values can be plotted to intuitively understand the performance of the model.The values of indicators such as root mean square error (RMSE) can also be calculated to measure the regression.RMSE is the square of the difference between the predicted data and the actual data divided by the number of sample points before taking the root.The closer its value is to 0, the better the regression model fits, and the predicted data are closer to the real data.The calculation formula is as follows: where true denotes the actual value, predict denotes the predicted value, and N denotes the number of samples.

Data Source
The shallow seawater measured data used in the research were assumed to be using XCTD equipment, which consists of a probe and connecting cables.The probe sinks at a known rate during the process while the data are transmitted through the connecting cable, effectively reducing observation time.When processing data, the loop nesting method is used multiple times to assist in data extraction, processing, and other operations.
In recent decades, many large-scale ocean observation plans and systems have emerged, among which the Array for Real-time Geostrophic Oceanography (Argo) has the widest coverage, largest scale, and most complete data.This project was proposed by atmospheric and marine scientists from countries such as the United States, the United Kingdom, and France in 1998, and quickly received positive responses from other countries [25].After where m = 31, n = 21.In order to facilitate the later solving work, each temperature and sound speed matrix is dimensionally transformed and expressed as h × m × n structure.
Utilizing the 2021 data as the measured data obtained by an XCTD, and the data from 2017 to 2020 as the historical data, then the mean value s(h) of the historical SSP in time at each point is derived, defining s k (h) to represent the value of the sound speed at the moment k at the depth h.The solution for the average historical SSP at each coordinate point is shown in the following equation.

Measured Data
The temperature measurement data at different depths of seawater are used in the comparative trial, and the data at different depths in 2021 are extracted to simulate the data measured by an XCTD [26].In order to ensure that the dimension of the temperature matrix corresponds to 58 depth layers, the temperature matrix needs to be supplemented by the mean value of the historical temperature data.The solution of the mean of historical temperature profiles at each point is similar to the solution of the mean of historical SSPs.According to the depth range of the input shallow seawater temperature data, the average temperature data corresponding to the remaining depth is added accordingly.
Page j of the h × m × n structured temperature matrix in the final input data, which is denoted as T j , represents the temperature data at latitude j, where j ∈ [1, n], n = 21.The T j expression is shown as follows.
where m ∈ [1, 31], denotes the longitude of the point; h denotes the depth, which is divided into 58 depth layers; i denotes the maximum depth layer corresponding to the input measured temperature data.When h ≤ i, the matrix is filled with the measured temperature data, t m (h), otherwise with the corresponding historical temperature mean t m (h).
Similarly, when comparing and analyzing the effect of inputting shallow sound speed measurements at different depth ranges on the estimation results, a similar method can be used to fill in the sound speed measurements corresponding to 58 depth layers.

Longitude and Latitude Matrix
For the longitude matrix and latitude matrix, based on the corresponding longitude and latitude at each point, the extracted longitude matrix, lon, and latitude matrix, lat, are expanded into matrices of size 31 × 21, respectively.The distribution of the values within the matrices is matched with the latitude and longitude of each point, and the distributions of specific matrices are shown below.For the 58 depth layers at the same point, the corresponding longitude and latitude are consistent, resulting in the longitude matrix and the latitude matrix of 58 × 31 × 21, respectively.
At this point, the data extraction required by the input is completed, and the elements that need to be written into the training data are successively written with the value of the fourth dimension as the serial number to form a large matrix containing all elements.

Training Dataset
Finally, the training dataset is formed by sliding nine cell values of matrix elements, that is, using a 3 × 3 mask for values, and the mask movement step is 1 in the process.But it is worth noting that using nine cells to take values from a 21 × 31 matrix determines that only 19 × 29 results containing nine cells data can be obtained, and ultimately obtain a total of 551 pieces of data in the training data set.
The input of the final training data to the network is the result of dimensionality reduction processing, containing the mean value of the historical SSP, the measured temperature information in a certain depth range of the shallow sea, the measured sound speed information in a certain depth range of the shallow sea, and the latitude and longitude information corresponding to each data.The output of the training data are the sound speed value of 58 depth layers in November of the desired year.
Then, in the model training stage, it also involves the process of data normalization and data format conversion.Data normalization generally adopts the principle of minimax normalization, while the data format conversion needs to be decided based on the specific requirements of the model input.

RFC-CNN-Based Model for Estimating SSP
CNN is a kind of deep feed-forward neural network with the characteristics of local connectivity and weight sharing.The weight sharing and local connectivity of neurons between upper and lower layers not only reduce the total number of network parameters, but also alleviate the overfitting phenomenon of the model in the process of training [11,27].In practice, CNNs can be used to mine a large amount of historical data information to derive the mapping relationship between the input feature quantities and the output values, thus avoiding complex modeling problems and enabling accurate prediction of the results.
The fully connected layer in the CNN is mostly used to unfold the feature map into vectors and send them to the back-end excitation function.However, there may be situations where the number of parameters is too large, which reduces the speed of training and is prone to overfitting.The dataset used in this paper does not have a high dimensionality, so the use of the fully connected layer can be reduced within the network, and only one fully connected layer can be added to the final output part as the output, so as to further increase the efficiency of the model.
In order to extend the universality of sound speed estimation methods and improve their real-time performance, in this paper, we propose an XCTD-assisted RFC-CNN SSP estimation method.The structure of the model is shown in Figure 3, with seven convolutional layers, one fully connected layer as the output layer, and four pooling layers set behind the first four convolutional layers, respectively.
The input layer of this model is set as a three-dimensional tensor, with the size of the first dimension determined by the number of input features, and the number of the second and third dimensions set to 1.After the input layer, the convolution layer, the batch normalization layer, the leaky relu layer, and the average pooling layer are sequentially set.These four kinds of layers can be regarded as a whole, and this whole can be repeatedly set to achieve the reduction of data operations and improve the efficiency of the model operation.Since the input is one-dimensional data, the size of the convolution kernel can be set to [2 × 1], and the number of filters can be sequentially set to 4, 8, 16 and 32.After executing the operation of the 32 filters, the data operation is simple enough to only set up the convolution layer and the leaky relu layer, and set the number of filters to 64 and 128 sequentially.After performing all the operations, the data can be passed to the fully connected layer as the output.In this process, the kernel size of the average pooling layer is set to [2 × 1] and the step size is set to 2. The Leaky ReLU type function is selected for the activation function layer, which does not require complex exponential operations and can achieve a faster convergence rate.In the network architecture, the number of neurons in a fully connected layer is usually determined based on the complexity of the task and the data.In this paper, it needs to set up only one fully connected layer as the output, and the number of neurons in the fully connected layer should be set according to the output dimension of the task.If the output is set to the sound speed value of 58 layers, the number of neurons in the fully connected layer can be set to 58.Finally, a regression layer is added to calculate the loss value, which helps the model to adjust the weights and parameters based on the feedback of the loss function to improve the performance.
When setting up the model, it is necessary to choose the appropriate optimization algorithm.In this paper, the adam method can be used as the optimization algorithm, and appropriate parameters such as the number of epochs, batch size, and the learning rate can be set.The selected adam algorithm usually adaptively adjusts the learning rate, and sets the learning rate to change numerically based on the distance.The choice of this method greatly improves the estimation accuracy of the model.

Experimental Parameter Setting
In the experiment, 80% of the extracted training data will be used as the training set, and 20% as the test set, resulting in a total of 551 data pieces.The first 440 pieces will be used as the training set, and the latter 111 pieces will be used as the test set.
When the data are input into the model for training, the data should be normalized, and the data normalization adopts the principle of minimax normalization to achieve that each row of the matrix is normalized to the [−1, 1] interval.The normalization expression is shown below.
where x denotes the data that needs to be normalized, xmax and xmin are the maximum and minimum values of x, respectively, and y denotes the normalized result, ymax and ymin denote the expected maximum and minimum values of each row.By default, the value of ymax is 1 and the value of ymin is −1.
Then, it is also necessary to convert the input data of the training and test sets into the input data form of CNN required in MATLAB, which is a four-dimensional form.The first dimension is the number of features, the second and third dimensions are set to 1, and the last dimension is the number of samples.The output data format is just kept in the original format.
Finally, the designed RFC-CNN is trained using the set training set, and the trained network is used to estimate the output of the test set.The parameters related to network training are set as shown in Table 1.However, the estimation results obtained at this point need to be transposed and reverse-normalized to convert the estimated values into a form that matches the actual values, which facilitates subsequent comparison and observation.To assess the feasibility of the model, the difference between the actual values and the predicted values can be shown by drawing comparison curves to visualize the performance of the model.The final values of the RMSE and the MAE can also be calculated to evaluate the model.Thus, the performance of the estimation model can be evaluated from various aspects to determine the accuracy and stability of the model's estimation results.
After determining the feasibility of the model, firstly, a comparative experiment is conducted on the measured temperature data from different depths of shallow seawater used in the input data.The trend of the accuracy of the SSP estimation is compared and observed, so as to determine the impact of the depth range of the measured temperature data contained in the input data on the accuracy of the estimation results.Secondly, the accuracy trend of SSP estimation can be compared when the input includes the measured sound speed data from different depths of shallow seawater.Thus, the impact of the measured sound speed data from different depths of shallow seawater on the accuracy of SSP estimation can be judged.
During the comparative experiment, the principle of single variable should be ensured, and data within the same time and spatial range should be used.For example, when studying the impact of the depth range of the measured temperature data on the estimation results, the dataset used in each experiment only involves different depth ranges of shallow seawater temperature, while the others remain consistent.

Verification of Model Feasibility
First of all, the experimental data of Indian Ocean region is selected to verify the feasibility of the model.The average historical SSP, temperature of 30 m in shallow sea surface, sound speed of 30 m in shallow sea surface, latitude and longitude are used as inputs, and the final results obtained by the model can be observed.The results are extracted every 20 bars to compare with the corresponding actual values.Six results are extracted as shown in Figure 4.  improving the real-time performance of the model estimation results.However, based on figures above, it can be found that the extracted third and sixth results had poor accuracy in shallow seawater.It was analyzed that probably because the effect of salinity on the actual SSPs was not taken into account.
During the experiment, we selected the average SSP data as part of the input, aiming to weaken the effect of extreme conditions on the SSP.In order to show the necessity of this choice, we can estimate the SSP at the center of the nine-gallery grid using only the measured data, and compare the results with those of the method proposed in this paper.
Considering that the spatial interpolation algorithm can fit discrete observations into a continuous surface, it is able to reconstruct the complete sound speed field by utilizing a limited number of known values.We choose the Inverse Distance Weighting (IDW), a deterministic spatial interpolation algorithm [28], to realize the estimation of the SSP at the center point of the nine-gallery grid by using the SSP data of the surrounding eight points and the coordinates where they are located.Six results obtained by the IDW method in the selected Indian Ocean region are extracted as shown in Figure 5   From the results obtained using IDW, it can be seen that the closer the distance from the center point, the greater the weight associated with the SSP data, i.e., the weight is a decreasing function of the distance.This is less able to cope with extreme situations than the weight updating method, so the generalizability is not as good as the method proposed in this paper.The IDW method relies on numerical operations, and the algorithmic complexity is higher compared to the method proposed in this paper.

Input Temperature Measured Data from Different Depth Ranges for Comparison
The first comparison experiment is conducted using datasets containing measured temperature data from different depth ranges.From the results of the comparison experiment, the depth range of the input temperature measured data that can achieve the optimal estimation result is judged.Two predicted values are selected from the results of each experiment for comparison with the actual values, as shown in Figure 6.
The RMSE results obtained by inputting measured temperature data from different depth ranges for the comparison experiment are shown in Table 2.The two columns of RMSE Values in the table represent the RMSE results for each of the two outcomes sampled.The RMSE in the case of inputting temperature data from six different depth ranges is shown in Figure 7.
According to the figures and the table, the RMSE values obtained by inputting temperature measurements within the range of 30 m to 50 m are on the small side.Therefore, in order to obtain more accurate estimates of SSPs, the optimal depth range for inputting shallow seawater temperature measurements is from 30 m to 50 m.The second comparison experiment is conducted using datasets containing measured sound speed data from different depth ranges.Two predicted values are selected from the results of each experiment for comparison and observation with the actual values, as shown in Figure 8.The RMSE results obtained by inputting measured sound speed data from different depth ranges for the comparison experiment are shown in Table 3.The two columns of RMSE Values in the table represent the RMSE results for each of the two outcomes sampled.From the results of the comparison experiment, the depth range of the input shallow seawater sound speed measured data that can achieve the optimal estimation result is judged.The RMSE in the case of inputting shallow seawater measured sound speed data from six different depth ranges is shown in Figure 9. From the results of the comparison experiment, it can be seen that the RMSE values obtained by inputting shallow seawater sound speed measurements within a range of about 100 m are relatively small.Therefore, in order to obtain more accurate estimates of SSPs, the optimal depth range for inputting sound speed measurements is about 100 m.

Conclusions
In this paper, a RFC-CNN-based SSP estimation method with the help of XCTD measured data is investigated.This method can quickly and accurately estimate the underwater SSP directly through the learning and feature extraction of the marine environmental data, and it also meets the real-time demand, and the experiment verifies the feasibility of the model in this paper.In addition, the selection of XCTD equipment reduces the cost of obtaining the measured data of sound speed and temperature, and improves the convenience and ease of implementation of the SSP estimation method.In this paper, we also compare the effects of inputting measured data of different depth ranges of seawater on the model's effectiveness in estimating the SSP.From the experimental results, the depth range of the input shallow seawater sound measured data that can achieve the optimal estimation result can be judged.
In addition, salinity is also one of the important factors affecting the variation in sound speed.Salinity data can be included in the subsequent study, using the univariate method, to explore the effect of the inclusion of salinity measured data on the performance of the model in estimating the SSP.In future research, it is also necessary to conduct research on different waters and complex marine environments.Researchers can adjust and improve the network according to the actual needs, thereby improving the efficiency of the model, adapting to different application scenarios, and expanding the scope of application.Considering the direct effect of global warming, our method has certain limitations for long-term sound speed estimation.It is necessary to supplement the newly collected data to the training data in future research, to reduce the limitations of the method by taking into account the effects of objective factors such as global warming.

Figure 1 .
Figure 1.Framework diagram of SSP estimation model based on RFC-CNN.
years of effort and multi-party cooperation, the Argo program has formed a vast global ocean observation system.The historical SSP data are selected from the Global Ocean Argo Dataset (GDCSM_ Argo) released by the China Argo Real-Time Data Center.The temporal resolution of the data are once a month, while the spatial resolution is 1°× 1°(longitude × latitude), and the data includes 58 vertical horizons from the sea level to 1975 m, stratified as follows: 0-10 m, divided into one layer every 5 m; 10-180 m, divided into one layer every 10 m; 180-460 m, divided into one layer every 20 m; 500-1250 m, divided into one layer every 50 m; 1300-1900 m, divided into one layer every 100 m; and the deepest layer is 1975 m.As shown in Figure 2, the region in the Pacific Ocean ranging from 9.5°N to 29.5°N, from 129.5°E to 159.5°E, and the region in the Indian Ocean ranging from 25.5°S to 45.5°S, from 60.5°E to 90.5°E are selected as the research area for this paper.The black dots in the figure indicate the locations of the Argo observation system's profiling buoys, which are the medium through which the Argo system acquires data on the marine environment.The sound speed data in the Argo dataset are extracted in November each year from 2017 to 2021, which includes the data from 21 × 31 coordinate points in space in November of each year during the 5-year period.

Figure 2 .
Figure 2. Spatial range of experimental data extraction.3.2.2.Historical Sound Speed and Temperature Matrix The data in November of each year corresponding to 21 × 31 coordinate points in the selected area are extracted from the Argo dataset, which forms 5 m × n × h temperature matrices, denoted as T 2017 , T 2018 , T 2019 , T 2020 and T 2021 , and 5 m × n × h sound speed matrices, which are labeled as S 2017 , S 2018 , S 2019 , S 2020 and S 2021 , respectively, where m corresponds to longitude with a value of 31, n corresponds to latitude with a value of 21, and h corresponds to depth with a value of 58.The temperature matrix T k and the sound speed matrix S k (k ∈ [2017, 2021]) are sliced, and page i of the matrix represents the data of each coordinate point in the ith depth layer, which is denoted as S i k , and is expressed as follows.

Figure 3 .
Figure 3. Network structure of the SSP estimation model based on RFC-CNN.

Figure 4 .
Figure 4. Estimations of SSP.The minimum value of RMSE obtained is 0.2021, which was obtained in the sixth result.During the model training and estimation process, the backend timing was performed separately.The training of the model finally took 31.15s, the estimation of the SSP of the target coordinate point took 0.132 s, and the minimum value of RMSE obtained was 0.22, indicating that the designed model can achieve fast and accurate estimation of the SSP.In addition, the model training process can be completed offline in advance, further . The black line of each figure shows the results obtained by the IDW algorithm, and the other eight lines show the SSP at the eight surrounding coordinate points.

Figure 5 .
Figure 5. SSP results obtained using IDW.The other eight lines of different colors in the subfigures show the SSP at the eight surrounding coordinate points.

Figure 6 .
Figure 6.Estimations of SSP.4.2.3.Input Measured Sound Speed Data from Different Depth Ranges for Comparison

Figure 9 .
Figure 9.Comparison of RMSE for different depth cases of inputs.

Table 1 .
Network training parameters.

Table 2 .
RMSE results of the comparison experiment.

Table 3 .
RMSE results of the comparison experiment.