An Automatic Internal Wave Recognition Algorithm Based on CNN Applicable to an Ocean Data Buoy System

: The application of internal wave recognition to the buoy system is of great signiﬁcance to enhance the understanding of the ocean internal wave phenomenon and provide more accurate data and information support. This article proposes an automatic internal wave recognition algorithm based on convolutional neural networks (CNN), which is used in the tight-proﬁle intelligent buoy system. The sea proﬁle temperature data were collected using the Bailong buoy system in the Andaman Sea in 2018. The CNN network structure is applied to feature compression of ocean temperature proﬁle data, reducing the input feature amount of the feature recognition network, thereby reducing the overall algorithm parameters and computational complexity. By adjusting the number of convolution kernels and the length of convolution steps, the original data features in the time domain and the space domain are compressed, respectively. The experimental results show that the identiﬁcation accuracy and robustness of this method are clearly superior to those of other methods. Additionally, the parameter number and calculation amount of this algorithm are very tiny, which greatly improves the possibility of its deployment in the buoy system.


Introduction
Internal waves are an ocean phenomenon with short periods and large amplitudes that can usually reach tens to hundreds of meters [1].Internal waves have been observed in many sea areas [2][3][4][5][6][7][8].Internal waves usually occur in the deep ocean and can change the thermohaline structure of seawater by affecting the vertical mixing of seawater, which is an important link in the transfer of large-scale and mesoscale motion energy [9,10].The impact of internal waves on marine ecosystems is also important.One important impact is that on the supply of nutrients in the upper ocean [11], which is of great significance for ocean productivity and the construction of food chains.In addition, internal waves can also affect the suspension and reaccumulation of seabed sediments, as well as the distribution and transformation of biological and chemical substances in the seabed [12].Internal waves also affect the species composition, community structure, and productivity of some marine ecosystems.Internal waves are also closely related to ocean utilization and maritime activities.Internal waves can affect the navigation of underwater vehicles and the operation of offshore drilling platforms [13], and they may also affect the dynamic response of offshore platforms.Therefore, understanding the characteristics and distribution of internal waves and studying their impact on the ocean and the environment are of great significance for understanding the ocean, protecting the environment, and improving disaster prevention and reduction.
Tides are considered the most common driving force for the generation of internal waves in the ocean.However, there are also several other mechanisms that can promote the generation of internal waves.Among them, the mechanism by which internal waves are generated through the interaction of strong currents with underwater sandbars is well known for producing lee waves [14].Furthermore, atmospheric disturbances, including wind fields and pressure fields, are important factors contributing to the generation of internal waves in the ocean.Previous studies have found that even a slow-moving pressure field can generate internal waves resembling a moving container, but on a much larger scale [15].There are also studies on the internal waves induced by wind forces.Through these studies, it has been demonstrated that wind speed divergence and convergence, as well as spatiotemporal variations in wind fields, can trigger baroclinic instability [15][16][17].Internal waves can be directly produced by eddies or indirectly through various phenomena associated with eddies, including drained energy, eddy-topography interaction, breaking of eddies, etc. [14].Fu and Holt were the first to report the coexistence of internal waves and mesoscale vortices observed in SAR (Synthetic Aperture Radar) imagery, but the authors did not directly link the internal waves with the vortices [18].Subsequently, this type of wave was observed in SAR imagery and pointed out by other researchers [19,20].The Andaman Sea is located in the northeastern part of the Indian Ocean, between the Andaman Islands, the Malay Peninsula, the Nicobar Islands, and the island of Sumatra [21].Tides are predominantly dominated by semi-diurnal tides [22].The topography and water column structure of the Andaman Sea provide the basic conditions for the generation of internal solitary waves [23,24], making it a natural experimental field for studying internal solitary waves.In addition, the prevailing monsoon and frequent eddies in the Andaman Sea are also important factors contributing to the generation of internal waves.
At present, internal wave recognition methods based on satellite remote sensing images [25][26][27][28][29] and ocean profile data are commonly used [30][31][32].The satellite remote sensing image method can be used to recognize internal waves by observing irregular light and dark fringes in images.With the rapid development of artificial intelligence, some scholars have carried out research on automatic internal wave recognition algorithms based on satellite remote sensing images.Celona S. et al. [27] used X-band radar to collect remote sensing images and a machine learning algorithm of a support vector machine (SVM) model to classify whether the images contained internal solitary waves or tidal internal waves, realizing the automatic detection and classification of internal waves.Bao S. et al. [28] used the target detection method to realize the internal wave automatic recognition method based on SAR remote sensing images.However, the observation range of satellite remote sensing images is usually large, and the satellite orbit is constantly changing, so it is impossible to observe specific areas for a long time.In addition, the observation of satellite remote sensing images is affected by natural factors such as weather and clouds [29], which will also affect the identification and observation of internal waves.and the characteristics of internal waves are easily confused with other features in remote sensing images (vortex, ship wake, wind, waves, etc.) [28].
In recent years, some scholars have performed related research on internal wave recognition based on ocean profile data.Zhang B. et al. [30], using the physical process of internal waves driving water particles to fluctuate up and down, proposed a method for calculating the amplitude of internal waves.The feasibility of this method was verified using data collected via a temperature chain installed on a moored buoy.However, this algorithm cannot automatically locate the position of internal waves and cannot be directly applied to automatically identify internal waves in the moored buoy system.Suanda S. H. et al. [31] used a buoy equipped with a thermistor to collect offshore ocean temperature profile data for a month, and the collected temperature data were filtered via differential filtering.Then, the filtered data were compared with threshold values, and values greater than the standard threshold value were judged to be internal waves.Liu B. et al. [32] proposed a method of measuring internal waves based on a mobile temperature chain real-time monitoring system that was independently designed to perform the mobile real-time monitoring of internal waves, and the method was tested on a monitoring ship.However, through experimental verification, this study found that the recognition effect of the threshold method was not excellent: the recall was 83.33%, the precision was 89.74%, and the delay was 5.2444 min.Deploying the internal wave recognition algorithm to the ocean data buoy system can allow researchers to improve the efficiency of data processing and analysis, reduce the cost of data transmission and processing, improve the real-time performance of observation data, and flexibly respond to different observation situations.However, none of the above methods [30][31][32] can meet the needs of accurate and automatic identification of internal waves in ocean data buoy systems.
In recent years, the application of CNN in the field of ocean engineering has gained widespread use.Their application has revolutionized the way we tackle various challenges and tasks.With their ability to analyze large amounts of data and extract meaningful features [33,34], CNNs have been extensively applied in ocean engineering, including ocean data analysis, ocean environmental monitoring, marine robotics, and autonomous systems [35][36][37][38][39][40][41][42].Him et al. [35] show that a statistical forecast model employing a CNN approach produces skilled ENSO forecasts for lead times of up to one and a half years.Jörges et al. [36] developed a novel two-dimensional mixed-data deep CNN for spatial SWH prediction in the nearshore area of Norderney, Germany.Chen Y. et al. [37] propose a meta-self-attention multi-scale convolution neural network (MSAMS-CNN) for the actuator fault diagnosis of AUVs.Jing Y. et al. [38] apply a CNN to construct the mapping relationship between wind data and wave data, which takes an hourglass configuration.Zhou Z. et al. [39] proposed a framework for ship speed extraction based on deep learning, taking into consideration the application of ship detection and tracking technology in hazy environments.Lu et al. [40] use the CNN-LSTM approach and utilize spatiotemporal information from the CYGNSS observations to establish an innovative model for ocean wind speed inversion.
In this paper, an automatic internal wave recognition algorithm based on CNN is proposed.This algorithm can be deployed directly on the buoy systems.By processing and analyzing the ocean profile temperature data collected using the buoy, the internal wave sign is extracted, and internal wave recognition is carried out by combining the neural network.The algorithm has the characteristics of real-time performance, high reliability, and automation and can meet the needs of internal wave recognition in intelligent buoys.In addition, considering the high energy consumption requirement of the buoy system, the algorithm can improve the feature extraction efficiency, reduce the number of parameters and calculation amount of the algorithm, and reduce the energy consumption of the buoy system by selecting a suitable number of convolution kernels and convolution interval.

Methods
In this paper, an internal wave recognition algorithm suitable for tight buoys is designed based on a neural network.The neural network algorithm used in the algorithm consists of two modules: a feature extraction module and a feature classification module.The algorithm first uses 1D-CNN [43,44] to extract features from input data and then uses a fully connected neural network to classify features.

Feature Extraction Module
CNNs can be divided into 1D-CNNs, 2D-CNNs, and 3D-CNNs according to input data types, and CNNs can extract more effective information from much more data [45].
The network structure is shown in Figure 1.The original data are a temperature sequence with 14 layers of lengths of 30, which contains 420 feature quantities.After the feature extraction network consisting of 1D-CNN, the data are transformed into a feature sequence with 5 layers of lengths of 8, which contains 40 feature quantities.Feature extraction can be achieved by enlarging the sampling step of the convolution operation and reducing the number of convolution kernels.CNNs can be divided into 1D-CNNs, 2D-CNNs, and 3D-CNNs according to input data types, and CNNs can extract more effective information from much more data [45].The network structure is shown in Figure 1.The original data are a temperature sequence with 14 layers of lengths of 30, which contains 420 feature quantities.After the feature extraction network consisting of 1D-CNN, the data are transformed into a feature sequence with 5 layers of lengths of 8, which contains 40 feature quantities.Feature extraction can be achieved by enlarging the sampling step of the convolution operation and reducing the number of convolution kernels.The correlation between adjacent moments of the ocean temperature profile data is strong, so appropriately increasing the sampling stride of the convolution operation will not affect the algorithm.The convolution kernel calculation formula for the ij-th element is shown in Formula (1).
where "" refers to the original data, "" represents the convolution output, "" is used to represent the convolution kernel, "" is used to indicate the number of bits that are utilized in the convolution output, "" refers to the sampling step that is used during the convolution operation, and "ℎ" represents the length calculation.
From Formula (1), the relationship between the input feature number "", the output feature number "", and the sampling step "" of the convolution operation can be obtained as follows (As shown in Formula 2 and Figure 2.):  The correlation between adjacent moments of the ocean temperature profile data is strong, so appropriately increasing the sampling stride of the convolution operation will not affect the algorithm.The convolution kernel calculation formula for the ij-th element is shown in Formula (1).
where "a" refers to the original data, "ConvOutput" represents the convolution output, "w" is used to represent the convolution kernel, "n" is used to indicate the number of bits that are utilized in the convolution output, "l" refers to the sampling step that is used during the convolution operation, and "length" represents the length calculation.
From Formula (1), the relationship between the input feature number "k", the output feature number "n", and the sampling step "l" of the convolution operation can be obtained as follows (As shown in Formula 2 and Figure 2.): CNNs can be divided into 1D-CNNs, 2D-CNNs, and 3D-CNNs according to input data types, and CNNs can extract more effective information from much more data [45].The network structure is shown in Figure 1.The original data are a temperature sequence with 14 layers of lengths of 30, which contains 420 feature quantities.After the feature extraction network consisting of 1D-CNN, the data are transformed into a feature sequence with 5 layers of lengths of 8, which contains 40 feature quantities.Feature extraction can be achieved by enlarging the sampling step of the convolution operation and reducing the number of convolution kernels.The correlation between adjacent moments of the ocean temperature profile data is strong, so appropriately increasing the sampling stride of the convolution operation will not affect the algorithm.The convolution kernel calculation formula for the ij-th element is shown in Formula (1).
where "" refers to the original data, "" represents the convolution output, "" is used to represent the convolution kernel, "" is used to indicate the number of bits that are utilized in the convolution output, "" refers to the sampling step that is used during the convolution operation, and "ℎ" represents the length calculation.
From Formula (1), the relationship between the input feature number "", the output feature number "", and the sampling step "" of the convolution operation can be obtained as follows (As shown in Formula 2 and Figure 2.):  This algorithm identifies internal waves through the temperature data of ocean profiles with multiple depth layers, and the temperature variation trend of adjacent depth layers is similar when internal waves arrive.To solve this problem, this method improves the efficiency of spatial features by designing a suitable number of convolution kernels.The corresponding relationship between ConvOutput and the original data "a" can be obtained from Formula (1), while the output of the feature extraction network is used to calculate the mean value of the convolutional output, as shown in Formula (3).
where "output FE " denotes the output of the feature extraction network, "N" refers to the number of layers in the raw data, and "M" represents the number of groups of convolutional kernels.Formula (3) shows that the spatial dimension of the feature extraction output is related to the number of convolution kernel groups and has nothing to do with the spatial dimension of the original data.Therefore, this method improves the effectiveness of spatial features by testing different numbers of convolution kernel groups.

Feature Classification Module
The feature recognition network of this algorithm is composed of two layers of a fully connected neural network, which does not have the feature extraction capability itself but only performs a nonlinear combination of features [46].After the output layer, a softmax layer is added to calculate the probability of a category belonging.The softmax expression used in this algorithm is shown in Formula (4).output = e x 1 e x 1 +e x 2 e x 2 e x 1 +e x 2 (4) In the formula, output represents the output of the neural network, and x 1 and x 2 represent the two nodes of the output layer.The output of the classification network is a 1 × 2 matrix, where output(0) and output(1) are the probabilities of identifying no and yes internal waves, respectively.
Finally, the recognition results are shown in Formula (5).
In the formula, y pre represents the prediction result, and P represents the judgment probability.

Materials 2.2.1. Collect Data
The Bailong buoy [47,48] was independently integrated and developed by the First Institute of Oceanography of the Ministry of Natural Resources.The buoy device consists of a buoy body, an anchor system, a power supply unit, a meteorological sensor, a hydrological sensor, and data acquisition control and communication units, as shown in Figure 3A.Through comparison and testing with the ATLAS and TFLEX buoys of the United States, the results show that all the data for the Bailong buoy have excellent performance [49].The temperature profile data of the Bailong buoy, placed at 9.6 • N and 95.6 • E in the Andaman Sea in the Indian Ocean on 14 December 2018, were used in this experiment.It States, the results show that all the data for the Bailong buoy have excellent performance [49].The temperature profile data of the Bailong buoy, placed at 9.6° N and 95.6° E in the Andaman Sea in the Indian Ocean on 14 December 2018, were used in this experiment.It was continuously observed for 11 months and 17 days and recovered on 10 November 2019 (as shown in Figure 3B,C).A total of 18 layers of self-contained RBR sensors are installed on the buoy anchorage (the layout location is shown in Table 1).The sensor types include T, CT, and CTD.The sensor sampling frequency is set to 1 min, and the layout depth is 0−600 m, among which 0−200 m sensors are dense and 200−600 m sensors are sparse.A total of 18 layers of self-contained RBR sensors are installed on the buoy anchorage (the layout location is shown in Table 1).The sensor types include T, CT, and CTD.The sensor sampling frequency is set to 1 min, and the layout depth is 0−600 m, among which 0−200 m sensors are dense and 200-600 m sensors are sparse.

Data Annotation
The work on data standards is divided into two parts.First, the start time (T s ), extreme time (T e ), and end time (T f ) of internal waves should be annotated based on the vertical sea temperature profile diagram (as shown in Figure 4).Additionally, the amplitude (H, Formula (6)) of internal waves is determined via the variation in thermocline depth (D).In this study, the 14-degree isotherm is used as the thermocline depth.The vertical velocity component (V p , Formula ( 7)) is calculated based on the amplitude and duration of the internal waves.An internal wave is classified when the amplitude is greater than 15 m and the vertical velocity component exceeds 1 m/s [50].In total, 1641 internal waves have been labeled.

Data Annotation
The work on data standards is divided into two parts.First, the start time ( ), extreme time ( ), and end time ( ) of internal waves should be annotated based on the vertical sea temperature profile diagram (as shown in Figure 4).Additionally, the amplitude (H, Formula ( 6)) of internal waves is determined via the variation in thermocline depth ().In this study, the 14-degree isotherm is used as the thermocline depth.The vertical velocity component ( , Formula ( 7)) is calculated based on the amplitude and duration of the internal waves.An internal wave is classified when the amplitude is greater than 15 m and the vertical velocity component exceeds 1 m/s [50].In total, 1641 internal waves have been labeled.

Feature Selection
According to the collected data, the ocean temperature profile is drawn.As shown in Figure 5, when the water depth is less than 200 m, the temperature changes significantly with increasing water depth, while when the water depth is greater than 200 m, the temperature does not change significantly with increasing water depth.This paper reflects the existence of internal waves through the change in the vertical distribution of water temperature.Therefore, the temperature data from 14 layers of sensors is selected as the input feature for the neural network.

Feature Selection
According to the collected data, the ocean temperature profile is drawn.As shown in Figure 5, when the water depth is less than 200 m, the temperature changes significantly with increasing water depth, while when the water depth is greater than 200 m, the temperature does not change significantly with increasing water depth.This paper reflects the existence of internal waves through the change in the vertical distribution of water temperature.Therefore, the temperature data from 14 layers of sensors is selected as the input feature for the neural network.

Data Annotation and Splitting
The labeled dataset is divided into three parts: a training dataset, a validation dataset, and a testing dataset.The training dataset and validation dataset are used to train the neural network, and the testing dataset evaluates the performance of the final network model.The buoy collection data from 14 December 2018 to 24 January 2019 is taken as the testing dataset, and the buoy collection data from 24 January 2019 to 9 November 2019 is used as the training dataset and validation dataset.The division ratio of the training dataset and validation dataset is 8:2.The data in the dataset is added in a loop, with a new dataset being added every other minute.The specific partition of the dataset is shown in Table 2.

Data Annotation and Splitting
The labeled dataset is divided into three parts: a training dataset, a validation dataset, and a testing dataset.The training dataset and validation dataset are used to train the neural network, and the testing dataset evaluates the performance of the final network model.The buoy collection data from 14 December 2018 to 24 January 2019 is taken as the testing dataset, and the buoy collection data from 24 January 2019 to 9 November 2019 is used as the training dataset and validation dataset.The division ratio of the training dataset and validation dataset is 8:2.The data in the dataset is added in a loop, with a new dataset being added every other minute.The specific partition of the dataset is shown in Table 2.

Model Training
Firstly, the parameters of the feature extraction network model and feature classification network model are initialized.Secondly, for training models with a training set, the overall loss of the model is calculated through the forward propagation process, and the model parameters are updated through the back propagation process according to the loss.Finally, the validating dataset is used to evaluate the model and determine whether the model converges or not.If the model convergence proves that the training completes the derivation of the model parameters, otherwise it is proved that the model does not reach the optimal value, and it is necessary to continue to adjust the parameters until the model converges.The training process is shown in Figure 6.In the proposed model, the cross-entropy loss function [51,52] is applied, the calculation formula is shown in Formula (8), the optimizer used is Adam, and the learning rate is set to 0.001.

Model Training
Firstly, the parameters of the feature extraction network model and feature classification network model are initialized.Secondly, for training models with a training set, the overall loss of the model is calculated through the forward propagation process, and the model parameters are updated through the back propagation process according to the loss.Finally, the validating dataset is used to evaluate the model and determine whether the model converges or not.If the model convergence proves that the training completes the derivation of the model parameters, otherwise it is proved that the model does not reach the optimal value, and it is necessary to continue to adjust the parameters until the model converges.The training process is shown in Figure 6.In the proposed model, the crossentropy loss function [51,52] is applied, the calculation formula is shown in Formula (8), the optimizer used is Adam, and the learning rate is set to 0.001.
In the formula, y 0 and y 1 indicate that the real label is no internal wave and there is an internal wave, respectively, and output(0) and output(1) are the probabilities of identifying no and yes internal waves, respectively.
In the formula,  and  indicate that the real label is no internal wave and there is an internal wave, respectively, and (0) and (1) are the probabilities of identifying no and yes internal waves, respectively.

Experimental Evaluation
In this paper, accuracy (Formula ( 9)) recall (Formula (10)), precision (Formula (11)), F1 score (Formula ( 12)) [53], and delay (Formula ( 13)) are used as metrics of the internal wave recognition algorithm.Accuracy is used to measure how well the model correctly identifies internal waves.Recall is used to measure the model's ability to identify internal waves.Precision is used to measure the accuracy of model recognition of internal waves.Generally, recall and precision are expected to both be high, but in some cases, the two indicators are contradictory.Therefore, the F1 score is used to reconcile recall and precision.In addition, the difference between the time when internal waves are recognized and the start time when internal waves are marked is defined as the delay.As shown in Table 3, the presence of internal waves is defined as a positive object, while the absence of internal waves is defined as a negative object.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
= (10) In the formula, TP indicates the number of samples that are correctly classified as having internal waves, TN indicates the number of samples that are correctly classified as having no internal waves, and FP indicates the number of samples that are incorrectly classified as having internal waves.FN represents the number of samples incorrectly

Experimental Evaluation
In this paper, accuracy (Formula ( 9)) recall (Formula (10)), precision (Formula (11)), F1 score (Formula ( 12)) [53], and delay (Formula ( 13)) are used as metrics of the internal wave recognition algorithm.Accuracy is used to measure how well the model correctly identifies internal waves.Recall is used to measure the model's ability to identify internal waves.Precision is used to measure the accuracy of model recognition of internal waves.Generally, recall and precision are expected to both be high, but in some cases, the two indicators are contradictory.Therefore, the F1 score is used to reconcile recall and precision.In addition, the difference between the time when internal waves are recognized and the start time when internal waves are marked is defined as the delay.As shown in Table 3, the presence of internal waves is defined as a positive object, while the absence of internal waves is defined as a negative object.
Recall = TP TP + FN (10) In the formula, TP indicates the number of samples that are correctly classified as having internal waves, TN indicates the number of samples that are correctly classified as having no internal waves, and FP indicates the number of samples that are incorrectly classified as having internal waves.FN represents the number of samples incorrectly classified as nonexistent internal waves.IP represents the internal wave-identified position, and SP represents the internal wave-start position marked in the dataset.
In addition, to explore the effectiveness of the feature extraction network, in addition to the above evaluation indicators of the algorithm, this paper also compares the data correlation and the number of features (N) before and after feature extraction.Finally, to study the practicability of the algorithm, we calculate the storage cost and computing cost using different model structures, in which the storage cost is measured using the model parameter number (parameters) index and the computing cost is measured using floating-point number (FLOPs) index.

Experimental Environment
In this study, all the experimental code source code is Python, using the PyTorch neural network architecture; the software installation version is Python 3.8.10,torch 1.11.1, and cuda11.3.The computing unit uses an RTX2080Ti graphics card with 11 GB of video memory and 40 GB of RAM.

Algorithm Recognition Effect
To validate the good performance of the algorithm in internal wave recognition and ensure its convergence, the training and testing dataset accuracy curves for internal wave recognition were plotted in this study (Figure 7).From the graph, it can be observed that the overall recognition performance of the algorithm is quite satisfactory.The validation dataset accuracy reaches 96.84% after 100 training iterations, and the accuracy steadily converges after around 50 iterations, as seen from the overall accuracy curve.
classified as nonexistent internal waves.IP represents the internal wave-identified position, and SP represents the internal wave-start position marked in the dataset.In addition, to explore the effectiveness of the feature extraction network, in addition to the above evaluation indicators of the algorithm, this paper also compares the data correlation and the number of features (N) before and after feature extraction.Finally, to study the practicability of the algorithm, we calculate the storage cost and computing cost using different model structures, in which the storage cost is measured using the model parameter number (parameters) index and the computing cost is measured using the floating-point number (FLOPs) index.

Experimental Environment
In this study, all the experimental code source code is Python, using the PyTorch neural network architecture; the software installation version is Python 3.8.10,torch 1.11.1, and cuda11.3.The computing unit uses an RTX2080Ti graphics card with 11 GB of video memory and 40 GB of RAM.

Algorithm Recognition Effect
To validate the good performance of the algorithm in internal wave recognition and ensure its convergence, the training and testing dataset accuracy curves for internal wave recognition were plotted in this study (Figure 7).From the graph, it can be observed that the overall recognition performance of the algorithm is quite satisfactory.The validation dataset accuracy reaches 96.84% after 100 training iterations, and the accuracy steadily converges after around 50 iterations, as seen from the overall accuracy curve.

Reliability Verification
To compare the artificial intelligence method with the threshold method [19,20], the threshold method used in this study uses the same test set as the artificial intelligence method to identify internal waves.The threshold method determines the range of

Reliability Verification
To compare the artificial intelligence method with the threshold method [19,20], the threshold method used in this study uses the same test set as the artificial intelligence method to identify internal waves.The threshold method determines the range of temperature changes within 30 min by setting a threshold (θ) to determine whether there is an internal wave.The effects of different thresholds on the experiment are shown in Table 4.When the threshold recognition internal wave method is set at 2.5 °C, the recall rate is close to 100%, but the precision is close to 45.49%.As θ increases, recall decreases sharply, precision increases sharply, and delay becomes longer.When θ = 5 °C, the precision is 97.65%, but the corresponding recall is only 65.87%, the delay reaches 8.2759 min.Therefore, the threshold method cannot balance the relationship between recall and precision, and the reliability of the algorithm cannot be guaranteed in practical applications.
Compared with the threshold method, the recognition effect of the artificial intelligence method has been significantly improved, as shown in Figure 8.The feature extraction network can extract and strengthen the internal wave signs and delete irrelevant features.The feature recognition network is trained to fit the internal wave sign through historical data, which takes more internal wave sign elements into consideration and has a better recognition effect than the threshold method neural network.From the experimental results, the recall rate reached 95.31%, precision was 97.53%, and the delay was reduced to 5.0862 min.Therefore, despite improving precision, the recall rate has remained at a high level, greatly enhancing the algorithm's reliability in practical applications.
temperature changes within 30 min by setting a threshold (θ) to determine whether there is an internal wave.The effects of different thresholds on the experiment are shown in Table 4.
When the threshold recognition internal wave method is set at 2.5 ℃, the recall rate is close to 100%, but the precision is close to 45.49%.As  increases, recall decreases sharply, precision increases sharply, and delay becomes longer.When  = 5 ℃, the precision is 97.65%, but the corresponding recall is only 65.87%, the delay reaches 8.2759 min.Therefore, the threshold method cannot balance the relationship between recall and precision, and the reliability of the algorithm cannot be guaranteed in practical applications.Compared with the threshold method, the recognition effect of the artificial intelligence method has been significantly improved, as shown in Figure 8.The feature extraction network can extract and strengthen the internal wave signs and delete irrelevant features.The feature recognition network is trained to fit the internal wave sign through historical data, which takes more internal wave sign elements into consideration and has a better recognition effect than the threshold method neural network.From the experimental results, the recall rate reached 95.31%, precision was 97.53%, and the delay was reduced to 5.0862 min.Therefore, despite improving precision, the recall rate has remained at a high level, greatly enhancing the algorithm's reliability in practical applications.To further verify the reliability of the artificial intelligence algorithm used in this paper, the artificial intelligence algorithm and threshold method are compared with the actual internal wave temperature vertical structure observation data, in which it is specified that the period when internal waves are recognized is a low state and the period when internal waves are not recognized is a high state.The comparison results are shown in Figure 9. Compared with the threshold method, the artificial intelligence method can identify more internal waves, as shown in Figure 9A,B.Due to the slow rate of temperature change, the threshold method cannot identify these internal waves.In addition, the artificial intelligence method has fewer misidentification phenomena, as shown in Figure 9C,D.The threshold method has misidentification phenomena, which are caused by the fact that although the temperature in the misidentification period tends to rise or fall, it is not enough to define this period as an internal wave period.Artificial intelligence algorithms can improve the accuracy and reliability of internal wave recognition by extracting and identifying internal wave signs.
To further verify the reliability of the artificial intelligence algorithm used in this paper, the artificial intelligence algorithm and threshold method are compared with the actual internal wave temperature vertical structure observation data, in which it is specified that the period when internal waves are recognized is a low state and the period when internal waves are not recognized is a high state.The comparison results are shown in Figure 9. Compared with the threshold method, the artificial intelligence method can identify more internal waves, as shown in Figure 9A,B.Due to the slow rate of temperature change, the threshold method cannot identify these internal waves.In addition, the artificial intelligence method has fewer misidentification phenomena, as shown in Figure 9C,D.The threshold method has misidentification phenomena, which are caused by the fact that although the temperature in the misidentification period tends to rise or fall, it is not enough to define this period as an internal wave period.Artificial intelligence algorithms can improve the accuracy and reliability of internal wave recognition by extracting and identifying internal wave signs.

Validity Verification of the Feature Extraction Network
In this paper, by adjusting the convolution stride and the number of convolution kernels of the feature extraction network, the effectiveness of features is improved in the time dimension and space dimension, respectively.The internal wave recognition effect is best when the convolution stride is 4 and the number of convolution kernels is 5.The feature extraction network has 420 input features and 40 output features, and the efficiency of feature extraction can reach 90.48%.The correlation matrix between the input and output data of the feature extraction network is shown in Figure 10.The correlations between the original data can be reduced through the feature extraction network.

Validity Verification of the Feature Extraction Network
In this paper, by adjusting the convolution stride and the number of convolution kernels of the feature extraction network, the effectiveness of features is improved in the time dimension and space dimension, respectively.The internal wave recognition effect is best when the convolution stride is 4 and the number of convolution kernels is 5.The feature extraction network has 420 input features and 40 output features, and the efficiency of feature extraction can reach 90.48%.The correlation matrix between the input and output data of the feature extraction network is shown in Figure 10.The correlations between the original data can be reduced through the feature extraction network.

Sampling Step Selection of Convolution Operation
To compare the effects of sampling steps of different convolution operations on the experiment, this study compares the effects of convolution steps from 1 to 7 on the results of internal wave recognition.Figure 11 shows that when the convolution step amplitude changes from 1 to 4, the changes in recall, precision, F1 score, and delay are not obvious, but when the stride continues to increase, F1 score will have a significant downward trend, and delay will have an obvious prolongation trend because the collected underwater tem-perature profile series is a continuously collected time series, and increasing the sampling interval by properly increasing the convolution step has little effect on the final internal wave recognition results.However, when the convolution step is raised to 5, the internal wave recognition has an obvious downward trend, so this method selects a convolution step of 4 to sample the sequence, and when the number of convolution cores is 8, the output feature number is 64, as shown in Table 5.

Sampling Step Selection of Convolution Operation
To compare the effects of sampling steps of different convolution operations on the experiment, this study compares the effects of convolution steps from 1 to 7 on the results of internal wave recognition.Figure 11 shows that when the convolution step amplitude changes from 1 to 4, the changes in recall, precision, F1 score, and delay are not obvious, but when the stride continues to increase, F1 score will have a significant downward trend, and delay will have an obvious prolongation trend because the collected underwater temperature profile series is a continuously collected time series, and increasing the sampling interval by properly increasing the convolution step has little effect on the final internal wave recognition results.However, when the convolution step is raised to 5, the internal wave recognition has an obvious downward trend, so this method selects a convolution step of 4 to sample the sequence, and when the number of convolution cores is 8, the output feature number is 64, as shown in Table 5.

Sampling Step Selection of Convolution Operation
To compare the effects of sampling steps of different convolution operations on the experiment, this study compares the effects of convolution steps from 1 to 7 on the results of internal wave recognition.Figure 11 shows that when the convolution step amplitude changes from 1 to 4, the changes in recall, precision, F1 score, and delay are not obvious, but when the stride continues to increase, F1 score will have a significant downward trend, and delay will have an obvious prolongation trend because the collected underwater temperature profile series is a continuously collected time series, and increasing the sampling interval by properly increasing the convolution step has little effect on the final internal wave recognition results.However, when the convolution step is raised to 5, the internal wave recognition has an obvious downward trend, so this method selects a convolution step of 4 to sample the sequence, and when the number of convolution cores is 8, the output feature number is 64, as shown in Table 5.

Selection of the Number of Convolution Kernels
To reduce the spatial information redundancy of the input data, this paper selects the appropriate number of convolution kernels to obtain more effective spatial features for internal wave recognition.The number of convolution kernels selected in the experiment is 1-9.When the number of convolution kernels is 1, it is found that the algorithm does not converge, and when the convolution kernel is 2-9, the experimental results are shown in Figure 12.When the number of convolution kernels increases from 2 to 3, the recognition effect of the algorithm is not obvious.When the number of convolution kernels is increased from 3 to 5, the recognition effect of the algorithm is significantly improved.When the number of convolution kernels is raised from 5 to 9, it is found that the internal wave recognition effect is not improved, so the final number of convolution kernels selected with this algorithm is 5, and the number of output features is 40 when the convolution step is 4, as shown in Table 6.To reduce the spatial information redundancy of the input data, this paper selects the appropriate number of convolution kernels to obtain more effective spatial features for internal wave recognition.The number of convolution kernels selected in the experiment is 1-9.When the number of convolution kernels is 1, it is found that the algorithm does not converge, and when the convolution kernel is 2-9, the experimental results are shown in Figure 12.When the number of convolution kernels increases from 2 to 3, the recognition effect of the algorithm is not obvious.When the number of convolution kernels is increased from 3 to 5, the recognition effect of the algorithm is significantly improved.When the number of convolution kernels is raised from 5 to 9, it is found that the internal wave recognition effect is not improved, so the final number of convolution kernels selected with this algorithm is 5, and the number of output features is 40 when the convolution step is 4, as shown in Table 6.

Practical Verification
By comparing the recognition results of different network structures, the one-layer convolutional neural network plus the fully connected internal wave recognition network used in this paper has the best result.Figure 13 shows the precision-recall curve of various methods, in which the precision-recall curve of this method is significantly higher than that of other network structures.As shown in Table 7, the effect of internal wave recognition is significantly improved after adding the feature extraction network, and the effect of the feature extraction network using one-dimensional convolution is also better than that of other feature extraction networks.This method has an F1 score that is 3.4% higher than the F1 score without the feature extraction network structure.The delay has been

Practical Verification
By comparing the recognition results of different network structures, the one-layer convolutional neural network plus the fully connected internal wave recognition network used in this paper has the best result.Figure 13 shows the precision-recall curve of various methods, in which the precision-recall curve of this method is significantly higher than that of other network structures.As shown in Table 7, the effect of internal wave recognition is significantly improved after adding the feature extraction network, and the effect of the feature extraction network using one-dimensional convolution is also better than that of other feature extraction networks.This method has an F1 score that is 3.4% higher than the F1 score without the feature extraction network structure.The delay has been reduced by 1.22 min.Compared with the two-layer CNN and three-layer CNN feature extraction network structures, the F1 score has improved by 2.46% and 2.14%, respectively.Additionally, the delay is reduced by 0.53 min and 1.09 min, respectively.Compared with LSTM, the delay is shorter by 0.5 min, and the F1 score is increased by 2.12%.duced by 88.2%, and the amount of calculation is reduced by 77.66%.
According to the analysis of the recognition effect of the algorithm, the number of parameters, and the amount of calculation, the algorithm has a good recognition effect, fewer parameters, and less calculation.The parameters and FLOPs of the algorithm are 1593 and 3024, respectively, so it requires very low storage capacity and computing power from the equipment.The algorithm can be directly deployed in the controller of the intelligent buoy to meet the need for automatic recognition of internal waves at the buoy end.In terms of the calculation of the number of network parameters, because this method reduces the input features of the feature recognition network by selecting the appropriate convolution steps and the number of convolution kernels, the number of parameters and computation of the algorithm are greatly reduced.As shown in Table 8, the number of parameters and the amount of calculation for this method are 1593 and 3024, respectively.Compared with the direct feature recognition method, the number of parameters is reduced by 88.2%, and the amount of calculation is reduced by 77.66%.According to the analysis of the recognition effect of the algorithm, the number of parameters, and the amount of calculation, the algorithm has a good recognition effect, fewer parameters, and less calculation.The parameters and FLOPs of the algorithm are 1593 and 3024, respectively, so it requires very low storage capacity and computing power from the equipment.The algorithm can be directly deployed in the controller of the intelligent buoy to meet the need for automatic recognition of internal waves at the buoy end.

Discussion and Future Work
Internal wave detection poses several challenges and technical issues.One significant challenge is the variability and complexity of internal wave patterns, which makes their identification difficult.Additionally, the presence of noise in in situ observations further complicates the detection process.Another challenge is the lack of standardized methods for internal wave detection, leading to inconsistencies in data analysis and comparison across different studies.
Currently, some researchers have utilized deep learning methods in conjunction with satellite remote sensing images to recognize internal waves [28,54].The basic principle involves identifying internal waves by observing the bright and dark patterns on the remote sensing images.However, due to specific conditions and time constraints, it is difficult to make continuous observations in specific areas, which limits the continuity and comprehensiveness of internal wave data.In addition, weather conditions, such as cloud cover and atmospheric interference, can also degrade image quality and affect the accuracy of internal wave identification.
In this paper, we propose several innovative methods to address the challenges of internal wave detection.On the one hand, we utilize a CNN algorithm, which takes advantage of its ability to learn complex patterns and features from field measurements.We employ advanced preprocessing techniques to improve the quality of input data and minimize noise interference.Our algorithm combines adaptive threshold and feature extraction techniques to improve the accuracy of internal wave identification.
On the other hand, we carefully select the parameters of the convolutional neural network to reduce the algorithm's parameters and computational complexity without compromising the detection performance.This allows us to deploy the algorithm in buoy systems in the future, which will help buoy systems efficiently process the redundant raw temperature profile data in any weather condition.By compressing some of the data, we can significantly reduce the computational and storage requirements without significantly affecting the detection results.
It should be noted that when applying the algorithm, certain considerations need to be considered.Fine-tuning of key parameters may be necessary to optimize the algorithm's performance for different datasets and observational conditions.It is essential to use a diverse range of training data types, including observed and modeled data and high-and low-resolution data, to ensure the algorithm's robustness and generalizability.
The potential applications of this technology extend beyond the study area to other marine regions where internal wave phenomena occur.Furthermore, the proposed method can be applied to the observation and analysis of other mesoscale atmospheric and physical oceanic phenomena, such as typhoons, eddies, and marine ecological studies.By expanding its application, this technology contributes to a better understanding of the ocean environment and its various dynamics.
In summary, this paper addresses the challenges in internal wave detection by introducing an innovative deep learning-based approach.The proposed method has the potential to be widely applied in various marine regions and opens the door to further development in the field of physical and biological oceanography.

Conclusions
This study presents an automated algorithm for recognizing internal waves in oceanographic buoy data based on convolutional neural networks (CNN).By exploiting the local connectivity of CNN, the algorithm effectively compresses the raw data, thereby significantly reducing the input dimension of the feature extraction network.To assess the reliability, practicality, and effectiveness of this feature extraction network, we conducted experiments using training, validation, and testing sets of Bailong buoy data.The results demonstrate the CNN-based approach's remarkable enhancement in both recall and precision of internal wave recognition, achieving a high level of performance.Moreover, we introduce an efficient feature extraction network that effectively reduces computational complexity and the number of algorithm parameters.This research forms the groundwork for automating the dependable recognition of internal waves in intelligent buoy systems.In future work, we aim to further refine and optimize the algorithm while exploring its application in broader contexts to contribute to the advancement of oceanographic observation and early warning systems.

Figure 1 .
Figure 1.Internal wave recognition network structure diagram.

Figure 1 .
Figure 1.Internal wave recognition network structure diagram.

Figure 1 .
Figure 1.Internal wave recognition network structure diagram.
was continuously observed for 11 months and 17 days and recovered on 10 November 2019 (as shown in Figure3B,C).J. Mar.Sci.Eng.2023, 11, x FOR PEER REVIEW 6 of 19

Figure 3 .
Figure 3. (A) Bailong buoy structure diagram; (B) the Bailong buoy located at 9.6 N and 95.6 E longitude in the Andaman Sea, Indian Ocean (indicated by the star in the image); and (C) the observation map of the Bailong buoy from 24 December 2018 to 10 November 2019.

Figure 3 .
Figure 3. (A) Bailong buoy structure diagram; (B) the Bailong buoy located at 9.6 o N and 95.6 o E longitude in the Andaman Sea, Indian Ocean (indicated by the star in the image); and (C) the observation map of the Bailong buoy from 24 December 2018 to 10 November 2019.
The sensors are positioned at depths of 15 m, 20 m, 30 m, 40 m, 50 m, 60 m, 70 m, 80 m, 100 m, 120 m, 140 m, 160 m, 180 m, 200 m, and 250 m.The selected sensor locations are mainly concentrated in the depth range of 40 m to 200 m, with the sensor coverage depth appropriately expanded.
The sensors are positioned at depths of 15 m, 20 m, 30 m, 40 m, 50 m, 60 m, 70 m, 80 m, 100 m, 120 m, 140 m, 160 m, 180 m, 200 m, and 250 m.The selected sensor locations are mainly concentrated in the depth range of 40 m to 200 m, with the sensor coverage depth appropriately expanded.

Figure 5 .
Figure 5. Diagram of temperature change in ocean profile.

Figure 5 .
Figure 5. Diagram of temperature change in ocean profile.

Figure 7 .
Figure 7.The convergence curve of the algorithm.

Figure 7 .
Figure 7.The convergence curve of the algorithm.

Figure 8 .
Figure 8.Comparison chart between the threshold method and artificial intelligence method (The threshold method has the best performance of the F1 score when θ = 3, so the experimental results of θ = 3 are selected for comparison).

Figure 8 .
Figure 8.Comparison chart between the threshold method and artificial intelligence method (The threshold method has the best performance of the F1 score when θ = 3, so the experimental results of θ = 3 are selected for comparison).

J 19 Figure 10 .
Figure 10.Comparison of the input−output correlation matrix of the feature extraction network.(A) Input feature correlation matrix.(B) Output feature correlation matrix.

Figure 11 .
Figure 11.Influence of the convolution stride of the convolution operation on the result of internal wave recognition.

Figure 10 .
Figure 10.Comparison of the input−output correlation matrix of the feature extraction network.(A) Input feature correlation matrix.(B) Output feature correlation matrix.

Figure 10 .
Figure 10.Comparison of the input−output correlation matrix of the feature extraction network.(A) Input feature correlation matrix.(B) Output feature correlation matrix.

Figure 11 .
Figure 11.Influence of the convolution stride of the convolution operation on the result of internal wave recognition.Figure 11.Influence of the convolution stride of the convolution operation on the result of internal wave recognition.

Figure 11 .
Figure 11.Influence of the convolution stride of the convolution operation on the result of internal wave recognition.Figure 11.Influence of the convolution stride of the convolution operation on the result of internal wave recognition.

Figure 12 .
Figure 12.The influence of the number of convolution kernels on internal wave recognition.

Figure 12 .
Figure 12.The influence of the number of convolution kernels on internal wave recognition.

Figure 13 .
Figure 13.Precision -recall curve comparison of different networks.

Table 3 .
Confusion matrix with or without binary classification of internal waves.

Table 3 .
Confusion matrix with or without binary classification of internal waves.

Table 4 .
Experimental results of the threshold method.

Table 4 .
Experimental results of the threshold method.

Table 5 .
The corresponding relationship between the convolution step and the output feature of the feature extraction network.

Table 5 .
The corresponding relationship between the convolution step and the output feature of the feature extraction network.

Table 6 .
The corresponding relationship between the number of convolution kernels and the output features of the feature extraction network.

Table 6 .
The corresponding relationship between the number of convolution kernels and the output features of the feature extraction network.

Table 7 .
Influence of different network structures on internal wave recognition results.

Table 8 .
Comparison of the number of parameters and computation of different networks.

Table 7 .
Influence of different network structures on internal wave recognition results.

Table 8 .
Comparison of the number of parameters and computation of different networks.