Denoising Ocean Turbulence Microstructure Signals for Application in Estimating Turbulence Kinetic Energy Dissipation Rates Based on EMD-PCA

When ocean turbulence signals are collected using turbulence observation instruments in real marine environments, the effective signals in the acquired data set are often polluted by noise. In order to eliminate the noise component contained in the non-stationary and nonlinear ocean turbulence signals, a new multi-scale turbulence signal denoising method is proposed by combining the empirical mode decomposition (EMD) and principle component analysis (PCA). First, the time series of turbulence signals are decomposed into a couple of components by EMD algorithm and approximately calculate the noise energy in each intrinsic mode function (IMF). Then, PCA is implemented on each IMF. The appropriate principal components are selected according to the decomposition characteristics of PCA and the noise energy proportion in IMF. Each IMF is reconstructed by the selected principle components. At last, the effective ocean turbulence signals are reconstructed by the corrected IMFs and the residue. Ocean turbulence signals collected in the South China Sea (SCS) are used to evaluate the effectiveness of the proposed method. The results show that the proposed method can effectively eliminate the noise and maintain the characteristics of the effective turbulence signals under high noise. Turbulence kinetic energy (TKE) is also estimated from the denoised signals, which provide a reliable data basis for the analysis of the turbulent characteristics in later stage.


Introduction
Ocean turbulence mixing is one of the most challenging research fields in physical oceanography as it is unsteady and irregular. As one of the main physical processes, turbulence controls the ocean overturning circulation at a wide range of temporal and spatial scales because of its influences on climate. At small scales, turbulence affects the ocean circulation and ecosystems by enhancing the transport of heat and nutrients within the water column. Therefore, the study of ocean turbulence can not only improve people's understanding of energy exchange in the marine environment and global climate change, but also strengthen the prediction system of marine climate and disasters [1][2][3].
Highly accurate turbulence data is the basis to analyze ocean turbulence characteristics. Oceanic turbulence measurements have been taken for more than 50 years since the pioneering work by Grant in 1962 [4]. In the open ocean, turbulence is usually assessed by measuring velocity fluctuations at dissipation scales using shear probes, while temperature fluctuations are measured with thermistors or platinum film thermometers. By mounting on different ocean observation platforms, the shear probe can continuously achieve nents from the original complex signal to represent most of the information of the original signal [13]. The turbulence denoising method based on EMD-PCA is designed to conduct in the frequency domain, which can achieve the optimal estimation of the observation spectrum and the dissipation rate of turbulence kinetic energy (TKE). Through decomposing the original turbulence signal into IMFs and estimating the noise energy using PCA algorithm, the appropriate principal components are selected for reconstruction. The sea data collected in the South China Sea with a moored turbulence measuring instrument (MTMI) is used to evaluate the validity of the improved turbulence denoising method. The outline of this paper is as follows. The turbulence observation instrument and the deployment are introduced in Section 2. Section 3 mainly presents the improved turbulence denoising method based on EMD-PCA in detail and gives the process to calculate the TKE dissipation rate. Sea data results to verify the validity of the improved denoising method are shown and discussed in Section 4, and conclusion remarks are given in Section 5.

Instrument and Deployment
The sea data used to verify the validity of the turbulence denoising method is collected in the South China Sea (21 • 09.900 N 117 • 42.031 E) with a moored turbulence measuring instrument (MTMI) designed by Ocean University of China (OUC). The MTMI is designed to collect abundant turbulence microstructure time series from the deep sea for a long time at a fixed level. In this experiment, the instrument was deployed in about 250-m-deep water, and it successfully collected turbulence microstructure time series for about 110 days.
The main structure of the MTMI is shown in Figure 1. The core component of MTMI is the turbulent observing part (part B in Figure 1), which in the middle of the nose contains two orthogonal shear probes (PNS07) to measure the cross-stream velocity fluctuations. In the design, one of the shear probes is oriented to sense horizontal velocity fluctuations (∂w/∂t), and the other responds to the athwartships velocity fluctuations (∂v/∂t). To monitor the status of the instrument under the seawater, the three orthogonal accelerometers are housed in the instrument cabin by a righthand cartesian coordinate, which can provide information on three directions. All the voltage output of the sensors are digitized by 16-bit A/D conversion and finally stored in the storage card. Meanwhile, part C mounted a current meter to measure the current speed, direction and the ability of the instrument to respond to the ambient currents. Part A is mainly the glass floats designed to provide buoyancy and ensure the stability of the whole system. Part E is the anchor block. Under the effects of part A and part E, the whole system can maintain a vertical state underwater. To recover the whole system, the acoustic release in part D opens and releases the anchor block, then the whole system ascends to the sea surface under the effect of buoyancy. The structure of MTMI and the field test location of it in the South China Sea are shown in Figure 1. The asterisk in Figure 1b represents the field test location and the horizontal and vertical coordinates represent longitude and latitude, respectively.

Multi-Scale Turbulence Signal Denoising Method Based on EMD-PCA
The EMD (empirical mode decomposition) method is a signal processing technique, which was originally developed to process nonstationary and nonlinear data, such as experimental turbulence records. Now, it has been further applied successfully to a variety of physical systems. The key benefit of using EMD is that it is automatic and fully data adaptive.
The EMD method decomposes the original signal ) (t S into a finite number n of intrinsic mode functions (IMFs), which represent different time scales and frequency from high to low [11]: IMFs can be expressed as the amplitude and phase of the j mode, respectively. Each IMF is characterized by a time-dependent ) (t j ω , and a typical time scale can be obtained by averaging over the whole time interval. It can be seen that the essence of EMD decomposition is to decompose the characteristics of different scales in the signal. The signal is gradually decomposed to form a series of IMF components, and each IMF component reflects the intrinsic modal characteristics of the different scales. Here, we analyze the signals in terms of energy. We use the EMD method to extract IMFs from a single mixed signal, which is a combination of a source signal and compound noises. The obtained IMF component can be defined as [14]:  The EMD (empirical mode decomposition) method is a signal processing technique, which was originally developed to process nonstationary and nonlinear data, such as experimental turbulence records. Now, it has been further applied successfully to a variety of physical systems. The key benefit of using EMD is that it is automatic and fully data adaptive.
The EMD method decomposes the original signal S(t) into a finite number n of intrinsic mode functions (IMFs), which represent different time scales and frequency from high to low [11]: IMFs can be expressed as IMF j (t) = A j (t) cos Φ j (t), where A j (t) and Φ j (t) represent the amplitude and phase of the j mode, respectively. Each IMF is characterized by a timedependent ω j (t), and a typical time scale can be obtained by averaging over the whole time interval. It can be seen that the essence of EMD decomposition is to decompose the characteristics of different scales in the signal. The signal is gradually decomposed to form a series of IMF components, and each IMF component reflects the intrinsic modal characteristics of the different scales.
Here, we analyze the signals in terms of energy. We use the EMD method to extract IMFs from a single mixed signal, which is a combination of a source signal and compound noises. The obtained IMF component can be defined as [14]: Let F j = im f j and F j = g j + n j ,where, g j is the effective signal component in im f j and n j is the noise component in im f j . Then, the energy of each IMF component is recorded as ε(F j ), ε(g j ), ε(n j ). Based on the assumption that the effective signals have no correlation with the noise component, the denoising process can be considered as removing the noise energy ε(n j ) from the ε(F j ) [15].
In order to find uncorrelated dominant basis components, the PCA algorithm is used. By employing singular value decomposition, matrix U is referred to as a row basis representing the principal components of Y L×P . The singular values represent the standard deviations proportional to the amount of information contained in the corresponding principal components [16,17]. A reduced set of P basis vectors are selected from U L×P using P first singular values.
According to the decomposition characteristics of the EMD method, im f 1 has the maximum amount of noise [10]. When using PCA to remove the noise component in im f 1 , we only retain the first principal component, and the obtained component can be defined by g 1 = [u 1 0 0 · · · 0]P = u 1 P 1 and the noise removed can defined by We record the energy of noise in im f 1 as W 1 , which is calculated as [18,19]: where, N represents the appropriate number of principal components. For im f j (j ≥ 2), it is necessary to select an appropriate number of principal components for signal reconstruction.
As each IMF component can be defined as im f j = F j = g j + n j , we transform each IMF into zero mean matrix according to PCA algorithm: F j has the same noise component as im f j and E[.] represents the mathematical expectation. After decomposition of the signal F j using PCA, we can obtain the principal component that is expressed as P 1 , P 2 , · · · P K and λ 1 , λ 2 , · · · λ K , respectively. Where P i and λ i represent as eigenvectors and eigenvalues of the covariance matrix, K is the length and reconstruction is conducted. The de-noising signal is then expressed as: The noise component removed from F j can be defined as: According to the noise energy model, the energy of noise component in im f j (j ≥ 2) can be approximately calculated according to W 1 , which can be expressed as: [20].
The energy of signal F j and noise n j are expressed as ε(F j ) and ε(n j ) , respectively. Here, ε(F j ) and ε(n j ) can be calculated as [21,22]: Then, the signal to noise ratio can be obtained as follows: If the appropriate number N of principal components is selected, the energy of removed noise n j will be the same as the noise energy contained in the original signal, that considered that the noise part of the signal F j is completely eliminated. The number of principal components N should meet the following requirements: Once the number of principal components N is determined, the reconstructed signal g j can be obtained.

Turbulence Signal Denoising Algorithm Based on EMD-PCA
Based on the above analysis, the specific steps of the proposed EMD denoising algorithm based on PCA are as follows: Step 1: Preprocessing the observed original turbulent shear voltage signal and converting the voltage signal into physical turbulent shear signal x(t). Decompose the signal by EMD method and obtain the IMF components of each layer, which are recorded as im f 1 (t), im f 2 (t) · · · im f m (t), as well as the remainder r m (t).
Step 2: Calculate the covariance matrix C m×m as well as its eigenvalues λ 1 ≥ λ 2 ≥ · · · ≥ λ m and eigenvectors U = [u 1 , u 2 , · · · , u m ] of each IMF component im f j . Apply the PCA method on every im f j and obtain the principal component Step 3: For the first IMF layer component im f 1 , let the number of principal components N = 1 and reconstruct the signal based on the first principal component. The denoised signal is referred to as g 1 . Calculate the noise energy ε(n 1 ) , and estimate the noise energy for other IMF components im f j (j ≥ 2).
Step 4: For other IMF components im f j (j ≥ 2), choose the appropriate number of principal components to reconstruct the signal and get the denoised signal g j .
Step 5: Cumulate and reconstruct the turbulence shear signal based on the g j (j ≥ 1) of each layer and the remainder r m (t) The denoised turbulence shear signalx(t) is obtained.
Step 6: Based on the obtainedx(t), calculate the wavenumber spectrum and the dissipation rate of TKE (turbulence kinetic energy). Compare the wavenumber spectrum with the standard Nasmyth theoretical spectrum to evaluate the turbulence shear signal denoising effect.

Result and Analysis
In this work, the EMD-PCA algorithm is used to eliminate the noise of the observed ocean turbulence data, and the standard Nasmyth theoretical spectrum is selected as the standard to evaluate the denoising effectiveness. In the filed test, the instrument collected abundant sea data from the deep sea for a long time at a fixed level. The shear signal sample processed in this paper is 3 min long, and the corresponding velocity U = 0.28 m/s, which represents a 250-m velocity profile.
Turbulence shear signals are measured from two orthogonal shear probes equipped on the turbulence observation system. The two shear probes can measure the velocity shear (∂w/∂z and ∂v/∂z) in two orthogonal directions, respectively. The output voltage of the shear probe can be expressed as [23]: where U is the velocity of the shear probe, w is the velocity perpendicular to the probe and S is the sensitivity of the shear probe. Based on the Taylor frozen assumption, we can obtain the shear signals [24]: The unprocessed shear signal sample contains contamination from the instrument motions and vibrations, which obscures the true environmental shear signal in the time domain (shown in Figure 2). The blue wavy lines in Figure 2b represents the raw collected voltage and the black wavy lines represents the turbulence shear calculated using Equations (11) and (12). In Figure 2 Comparison of the raw velocity shear signals and the decomposed IMF modes is shown in Figure 3. It is easy to find that after the resampling, the resampled signals hold the same detail information as the original signals. Here, we only present one of the two perpendicular velocity components that decomposed using the EMD method. All the IMF modes have different mean frequency and time scales. Comparison of the raw velocity shear signals and the decomposed IMF modes is shown in Figure 3. It is easy to find that after the resampling, the resampled signals hold the same detail information as the original signals. Here, we only present one of the two perpendicular velocity components that decomposed using the EMD method. All the IMF modes have different mean frequency and time scales.
Comparison of the raw velocity shear signals and the decomposed IMF modes is shown in Figure 3. It is easy to find that after the resampling, the resampled signals hold the same detail information as the original signals. Here, we only present one of the two perpendicular velocity components that decomposed using the EMD method. All the IMF modes have different mean frequency and time scales. In Figure 3, each IMF modes has its own time scales and frequency. IMF1-IMF9 represent the different parts of the original ocean turbulence shear signal with the frequency from high to low. The time scales of the nine IMF modes become larger as the orders increase, while the frequency and amplitudes become smaller. The varia-  In Figure 3, each IMF modes has its own time scales and frequency. IMF1-IMF9 represent the different parts of the original ocean turbulence shear signal with the frequency from high to low. The time scales of the nine IMF modes become larger as the orders increase, while the frequency and amplitudes become smaller. The variation tendency of IMF1 is the most similar to the original shear signals. Frequency comparison of the raw velocity shear signals and the decomposed IMF modes is shown in Figure 4. The horizontal axis represents the frequency and the vertical axis represents the amplitude.   PCA is used to analyze the principal component of the obtained IMF1 and estimate the noise energy. After the noise energy contained in all IMF components are estimated, the appropriate number of principal components is selected for each IMF component. After the denoising of IMF components in each layer is completed, the reserved IMF components and the remainder are accumulated and reconstructed. So, the turbulence shear signal after denoising is obtained. The comparison between the reconstructed turbulence shear signal and the unprocessed one are shown in Figure 5.  PCA is used to analyze the principal component of the obtained IMF1 and estimate the noise energy. After the noise energy contained in all IMF components are estimated, the appropriate number of principal components is selected for each IMF component. After the denoising of IMF components in each layer is completed, the reserved IMF components and the remainder are accumulated and reconstructed. So, the turbulence shear signal after denoising is obtained. The comparison between the reconstructed turbulence shear signal and the unprocessed one are shown in Figure 5.
cies of IMF modes from IMF5 to IMF9.
PCA is used to analyze the principal component of the obtained IMF1 and estimate the noise energy. After the noise energy contained in all IMF components are estimated, the appropriate number of principal components is selected for each IMF component. After the denoising of IMF components in each layer is completed, the reserved IMF components and the remainder are accumulated and reconstructed. So, the turbulence shear signal after denoising is obtained. The comparison between the reconstructed turbulence shear signal and the unprocessed one are shown in Figure 5. To further verify the effectiveness of the improved algorithm based on EMD-PCA, we compare it with the classical cross spectrum algorithm. Signals processed with the EMD-PCA algorithm are transformed into the frequency domain using the Fast Fourier Transform (FFT). Based on the Taylor freezing theorem, the shear frequency power spectrum is converted to wavenumber domain k = f /U and can be compared with the standard Nasmyth theoretical spectrum, which is regarded as a criterion for evaluating the effectiveness of turbulence signals. The dissipation rate of turbulence kinetic energy is calculated with the corrected shear signals using spectral integration [23]: where ∂w/∂z is the vertical shear, Φ(k) is the wavenumber spectrum of the shear signals, ν is the coefficient of viscosity and ν ≈ 1.64 × 10 −6 m 2 s −1 .
The comparison results are shown in Figure 6. In Figure 6, the three solid curves are the observed spectra, respectively, where the black curve is the raw signal spectrum before denoising, the red one is the spectrum after being corrected with the improved method proposed in this paper and the blue one is the spectrum after being corrected with the cross-spectrum denoising algorithm. The green dashed curve is the Nasmyth theoretical spectrum. The current speed is 0.34 m/s in Figure 6a, 0.28 m/s in Figure 6b and 0.19 m/s in Figure 6c.
It is obvious that compared with the cross-spectrum denoising algorithm, the observed spectrum corrected with the improved denoising algorithm based on EMD-PCA better eliminates the noise spike and the spectrum shape agrees much closer with the Nasmyth theoretical spectrum up to 80 cpm, which means the low wavenumber noise apparent in the raw spectrum is removed more effectively. Furthermore, the dissipation rate of the TKE computed with the corrected shear signals using the improved denoising algorithm drops near an order of magnitude compared to the raw measured data, which provides reliable and effective data for the further analysis of the turbulence characteristics. the observed spectra, respectively, where the black curve is the raw signal spectrum before denoising, the red one is the spectrum after being corrected with the improved method proposed in this paper and the blue one is the spectrum after being corrected with the cross-spectrum denoising algorithm. The green dashed curve is the Nasmyth theoretical spectrum. The current speed is 0.34 m/s in Figure 6a, 0.28 m/s in Figure 6b and 0.19 m/s in Figure 6c. It is obvious that compared with the cross-spectrum denoising algorithm, the observed spectrum corrected with the improved denoising algorithm based on EMD-PCA better eliminates the noise spike and the spectrum shape agrees much closer with the Nasmyth theoretical spectrum up to 80 cpm, which means the low wavenumber noise apparent in the raw spectrum is removed more effectively. Furthermore, the dissipation rate In addition, the improved denoising algorithm based on EMD-PCA is also performed on two days of turbulent velocity shear signals. Here, we also add the variation of the corresponding measured current velocity, shown in Figure 7a. The dissipation rate of TKE before and after denoising are calculated and compared, which are shown in Figure 7b. The variation curve of the turbulence kinetic energy dissipation rate represents the semi-diurnal tide characteristic. When the macroscopic parameters fluctuate with the tide period, the instantaneous local flow characteristics, as well as the turbulent structure, will be affected by a variety of mechanisms. In addition, this work also adopts two criteria to evaluate the denoising effectiveness of the improved denoising algorithm based on EMD-PCA: MSE (mean square error) and SNR (signal-to-noise ratio). MSE is always used to evaluate the change degree of the data and the smaller of MSE, the better of the algorithm performance. SNR represents the ratio between signal and noise and the higher of SNR, the less noise it contains and the better of the algorithm efficiency [25,26]. (15) In this work, the turbulent velocity shear signals that collected at different current  In addition, this work also adopts two criteria to evaluate the denoising effectiveness of the improved denoising algorithm based on EMD-PCA: MSE (mean square error) and SNR (signal-to-noise ratio). MSE is always used to evaluate the change degree of the data and the smaller of MSE, the better of the algorithm performance. SNR represents the ratio between signal and noise and the higher of SNR, the less noise it contains and the better of the algorithm efficiency [25,26].
SNR = 10 × log 10 x(t) (x(t) −x(t)) 2 (15) In this work, the turbulent velocity shear signals that collected at different current speed are processed using the improved denoising algorithm based on EMD-PCA. The calculated results of MSE and SNR at different current speed are shown in Table 1. It can be seen that at different current speed, the values of MSE calculated from the original signal and the denoised signal is relatively small; meanwhile, the SNR is relatively high, which can verify the good denoising effectiveness.

Conclusions
Ocean turbulence collected signals are the basis to analyze the turbulence mixing characteristics. In this work, aiming at eliminating the noise component in the collected turbulence signals, an improved turbulence denoising method combining the EMD and PCA is proposed. First, the raw turbulence signals are decomposed using EMD method and noise energy of each IMF is estimated. Second, the PCA method is implemented on each IMF to select the appropriate number of principal components. Finally, the denoised signal is reconstructed based on the selected effective IMF. Ocean turbulence shear signals collected in the South China Sea with a moored turbulence observation instrument are used to evaluate the performance of the improved denoising method. Samples under different current speed are chosen. By comparing with the cross-spectrum denoising method, the results show that the improved method can effectively and reliably eliminate the vibration noise. Two days of dissipation rates of TKE before and after denoising are also calculated and compared, which proved that the improved method can well eliminate the noise and can supply effective turbulence signals for further analysis of ocean turbulence.
Author Contributions: Conceptualization, X.C. and X.Z.; methodology, X.C.; analyzed the data, X.C. and X.Z.; wrote the paper, X.C.; writing-original draft preparation, X.C.; writing-review and editing, Y.L. and X.L.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.