A Modulation Recognition System for Underwater Acoustic Communication Signals Based on Higher-Order Cumulants and Deep Learning

: Underwater acoustic channels, inﬂuenced by time-varying, space-varying, frequency-varying, and multipath effects, pose signiﬁcant interference challenges to underwater acoustic communication (UWAC) signals, especially in non-cooperative scenarios. The task of modulating and identifying distorted signals faces huge challenges. Although traditional modulation recognition methods can be useful in the radio ﬁeld, they often prove inadequate in underwater environments. This paper introduces a modulation recognition system for recognizing UWAC signals based on higher-order cumulants and deep learning. The system achieves blind recognition of received UWAC signals even under non-cooperative conditions. Higher-order cumulants are employed due to their excellent noise resistance, enabling the differentiation of OFDM signals from PSK and FSK signals. Additionally, the high-order spectra differences among signals are utilized for the intra-class recognition of PSK and FSK signals. Both simulation and lake test results substantiate the effectiveness of the proposed method.


Introduction
The ocean contains abundant untapped resources.To safeguard maritime sovereignty and maritime rights and interests, the demand for underwater information transmission is becoming increasingly urgent.However, the propagation of electromagnetic waves in underwater media suffers from severe attenuation, which greatly limits the direct application of wireless technologies in underwater communication.Although there are technologies such as photonic communication for underwater communication, they have higher transmission bandwidth, data transfer rates, lower link delays, and are difficult to detect with high security.However, limitations, such as losses, caused by complex turbulence, bubbles, seawater absorption, and light scattering in the underwater environment restrict their transmission distance and reliability [1,2].As a result, underwater acoustic communication currently serves as the primary form of underwater information transmission and is gaining more importance.
In underwater acoustic communication (UWAC), information is transmitted via sound waves, and information is loaded by mapping parameters, such as phase, frequency, and amplitude.In practical applications, there are two scenarios for the underwater communication receiver to receive and process signals.In a cooperative communication scenario, prior to the accurate restoration of the data encompassed within the signal, the receiver must identify the modulation method, also known as the information loading method; in a non-cooperative communication scenario, the reconnaissance party needs to quickly and accurately classify and recognize the intercepted signals' modulation methods, which is a prerequisite for further analyzing and mastering the enemy's communication information.Even if our side cannot directly decipher the information, we can send deceptive signals to interfere with the enemy's communication signals by identifying their modulation methods, which can provide a new idea for underwater communication countermeasures technology.Building a smooth and secure underwater communication network in both civilian and military fields requires the automatic modulation classification (AMC) of UWAC signals.Therefore, AMC holds great significance.
Figure 1   The hypothesis testing method based on decision theory, which is likelihood-based, treats the modulation recognition problem as a probability-based hypothesis testing problem, utilizing hypothesis testing-related theory.It establishes the maximum likelihood discrimination formula of the signal, deduces the optimal decision threshold of the signal, compares a certain statistical quantity of the signal with the threshold value, and judges the signal modulation type [3].Despite the robust theoretical foundation of modulation recognition technology based on the maximum likelihood ratio hypothesis testing algorithm, this approach necessitates prior knowledge, such as the signal's mean, variance, and covariance, which can be challenging to acquire in typical non-cooperative communication settings.This results in practical difficulties in application, insufficient generalization and robustness, and high computational complexity.Therefore, this research method is not commonly used at present [4].
The feature-based statistical pattern analysis method selects and extracts the signal's modulation-type features, transferring them from the object space to the feature space.Subsequently, this method analyzes the distribution properties of the various features, identifies the clustering patterns of different features, partitions the feature space, and then maps it from the feature space to the decision space to make classification decisions or judgments, ultimately accomplishing modulation recognition [5].The disadvantage is that the recognition framework of this method currently lacks a unified and complete theoretical basis as support, and the recognition system is relatively complex.The algorithm usually extracts features based on specific signal samples, and the decision threshold is set artificially based on experience, so the recognition effect is greatly affected by changes in noise and environment.When entering unfamiliar waters and the channel is not ideal, the features will become blurred or even invalid, and the system's robustness will be insufficient [6].
The machine learning-based signal modulation recognition method initially extracts specific signal features and subsequently utilizes machine learning categorizer algorithms to differentiate the signal's modulation type based on the differences in signal features [7].As the machine learning-based signal modulation recognition method is created manually based on specific standards and then feature selection is determined based on these criteria, the feature extraction procedure and the final prediction model learning are separate.Consequently, the acquired features may not enhance the final model's performance or guarantee the model's capacity for generalization.
The deep learning-based modulation recognition method employs deeper neural networks and data-driven techniques to solve the modulation recognition problem without human intervention [8,9].It extracts complex features directly from raw data using a feature extraction network and makes effective recognition decisions using a classification network, thereby achieving end-to-end feature extraction and recognition.Compared to conventional methods, it eliminates the need for complex manual feature selection and is capable of achieving synchronous optimization processing of the feature extraction network and classifier under sufficient training samples.This results in excellent communication signal recognition performance and makes it an important technical method for target feature extraction and recognition [10].
Thus, in comparison to the preceding three approaches, deep learning network technology possesses potent representation learning capabilities, enabling the automatic extraction of various intricate features from raw data.In the face of a complex underwater acoustic channel, the deep learning-based modulation recognition classifier has good robustness and can adapt to changes in the underwater environment, even under low signal-to-noise ratio conditions, it can still effectively complete the signal modulation recognition task.
In 1998, Nandi, A.K. et al. systematically expounded the theory and algorithm of automatic recognition of communication signal modulation in reference [11].The simulation results indicate that the results obtained by the artificial neural network method are better than those obtained by the decision theory method.In 2016, O'Shea, T.J. et al. utilized convolutional neural networks for the first time in communication signal recognition, as referenced in [12].By constructing an end-to-end convolutional neural network, he successfully identified the modulation methods of 11 different signals.In 2018, the team further demonstrated the ability of deep learning (DL) in radio modulation recognition through experimental research, which is stronger than computer vision (CV) and machine learning (ML).The signal recognition model based on VGG-Net and ResNet networks can identify 24 signal modulation methods [13].The investigation carried out by Jeong, S. et al. in [14] employed short-time Fourier transform (STFT) for generating a time-frequency diagram of the communication signal.Subsequently, CNN was utilized to capture the signal characteristics present in the time-frequency diagram.The FSK, PSK, and QAM modulation methods were recognized.Zhang, Z. et al. proposed a feature fusion scheme for AMC based on convolutional neural networks in reference [15].By filtering the pseudo-Wigner-Ville distribution and Born-Jordan distribution, this method transforms the signal into two time-frequency images and employs a finely tuned CNN model to extract image features.The outcomes of the simulation revealed that the scheme can achieve a classification accuracy of 92.5% when the SNR is 4 dB.
The aforementioned accomplishments are some outcomes in the domain of modulation recognition technology in wireless communication.In the complex time-varying and space-varying underwater acoustic channel environment, improvements are needed in wireless technologies to better adapt to the modulation recognition scenarios in UWAC.Cheng, E. et al. proposed a method for recognizing MPSK-like UWAC signals, which contain Gaussian white noise and multipath, in their study.Initially, this technique employs wavelet transform on the signals and utilizes the variance amplitude of the processed signals for inter-class recognition to ascertain whether they are PSK-like modulation signals.
Eventually, the fourth-order cumulant of the signals is computed for intra-class recognition of PSK-like signals.The simulation outcomes revealed that, at an SNR of −5 dB, the iden-tification precision for both BPSK and QPSK was 100% [16].To address the problem of acquiring high-quality labeled data in time-varying and space-varying underwater channels, Xu, Z. et al. presented a semi-supervised learning-based blind modulation recognition technique called SSLUWA in their work [17].The technique employs linear interpolation to pseudo-label unlabeled signals and trains the classification system using the interpolation consistency principle to extract classification features from unlabeled signals and test the knowledge learned from labeled signals, improving recognition accuracy when labeled data are scarce.Experimental results showed that compared to fully labeled samples, when the labeled samples accounted for only 10%, the recognition accuracy was 99% at an SNR of 2 dB.
In the context of non-cooperative reception, Wang, B. et al. proposed a deep learningbased blind detection technique for underwater acoustic communication signals [18].The technique utilizes an impulse noise preprocessor (INP) and a generative adversarial network (GAN) to preprocess the received signals for noise reduction, mitigating the adverse effects of underwater impulse noise.Subsequently, an automatic feature extraction technique based on a convolutional neural network (CNN) is employed to differentiate underwater acoustic communication signals from underwater noise.Furthermore, in their subsequent research documented in [19], they introduced a hybrid neural network (HNN) for modulation classification of UWAC signals.The network employs the same impulse noise preprocessor (INP) as in [18] for noise reduction, and then employs a CNN with an attention mechanism to extract signal features for recognition of 2FSK, 4FSK, 8FSK, PSK, OFDM, and other signals.Subsequently, a sparse autoencoder is used for the intra-class recognition of PSK-like signals, distinguishing between BPSK and QPSK.Field trial results showed that except for 8FSK, the recognition accuracy for other signals was above 84.5%.Lastly, in [20], the authors presented their new achievement, an underwater acoustic communication modulation recognition technique suitable for small sample conditions, named IAFNet.It includes an impulse noise pre-processing module (INP), an attention mechanism (AN), and few-shot learning (FSL).The fundamental concept is to extract similarity features from a handful of labeled signals and unlabeled signals, assign weights to the features using an attention network, and then feed them into a similarity comparison module to ascertain the modulation type of the unlabeled signals.The field trial outcomes indicated that, aside from BPSK and QPSK signals, which had a 30% probability of being classified erroneously as OFDM signals, the recognition accuracy for other signals exceeded 93%, demonstrating the effectiveness of the proposed approach.
Wang, J. et al. proposed a deep fusion neural network model called R&CNN (recurrent and convolutional neural network) for modulation classification of UWAC signals in their work [21].The model constructs a recurrent layer using a gated recurrent unit (GRU), which can mitigate the interference of the Doppler effect by memorizing and processing signal sequences.In the Yellow Sea zone of China, they assembled a dataset containing measured signals of seven modulation types, namely BPSK, QPSK, BFSK, QFSK, 16QAM, 64QAM, and OFDM.The classification recognition accuracy on the field trial data reached 99.38%, outperforming traditional methods, such as AlexNet8, LSTM, and CNN-LSTM in terms of recognition accuracy.
This paper presents a modulation recognition system designed for the blind identification of received signals in the passband, utilizing modulation techniques, like binary phase shift keying (BPSK), quadrature phase shift keying (QPSK), 2-frequency shift keying (2FSK), 4-frequency shift keying (4FSK), and orthogonal frequency division multiplexing (OFDM) in non-cooperative settings, particularly tailored to complex underwater channel environments.Below, we outline the primary contributions of this paper: (1) An improvedsixth-order cumulant is proposed for the identification of OFDM signals.
(2) An improved bispectrum is proposed for the identification of BPSK, QPSK, 2FSK and 4FSK signals.
(3) Extensive simulations are conducted in a theoretical computer simulation environment and a Bellhop simulation environment.The simulations involve the addition of realistic in-band colored noise and multipath effects to the transmitted signals.
The simulation outcomes validate the effectiveness of the proposed techniques.Moreover, field experiments conducted in an actual lake environment furnish additional proof of the effectiveness and resilience of the proposed system.

Signal Model
Several factors, including specific application scenarios, transmission distances, and bandwidth limitations, need to be considered when selecting modulation schemes in underwater acoustic communication.Different modulation schemes are suitable for different channel conditions.This paper considers common underwater acoustic modulation signals such as BPSK, QPSK, 2FSK, 4FSK, and OFDM.BPSK is generally suitable for poor channel conditions, such as high noise environments or strong fading channels.BPSK's two-phase states provide high tolerance to the phase offset and inter-symbol interference, leading to better error rate performance in low SNR environments.QPSK is suitable for relatively good channel conditions, such as moderate signal-to-noise ratio environments.QPSK divides the signal into four phase states, allowing for the transmission of more information under the same bandwidth and power conditions.Therefore, in good channel conditions, QPSK can provide higher transmission rates.Both 2FSK and 4FSK belong to frequency shift keying modulation and are suitable for channel conditions with pronounced frequency selective fading.As underwater acoustic channels are afflicted by frequency selective fading, FSK modulation schemes adapt to the frequency variations of the channel by changing the carrier frequency, thereby improving communication reliability.Among them, 4FSK can transmit more information under the same bandwidth and power conditions by dividing the signal into four frequency states, thus providing higher data transmission rates compared to 2FSK.OFDM is appropriate for channel conditions characterized by frequency selective fading and multipath propagation.As underwater acoustic channels are susceptible to these issues, OFDM segregates the spectrum into multiple subcarriers and transmits them orthogonally, effectively minimizing inter-symbol interference caused by multipath propagation, thereby enhancing anti-interference performance and spectral efficiency.
When there is no relative movement between the transmitter and receiver, the underwater acoustic channel can be modeled as a coherent multipath UWAC channel.In this scenario, the received signal on a single channel can be expressed as follows: Here, r(t) denotes the signal received, h(t) represents the response of the system to an impulse of the UWAC channel, s(t) represents the transmitted passband modulated signal, ⊗ denotes the convolution operation, and n(t) represents the ambient noise in the ocean environment.
If the transmitted signal s(t) is the MPSK signal, its mathematical expression is as follows: In this equation, A represents the magnitude of the signal, f c signifies the carrier frequency, k is the symbol index, M stands for the number of symbols, and θ denotes the initial phase.
If s(t) is the MFSK signal, its mathematical expression is as follows: where A is the signal amplitude, f 1 , f 2 , . . ., f M are the carrier frequencies, ϕ 1 (k), ϕ 2 (k), . . ., ϕ M (k) represent the phase shifts corresponding to the k-th symbol, while M denotes the total number of symbols.
If the transmitted signal s(t) is an OFDM signal, its mathematical expression is as follows: In this equation, x k (t) signifies the complex-valued symbol transmitted on the k-th subcarrier at time t, f k denotes the frequency of the k-th subcarrier, N represents the total number of subcarriers, and j stands for the imaginary unit.
To obtain x k (t) in the time domain, the Fourier inverse transform of the corresponding frequency-domain symbol X k is taken: In this equation, X k denotes the complex-valued symbol transmitted on the k-th subcarrier in the spectral domain.
The OFDM signal is formed by concatenating the time-domain symbols for all subcarriers: Usually, the OFDM signal is transmitted through a frequency-selective channel, where each subcarrier undergoes diverse channel gain and phase shift.At the receiver, the signal is demodulated by performing a Fourier transform to recover the frequency-domain symbols X k that are then employed to retrieve the original data.

Underwater Acoustic Channel Analysis
Frequency selective channel refers to a channel in which the transmitted signal undergoes frequency selective fading.This type of channel is characterized by different degrees of signal attenuation at different frequencies, resulting in changes in the transmitted signal's spectrum.Multipath propagation refers to the phenomenon where the signal reaches the receiver through multiple paths during transmission, with each path having different propagation distances and propagation times.This leads to the generation of multiple different time delays and amplitudes at the receiver.This phenomenon causes the signals to superimpose, resulting in multipath interference.For frequency-selective channels, multipath propagation can cause different degrees of interference on signals at different frequencies, resulting in spectrum expansion and distortion.From the perspective of ray acoustics, the signal is emitted from the sound source and travels along different paths to reach the receiver.At the recipient's end, the composite received signal arises from the combination and interference of the transmitted signals across all conceivable acoustic pathways.For simplicity, we assume that each path has sufficient stability during a certain communication duration.Under this assumption, the underwater acoustic channel can be approximated as a static time-invariant channel, and its corresponding impulse response is a one-dimensional function with a delay τ, denoted as follows: In Equation (7), δ(•) represents the Dirac impulse function, P represents the number of paths in the channel, and α p , φ p , τ p signify the magnitude, phase, and delay of the p-th path, respectively.The relevant path parameters can be obtained by solving the eigenray equations as mentioned earlier.Then, the channel response output signal is obtained by convolving it with the transmitted signal.
By substituting c(τ) into the equation above, we obtain In Equation ( 9), s(t) and r(t) indicate the emitted signal and the received signal, respectively, following transmission through the underwater acoustic channel.The Doppler shift refers to the change in frequency of a signal when either the source or the receiver is in motion.When the source or the receiver moves toward the other, the signal frequency increases; when the source or the receiver moves away, the signal frequency decreases.In the underwater acoustic channel, the motion of a vessel or sonar equipment can cause the Doppler shift.The Doppler shift affects the spectral composition of the transmitted signal, resulting in changes in the received signal's spectrum.When primarily considering this factor, Doppler shift ∆ f can be expressed by the following formula: In Equation ( 10), v r signifies the relative speed between the sender and receiver, and can be expressed as v r = v T cos θ T + v 0 cos θ 0 .Here, v T and v 0 denote the velocities of the transmitter and receiver, respectively, while θ T and θ 0 represent the angles between the transmitter and receiver velocities and the direction of sound propagation.Moreover, f c denotes the carrier frequency, δ signifies the Doppler factor, and c denotes the speed of sound.As the velocity of sound in the ocean is relatively low (approximately 1500 m/s) compared to terrestrial wireless communication systems, the Doppler effect has a more significant influence on underwater acoustic communication.The Doppler effect introduces time-varying characteristics to the channel response.In underwater communication systems, the channel's time-varying nature, to some extent, can prevent the receiver from recovering the carrier, making it impossible to achieve high-rate communication using coherent transmission.Fading refers to the amplitude attenuation that signals experience during transmission.In underwater acoustic channels, fading can be caused by various factors such as sound propagation loss, scattering, and the Doppler effect.Fading leads to changes in the amplitude of the received signal, thereby affecting signal quality and reliability.In summary, frequency-selective channels in underwater communication are affected by multipath propagation, Doppler shift, and fading.Multipath propagation causes signal spectrum expansion and distortion, Doppler shift results in changes to the signal spectrum, and fading causes changes in signal amplitude.These effects have significant impacts on the transmission quality and reliability of signals.

Theoretical Analysis of Higher-Order Cumulant 2.3.1. Higher-Order Cumulant
Higher-order statistical analysis of signals, often termed non-Gaussian waveform processing, refers to signal analysis utilizing higher-order statistical moments.Classical signal processing approaches employ second-order statistical moments as mathematical tools for analysis, which are represented in the temporal realm as correlation functions and in the spectral domain as power spectra.However, these methods have certain limitations, such as multiplicity or equivalency, and cannot identify non-minimum phase systems.Moreover, they are susceptible to the influence of additive noise and can only handle signal data with additive white noise.To address these limitations, higher-order statistical moments are required.While analysis based on second-order statistical measures can solely capture the primary information of the signal, namely the outline, higher-order statistical analysis can offer more intricate insights into the signal [22].Consequently, this paper employs higher-order statistical moments as mathematical tools for modulation recognition.
Reference [22] provides some derivations: Given a random variable x, the initial characteristic function (also known as the moment generating function) is characterized as the Fourier inverse transform of its probability density function.
The cumulant generating function, alternatively referred to as the second momentgenerating function, is defined as follows: Let the k-th derivative of the initial characteristic function be denoted as follows: By setting ω = 0, we can derive the k-th moment of x as follows: The delineation of the kth-order cumulant of x is We consider a Gaussian random variable x with a distribution, described as x ∼ N(0, σ 2 ), where its probability density function is expressed as Thus, the generating function of moments for the Gaussian random variable x can be formulated as follows: According to the integral transformation formula, the equation can be transformed as follows: Let us consider the values A = 1 2σ 2 , B = jω 2 , and C = 0.In this case, the moment generating function of x can be expressed as follows: In this case, the various derivatives of φ(ω) can be computed as follows: Therefore, the k-th order statistical moment of the Gaussian random variable x can be represented as follows: Consequently, it can be generalized that for any arbitrary integer k, the moment of the random variable x following a Gaussian distribution is given by Subsequently, the cumulative generating function of x can be derived as follows: The derivatives of the cumulative generating function of x with respect to its orders are given by ϕ (ω) = −σ 2 ω (30) Therefore, the kth-order cumulative moments of the Gaussian random variable x are given by c 1 = 0 (33) Upon derivation, it becomes apparent that the variance of any zero-mean Gaussian random process is equivalent to its second-order cumulant, which is also equivalent to its variance σ 2 .The odd-order statistical moments are consistently zero, while the even-order statistical moments are non-zero.Nevertheless, the higher-order cumulants (third-order and beyond) are always zero.This indicates that the second-order cumulant is susceptible to additive noise, whereas the higher-order cumulants (third-order and beyond) demonstrate anti-noise interference characteristics for Gaussian random processes, such as Gaussiancolored noise.
High-order statistical moments refer to moments greater than the second order, while high-order cumulative moments refer to cumulative moments greater than the second order.We denote the operation of calculating moments as mom(•) and the operation of calculating cumulative moments as cum(•).According to the definitions, we can derive some properties of high-order moments and high-order cumulative moments: According to the reference [22], in the presence of additive Gaussian-colored noise, the high-order cumulant moments of the observation process mirror the high-order cumulant moments of the non-Gaussian signal.In other words, the high-order cumulant moments are insensitive or resistant to Gaussian-colored noise.While due to the absence of semi-invariance in high-order moments, the high-order statistical moments of the observation process might differ from those of the non-Gaussian signal, suggesting that high-order moments are susceptible to Gaussian noise.Therefore, in higher-order statistical analysis, high-order cumulant moments are commonly used as tools for analyzing and processing non-Gaussian signals, rather than high-order moments.In signal analysis, high-order cumulant moments provide more comprehensive information about signal characteristics compared to high-order moments, making them significant for the analysis and processing of non-Gaussian signals.
Common high-order cumulative moments include the third-order cumulative moment (skewness) and the fourth-order cumulative moment (kurtosis).Skewness reflects the degree of asymmetry in the signal distribution and can be used to describe its asymmetric characteristics.Kurtosis reflects the peakedness of the signal distribution and can be used to describe its peakedness characteristics.
In signal analysis, high-order cumulative moments can be used to identify and differentiate different types of signals.By calculating the high-order cumulative moments of different signals, non-Gaussianity and nonlinearity features can be extracted, enabling signal classification and recognition.Additionally, high-order cumulative moments can be used to detect abnormal conditions in signals, such as abnormal pulses or noise, as these abnormal signals often lead to significant changes in high-order cumulative moments.
In conclusion, high-order cumulative moments involve the accumulation computation of higher-order statistical quantities of signals, providing more detailed and accurate information about signal characteristics.They have a crucial role in the examination and manipulation of signals that are non-Gaussian and nonlinear.The application of high-order cumulative moments in signal processing, pattern recognition, and anomaly detection fields can help us better understand and utilize signal properties.
The pth-order mixed moment of the stationary stochastic process x(t) is defined as follows [22]: The second to eighth-order cumulants of x(t) can be obtained as follows [23]: In the event that x(t) is a Gaussian process, the cumulants beyond the second order will always be zero, resulting in the higher-order cumulants being capable of efficiently attenuating Gaussian noise.The theoretical values of higher-order cumulants for various modulation techniques of baseband signals are detailed in Table 1 [24,25].
* where E is the power of the signal.
In OFDM systems, the spectrum is divided into multiple subcarriers, and modulation and transmission are performed on each subcarrier.To simplify system design and implementation complexity, OFDM assumes that the modulation sequences on each subcarrier are independent and identically distributed complex random sequences.This assumption is made to simplify the system design and implementation complexity.If the modulation sequences on each subcarrier are mutually independent and identically distributed complex random sequences, the same modulator and demodulator can be used to process each subcarrier, thus reducing system complexity.Additionally, this assumption simplifies the implementation of key techniques, such as channel estimation and equalization.The design and implementation of OFDM systems are closely related to the independence and identical distribution of the modulation sequences on each subcarrier.When designing an OFDM system, parameters such as the number of subcarriers, frequency spacing between subcarriers, and modulation scheme need to be determined.By assuming that the modulation sequences on the subcarriers are independent and identically distributed complex random sequences, system performance analysis and optimization can be conducted to select appropriate parameter settings.In the implementation of an OFDM system, the same modulator and demodulator can be used to process the modulation sequences on each subcarrier, simplifying the hardware design and software implementation of the system.
OFDM is a signal composed of multiple orthogonal subcarriers, with each subcarrier transmitting independent data.The frequency spacing between subcarriers in an OFDM system is orthogonal to each other, which helps avoid interference between subcarriers.The time-domain waveform of an OFDM signal is formed by adding multiple sinusoidal waves together.According to the central limit theorem, when many independent random variables with finite variances are linearly summed, the distribution of the sum is inclined to a normal distribution.In an OFDM signal, the data on each subcarrier can be considered as independent random variables, and due to the orthogonal frequency spacing between subcarriers, their linear sum can be approximated as a Gaussian distribution.Furthermore, OFDM signals are periodic in the time domain, while a Gaussian distribution is a stationary process that has infinite bandwidth in the spectral domain.As a result, the spectrum of an OFDM signal remains flat near the center frequency of each subcarrier and decays rapidly at the frequency spacing between subcarriers.This spectral characteristic contributes to the improved spectral efficiency and resistance to multipath fading in OFDM systems.In conclusion, the orthogonality between subcarriers and the application of the central limit theorem results in the Gaussian distribution characteristics of OFDM signals in the spectral domain.
Regarding the received OFDM signal at the receiver, since the OFDM signal is a superposition of multiple orthogonal subcarriers, it is assumed that the modulation sequences on individual subcarriers follow independent and identically distributed complex random sequences.According to the central limit theorem of statistics, the OFDM communication signal follows an asymptotic complex Gaussian distribution.Furthermore, the Gaussian nature of OFDM is affected by the number of subcarriers, with stronger Gaussianity observed as the number of subcarriers increases [23].This property is independent of the modulation scheme used on the subcarriers, whether it is BPSK or QPSK.For single-carrier modulation signals, they do not exhibit normal distribution characteristics, so their C 42 and C 63 are non-zero.
This paper constructs In the baseband, the theoretical values of C42 (x(n)) for BPSK, QPSK, 2FSK, 4FSK, and OFDM signals are 2, 1, 1, 1, and 0, respectively.The theoretical values of C63 (x(n)) are 13, 4, 4, 3, and 0, respectively.The aim of creating C 42 and C 63 is to augment the separation between the data that correspond to the higher-order cumulants of distinct modulation signals.This is conducted to more effectively differentiate between various modulation schemes in the presence of noise.Compared to C42 (x(n)) and C 42 , due to the complex noise effects in the underwater acoustic channel, using the sixth-order cumulant will have higher robustness than employing the fourth-order cumulant, and C 63 is used in this paper to distinguish between OFDM and other modulation schemes.

High-Order Spectral Analysis
The higher-order spectrum, also known as the polyspectrum, refers to the spectrum containing multiple frequencies.More precisely, the third-order spectrum S 3x (ω 1 , ω 2 ) is known as the bispectrum, while the fourth-order spectrum S 4x (ω 1 , ω 2 , ω 3 ) is commonly referred to as the trispectrum.This is because they respectively represent the energy spectrum of two and three frequencies.It is customary to use B x (ω 1 , ω 2 ) to represent the bispectrum and T x (ω 1 , ω 2 , ω 3 ) to express the trispectrum.The implementation process of the bispectrum is derived below [22].
Consider zero-mean observed samples x(0), x(1), . . ., x(N − 1) with a sampling frequency of f s .The data to be detected are partitioned into K segments, each with a length of M, denoted as , where k = 1, . . ., K, and overlapping between adjacent data segments is permitted.
We compute the triple correlation of DFT coefficients, where ∆ 0 = f s /N 0 , and N 0 and L 1 must meet the condition that M = (2L where The average value of K-segment bispectrum estimation is the final bispectrum estimation value of the sample data [22]. where

System Framework
The configuration of the system for recognizing underwater communication signal modulation, which is based on higher-order cumulant theory and deep learning network, is illustrated in Figure 2. The system first utilizes the improved higher-order cumulant C 63 value of OFDM, which is zero, while the C 63 values of other phase modulation and frequency type signals are non-zero.By using this difference, OFDM and other modulationtype signals can be distinguished.The theoretical simulation results without noise are shown in Figure 3, and it can be seen that the feature distance between C 42 and C 63 is larger than that between C42 (x(n)) and C63 (x(n)), resulting in a better distinguishing effect.Moreover, the distinguishing effect of C 63 is better than that of C 42 .Therefore, it is more appropriate for the system to use C 63 to distinguish OFDM and other modulation types.Then, using the theory of higher-order spectra characteristics of the signal, the bispectrum of signals with different modulation methods is plotted.Figure 4 shows the bispectrum generated under an SNR of 10 dB.Due to the fact that the bispectrum directly generated does not have prominent features and is easily obscured by noise, this paper has made improvements based on this and squared the signal before estimating its bispectrum.The enhanced bispectrum of the squared signal is presented in Figure 5.It is apparent that the distinctions between the spectra of various signals are more pronounced, making it suitable for identifying the modulation type in image recognition.

Neural Network Model
Feedforward neural networks that encompass convolutional operations and possess a deep structure are commonly known as conventional convolutional neural networks (CNNs).The most representative of these is the LeNet network model proposed by Yan LeCun in 1998, which laid the foundation for modern CNNs and earned him the title of "father of convolutional neural networks".Figure 6 displays the LeNet network architecture, which consists of convolutional layers, activation layers, pooling layers, fully connected layers, and ultimately utilizes the softmax function to classify the resultant image [26].Nevertheless, due to the intricate nature of the UWAC channel environment and the impact of noise, the signal may experience distortion, and the critical feature expression in the squared bispectrum plot may also be obscured by noise.Consequently, classical CNN networks exhibit substandard recognition performance under low signal-to-noise ratios.One potential solution is to augment the depth of the network to gather a greater amount of information present in the images.However, blindly pursuing network depth can lead to problems such as gradient disappearance or explosion, and bring difficulties to manual parameter tuning of the network.
Therefore, this paper uses the ResNet model proposed by He, K. et al. in 2015 as the main structure [27].The fundamental component of the network, ResBlock, is depicted in Figure 7.It creates "shortcut connections" between the upper and lower layers, which facilitates gradient backpropagation throughout the training process and trains deeper CNN networks.From the perspective of information transmission, this structure has advantages.In conventional convolutional neural networks, information loss may occur after convolution and other operations, and as the network's depth escalates, the loss of information becomes increasingly severe.This can result in issues such as gradient disappearance or explosion, rendering the network ineffective in training.In the design of ResBlock, through the direct bridging operation from the upper layer to the lower layer, the lower layer can receive the feature representation extracted by the convolution layer as well as the complete information from the upper layer, effectively reducing the loss of information during network transmission.
In ResNet, the skip connection operation, also known as the direct bridging operation, connects the input signal directly to the output signal, enabling the preservation and transmission of information across layers.This architecture aids in resolving the problems of dwindling and skyrocketing gradients in deep neural networks.By using skip connections, the input signal can bypass certain layers and be directly propagated to subsequent layers, thereby retaining more information and gradients.
The convolution operation is a commonly used feature extraction method in deep learning.Through convolution operation, neural networks can learn to extract local features from data such as images, speech, and text.The convolution kernel performs convolution operation on the input data by the sliding window, convolving the local features within the window with the convolution kernel, and obtaining feature maps.Through multiple convolution operations, neural networks can gradually extract more abstract and advanced features.
Batch normalization is a technique used to accelerate training and improve the performance of neural networks.It normalizes the input of each mini-batch, making the mean of each feature dimension close to 0 and the variance close to 1.This mitigates the challenges of gradient vanishing and exploding, while simultaneously enhancing the steadiness and rate of convergence of the network.In addition, batch normalization also has a certain regularization effect, which can reduce the risk of overfitting.
ReLU (rectified linear unit) activation function is a commonly used non-linear activation function.It sets the input signal to 0 for values less than 0, and keeps the values greater than 0 unchanged.The ReLU activation function has a simple computational form and good non-linear expression ability, which helps neural networks learn more complex features and express more complex non-linear relationships.Furthermore, the ReLU activation function possesses sparse activation characteristics, which aid the network in learning more sparse feature representations and enhancing the network's generalization capability.
In summary, in ResNet, the skip connection operation achieves the preservation and transmission of information across layers, convolution operation extracts features, batch normalization accelerates training time and improves network performance, and the ReLU activation function enhances non-linear relationships, thereby improving the performance and generalization ability of neural networks.
In ResBlock, the information is divided into two paths.One path directly proceeds to the subsequent layer, while the other path traverses through a convolution layer with a kernel size of 3 × 3, a stride of 1, and a padding of 1, succeeded by a batch normalization layer and a ReLU activation layer.The batch normalization layer is employed to expedite the training process and enhance the model's generalization capability, whereas the ReLU activation function is a frequently used nonlinear activation function that amplifies the network's nonlinear expression ability.Among these, the in-place setting of the ReLU activation function is set to True, which implies that it operates in place and can conserve memory usage.The overall residual network structure is depicted in Figure 8.Because the bispectrum plot has central symmetry, the actual input of the network is in the lower right corner of the cropped image, which accounts for 1/4 of the original image.The image passes through a convolution layer equipped with a 3 × 3 kernel size, a stride of 1, and a padding of 1, subsequently accompanied by a batch normalization layer and a ReLU activation layer.Then it enters the residual network layer, which is composed of three residual blocks, as mentioned earlier, for feature extraction.This structure is deeper than a CNN network.Finally, average pooling is performed, and the recognition result is output using the softmax function through a fully connected layer.The network used the cross-entropy loss function to measure the difference between the true values and the model's predicted values for image classification tasks.The neural network in this study was trained and evaluated on an Ubuntu system with a 12th Gen Intel(R) Core(TM) i9-12900KF CPU and a GeForce RTX 3090 GPU, with 64 GB of graphics memory.The environment was established using the PyTorch deep learning network platform framework.During the training process, the Adam optimizer with a learning rate of 0.001 was employed, and the batch size for both the training and testing datasets was set to 64.The network was trained for 100 epochs.

Experiment and Verification
In this paper, a training set generated under a theoretical simulation environment was constructed, and test sets generated under a theoretical simulation environment, a Bellhop underwater acoustic simulation channel environment, and a real lake trial environment were used for testing.The empirical findings demonstrate the strength of the system in blind recognition of received signals in non-cooperative communication scenarios.

Simulation Verification
In this paper, we constructed training and test sets in a theoretical simulation environment, and the parameters of the transmitted signals are shown in Table 3.
To make the simulation results closer to the real environment and mitigate the disparity between the training and actual test datasets, after the transmission signals were generated, they passed through a Rayleigh fading channel with a multipath number of 7 and added in-band colored noise with a bandwidth of 4000-8000 Hz, rather than simple Gaussian white noise.This can better simulate the real underwater acoustic channel environment, enhance the training set's capacity for generalization, and enhance the persuasiveness of the simulation results.As shown in Figure 9, the signal passed through a channel with a multipath number of  The training and testing processes of the network with an SNR of 20 dB are presented in Figure 13.It is apparent from Figure 13a that the ResNet achieves a testing accuracy that tends to be 100% from the beginning, while the testing accuracy of LeNet fluctuates significantly.Moreover, as shown in Figure 13b,c, the testing loss and training loss demonstrate that ResNet can converge rapidly and achieve optimal performance compared to LeNet.This paper further demonstrates the effectiveness of the proposed method by comparing it with the methods proposed in references [28,29].Reference [28] used the original bispectrum of the signal as the dataset for the deep learning neural network, directly estimating the bispectrum of the signal.Reference [29] used the smooth pseud Wigner-Ville distribution (SPWVD) to plot the time-frequency diagram of the signal as the dataset for the neural network.Figure 14 shows the performance of the three methods in the inter-class recognition of PSK, FSK, and OFDM signals in a computer simulation environment.Both the original bispectrum and SPWVD methods also exhibit good performance in recognizing PSK, FSK, and OFDM signals, with an inter-class recognition accuracy of over 90% when SNR > 0 dB.The proposed squared bispectrum method achieves 100% inter-class recognition accuracy when SNR ≥ −4 dB. Figure 15 shows the performance of the three methods in intra-class recognition of BPSK, QPSK, 2FSK, and 4FSK signals, highlighting the superiority of the proposed method.The average recognition accuracy of the original bispectrum method is 78.91%, while the average recognition accuracy of the SPWVD time-frequency method is 67.42%.

Bellhop Dataset Simulation Validation
In this work, a Bellhop simulation model was used to generate underwater acoustic channels.The simulated source depth was set to 100 m, the receiver depth was set to 50 m, and the lateral communication distance between the transmitter and receiver was 1 km.The sound ray diagram of the simulated water is shown in Figure 16, and the generated channel impulse response is shown in Figure 17.It can be seen that the channel received by the receiver comprises 7 multipaths.As shown in Figure 18, the signal passed through a Bellhop channel and added colored noise within the bandwidth with a step size of 2 dB, ranging from 10 dB to 20 dB.The trends of estimated C63 (x(n)) and C 63 with the change of SNR were obtained.The results show that the classification threshold set to 0.7 can distinguish OFDM from BPSK, QPSK, 2FSK, 4FSK and other modulated signals at an SNR of 4 dB in the Bellhop simulated underwater acoustic channel.As shown in Figure 21, the test set generated under the Bellhop environment is more affected by the simulated underwater acoustic channel noise than the test set generated in a theoretical simulation environment.In the range of −10 dB to 0 dB, the recognition accuracies of ResNet and LeNet networks decreased, but overall, ResNet has better signal modulation-type recognition performance than LeNet.ResNet can achieve 100% recognition accuracy at an SNR of 2 dB, while LeNet still fluctuates in recognition accuracy in environments with SNRs above 6 dB and cannot reach 100% accuracy.This indicates that ResNet is more robust than LeNet in terms of changes in SNRs.for PSK, FSK, and OFDM signals at 2 dB, slightly outperforming the proposed squared bispectrum method.In intra-class recognition, the average recognition accuracy of the original bispectrum method is 75.92%, while the average recognition accuracy of the SPWVD time-frequency method is 65.15%, highlighting the superiority of the proposed method in this paper.

Lake-Test Dataset Verification
In September 2018, a field experiment was conducted at Danjiangkou Reservoir, Danjiangkou City, Hubei Province, China, to verify the recognition performance of the modulation recognition system proposed in this paper under practical underwater channel conditions.The experiment was conducted on the lake and the communication distance was 1 km.The arrangement of the transmitting and receiving apparatus is illustrated in Figure 24.The sound source at the emitting end was directly connected to the transducer via a long cable using a laptop and power amplifier to generate sound.The receiving end used a mid-frequency underwater acoustic communication integrated machine developed by our research team, as shown in Figure 25.The parameters of the transmitting signal during the lake trial are shown in Table 4.In this study, operations such as searching for synchronization headers and cropping were performed on the received data.The signal sampling frequency was 48,000 Hz.The length of the BPSK signal was 5 s, the length of the QPSK signal was 10 s, the length of the 2FSK signal was 10 s, the length of the 4FSK signal was 50 s, the length of the OFDM-BPSK signal was 25 s, and the length of the OFDM-QPSK signal was 11 s.A square bispectrogram was generated every 0.3 s, resulting in a total of 370 test set images.
As depicted in Figure 26, the improved sixth-order cumulative quantity C 63 threshold was set to 0.7, which was found to be feasible based on the measured data collected during the lake trial.The system was able to effectively distinguish between OFDM and other modulation types.As shown in Figure 27, the square bispectrum diagram generated from the offline processing of the lake trial data exhibits little difference compared to the simulated environment, indicating that the network still maintains its effectiveness in identifying the lake trial data.

Conclusions
This paper proposes a modulation recognition system for UWAC signals using highorder cumulants and deep learning networks.By utilizing the difference in high-order cumulant values C 63 of signals with different modulation types, the OFDM signal is distinguished from other modulation types.Subsequently, by leveraging the distinctions in the square bispectrum diagrams of signals with varying modulation types, a deep learning network is employed for the visual recognition of distinct images, effectively recognizing the modulation type of UWAC signals.
The simulation results show that with the improved sixth-order cumulant and a multipath number of 7, the system can distinguish between OFDM and BPSK, QPSK, 2FSK, and 4FSK signals under 0 dB conditions with a recognition accuracy of 100%.With the ResNet neural network recognizing the improved bispectrum diagram, the system can distinguish between BPSK, QPSK, 2FSK, and 4FSK modulation signals under 0 dB conditions, with a recognition accuracy of 100% in the simulation environment.In the UWAC simulation channel generated by BELLHOP, the system achieved recognition accuracy of 100% at an SNR of 2 dB, verifying the feasibility of the simulation results.In the real lake trial environment at Danjiangkou, the recognition accuracy of the lake trial data was 98.44%, proving that the proposed system, which combines the high-order cumulant theory with artificial intelligence, can achieve blind modulation recognition of common UWAC signals in non-cooperative scenarios under low SNR and underwater multipath conditions.
displays the two principal classifications of modulation identification methods: likelihood-based (LB) hypothesis testing based on decision-theory and feature-based (FB) modulation classification based on feature extraction.The FB modulation classification can be further segregated into three distinct types: statistical pattern analysis utilizing feature extraction, modulation classification based on machine learning, and modulation classification based on deep learning.

Figure 2 .
Figure 2. Block diagram of the modulation recognition system.

Figure 3 .
Figure 3. Higher-order cumulants under a theoretical simulation environment without noise.(a) The trend of C42 (x(n)) with respect to the SNR; (b) the trend of C 42 with respect to the SNR; (c) the trend of C63 (x(n)) with respect to the SNR; (d) the trend of C 63 with respect to the SNR.

Figure 12 .
Figure 12.Network recognition accuracy curve under theoretical simulation conditions.

Figure 13 .
Figure 13.The training and testing process of the network at an SNR of 20 dB.(a) Test accuracy curve; (b) test loss curve; (c) train loss curve.

Figure 14 .
Figure 14.Inter-class recognition comparison curves under theoretical simulation conditions.

Figure 15 .
Figure 15.Intra-class recognition comparison curves under theoretical simulation conditions.

Figure 16 .Figure 17 .
Figure 16.Sound ray diagram between the transmitter and receiver in the Bellhop.

Figure 18 .Figure 19 .Figure 20 .
Figure 18.Higher-order cumulants in the Bellhop-simulated channel environment.(a) The trend of C63 (x(n)) with the change of SNR.(b) The trend of C 63 with the change of SNR.Using the training set of 12,800 images generated in a theoretical simulation environment, the test set of 640 images generated by Bellhop was tested.The squared spectrogram generated by the Bellhop channel is shown in Figure19.The testing recognition outcomes are presented in Figure20.It is evident that ResNet can precisely discern BPSK, QPSK, 2FSK, and 4FSK signals with 100% accuracy at an SNR of 2 dB.The effectiveness is basically the same as that in the theoretical simulation environment, indicating that the quality of the training set generated by the simulation meets the simulation requirements of complex underwater acoustic channel environments.

Figure 21 .
Figure 21.Network recognition accuracy curve under the Bellhop-simulated channel conditions.

Figure 22
Figure22shows the performances of the three methods in inter-class recognition of PSK, FSK, and OFDM signals in a Bellhop-simulated underwater acoustic channel environment.Figure23demonstrates the performances of the three methods in intra-class recognition of BPSK, QPSK, 2FSK, and 4FSK signals.Further validation of the results in the computer simulation environment shows that both the original bispectrum and SPWVD time-frequency methods achieve inter-class recognition accuracy of over 90% for PSK, FSK, and OFDM signals at 2 dB, slightly outperforming the proposed squared bispectrum method.In intra-class recognition, the average recognition accuracy of the original bispectrum method is 75.92%, while the average recognition accuracy of the SPWVD time-frequency method is 65.15%, highlighting the superiority of the proposed method in this paper.

Figure 22 .
Figure 22.Inter-class recognition comparison curves under Bellhop simulation conditions.

Figure 24 .
Figure 24.Layout of the transmission and reception devices.

Figure 26 .
Figure 26.Advanced cumulative quantity C 63 of the lake trial data.

Figure 27 .Figure 28 .
Figure 27.Square bispectrum diagram generated from the lake trial data.(a) BPSK; (b) QPSK; (c) 2FSK; (d) 4FSK.As illustrated in Figure28, the recognition outcomes of ResNet and LeNet on the lake trial data demonstrate that ResNet has effectively accomplished modulation identification of BPSK, QPSK, 2FSK, and 4FSK signals with a recognition accuracy of 98.44%, whereas LeNet achieved a recognition accuracy of 90.00%.Therefore, ResNet outperforms LeNet in terms of performance.

Table 1 .
Theoretical values of higher-order cumulants for various types of digital modulation signals.

Table 2
illustrates the network parameters utilized in this paper.

Table 2 .
Transmission signal parameters in the simulation environment.The parameter represented by −1 varies with the number of channels in the previous layer's output.

Table 3 .
Transmission signal parameters in a simulation environment.

Table 4 .
Modulation parameters of the signal transmitted during the lake trial.