Modulation Recognition Method for Underwater Acoustic Communication Signals Based on Passive Time Reversal-Autoencoder with the Synchronous Signals

In the process of the modulation recognition of underwater acoustic communication signals, the multipath effect seriously interferes with the signal characteristics, reducing modulation recognition accuracy. The existing methods passively improve the accuracy from the perspective of selecting appropriate signal features, lacking specialized preprocessing for suppressing multipath effects. So, the accuracy improvement of the designed modulation recognition models is limited, and the adaptability to environmental changes is poor. The method proposed in this paper actively utilizes common synchronous signals in underwater acoustic communication as detection signals to achieve passive time reversal without external signals and designs a passive time reversal-autoencoder to suppress multipath effects, enhance signals’ features, and improve modulation recognition accuracy and environmental adaptability. Firstly, synchronous signals are identified and estimated. Subsequently, a passive time reversal-autoencoder is designed to enhance power spectrum and square spectrum features. Finally, a modulation classification is performed using a convolutional neural network. The model is trained in simulation channels generated by Bellhop and tested in actual channels which are different from the training period. The average recognition accuracy of the six modulated signals is improved by 10% compared to existing passive modulation recognition methods, indicating good environmental adaptability as well.


Introduction
Communication signal modulation recognition is based on the non-cooperative scenario between senders and receivers, which plays an important role in information recovery. In the field of wireless communication, modulation recognition is mostly based on in-phase and quadrature (IQ) samples [1][2][3], high-order Cumulant characteristics [4,5], signal instantaneous characteristics, and wavelet transforms characteristics [6], and then appropriate classifiers are designed to classify modulation types. However, these recognition methods are not suitable for complex and variable underwater acoustic channels.
With the increasing status of the ocean, more and more researchers are devoted to the modulation recognition research of underwater acoustic communication signals. Zhang et al. [7] used machine learning algorithms to recognize modulation based on cumulant, power spectral density, instantaneous phase, instantaneous phase, and frequency characteristics. Denis Stanescu et al. [8] used phase diagram entropy to characterize and identify various modulation types. Dai et al. [9] carried out wavelet denoising and timefrequency feature extraction for the received signal and used the decision tree model for modulation recognition. Huang et al. [10] extracted entropy features and morphological features, and designed optimized autoencoder (OAE) and evaluation-enhanced k-nearest neighbor (EEKNN) algorithms to recognize modulation types.
Deep learning-based methods have been continuously developed in recent years, gradually improving recognition performance. Existing methods are usually based on two types of features in classification: time-domain features and frequency-domain features.
The first type is to use time-domain features as the recognition criteria. Alex et al. [11] designed CNN for modulation recognition of received time-domain signals. Li et al. [12] designed a feature extraction and recognition network based on Resnet to classify the timedomain signals. Yao et al. [13] trained generative adversarial networks (GANs) based on time-domain waveform features for signal enhancement, feature extraction, and automatic modulation classification. Yu et al. [14] utilized long-and short-term memory (LSTM) for modulation recognition with the signals' instantaneous characteristics. Zhang et al. [15] trained a cyclic convolutional neural network with a normalized time series. Kong et al. [16] used IQ symbols to train a residual network. Wang et al. [17] proposed a sequence convolutional network to achieve modulation classification based on signals' temporal characteristics. Liu et al. [18] utilized principal component analysis technology to compress the original time-domain signals and then designed a deep heterogeneous network for modulation recognition. Xiao et al. [19] designed a CNN for classification based on IQ signals. However, time-domain characteristics are easily interfered with by the complex underwater acoustic channel. The above methods can only achieve ideal results when the channel conditions of the test set are the same as the training set. Once the conditions are inconsistent, the recognition accuracy will be seriously affected.
The second type is to select frequency-domain features as the recognition basis. The power spectrum, time-frequency map, frequency spectrum, and singular spectrum are common features used for modulation recognition [20][21][22][23][24]. Jiang et al. [20] proposed a sparse automatic encoder (SAE) for feature extraction and modulation recognition based on power spectrum features. Wang Bin et al. [21] used a denoising autoencoder (DAE) to denoise signals and then used CNN to classify the modulation types based on power spectral features. Wang et al. [22] proposed a relational network and fed it with power spectrums. Xu et al. [23] trained CNN with time-frequency map features of signals. Kou et al. [24] extracted the real and imaginary parts of the signal through the Fast Fourier transform (FFT) and then designed an artificial neural network (ANN) as a feature classifier.
However, the characteristics of multiple phase shift keying (MPSK) are too similar and difficult to distinguish, choosing only one feature has certain limitations. Therefore, some methods choose two features to distinguish MPSK. Jiang et al. [25] used principal component analysis to extract effective features from the power spectrums and square spectrums, distinguishing across multiple frequency shift keying (MFSK), binary phase shift keying (BPSK), and quadrature phase shift keying (QPSK). Li et al. [26] combined time-domain and frequency-domain features to classify modulation types. Firstly, MPSK and other signals were identified through time-domain waveform features, and then the square spectrum features were selected to identify BPSK and QPSK. Compared to timedomain features, frequency-domain features have a stronger anti-interference ability, but a single frequency-domain feature has limitations. This paper selects two frequency-domain features, power spectrum and square spectrum, to classify modulation types.
The severe multipath effect of underwater acoustic channels can have serious interference with the time-frequency characteristics, reducing the accuracy of modulation recognition. The existing methods passively focus on the selection of signal features, and cannot actively weaken the impact of the environment, resulting in a sharp decline in the recognition performance when underwater acoustic channel conditions change. This paper actively utilizes commonly used synchronous signals in communication as the detection signals and designs a passive time reversal-autoencoder to improve accuracy and environmental adaptability. Firstly, we identify and estimate the types and parameters of synchronization signals, and use the recovered synchronization signals as detection signals in passive time reversal. Then, we design a passive time reversal-autoencoder (PTR-AE) for multipath suppression and signal feature enhancement. Finally, modulation recognition is performed by using CNN. The modulation classification network is trained with simulation Sensors 2023, 23, 5997 3 of 17 data and tested in actual underwater acoustic channel environments which are different from the training environments. We compare the proposed model to verify its effectiveness with existing methods.

Signal Model
The underwater acoustic communication signal model can be expressed as: where y(t) is received signal; x(t) is the modulated signal sent by the transmitter; h(t) denotes underwater acoustic channel; "⊗" denotes convolution operation; n(t) is additive noise. The impulse response function model for the multipath channel can be expressed as [27]: where A and A i are amplitudes; τ 0 and τ i represent time delays. The first term on the right side of the equal sign is the direct sound wave, and the second term is bounded refraction and reflection waves.

System Model and Proposed Method
The method proposed in this paper is divided into three steps: synchronous signal recognition and parameter estimation, frequency domain feature enhancement based on PTR-AE, and classification recognition. The specific process is shown in Figure 1. signals in passive time reversal. Then, we design a passive time reversal-autoencoder (PTR-AE) for multipath suppression and signal feature enhancement. Finally, modulation recognition is performed by using CNN. The modulation classification network is trained with simulation data and tested in actual underwater acoustic channel environments which are different from the training environments. We compare the proposed model to verify its effectiveness with existing methods.

Signal Model
The underwater acoustic communication signal model can be expressed as: where ( ) y t is received signal; ( ) x t is the modulated signal sent by the transmitter; ( ) h t denotes underwater acoustic channel; " ⊗" denotes convolution operation; ( ) n t is additive noise. The impulse response function model for the multipath channel can be expressed as [27]: where A and i A are amplitudes; 0 τ and i τ represent time delays. The first term on the right side of the equal sign is the direct sound wave, and the second term is bounded refraction and reflection waves.

System Model and Proposed Method
The method proposed in this paper is divided into three steps: synchronous signal recognition and parameter estimation, frequency domain feature enhancement based on PTR-AE, and classification recognition. The specific process is shown in Figure 1.  First, a synchronous signal recognition network is designed to identify the type of synchronous signal. And then its parameters are estimated based on fractional Fourier transform (FrFT), Hough transform, and spectral features. Afterward, PTR-AE and CNN are designed for power spectrum features enhancement and modulation recognition, to classify the signals into 2FSK, 4FSK, 8FSK, PSK, and OFDM.
Due to the similarity of power spectrum features between BPSK and QPSK, the square spectrum is selected as the classification feature. Therefore, after identifying the

Structure of Synchronous Signal Recognition Network
Due to the significant differences in time-frequency characteristics among HFM, LFM, and other communication signals, we use the time-frequency features calculated by shorttime Fourier transform (STFT) to recognize them. The specific structure and parameters of the synchronous signal recognition network are shown in Figure 2, where Conv represents the convolutional layer, C represents the size of the convolutional kernel and pooling kernel sliding step, H denotes the number of convolutional kernels, R is the convolutional kernel size and maximum pooling window size, and FC represents the fully connected layer. The synchronous signal recognition network includes the convolution layers, pooling layers, a full connection layer, and a Softmax layer. ReLU and the cross-entropy function are used as the activation function and loss function, respectively.
First, a synchronous signal recognition network is designed to identify the type synchronous signal. And then its parameters are estimated based on fractional Four transform (FrFT), Hough transform, and spectral features. Afterward, PTR-AE and CN are designed for power spectrum features enhancement and modulation recognition, classify the signals into 2FSK, 4FSK, 8FSK, PSK, and OFDM.
Due to the similarity of power spectrum features between BPSK and QPSK, t square spectrum is selected as the classification feature. Therefore, after identifying t modulation types of the signal as PSK, PTR-AE, and CNN are used for square spectru features enhancement and modulation recognition of BPSK and QPSK.

Structure of Synchronous Signal Recognition Network
Due to the significant differences in time-frequency characteristics among HF LFM, and other communication signals, we use the time-frequency features calculated short-time Fourier transform (STFT) to recognize them. The specific structure and param eters of the synchronous signal recognition network are shown in Figure 2, where Co represents the convolutional layer, C represents the size of the convolutional kernel a pooling kernel sliding step, H denotes the number of convolutional kernels, R is the co volutional kernel size and maximum pooling window size, and FC represents the fu connected layer. The synchronous signal recognition network includes the convoluti layers, pooling layers, a full connection layer, and a Softmax layer. ReLU and the cro entropy function are used as the activation function and loss function, respectively.

Training and Testing of Synchronous Signal Recognition Network Models
During the training process, the sampling rate is set to 96 kHz and the number sampling points for each signal segment is 8192. Other parameters are shown in Table  "/" means that the parameter is not involved, "[]" indicates that the data is randomly s lected within the closed set range, "(Hough 1962)" indicates random selection among t listed items, and "∪" is the union operator. The underwater acoustic channel data are ge erated by Bellhop, and the specific parameters are shown in Table 2.

Training and Testing of Synchronous Signal Recognition Network Models
During the training process, the sampling rate is set to 96 kHz and the number of sampling points for each signal segment is 8192. Other parameters are shown in Table 1. "/" means that the parameter is not involved, "[]" indicates that the data is randomly selected within the closed set range, "(Hough 1962)" indicates random selection among the listed items, and "∪" is the union operator. The underwater acoustic channel data are generated by Bellhop, and the specific parameters are shown in Table 2.  [8,10] During the training phase, 500 samples are generated for each modulation signal based on the parameters in Table 1. Bellhop is used to generate simulated underwater acoustic channels according to the parameters in Table 2. The signal-to-noise ratio (SNR) is set within the range from 0 to 10 dB. During the testing phase, the underwater acoustic channels of Haihe and Danjiangkou reservoir are used as testing environments, with specific parameters shown in Table 3. The number of test samples for each modulation signal corresponding to each channel is 100.   Figure 3 shows the recognition accuracy of the synchronous signal recognition network in two environments. Under two different channels from the training environments, the recognition accuracy can reach over 98%.

Estimation of LFM Parameters
Fractional Fourier transform (FrFT) [30] has a good energy aggregation effect on a given LFM signal in a certain order of the fractional Fourier domain. The relationship between LFM frequency modulated rate 0 k and optimal order 0 p is: where L is the length of the discrete signal; s f is the sampling rate; 0 p represents the optimal FrFT order, ranging from 0 to 4. The initial interval of 0 p can be determined based on the value of the frequency modulation 0 k displayed in the spectrogram. If 0 k is positive, the initial value range is [0, 2], and if 0 k is negative, the initial range is [2,4).

Estimation of LFM Parameters
Fractional Fourier transform (FrFT) [30] has a good energy aggregation effect on a given LFM signal in a certain order of the fractional Fourier domain. The relationship between LFM frequency modulated rate k 0 and optimal order p 0 is: where L is the length of the discrete signal; f s is the sampling rate; p 0 represents the optimal FrFT order, ranging from 0 to 4. The initial interval of p 0 can be determined based on the value of the frequency modulation k 0 displayed in the spectrogram. If k 0 is positive, the initial value range is [0, 2], and if k 0 is negative, the initial range is [2,4). Set the search step ∆p = 0.01, calculate the fractional order spectrum, and obtain the rough value of p 0 based on the corresponding point of the maximum absolute amplitude. Then, within the range [p 0 − ∆p, p 0 + ∆p], set the step size ∆p = 0.001 for accurate estimation. Finally, calculate k 0 according to Formula (3). As for estimating N 0 and L, we calculate the frequency spectrum through Fourier transform first, and then set two thresholds to determine the maximum frequency f max and minimum frequency f min of the LFM signal. Threshold 1 and threshold 2 can be calculated by: where N is the number of sampled points of the received signal; a 1 and a 2 are threshold parameters, ranging from 0 to the first frequency point where the power is greater than threshold 1, and f max be the last frequency point where the power is greater than threshold 2. The periods T and L can be calculated by: Using ∆a = 0.1 as the step size, all a 1 and a 2 are transversed to estimate T and f min . The LFM signal is recovered and a cross-correlation with the received signal is performed. The a 1 , a 2 and corresponding f min and T are found, achieving achieve the maximum crosscorrelation peak, then this correlation peak is used to determine the starting position N 0 .
We test the estimation accuracy by using the channels of the Haihe and Danjiangkou reservoir. The true and estimated values are summarized in Table 4. N 0 is based on 0, with delays greater than zero and advances less than zero. When the sampling rate is 96 kHz, the estimation errors of N 0 and L do not exceed 60 sampling points, and the error percentage of k 0 is less than 1.2%.

Estimation of HFM Parameters
The Hough transform [31] is commonly used to detect curves in images. By using the transformation between two coordinate spaces, curves with the same shape in the coordinate space form peaks that map to points in another space. This paper estimates k 0 of HFM based on the Hough transform and time-frequency image calculated by Wigner-Ville distribution (WVD), because WVD has good energy aggregation and high resolution, it can better characterize the time-frequency characteristics of HFM. The frequency of the HFM signal at each moment is: Convert the equation into a polar coordinate system [32]: Find the peak point (ρ 0 , θ 0 ) in the ρ − θ parameter space and calculate k 0 by [32]: After obtaining the estimated value k 0 , N 0 and L are estimated by using the same method as LFM. The final results are summarized in Table 5. When the sampling rate is 96 kHz, the estimation errors of N 0 and L do not exceed 90 points. The percentage error of k 0 is less than 5%.

Signal Frequency Domain Feature Enhancement Network Based on PTR-AE
To actively alleviate the impact of multipath effects on communication signal modulation recognition, PTR-AE is designed to suppress multipath effects and enhance signal frequency domain features after identifying and estimating specific parameters of the synchronous signal.

Passive Time Reversal Detection Signal Selection
The implementation of PTR requires two parts: detection signal and modulation signal. The detection signal needs to meet the following conditions [33]: (1) Its frequency band must cover all frequency bands of the effective signal data; (2) It must have good autocorrelation characteristics; (3) Its frequency spectrum should be whitened as much as possible within the frequency band.
In the process of underwater acoustic communication, it is necessary to add a synchronization signal to assist in the synchronization and demodulation of modulated signals. Common synchronization signals include LFM and HFM, both of which meet the above conditions and can be used as detection signals for passive time reversal.

The Principle of Passive Time Reversal
The detection signal p(t) at the receiving end is first time reversed, and then convolved with the received modulated signal y(t) to obtain intermediate data. The intermediate data is convolved with the detection signal p(t) at the sending end to suppress multipath effects. The schematic diagram is shown in Figure 4.

Feature Spectrum Estimation
After using synchronous signal-based PTR to suppress multipath effects on the sig nal, appropriate feature spectra should be selected as the input of AE. Due to the signifi cant differences between 2FSK, 4FSK, 8FSK, PSK, and OFDM, the power spectrum is firs selected as the classification feature, and AE is used to enhance it. The power spectrum ( ) P ω can be calculated by: where 0 ( ) y n is the discrete modulated signal after passive time reversal, and N is the number of signal sampling points. The square spectrum of the BPSK signal has an impulse characteristic at the position that is twice the carrier frequency, the QPSK signal does not have this feature. Therefore for these two types of signals, we select the square spectrum as the modulation classifica tion feature. AE is also used to enhance the square spectral features of these two signals The square spectrum can be expressed as:

Structure of PTR-AE
The PTR-AE consists of two parts: a passive time reversal layer and an autoencode which consists of seven convolutional layers and eight deconvolution layers. There are some skip connections between convolutional layers and deconvolution layers. The con volutional kernel size is 15, and its sliding step size is 2. Leaky ReLU is used as an activa tion function. The network structure is shown in Figure 5, in which Conv represents the convolutional layer, Deconv represents the deconvolution layer, and H represents the number of convolutional kernels.

Feature Spectrum Estimation
After using synchronous signal-based PTR to suppress multipath effects on the signal, appropriate feature spectra should be selected as the input of AE. Due to the significant differences between 2FSK, 4FSK, 8FSK, PSK, and OFDM, the power spectrum is first selected as the classification feature, and AE is used to enhance it. The power spectrum P(ω) can be calculated by: where y 0 (n) is the discrete modulated signal after passive time reversal, and N is the number of signal sampling points. The square spectrum of the BPSK signal has an impulse characteristic at the position that is twice the carrier frequency, the QPSK signal does not have this feature. Therefore, for these two types of signals, we select the square spectrum as the modulation classification feature. AE is also used to enhance the square spectral features of these two signals. The square spectrum can be expressed as:

Structure of PTR-AE
The PTR-AE consists of two parts: a passive time reversal layer and an autoencoder which consists of seven convolutional layers and eight deconvolution layers. There are some skip connections between convolutional layers and deconvolution layers. The convolutional kernel size is 15, and its sliding step size is 2. Leaky ReLU is used as an activation function. The network structure is shown in Figure 5, in which Conv represents the convolutional layer, Deconv represents the deconvolution layer, and H represents the number of convolutional kernels.
The convolutional layers of the encoder compress the input signal features layer by layer, remove redundant information, and extract high-dimensional features. The deconvolution layers of the decoder realize signal feature decoding and reconstruction. The L1 loss term is used to measure the feature enhancement effect, and the RMSProp optimizer is selected to optimize and adjust the network parameters. The convolutional layers of the encoder compress the input signal features layer layer, remove redundant information, and extract high-dimensional features. The deco volution layers of the decoder realize signal feature decoding and reconstruction. The loss term is used to measure the feature enhancement effect, and the RMSProp optimi is selected to optimize and adjust the network parameters.

CNN-Based Modulation Classification Network
After feature enhancement, CNN is designed for modulation classification. The n work includes five convolution layers, five pooling layers, and one full connection lay ReLU is selected as the activation function. The cross-entropy is selected as the loss fu tion. CNN extracts high-dimensional features of the signal power spectrum and squ spectrum through convolution and finally classifies them using a Softmax layer. The n work structure is shown in Figure 6, where Conv represents the convolutional layer, P represents the pooling layer, and H represents the number of convolutional kernels. T convolutional kernel size is five, and the sliding step size is one. The maximum pooli size is two. By learning the enhanced signal power spectrum features, CNN can classify 2FS 4FSK, 8FSK, PSK, and OFDM. When the modulation type of the signal has been identifi as PSK, the enhanced square spectral features and CNN are used to further classify BP and QPSK.

CNN-Based Modulation Classification Network
After feature enhancement, CNN is designed for modulation classification. The network includes five convolution layers, five pooling layers, and one full connection layer. ReLU is selected as the activation function. The cross-entropy is selected as the loss function. CNN extracts high-dimensional features of the signal power spectrum and square spectrum through convolution and finally classifies them using a Softmax layer. The network structure is shown in Figure 6, where Conv represents the convolutional layer, Pool represents the pooling layer, and H represents the number of convolutional kernels. The convolutional kernel size is five, and the sliding step size is one. The maximum pooling size is two. The convolutional layers of the encoder compress the input signal features layer by layer, remove redundant information, and extract high-dimensional features. The deconvolution layers of the decoder realize signal feature decoding and reconstruction. The L1 loss term is used to measure the feature enhancement effect, and the RMSProp optimizer is selected to optimize and adjust the network parameters.

CNN-Based Modulation Classification Network
After feature enhancement, CNN is designed for modulation classification. The network includes five convolution layers, five pooling layers, and one full connection layer. ReLU is selected as the activation function. The cross-entropy is selected as the loss function. CNN extracts high-dimensional features of the signal power spectrum and square spectrum through convolution and finally classifies them using a Softmax layer. The network structure is shown in Figure 6, where Conv represents the convolutional layer, Pool represents the pooling layer, and H represents the number of convolutional kernels. The convolutional kernel size is five, and the sliding step size is one. The maximum pooling size is two. By learning the enhanced signal power spectrum features, CNN can classify 2FSK, 4FSK, 8FSK, PSK, and OFDM. When the modulation type of the signal has been identified as PSK, the enhanced square spectral features and CNN are used to further classify BPSK and QPSK.

Training of PTR-AE-CNN
During the training process, the sampling rate is set to 96 kHz. Except for OFDM and detection signals, the duration of all other signals is 20 ms, and the duration of detection signals LFM and HFM is 50 ms. Other parameters are shown in Table 6, where "/" By learning the enhanced signal power spectrum features, CNN can classify 2FSK, 4FSK, 8FSK, PSK, and OFDM. When the modulation type of the signal has been identified as PSK, the enhanced square spectral features and CNN are used to further classify BPSK and QPSK.

Training of PTR-AE-CNN
During the training process, the sampling rate is set to 96 kHz. Except for OFDM and detection signals, the duration of all other signals is 20 ms, and the duration of detection signals LFM and HFM is 50 ms. Other parameters are shown in Table 6, where "/" indicates that the parameter is not involved, "[]" indicates that the data is randomly selected within the closed set range, and "{}" indicates that it is randomly selected among the listed items. The underwater acoustic channel is generated by Bellhop, and the specific parameters are shown in Channel 3 in Table 2. In total, 500 sending samples are generated for each modulation signal, and 600 channels are generated by Bellhop. Received data are generated according to Formula (1) with SNR set from 0 to 10 dB. The network parameters of PTR-AE and CNN are constantly optimized through training data, and the training is stopped when the loss function becomes stable.

Performance Testing of PTR-AE-CNN
During the testing phase, three types of underwater acoustic channels, Haihe, Danjiangkou reservoir, and BCH1 channel, data provided by Watermark [34], are used as testing environments. The corresponding number of test samples for each modulation signal in both environments is 600. The specific parameters of the test channel are shown in Table 3.
Taking 2FSK and BPSK signals as examples, the enhancement effect of PTR-AE on signal frequency-domain features in this paper is shown in Figures 7 and 8. After PTR-AE enhancement, the spectral line characteristics of the 2FSK signal's power spectrum in the frequency range of 20-30 kHz are clearer. The impulse characteristics of the BPSK signal at the square spectrum double carrier frequency position are enhanced. When the detection signal is LFM, the average SNR of the power spectrums of the six signals increases from 1 dB to 7 dB, and the average SNR of the PSK square spectrums increases from 1.5 dB to 11 dB. When the detection signal is HFM, the average SNR of the signal power spectrums increases by 6.14 dB, and the average SNR of the square spectrums increases by 10.5 dB. Figure 9 shows the recognition accuracy of the proposed method in three different environments. When the detection signal is LFM, in the Haihe River, the OFDM signal recognition accuracy is greater than 80%, and the accuracy of other modulation signals is higher than 90%. In the Danjiangkou reservoir, the accuracy of all modulated signals is higher than 85%. When using BCH1 channel data testing, the recognition accuracy of all signals is above 90%.
When the detection signal is HFM, the accuracy is slightly lower than LFM. This is due to the energy distribution of the HFM spectrum not being as uniform as LFM, which affects the passive time reversal to some extent. But overall, the recognition rate of all signals is above 70%, and it is also adaptable to changes in underwater acoustic channels.
To demonstrate the effectiveness of PTR-AE in improving the accuracy of modulation recognition, we compared the effectiveness of using CNN for modulation recognition without using PTR-AE for feature enhancement processing under the environment of Haihe and Danjiangkou reservoir. It can be seen from the results that the modulation recognition accuracy is improved by at least 20% after PTR-AE enhancement. We also compare it with methods based on Resnet [16], DAE-Alexnet [21], Alexnet [35], and R&CNN [15]. Table 7 summarizes the test results and parameter quantities of these methods.  Figure 9 shows the recognition accuracy of the proposed method in three different environments. When the detection signal is LFM, in the Haihe River, the OFDM signal recognition accuracy is greater than 80%, and the accuracy of other modulation signals is higher than 90%. In the Danjiangkou reservoir, the accuracy of all modulated signals is higher than 85%. When using BCH1 channel data testing, the recognition accuracy of all signals is above 90%.
When the detection signal is HFM, the accuracy is slightly lower than LFM. This is due to the energy distribution of the HFM spectrum not being as uniform as LFM, which affects the passive time reversal to some extent. But overall, the recognition rate of all signals is above 70%, and it is also adaptable to changes in underwater acoustic channels. The above results indicate that the method in this paper can effectively suppress the impact of multipath effects, significantly enhance the frequency-domain characteristics, and the model is robust to changes in environmental conditions. Traditional passive modulation recognition methods lack effective signal enhancement processing, and the input data of the classifier is severely disturbed by the underwater acoustic channel, which results in fuzzy features. The recognition accuracies of these models are very sensitive to the changes in underwater channels and require training with a small amount of data from the testing environment to achieve the desired effect. In this paper, we actively utilize synchronous signals to suppress multi-path effects, improve the input of traditional classifiers, reduce the model's dependence on environmental conditions, and achieve good recognition accuracy without adjusting model parameters. Since both PTR-AE and CNN use one-dimensional convolutional kernels, only the synchronous signal recognition network uses a two-dimensional convolutional kernel, compared with other comparison methods in this article, the number of parameters in neural networks is not very large.  Figure 9 shows the recognition accuracy of the proposed method in three different environments. When the detection signal is LFM, in the Haihe River, the OFDM signal recognition accuracy is greater than 80%, and the accuracy of other modulation signals is higher than 90%. In the Danjiangkou reservoir, the accuracy of all modulated signals is higher than 85%. When using BCH1 channel data testing, the recognition accuracy of all signals is above 90%.
When the detection signal is HFM, the accuracy is slightly lower than LFM. This is due to the energy distribution of the HFM spectrum not being as uniform as LFM, which affects the passive time reversal to some extent. But overall, the recognition rate of all signals is above 70%, and it is also adaptable to changes in underwater acoustic channels. To demonstrate the effectiveness of PTR-AE in improving the accuracy of modulation recognition, we compared the effectiveness of using CNN for modulation recognition without using PTR-AE for feature enhancement processing under the environment of Haihe and Danjiangkou reservoir. It can be seen from the results that the modulation recognition accuracy is improved by at least 20% after PTR-AE enhancement. We also compare it with methods based on Resnet [16], DAE-Alexnet [21], Alexnet [35], and R&CNN [15]. Table 7 summarizes the test results and parameter quantities of these methods.    From Figures 10 and 11, it can be seen that under the condition of a sampling rate of 96 kHz, when the position and length errors of the synchronous signal are controlled within the range of 100 sampling points, the average recognition accuracy of the six modulation signals decreases to a limited extent, and the overall average recognition rate is still higher than existing passive modulation recognition methods.
This section tests the impact of the estimation errors of 0 N and L on the model recognition accuracy. The range of 0 N is [−100, 100], where less than 0 indicates an early synchronization position, greater than 0 indicates a delayed synchronization position, the range of L values is [4700, 4900], and the sampling rate is set to 96 kHz. From Figures 10 and 11, it can be seen that under the condition of a sampling rate of 96 kHz, when the position and length errors of the synchronous signal are controlled within the range of 100 sampling points, the average recognition accuracy of the six modulation signals decreases to a limited extent, and the overall average recognition rate is still higher than existing passive modulation recognition methods.

Significance of the Proposed Method
The method proposed in this article provides a new approach to improve the accuracy of modulation recognition. Actively utilizing synchronous signals commonly used in underwater acoustic communication as detection signals, a passive time reversal autoencoder is designed to enhance signal features, improving the accuracy of modulation recognition and environmental adaptability.
The proposed model can also be used for modulation recognition of other types of signals, but suitable features need to be selected based on specific signals to better leverage the advantages of the model itself.

Future Research Direction
In the estimation of synchronous signal parameters, the method of estimating frequency modulation parameters based on FrFT and Hough transform is relatively mature. However, the method used in this paper is relatively simple for estimating signal starting position and length. In addition, the effect of passive time reversal mirrors is affected by noise. Under low SNR conditions, it is necessary to adopt noise suppression preprocessing to ensure the feature enhancement effect of PTR-AE.
In the future, more detailed research can be conducted on the parameter estimation problem of synchronous signals to further reduce parameter estimation errors. At the

Significance of the Proposed Method
The method proposed in this article provides a new approach to improve the accuracy of modulation recognition. Actively utilizing synchronous signals commonly used in underwater acoustic communication as detection signals, a passive time reversal autoencoder is designed to enhance signal features, improving the accuracy of modulation recognition and environmental adaptability.
The proposed model can also be used for modulation recognition of other types of signals, but suitable features need to be selected based on specific signals to better leverage the advantages of the model itself.

Future Research Direction
In the estimation of synchronous signal parameters, the method of estimating frequency modulation parameters based on FrFT and Hough transform is relatively mature. However, the method used in this paper is relatively simple for estimating signal starting position and length. In addition, the effect of passive time reversal mirrors is affected by noise. Under low SNR conditions, it is necessary to adopt noise suppression preprocessing to ensure the feature enhancement effect of PTR-AE.
In the future, more detailed research can be conducted on the parameter estimation problem of synchronous signals to further reduce parameter estimation errors. At the same time, effective denoising methods should also be studied to improve the enhancement effect of PTR-AE on signal features under low SNR conditions.

Conclusions
To reduce the impact of multipath effects on the accuracy of modulation recognition in the ocean, this paper actively utilizes synchronous signals in underwater acoustic communication to suppress the multipath effect. A passive time reversal-autoencoder based on synchronous signals is designed to enhance signal characteristics in underwater acoustic channels. Modulation classification is performed using the convolutional neural network.
The results show that PTR-AE can suppress multipath effects in underwater acoustic channels, and enhance power spectrum and squared spectrum features. It can also show good recognition performance in different underwater acoustic channels. Compared with existing methods, the modulation recognition rate of this article has been improved by at least 10%.

Data Availability Statement:
The data presented in this paper are available after contacting the corresponding author.