Bayesian Learning-Based Clustered-Sparse Channel Estimation for Time-Varying Underwater Acoustic OFDM Communication

Orthogonal frequency division multiplexing (OFDM) has been widely adopted in underwater acoustic (UWA) communication due to its good anti-multipath performance and high spectral efficiency. For UWA-OFDM systems, channel state information (CSI) is essential for channel equalization and adaptive transmission, which can significantly affect the reliability and throughput. However, the time-varying UWA channel is difficult to estimate because of excessive delay spread and complex noise distribution. To this end, a novel Bayesian learning-based channel estimation architecture is proposed for UWA-OFDM systems. A clustered-sparse channel distribution model and a noise-resistant channel measurement model are constructed, and the model hyperparameters are iteratively optimized to obtain accurate Bayesian channel estimation. Accordingly, to obtain the clustered-sparse distribution, a partition-based clustered-sparse Bayesian learning (PB-CSBL) algorithm was designed. In order to lessen the effect of strong colored noise, a noise-corrected clustered-sparse channel estimation (NC-CSCE) algorithm was proposed to improve the estimation accuracy. Numerical simulations and lake trials are conducted to verify the effectiveness of the algorithms. Results show that the proposed algorithms achieve higher channel estimation accuracy and lower bit error rate (BER).


Introduction
Underwater acoustic (UWA) sensor networks can be applied to a variety of underwater activities, including environment monitoring, resource exploration and equipment navigation. Reliable high-rate communication is significant for massive data transmission between sensor nodes [1]. However, the UWA channel is widely known as one of the most difficult communication media, and high-rate communication with high reliability is quite challenging. Since the sound attenuation rapidly increases with frequency, the bandwidth available for communication is extremely limited. Due to the reflection and refraction of sound in the ocean, the multipath spread in a medium-range shallow water channel can extend over tens or even hundreds of milliseconds [2], resulting in severe frequency-selective signal distortion. Due to the dynamic environment and moving transceivers in ocean, the UWA channel is time-varying. In particular, the normalized frequency offset factor induced by Doppler can be on the order of 10 −3 compared with 10 −7 for high-speed mobile radio channels [2].
Due to high spectral efficiency and good anti-multipath performance, orthogonal frequency division multiplexing (OFDM) and some OFDM-based modulation techniques are widely adopted for UWA communications [3][4][5][6]. In UWA-OFDM systems, accurate channel state information (CSI) is vital for channel equalization and adaptive transmission [7,8], which can significantly affect the reliability and throughput. However, it is difficult to estimate the time-varying channel impulse responses (CIRs) with excessive delay spread. To overcome this problem, some structural information such as sparsity can be exploited to improve channel estimation performance. The paths of the UWA channel tend to be clustered-sparse, which carry comprehensive structural information [9][10][11][12][13]. As shown in Figure 1, there are many weak paths centering around the eigen-paths associated with the medium refractions and surface/bottom bounces, which form the clustered-sparse structure. As different clusters are usually derived from different transmission routes, the channel coherence time usually varies across clusters. Thus, the path variation for different clusters can reflect the channel variation characteristics [14], and can be used for predicting the time-varying UWA channel [15]. However, the fading process of the UWA channel may randomly vary [2], which makes it difficult to accurately obtain the path variation for different clusters between adjacent OFDM blocks. To obtain the clustered-sparse structure of the UWA channel, we proposed a clustered-sparse Bayesian channel estimation method in [16], which is the preliminary version of this work (This work extends the conference paper [16]) by (i) adding a Bayesian learning-based channel estimation architecture; (ii) adding a noise-corrected clustered-sparse channel estimation algorithm to lessen the effect of strong colored noise; (iii) giving the detailed derivation and performance analysis of the proposed algorithms; and (iv) presenting more complete performance results related to modulation order, model hyperparameters and noise distribution). In addition to the time-varying CIRs with excessive delay spread, the complex distribution of noise in the UWA channel also makes the channel estimation difficult. Generally, there are two kinds of channel noise, including the ambient noise and site-specific noise. The former type is presented in the background of the quiet deep sea, and exhibits a colored characteristic when approximated by Gaussian distribution, while for the site-specific noise, significant non-Gaussian components should be carefully considered [2]. Some experiments have confirmed that the noise is non-uniform across different locations, times and frequencies [17]. As a popular noise model in most UWA-OFDM systems, additive white Gaussian noise (AWGN) is straightforward for modeling and solving the noise distribution of the UWA channel, but it deviates from the actual underwater noise. Obviously, the deviation of the noise model means the estimation error of the noise distribution, which will lead to the degradation of channel estimation and signal detection performance.
To tackle the challenge of estimating the time-varying UWA channel with excessive delay spread and complex noise distribution, we adopted Bayesian learning to obtain the multipath and noise distribution characteristics of the UWA channel, based on which Bayesian channel estimation is derived to improve the performance of pilot-based channel estimation. The main contributions of this work are summarized as follows: • To estimate the time-varying multipath channel with colored noise, we propose a novel Bayesian learning-based channel estimation architecture for UWA-OFDM systems. Specifically, a clustered-sparse channel distribution model is constructed to characterize the delay power spectrum and temporal correlation of each cluster in the multipath channel, and a noise-resistant channel measurement model is constructed to reduce the noise disturbance. By learning the model hyperparameters, the Bayesian channel estimation based on the two models can be iteratively optimized. • To obtain the clustered-sparse distribution, we propose a partition-based clusteredsparse Bayesian learning (PB-CSBL) algorithm. Through the cluster partition, different clusters can learn different channel correlation coefficients, and thus the inter-cluster interference of the multipath channel can be suppressed. • To lessen the effect of strong colored noise, we propose a noise-corrected clusteredsparse channel estimation (NC-CSCE) algorithm. Based on the iterative symbol decision and noise correction, the more accurate hyperparameters of the models can be obtained, which can improve the accuracy of the Bayesian channel estimation.
The rest of this paper is organized as follows. Section 2 discusses the related works. In Section 3, the system architecture and channel models are constructed. Then, the Bayesian learning-based clustered-sparse channel estimation method is designed in Section 4. In Section 5, the noise-corrected clustered-sparse channel estimation method is presented. The evaluation and result analysis are shown in Section 6. In Section 7, conclusions are given.

Related Works
The UWA channel is severely band-limited, and exhibits time-varying excessive delay spread. To this end, the sparse structure of the UWA channel can be exploited to improve the estimation performance, as most channel energy of the UWA channel is concentrated on a few paths [18,19]. The paths of the UWA channel also tend to be clustered-sparse [20], which has been frequently observed based on field experiments at different locations [9][10][11][12][13]. A detailed investigation on the variation of the significant paths in each cluster is presented in [21]. Based on the observation of channel sparsity, compressed sensing (CS)-based sparse channel estimation methods [22][23][24] have been studied, such as the greedy matching pursuit (MP) and orthogonal matching pursuit (OMP), which were adopted in [18,25], aiming to estimate the UWA channel impulse response (CIR) and the channel delay-Doppler-spread function, respectively. In order to reduce the convergence error of the OMP algorithm, the compressed sampling matching pursuit (CoSaMP) algorithm is proposed in [26]. However, CoSaMP-based channel estimators require knowledge of the channel sparsity level, which is difficult to obtain accurately in advance in practical applications. The joint sparsity among adjacent OFDM blocks was exploited by the simultaneous OMP (SOMP) [27] for higher estimation accuracy, but only stable path delays were considered and the temporal correlation of path gains was ignored. Typical convex optimization algorithms including basis pursuit (BP) and basis pursuit denoising (BPDN) tackle the NP-hard l 0 norm problem by transforming the l 0 norm to a convex l 1 norm. The BP algorithm was applied to estimate the path delays and Doppler scale factors of the UWA channel in [25], and the BPDN algorithm was proposed to estimate the slowly timevarying UWA channels in [28]. However, the global minimum of the cost function in BP or BPDN is not necessarily consistent with the sparsest solution. An empirical channel variation model, in which different clusters vary independently, was proposed in [14], and demonstrated that adaptation with multiple clusters outperforms that with one cluster for UWA channel estimation.
In recent years, Bayesian learning (BL) has been used to improve the performance of channel estimation. Compared with the CS-based methods that provide point estimation, Bayesian techniques can give the interval estimation of channel response and have a desirable property of preventing structural errors [29]. By using BL-based methods, the high accuracy of channel estimation was achieved in OFDM systems [30][31][32][33]. The temporal multiple sparse Bayesian learning (TMSBL)-based estimation method was further proposed in [32] to estimate the sparse channels by taking advantage of the channel coherence between consecutive OFDM blocks, and it can achieve better performance in strong correlated channels and maintains robustness in weak temporal correlated channels. Due to the high computational complexity of BL, some accelerated algorithms were proposed, such as the approximate message passing (AMP)-based method in [34]. More recently, some researchers utilized deep learning (DL)-based channel estimators for OFDM systems [35][36][37], which showed better channel estimation or equalization performance for nonlinear channel distortion. However, how to go about improving the generalization performance of pre-training DL models and reduce the complexity of training and inferring remain challenging.

System Architecture and Channel Models
In this section, the system architecture of Bayesian learning-based channel estimation is presented first. Then, the noise-resistant channel measurement model and the clusteredsparse channel distribution model are constructed.

Bayesian Learning-Based Channel Estimation Architecture
A typical UWA-OFDM communication scenario was shown at the top of Figure 2. The receiver can obtain the transmitted OFDM signal through dynamic multipaths, and the received signal may be affected by random colored noise. At the bottom of Figure 2, the system architecture of Bayesian learning-based channel estimation is given. To characterize the delay power spectrum and temporal correlation of each cluster in the UWA channel, the clustered-sparse channel distribution model is constructed and the model hyperparameters can be optimized by the iterative partition-based cluster evolution. Demodulated signal and pilot symbols were utilized to measure the instantaneous channel response, and the iterative noise measurement and data detection can improve the anti-noise performance of the channel measurement model. Based on the channel distribution model and the channel measurement model, the Bayesian channel estimation can be obtained. Important notations used in following sections are given by:

Noise-Resistant Channel Measurement Model
Considering the OFDM system with cyclic prefix (CP) and K subcarriers, the nonoverlapping sets of K d data subcarriers x d , K p pilot subcarriers x p and K n null subcarriers x n satisfy K = K d + K p + K n . Let T denotes the OFDM symbol period without CP, T cp denotes the length of CP and T s = T + T cp is the whole OFDM symbol period. The kth subcarrier is at the frequency: where f c is the carrier frequency. Assuming that there are M OFDM blocks in one frame, a passband waveform of the mth OFDM block at time t ∈ [−T cp + mT s , T + mT s ] is then given by where m = 1, 2, . . . , is defined as a transmitted symbol on the kth subcarrier at the mth OFDM block and q(t) is the pulse shaping filter: The UWA channel is a typical sparse multipath channel. For horizontal shallow water multi-path transmission with a range much greater than depth, each path has almost the same Doppler factor [38]. Thus, we assume that all the paths have the equal Doppler scale factor a, and the Doppler effect can be almost eliminated through effective Doppler estimation and compensation. Furthermore, the path delay remains stable across several consecutive OFDM blocks, and the path gains and Doppler scale factors are constant during one OFDM block, but vary from block to block. The time-varying channel response during the mth OFDM block can be expressed as where L is the path number of the multipath channel, which may be distributed in more than one cluster. A l and τ l denote the gain and delay of the lth path. Through the time-varying channel, the received passband signal during the mth OFDM block can be expressed as where w(t) is the additive noise. After synchronization, a popular two-step approach [39] can be adopted to mitigate the Doppler effect, which takes a coarse Doppler estimation and resampling as the first step and then performs the fine Doppler shift compensation using the null subcarriers. Moreover, CP can also be used for Doppler shift estimation [40].
Performing CP-OFDM demodulation, the output K × 1 vector of the mth OFDM block can be expressed as where . For M consecutive OFDM blocks, the transmission model of all K × M symbols can be expressed as where Y = [y (1) , y (2) , · · · , y (M) ], W = [w (1) , w (2) , · · · , w (M) ], and H = [h (1) , h (2) Considering the comb-type pilot arrangement, the transmission model of K p × M pilot symbols can be expressed as where Y P = [y (1) p , y (2) p , · · · , y (M) p ] and W P = [w (1) p , w (2) p , · · · , w (M) p ] are the submatrixs of Y and W at the location of the pilot subcarriers, respectively. Φ P X P F P is the known dictionary matrix where X P is the K p × K p diagonal matrix with the known pilots along its diagonal and F P is the K p × L DFT matrix.
By vectorizing H as h = vec(H T ), a pilot-based channel measurement model can be constructed according to (8), as where y p = vec(Y T P ), w p = vec(W T P ) and U P = Φ P ⊗ I M . AWGN can be adopted to approximate the additive noise w p . However, the attenuation and noise of the UWA channel may vary over the signal bandwidth. When a small number of pilot symbols are used to estimate the UWA channel with strong colored noise, the accuracy of the pilotbased channel measurement will decrease severely. To improve the performance of channel measurement, a noise-resistant channel measurement model is constructed according to (7) and it can be expressed as where y = vec(Y T ), w = vec(W T ) and U is the KM × LM matrix as where Φ (m) [i, j] denotes the entry in the ith row and the jth column of Φ (m) . All subcarriers are used for channel measurement in (10), as the unknown data symbols can be obtained by data detection or decoding. Assuming that the power distribution over all subcarriers is (10) satisfies: where Λ = diag(λ) ⊗ I M .

Clustered-Sparse Channel Distribution Model
According to the measurement equation in (9) or (10), the channel vector h can be estimated. However, the UWA channel usually has an excessive multipath delay spread. With so many undetermined channel taps, the prior clustered-sparse structure of the UWA channel can be exploited to obtain the sparse solution of the channel vector. First, we assume all that the L paths (i.e., L rows) in H are mutually independent, which can form C clusters. There are L d paths in the dth cluster and these clusters do not overlap, each of which occupies dense path delays. With the same time correlation characteristics, the dth cluster satisfies a first-order auto-regressive (AR) model, given by where β d is the temporal correlation coefficient of the dth cluster.
is a positive semi-definite diagonal real-valued matrix. Although a higher-order AR model may express better model approximation performance, it would lead to higher complexity. More importantly, a higher-order AR model means a higher overfitting risk, especially when there are only a few OFDM blocks in one frame. Hence, a first-order AR process was adopted to model the temporal correlation characteristic for each cluster of CIRs.
where γ d = γ d,1 , γ d,2 , · · · , γ d,L d controls the sparsity of the L d paths in the dth cluster. When Tr(Γ d ) → 0, the associated h d → 0. In other words, Γ d reflects the probability density of the dth cluster. B d is a Toeplitz matrix with the following form: and it reflects the variation characteristics of the dth cluster. Γ d and B d jointly determine the cluster distribution of the dth cluster. Accordingly, the prior probability density function of h can be written as

Bayesian Learning-Based Clustered-Sparse Channel Estimation
In this section, the parametric form of Bayesian channel estimation is given first, and followed by the cluster distribution learning. Then, the partition-based clusteredsparse Bayesian learning algorithm is presented to obtain the clustered-sparse distribution and Bayesian channel estimate. At last, the complexity and performance analysis of this algorithm is given.

Bayesian Channel Estimation
To simplify the implementation complexity of Bayesian channel estimation, only known pilot symbols are used for channel measurement. As pilot symbols are sparsely distributed, it is difficult to estimate the non-uniform power spectrum at the location of pilot subcarriers. Therefore, the additive noise in (9) is approximated as white Gaussian noise with noise variance λ in this section. According to (9) and (17), using the Bayes rule, the posterior density of h can be obtained as which is the complex Gaussian distribution with the posterior mean and covariance matrix: Obviously, the maximum posterior (MAP) estimates h MAP µ h and the covariance matrix Σ h are related to the hyperparameters [λ, γ 1 , γ 2 , · · · , γ C , β 1 , β 2 , · · · , β C ]. Using K n × M null subcarriers Y N = [y (1) n , · · · , y (M) n ], λ can be estimated as where i and j are the row and column indexes of Y N , respectively. The remaining hyperparameters [γ 1 , γ 2 , · · · , γ C , β 1 , β 2 , · · · , β C ] are related to the cluster distribution information.

Cluster Distribution Learning
The hyperparameters θ [γ 1 , γ 2 , · · · , γ C , β 1 , β 2 , · · · , β C ] can be estimated by maximizing the marginal likelihood function p(y p ; θ). This is equivalent to minimizing −log p(y p ; θ), giving the effective cost function: where Σ y p (θ) = λI + U P Σ 0 (θ)U H P . The above problem cannot be solved in closed form, and the expectation maximization (EM) method [41] can be employed to solve it iteratively. For unknown values of hyperparameters governing the prior density, h is considered as the nuisance variable and θ is estimated by maximizing: where θ (old) denotes the estimated hyperparameters in the previous iteration. Ignoring the first term unrelated to θ, the Q Function (23) can be simplified to: where and Σ h ∈ C ML×ML , respectively. µ h and Σ h are evaluated according to (19) and (20), given the estimated hyperparameters θ (old) and the noise variance λ. As shown in (14) and (16), Γ d and B d are determined by the hyperparameters γ d and β d , respectively.
The partial derivative of (24) with respect to γ d,i (d = 1, 2, . . . , where µ d,i h ∈ C M×1 denotes the corresponding ith path in µ d h , and Σ d,i h ∈ C M×M is the corresponding principal diagonal block for the ith path in Σ d h . So the learning rule for γ d,i can be given by The gradient of (24) over B d (d = 1, 2, . . . , C) is: Thus, the learning rule for B d can be derived as here, we constrain B d to have the form as shown in (16), and thus β d can be empirically calculated as β d α 1 α 0 where α 0 (resp. α 1 ) is the average of the elements along the main diagonal (resp. the main sub-diagonal) of the matrix B d in (28).

Partition-Based Clustered-Sparse Bayesian Learning Algorithm
When the cluster partition information is known, the hyperparameters θ, the posterior mean µ h and the posterior covariance Σ h can be iteratively obtained through the EM method. In other words, cluster partition and cluster distribution jointly determine the clustered-sparse channel distribution. However, it is usually difficult to obtain the cluster partition in advance. As shown in Figure 1, several significant clusters with high energy in CIRs are scattered over a long time spread, and usually do not overlap each other. As shown in (28), the boundary information of these clusters is important for calculating B d , because different clusters have different coherence times (i.e., different AR coefficients). In particular, some weak clusters that can be ignored will cause the hyperparameter estimation error of significant clusters when these clusters are not distinguished. Therefore, the PB-CSBL algorithm is proposed to update cluster partition by pruning the paths with very small energy (i.e., γ d,i → 0). As shown in Algorithm 1, the PB-CSBL algorithm mainly includes three sub-parts: Channel Estimation, Cluster Partition, Cluster Evolution, and they will loop until the iteration stop condition is reached. In the sub-part of Cluster Partition, only the dense paths with high energy are retained to form clusters and the paths with negligible energy are pruned. For a cluster with some pruned paths, it can further split into more clusters according to the positions of the pruned paths. To reduce the impact of an inaccurate clustering structure at the beginning of the iteration, the maximum possible number of discrete paths in one cluster is given. In the sub-part of Cluster Evolution, the derived cluster distribution learning rules are included. A good initial estimation of the unknown hyperparameters is significant for achieving the global maximum instead of a local maximum. For practical implementation, it was found that an initial estimation given by Γ d = I L d and B d = I M was sufficient for the proposed PB-CSBL algorithm.

Algorithm 1: Partition-Based Clustered-Sparse Bayesian Learning Algorithm
Input: the received pilot signal y p ; the dictionary matrix U P ; the noise variance λ; the length of discrete paths L; the maximum number of iterations r max ; the maximum number of discrete paths in one cluster LC max ; the threshold for prunning small hyperparameters γ th ; the threshold to stop the whole algorithm . 1 using the L d most significant continuous paths. 10: β d ← α 1 α 0 , where α 1 and α 0 can be obtained through B d . 11: end for Cluster Partition: 12: p (0) ← FindIndex(0 < γ list < γ th ). 13: if p (0) is not empty then 14: γ (old) list ← γ list and γ list [p (0) ] ← 0. 15: CLS ← Split(CLS, p (0) ), where CLS is splitted into C clusters according to p (0) . 16:

Complexity and Performance Analysis
Considering the excessive delay spread of the UWA channel and the purpose of saving pilot overhead, we assumed K d ≥ L ≥ K p ≥ M. The time complexity of the proposed PB-CSBL algorithm is compared with the least squares (LS) [39], OMP [25], SOMP [27] and TMSBL [32]. In order to avoid involving matrix inversion, pilot subcarriers are equispaced in the LS method. For fair comparison, the residual Doppler expansion is ignored when evaluating the time complexity of the OMP method. The comparison of time complexity is shown in Table 1, where N denotes the average iteration number. It can be seen that the time complexity of the PB-CSBL algorithm is higher than that of the LS, OMP and SOMP. This is due to the fact that the temporal correlation of CIRs is ignored in the LS and OMP. Although the stability of path delays is exploited in the SOMP, the correlation of path gains is discarded. Therefore, LS, OMP and SOMP may exhibit worse channel estimation performance compared to the PB-CSBL algorithm. As for the TMSBL with O(LK 2 p ) per iteration, it has lower time complexity than the proposed PB-CSBL with O(LK 2 p M 3 ) per iteration. This is mainly because it adopts the measure in [41], assuming that there is no temporal correlation among consecutive CIRs or no additive noise in received signal, but this may bring some performance loss, especially for slow-varying or high-noise channels. Therefore, when M is small, we do not adopt the simplification in [41]. The computational complexity of the PB-CSBL algorithm can be further reduced by implementing a first-order algorithm in E-step [34], but the implementation details are omitted here.
According to (20), the mean square error (MSE) bound of the sparse vector h can be expressed as where h is the estimate of h, and its estimation accuracy is related to the hyperparameter vector θ obtained by maximizing the marginal likelihood function. Thus, the lower bound of MSE is reached (i.e., equality holds) only when the perfect hyperparameters are obtained. Assume that all L paths are in one cluster (i.e., C = 1) and consider two special cases of θ: (1) The temporal correlation coefficient β = 0.
In this case, there is no temporal correlation among CIRs. It is assumed that each transmitted symbol is normalized to unit power. Then, the MSE bound of h (m) can be expressed by where Γ = diag(γ), γ = [γ 1 , γ 2 , · · · , γ L ] and γ l controls the variance of the lth channel coefficient.
As shown in the case of the high temporal correlation (i.e., β = 1), the anti-noise performance of channel estimation can be improved. As for the low temporal correlation among CIRs (i.e., β = 0), the anti-noise performance is not significantly improved. However, with more reference information, the better hyperparameter vector γ can be obtained. Therefore, when multiple OFDM blocks are utilized simultaneously, higher channel estimation accuracy can be achieved.

Noise-Corrected Clustered-Sparse Channel Estimation
In this section, the parametric form of noise-resistant Bayesian channel estimation is given first. Then, the data detection and noise measurement are derived. Then, the noise-corrected clustered-sparse channel estimation algorithm is presented. At last, the complexity and performance analysis of this algorithm is given.

Noise-Resistant Bayesian Channel Estimation
As the number of pilots K p decreases, the precision of channel estimation using the PB-CSBL algorithm will rapidly deteriorate. This is primarily because the algorithm does not utilize all the information available. Specifically, although both pilot and data symbols are transmitted, only the K p × M matrix Y P corresponding to the pilot subcarriers in (8) is used for estimating the CIRs. For the matrix Y in (7), the remaining K d × M observations are typically not used as they contain the unknown data symbols. If the unknown data symbols are perfectly obtained, using the Bayes rule, the posterior mean of CIRs based on the prior probability in (17) and the channel measurement equation in (10) can be written by with the posterior covariance matrix: Moreover, AWGN is assumed to simplify implementation complexity in the PB-CSBL algorithm, as the noise power distribution λ = [λ −K/2 , λ −K/2+1 , · · · , λ K/2−1 ] depends on the observation of all subcarriers. However, in a complex UWA environment, the noise power may vary significantly over the signal bandwidth. Using decided data symbols and corrected colored noise, the high estimation accuracy can be achieved by the MAP estimate in (32).

Data Detection and Noise Measurement
To obtain the MAP estimate in (32), the hyperparameters related to the cluster distribution, θ [γ 1 , γ 2 , · · · , γ C , β 1 , β 2 , · · · , β C ], can be iteratively updated, as shown in the proposed PB-CSBL algorithm, however, the detection of data signal x (m) d and the noise distribution λ are still unresolved. Using the notations , and ζ [θ, ϕ], the unknown hyperparameters set ζ can be estimated by maximizing the marginal likelihood function p(y; ζ). This is equivalent to minimizing −log p(y; ζ), giving the effective cost function: where Σ y (ζ) = Λ(λ) + U(x d )Σ 0 (θ)U H (x d ). Similar to the cluster distribution learning, the expectation maximization (EM) method [41] can be employed to solve (34) where ζ (old) denotes the estimated hyperparameters in the previous iteration. As shown in (35), the joint maximization simplifies into two independent maximizations over θ and ϕ.
On optimizing the latter function in (35) with respect to θ, one obtains the learning rules about θ as (26) and (28). Maximizing the former over x (m) d yields the minimum mean squared error (MMSE) equalizer: where y (m) d and λ d are the subvectors of y (m) and λ at the location of data subcarriers, and Σ Although the noise distribution is non-uniform, the noise powers of adjacent subcarriers are usually close (especially when the number of subcarriers is large). Then, we divide all subcarriers into non-overlapping Q groups, so that K/Q subcarriers can be grouped into the same one. Thus, the average noise power of the qth group can be calculated as where Ξ h is defined as Ξ h µ h (µ h ) H + Σ h . U ∈ C KM×LM can be calculated according to (11), as unknown data symbols are estimated by (36) and (37). U (q) ∈ C KM Q ×LM is the submatrix of U and denotes the qth group. y (q) denotes the qth group of y. After λ (1) g , λ (2) g , . . . , λ (Q) g are calculated according to (38), the non-uniform noise power distribution across all subcarriers, λ = [λ −K/2 , λ −K/2+1 , · · · , λ K/2−1 ], can be obtained by upsampling the Q average noise powers.

Noise-Corrected Clustered-Sparse Channel Estimation Algorithm
Based on the noise-resistant Bayesian channel estimation, noise measurement and data detection, we can develop the PB-CSBL algorithm into the NC-CSCE Algorithm. As shown in Algorithm 2, the sub-parts of Cluster Partition and Cluster Evolution have the similar process to the PB-CSBL algorithm. With the aid of Data Detection and Noise Measurement, the estimation accuracy in Channel Estimation is improved. Conversely, the improvement of channel estimation also ensures the high accuracy of data detection and noise measurement results. To achieve the global maximum instead of a local maximum, the NC-CSCE algorithm requires a good initial estimate with respect to the unknown cluster distribution, data signal and noise distribution. However, it is obviously difficult to initialize so many hyperparameters at the same time. Therefore, the PB-CSBL algorithm is utilized to initialize µ h and Σ h instead of initializing these hyperparameters. Once µ h and Σ h are initialized, these hyperparameters can be calculated as shown in Algorithm 2.

Complexity and Performance Analysis
The NC-CSCE algorithm utilizes all available subcarriers to achieve better channel estimation performance. Therefore, compared with the proposed PB-CSBL algorithm with O(LK 2 p M 3 ) per iteration, the NC-CSCE algorithm has higher time complexity, i.e., O((K d + K p ) 3 M 3 ) per iteration. Similar to the PB-CSBL algorithm, the complexity of the NC-CSCE algorithm can be further simplified by using the techniques in [34,41], however, these simplifications may lead to the performance degradation of channel estimation because of the errors introduced by the approximations. The details of the simplifications are omitted here.

Algorithm 2: Noise-Corrected Clustered-Sparse Channel Estimation Algorithm
Input: the received signal y; the noise variance λ; the length of discreted paths L; the maximum number of iterations r max ; the maximun number of discreted paths in one cluster LC max ; the threshold for prunning small hyperparameters γ th ; the threshold to stop the whole algorithm . Initialize: The PB-CSBL algorithm is utilized to obtain the posterior mean µ h ← µ h and covariance matrix Σ h ← Σ h ; the noise vector λ = λI K ; the list of path power γ list = 1 L ; the iteration counter r = 0. Cluster Evolution: 1: for d = 1, 2, . . . , C do 2: for i = 1, 2, . . . , L d do 3: using the L d most significant continuous paths. 7: where α 1 and α 0 can be obtained through B d . 8: end for Cluster Partition: 9:

12:
CLS ← Split(CLS, p (0) ), where CLS is splitted into C clusters according to p (0) . 13: end if 14: for d = 1, 2, . . . , C do 15: Update R d , Γ d and B d in CLS according to p (0) , γ list and β d , respectively. 16: end for Data Detection: 17: Refer to (36) and (37). Noise Measurement: 18: Update the noise vector λ according to (38). Channel Estimation: 19: 20: Obtain U according to (11). 21: 22: Refer to (32) and (33). Check stopping conditions: 23: r ← r + 1. 24: return the sub-part of Cluster Evolution until r ≥ r max or γ list − γ Based on the noise-resistant Bayesian channel estimation in the NC-CSCE algorithm, the MSE bound of the sparse vector h can be expressed as where h is the estimate of h. The MSE bound is related to the hyperparameters of the cluster distribution, data symbols and noise distribution. Only when the perfect hyperparameters are obtained is the lower bound of MSE reached. Similarly to analyzing the PB-CSBL algorithm, we assume that all L paths are in one cluster (i.e., C = 1) and consider the two special cases, namely the temporal correlation coefficient β = 0 and β = 1. For ease of analysis, the noise power keeps constant λ over all subcarriers. Furthermore, each transmitted symbol is normalized to unit power. Then, the MSE bounds of h (m) can be evaluated by respectively. In addition to what has been discussed when analyzing the PB-CSBL algorithm, Equations (40) and (41) also prove that the NC-CSCE algorithm utilizing pilot symbols and unknown data symbols can achieve a smaller MSE compared with the PB-CSBL algorithm.

Evaluation and Result Analysis
In this section, to evaluate the performance of the proposed algorithms, numerical simulations and a lake trial were conducted. The parameter settings of CP-OFDM in the simulations and lake trial are shown in Table 2. The bandwidth, carrier frequency and sampling frequency of the lake trial are closely related to the transducers used in the trial, and these values are set smaller in the simulation system to increase the simulation speed. Because of the high cost of the lake trial, the CP length, the number of pilot subcarriers and the number of null subcarriers are designed to be larger than the simulation system to ensure the availability of trial data. In the simulation system (the specifications of our computer are as follows: Intel(R) Core(TM) i5-6400 CPU 2.7 GHz (4 cores), 16 GB RAM, 1 TB memory with Windows and MATLAB installed), it is assumed that the UWA channel has 12 randomly generated paths, and that these paths remain fixed within one OFDM frame. The maximum path delay is less than CP length, and the amplitudes of paths are Rayleigh distributed with the average power decreasing exponentially with delay. The Doppler scale factor a ∈ (0, 10 −3 ) is randomly chosen. Moreover, there is no channel code module in the simulation system.

Simulation Results
Firstly, the proposed PB-CSBL and NC-CSCE algorithms were compared with the LS, OMP, SOMP and TMSBL for channel estimation. All paths are distributed in three nonoverlapping clusters, and these clusters are set with three different temporal correlation coefficients ranges over multiple OFDM blocks, which are [0.1, 0.3], [0.5, 0.7], [0.8, 1.0]. The additive noise is assumed to be white Gaussian noise, and thus all subcarriers are divided into the same group (i.e., Q = 1) for the proposed NC-CSCE algorithm. The transmitted data bits are mapped to QPSK or 16-QAM constellation symbols. The performance metrics used in the numerical simulations are the MSE of CIRs and BER of data bits, which are verified in 500 simulations for each value of SNR. Figure 3a,b show the MSE and BER performance of different channel estimation methods, respectively. LS performs worst, because the channel sparsity is not considered for this underdetermined system. Based on the joint estimation of multiple OFDM blocks, the SOMP makes full use of the strong temporal correlation of path delays and achieves better performance than the OMP. The delay and correlation of path gains are analyzed in the TMSBL, which makes it better than the SOMP. However, the proposed PB-CSBL performs even better, as it can learn different temporal correlation coefficients and reduce the interference among clusters. The proposed NC-CSCE algorithm utilizes all received pilot and data symbols for the channel estimation, which achieves the lowest BER close to the perfect CSI. When 16-QAM mapping is adopted, Figure 4 proves that the proposed algorithms still maintain good estimation performance. However, in the low SNR region from 0 to 10 dB, the NC-CSCE algorithm does not perform well. This is mainly because high-order modulation at low SNR is prone to symbol decision errors, which may be exacerbated in the iterative process of the NC-CSCE algorithm.  Then, to verify the estimation performance of the proposed algorithms under extreme temporal correlation coefficients (i.e., β = 0 and β = 1), the proposed algorithms are compared with their theoretical lower bounds. Multipaths are assumed to be sparse and share the same correlation coefficient. Then, under QPSK mapping and white Gaussian noise, the mean square error performance is presented in Figure 5. When β = 0, the MSE of the proposed algorithms is close to the theoretical lower bound in a high SNR region. A significant gap exists in the low SNR region, especially for the NC-CSCE algorithm. This is caused by imperfect hyperparameter estimation, as the lower bound is only reached when the perfect hyperparameters are obtained. When β = 1, the gap is more obvious in the whole SNR region because of the hyperparameter estimation error. However, the MSE performance with β = 1 is better than the MSE performance with β = 0 in a low SNR region. In short, a larger temporal correlation coefficient means stronger resistance to additive noise, however, the hyperparameter estimation error, such as the error of γ, is often larger than that of small temporal correlation coefficients in the high SNR region. Figure 6a,b show the MSE of the γ estimated by the proposed PB-CSBL and NC-CSCE algorithms, respectively. The MSE is evaluated under different SNRs and temporal correlation coefficients. As can be seen, a larger temporal correlation coefficient achieves a smaller MSE in the low SNR region, while a smaller temporal correlation coefficient reduces the MSE value in the high SNR region. Finally, to evaluate the estimation performance of the proposed NC-CSCE algorithm under colored noise, we compared the NC-CSCE algorithm with the TMSBL and PB-CSBL algorithms. The group number of subcarriers in NC-CSCE is adjusted to evaluate the channel estimation performance. In the simulation, all paths are distributed in three non-overlapping clusters. The clusters are set with three different temporal correlation coefficients ranges over multiple OFDM blocks, which are [0.1, 0.3], [0.5, 0.7], [0.8, 1.0]. All subcarriers are divided into eight groups and each group is set with a different SNR. As shown in Figure 7b, the reference SNR ranges are from 5 to 20 dB. With QPSK mapping, the MSE and BER performance curves are shown in Figure 7a, and the results prove that the proposed NC-CSCE algorithm is superior to the TMSBL and PB-CSBL algorithms. The NC-CSCE-8 denotes the NC-CSCE algorithm with eight groups, and it performs better than the NC-CSCE algorithm with other groups. Therefore, the setting of group number is critical for the NC-CSCE algorithm. In one instance, the actual distribution of colored noise can hardly be reflected with a limited group number; yet in another, a small symbol number will result in low measurement accuracy in one group.

Lake Experimental Results
To further verify the proposed algorithms, a real lake experiment was performed at Huating Lake, Anhui Province, China, in May 2018. Huating Lake covers an area of about 70 square kilometers, with an average water depth of about 17 m. We hung the transmitter and receiver on the side of a passenger ship and a speedboat, respectively. The passenger ship has a semi-open roof and is about 10 m long, while the speedboat is about half the size of the ship. During the experiment, the transmitter depth was about 10 m and the receiver depth was about 13 m. Affected by wind and current, their relative speed was about 0.5 m/s, and the distance between them was more than 4.5 km. The parameter settings of the experiment are in Table 2. In addition, the linear frequency modulation (LFM) signal was added at the beginning and end of each frame for time synchronization and Doppler estimation. The transmitted data bits were encoded by a 1/2 rate convolutional code, and the encoded bits were mapped by QPSK constellation. As showed in Figure 1, using the proposed NC-CSCE algorithm, we obtained the estimated CIRs for a 14 frame history with five consecutive OFDM blocks in one frame. The CIRs are estimated after Doppler compensation, and the total delay spread is around 15 ms. From the estimated CIRs, we notice that the paths are distributed in multiple clusters, and the channel shows a clustered-sparse structure. Figure 8 shows the BER performance after convolutional decoding. The BERs of the proposed two algorithms are still the lowest for most frames, followed by the TMSBL and the SOMP, while the OMP has the worst BER performance. Obviously, the performance gap between proposed algorithms and other methods further verify the superior performance of the proposed algorithms.

Conclusions
In this paper, we propose a novel Bayesian learning-based channel estimation architecture to estimate the time-varying multipath channel with colored noise for UWA-OFDM systems. Specifically, a clustered-sparse channel distribution model was constructed to characterize the multipath distribution, and a noise-resistant channel measurement model is constructed to reduce the noise disturbance. To obtain the clustered-sparse distribution, we propose the partition-based clustered-sparse Bayesian learning algorithm. To lessen the effect of colored noise, we proposed a noise-corrected clustered-sparse channel estimation algorithm to improve the estimation performance. Taking advantage of the iterative clustered-sparse distribution learning, symbol decision and noise correction, the accuracy of Bayesian channel estimation can be improved. Experiments proved the effectiveness of the proposed algorithms for channel estimation.
In future work, channel decoding can be incorporated into the Bayesian learning-based channel estimation architecture to explore the possibility of reducing symbol decision errors. In the noise-resistant channel measurement model, the channel gains are approximately constant within one OFDM block, which may be too optimistic in some rapidly timevarying UWA channels. Therefore, some channel estimation models that deal with doubly selective fading channels, such as the popular basis expansion model, can be combined with the clustered-sparse Bayesian learning to improve the estimation performance. At the same time, channel feedback and reconstruction will be analyzed to realize adaptive modulation and coding in UWA-OFDM communication.

Conflicts of Interest:
The authors declare no conflict of interest.