An Enhanced Data-Driven Array Shape Estimation Method Using Passive Underwater Acoustic Data

: Beamforming-based signal enhancement technologies in passive sonar array processing often have poor performance due to array distortion caused by rapid tactical maneuvers of the towed platform, oceanic currents, hydrodynamic effects, etc. In this paper, an enhanced data-driven shape array estimation formulation is proposed using passive underwater acoustic data. Beamforming based on a hypothetically ideal array is ﬁrstly employed to perform the detection of narrow-band components from sources of opportunity, and the corresponding phases of these detected narrow-band components are subsequently extracted to acquire time-delay differences. Then, a weighted outlier-robust Kalman smoother is proposed to acquire enhanced estimates of the time-delay differences, since the underlying properties of slowly changing time-delay differences in the hydrophone array and diverse signal to interference and noise ratios in multiple narrowband components are explored; and its Cramer–Rao Lower Bound is also provided. Finally, the hydrophone array shape is estimated based on the estimated time delay differences. The proposed formulation fully exploits directional radiated noise signals from distant underwater acoustic targets as sources of opportunity for real-time array shape estimation, and thus it requires neither the number nor direction of sources to be known in advance. The effectiveness of the proposed method is validated in simulations and real experimental data.


Introduction
Hydrophone arrays are widely used in underwater acoustics to perform direction-ofarrival (DOA) estimation and beamforming-based signal enhancement in the passive sonar system. Most array signal processing techniques are based on the assumption that the shape of the array remains unchanged. Unfortunately, the shape of a hydrophone array, such as a towed array, is often perturbed due to fluctuations in ship tactical maneuvering, oceanic currents, nonneutral buoyancy, and so on, particularly in the demand of the long-towed arrays for the acquisition of a low-frequency signal source at long range. These would lead to uncertainty in array shape and deviation from the hypothetical one [1][2][3][4]. However, beamforming as a typical signal enhancement technique requires knowledge of accurate array positions to properly process the measurements, which means an accurate estimate of the array shape is required to properly beamform the acoustic arrivals. The uncertainty in array shape inevitably degrades the beamforming performance [2]. A theoretical derivation of the performance loss in a distorted array shape due to the tactical maneuvering is provided [5]. Therefore, it is requisite to implement an array shape estimation to improve the acoustic system performance.

Related Works
A huge amount of effort has been devoted to array shape estimation. In general, these methods can be classified into source-independent methods and source-dependent methods. For source-independent methods, one effective method is to place depth-measurement sensors and compasses at several points within the array. These additional sensors provide localized respectively vertical and horizontal information on the transverse displacements of the array, whereas it is less economical due to the use of additional sensors. Further work is to model the array shape by a low order polynomial and to use the available sensors' information to determine the coefficients of this polynomial [6]. A Kalman filter technique with a small angle approximation is used to estimate the array shape evolution over time [7], and an extended Kalman filter approach is also considered with the predicted state update following a higher-order nonlinear model [8].
As for source-dependent methods, acoustic hydrophone data would be used to estimate the array shape when there is no sensors' information within the array or there are too few working sensors, and the shape can be estimated by measuring the relative time or phase delays among sensors due to one or more sources with known or unknown directions [9][10][11][12][13][14][15][16][17][18][19]. Historically, the array shape model employed for acoustic-based estimation is often considered to be a horizontal axis with perturbations at regular intervals on the vertical axis. Under this formulation, the Cramer-Rao Lower Bound (CRLB) for the narrow-band plane-wave sources is provided for source localization with a single known source direction or a known angle between the first receiver and the second one [9]. The maximum-likelihood (ML) approach is used to jointly estimate both the array and source parameters for multiple narrow-band or broadband sources with the known number of sources through a two-stage iteration procedure [10]. Subspace-based eigenvector techniques form another way to estimate array shapes when a single source at a known frequency and direction is known [11]. The eigenvector method for estimating the positions of the array elements is based on the eigen-decomposition of the array covariance matrix to acquire the phase delays between adjacent array elements. The statistic of phase delays is analyzed in [12]. The CRLB of the subspace-based array shape estimation method is provided in [3]. The eigenvector method is similar to the method [13], and it can be also regarded as a special case of the maximum-likelihood method [14]. Reference [15] proposes an improved array shape estimation method by jointly exploiting the acoustic hydrophone data and sensor data.

Contributions
In this paper, an enhanced data-driven array shape estimation formulation is proposed using passive underwater acoustic data. Instead of using the traditional method with a deterministic signal model and a known number of discrete sources, a novel datadriven approach is proposed by exploiting these directionals radiated noise sources, such as shipping and cargo as sources of opportunity for real-time array shape estimation. The conventional beamforming based on the hypothetical uniform linear array is firstly employed to acquire an improved waveform due to spatial gain from the hydrophone array. Since the radiated noise sources involve the amount of narrow-band components that originated from the vibration of mechanical components like diesel generator and air conditioning equipment, we perform the narrow-band frequency detection and extract their corresponding phases for the acquisition of noisy time-delay differences. A novel weighted outlier-robust Kalman smoother (WORKS) method is then proposed to acquire enhanced time-delay estimates by exploring the properties of slowly changing time-delay difference in hydrophone array as well as the diverse signal to interference and noise ratios (SINR) in multiple narrow-band components, and its CRLB is also analyzed and provided. Finally, the hydrophone array shape is obtained using the estimated time delay differences, and thus beamforming-based improved radiated noise signal is acquired. The proposed formulation fully exploits the directional radiated noise source signals as sources of opportunity for real-time array shape estimation, and thus it requires neither the number nor direction of sources to be known in advance. Furthermore, the proposed WORKS approach has the capability of acquiring the enhanced time-delay difference estimation by treating the weights associated with each observed noisy time-delay difference data in a probabilistic approach.
The contributions of the paper lie in three aspects: (1) An enhanced data-driven array shape estimation formulation using passive underwater acoustic data is provided. The proposed formulation fully exploits the directional radiated noise source signals as sources of opportunity for real-time array shape estimation, and thus it requires neither the number nor direction of sources to be known in advance. (2) A novel weighted outlier-robust Kalman smoother method is proposed to acquire enhanced time-delay estimates by the exploitation of slowly changing time-delay differences in hydrophone array and has the capability of alleviating the effects of the outliers resulted from the diverse noise levels in the multiple frequencies.
(3) The CRLB of the proposed method is analyzed.

Organization and Notation
The paper is organized as follows: The underwater acoustic radiated noise signal model is introduced in Section 2. A novel array shape estimate technique is described in Section 3, and the weighted robust-outlier Kalman smoother method is also proposed in this section. Performance bound is also provided in Section 3. Simulations and experiments demonstrate the robustness of the proposed algorithms compared to conventional methods in Section 4. Concluding remarks are given in Section 5.
Notation: We use lower-case (upper-case) bold characters to denote vectors (matrices). p(·) denotes the probability density function (pdf), and N (x|a, b) denotes that random variable x follows a Gaussian distribution with mean a and variance b. In addition, (·) T and (·) H denote transpose and conjugate transpose, respectively, I N denotes the N × N identity matrix. ∇ x = ∂ ∂x 1 , · · · , ∂ ∂x N denotes the gradient operator respect to the variable vector x = [x 1 , · · · , x N ] T ∈ R N×1 , and E(x) represents the expectation operator of the random variable x.

Underwater Acoustic Radiated Noise Model in the Distorted Array
In this section, the underwater acoustic radiated noise model in a distorted hydrophone array is introduced. We assume that a thin flexible cable with M omnidirectional hydrophone sensors mounted at fixed inter-element spacing d is employed. Due to the transverse motion of ship tactical maneuvering, oceanic currents, and hydrodynamic effects, it would result in transverse displacements of the array from a straight line, as shown in Figure 1. The positions of the first two sensors define a baseline for the coordinate system and consequently are always assumed to reside on the x axis. It is observed that a deformed array occurs when any one of the inner sensor y positions are no longer zero. The deformation is caused by bending at the sensor junctions due to the soft and elastic external shell of hydrophone array, which implies that the inter-sensor distances remain constant, and thus we have, where the location of the mth sensor is denoted by (x m , y m ) in the x − y Cartesian plane. With respect to K far-field sources of opportunity impinging on the hydrophone array and the source with the bearing or direction of θ k , the range difference between the first two sensors becomes, as shown in Figure 1, where r k,1 and r k,0 denote the distances from the kth source to the second and reference sensors, respectively, and the result above is based on the far-field hypothesis. However, the locations of sensors deviate from the nominal ones, when the hydrophone array shape is distorted or deformed. Assuming the deviation angle formed by the line between the mth sensor and (m + 1)th sensor along x axis is denoted as φ m , the range difference becomes, For the first two sensors as a baseline, φ 0 = 0. The range difference between the mth sensor and the reference can be written by, The corresponding time-delay difference is where v is the speed of acoustic wave in water, and ∆τ k,i denotes the time-delay difference between the (i + 1)th and ith sensors with respect to the source of opportunity. It should be noted that the accuracies of the array shape estimates will be confined to the accuracies of each estimated time-delay difference for adjacent sensors, giving rise to error accumulations of the time-delay difference according to Equation (5). Assuming the kth radiated noise source signal as s k (t) with the bearing of θ k , the received signal y m (t) in the mth sensor can be written as where ζ k,m and τ k,m , represent respectively the amplitude and time delay from the source to the mth sensor, and e m (t) is an additive noise.
According to the statistical model of the underwater acoustic radiated noise [20,21], the corresponding time-domain s k (t) is expressed by, where p kl (t) is a narrow-band signal arised from the vibration of machinery, p km (t) represents a modulation function due to the propeller movement, and p kc (t) denotes a stationary hydrodynamic noise.
The narrow-band signal p kl (t), which is often generated from the vibration of mechanical components like diesel generator and air conditioning equipment, can be denoted by the superposition of multiple sinusoid signals as, where α kl is an amplitude of the lth single-tone frequency from the source, L is the number of the sinusoid signals, and f kl is the corresponding frequency.
In the real sonar system, the narrow-band signals in the underwater acoustic radiation noise often have high source levels owing to inevitable mechanical vibration, and thus they are easier to be detected and recognized, compared to the broadband hydrodynamic noise signals [20,21]. Figure 2 shows a typical spectrum example of radiated noise from a ship. It is observed that there exist several narrow-band frequencies that have higher powers and are prominent in their nearby frequency ranges, their phases involve the information of time delays between the sources of opportunity and sensors in the deformed array. It possibly provides a way to estimate the distorted array shape.

Array Shape Estimation
In this section, a novel data-driven array shape estimation formulation will be provided, and the weighted robust-outlier Kalman smoother method is proposed to acquire an enhanced time-delay difference estimation.

Phase Extraction Based on Detected Narrow-Band Frequencies
Assume a passive sonar system with the sample rate of F s and the time interval of T, and thus the total number of sample is N = F s T. In accordance with Equation (6), the observed signal y m (t) at the mth sensor is decomposed into N narrow-band components using Discrete Fourier Transform (DFT). The resulting narrow-band components corresponding to any one of the frequency points may be expressed as, where f n = nF s /N is the nth frequency grid, ζ k,m ( f n ) is the coefficient at the f n frequency with a high value, when a strong sinusoidal noise signal at the frequency of f n exists, and K is the number of the far-field sources of opportunity.
According to Equation (9), the wideband beamforming is turned to be multiple narrowband beamforming through the DFT procedure. The weight vector with the pre-defined ϑ q in the nth frequency point is written by, where ϑ q ∈ (0, 180), q ∈ 1, · · · , Q, and Q is the number of the pre-defined bearing grid. According to weights in Equation (10), beamforming is achieved by where y( f n ) = [y 1 ( f n ), · · · , y M ( f n )] T ∈ C M is a received signal vector in the nth frequency point. According to the result in Equation (11), theθ k is estimated based on the maximumvalue detection criterion. The output signal based on the estimatedθ k is, where M is the number of the hydrophone sensors. The operations above in Equations (9)- (12) are involved in the basic wideband frequency beamforming method. The weight vector in Equation (10) is optimal, and the signal to noise ratio (SNR) of the output signalỹ k = [ỹ k (1), · · · ,ỹ k (N)] T can be improved by M times, when the array is nominally uniform linear in the Gaussian noise case. However, the array should be deformed severely and not remain a straight line due to the transverse motion of the towing vessel, oceanic currents, and hydrodynamic effects, weights in Equation (10) would be wrong. Therefore, the performance of beamforming would worsen dramatically, when the hydrophone array greatly deviates from the nominal one. Taking a uniform linear towed array of M = 128 hydrophone sensors with a spatial interval of 1.6 m for example, the shape of the distorted array approximately becomes a parabola and deviation angle φ m is sin(π/(2M)), the average loss of the narrow-band signals would reach up to about 15 dB.
The output signalỹ k based on the nominal array shape would not be the optimal one in the distorted array cases because the time delays in the weight vector Equation (10) do not match the true ones. Considering the fact that an important component in radiated noise originates from the machinery noise and can be regarded as the superposition of multiple narrowband signals, we would utilize the phases of these narrow-band signals to estimate time delays in the distorted array instead of the time delays in the nominal array. Instead of directly performing the element-space narrow-band detection, we carry out the detection on the spectrum of P yk ( f ), which is acquired by performing the DFT operation on the output signalỹ k . By exploiting the background calibration and threshold-based detection techniques, the detected narrow-band frequencies radiated by the kth source of opportunity can be denoted asf kl with l ∈ 1, · · · , L 0 , where L 0 is the number of detected narrow-band frequencies. The corresponding unwrapped phase of the lth narrow-band frequency from the kth source to the mth sensor can be acquired by, In a similar way, we obtain the phase of ψ m+1 (f kl ) in the (m + 1)th hydrophone,and thus estimate the adjacent phase difference by

Proposed Weighted Outlier-Robust Kalman Smoother
Ideally, the estimated time-delay difference ∆τ k,l,m between the (m + 1)th hydrophone sensor the mth hydrophone sensor expressed by Equation (14) should be independent of narrow-band frequencies, and be identical among L 0 narrow-band frequencies. The reason is that its phase difference is only determined by the spatial interval d between adjacent hydrophones and the bearing of impinging source. However, the time-delay differences ∆τ k,l,m with l ∈ 1, · · · , L 0 are often diverse in the real case. Figure 3 shows an example in a public SWellEx-96 HLA dataset [22]. It is observed that the time-delay differences in the detected narrow-band frequencies are completely different across sensors. The possible reason is that there are diverse SINRs in these detected narrow-band frequencies, and these time-delay differences will deviate from the true values at these frequencies. One way to alleviate the fluctuations of ∆τ k,l,m is average these estimated time-delay differences and acquire [23] However, the average operator being a linear estimator is sensitive to outliers and easily deviates from the true value when there exists a small number of outliers. The ∆τ k,l,m acquired by the adjacent phase differences easily deteriorates in the low SINR cases, and thus become abnormal. In this case, these outliers would lead to a large deviation in the average operator. Therefore, the average operator is not an optimal way to acquire the time-delay estimation in the distorted array. To address this issue, we propose to use a weighted robust-outlier Kalman smoother.
Instead of averaging all the ∆ψ k,l,m in multiple narrow-band frequencies, we stack ∆ψ k,l,m with l = 1, · · · , L 0 into a vector as z k,m = ∆ψ k,1,m , · · · , ∆ψ k,L 0 ,m T , and regard z k,m as the observation data sample. In conformity with the analysis above, the observation equation is expressed by, where α( f ) = 2π f 1 , · · · , 2π f L 0 T ∈ R L 0 ×1 is a vector, ∆τ k,m is the true time-delay difference between the (m + 1)th and the mth sensors, and m ∈ R L 0 ×1 is an additive noise.
According to Equation (17), the observed time-delay differences in multiple narrowband frequencies can be considered as the true time-delay difference plus noise. To alleviate the effects of outliers and diverse noise levels in the multiple frequencies, we introduce a novel Bayesian algorithm that treats the weights associated with each observation data samples probabilistically as proposed by [24,25]. However, unlike [24,25] which use a scalar to weight the observed data sample and all elements in observed data sample are equally treated, we introduce a vector weight γ m for each observed data sample z k,m such that the variances of z k,m are weighted with γ m , owing to diverse SINRs in the corresponding narrow-band frequencies, and have, where R = diag(r), which is a covariance matrix for observation noise, is a diagonal matrix with r = r 1 , · · · , r L 0 It should be noted that each elements in r is weighted by the corresponding γ m . A Gamma prior distribution for the weights in order to ensure they remain positive is imposed on the weight vector γ m as, where Gamma(·) represents a Gamma distribution, and a 0 and b 0 are hyperparameters respectively, and often set to a 0 = b 0 . Considering the fact that external shell of hydrophone array is soft and elastic, it is reasonable to assume the time-delay differences ∆τ k,m are slowly changing across sensors (These ∆τ k,m with m = 0, · · · , M − 2 should be identical, when the array is ideally straight), and thus we have the state equation as where ∆τ k,m+1 is the time-delay difference between the (m + 2)th sensor and the (m + 1)th sensor, and ε m is an additive noise. For the m = 0, the initialization of ∆τ k,0 is assumed to follow the Gaussian distribution of ∆τ k,0 ∼ N (∆τ k,0 |0, ν 0 ). Without loss of generality, a Gaussian prior is considered to model this noise, and can be expressed by, where ν is the variance of state noise.
In accordance with these time-delay differences in the M sensors, we have observed data samples ∆τ k,0:M−2 at one time. To estimate the posterior distributions of the random variables and parameter values, we should consider the log evidence of the data samples observed, i.e., log p z k,0:M−2 , ∆τ k,0:M−2 , γ 0:M−2 . It is a typical Kalman smoother.
We can treat this entire problem as an Expectation Minimization (EM) learning problem [26]. The expectation of the complete data likelihood should be taken with respect to the true posterior distribution of all hidden variables q ∆τ k,0:M−2 , γ 0:M−2 . However, since this is an analytically intractable expression, we make a factorial approximation of the true posterior as follows [27], This factorization of ∆τ k,0:M−2 considers the influence of each ∆τ k,m within its Markov chain. According to this factor approximation, all resulting posterior distributions over hidden variables become analytically tractable. We can derive the final EM update equations from standard manipulations of Gaussian and Gamma distributions and acquire the following, E-step : Forward recursion, Backward recursion, M-step : Once these time-delay differences {µ m } M−1 m=0 have been obtained, we can estimate the array shape. Since the first two sensors define the the baseline, the estimated bearingθ k of the kth noise source can be calculated based on the estimated time-delayμ 0 according to Equation (3) Therefore, the estimated deviation angleφ m can be calculated as according to Equation (3),φ The location (x m+1 ,ŷ m+1 ) of (m + 1)th sensor can be estimated by , where m = 2, · · · , M − 1. Finally, we perform the beamforming and acquire the improved output signal based on the estimated array shape as where n = [1, · · · , N].

Summary of Proposed Distorted Array Shape Estimation Scheme
The proposed distorted array shape estimation scheme is summarized in Table 1. Given the observed array data {y m (t)} M m=1 . (1) Calculateỹ k with respect to the kth source based on the hypothetical uniform linear array.

Performance Bound Analysis
To quantify the estimation performance of the proposed method, we perform the CRLB analysis in this section. Considering the problem of estimating the time-delay difference vector ∆τ k = [∆τ k,0 , · · · , ∆τ k,M−2 ] T from a set of observed data vectors of with z k,m = ∆ψ k,1,m , · · · , ∆ψ k,L 0 ,m T , we can acquire the joint probability distribution function (pdf) p(∆τ k , z k ) from the kth source as based on Equations (18) and (20), The Fisher information matrix (FIM) J(∆τ k ) for the parameter vector ∆τ k is defined as follows, When the derivatives and expectation exist, we acquire an alternative formula for the FIM as, Let ∆τ k = ∆τ k (z k ) be an estimated of the parameter vector ∆τ k . The mean-square error matrix (MSEM) is defined as In fact, the MSEM is bounded by an inverse of the FIM, and we acquire [28], The inverse of J −1 (∆τ k ) is just the well-known posterior or Bayesian Cramer-Rao bound [29], and it will be denoted as C(∆τ k ). According to the proposed generative model in Section 3.2, the logarithm of joint pdf of the state and measurement histories p(z k , ∆τ k ) can be expressed as, Motivated by [28], we follow a similar recursive approach for the smoothing CRLBs, and express CRLBs for the filtering as in the forward recursion, where m = 0, · · · , M − 2. We acquire the CRLBs of the smoother in the backward recursion [28], where, According to Equations (18) and (19) in the generative model, p(z k,m ) would follow a multivariate Student distribution by integrating the weight vector γ m , and can be expressed by, where st(·) denotes the student-t distribution, and the freedom υ = 2a 0 . According to [28], we derive the recursive formula for the CRLB of the proposed WORKS method as in the forward recursion, and we acquire the corresponding recursive formula for the CRLB as in the backward recursion, where m = 0, · · · , M − 2.

Simulations and Experiments
In this section, both simulations and experiments will verify the effectiveness and correctness of the proposed method. Several typical source-dependent approaches, such as subspace-based eigenvector method [11] and average operation [23], are used for comparisons in simulations and experiments.

Simulations
Suppose that a uniform linear towing array has M = 128 hydrophone sensors with a spatial interval of 1.6 m, and the bearing of a radiated noise source is 60 • . The deviation angle φ m is set to be sin(π/(2M)), and thus the shape of the distorted array approximately becomes a parabola, as the black line shown in Figure 4a. The time duration is T = 5 s and the sampling rate F s = 4 kHz. According to the characteristics of radiated noise [20,21], a three parameters model with f c = 500 Hz, f m = 200 Hz and κ = 0 is used to model the continuous spectrum component. In addition, seven narrow-band (sinusoid) signals with frequencies of 59 Hz, 97 Hz, 125 Hz, 163 Hz, 198 Hz, 232 Hz and 280 Hz are considered. Without loss of generality, additive Gaussian noise and additional interferences in three narrow-band frequencies with 59 Hz, 97 Hz, and 163 Hz are considered with the SINR of −10 dB.  Figure 4b shows the beamforming result based on the hypothetical uniform linear array. The maximum value in the beam energy diagram is 55 • , which is deviated from the true bearing of 60 • , due to the time-delay mismatches in the distorted array. Only five narrow-band frequencies with 59 Hz, 97 Hz, 198 Hz, 163 Hz, and 232 Hz are acquired after the narrow-band detection based on the beamforming with the hypothetical array shape, and the narrow-band frequencies with 125 Hz and 280 Hz are missing. The corresponding time-delay differences of these detected narrow-band frequencies with 59 Hz, 97 Hz, 163 Hz, 198 Hz, and 232 Hz are shown in Figure 4d. It is observed that the time-delay differences are generally different, and show significant fluctuations particularly in the frequencies of 59 Hz, 97 Hz, and 163 Hz, due to the diverse SINRs on these narrow-band frequencies.
Several outliers appear in the estimated time-delay differences. The time-delay differences should be independent of the considered narrow-band frequencies and be identical for the five narrow-band frequencies.
The ground truth of time-delay differences is shown in Figure 4c with the black line, while the average of these phase differences is shown in Figure 4c with the blue line. The estimation of time-delay differences in the subspace method is shown in Figure 4c with the green line. It is found that the estimates of time-delay differences in both the subspacebased method and the average operation method exhibit relatively large deviations from the ground truth and lead to quite similar estimates of the time-delay differences. The possible reason is that both the subspace-based method and the average operation approach are based on the narrow-band phase differences between adjacent elements, and both of them utilize the average operators for the estimates of time-delay differences in essence. The estimated time-delay differences in the proposed WORKS method, which significantly inhibits outliers, are quite closer to the true values, as shown in Figure 4c with the red line. According to the detected narrow-band frequencies, the array shape estimation is acquired by Equations (34) and (35), as shown in Figure 4a. The estimated array shape in the proposed method is closer to the real one, compared to those of the other two methods, even if gaps exist between them, especially in the latter part of the array. It is due to the error accumulations of time-delay differences in the array shape estimation, according to Equation (5).
The output spectrum based on the estimated array shape is provided in Figure 4e. It is reasonable that the output spectrum has higher gain, and is closer to the spectrum in the known array shape, compared to these in the subspace method and average operation method. It is also observed that the missing narrow-band frequency of 125 Hz becomes quite clear and can easily be detected in this enhanced spectrum. To quantify the superior performance, we use the amplitude error as the performance index. The amplitude error is defined as the absolute error between the amplitude of the detected narrow-band frequency and the amplitude in the corresponding frequencies in the spectrum based on the known array shape. The amplitude errors of the three methods are presented in Table 2. It is obvious that the proposed method is superior over the average operation and the subspacebased methods, owing to the ability to mitigate the effects of the outliers. It is also observed that the proposed method generally has the highest gain improvements among these three methods, and its amplitude errors in these detected narrow-band frequencies are generally less than 0.3 dB. In this subsection, the variance estimates of time-delay differences are analyzed. All the parameters are the same as the simulations above. Figure 5 shows the variance results of the estimated ∆τ. It is obvious that the proposed approach has the lowest estimated variance across 128 sensors, and its average variance is 0.023, which is quite closer to the CRBL of 0.006, compared to 0.842 and 0.821 in the average operation and the subspacebased methods, respectively. The main reason is that the proposed method is quite robust to these outliers in the estimated time-delay differences possibly caused by the low-SNR phases, whereas both average operation and the subspace-based method are sensitive to these outliers, and would lead to relatively poor performance.

Performance Comparison versus Bearings of the Radiated Noise Source
The effectiveness of the proposed method has been verified thanks to the aforementioned simulations. In the following simulation, the performance versus bearings of the noise source will be analyzed. The source bearings change from 5 • to 175 • by 10 • steps. Other parameters are the same as in the previous simulations. It is observed that all three approaches seem robust to the bearings of targets and their average gain errors of narrowband frequencies in the end-fire bearing are comparable to those in the upright bearing. However, the average gain errors of results in the proposed method are much smaller than in the average operation and the subspace-based methods, as shown in Figure 6.

Performance Comparison versus SINR
In this simulation, we focus on the performance comparison versus diverse SINRs. The narrow-band frequency interferences of 59 Hz, 97 Hz, and 163 Hz with different amplitudes, which would cause diverse outliers in time-delay differences, are considered to form diverse SINRs from −3 dB to −14 dB with the step of −2 dB. Figure 7 shows average amplitude errors versus diverse SINRs. The proposed method has superior performance over the average operation and the subspace-based methods. It is observed that the average amplitude errors of the subspace-based method and the average operation approach gradually increase as the SINR decreases, whereas the proposed method is robust to different SINRs, which means the proposed method could mitigate the effects of different SINRs.

Experiments in Lake Trial Data
In this section, a set of lake trial data based on the distorted array is used to verify the effectiveness of the proposed method. In the experiment, the power amplifier as the acoustic source with 15 m in depth actively transmits the 83 Hz, 104 Hz, and 109 Hz low-frequency narrow-band signals and a chirp signal whose initial frequency is 90 Hz and bandwidth 8 Hz. The other ship equipped with a towing array of 64 hydrophone sensors with a uniform interval of 1.5 m, located 1 km away, is used to acquire the radiated noise data. There is a slight bow in the array. The data duration of T is 300 second, and the sample rate is 4 kHz.
The three narrow-band frequencies of 83 Hz, 104 Hz, and 109 Hz are acquired by using the narrow-band frequency detection. The corresponding time-delay differences are shown in Figure 8a. It is observed that some perturbations of time-delay differences exist among these detected frequencies, and they would result in large derivations from each other and lead to the appearances of outliers. The possible reason is that the SINRs in each detected narrow-band frequencies are quite different. The time-delay differences estimation in the average method, the subspace-based method, and the proposed WORKS method are shown in Figure 8b, respectively. The time-delay differences in the proposed method are quite smooth and slowly change, compared to thees in the average method and the subspace method. The output time-frequency spectrum based on the enhanced signals during 300 s are given in Figure 9a-d. It is found that the spectrum of enhanced signals based on the proposed method has higher amplitudes in both narrow-band signals and chirp signals, compared to these in the other two methods. The proposed method is superior to the average operation and the subspace-based methods. The quantitative performance concerning the average gains of three narrow-band frequencies are provided as shown in Figure 10. Note that the gains are calculated by the narrow-band frequency amplitude in the output signal over that in the conventional beamforming assuming a hypothetical uniform linear array due to the unknown distorted array shape. It is found the gains in the our proposed method are generally above 15 dB, even up to 30 dB, compared to about 5 dB gains in both the average operation and the subspace-based methods.

Conclusions
A robust and data-driven signal enhancement scheme is proposed to take into account the array distortion in this paper. The proposed scheme firstly extracts the phases of narrowband signals from the conventional beamforming result using narrow-band signal detection. Then, we proposed a novel weighted outlier-robust Kalman smoother to estimate timedelay differences to mitigate outliers in the time-delay differences. Finally, an enhanced output signal is achieved by beamforming based on the estimated time-delay differences. The proposed scheme is validated by both simulations and experiments. The proposed method is a data-driven approach that fully exploits the directional radiated noise signal due to distant underwater acoustic targets as sources of opportunity for real-time array shape estimation. It requires neither the number nor direction of sources to be known in advance.

Data Availability Statement:
The data presented in this paper are available after contacting the corresponding author.