Joint Passive Detection and Tracking of Underwater Acoustic Target by Beamforming-Based Bernoulli Filter with Multiple Arrays

In this paper, improved Bernoulli filtering methods are developed to deal with the problem of joint passive detection and tracking of an underwater acoustic target with multiple arrays. Three different likelihood calculation methods based on local beamforming results are proposed for the Bernoulli filter updating. Firstly, multiple peaks, including both mainlobe and sidelobe peaks, are selected to form the direction-of-arrival (DOA) measurement set, and then the Bernoulli filter is used to extract the target track. Secondly, to make full use of the informations in the beamforming output, not only the DOAs but also their intensities, the beam powers are used as the input measurement sets of the filter, and an approach based on Pearson correlation coefficient (PCC) is developed for distinguishing between signal and noise. Lastly, a hybrid method of the former two is proposed in the case of fewer then three arrays. The tracking performances of the three methods are compared in simulations and experiment. The simulations with three distributed arrays show that, compared with the DOA-based method, the beam-based method and the hybrid method can both improve the target tracking accuracy. The processing results of the shallow water experimental data collected by two arrays show that the hybrid method can achieve a better tracking performance.


Introduction
In many target detection and tracking applications, detection and estimation are usually carried out separately under the Bayes framework [1]. Hypothesis tests based on the Neyman-Pearson or Bayes criterion [2] are applied to detect the existence of the target. Sequential probability ratio test (SPRT) which makes full use of all collected data based on the continuity of the target state [3] is also a detector. The estimators start working after the target has been declared to be detected. Examples of this type of sequential estimators include the Kalman filter (KF) [4], the extended Kalman filter (EKF) [5], the unscented Kalman filter (UKF) [6], and the particle filter (PF) [7]. Especially, the PF is a sequential Monte Carlo (SMC) method that uses a random particle system (states and weights) to approximate the relevant probability distribution. PF provides a generalized approximate solution for non-linear, non-Gaussian, and unconventional measurement problems where analytical solutions are difficult to obtain [8]. In PF applications, importance sampling is critical. To better apply the particle filter, researchers proposed appropriate resampling methods to reduce the degeneracy of the particle system [9,10]. Gaussian sum particle filter is a special kind of particle filter easier to implement where the relevant probability distribution is assumed as the sum of Gaussian distributions [11]. For low signal-to-noise (SNR) scenarios, track-before-detect (TBD) algorithms arise. In the TBD algorithm, detection is not declared for every frame. Instead, detection is declared when the estimated track is returned after processing of several data frames [12]. In the multi-target tracking problems, multiple hypothesis on measure-to-track association can be filtered to achieve the tracking result. A comparison between multiple hypothesis Kalman filter and the multi-target particle filter can be found in Reference [13].
In passive acoustic target surveillance applications, however, a non-cooperative target may not always radiate a signal that is observable for detection and tracking. The noisy environment increases the uncertainty of obtaining the exact target existence information from available observations. A filter that can distinguish the existence of a target by noisy observations and can be able to resume track after losing the target for a while is needed.
Recently, target tracking based on random finite set (RFS) theory provides new opportunities for state-space processing which has been widely used in many tracking examples, such as video image recognition [14], infrared target tracking [15], and vehicle tracking detection [16]. Mahler and Vo systematically extended RFS theory to target tracking and made great progress in both theoretical research and applications [17]. Bernoulli filter is one of the RFS target tracking methods where the target existence and location state is assumed to be a Bernoulli RFS. It can be used to handle a variety of problems such as multiple on/off switching systems, detection imperfections, and inaccurate estimates [18]. Target tracking with bearings-only measurements by RFS-based filters has been studied from various aspects [19][20][21][22]. In Reference [23], Bernoulli filter tracking along with fusion is applied in an active multi-static acoustic sensor network. Gune considered the problem of detecting and tracking a high-frequency acoustic target in the frequency-azimuth plane obtained by an acoustic vector using Bernoulli filter [24]. However, to the best of our knowledge, modeling and filtering were conducted directly on the direction-of-arrival (DOA) estimations, and the DOA estimation process was usually neglected in the previous research.
In active radar monitoring, the measurement set of Bernoulli filter is assumed to be composed of a DOA estimation of the true target, whose error is a zero mean Gaussian variable, and several DOAs generated by clutters, which form a Poisson RFS [25]. However, the physical basis is different in the study of passive detection and tracking of an underwater target. There are no clutters, but, in estimating target DOA by passive beamforming in a noisy environment, outputs at the sidelobes may be higher than that at the mainlobe corresponding to the target position, resulting in significant errors. To deal with this problem, we jointly use beamforming and Bernoulli filters. A more general assumption is made for the underwater acoustic signal [26], where Gaussian noise is added directly to the received sound pressure. We conduct conventional beamforming (CBF) on the target signal band to process frequency snapshots at distributed arrays separately, and then employ the Bernoulli filter to track the target.
The main contribution of our work is that three different likelihood calculation methods based on local beamforming results are proposed for the Bernoulli filter. Firstly, we construct a DOA measurement set from the passive beamforming results (the DOA-based method). Similar to the method used in Reference [27], to increase the probability of selecting the target DOA, we select the angles corresponding to multiple peaks of the beamforming results as the measurement set. The set contains the DOA estimation corresponding to the real target (mainlobe) and several spurious peaks caused by sidelobes. Secondly, different from the method where only DOA peaks are conserved, we propose to use the beam power as the measurement set instead, because the beam power has not only DOA peak position information but also peak intensity information. Hence, we can use beam pattern to refine the target position (or DOA, or azimuth) from the data beam power. However, the tracking results can be trapped in the end-fire direction of one array where the beam pattern is flat and ambiguous. Therefore, at last, we propose a hybrid method, aiming to combine the advantages of the DOA-based method and the beam-based method in different situations. We use the practical acoustic model to generate sound pressures in the simulations to compare the proposed methods. Filters are also used to process experimental data collected in the real world.
Throughout this paper, we use uppercase calligraphic letters to denote the RFS variable, and ∅ denotes the emptyset. For any RFS S, the cardinality of S is denoted by |S|. Symbol \ denotes the set-difference operation. Bold lowercase letters are used to denote column vectors, and bold uppercase letters denote matrices. Spaces of variables are denoted by uppercase blackboard bold letters, where N 0 is the space of non-negative integers. We use blkdiag(·) to denote block diagonal matrix, where matrices in the brackets are diagonal blocks. N (m x , C x ) is denoted as a Gaussian distribution with mean m x and covariance matrix C x , and the corresponding probability density function (PDF) with respect to x is denoted as N (x; m x , C x ). We use f as a generic symbol for the finite set statistics (FISST) PDF of the RFS variables, while p is denoted as a generic symbol for the traditional PDF of the vector variables. For any variable or function ζ iterated in the prediction and update process of the Bayesian filtering framework, ζ k|k−1 is used to represent the prediction of ζ from time k − 1 to time k, and ζ k|k represents the update of ζ with all previous states and measurements from the beginning to time k.
The rest of the paper is organized as follows. The Bernoulli filter are briefly reviewed in Section 2, and the prediction step is provided. In Section 3, we establish two kinds of measurement sets derived from beamforming in a multiple-array system, which leads to different update process refers to DOA-based, beam-based, and hybrid Bernoulli filter, correspondingly. Section 4 presents the numerical implementation of the filters for practical usage. The proposed filters are implemented to process data from both simulations and real experiments. The performance results are shown in Section 5, and the paper is finally concluded in Section 6.

Bernoulli RFS Target State Model and Prediction Process
The concept of RFS was introduced by Mahler to describe the state of a randomly present target or multiple targets [17]. RFS-based Bayes filter aims to solve problems such as detection, estimation, and data association of target tracking within a unified paradigm [17,28].
In the RFS-based Bayes filter, target states and sensor measurements are modeled as RFSs instead of traditional vector formulation. Suppose that n k targets present in the surveillance area at time k, and their state vectors are x k,1 , . . . , x k,n k , respectively. The state of all the targets at time k is presented by an RFS on the state vector space, X k = {x k,1 , . . . , x k,n k } ∈ F (X), where X is the state vector space. All the important information about the target, including the target number (|X k | = n k ) and the dynamic states of each target, has been contained. Likewise, the RFSs of sensor measurements are defined as Z k = {z k,1 , . . . , z k,m k } ∈ F (Z), where Z is the measurement vector space.
The posterior probability density of the state variable is propagated in the predicting and updating procedure of the Bayes filter. By using the RFS state and measurement sets, we can extend the traditional Bayes filter from vector space to RFS framework. The predicting and updating process of the RFS posterior densities f k|k (X k |Z 1:k ) can be expressed as follows [18]: where Z 1:k = Z 1 , . . . , Z k is the collection of all the previous measurements and ϕ k (Z k |X k ) is the likelihood function of measurement set Z k , given the state X k . The RFS state is assumed to be Markovian with transitional density φ k|k−1 (X k |X k−1 ). With the updated posterior densities f k|k (X k |Z 1:k ) at every time step k, RFS state X k can be estimated by criteria such as maximum a posteriori probability (MAP). Although the set integrals in the RFS iteration are usually intractable, solutions can be found for joint detection and tracking of a single target. For the case of tracking a target which may randomly be present or disappeared in a surveillance area, Bernoulli RFS can be adopted to model the target state, whose cardinality distribution is Bernoulli. Such RFS is either an empty set (with probability 1 − q) or a singleton (with probability q), whose element x is the state vector of the target distributed following the PDF p(x). Thus, the FISST PDF of a Bernoulli RFS is completely determined by the target existence probability q as Apparently, the Bernoulli RFS X can be completely determined by the corresponding existence probability q and vector PDF q(x). Therefore, predicting and updating the RFS posterior densities f k|k (X k |Z 1:k ) are equivalent to predict and update the corresponding existence probability and vector PDF. The predicted existence probability and vector PDF are denoted by q k|k−1 and p k|k−1 (x), respectively. q k|k and p k|k (x) are denoted as the posterior existence probability and vector PDF after update.
The kinetic characterization of the target is used in the prediction stage. This dynamic is described by the Markov transitional density (from time k − 1 to k): where φ k|k−1 (X |∅) describes the birth probability of target with state x, p b = Pr(|X | = 1||X | = 0) is the probability of event "target birth", and b k|k−1 (x) denotes the distribution of the state of the born target on the vector space, which is defined in advance. When prior knowledge of the target is not available, the distribution of the born target is usually assumed to be a uniform distribution over the vector space. φ k|k−1 (X |{x }) is the survive probability, p s = Pr(|X | = 1||X | = 1) is the probability of "target survive", and π k|k−1 (x|x ) is the transitional density on the vector space, which is related to the motion characteristics of the target. A typical target motion model is the constant velocity (CV) model. In a two-dimensional region, the target state vector is expressed by is the Cartesian coordinate of the target and (ẋ k ,ẏ k ) is the velocity component, correspondingly.
The vector state transition equation of CV model is given by: where F k = blkdiag(G k , G k ) is the state transition matrix, and v k is the zero-mean Gaussian process noise with a covariance matrix Q k = blkdiag(Ξ k , Ξ k ). The diagonal blocks are given by where T k is the sampling interval from time k − 1 to time k, and 1 is the potential acceleration disturbance of the moving target. Therefore, the transitional density is Gaussian, With the definition of the Bernoulli RFS and its Markov transitional density, the prediction of RFS state PDF can be achieved by predicting the target existence probability and vector PDF as follows [18]: Equations (7) and (8) completely specify the prediction stage of the Bernoulli filter.
In the next section, we present the update stage for the multi-array passive beamforming system. Different kinds of measurement sets are used, which lead to different forms of the update procedure.

Measurement Model for Multiple Arrays and Update Process
Using a multi-array system for joint passive detection and tracking of the underwater acoustic target, we first perform beamforming on each local array to obtain the relative DOA information of the target to the local array. Subsequently, in the Bernoulli filter, with known array locations, the tracking of the target state (position and velocity in two-dimensional Cartesian coordinates) is achieved using the beamforming results of all the arrays via triangulation.

Passive Local Array Beamforming
For the convenience of practical implementation, CBF is adopted to process the collected data. We divide the signal band of interest into multiple narrow subbands and perform narrow-band CBF on every subband, and then sum up the subband results to obtain the final beam power.
By fast Fourier transform (FFT), the recorded time domain signal of a given S-sensor array is converted to frequency-domain snapshots on M narrow subbands. The center frequency for the mth subband is denoted by f m , m = 1, . . . , M. For every subband, L snapshots are collected to form one data frame matrix, R m = [r m1 , . . . , r mL ]. The lth column vector in matrix R m is an S × 1 vector. In the plane wave assumption, for signal oriented from a target with azimuth θ, elevation ψ, and range r, the propagation delay difference on the array is described by the steering (or replica) where p s , s = 1, . . . , S, are the Cartesian positions of the array elements, and k = − 2πc f m [sin(ψ) cos(θ), sin(ψ) sin(θ), cos(ψ)] T is the wave number of the mth frequency bin with c being the sound speed. In the shallow water environment, depth is negligible compared to the range between target and array. That is, ψ ≈ π/2, which leads to cos(ψ) ≈ 0 and sin(ψ) ≈ 1. Thus, in the local array beamforming step, we only need to estimate the azimuth (DOA) of the target. Defining the data sample covariance matrix as C m = 1 L R m R H m , where the superscript H indicates the conjugate transpose, the beam power of narrow band CBF is then w H m (θ)Cw m (θ). The weight vector of CBF is exactly the driving vector, w m (θ) = a m (θ). Summation of the beam powers of all the M subbands leads to the basic broadband beamforming of the form When the featured frequencies of the target are included in the summed subbands, the target DOA will be enhanced in the beam power. The estimation of DOA is then given byθ = arg max θ∈(−π,π] P(θ).
In some scenarios, for example, when SNR is low or the array sensor spacing does not meet half-wavelength requirement, strong noise at the sidelobes and grating lobes will degrade the performance of DOA estimation. The correct peak still exists in the beam power but is usually lower than the spurious peaks. In other words, the beam pattern corresponding to the correct azimuth is corrupted but remains in the presence of noises and interferences. For the purpose of making full use of the beamforming results, we propose two kinds of derivative measurements of P(θ) in the following context and use them in the update procedure of Bernoulli filter.

Update Process in Multiple Array Systems
Suppose that a system equipped with N distributed arrays is deployed in the surveillance region. Every array in the system provides a measurement set at frame k. Denote the measurement set of the nth array at frame k by Z (n) k , n = 1, . . . , N. As the arrays are usually spatially separated, it is reasonable to make the assumption that measurements obtained at different arrays are statistically independent. Thus, the entire likelihood can be expressed as the product of local likelihood functions of individual arrays: where η (n) * ,1 (·) and η (n) * ,0 (·) denote the likelihood of the nth array. η (n) * ,1 characterizes the similarity between measurement sets to a target-oriented measurement with vector state x. On the contrary, η (n) * ,0 is the likelihood to noise. Symbol * will be replaced by specific methods in the following context. With the definition of the Bernoulli RFS and the likelihood function of the measurement RFS in Equation (10), the update step can be expressed by updating the target existence probability q k|k and the posterior PDF p k|k (x) on the vector space, respectively [18], where is the measurement likelihood ratio function. The likelihood functions η (n) * ,1 and η (n) * ,0 are important in the update iteration, and will be introduced in the rest part of this section.

DOA-Based Method
In this section, we discuss the construction method of a DOA-based measurement set. In the case of low SNR, the sidelobe outputs may be higher than the mainlobe in the beam power. However, our simulations showed that, if we conserve certain number of peaks, the correct DOA information of the target has a high probability to be retained, which can thus be transferred to the subsequent filter. Therefore, for each local array, we search for λ largest peaks on the beam power curve P(θ). For an individual array, the azimuths corresponding to these peaks compose the measurement set Z = {z 1 , . . . , z λ }. The measurement set can be divided into two parts Z = T ∪ F , where T is the RFS for the true target DOA, and F is the RFS for the spurious peaks. As mentioned before, true set T can be modeled as a Bernoulli RFS. When the true azimuth is not included in the selected peaks, T is an empty set. At each frame, let peak number λ be randomly generated according to the Poisson distribution with the mean of ν, so that the set F of spurious peaks is approximately a Poisson RFS whose PDF is defined as where u(z) is the density of the false DOA peak z. For the target generated true azimuth set, the conditional PDF is given by where p d (x) is the probability of detecting the object in state x; that is, for a given state x, the probability of the corresponding azimuth peak is selected. Moreover, p(z|x) is the conventional likelihood function of true azimuth measurement due to the object x. Similar to other Bernoulli filters, p(z|x) is assumed to follow a Gaussian distribution N (z; θ(x), σ 2 B (x)). The mean value θ(x) is the correct DOA of the target x, and the standard deviation σ B (x) equals to half of the 3-dB mainlobe width of the corresponding beam pattern [29].
In an N-array system, let Z (n) = T (n) ∪ F (n) be the azimuth peak set of the nth array. When the target is absent, X = ∅ and the measured peak set is exactly the set of spurious peaks. Therefore, the likelihood is given by When a target is present, the likelihood is given by The update process of the DOA-based method is completed by substituting the likelihood functions in Equations (16) and (17) into Equations (11)-(13).

Beam-Based Method
In the DOA-based method, we only retain the DOA peak positions from the beamforming outputs for filtering, but the intensity information in the beam power can also be useful. Therefore, we proposed a beam-based method in this section, where the entire beam power for θ ∈ (−π, π] of the arrays are used as the measurements, that is, Z (n) k ≡ P To get a likelihood function of these features, we propose to use the concept of correlation between the data beam power and the beam pattern for some assumed signal arrival direction.
For a fixed array, the element positions in the array are known a priori. Thus, we have known beam pattern (in power) for a given DOA θ x of some target state x: where · 2 is to take the 2 -norm of a vector. The beam pattern of the nth array is denoted by B (n) (θ; θ x ). The correlation between the beam pattern and the noisy beam power can tell the likelihood of measurements to target state. In statistics, the Pearson correlation coefficient (PCC) is a measure of the linear correlation between two variables. The PCC between the measured beam power P (n) k (θ) and beam pattern B (n) (θ; θ x ) is given by whereP θ,k andB θ (θ x ) are the means of P (n) k (θ) and B (n) (θ; θ x ) with respect to θ, respectively. The PCC has a value between −1 and +1, where +1 is the perfect positive linear correlation, 0 is the no-linear correlation, and −1 is the perfect negative linear correlation. While the likelihood functions take values in [0, 1], mapping rules from PCC to likelihood functions are required. In our applications, when the local beamforming result does not match the beam pattern from the assumed direction, the PCC is negative or close to 0, indicating that there is only noise in this direction, so that the likelihood function to noise should be larger. On contrast, when the local beamforming result completely matches the beam pattern, PCC is equal to 1, indicating that there is a target in the assumed direction. The closer the PCC is to 1, the larger the target likelihood function should be set.
Note that negative PCCs are considered as mismatch here. Readers may be confused because negative correlations are usually considered as strong correlations in applications such as image recognition. That is, if the direction of a measured image are opposite to the original one, the PCC between them equals to −1, but they are still the same object (highly correlated). However, in our application, the PCC between the local beamforming result and the beam pattern being negative means that peaks appear at the DOA positions that should have been valleys, while false peaks appear at where they should not have been existed. This is not a match in our expectation. If the negative PCC was still considered as strong correlation, we would get wrong DOA estimation.
The desired mapping rules should satisfy the requirements listed above. For this purpose, we first define a non-negative PCC ρ (n) k (x) ≤ 0. Then, we generate likelihood functions from ρ (n) k+ (x) as follows: where σ ρ is an adjustable probability mapping scale parameter. The update process of the beam-based method is completed by substituting the likelihood functions in Equation (20) into Equations (11)-(13).

Hybrid Method
The beam-based method is beneficial to improve the accuracy when the beamforming result highly matches the beam pattern of the target DOA. However, since there are wider peaks near the end-fire direction in the beam pattern of an individual local array, the tracking results can sometimes be trapped in the end-fire regions. If there are more than three arrays, the effect of such error can be eliminated thanks to the geometric relations between the arrays. In the case of fewer arrays, this problem leads to localization error. Although the DOA-based method may also have ambiguity problems in the end-fire direction, since only one or two DOA peaks are retained for filtering, it is less affected than the beam-based method where the operation of calculating the PCC will aggravate the performance degradation caused by the ambiguity in the end-fire direction. Therefore, the method using DOA measurements only is advantageous for protecting the true value from the effects of the end-fire direction in the filtering procedure, especially when the number of arrays is not enough. Both methods have their own advantages and disadvantages. To complement each other, we propose a hybrid approach combining these two choices. We define the new likelihood functions as the geometric mean of likelihood functions of the DOA-based and beam-based methods as follows: where α ∈ [0, 1] is the portion coefficient to adjust the weights of beam-based and DOA-based likelihood in the hybrid method. When α = 1, the function is the same as the beam-based likelihood, while α = 0, it is exactly the same as that of the DOA-based method.

SMC Implementation of the Bernoulli Filter
The PF/SMC method provides a general framework for the implementation of Bernoulli filter [18].
The PDF p k|k (x) is approximated by a particle system {w k is the state of particle i and w (i) k is the normalized particle weight, that is, ∑ where δ b (·) is the Dirac function at point b. In the prediction stage, the prediction of PDF in Equation (8) is completed by predicting the particle system The particles are divided into two parts: N b newly born particles and N p persistent particles. The states of the birth particles are drawn from a pre-defined birth distribution, for example, uniform distribution. The birth particles are equally weighted by w The persistent particles are the extension of those particles at the previous time. Their states are drawn from the state transitional density The relationship between the weights of the persistent particles and the previous particle weights is given by w Note that we use proportion rather than equality, because the weights will also be normalized in the prediction stage.
In the update equations of target existence probability, the integrals can be approximated using predicted particles as To update PDF p k|k (x), the weights of particles are updated by At the end of the update procedure, sampling importance resampling (N p particles persist) is adopted to regularize the particle system [9]. With all these particles, the MAP estimation of the target state is approximately the weighted summation of the updated particles at time k, that is, ∑

Performance Tests
In this section, performance of the three Bernoulli filters is evaluated using simulational and experimental data.

Simulation Results
In the simulations, we test the performance of the DOA-based, beam-based and hybrid methods in a shallow water environment, as depicted in Figure 1. As shown in Figure 1a   The featured frequency of the target signal is 50, 60, 70 and 80 Hz. The frequency snapshot of the received signal of the nth array at time k is given by r where A k is the complex Gaussian random amplitude of the source signal following the distribution CN (0, σ 2 s ). The Greens' function vector g (n) k represents the transmission loss from target location x k to the sensors in array n. Additive complex Gaussian noise is denoted by n (n) k which follows the distribution CN (0, σ 2 n I), where 0 is a zero mean vector and I is the identity matrix of compatible dimension. Noises at different arrays are assumed to be independent and identically distributed. The source level and noise level are given by SL = 10 log σ 2 s and NL = 10 log σ 2 n , respectively. The received data are generated on the frequency band from 40 Hz to 100 Hz with interval ∆ f = 1 Hz. For the featured frequencies of the target, the simulated frequency snapshots are generated containing the signal component, and for simplicity the level of the source signal of each featured frequency is assumed to be identical, i.e., To get data that better fit the realistic shallow water environment, normal mode model KRAKEN is used to generate the transmission loss g (n) k at signal frequencies for all the arrays with environment parameters shown in Figure 1b. For the other frequency bins in the received bandwidth, noise-only data snapshots are generated. In processing, sliding window is used to pick the snapshots. One single frame k with L = 11 snapshots are used for beamforming at one time with five snapshots overlapping between the adjacent frames. The sample time interval between snapshots are set to be 2.5 s and thus the time interval in between frames equals to 15 s.
There are two kinds of criteria here to evaluate the filter performance. Optimal subpattern assignment (OSPA) is a measure that is used to analyze the estimation error of the target state and target number at the same time. The OSPA distance between two RFS X = {x 1 , . . . , x n x } and Y = {y 1 , . . . , y n y }, n x , n y ∈ N 0 , is given bȳ if n x ≤ n y , where d (c) (x, y) = min(c, d(x, y)) is the distance between x and y, and d(x, y) is the traditional distance between vectors. If n y ≤ n x , thend p (Y, X ). The summation is taken over all permutations π on {1, 2, . . . , n y }.d (c) p is the OSPA metric of order p with cut-off c.
Here, we use the Euclidean distance d(x, y) = x − y 2 . The cut-off distance c represents the upper bound of the estimation error of the state vector.
Another metric used is to measure the probability of target lost in the tracking problem, called circular position error probability (CPEP), defined as where matrix H is used to extract components of state vector for performance analysis, and r is the pre-defined position error radius, outside which the estimation is regarded as target missing. Given the target track in Figure 1a, we set NL = 70 dB, and Bernoulli SMC filters are applied for SL =130 dB and SL = 120 dB, respectively. Based on experience, the Bernoulli filter parameters are set to N p = 5000, N b = 5000, p s = 0.99 and p b = 0.01. When calculating the particle weights for DOA-based method, we set ν = 6 and u(z) = 1/2π (uniform distribution of azimuth). Probability of detection is assumed to be constant p d = 0.2. Probability of target existence, CPEP, and OSPA of both location and velocity tracking are compared for the DOA-based, beam-based and hybrid methods. To eliminate the variation caused by SMC implementation, results are averaged over 500 trails with the same filter parameters. Probability of target existence is the updated probability q k|k in the filter iteration. Target is declared when q k|k > 0.5 and the state vector estimation is accepted. For q k|k <= 0.5, target is not detected and the RFS state is an empty set. When calculating CPEP, we use H = diag(1, 0, 1, 0) and r = 2 km. That is, when the positioning error exceeds 2 km, the target is regarded as lost. At last, we extract the position and velocity components of the state estimate to calculate the OSPA separately so that the location and the speed estimation error can be compared individually. The error cut-off is c = 4 km for location and c = 0.1 m/s for velocity.
Results are shown in Figures 2 and 3. Both beam-based and hybrid methods can better capture the appearance and disappearance of the target signal. The DOA-based method fails to detect the disappearance of the target after k = 185. When the target appears, it takes several frames for the beam-based and hybrid filters to detect the target. After the target is detected, the tracking precision of the beam-based and hybrid methods is better than that of the DOA-based method in the simulations. The target track after appearance can be roughly divided into three parts. In the first part, the target is close to HLA2 and is far away from the other two HLAs. As the target moving away from HLA1, the tracking performance degrades (OSPA increases and CPEP decreases). In the middle of the track where target is not close to any of the three arrays, the tracking performance is relatively poor. When the target comes into the area near HLA1 and HLA3, performance improvement can be observed. The performance change is more pronounced when SL = 120 dB. In Figure 3, a possible reason for the sudden change around k = 150 is that the target enters a relatively high SNR region of HLA3 (the relationship between performance and SNR is not linear), so the target location is corrected. However, while the target location is corrected, the filter actually detects a large change in the received signal, so that the target existence probability decreases in a short time. The sudden correction of the target position increases the estimation error of target velocity. These errors are corrected after a short period of time. When SL = 130 dB, similar but insignificant performance changes can be observed in the same period. Both beam-based and hybrid methods, where the amplitude information of beamforming is used, are more sensitive to degradation of signal quality. However, in most frames, both beam-based and hybrid methods still outperform the DOA-based method.

Experiment Data Processing Results
The experiment dataset used to evaluate the performance was collected in the experiment SWellEx-96 Event S5 [30]. In this event, sound pressure was recorded by multiple arrays, a vertical line array (VLA), a tilted line array (TLA) and two HLAs. The experiment area along with the array locations and ship-recorded target trace are shown in Figure 4. The locations of arrays and target trace with recorded data (in the red box) in latitude and longitude are converted to Cartesian coordinates and a surveillance region of size 2 km × 8 km is formed. The corresponding latitude and longitude of the Cartesian origin are 32 • 37.25 N and 117 • 21.82 W. For the purpose of tracking the target in the XY-plane, only the data collected by 2 HLAs, referred to as HLA north (HLAn) and HLA south (HLAs), are used. Those arrays are deployed at the seabed with the water depth of 213 m (HLAn) and 198 m (HLAs), respectively. The number of effective array elements for processing is 27 (HLAn) and 28 (HLAs), respectively. Two multi-tone sources travel in the region at speed around 5 knots. The respective source depths are 54 m and 9 m, and their transmitted tone frequencies are listed in Table 1. Data from frequency band of 40-150 Hz are used to test our Bernoulli tracking algorithms. Array element locations as provided on the dataset website are used to calculate the beam pattern in advance.  The recorded sound pressure is sampled at 3276.8 Hz. Data flow is transformed to frequency-domain snapshots with 8192-point FFT. The snapshots length is thus 2.5 s (0.4 Hz frequency bin widths). Similar to the simulations in the previous section, in beamforming, one single frame k contains L = 11 snapshots with 5 snapshots overlapping between the adjacent frames. A total of 199 frames were processed. During the experiment, for frames 48∼54, 120∼123 and 131∼134, the deep source stopped projecting continuous wave (CW) tones and started projected chirps that are not on the processed frequency bands. Target is treated as absent during those frames for that only noise is contained in the data for processing.
In the experiment data processing, the Bernoulli filter parameters are set to N p = 10,000, N b = 8000, p s = 0.9 and p b = 0.1. In DOA-based method, ν = 6 and u(z) = 1/2π (uniform distribution of azimuth). Probability of detection is p d = 0.9. Probability of target existence, CPEP, OSPA of both location and velocity tracking are presented in Figure 5. Results are the average of 500 trails with the same filter parameters. Probability of target existence is the updated probability q k|k in the filter iteration. Target is declared when q k|k > 0.5 and the state vector estimation is accepted. For q k|k <= 0.5, target is not detected and the RFS state is an empty set. H = diag(1, 0, 1, 0) and location error radius r = 1 km in CPEP metric. Position and velocity OSPA cut-off are set to c = 2 km for location and c = 0.1 m/s for velocity.  Only two arrays in the experiment led to wrong estimation when the correlation deviated for any of the arrays. This deviation should have been eliminated if we had one more array. Besides, for many frames, the target is in the end-fire direction of HLAs, where the performance of the beam-based method is not satisfactory. In Figure 5, the performance results of the DOA-based method and the hybrid method with α = 0.4 are presented. The performance of the beam-based method decreases dramatically, because of the lack of arrays and the effect of the end-fire direction, and is thus not shown. Similar to the simulation results, compared to the DOA-based algorithm, the hybrid method is better at detection, i.e., capturing the appearance and disappearance of the target signal. This can be seen between frames 48∼54, 120∼123 and 131∼134 when the target signal is not included in the processed data. Correspondingly, when using the hybrid method, the convergence rate for the relatively more precise estimation is slower than that of the DOA-based method (for frames 1∼26 at the beginning). From frames 62∼118, both tracking algorithms achieve good performance. Near the end of the track (frames 135∼199), the hybrid method outperforms the DOA-based method.