Underwater Source Counting with Local-Confidence-Level-Enhanced Density Clustering

Source counting is the key procedure of autonomous detection for underwater unmanned platforms. A source counting method with local-confidence-level-enhanced density clustering using a single acoustic vector sensor (AVS) is proposed in this paper. The short-time Fourier transforms (STFT) of the sound pressure and vibration velocity measured by the AVS are first calculated, and a data set is established with the direction of arrivals (DOAs) estimated from all of the time–frequency points. Then, the density clustering algorithm is used to classify the DOAs in the data set, with which the number of the clusters and the cluster centers are obtained as the source number and the DOA estimations, respectively. In particular, the local confidence level is adopted to weigh the density of each DOA data point to highlight samples with the dominant sources and downplay those without, so that the differences in densities for the cluster centers and sidelobes are increased. Therefore, the performance of the density clustering algorithm is improved, leading to an improved source counting accuracy. Experimental results reveal that the enhanced source counting method achieves a better source counting performance than that of basic density clustering.

The target directing methods based on AVSs include the average sound intensity detector, sound intensity flow DOA histogram, cross-spectral DOA histogram, and so on [11,12].AVSs can also be used in multi-channel arrays [13].Each of these methods has its advantages and disadvantages.The histogram algorithm is widely used in engineering due to its better robustness compared with other algorithms.It can suppress narrowband and strong line spectrum interference and has a certain degree of multi-target resolution ability [14,15].In [11], a DOA histogram based on sound intensity flow was used to estimate the DOA of targets, and the weighted DOA histogram of the line spectrum was used to realize the resolution and DOA estimation of multiple line spectrum targets.In [16], a DOA histogram was used to realize the resolution and DOA estimation of multiple wideband targets in the time-frequency domain and instantaneous frequency domain of the Huang transform.In an AVS sea trial experiment based on the Argo buoy platform in [17], it was also found that the DOA histogram could distinguish two wideband targets, the test ship and the engineering ship, whose adjacent interval was about 80 degrees.In reference [18,19], the windowed-disjoint orthogonality (WDO) of the target signal was further introduced to explain the mechanism of the wideband multi-target resolution ability of the DOA histogram.
The components of ship-radiated noise are very complex, including line spectrum, stationary ergodic random signals, and transient signals [20].Therefore, it is considered to be broadband noise with energy distribution at any time and any frequency from the macro point of view.However, from the micro point of view, the energy intensity at different timefrequency points varies, which leads to a significant difference in the energy of different sources at some time-frequency points when multiple sources are synthesized at the sensor receiver, and a target playing a dominant role.This phenomenon is called the windowdisjoint orthogonality (WDO) of signals and is widely used in blind signal separation for multi-source resolution and separation [21,22].The higher the WDO characteristic of the signal in a certain time frequency point, the greater the energy of the dominant signal is compared to the sum of other signals, and the direction estimation result of the frequency point will be biased to the target direction of the dominant signal.If significant numbers of TF points possess the WDO property, in the DOA histogram the DOA estimates will cluster around the actual DOAs of the sources to achieve multi-target resolution of the histogram.
With multi-target resolution, it is possible to achieve source counting through multiple source detection.Traditional multiple source detection methods such as Minimum Description Length (MDL) [23], Akaike Information Criterion (AIC) [24], and Random Matrix Theory (RMT) [25] are overdetermined.They work well when the sensor channels outnumber the sources.However, for two-dimensional AVSs, these methods will fail when there are more than two sources, in which case the applicability of the AVS will be greatly limited.
As a result, an underdetermined method based on the density clustering algorithm was formed to solve the source counting problem for single AVSs in [18].The density clustering algorithm was used to classify the DOAs of all the time-frequency points, where the cluster centers and the number of the clusters were obtained as the DOA estimations and target number, respectively.However, WDO is the inherent characteristic of the signal.With the increase in targets, the WDO decreases, which limits the multi-target resolution performance of the DOA histogram, and the performance of the density clustering algorithm deteriorated.
For this reason, a multi-source detection based on multimodal fusion was developed in [10].The output of the AVS was decomposed into multiple modes by intrinsic time-scale decomposition (ITD), so that the source number in each mode decreased.For each mode, the WDO increased, leading to a better source counting accuracy.Therefore, the fused source counting performance was improved.However, the counting performance varied with the number of modes employed and it was difficult to determine the optimal number of employed modes.
In this paper, the local confidence level is adopted to weigh the density of each DOA data point to highlight the samples with the dominant source and downplay those without, so that the differences in densities for the cluster centers and sidelobes are increased.Therefore, the performance of the density clustering algorithm is enhanced.An analysis of lake trial data is conducted by comparing the proposed local-confidence-level-enhanced density clustering method with the methods in [10] and [18].The results confirm the availability of the proposed local-confidence-level-enhanced density clustering method in improving source counting performance.The source counting accuracy of the proposed method is better than the methods it is compared with.The probability distribution of the source counting result for the local-confidence-level-enhanced method is more concentrated to the source number.As the SNR decreases, the proposed method undergoes slower degradation.

Model and Cross-Spectral DOA Histogram of Single-Vector Sensor
The output of an AVS contains information on both sound pressure and vibration velocity.The model of two-dimensional AVSs in the horizontal free sound field can be expressed as: where N is the number of sources, P(t) is the sound pressure component of the AVS output, and V x (t) and V y (t) are the components of vibration velocity on two axes, respectively.
x n (t) represents the signal radiated from the nth source to the AVS, α n (t) indicates the angle of the nth sound source relative to the vector hydrophone (x axis is zero-degree orientation), e(t) = [n p (t), n x (t), n y (t)] T is the noise vector, and n p (t), n x (t), and n y (t) are noises of the sound pressure, x-axis, and y-axis vibration velocities, respectively.By taking the short-time Fourier transform (STFT) of each channel of the mixture, the AVS model can be expressed in the time-frequency domain as: where ω and m are the frequency bin and time frame indices, respectively.P(ω, m), V x (ω, m), V y (ω, m), X n (ω, m), and E(ω, m) are the STFTs of P(t), V x (t), V y (t), and e(t), respectively.The DOA estimation at each time-frequency point can be obtained by: where ∠ represents the phase angle, and V * x (ω, m) and V * y (ω, m) represent the complex conjugates of V x (ω, m) and V y (ω, m), respectively.The DOA histogram can be obtained by counting the estimation results of all time-frequency points.Supposing that the azimuth interval of histogram statistics is ∆θ, we count the number of DOA estimation results in the interval [θ − ∆θ/2, θ + ∆θ/2]: when θ − ∆θ/2 < θ(ω, m) ≤ θ + ∆θ/2, the amplitude of the histogram at the corresponding position is increased by 1: (4)

Density-Clustering-Based Source Counting
The density clustering algorithm depends on two important hypotheses: (1) The density for the center of a cluster is higher than that of the neighboring points; (2) The minimum distance between the center of a cluster and a higher-density point is relatively large [26,27].The data set Θ = [θ 1 , θ 2 , . . ., θ K ] is established with the DOA estimations obtained by Equation (3) in the time-frequency domain, where K is the number of the DOA estimations.Since the distance between two DOA estimations does not exceed 180 • , the distance d kl between each two samples in Θ is defined as: where θ k and θ l are two different samples in the data set Θ.They are angles limited within the interval of [−180 Then, the local density of each sample ρ k is calculated as: where χ(x) = 1 if x < 0 and χ(x) = 0 otherwise, and d c > 0 represents the cut-off distance used to keep a region for each sample.ρ k is the number of the samples within a limited distance of d c from the sample θ k , i.e., ρ k is equal to the number of points that are closer than d c to the sample θ k .The more samples there are near the sample θ k , the larger the local density of the sample θ k is.d c can be chosen so that the average number of neighbors is around 1 to 2% of the total number of points in the data set [28].A sample is assigned to the same cluster of its nearest sample with a higher density only if their distance is smaller than d c .The minimum distance δ k between the sample θ k and any other sample with a higher density is measured as: For the point with the highest density, we conventionally take δ k as: The product γ k = ρ k δ k , k = 1, 2, . . ., K is used as the feature, and its value for a cluster center sample is obviously larger than that for the other sample.Thus, the differences among the ordered feature sequence are used to find the samples with significantly larger features, whose number corresponds to the number of clusters or targets [29].In this procedure, the features γ k , k = 1, 2, . . ., K are sorted in descending order, i.e., Suppose that there are L targets in the detection range of the AVS.Since the features of the cluster center samples are much larger than the other samples, the first L features {γ 1 , γ 2 , . . . ,γ L } are significantly larger than the other K − L features {γ L+1 , γ L+2 , . . . ,γ K }.Consequently, the difference between γ L and γ L+1 is relatively large.To obtain L, the differences in the ordered features are computed as: And the variance of the sequence {∆γ i } K−1 i=n is calculated as: where n = 1, 2, . . ., K − 2. Further, the second-order statistic of the features is defined as: Finally, the target (or cluster) number is estimated as: And the L DOA samples with the L largest features are obtained as the final multitarget DOA estimation results.

Local-Confidence-Level-Enhanced Density Clustering Algorithm
With the increase in sources, the WDO decreases, which limits the multi-target resolution performance of the DOA histogram.DOA estimations of different time-frequency points have different contributions to density clustering.For a certain time-frequency point, the larger the proportion of the dominant signal energy is, the closer the DOA estimation is to the truth value, and the greater the contribution this sample provides to the clustering.A local-confidence-level-based density enhancement algorithm is proposed in which the contribution of the samples with a high proportion of the dominant signal energy are exaggerated, so that the accuracy of the clustering and the precision of the target number estimation are improved.
For a certain time-frequency point (ω, m) (read circle) in the STFT domain, define the rectangular area Ω ω,m (Dotted Box) around it, as illustrated in Figure 1.The width and length of this area are l m time points and l ω frequency points, respectively.Therefore, there are l ω l m time-frequency points in the rectangular area Ω ω,m .The local confidence Γ(ω, m) of this time-frequency point (ω, m) is estimated by performing a principal component analysis (PCA) [30] on the snapshot vector Y(Ω ω,m ) in the region Ω ω,m , which reflects the strength of the dominant signal at the time-frequency point (ω, m).It is obvious that the local confidence level is proportional to the ratio of the dominant signal.
Then, the local confidence is used to enhance the cross-spectral DOA histogram.In the original orientation histogram statistics, the density of each sample is 1.The localconfidence-weighted DOA histogram with a sample density of ( , ) m Substituting Equation (4) into Equation ( 16) yields: where ( )   is the average local confidence level.
The enhanced local density k   of the k th sample is weighted with the correspond- The minimum distance is calculated according to the weighted local density k   , and the target number is estimated by the ordered feature sequence afterward.The remaining For each time-frequency point, a snapshot Y(Ω ω,m ) is formed with all snapshots Y(ω, m) in the region Ω ω,m .It is a 3 × l ω l m matrix with columns Y(ω, m), (ω, m) ∈ Ω ω,m .A positive semidefinite complex Hermitian matrix is then constructed with Y(Ω ω,m ) for the region Ω ω,m as: For a two-dimensional AVS, the rank of the matrix R(ω, m) is 3. Performing an eigenvalue decomposition on R(ω, m), we obtain three real-valued positive eigenvalues in decreasing order λ 1 (ω, m) ≥ λ 2 (ω, m) ≥ λ 3 (ω, m) of the matrix R(ω, m).The local confidence level of the time-frequency point (ω, m) is expressed as: If there is only one dominant signal whose power is much larger than the noise in the time-frequency point (ω, m), λ 1 is the eigenvalue of the dominant signal which is much larger than λ 2 and λ 3 , which are the eigenvalues of the noise.Otherwise, if there are two similar signals whose power is much larger than the noise, λ 2 will be much larger than λ 3 , and the local confidence will become much smaller than the one dominant signal case.Therefore, it can be used to represent the WDO.
It is obvious that the local confidence level is proportional to the ratio of the dominant signal.
Then, the local confidence is used to enhance the cross-spectral DOA histogram.In the original orientation histogram statistics, the density of each sample is 1.The localconfidence-weighted DOA histogram with a sample density of Γ(ω, m) is: Substituting Equation (4) into Equation ( 16) yields: where Γ(θ) is the average local confidence level.
The enhanced local density ρ k of the kth sample is weighted with the corresponding local confidence level Γ(k) as: The minimum distance is calculated according to the weighted local density ρ k , and the target number is estimated by the ordered feature sequence afterward.The remaining steps are the same as in Section 3.1.
With the enhanced local density, the density of the sample points that are closer to the cluster center will increase while the density of the sample points far away from the cluster center will decrease.Consequently, the performance of the density-clustering-based multi-target detection is improved.To summarize, the proposed source counting method based on the local confidence-enhanced density clustering is depicted in Algorithm 1.

Experimental Settings
The experimental data were collected in FuXian Lake.The experimental scenario is depicted in Figure 2, in which the location of a two-dimensional co-vibration AVS is in the middle of the lake.The sensitivity of the velocity channel decreased with increasing frequency (as the slope of −6 dB/octave), and there was a 90-degree phase difference between the vibration velocity channel and the sound pressure channel.The AVS was rigidly fixed about 4 m underwater on the side of the survey ship.The output signals of the AVS were collected and stored by a multi-channel synchronous data acquisition system.In the experiment, four ships (yachts) acted as moving targets located 1-2 km around the AVS.The ships were distributed in four directions of about 40 • , 140 • , 210 • , and 330 • at the beginning.The sampling rate was 48 kHz, the length of the STFT window was 8192 points, the frequency band was 0.5-8 kHz, the integral length of the azimuth histogram was 1 s (six windows), the sliding step was 8192 points (one window), and the rectangular area was l m = l w = 3.The signals of two vibration velocity channels were compensated in the frequency domain according to the slope and phase of sensitivity after STFT.

Resolution Performance
Figure 3 illustrates the temporal course of the cross-spectral DOA histogram computed with Equation (4) at each time bin.Four targets marked as <1>, <2>, <3>, and <4> are presented in the histogram.The red color stands for the peaks of the histogram, which indicate the locations of the targets.Continuous peaks in time compose the time-bearing course of a target, which is illustrated roughly with straight black, red, green, and blue dash-dot lines, respectively, for each target.As shown in Figure 3, the azimuths of targets 3 and 4 gradually become closer over time, so the spectral peaks of these two targets become hard to distinguish (red color connects in one piece around time (d)).Since the signal from the target ship is weak when the ship decelerates or stops, the spectral peaks of targets 1 and 2 are unapparent or even submerged by the background during 38-40 s in the DOA histogram, as with the time points 44-50 s of target 3 and 50-53 s of target 4 (red color absent).Thus, the target courses have obvious discontinuity.  3 and 4 gradually become closer over time, so the spectral peaks of these two targets become hard to distinguish (red color connects in one piece around time (d)).Since the signal from the target ship is weak when the ship decelerates or stops, the spectral peaks of targets 1 and 2 are unapparent or even submerged by the background during 38-40 s in the DOA histogram, as with the time points 44-50 s of target 3 and 50-53 s of target 4 (red color absent).Thus, the target courses have obvious discontinuity.16) at each time bin.It can be seen that the course of target 3 is significantly enhanced (red color is more continuous in time and peaks are higher than those in Figure 3), and the course of other targets is also enhanced relative to the original ones in Figure 3.It indicates that by using the local confidence level enhancement, the DOA courses of these targets become apparent, and both the resolution of targets and the continuity of the courses are improved (red color areas are thinner in bearing axis direction and more clearly separated compared to Figure 3).16) at each time bin.It can be seen that the course of target 3 is significantly enhanced (red color is more continuous in time and peaks are higher than those in Figure 3), and the course of other targets is also enhanced relative to the original ones in Figure 3.It indicates that by using the local confidence level enhancement, the DOA courses of these targets become apparent, and both the resolution of targets and the continuity of the courses are improved (red color areas are thinner in bearing axis direction and more clearly separated compared to Figure 3).3 and 4, respectively.As shown in Figure 5, both the original and the enhanced histograms present spectral peaks at the directions of the targets, while the enhanced one possesses glaringly obvious peaks.This reveals that the enhanced histogram is able to estimate the azimuths of targets at a higher resolution.In particular, when the signal from the target ship is weak, such as target 3 in Figure 5c, the enhanced cross-spectrum can significantly upgrade its peak, which makes it possible to detect weak targets.Additionally, as depicted in Figure 5d, the distinguishability of adjacent targets (targets 3 and 4) is also promoted by the enhanced cross-spectrum.

Resolution Performance
Figure 6 explains the enhancement performance of the local confidence level by the DOA histogram at time (c).The average local confidences of the directions of the targets are larger than the others.As a result, the peaks of the targets become higher and sharper.3 and 4, respectively.As shown in Figure 5, both the original and the enhanced histograms present spectral peaks at the directions of the targets, while the enhanced one possesses glaringly obvious peaks.This reveals that the enhanced histogram is able to estimate the azimuths of targets at a higher resolution.In particular, when the signal from the target ship is weak, such as target 3 in Figure 5c, the enhanced cross-spectrum can significantly upgrade its peak, which makes it possible to detect weak targets.Additionally, as depicted in Figure 5d, the distinguishability of adjacent targets (targets 3 and 4) is also promoted by the enhanced cross-spectrum.when the signal from the target ship is weak, such as target 3 in Figure 5c, the enhanced cross-spectrum can significantly upgrade its peak, which makes it possible to detect weak targets.Additionally, as depicted in Figure 5d, the distinguishability of adjacent targets (targets 3 and 4) is also promoted by the enhanced cross-spectrum.
Figure 6 explains the enhancement performance of the local confidence level by the DOA histogram at time (c).The average local confidences of the directions of the targets are larger than the others.As a result, the peaks of the targets become higher and sharper.

Source Counting Performance
To explain the advantage of the local-confidence-level-enhanced density clustering algorithm, Figure 7 illustrates the decision graphs of the basic and enhanced density clustering algorithms with the data in Figure 6.Due to the inconspicuousness of the peaks of targets 1 and 3 (i.e., points c and d in Figure 6a), the cluster centers (i.e., decision points c and d in Figure 7a) of the basic density clustering are close to the cluster halos, which are recognized as the background.Nevertheless, in the enhanced density clustering algo-

Source Counting Performance
To explain the advantage of the local-confidence-level-enhanced density clustering algorithm, Figure 7 illustrates the decision graphs of the basic and enhanced density clustering algorithms with the data in Figure 6.Due to the inconspicuousness of the peaks of

Source Counting Performance
To explain the advantage of the local-confidence-level-enhanced density clustering algorithm, Figure 7 illustrates the decision graphs of the basic and enhanced density clustering algorithms with the data in Figure 6.Due to the inconspicuousness of the peaks of targets 1 and 3 (i.e., points c and d in Figure 6a), the cluster centers (i.e., decision points c and d in Figure 7a) of the basic density clustering are close to the cluster halos, which are recognized as the background.Nevertheless, in the enhanced density clustering algorithm, the local density is enhanced with the corresponding local confidence level which is represented by the size of the points in Figure 7.The local confidence level is higher around the cluster center than that in the cluster halo.Therefore, the background becomes more clustered and the cluster centers are far away from the background, especially for targets 1 and 3 (i.e., points C and D in Figure 7b).Therefore, the enhanced local density leads to a clearer distinction between the cluster centers and the background in the decision graph, which makes it easier to achieve correct counting.Figure 8 demonstrates the second-order statistic S n of the basic and enhanced density clustering algorithms with the data in Figure 6.Obviously, the basic density clustering obtains the wrong counting result while the enhanced density clustering obtains the correct one.These results are consistent with the analysis of Figure 7.
Figure 9 illustrates the courses of the multi-target DOA estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.Compared with the courses estimated by the basic density clustering, the outliers in the courses estimated by the enhanced density clustering significantly reduce by weighting the density with the local confidence level, and the continuity of the courses of the multi-target detection is remarkably improved, especially in Regions A and B.Moreover, the distribution of the DOA estimations of targets is more compact for enhanced density clustering.
Figure 10 depicts the results of the source number estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.Obviously, in Regions A, B, and C, the enhanced density clustering achieves the correct estimation results while the basic density clustering does not.In Region D, although both methods obtain incorrect results, the outcomes of the enhanced density clustering are closer to the true value.Targets 3 and 4 stop for a while during the periods of 44-50 s and 50-53 s, respectively.During these periods, there are only three targets in the field.As a result, the source number estimation result in the period before 44 s was used to calculate the source counting accuracy.A comparison was conducted between the proposed local-confidence-level-enhanced density clustering method, the multimodal-fusion-based method in [10], and the basic density-clustering-based method in [18].As shown in Figure 11, the basic densityclustering-based method obtained 48.82% accuracy of target number estimation, the multimodal-fusion-based method obtained an accuracy of 51.15% with four modes employed, A comparison was conducted between the proposed local-confidence-level-enhanced density clustering method, the multimodal-fusion-based method in [10], and the basic density-clustering-based method in [18].As shown in Figure 11, the basic densityclustering-based method obtained 48.82% accuracy of target number estimation, the multimodal-fusion-based method obtained an accuracy of 51.15% with four modes employed, while the local-confidence-level-enhanced method achieved 63.39% accuracy.The probability distribution of the source counting result for the local-confidence-levelenhanced method is more concentrated to the source number.
estimating the errors of one and two sources and worsens the probability of overestimat ing errors of five sources.The WDO increases and sources are more detectable in each mode, so the probability of missed detections is reduced.However, a source signal may be divided into parts and be distributed in more than one mode.These parts may not be fully recognized as the same source during the fusion step, leading to an overestimation As a result, compared to the basic density-clustering-based method, despite an advantage in the estimation accuracy, the probability distribution of estimated results is much better for the multimodal-fusion-based method.In addition, the estimation accuracy varies with the number of modes employed.Therefore, it can be seen that the presetting of the numbe of employed modes is quite important.
In contrast, the local-confidence-level-enhanced density clustering dramatically im proves the accuracy of the source counting.In the enhanced method, the local density o the samples around the cluster centers with high local confidence is increased.Therefore the clusters become more compact, which improves the source counting accuracy.In order to evaluate the effect of the noise on the proposed method, an experimen with different SNRs was performed.The real SNR of the lake trial is unknown.It was assumed, empirically, to be 24 dB.The SNR was adjusted by adding simulated Gaussian white noise to the lake trial signal.The performance of the basic density-clustering-based method and the enhanced density-clustering-based method for 8, 12, 16, 20, and 24 dB (without simulated noise) is displayed in Figure 12 comparatively.As the SNR decreases the performance of both methods degenerates gradually.However, the enhanced density clustering-based method performs better than the basic density-clustering-based method and undergoes slower degradation.The multimodal-fusion-based method significantly reduces the probability of underestimating the errors of one and two sources and worsens the probability of overestimating errors of five sources.The WDO increases and sources are more detectable in each mode, so the probability of missed detections is reduced.However, a source signal may be divided into parts and be distributed in more than one mode.These parts may not be fully recognized as the same source during the fusion step, leading to an overestimation.As a result, compared to the basic density-clustering-based method, despite an advantage in the estimation accuracy, the probability distribution of estimated results is much better for the multimodal-fusion-based method.In addition, the estimation accuracy varies with the number of modes employed.Therefore, it can be seen that the presetting of the number of employed modes is quite important.
In contrast, the local-confidence-level-enhanced density clustering dramatically improves the accuracy of the source counting.In the enhanced method, the local density of the samples around the cluster centers with high local confidence is increased.Therefore, the clusters become more compact, which improves the source counting accuracy.
In order to evaluate the effect of the noise on the proposed method, an experiment with different SNRs was performed.The real SNR of the lake trial is unknown.It was assumed, empirically, to be 24 dB.The SNR was adjusted by adding simulated Gaussian white noise to the lake trial signal.The performance of the basic density-clustering-based method and the enhanced density-clustering-based method for 8, 12, 16, 20, and 24 dB (without simulated noise) is displayed in Figure 12 comparatively.As the SNR decreases, the performance of both methods degenerates gradually.However, the enhanced densityclustering-based method performs better than the basic density-clustering-based method and undergoes slower degradation.To investigate the frequency bands of the targets in the experiment, each TF point was assigned to a target according to its dominant signal.If the DOA estimation of a certain TF point was 2° away from the nearest cluster center, it was assigned to this target.Otherwise, it is considered that there is no dominant signal in this time-frequency point.Figure 13 shows the assignment of the first 1.4 s.The frequencies of all four targets spread over the whole broadband of 500-8000 Hz randomly, and clearly cannot be divided into four different separated sub-bands.This result indicates that the WDO assumption holds in this experiment.

Conclusions
An underwater source counting method with local-confidence-level-enhanced density clustering is proposed.In this method, the STFT of the AVS output signal is calculated, and the DOAs of all time-frequency points consist of a data set.Then, the DOAs in the data set are classified with density clustering, where the cluster centers and the number of clusters are obtained as the DOA estimations and the target number, respectively.Finally, the local confidence level is applied to further enhance the performance of the density clustering algorithm.A lake trial is conducted to verify the local-confidence-level-enhanced source counting method which achieves a better source counting performance than the multimodal-fusion-based method and the enhanced density-clustering-based method.
Author Contributions: All the authors contributed equally.All authors have read and agreed to the published version of the manuscript.To investigate the frequency bands of the targets in the experiment, each TF point was assigned to a target according to its dominant signal.If the DOA estimation of a certain TF point was 2 • away from the nearest cluster center, it was assigned to this target.Otherwise, it is considered that there is no dominant signal in this time-frequency point.Figure 13 shows the assignment of the first 1.4 s.The frequencies of all four targets spread over the whole broadband of 500-8000 Hz randomly, and clearly cannot be divided into four different separated sub-bands.This result indicates that the WDO assumption holds in this experiment.To investigate the frequency bands of the targets in the experiment, each TF point was assigned to a target according to its dominant signal.If the DOA estimation of a certain TF point was 2° away from the nearest cluster center, it was assigned to this target.Otherwise, it is considered that there is no dominant signal in this time-frequency point.Figure 13 shows the assignment of the first 1.4 s.The frequencies of all four targets spread over the whole broadband of 500-8000 Hz randomly, and clearly cannot be divided into four different separated sub-bands.This result indicates that the WDO assumption holds in this experiment.

Conclusions
An underwater source counting method with local-confidence-level-enhanced density clustering is proposed.In this method, the STFT of the AVS output signal is calculated, and the DOAs of all time-frequency points consist of a data set.Then, the DOAs in the data set are classified with density clustering, where the cluster centers and the number of clusters are obtained as the DOA estimations and the target number, respectively.Finally, the local confidence level is applied to further enhance the performance of the density clustering algorithm.A lake trial is conducted to verify the local-confidence-level-enhanced source counting method which achieves a better source counting performance than the multimodal-fusion-based method and the enhanced density-clustering-based method.
Author Contributions: All the authors contributed equally.All authors have read and agreed to the published version of the manuscript.

Conclusions
An underwater source counting method with local-confidence-level-enhanced density clustering is proposed.In this method, the STFT of the AVS output signal is calculated, and the DOAs of all time-frequency points consist of a data set.Then, the DOAs in the data set are classified with density clustering, where the cluster centers and the number of clusters are obtained as the DOA estimations and the target number, respectively.Finally, the local confidence level is applied to further enhance the performance of the density clustering algorithm.A lake trial is conducted to verify the local-confidence-level-enhanced source counting method which achieves a better source counting performance than the multimodal-fusion-based method and the enhanced density-clustering-based method.

Figure 3
Figure3illustrates the temporal course of the cross-spectral DOA histogram computed with Equation (4) at each time bin.Four targets marked as <1>, <2>, <3>, and <4> are presented in the histogram.The red color stands for the peaks of the histogram, which indicate the locations of the targets.Continuous peaks in time compose the time-bearing course of a target, which is illustrated roughly with straight black, red, green, and blue dash-dot lines, respectively, for each target.As shown in Figure3, the azimuths of targets 3 and 4 gradually become closer over time, so the spectral peaks of these two targets become hard to distinguish (red color connects in one piece around time (d)).Since the signal from the target ship is weak when the ship decelerates or stops, the spectral peaks of targets 1 and 2 are unapparent or even submerged by the background during 38-40 s in the DOA histogram, as with the time points 44-50 s of target 3 and 50-53 s of target 4 (red color absent).Thus, the target courses have obvious discontinuity.

Figure 3 .
Figure 3.The temporal course of the cross-spectral DOA histogram of vector hydrophone from lake trial.((a), (b), (c) and (d) denote the time of 6.7 s, 11.6 s, 21.3 s and 36.0 s).

Figure 4
Figure 4 depicts the temporal course of the local-confidence-level-enhanced crossspectral DOA histogram computed with Equation (16) at each time bin.It can be seen that the course of target 3 is significantly enhanced (red color is more continuous in time and peaks are higher than those in Figure3), and the course of other targets is also enhanced relative to the original ones in Figure3.It indicates that by using the local confidence level enhancement, the DOA courses of these targets become apparent, and both the resolution of targets and the continuity of the courses are improved (red color areas are thinner in bearing axis direction and more clearly separated compared to Figure3).

Sensors 2023 , 16 Figure 4
Figure 4 depicts the temporal course of the local-confidence-level-enhanced crossspectral DOA histogram computed with Equation (16) at each time bin.It can be seen that the course of target 3 is significantly enhanced (red color is more continuous in time and peaks are higher than those in Figure3), and the course of other targets is also enhanced relative to the original ones in Figure3.It indicates that by using the local confidence level enhancement, the DOA courses of these targets become apparent, and both the resolution of targets and the continuity of the courses are improved (red color areas are thinner in bearing axis direction and more clearly separated compared to Figure3).

Figure 5
Figure 5 shows the comparative results between the cross-spectral DOA histogram and the enhanced cross-spectral DOA histogram at different time points corresponding to (a), (b), (c), and (d) in Figures3 and 4, respectively.As shown in Figure5, both the original and the enhanced histograms present spectral peaks at the directions of the targets, while the enhanced one possesses glaringly obvious peaks.This reveals that the enhanced histogram is able to estimate the azimuths of targets at a higher resolution.In particular, when the signal from the target ship is weak, such as target 3 in Figure5c, the enhanced cross-spectrum can significantly upgrade its peak, which makes it possible to detect weak targets.Additionally, as depicted in Figure5d, the distinguishability of adjacent targets (targets 3 and 4) is also promoted by the enhanced cross-spectrum.Figure6explains the enhancement performance of the local confidence level by the DOA histogram at time (c).The average local confidences of the directions of the targets are larger than the others.As a result, the peaks of the targets become higher and sharper.

Figure 5
Figure 5 shows the comparative results between the cross-spectral DOA histogram and the enhanced cross-spectral DOA histogram at different time points corresponding to (a), (b), (c), and (d) in Figures3 and 4, respectively.As shown in Figure5, both the original and the enhanced histograms present spectral peaks at the directions of the targets, while the enhanced one possesses glaringly obvious peaks.This reveals that the enhanced histogram is able to estimate the azimuths of targets at a higher resolution.In particular, when the signal from the target ship is weak, such as target 3 in Figure5c, the enhanced cross-spectrum can significantly upgrade its peak, which makes it possible to detect weak targets.Additionally, as depicted in Figure5d, the distinguishability of adjacent targets (targets 3 and 4) is also promoted by the enhanced cross-spectrum.

Figure 6 .
Figure 6.DOA histogram: (a) The cross-spectral DOA histogram and average local confidence level; (b) The enhanced cross-spectral DOA histogram.(The circles a-d and A-D denote the data samples considered as the clusters centers)

Figure 6 Figure 5 .Figure 6 .
Figure 6 explains the enhancement performance of the local confidence level by the DOA histogram at time (c).The average local confidences of the directions of the targets are larger than the others.As a result, the peaks of the targets become higher and sharper.

Figure 6 .
Figure 6.DOA histogram: (a) The cross-spectral DOA histogram and average local confidence level; (b) The enhanced cross-spectral DOA histogram.(The circles a-d and A-D denote the data samples considered as the clusters centers).

Sensors 2023 , 16 Figure 8 Figure 7 .Figure 8 .
Figure8demonstrates the second-order statistic  of the basic and enhanced density clustering algorithms with the data in Figure6.Obviously, the basic density clustering obtains the wrong counting result while the enhanced density clustering obtains the correct one.These results are consistent with the analysis of Figure7.

Figure 9
Figure 9 illustrates the courses of the multi-target DOA estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.Compared with the courses estimated by the basic density clustering, the outliers in the courses estimated by the enhanced density clustering significantly reduce by weighting the density with the local confidence level, and the continuity of the courses of the multi-target detection is remarkably improved, especially in Regions A and B.Moreover, the distribu-

Figure 7 .
Figure 7.The decision graphs of the basic and enhanced density clustering algorithms for the data in Figure 6: (a) Basic density clustering; (b) Enhanced density clustering.(The points a-d and A-D denote the data samples considered as the clusters centers).

Figure 7 .Figure 8 .
Figure 7.The decision graphs of the basic and enhanced density clustering algorithms for the data in Figure 6: (a) Basic density clustering; (b) Enhanced density clustering.(The points a-d and A-D denote the data samples considered as the clusters centers)

Figure 9
Figure 9 illustrates the courses of the multi-target DOA estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.Compared with the courses estimated by the basic density clustering, the outliers in the courses estimated by the enhanced density clustering significantly reduce by weighting the density with the local confidence level, and the continuity of the courses of the multi-target detection is remarkably improved, especially in Regions A and B.Moreover, the distribution of the DOA estimations of targets is more compact for enhanced density clustering.

Figure 8 . 16 Figure 9 .
Figure 8.The second-order statistic S n of the basic and enhanced density clustering algorithms for the data in Figure 6.Sensors 2023, 23, x FOR PEER REVIEW 12 of 16

Figure 9 .
Figure 9. Courses of the multi-target DOA estimations with the basic density clustering, and the local-confidence-level-enhanced density clustering.(Comparing to the basic density clustering, the continuity of the courses of the enhanced density clustering is remarkably improved in Regions A and B).

Sensors 2023 , 16 Figure 9 .
Figure 9. Courses of the multi-target DOA estimations with the basic density clustering, and the local-confidence-level-enhanced density clustering.(Comparing to the basic density clustering , the continuity of the courses of the enhanced density clustering is remarkably improved in Regions A and B ) Figure 10 depicts the results of the source number estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.Obviously, in Regions A, B, and C, the enhanced density clustering achieves the correct estimation results while the basic density clustering does not.In Region D, although both methods obtain incorrect results, the outcomes of the enhanced density clustering are closer to the true value.Targets 3 and 4 stop for a while during the periods of 44-50 s and 50-53 s, respectively.During these periods, there are only three targets in the field.As a result, the source number estimation result in the period before 44 s was used to calculate the source counting accuracy.

Figure 10 .
Figure 10.Courses of the source number estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.(The enhanced density clustering achieves the correct estimation results while the basic density clustering does not in Regions A, B, and C. The outcomes of the enhanced density clustering are closer to the true value in Regions D)

Figure 10 .
Figure 10.Courses of the source number estimations obtained by the basic density clustering and the local-confidence-level-enhanced density clustering.(The enhanced density clustering achieves the correct estimation results while the basic density clustering does not in Regions A, B, and C. The outcomes of the enhanced density clustering are closer to the true value in Regions D).

Figure 11 .
Figure 11.The estimated source number histogram of the basic density-clustering-based method the multimodal-fusion-based method, and the enhanced density-clustering-based method for the experiment.

Figure 11 .
Figure 11.The estimated source number histogram of the basic density-clustering-based method, the multimodal-fusion-based method, and the enhanced density-clustering-based method for the experiment.

Figure 13 .
Figure 13.Frequency distribution of targets in time-frequency domain.

Figure 13 .
Figure 13.Frequency distribution of targets in time-frequency domain.

Figure 13 .
Figure 13.Frequency distribution of targets in time-frequency domain.