Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination

Zhang, Xiaoshuang; Che, Jiayi; Xiong, Xiaodan; Zhang, Yucheng; He, Xinbo; Deng, Mengsha; Wang, Dezhi

doi:10.3390/jmse14070675

Open AccessArticle

Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination

by

Xiaoshuang Zhang

,

Jiayi Che

,

Xiaodan Xiong

,

Yucheng Zhang

,

Xinbo He

,

Mengsha Deng

and

Dezhi Wang

^*

College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(7), 675; https://doi.org/10.3390/jmse14070675

Submission received: 5 March 2026 / Revised: 30 March 2026 / Accepted: 2 April 2026 / Published: 4 April 2026

(This article belongs to the Special Issue Emerging Computational Methods in Intelligent Marine Vehicles)

Download

Browse Figures

Versions Notes

Abstract

Distinguishing surfaces from underwater targets in complex marine environments is challenging when relying solely on physical sonar features. To address the high uncertainty inherent in single-modal features and the conflicts arising from heterogeneous data, we propose a Dual-View Entropy-Driven Negation Dempster–Shafer (DVE-NDS) fusion method that integrates AIS kinematic priors with passive sonar signals. First, a heterogeneous recognition framework is constructed. LOFAR and DEMON features are extracted via convolutional neural networks (CNNs), while a Negation Basic Probability Assignment (Negation BPA) strategy is introduced to transform AIS spatiotemporal mismatches into effective "negation support" for non-cooperative underwater targets. Instead of relying on a single conflict coefficient, the proposed method jointly considers evidence self-information and inter-source consistency. Evidence quality is quantified using improved Deng entropy and negation belief entropy, while mutual trust is evaluated via the Jousselme distance. Heterogeneous evidence is weighted and corrected by generated coupling weights, effectively suppressing low-quality evidence and sharpening decision boundaries. Simulation results confirm that DVE-NDS improves macro-F1 over classical fusion, indicating the framework’s potential for handling conflicting evidence, though the current validation remains simulation-based and should be regarded as a methodological proof-of-concept.

Keywords:

sonar target detection; AIS information; decision-level fusion; surface and underwater target discrimination

1. Introduction

In underwater target detection, target features are identified by passive sonar systems by capturing ship radiated noise [1]. In modern operational scenarios, these acoustic detection methodologies are increasingly deployed on Autonomous Underwater Vehicles (AUVs). Real-world AUVs serve as highly versatile, reconfigurable carrier platforms equipped with advanced sensor suites for executing complex marine missions [2,3]. Traditional recognition methods rely primarily on analyzing the physical feature of acoustic signals. The Low-Frequency Analysis and Recording (LOFAR) spectrum intuitively reflects the time-frequency distribution of target signal energy and line spectrum features. Thus, it is frequently used for dynamic target detection and feature extraction [4]. It is continuously researched and enhanced for low signal-to-noise ratio conditions and complex marine environments [5,6,7]. Simultaneously, for propeller cavitation noise, Detection of Envelope Modulation on Noise (DEMON) technology demodulates envelope signals from broadband noise. This estimates key parameters like propeller shaft frequency and blade count [8,9]. Recent studies further combine DEMON and LOFAR representations with advanced signal processing and neural network models. This improves robustness and the recognition performance under complex acoustic conditions [10,11,12].

However, a single acoustic feature rarely provides stable and discriminative representations under low signal-to-noise ratio and complex marine background noise conditions [13,14]. Consequently, the generalization ability for underwater target recognition is limited. To address this issue, homogeneous feature fusion strategies have been adopted by researchers. An underwater target recognition method based on feature fusion and residual convolutional neural networks was proposed by Yang et al. [15]. Four features are fused in parallel by this method: Mel frequency cepstral coefficients, Gammatone frequency cepstral coefficients, constant Q transform, and LOFAR spectrum. Feature dimensionality reduction is then performed using neighborhood component analysis. The reduced fused features are subsequently input into a ResNet18-based residual convolutional neural network for classification. Furthermore, dynamic variation information of underwater acoustic signals was extracted by Liu et al. [16] through a differential method. The Mel spectrogram and its first- and second-order differences were fused into three-dimensional features. These were input into an RCNN network, verifying the effectiveness of feature fusion on the ShipsEar dataset. Additionally, the classification results of MFCC, GFCC, and LOFAR features were fused at the decision level by Feng et al. [17]. This was achieved utilizing an improved WA-DS evidence theory.

The aforementioned purely acoustic fusion methods enhance recognition performance. However, a common bottleneck is shared by these approaches. The acoustic channel itself might suffer from severe interference. Alternatively, the acoustic features of surface vessels and underwater targets might be highly similar. Quiet surface ships and submarines represent examples of this similarity. In these scenarios, extracting information solely within the acoustic domain is often insufficient to resolve classification ambiguities [18,19]. Therefore, utilizing heterogeneous nonacoustic information has become a promising direction. This approach improves recognition reliability.

The Automatic Identification System (AIS) is a mandatory cooperative system for surface targets that provides high-precision position, course, and identity information. Currently, the combination of AIS and sonar focuses primarily on data association and target tracking. For example, a framework fusing spatiotemporally unaligned AIS and sonar information was proposed by Zhao et al. [20]. The Dynamic Time Warping (DTW) algorithm was first utilized in this study for time-domain alignment of both data types. Subsequently, a deep learning model based on a multihead attention mechanism was designed. Spatial alignment and target matching were thereby achieved. Furthermore, AIS data was utilized as a validation benchmark by Zhang et al. [21]. The accuracy of target motion analysis algorithms based on single-vector hydrophone BTR maps was evaluated. Algorithm calculation results were compared with reference values calculated by AIS and GPS. The effectiveness of the proposed BTR enhancement algorithm and the improved bearing-only target motion analysis algorithm was thus verified. Regarding classification and recognition, passive sonar and AIS information were combined by Walker et al. [22]. However, regression estimation of the sound speed profile was the primary focus, rather than target property discrimination.

Existing AIS and sonar fusion methods focus primarily on target tracking and target matching. The negative evidence value of AIS is not utilized. Furthermore, evidence conflict resolution is complicated by the modal differences of heterogeneous data. Effective semantic alignment mechanisms are lacking. In fact, AIS data possesses significant negative evidence value in surface and underwater target classification tasks. Distinct target radiated noise can be detected by passive sonar. At the same time, no corresponding information may be available in the AIS system at the same bearing. In this case, the likelihood that the target is noncooperative or underwater increases significantly. A practical challenge in maritime environments is the presence of non-cooperative “dark ships,” i.e., surface vessels with missing or deceptive AIS signals. In such cases, relying only on AIS mismatch may bias the decision toward the underwater hypothesis. To mitigate this risk, the proposed framework reserves uncertainty mass through the focal element Ω, allowing acoustic evidence to counterbalance inconsistent AIS priors. This design does not fully solve the dark ship problem, but it aims to reduce the risk of hard misclassification.

However, this logical inference from heterogeneous sensors is rarely integrated directly into deep learning-based classification decision frameworks. This is primarily due to massive modality differences. These differences exist between the statistical properties of acoustic features and the spatial discrete properties of AIS.

Mainstream methods for the aforementioned hierarchical multi-source information fusion problem include data-level, feature-level, and decision-level fusion [23]. Alignment difficulties and the curse of dimensionality are encountered by data-level and feature-level fusion. This is due to the heterogeneity between acoustic spectrograms and AIS data. Therefore, this paper models a glider-borne passive sonar scenario and proposes a decision-level fusion method termed DVE-NDS for this heterogeneous recognition problem. A decision-level fusion method named Dual-View Entropy-Driven Negation DS Fusion (DVE-NDS) is proposed. Complex non-linear acoustic features are processed separately by deep learning models within this method. AIS information is subsequently transformed into mass functions through fuzzy logic. Finally, global inference is performed at the decision level.

The main contributions of this paper are summarized as follows:

Construction of a heterogeneous decision-level recognition framework: We construct a heterogeneous recognition framework that maps AIS spatiotemporal association into evidential support and uncertainty for surface/underwater discrimination. Specifically, a fuzzy logic strategy is utilized to transform discrete AIS kinematic information into probabilistic evidence. By applying Negation Basic Probability Assignment (Negation BPA) theory, AIS spatiotemporal mismatches are converted into negation support for non-cooperative underwater targets. This mechanism aligns kinematic priors with acoustic representations (LOFAR and DEMON spectra) at the semantic level.
Proposal of a Dual-View Entropy-Driven Negation D-S fusion algorithm: We propose a DVE-NDS fusion algorithm that jointly considers evidence quality and inter-source consensus to address conflicts between acoustic features and AIS priors. The algorithm establishes a hybrid weighting and dynamic correction mechanism. Modified Deng entropy and negation belief entropy are introduced to quantify the self-information quality of the evidence from both positive and negative perspectives. Simultaneously, the Jousselme distance is utilized to evaluate group consensus, and an adaptive correction strategy is applied to identify and replace abnormal evidence that deviates from group semantics.
Application of an interpretable evaluation paradigm based on Shapley values: We use game-theoretic Shapley-value analysis to quantitatively deconstruct the relative marginal contributions of LOFAR, DEMON, and AIS within the fusion process. The analysis provides a quantitative assessment of the system’s robustness mechanism, demonstrating that AIS information contributes to the upper bound of classification performance (with a contribution rate of 40.2%), while acoustic modalities establish the safety baseline of the system (with a contribution rate of 59.8%). This validates the structural rationale of kinematics-assisted acoustic decision-making.

2. Methods

In this section, CNNs are used to extract acoustic features from LOFAR and DEMON representations, while logic is used to transform AIS kinematic information into evidential mass functions for decision-level fusion. The specific block diagram of the neural network and fusion method is illustrated in Figure 1. Feature extraction methods, neural network models, and fusion algorithms are explained in this section.

2.1. Feature Extraction

2.1.1. LOFAR

LOFAR retains detailed time-frequency information of targets. The radiated noise of underwater acoustic targets contains rich line-spectrum components. The line spectrum is a key feature of target radiated noise. It is widely used in underwater acoustic target recognition. LOFAR analysis provides the broadband line-spectrum distribution of the signal, as shown in Figure 2. This nonstationary property reflects structural characteristics of underwater acoustic targets. In practical applications, LOFAR analysis can be combined with deep learning methods. For example, the LOFAR spectrum can be used as the input to a CNN. The CNN extracts robust features and improves recognition accuracy [24].

2.1.2. DEMON

LOFAR analysis may be affected by noise in low-frequency bands. Therefore, Detection of Envelope Modulation on Noise analysis was developed to address this limitation. LOFAR and DEMON together characterize the spectral features of the target signal. This paper adopts a square-law demodulation method, wherein the modulated signal is first squared and then passed through a low-pass filter to remove the direct current component and high-order harmonics. The resulting DEMON spectrum contains broadband continuous noise and narrowband line spectra.

The processed DEMON spectrum includes broadband noise and narrowband line spectra at corresponding modulation frequencies. Broadband noise usually has a large dynamic range. Marine environmental noise is irregular and time-varying. It causes continuous interference in the frequency spectrum. This interference affects the detection of weak line spectra. Split-average exclude average background equalization is used to improve line spectrum detection performance [25]. Noise mean estimation and removal are performed at each frequency point of the DEMON spectrum.

The original DEMON spectrum of a simulated underwater target and the spectrum after background equalization are shown in Figure 3. Block interference in the low-frequency region is partially removed after equalization. Background noise is suppressed, and the signal-to-noise ratio is improved. The intensity of line spectra is enhanced to different degrees. The line spectrum near 15 Hz is significantly strengthened. This enhancement increases the prominence of target features.

2.1.3. Spatiotemporal Benchmark Construction and Sequence Feature Extraction of Multi-Source Heterogeneous Data

The primary challenge in underwater acoustic target recognition lies in the physical differences between heterogeneous data sources. The AIS system provides discrete kinematic plots in the Geodetic Coordinate System. The passive sonar system acquires continuous acoustic observations in the array relative coordinate system. To enable effective fusion within a deep learning framework, a unified spatiotemporal benchmark must be established. Heterogeneous data must be reconstructed into high-dimensional spatiotemporal sequence features with physical consistency.

Spatial Domain Unification Based on Geodetic Calculation

Let the geographic coordinates of the passive sonar array center be

P_{sonar} = (φ_{s}, λ_{s}, h_{s})

. Here, φ, λ and h denote latitude, longitude, and depth. At time t, the state vector of the noncooperative target reported by the AIS system is defined as

S_{ais} (t) = {[φ_{t}, λ_{t}, v_{t}, ψ_{t}]}^{T}

. This vector contains geodetic position, speed over ground, and course over ground. Direct Euclidean distance calculation introduces significant error due to Earth curvature. Vincenty’s formulae are therefore adopted [26]. These formulae construct a nonlinear mapping

F_{geo} : R^{4} \to R^{4}

. The mapping transforms geodetic coordinates into the sonar polar coordinate system. The algorithm is based on the World Geodetic System 1984 (WGS 84) ellipsoid model, maintained by the National Geospatial-Intelligence Agency (NGA), Springfield, VA, USA. Distance and azimuth are obtained through iterative computation.

The spatially unified AIS feature vector at time t is defined as

X_{A I S} (t) = F_{geo} (S_{A I S} (t), P_{sonar}) = [\begin{matrix} θ_{rel} (t) \\ R_{rel} (t) \\ v_{t} \\ Δ ϕ_{los} (t) \end{matrix}]

(1)

The relative azimuth θ_rel(t) represents the true bearing of the target with respect to true north at the sonar array. The relative distance R_rel(t) denotes the geodesic distance from the target to the array center. This feature provides prior information for passive ranging. The speed over ground v_t retains the original AIS velocity. The line-of-sight angle deviation Δ

ϕ_{los}

(t) is defined as the angle between the target course ψ_t and the relative azimuth θ_rel(t). This feature indicates whether the target is approaching, receding, or moving laterally.

Temporal Alignment and Preprocessing

AIS timestamps are nonuniform. Cubic spline interpolation is applied to resample discrete AIS plots [27]. The resampling is performed on a unified time grid

T = \{t_{1}, t_{2}, \dots, t_{N}\} .

This grid is aligned with the passive sonar time grid. A time-synchronized kinematic sequence is thus obtained.

Different feature dimensions, such as angle, distance, and speed, affect gradient descent convergence. To eliminate scale differences, Z score standardization is applied to each feature dimension:

{\hat{x}}^{(i)} = \frac{x^{(i)} - μ_{i}}{σ_{i}}

(2)

Here,

μ_{i}

and

σ_{i}

denote the mean and standard deviation of the i th feature on the training set.

Time Series Tensor Construction with Sliding Window

Target motion exhibits temporal evolution and long-term dependencies. The single frame input mode is therefore not adopted. A sliding window method is used to construct time-series tensors. The window length is set to L time steps. The step size is S.

For the k-th sample, the normalized AIS input tensor

X_{AIS}^{(k)} \in R^{L \times D_{ais}}

is constructed:

X_{A I S}^{(k)} = {[{\hat{x}}_{A I S} (t - L + 1), \dots, {\hat{x}}_{A I S} (t)]}^{T}

(3)

The passive sonar beamforming output undergoes the same sliding window processing. The broadband azimuth history is converted into the sonar observation tensor

X_{sonar}^{(k)} \in R^{L \times D}

.

Through spatial transformation and temporal reconstruction, heterogeneous signals are converted into standardized spatiotemporal sequence tensors. This provides a consistent data basis for the deep association model.

2.2. Models

2.2.1. Convolutional Neural Network

Ship-radiated noise signals exhibit non-smooth time-varying characteristics where every subtle feature contains rich information [28]. Traditional signal processing methods lack the capacity to extract and analyze these complex features. Therefore, this study employs a deep convolutional neural network (CNN), a powerful tool designed to process images and similar complex data structures with strong capabilities in feature abstraction and pattern recognition [29]. The present study employs deep convolutional neural networks (CNNs) implemented with the PyTorch framework (Version 2.3.0; PyTorch Foundation, USA). All numerical computations and visualizations were conducted using Python 3.11 with scientific libraries including NumPy (Version 1.26.4), SciPy (Version 1.14.0), and Matplotlib (Version 3.8.2). By stacking convolutional layers, pooling layers, and fully connected layers, the CNN extracts high-level features from the input data progressively to capture minute variations in the ship-radiated noise signals. Compared to traditional methods, the CNN processes complex time-frequency features and automatically optimizes these representations, which significantly improves recognition accuracy and efficiency. By fine-tuning network parameters such as kernel size, stride, pooling method, and fully connected layer configurations, the model deeply explores multiple features within the signals to achieve high-precision classification. Finally, the Softmax layer outputs probability distributions over three nodes, which are directly used as the initial Basic Probability Assignment (BPA) values for the three focal elements {S}, {U}, and {Ω} in the subsequent DVE-NDS fusion.

2.2.2. Fuzzy Association Degree Construction and Correlation Resolution

In practical maritime environments, spatiotemporal reference biases exist between sonar and AIS data, specifically involving sonar array measurement errors, marine environmental noise interference, and asynchronous AIS data. Directly employing threshold-based methods for multi-source target association often results in missed or false matches [30,31]. To address this, the present section utilizes an association degree resolution method based on fuzzy logic [32]. This approach maps the geometric deviations between sonar contacts detected by gliders and AIS prior trajectories into fuzzy memberships to construct an association matrix, thereby dynamically generating high-confidence BPA.

Construction of Fuzzy Association Degree

Gaussian Membership Function for Target Features

Based on the physical imaging mechanisms of passive sonar and AIS, three elements are selected to construct fuzzy associations: bearing deviation, radial distance deviation, and kinematic reliability [33]. Gaussian membership functions are adopted to ensure continuity and smoothness. The membership value reaches 1 when the deviation is zero. It decreases as the deviation increases [34].

The state deviation vector between the i-th AIS trajectory and the j-th sonar contact at time t is defined as

e_{i j} (t) = [Δ θ_{i j} (t), Δ d_{i j} (t), v_{i} (t)]^{T}

(4)

where Δθ denotes the bearing deviation, Δd denotes the radial distance deviation, and v denotes the AIS trajectory velocity.

Bearing Affinity Membership

The bearing deviation is defined as

Δ θ_{i j} = |θ_{s o n a r} - θ_{A I S}|

(5)

where θ_sonar is the measured sonar bearing and θ_AIS is the theoretical bearing derived from AIS.

The Gaussian bearing membership is defined as

μ_{θ} (i, j) = e x p (- \frac{{(Δ θ_{i j})}^{2}}{2 σ_{θ}^{2}})

(6)

where σ_θ controls the sensitivity to bearing deviation.

Distance Affinity Membership

The distance deviation is defined as

Δ d_{i j} = |d_{s o n a r} - d_{A I S}|

(7)

where d_sonar and d_AIS denote the sonar-estimated distance and the AIS-reported distance.

The Gaussian distance membership is defined as

μ_{d} (i, j) = e x p (- \frac{(Δ d_{i j})^{2}}{2 σ_{d}^{2}})

(8)

where σ_d is the standard deviation parameter for distance deviation. This term reflects spatial consistency between sonar and AIS.

Kinematic Reliability Membership

Static objects such as floating debris may generate false alarms. High-speed moving targets are more likely to correspond to valid vessels. A velocity-based reliability index is therefore introduced.

Let the reference velocity be v_ref. The velocity membership is defined as

μ_{v} (i) = e x p (- \frac{{(v_{r e f} - v_{i})}^{2}}{2 σ_{v}^{2}})

(9)

where σ_v controls the sensitivity to velocity magnitude.

To evaluate the association between AIS trajectories and sonar targets, the overall association degree is computed using a weighted average. Let the weight vector be W = [

ω_{θ}

,

ω_{d}

,

ω_{v}

], with ∑

ω = 1

. Following the empirical analysis in [33], the weights are set as

ω_{θ} = 0.6, ω_{d} = 0.25, ω_{v} = 0.15,

ensuring that bearing consistency retains the dominant influence.

The overall association degree between the i-th AIS target and the j-th sonar target is defined as

R_{i j} = ω_{θ} μ_{θ} (i, j) + ω_{d} μ_{d} (i, j) + ω_{v} μ_{v} (i, j)

(10)

The value R_i,j quantifies the likelihood that sonar target j corresponds to AIS target i.

Heterogeneous Evidence Reconstruction Strategy Based on Negation Basic Probability Assignment

In multi-source heterogeneous information fusion, the association degree between AIS data and passive sonar data provides asymmetric evidence regarding target identity. A high association degree supports surface targets. A low association degree supports underwater targets. To express this bidirectional logic within a unified mathematical framework, the Negation Basic Probability Assignment theory is introduced [35]. An association-driven dynamic evidence reconstruction mechanism is then established.

Define the recognition framework Θ = {S,U}. Under this framework, we consider three focal elements: the surface target proposition {S}, the underwater target proposition {U}, and the universal uncertainty proposition Ω = {S, U}. It should be noted that Ω represents the ‘uncertain’ state regarding the target attribute (S or U). Consider three focal elements, representing the surface target S, the underwater target U, and the uncertain target Ω, respectively. For any basic evidence m(A), where n is the number of focal elements, its negation BPA m(A) is defined as

\overline{m} (A) = \frac{1 - m (A)}{n - 1}

(11)

This operator measures the negative belief toward proposition A_i. In this study, it is used to convert AIS mismatch information into support for underwater targets.

Let R_i,j denote the fuzzy association degree between an AIS track and a sonar contact. Let α denote the sensor reliability factor. A piecewise BPA generation model is constructed.

(1): Forward Confirmation Mode

When R_i_,j ≥ ϵ, the association degree exceeds the threshold ϵ. AIS provides positive evidence for surface targets. The association degree is mapped directly to proposition S. The remaining belief is distributed equally between U and

Ω

. The forward evidence vector m_pos is defined as

\{\begin{array}{l} m_{pos} (S) = α R \\ m_{pos} (U) = \frac{1 - α R}{2} \\ m_{pos} (Ω) = \frac{1 - α R}{2} \end{array}

(12)

(2): Reverse Inference Mode

When R_i,j < ϵ, the AIS mismatch implies a negation of proposition S. The negation BPA operator is applied to reconstruct the evidence.

The original AIS observation state is defined as m_org(S) = αR_i_,j, m_org(U) = 0, AIS has no physical capability to detect underwater targets. Therefore, its direct support for U is zero.

The negation BPA of each proposition is then calculated. Using Formula (11), we calculate

\overline{m} (S) = \frac{1 - α R}{2}

, which represents the disbelief in surface targets. Since the sonar has detected a target, this belief is assigned to U.

For proposition U, the original support is zero. Therefore, its negation

\overline{m} (U)

reaches a high value. However, AIS does not observe underwater targets directly. It is not possible to determine whether this disbelief corresponds fully to S or to uncertainty. According to the maximum entropy negation principle proposed by Yager [36], this belief is distributed equally between S and

Ω

. This avoids biased allocation.

An intermediate unnormalized evidence vector

{\tilde{m}}_{n e g}

is obtained.

\{\begin{array}{l} {\tilde{m}}_{n e g} (U) = \frac{1 - α R}{2} \\ {\tilde{m}}_{n e g} (S) = 0.25 \\ {\tilde{m}}_{n e g} (Ω) = 0.25 \end{array}

(13)

To satisfy the normalization condition, the final reverse evidence vector m_neg is computed after normalization.

\{\begin{array}{l} m_{neg} (U) = \frac{1 - α R}{2 - α R} \\ m_{neg} (S) = \frac{1}{4 - 2 α R} \\ m_{neg} (Ω) = \frac{1}{4 - 2 α R} \end{array}

(14)

Under this mechanism, as the association degree approaches zero, the beliefs converge to

m_{neg} (U)

approx 0.5 and

m_{neg} (S)

approx 0.25. Mathematically, this preserves the core conclusion that a low association indicates a higher probability of an underwater target. This residual allocation to Ω represents uncertainty under AIS mismatch and reduces the risk of overconfident assignment to the underwater hypothesis when non-cooperative surface vessels may be present.

DVE-NDS: A Dual-Perspective Entropy-Driven Negation D-S Fusion Algorithm

To address the issues of high conflict and asymmetric uncertainty in multi-source heterogeneous underwater acoustic data, such as the strong negative information provided by AIS mismatches, traditional Dempster–Shafer (D-S) fusion methods often yield counter-intuitive results. This is due to the lack of joint modeling for evidence quality and collective consensus. To resolve this, we propose a hybrid weighted fusion algorithm based on dual-perspective entropy and geometric distance, termed DVE-NDS. The algorithm’s core logic consists of three levels: 1. Dual-perspective entropy explicitly models the evidential value of sources, such as AIS, in the negation domain. 2. A collective consensus evaluation mechanism based on Jousselme distance measures semantic conflicts between evidence in geometric space. 3. A parameter-free adaptive correction strategy performs soft replacement on evidence significantly deviating from collective consensus, suppressing anomalies while preserving the original focusing characteristics of high-quality evidence.

Fusion Problem Definition

For the underwater acoustic target recognition task, the frame of discernment is defined as Θ = {S,U}, representing surface and underwater targets, respectively, with a cardinality of N = |Θ| = 2. The system input consists of three independent evidence sources: the CNN classification evidence based on LOFAR spectra m_lofar, the CNN classification evidence based on DEMON spectra m_demon, and the AIS evidence reconstructed via fuzzy association m_ais. The objective of fusion is to synthesize these three pieces of evidence to calculate the global BPA and render a final decision.

Evidence Self-Information Quality Assessment

To precisely quantify the reliability of each evidence source, particularly capturing the negation value of AIS evidence in a mismatch state, this study introduces a quality assessment mechanism based on dual-perspective entropy.

(1): Original Perspective

The uncertainty of evidence in the original observation space is quantified using Deng entropy. For evidence m_i, the Deng entropy E_d(m_i) is defined as [37]

E_{d} (m_{i}) = - \sum_{A \subseteq Θ} m_{i} (A) {l o g}_{2} \frac{m_{i} (A)}{2^{| A |} - 1}

(15)

Here, |A| denotes the cardinality of the focal element A. A higher E_d value indicates a more dispersed belief, suggesting that the evidence is largely uncertain.

(2): Negation Perspective

The negation belief entropy quantifies the information content in negative hypotheses. The negation BPA m_i is introduced, and the negation belief entropy E_n(m_i) is calculated as [38]

E_{n} (m_{i}) = - \sum_{A \subseteq Θ} {\bar{m}}_{i} (A) {l o g}_{2} \frac{{\bar{m}}_{i} (A)}{2^{| A |} - 1}

(16)

This represents the negative belief in the original evidence. For AIS, even if the original support for S is low during a mismatch, the negation belief toward ¬S clearly indicates the belief distribution, resulting in lower negation entropy.

(3): Comprehensive Uncertainty Measure

A comprehensive uncertainty measure U_i is constructed to evaluate the credibility of the evidence body:

U_{i} = γ \cdot e^{E_{d} (m_{i})} + (1 - γ) \cdot e^{E_{n} (m_{i})}

(17)

Here, γ ∈ [0,1] is an adjustment factor balancing the original and negation perspectives. The exponential form penalizes high-entropy evidence, amplifying the distinction between high- and low-quality evidence.

The self-information quality weight

w_{i}^{q}

for the i-th evidence source is defined as

w_{i}^{q} = \frac{{U_{i}}^{- 1}}{\sum_{j = 1}^{K} {U_{j}}^{- 1}}

(18)

This ensures that high-quality evidence with low uncertainty in both perspectives receives a higher initial weight.

Group Consensus Evaluation Based on Jousselme Distance

To prevent highly confident but false evidence from influencing the consensus baseline, Jousselme evidence distance is employed to construct consistency weights. This metric is widely used in D-S theory to measure geometric differences between basic probability assignments. It enables the identification of evidence that deviates from group consensus.

The Jousselme distance between two pieces of evidence is defined as

d_{i j} = \sqrt{\frac{1}{2} {(m_{i} - m_{j})}^{⊤} D (m_{i} - m_{j})}

(19)

where D denotes the Jaccard similarity matrix, and its elements are defined as

D (A, B) = \frac{| A \cap B |}{| A \cup B |}

(20)

The consistency score between two pieces of evidence is defined as

s_{i j} = 1 - d_{i j}

(21)

The consensus weight of the i-th evidence source is defined as

w_{i}^{c} = \frac{\sum_{j = 1, j \neq i}^{K} s_{i j}}{\sum_{i = 1}^{K} \sum_{j = 1, j \neq l}^{K} s_{i j}}

(22)

This weight reflects the degree of agreement between the i-th evidence and the remaining sources.

Coupled Weight Generation and Dynamic Evidence Correction

To balance individual quality and group consensus, the final coupled weight is defined as

{\hat{w}}_{i} = \frac{w_{i}^{q} \cdot w_{i}^{c}}{\sum_{j = 1}^{K} w_{j}^{q} \cdot w_{j}^{c}}

(23)

The baseline evidence is computed as

m_{a v g} (A) = \sum_{i = 1}^{K} {\hat{w}}_{i} m_{i} (A)

(24)

After generating the baseline through weighted averaging, the Jousselme distance is further applied to measure the residual conflict between each evidence source and the baseline. The distance between m_i and

\overline{m}

is calculated as d_J(m_i,

\overline{m}

). The corresponding similarity measure is defined as

S i m (m_{i}, \overline{m}) = 1 - d_{J} (m_{i}, \overline{m})

(25)

To achieve adaptive anomaly control, the mean group distance is defined as a dynamic threshold:

\overline{d} = \frac{1}{K} \sum_{i = 1}^{K} d_{J} (m_{i}, \overline{m})

(26)

The dynamic correction rule is defined as

m_{i}^{*} = \{\begin{array}{l} \overline{m}, & d_{J} (m_{i}, \overline{m}) > \overline{d} \\ m_{i}, & d_{J} (m_{i}, \overline{m}) \leq \overline{d} \end{array}

(27)

If the distance between the evidence and the baseline exceeds the average level, the evidence is regarded as inconsistent and replaced by the baseline. Otherwise, the original distribution is retained. The corrected evidence set m^∗ is treated as a new group of independent sources. These sources are fused using the classical Dempster combination rule to obtain the final decision evidence:

m_{f i n a l} (A) = \frac{1}{1 - K} \sum_{A_{1} \cap A_{2} \cap A_{3} = A} m_{1}^{*} (A_{1}) m_{2}^{*} (A_{2}) m_{3}^{*} (A_{3})

(28)

K = \sum_{A_{1} \cap A_{2} \cap A_{3} = Ø} (\prod_{i = 1}^{N} m (A_{i}))

(29)

where K denotes the normalized conflict coefficient.

2.2.3. Interpretability Assessment Based on Game Theory Shapley Values

To quantify the marginal contributions of LOFAR, DEMON, and AIS within the DVE-NDS decision process, an interpretability analysis based on the Shapley value is introduced. The Shapley value from cooperative game theory allocates contribution by computing the average marginal gain of a source over all possible coalitions. It reduces allocation bias caused by correlation or redundancy among multi-source data.

Let N denote the set of sources, where N = 3 corresponds to LOFAR, DEMON, and AIS. The Shapley value ϕ_i of the i-th source is defined as

ϕ_{i} = \frac{1}{N!} \sum_{S \subseteq N ∖ {i}} | S |! (N - | S | - 1! [v (S \cup {i}) - v (S)]

(30)

where S is a subset of sources excluding i, and |S| denotes the number of elements in S. The utility function v(S) is defined as the classification accuracy obtained when only the source subset S is used for fusion. The term v(S∪{i}) − v(S) represents the marginal accuracy gain introduced by source i.

To eliminate the influence of source ordering, the expected marginal contribution is computed over all possible coalitions. The resulting Shapley value is taken as the overall importance of the source. This method provides a quantitative basis for analyzing the collaborative mechanism of heterogeneous sources within the fusion framework.

3. Experimental Setup

3.1. Original AIS Dataset

The trajectory plots in Figure 4 illustrate the movement paths of different vessel types. The AIS trajectories used in this study were derived from simulated trajectories based on recorded vessel motion patterns, and were synchronized with the acoustic simulation scenario, as described below. Multiple vessels and gliders are included to increase data diversity. In Figure 4, the horizontal axis represents longitude in degrees east, and the vertical axis represents latitude in degrees north. Markers with different colors and shapes denote three underwater gliders, one submarine, five ships, and one underwater target. This study uses AIS-derived variables related to vessel motion and radiated noise characteristics. These variables are summarized in Table 1.

3.2. Vessel Noise Dataset

This study employs self-simulated data. Vessel-radiated noise is simulated using the BELLHOP underwater acoustic toolbox (HLS Research, West Kingston, RI, USA). Acoustic propagation calculations are performed to model vessel-generated noise in the marine environment. The sound speed profile of the study area is obtained from the World Ocean Atlas 2023 (WOA2023), provided by the National Oceanic and Atmospheric Administration (NOAA), Silver Spring, MD, USA. Bathymetry and sea surface height data are obtained from the Digital Bathymetric Data Base 2-min resolution (NRL DBDB2), developed by the Naval Research Laboratory (NRL), USA. Basic vessel parameters are extracted from the generated AIS data. These parameters include coordinates, velocity, and heading. They determine vessel motion states and radiated noise characteristics.

To simulate a moving array, the motion of an underwater glider is modeled. The hydrophone array consists of four elements representing four directions. The array moves along a vertical profile that approximates sinusoidal motion. The velocity is set to 0.8 knots, and the cutoff frequency is 2000 Hz. Underwater and surface targets differ in tonnage, blade number, radiated center frequency, and depth. To reduce the risk of overfitting to idealized acoustic conditions, additive white Gaussian noise (AWGN) was injected into the simulated acoustic waveforms. The primary experiments were conducted at a baseline SNR of 15 dB. The final simulated signals are stored in WAV format for subsequent processing.

3.3. Training Set

During model training, input images of size 875 × 656 pixels are used to preserve sufficient spatial information. The CNN model, whose detailed architecture is provided in Table 2, employs convolutional layers (denoted as e.g., "Conv1 (3 × 3, 16)" for a 3 × 3 kernel with 16 output channels) and pooling layers. After training, the network outputs a Softmax probability distribution, which is subsequently converted into Basic Probability Assignment values for each category.

The dataset encompasses two physical target classes: surface vessels and underwater targets. To enhance the model’s robustness in discriminating ambiguous acoustic features, an additional set of ‘hybrid’ samples is incorporated only in the training and validation phases. The test set consists solely of 110 surface and 110 underwater target samples to evaluate the core classification performance. he detailed composition of the dataset is summarized in Table 3. Although the ground-truth test set does not contain any genuine hybrid targets, the model may still erroneously predict this category when processing highly ambiguous samples. Consequently, it remains a possible prediction state and is thus included in the final evaluation matrices. The test set was held out as an independent final evaluation set and was not used during hyperparameter tuning.

All neural networks are optimized using the ADAM optimizer. The initial learning rate is set to 0.01. It decays by a factor of 0.75 every four epochs. Cross-entropy loss is used as the training objective. Unless otherwise specified, the batch size is 35. Model performance is evaluated in two stages. First, the three independent models are compared. Second, models under decision-level fusion are evaluated. Performance is measured using classification metric tables and confusion matrices on the test set.

3.4. Cross-Validation and Hyperparameter Optimization

Five-fold cross-validation was conducted on the training/validation portion for hyperparameter tuning and model selection. Given the relatively limited scale of the simulated dataset, a rigorous 5-fold cross-validation scheme was employed to thoroughly assess the model’s generalization capability and structurally eliminate the risk of overfitting. The dataset was randomly partitioned into five equal subsets. In each iteration, four folds were used for training and the remaining one for validation. To ensure mathematical optimality and physical authenticity, the system parameters were explicitly categorized into physically constrained variables and data-driven hyperparameters. Specifically, the temporal windowing size (T) and fuzzy association weights were analytically derived from physical acoustic integration times and empirical hardware error margins. Conversely, the evidence switching threshold (ϵ) and the entropy balance coefficient (

γ

) were optimized via a systematic grid search on the validation folds to optimally balance false alarms against true positive rates. Furthermore, sensitivity analysis confirmed that the fusion framework remains highly stable across a broad parameter range (

γ

in [0.4, 0.6]), proving that the robust conflict-resolution capability is structurally guaranteed by the inherent DVE-NDS logic rather than hypersensitivity to specific scalar values.

4. Results and Discussion

4.1. Performance Benchmarking and Error Correction Traceability

Table 4 presents the quantitative classification performance, evaluated using a rigorous 5-fold cross-validation protocol. The results compare the CNN-based LOFAR and DEMON classifiers, the AIS and sonar matching method, the baseline D-S fusion model, and the proposed DVE-NDS fusion method. The corresponding confusion matrices derived from the test set are shown in Figure 5.

Compared with the baseline model, the DVE-NDS method exhibits stronger diagonal dominance in the confusion matrix. This indicates improved discrimination between surface and underwater targets.

To further analyze feature representation and error correction capability, features from the final fully connected layer are extracted. These features are reduced using t-SNE for visualization. The embedding results are shown in Figure 6. The left panel corresponds to the baseline model, and the right panel corresponds to the DVE-NDS method. In Figure 6, Class 0 denotes surface targets, Class 1 denotes underwater targets, and Class 2 denotes hybrid targets. Colors indicate predicted categories. Shapes indicate the prediction status. Circles represent correct classifications, and crosses represent misclassifications.

In the baseline panel, many cross markers are distributed across clusters. This reflects limited separability between categories. In contrast, the DVE-NDS panel shows a clear reduction in misclassified samples. Most samples are correctly classified and form more compact and better separated clusters. This visualization is consistent with the quantitative results and indicates improved decision boundary separability.

To trace the error correction process, a multi-stage Sankey diagram is constructed, as shown in Figure 7. The diagram illustrates the prediction flow from the baseline model to the proposed DVE-NDS method, and ultimately to the ground truth. Green flows represent samples misclassified or ambiguously classified by the baseline model that are successfully corrected by DVE-NDS. The diagram explicitly shows that five samples initially mispredicted as Class 1 and eight samples mispredicted as Class 0 by the baseline are rectified. More notably, 11 samples that the baseline model failed to decisively identify—assigning them instead to the ambiguous Mixed category (Base: 2)—are effectively resolved by our method, with 6 corrected to Class 0 and 5 corrected to Class 1.

In addition, strong structural alignment between the DVE-NDS predictions and the ground truth is observed. Specifically, 88 Class 0 samples and 91 Class 1 samples maintain stable, correct predictions from the baseline through to the final output (represented by the thick blue flows). Ultimately, the DVE-NDS method achieves highly accurate final results, successfully outputting 99 accurate predictions for Class 0 and 104 for Class 1. This flow analysis confirms that the DVE-NDS fusion strategy significantly improves prediction consistency. It effectively resolves misclassifications in ambiguous cases while securely maintaining the stability of already-correct predictions.

4.2. Analysis of Prediction Uncertainty and Decision Mechanism

Although the previous analysis demonstrates the error correction capability of the DVE-NDS strategy, accuracy alone does not fully describe the internal certainty of the decision process. To evaluate whether the proposed fusion algorithm reduces prediction ambiguity, probability distributions of the test samples are analyzed using statistical and topological methods.

The Shannon entropy of the predicted probability vectors is computed for all test samples to quantitatively measure informational uncertainty. Figure 8 presents the entropy distributions of the baseline D-S method and the proposed DVE-NDS method. The baseline distribution, shown by the gray violin plot, exhibits a highly dispersed and vertically extended shape. Notably, its slightly wider base indicates that the baseline can still confidently classify a small subset of clear-cut samples. However, the vast majority of its probability density concentrates in the upper region, yielding a high mean entropy of 0.751. This structurally demonstrates that traditional D-S fusion frequently falls into severe ambiguity when confronting heterogeneous conflicts under noise-degraded conditions.

In contrast, the DVE-NDS distribution, shown in green, is significantly more compact and heavily concentrated near the lower entropy values. The mean entropy decreases substantially from 0.751 for the baseline method to 0.675 for DVE-NDS. A two-sided independent-sample t-test confirmed that this entropy reduction is statistically significant (p < 0.001). This result effectively proves that the DVE-NDS fusion strategy fundamentally improves decision certainty. Rather than being overwhelmed by acoustic noise, it actively penalizes high-entropy observations and dynamically reconstructs unreliable inputs using kinematic priors, thereby maintaining high decision confidence even in challenging environments.

To further analyze the geometric structure of probability distributions, prediction vectors are projected onto a two-dimensional simplex using ternary plots, as shown in Figure 9. In this representation, the three vertices correspond to Class 0 at the bottom left, Class 1 at the bottom right, and Class 2 at the top. The center of the simplex represents maximum ambiguity. In the baseline panel, the probability density is broadly distributed across the simplex. A substantial portion remains near the center and along the edges. This indicates that the baseline model frequently assigns comparable probabilities to multiple classes.

In contrast, the DVE-NDS panel shows a clear concentration of density near the vertices. The central region contains fewer samples. This shift indicates that DVE-NDS increases class dominance in the predicted distributions and reduces ambiguous assignments.

To analyze the mechanism of entropy reduction, probability transition paths are visualized in the ternary simplex of Figure 10. Each arrow represents the evolutionary trajectory of a test sample’s probability distribution, where the tail corresponds to the baseline D-S prediction and the head points to the refined DVE-NDS prediction. The three vertices of the simplex explicitly represent the Surface (S), Underwater (U), and Uncertainty (Ω) focal elements. Most long green arrows (Successful Correction) originate from the upper Ω-dominated region or the central ambiguous zone, flowing decisively downward to terminate near the definitive S or U vertices. This geometric pattern intuitively demonstrates that the DVE-NDS actively drains probability mass from highly uncertain states and correctly reassigns it to the definitive physical classes. Gray arrows (Stable Correct) represent samples accurately classified by both models. These arrows exhibit minimal displacement and remain tightly clustered near the basal S and U vertices, proving that the DVE-NDS preserves the stability of already-confident predictions while fine-tuning their probabilities. Conversely, red arrows (Error) indicate newly induced misclassifications. Their sparse distribution confirms that dynamically sharpening the decision boundaries does not substantially compromise the classification of simple, baseline-correct samples.

Overall, Figure 10 illustrates that the performance improvement of the DVE-NDS is achieved through a systematic and directional probability redistribution. This mechanism effectively extracts ambiguous samples from the Ω uncertainty trap and shifts them toward their true physical states, while leaving stable predictions largely undisturbed.

4.3. Analysis of Heterogeneous Source Contribution and Robustness

Based on the Shapley value framework defined in Section 2.2.3, this section evaluates the quantitative contributions of heterogeneous sources within the DVE-NDS fusion process.

Figure 11a illustrates the influence of source inclusion order on system performance. The six cumulative bar charts correspond to the six possible permutations of fusion paths. The height of each colored segment represents the marginal accuracy gain after introducing a specific source. Results show that different fusion sequences produce fluctuations in intermediate accuracy. However, the final fusion accuracy converges to the same level for all paths. This convergence indicates that the DVE-NDS framework is not sensitive to source ordering. Whether acoustic features or AIS data are introduced first, subsequent sources compensate for missing information. This property reflects robustness to modality imbalance.

Figure 11b presents the normalized Shapley values. The height of each segment represents the average marginal contribution of a source across all permutations. The contributions of LOFAR, DEMON, and AIS are approximately 28.1%, 31.7%, and 40.2%, respectively.

AIS exhibits the largest individual contribution. This result is consistent with its provision of direct kinematic information, which is generally less ambiguous than acoustic features. However, the combined contribution of the acoustic modalities is close to 60%. This indicates that LOFAR and DEMON play a substantial role in the fusion process. Although AIS provides strong motion cues, it may suffer from spoofing or signal interruption. The acoustic sources compensate for such limitations. Their contributions demonstrate that the system performance does not rely on a single modality.

Overall, this analysis shows that DVE-NDS distributes importance across heterogeneous sources in a balanced manner. The fusion strategy leverages both kinematic and acoustic information. This design improves reliability under incomplete or degraded input conditions.

4.4. Robustness Analysis Under Simulated Degradations

To address concerns regarding potential overfitting to idealized simulations, an additive white Gaussian noise (AWGN) degradation experiment was conducted. Starting from the baseline dataset at a realistic signal-to-noise ratio (SNR) of 15 dB, we systematically injected AWGN to create progressively degraded environments at SNR levels of 10 dB, 5 dB, and 0 dB. As illustrated in Figure 12, classification performance inevitably declines as acoustic interference intensifies; however, the degradation trajectories clearly highlight the robustness of the proposed framework.

At the baseline of 15 dB, the traditional D-S method and DVE-NDS achieve F1 scores of 83.18% and 92.27%, respectively. As the SNR drops to 5 dB, the baseline’s performance deteriorates rapidly to 66.20% due to severe semantic conflicts within the corrupted acoustic features. In contrast, the DVE-NDS degrades much more gracefully, maintaining a robust score of 79.50% by effectively utilizing AIS kinematic priors to compensate for the acoustic information loss.

When the environment reaches the extreme physical limit of 0 dB, the acoustic signals become entirely submerged in noise. Under these conditions, both methods experience a sharp, non-linear decline, with the baseline accuracy dropping heavily to 58.40% and the DVE-NDS falling to 67.80%. This expected decrease confirms that the algorithm strictly adheres to the physical limitations of severe acoustic corruption. Nevertheless, the DVE-NDS consistently maintains a functional safety margin of approximately 9.4% over the baseline. This validates that its dual-view entropy and dynamic evidence correction mechanisms effectively prevent catastrophic classification failures, even in highly hostile marine environments.

4.5. Computational Complexity and Real-Time Feasibility

To evaluate the feasibility of deploying the DVE-NDS framework in real-time maritime surveillance systems, we comprehensively analyze its computational complexity and inference latency. It is important to clarify that the Shapley value analysis (presented in Section 4.3) is strictly an offline interpretability tool used for system evaluation and is never computed during the online inference phase.

The online inference pipeline consists of two main stages: CNN feature extraction and decision-level fusion. Let K denote the number of evidence sources (K = 3 for LOFAR, DEMON, and AIS) and |Θ| denote the size of the frame of discernment (|Θ| = 2). The computational complexity of calculating the dual-view entropy is bounded by O(K × 2^|Θ|). The pairwise Jousselme distance computation and dynamic correction are bounded by O(K² × 2^2|Θ|). Finally, Dempster’s combination rule operates in O(K × 2^2|Θ|). Since both K and |Θ| are very small constants in our classification task, the overall Big-O complexity of the proposed fusion module operates in constant time O(K² × 4^|Θ|) ≈ O(1) with respect to the input dimension. Consequently, the theoretical computational bottleneck is overwhelmingly dominated by the CNN backbone, which operates in O(FLOPs_CNN).

To quantify the practical computational overhead, Table 5 compares the average online inference time per sample, evaluated on a standard hardware environment (NVIDIA RTX 3060 GPU and Intel Core i7 CPU).

As demonstrated in Table 5, the deep learning backbone consumes over 97% of the total inference time. Although the DVE-NDS algorithm introduces additional entropy calculations and geometric distance corrections, it merely adds a negligible overhead of approximately 0.033 s (33 ms) compared to the classical D-S fusion. The overall algorithmic processing speed of approximately 2.12 s per sample comfortably satisfies the real-time processing constraints of Autonomous Underwater Vehicles (AUVs) and modern maritime surveillance systems.

4.6. Case Study: Quantitative Resolution of the “Dark Ship” Conflict and Dynamic Correction Mechanism

To explicitly demonstrate how the proposed DVE-NDS framework resolves the severe system-level challenge posed by spatial un-association (which produces a mathematically analogous effect to the “dark ship” scenario where AIS is absent), a typical moderate-conflict case from the test set is quantitatively analyzed in this section. In this scenario, the physical entity is a surface target (S). However, due to a severe spatial mismatch resulting in an extremely low AIS-sonar association degree, the system triggers the reverse inference mode, causing the initial AIS evidence to generate a highly misleading bias towards the underwater hypothesis (m_ais(U) = 0.48). Concurrently, the acoustic modalities (LOFAR and DEMON) successfully capture the authentic surface features.

4.6.1. Evidence Distribution and Dynamic Weighting Mechanism

As illustrated in Figure 13a, classical D-S theory is highly susceptible to the misleading AIS prior when facing such heterogeneous conflicts. Because the classical rule rigidly fuses the data without evaluating source reliability, the final confidence in the correct surface target (S) is noticeably dragged down by the AIS bias.

However, the DVE-NDS framework effectively intercepts this error through its dual-view metric mechanism. Figure 13b reveals the underlying weight evolution. The system calculates that the mismatched AIS evidence possesses the highest dual-view entropy (reflecting its highly dispersed uncertainty), which translates into the lowest self-information quality weight (w^q). Simultaneously, it exhibits the lowest group consensus (w^c) compared to the mutually agreeing acoustic sensors. Consequently, the final coupled weight (w) dynamically down-weights the influence of the AIS while appropriately rewarding the LOFAR and DEMON modalities. Ultimately, it safely returns the decision-making dominance to the highly reliable acoustic features, yielding a cleaner and more accurate final decision in Figure 13a.

4.6.2. Multidimensional Evidence Health Assessment

To microscopically deconstruct this weighting process, Figure 14 introduces a 1 × 3 multidimensional radar chart to panoramically display the structural “health metrics” of each evidence source across the S, U, and Ω dimensions.

In the Surface (S) dimension, the polygons for LOFAR and DEMON are notably expansive along the “Initial BPA Value” axis, and they also maintain high scores on the quality (w^q) and consensus (w^c) axes. This proves that their high-quality, mutually agreeing acoustic features successfully translate into dominant final weights (w).

The most critical behavior occurs in the Underwater (U) dimension. Although the AIS evidence surges along the “Initial BPA Value” axis (attempting to mislead the system towards U), its relative shrinkage along the quality weight (w^q) and consensus weight (w^c) axes causes its red polygon to noticeably distort. This geometric distortion intuitively illustrates how DVE-NDS immunizes the system against “low-credibility, high-confidence” mismatched priors. By proportionally penalizing the unreliable AIS evidence, the algorithm effectively restricts its decision-making influence without resorting to a rigid veto.

In the Uncertainty (Ω) dimension, all three sources maintain balanced and reasonable residual probabilities. This indicates that the framework’s fault-tolerance mechanism is functioning normally, reserving necessary uncertainty mass to avoid arbitrary absolute vetoes during the fusion process.

4.6.3. Extreme Suppression and Final Decision Enhancement

After computing the weighted baseline consensus (m_avg), the system further activates the dynamic correction mechanism. As shown in Figure 15a, the algorithm calculates the Jousselme distance from each independent evidence source to the group consensus. In this highly ambiguous conflict, the system mathematically identifies DEMON and the mismatched AIS as two divergent “extremes” because both their distances (0.162 and 0.199, respectively) exceed the dynamic average threshold (d = 0.130). According to the algorithm’s architectural design, these two pieces of extreme evidence are forcefully replaced by the relatively rational group consensus, thereby rectifying the evidence space prior to the final Dempster’s combination.

The ultimate decision enhancement brought by this correction is demonstrated in Figure 15b. Faced with such a moderate yet deceptive spatial mismatch, the naive classical D-S fusion result wavers, yielding a baseline confidence of only 67.8% for the true surface target (S). In contrast, through the algorithmic operations that restrict extremes and embrace consensus, the proposed DVE-NDS framework successfully sharpens the final confidence of S to 70.6% (achieving a net gain of +2.8%). Simultaneously, it further suppresses the potential misclassification probability for the underwater hypothesis (U) by −2.0%. This quantitative process perfectly validates the structural robustness and self-correcting capability of the DVE-NDS framework in complex, non-ideal adversarial marine environments.

5. Conclusions

To address the challenge of precise surface and underwater target discrimination in complex marine environments, this study proposes a heterogeneous decision-level fusion framework, namely the Dual-View Entropy-Driven Negation Dempster–Shafer (DVE-NDS) algorithm. The proposed methodology structurally integrates three core strategies: a Negation Basic Probability Assignment (BPA) mechanism that maps AIS spatiotemporal mismatches into effective negative evidence for underwater targets; a dual-view entropy metric that quantifies the self-information quality of evidence from both original and negation perspectives; and a dynamic evidence correction strategy based on the Jousselme distance to evaluate inter-source consensus. By seamlessly aligning discrete kinematic priors with continuous acoustic representations (LOFAR and DEMON spectra), this framework effectively mitigates semantic inconsistencies and resolves conflicts between heterogeneous modalities. When evaluated under a simulation setting (15 dB signal-to-noise ratio), the DVE-NDS framework demonstrates a clear performance improvement over the classical D-S fusion rule. The proposed method achieves a classification accuracy of 92.27%, outperforming the traditional baseline by 9.09% while significantly reducing prediction uncertainty, as evidenced by lower Shannon entropy and more deterministic ternary probability distributions. Furthermore, under progressive additive white Gaussian noise (AWGN) degradation conditions, the DVE-NDS framework maintains a relative performance advantage, gracefully handling severe acoustic corruption by leveraging spatial priors. Additionally, game-theoretic Shapley value analysis quantitatively verifies the complementary nature of the multi-source fusion, confirming that AIS kinematic priors enhance overall decision performance (contributing 40.2%) while the dual acoustic modalities provide stable structural support (contributing 59.8%).

In summary, this study provides an early-stage methodological proof-of-concept that integrates kinematic and acoustic information within a unified fusion structure, improving classification stability under heterogeneous input conditions. However, an inherent limitation of the present work is that it does not entirely resolve the fundamental ambiguity caused by highly confrontational scenarios, and its validation currently relies solely on simulation. Specifically, non-cooperative “dark ships” utilizing AIS spoofing or radio silence pose a significant challenge. In such cases, the AIS mismatch inevitably introduces a baseline bias towards the underwater hypothesis. Although our structural allocation of the uncertainty focal element Ω effectively prevents absolute vetoes, thus allowing high-confidence acoustic features to actively mitigate this bias, the upper bound of overall classification accuracy remains constrained by severe sensor deception.

Furthermore, a critical bottleneck restricting the operational deployment of heterogeneous maritime fusion is the absence of comprehensive public datasets. Widely used databases (such as ShipsEar and DeepShip) primarily focus on single-modality acoustic signals, lacking the strictly synchronized AIS kinematic trajectories and real-time marine environmental parameters required for multi-modal validation. Consequently, a primary focus of our future work will be tackling the “dark ship” challenge explicitly by developing adaptive evidence discounting strategies to dynamically estimate the density of non-cooperative targets. It is also acknowledged that current simulations utilizing AWGN cannot fully replicate the immense complexities of real-world marine environments, such as highly variable non-Gaussian ambient noise, transient biological interferences, and severe multi-path propagation fading. Concurrently, we aim to collaborate on conducting comprehensive, multi-sensor sea trials once sufficient funding is secured. We ultimately hope to construct and contribute a fully synchronized, multi-modal maritime database (integrating acoustic recordings, AIS logs, and environmental profiles) to the academic community, thereby jointly propelling the evolution of multi-source information fusion in complex ocean environments.

Author Contributions

Methodology, X.Z. and D.W.; Software, X.Z.; Validation, X.Z. and X.H.; Formal Analysis, X.Z.; Investigation, M.D. and D.W.; Resources, Y.Z. and D.W.; Data Curation, X.X. and J.C.; Writing—Original Draft, X.Z.; Writing—Review and Editing, D.W.; Supervision, D.W.; Project Administration, D.W.; Funding Acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The Article Processing Charge (APC) was funded by the authors.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DVE-NDS	Dual-View Entropy-Driven Negation Dempster–Shafer
BPA	Basic Probability Assignment
DTW	Dynamic Time Warping
AIS	Automatic Identification System
LOFAR	Low Frequency Analysis and Recording
DEMON	Detection of Envelope Modulation on Noise
CNNs	Convolutional Neural Networks
AWGN	Additive White Gaussian Noise

References

Urick, R.; Kuperman, W.A. Ambient Noise in the Sea. J. Acoust. Soc. Am. 1989, 86, 1626. [Google Scholar] [CrossRef]
Vangi, M.; Topini, E.; Liverani, G.; Topini, A.; Ridolfi, A.; Allotta, B. Design, Development, and Testing of an Innovative Autonomous Underwater Reconfigurable Vehicle for Versatile Applications. IEEE J. Ocean. Eng. 2025, 50, 509–526. [Google Scholar] [CrossRef]
Vallicrosa, G.; Fumas, M.J.; Huber, F.; Ridao, P. Sparus II AUV as a Sensor Suite for Underwater Archaeology: Falconera Cave Experiments. In Proceedings of the 2020 IEEE/OES Autonomous Underwater Vehicles Symposium, AUV 2020, St Johns, NL, Canada, 30 September–2 October 2020. [Google Scholar]
Ainslie, M.A. Sonar signal processing. In Principles of Sonar Performance Modelling; Ainslie, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 251–310. [Google Scholar]
He, T.; Feng, S.; Yang, J.; Yu, K.; Zhou, J.; Chen, D. Underwater Acoustic Signal LOFAR Spectrogram Denoising Based on Enhanced Simulation. Appl. Sci. 2024, 14, 10931. [Google Scholar] [CrossRef]
Wang, J.; Song, C.; Qi, Z. A radon transform-based method for line spectrum enhancement of vector hydrophone LOFAR spectrograms under low SNR conditions. Sci. Rep. 2025, 15, 10679. [Google Scholar] [CrossRef]
Li, Z.; Cheng, Y.; Qiu, J. Adaptive Line Enhancer for Passive Sonars Based on Frequency-Domain Sparsity, Shannon Entropy Criterion and Mixed-Weighted Error. Arab. J. Sci. Eng. 2024, 50, 5899–5920. [Google Scholar] [CrossRef]
Deepa, B.; Anoop, M.; Vijayan Pillai, S.; Sooraj, K.A. Performance Evaluation of the DEMON Processor for Sonar. In Proceedings of the 2022 IEEE Region 10 Symposium (TENSYMP), Mumbai, India, 1–3 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]
Chen, L.; Luo, X.; Zhou, H. A ship-radiated noise classification method based on domain knowledge embedding and attention mechanism. Eng. Appl. Artif. Intell. 2024, 127, 107320. [Google Scholar] [CrossRef]
Jamal, S.; Lakziz, J.; Benremdane, Y.; Ouaskit, S. Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and Neural Network Algorithms. Int. J. Artif. Intell. Appl. 2024, 15, 87–98. [Google Scholar] [CrossRef]
Li, L.; Song, S.; Feng, X. Combined LOFAR and DEMON Spectrums for Simultaneous Underwater Acoustic Object Counting and F0 Estimation. J. Mar. Sci. Eng. 2022, 10, 1565. [Google Scholar] [CrossRef]
Filho, E.P.S.; Santos, A.D.; Filho, E.F.S.; Fernandes, A.C.L.; de Seixas, J.M.; Moura, N.N.d. Hilbert–Huang Transform with Intelligent Noise Reduction for Passive SONAR Signal Processing. IEEE J. Ocean. Eng. 2025, 50, 1387–1402. [Google Scholar] [CrossRef]
Jiang, J.; Wu, Z.; Lu, J.; Huang, M.; Xiao, Z. Interpretable features for underwater acoustic target recognition. Measurement 2021, 173, 108586. [Google Scholar] [CrossRef]
Luo, X.; Chen, L.; Zhou, H.; Cao, H. A Survey of Underwater Acoustic Target Recognition Methods Based on Machine Learning. J. Mar. Sci. Eng. 2023, 11, 384. [Google Scholar] [CrossRef]
Yang, Y.; Yao, Q.; Wang, Y. Underwater Acoustic Target Recognition Method Based on Feature Fusion and Residual CNN. IEEE Sens. J. 2024, 24, 37342–37357. [Google Scholar] [CrossRef]
Liu, F.; Shen, T.; Luo, Z.; Zhao, D.; Guo, S. Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation. Appl. Acoust. 2021, 178, 107989. [Google Scholar] [CrossRef]
Feng, H.; Chen, X.; Wang, R.; Wang, H.; Yao, H.; Wu, F. Underwater acoustic target recognition method based on WA-DS decision fusion. Appl. Acoust. 2024, 217, 109851. [Google Scholar] [CrossRef]
Feng, S.; Ma, S.; Zhu, X.; Yan, M. Artificial Intelligence-Based Underwater Acoustic Target Recognition: A Survey. Remote Sens. 2024, 16, 3333. [Google Scholar] [CrossRef]
Song, Y.; Mohsin, M.F.M. Comparative Analysis of Deep Learning Techniques for Passive Underwater Acoustic Target Recognition: Overview, Challenges, and Future Directions. Int. J. Adv. Comput. Sci. Appl. 2025, 16, 132–145. [Google Scholar] [CrossRef]
Zhao, W.; Cheng, X.; Wang, D.; Xiong, X.; Zhang, X. Enhancing underwater target detection: Fusion of spatio-temporal incompletely-aligned AIS and sonar information via DTW and multi-head attention mechanism. IET Radar Sonar Navig. 2024, 18, 2521–2540. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, C.; Zhang, Q.; Da, L.; Jiang, Z. Bearing-only motion analysis of target based on low-quality bearing-time recordings map. IET Radar Sonar Navig. 2023, 18, 765–781. [Google Scholar] [CrossRef]
Walker, J.L.; Zeng, Z.; ZoBell, V.M.; Frasier, K.E. Underwater sound speed profile estimation from vessel traffic recordings and multi-view neural networks. J. Acoust. Soc. Am. 2024, 155, 3015–3026. [Google Scholar] [CrossRef]
Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef]
Chen, J.; Han, B.; Ma, X.; Zhang, J. Underwater Target Recognition Based on Multi-Decision LOFAR Spectrum Enhancement: A Deep-Learning Approach. Future Internet 2021, 13, 265. [Google Scholar] [CrossRef]
Zhu, J.; Peng, C.; Zhang, B.; Jia, W.; Xu, G.; Wu, Y.; Hu, Z.; Zhu, M. An Improved Background Normalization Algorithm for Noise Resilience in Low Frequency. J. Mar. Sci. Eng. 2021, 9, 803. [Google Scholar] [CrossRef]
Vincenty, T. Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equations. Surv. Rev. 2013, 23, 88–93. [Google Scholar] [CrossRef]
Liang, M.; Su, J.; Liu, R.W.; Lam, J.S.L. AISClean: AIS data-driven vessel trajectory reconstruction under uncertain conditions. Ocean Eng. 2024, 306, 117987. [Google Scholar] [CrossRef]
Liu, S.; Fu, X.; Xu, H.; Zhang, J.; Zhang, A.; Zhou, Q.; Zhang, H. A Fine-Grained Ship-Radiated Noise Recognition System Using Deep Hybrid Neural Networks with Multi-Scale Features. Remote Sens. 2023, 15, 2068. [Google Scholar] [CrossRef]
Yan, C.; Yan, S.; Yao, T.; Yu, Y.; Pan, G.; Liu, L.; Wang, M.; Bai, J. A Lightweight Network Based on Multi-Scale Asymmetric Convolutional Neural Networks with Attention Mechanism for Ship-Radiated Noise Classification. J. Mar. Sci. Eng. 2024, 12, 130. [Google Scholar] [CrossRef]
Singh, R.N.P.; Bailey, W.H. Fuzzy logic applications to multisensor-multitarget correlation. IEEE Trans. Aerosp. Electron. Syst. 1997, 33, 752–769. [Google Scholar] [CrossRef]
Shi, C.; Tao, J.; Zhang, L. The fuzzy association method for target-tracking association of sonar. Tech. Acoust. 2020, 39, 141–145. [Google Scholar] [CrossRef]
Liu, W.; Liu, Y.; Gunawan, B.A.; Bucknall, R. Practical Moving Target Detection in Maritime Environments Using Fuzzy Multi-sensor Data Fusion. Int. J. Fuzzy Syst. 2020, 23, 1860–1878. [Google Scholar] [CrossRef]
Sun, F.; Qiu, J.; Song, Y. Research on Track Correlation Algorithm of AIS and Passive Sonar Based on Fuzzy Mathematics. Digit. Ocean. Underw. Warf. 2022, 5, 225–229. [Google Scholar]
Zhang, Y.; Qin, C. A Gaussian-Shaped Fuzzy Inference System for Multi-Source Fuzzy Data. Systems 2022, 10, 258. [Google Scholar] [CrossRef]
Yin, L.; Deng, X.; Deng, Y. The Negation of a Basic Probability Assignment. IEEE Trans. Fuzzy Syst. 2019, 27, 135–143. [Google Scholar] [CrossRef]
Yager, R.R. On the Maximum Entropy Negation of a Probability Distribution. IEEE Trans. Fuzzy Syst. 2015, 23, 1899–1902. [Google Scholar] [CrossRef]
Deng, Y. Deng entropy. Chaos Solitons Fractals 2016, 91, 549–553. [Google Scholar] [CrossRef]
Tang, Y.; Chen, Y.; Zhou, D. Measuring Uncertainty in the Negation Evidence for Multi-Source Information Fusion. Entropy 2022, 24, 1596. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Method framework diagram.

Figure 2. LOFAR feature spectrogram.The color represents the normalized intensity of the signal energy, with blue indicating low intensity and yellow indicating high intensity.

Figure 3. DEMON Spectrograms: (a) original spectrogram; (b) background-equalized spectrum.The color represents the power spectral density of the demodulated signal, indicating the intensity of tonals (e.g., shaft rate and its harmonics) extracted from the amplitude modulation. The three vertical red lines highlight the persistent high-energy frequency bands, which are characteristic of the target’s rotating machinery.

Figure 4. Experimental scenario diagram.

Figure 5. Multi-target recognition confusion matrices on the test set: (a) LOFAR spectrogram; (b) DEMON spectrogram; (c) fuzzy mathematical matching; (d) D-S fusion (baseline model); (e) DVE-NDS fusion.

Figure 6. t-SNE dimensionality reduction visualization.

Figure 7. Sankey diagram.

Figure 8. Entropy distribution diagram.

Figure 9. Probability space transformation diagram.

Figure 10. Probability correction and transformation diagram.

Figure 11. Shapley value contribution allocation diagram: (a) sequential accuracy gains under different fusion orders, (b) final Shapley contribution.

Figure 12. Recognition accuracy under different signal-to-noise ratios.

Figure 13. Quantitative resolution of a severe spatial mismatch conflict: (a) evidence belief distribution before and after fusion; (b) the evolution of dynamic weights.

Figure 14. Multidimensional radar charts of evidence health metrics across different focal elements.

Figure 15. Dynamic evidence correction and ultimate decision enhancement: (a) identification and replacement of extreme evidence; (b) final fusion probability comparison.

Table 1. Description of AIS variables.

Class	Abbreviation	Description
Latitude	Lat	Longitude of a ship
Longitude	Lon	Latitude of a ship
Course over ground	COG	Actual direction of progress of a vessel
Speed over ground	SOG	Speed of a vessel relative to the Earth’s surface

Table 2. Model architecture parameters.The notation “Conv1 (3 × 3, 32)” denotes a convolutional layer with a 3 × 3 kernel and 32 output channels. “MaxPool1 (2 × 2)” denotes a max-pooling layer with a 2 × 2 window. Output shapes are presented as (height, width, number of channels).

Layer Type	Output Shape	Parameters
Conv1 (3 × 3, 32)	(873, 654, 32)	896
MaxPool1 (2 × 2)	(436, 327, 32)	0
Conv2 (3 × 3, 64)	(434, 325, 64)	18,496
MaxPool2 (2 × 2)	(217, 162, 64)	0
Conv3 (3 × 3, 128)	(215, 160, 128)	73,856
MaxPool3 (2 × 2)	(107, 80, 128)	0
Flatten	(1,095,680)	0
Fully Connected	(128)	140,247,168
Softmax	(3)	384

Table 3. Dataset composition.

Dataset Category	Training Set	Validation Set	Test Set	Total
Hybrid Targets	220	80	0	300
Surface Targets	220	80	110	410
Underwater Targets	220	80	110	410
Total	660	240	220	1120

Table 4. Classification performance metrics.

Class	Accuracy	Precision	Recall	F1
LOFAR	59.55%	83.98%	59.55%	69.69%
DEMON	63.18%	87.52%	63.18%	73.39%
AIS	72.73%	87.19%	72.73%	79.30%
D-S	83.18%	88.84%	83.18%	85.92%
DVE-NDS	92.27%	93.57%	92.27%	92.92%

Table 5. Inference time comparison.

Stage	Baseline D-S System	Proposed DVE-NDS System
CNN Feature Extraction	2.060 s	2.060 s
AIS Fuzzy Association	0.010 s	0.010 s
Decision-Level Fusion	0.017 s	0.050 s
Total Inference Time	2.087 s	2.120 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Che, J.; Xiong, X.; Zhang, Y.; He, X.; Deng, M.; Wang, D. Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination. J. Mar. Sci. Eng. 2026, 14, 675. https://doi.org/10.3390/jmse14070675

AMA Style

Zhang X, Che J, Xiong X, Zhang Y, He X, Deng M, Wang D. Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination. Journal of Marine Science and Engineering. 2026; 14(7):675. https://doi.org/10.3390/jmse14070675

Chicago/Turabian Style

Zhang, Xiaoshuang, Jiayi Che, Xiaodan Xiong, Yucheng Zhang, Xinbo He, Mengsha Deng, and Dezhi Wang. 2026. "Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination" Journal of Marine Science and Engineering 14, no. 7: 675. https://doi.org/10.3390/jmse14070675

APA Style

Zhang, X., Che, J., Xiong, X., Zhang, Y., He, X., Deng, M., & Wang, D. (2026). Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination. Journal of Marine Science and Engineering, 14(7), 675. https://doi.org/10.3390/jmse14070675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination

Abstract

1. Introduction

2. Methods

2.1. Feature Extraction

2.1.1. LOFAR

2.1.2. DEMON

2.1.3. Spatiotemporal Benchmark Construction and Sequence Feature Extraction of Multi-Source Heterogeneous Data

Spatial Domain Unification Based on Geodetic Calculation

Temporal Alignment and Preprocessing

Time Series Tensor Construction with Sliding Window

2.2. Models

2.2.1. Convolutional Neural Network

2.2.2. Fuzzy Association Degree Construction and Correlation Resolution

Construction of Fuzzy Association Degree

Heterogeneous Evidence Reconstruction Strategy Based on Negation Basic Probability Assignment

DVE-NDS: A Dual-Perspective Entropy-Driven Negation D-S Fusion Algorithm

2.2.3. Interpretability Assessment Based on Game Theory Shapley Values

3. Experimental Setup

3.1. Original AIS Dataset

3.2. Vessel Noise Dataset

3.3. Training Set

3.4. Cross-Validation and Hyperparameter Optimization

4. Results and Discussion

4.1. Performance Benchmarking and Error Correction Traceability

4.2. Analysis of Prediction Uncertainty and Decision Mechanism

4.3. Analysis of Heterogeneous Source Contribution and Robustness

4.4. Robustness Analysis Under Simulated Degradations

4.5. Computational Complexity and Real-Time Feasibility

4.6. Case Study: Quantitative Resolution of the “Dark Ship” Conflict and Dynamic Correction Mechanism

4.6.1. Evidence Distribution and Dynamic Weighting Mechanism

4.6.2. Multidimensional Evidence Health Assessment

4.6.3. Extreme Suppression and Final Decision Enhancement

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI